From 72b9188142f1b009ae264d631fe0e52ecf4c02a8 Mon Sep 17 00:00:00 2001 From: Bamboo CI Date: Thu, 20 Jul 2023 18:31:38 +0000 Subject: [PATCH] Automated build in Bamboo CI --- 404.html | 4 ++-- assets/js/{1a4e3797.6edfb8bf.js => 1a4e3797.1ecd994c.js} | 4 ++-- assets/js/1a4e3797.1ecd994c.js.LICENSE.txt | 1 + assets/js/1a4e3797.6edfb8bf.js.LICENSE.txt | 1 - assets/js/{61426.4a4dfe81.js => 61426.8bf7a004.js} | 2 +- .../js/{runtime~main.326b708d.js => runtime~main.3f7ca0ae.js} | 2 +- docs/adding-a-task/index.html | 4 ++-- docs/api/index.html | 4 ++-- docs/architecture/index.html | 4 ++-- docs/category/about-cumulus/index.html | 4 ++-- docs/category/common-use-cases/index.html | 4 ++-- docs/category/configuration-1/index.html | 4 ++-- docs/category/configuration/index.html | 4 ++-- docs/category/cookbooks/index.html | 4 ++-- docs/category/cumulus-development/index.html | 4 ++-- docs/category/deployment/index.html | 4 ++-- docs/category/development/index.html | 4 ++-- docs/category/external-contributions/index.html | 4 ++-- docs/category/features/index.html | 4 ++-- docs/category/getting-started/index.html | 4 ++-- docs/category/integrator-guide/index.html | 4 ++-- docs/category/logs/index.html | 4 ++-- docs/category/operations/index.html | 4 ++-- docs/category/troubleshooting/index.html | 4 ++-- docs/category/upgrade-notes/index.html | 4 ++-- docs/category/workflow-tasks/index.html | 4 ++-- docs/category/workflows/index.html | 4 ++-- docs/configuration/cloudwatch-retention/index.html | 4 ++-- .../collection-storage-best-practices/index.html | 4 ++-- docs/configuration/data-management-types/index.html | 4 ++-- docs/configuration/lifecycle-policies/index.html | 4 ++-- docs/configuration/monitoring-readme/index.html | 4 ++-- docs/configuration/server_access_logging/index.html | 4 ++-- docs/configuration/task-configuration/index.html | 4 ++-- docs/data-cookbooks/about-cookbooks/index.html | 4 ++-- docs/data-cookbooks/browse-generation/index.html | 4 ++-- docs/data-cookbooks/choice-states/index.html | 4 ++-- docs/data-cookbooks/cnm-workflow/index.html | 4 ++-- docs/data-cookbooks/error-handling/index.html | 4 ++-- docs/data-cookbooks/hello-world/index.html | 4 ++-- docs/data-cookbooks/ingest-notifications/index.html | 4 ++-- docs/data-cookbooks/queue-post-to-cmr/index.html | 4 ++-- docs/data-cookbooks/run-tasks-in-lambda-or-docker/index.html | 4 ++-- docs/data-cookbooks/sips-workflow/index.html | 4 ++-- docs/data-cookbooks/throttling-queued-executions/index.html | 4 ++-- docs/data-cookbooks/tracking-files/index.html | 4 ++-- docs/deployment/api-gateway-logging/index.html | 4 ++-- docs/deployment/apis-introduction/index.html | 4 ++-- docs/deployment/choosing_configuring_rds/index.html | 4 ++-- docs/deployment/cloudwatch-logs-delivery/index.html | 4 ++-- docs/deployment/components/index.html | 4 ++-- docs/deployment/create_bucket/index.html | 4 ++-- docs/deployment/cumulus_distribution/index.html | 4 ++-- docs/deployment/databases-introduction/index.html | 4 ++-- docs/deployment/index.html | 4 ++-- docs/deployment/postgres_database_deployment/index.html | 4 ++-- docs/deployment/share-s3-access-logs/index.html | 4 ++-- docs/deployment/terraform-best-practices/index.html | 4 ++-- docs/deployment/thin_egress_app/index.html | 4 ++-- docs/deployment/upgrade-readme/index.html | 4 ++-- docs/development/forked-pr/index.html | 4 ++-- docs/development/integration-tests/index.html | 4 ++-- docs/development/quality-and-coverage/index.html | 4 ++-- docs/development/release/index.html | 4 ++-- docs/docs-how-to/index.html | 4 ++-- docs/external-contributions/index.html | 4 ++-- docs/faqs/index.html | 4 ++-- docs/features/ancillary_metadata/index.html | 4 ++-- docs/features/backup_and_restore/index.html | 4 ++-- docs/features/dead_letter_archive/index.html | 4 ++-- docs/features/dead_letter_queues/index.html | 4 ++-- docs/features/distribution-metrics/index.html | 4 ++-- docs/features/execution_payload_retention/index.html | 4 ++-- docs/features/logging-esdis-metrics/index.html | 4 ++-- docs/features/replay-archived-sqs-messages/index.html | 4 ++-- docs/features/replay-kinesis-messages/index.html | 4 ++-- docs/features/reports/index.html | 4 ++-- docs/getting-started/index.html | 4 ++-- docs/glossary/index.html | 4 ++-- docs/index.html | 4 ++-- docs/integrator-guide/about-int-guide/index.html | 4 ++-- docs/integrator-guide/int-common-use-cases/index.html | 4 ++-- docs/integrator-guide/workflow-add-new-lambda/index.html | 4 ++-- docs/integrator-guide/workflow-ts-failed-step/index.html | 4 ++-- docs/interfaces/index.html | 4 ++-- docs/next/adding-a-task/index.html | 4 ++-- docs/next/api/index.html | 4 ++-- docs/next/architecture/index.html | 4 ++-- docs/next/category/about-cumulus/index.html | 4 ++-- docs/next/category/common-use-cases/index.html | 4 ++-- docs/next/category/configuration-1/index.html | 4 ++-- docs/next/category/configuration/index.html | 4 ++-- docs/next/category/cookbooks/index.html | 4 ++-- docs/next/category/cumulus-development/index.html | 4 ++-- docs/next/category/deployment/index.html | 4 ++-- docs/next/category/development/index.html | 4 ++-- docs/next/category/external-contributions/index.html | 4 ++-- docs/next/category/features/index.html | 4 ++-- docs/next/category/getting-started/index.html | 4 ++-- docs/next/category/integrator-guide/index.html | 4 ++-- docs/next/category/logs/index.html | 4 ++-- docs/next/category/operations/index.html | 4 ++-- docs/next/category/troubleshooting/index.html | 4 ++-- docs/next/category/upgrade-notes/index.html | 4 ++-- docs/next/category/workflow-tasks/index.html | 4 ++-- docs/next/category/workflows/index.html | 4 ++-- docs/next/configuration/cloudwatch-retention/index.html | 4 ++-- .../collection-storage-best-practices/index.html | 4 ++-- docs/next/configuration/data-management-types/index.html | 4 ++-- docs/next/configuration/lifecycle-policies/index.html | 4 ++-- docs/next/configuration/monitoring-readme/index.html | 4 ++-- docs/next/configuration/server_access_logging/index.html | 4 ++-- docs/next/configuration/task-configuration/index.html | 4 ++-- docs/next/data-cookbooks/about-cookbooks/index.html | 4 ++-- docs/next/data-cookbooks/browse-generation/index.html | 4 ++-- docs/next/data-cookbooks/choice-states/index.html | 4 ++-- docs/next/data-cookbooks/cnm-workflow/index.html | 4 ++-- docs/next/data-cookbooks/error-handling/index.html | 4 ++-- docs/next/data-cookbooks/hello-world/index.html | 4 ++-- docs/next/data-cookbooks/ingest-notifications/index.html | 4 ++-- docs/next/data-cookbooks/queue-post-to-cmr/index.html | 4 ++-- .../data-cookbooks/run-tasks-in-lambda-or-docker/index.html | 4 ++-- docs/next/data-cookbooks/sips-workflow/index.html | 4 ++-- .../data-cookbooks/throttling-queued-executions/index.html | 4 ++-- docs/next/data-cookbooks/tracking-files/index.html | 4 ++-- docs/next/deployment/api-gateway-logging/index.html | 4 ++-- docs/next/deployment/apis-introduction/index.html | 4 ++-- docs/next/deployment/choosing_configuring_rds/index.html | 4 ++-- docs/next/deployment/cloudwatch-logs-delivery/index.html | 4 ++-- docs/next/deployment/components/index.html | 4 ++-- docs/next/deployment/create_bucket/index.html | 4 ++-- docs/next/deployment/cumulus_distribution/index.html | 4 ++-- docs/next/deployment/databases-introduction/index.html | 4 ++-- docs/next/deployment/index.html | 4 ++-- docs/next/deployment/postgres_database_deployment/index.html | 4 ++-- docs/next/deployment/share-s3-access-logs/index.html | 4 ++-- docs/next/deployment/terraform-best-practices/index.html | 4 ++-- docs/next/deployment/thin_egress_app/index.html | 4 ++-- docs/next/deployment/upgrade-readme/index.html | 4 ++-- docs/next/development/forked-pr/index.html | 4 ++-- docs/next/development/integration-tests/index.html | 4 ++-- docs/next/development/quality-and-coverage/index.html | 4 ++-- docs/next/development/release/index.html | 4 ++-- docs/next/docs-how-to/index.html | 4 ++-- docs/next/external-contributions/index.html | 4 ++-- docs/next/faqs/index.html | 4 ++-- docs/next/features/ancillary_metadata/index.html | 4 ++-- docs/next/features/backup_and_restore/index.html | 4 ++-- docs/next/features/dead_letter_archive/index.html | 4 ++-- docs/next/features/dead_letter_queues/index.html | 4 ++-- docs/next/features/distribution-metrics/index.html | 4 ++-- docs/next/features/execution_payload_retention/index.html | 4 ++-- docs/next/features/logging-esdis-metrics/index.html | 4 ++-- docs/next/features/replay-archived-sqs-messages/index.html | 4 ++-- docs/next/features/replay-kinesis-messages/index.html | 4 ++-- docs/next/features/reports/index.html | 4 ++-- docs/next/getting-started/index.html | 4 ++-- docs/next/glossary/index.html | 4 ++-- docs/next/index.html | 4 ++-- docs/next/integrator-guide/about-int-guide/index.html | 4 ++-- docs/next/integrator-guide/int-common-use-cases/index.html | 4 ++-- docs/next/integrator-guide/workflow-add-new-lambda/index.html | 4 ++-- docs/next/integrator-guide/workflow-ts-failed-step/index.html | 4 ++-- docs/next/interfaces/index.html | 4 ++-- docs/next/operator-docs/about-operator-docs/index.html | 4 ++-- docs/next/operator-docs/bulk-operations/index.html | 4 ++-- docs/next/operator-docs/cmr-operations/index.html | 4 ++-- docs/next/operator-docs/create-rule-in-cumulus/index.html | 4 ++-- docs/next/operator-docs/discovery-filtering/index.html | 4 ++-- docs/next/operator-docs/granule-workflows/index.html | 4 ++-- docs/next/operator-docs/kinesis-stream-for-ingest/index.html | 4 ++-- docs/next/operator-docs/locating-access-logs/index.html | 4 ++-- docs/next/operator-docs/naming-executions/index.html | 4 ++-- docs/next/operator-docs/ops-common-use-cases/index.html | 4 ++-- docs/next/operator-docs/trigger-workflow/index.html | 4 ++-- docs/next/tasks/index.html | 4 ++-- docs/next/team/index.html | 4 ++-- docs/next/troubleshooting/index.html | 4 ++-- docs/next/troubleshooting/reindex-elasticsearch/index.html | 4 ++-- .../troubleshooting/rerunning-workflow-executions/index.html | 4 ++-- .../troubleshooting/troubleshooting-deployment/index.html | 4 ++-- .../upgrade-notes/cumulus_distribution_migration/index.html | 4 ++-- docs/next/upgrade-notes/migrate_tea_standalone/index.html | 4 ++-- .../rds-phase-3-data-migration-guidance/index.html | 4 ++-- docs/next/upgrade-notes/update-cma-2.0.2/index.html | 4 ++-- docs/next/upgrade-notes/update-task-file-schemas/index.html | 4 ++-- .../next/upgrade-notes/upgrade-rds-phase-3-release/index.html | 4 ++-- docs/next/upgrade-notes/upgrade-rds/index.html | 4 ++-- docs/next/upgrade-notes/upgrade_tf_version_0.13.6/index.html | 4 ++-- docs/next/workflow_tasks/discover_granules/index.html | 4 ++-- docs/next/workflow_tasks/files_to_granules/index.html | 4 ++-- docs/next/workflow_tasks/lzards_backup/index.html | 4 ++-- docs/next/workflow_tasks/move_granules/index.html | 4 ++-- docs/next/workflow_tasks/parse_pdr/index.html | 4 ++-- docs/next/workflow_tasks/queue_granules/index.html | 4 ++-- docs/next/workflows/cumulus-task-message-flow/index.html | 4 ++-- docs/next/workflows/developing-a-cumulus-workflow/index.html | 4 ++-- docs/next/workflows/developing-workflow-tasks/index.html | 4 ++-- docs/next/workflows/docker/index.html | 4 ++-- docs/next/workflows/index.html | 4 ++-- docs/next/workflows/input_output/index.html | 4 ++-- docs/next/workflows/lambda/index.html | 4 ++-- docs/next/workflows/message_granule_writes/index.html | 4 ++-- docs/next/workflows/protocol/index.html | 4 ++-- docs/next/workflows/workflow-configuration-how-to/index.html | 4 ++-- docs/next/workflows/workflow-triggers/index.html | 4 ++-- docs/operator-docs/about-operator-docs/index.html | 4 ++-- docs/operator-docs/bulk-operations/index.html | 4 ++-- docs/operator-docs/cmr-operations/index.html | 4 ++-- docs/operator-docs/create-rule-in-cumulus/index.html | 4 ++-- docs/operator-docs/discovery-filtering/index.html | 4 ++-- docs/operator-docs/granule-workflows/index.html | 4 ++-- docs/operator-docs/kinesis-stream-for-ingest/index.html | 4 ++-- docs/operator-docs/locating-access-logs/index.html | 4 ++-- docs/operator-docs/naming-executions/index.html | 4 ++-- docs/operator-docs/ops-common-use-cases/index.html | 4 ++-- docs/operator-docs/trigger-workflow/index.html | 4 ++-- docs/tasks/index.html | 4 ++-- docs/team/index.html | 4 ++-- docs/troubleshooting/index.html | 4 ++-- docs/troubleshooting/reindex-elasticsearch/index.html | 4 ++-- docs/troubleshooting/rerunning-workflow-executions/index.html | 4 ++-- docs/troubleshooting/troubleshooting-deployment/index.html | 4 ++-- docs/upgrade-notes/cumulus_distribution_migration/index.html | 4 ++-- docs/upgrade-notes/migrate_tea_standalone/index.html | 4 ++-- .../rds-phase-3-data-migration-guidance/index.html | 4 ++-- docs/upgrade-notes/update-cma-2.0.2/index.html | 4 ++-- docs/upgrade-notes/update-task-file-schemas/index.html | 4 ++-- docs/upgrade-notes/upgrade-rds-phase-3-release/index.html | 4 ++-- docs/upgrade-notes/upgrade-rds/index.html | 4 ++-- docs/upgrade-notes/upgrade_tf_version_0.13.6/index.html | 4 ++-- docs/v10.0.0/adding-a-task/index.html | 4 ++-- docs/v10.0.0/api/index.html | 4 ++-- docs/v10.0.0/architecture/index.html | 4 ++-- docs/v10.0.0/configuration/cloudwatch-retention/index.html | 4 ++-- .../collection-storage-best-practices/index.html | 4 ++-- docs/v10.0.0/configuration/data-management-types/index.html | 4 ++-- docs/v10.0.0/configuration/lifecycle-policies/index.html | 4 ++-- docs/v10.0.0/configuration/monitoring-readme/index.html | 4 ++-- docs/v10.0.0/configuration/server_access_logging/index.html | 4 ++-- docs/v10.0.0/configuration/task-configuration/index.html | 4 ++-- docs/v10.0.0/data-cookbooks/about-cookbooks/index.html | 4 ++-- docs/v10.0.0/data-cookbooks/browse-generation/index.html | 4 ++-- docs/v10.0.0/data-cookbooks/choice-states/index.html | 4 ++-- docs/v10.0.0/data-cookbooks/cnm-workflow/index.html | 4 ++-- docs/v10.0.0/data-cookbooks/error-handling/index.html | 4 ++-- docs/v10.0.0/data-cookbooks/hello-world/index.html | 4 ++-- docs/v10.0.0/data-cookbooks/ingest-notifications/index.html | 4 ++-- docs/v10.0.0/data-cookbooks/queue-post-to-cmr/index.html | 4 ++-- .../data-cookbooks/run-tasks-in-lambda-or-docker/index.html | 4 ++-- docs/v10.0.0/data-cookbooks/sips-workflow/index.html | 4 ++-- .../data-cookbooks/throttling-queued-executions/index.html | 4 ++-- docs/v10.0.0/data-cookbooks/tracking-files/index.html | 4 ++-- docs/v10.0.0/deployment/api-gateway-logging/index.html | 4 ++-- docs/v10.0.0/deployment/cloudwatch-logs-delivery/index.html | 4 ++-- docs/v10.0.0/deployment/components/index.html | 4 ++-- docs/v10.0.0/deployment/create_bucket/index.html | 4 ++-- docs/v10.0.0/deployment/cumulus_distribution/index.html | 4 ++-- docs/v10.0.0/deployment/index.html | 4 ++-- .../deployment/postgres_database_deployment/index.html | 4 ++-- docs/v10.0.0/deployment/share-s3-access-logs/index.html | 4 ++-- docs/v10.0.0/deployment/terraform-best-practices/index.html | 4 ++-- docs/v10.0.0/deployment/thin_egress_app/index.html | 4 ++-- docs/v10.0.0/deployment/upgrade-readme/index.html | 4 ++-- docs/v10.0.0/development/forked-pr/index.html | 4 ++-- docs/v10.0.0/development/integration-tests/index.html | 4 ++-- docs/v10.0.0/development/quality-and-coverage/index.html | 4 ++-- docs/v10.0.0/development/release/index.html | 4 ++-- docs/v10.0.0/docs-how-to/index.html | 4 ++-- docs/v10.0.0/external-contributions/index.html | 4 ++-- docs/v10.0.0/faqs/index.html | 4 ++-- docs/v10.0.0/features/ancillary_metadata/index.html | 4 ++-- docs/v10.0.0/features/backup_and_restore/index.html | 4 ++-- docs/v10.0.0/features/data_in_dynamodb/index.html | 4 ++-- docs/v10.0.0/features/dead_letter_archive/index.html | 4 ++-- docs/v10.0.0/features/dead_letter_queues/index.html | 4 ++-- docs/v10.0.0/features/distribution-metrics/index.html | 4 ++-- docs/v10.0.0/features/execution_payload_retention/index.html | 4 ++-- docs/v10.0.0/features/logging-esdis-metrics/index.html | 4 ++-- docs/v10.0.0/features/replay-archived-sqs-messages/index.html | 4 ++-- docs/v10.0.0/features/replay-kinesis-messages/index.html | 4 ++-- docs/v10.0.0/features/reports/index.html | 4 ++-- docs/v10.0.0/getting-started/index.html | 4 ++-- docs/v10.0.0/glossary/index.html | 4 ++-- docs/v10.0.0/index.html | 4 ++-- docs/v10.0.0/integrator-guide/about-int-guide/index.html | 4 ++-- docs/v10.0.0/integrator-guide/int-common-use-cases/index.html | 4 ++-- .../integrator-guide/workflow-add-new-lambda/index.html | 4 ++-- .../integrator-guide/workflow-ts-failed-step/index.html | 4 ++-- docs/v10.0.0/interfaces/index.html | 4 ++-- docs/v10.0.0/operator-docs/about-operator-docs/index.html | 4 ++-- docs/v10.0.0/operator-docs/bulk-operations/index.html | 4 ++-- docs/v10.0.0/operator-docs/cmr-operations/index.html | 4 ++-- docs/v10.0.0/operator-docs/create-rule-in-cumulus/index.html | 4 ++-- docs/v10.0.0/operator-docs/discovery-filtering/index.html | 4 ++-- docs/v10.0.0/operator-docs/granule-workflows/index.html | 4 ++-- .../operator-docs/kinesis-stream-for-ingest/index.html | 4 ++-- docs/v10.0.0/operator-docs/locating-access-logs/index.html | 4 ++-- docs/v10.0.0/operator-docs/naming-executions/index.html | 4 ++-- docs/v10.0.0/operator-docs/ops-common-use-cases/index.html | 4 ++-- docs/v10.0.0/operator-docs/trigger-workflow/index.html | 4 ++-- docs/v10.0.0/tasks/index.html | 4 ++-- docs/v10.0.0/team/index.html | 4 ++-- docs/v10.0.0/troubleshooting/index.html | 4 ++-- docs/v10.0.0/troubleshooting/reindex-elasticsearch/index.html | 4 ++-- .../troubleshooting/rerunning-workflow-executions/index.html | 4 ++-- .../troubleshooting/troubleshooting-deployment/index.html | 4 ++-- .../upgrade-notes/cumulus_distribution_migration/index.html | 4 ++-- docs/v10.0.0/upgrade-notes/migrate_tea_standalone/index.html | 4 ++-- docs/v10.0.0/upgrade-notes/update-cma-2.0.2/index.html | 4 ++-- .../v10.0.0/upgrade-notes/update-task-file-schemas/index.html | 4 ++-- docs/v10.0.0/upgrade-notes/upgrade-rds/index.html | 4 ++-- .../upgrade-notes/upgrade_tf_version_0.13.6/index.html | 4 ++-- docs/v10.0.0/workflow_tasks/discover_granules/index.html | 4 ++-- docs/v10.0.0/workflow_tasks/files_to_granules/index.html | 4 ++-- docs/v10.0.0/workflow_tasks/lzards_backup/index.html | 4 ++-- docs/v10.0.0/workflow_tasks/move_granules/index.html | 4 ++-- docs/v10.0.0/workflow_tasks/parse_pdr/index.html | 4 ++-- docs/v10.0.0/workflow_tasks/queue_granules/index.html | 4 ++-- docs/v10.0.0/workflows/cumulus-task-message-flow/index.html | 4 ++-- .../workflows/developing-a-cumulus-workflow/index.html | 4 ++-- docs/v10.0.0/workflows/developing-workflow-tasks/index.html | 4 ++-- docs/v10.0.0/workflows/docker/index.html | 4 ++-- docs/v10.0.0/workflows/index.html | 4 ++-- docs/v10.0.0/workflows/input_output/index.html | 4 ++-- docs/v10.0.0/workflows/lambda/index.html | 4 ++-- docs/v10.0.0/workflows/protocol/index.html | 4 ++-- .../workflows/workflow-configuration-how-to/index.html | 4 ++-- docs/v10.0.0/workflows/workflow-triggers/index.html | 4 ++-- docs/v10.1.0/adding-a-task/index.html | 4 ++-- docs/v10.1.0/api/index.html | 4 ++-- docs/v10.1.0/architecture/index.html | 4 ++-- docs/v10.1.0/configuration/cloudwatch-retention/index.html | 4 ++-- .../collection-storage-best-practices/index.html | 4 ++-- docs/v10.1.0/configuration/data-management-types/index.html | 4 ++-- docs/v10.1.0/configuration/lifecycle-policies/index.html | 4 ++-- docs/v10.1.0/configuration/monitoring-readme/index.html | 4 ++-- docs/v10.1.0/configuration/server_access_logging/index.html | 4 ++-- docs/v10.1.0/configuration/task-configuration/index.html | 4 ++-- docs/v10.1.0/data-cookbooks/about-cookbooks/index.html | 4 ++-- docs/v10.1.0/data-cookbooks/browse-generation/index.html | 4 ++-- docs/v10.1.0/data-cookbooks/choice-states/index.html | 4 ++-- docs/v10.1.0/data-cookbooks/cnm-workflow/index.html | 4 ++-- docs/v10.1.0/data-cookbooks/error-handling/index.html | 4 ++-- docs/v10.1.0/data-cookbooks/hello-world/index.html | 4 ++-- docs/v10.1.0/data-cookbooks/ingest-notifications/index.html | 4 ++-- docs/v10.1.0/data-cookbooks/queue-post-to-cmr/index.html | 4 ++-- .../data-cookbooks/run-tasks-in-lambda-or-docker/index.html | 4 ++-- docs/v10.1.0/data-cookbooks/sips-workflow/index.html | 4 ++-- .../data-cookbooks/throttling-queued-executions/index.html | 4 ++-- docs/v10.1.0/data-cookbooks/tracking-files/index.html | 4 ++-- docs/v10.1.0/deployment/api-gateway-logging/index.html | 4 ++-- docs/v10.1.0/deployment/cloudwatch-logs-delivery/index.html | 4 ++-- docs/v10.1.0/deployment/components/index.html | 4 ++-- docs/v10.1.0/deployment/create_bucket/index.html | 4 ++-- docs/v10.1.0/deployment/cumulus_distribution/index.html | 4 ++-- docs/v10.1.0/deployment/index.html | 4 ++-- .../deployment/postgres_database_deployment/index.html | 4 ++-- docs/v10.1.0/deployment/share-s3-access-logs/index.html | 4 ++-- docs/v10.1.0/deployment/terraform-best-practices/index.html | 4 ++-- docs/v10.1.0/deployment/thin_egress_app/index.html | 4 ++-- docs/v10.1.0/deployment/upgrade-readme/index.html | 4 ++-- docs/v10.1.0/development/forked-pr/index.html | 4 ++-- docs/v10.1.0/development/integration-tests/index.html | 4 ++-- docs/v10.1.0/development/quality-and-coverage/index.html | 4 ++-- docs/v10.1.0/development/release/index.html | 4 ++-- docs/v10.1.0/docs-how-to/index.html | 4 ++-- docs/v10.1.0/external-contributions/index.html | 4 ++-- docs/v10.1.0/faqs/index.html | 4 ++-- docs/v10.1.0/features/ancillary_metadata/index.html | 4 ++-- docs/v10.1.0/features/backup_and_restore/index.html | 4 ++-- docs/v10.1.0/features/data_in_dynamodb/index.html | 4 ++-- docs/v10.1.0/features/dead_letter_archive/index.html | 4 ++-- docs/v10.1.0/features/dead_letter_queues/index.html | 4 ++-- docs/v10.1.0/features/distribution-metrics/index.html | 4 ++-- docs/v10.1.0/features/execution_payload_retention/index.html | 4 ++-- docs/v10.1.0/features/logging-esdis-metrics/index.html | 4 ++-- docs/v10.1.0/features/replay-archived-sqs-messages/index.html | 4 ++-- docs/v10.1.0/features/replay-kinesis-messages/index.html | 4 ++-- docs/v10.1.0/features/reports/index.html | 4 ++-- docs/v10.1.0/getting-started/index.html | 4 ++-- docs/v10.1.0/glossary/index.html | 4 ++-- docs/v10.1.0/index.html | 4 ++-- docs/v10.1.0/integrator-guide/about-int-guide/index.html | 4 ++-- docs/v10.1.0/integrator-guide/int-common-use-cases/index.html | 4 ++-- .../integrator-guide/workflow-add-new-lambda/index.html | 4 ++-- .../integrator-guide/workflow-ts-failed-step/index.html | 4 ++-- docs/v10.1.0/interfaces/index.html | 4 ++-- docs/v10.1.0/operator-docs/about-operator-docs/index.html | 4 ++-- docs/v10.1.0/operator-docs/bulk-operations/index.html | 4 ++-- docs/v10.1.0/operator-docs/cmr-operations/index.html | 4 ++-- docs/v10.1.0/operator-docs/create-rule-in-cumulus/index.html | 4 ++-- docs/v10.1.0/operator-docs/discovery-filtering/index.html | 4 ++-- docs/v10.1.0/operator-docs/granule-workflows/index.html | 4 ++-- .../operator-docs/kinesis-stream-for-ingest/index.html | 4 ++-- docs/v10.1.0/operator-docs/locating-access-logs/index.html | 4 ++-- docs/v10.1.0/operator-docs/naming-executions/index.html | 4 ++-- docs/v10.1.0/operator-docs/ops-common-use-cases/index.html | 4 ++-- docs/v10.1.0/operator-docs/trigger-workflow/index.html | 4 ++-- docs/v10.1.0/tasks/index.html | 4 ++-- docs/v10.1.0/team/index.html | 4 ++-- docs/v10.1.0/troubleshooting/index.html | 4 ++-- docs/v10.1.0/troubleshooting/reindex-elasticsearch/index.html | 4 ++-- .../troubleshooting/rerunning-workflow-executions/index.html | 4 ++-- .../troubleshooting/troubleshooting-deployment/index.html | 4 ++-- .../upgrade-notes/cumulus_distribution_migration/index.html | 4 ++-- docs/v10.1.0/upgrade-notes/migrate_tea_standalone/index.html | 4 ++-- docs/v10.1.0/upgrade-notes/update-cma-2.0.2/index.html | 4 ++-- .../v10.1.0/upgrade-notes/update-task-file-schemas/index.html | 4 ++-- docs/v10.1.0/upgrade-notes/upgrade-rds/index.html | 4 ++-- .../upgrade-notes/upgrade_tf_version_0.13.6/index.html | 4 ++-- docs/v10.1.0/workflow_tasks/discover_granules/index.html | 4 ++-- docs/v10.1.0/workflow_tasks/files_to_granules/index.html | 4 ++-- docs/v10.1.0/workflow_tasks/lzards_backup/index.html | 4 ++-- docs/v10.1.0/workflow_tasks/move_granules/index.html | 4 ++-- docs/v10.1.0/workflow_tasks/parse_pdr/index.html | 4 ++-- docs/v10.1.0/workflow_tasks/queue_granules/index.html | 4 ++-- docs/v10.1.0/workflows/cumulus-task-message-flow/index.html | 4 ++-- .../workflows/developing-a-cumulus-workflow/index.html | 4 ++-- docs/v10.1.0/workflows/developing-workflow-tasks/index.html | 4 ++-- docs/v10.1.0/workflows/docker/index.html | 4 ++-- docs/v10.1.0/workflows/index.html | 4 ++-- docs/v10.1.0/workflows/input_output/index.html | 4 ++-- docs/v10.1.0/workflows/lambda/index.html | 4 ++-- docs/v10.1.0/workflows/protocol/index.html | 4 ++-- .../workflows/workflow-configuration-how-to/index.html | 4 ++-- docs/v10.1.0/workflows/workflow-triggers/index.html | 4 ++-- docs/v11.0.0/adding-a-task/index.html | 4 ++-- docs/v11.0.0/api/index.html | 4 ++-- docs/v11.0.0/architecture/index.html | 4 ++-- docs/v11.0.0/configuration/cloudwatch-retention/index.html | 4 ++-- .../collection-storage-best-practices/index.html | 4 ++-- docs/v11.0.0/configuration/data-management-types/index.html | 4 ++-- docs/v11.0.0/configuration/lifecycle-policies/index.html | 4 ++-- docs/v11.0.0/configuration/monitoring-readme/index.html | 4 ++-- docs/v11.0.0/configuration/server_access_logging/index.html | 4 ++-- docs/v11.0.0/configuration/task-configuration/index.html | 4 ++-- docs/v11.0.0/data-cookbooks/about-cookbooks/index.html | 4 ++-- docs/v11.0.0/data-cookbooks/browse-generation/index.html | 4 ++-- docs/v11.0.0/data-cookbooks/choice-states/index.html | 4 ++-- docs/v11.0.0/data-cookbooks/cnm-workflow/index.html | 4 ++-- docs/v11.0.0/data-cookbooks/error-handling/index.html | 4 ++-- docs/v11.0.0/data-cookbooks/hello-world/index.html | 4 ++-- docs/v11.0.0/data-cookbooks/ingest-notifications/index.html | 4 ++-- docs/v11.0.0/data-cookbooks/queue-post-to-cmr/index.html | 4 ++-- .../data-cookbooks/run-tasks-in-lambda-or-docker/index.html | 4 ++-- docs/v11.0.0/data-cookbooks/sips-workflow/index.html | 4 ++-- .../data-cookbooks/throttling-queued-executions/index.html | 4 ++-- docs/v11.0.0/data-cookbooks/tracking-files/index.html | 4 ++-- docs/v11.0.0/deployment/api-gateway-logging/index.html | 4 ++-- docs/v11.0.0/deployment/cloudwatch-logs-delivery/index.html | 4 ++-- docs/v11.0.0/deployment/components/index.html | 4 ++-- docs/v11.0.0/deployment/create_bucket/index.html | 4 ++-- docs/v11.0.0/deployment/cumulus_distribution/index.html | 4 ++-- docs/v11.0.0/deployment/index.html | 4 ++-- .../deployment/postgres_database_deployment/index.html | 4 ++-- docs/v11.0.0/deployment/share-s3-access-logs/index.html | 4 ++-- docs/v11.0.0/deployment/terraform-best-practices/index.html | 4 ++-- docs/v11.0.0/deployment/thin_egress_app/index.html | 4 ++-- docs/v11.0.0/deployment/upgrade-readme/index.html | 4 ++-- docs/v11.0.0/development/forked-pr/index.html | 4 ++-- docs/v11.0.0/development/integration-tests/index.html | 4 ++-- docs/v11.0.0/development/quality-and-coverage/index.html | 4 ++-- docs/v11.0.0/development/release/index.html | 4 ++-- docs/v11.0.0/docs-how-to/index.html | 4 ++-- docs/v11.0.0/external-contributions/index.html | 4 ++-- docs/v11.0.0/faqs/index.html | 4 ++-- docs/v11.0.0/features/ancillary_metadata/index.html | 4 ++-- docs/v11.0.0/features/backup_and_restore/index.html | 4 ++-- docs/v11.0.0/features/dead_letter_archive/index.html | 4 ++-- docs/v11.0.0/features/dead_letter_queues/index.html | 4 ++-- docs/v11.0.0/features/distribution-metrics/index.html | 4 ++-- docs/v11.0.0/features/execution_payload_retention/index.html | 4 ++-- docs/v11.0.0/features/logging-esdis-metrics/index.html | 4 ++-- docs/v11.0.0/features/replay-archived-sqs-messages/index.html | 4 ++-- docs/v11.0.0/features/replay-kinesis-messages/index.html | 4 ++-- docs/v11.0.0/features/reports/index.html | 4 ++-- docs/v11.0.0/getting-started/index.html | 4 ++-- docs/v11.0.0/glossary/index.html | 4 ++-- docs/v11.0.0/index.html | 4 ++-- docs/v11.0.0/integrator-guide/about-int-guide/index.html | 4 ++-- docs/v11.0.0/integrator-guide/int-common-use-cases/index.html | 4 ++-- .../integrator-guide/workflow-add-new-lambda/index.html | 4 ++-- .../integrator-guide/workflow-ts-failed-step/index.html | 4 ++-- docs/v11.0.0/interfaces/index.html | 4 ++-- docs/v11.0.0/operator-docs/about-operator-docs/index.html | 4 ++-- docs/v11.0.0/operator-docs/bulk-operations/index.html | 4 ++-- docs/v11.0.0/operator-docs/cmr-operations/index.html | 4 ++-- docs/v11.0.0/operator-docs/create-rule-in-cumulus/index.html | 4 ++-- docs/v11.0.0/operator-docs/discovery-filtering/index.html | 4 ++-- docs/v11.0.0/operator-docs/granule-workflows/index.html | 4 ++-- .../operator-docs/kinesis-stream-for-ingest/index.html | 4 ++-- docs/v11.0.0/operator-docs/locating-access-logs/index.html | 4 ++-- docs/v11.0.0/operator-docs/naming-executions/index.html | 4 ++-- docs/v11.0.0/operator-docs/ops-common-use-cases/index.html | 4 ++-- docs/v11.0.0/operator-docs/trigger-workflow/index.html | 4 ++-- docs/v11.0.0/tasks/index.html | 4 ++-- docs/v11.0.0/team/index.html | 4 ++-- docs/v11.0.0/troubleshooting/index.html | 4 ++-- docs/v11.0.0/troubleshooting/reindex-elasticsearch/index.html | 4 ++-- .../troubleshooting/rerunning-workflow-executions/index.html | 4 ++-- .../troubleshooting/troubleshooting-deployment/index.html | 4 ++-- .../upgrade-notes/cumulus_distribution_migration/index.html | 4 ++-- docs/v11.0.0/upgrade-notes/migrate_tea_standalone/index.html | 4 ++-- docs/v11.0.0/upgrade-notes/update-cma-2.0.2/index.html | 4 ++-- .../v11.0.0/upgrade-notes/update-task-file-schemas/index.html | 4 ++-- docs/v11.0.0/upgrade-notes/upgrade-rds/index.html | 4 ++-- .../upgrade-notes/upgrade_tf_version_0.13.6/index.html | 4 ++-- docs/v11.0.0/workflow_tasks/discover_granules/index.html | 4 ++-- docs/v11.0.0/workflow_tasks/files_to_granules/index.html | 4 ++-- docs/v11.0.0/workflow_tasks/lzards_backup/index.html | 4 ++-- docs/v11.0.0/workflow_tasks/move_granules/index.html | 4 ++-- docs/v11.0.0/workflow_tasks/parse_pdr/index.html | 4 ++-- docs/v11.0.0/workflow_tasks/queue_granules/index.html | 4 ++-- docs/v11.0.0/workflows/cumulus-task-message-flow/index.html | 4 ++-- .../workflows/developing-a-cumulus-workflow/index.html | 4 ++-- docs/v11.0.0/workflows/developing-workflow-tasks/index.html | 4 ++-- docs/v11.0.0/workflows/docker/index.html | 4 ++-- docs/v11.0.0/workflows/index.html | 4 ++-- docs/v11.0.0/workflows/input_output/index.html | 4 ++-- docs/v11.0.0/workflows/lambda/index.html | 4 ++-- docs/v11.0.0/workflows/protocol/index.html | 4 ++-- .../workflows/workflow-configuration-how-to/index.html | 4 ++-- docs/v11.0.0/workflows/workflow-triggers/index.html | 4 ++-- docs/v11.1.0/adding-a-task/index.html | 4 ++-- docs/v11.1.0/api/index.html | 4 ++-- docs/v11.1.0/architecture/index.html | 4 ++-- docs/v11.1.0/configuration/cloudwatch-retention/index.html | 4 ++-- .../collection-storage-best-practices/index.html | 4 ++-- docs/v11.1.0/configuration/data-management-types/index.html | 4 ++-- docs/v11.1.0/configuration/lifecycle-policies/index.html | 4 ++-- docs/v11.1.0/configuration/monitoring-readme/index.html | 4 ++-- docs/v11.1.0/configuration/server_access_logging/index.html | 4 ++-- docs/v11.1.0/configuration/task-configuration/index.html | 4 ++-- docs/v11.1.0/data-cookbooks/about-cookbooks/index.html | 4 ++-- docs/v11.1.0/data-cookbooks/browse-generation/index.html | 4 ++-- docs/v11.1.0/data-cookbooks/choice-states/index.html | 4 ++-- docs/v11.1.0/data-cookbooks/cnm-workflow/index.html | 4 ++-- docs/v11.1.0/data-cookbooks/error-handling/index.html | 4 ++-- docs/v11.1.0/data-cookbooks/hello-world/index.html | 4 ++-- docs/v11.1.0/data-cookbooks/ingest-notifications/index.html | 4 ++-- docs/v11.1.0/data-cookbooks/queue-post-to-cmr/index.html | 4 ++-- .../data-cookbooks/run-tasks-in-lambda-or-docker/index.html | 4 ++-- docs/v11.1.0/data-cookbooks/sips-workflow/index.html | 4 ++-- .../data-cookbooks/throttling-queued-executions/index.html | 4 ++-- docs/v11.1.0/data-cookbooks/tracking-files/index.html | 4 ++-- docs/v11.1.0/deployment/api-gateway-logging/index.html | 4 ++-- docs/v11.1.0/deployment/choosing_configuring_rds/index.html | 4 ++-- docs/v11.1.0/deployment/cloudwatch-logs-delivery/index.html | 4 ++-- docs/v11.1.0/deployment/components/index.html | 4 ++-- docs/v11.1.0/deployment/create_bucket/index.html | 4 ++-- docs/v11.1.0/deployment/cumulus_distribution/index.html | 4 ++-- docs/v11.1.0/deployment/index.html | 4 ++-- .../deployment/postgres_database_deployment/index.html | 4 ++-- docs/v11.1.0/deployment/share-s3-access-logs/index.html | 4 ++-- docs/v11.1.0/deployment/terraform-best-practices/index.html | 4 ++-- docs/v11.1.0/deployment/thin_egress_app/index.html | 4 ++-- docs/v11.1.0/deployment/upgrade-readme/index.html | 4 ++-- docs/v11.1.0/development/forked-pr/index.html | 4 ++-- docs/v11.1.0/development/integration-tests/index.html | 4 ++-- docs/v11.1.0/development/quality-and-coverage/index.html | 4 ++-- docs/v11.1.0/development/release/index.html | 4 ++-- docs/v11.1.0/docs-how-to/index.html | 4 ++-- docs/v11.1.0/external-contributions/index.html | 4 ++-- docs/v11.1.0/faqs/index.html | 4 ++-- docs/v11.1.0/features/ancillary_metadata/index.html | 4 ++-- docs/v11.1.0/features/backup_and_restore/index.html | 4 ++-- docs/v11.1.0/features/dead_letter_archive/index.html | 4 ++-- docs/v11.1.0/features/dead_letter_queues/index.html | 4 ++-- docs/v11.1.0/features/distribution-metrics/index.html | 4 ++-- docs/v11.1.0/features/execution_payload_retention/index.html | 4 ++-- docs/v11.1.0/features/logging-esdis-metrics/index.html | 4 ++-- docs/v11.1.0/features/replay-archived-sqs-messages/index.html | 4 ++-- docs/v11.1.0/features/replay-kinesis-messages/index.html | 4 ++-- docs/v11.1.0/features/reports/index.html | 4 ++-- docs/v11.1.0/getting-started/index.html | 4 ++-- docs/v11.1.0/glossary/index.html | 4 ++-- docs/v11.1.0/index.html | 4 ++-- docs/v11.1.0/integrator-guide/about-int-guide/index.html | 4 ++-- docs/v11.1.0/integrator-guide/int-common-use-cases/index.html | 4 ++-- .../integrator-guide/workflow-add-new-lambda/index.html | 4 ++-- .../integrator-guide/workflow-ts-failed-step/index.html | 4 ++-- docs/v11.1.0/interfaces/index.html | 4 ++-- docs/v11.1.0/operator-docs/about-operator-docs/index.html | 4 ++-- docs/v11.1.0/operator-docs/bulk-operations/index.html | 4 ++-- docs/v11.1.0/operator-docs/cmr-operations/index.html | 4 ++-- docs/v11.1.0/operator-docs/create-rule-in-cumulus/index.html | 4 ++-- docs/v11.1.0/operator-docs/discovery-filtering/index.html | 4 ++-- docs/v11.1.0/operator-docs/granule-workflows/index.html | 4 ++-- .../operator-docs/kinesis-stream-for-ingest/index.html | 4 ++-- docs/v11.1.0/operator-docs/locating-access-logs/index.html | 4 ++-- docs/v11.1.0/operator-docs/naming-executions/index.html | 4 ++-- docs/v11.1.0/operator-docs/ops-common-use-cases/index.html | 4 ++-- docs/v11.1.0/operator-docs/trigger-workflow/index.html | 4 ++-- docs/v11.1.0/tasks/index.html | 4 ++-- docs/v11.1.0/team/index.html | 4 ++-- docs/v11.1.0/troubleshooting/index.html | 4 ++-- docs/v11.1.0/troubleshooting/reindex-elasticsearch/index.html | 4 ++-- .../troubleshooting/rerunning-workflow-executions/index.html | 4 ++-- .../troubleshooting/troubleshooting-deployment/index.html | 4 ++-- .../upgrade-notes/cumulus_distribution_migration/index.html | 4 ++-- docs/v11.1.0/upgrade-notes/migrate_tea_standalone/index.html | 4 ++-- docs/v11.1.0/upgrade-notes/update-cma-2.0.2/index.html | 4 ++-- .../v11.1.0/upgrade-notes/update-task-file-schemas/index.html | 4 ++-- docs/v11.1.0/upgrade-notes/upgrade-rds/index.html | 4 ++-- .../upgrade-notes/upgrade_tf_version_0.13.6/index.html | 4 ++-- docs/v11.1.0/workflow_tasks/discover_granules/index.html | 4 ++-- docs/v11.1.0/workflow_tasks/files_to_granules/index.html | 4 ++-- docs/v11.1.0/workflow_tasks/lzards_backup/index.html | 4 ++-- docs/v11.1.0/workflow_tasks/move_granules/index.html | 4 ++-- docs/v11.1.0/workflow_tasks/parse_pdr/index.html | 4 ++-- docs/v11.1.0/workflow_tasks/queue_granules/index.html | 4 ++-- docs/v11.1.0/workflows/cumulus-task-message-flow/index.html | 4 ++-- .../workflows/developing-a-cumulus-workflow/index.html | 4 ++-- docs/v11.1.0/workflows/developing-workflow-tasks/index.html | 4 ++-- docs/v11.1.0/workflows/docker/index.html | 4 ++-- docs/v11.1.0/workflows/index.html | 4 ++-- docs/v11.1.0/workflows/input_output/index.html | 4 ++-- docs/v11.1.0/workflows/lambda/index.html | 4 ++-- docs/v11.1.0/workflows/protocol/index.html | 4 ++-- .../workflows/workflow-configuration-how-to/index.html | 4 ++-- docs/v11.1.0/workflows/workflow-triggers/index.html | 4 ++-- docs/v12.0.0/adding-a-task/index.html | 4 ++-- docs/v12.0.0/api/index.html | 4 ++-- docs/v12.0.0/architecture/index.html | 4 ++-- docs/v12.0.0/configuration/cloudwatch-retention/index.html | 4 ++-- .../collection-storage-best-practices/index.html | 4 ++-- docs/v12.0.0/configuration/data-management-types/index.html | 4 ++-- docs/v12.0.0/configuration/lifecycle-policies/index.html | 4 ++-- docs/v12.0.0/configuration/monitoring-readme/index.html | 4 ++-- docs/v12.0.0/configuration/server_access_logging/index.html | 4 ++-- docs/v12.0.0/configuration/task-configuration/index.html | 4 ++-- docs/v12.0.0/data-cookbooks/about-cookbooks/index.html | 4 ++-- docs/v12.0.0/data-cookbooks/browse-generation/index.html | 4 ++-- docs/v12.0.0/data-cookbooks/choice-states/index.html | 4 ++-- docs/v12.0.0/data-cookbooks/cnm-workflow/index.html | 4 ++-- docs/v12.0.0/data-cookbooks/error-handling/index.html | 4 ++-- docs/v12.0.0/data-cookbooks/hello-world/index.html | 4 ++-- docs/v12.0.0/data-cookbooks/ingest-notifications/index.html | 4 ++-- docs/v12.0.0/data-cookbooks/queue-post-to-cmr/index.html | 4 ++-- .../data-cookbooks/run-tasks-in-lambda-or-docker/index.html | 4 ++-- docs/v12.0.0/data-cookbooks/sips-workflow/index.html | 4 ++-- .../data-cookbooks/throttling-queued-executions/index.html | 4 ++-- docs/v12.0.0/data-cookbooks/tracking-files/index.html | 4 ++-- docs/v12.0.0/deployment/api-gateway-logging/index.html | 4 ++-- docs/v12.0.0/deployment/choosing_configuring_rds/index.html | 4 ++-- docs/v12.0.0/deployment/cloudwatch-logs-delivery/index.html | 4 ++-- docs/v12.0.0/deployment/components/index.html | 4 ++-- docs/v12.0.0/deployment/create_bucket/index.html | 4 ++-- docs/v12.0.0/deployment/cumulus_distribution/index.html | 4 ++-- docs/v12.0.0/deployment/index.html | 4 ++-- .../deployment/postgres_database_deployment/index.html | 4 ++-- docs/v12.0.0/deployment/share-s3-access-logs/index.html | 4 ++-- docs/v12.0.0/deployment/terraform-best-practices/index.html | 4 ++-- docs/v12.0.0/deployment/thin_egress_app/index.html | 4 ++-- docs/v12.0.0/deployment/upgrade-readme/index.html | 4 ++-- docs/v12.0.0/development/forked-pr/index.html | 4 ++-- docs/v12.0.0/development/integration-tests/index.html | 4 ++-- docs/v12.0.0/development/quality-and-coverage/index.html | 4 ++-- docs/v12.0.0/development/release/index.html | 4 ++-- docs/v12.0.0/docs-how-to/index.html | 4 ++-- docs/v12.0.0/external-contributions/index.html | 4 ++-- docs/v12.0.0/faqs/index.html | 4 ++-- docs/v12.0.0/features/ancillary_metadata/index.html | 4 ++-- docs/v12.0.0/features/backup_and_restore/index.html | 4 ++-- docs/v12.0.0/features/dead_letter_archive/index.html | 4 ++-- docs/v12.0.0/features/dead_letter_queues/index.html | 4 ++-- docs/v12.0.0/features/distribution-metrics/index.html | 4 ++-- docs/v12.0.0/features/execution_payload_retention/index.html | 4 ++-- docs/v12.0.0/features/logging-esdis-metrics/index.html | 4 ++-- docs/v12.0.0/features/replay-archived-sqs-messages/index.html | 4 ++-- docs/v12.0.0/features/replay-kinesis-messages/index.html | 4 ++-- docs/v12.0.0/features/reports/index.html | 4 ++-- docs/v12.0.0/getting-started/index.html | 4 ++-- docs/v12.0.0/glossary/index.html | 4 ++-- docs/v12.0.0/index.html | 4 ++-- docs/v12.0.0/integrator-guide/about-int-guide/index.html | 4 ++-- docs/v12.0.0/integrator-guide/int-common-use-cases/index.html | 4 ++-- .../integrator-guide/workflow-add-new-lambda/index.html | 4 ++-- .../integrator-guide/workflow-ts-failed-step/index.html | 4 ++-- docs/v12.0.0/interfaces/index.html | 4 ++-- docs/v12.0.0/operator-docs/about-operator-docs/index.html | 4 ++-- docs/v12.0.0/operator-docs/bulk-operations/index.html | 4 ++-- docs/v12.0.0/operator-docs/cmr-operations/index.html | 4 ++-- docs/v12.0.0/operator-docs/create-rule-in-cumulus/index.html | 4 ++-- docs/v12.0.0/operator-docs/discovery-filtering/index.html | 4 ++-- docs/v12.0.0/operator-docs/granule-workflows/index.html | 4 ++-- .../operator-docs/kinesis-stream-for-ingest/index.html | 4 ++-- docs/v12.0.0/operator-docs/locating-access-logs/index.html | 4 ++-- docs/v12.0.0/operator-docs/naming-executions/index.html | 4 ++-- docs/v12.0.0/operator-docs/ops-common-use-cases/index.html | 4 ++-- docs/v12.0.0/operator-docs/trigger-workflow/index.html | 4 ++-- docs/v12.0.0/tasks/index.html | 4 ++-- docs/v12.0.0/team/index.html | 4 ++-- docs/v12.0.0/troubleshooting/index.html | 4 ++-- docs/v12.0.0/troubleshooting/reindex-elasticsearch/index.html | 4 ++-- .../troubleshooting/rerunning-workflow-executions/index.html | 4 ++-- .../troubleshooting/troubleshooting-deployment/index.html | 4 ++-- .../upgrade-notes/cumulus_distribution_migration/index.html | 4 ++-- docs/v12.0.0/upgrade-notes/migrate_tea_standalone/index.html | 4 ++-- docs/v12.0.0/upgrade-notes/update-cma-2.0.2/index.html | 4 ++-- .../v12.0.0/upgrade-notes/update-task-file-schemas/index.html | 4 ++-- docs/v12.0.0/upgrade-notes/upgrade-rds/index.html | 4 ++-- .../upgrade-notes/upgrade_tf_version_0.13.6/index.html | 4 ++-- docs/v12.0.0/workflow_tasks/discover_granules/index.html | 4 ++-- docs/v12.0.0/workflow_tasks/files_to_granules/index.html | 4 ++-- docs/v12.0.0/workflow_tasks/lzards_backup/index.html | 4 ++-- docs/v12.0.0/workflow_tasks/move_granules/index.html | 4 ++-- docs/v12.0.0/workflow_tasks/parse_pdr/index.html | 4 ++-- docs/v12.0.0/workflow_tasks/queue_granules/index.html | 4 ++-- docs/v12.0.0/workflows/cumulus-task-message-flow/index.html | 4 ++-- .../workflows/developing-a-cumulus-workflow/index.html | 4 ++-- docs/v12.0.0/workflows/developing-workflow-tasks/index.html | 4 ++-- docs/v12.0.0/workflows/docker/index.html | 4 ++-- docs/v12.0.0/workflows/index.html | 4 ++-- docs/v12.0.0/workflows/input_output/index.html | 4 ++-- docs/v12.0.0/workflows/lambda/index.html | 4 ++-- docs/v12.0.0/workflows/protocol/index.html | 4 ++-- .../workflows/workflow-configuration-how-to/index.html | 4 ++-- docs/v12.0.0/workflows/workflow-triggers/index.html | 4 ++-- docs/v13.0.0/adding-a-task/index.html | 4 ++-- docs/v13.0.0/api/index.html | 4 ++-- docs/v13.0.0/architecture/index.html | 4 ++-- docs/v13.0.0/configuration/cloudwatch-retention/index.html | 4 ++-- .../collection-storage-best-practices/index.html | 4 ++-- docs/v13.0.0/configuration/data-management-types/index.html | 4 ++-- docs/v13.0.0/configuration/lifecycle-policies/index.html | 4 ++-- docs/v13.0.0/configuration/monitoring-readme/index.html | 4 ++-- docs/v13.0.0/configuration/server_access_logging/index.html | 4 ++-- docs/v13.0.0/configuration/task-configuration/index.html | 4 ++-- docs/v13.0.0/data-cookbooks/about-cookbooks/index.html | 4 ++-- docs/v13.0.0/data-cookbooks/browse-generation/index.html | 4 ++-- docs/v13.0.0/data-cookbooks/choice-states/index.html | 4 ++-- docs/v13.0.0/data-cookbooks/cnm-workflow/index.html | 4 ++-- docs/v13.0.0/data-cookbooks/error-handling/index.html | 4 ++-- docs/v13.0.0/data-cookbooks/hello-world/index.html | 4 ++-- docs/v13.0.0/data-cookbooks/ingest-notifications/index.html | 4 ++-- docs/v13.0.0/data-cookbooks/queue-post-to-cmr/index.html | 4 ++-- .../data-cookbooks/run-tasks-in-lambda-or-docker/index.html | 4 ++-- docs/v13.0.0/data-cookbooks/sips-workflow/index.html | 4 ++-- .../data-cookbooks/throttling-queued-executions/index.html | 4 ++-- docs/v13.0.0/data-cookbooks/tracking-files/index.html | 4 ++-- docs/v13.0.0/deployment/api-gateway-logging/index.html | 4 ++-- docs/v13.0.0/deployment/choosing_configuring_rds/index.html | 4 ++-- docs/v13.0.0/deployment/cloudwatch-logs-delivery/index.html | 4 ++-- docs/v13.0.0/deployment/components/index.html | 4 ++-- docs/v13.0.0/deployment/create_bucket/index.html | 4 ++-- docs/v13.0.0/deployment/cumulus_distribution/index.html | 4 ++-- docs/v13.0.0/deployment/index.html | 4 ++-- .../deployment/postgres_database_deployment/index.html | 4 ++-- docs/v13.0.0/deployment/share-s3-access-logs/index.html | 4 ++-- docs/v13.0.0/deployment/terraform-best-practices/index.html | 4 ++-- docs/v13.0.0/deployment/thin_egress_app/index.html | 4 ++-- docs/v13.0.0/deployment/upgrade-readme/index.html | 4 ++-- docs/v13.0.0/development/forked-pr/index.html | 4 ++-- docs/v13.0.0/development/integration-tests/index.html | 4 ++-- docs/v13.0.0/development/quality-and-coverage/index.html | 4 ++-- docs/v13.0.0/development/release/index.html | 4 ++-- docs/v13.0.0/docs-how-to/index.html | 4 ++-- docs/v13.0.0/external-contributions/index.html | 4 ++-- docs/v13.0.0/faqs/index.html | 4 ++-- docs/v13.0.0/features/ancillary_metadata/index.html | 4 ++-- docs/v13.0.0/features/backup_and_restore/index.html | 4 ++-- docs/v13.0.0/features/dead_letter_archive/index.html | 4 ++-- docs/v13.0.0/features/dead_letter_queues/index.html | 4 ++-- docs/v13.0.0/features/distribution-metrics/index.html | 4 ++-- docs/v13.0.0/features/execution_payload_retention/index.html | 4 ++-- docs/v13.0.0/features/logging-esdis-metrics/index.html | 4 ++-- docs/v13.0.0/features/replay-archived-sqs-messages/index.html | 4 ++-- docs/v13.0.0/features/replay-kinesis-messages/index.html | 4 ++-- docs/v13.0.0/features/reports/index.html | 4 ++-- docs/v13.0.0/getting-started/index.html | 4 ++-- docs/v13.0.0/glossary/index.html | 4 ++-- docs/v13.0.0/index.html | 4 ++-- docs/v13.0.0/integrator-guide/about-int-guide/index.html | 4 ++-- docs/v13.0.0/integrator-guide/int-common-use-cases/index.html | 4 ++-- .../integrator-guide/workflow-add-new-lambda/index.html | 4 ++-- .../integrator-guide/workflow-ts-failed-step/index.html | 4 ++-- docs/v13.0.0/interfaces/index.html | 4 ++-- docs/v13.0.0/operator-docs/about-operator-docs/index.html | 4 ++-- docs/v13.0.0/operator-docs/bulk-operations/index.html | 4 ++-- docs/v13.0.0/operator-docs/cmr-operations/index.html | 4 ++-- docs/v13.0.0/operator-docs/create-rule-in-cumulus/index.html | 4 ++-- docs/v13.0.0/operator-docs/discovery-filtering/index.html | 4 ++-- docs/v13.0.0/operator-docs/granule-workflows/index.html | 4 ++-- .../operator-docs/kinesis-stream-for-ingest/index.html | 4 ++-- docs/v13.0.0/operator-docs/locating-access-logs/index.html | 4 ++-- docs/v13.0.0/operator-docs/naming-executions/index.html | 4 ++-- docs/v13.0.0/operator-docs/ops-common-use-cases/index.html | 4 ++-- docs/v13.0.0/operator-docs/trigger-workflow/index.html | 4 ++-- docs/v13.0.0/tasks/index.html | 4 ++-- docs/v13.0.0/team/index.html | 4 ++-- docs/v13.0.0/troubleshooting/index.html | 4 ++-- docs/v13.0.0/troubleshooting/reindex-elasticsearch/index.html | 4 ++-- .../troubleshooting/rerunning-workflow-executions/index.html | 4 ++-- .../troubleshooting/troubleshooting-deployment/index.html | 4 ++-- .../upgrade-notes/cumulus_distribution_migration/index.html | 4 ++-- docs/v13.0.0/upgrade-notes/migrate_tea_standalone/index.html | 4 ++-- docs/v13.0.0/upgrade-notes/update-cma-2.0.2/index.html | 4 ++-- .../v13.0.0/upgrade-notes/update-task-file-schemas/index.html | 4 ++-- docs/v13.0.0/upgrade-notes/upgrade-rds/index.html | 4 ++-- .../upgrade-notes/upgrade_tf_version_0.13.6/index.html | 4 ++-- docs/v13.0.0/workflow_tasks/discover_granules/index.html | 4 ++-- docs/v13.0.0/workflow_tasks/files_to_granules/index.html | 4 ++-- docs/v13.0.0/workflow_tasks/lzards_backup/index.html | 4 ++-- docs/v13.0.0/workflow_tasks/move_granules/index.html | 4 ++-- docs/v13.0.0/workflow_tasks/parse_pdr/index.html | 4 ++-- docs/v13.0.0/workflow_tasks/queue_granules/index.html | 4 ++-- docs/v13.0.0/workflows/cumulus-task-message-flow/index.html | 4 ++-- .../workflows/developing-a-cumulus-workflow/index.html | 4 ++-- docs/v13.0.0/workflows/developing-workflow-tasks/index.html | 4 ++-- docs/v13.0.0/workflows/docker/index.html | 4 ++-- docs/v13.0.0/workflows/index.html | 4 ++-- docs/v13.0.0/workflows/input_output/index.html | 4 ++-- docs/v13.0.0/workflows/lambda/index.html | 4 ++-- docs/v13.0.0/workflows/protocol/index.html | 4 ++-- .../workflows/workflow-configuration-how-to/index.html | 4 ++-- docs/v13.0.0/workflows/workflow-triggers/index.html | 4 ++-- docs/v13.4.0/adding-a-task/index.html | 4 ++-- docs/v13.4.0/api/index.html | 4 ++-- docs/v13.4.0/architecture/index.html | 4 ++-- docs/v13.4.0/configuration/cloudwatch-retention/index.html | 4 ++-- .../collection-storage-best-practices/index.html | 4 ++-- docs/v13.4.0/configuration/data-management-types/index.html | 4 ++-- docs/v13.4.0/configuration/lifecycle-policies/index.html | 4 ++-- docs/v13.4.0/configuration/monitoring-readme/index.html | 4 ++-- docs/v13.4.0/configuration/server_access_logging/index.html | 4 ++-- docs/v13.4.0/configuration/task-configuration/index.html | 4 ++-- docs/v13.4.0/data-cookbooks/about-cookbooks/index.html | 4 ++-- docs/v13.4.0/data-cookbooks/browse-generation/index.html | 4 ++-- docs/v13.4.0/data-cookbooks/choice-states/index.html | 4 ++-- docs/v13.4.0/data-cookbooks/cnm-workflow/index.html | 4 ++-- docs/v13.4.0/data-cookbooks/error-handling/index.html | 4 ++-- docs/v13.4.0/data-cookbooks/hello-world/index.html | 4 ++-- docs/v13.4.0/data-cookbooks/ingest-notifications/index.html | 4 ++-- docs/v13.4.0/data-cookbooks/queue-post-to-cmr/index.html | 4 ++-- .../data-cookbooks/run-tasks-in-lambda-or-docker/index.html | 4 ++-- docs/v13.4.0/data-cookbooks/sips-workflow/index.html | 4 ++-- .../data-cookbooks/throttling-queued-executions/index.html | 4 ++-- docs/v13.4.0/data-cookbooks/tracking-files/index.html | 4 ++-- docs/v13.4.0/deployment/api-gateway-logging/index.html | 4 ++-- docs/v13.4.0/deployment/choosing_configuring_rds/index.html | 4 ++-- docs/v13.4.0/deployment/cloudwatch-logs-delivery/index.html | 4 ++-- docs/v13.4.0/deployment/components/index.html | 4 ++-- docs/v13.4.0/deployment/create_bucket/index.html | 4 ++-- docs/v13.4.0/deployment/cumulus_distribution/index.html | 4 ++-- docs/v13.4.0/deployment/index.html | 4 ++-- .../deployment/postgres_database_deployment/index.html | 4 ++-- docs/v13.4.0/deployment/share-s3-access-logs/index.html | 4 ++-- docs/v13.4.0/deployment/terraform-best-practices/index.html | 4 ++-- docs/v13.4.0/deployment/thin_egress_app/index.html | 4 ++-- docs/v13.4.0/deployment/upgrade-readme/index.html | 4 ++-- docs/v13.4.0/development/forked-pr/index.html | 4 ++-- docs/v13.4.0/development/integration-tests/index.html | 4 ++-- docs/v13.4.0/development/quality-and-coverage/index.html | 4 ++-- docs/v13.4.0/development/release/index.html | 4 ++-- docs/v13.4.0/docs-how-to/index.html | 4 ++-- docs/v13.4.0/external-contributions/index.html | 4 ++-- docs/v13.4.0/faqs/index.html | 4 ++-- docs/v13.4.0/features/ancillary_metadata/index.html | 4 ++-- docs/v13.4.0/features/backup_and_restore/index.html | 4 ++-- docs/v13.4.0/features/dead_letter_archive/index.html | 4 ++-- docs/v13.4.0/features/dead_letter_queues/index.html | 4 ++-- docs/v13.4.0/features/distribution-metrics/index.html | 4 ++-- docs/v13.4.0/features/execution_payload_retention/index.html | 4 ++-- docs/v13.4.0/features/logging-esdis-metrics/index.html | 4 ++-- docs/v13.4.0/features/replay-archived-sqs-messages/index.html | 4 ++-- docs/v13.4.0/features/replay-kinesis-messages/index.html | 4 ++-- docs/v13.4.0/features/reports/index.html | 4 ++-- docs/v13.4.0/getting-started/index.html | 4 ++-- docs/v13.4.0/glossary/index.html | 4 ++-- docs/v13.4.0/index.html | 4 ++-- docs/v13.4.0/integrator-guide/about-int-guide/index.html | 4 ++-- docs/v13.4.0/integrator-guide/int-common-use-cases/index.html | 4 ++-- .../integrator-guide/workflow-add-new-lambda/index.html | 4 ++-- .../integrator-guide/workflow-ts-failed-step/index.html | 4 ++-- docs/v13.4.0/interfaces/index.html | 4 ++-- docs/v13.4.0/operator-docs/about-operator-docs/index.html | 4 ++-- docs/v13.4.0/operator-docs/bulk-operations/index.html | 4 ++-- docs/v13.4.0/operator-docs/cmr-operations/index.html | 4 ++-- docs/v13.4.0/operator-docs/create-rule-in-cumulus/index.html | 4 ++-- docs/v13.4.0/operator-docs/discovery-filtering/index.html | 4 ++-- docs/v13.4.0/operator-docs/granule-workflows/index.html | 4 ++-- .../operator-docs/kinesis-stream-for-ingest/index.html | 4 ++-- docs/v13.4.0/operator-docs/locating-access-logs/index.html | 4 ++-- docs/v13.4.0/operator-docs/naming-executions/index.html | 4 ++-- docs/v13.4.0/operator-docs/ops-common-use-cases/index.html | 4 ++-- docs/v13.4.0/operator-docs/trigger-workflow/index.html | 4 ++-- docs/v13.4.0/tasks/index.html | 4 ++-- docs/v13.4.0/team/index.html | 4 ++-- docs/v13.4.0/troubleshooting/index.html | 4 ++-- docs/v13.4.0/troubleshooting/reindex-elasticsearch/index.html | 4 ++-- .../troubleshooting/rerunning-workflow-executions/index.html | 4 ++-- .../troubleshooting/troubleshooting-deployment/index.html | 4 ++-- .../upgrade-notes/cumulus_distribution_migration/index.html | 4 ++-- docs/v13.4.0/upgrade-notes/migrate_tea_standalone/index.html | 4 ++-- docs/v13.4.0/upgrade-notes/update-cma-2.0.2/index.html | 4 ++-- .../v13.4.0/upgrade-notes/update-task-file-schemas/index.html | 4 ++-- docs/v13.4.0/upgrade-notes/upgrade-rds/index.html | 4 ++-- .../upgrade-notes/upgrade_tf_version_0.13.6/index.html | 4 ++-- docs/v13.4.0/workflow_tasks/discover_granules/index.html | 4 ++-- docs/v13.4.0/workflow_tasks/files_to_granules/index.html | 4 ++-- docs/v13.4.0/workflow_tasks/lzards_backup/index.html | 4 ++-- docs/v13.4.0/workflow_tasks/move_granules/index.html | 4 ++-- docs/v13.4.0/workflow_tasks/parse_pdr/index.html | 4 ++-- docs/v13.4.0/workflow_tasks/queue_granules/index.html | 4 ++-- docs/v13.4.0/workflows/cumulus-task-message-flow/index.html | 4 ++-- .../workflows/developing-a-cumulus-workflow/index.html | 4 ++-- docs/v13.4.0/workflows/developing-workflow-tasks/index.html | 4 ++-- docs/v13.4.0/workflows/docker/index.html | 4 ++-- docs/v13.4.0/workflows/index.html | 4 ++-- docs/v13.4.0/workflows/input_output/index.html | 4 ++-- docs/v13.4.0/workflows/lambda/index.html | 4 ++-- docs/v13.4.0/workflows/protocol/index.html | 4 ++-- .../workflows/workflow-configuration-how-to/index.html | 4 ++-- docs/v13.4.0/workflows/workflow-triggers/index.html | 4 ++-- docs/v14.1.0/adding-a-task/index.html | 4 ++-- docs/v14.1.0/api/index.html | 4 ++-- docs/v14.1.0/architecture/index.html | 4 ++-- docs/v14.1.0/configuration/cloudwatch-retention/index.html | 4 ++-- .../collection-storage-best-practices/index.html | 4 ++-- docs/v14.1.0/configuration/data-management-types/index.html | 4 ++-- docs/v14.1.0/configuration/lifecycle-policies/index.html | 4 ++-- docs/v14.1.0/configuration/monitoring-readme/index.html | 4 ++-- docs/v14.1.0/configuration/server_access_logging/index.html | 4 ++-- docs/v14.1.0/configuration/task-configuration/index.html | 4 ++-- docs/v14.1.0/data-cookbooks/about-cookbooks/index.html | 4 ++-- docs/v14.1.0/data-cookbooks/browse-generation/index.html | 4 ++-- docs/v14.1.0/data-cookbooks/choice-states/index.html | 4 ++-- docs/v14.1.0/data-cookbooks/cnm-workflow/index.html | 4 ++-- docs/v14.1.0/data-cookbooks/error-handling/index.html | 4 ++-- docs/v14.1.0/data-cookbooks/hello-world/index.html | 4 ++-- docs/v14.1.0/data-cookbooks/ingest-notifications/index.html | 4 ++-- docs/v14.1.0/data-cookbooks/queue-post-to-cmr/index.html | 4 ++-- .../data-cookbooks/run-tasks-in-lambda-or-docker/index.html | 4 ++-- docs/v14.1.0/data-cookbooks/sips-workflow/index.html | 4 ++-- .../data-cookbooks/throttling-queued-executions/index.html | 4 ++-- docs/v14.1.0/data-cookbooks/tracking-files/index.html | 4 ++-- docs/v14.1.0/deployment/api-gateway-logging/index.html | 4 ++-- docs/v14.1.0/deployment/choosing_configuring_rds/index.html | 4 ++-- docs/v14.1.0/deployment/cloudwatch-logs-delivery/index.html | 4 ++-- docs/v14.1.0/deployment/components/index.html | 4 ++-- docs/v14.1.0/deployment/create_bucket/index.html | 4 ++-- docs/v14.1.0/deployment/cumulus_distribution/index.html | 4 ++-- docs/v14.1.0/deployment/index.html | 4 ++-- .../deployment/postgres_database_deployment/index.html | 4 ++-- docs/v14.1.0/deployment/share-s3-access-logs/index.html | 4 ++-- docs/v14.1.0/deployment/terraform-best-practices/index.html | 4 ++-- docs/v14.1.0/deployment/thin_egress_app/index.html | 4 ++-- docs/v14.1.0/deployment/upgrade-readme/index.html | 4 ++-- docs/v14.1.0/development/forked-pr/index.html | 4 ++-- docs/v14.1.0/development/integration-tests/index.html | 4 ++-- docs/v14.1.0/development/quality-and-coverage/index.html | 4 ++-- docs/v14.1.0/development/release/index.html | 4 ++-- docs/v14.1.0/docs-how-to/index.html | 4 ++-- docs/v14.1.0/external-contributions/index.html | 4 ++-- docs/v14.1.0/faqs/index.html | 4 ++-- docs/v14.1.0/features/ancillary_metadata/index.html | 4 ++-- docs/v14.1.0/features/backup_and_restore/index.html | 4 ++-- docs/v14.1.0/features/dead_letter_archive/index.html | 4 ++-- docs/v14.1.0/features/dead_letter_queues/index.html | 4 ++-- docs/v14.1.0/features/distribution-metrics/index.html | 4 ++-- docs/v14.1.0/features/execution_payload_retention/index.html | 4 ++-- docs/v14.1.0/features/logging-esdis-metrics/index.html | 4 ++-- docs/v14.1.0/features/replay-archived-sqs-messages/index.html | 4 ++-- docs/v14.1.0/features/replay-kinesis-messages/index.html | 4 ++-- docs/v14.1.0/features/reports/index.html | 4 ++-- docs/v14.1.0/getting-started/index.html | 4 ++-- docs/v14.1.0/glossary/index.html | 4 ++-- docs/v14.1.0/index.html | 4 ++-- docs/v14.1.0/integrator-guide/about-int-guide/index.html | 4 ++-- docs/v14.1.0/integrator-guide/int-common-use-cases/index.html | 4 ++-- .../integrator-guide/workflow-add-new-lambda/index.html | 4 ++-- .../integrator-guide/workflow-ts-failed-step/index.html | 4 ++-- docs/v14.1.0/interfaces/index.html | 4 ++-- docs/v14.1.0/operator-docs/about-operator-docs/index.html | 4 ++-- docs/v14.1.0/operator-docs/bulk-operations/index.html | 4 ++-- docs/v14.1.0/operator-docs/cmr-operations/index.html | 4 ++-- docs/v14.1.0/operator-docs/create-rule-in-cumulus/index.html | 4 ++-- docs/v14.1.0/operator-docs/discovery-filtering/index.html | 4 ++-- docs/v14.1.0/operator-docs/granule-workflows/index.html | 4 ++-- .../operator-docs/kinesis-stream-for-ingest/index.html | 4 ++-- docs/v14.1.0/operator-docs/locating-access-logs/index.html | 4 ++-- docs/v14.1.0/operator-docs/naming-executions/index.html | 4 ++-- docs/v14.1.0/operator-docs/ops-common-use-cases/index.html | 4 ++-- docs/v14.1.0/operator-docs/trigger-workflow/index.html | 4 ++-- docs/v14.1.0/tasks/index.html | 4 ++-- docs/v14.1.0/team/index.html | 4 ++-- docs/v14.1.0/troubleshooting/index.html | 4 ++-- docs/v14.1.0/troubleshooting/reindex-elasticsearch/index.html | 4 ++-- .../troubleshooting/rerunning-workflow-executions/index.html | 4 ++-- .../troubleshooting/troubleshooting-deployment/index.html | 4 ++-- .../upgrade-notes/cumulus_distribution_migration/index.html | 4 ++-- docs/v14.1.0/upgrade-notes/migrate_tea_standalone/index.html | 4 ++-- docs/v14.1.0/upgrade-notes/update-cma-2.0.2/index.html | 4 ++-- .../v14.1.0/upgrade-notes/update-task-file-schemas/index.html | 4 ++-- docs/v14.1.0/upgrade-notes/upgrade-rds/index.html | 4 ++-- .../upgrade-notes/upgrade_tf_version_0.13.6/index.html | 4 ++-- docs/v14.1.0/workflow_tasks/discover_granules/index.html | 4 ++-- docs/v14.1.0/workflow_tasks/files_to_granules/index.html | 4 ++-- docs/v14.1.0/workflow_tasks/lzards_backup/index.html | 4 ++-- docs/v14.1.0/workflow_tasks/move_granules/index.html | 4 ++-- docs/v14.1.0/workflow_tasks/parse_pdr/index.html | 4 ++-- docs/v14.1.0/workflow_tasks/queue_granules/index.html | 4 ++-- docs/v14.1.0/workflows/cumulus-task-message-flow/index.html | 4 ++-- .../workflows/developing-a-cumulus-workflow/index.html | 4 ++-- docs/v14.1.0/workflows/developing-workflow-tasks/index.html | 4 ++-- docs/v14.1.0/workflows/docker/index.html | 4 ++-- docs/v14.1.0/workflows/index.html | 4 ++-- docs/v14.1.0/workflows/input_output/index.html | 4 ++-- docs/v14.1.0/workflows/lambda/index.html | 4 ++-- docs/v14.1.0/workflows/protocol/index.html | 4 ++-- .../workflows/workflow-configuration-how-to/index.html | 4 ++-- docs/v14.1.0/workflows/workflow-triggers/index.html | 4 ++-- docs/v15.0.2/adding-a-task/index.html | 4 ++-- docs/v15.0.2/api/index.html | 4 ++-- docs/v15.0.2/architecture/index.html | 4 ++-- docs/v15.0.2/configuration/cloudwatch-retention/index.html | 4 ++-- .../collection-storage-best-practices/index.html | 4 ++-- docs/v15.0.2/configuration/data-management-types/index.html | 4 ++-- docs/v15.0.2/configuration/lifecycle-policies/index.html | 4 ++-- docs/v15.0.2/configuration/monitoring-readme/index.html | 4 ++-- docs/v15.0.2/configuration/server_access_logging/index.html | 4 ++-- docs/v15.0.2/configuration/task-configuration/index.html | 4 ++-- docs/v15.0.2/data-cookbooks/about-cookbooks/index.html | 4 ++-- docs/v15.0.2/data-cookbooks/browse-generation/index.html | 4 ++-- docs/v15.0.2/data-cookbooks/choice-states/index.html | 4 ++-- docs/v15.0.2/data-cookbooks/cnm-workflow/index.html | 4 ++-- docs/v15.0.2/data-cookbooks/error-handling/index.html | 4 ++-- docs/v15.0.2/data-cookbooks/hello-world/index.html | 4 ++-- docs/v15.0.2/data-cookbooks/ingest-notifications/index.html | 4 ++-- docs/v15.0.2/data-cookbooks/queue-post-to-cmr/index.html | 4 ++-- .../data-cookbooks/run-tasks-in-lambda-or-docker/index.html | 4 ++-- docs/v15.0.2/data-cookbooks/sips-workflow/index.html | 4 ++-- .../data-cookbooks/throttling-queued-executions/index.html | 4 ++-- docs/v15.0.2/data-cookbooks/tracking-files/index.html | 4 ++-- docs/v15.0.2/deployment/api-gateway-logging/index.html | 4 ++-- docs/v15.0.2/deployment/choosing_configuring_rds/index.html | 4 ++-- docs/v15.0.2/deployment/cloudwatch-logs-delivery/index.html | 4 ++-- docs/v15.0.2/deployment/components/index.html | 4 ++-- docs/v15.0.2/deployment/create_bucket/index.html | 4 ++-- docs/v15.0.2/deployment/cumulus_distribution/index.html | 4 ++-- docs/v15.0.2/deployment/index.html | 4 ++-- .../deployment/postgres_database_deployment/index.html | 4 ++-- docs/v15.0.2/deployment/share-s3-access-logs/index.html | 4 ++-- docs/v15.0.2/deployment/terraform-best-practices/index.html | 4 ++-- docs/v15.0.2/deployment/thin_egress_app/index.html | 4 ++-- docs/v15.0.2/deployment/upgrade-readme/index.html | 4 ++-- docs/v15.0.2/development/forked-pr/index.html | 4 ++-- docs/v15.0.2/development/integration-tests/index.html | 4 ++-- docs/v15.0.2/development/quality-and-coverage/index.html | 4 ++-- docs/v15.0.2/development/release/index.html | 4 ++-- docs/v15.0.2/docs-how-to/index.html | 4 ++-- docs/v15.0.2/external-contributions/index.html | 4 ++-- docs/v15.0.2/faqs/index.html | 4 ++-- docs/v15.0.2/features/ancillary_metadata/index.html | 4 ++-- docs/v15.0.2/features/backup_and_restore/index.html | 4 ++-- docs/v15.0.2/features/dead_letter_archive/index.html | 4 ++-- docs/v15.0.2/features/dead_letter_queues/index.html | 4 ++-- docs/v15.0.2/features/distribution-metrics/index.html | 4 ++-- docs/v15.0.2/features/execution_payload_retention/index.html | 4 ++-- docs/v15.0.2/features/logging-esdis-metrics/index.html | 4 ++-- docs/v15.0.2/features/replay-archived-sqs-messages/index.html | 4 ++-- docs/v15.0.2/features/replay-kinesis-messages/index.html | 4 ++-- docs/v15.0.2/features/reports/index.html | 4 ++-- docs/v15.0.2/getting-started/index.html | 4 ++-- docs/v15.0.2/glossary/index.html | 4 ++-- docs/v15.0.2/index.html | 4 ++-- docs/v15.0.2/integrator-guide/about-int-guide/index.html | 4 ++-- docs/v15.0.2/integrator-guide/int-common-use-cases/index.html | 4 ++-- .../integrator-guide/workflow-add-new-lambda/index.html | 4 ++-- .../integrator-guide/workflow-ts-failed-step/index.html | 4 ++-- docs/v15.0.2/interfaces/index.html | 4 ++-- docs/v15.0.2/operator-docs/about-operator-docs/index.html | 4 ++-- docs/v15.0.2/operator-docs/bulk-operations/index.html | 4 ++-- docs/v15.0.2/operator-docs/cmr-operations/index.html | 4 ++-- docs/v15.0.2/operator-docs/create-rule-in-cumulus/index.html | 4 ++-- docs/v15.0.2/operator-docs/discovery-filtering/index.html | 4 ++-- docs/v15.0.2/operator-docs/granule-workflows/index.html | 4 ++-- .../operator-docs/kinesis-stream-for-ingest/index.html | 4 ++-- docs/v15.0.2/operator-docs/locating-access-logs/index.html | 4 ++-- docs/v15.0.2/operator-docs/naming-executions/index.html | 4 ++-- docs/v15.0.2/operator-docs/ops-common-use-cases/index.html | 4 ++-- docs/v15.0.2/operator-docs/trigger-workflow/index.html | 4 ++-- docs/v15.0.2/tasks/index.html | 4 ++-- docs/v15.0.2/team/index.html | 4 ++-- docs/v15.0.2/troubleshooting/index.html | 4 ++-- docs/v15.0.2/troubleshooting/reindex-elasticsearch/index.html | 4 ++-- .../troubleshooting/rerunning-workflow-executions/index.html | 4 ++-- .../troubleshooting/troubleshooting-deployment/index.html | 4 ++-- .../upgrade-notes/cumulus_distribution_migration/index.html | 4 ++-- docs/v15.0.2/upgrade-notes/migrate_tea_standalone/index.html | 4 ++-- docs/v15.0.2/upgrade-notes/update-cma-2.0.2/index.html | 4 ++-- .../v15.0.2/upgrade-notes/update-task-file-schemas/index.html | 4 ++-- docs/v15.0.2/upgrade-notes/upgrade-rds/index.html | 4 ++-- .../upgrade-notes/upgrade_tf_version_0.13.6/index.html | 4 ++-- docs/v15.0.2/workflow_tasks/discover_granules/index.html | 4 ++-- docs/v15.0.2/workflow_tasks/files_to_granules/index.html | 4 ++-- docs/v15.0.2/workflow_tasks/lzards_backup/index.html | 4 ++-- docs/v15.0.2/workflow_tasks/move_granules/index.html | 4 ++-- docs/v15.0.2/workflow_tasks/parse_pdr/index.html | 4 ++-- docs/v15.0.2/workflow_tasks/queue_granules/index.html | 4 ++-- docs/v15.0.2/workflows/cumulus-task-message-flow/index.html | 4 ++-- .../workflows/developing-a-cumulus-workflow/index.html | 4 ++-- docs/v15.0.2/workflows/developing-workflow-tasks/index.html | 4 ++-- docs/v15.0.2/workflows/docker/index.html | 4 ++-- docs/v15.0.2/workflows/index.html | 4 ++-- docs/v15.0.2/workflows/input_output/index.html | 4 ++-- docs/v15.0.2/workflows/lambda/index.html | 4 ++-- docs/v15.0.2/workflows/protocol/index.html | 4 ++-- .../workflows/workflow-configuration-how-to/index.html | 4 ++-- docs/v15.0.2/workflows/workflow-triggers/index.html | 4 ++-- docs/v9.0.0/adding-a-task/index.html | 4 ++-- docs/v9.0.0/api/index.html | 4 ++-- docs/v9.0.0/architecture/index.html | 4 ++-- docs/v9.0.0/configuration/cloudwatch-retention/index.html | 4 ++-- .../collection-storage-best-practices/index.html | 4 ++-- docs/v9.0.0/configuration/data-management-types/index.html | 4 ++-- docs/v9.0.0/configuration/lifecycle-policies/index.html | 4 ++-- docs/v9.0.0/configuration/monitoring-readme/index.html | 4 ++-- docs/v9.0.0/configuration/server_access_logging/index.html | 4 ++-- docs/v9.0.0/data-cookbooks/about-cookbooks/index.html | 4 ++-- docs/v9.0.0/data-cookbooks/browse-generation/index.html | 4 ++-- docs/v9.0.0/data-cookbooks/choice-states/index.html | 4 ++-- docs/v9.0.0/data-cookbooks/cnm-workflow/index.html | 4 ++-- docs/v9.0.0/data-cookbooks/error-handling/index.html | 4 ++-- docs/v9.0.0/data-cookbooks/hello-world/index.html | 4 ++-- docs/v9.0.0/data-cookbooks/ingest-notifications/index.html | 4 ++-- docs/v9.0.0/data-cookbooks/queue-post-to-cmr/index.html | 4 ++-- .../data-cookbooks/run-tasks-in-lambda-or-docker/index.html | 4 ++-- docs/v9.0.0/data-cookbooks/sips-workflow/index.html | 4 ++-- .../data-cookbooks/throttling-queued-executions/index.html | 4 ++-- docs/v9.0.0/data-cookbooks/tracking-files/index.html | 4 ++-- docs/v9.0.0/deployment/api-gateway-logging/index.html | 4 ++-- docs/v9.0.0/deployment/cloudwatch-logs-delivery/index.html | 4 ++-- docs/v9.0.0/deployment/components/index.html | 4 ++-- docs/v9.0.0/deployment/create_bucket/index.html | 4 ++-- docs/v9.0.0/deployment/index.html | 4 ++-- .../v9.0.0/deployment/postgres_database_deployment/index.html | 4 ++-- docs/v9.0.0/deployment/share-s3-access-logs/index.html | 4 ++-- docs/v9.0.0/deployment/terraform-best-practices/index.html | 4 ++-- docs/v9.0.0/deployment/thin_egress_app/index.html | 4 ++-- docs/v9.0.0/deployment/upgrade-readme/index.html | 4 ++-- docs/v9.0.0/development/forked-pr/index.html | 4 ++-- docs/v9.0.0/development/integration-tests/index.html | 4 ++-- docs/v9.0.0/development/quality-and-coverage/index.html | 4 ++-- docs/v9.0.0/development/release/index.html | 4 ++-- docs/v9.0.0/docs-how-to/index.html | 4 ++-- docs/v9.0.0/external-contributions/index.html | 4 ++-- docs/v9.0.0/faqs/index.html | 4 ++-- docs/v9.0.0/features/ancillary_metadata/index.html | 4 ++-- docs/v9.0.0/features/backup_and_restore/index.html | 4 ++-- docs/v9.0.0/features/data_in_dynamodb/index.html | 4 ++-- docs/v9.0.0/features/dead_letter_queues/index.html | 4 ++-- docs/v9.0.0/features/distribution-metrics/index.html | 4 ++-- docs/v9.0.0/features/ems_reporting/index.html | 4 ++-- docs/v9.0.0/features/execution_payload_retention/index.html | 4 ++-- docs/v9.0.0/features/lambda_versioning/index.html | 4 ++-- docs/v9.0.0/features/logging-esdis-metrics/index.html | 4 ++-- docs/v9.0.0/features/replay-kinesis-messages/index.html | 4 ++-- docs/v9.0.0/features/reports/index.html | 4 ++-- docs/v9.0.0/getting-started/index.html | 4 ++-- docs/v9.0.0/glossary/index.html | 4 ++-- docs/v9.0.0/index.html | 4 ++-- docs/v9.0.0/integrator-guide/about-int-guide/index.html | 4 ++-- docs/v9.0.0/integrator-guide/int-common-use-cases/index.html | 4 ++-- .../integrator-guide/workflow-add-new-lambda/index.html | 4 ++-- .../integrator-guide/workflow-ts-failed-step/index.html | 4 ++-- docs/v9.0.0/interfaces/index.html | 4 ++-- docs/v9.0.0/operator-docs/about-operator-docs/index.html | 4 ++-- docs/v9.0.0/operator-docs/bulk-operations/index.html | 4 ++-- docs/v9.0.0/operator-docs/cmr-operations/index.html | 4 ++-- docs/v9.0.0/operator-docs/create-rule-in-cumulus/index.html | 4 ++-- docs/v9.0.0/operator-docs/discovery-filtering/index.html | 4 ++-- docs/v9.0.0/operator-docs/granule-workflows/index.html | 4 ++-- .../v9.0.0/operator-docs/kinesis-stream-for-ingest/index.html | 4 ++-- docs/v9.0.0/operator-docs/locating-access-logs/index.html | 4 ++-- docs/v9.0.0/operator-docs/naming-executions/index.html | 4 ++-- docs/v9.0.0/operator-docs/ops-common-use-cases/index.html | 4 ++-- docs/v9.0.0/operator-docs/trigger-workflow/index.html | 4 ++-- docs/v9.0.0/tasks/index.html | 4 ++-- docs/v9.0.0/team/index.html | 4 ++-- docs/v9.0.0/troubleshooting/index.html | 4 ++-- docs/v9.0.0/troubleshooting/reindex-elasticsearch/index.html | 4 ++-- .../troubleshooting/rerunning-workflow-executions/index.html | 4 ++-- .../troubleshooting/troubleshooting-deployment/index.html | 4 ++-- docs/v9.0.0/upgrade-notes/migrate_tea_standalone/index.html | 4 ++-- docs/v9.0.0/upgrade-notes/upgrade-rds/index.html | 4 ++-- .../v9.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html | 4 ++-- docs/v9.0.0/workflow_tasks/discover_granules/index.html | 4 ++-- docs/v9.0.0/workflow_tasks/files_to_granules/index.html | 4 ++-- docs/v9.0.0/workflow_tasks/move_granules/index.html | 4 ++-- docs/v9.0.0/workflow_tasks/parse_pdr/index.html | 4 ++-- docs/v9.0.0/workflows/cumulus-task-message-flow/index.html | 4 ++-- .../v9.0.0/workflows/developing-a-cumulus-workflow/index.html | 4 ++-- docs/v9.0.0/workflows/developing-workflow-tasks/index.html | 4 ++-- docs/v9.0.0/workflows/docker/index.html | 4 ++-- docs/v9.0.0/workflows/index.html | 4 ++-- docs/v9.0.0/workflows/input_output/index.html | 4 ++-- docs/v9.0.0/workflows/lambda/index.html | 4 ++-- docs/v9.0.0/workflows/protocol/index.html | 4 ++-- .../v9.0.0/workflows/workflow-configuration-how-to/index.html | 4 ++-- docs/v9.0.0/workflows/workflow-triggers/index.html | 4 ++-- docs/v9.9.0/adding-a-task/index.html | 4 ++-- docs/v9.9.0/api/index.html | 4 ++-- docs/v9.9.0/architecture/index.html | 4 ++-- docs/v9.9.0/configuration/cloudwatch-retention/index.html | 4 ++-- .../collection-storage-best-practices/index.html | 4 ++-- docs/v9.9.0/configuration/data-management-types/index.html | 4 ++-- docs/v9.9.0/configuration/lifecycle-policies/index.html | 4 ++-- docs/v9.9.0/configuration/monitoring-readme/index.html | 4 ++-- docs/v9.9.0/configuration/server_access_logging/index.html | 4 ++-- docs/v9.9.0/configuration/task-configuration/index.html | 4 ++-- docs/v9.9.0/data-cookbooks/about-cookbooks/index.html | 4 ++-- docs/v9.9.0/data-cookbooks/browse-generation/index.html | 4 ++-- docs/v9.9.0/data-cookbooks/choice-states/index.html | 4 ++-- docs/v9.9.0/data-cookbooks/cnm-workflow/index.html | 4 ++-- docs/v9.9.0/data-cookbooks/error-handling/index.html | 4 ++-- docs/v9.9.0/data-cookbooks/hello-world/index.html | 4 ++-- docs/v9.9.0/data-cookbooks/ingest-notifications/index.html | 4 ++-- docs/v9.9.0/data-cookbooks/queue-post-to-cmr/index.html | 4 ++-- .../data-cookbooks/run-tasks-in-lambda-or-docker/index.html | 4 ++-- docs/v9.9.0/data-cookbooks/sips-workflow/index.html | 4 ++-- .../data-cookbooks/throttling-queued-executions/index.html | 4 ++-- docs/v9.9.0/data-cookbooks/tracking-files/index.html | 4 ++-- docs/v9.9.0/deployment/api-gateway-logging/index.html | 4 ++-- docs/v9.9.0/deployment/cloudwatch-logs-delivery/index.html | 4 ++-- docs/v9.9.0/deployment/components/index.html | 4 ++-- docs/v9.9.0/deployment/create_bucket/index.html | 4 ++-- docs/v9.9.0/deployment/cumulus_distribution/index.html | 4 ++-- docs/v9.9.0/deployment/index.html | 4 ++-- .../v9.9.0/deployment/postgres_database_deployment/index.html | 4 ++-- docs/v9.9.0/deployment/share-s3-access-logs/index.html | 4 ++-- docs/v9.9.0/deployment/terraform-best-practices/index.html | 4 ++-- docs/v9.9.0/deployment/thin_egress_app/index.html | 4 ++-- docs/v9.9.0/deployment/upgrade-readme/index.html | 4 ++-- docs/v9.9.0/development/forked-pr/index.html | 4 ++-- docs/v9.9.0/development/integration-tests/index.html | 4 ++-- docs/v9.9.0/development/quality-and-coverage/index.html | 4 ++-- docs/v9.9.0/development/release/index.html | 4 ++-- docs/v9.9.0/docs-how-to/index.html | 4 ++-- docs/v9.9.0/external-contributions/index.html | 4 ++-- docs/v9.9.0/faqs/index.html | 4 ++-- docs/v9.9.0/features/ancillary_metadata/index.html | 4 ++-- docs/v9.9.0/features/backup_and_restore/index.html | 4 ++-- docs/v9.9.0/features/data_in_dynamodb/index.html | 4 ++-- docs/v9.9.0/features/dead_letter_archive/index.html | 4 ++-- docs/v9.9.0/features/dead_letter_queues/index.html | 4 ++-- docs/v9.9.0/features/distribution-metrics/index.html | 4 ++-- docs/v9.9.0/features/execution_payload_retention/index.html | 4 ++-- docs/v9.9.0/features/logging-esdis-metrics/index.html | 4 ++-- docs/v9.9.0/features/replay-archived-sqs-messages/index.html | 4 ++-- docs/v9.9.0/features/replay-kinesis-messages/index.html | 4 ++-- docs/v9.9.0/features/reports/index.html | 4 ++-- docs/v9.9.0/getting-started/index.html | 4 ++-- docs/v9.9.0/glossary/index.html | 4 ++-- docs/v9.9.0/index.html | 4 ++-- docs/v9.9.0/integrator-guide/about-int-guide/index.html | 4 ++-- docs/v9.9.0/integrator-guide/int-common-use-cases/index.html | 4 ++-- .../integrator-guide/workflow-add-new-lambda/index.html | 4 ++-- .../integrator-guide/workflow-ts-failed-step/index.html | 4 ++-- docs/v9.9.0/interfaces/index.html | 4 ++-- docs/v9.9.0/operator-docs/about-operator-docs/index.html | 4 ++-- docs/v9.9.0/operator-docs/bulk-operations/index.html | 4 ++-- docs/v9.9.0/operator-docs/cmr-operations/index.html | 4 ++-- docs/v9.9.0/operator-docs/create-rule-in-cumulus/index.html | 4 ++-- docs/v9.9.0/operator-docs/discovery-filtering/index.html | 4 ++-- docs/v9.9.0/operator-docs/granule-workflows/index.html | 4 ++-- .../v9.9.0/operator-docs/kinesis-stream-for-ingest/index.html | 4 ++-- docs/v9.9.0/operator-docs/locating-access-logs/index.html | 4 ++-- docs/v9.9.0/operator-docs/naming-executions/index.html | 4 ++-- docs/v9.9.0/operator-docs/ops-common-use-cases/index.html | 4 ++-- docs/v9.9.0/operator-docs/trigger-workflow/index.html | 4 ++-- docs/v9.9.0/tasks/index.html | 4 ++-- docs/v9.9.0/team/index.html | 4 ++-- docs/v9.9.0/troubleshooting/index.html | 4 ++-- docs/v9.9.0/troubleshooting/reindex-elasticsearch/index.html | 4 ++-- .../troubleshooting/rerunning-workflow-executions/index.html | 4 ++-- .../troubleshooting/troubleshooting-deployment/index.html | 4 ++-- .../upgrade-notes/cumulus_distribution_migration/index.html | 4 ++-- docs/v9.9.0/upgrade-notes/migrate_tea_standalone/index.html | 4 ++-- docs/v9.9.0/upgrade-notes/upgrade-rds/index.html | 4 ++-- .../v9.9.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html | 4 ++-- docs/v9.9.0/workflow_tasks/discover_granules/index.html | 4 ++-- docs/v9.9.0/workflow_tasks/files_to_granules/index.html | 4 ++-- docs/v9.9.0/workflow_tasks/move_granules/index.html | 4 ++-- docs/v9.9.0/workflow_tasks/parse_pdr/index.html | 4 ++-- docs/v9.9.0/workflow_tasks/queue_granules/index.html | 4 ++-- docs/v9.9.0/workflows/cumulus-task-message-flow/index.html | 4 ++-- .../v9.9.0/workflows/developing-a-cumulus-workflow/index.html | 4 ++-- docs/v9.9.0/workflows/developing-workflow-tasks/index.html | 4 ++-- docs/v9.9.0/workflows/docker/index.html | 4 ++-- docs/v9.9.0/workflows/index.html | 4 ++-- docs/v9.9.0/workflows/input_output/index.html | 4 ++-- docs/v9.9.0/workflows/lambda/index.html | 4 ++-- docs/v9.9.0/workflows/protocol/index.html | 4 ++-- .../v9.9.0/workflows/workflow-configuration-how-to/index.html | 4 ++-- docs/v9.9.0/workflows/workflow-triggers/index.html | 4 ++-- docs/workflow_tasks/discover_granules/index.html | 4 ++-- docs/workflow_tasks/files_to_granules/index.html | 4 ++-- docs/workflow_tasks/lzards_backup/index.html | 4 ++-- docs/workflow_tasks/move_granules/index.html | 4 ++-- docs/workflow_tasks/parse_pdr/index.html | 4 ++-- docs/workflow_tasks/queue_granules/index.html | 4 ++-- docs/workflows/cumulus-task-message-flow/index.html | 4 ++-- docs/workflows/developing-a-cumulus-workflow/index.html | 4 ++-- docs/workflows/developing-workflow-tasks/index.html | 4 ++-- docs/workflows/docker/index.html | 4 ++-- docs/workflows/index.html | 4 ++-- docs/workflows/input_output/index.html | 4 ++-- docs/workflows/lambda/index.html | 4 ++-- docs/workflows/protocol/index.html | 4 ++-- docs/workflows/workflow-configuration-how-to/index.html | 4 ++-- docs/workflows/workflow-triggers/index.html | 4 ++-- index.html | 4 ++-- search/index.html | 4 ++-- versions/index.html | 4 ++-- 1317 files changed, 2629 insertions(+), 2629 deletions(-) rename assets/js/{1a4e3797.6edfb8bf.js => 1a4e3797.1ecd994c.js} (99%) create mode 100644 assets/js/1a4e3797.1ecd994c.js.LICENSE.txt delete mode 100644 assets/js/1a4e3797.6edfb8bf.js.LICENSE.txt rename assets/js/{61426.4a4dfe81.js => 61426.8bf7a004.js} (99%) rename assets/js/{runtime~main.326b708d.js => runtime~main.3f7ca0ae.js} (99%) diff --git a/404.html b/404.html index 7a7f466889b..2c7d30a6c75 100644 --- a/404.html +++ b/404.html @@ -5,13 +5,13 @@ Page Not Found | Cumulus Documentation - +
Skip to main content

Page Not Found

We could not find what you were looking for.

Please contact the owner of the site that linked you to the original URL and let them know their link is broken.

- + \ No newline at end of file diff --git a/assets/js/1a4e3797.6edfb8bf.js b/assets/js/1a4e3797.1ecd994c.js similarity index 99% rename from assets/js/1a4e3797.6edfb8bf.js rename to assets/js/1a4e3797.1ecd994c.js index a5e2b1dbbf7..460918df07d 100644 --- a/assets/js/1a4e3797.6edfb8bf.js +++ b/assets/js/1a4e3797.1ecd994c.js @@ -1,2 +1,2 @@ -/*! For license information please see 1a4e3797.6edfb8bf.js.LICENSE.txt */ -(self.webpackChunk_cumulus_website=self.webpackChunk_cumulus_website||[]).push([[97920],{17331:e=>{function t(){this._events=this._events||{},this._maxListeners=this._maxListeners||void 0}function r(e){return"function"==typeof e}function n(e){return"object"==typeof e&&null!==e}function i(e){return void 0===e}e.exports=t,t.prototype._events=void 0,t.prototype._maxListeners=void 0,t.defaultMaxListeners=10,t.prototype.setMaxListeners=function(e){if("number"!=typeof e||e<0||isNaN(e))throw TypeError("n must be a positive number");return this._maxListeners=e,this},t.prototype.emit=function(e){var t,a,s,c,u,o;if(this._events||(this._events={}),"error"===e&&(!this._events.error||n(this._events.error)&&!this._events.error.length)){if((t=arguments[1])instanceof Error)throw t;var h=new Error('Uncaught, unspecified "error" event. ('+t+")");throw h.context=t,h}if(i(a=this._events[e]))return!1;if(r(a))switch(arguments.length){case 1:a.call(this);break;case 2:a.call(this,arguments[1]);break;case 3:a.call(this,arguments[1],arguments[2]);break;default:c=Array.prototype.slice.call(arguments,1),a.apply(this,c)}else if(n(a))for(c=Array.prototype.slice.call(arguments,1),s=(o=a.slice()).length,u=0;u0&&this._events[e].length>s&&(this._events[e].warned=!0,console.error("(node) warning: possible EventEmitter memory leak detected. %d listeners added. Use emitter.setMaxListeners() to increase limit.",this._events[e].length),"function"==typeof console.trace&&console.trace()),this},t.prototype.on=t.prototype.addListener,t.prototype.once=function(e,t){if(!r(t))throw TypeError("listener must be a function");var n=!1;function i(){this.removeListener(e,i),n||(n=!0,t.apply(this,arguments))}return i.listener=t,this.on(e,i),this},t.prototype.removeListener=function(e,t){var i,a,s,c;if(!r(t))throw TypeError("listener must be a function");if(!this._events||!this._events[e])return this;if(s=(i=this._events[e]).length,a=-1,i===t||r(i.listener)&&i.listener===t)delete this._events[e],this._events.removeListener&&this.emit("removeListener",e,t);else if(n(i)){for(c=s;c-- >0;)if(i[c]===t||i[c].listener&&i[c].listener===t){a=c;break}if(a<0)return this;1===i.length?(i.length=0,delete this._events[e]):i.splice(a,1),this._events.removeListener&&this.emit("removeListener",e,t)}return this},t.prototype.removeAllListeners=function(e){var t,n;if(!this._events)return this;if(!this._events.removeListener)return 0===arguments.length?this._events={}:this._events[e]&&delete this._events[e],this;if(0===arguments.length){for(t in this._events)"removeListener"!==t&&this.removeAllListeners(t);return this.removeAllListeners("removeListener"),this._events={},this}if(r(n=this._events[e]))this.removeListener(e,n);else if(n)for(;n.length;)this.removeListener(e,n[n.length-1]);return delete this._events[e],this},t.prototype.listeners=function(e){return this._events&&this._events[e]?r(this._events[e])?[this._events[e]]:this._events[e].slice():[]},t.prototype.listenerCount=function(e){if(this._events){var t=this._events[e];if(r(t))return 1;if(t)return t.length}return 0},t.listenerCount=function(e,t){return e.listenerCount(t)}},8131:(e,t,r)=>{"use strict";var n=r(49374),i=r(17775),a=r(23076);function s(e,t,r){return new n(e,t,r)}s.version=r(24336),s.AlgoliaSearchHelper=n,s.SearchParameters=i,s.SearchResults=a,e.exports=s},68078:(e,t,r)=>{"use strict";var n=r(17331);function i(e,t){this.main=e,this.fn=t,this.lastResults=null}r(14853)(i,n),i.prototype.detach=function(){this.removeAllListeners(),this.main.detachDerivedHelper(this)},i.prototype.getModifiedState=function(e){return this.fn(e)},e.exports=i},82437:(e,t,r)=>{"use strict";var n=r(52344),i=r(49803),a=r(90116),s={addRefinement:function(e,t,r){if(s.isRefined(e,t,r))return e;var i=""+r,a=e[t]?e[t].concat(i):[i],c={};return c[t]=a,n({},c,e)},removeRefinement:function(e,t,r){if(void 0===r)return s.clearRefinement(e,(function(e,r){return t===r}));var n=""+r;return s.clearRefinement(e,(function(e,r){return t===r&&n===e}))},toggleRefinement:function(e,t,r){if(void 0===r)throw new Error("toggleRefinement should be used with a value");return s.isRefined(e,t,r)?s.removeRefinement(e,t,r):s.addRefinement(e,t,r)},clearRefinement:function(e,t,r){if(void 0===t)return a(e)?{}:e;if("string"==typeof t)return i(e,[t]);if("function"==typeof t){var n=!1,s=Object.keys(e).reduce((function(i,a){var s=e[a]||[],c=s.filter((function(e){return!t(e,a,r)}));return c.length!==s.length&&(n=!0),i[a]=c,i}),{});return n?s:e}},isRefined:function(e,t,r){var n=Boolean(e[t])&&e[t].length>0;if(void 0===r||!n)return n;var i=""+r;return-1!==e[t].indexOf(i)}};e.exports=s},17775:(e,t,r)=>{"use strict";var n=r(60185),i=r(52344),a=r(22686),s=r(7888),c=r(28023),u=r(49803),o=r(90116),h=r(46801),f=r(82437);function l(e,t){return Array.isArray(e)&&Array.isArray(t)?e.length===t.length&&e.every((function(e,r){return l(t[r],e)})):e===t}function m(e){var t=e?m._parseNumbers(e):{};void 0===t.userToken||h(t.userToken)||console.warn("[algoliasearch-helper] The `userToken` parameter is invalid. This can lead to wrong analytics.\n - Format: [a-zA-Z0-9_-]{1,64}"),this.facets=t.facets||[],this.disjunctiveFacets=t.disjunctiveFacets||[],this.hierarchicalFacets=t.hierarchicalFacets||[],this.facetsRefinements=t.facetsRefinements||{},this.facetsExcludes=t.facetsExcludes||{},this.disjunctiveFacetsRefinements=t.disjunctiveFacetsRefinements||{},this.numericRefinements=t.numericRefinements||{},this.tagRefinements=t.tagRefinements||[],this.hierarchicalFacetsRefinements=t.hierarchicalFacetsRefinements||{};var r=this;Object.keys(t).forEach((function(e){var n=-1!==m.PARAMETERS.indexOf(e),i=void 0!==t[e];!n&&i&&(r[e]=t[e])}))}m.PARAMETERS=Object.keys(new m),m._parseNumbers=function(e){if(e instanceof m)return e;var t={};if(["aroundPrecision","aroundRadius","getRankingInfo","minWordSizefor2Typos","minWordSizefor1Typo","page","maxValuesPerFacet","distinct","minimumAroundRadius","hitsPerPage","minProximity"].forEach((function(r){var n=e[r];if("string"==typeof n){var i=parseFloat(n);t[r]=isNaN(i)?n:i}})),Array.isArray(e.insideBoundingBox)&&(t.insideBoundingBox=e.insideBoundingBox.map((function(e){return Array.isArray(e)?e.map((function(e){return parseFloat(e)})):e}))),e.numericRefinements){var r={};Object.keys(e.numericRefinements).forEach((function(t){var n=e.numericRefinements[t]||{};r[t]={},Object.keys(n).forEach((function(e){var i=n[e].map((function(e){return Array.isArray(e)?e.map((function(e){return"string"==typeof e?parseFloat(e):e})):"string"==typeof e?parseFloat(e):e}));r[t][e]=i}))})),t.numericRefinements=r}return n({},e,t)},m.make=function(e){var t=new m(e);return(e.hierarchicalFacets||[]).forEach((function(e){if(e.rootPath){var r=t.getHierarchicalRefinement(e.name);r.length>0&&0!==r[0].indexOf(e.rootPath)&&(t=t.clearRefinements(e.name)),0===(r=t.getHierarchicalRefinement(e.name)).length&&(t=t.toggleHierarchicalFacetRefinement(e.name,e.rootPath))}})),t},m.validate=function(e,t){var r=t||{};return e.tagFilters&&r.tagRefinements&&r.tagRefinements.length>0?new Error("[Tags] Cannot switch from the managed tag API to the advanced API. It is probably an error, if it is really what you want, you should first clear the tags with clearTags method."):e.tagRefinements.length>0&&r.tagFilters?new Error("[Tags] Cannot switch from the advanced tag API to the managed API. It is probably an error, if it is not, you should first clear the tags with clearTags method."):e.numericFilters&&r.numericRefinements&&o(r.numericRefinements)?new Error("[Numeric filters] Can't switch from the advanced to the managed API. It is probably an error, if this is really what you want, you have to first clear the numeric filters."):o(e.numericRefinements)&&r.numericFilters?new Error("[Numeric filters] Can't switch from the managed API to the advanced. It is probably an error, if this is really what you want, you have to first clear the numeric filters."):null},m.prototype={constructor:m,clearRefinements:function(e){var t={numericRefinements:this._clearNumericRefinements(e),facetsRefinements:f.clearRefinement(this.facetsRefinements,e,"conjunctiveFacet"),facetsExcludes:f.clearRefinement(this.facetsExcludes,e,"exclude"),disjunctiveFacetsRefinements:f.clearRefinement(this.disjunctiveFacetsRefinements,e,"disjunctiveFacet"),hierarchicalFacetsRefinements:f.clearRefinement(this.hierarchicalFacetsRefinements,e,"hierarchicalFacet")};return t.numericRefinements===this.numericRefinements&&t.facetsRefinements===this.facetsRefinements&&t.facetsExcludes===this.facetsExcludes&&t.disjunctiveFacetsRefinements===this.disjunctiveFacetsRefinements&&t.hierarchicalFacetsRefinements===this.hierarchicalFacetsRefinements?this:this.setQueryParameters(t)},clearTags:function(){return void 0===this.tagFilters&&0===this.tagRefinements.length?this:this.setQueryParameters({tagFilters:void 0,tagRefinements:[]})},setIndex:function(e){return e===this.index?this:this.setQueryParameters({index:e})},setQuery:function(e){return e===this.query?this:this.setQueryParameters({query:e})},setPage:function(e){return e===this.page?this:this.setQueryParameters({page:e})},setFacets:function(e){return this.setQueryParameters({facets:e})},setDisjunctiveFacets:function(e){return this.setQueryParameters({disjunctiveFacets:e})},setHitsPerPage:function(e){return this.hitsPerPage===e?this:this.setQueryParameters({hitsPerPage:e})},setTypoTolerance:function(e){return this.typoTolerance===e?this:this.setQueryParameters({typoTolerance:e})},addNumericRefinement:function(e,t,r){var i=c(r);if(this.isNumericRefined(e,t,i))return this;var a=n({},this.numericRefinements);return a[e]=n({},a[e]),a[e][t]?(a[e][t]=a[e][t].slice(),a[e][t].push(i)):a[e][t]=[i],this.setQueryParameters({numericRefinements:a})},getConjunctiveRefinements:function(e){return this.isConjunctiveFacet(e)&&this.facetsRefinements[e]||[]},getDisjunctiveRefinements:function(e){return this.isDisjunctiveFacet(e)&&this.disjunctiveFacetsRefinements[e]||[]},getHierarchicalRefinement:function(e){return this.hierarchicalFacetsRefinements[e]||[]},getExcludeRefinements:function(e){return this.isConjunctiveFacet(e)&&this.facetsExcludes[e]||[]},removeNumericRefinement:function(e,t,r){var n=r;return void 0!==n?this.isNumericRefined(e,t,n)?this.setQueryParameters({numericRefinements:this._clearNumericRefinements((function(r,i){return i===e&&r.op===t&&l(r.val,c(n))}))}):this:void 0!==t?this.isNumericRefined(e,t)?this.setQueryParameters({numericRefinements:this._clearNumericRefinements((function(r,n){return n===e&&r.op===t}))}):this:this.isNumericRefined(e)?this.setQueryParameters({numericRefinements:this._clearNumericRefinements((function(t,r){return r===e}))}):this},getNumericRefinements:function(e){return this.numericRefinements[e]||{}},getNumericRefinement:function(e,t){return this.numericRefinements[e]&&this.numericRefinements[e][t]},_clearNumericRefinements:function(e){if(void 0===e)return o(this.numericRefinements)?{}:this.numericRefinements;if("string"==typeof e)return u(this.numericRefinements,[e]);if("function"==typeof e){var t=!1,r=this.numericRefinements,n=Object.keys(r).reduce((function(n,i){var a=r[i],s={};return a=a||{},Object.keys(a).forEach((function(r){var n=a[r]||[],c=[];n.forEach((function(t){e({val:t,op:r},i,"numeric")||c.push(t)})),c.length!==n.length&&(t=!0),s[r]=c})),n[i]=s,n}),{});return t?n:this.numericRefinements}},addFacet:function(e){return this.isConjunctiveFacet(e)?this:this.setQueryParameters({facets:this.facets.concat([e])})},addDisjunctiveFacet:function(e){return this.isDisjunctiveFacet(e)?this:this.setQueryParameters({disjunctiveFacets:this.disjunctiveFacets.concat([e])})},addHierarchicalFacet:function(e){if(this.isHierarchicalFacet(e.name))throw new Error("Cannot declare two hierarchical facets with the same name: `"+e.name+"`");return this.setQueryParameters({hierarchicalFacets:this.hierarchicalFacets.concat([e])})},addFacetRefinement:function(e,t){if(!this.isConjunctiveFacet(e))throw new Error(e+" is not defined in the facets attribute of the helper configuration");return f.isRefined(this.facetsRefinements,e,t)?this:this.setQueryParameters({facetsRefinements:f.addRefinement(this.facetsRefinements,e,t)})},addExcludeRefinement:function(e,t){if(!this.isConjunctiveFacet(e))throw new Error(e+" is not defined in the facets attribute of the helper configuration");return f.isRefined(this.facetsExcludes,e,t)?this:this.setQueryParameters({facetsExcludes:f.addRefinement(this.facetsExcludes,e,t)})},addDisjunctiveFacetRefinement:function(e,t){if(!this.isDisjunctiveFacet(e))throw new Error(e+" is not defined in the disjunctiveFacets attribute of the helper configuration");return f.isRefined(this.disjunctiveFacetsRefinements,e,t)?this:this.setQueryParameters({disjunctiveFacetsRefinements:f.addRefinement(this.disjunctiveFacetsRefinements,e,t)})},addTagRefinement:function(e){if(this.isTagRefined(e))return this;var t={tagRefinements:this.tagRefinements.concat(e)};return this.setQueryParameters(t)},removeFacet:function(e){return this.isConjunctiveFacet(e)?this.clearRefinements(e).setQueryParameters({facets:this.facets.filter((function(t){return t!==e}))}):this},removeDisjunctiveFacet:function(e){return this.isDisjunctiveFacet(e)?this.clearRefinements(e).setQueryParameters({disjunctiveFacets:this.disjunctiveFacets.filter((function(t){return t!==e}))}):this},removeHierarchicalFacet:function(e){return this.isHierarchicalFacet(e)?this.clearRefinements(e).setQueryParameters({hierarchicalFacets:this.hierarchicalFacets.filter((function(t){return t.name!==e}))}):this},removeFacetRefinement:function(e,t){if(!this.isConjunctiveFacet(e))throw new Error(e+" is not defined in the facets attribute of the helper configuration");return f.isRefined(this.facetsRefinements,e,t)?this.setQueryParameters({facetsRefinements:f.removeRefinement(this.facetsRefinements,e,t)}):this},removeExcludeRefinement:function(e,t){if(!this.isConjunctiveFacet(e))throw new Error(e+" is not defined in the facets attribute of the helper configuration");return f.isRefined(this.facetsExcludes,e,t)?this.setQueryParameters({facetsExcludes:f.removeRefinement(this.facetsExcludes,e,t)}):this},removeDisjunctiveFacetRefinement:function(e,t){if(!this.isDisjunctiveFacet(e))throw new Error(e+" is not defined in the disjunctiveFacets attribute of the helper configuration");return f.isRefined(this.disjunctiveFacetsRefinements,e,t)?this.setQueryParameters({disjunctiveFacetsRefinements:f.removeRefinement(this.disjunctiveFacetsRefinements,e,t)}):this},removeTagRefinement:function(e){if(!this.isTagRefined(e))return this;var t={tagRefinements:this.tagRefinements.filter((function(t){return t!==e}))};return this.setQueryParameters(t)},toggleRefinement:function(e,t){return this.toggleFacetRefinement(e,t)},toggleFacetRefinement:function(e,t){if(this.isHierarchicalFacet(e))return this.toggleHierarchicalFacetRefinement(e,t);if(this.isConjunctiveFacet(e))return this.toggleConjunctiveFacetRefinement(e,t);if(this.isDisjunctiveFacet(e))return this.toggleDisjunctiveFacetRefinement(e,t);throw new Error("Cannot refine the undeclared facet "+e+"; it should be added to the helper options facets, disjunctiveFacets or hierarchicalFacets")},toggleConjunctiveFacetRefinement:function(e,t){if(!this.isConjunctiveFacet(e))throw new Error(e+" is not defined in the facets attribute of the helper configuration");return this.setQueryParameters({facetsRefinements:f.toggleRefinement(this.facetsRefinements,e,t)})},toggleExcludeFacetRefinement:function(e,t){if(!this.isConjunctiveFacet(e))throw new Error(e+" is not defined in the facets attribute of the helper configuration");return this.setQueryParameters({facetsExcludes:f.toggleRefinement(this.facetsExcludes,e,t)})},toggleDisjunctiveFacetRefinement:function(e,t){if(!this.isDisjunctiveFacet(e))throw new Error(e+" is not defined in the disjunctiveFacets attribute of the helper configuration");return this.setQueryParameters({disjunctiveFacetsRefinements:f.toggleRefinement(this.disjunctiveFacetsRefinements,e,t)})},toggleHierarchicalFacetRefinement:function(e,t){if(!this.isHierarchicalFacet(e))throw new Error(e+" is not defined in the hierarchicalFacets attribute of the helper configuration");var r=this._getHierarchicalFacetSeparator(this.getHierarchicalFacetByName(e)),n={};return void 0!==this.hierarchicalFacetsRefinements[e]&&this.hierarchicalFacetsRefinements[e].length>0&&(this.hierarchicalFacetsRefinements[e][0]===t||0===this.hierarchicalFacetsRefinements[e][0].indexOf(t+r))?-1===t.indexOf(r)?n[e]=[]:n[e]=[t.slice(0,t.lastIndexOf(r))]:n[e]=[t],this.setQueryParameters({hierarchicalFacetsRefinements:i({},n,this.hierarchicalFacetsRefinements)})},addHierarchicalFacetRefinement:function(e,t){if(this.isHierarchicalFacetRefined(e))throw new Error(e+" is already refined.");if(!this.isHierarchicalFacet(e))throw new Error(e+" is not defined in the hierarchicalFacets attribute of the helper configuration.");var r={};return r[e]=[t],this.setQueryParameters({hierarchicalFacetsRefinements:i({},r,this.hierarchicalFacetsRefinements)})},removeHierarchicalFacetRefinement:function(e){if(!this.isHierarchicalFacetRefined(e))return this;var t={};return t[e]=[],this.setQueryParameters({hierarchicalFacetsRefinements:i({},t,this.hierarchicalFacetsRefinements)})},toggleTagRefinement:function(e){return this.isTagRefined(e)?this.removeTagRefinement(e):this.addTagRefinement(e)},isDisjunctiveFacet:function(e){return this.disjunctiveFacets.indexOf(e)>-1},isHierarchicalFacet:function(e){return void 0!==this.getHierarchicalFacetByName(e)},isConjunctiveFacet:function(e){return this.facets.indexOf(e)>-1},isFacetRefined:function(e,t){return!!this.isConjunctiveFacet(e)&&f.isRefined(this.facetsRefinements,e,t)},isExcludeRefined:function(e,t){return!!this.isConjunctiveFacet(e)&&f.isRefined(this.facetsExcludes,e,t)},isDisjunctiveFacetRefined:function(e,t){return!!this.isDisjunctiveFacet(e)&&f.isRefined(this.disjunctiveFacetsRefinements,e,t)},isHierarchicalFacetRefined:function(e,t){if(!this.isHierarchicalFacet(e))return!1;var r=this.getHierarchicalRefinement(e);return t?-1!==r.indexOf(t):r.length>0},isNumericRefined:function(e,t,r){if(void 0===r&&void 0===t)return Boolean(this.numericRefinements[e]);var n=this.numericRefinements[e]&&void 0!==this.numericRefinements[e][t];if(void 0===r||!n)return n;var i,a,u=c(r),o=void 0!==(i=this.numericRefinements[e][t],a=u,s(i,(function(e){return l(e,a)})));return n&&o},isTagRefined:function(e){return-1!==this.tagRefinements.indexOf(e)},getRefinedDisjunctiveFacets:function(){var e=this,t=a(Object.keys(this.numericRefinements).filter((function(t){return Object.keys(e.numericRefinements[t]).length>0})),this.disjunctiveFacets);return Object.keys(this.disjunctiveFacetsRefinements).filter((function(t){return e.disjunctiveFacetsRefinements[t].length>0})).concat(t).concat(this.getRefinedHierarchicalFacets())},getRefinedHierarchicalFacets:function(){var e=this;return a(this.hierarchicalFacets.map((function(e){return e.name})),Object.keys(this.hierarchicalFacetsRefinements).filter((function(t){return e.hierarchicalFacetsRefinements[t].length>0})))},getUnrefinedDisjunctiveFacets:function(){var e=this.getRefinedDisjunctiveFacets();return this.disjunctiveFacets.filter((function(t){return-1===e.indexOf(t)}))},managedParameters:["index","facets","disjunctiveFacets","facetsRefinements","hierarchicalFacets","facetsExcludes","disjunctiveFacetsRefinements","numericRefinements","tagRefinements","hierarchicalFacetsRefinements"],getQueryParams:function(){var e=this.managedParameters,t={},r=this;return Object.keys(this).forEach((function(n){var i=r[n];-1===e.indexOf(n)&&void 0!==i&&(t[n]=i)})),t},setQueryParameter:function(e,t){if(this[e]===t)return this;var r={};return r[e]=t,this.setQueryParameters(r)},setQueryParameters:function(e){if(!e)return this;var t=m.validate(this,e);if(t)throw t;var r=this,n=m._parseNumbers(e),i=Object.keys(this).reduce((function(e,t){return e[t]=r[t],e}),{}),a=Object.keys(n).reduce((function(e,t){var r=void 0!==e[t],i=void 0!==n[t];return r&&!i?u(e,[t]):(i&&(e[t]=n[t]),e)}),i);return new this.constructor(a)},resetPage:function(){return void 0===this.page?this:this.setPage(0)},_getHierarchicalFacetSortBy:function(e){return e.sortBy||["isRefined:desc","name:asc"]},_getHierarchicalFacetSeparator:function(e){return e.separator||" > "},_getHierarchicalRootPath:function(e){return e.rootPath||null},_getHierarchicalShowParentLevel:function(e){return"boolean"!=typeof e.showParentLevel||e.showParentLevel},getHierarchicalFacetByName:function(e){return s(this.hierarchicalFacets,(function(t){return t.name===e}))},getHierarchicalFacetBreadcrumb:function(e){if(!this.isHierarchicalFacet(e))return[];var t=this.getHierarchicalRefinement(e)[0];if(!t)return[];var r=this._getHierarchicalFacetSeparator(this.getHierarchicalFacetByName(e));return t.split(r).map((function(e){return e.trim()}))},toString:function(){return JSON.stringify(this,null,2)}},e.exports=m},10210:(e,t,r)=>{"use strict";e.exports=function(e){return function(t,r){var s=e.hierarchicalFacets[r],o=e.hierarchicalFacetsRefinements[s.name]&&e.hierarchicalFacetsRefinements[s.name][0]||"",h=e._getHierarchicalFacetSeparator(s),f=e._getHierarchicalRootPath(s),l=e._getHierarchicalShowParentLevel(s),m=a(e._getHierarchicalFacetSortBy(s)),d=t.every((function(e){return e.exhaustive})),p=function(e,t,r,a,s){return function(o,h,f){var l=o;if(f>0){var m=0;for(l=o;m{"use strict";var n=r(60185),i=r(52344),a=r(42148),s=r(74587),c=r(7888),u=r(69725),o=r(82293),h=r(94039),f=h.escapeFacetValue,l=h.unescapeFacetValue,m=r(10210);function d(e){var t={};return e.forEach((function(e,r){t[e]=r})),t}function p(e,t,r){t&&t[r]&&(e.stats=t[r])}function v(e,t,r){var a=t[0];this._rawResults=t;var o=this;Object.keys(a).forEach((function(e){o[e]=a[e]})),Object.keys(r||{}).forEach((function(e){o[e]=r[e]})),this.processingTimeMS=t.reduce((function(e,t){return void 0===t.processingTimeMS?e:e+t.processingTimeMS}),0),this.disjunctiveFacets=[],this.hierarchicalFacets=e.hierarchicalFacets.map((function(){return[]})),this.facets=[];var h=e.getRefinedDisjunctiveFacets(),f=d(e.facets),v=d(e.disjunctiveFacets),g=1,y=a.facets||{};Object.keys(y).forEach((function(t){var r,n,i=y[t],s=(r=e.hierarchicalFacets,n=t,c(r,(function(e){return(e.attributes||[]).indexOf(n)>-1})));if(s){var h=s.attributes.indexOf(t),l=u(e.hierarchicalFacets,(function(e){return e.name===s.name}));o.hierarchicalFacets[l][h]={attribute:t,data:i,exhaustive:a.exhaustiveFacetsCount}}else{var m,d=-1!==e.disjunctiveFacets.indexOf(t),g=-1!==e.facets.indexOf(t);d&&(m=v[t],o.disjunctiveFacets[m]={name:t,data:i,exhaustive:a.exhaustiveFacetsCount},p(o.disjunctiveFacets[m],a.facets_stats,t)),g&&(m=f[t],o.facets[m]={name:t,data:i,exhaustive:a.exhaustiveFacetsCount},p(o.facets[m],a.facets_stats,t))}})),this.hierarchicalFacets=s(this.hierarchicalFacets),h.forEach((function(r){var s=t[g],c=s&&s.facets?s.facets:{},h=e.getHierarchicalFacetByName(r);Object.keys(c).forEach((function(t){var r,f=c[t];if(h){r=u(e.hierarchicalFacets,(function(e){return e.name===h.name}));var m=u(o.hierarchicalFacets[r],(function(e){return e.attribute===t}));if(-1===m)return;o.hierarchicalFacets[r][m].data=n({},o.hierarchicalFacets[r][m].data,f)}else{r=v[t];var d=a.facets&&a.facets[t]||{};o.disjunctiveFacets[r]={name:t,data:i({},f,d),exhaustive:s.exhaustiveFacetsCount},p(o.disjunctiveFacets[r],s.facets_stats,t),e.disjunctiveFacetsRefinements[t]&&e.disjunctiveFacetsRefinements[t].forEach((function(n){!o.disjunctiveFacets[r].data[n]&&e.disjunctiveFacetsRefinements[t].indexOf(l(n))>-1&&(o.disjunctiveFacets[r].data[n]=0)}))}})),g++})),e.getRefinedHierarchicalFacets().forEach((function(r){var n=e.getHierarchicalFacetByName(r),a=e._getHierarchicalFacetSeparator(n),s=e.getHierarchicalRefinement(r);0===s.length||s[0].split(a).length<2||t.slice(g).forEach((function(t){var r=t&&t.facets?t.facets:{};Object.keys(r).forEach((function(t){var c=r[t],h=u(e.hierarchicalFacets,(function(e){return e.name===n.name})),f=u(o.hierarchicalFacets[h],(function(e){return e.attribute===t}));if(-1!==f){var l={};if(s.length>0){var m=s[0].split(a)[0];l[m]=o.hierarchicalFacets[h][f].data[m]}o.hierarchicalFacets[h][f].data=i(l,c,o.hierarchicalFacets[h][f].data)}})),g++}))})),Object.keys(e.facetsExcludes).forEach((function(t){var r=e.facetsExcludes[t],n=f[t];o.facets[n]={name:t,data:y[t],exhaustive:a.exhaustiveFacetsCount},r.forEach((function(e){o.facets[n]=o.facets[n]||{name:t},o.facets[n].data=o.facets[n].data||{},o.facets[n].data[e]=0}))})),this.hierarchicalFacets=this.hierarchicalFacets.map(m(e)),this.facets=s(this.facets),this.disjunctiveFacets=s(this.disjunctiveFacets),this._state=e}function g(e,t){function r(e){return e.name===t}if(e._state.isConjunctiveFacet(t)){var n=c(e.facets,r);return n?Object.keys(n.data).map((function(r){var i=f(r);return{name:r,escapedValue:i,count:n.data[r],isRefined:e._state.isFacetRefined(t,i),isExcluded:e._state.isExcludeRefined(t,r)}})):[]}if(e._state.isDisjunctiveFacet(t)){var i=c(e.disjunctiveFacets,r);return i?Object.keys(i.data).map((function(r){var n=f(r);return{name:r,escapedValue:n,count:i.data[r],isRefined:e._state.isDisjunctiveFacetRefined(t,n)}})):[]}if(e._state.isHierarchicalFacet(t)){var a=c(e.hierarchicalFacets,r);if(!a)return a;var s=e._state.getHierarchicalFacetByName(t),u=e._state._getHierarchicalFacetSeparator(s),o=l(e._state.getHierarchicalRefinement(t)[0]||"");0===o.indexOf(s.rootPath)&&(o=o.replace(s.rootPath+u,""));var h=o.split(u);return h.unshift(t),y(a,h,0),a}}function y(e,t,r){e.isRefined=e.name===t[r],e.data&&e.data.forEach((function(e){y(e,t,r+1)}))}function R(e,t,r,n){if(n=n||0,Array.isArray(t))return e(t,r[n]);if(!t.data||0===t.data.length)return t;var a=t.data.map((function(t){return R(e,t,r,n+1)})),s=e(a,r[n]);return i({data:s},t)}function F(e,t){var r=c(e,(function(e){return e.name===t}));return r&&r.stats}function b(e,t,r,n,i){var a=c(i,(function(e){return e.name===r})),s=a&&a.data&&a.data[n]?a.data[n]:0,u=a&&a.exhaustive||!1;return{type:t,attributeName:r,name:n,count:s,exhaustive:u}}v.prototype.getFacetByName=function(e){function t(t){return t.name===e}return c(this.facets,t)||c(this.disjunctiveFacets,t)||c(this.hierarchicalFacets,t)},v.DEFAULT_SORT=["isRefined:desc","count:desc","name:asc"],v.prototype.getFacetValues=function(e,t){var r=g(this,e);if(r){var n,s=i({},t,{sortBy:v.DEFAULT_SORT,facetOrdering:!(t&&t.sortBy)}),c=this;if(Array.isArray(r))n=[e];else n=c._state.getHierarchicalFacetByName(r.name).attributes;return R((function(e,t){if(s.facetOrdering){var r=function(e,t){return e.renderingContent&&e.renderingContent.facetOrdering&&e.renderingContent.facetOrdering.values&&e.renderingContent.facetOrdering.values[t]}(c,t);if(r)return function(e,t){var r=[],n=[],i=(t.order||[]).reduce((function(e,t,r){return e[t]=r,e}),{});e.forEach((function(e){var t=e.path||e.name;void 0!==i[t]?r[i[t]]=e:n.push(e)})),r=r.filter((function(e){return e}));var s,c=t.sortRemainingBy;return"hidden"===c?r:(s="alpha"===c?[["path","name"],["asc","asc"]]:[["count"],["desc"]],r.concat(a(n,s[0],s[1])))}(e,r)}if(Array.isArray(s.sortBy)){var n=o(s.sortBy,v.DEFAULT_SORT);return a(e,n[0],n[1])}if("function"==typeof s.sortBy)return function(e,t){return t.sort(e)}(s.sortBy,e);throw new Error("options.sortBy is optional but if defined it must be either an array of string (predicates) or a sorting function")}),r,n)}},v.prototype.getFacetStats=function(e){return this._state.isConjunctiveFacet(e)?F(this.facets,e):this._state.isDisjunctiveFacet(e)?F(this.disjunctiveFacets,e):void 0},v.prototype.getRefinements=function(){var e=this._state,t=this,r=[];return Object.keys(e.facetsRefinements).forEach((function(n){e.facetsRefinements[n].forEach((function(i){r.push(b(e,"facet",n,i,t.facets))}))})),Object.keys(e.facetsExcludes).forEach((function(n){e.facetsExcludes[n].forEach((function(i){r.push(b(e,"exclude",n,i,t.facets))}))})),Object.keys(e.disjunctiveFacetsRefinements).forEach((function(n){e.disjunctiveFacetsRefinements[n].forEach((function(i){r.push(b(e,"disjunctive",n,i,t.disjunctiveFacets))}))})),Object.keys(e.hierarchicalFacetsRefinements).forEach((function(n){e.hierarchicalFacetsRefinements[n].forEach((function(i){r.push(function(e,t,r,n){var i=e.getHierarchicalFacetByName(t),a=e._getHierarchicalFacetSeparator(i),s=r.split(a),u=c(n,(function(e){return e.name===t})),o=s.reduce((function(e,t){var r=e&&c(e.data,(function(e){return e.name===t}));return void 0!==r?r:e}),u),h=o&&o.count||0,f=o&&o.exhaustive||!1,l=o&&o.path||"";return{type:"hierarchical",attributeName:t,name:l,count:h,exhaustive:f}}(e,n,i,t.hierarchicalFacets))}))})),Object.keys(e.numericRefinements).forEach((function(t){var n=e.numericRefinements[t];Object.keys(n).forEach((function(e){n[e].forEach((function(n){r.push({type:"numeric",attributeName:t,name:n,numericValue:n,operator:e})}))}))})),e.tagRefinements.forEach((function(e){r.push({type:"tag",attributeName:"_tags",name:e})})),r},e.exports=v},49374:(e,t,r)=>{"use strict";var n=r(17775),i=r(23076),a=r(68078),s=r(96394),c=r(17331),u=r(14853),o=r(90116),h=r(49803),f=r(60185),l=r(24336),m=r(94039).escapeFacetValue;function d(e,t,r){"function"==typeof e.addAlgoliaAgent&&e.addAlgoliaAgent("JS Helper ("+l+")"),this.setClient(e);var i=r||{};i.index=t,this.state=n.make(i),this.lastResults=null,this._queryId=0,this._lastQueryIdReceived=-1,this.derivedHelpers=[],this._currentNbQueries=0}function p(e){if(e<0)throw new Error("Page requested below 0.");return this._change({state:this.state.setPage(e),isPageReset:!1}),this}function v(){return this.state.page}u(d,c),d.prototype.search=function(){return this._search({onlyWithDerivedHelpers:!1}),this},d.prototype.searchOnlyWithDerivedHelpers=function(){return this._search({onlyWithDerivedHelpers:!0}),this},d.prototype.getQuery=function(){var e=this.state;return s._getHitsSearchParams(e)},d.prototype.searchOnce=function(e,t){var r=e?this.state.setQueryParameters(e):this.state,n=s._getQueries(r.index,r),a=this;if(this._currentNbQueries++,this.emit("searchOnce",{state:r}),!t)return this.client.search(n).then((function(e){return a._currentNbQueries--,0===a._currentNbQueries&&a.emit("searchQueueEmpty"),{content:new i(r,e.results),state:r,_originalResponse:e}}),(function(e){throw a._currentNbQueries--,0===a._currentNbQueries&&a.emit("searchQueueEmpty"),e}));this.client.search(n).then((function(e){a._currentNbQueries--,0===a._currentNbQueries&&a.emit("searchQueueEmpty"),t(null,new i(r,e.results),r)})).catch((function(e){a._currentNbQueries--,0===a._currentNbQueries&&a.emit("searchQueueEmpty"),t(e,null,r)}))},d.prototype.findAnswers=function(e){console.warn("[algoliasearch-helper] answers is no longer supported");var t=this.state,r=this.derivedHelpers[0];if(!r)return Promise.resolve([]);var n=r.getModifiedState(t),i=f({attributesForPrediction:e.attributesForPrediction,nbHits:e.nbHits},{params:h(s._getHitsSearchParams(n),["attributesToSnippet","hitsPerPage","restrictSearchableAttributes","snippetEllipsisText"])}),a="search for answers was called, but this client does not have a function client.initIndex(index).findAnswers";if("function"!=typeof this.client.initIndex)throw new Error(a);var c=this.client.initIndex(n.index);if("function"!=typeof c.findAnswers)throw new Error(a);return c.findAnswers(n.query,e.queryLanguages,i)},d.prototype.searchForFacetValues=function(e,t,r,n){var i="function"==typeof this.client.searchForFacetValues,a="function"==typeof this.client.initIndex;if(!i&&!a&&"function"!=typeof this.client.search)throw new Error("search for facet values (searchable) was called, but this client does not have a function client.searchForFacetValues or client.initIndex(index).searchForFacetValues");var c=this.state.setQueryParameters(n||{}),u=c.isDisjunctiveFacet(e),o=s.getSearchForFacetQuery(e,t,r,c);this._currentNbQueries++;var h,f=this;return i?h=this.client.searchForFacetValues([{indexName:c.index,params:o}]):a?h=this.client.initIndex(c.index).searchForFacetValues(o):(delete o.facetName,h=this.client.search([{type:"facet",facet:e,indexName:c.index,params:o}]).then((function(e){return e.results[0]}))),this.emit("searchForFacetValues",{state:c,facet:e,query:t}),h.then((function(t){return f._currentNbQueries--,0===f._currentNbQueries&&f.emit("searchQueueEmpty"),(t=Array.isArray(t)?t[0]:t).facetHits.forEach((function(t){t.escapedValue=m(t.value),t.isRefined=u?c.isDisjunctiveFacetRefined(e,t.escapedValue):c.isFacetRefined(e,t.escapedValue)})),t}),(function(e){throw f._currentNbQueries--,0===f._currentNbQueries&&f.emit("searchQueueEmpty"),e}))},d.prototype.setQuery=function(e){return this._change({state:this.state.resetPage().setQuery(e),isPageReset:!0}),this},d.prototype.clearRefinements=function(e){return this._change({state:this.state.resetPage().clearRefinements(e),isPageReset:!0}),this},d.prototype.clearTags=function(){return this._change({state:this.state.resetPage().clearTags(),isPageReset:!0}),this},d.prototype.addDisjunctiveFacetRefinement=function(e,t){return this._change({state:this.state.resetPage().addDisjunctiveFacetRefinement(e,t),isPageReset:!0}),this},d.prototype.addDisjunctiveRefine=function(){return this.addDisjunctiveFacetRefinement.apply(this,arguments)},d.prototype.addHierarchicalFacetRefinement=function(e,t){return this._change({state:this.state.resetPage().addHierarchicalFacetRefinement(e,t),isPageReset:!0}),this},d.prototype.addNumericRefinement=function(e,t,r){return this._change({state:this.state.resetPage().addNumericRefinement(e,t,r),isPageReset:!0}),this},d.prototype.addFacetRefinement=function(e,t){return this._change({state:this.state.resetPage().addFacetRefinement(e,t),isPageReset:!0}),this},d.prototype.addRefine=function(){return this.addFacetRefinement.apply(this,arguments)},d.prototype.addFacetExclusion=function(e,t){return this._change({state:this.state.resetPage().addExcludeRefinement(e,t),isPageReset:!0}),this},d.prototype.addExclude=function(){return this.addFacetExclusion.apply(this,arguments)},d.prototype.addTag=function(e){return this._change({state:this.state.resetPage().addTagRefinement(e),isPageReset:!0}),this},d.prototype.removeNumericRefinement=function(e,t,r){return this._change({state:this.state.resetPage().removeNumericRefinement(e,t,r),isPageReset:!0}),this},d.prototype.removeDisjunctiveFacetRefinement=function(e,t){return this._change({state:this.state.resetPage().removeDisjunctiveFacetRefinement(e,t),isPageReset:!0}),this},d.prototype.removeDisjunctiveRefine=function(){return this.removeDisjunctiveFacetRefinement.apply(this,arguments)},d.prototype.removeHierarchicalFacetRefinement=function(e){return this._change({state:this.state.resetPage().removeHierarchicalFacetRefinement(e),isPageReset:!0}),this},d.prototype.removeFacetRefinement=function(e,t){return this._change({state:this.state.resetPage().removeFacetRefinement(e,t),isPageReset:!0}),this},d.prototype.removeRefine=function(){return this.removeFacetRefinement.apply(this,arguments)},d.prototype.removeFacetExclusion=function(e,t){return this._change({state:this.state.resetPage().removeExcludeRefinement(e,t),isPageReset:!0}),this},d.prototype.removeExclude=function(){return this.removeFacetExclusion.apply(this,arguments)},d.prototype.removeTag=function(e){return this._change({state:this.state.resetPage().removeTagRefinement(e),isPageReset:!0}),this},d.prototype.toggleFacetExclusion=function(e,t){return this._change({state:this.state.resetPage().toggleExcludeFacetRefinement(e,t),isPageReset:!0}),this},d.prototype.toggleExclude=function(){return this.toggleFacetExclusion.apply(this,arguments)},d.prototype.toggleRefinement=function(e,t){return this.toggleFacetRefinement(e,t)},d.prototype.toggleFacetRefinement=function(e,t){return this._change({state:this.state.resetPage().toggleFacetRefinement(e,t),isPageReset:!0}),this},d.prototype.toggleRefine=function(){return this.toggleFacetRefinement.apply(this,arguments)},d.prototype.toggleTag=function(e){return this._change({state:this.state.resetPage().toggleTagRefinement(e),isPageReset:!0}),this},d.prototype.nextPage=function(){var e=this.state.page||0;return this.setPage(e+1)},d.prototype.previousPage=function(){var e=this.state.page||0;return this.setPage(e-1)},d.prototype.setCurrentPage=p,d.prototype.setPage=p,d.prototype.setIndex=function(e){return this._change({state:this.state.resetPage().setIndex(e),isPageReset:!0}),this},d.prototype.setQueryParameter=function(e,t){return this._change({state:this.state.resetPage().setQueryParameter(e,t),isPageReset:!0}),this},d.prototype.setState=function(e){return this._change({state:n.make(e),isPageReset:!1}),this},d.prototype.overrideStateWithoutTriggeringChangeEvent=function(e){return this.state=new n(e),this},d.prototype.hasRefinements=function(e){return!!o(this.state.getNumericRefinements(e))||(this.state.isConjunctiveFacet(e)?this.state.isFacetRefined(e):this.state.isDisjunctiveFacet(e)?this.state.isDisjunctiveFacetRefined(e):!!this.state.isHierarchicalFacet(e)&&this.state.isHierarchicalFacetRefined(e))},d.prototype.isExcluded=function(e,t){return this.state.isExcludeRefined(e,t)},d.prototype.isDisjunctiveRefined=function(e,t){return this.state.isDisjunctiveFacetRefined(e,t)},d.prototype.hasTag=function(e){return this.state.isTagRefined(e)},d.prototype.isTagRefined=function(){return this.hasTagRefinements.apply(this,arguments)},d.prototype.getIndex=function(){return this.state.index},d.prototype.getCurrentPage=v,d.prototype.getPage=v,d.prototype.getTags=function(){return this.state.tagRefinements},d.prototype.getRefinements=function(e){var t=[];if(this.state.isConjunctiveFacet(e))this.state.getConjunctiveRefinements(e).forEach((function(e){t.push({value:e,type:"conjunctive"})})),this.state.getExcludeRefinements(e).forEach((function(e){t.push({value:e,type:"exclude"})}));else if(this.state.isDisjunctiveFacet(e)){this.state.getDisjunctiveRefinements(e).forEach((function(e){t.push({value:e,type:"disjunctive"})}))}var r=this.state.getNumericRefinements(e);return Object.keys(r).forEach((function(e){var n=r[e];t.push({value:n,operator:e,type:"numeric"})})),t},d.prototype.getNumericRefinement=function(e,t){return this.state.getNumericRefinement(e,t)},d.prototype.getHierarchicalFacetBreadcrumb=function(e){return this.state.getHierarchicalFacetBreadcrumb(e)},d.prototype._search=function(e){var t=this.state,r=[],n=[];e.onlyWithDerivedHelpers||(n=s._getQueries(t.index,t),r.push({state:t,queriesCount:n.length,helper:this}),this.emit("search",{state:t,results:this.lastResults}));var i=this.derivedHelpers.map((function(e){var n=e.getModifiedState(t),i=n.index?s._getQueries(n.index,n):[];return r.push({state:n,queriesCount:i.length,helper:e}),e.emit("search",{state:n,results:e.lastResults}),i})),a=Array.prototype.concat.apply(n,i),c=this._queryId++;if(this._currentNbQueries++,!a.length)return Promise.resolve({results:[]}).then(this._dispatchAlgoliaResponse.bind(this,r,c));try{this.client.search(a).then(this._dispatchAlgoliaResponse.bind(this,r,c)).catch(this._dispatchAlgoliaError.bind(this,c))}catch(u){this.emit("error",{error:u})}},d.prototype._dispatchAlgoliaResponse=function(e,t,r){if(!(t0},d.prototype._change=function(e){var t=e.state,r=e.isPageReset;t!==this.state&&(this.state=t,this.emit("change",{state:this.state,results:this.lastResults,isPageReset:r}))},d.prototype.clearCache=function(){return this.client.clearCache&&this.client.clearCache(),this},d.prototype.setClient=function(e){return this.client===e||("function"==typeof e.addAlgoliaAgent&&e.addAlgoliaAgent("JS Helper ("+l+")"),this.client=e),this},d.prototype.getClient=function(){return this.client},d.prototype.derive=function(e){var t=new a(this,e);return this.derivedHelpers.push(t),t},d.prototype.detachDerivedHelper=function(e){var t=this.derivedHelpers.indexOf(e);if(-1===t)throw new Error("Derived helper already detached");this.derivedHelpers.splice(t,1)},d.prototype.hasPendingRequests=function(){return this._currentNbQueries>0},e.exports=d},74587:e=>{"use strict";e.exports=function(e){return Array.isArray(e)?e.filter(Boolean):[]}},52344:e=>{"use strict";e.exports=function(){return Array.prototype.slice.call(arguments).reduceRight((function(e,t){return Object.keys(Object(t)).forEach((function(r){void 0!==t[r]&&(void 0!==e[r]&&delete e[r],e[r]=t[r])})),e}),{})}},94039:e=>{"use strict";e.exports={escapeFacetValue:function(e){return"string"!=typeof e?e:String(e).replace(/^-/,"\\-")},unescapeFacetValue:function(e){return"string"!=typeof e?e:e.replace(/^\\-/,"-")}}},7888:e=>{"use strict";e.exports=function(e,t){if(Array.isArray(e))for(var r=0;r{"use strict";e.exports=function(e,t){if(!Array.isArray(e))return-1;for(var r=0;r{"use strict";var n=r(7888);e.exports=function(e,t){var r=(t||[]).map((function(e){return e.split(":")}));return e.reduce((function(e,t){var i=t.split(":"),a=n(r,(function(e){return e[0]===i[0]}));return i.length>1||!a?(e[0].push(i[0]),e[1].push(i[1]),e):(e[0].push(a[0]),e[1].push(a[1]),e)}),[[],[]])}},14853:e=>{"use strict";e.exports=function(e,t){e.prototype=Object.create(t.prototype,{constructor:{value:e,enumerable:!1,writable:!0,configurable:!0}})}},22686:e=>{"use strict";e.exports=function(e,t){return e.filter((function(r,n){return t.indexOf(r)>-1&&e.indexOf(r)===n}))}},60185:e=>{"use strict";function t(e){return"function"==typeof e||Array.isArray(e)||"[object Object]"===Object.prototype.toString.call(e)}function r(e,n){if(e===n)return e;for(var i in n)if(Object.prototype.hasOwnProperty.call(n,i)&&"__proto__"!==i&&"constructor"!==i){var a=n[i],s=e[i];void 0!==s&&void 0===a||(t(s)&&t(a)?e[i]=r(s,a):e[i]="object"==typeof(c=a)&&null!==c?r(Array.isArray(c)?[]:{},c):c)}var c;return e}e.exports=function(e){t(e)||(e={});for(var n=1,i=arguments.length;n{"use strict";e.exports=function(e){return e&&Object.keys(e).length>0}},49803:e=>{"use strict";e.exports=function(e,t){if(null===e)return{};var r,n,i={},a=Object.keys(e);for(n=0;n=0||(i[r]=e[r]);return i}},42148:e=>{"use strict";function t(e,t){if(e!==t){var r=void 0!==e,n=null===e,i=void 0!==t,a=null===t;if(!a&&e>t||n&&i||!r)return 1;if(!n&&e=n.length?a:"desc"===n[i]?-a:a}return e.index-r.index})),i.map((function(e){return e.value}))}},28023:e=>{"use strict";e.exports=function e(t){if("number"==typeof t)return t;if("string"==typeof t)return parseFloat(t);if(Array.isArray(t))return t.map(e);throw new Error("The value should be a number, a parsable string or an array of those.")}},96394:(e,t,r)=>{"use strict";var n=r(60185);function i(e){return Object.keys(e).sort((function(e,t){return e.localeCompare(t)})).reduce((function(t,r){return t[r]=e[r],t}),{})}var a={_getQueries:function(e,t){var r=[];return r.push({indexName:e,params:a._getHitsSearchParams(t)}),t.getRefinedDisjunctiveFacets().forEach((function(n){r.push({indexName:e,params:a._getDisjunctiveFacetSearchParams(t,n)})})),t.getRefinedHierarchicalFacets().forEach((function(n){var i=t.getHierarchicalFacetByName(n),s=t.getHierarchicalRefinement(n),c=t._getHierarchicalFacetSeparator(i);if(s.length>0&&s[0].split(c).length>1){var u=s[0].split(c).slice(0,-1).reduce((function(e,t,r){return e.concat({attribute:i.attributes[r],value:0===r?t:[e[e.length-1].value,t].join(c)})}),[]);u.forEach((function(n,s){var c=a._getDisjunctiveFacetSearchParams(t,n.attribute,0===s);function o(e){return i.attributes.some((function(t){return t===e.split(":")[0]}))}var h=(c.facetFilters||[]).reduce((function(e,t){if(Array.isArray(t)){var r=t.filter((function(e){return!o(e)}));r.length>0&&e.push(r)}return"string"!=typeof t||o(t)||e.push(t),e}),[]),f=u[s-1];c.facetFilters=s>0?h.concat(f.attribute+":"+f.value):h.length>0?h:void 0,r.push({indexName:e,params:c})}))}})),r},_getHitsSearchParams:function(e){var t=e.facets.concat(e.disjunctiveFacets).concat(a._getHitsHierarchicalFacetsAttributes(e)),r=a._getFacetFilters(e),s=a._getNumericFilters(e),c=a._getTagFilters(e),u={facets:t.indexOf("*")>-1?["*"]:t,tagFilters:c};return r.length>0&&(u.facetFilters=r),s.length>0&&(u.numericFilters=s),i(n({},e.getQueryParams(),u))},_getDisjunctiveFacetSearchParams:function(e,t,r){var s=a._getFacetFilters(e,t,r),c=a._getNumericFilters(e,t),u=a._getTagFilters(e),o={hitsPerPage:0,page:0,analytics:!1,clickAnalytics:!1};u.length>0&&(o.tagFilters=u);var h=e.getHierarchicalFacetByName(t);return o.facets=h?a._getDisjunctiveHierarchicalFacetAttribute(e,h,r):t,c.length>0&&(o.numericFilters=c),s.length>0&&(o.facetFilters=s),i(n({},e.getQueryParams(),o))},_getNumericFilters:function(e,t){if(e.numericFilters)return e.numericFilters;var r=[];return Object.keys(e.numericRefinements).forEach((function(n){var i=e.numericRefinements[n]||{};Object.keys(i).forEach((function(e){var a=i[e]||[];t!==n&&a.forEach((function(t){if(Array.isArray(t)){var i=t.map((function(t){return n+e+t}));r.push(i)}else r.push(n+e+t)}))}))})),r},_getTagFilters:function(e){return e.tagFilters?e.tagFilters:e.tagRefinements.join(",")},_getFacetFilters:function(e,t,r){var n=[],i=e.facetsRefinements||{};Object.keys(i).forEach((function(e){(i[e]||[]).forEach((function(t){n.push(e+":"+t)}))}));var a=e.facetsExcludes||{};Object.keys(a).forEach((function(e){(a[e]||[]).forEach((function(t){n.push(e+":-"+t)}))}));var s=e.disjunctiveFacetsRefinements||{};Object.keys(s).forEach((function(e){var r=s[e]||[];if(e!==t&&r&&0!==r.length){var i=[];r.forEach((function(t){i.push(e+":"+t)})),n.push(i)}}));var c=e.hierarchicalFacetsRefinements||{};return Object.keys(c).forEach((function(i){var a=(c[i]||[])[0];if(void 0!==a){var s,u,o=e.getHierarchicalFacetByName(i),h=e._getHierarchicalFacetSeparator(o),f=e._getHierarchicalRootPath(o);if(t===i){if(-1===a.indexOf(h)||!f&&!0===r||f&&f.split(h).length===a.split(h).length)return;f?(u=f.split(h).length-1,a=f):(u=a.split(h).length-2,a=a.slice(0,a.lastIndexOf(h))),s=o.attributes[u]}else u=a.split(h).length-1,s=o.attributes[u];s&&n.push([s+":"+a])}})),n},_getHitsHierarchicalFacetsAttributes:function(e){return e.hierarchicalFacets.reduce((function(t,r){var n=e.getHierarchicalRefinement(r.name)[0];if(!n)return t.push(r.attributes[0]),t;var i=e._getHierarchicalFacetSeparator(r),a=n.split(i).length,s=r.attributes.slice(0,a+1);return t.concat(s)}),[])},_getDisjunctiveHierarchicalFacetAttribute:function(e,t,r){var n=e._getHierarchicalFacetSeparator(t);if(!0===r){var i=e._getHierarchicalRootPath(t),a=0;return i&&(a=i.split(n).length),[t.attributes[a]]}var s=(e.getHierarchicalRefinement(t.name)[0]||"").split(n).length-1;return t.attributes.slice(0,s+1)},getSearchForFacetQuery:function(e,t,r,s){var c=s.isDisjunctiveFacet(e)?s.clearRefinements(e):s,u={facetQuery:t,facetName:e};return"number"==typeof r&&(u.maxFacetHits=r),i(n({},a._getHitsSearchParams(c),u))}};e.exports=a},46801:e=>{"use strict";e.exports=function(e){return null!==e&&/^[a-zA-Z0-9_-]{1,64}$/.test(e)}},24336:e=>{"use strict";e.exports="3.13.3"},70290:function(e){e.exports=function(){"use strict";function e(e,t,r){return t in e?Object.defineProperty(e,t,{value:r,enumerable:!0,configurable:!0,writable:!0}):e[t]=r,e}function t(e,t){var r=Object.keys(e);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(e);t&&(n=n.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),r.push.apply(r,n)}return r}function r(r){for(var n=1;n=0||(i[r]=e[r]);return i}(e,t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(i[r]=e[r])}return i}function i(e,t){return function(e){if(Array.isArray(e))return e}(e)||function(e,t){if(Symbol.iterator in Object(e)||"[object Arguments]"===Object.prototype.toString.call(e)){var r=[],n=!0,i=!1,a=void 0;try{for(var s,c=e[Symbol.iterator]();!(n=(s=c.next()).done)&&(r.push(s.value),!t||r.length!==t);n=!0);}catch(e){i=!0,a=e}finally{try{n||null==c.return||c.return()}finally{if(i)throw a}}return r}}(e,t)||function(){throw new TypeError("Invalid attempt to destructure non-iterable instance")}()}function a(e){return function(e){if(Array.isArray(e)){for(var t=0,r=new Array(e.length);t2&&void 0!==arguments[2]?arguments[2]:{miss:function(){return Promise.resolve()}};return Promise.resolve().then((function(){c();var t=JSON.stringify(e);return a()[t]})).then((function(e){return Promise.all([e?e.value:t(),void 0!==e])})).then((function(e){var t=i(e,2),n=t[0],a=t[1];return Promise.all([n,a||r.miss(n)])})).then((function(e){return i(e,1)[0]}))},set:function(e,t){return Promise.resolve().then((function(){var i=a();return i[JSON.stringify(e)]={timestamp:(new Date).getTime(),value:t},n().setItem(r,JSON.stringify(i)),t}))},delete:function(e){return Promise.resolve().then((function(){var t=a();delete t[JSON.stringify(e)],n().setItem(r,JSON.stringify(t))}))},clear:function(){return Promise.resolve().then((function(){n().removeItem(r)}))}}}function c(e){var t=a(e.caches),r=t.shift();return void 0===r?{get:function(e,t){var r=arguments.length>2&&void 0!==arguments[2]?arguments[2]:{miss:function(){return Promise.resolve()}};return t().then((function(e){return Promise.all([e,r.miss(e)])})).then((function(e){return i(e,1)[0]}))},set:function(e,t){return Promise.resolve(t)},delete:function(e){return Promise.resolve()},clear:function(){return Promise.resolve()}}:{get:function(e,n){var i=arguments.length>2&&void 0!==arguments[2]?arguments[2]:{miss:function(){return Promise.resolve()}};return r.get(e,n,i).catch((function(){return c({caches:t}).get(e,n,i)}))},set:function(e,n){return r.set(e,n).catch((function(){return c({caches:t}).set(e,n)}))},delete:function(e){return r.delete(e).catch((function(){return c({caches:t}).delete(e)}))},clear:function(){return r.clear().catch((function(){return c({caches:t}).clear()}))}}}function u(){var e=arguments.length>0&&void 0!==arguments[0]?arguments[0]:{serializable:!0},t={};return{get:function(r,n){var i=arguments.length>2&&void 0!==arguments[2]?arguments[2]:{miss:function(){return Promise.resolve()}},a=JSON.stringify(r);if(a in t)return Promise.resolve(e.serializable?JSON.parse(t[a]):t[a]);var s=n(),c=i&&i.miss||function(){return Promise.resolve()};return s.then((function(e){return c(e)})).then((function(){return s}))},set:function(r,n){return t[JSON.stringify(r)]=e.serializable?JSON.stringify(n):n,Promise.resolve(n)},delete:function(e){return delete t[JSON.stringify(e)],Promise.resolve()},clear:function(){return t={},Promise.resolve()}}}function o(e){for(var t=e.length-1;t>0;t--){var r=Math.floor(Math.random()*(t+1)),n=e[t];e[t]=e[r],e[r]=n}return e}function h(e,t){return t?(Object.keys(t).forEach((function(r){e[r]=t[r](e)})),e):e}function f(e){for(var t=arguments.length,r=new Array(t>1?t-1:0),n=1;n0?n:void 0,timeout:r.timeout||t,headers:r.headers||{},queryParameters:r.queryParameters||{},cacheable:r.cacheable}}var d={Read:1,Write:2,Any:3},p=1,v=2,g=3;function y(e){var t=arguments.length>1&&void 0!==arguments[1]?arguments[1]:p;return r(r({},e),{},{status:t,lastUpdate:Date.now()})}function R(e){return"string"==typeof e?{protocol:"https",url:e,accept:d.Any}:{protocol:e.protocol||"https",url:e.url,accept:e.accept||d.Any}}var F="GET",b="POST";function P(e,t){return Promise.all(t.map((function(t){return e.get(t,(function(){return Promise.resolve(y(t))}))}))).then((function(e){var r=e.filter((function(e){return function(e){return e.status===p||Date.now()-e.lastUpdate>12e4}(e)})),n=e.filter((function(e){return function(e){return e.status===g&&Date.now()-e.lastUpdate<=12e4}(e)})),i=[].concat(a(r),a(n));return{getTimeout:function(e,t){return(0===n.length&&0===e?1:n.length+3+e)*t},statelessHosts:i.length>0?i.map((function(e){return R(e)})):t}}))}function j(e,t,n,i){var s=[],c=function(e,t){if(e.method!==F&&(void 0!==e.data||void 0!==t.data)){var n=Array.isArray(e.data)?e.data:r(r({},e.data),t.data);return JSON.stringify(n)}}(n,i),u=function(e,t){var n=r(r({},e.headers),t.headers),i={};return Object.keys(n).forEach((function(e){var t=n[e];i[e.toLowerCase()]=t})),i}(e,i),o=n.method,h=n.method!==F?{}:r(r({},n.data),i.data),f=r(r(r({"x-algolia-agent":e.userAgent.value},e.queryParameters),h),i.queryParameters),l=0,m=function t(r,a){var h=r.pop();if(void 0===h)throw{name:"RetryError",message:"Unreachable hosts - your application id may be incorrect. If the error persists, contact support@algolia.com.",transporterStackTrace:O(s)};var m={data:c,headers:u,method:o,url:E(h,n.path,f),connectTimeout:a(l,e.timeouts.connect),responseTimeout:a(l,i.timeout)},d=function(e){var t={request:m,response:e,host:h,triesLeft:r.length};return s.push(t),t},p={onSuccess:function(e){return function(e){try{return JSON.parse(e.content)}catch(t){throw function(e,t){return{name:"DeserializationError",message:e,response:t}}(t.message,e)}}(e)},onRetry:function(n){var i=d(n);return n.isTimedOut&&l++,Promise.all([e.logger.info("Retryable failure",w(i)),e.hostsCache.set(h,y(h,n.isTimedOut?g:v))]).then((function(){return t(r,a)}))},onFail:function(e){throw d(e),function(e,t){var r=e.content,n=e.status,i=r;try{i=JSON.parse(r).message}catch(e){}return function(e,t,r){return{name:"ApiError",message:e,status:t,transporterStackTrace:r}}(i,n,t)}(e,O(s))}};return e.requester.send(m).then((function(e){return function(e,t){return function(e){var t=e.status;return e.isTimedOut||function(e){var t=e.isTimedOut,r=e.status;return!t&&0==~~r}(e)||2!=~~(t/100)&&4!=~~(t/100)}(e)?t.onRetry(e):2==~~(e.status/100)?t.onSuccess(e):t.onFail(e)}(e,p)}))};return P(e.hostsCache,t).then((function(e){return m(a(e.statelessHosts).reverse(),e.getTimeout)}))}function _(e){var t={value:"Algolia for JavaScript (".concat(e,")"),add:function(e){var r="; ".concat(e.segment).concat(void 0!==e.version?" (".concat(e.version,")"):"");return-1===t.value.indexOf(r)&&(t.value="".concat(t.value).concat(r)),t}};return t}function E(e,t,r){var n=x(r),i="".concat(e.protocol,"://").concat(e.url,"/").concat("/"===t.charAt(0)?t.substr(1):t);return n.length&&(i+="?".concat(n)),i}function x(e){return Object.keys(e).map((function(t){return f("%s=%s",t,(r=e[t],"[object Object]"===Object.prototype.toString.call(r)||"[object Array]"===Object.prototype.toString.call(r)?JSON.stringify(e[t]):e[t]));var r})).join("&")}function O(e){return e.map((function(e){return w(e)}))}function w(e){var t=e.request.headers["x-algolia-api-key"]?{"x-algolia-api-key":"*****"}:{};return r(r({},e),{},{request:r(r({},e.request),{},{headers:r(r({},e.request.headers),t)})})}var N=function(e){var t=e.appId,n=function(e,t,r){var n={"x-algolia-api-key":r,"x-algolia-application-id":t};return{headers:function(){return e===l.WithinHeaders?n:{}},queryParameters:function(){return e===l.WithinQueryParameters?n:{}}}}(void 0!==e.authMode?e.authMode:l.WithinHeaders,t,e.apiKey),a=function(e){var t=e.hostsCache,r=e.logger,n=e.requester,a=e.requestsCache,s=e.responsesCache,c=e.timeouts,u=e.userAgent,o=e.hosts,h=e.queryParameters,f={hostsCache:t,logger:r,requester:n,requestsCache:a,responsesCache:s,timeouts:c,userAgent:u,headers:e.headers,queryParameters:h,hosts:o.map((function(e){return R(e)})),read:function(e,t){var r=m(t,f.timeouts.read),n=function(){return j(f,f.hosts.filter((function(e){return 0!=(e.accept&d.Read)})),e,r)};if(!0!==(void 0!==r.cacheable?r.cacheable:e.cacheable))return n();var a={request:e,mappedRequestOptions:r,transporter:{queryParameters:f.queryParameters,headers:f.headers}};return f.responsesCache.get(a,(function(){return f.requestsCache.get(a,(function(){return f.requestsCache.set(a,n()).then((function(e){return Promise.all([f.requestsCache.delete(a),e])}),(function(e){return Promise.all([f.requestsCache.delete(a),Promise.reject(e)])})).then((function(e){var t=i(e,2);return t[0],t[1]}))}))}),{miss:function(e){return f.responsesCache.set(a,e)}})},write:function(e,t){return j(f,f.hosts.filter((function(e){return 0!=(e.accept&d.Write)})),e,m(t,f.timeouts.write))}};return f}(r(r({hosts:[{url:"".concat(t,"-dsn.algolia.net"),accept:d.Read},{url:"".concat(t,".algolia.net"),accept:d.Write}].concat(o([{url:"".concat(t,"-1.algolianet.com")},{url:"".concat(t,"-2.algolianet.com")},{url:"".concat(t,"-3.algolianet.com")}]))},e),{},{headers:r(r(r({},n.headers()),{"content-type":"application/x-www-form-urlencoded"}),e.headers),queryParameters:r(r({},n.queryParameters()),e.queryParameters)}));return h({transporter:a,appId:t,addAlgoliaAgent:function(e,t){a.userAgent.add({segment:e,version:t})},clearCache:function(){return Promise.all([a.requestsCache.clear(),a.responsesCache.clear()]).then((function(){}))}},e.methods)},A=function(e){return function(t,r){return t.method===F?e.transporter.read(t,r):e.transporter.write(t,r)}},H=function(e){return function(t){var r=arguments.length>1&&void 0!==arguments[1]?arguments[1]:{};return h({transporter:e.transporter,appId:e.appId,indexName:t},r.methods)}},S=function(e){return function(t,n){var i=t.map((function(e){return r(r({},e),{},{params:x(e.params||{})})}));return e.transporter.read({method:b,path:"1/indexes/*/queries",data:{requests:i},cacheable:!0},n)}},T=function(e){return function(t,i){return Promise.all(t.map((function(t){var a=t.params,s=a.facetName,c=a.facetQuery,u=n(a,["facetName","facetQuery"]);return H(e)(t.indexName,{methods:{searchForFacetValues:I}}).searchForFacetValues(s,c,r(r({},i),u))})))}},Q=function(e){return function(t,r,n){return e.transporter.read({method:b,path:f("1/answers/%s/prediction",e.indexName),data:{query:t,queryLanguages:r},cacheable:!0},n)}},C=function(e){return function(t,r){return e.transporter.read({method:b,path:f("1/indexes/%s/query",e.indexName),data:{query:t},cacheable:!0},r)}},I=function(e){return function(t,r,n){return e.transporter.read({method:b,path:f("1/indexes/%s/facets/%s/query",e.indexName,t),data:{facetQuery:r},cacheable:!0},n)}},k=1,D=2,q=3;function V(e,t,n){var i,a={appId:e,apiKey:t,timeouts:{connect:1,read:2,write:30},requester:{send:function(e){return new Promise((function(t){var r=new XMLHttpRequest;r.open(e.method,e.url,!0),Object.keys(e.headers).forEach((function(t){return r.setRequestHeader(t,e.headers[t])}));var n,i=function(e,n){return setTimeout((function(){r.abort(),t({status:0,content:n,isTimedOut:!0})}),1e3*e)},a=i(e.connectTimeout,"Connection timeout");r.onreadystatechange=function(){r.readyState>r.OPENED&&void 0===n&&(clearTimeout(a),n=i(e.responseTimeout,"Socket timeout"))},r.onerror=function(){0===r.status&&(clearTimeout(a),clearTimeout(n),t({content:r.responseText||"Network request failed",status:r.status,isTimedOut:!1}))},r.onload=function(){clearTimeout(a),clearTimeout(n),t({content:r.responseText,status:r.status,isTimedOut:!1})},r.send(e.data)}))}},logger:(i=q,{debug:function(e,t){return k>=i&&console.debug(e,t),Promise.resolve()},info:function(e,t){return D>=i&&console.info(e,t),Promise.resolve()},error:function(e,t){return console.error(e,t),Promise.resolve()}}),responsesCache:u(),requestsCache:u({serializable:!1}),hostsCache:c({caches:[s({key:"".concat("4.19.0","-").concat(e)}),u()]}),userAgent:_("4.19.0").add({segment:"Browser",version:"lite"}),authMode:l.WithinQueryParameters};return N(r(r(r({},a),n),{},{methods:{search:S,searchForFacetValues:T,multipleQueries:S,multipleSearchForFacetValues:T,customRequest:A,initIndex:function(e){return function(t){return H(e)(t,{methods:{search:C,searchForFacetValues:I,findAnswers:Q}})}}}}))}return V.version="4.19.0",V}()},56675:(e,t,r)=>{"use strict";r.r(t),r.d(t,{default:()=>A});var n=r(67294),i=r(86010),a=r(8131),s=r.n(a),c=r(70290),u=r.n(c),o=r(10412),h=r(35742),f=r(39960),l=r(80143),m=r(52263),d=["zero","one","two","few","many","other"];function p(e){return d.filter((function(t){return e.includes(t)}))}var v={locale:"en",pluralForms:p(["one","other"]),select:function(e){return 1===e?"one":"other"}};function g(){var e=(0,m.Z)().i18n.currentLocale;return(0,n.useMemo)((function(){try{return t=e,r=new Intl.PluralRules(t),{locale:t,pluralForms:p(r.resolvedOptions().pluralCategories),select:function(e){return r.select(e)}}}catch(n){return console.error('Failed to use Intl.PluralRules for locale "'+e+'".\nDocusaurus will fallback to the default (English) implementation.\nError: '+n.message+"\n"),v}var t,r}),[e])}function y(){var e=g();return{selectMessage:function(t,r){return function(e,t,r){var n=e.split("|");if(1===n.length)return n[0];n.length>r.pluralForms.length&&console.error("For locale="+r.locale+", a maximum of "+r.pluralForms.length+" plural forms are expected ("+r.pluralForms.join(",")+"), but the message contains "+n.length+": "+e);var i=r.select(t),a=r.pluralForms.indexOf(i);return n[Math.min(a,n.length-1)]}(r,t,e)}}}var R=r(66177),F=r(69688),b=r(10833),P=r(82128),j=r(95999),_=r(6278),E=r(239),x=r(7452);const O={searchQueryInput:"searchQueryInput_u2C7",searchVersionInput:"searchVersionInput_m0Ui",searchResultsColumn:"searchResultsColumn_JPFH",algoliaLogo:"algoliaLogo_rT1R",algoliaLogoPathFill:"algoliaLogoPathFill_WdUC",searchResultItem:"searchResultItem_Tv2o",searchResultItemHeading:"searchResultItemHeading_KbCB",searchResultItemPath:"searchResultItemPath_lhe1",searchResultItemSummary:"searchResultItemSummary_AEaO",searchQueryColumn:"searchQueryColumn_RTkw",searchVersionColumn:"searchVersionColumn_ypXd",searchLogoColumn:"searchLogoColumn_rJIA",loadingSpinner:"loadingSpinner_XVxU","loading-spin":"loading-spin_vzvp",loader:"loader_vvXV"};function w(e){var t=e.docsSearchVersionsHelpers,r=Object.entries(t.allDocsData).filter((function(e){return e[1].versions.length>1}));return n.createElement("div",{className:(0,i.Z)("col","col--3","padding-left--none",O.searchVersionColumn)},r.map((function(e){var i=e[0],a=e[1],s=r.length>1?i+": ":"";return n.createElement("select",{key:i,onChange:function(e){return t.setSearchVersion(i,e.target.value)},defaultValue:t.searchVersions[i],className:O.searchVersionInput},a.versions.map((function(e,t){return n.createElement("option",{key:t,label:""+s+e.label,value:e.name})})))})))}function N(){var e,t,r,a,c,d,p=(0,m.Z)().i18n.currentLocale,v=(0,_.L)().algolia,g=v.appId,b=v.apiKey,N=v.indexName,A=(0,E.l)(),H=(e=y().selectMessage,function(t){return e(t,(0,j.I)({id:"theme.SearchPage.documentsFound.plurals",description:'Pluralized label for "{count} documents found". Use as much plural forms (separated by "|") as your language support (see https://www.unicode.org/cldr/cldr-aux/charts/34/supplemental/language_plural_rules.html)',message:"One document found|{count} documents found"},{count:t}))}),S=(t=(0,l._r)(),r=(0,n.useState)((function(){return Object.entries(t).reduce((function(e,t){var r,n=t[0],i=t[1];return Object.assign({},e,((r={})[n]=i.versions[0].name,r))}),{})})),a=r[0],c=r[1],d=Object.values(t).some((function(e){return e.versions.length>1})),{allDocsData:t,versioningEnabled:d,searchVersions:a,setSearchVersion:function(e,t){return c((function(r){var n;return Object.assign({},r,((n={})[e]=t,n))}))}}),T=(0,R.K)(),Q=T[0],C=T[1],I={items:[],query:null,totalResults:null,totalPages:null,lastPage:null,hasMore:null,loading:null},k=(0,n.useReducer)((function(e,t){switch(t.type){case"reset":return I;case"loading":return Object.assign({},e,{loading:!0});case"update":return Q!==t.value.query?e:Object.assign({},t.value,{items:0===t.value.lastPage?t.value.items:e.items.concat(t.value.items)});case"advance":var r=e.totalPages>e.lastPage+1;return Object.assign({},e,{lastPage:r?e.lastPage+1:e.lastPage,hasMore:r});default:return e}}),I),D=k[0],q=k[1],V=u()(g,b),L=s()(V,N,{hitsPerPage:15,advancedSyntax:!0,disjunctiveFacets:["language","docusaurus_tag"]});L.on("result",(function(e){var t=e.results,r=t.query,n=t.hits,i=t.page,a=t.nbHits,s=t.nbPages;if(""!==r&&Array.isArray(n)){var c=function(e){return e.replace(/algolia-docsearch-suggestion--highlight/g,"search-result-match")},u=n.map((function(e){var t=e.url,r=e._highlightResult.hierarchy,n=e._snippetResult,i=void 0===n?{}:n,a=Object.keys(r).map((function(e){return c(r[e].value)}));return{title:a.pop(),url:A(t),summary:i.content?c(i.content.value)+"...":"",breadcrumbs:a}}));q({type:"update",value:{items:u,query:r,totalResults:a,totalPages:s,lastPage:i,hasMore:s>i+1,loading:!1}})}else q({type:"reset"})}));var B=(0,n.useState)(null),z=B[0],M=B[1],J=(0,n.useRef)(0),W=(0,n.useRef)(o.Z.canUseIntersectionObserver&&new IntersectionObserver((function(e){var t=e[0],r=t.isIntersecting,n=t.boundingClientRect.y;r&&J.current>n&&q({type:"advance"}),J.current=n}),{threshold:1})),U=function(){return Q?(0,j.I)({id:"theme.SearchPage.existingResultsTitle",message:'Search results for "{query}"',description:"The search page title for non-empty query"},{query:Q}):(0,j.I)({id:"theme.SearchPage.emptyResultsTitle",message:"Search the documentation",description:"The search page title for empty query"})},Z=(0,F.zX)((function(e){void 0===e&&(e=0),L.addDisjunctiveFacetRefinement("docusaurus_tag","default"),L.addDisjunctiveFacetRefinement("language",p),Object.entries(S.searchVersions).forEach((function(e){var t=e[0],r=e[1];L.addDisjunctiveFacetRefinement("docusaurus_tag","docs-"+t+"-"+r)})),L.setQuery(Q).setPage(e).search()}));return(0,n.useEffect)((function(){if(z){var e=W.current;return e?(e.observe(z),function(){return e.unobserve(z)}):function(){return!0}}}),[z]),(0,n.useEffect)((function(){q({type:"reset"}),Q&&(q({type:"loading"}),setTimeout((function(){Z()}),300))}),[Q,S.searchVersions,Z]),(0,n.useEffect)((function(){D.lastPage&&0!==D.lastPage&&Z(D.lastPage)}),[Z,D.lastPage]),n.createElement(x.Z,null,n.createElement(h.Z,null,n.createElement("title",null,(0,P.p)(U())),n.createElement("meta",{property:"robots",content:"noindex, follow"})),n.createElement("div",{className:"container margin-vert--lg"},n.createElement("h1",null,U()),n.createElement("form",{className:"row",onSubmit:function(e){return e.preventDefault()}},n.createElement("div",{className:(0,i.Z)("col",O.searchQueryColumn,{"col--9":S.versioningEnabled,"col--12":!S.versioningEnabled})},n.createElement("input",{type:"search",name:"q",className:O.searchQueryInput,placeholder:(0,j.I)({id:"theme.SearchPage.inputPlaceholder",message:"Type your search here",description:"The placeholder for search page input"}),"aria-label":(0,j.I)({id:"theme.SearchPage.inputLabel",message:"Search",description:"The ARIA label for search page input"}),onChange:function(e){return C(e.target.value)},value:Q,autoComplete:"off",autoFocus:!0})),S.versioningEnabled&&n.createElement(w,{docsSearchVersionsHelpers:S})),n.createElement("div",{className:"row"},n.createElement("div",{className:(0,i.Z)("col","col--8",O.searchResultsColumn)},!!D.totalResults&&H(D.totalResults)),n.createElement("div",{className:(0,i.Z)("col","col--4","text--right",O.searchLogoColumn)},n.createElement("a",{target:"_blank",rel:"noopener noreferrer",href:"https://www.algolia.com/","aria-label":(0,j.I)({id:"theme.SearchPage.algoliaLabel",message:"Search by Algolia",description:"The ARIA label for Algolia mention"})},n.createElement("svg",{viewBox:"0 0 168 24",className:O.algoliaLogo},n.createElement("g",{fill:"none"},n.createElement("path",{className:O.algoliaLogoPathFill,d:"M120.925 18.804c-4.386.02-4.386-3.54-4.386-4.106l-.007-13.336 2.675-.424v13.254c0 .322 0 2.358 1.718 2.364v2.248zm-10.846-2.18c.821 0 1.43-.047 1.855-.129v-2.719a6.334 6.334 0 0 0-1.574-.199 5.7 5.7 0 0 0-.897.069 2.699 2.699 0 0 0-.814.24c-.24.116-.439.28-.582.491-.15.212-.219.335-.219.656 0 .628.219.991.616 1.23s.938.362 1.615.362zm-.233-9.7c.883 0 1.629.109 2.231.328.602.218 1.088.525 1.444.915.363.396.609.922.76 1.483.157.56.232 1.175.232 1.85v6.874a32.5 32.5 0 0 1-1.868.314c-.834.123-1.772.185-2.813.185-.69 0-1.327-.069-1.895-.198a4.001 4.001 0 0 1-1.471-.636 3.085 3.085 0 0 1-.951-1.134c-.226-.465-.343-1.12-.343-1.803 0-.656.13-1.073.384-1.525a3.24 3.24 0 0 1 1.047-1.106c.445-.287.95-.492 1.532-.615a8.8 8.8 0 0 1 1.82-.185 8.404 8.404 0 0 1 1.972.24v-.438c0-.307-.035-.6-.11-.874a1.88 1.88 0 0 0-.384-.73 1.784 1.784 0 0 0-.724-.493 3.164 3.164 0 0 0-1.143-.205c-.616 0-1.177.075-1.69.164a7.735 7.735 0 0 0-1.26.307l-.321-2.192c.335-.117.834-.233 1.478-.349a10.98 10.98 0 0 1 2.073-.178zm52.842 9.626c.822 0 1.43-.048 1.854-.13V13.7a6.347 6.347 0 0 0-1.574-.199c-.294 0-.595.021-.896.069a2.7 2.7 0 0 0-.814.24 1.46 1.46 0 0 0-.582.491c-.15.212-.218.335-.218.656 0 .628.218.991.615 1.23.404.245.938.362 1.615.362zm-.226-9.694c.883 0 1.629.108 2.231.327.602.219 1.088.526 1.444.915.355.39.609.923.759 1.483a6.8 6.8 0 0 1 .233 1.852v6.873c-.41.088-1.034.19-1.868.314-.834.123-1.772.184-2.813.184-.69 0-1.327-.068-1.895-.198a4.001 4.001 0 0 1-1.471-.635 3.085 3.085 0 0 1-.951-1.134c-.226-.465-.343-1.12-.343-1.804 0-.656.13-1.073.384-1.524.26-.45.608-.82 1.047-1.107.445-.286.95-.491 1.532-.614a8.803 8.803 0 0 1 2.751-.13c.329.034.671.096 1.04.185v-.437a3.3 3.3 0 0 0-.109-.875 1.873 1.873 0 0 0-.384-.731 1.784 1.784 0 0 0-.724-.492 3.165 3.165 0 0 0-1.143-.205c-.616 0-1.177.075-1.69.164a7.75 7.75 0 0 0-1.26.307l-.321-2.193c.335-.116.834-.232 1.478-.348a11.633 11.633 0 0 1 2.073-.177zm-8.034-1.271a1.626 1.626 0 0 1-1.628-1.62c0-.895.725-1.62 1.628-1.62.904 0 1.63.725 1.63 1.62 0 .895-.733 1.62-1.63 1.62zm1.348 13.22h-2.689V7.27l2.69-.423v11.956zm-4.714 0c-4.386.02-4.386-3.54-4.386-4.107l-.008-13.336 2.676-.424v13.254c0 .322 0 2.358 1.718 2.364v2.248zm-8.698-5.903c0-1.156-.253-2.119-.746-2.788-.493-.677-1.183-1.01-2.067-1.01-.882 0-1.574.333-2.065 1.01-.493.676-.733 1.632-.733 2.788 0 1.168.246 1.953.74 2.63.492.683 1.183 1.018 2.066 1.018.882 0 1.574-.342 2.067-1.019.492-.683.738-1.46.738-2.63zm2.737-.007c0 .902-.13 1.584-.397 2.33a5.52 5.52 0 0 1-1.128 1.906 4.986 4.986 0 0 1-1.752 1.223c-.685.286-1.739.45-2.265.45-.528-.006-1.574-.157-2.252-.45a5.096 5.096 0 0 1-1.744-1.223c-.487-.527-.863-1.162-1.137-1.906a6.345 6.345 0 0 1-.41-2.33c0-.902.123-1.77.397-2.508a5.554 5.554 0 0 1 1.15-1.892 5.133 5.133 0 0 1 1.75-1.216c.679-.287 1.425-.423 2.232-.423.808 0 1.553.142 2.237.423a4.88 4.88 0 0 1 1.753 1.216 5.644 5.644 0 0 1 1.135 1.892c.287.738.431 1.606.431 2.508zm-20.138 0c0 1.12.246 2.363.738 2.882.493.52 1.13.78 1.91.78.424 0 .828-.062 1.204-.178.377-.116.677-.253.917-.417V9.33a10.476 10.476 0 0 0-1.766-.226c-.971-.028-1.71.37-2.23 1.004-.513.636-.773 1.75-.773 2.788zm7.438 5.274c0 1.824-.466 3.156-1.404 4.004-.936.846-2.367 1.27-4.296 1.27-.705 0-2.17-.137-3.34-.396l.431-2.118c.98.205 2.272.26 2.95.26 1.074 0 1.84-.219 2.299-.656.459-.437.684-1.086.684-1.948v-.437a8.07 8.07 0 0 1-1.047.397c-.43.13-.93.198-1.492.198-.739 0-1.41-.116-2.018-.349a4.206 4.206 0 0 1-1.567-1.025c-.431-.45-.774-1.017-1.013-1.694-.24-.677-.363-1.885-.363-2.773 0-.834.13-1.88.384-2.577.26-.696.629-1.298 1.129-1.796.493-.498 1.095-.881 1.8-1.162a6.605 6.605 0 0 1 2.428-.457c.87 0 1.67.109 2.45.24.78.129 1.444.265 1.985.415V18.17zM6.972 6.677v1.627c-.712-.446-1.52-.67-2.425-.67-.585 0-1.045.13-1.38.391a1.24 1.24 0 0 0-.502 1.03c0 .425.164.765.494 1.02.33.256.835.532 1.516.83.447.192.795.356 1.045.495.25.138.537.332.862.582.324.25.563.548.718.894.154.345.23.741.23 1.188 0 .947-.334 1.691-1.004 2.234-.67.542-1.537.814-2.601.814-1.18 0-2.16-.229-2.936-.686v-1.708c.84.628 1.814.942 2.92.942.585 0 1.048-.136 1.388-.407.34-.271.51-.646.51-1.125 0-.287-.1-.55-.302-.79-.203-.24-.42-.42-.655-.542-.234-.123-.585-.29-1.053-.503a61.27 61.27 0 0 1-.582-.271 13.67 13.67 0 0 1-.55-.287 4.275 4.275 0 0 1-.567-.351 6.92 6.92 0 0 1-.455-.4c-.18-.17-.31-.34-.39-.51-.08-.17-.155-.37-.224-.598a2.553 2.553 0 0 1-.104-.742c0-.915.333-1.638.998-2.17.664-.532 1.523-.798 2.576-.798.968 0 1.793.17 2.473.51zm7.468 5.696v-.287c-.022-.607-.187-1.088-.495-1.444-.309-.357-.75-.535-1.324-.535-.532 0-.99.194-1.373.583-.382.388-.622.949-.717 1.683h3.909zm1.005 2.792v1.404c-.596.34-1.383.51-2.362.51-1.255 0-2.255-.377-3-1.132-.744-.755-1.116-1.744-1.116-2.968 0-1.297.34-2.316 1.021-3.055.68-.74 1.548-1.11 2.6-1.11 1.033 0 1.852.323 2.458.966.606.644.91 1.572.91 2.784 0 .33-.033.676-.096 1.038h-5.314c.107.702.405 1.239.894 1.611.49.372 1.106.558 1.85.558.862 0 1.58-.202 2.155-.606zm6.605-1.77h-1.212c-.596 0-1.045.116-1.349.35-.303.234-.454.532-.454.894 0 .372.117.664.35.877.235.213.575.32 1.022.32.51 0 .912-.142 1.204-.424.293-.281.44-.651.44-1.108v-.91zm-4.068-2.554V9.325c.627-.361 1.457-.542 2.489-.542 2.116 0 3.175 1.026 3.175 3.08V17h-1.548v-.957c-.415.68-1.143 1.02-2.186 1.02-.766 0-1.38-.22-1.843-.661-.462-.442-.694-1.003-.694-1.684 0-.776.293-1.38.878-1.81.585-.431 1.404-.647 2.457-.647h1.34V11.8c0-.554-.133-.971-.399-1.253-.266-.282-.707-.423-1.324-.423a4.07 4.07 0 0 0-2.345.718zm9.333-1.93v1.42c.394-1 1.101-1.5 2.123-1.5.148 0 .313.016.494.048v1.531a1.885 1.885 0 0 0-.75-.143c-.542 0-.989.24-1.34.718-.351.479-.527 1.048-.527 1.707V17h-1.563V8.91h1.563zm5.01 4.084c.022.82.272 1.492.75 2.019.479.526 1.15.79 2.01.79.639 0 1.235-.176 1.788-.527v1.404c-.521.319-1.186.479-1.995.479-1.265 0-2.276-.4-3.031-1.197-.755-.798-1.133-1.792-1.133-2.984 0-1.16.38-2.151 1.14-2.975.761-.825 1.79-1.237 3.088-1.237.702 0 1.346.149 1.93.447v1.436a3.242 3.242 0 0 0-1.77-.495c-.84 0-1.513.266-2.019.798-.505.532-.758 1.213-.758 2.042zM40.24 5.72v4.579c.458-1 1.293-1.5 2.505-1.5.787 0 1.42.245 1.899.734.479.49.718 1.17.718 2.042V17h-1.564v-5.106c0-.553-.14-.98-.422-1.284-.282-.303-.652-.455-1.11-.455-.531 0-1.002.202-1.411.606-.41.405-.615 1.022-.615 1.851V17h-1.563V5.72h1.563zm14.966 10.02c.596 0 1.096-.253 1.5-.758.404-.506.606-1.157.606-1.955 0-.915-.202-1.62-.606-2.114-.404-.495-.92-.742-1.548-.742-.553 0-1.05.224-1.491.67-.442.447-.662 1.133-.662 2.058 0 .958.212 1.67.638 2.138.425.469.946.703 1.563.703zM53.004 5.72v4.42c.574-.894 1.388-1.341 2.44-1.341 1.022 0 1.857.383 2.506 1.149.649.766.973 1.781.973 3.047 0 1.138-.309 2.109-.925 2.912-.617.803-1.463 1.205-2.537 1.205-1.075 0-1.894-.447-2.457-1.34V17h-1.58V5.72h1.58zm9.908 11.104l-3.223-7.913h1.739l1.005 2.632 1.26 3.415c.096-.32.48-1.458 1.15-3.415l.909-2.632h1.66l-2.92 7.866c-.777 2.074-1.963 3.11-3.559 3.11a2.92 2.92 0 0 1-.734-.079v-1.34c.17.042.351.064.543.064 1.032 0 1.755-.57 2.17-1.708z"}),n.createElement("path",{fill:"#5468FF",d:"M78.988.938h16.594a2.968 2.968 0 0 1 2.966 2.966V20.5a2.967 2.967 0 0 1-2.966 2.964H78.988a2.967 2.967 0 0 1-2.966-2.964V3.897A2.961 2.961 0 0 1 78.988.938z"}),n.createElement("path",{fill:"white",d:"M89.632 5.967v-.772a.978.978 0 0 0-.978-.977h-2.28a.978.978 0 0 0-.978.977v.793c0 .088.082.15.171.13a7.127 7.127 0 0 1 1.984-.28c.65 0 1.295.088 1.917.259.082.02.164-.04.164-.13m-6.248 1.01l-.39-.389a.977.977 0 0 0-1.382 0l-.465.465a.973.973 0 0 0 0 1.38l.383.383c.062.061.15.047.205-.014.226-.307.472-.601.746-.874.281-.28.568-.526.883-.751.068-.042.075-.137.02-.2m4.16 2.453v3.341c0 .096.104.165.192.117l2.97-1.537c.068-.034.089-.117.055-.184a3.695 3.695 0 0 0-3.08-1.866c-.068 0-.136.054-.136.13m0 8.048a4.489 4.489 0 0 1-4.49-4.482 4.488 4.488 0 0 1 4.49-4.482 4.488 4.488 0 0 1 4.489 4.482 4.484 4.484 0 0 1-4.49 4.482m0-10.85a6.363 6.363 0 1 0 0 12.729 6.37 6.37 0 0 0 6.372-6.368 6.358 6.358 0 0 0-6.371-6.36"})))))),D.items.length>0?n.createElement("main",null,D.items.map((function(e,t){var r=e.title,a=e.url,s=e.summary,c=e.breadcrumbs;return n.createElement("article",{key:t,className:O.searchResultItem},n.createElement("h2",{className:O.searchResultItemHeading},n.createElement(f.Z,{to:a,dangerouslySetInnerHTML:{__html:r}})),c.length>0&&n.createElement("nav",{"aria-label":"breadcrumbs"},n.createElement("ul",{className:(0,i.Z)("breadcrumbs",O.searchResultItemPath)},c.map((function(e,t){return n.createElement("li",{key:t,className:"breadcrumbs__item",dangerouslySetInnerHTML:{__html:e}})})))),s&&n.createElement("p",{className:O.searchResultItemSummary,dangerouslySetInnerHTML:{__html:s}}))}))):[Q&&!D.loading&&n.createElement("p",{key:"no-results"},n.createElement(j.Z,{id:"theme.SearchPage.noResultsText",description:"The paragraph for empty search result"},"No results were found")),!!D.loading&&n.createElement("div",{key:"spinner",className:O.loadingSpinner})],D.hasMore&&n.createElement("div",{className:O.loader,ref:M},n.createElement(j.Z,{id:"theme.SearchPage.fetchingNewResults",description:"The paragraph for fetching new search results"},"Fetching new results..."))))}function A(){return n.createElement(b.FG,{className:"search-page-wrapper"},n.createElement(N,null))}}}]); \ No newline at end of file +/*! For license information please see 1a4e3797.1ecd994c.js.LICENSE.txt */ +(self.webpackChunk_cumulus_website=self.webpackChunk_cumulus_website||[]).push([[97920],{17331:e=>{function t(){this._events=this._events||{},this._maxListeners=this._maxListeners||void 0}function r(e){return"function"==typeof e}function n(e){return"object"==typeof e&&null!==e}function i(e){return void 0===e}e.exports=t,t.prototype._events=void 0,t.prototype._maxListeners=void 0,t.defaultMaxListeners=10,t.prototype.setMaxListeners=function(e){if("number"!=typeof e||e<0||isNaN(e))throw TypeError("n must be a positive number");return this._maxListeners=e,this},t.prototype.emit=function(e){var t,a,s,c,u,o;if(this._events||(this._events={}),"error"===e&&(!this._events.error||n(this._events.error)&&!this._events.error.length)){if((t=arguments[1])instanceof Error)throw t;var h=new Error('Uncaught, unspecified "error" event. ('+t+")");throw h.context=t,h}if(i(a=this._events[e]))return!1;if(r(a))switch(arguments.length){case 1:a.call(this);break;case 2:a.call(this,arguments[1]);break;case 3:a.call(this,arguments[1],arguments[2]);break;default:c=Array.prototype.slice.call(arguments,1),a.apply(this,c)}else if(n(a))for(c=Array.prototype.slice.call(arguments,1),s=(o=a.slice()).length,u=0;u0&&this._events[e].length>s&&(this._events[e].warned=!0,console.error("(node) warning: possible EventEmitter memory leak detected. %d listeners added. Use emitter.setMaxListeners() to increase limit.",this._events[e].length),"function"==typeof console.trace&&console.trace()),this},t.prototype.on=t.prototype.addListener,t.prototype.once=function(e,t){if(!r(t))throw TypeError("listener must be a function");var n=!1;function i(){this.removeListener(e,i),n||(n=!0,t.apply(this,arguments))}return i.listener=t,this.on(e,i),this},t.prototype.removeListener=function(e,t){var i,a,s,c;if(!r(t))throw TypeError("listener must be a function");if(!this._events||!this._events[e])return this;if(s=(i=this._events[e]).length,a=-1,i===t||r(i.listener)&&i.listener===t)delete this._events[e],this._events.removeListener&&this.emit("removeListener",e,t);else if(n(i)){for(c=s;c-- >0;)if(i[c]===t||i[c].listener&&i[c].listener===t){a=c;break}if(a<0)return this;1===i.length?(i.length=0,delete this._events[e]):i.splice(a,1),this._events.removeListener&&this.emit("removeListener",e,t)}return this},t.prototype.removeAllListeners=function(e){var t,n;if(!this._events)return this;if(!this._events.removeListener)return 0===arguments.length?this._events={}:this._events[e]&&delete this._events[e],this;if(0===arguments.length){for(t in this._events)"removeListener"!==t&&this.removeAllListeners(t);return this.removeAllListeners("removeListener"),this._events={},this}if(r(n=this._events[e]))this.removeListener(e,n);else if(n)for(;n.length;)this.removeListener(e,n[n.length-1]);return delete this._events[e],this},t.prototype.listeners=function(e){return this._events&&this._events[e]?r(this._events[e])?[this._events[e]]:this._events[e].slice():[]},t.prototype.listenerCount=function(e){if(this._events){var t=this._events[e];if(r(t))return 1;if(t)return t.length}return 0},t.listenerCount=function(e,t){return e.listenerCount(t)}},8131:(e,t,r)=>{"use strict";var n=r(49374),i=r(17775),a=r(23076);function s(e,t,r){return new n(e,t,r)}s.version=r(24336),s.AlgoliaSearchHelper=n,s.SearchParameters=i,s.SearchResults=a,e.exports=s},68078:(e,t,r)=>{"use strict";var n=r(17331);function i(e,t){this.main=e,this.fn=t,this.lastResults=null}r(14853)(i,n),i.prototype.detach=function(){this.removeAllListeners(),this.main.detachDerivedHelper(this)},i.prototype.getModifiedState=function(e){return this.fn(e)},e.exports=i},82437:(e,t,r)=>{"use strict";var n=r(52344),i=r(49803),a=r(90116),s={addRefinement:function(e,t,r){if(s.isRefined(e,t,r))return e;var i=""+r,a=e[t]?e[t].concat(i):[i],c={};return c[t]=a,n({},c,e)},removeRefinement:function(e,t,r){if(void 0===r)return s.clearRefinement(e,(function(e,r){return t===r}));var n=""+r;return s.clearRefinement(e,(function(e,r){return t===r&&n===e}))},toggleRefinement:function(e,t,r){if(void 0===r)throw new Error("toggleRefinement should be used with a value");return s.isRefined(e,t,r)?s.removeRefinement(e,t,r):s.addRefinement(e,t,r)},clearRefinement:function(e,t,r){if(void 0===t)return a(e)?{}:e;if("string"==typeof t)return i(e,[t]);if("function"==typeof t){var n=!1,s=Object.keys(e).reduce((function(i,a){var s=e[a]||[],c=s.filter((function(e){return!t(e,a,r)}));return c.length!==s.length&&(n=!0),i[a]=c,i}),{});return n?s:e}},isRefined:function(e,t,r){var n=Boolean(e[t])&&e[t].length>0;if(void 0===r||!n)return n;var i=""+r;return-1!==e[t].indexOf(i)}};e.exports=s},17775:(e,t,r)=>{"use strict";var n=r(60185),i=r(52344),a=r(22686),s=r(7888),c=r(28023),u=r(49803),o=r(90116),h=r(46801),f=r(82437);function l(e,t){return Array.isArray(e)&&Array.isArray(t)?e.length===t.length&&e.every((function(e,r){return l(t[r],e)})):e===t}function m(e){var t=e?m._parseNumbers(e):{};void 0===t.userToken||h(t.userToken)||console.warn("[algoliasearch-helper] The `userToken` parameter is invalid. This can lead to wrong analytics.\n - Format: [a-zA-Z0-9_-]{1,64}"),this.facets=t.facets||[],this.disjunctiveFacets=t.disjunctiveFacets||[],this.hierarchicalFacets=t.hierarchicalFacets||[],this.facetsRefinements=t.facetsRefinements||{},this.facetsExcludes=t.facetsExcludes||{},this.disjunctiveFacetsRefinements=t.disjunctiveFacetsRefinements||{},this.numericRefinements=t.numericRefinements||{},this.tagRefinements=t.tagRefinements||[],this.hierarchicalFacetsRefinements=t.hierarchicalFacetsRefinements||{};var r=this;Object.keys(t).forEach((function(e){var n=-1!==m.PARAMETERS.indexOf(e),i=void 0!==t[e];!n&&i&&(r[e]=t[e])}))}m.PARAMETERS=Object.keys(new m),m._parseNumbers=function(e){if(e instanceof m)return e;var t={};if(["aroundPrecision","aroundRadius","getRankingInfo","minWordSizefor2Typos","minWordSizefor1Typo","page","maxValuesPerFacet","distinct","minimumAroundRadius","hitsPerPage","minProximity"].forEach((function(r){var n=e[r];if("string"==typeof n){var i=parseFloat(n);t[r]=isNaN(i)?n:i}})),Array.isArray(e.insideBoundingBox)&&(t.insideBoundingBox=e.insideBoundingBox.map((function(e){return Array.isArray(e)?e.map((function(e){return parseFloat(e)})):e}))),e.numericRefinements){var r={};Object.keys(e.numericRefinements).forEach((function(t){var n=e.numericRefinements[t]||{};r[t]={},Object.keys(n).forEach((function(e){var i=n[e].map((function(e){return Array.isArray(e)?e.map((function(e){return"string"==typeof e?parseFloat(e):e})):"string"==typeof e?parseFloat(e):e}));r[t][e]=i}))})),t.numericRefinements=r}return n({},e,t)},m.make=function(e){var t=new m(e);return(e.hierarchicalFacets||[]).forEach((function(e){if(e.rootPath){var r=t.getHierarchicalRefinement(e.name);r.length>0&&0!==r[0].indexOf(e.rootPath)&&(t=t.clearRefinements(e.name)),0===(r=t.getHierarchicalRefinement(e.name)).length&&(t=t.toggleHierarchicalFacetRefinement(e.name,e.rootPath))}})),t},m.validate=function(e,t){var r=t||{};return e.tagFilters&&r.tagRefinements&&r.tagRefinements.length>0?new Error("[Tags] Cannot switch from the managed tag API to the advanced API. It is probably an error, if it is really what you want, you should first clear the tags with clearTags method."):e.tagRefinements.length>0&&r.tagFilters?new Error("[Tags] Cannot switch from the advanced tag API to the managed API. It is probably an error, if it is not, you should first clear the tags with clearTags method."):e.numericFilters&&r.numericRefinements&&o(r.numericRefinements)?new Error("[Numeric filters] Can't switch from the advanced to the managed API. It is probably an error, if this is really what you want, you have to first clear the numeric filters."):o(e.numericRefinements)&&r.numericFilters?new Error("[Numeric filters] Can't switch from the managed API to the advanced. It is probably an error, if this is really what you want, you have to first clear the numeric filters."):null},m.prototype={constructor:m,clearRefinements:function(e){var t={numericRefinements:this._clearNumericRefinements(e),facetsRefinements:f.clearRefinement(this.facetsRefinements,e,"conjunctiveFacet"),facetsExcludes:f.clearRefinement(this.facetsExcludes,e,"exclude"),disjunctiveFacetsRefinements:f.clearRefinement(this.disjunctiveFacetsRefinements,e,"disjunctiveFacet"),hierarchicalFacetsRefinements:f.clearRefinement(this.hierarchicalFacetsRefinements,e,"hierarchicalFacet")};return t.numericRefinements===this.numericRefinements&&t.facetsRefinements===this.facetsRefinements&&t.facetsExcludes===this.facetsExcludes&&t.disjunctiveFacetsRefinements===this.disjunctiveFacetsRefinements&&t.hierarchicalFacetsRefinements===this.hierarchicalFacetsRefinements?this:this.setQueryParameters(t)},clearTags:function(){return void 0===this.tagFilters&&0===this.tagRefinements.length?this:this.setQueryParameters({tagFilters:void 0,tagRefinements:[]})},setIndex:function(e){return e===this.index?this:this.setQueryParameters({index:e})},setQuery:function(e){return e===this.query?this:this.setQueryParameters({query:e})},setPage:function(e){return e===this.page?this:this.setQueryParameters({page:e})},setFacets:function(e){return this.setQueryParameters({facets:e})},setDisjunctiveFacets:function(e){return this.setQueryParameters({disjunctiveFacets:e})},setHitsPerPage:function(e){return this.hitsPerPage===e?this:this.setQueryParameters({hitsPerPage:e})},setTypoTolerance:function(e){return this.typoTolerance===e?this:this.setQueryParameters({typoTolerance:e})},addNumericRefinement:function(e,t,r){var i=c(r);if(this.isNumericRefined(e,t,i))return this;var a=n({},this.numericRefinements);return a[e]=n({},a[e]),a[e][t]?(a[e][t]=a[e][t].slice(),a[e][t].push(i)):a[e][t]=[i],this.setQueryParameters({numericRefinements:a})},getConjunctiveRefinements:function(e){return this.isConjunctiveFacet(e)&&this.facetsRefinements[e]||[]},getDisjunctiveRefinements:function(e){return this.isDisjunctiveFacet(e)&&this.disjunctiveFacetsRefinements[e]||[]},getHierarchicalRefinement:function(e){return this.hierarchicalFacetsRefinements[e]||[]},getExcludeRefinements:function(e){return this.isConjunctiveFacet(e)&&this.facetsExcludes[e]||[]},removeNumericRefinement:function(e,t,r){var n=r;return void 0!==n?this.isNumericRefined(e,t,n)?this.setQueryParameters({numericRefinements:this._clearNumericRefinements((function(r,i){return i===e&&r.op===t&&l(r.val,c(n))}))}):this:void 0!==t?this.isNumericRefined(e,t)?this.setQueryParameters({numericRefinements:this._clearNumericRefinements((function(r,n){return n===e&&r.op===t}))}):this:this.isNumericRefined(e)?this.setQueryParameters({numericRefinements:this._clearNumericRefinements((function(t,r){return r===e}))}):this},getNumericRefinements:function(e){return this.numericRefinements[e]||{}},getNumericRefinement:function(e,t){return this.numericRefinements[e]&&this.numericRefinements[e][t]},_clearNumericRefinements:function(e){if(void 0===e)return o(this.numericRefinements)?{}:this.numericRefinements;if("string"==typeof e)return u(this.numericRefinements,[e]);if("function"==typeof e){var t=!1,r=this.numericRefinements,n=Object.keys(r).reduce((function(n,i){var a=r[i],s={};return a=a||{},Object.keys(a).forEach((function(r){var n=a[r]||[],c=[];n.forEach((function(t){e({val:t,op:r},i,"numeric")||c.push(t)})),c.length!==n.length&&(t=!0),s[r]=c})),n[i]=s,n}),{});return t?n:this.numericRefinements}},addFacet:function(e){return this.isConjunctiveFacet(e)?this:this.setQueryParameters({facets:this.facets.concat([e])})},addDisjunctiveFacet:function(e){return this.isDisjunctiveFacet(e)?this:this.setQueryParameters({disjunctiveFacets:this.disjunctiveFacets.concat([e])})},addHierarchicalFacet:function(e){if(this.isHierarchicalFacet(e.name))throw new Error("Cannot declare two hierarchical facets with the same name: `"+e.name+"`");return this.setQueryParameters({hierarchicalFacets:this.hierarchicalFacets.concat([e])})},addFacetRefinement:function(e,t){if(!this.isConjunctiveFacet(e))throw new Error(e+" is not defined in the facets attribute of the helper configuration");return f.isRefined(this.facetsRefinements,e,t)?this:this.setQueryParameters({facetsRefinements:f.addRefinement(this.facetsRefinements,e,t)})},addExcludeRefinement:function(e,t){if(!this.isConjunctiveFacet(e))throw new Error(e+" is not defined in the facets attribute of the helper configuration");return f.isRefined(this.facetsExcludes,e,t)?this:this.setQueryParameters({facetsExcludes:f.addRefinement(this.facetsExcludes,e,t)})},addDisjunctiveFacetRefinement:function(e,t){if(!this.isDisjunctiveFacet(e))throw new Error(e+" is not defined in the disjunctiveFacets attribute of the helper configuration");return f.isRefined(this.disjunctiveFacetsRefinements,e,t)?this:this.setQueryParameters({disjunctiveFacetsRefinements:f.addRefinement(this.disjunctiveFacetsRefinements,e,t)})},addTagRefinement:function(e){if(this.isTagRefined(e))return this;var t={tagRefinements:this.tagRefinements.concat(e)};return this.setQueryParameters(t)},removeFacet:function(e){return this.isConjunctiveFacet(e)?this.clearRefinements(e).setQueryParameters({facets:this.facets.filter((function(t){return t!==e}))}):this},removeDisjunctiveFacet:function(e){return this.isDisjunctiveFacet(e)?this.clearRefinements(e).setQueryParameters({disjunctiveFacets:this.disjunctiveFacets.filter((function(t){return t!==e}))}):this},removeHierarchicalFacet:function(e){return this.isHierarchicalFacet(e)?this.clearRefinements(e).setQueryParameters({hierarchicalFacets:this.hierarchicalFacets.filter((function(t){return t.name!==e}))}):this},removeFacetRefinement:function(e,t){if(!this.isConjunctiveFacet(e))throw new Error(e+" is not defined in the facets attribute of the helper configuration");return f.isRefined(this.facetsRefinements,e,t)?this.setQueryParameters({facetsRefinements:f.removeRefinement(this.facetsRefinements,e,t)}):this},removeExcludeRefinement:function(e,t){if(!this.isConjunctiveFacet(e))throw new Error(e+" is not defined in the facets attribute of the helper configuration");return f.isRefined(this.facetsExcludes,e,t)?this.setQueryParameters({facetsExcludes:f.removeRefinement(this.facetsExcludes,e,t)}):this},removeDisjunctiveFacetRefinement:function(e,t){if(!this.isDisjunctiveFacet(e))throw new Error(e+" is not defined in the disjunctiveFacets attribute of the helper configuration");return f.isRefined(this.disjunctiveFacetsRefinements,e,t)?this.setQueryParameters({disjunctiveFacetsRefinements:f.removeRefinement(this.disjunctiveFacetsRefinements,e,t)}):this},removeTagRefinement:function(e){if(!this.isTagRefined(e))return this;var t={tagRefinements:this.tagRefinements.filter((function(t){return t!==e}))};return this.setQueryParameters(t)},toggleRefinement:function(e,t){return this.toggleFacetRefinement(e,t)},toggleFacetRefinement:function(e,t){if(this.isHierarchicalFacet(e))return this.toggleHierarchicalFacetRefinement(e,t);if(this.isConjunctiveFacet(e))return this.toggleConjunctiveFacetRefinement(e,t);if(this.isDisjunctiveFacet(e))return this.toggleDisjunctiveFacetRefinement(e,t);throw new Error("Cannot refine the undeclared facet "+e+"; it should be added to the helper options facets, disjunctiveFacets or hierarchicalFacets")},toggleConjunctiveFacetRefinement:function(e,t){if(!this.isConjunctiveFacet(e))throw new Error(e+" is not defined in the facets attribute of the helper configuration");return this.setQueryParameters({facetsRefinements:f.toggleRefinement(this.facetsRefinements,e,t)})},toggleExcludeFacetRefinement:function(e,t){if(!this.isConjunctiveFacet(e))throw new Error(e+" is not defined in the facets attribute of the helper configuration");return this.setQueryParameters({facetsExcludes:f.toggleRefinement(this.facetsExcludes,e,t)})},toggleDisjunctiveFacetRefinement:function(e,t){if(!this.isDisjunctiveFacet(e))throw new Error(e+" is not defined in the disjunctiveFacets attribute of the helper configuration");return this.setQueryParameters({disjunctiveFacetsRefinements:f.toggleRefinement(this.disjunctiveFacetsRefinements,e,t)})},toggleHierarchicalFacetRefinement:function(e,t){if(!this.isHierarchicalFacet(e))throw new Error(e+" is not defined in the hierarchicalFacets attribute of the helper configuration");var r=this._getHierarchicalFacetSeparator(this.getHierarchicalFacetByName(e)),n={};return void 0!==this.hierarchicalFacetsRefinements[e]&&this.hierarchicalFacetsRefinements[e].length>0&&(this.hierarchicalFacetsRefinements[e][0]===t||0===this.hierarchicalFacetsRefinements[e][0].indexOf(t+r))?-1===t.indexOf(r)?n[e]=[]:n[e]=[t.slice(0,t.lastIndexOf(r))]:n[e]=[t],this.setQueryParameters({hierarchicalFacetsRefinements:i({},n,this.hierarchicalFacetsRefinements)})},addHierarchicalFacetRefinement:function(e,t){if(this.isHierarchicalFacetRefined(e))throw new Error(e+" is already refined.");if(!this.isHierarchicalFacet(e))throw new Error(e+" is not defined in the hierarchicalFacets attribute of the helper configuration.");var r={};return r[e]=[t],this.setQueryParameters({hierarchicalFacetsRefinements:i({},r,this.hierarchicalFacetsRefinements)})},removeHierarchicalFacetRefinement:function(e){if(!this.isHierarchicalFacetRefined(e))return this;var t={};return t[e]=[],this.setQueryParameters({hierarchicalFacetsRefinements:i({},t,this.hierarchicalFacetsRefinements)})},toggleTagRefinement:function(e){return this.isTagRefined(e)?this.removeTagRefinement(e):this.addTagRefinement(e)},isDisjunctiveFacet:function(e){return this.disjunctiveFacets.indexOf(e)>-1},isHierarchicalFacet:function(e){return void 0!==this.getHierarchicalFacetByName(e)},isConjunctiveFacet:function(e){return this.facets.indexOf(e)>-1},isFacetRefined:function(e,t){return!!this.isConjunctiveFacet(e)&&f.isRefined(this.facetsRefinements,e,t)},isExcludeRefined:function(e,t){return!!this.isConjunctiveFacet(e)&&f.isRefined(this.facetsExcludes,e,t)},isDisjunctiveFacetRefined:function(e,t){return!!this.isDisjunctiveFacet(e)&&f.isRefined(this.disjunctiveFacetsRefinements,e,t)},isHierarchicalFacetRefined:function(e,t){if(!this.isHierarchicalFacet(e))return!1;var r=this.getHierarchicalRefinement(e);return t?-1!==r.indexOf(t):r.length>0},isNumericRefined:function(e,t,r){if(void 0===r&&void 0===t)return Boolean(this.numericRefinements[e]);var n=this.numericRefinements[e]&&void 0!==this.numericRefinements[e][t];if(void 0===r||!n)return n;var i,a,u=c(r),o=void 0!==(i=this.numericRefinements[e][t],a=u,s(i,(function(e){return l(e,a)})));return n&&o},isTagRefined:function(e){return-1!==this.tagRefinements.indexOf(e)},getRefinedDisjunctiveFacets:function(){var e=this,t=a(Object.keys(this.numericRefinements).filter((function(t){return Object.keys(e.numericRefinements[t]).length>0})),this.disjunctiveFacets);return Object.keys(this.disjunctiveFacetsRefinements).filter((function(t){return e.disjunctiveFacetsRefinements[t].length>0})).concat(t).concat(this.getRefinedHierarchicalFacets())},getRefinedHierarchicalFacets:function(){var e=this;return a(this.hierarchicalFacets.map((function(e){return e.name})),Object.keys(this.hierarchicalFacetsRefinements).filter((function(t){return e.hierarchicalFacetsRefinements[t].length>0})))},getUnrefinedDisjunctiveFacets:function(){var e=this.getRefinedDisjunctiveFacets();return this.disjunctiveFacets.filter((function(t){return-1===e.indexOf(t)}))},managedParameters:["index","facets","disjunctiveFacets","facetsRefinements","hierarchicalFacets","facetsExcludes","disjunctiveFacetsRefinements","numericRefinements","tagRefinements","hierarchicalFacetsRefinements"],getQueryParams:function(){var e=this.managedParameters,t={},r=this;return Object.keys(this).forEach((function(n){var i=r[n];-1===e.indexOf(n)&&void 0!==i&&(t[n]=i)})),t},setQueryParameter:function(e,t){if(this[e]===t)return this;var r={};return r[e]=t,this.setQueryParameters(r)},setQueryParameters:function(e){if(!e)return this;var t=m.validate(this,e);if(t)throw t;var r=this,n=m._parseNumbers(e),i=Object.keys(this).reduce((function(e,t){return e[t]=r[t],e}),{}),a=Object.keys(n).reduce((function(e,t){var r=void 0!==e[t],i=void 0!==n[t];return r&&!i?u(e,[t]):(i&&(e[t]=n[t]),e)}),i);return new this.constructor(a)},resetPage:function(){return void 0===this.page?this:this.setPage(0)},_getHierarchicalFacetSortBy:function(e){return e.sortBy||["isRefined:desc","name:asc"]},_getHierarchicalFacetSeparator:function(e){return e.separator||" > "},_getHierarchicalRootPath:function(e){return e.rootPath||null},_getHierarchicalShowParentLevel:function(e){return"boolean"!=typeof e.showParentLevel||e.showParentLevel},getHierarchicalFacetByName:function(e){return s(this.hierarchicalFacets,(function(t){return t.name===e}))},getHierarchicalFacetBreadcrumb:function(e){if(!this.isHierarchicalFacet(e))return[];var t=this.getHierarchicalRefinement(e)[0];if(!t)return[];var r=this._getHierarchicalFacetSeparator(this.getHierarchicalFacetByName(e));return t.split(r).map((function(e){return e.trim()}))},toString:function(){return JSON.stringify(this,null,2)}},e.exports=m},10210:(e,t,r)=>{"use strict";e.exports=function(e){return function(t,r){var s=e.hierarchicalFacets[r],o=e.hierarchicalFacetsRefinements[s.name]&&e.hierarchicalFacetsRefinements[s.name][0]||"",h=e._getHierarchicalFacetSeparator(s),f=e._getHierarchicalRootPath(s),l=e._getHierarchicalShowParentLevel(s),m=a(e._getHierarchicalFacetSortBy(s)),d=t.every((function(e){return e.exhaustive})),p=function(e,t,r,a,s){return function(o,h,f){var l=o;if(f>0){var m=0;for(l=o;m{"use strict";var n=r(60185),i=r(52344),a=r(42148),s=r(74587),c=r(7888),u=r(69725),o=r(82293),h=r(94039),f=h.escapeFacetValue,l=h.unescapeFacetValue,m=r(10210);function d(e){var t={};return e.forEach((function(e,r){t[e]=r})),t}function p(e,t,r){t&&t[r]&&(e.stats=t[r])}function v(e,t,r){var a=t[0];this._rawResults=t;var o=this;Object.keys(a).forEach((function(e){o[e]=a[e]})),Object.keys(r||{}).forEach((function(e){o[e]=r[e]})),this.processingTimeMS=t.reduce((function(e,t){return void 0===t.processingTimeMS?e:e+t.processingTimeMS}),0),this.disjunctiveFacets=[],this.hierarchicalFacets=e.hierarchicalFacets.map((function(){return[]})),this.facets=[];var h=e.getRefinedDisjunctiveFacets(),f=d(e.facets),v=d(e.disjunctiveFacets),g=1,y=a.facets||{};Object.keys(y).forEach((function(t){var r,n,i=y[t],s=(r=e.hierarchicalFacets,n=t,c(r,(function(e){return(e.attributes||[]).indexOf(n)>-1})));if(s){var h=s.attributes.indexOf(t),l=u(e.hierarchicalFacets,(function(e){return e.name===s.name}));o.hierarchicalFacets[l][h]={attribute:t,data:i,exhaustive:a.exhaustiveFacetsCount}}else{var m,d=-1!==e.disjunctiveFacets.indexOf(t),g=-1!==e.facets.indexOf(t);d&&(m=v[t],o.disjunctiveFacets[m]={name:t,data:i,exhaustive:a.exhaustiveFacetsCount},p(o.disjunctiveFacets[m],a.facets_stats,t)),g&&(m=f[t],o.facets[m]={name:t,data:i,exhaustive:a.exhaustiveFacetsCount},p(o.facets[m],a.facets_stats,t))}})),this.hierarchicalFacets=s(this.hierarchicalFacets),h.forEach((function(r){var s=t[g],c=s&&s.facets?s.facets:{},h=e.getHierarchicalFacetByName(r);Object.keys(c).forEach((function(t){var r,f=c[t];if(h){r=u(e.hierarchicalFacets,(function(e){return e.name===h.name}));var m=u(o.hierarchicalFacets[r],(function(e){return e.attribute===t}));if(-1===m)return;o.hierarchicalFacets[r][m].data=n({},o.hierarchicalFacets[r][m].data,f)}else{r=v[t];var d=a.facets&&a.facets[t]||{};o.disjunctiveFacets[r]={name:t,data:i({},f,d),exhaustive:s.exhaustiveFacetsCount},p(o.disjunctiveFacets[r],s.facets_stats,t),e.disjunctiveFacetsRefinements[t]&&e.disjunctiveFacetsRefinements[t].forEach((function(n){!o.disjunctiveFacets[r].data[n]&&e.disjunctiveFacetsRefinements[t].indexOf(l(n))>-1&&(o.disjunctiveFacets[r].data[n]=0)}))}})),g++})),e.getRefinedHierarchicalFacets().forEach((function(r){var n=e.getHierarchicalFacetByName(r),a=e._getHierarchicalFacetSeparator(n),s=e.getHierarchicalRefinement(r);0===s.length||s[0].split(a).length<2||t.slice(g).forEach((function(t){var r=t&&t.facets?t.facets:{};Object.keys(r).forEach((function(t){var c=r[t],h=u(e.hierarchicalFacets,(function(e){return e.name===n.name})),f=u(o.hierarchicalFacets[h],(function(e){return e.attribute===t}));if(-1!==f){var l={};if(s.length>0){var m=s[0].split(a)[0];l[m]=o.hierarchicalFacets[h][f].data[m]}o.hierarchicalFacets[h][f].data=i(l,c,o.hierarchicalFacets[h][f].data)}})),g++}))})),Object.keys(e.facetsExcludes).forEach((function(t){var r=e.facetsExcludes[t],n=f[t];o.facets[n]={name:t,data:y[t],exhaustive:a.exhaustiveFacetsCount},r.forEach((function(e){o.facets[n]=o.facets[n]||{name:t},o.facets[n].data=o.facets[n].data||{},o.facets[n].data[e]=0}))})),this.hierarchicalFacets=this.hierarchicalFacets.map(m(e)),this.facets=s(this.facets),this.disjunctiveFacets=s(this.disjunctiveFacets),this._state=e}function g(e,t){function r(e){return e.name===t}if(e._state.isConjunctiveFacet(t)){var n=c(e.facets,r);return n?Object.keys(n.data).map((function(r){var i=f(r);return{name:r,escapedValue:i,count:n.data[r],isRefined:e._state.isFacetRefined(t,i),isExcluded:e._state.isExcludeRefined(t,r)}})):[]}if(e._state.isDisjunctiveFacet(t)){var i=c(e.disjunctiveFacets,r);return i?Object.keys(i.data).map((function(r){var n=f(r);return{name:r,escapedValue:n,count:i.data[r],isRefined:e._state.isDisjunctiveFacetRefined(t,n)}})):[]}if(e._state.isHierarchicalFacet(t)){var a=c(e.hierarchicalFacets,r);if(!a)return a;var s=e._state.getHierarchicalFacetByName(t),u=e._state._getHierarchicalFacetSeparator(s),o=l(e._state.getHierarchicalRefinement(t)[0]||"");0===o.indexOf(s.rootPath)&&(o=o.replace(s.rootPath+u,""));var h=o.split(u);return h.unshift(t),y(a,h,0),a}}function y(e,t,r){e.isRefined=e.name===t[r],e.data&&e.data.forEach((function(e){y(e,t,r+1)}))}function R(e,t,r,n){if(n=n||0,Array.isArray(t))return e(t,r[n]);if(!t.data||0===t.data.length)return t;var a=t.data.map((function(t){return R(e,t,r,n+1)})),s=e(a,r[n]);return i({data:s},t)}function F(e,t){var r=c(e,(function(e){return e.name===t}));return r&&r.stats}function b(e,t,r,n,i){var a=c(i,(function(e){return e.name===r})),s=a&&a.data&&a.data[n]?a.data[n]:0,u=a&&a.exhaustive||!1;return{type:t,attributeName:r,name:n,count:s,exhaustive:u}}v.prototype.getFacetByName=function(e){function t(t){return t.name===e}return c(this.facets,t)||c(this.disjunctiveFacets,t)||c(this.hierarchicalFacets,t)},v.DEFAULT_SORT=["isRefined:desc","count:desc","name:asc"],v.prototype.getFacetValues=function(e,t){var r=g(this,e);if(r){var n,s=i({},t,{sortBy:v.DEFAULT_SORT,facetOrdering:!(t&&t.sortBy)}),c=this;if(Array.isArray(r))n=[e];else n=c._state.getHierarchicalFacetByName(r.name).attributes;return R((function(e,t){if(s.facetOrdering){var r=function(e,t){return e.renderingContent&&e.renderingContent.facetOrdering&&e.renderingContent.facetOrdering.values&&e.renderingContent.facetOrdering.values[t]}(c,t);if(r)return function(e,t){var r=[],n=[],i=(t.order||[]).reduce((function(e,t,r){return e[t]=r,e}),{});e.forEach((function(e){var t=e.path||e.name;void 0!==i[t]?r[i[t]]=e:n.push(e)})),r=r.filter((function(e){return e}));var s,c=t.sortRemainingBy;return"hidden"===c?r:(s="alpha"===c?[["path","name"],["asc","asc"]]:[["count"],["desc"]],r.concat(a(n,s[0],s[1])))}(e,r)}if(Array.isArray(s.sortBy)){var n=o(s.sortBy,v.DEFAULT_SORT);return a(e,n[0],n[1])}if("function"==typeof s.sortBy)return function(e,t){return t.sort(e)}(s.sortBy,e);throw new Error("options.sortBy is optional but if defined it must be either an array of string (predicates) or a sorting function")}),r,n)}},v.prototype.getFacetStats=function(e){return this._state.isConjunctiveFacet(e)?F(this.facets,e):this._state.isDisjunctiveFacet(e)?F(this.disjunctiveFacets,e):void 0},v.prototype.getRefinements=function(){var e=this._state,t=this,r=[];return Object.keys(e.facetsRefinements).forEach((function(n){e.facetsRefinements[n].forEach((function(i){r.push(b(e,"facet",n,i,t.facets))}))})),Object.keys(e.facetsExcludes).forEach((function(n){e.facetsExcludes[n].forEach((function(i){r.push(b(e,"exclude",n,i,t.facets))}))})),Object.keys(e.disjunctiveFacetsRefinements).forEach((function(n){e.disjunctiveFacetsRefinements[n].forEach((function(i){r.push(b(e,"disjunctive",n,i,t.disjunctiveFacets))}))})),Object.keys(e.hierarchicalFacetsRefinements).forEach((function(n){e.hierarchicalFacetsRefinements[n].forEach((function(i){r.push(function(e,t,r,n){var i=e.getHierarchicalFacetByName(t),a=e._getHierarchicalFacetSeparator(i),s=r.split(a),u=c(n,(function(e){return e.name===t})),o=s.reduce((function(e,t){var r=e&&c(e.data,(function(e){return e.name===t}));return void 0!==r?r:e}),u),h=o&&o.count||0,f=o&&o.exhaustive||!1,l=o&&o.path||"";return{type:"hierarchical",attributeName:t,name:l,count:h,exhaustive:f}}(e,n,i,t.hierarchicalFacets))}))})),Object.keys(e.numericRefinements).forEach((function(t){var n=e.numericRefinements[t];Object.keys(n).forEach((function(e){n[e].forEach((function(n){r.push({type:"numeric",attributeName:t,name:n,numericValue:n,operator:e})}))}))})),e.tagRefinements.forEach((function(e){r.push({type:"tag",attributeName:"_tags",name:e})})),r},e.exports=v},49374:(e,t,r)=>{"use strict";var n=r(17775),i=r(23076),a=r(68078),s=r(96394),c=r(17331),u=r(14853),o=r(90116),h=r(49803),f=r(60185),l=r(24336),m=r(94039).escapeFacetValue;function d(e,t,r){"function"==typeof e.addAlgoliaAgent&&e.addAlgoliaAgent("JS Helper ("+l+")"),this.setClient(e);var i=r||{};i.index=t,this.state=n.make(i),this.lastResults=null,this._queryId=0,this._lastQueryIdReceived=-1,this.derivedHelpers=[],this._currentNbQueries=0}function p(e){if(e<0)throw new Error("Page requested below 0.");return this._change({state:this.state.setPage(e),isPageReset:!1}),this}function v(){return this.state.page}u(d,c),d.prototype.search=function(){return this._search({onlyWithDerivedHelpers:!1}),this},d.prototype.searchOnlyWithDerivedHelpers=function(){return this._search({onlyWithDerivedHelpers:!0}),this},d.prototype.getQuery=function(){var e=this.state;return s._getHitsSearchParams(e)},d.prototype.searchOnce=function(e,t){var r=e?this.state.setQueryParameters(e):this.state,n=s._getQueries(r.index,r),a=this;if(this._currentNbQueries++,this.emit("searchOnce",{state:r}),!t)return this.client.search(n).then((function(e){return a._currentNbQueries--,0===a._currentNbQueries&&a.emit("searchQueueEmpty"),{content:new i(r,e.results),state:r,_originalResponse:e}}),(function(e){throw a._currentNbQueries--,0===a._currentNbQueries&&a.emit("searchQueueEmpty"),e}));this.client.search(n).then((function(e){a._currentNbQueries--,0===a._currentNbQueries&&a.emit("searchQueueEmpty"),t(null,new i(r,e.results),r)})).catch((function(e){a._currentNbQueries--,0===a._currentNbQueries&&a.emit("searchQueueEmpty"),t(e,null,r)}))},d.prototype.findAnswers=function(e){console.warn("[algoliasearch-helper] answers is no longer supported");var t=this.state,r=this.derivedHelpers[0];if(!r)return Promise.resolve([]);var n=r.getModifiedState(t),i=f({attributesForPrediction:e.attributesForPrediction,nbHits:e.nbHits},{params:h(s._getHitsSearchParams(n),["attributesToSnippet","hitsPerPage","restrictSearchableAttributes","snippetEllipsisText"])}),a="search for answers was called, but this client does not have a function client.initIndex(index).findAnswers";if("function"!=typeof this.client.initIndex)throw new Error(a);var c=this.client.initIndex(n.index);if("function"!=typeof c.findAnswers)throw new Error(a);return c.findAnswers(n.query,e.queryLanguages,i)},d.prototype.searchForFacetValues=function(e,t,r,n){var i="function"==typeof this.client.searchForFacetValues,a="function"==typeof this.client.initIndex;if(!i&&!a&&"function"!=typeof this.client.search)throw new Error("search for facet values (searchable) was called, but this client does not have a function client.searchForFacetValues or client.initIndex(index).searchForFacetValues");var c=this.state.setQueryParameters(n||{}),u=c.isDisjunctiveFacet(e),o=s.getSearchForFacetQuery(e,t,r,c);this._currentNbQueries++;var h,f=this;return i?h=this.client.searchForFacetValues([{indexName:c.index,params:o}]):a?h=this.client.initIndex(c.index).searchForFacetValues(o):(delete o.facetName,h=this.client.search([{type:"facet",facet:e,indexName:c.index,params:o}]).then((function(e){return e.results[0]}))),this.emit("searchForFacetValues",{state:c,facet:e,query:t}),h.then((function(t){return f._currentNbQueries--,0===f._currentNbQueries&&f.emit("searchQueueEmpty"),(t=Array.isArray(t)?t[0]:t).facetHits.forEach((function(t){t.escapedValue=m(t.value),t.isRefined=u?c.isDisjunctiveFacetRefined(e,t.escapedValue):c.isFacetRefined(e,t.escapedValue)})),t}),(function(e){throw f._currentNbQueries--,0===f._currentNbQueries&&f.emit("searchQueueEmpty"),e}))},d.prototype.setQuery=function(e){return this._change({state:this.state.resetPage().setQuery(e),isPageReset:!0}),this},d.prototype.clearRefinements=function(e){return this._change({state:this.state.resetPage().clearRefinements(e),isPageReset:!0}),this},d.prototype.clearTags=function(){return this._change({state:this.state.resetPage().clearTags(),isPageReset:!0}),this},d.prototype.addDisjunctiveFacetRefinement=function(e,t){return this._change({state:this.state.resetPage().addDisjunctiveFacetRefinement(e,t),isPageReset:!0}),this},d.prototype.addDisjunctiveRefine=function(){return this.addDisjunctiveFacetRefinement.apply(this,arguments)},d.prototype.addHierarchicalFacetRefinement=function(e,t){return this._change({state:this.state.resetPage().addHierarchicalFacetRefinement(e,t),isPageReset:!0}),this},d.prototype.addNumericRefinement=function(e,t,r){return this._change({state:this.state.resetPage().addNumericRefinement(e,t,r),isPageReset:!0}),this},d.prototype.addFacetRefinement=function(e,t){return this._change({state:this.state.resetPage().addFacetRefinement(e,t),isPageReset:!0}),this},d.prototype.addRefine=function(){return this.addFacetRefinement.apply(this,arguments)},d.prototype.addFacetExclusion=function(e,t){return this._change({state:this.state.resetPage().addExcludeRefinement(e,t),isPageReset:!0}),this},d.prototype.addExclude=function(){return this.addFacetExclusion.apply(this,arguments)},d.prototype.addTag=function(e){return this._change({state:this.state.resetPage().addTagRefinement(e),isPageReset:!0}),this},d.prototype.removeNumericRefinement=function(e,t,r){return this._change({state:this.state.resetPage().removeNumericRefinement(e,t,r),isPageReset:!0}),this},d.prototype.removeDisjunctiveFacetRefinement=function(e,t){return this._change({state:this.state.resetPage().removeDisjunctiveFacetRefinement(e,t),isPageReset:!0}),this},d.prototype.removeDisjunctiveRefine=function(){return this.removeDisjunctiveFacetRefinement.apply(this,arguments)},d.prototype.removeHierarchicalFacetRefinement=function(e){return this._change({state:this.state.resetPage().removeHierarchicalFacetRefinement(e),isPageReset:!0}),this},d.prototype.removeFacetRefinement=function(e,t){return this._change({state:this.state.resetPage().removeFacetRefinement(e,t),isPageReset:!0}),this},d.prototype.removeRefine=function(){return this.removeFacetRefinement.apply(this,arguments)},d.prototype.removeFacetExclusion=function(e,t){return this._change({state:this.state.resetPage().removeExcludeRefinement(e,t),isPageReset:!0}),this},d.prototype.removeExclude=function(){return this.removeFacetExclusion.apply(this,arguments)},d.prototype.removeTag=function(e){return this._change({state:this.state.resetPage().removeTagRefinement(e),isPageReset:!0}),this},d.prototype.toggleFacetExclusion=function(e,t){return this._change({state:this.state.resetPage().toggleExcludeFacetRefinement(e,t),isPageReset:!0}),this},d.prototype.toggleExclude=function(){return this.toggleFacetExclusion.apply(this,arguments)},d.prototype.toggleRefinement=function(e,t){return this.toggleFacetRefinement(e,t)},d.prototype.toggleFacetRefinement=function(e,t){return this._change({state:this.state.resetPage().toggleFacetRefinement(e,t),isPageReset:!0}),this},d.prototype.toggleRefine=function(){return this.toggleFacetRefinement.apply(this,arguments)},d.prototype.toggleTag=function(e){return this._change({state:this.state.resetPage().toggleTagRefinement(e),isPageReset:!0}),this},d.prototype.nextPage=function(){var e=this.state.page||0;return this.setPage(e+1)},d.prototype.previousPage=function(){var e=this.state.page||0;return this.setPage(e-1)},d.prototype.setCurrentPage=p,d.prototype.setPage=p,d.prototype.setIndex=function(e){return this._change({state:this.state.resetPage().setIndex(e),isPageReset:!0}),this},d.prototype.setQueryParameter=function(e,t){return this._change({state:this.state.resetPage().setQueryParameter(e,t),isPageReset:!0}),this},d.prototype.setState=function(e){return this._change({state:n.make(e),isPageReset:!1}),this},d.prototype.overrideStateWithoutTriggeringChangeEvent=function(e){return this.state=new n(e),this},d.prototype.hasRefinements=function(e){return!!o(this.state.getNumericRefinements(e))||(this.state.isConjunctiveFacet(e)?this.state.isFacetRefined(e):this.state.isDisjunctiveFacet(e)?this.state.isDisjunctiveFacetRefined(e):!!this.state.isHierarchicalFacet(e)&&this.state.isHierarchicalFacetRefined(e))},d.prototype.isExcluded=function(e,t){return this.state.isExcludeRefined(e,t)},d.prototype.isDisjunctiveRefined=function(e,t){return this.state.isDisjunctiveFacetRefined(e,t)},d.prototype.hasTag=function(e){return this.state.isTagRefined(e)},d.prototype.isTagRefined=function(){return this.hasTagRefinements.apply(this,arguments)},d.prototype.getIndex=function(){return this.state.index},d.prototype.getCurrentPage=v,d.prototype.getPage=v,d.prototype.getTags=function(){return this.state.tagRefinements},d.prototype.getRefinements=function(e){var t=[];if(this.state.isConjunctiveFacet(e))this.state.getConjunctiveRefinements(e).forEach((function(e){t.push({value:e,type:"conjunctive"})})),this.state.getExcludeRefinements(e).forEach((function(e){t.push({value:e,type:"exclude"})}));else if(this.state.isDisjunctiveFacet(e)){this.state.getDisjunctiveRefinements(e).forEach((function(e){t.push({value:e,type:"disjunctive"})}))}var r=this.state.getNumericRefinements(e);return Object.keys(r).forEach((function(e){var n=r[e];t.push({value:n,operator:e,type:"numeric"})})),t},d.prototype.getNumericRefinement=function(e,t){return this.state.getNumericRefinement(e,t)},d.prototype.getHierarchicalFacetBreadcrumb=function(e){return this.state.getHierarchicalFacetBreadcrumb(e)},d.prototype._search=function(e){var t=this.state,r=[],n=[];e.onlyWithDerivedHelpers||(n=s._getQueries(t.index,t),r.push({state:t,queriesCount:n.length,helper:this}),this.emit("search",{state:t,results:this.lastResults}));var i=this.derivedHelpers.map((function(e){var n=e.getModifiedState(t),i=n.index?s._getQueries(n.index,n):[];return r.push({state:n,queriesCount:i.length,helper:e}),e.emit("search",{state:n,results:e.lastResults}),i})),a=Array.prototype.concat.apply(n,i),c=this._queryId++;if(this._currentNbQueries++,!a.length)return Promise.resolve({results:[]}).then(this._dispatchAlgoliaResponse.bind(this,r,c));try{this.client.search(a).then(this._dispatchAlgoliaResponse.bind(this,r,c)).catch(this._dispatchAlgoliaError.bind(this,c))}catch(u){this.emit("error",{error:u})}},d.prototype._dispatchAlgoliaResponse=function(e,t,r){if(!(t0},d.prototype._change=function(e){var t=e.state,r=e.isPageReset;t!==this.state&&(this.state=t,this.emit("change",{state:this.state,results:this.lastResults,isPageReset:r}))},d.prototype.clearCache=function(){return this.client.clearCache&&this.client.clearCache(),this},d.prototype.setClient=function(e){return this.client===e||("function"==typeof e.addAlgoliaAgent&&e.addAlgoliaAgent("JS Helper ("+l+")"),this.client=e),this},d.prototype.getClient=function(){return this.client},d.prototype.derive=function(e){var t=new a(this,e);return this.derivedHelpers.push(t),t},d.prototype.detachDerivedHelper=function(e){var t=this.derivedHelpers.indexOf(e);if(-1===t)throw new Error("Derived helper already detached");this.derivedHelpers.splice(t,1)},d.prototype.hasPendingRequests=function(){return this._currentNbQueries>0},e.exports=d},74587:e=>{"use strict";e.exports=function(e){return Array.isArray(e)?e.filter(Boolean):[]}},52344:e=>{"use strict";e.exports=function(){return Array.prototype.slice.call(arguments).reduceRight((function(e,t){return Object.keys(Object(t)).forEach((function(r){void 0!==t[r]&&(void 0!==e[r]&&delete e[r],e[r]=t[r])})),e}),{})}},94039:e=>{"use strict";e.exports={escapeFacetValue:function(e){return"string"!=typeof e?e:String(e).replace(/^-/,"\\-")},unescapeFacetValue:function(e){return"string"!=typeof e?e:e.replace(/^\\-/,"-")}}},7888:e=>{"use strict";e.exports=function(e,t){if(Array.isArray(e))for(var r=0;r{"use strict";e.exports=function(e,t){if(!Array.isArray(e))return-1;for(var r=0;r{"use strict";var n=r(7888);e.exports=function(e,t){var r=(t||[]).map((function(e){return e.split(":")}));return e.reduce((function(e,t){var i=t.split(":"),a=n(r,(function(e){return e[0]===i[0]}));return i.length>1||!a?(e[0].push(i[0]),e[1].push(i[1]),e):(e[0].push(a[0]),e[1].push(a[1]),e)}),[[],[]])}},14853:e=>{"use strict";e.exports=function(e,t){e.prototype=Object.create(t.prototype,{constructor:{value:e,enumerable:!1,writable:!0,configurable:!0}})}},22686:e=>{"use strict";e.exports=function(e,t){return e.filter((function(r,n){return t.indexOf(r)>-1&&e.indexOf(r)===n}))}},60185:e=>{"use strict";function t(e){return"function"==typeof e||Array.isArray(e)||"[object Object]"===Object.prototype.toString.call(e)}function r(e,n){if(e===n)return e;for(var i in n)if(Object.prototype.hasOwnProperty.call(n,i)&&"__proto__"!==i&&"constructor"!==i){var a=n[i],s=e[i];void 0!==s&&void 0===a||(t(s)&&t(a)?e[i]=r(s,a):e[i]="object"==typeof(c=a)&&null!==c?r(Array.isArray(c)?[]:{},c):c)}var c;return e}e.exports=function(e){t(e)||(e={});for(var n=1,i=arguments.length;n{"use strict";e.exports=function(e){return e&&Object.keys(e).length>0}},49803:e=>{"use strict";e.exports=function(e,t){if(null===e)return{};var r,n,i={},a=Object.keys(e);for(n=0;n=0||(i[r]=e[r]);return i}},42148:e=>{"use strict";function t(e,t){if(e!==t){var r=void 0!==e,n=null===e,i=void 0!==t,a=null===t;if(!a&&e>t||n&&i||!r)return 1;if(!n&&e=n.length?a:"desc"===n[i]?-a:a}return e.index-r.index})),i.map((function(e){return e.value}))}},28023:e=>{"use strict";e.exports=function e(t){if("number"==typeof t)return t;if("string"==typeof t)return parseFloat(t);if(Array.isArray(t))return t.map(e);throw new Error("The value should be a number, a parsable string or an array of those.")}},96394:(e,t,r)=>{"use strict";var n=r(60185);function i(e){return Object.keys(e).sort((function(e,t){return e.localeCompare(t)})).reduce((function(t,r){return t[r]=e[r],t}),{})}var a={_getQueries:function(e,t){var r=[];return r.push({indexName:e,params:a._getHitsSearchParams(t)}),t.getRefinedDisjunctiveFacets().forEach((function(n){r.push({indexName:e,params:a._getDisjunctiveFacetSearchParams(t,n)})})),t.getRefinedHierarchicalFacets().forEach((function(n){var i=t.getHierarchicalFacetByName(n),s=t.getHierarchicalRefinement(n),c=t._getHierarchicalFacetSeparator(i);if(s.length>0&&s[0].split(c).length>1){var u=s[0].split(c).slice(0,-1).reduce((function(e,t,r){return e.concat({attribute:i.attributes[r],value:0===r?t:[e[e.length-1].value,t].join(c)})}),[]);u.forEach((function(n,s){var c=a._getDisjunctiveFacetSearchParams(t,n.attribute,0===s);function o(e){return i.attributes.some((function(t){return t===e.split(":")[0]}))}var h=(c.facetFilters||[]).reduce((function(e,t){if(Array.isArray(t)){var r=t.filter((function(e){return!o(e)}));r.length>0&&e.push(r)}return"string"!=typeof t||o(t)||e.push(t),e}),[]),f=u[s-1];c.facetFilters=s>0?h.concat(f.attribute+":"+f.value):h.length>0?h:void 0,r.push({indexName:e,params:c})}))}})),r},_getHitsSearchParams:function(e){var t=e.facets.concat(e.disjunctiveFacets).concat(a._getHitsHierarchicalFacetsAttributes(e)),r=a._getFacetFilters(e),s=a._getNumericFilters(e),c=a._getTagFilters(e),u={facets:t.indexOf("*")>-1?["*"]:t,tagFilters:c};return r.length>0&&(u.facetFilters=r),s.length>0&&(u.numericFilters=s),i(n({},e.getQueryParams(),u))},_getDisjunctiveFacetSearchParams:function(e,t,r){var s=a._getFacetFilters(e,t,r),c=a._getNumericFilters(e,t),u=a._getTagFilters(e),o={hitsPerPage:0,page:0,analytics:!1,clickAnalytics:!1};u.length>0&&(o.tagFilters=u);var h=e.getHierarchicalFacetByName(t);return o.facets=h?a._getDisjunctiveHierarchicalFacetAttribute(e,h,r):t,c.length>0&&(o.numericFilters=c),s.length>0&&(o.facetFilters=s),i(n({},e.getQueryParams(),o))},_getNumericFilters:function(e,t){if(e.numericFilters)return e.numericFilters;var r=[];return Object.keys(e.numericRefinements).forEach((function(n){var i=e.numericRefinements[n]||{};Object.keys(i).forEach((function(e){var a=i[e]||[];t!==n&&a.forEach((function(t){if(Array.isArray(t)){var i=t.map((function(t){return n+e+t}));r.push(i)}else r.push(n+e+t)}))}))})),r},_getTagFilters:function(e){return e.tagFilters?e.tagFilters:e.tagRefinements.join(",")},_getFacetFilters:function(e,t,r){var n=[],i=e.facetsRefinements||{};Object.keys(i).forEach((function(e){(i[e]||[]).forEach((function(t){n.push(e+":"+t)}))}));var a=e.facetsExcludes||{};Object.keys(a).forEach((function(e){(a[e]||[]).forEach((function(t){n.push(e+":-"+t)}))}));var s=e.disjunctiveFacetsRefinements||{};Object.keys(s).forEach((function(e){var r=s[e]||[];if(e!==t&&r&&0!==r.length){var i=[];r.forEach((function(t){i.push(e+":"+t)})),n.push(i)}}));var c=e.hierarchicalFacetsRefinements||{};return Object.keys(c).forEach((function(i){var a=(c[i]||[])[0];if(void 0!==a){var s,u,o=e.getHierarchicalFacetByName(i),h=e._getHierarchicalFacetSeparator(o),f=e._getHierarchicalRootPath(o);if(t===i){if(-1===a.indexOf(h)||!f&&!0===r||f&&f.split(h).length===a.split(h).length)return;f?(u=f.split(h).length-1,a=f):(u=a.split(h).length-2,a=a.slice(0,a.lastIndexOf(h))),s=o.attributes[u]}else u=a.split(h).length-1,s=o.attributes[u];s&&n.push([s+":"+a])}})),n},_getHitsHierarchicalFacetsAttributes:function(e){return e.hierarchicalFacets.reduce((function(t,r){var n=e.getHierarchicalRefinement(r.name)[0];if(!n)return t.push(r.attributes[0]),t;var i=e._getHierarchicalFacetSeparator(r),a=n.split(i).length,s=r.attributes.slice(0,a+1);return t.concat(s)}),[])},_getDisjunctiveHierarchicalFacetAttribute:function(e,t,r){var n=e._getHierarchicalFacetSeparator(t);if(!0===r){var i=e._getHierarchicalRootPath(t),a=0;return i&&(a=i.split(n).length),[t.attributes[a]]}var s=(e.getHierarchicalRefinement(t.name)[0]||"").split(n).length-1;return t.attributes.slice(0,s+1)},getSearchForFacetQuery:function(e,t,r,s){var c=s.isDisjunctiveFacet(e)?s.clearRefinements(e):s,u={facetQuery:t,facetName:e};return"number"==typeof r&&(u.maxFacetHits=r),i(n({},a._getHitsSearchParams(c),u))}};e.exports=a},46801:e=>{"use strict";e.exports=function(e){return null!==e&&/^[a-zA-Z0-9_-]{1,64}$/.test(e)}},24336:e=>{"use strict";e.exports="3.13.5"},70290:function(e){e.exports=function(){"use strict";function e(e,t,r){return t in e?Object.defineProperty(e,t,{value:r,enumerable:!0,configurable:!0,writable:!0}):e[t]=r,e}function t(e,t){var r=Object.keys(e);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(e);t&&(n=n.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),r.push.apply(r,n)}return r}function r(r){for(var n=1;n=0||(i[r]=e[r]);return i}(e,t);if(Object.getOwnPropertySymbols){var a=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(i[r]=e[r])}return i}function i(e,t){return function(e){if(Array.isArray(e))return e}(e)||function(e,t){if(Symbol.iterator in Object(e)||"[object Arguments]"===Object.prototype.toString.call(e)){var r=[],n=!0,i=!1,a=void 0;try{for(var s,c=e[Symbol.iterator]();!(n=(s=c.next()).done)&&(r.push(s.value),!t||r.length!==t);n=!0);}catch(e){i=!0,a=e}finally{try{n||null==c.return||c.return()}finally{if(i)throw a}}return r}}(e,t)||function(){throw new TypeError("Invalid attempt to destructure non-iterable instance")}()}function a(e){return function(e){if(Array.isArray(e)){for(var t=0,r=new Array(e.length);t2&&void 0!==arguments[2]?arguments[2]:{miss:function(){return Promise.resolve()}};return Promise.resolve().then((function(){c();var t=JSON.stringify(e);return a()[t]})).then((function(e){return Promise.all([e?e.value:t(),void 0!==e])})).then((function(e){var t=i(e,2),n=t[0],a=t[1];return Promise.all([n,a||r.miss(n)])})).then((function(e){return i(e,1)[0]}))},set:function(e,t){return Promise.resolve().then((function(){var i=a();return i[JSON.stringify(e)]={timestamp:(new Date).getTime(),value:t},n().setItem(r,JSON.stringify(i)),t}))},delete:function(e){return Promise.resolve().then((function(){var t=a();delete t[JSON.stringify(e)],n().setItem(r,JSON.stringify(t))}))},clear:function(){return Promise.resolve().then((function(){n().removeItem(r)}))}}}function c(e){var t=a(e.caches),r=t.shift();return void 0===r?{get:function(e,t){var r=arguments.length>2&&void 0!==arguments[2]?arguments[2]:{miss:function(){return Promise.resolve()}};return t().then((function(e){return Promise.all([e,r.miss(e)])})).then((function(e){return i(e,1)[0]}))},set:function(e,t){return Promise.resolve(t)},delete:function(e){return Promise.resolve()},clear:function(){return Promise.resolve()}}:{get:function(e,n){var i=arguments.length>2&&void 0!==arguments[2]?arguments[2]:{miss:function(){return Promise.resolve()}};return r.get(e,n,i).catch((function(){return c({caches:t}).get(e,n,i)}))},set:function(e,n){return r.set(e,n).catch((function(){return c({caches:t}).set(e,n)}))},delete:function(e){return r.delete(e).catch((function(){return c({caches:t}).delete(e)}))},clear:function(){return r.clear().catch((function(){return c({caches:t}).clear()}))}}}function u(){var e=arguments.length>0&&void 0!==arguments[0]?arguments[0]:{serializable:!0},t={};return{get:function(r,n){var i=arguments.length>2&&void 0!==arguments[2]?arguments[2]:{miss:function(){return Promise.resolve()}},a=JSON.stringify(r);if(a in t)return Promise.resolve(e.serializable?JSON.parse(t[a]):t[a]);var s=n(),c=i&&i.miss||function(){return Promise.resolve()};return s.then((function(e){return c(e)})).then((function(){return s}))},set:function(r,n){return t[JSON.stringify(r)]=e.serializable?JSON.stringify(n):n,Promise.resolve(n)},delete:function(e){return delete t[JSON.stringify(e)],Promise.resolve()},clear:function(){return t={},Promise.resolve()}}}function o(e){for(var t=e.length-1;t>0;t--){var r=Math.floor(Math.random()*(t+1)),n=e[t];e[t]=e[r],e[r]=n}return e}function h(e,t){return t?(Object.keys(t).forEach((function(r){e[r]=t[r](e)})),e):e}function f(e){for(var t=arguments.length,r=new Array(t>1?t-1:0),n=1;n0?n:void 0,timeout:r.timeout||t,headers:r.headers||{},queryParameters:r.queryParameters||{},cacheable:r.cacheable}}var d={Read:1,Write:2,Any:3},p=1,v=2,g=3;function y(e){var t=arguments.length>1&&void 0!==arguments[1]?arguments[1]:p;return r(r({},e),{},{status:t,lastUpdate:Date.now()})}function R(e){return"string"==typeof e?{protocol:"https",url:e,accept:d.Any}:{protocol:e.protocol||"https",url:e.url,accept:e.accept||d.Any}}var F="GET",b="POST";function P(e,t){return Promise.all(t.map((function(t){return e.get(t,(function(){return Promise.resolve(y(t))}))}))).then((function(e){var r=e.filter((function(e){return function(e){return e.status===p||Date.now()-e.lastUpdate>12e4}(e)})),n=e.filter((function(e){return function(e){return e.status===g&&Date.now()-e.lastUpdate<=12e4}(e)})),i=[].concat(a(r),a(n));return{getTimeout:function(e,t){return(0===n.length&&0===e?1:n.length+3+e)*t},statelessHosts:i.length>0?i.map((function(e){return R(e)})):t}}))}function j(e,t,n,i){var s=[],c=function(e,t){if(e.method!==F&&(void 0!==e.data||void 0!==t.data)){var n=Array.isArray(e.data)?e.data:r(r({},e.data),t.data);return JSON.stringify(n)}}(n,i),u=function(e,t){var n=r(r({},e.headers),t.headers),i={};return Object.keys(n).forEach((function(e){var t=n[e];i[e.toLowerCase()]=t})),i}(e,i),o=n.method,h=n.method!==F?{}:r(r({},n.data),i.data),f=r(r(r({"x-algolia-agent":e.userAgent.value},e.queryParameters),h),i.queryParameters),l=0,m=function t(r,a){var h=r.pop();if(void 0===h)throw{name:"RetryError",message:"Unreachable hosts - your application id may be incorrect. If the error persists, contact support@algolia.com.",transporterStackTrace:O(s)};var m={data:c,headers:u,method:o,url:E(h,n.path,f),connectTimeout:a(l,e.timeouts.connect),responseTimeout:a(l,i.timeout)},d=function(e){var t={request:m,response:e,host:h,triesLeft:r.length};return s.push(t),t},p={onSuccess:function(e){return function(e){try{return JSON.parse(e.content)}catch(t){throw function(e,t){return{name:"DeserializationError",message:e,response:t}}(t.message,e)}}(e)},onRetry:function(n){var i=d(n);return n.isTimedOut&&l++,Promise.all([e.logger.info("Retryable failure",w(i)),e.hostsCache.set(h,y(h,n.isTimedOut?g:v))]).then((function(){return t(r,a)}))},onFail:function(e){throw d(e),function(e,t){var r=e.content,n=e.status,i=r;try{i=JSON.parse(r).message}catch(e){}return function(e,t,r){return{name:"ApiError",message:e,status:t,transporterStackTrace:r}}(i,n,t)}(e,O(s))}};return e.requester.send(m).then((function(e){return function(e,t){return function(e){var t=e.status;return e.isTimedOut||function(e){var t=e.isTimedOut,r=e.status;return!t&&0==~~r}(e)||2!=~~(t/100)&&4!=~~(t/100)}(e)?t.onRetry(e):2==~~(e.status/100)?t.onSuccess(e):t.onFail(e)}(e,p)}))};return P(e.hostsCache,t).then((function(e){return m(a(e.statelessHosts).reverse(),e.getTimeout)}))}function _(e){var t={value:"Algolia for JavaScript (".concat(e,")"),add:function(e){var r="; ".concat(e.segment).concat(void 0!==e.version?" (".concat(e.version,")"):"");return-1===t.value.indexOf(r)&&(t.value="".concat(t.value).concat(r)),t}};return t}function E(e,t,r){var n=x(r),i="".concat(e.protocol,"://").concat(e.url,"/").concat("/"===t.charAt(0)?t.substr(1):t);return n.length&&(i+="?".concat(n)),i}function x(e){return Object.keys(e).map((function(t){return f("%s=%s",t,(r=e[t],"[object Object]"===Object.prototype.toString.call(r)||"[object Array]"===Object.prototype.toString.call(r)?JSON.stringify(e[t]):e[t]));var r})).join("&")}function O(e){return e.map((function(e){return w(e)}))}function w(e){var t=e.request.headers["x-algolia-api-key"]?{"x-algolia-api-key":"*****"}:{};return r(r({},e),{},{request:r(r({},e.request),{},{headers:r(r({},e.request.headers),t)})})}var N=function(e){var t=e.appId,n=function(e,t,r){var n={"x-algolia-api-key":r,"x-algolia-application-id":t};return{headers:function(){return e===l.WithinHeaders?n:{}},queryParameters:function(){return e===l.WithinQueryParameters?n:{}}}}(void 0!==e.authMode?e.authMode:l.WithinHeaders,t,e.apiKey),a=function(e){var t=e.hostsCache,r=e.logger,n=e.requester,a=e.requestsCache,s=e.responsesCache,c=e.timeouts,u=e.userAgent,o=e.hosts,h=e.queryParameters,f={hostsCache:t,logger:r,requester:n,requestsCache:a,responsesCache:s,timeouts:c,userAgent:u,headers:e.headers,queryParameters:h,hosts:o.map((function(e){return R(e)})),read:function(e,t){var r=m(t,f.timeouts.read),n=function(){return j(f,f.hosts.filter((function(e){return 0!=(e.accept&d.Read)})),e,r)};if(!0!==(void 0!==r.cacheable?r.cacheable:e.cacheable))return n();var a={request:e,mappedRequestOptions:r,transporter:{queryParameters:f.queryParameters,headers:f.headers}};return f.responsesCache.get(a,(function(){return f.requestsCache.get(a,(function(){return f.requestsCache.set(a,n()).then((function(e){return Promise.all([f.requestsCache.delete(a),e])}),(function(e){return Promise.all([f.requestsCache.delete(a),Promise.reject(e)])})).then((function(e){var t=i(e,2);return t[0],t[1]}))}))}),{miss:function(e){return f.responsesCache.set(a,e)}})},write:function(e,t){return j(f,f.hosts.filter((function(e){return 0!=(e.accept&d.Write)})),e,m(t,f.timeouts.write))}};return f}(r(r({hosts:[{url:"".concat(t,"-dsn.algolia.net"),accept:d.Read},{url:"".concat(t,".algolia.net"),accept:d.Write}].concat(o([{url:"".concat(t,"-1.algolianet.com")},{url:"".concat(t,"-2.algolianet.com")},{url:"".concat(t,"-3.algolianet.com")}]))},e),{},{headers:r(r(r({},n.headers()),{"content-type":"application/x-www-form-urlencoded"}),e.headers),queryParameters:r(r({},n.queryParameters()),e.queryParameters)}));return h({transporter:a,appId:t,addAlgoliaAgent:function(e,t){a.userAgent.add({segment:e,version:t})},clearCache:function(){return Promise.all([a.requestsCache.clear(),a.responsesCache.clear()]).then((function(){}))}},e.methods)},A=function(e){return function(t,r){return t.method===F?e.transporter.read(t,r):e.transporter.write(t,r)}},H=function(e){return function(t){var r=arguments.length>1&&void 0!==arguments[1]?arguments[1]:{};return h({transporter:e.transporter,appId:e.appId,indexName:t},r.methods)}},S=function(e){return function(t,n){var i=t.map((function(e){return r(r({},e),{},{params:x(e.params||{})})}));return e.transporter.read({method:b,path:"1/indexes/*/queries",data:{requests:i},cacheable:!0},n)}},T=function(e){return function(t,i){return Promise.all(t.map((function(t){var a=t.params,s=a.facetName,c=a.facetQuery,u=n(a,["facetName","facetQuery"]);return H(e)(t.indexName,{methods:{searchForFacetValues:I}}).searchForFacetValues(s,c,r(r({},i),u))})))}},Q=function(e){return function(t,r,n){return e.transporter.read({method:b,path:f("1/answers/%s/prediction",e.indexName),data:{query:t,queryLanguages:r},cacheable:!0},n)}},C=function(e){return function(t,r){return e.transporter.read({method:b,path:f("1/indexes/%s/query",e.indexName),data:{query:t},cacheable:!0},r)}},I=function(e){return function(t,r,n){return e.transporter.read({method:b,path:f("1/indexes/%s/facets/%s/query",e.indexName,t),data:{facetQuery:r},cacheable:!0},n)}},k=1,D=2,q=3;function V(e,t,n){var i,a={appId:e,apiKey:t,timeouts:{connect:1,read:2,write:30},requester:{send:function(e){return new Promise((function(t){var r=new XMLHttpRequest;r.open(e.method,e.url,!0),Object.keys(e.headers).forEach((function(t){return r.setRequestHeader(t,e.headers[t])}));var n,i=function(e,n){return setTimeout((function(){r.abort(),t({status:0,content:n,isTimedOut:!0})}),1e3*e)},a=i(e.connectTimeout,"Connection timeout");r.onreadystatechange=function(){r.readyState>r.OPENED&&void 0===n&&(clearTimeout(a),n=i(e.responseTimeout,"Socket timeout"))},r.onerror=function(){0===r.status&&(clearTimeout(a),clearTimeout(n),t({content:r.responseText||"Network request failed",status:r.status,isTimedOut:!1}))},r.onload=function(){clearTimeout(a),clearTimeout(n),t({content:r.responseText,status:r.status,isTimedOut:!1})},r.send(e.data)}))}},logger:(i=q,{debug:function(e,t){return k>=i&&console.debug(e,t),Promise.resolve()},info:function(e,t){return D>=i&&console.info(e,t),Promise.resolve()},error:function(e,t){return console.error(e,t),Promise.resolve()}}),responsesCache:u(),requestsCache:u({serializable:!1}),hostsCache:c({caches:[s({key:"".concat("4.19.1","-").concat(e)}),u()]}),userAgent:_("4.19.1").add({segment:"Browser",version:"lite"}),authMode:l.WithinQueryParameters};return N(r(r(r({},a),n),{},{methods:{search:S,searchForFacetValues:T,multipleQueries:S,multipleSearchForFacetValues:T,customRequest:A,initIndex:function(e){return function(t){return H(e)(t,{methods:{search:C,searchForFacetValues:I,findAnswers:Q}})}}}}))}return V.version="4.19.1",V}()},56675:(e,t,r)=>{"use strict";r.r(t),r.d(t,{default:()=>A});var n=r(67294),i=r(86010),a=r(8131),s=r.n(a),c=r(70290),u=r.n(c),o=r(10412),h=r(35742),f=r(39960),l=r(80143),m=r(52263),d=["zero","one","two","few","many","other"];function p(e){return d.filter((function(t){return e.includes(t)}))}var v={locale:"en",pluralForms:p(["one","other"]),select:function(e){return 1===e?"one":"other"}};function g(){var e=(0,m.Z)().i18n.currentLocale;return(0,n.useMemo)((function(){try{return t=e,r=new Intl.PluralRules(t),{locale:t,pluralForms:p(r.resolvedOptions().pluralCategories),select:function(e){return r.select(e)}}}catch(n){return console.error('Failed to use Intl.PluralRules for locale "'+e+'".\nDocusaurus will fallback to the default (English) implementation.\nError: '+n.message+"\n"),v}var t,r}),[e])}function y(){var e=g();return{selectMessage:function(t,r){return function(e,t,r){var n=e.split("|");if(1===n.length)return n[0];n.length>r.pluralForms.length&&console.error("For locale="+r.locale+", a maximum of "+r.pluralForms.length+" plural forms are expected ("+r.pluralForms.join(",")+"), but the message contains "+n.length+": "+e);var i=r.select(t),a=r.pluralForms.indexOf(i);return n[Math.min(a,n.length-1)]}(r,t,e)}}}var R=r(66177),F=r(69688),b=r(10833),P=r(82128),j=r(95999),_=r(6278),E=r(239),x=r(7452);const O={searchQueryInput:"searchQueryInput_u2C7",searchVersionInput:"searchVersionInput_m0Ui",searchResultsColumn:"searchResultsColumn_JPFH",algoliaLogo:"algoliaLogo_rT1R",algoliaLogoPathFill:"algoliaLogoPathFill_WdUC",searchResultItem:"searchResultItem_Tv2o",searchResultItemHeading:"searchResultItemHeading_KbCB",searchResultItemPath:"searchResultItemPath_lhe1",searchResultItemSummary:"searchResultItemSummary_AEaO",searchQueryColumn:"searchQueryColumn_RTkw",searchVersionColumn:"searchVersionColumn_ypXd",searchLogoColumn:"searchLogoColumn_rJIA",loadingSpinner:"loadingSpinner_XVxU","loading-spin":"loading-spin_vzvp",loader:"loader_vvXV"};function w(e){var t=e.docsSearchVersionsHelpers,r=Object.entries(t.allDocsData).filter((function(e){return e[1].versions.length>1}));return n.createElement("div",{className:(0,i.Z)("col","col--3","padding-left--none",O.searchVersionColumn)},r.map((function(e){var i=e[0],a=e[1],s=r.length>1?i+": ":"";return n.createElement("select",{key:i,onChange:function(e){return t.setSearchVersion(i,e.target.value)},defaultValue:t.searchVersions[i],className:O.searchVersionInput},a.versions.map((function(e,t){return n.createElement("option",{key:t,label:""+s+e.label,value:e.name})})))})))}function N(){var e,t,r,a,c,d,p=(0,m.Z)().i18n.currentLocale,v=(0,_.L)().algolia,g=v.appId,b=v.apiKey,N=v.indexName,A=(0,E.l)(),H=(e=y().selectMessage,function(t){return e(t,(0,j.I)({id:"theme.SearchPage.documentsFound.plurals",description:'Pluralized label for "{count} documents found". Use as much plural forms (separated by "|") as your language support (see https://www.unicode.org/cldr/cldr-aux/charts/34/supplemental/language_plural_rules.html)',message:"One document found|{count} documents found"},{count:t}))}),S=(t=(0,l._r)(),r=(0,n.useState)((function(){return Object.entries(t).reduce((function(e,t){var r,n=t[0],i=t[1];return Object.assign({},e,((r={})[n]=i.versions[0].name,r))}),{})})),a=r[0],c=r[1],d=Object.values(t).some((function(e){return e.versions.length>1})),{allDocsData:t,versioningEnabled:d,searchVersions:a,setSearchVersion:function(e,t){return c((function(r){var n;return Object.assign({},r,((n={})[e]=t,n))}))}}),T=(0,R.K)(),Q=T[0],C=T[1],I={items:[],query:null,totalResults:null,totalPages:null,lastPage:null,hasMore:null,loading:null},k=(0,n.useReducer)((function(e,t){switch(t.type){case"reset":return I;case"loading":return Object.assign({},e,{loading:!0});case"update":return Q!==t.value.query?e:Object.assign({},t.value,{items:0===t.value.lastPage?t.value.items:e.items.concat(t.value.items)});case"advance":var r=e.totalPages>e.lastPage+1;return Object.assign({},e,{lastPage:r?e.lastPage+1:e.lastPage,hasMore:r});default:return e}}),I),D=k[0],q=k[1],V=u()(g,b),L=s()(V,N,{hitsPerPage:15,advancedSyntax:!0,disjunctiveFacets:["language","docusaurus_tag"]});L.on("result",(function(e){var t=e.results,r=t.query,n=t.hits,i=t.page,a=t.nbHits,s=t.nbPages;if(""!==r&&Array.isArray(n)){var c=function(e){return e.replace(/algolia-docsearch-suggestion--highlight/g,"search-result-match")},u=n.map((function(e){var t=e.url,r=e._highlightResult.hierarchy,n=e._snippetResult,i=void 0===n?{}:n,a=Object.keys(r).map((function(e){return c(r[e].value)}));return{title:a.pop(),url:A(t),summary:i.content?c(i.content.value)+"...":"",breadcrumbs:a}}));q({type:"update",value:{items:u,query:r,totalResults:a,totalPages:s,lastPage:i,hasMore:s>i+1,loading:!1}})}else q({type:"reset"})}));var B=(0,n.useState)(null),z=B[0],M=B[1],J=(0,n.useRef)(0),W=(0,n.useRef)(o.Z.canUseIntersectionObserver&&new IntersectionObserver((function(e){var t=e[0],r=t.isIntersecting,n=t.boundingClientRect.y;r&&J.current>n&&q({type:"advance"}),J.current=n}),{threshold:1})),U=function(){return Q?(0,j.I)({id:"theme.SearchPage.existingResultsTitle",message:'Search results for "{query}"',description:"The search page title for non-empty query"},{query:Q}):(0,j.I)({id:"theme.SearchPage.emptyResultsTitle",message:"Search the documentation",description:"The search page title for empty query"})},Z=(0,F.zX)((function(e){void 0===e&&(e=0),L.addDisjunctiveFacetRefinement("docusaurus_tag","default"),L.addDisjunctiveFacetRefinement("language",p),Object.entries(S.searchVersions).forEach((function(e){var t=e[0],r=e[1];L.addDisjunctiveFacetRefinement("docusaurus_tag","docs-"+t+"-"+r)})),L.setQuery(Q).setPage(e).search()}));return(0,n.useEffect)((function(){if(z){var e=W.current;return e?(e.observe(z),function(){return e.unobserve(z)}):function(){return!0}}}),[z]),(0,n.useEffect)((function(){q({type:"reset"}),Q&&(q({type:"loading"}),setTimeout((function(){Z()}),300))}),[Q,S.searchVersions,Z]),(0,n.useEffect)((function(){D.lastPage&&0!==D.lastPage&&Z(D.lastPage)}),[Z,D.lastPage]),n.createElement(x.Z,null,n.createElement(h.Z,null,n.createElement("title",null,(0,P.p)(U())),n.createElement("meta",{property:"robots",content:"noindex, follow"})),n.createElement("div",{className:"container margin-vert--lg"},n.createElement("h1",null,U()),n.createElement("form",{className:"row",onSubmit:function(e){return e.preventDefault()}},n.createElement("div",{className:(0,i.Z)("col",O.searchQueryColumn,{"col--9":S.versioningEnabled,"col--12":!S.versioningEnabled})},n.createElement("input",{type:"search",name:"q",className:O.searchQueryInput,placeholder:(0,j.I)({id:"theme.SearchPage.inputPlaceholder",message:"Type your search here",description:"The placeholder for search page input"}),"aria-label":(0,j.I)({id:"theme.SearchPage.inputLabel",message:"Search",description:"The ARIA label for search page input"}),onChange:function(e){return C(e.target.value)},value:Q,autoComplete:"off",autoFocus:!0})),S.versioningEnabled&&n.createElement(w,{docsSearchVersionsHelpers:S})),n.createElement("div",{className:"row"},n.createElement("div",{className:(0,i.Z)("col","col--8",O.searchResultsColumn)},!!D.totalResults&&H(D.totalResults)),n.createElement("div",{className:(0,i.Z)("col","col--4","text--right",O.searchLogoColumn)},n.createElement("a",{target:"_blank",rel:"noopener noreferrer",href:"https://www.algolia.com/","aria-label":(0,j.I)({id:"theme.SearchPage.algoliaLabel",message:"Search by Algolia",description:"The ARIA label for Algolia mention"})},n.createElement("svg",{viewBox:"0 0 168 24",className:O.algoliaLogo},n.createElement("g",{fill:"none"},n.createElement("path",{className:O.algoliaLogoPathFill,d:"M120.925 18.804c-4.386.02-4.386-3.54-4.386-4.106l-.007-13.336 2.675-.424v13.254c0 .322 0 2.358 1.718 2.364v2.248zm-10.846-2.18c.821 0 1.43-.047 1.855-.129v-2.719a6.334 6.334 0 0 0-1.574-.199 5.7 5.7 0 0 0-.897.069 2.699 2.699 0 0 0-.814.24c-.24.116-.439.28-.582.491-.15.212-.219.335-.219.656 0 .628.219.991.616 1.23s.938.362 1.615.362zm-.233-9.7c.883 0 1.629.109 2.231.328.602.218 1.088.525 1.444.915.363.396.609.922.76 1.483.157.56.232 1.175.232 1.85v6.874a32.5 32.5 0 0 1-1.868.314c-.834.123-1.772.185-2.813.185-.69 0-1.327-.069-1.895-.198a4.001 4.001 0 0 1-1.471-.636 3.085 3.085 0 0 1-.951-1.134c-.226-.465-.343-1.12-.343-1.803 0-.656.13-1.073.384-1.525a3.24 3.24 0 0 1 1.047-1.106c.445-.287.95-.492 1.532-.615a8.8 8.8 0 0 1 1.82-.185 8.404 8.404 0 0 1 1.972.24v-.438c0-.307-.035-.6-.11-.874a1.88 1.88 0 0 0-.384-.73 1.784 1.784 0 0 0-.724-.493 3.164 3.164 0 0 0-1.143-.205c-.616 0-1.177.075-1.69.164a7.735 7.735 0 0 0-1.26.307l-.321-2.192c.335-.117.834-.233 1.478-.349a10.98 10.98 0 0 1 2.073-.178zm52.842 9.626c.822 0 1.43-.048 1.854-.13V13.7a6.347 6.347 0 0 0-1.574-.199c-.294 0-.595.021-.896.069a2.7 2.7 0 0 0-.814.24 1.46 1.46 0 0 0-.582.491c-.15.212-.218.335-.218.656 0 .628.218.991.615 1.23.404.245.938.362 1.615.362zm-.226-9.694c.883 0 1.629.108 2.231.327.602.219 1.088.526 1.444.915.355.39.609.923.759 1.483a6.8 6.8 0 0 1 .233 1.852v6.873c-.41.088-1.034.19-1.868.314-.834.123-1.772.184-2.813.184-.69 0-1.327-.068-1.895-.198a4.001 4.001 0 0 1-1.471-.635 3.085 3.085 0 0 1-.951-1.134c-.226-.465-.343-1.12-.343-1.804 0-.656.13-1.073.384-1.524.26-.45.608-.82 1.047-1.107.445-.286.95-.491 1.532-.614a8.803 8.803 0 0 1 2.751-.13c.329.034.671.096 1.04.185v-.437a3.3 3.3 0 0 0-.109-.875 1.873 1.873 0 0 0-.384-.731 1.784 1.784 0 0 0-.724-.492 3.165 3.165 0 0 0-1.143-.205c-.616 0-1.177.075-1.69.164a7.75 7.75 0 0 0-1.26.307l-.321-2.193c.335-.116.834-.232 1.478-.348a11.633 11.633 0 0 1 2.073-.177zm-8.034-1.271a1.626 1.626 0 0 1-1.628-1.62c0-.895.725-1.62 1.628-1.62.904 0 1.63.725 1.63 1.62 0 .895-.733 1.62-1.63 1.62zm1.348 13.22h-2.689V7.27l2.69-.423v11.956zm-4.714 0c-4.386.02-4.386-3.54-4.386-4.107l-.008-13.336 2.676-.424v13.254c0 .322 0 2.358 1.718 2.364v2.248zm-8.698-5.903c0-1.156-.253-2.119-.746-2.788-.493-.677-1.183-1.01-2.067-1.01-.882 0-1.574.333-2.065 1.01-.493.676-.733 1.632-.733 2.788 0 1.168.246 1.953.74 2.63.492.683 1.183 1.018 2.066 1.018.882 0 1.574-.342 2.067-1.019.492-.683.738-1.46.738-2.63zm2.737-.007c0 .902-.13 1.584-.397 2.33a5.52 5.52 0 0 1-1.128 1.906 4.986 4.986 0 0 1-1.752 1.223c-.685.286-1.739.45-2.265.45-.528-.006-1.574-.157-2.252-.45a5.096 5.096 0 0 1-1.744-1.223c-.487-.527-.863-1.162-1.137-1.906a6.345 6.345 0 0 1-.41-2.33c0-.902.123-1.77.397-2.508a5.554 5.554 0 0 1 1.15-1.892 5.133 5.133 0 0 1 1.75-1.216c.679-.287 1.425-.423 2.232-.423.808 0 1.553.142 2.237.423a4.88 4.88 0 0 1 1.753 1.216 5.644 5.644 0 0 1 1.135 1.892c.287.738.431 1.606.431 2.508zm-20.138 0c0 1.12.246 2.363.738 2.882.493.52 1.13.78 1.91.78.424 0 .828-.062 1.204-.178.377-.116.677-.253.917-.417V9.33a10.476 10.476 0 0 0-1.766-.226c-.971-.028-1.71.37-2.23 1.004-.513.636-.773 1.75-.773 2.788zm7.438 5.274c0 1.824-.466 3.156-1.404 4.004-.936.846-2.367 1.27-4.296 1.27-.705 0-2.17-.137-3.34-.396l.431-2.118c.98.205 2.272.26 2.95.26 1.074 0 1.84-.219 2.299-.656.459-.437.684-1.086.684-1.948v-.437a8.07 8.07 0 0 1-1.047.397c-.43.13-.93.198-1.492.198-.739 0-1.41-.116-2.018-.349a4.206 4.206 0 0 1-1.567-1.025c-.431-.45-.774-1.017-1.013-1.694-.24-.677-.363-1.885-.363-2.773 0-.834.13-1.88.384-2.577.26-.696.629-1.298 1.129-1.796.493-.498 1.095-.881 1.8-1.162a6.605 6.605 0 0 1 2.428-.457c.87 0 1.67.109 2.45.24.78.129 1.444.265 1.985.415V18.17zM6.972 6.677v1.627c-.712-.446-1.52-.67-2.425-.67-.585 0-1.045.13-1.38.391a1.24 1.24 0 0 0-.502 1.03c0 .425.164.765.494 1.02.33.256.835.532 1.516.83.447.192.795.356 1.045.495.25.138.537.332.862.582.324.25.563.548.718.894.154.345.23.741.23 1.188 0 .947-.334 1.691-1.004 2.234-.67.542-1.537.814-2.601.814-1.18 0-2.16-.229-2.936-.686v-1.708c.84.628 1.814.942 2.92.942.585 0 1.048-.136 1.388-.407.34-.271.51-.646.51-1.125 0-.287-.1-.55-.302-.79-.203-.24-.42-.42-.655-.542-.234-.123-.585-.29-1.053-.503a61.27 61.27 0 0 1-.582-.271 13.67 13.67 0 0 1-.55-.287 4.275 4.275 0 0 1-.567-.351 6.92 6.92 0 0 1-.455-.4c-.18-.17-.31-.34-.39-.51-.08-.17-.155-.37-.224-.598a2.553 2.553 0 0 1-.104-.742c0-.915.333-1.638.998-2.17.664-.532 1.523-.798 2.576-.798.968 0 1.793.17 2.473.51zm7.468 5.696v-.287c-.022-.607-.187-1.088-.495-1.444-.309-.357-.75-.535-1.324-.535-.532 0-.99.194-1.373.583-.382.388-.622.949-.717 1.683h3.909zm1.005 2.792v1.404c-.596.34-1.383.51-2.362.51-1.255 0-2.255-.377-3-1.132-.744-.755-1.116-1.744-1.116-2.968 0-1.297.34-2.316 1.021-3.055.68-.74 1.548-1.11 2.6-1.11 1.033 0 1.852.323 2.458.966.606.644.91 1.572.91 2.784 0 .33-.033.676-.096 1.038h-5.314c.107.702.405 1.239.894 1.611.49.372 1.106.558 1.85.558.862 0 1.58-.202 2.155-.606zm6.605-1.77h-1.212c-.596 0-1.045.116-1.349.35-.303.234-.454.532-.454.894 0 .372.117.664.35.877.235.213.575.32 1.022.32.51 0 .912-.142 1.204-.424.293-.281.44-.651.44-1.108v-.91zm-4.068-2.554V9.325c.627-.361 1.457-.542 2.489-.542 2.116 0 3.175 1.026 3.175 3.08V17h-1.548v-.957c-.415.68-1.143 1.02-2.186 1.02-.766 0-1.38-.22-1.843-.661-.462-.442-.694-1.003-.694-1.684 0-.776.293-1.38.878-1.81.585-.431 1.404-.647 2.457-.647h1.34V11.8c0-.554-.133-.971-.399-1.253-.266-.282-.707-.423-1.324-.423a4.07 4.07 0 0 0-2.345.718zm9.333-1.93v1.42c.394-1 1.101-1.5 2.123-1.5.148 0 .313.016.494.048v1.531a1.885 1.885 0 0 0-.75-.143c-.542 0-.989.24-1.34.718-.351.479-.527 1.048-.527 1.707V17h-1.563V8.91h1.563zm5.01 4.084c.022.82.272 1.492.75 2.019.479.526 1.15.79 2.01.79.639 0 1.235-.176 1.788-.527v1.404c-.521.319-1.186.479-1.995.479-1.265 0-2.276-.4-3.031-1.197-.755-.798-1.133-1.792-1.133-2.984 0-1.16.38-2.151 1.14-2.975.761-.825 1.79-1.237 3.088-1.237.702 0 1.346.149 1.93.447v1.436a3.242 3.242 0 0 0-1.77-.495c-.84 0-1.513.266-2.019.798-.505.532-.758 1.213-.758 2.042zM40.24 5.72v4.579c.458-1 1.293-1.5 2.505-1.5.787 0 1.42.245 1.899.734.479.49.718 1.17.718 2.042V17h-1.564v-5.106c0-.553-.14-.98-.422-1.284-.282-.303-.652-.455-1.11-.455-.531 0-1.002.202-1.411.606-.41.405-.615 1.022-.615 1.851V17h-1.563V5.72h1.563zm14.966 10.02c.596 0 1.096-.253 1.5-.758.404-.506.606-1.157.606-1.955 0-.915-.202-1.62-.606-2.114-.404-.495-.92-.742-1.548-.742-.553 0-1.05.224-1.491.67-.442.447-.662 1.133-.662 2.058 0 .958.212 1.67.638 2.138.425.469.946.703 1.563.703zM53.004 5.72v4.42c.574-.894 1.388-1.341 2.44-1.341 1.022 0 1.857.383 2.506 1.149.649.766.973 1.781.973 3.047 0 1.138-.309 2.109-.925 2.912-.617.803-1.463 1.205-2.537 1.205-1.075 0-1.894-.447-2.457-1.34V17h-1.58V5.72h1.58zm9.908 11.104l-3.223-7.913h1.739l1.005 2.632 1.26 3.415c.096-.32.48-1.458 1.15-3.415l.909-2.632h1.66l-2.92 7.866c-.777 2.074-1.963 3.11-3.559 3.11a2.92 2.92 0 0 1-.734-.079v-1.34c.17.042.351.064.543.064 1.032 0 1.755-.57 2.17-1.708z"}),n.createElement("path",{fill:"#5468FF",d:"M78.988.938h16.594a2.968 2.968 0 0 1 2.966 2.966V20.5a2.967 2.967 0 0 1-2.966 2.964H78.988a2.967 2.967 0 0 1-2.966-2.964V3.897A2.961 2.961 0 0 1 78.988.938z"}),n.createElement("path",{fill:"white",d:"M89.632 5.967v-.772a.978.978 0 0 0-.978-.977h-2.28a.978.978 0 0 0-.978.977v.793c0 .088.082.15.171.13a7.127 7.127 0 0 1 1.984-.28c.65 0 1.295.088 1.917.259.082.02.164-.04.164-.13m-6.248 1.01l-.39-.389a.977.977 0 0 0-1.382 0l-.465.465a.973.973 0 0 0 0 1.38l.383.383c.062.061.15.047.205-.014.226-.307.472-.601.746-.874.281-.28.568-.526.883-.751.068-.042.075-.137.02-.2m4.16 2.453v3.341c0 .096.104.165.192.117l2.97-1.537c.068-.034.089-.117.055-.184a3.695 3.695 0 0 0-3.08-1.866c-.068 0-.136.054-.136.13m0 8.048a4.489 4.489 0 0 1-4.49-4.482 4.488 4.488 0 0 1 4.49-4.482 4.488 4.488 0 0 1 4.489 4.482 4.484 4.484 0 0 1-4.49 4.482m0-10.85a6.363 6.363 0 1 0 0 12.729 6.37 6.37 0 0 0 6.372-6.368 6.358 6.358 0 0 0-6.371-6.36"})))))),D.items.length>0?n.createElement("main",null,D.items.map((function(e,t){var r=e.title,a=e.url,s=e.summary,c=e.breadcrumbs;return n.createElement("article",{key:t,className:O.searchResultItem},n.createElement("h2",{className:O.searchResultItemHeading},n.createElement(f.Z,{to:a,dangerouslySetInnerHTML:{__html:r}})),c.length>0&&n.createElement("nav",{"aria-label":"breadcrumbs"},n.createElement("ul",{className:(0,i.Z)("breadcrumbs",O.searchResultItemPath)},c.map((function(e,t){return n.createElement("li",{key:t,className:"breadcrumbs__item",dangerouslySetInnerHTML:{__html:e}})})))),s&&n.createElement("p",{className:O.searchResultItemSummary,dangerouslySetInnerHTML:{__html:s}}))}))):[Q&&!D.loading&&n.createElement("p",{key:"no-results"},n.createElement(j.Z,{id:"theme.SearchPage.noResultsText",description:"The paragraph for empty search result"},"No results were found")),!!D.loading&&n.createElement("div",{key:"spinner",className:O.loadingSpinner})],D.hasMore&&n.createElement("div",{className:O.loader,ref:M},n.createElement(j.Z,{id:"theme.SearchPage.fetchingNewResults",description:"The paragraph for fetching new search results"},"Fetching new results..."))))}function A(){return n.createElement(b.FG,{className:"search-page-wrapper"},n.createElement(N,null))}}}]); \ No newline at end of file diff --git a/assets/js/1a4e3797.1ecd994c.js.LICENSE.txt b/assets/js/1a4e3797.1ecd994c.js.LICENSE.txt new file mode 100644 index 00000000000..f167ba6c6f6 --- /dev/null +++ b/assets/js/1a4e3797.1ecd994c.js.LICENSE.txt @@ -0,0 +1 @@ +/*! algoliasearch-lite.umd.js | 4.19.1 | © Algolia, inc. | https://github.com/algolia/algoliasearch-client-javascript */ diff --git a/assets/js/1a4e3797.6edfb8bf.js.LICENSE.txt b/assets/js/1a4e3797.6edfb8bf.js.LICENSE.txt deleted file mode 100644 index ac43b0e313c..00000000000 --- a/assets/js/1a4e3797.6edfb8bf.js.LICENSE.txt +++ /dev/null @@ -1 +0,0 @@ -/*! algoliasearch-lite.umd.js | 4.19.0 | © Algolia, inc. | https://github.com/algolia/algoliasearch-client-javascript */ diff --git a/assets/js/61426.4a4dfe81.js b/assets/js/61426.8bf7a004.js similarity index 99% rename from assets/js/61426.4a4dfe81.js rename to assets/js/61426.8bf7a004.js index ddd0ae2a6a7..eb14fd10c5b 100644 --- a/assets/js/61426.4a4dfe81.js +++ b/assets/js/61426.8bf7a004.js @@ -1 +1 @@ -"use strict";(self.webpackChunk_cumulus_website=self.webpackChunk_cumulus_website||[]).push([[61426],{61426:(e,t,r)=>{function n(e,t){var r=void 0;return function(){for(var n=arguments.length,o=new Array(n),i=0;ipn});var a=function(){};function c(e){var t=e.item,r=e.items;return{index:t.__autocomplete_indexName,items:[t],positions:[1+r.findIndex((function(e){return e.objectID===t.objectID}))],queryID:t.__autocomplete_queryID,algoliaSource:["autocomplete"]}}function l(e,t){return function(e){if(Array.isArray(e))return e}(e)||function(e,t){var r=null==e?null:"undefined"!=typeof Symbol&&e[Symbol.iterator]||e["@@iterator"];if(null!=r){var n,o,i,a,c=[],l=!0,u=!1;try{if(i=(r=r.call(e)).next,0===t){if(Object(r)!==r)return;l=!1}else for(;!(l=(n=i.call(r)).done)&&(c.push(n.value),c.length!==t);l=!0);}catch(s){u=!0,o=s}finally{try{if(!l&&null!=r.return&&(a=r.return(),Object(a)!==a))return}finally{if(u)throw o}}return c}}(e,t)||function(e,t){if(!e)return;if("string"==typeof e)return u(e,t);var r=Object.prototype.toString.call(e).slice(8,-1);"Object"===r&&e.constructor&&(r=e.constructor.name);if("Map"===r||"Set"===r)return Array.from(e);if("Arguments"===r||/^(?:Ui|I)nt(?:8|16|32)(?:Clamped)?Array$/.test(r))return u(e,t)}(e,t)||function(){throw new TypeError("Invalid attempt to destructure non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.")}()}function u(e,t){(null==t||t>e.length)&&(t=e.length);for(var r=0,n=new Array(t);re.length)&&(t=e.length);for(var r=0,n=new Array(t);r=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}function y(e,t){var r=Object.keys(e);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(e);t&&(n=n.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),r.push.apply(r,n)}return r}function h(e){for(var t=1;t=3||2===r&&n>=4||1===r&&n>=10);function i(t,r,n){if(o&&void 0!==n){var i=n[0].__autocomplete_algoliaCredentials,a={"X-Algolia-Application-Id":i.appId,"X-Algolia-API-Key":i.apiKey};e.apply(void 0,[t].concat(p(r),[{headers:a}]))}else e.apply(void 0,[t].concat(p(r)))}return{init:function(t,r){e("init",{appId:t,apiKey:r})},setUserToken:function(t){e("setUserToken",t)},clickedObjectIDsAfterSearch:function(){for(var e=arguments.length,t=new Array(e),r=0;r0&&i("clickedObjectIDsAfterSearch",g(t),t[0].items)},clickedObjectIDs:function(){for(var e=arguments.length,t=new Array(e),r=0;r0&&i("clickedObjectIDs",g(t),t[0].items)},clickedFilters:function(){for(var t=arguments.length,r=new Array(t),n=0;n0&&e.apply(void 0,["clickedFilters"].concat(r))},convertedObjectIDsAfterSearch:function(){for(var e=arguments.length,t=new Array(e),r=0;r0&&i("convertedObjectIDsAfterSearch",g(t),t[0].items)},convertedObjectIDs:function(){for(var e=arguments.length,t=new Array(e),r=0;r0&&i("convertedObjectIDs",g(t),t[0].items)},convertedFilters:function(){for(var t=arguments.length,r=new Array(t),n=0;n0&&e.apply(void 0,["convertedFilters"].concat(r))},viewedObjectIDs:function(){for(var e=arguments.length,t=new Array(e),r=0;r0&&t.reduce((function(e,t){var r=t.items,n=d(t,f);return[].concat(p(e),p(function(e){for(var t=arguments.length>1&&void 0!==arguments[1]?arguments[1]:20,r=[],n=0;n0&&e.apply(void 0,["viewedFilters"].concat(r))}}}function S(e){var t=e.items.reduce((function(e,t){var r;return e[t.__autocomplete_indexName]=(null!==(r=e[t.__autocomplete_indexName])&&void 0!==r?r:[]).concat(t),e}),{});return Object.keys(t).map((function(e){return{index:e,items:t[e],algoliaSource:["autocomplete"]}}))}function j(e){return e.objectID&&e.__autocomplete_indexName&&e.__autocomplete_queryID}function w(e){return w="function"==typeof Symbol&&"symbol"==typeof Symbol.iterator?function(e){return typeof e}:function(e){return e&&"function"==typeof Symbol&&e.constructor===Symbol&&e!==Symbol.prototype?"symbol":typeof e},w(e)}function E(e){return function(e){if(Array.isArray(e))return P(e)}(e)||function(e){if("undefined"!=typeof Symbol&&null!=e[Symbol.iterator]||null!=e["@@iterator"])return Array.from(e)}(e)||function(e,t){if(!e)return;if("string"==typeof e)return P(e,t);var r=Object.prototype.toString.call(e).slice(8,-1);"Object"===r&&e.constructor&&(r=e.constructor.name);if("Map"===r||"Set"===r)return Array.from(e);if("Arguments"===r||/^(?:Ui|I)nt(?:8|16|32)(?:Clamped)?Array$/.test(r))return P(e,t)}(e)||function(){throw new TypeError("Invalid attempt to spread non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.")}()}function P(e,t){(null==t||t>e.length)&&(t=e.length);for(var r=0,n=new Array(t);r0&&C({onItemsChange:o,items:r,insights:f,state:t}))}}),0);return{name:"aa.algoliaInsightsPlugin",subscribe:function(e){var t=e.setContext,r=e.onSelect,n=e.onActive;s("addAlgoliaAgent","insights-plugin"),t({algoliaInsightsPlugin:{__algoliaSearchParameters:{clickAnalytics:!0},insights:f}}),r((function(e){var t=e.item,r=e.state,n=e.event;j(t)&&l({state:r,event:n,insights:f,item:t,insightsEvents:[D({eventName:"Item Selected"},c({item:t,items:m.current}))]})})),n((function(e){var t=e.item,r=e.state,n=e.event;j(t)&&u({state:r,event:n,insights:f,item:t,insightsEvents:[D({eventName:"Item Active"},c({item:t,items:m.current}))]})}))},onStateChange:function(e){var t=e.state;p({state:t})},__autocomplete_pluginOptions:e}}function N(e){return N="function"==typeof Symbol&&"symbol"==typeof Symbol.iterator?function(e){return typeof e}:function(e){return e&&"function"==typeof Symbol&&e.constructor===Symbol&&e!==Symbol.prototype?"symbol":typeof e},N(e)}function T(e,t){var r=Object.keys(e);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(e);t&&(n=n.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),r.push.apply(r,n)}return r}function q(e,t,r){return(t=function(e){var t=function(e,t){if("object"!==N(e)||null===e)return e;var r=e[Symbol.toPrimitive];if(void 0!==r){var n=r.call(e,t||"default");if("object"!==N(n))return n;throw new TypeError("@@toPrimitive must return a primitive value.")}return("string"===t?String:Number)(e)}(e,"string");return"symbol"===N(t)?t:String(t)}(t))in e?Object.defineProperty(e,t,{value:r,enumerable:!0,configurable:!0,writable:!0}):e[t]=r,e}function R(e,t,r){var n,o=t.initialState;return{getState:function(){return o},dispatch:function(n,i){var a=function(e){for(var t=1;te.length)&&(t=e.length);for(var r=0,n=new Array(t);r0},reshape:function(e){return e.sources}},e),{},{id:null!==(r=e.id)&&void 0!==r?r:"autocomplete-".concat(V++),plugins:o,initialState:X({activeItemId:null,query:"",completion:null,collections:[],isOpen:!1,status:"idle",context:{}},e.initialState),onStateChange:function(t){var r;null===(r=e.onStateChange)||void 0===r||r.call(e,t),o.forEach((function(e){var r;return null===(r=e.onStateChange)||void 0===r?void 0:r.call(e,t)}))},onSubmit:function(t){var r;null===(r=e.onSubmit)||void 0===r||r.call(e,t),o.forEach((function(e){var r;return null===(r=e.onSubmit)||void 0===r?void 0:r.call(e,t)}))},onReset:function(t){var r;null===(r=e.onReset)||void 0===r||r.call(e,t),o.forEach((function(e){var r;return null===(r=e.onReset)||void 0===r?void 0:r.call(e,t)}))},getSources:function(r){return Promise.all([].concat(Q(o.map((function(e){return e.getSources}))),[e.getSources]).filter(Boolean).map((function(e){return function(e,t){var r=[];return Promise.resolve(e(t)).then((function(e){return Array.isArray(e),Promise.all(e.filter((function(e){return Boolean(e)})).map((function(e){if(e.sourceId,r.includes(e.sourceId))throw new Error("[Autocomplete] The `sourceId` ".concat(JSON.stringify(e.sourceId)," is not unique."));r.push(e.sourceId);var t={getItemInputValue:function(e){return e.state.query},getItemUrl:function(){},onSelect:function(e){(0,e.setIsOpen)(!1)},onActive:a,onResolve:a};Object.keys(t).forEach((function(e){t[e].__default=!0}));var n=$($({},t),e);return Promise.resolve(n)})))}))}(e,r)}))).then((function(e){return L(e)})).then((function(e){return e.map((function(e){return X(X({},e),{},{onSelect:function(r){e.onSelect(r),t.forEach((function(e){var t;return null===(t=e.onSelect)||void 0===t?void 0:t.call(e,r)}))},onActive:function(r){e.onActive(r),t.forEach((function(e){var t;return null===(t=e.onActive)||void 0===t?void 0:t.call(e,r)}))},onResolve:function(r){e.onResolve(r),t.forEach((function(e){var t;return null===(t=e.onResolve)||void 0===t?void 0:t.call(e,r)}))}})}))}))},navigator:X({navigate:function(e){var t=e.itemUrl;n.location.assign(t)},navigateNewTab:function(e){var t=e.itemUrl,r=n.open(t,"_blank","noopener");null==r||r.focus()},navigateNewWindow:function(e){var t=e.itemUrl;n.open(t,"_blank","noopener")}},e.navigator)})}function te(e){return te="function"==typeof Symbol&&"symbol"==typeof Symbol.iterator?function(e){return typeof e}:function(e){return e&&"function"==typeof Symbol&&e.constructor===Symbol&&e!==Symbol.prototype?"symbol":typeof e},te(e)}function re(e,t){var r=Object.keys(e);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(e);t&&(n=n.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),r.push.apply(r,n)}return r}function ne(e){for(var t=1;te.length)&&(t=e.length);for(var r=0,n=new Array(t);r=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}var Ie,De,Ae,ke=null,xe=(Ie=-1,De=-1,Ae=void 0,function(e){var t=++Ie;return Promise.resolve(e).then((function(e){return Ae&&t=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}var Me=/((gt|sm)-|galaxy nexus)|samsung[- ]|samsungbrowser/i;function He(e){return He="function"==typeof Symbol&&"symbol"==typeof Symbol.iterator?function(e){return typeof e}:function(e){return e&&"function"==typeof Symbol&&e.constructor===Symbol&&e!==Symbol.prototype?"symbol":typeof e},He(e)}var Fe=["props","refresh","store"],Ue=["inputElement","formElement","panelElement"],Be=["inputElement"],Ve=["inputElement","maxLength"],Ke=["sourceIndex"],$e=["sourceIndex"],Je=["item","source","sourceIndex"];function ze(e,t){var r=Object.keys(e);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(e);t&&(n=n.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),r.push.apply(r,n)}return r}function We(e){for(var t=1;t=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}function Ge(e){var t=e.props,r=e.refresh,n=e.store,o=Ze(e,Fe),i=function(e,t){return void 0!==t?"".concat(e,"-").concat(t):e};return{getEnvironmentProps:function(e){var r=e.inputElement,o=e.formElement,i=e.panelElement;function a(e){!n.getState().isOpen&&n.pendingRequests.isEmpty()||e.target===r||!1===[o,i].some((function(t){return r=t,n=e.target,r===n||r.contains(n);var r,n}))&&(n.dispatch("blur",null),t.debug||n.pendingRequests.cancelAll())}return We({onTouchStart:a,onMouseDown:a,onTouchMove:function(e){!1!==n.getState().isOpen&&r===t.environment.document.activeElement&&e.target!==r&&r.blur()}},Ze(e,Ue))},getRootProps:function(e){return We({role:"combobox","aria-expanded":n.getState().isOpen,"aria-haspopup":"listbox","aria-owns":n.getState().isOpen?"".concat(t.id,"-list"):void 0,"aria-labelledby":"".concat(t.id,"-label")},e)},getFormProps:function(e){e.inputElement;return We({action:"",noValidate:!0,role:"search",onSubmit:function(i){var a;i.preventDefault(),t.onSubmit(We({event:i,refresh:r,state:n.getState()},o)),n.dispatch("submit",null),null===(a=e.inputElement)||void 0===a||a.blur()},onReset:function(i){var a;i.preventDefault(),t.onReset(We({event:i,refresh:r,state:n.getState()},o)),n.dispatch("reset",null),null===(a=e.inputElement)||void 0===a||a.focus()}},Ze(e,Be))},getLabelProps:function(e){var r=e||{},n=r.sourceIndex,o=Ze(r,Ke);return We({htmlFor:"".concat(i(t.id,n),"-input"),id:"".concat(i(t.id,n),"-label")},o)},getInputProps:function(e){var i;function c(e){(t.openOnFocus||Boolean(n.getState().query))&&Ce(We({event:e,props:t,query:n.getState().completion||n.getState().query,refresh:r,store:n},o)),n.dispatch("focus",null)}var l=e||{},u=(l.inputElement,l.maxLength),s=void 0===u?512:u,f=Ze(l,Ve),m=ge(n.getState()),p=function(e){return Boolean(e&&e.match(Me))}((null===(i=t.environment.navigator)||void 0===i?void 0:i.userAgent)||""),v=null!=m&&m.itemUrl&&!p?"go":"search";return We({"aria-autocomplete":"both","aria-activedescendant":n.getState().isOpen&&null!==n.getState().activeItemId?"".concat(t.id,"-item-").concat(n.getState().activeItemId):void 0,"aria-controls":n.getState().isOpen?"".concat(t.id,"-list"):void 0,"aria-labelledby":"".concat(t.id,"-label"),value:n.getState().completion||n.getState().query,id:"".concat(t.id,"-input"),autoComplete:"off",autoCorrect:"off",autoCapitalize:"off",enterKeyHint:v,spellCheck:"false",autoFocus:t.autoFocus,placeholder:t.placeholder,maxLength:s,type:"search",onChange:function(e){Ce(We({event:e,props:t,query:e.currentTarget.value.slice(0,s),refresh:r,store:n},o))},onKeyDown:function(e){!function(e){var t=e.event,r=e.props,n=e.refresh,o=e.store,i=Le(e,Ne);if("ArrowUp"===t.key||"ArrowDown"===t.key){var a=function(){var e=r.environment.document.getElementById("".concat(r.id,"-item-").concat(o.getState().activeItemId));e&&(e.scrollIntoViewIfNeeded?e.scrollIntoViewIfNeeded(!1):e.scrollIntoView(!1))},c=function(){var e=ge(o.getState());if(null!==o.getState().activeItemId&&e){var r=e.item,a=e.itemInputValue,c=e.itemUrl,l=e.source;l.onActive(qe({event:t,item:r,itemInputValue:a,itemUrl:c,refresh:n,source:l,state:o.getState()},i))}};t.preventDefault(),!1===o.getState().isOpen&&(r.openOnFocus||Boolean(o.getState().query))?Ce(qe({event:t,props:r,query:o.getState().query,refresh:n,store:o},i)).then((function(){o.dispatch(t.key,{nextActiveItemId:r.defaultActiveItemId}),c(),setTimeout(a,0)})):(o.dispatch(t.key,{}),c(),a())}else if("Escape"===t.key)t.preventDefault(),o.dispatch(t.key,null),o.pendingRequests.cancelAll();else if("Tab"===t.key)o.dispatch("blur",null),o.pendingRequests.cancelAll();else if("Enter"===t.key){if(null===o.getState().activeItemId||o.getState().collections.every((function(e){return 0===e.items.length})))return void(r.debug||o.pendingRequests.cancelAll());t.preventDefault();var l=ge(o.getState()),u=l.item,s=l.itemInputValue,f=l.itemUrl,m=l.source;if(t.metaKey||t.ctrlKey)void 0!==f&&(m.onSelect(qe({event:t,item:u,itemInputValue:s,itemUrl:f,refresh:n,source:m,state:o.getState()},i)),r.navigator.navigateNewTab({itemUrl:f,item:u,state:o.getState()}));else if(t.shiftKey)void 0!==f&&(m.onSelect(qe({event:t,item:u,itemInputValue:s,itemUrl:f,refresh:n,source:m,state:o.getState()},i)),r.navigator.navigateNewWindow({itemUrl:f,item:u,state:o.getState()}));else if(t.altKey);else{if(void 0!==f)return m.onSelect(qe({event:t,item:u,itemInputValue:s,itemUrl:f,refresh:n,source:m,state:o.getState()},i)),void r.navigator.navigate({itemUrl:f,item:u,state:o.getState()});Ce(qe({event:t,nextState:{isOpen:!1},props:r,query:s,refresh:n,store:o},i)).then((function(){m.onSelect(qe({event:t,item:u,itemInputValue:s,itemUrl:f,refresh:n,source:m,state:o.getState()},i))}))}}}(We({event:e,props:t,refresh:r,store:n},o))},onFocus:c,onBlur:a,onClick:function(r){e.inputElement!==t.environment.document.activeElement||n.getState().isOpen||c(r)}},f)},getPanelProps:function(e){return We({onMouseDown:function(e){e.preventDefault()},onMouseLeave:function(){n.dispatch("mouseleave",null)}},e)},getListProps:function(e){var r=e||{},n=r.sourceIndex,o=Ze(r,$e);return We({role:"listbox","aria-labelledby":"".concat(i(t.id,n),"-label"),id:"".concat(i(t.id,n),"-list")},o)},getItemProps:function(e){var a=e.item,c=e.source,l=e.sourceIndex,u=Ze(e,Je);return We({id:"".concat(i(t.id,l),"-item-").concat(a.__autocomplete_id),role:"option","aria-selected":n.getState().activeItemId===a.__autocomplete_id,onMouseMove:function(e){if(a.__autocomplete_id!==n.getState().activeItemId){n.dispatch("mousemove",a.__autocomplete_id);var t=ge(n.getState());if(null!==n.getState().activeItemId&&t){var i=t.item,c=t.itemInputValue,l=t.itemUrl,u=t.source;u.onActive(We({event:e,item:i,itemInputValue:c,itemUrl:l,refresh:r,source:u,state:n.getState()},o))}}},onMouseDown:function(e){e.preventDefault()},onClick:function(e){var i=c.getItemInputValue({item:a,state:n.getState()}),l=c.getItemUrl({item:a,state:n.getState()});(l?Promise.resolve():Ce(We({event:e,nextState:{isOpen:!1},props:t,query:i,refresh:r,store:n},o))).then((function(){c.onSelect(We({event:e,item:a,itemInputValue:i,itemUrl:l,refresh:r,source:c,state:n.getState()},o))}))}},u)}}}var Xe=[{segment:"autocomplete-core",version:"1.9.3"}];function Ye(e){return Ye="function"==typeof Symbol&&"symbol"==typeof Symbol.iterator?function(e){return typeof e}:function(e){return e&&"function"==typeof Symbol&&e.constructor===Symbol&&e!==Symbol.prototype?"symbol":typeof e},Ye(e)}function et(e,t){var r=Object.keys(e);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(e);t&&(n=n.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),r.push.apply(r,n)}return r}function tt(e){for(var t=1;t=r?null===n?null:0:o}function at(e){return at="function"==typeof Symbol&&"symbol"==typeof Symbol.iterator?function(e){return typeof e}:function(e){return e&&"function"==typeof Symbol&&e.constructor===Symbol&&e!==Symbol.prototype?"symbol":typeof e},at(e)}function ct(e,t){var r=Object.keys(e);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(e);t&&(n=n.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),r.push.apply(r,n)}return r}function lt(e){for(var t=1;te.length)&&(t=e.length);for(var r=0,n=new Array(t);r=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}function kt(e){var t=e.translations,r=void 0===t?{}:t,n=At(e,Pt),o=r.noResultsText,i=void 0===o?"No results for":o,a=r.suggestedQueryText,c=void 0===a?"Try searching for":a,l=r.reportMissingResultsText,u=void 0===l?"Believe this query should return results?":l,s=r.reportMissingResultsLinkText,f=void 0===s?"Let us know.":s,m=n.state.context.searchSuggestions;return yt.createElement("div",{className:"DocSearch-NoResults"},yt.createElement("div",{className:"DocSearch-Screen-Icon"},yt.createElement(Et,null)),yt.createElement("p",{className:"DocSearch-Title"},i,' "',yt.createElement("strong",null,n.state.query),'"'),m&&m.length>0&&yt.createElement("div",{className:"DocSearch-NoResults-Prefill-List"},yt.createElement("p",{className:"DocSearch-Help"},c,":"),yt.createElement("ul",null,m.slice(0,3).reduce((function(e,t){return[].concat(It(e),[yt.createElement("li",{key:t},yt.createElement("button",{className:"DocSearch-Prefill",key:t,type:"button",onClick:function(){n.setQuery(t.toLowerCase()+" "),n.refresh(),n.inputRef.current.focus()}},t))])}),[]))),n.getMissingResultsUrl&&yt.createElement("p",{className:"DocSearch-Help"},"".concat(u," "),yt.createElement("a",{href:n.getMissingResultsUrl({query:n.state.query}),target:"_blank",rel:"noopener noreferrer"},f)))}var xt=function(){return yt.createElement("svg",{width:"20",height:"20",viewBox:"0 0 20 20"},yt.createElement("path",{d:"M17 6v12c0 .52-.2 1-1 1H4c-.7 0-1-.33-1-1V2c0-.55.42-1 1-1h8l5 5zM14 8h-3.13c-.51 0-.87-.34-.87-.87V4",stroke:"currentColor",fill:"none",fillRule:"evenodd",strokeLinejoin:"round"}))};function Ct(e){switch(e.type){case"lvl1":return yt.createElement(xt,null);case"content":return yt.createElement(Nt,null);default:return yt.createElement(_t,null)}}function _t(){return yt.createElement("svg",{width:"20",height:"20",viewBox:"0 0 20 20"},yt.createElement("path",{d:"M13 13h4-4V8H7v5h6v4-4H7V8H3h4V3v5h6V3v5h4-4v5zm-6 0v4-4H3h4z",stroke:"currentColor",fill:"none",fillRule:"evenodd",strokeLinecap:"round",strokeLinejoin:"round"}))}function Nt(){return yt.createElement("svg",{width:"20",height:"20",viewBox:"0 0 20 20"},yt.createElement("path",{d:"M17 5H3h14zm0 5H3h14zm0 5H3h14z",stroke:"currentColor",fill:"none",fillRule:"evenodd",strokeLinejoin:"round"}))}function Tt(){return yt.createElement("svg",{className:"DocSearch-Hit-Select-Icon",width:"20",height:"20",viewBox:"0 0 20 20"},yt.createElement("g",{stroke:"currentColor",fill:"none",fillRule:"evenodd",strokeLinecap:"round",strokeLinejoin:"round"},yt.createElement("path",{d:"M18 3v4c0 2-2 4-4 4H2"}),yt.createElement("path",{d:"M8 17l-6-6 6-6"})))}var qt=["hit","attribute","tagName"];function Rt(e,t){var r=Object.keys(e);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(e);t&&(n=n.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),r.push.apply(r,n)}return r}function Lt(e){for(var t=1;t=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}function Ft(e,t){return t.split(".").reduce((function(e,t){return null!=e&&e[t]?e[t]:null}),e)}function Ut(e){var t=e.hit,r=e.attribute,n=e.tagName,o=void 0===n?"span":n,i=Ht(e,qt);return(0,yt.createElement)(o,Lt(Lt({},i),{},{dangerouslySetInnerHTML:{__html:Ft(t,"_snippetResult.".concat(r,".value"))||Ft(t,r)}}))}function Bt(e,t){return function(e){if(Array.isArray(e))return e}(e)||function(e,t){var r=null==e?null:"undefined"!=typeof Symbol&&e[Symbol.iterator]||e["@@iterator"];if(null==r)return;var n,o,i=[],a=!0,c=!1;try{for(r=r.call(e);!(a=(n=r.next()).done)&&(i.push(n.value),!t||i.length!==t);a=!0);}catch(l){c=!0,o=l}finally{try{a||null==r.return||r.return()}finally{if(c)throw o}}return i}(e,t)||function(e,t){if(!e)return;if("string"==typeof e)return Vt(e,t);var r=Object.prototype.toString.call(e).slice(8,-1);"Object"===r&&e.constructor&&(r=e.constructor.name);if("Map"===r||"Set"===r)return Array.from(e);if("Arguments"===r||/^(?:Ui|I)nt(?:8|16|32)(?:Clamped)?Array$/.test(r))return Vt(e,t)}(e,t)||function(){throw new TypeError("Invalid attempt to destructure non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.")}()}function Vt(e,t){(null==t||t>e.length)&&(t=e.length);for(var r=0,n=new Array(t);r|<\/mark>)/g,Wt=RegExp(zt.source);function Qt(e){var t,r,n,o,i,a=e;if(!a.__docsearch_parent&&!e._highlightResult)return e.hierarchy.lvl0;var c=((a.__docsearch_parent?null===(t=a.__docsearch_parent)||void 0===t||null===(r=t._highlightResult)||void 0===r||null===(n=r.hierarchy)||void 0===n?void 0:n.lvl0:null===(o=e._highlightResult)||void 0===o||null===(i=o.hierarchy)||void 0===i?void 0:i.lvl0)||{}).value;return c&&Wt.test(c)?c.replace(zt,""):c}function Zt(){return Zt=Object.assign||function(e){for(var t=1;t=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}function or(e){var t=e.translations,r=void 0===t?{}:t,n=nr(e,tr),o=r.recentSearchesTitle,i=void 0===o?"Recent":o,a=r.noRecentSearchesText,c=void 0===a?"No recent searches":a,l=r.saveRecentSearchButtonTitle,u=void 0===l?"Save this search":l,s=r.removeRecentSearchButtonTitle,f=void 0===s?"Remove this search from history":s,m=r.favoriteSearchesTitle,p=void 0===m?"Favorite":m,v=r.removeFavoriteSearchButtonTitle,d=void 0===v?"Remove this search from favorites":v;return"idle"===n.state.status&&!1===n.hasCollections?n.disableUserPersonalization?null:yt.createElement("div",{className:"DocSearch-StartScreen"},yt.createElement("p",{className:"DocSearch-Help"},c)):!1===n.hasCollections?null:yt.createElement("div",{className:"DocSearch-Dropdown-Container"},yt.createElement($t,rr({},n,{title:i,collection:n.state.collections[0],renderIcon:function(){return yt.createElement("div",{className:"DocSearch-Hit-icon"},yt.createElement(Xt,null))},renderAction:function(e){var t=e.item,r=e.runFavoriteTransition,o=e.runDeleteTransition;return yt.createElement(yt.Fragment,null,yt.createElement("div",{className:"DocSearch-Hit-action"},yt.createElement("button",{className:"DocSearch-Hit-action-button",title:u,type:"submit",onClick:function(e){e.preventDefault(),e.stopPropagation(),r((function(){n.favoriteSearches.add(t),n.recentSearches.remove(t),n.refresh()}))}},yt.createElement(Yt,null))),yt.createElement("div",{className:"DocSearch-Hit-action"},yt.createElement("button",{className:"DocSearch-Hit-action-button",title:f,type:"submit",onClick:function(e){e.preventDefault(),e.stopPropagation(),o((function(){n.recentSearches.remove(t),n.refresh()}))}},yt.createElement(er,null))))}})),yt.createElement($t,rr({},n,{title:p,collection:n.state.collections[1],renderIcon:function(){return yt.createElement("div",{className:"DocSearch-Hit-icon"},yt.createElement(Yt,null))},renderAction:function(e){var t=e.item,r=e.runDeleteTransition;return yt.createElement("div",{className:"DocSearch-Hit-action"},yt.createElement("button",{className:"DocSearch-Hit-action-button",title:d,type:"submit",onClick:function(e){e.preventDefault(),e.stopPropagation(),r((function(){n.favoriteSearches.remove(t),n.refresh()}))}},yt.createElement(er,null)))}})))}var ir=["translations"];function ar(){return ar=Object.assign||function(e){for(var t=1;t=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}var lr=yt.memo((function(e){var t=e.translations,r=void 0===t?{}:t,n=cr(e,ir);if("error"===n.state.status)return yt.createElement(wt,{translations:null==r?void 0:r.errorScreen});var o=n.state.collections.some((function(e){return e.items.length>0}));return n.state.query?!1===o?yt.createElement(kt,ar({},n,{translations:null==r?void 0:r.noResultsScreen})):yt.createElement(Gt,n):yt.createElement(or,ar({},n,{hasCollections:o,translations:null==r?void 0:r.startScreen}))}),(function(e,t){return"loading"===t.state.status||"stalled"===t.state.status}));function ur(){return yt.createElement("svg",{viewBox:"0 0 38 38",stroke:"currentColor",strokeOpacity:".5"},yt.createElement("g",{fill:"none",fillRule:"evenodd"},yt.createElement("g",{transform:"translate(1 1)",strokeWidth:"2"},yt.createElement("circle",{strokeOpacity:".3",cx:"18",cy:"18",r:"18"}),yt.createElement("path",{d:"M36 18c0-9.94-8.06-18-18-18"},yt.createElement("animateTransform",{attributeName:"transform",type:"rotate",from:"0 18 18",to:"360 18 18",dur:"1s",repeatCount:"indefinite"})))))}var sr=r(20830),fr=["translations"];function mr(){return mr=Object.assign||function(e){for(var t=1;t=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}function vr(e){var t=e.translations,r=void 0===t?{}:t,n=pr(e,fr),o=r.resetButtonTitle,i=void 0===o?"Clear the query":o,a=r.resetButtonAriaLabel,c=void 0===a?"Clear the query":a,l=r.cancelButtonText,u=void 0===l?"Cancel":l,s=r.cancelButtonAriaLabel,f=void 0===s?"Cancel":s,m=n.getFormProps({inputElement:n.inputRef.current}).onReset;return yt.useEffect((function(){n.autoFocus&&n.inputRef.current&&n.inputRef.current.focus()}),[n.autoFocus,n.inputRef]),yt.useEffect((function(){n.isFromSelection&&n.inputRef.current&&n.inputRef.current.select()}),[n.isFromSelection,n.inputRef]),yt.createElement(yt.Fragment,null,yt.createElement("form",{className:"DocSearch-Form",onSubmit:function(e){e.preventDefault()},onReset:m},yt.createElement("label",mr({className:"DocSearch-MagnifierLabel"},n.getLabelProps()),yt.createElement(sr.W,null)),yt.createElement("div",{className:"DocSearch-LoadingIndicator"},yt.createElement(ur,null)),yt.createElement("input",mr({className:"DocSearch-Input",ref:n.inputRef},n.getInputProps({inputElement:n.inputRef.current,autoFocus:n.autoFocus,maxLength:ht}))),yt.createElement("button",{type:"reset",title:i,className:"DocSearch-Reset","aria-label":c,hidden:!n.state.query},yt.createElement(er,null))),yt.createElement("button",{className:"DocSearch-Cancel",type:"reset","aria-label":f,onClick:n.onClose},u))}var dr=["_highlightResult","_snippetResult"];function yr(e,t){if(null==e)return{};var r,n,o=function(e,t){if(null==e)return{};var r,n,o={},i=Object.keys(e);for(n=0;n=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}function hr(e){return!1===function(){var e="__TEST_KEY__";try{return localStorage.setItem(e,""),localStorage.removeItem(e),!0}catch(t){return!1}}()?{setItem:function(){},getItem:function(){return[]}}:{setItem:function(t){return window.localStorage.setItem(e,JSON.stringify(t))},getItem:function(){var t=window.localStorage.getItem(e);return t?JSON.parse(t):[]}}}function br(e){var t=e.key,r=e.limit,n=void 0===r?5:r,o=hr(t),i=o.getItem().slice(0,n);return{add:function(e){var t=e,r=(t._highlightResult,t._snippetResult,yr(t,dr)),a=i.findIndex((function(e){return e.objectID===r.objectID}));a>-1&&i.splice(a,1),i.unshift(r),i=i.slice(0,n),o.setItem(i)},remove:function(e){i=i.filter((function(t){return t.objectID!==e.objectID})),o.setItem(i)},getAll:function(){return i}}}function gr(e){const t=`algoliasearch-client-js-${e.key}`;let r;const n=()=>(void 0===r&&(r=e.localStorage||window.localStorage),r),o=()=>JSON.parse(n().getItem(t)||"{}"),i=e=>{n().setItem(t,JSON.stringify(e))};return{get:(t,r,n={miss:()=>Promise.resolve()})=>Promise.resolve().then((()=>{(()=>{const t=e.timeToLive?1e3*e.timeToLive:null,r=o(),n=Object.fromEntries(Object.entries(r).filter((([,e])=>void 0!==e.timestamp)));if(i(n),!t)return;const a=Object.fromEntries(Object.entries(n).filter((([,e])=>{const r=(new Date).getTime();return!(e.timestamp+tPromise.all([e?e.value:r(),void 0!==e]))).then((([e,t])=>Promise.all([e,t||n.miss(e)]))).then((([e])=>e)),set:(e,r)=>Promise.resolve().then((()=>{const i=o();return i[JSON.stringify(e)]={timestamp:(new Date).getTime(),value:r},n().setItem(t,JSON.stringify(i)),r})),delete:e=>Promise.resolve().then((()=>{const r=o();delete r[JSON.stringify(e)],n().setItem(t,JSON.stringify(r))})),clear:()=>Promise.resolve().then((()=>{n().removeItem(t)}))}}function Or(e){const t=[...e.caches],r=t.shift();return void 0===r?{get:(e,t,r={miss:()=>Promise.resolve()})=>t().then((e=>Promise.all([e,r.miss(e)]))).then((([e])=>e)),set:(e,t)=>Promise.resolve(t),delete:e=>Promise.resolve(),clear:()=>Promise.resolve()}:{get:(e,n,o={miss:()=>Promise.resolve()})=>r.get(e,n,o).catch((()=>Or({caches:t}).get(e,n,o))),set:(e,n)=>r.set(e,n).catch((()=>Or({caches:t}).set(e,n))),delete:e=>r.delete(e).catch((()=>Or({caches:t}).delete(e))),clear:()=>r.clear().catch((()=>Or({caches:t}).clear()))}}function Sr(e={serializable:!0}){let t={};return{get(r,n,o={miss:()=>Promise.resolve()}){const i=JSON.stringify(r);if(i in t)return Promise.resolve(e.serializable?JSON.parse(t[i]):t[i]);const a=n(),c=o&&o.miss||(()=>Promise.resolve());return a.then((e=>c(e))).then((()=>a))},set:(r,n)=>(t[JSON.stringify(r)]=e.serializable?JSON.stringify(n):n,Promise.resolve(n)),delete:e=>(delete t[JSON.stringify(e)],Promise.resolve()),clear:()=>(t={},Promise.resolve())}}function jr(e){let t=e.length-1;for(;t>0;t--){const r=Math.floor(Math.random()*(t+1)),n=e[t];e[t]=e[r],e[r]=n}return e}function wr(e,t){return t?(Object.keys(t).forEach((r=>{e[r]=t[r](e)})),e):e}function Er(e,...t){let r=0;return e.replace(/%s/g,(()=>encodeURIComponent(t[r++])))}const Pr="4.19.0",Ir={WithinQueryParameters:0,WithinHeaders:1};function Dr(e,t){const r=e||{},n=r.data||{};return Object.keys(r).forEach((e=>{-1===["timeout","headers","queryParameters","data","cacheable"].indexOf(e)&&(n[e]=r[e])})),{data:Object.entries(n).length>0?n:void 0,timeout:r.timeout||t,headers:r.headers||{},queryParameters:r.queryParameters||{},cacheable:r.cacheable}}const Ar={Read:1,Write:2,Any:3},kr={Up:1,Down:2,Timeouted:3},xr=12e4;function Cr(e,t=kr.Up){return{...e,status:t,lastUpdate:Date.now()}}function _r(e){return"string"==typeof e?{protocol:"https",url:e,accept:Ar.Any}:{protocol:e.protocol||"https",url:e.url,accept:e.accept||Ar.Any}}const Nr={Delete:"DELETE",Get:"GET",Post:"POST",Put:"PUT"};function Tr(e,t){return Promise.all(t.map((t=>e.get(t,(()=>Promise.resolve(Cr(t))))))).then((e=>{const r=e.filter((e=>function(e){return e.status===kr.Up||Date.now()-e.lastUpdate>xr}(e))),n=e.filter((e=>function(e){return e.status===kr.Timeouted&&Date.now()-e.lastUpdate<=xr}(e))),o=[...r,...n];return{getTimeout:(e,t)=>(0===n.length&&0===e?1:n.length+3+e)*t,statelessHosts:o.length>0?o.map((e=>_r(e))):t}}))}const qr=(e,t)=>(e=>{const t=e.status;return e.isTimedOut||(({isTimedOut:e,status:t})=>!e&&0==~~t)(e)||2!=~~(t/100)&&4!=~~(t/100)})(e)?t.onRetry(e):(({status:e})=>2==~~(e/100))(e)?t.onSuccess(e):t.onFail(e);function Rr(e,t,r,n){const o=[],i=function(e,t){if(e.method===Nr.Get||void 0===e.data&&void 0===t.data)return;const r=Array.isArray(e.data)?e.data:{...e.data,...t.data};return JSON.stringify(r)}(r,n),a=function(e,t){const r={...e.headers,...t.headers},n={};return Object.keys(r).forEach((e=>{const t=r[e];n[e.toLowerCase()]=t})),n}(e,n),c=r.method,l=r.method!==Nr.Get?{}:{...r.data,...n.data},u={"x-algolia-agent":e.userAgent.value,...e.queryParameters,...l,...n.queryParameters};let s=0;const f=(t,l)=>{const m=t.pop();if(void 0===m)throw{name:"RetryError",message:"Unreachable hosts - your application id may be incorrect. If the error persists, contact support@algolia.com.",transporterStackTrace:Fr(o)};const p={data:i,headers:a,method:c,url:Mr(m,r.path,u),connectTimeout:l(s,e.timeouts.connect),responseTimeout:l(s,n.timeout)},v=e=>{const r={request:p,response:e,host:m,triesLeft:t.length};return o.push(r),r},d={onSuccess:e=>function(e){try{return JSON.parse(e.content)}catch(t){throw function(e,t){return{name:"DeserializationError",message:e,response:t}}(t.message,e)}}(e),onRetry(r){const n=v(r);return r.isTimedOut&&s++,Promise.all([e.logger.info("Retryable failure",Ur(n)),e.hostsCache.set(m,Cr(m,r.isTimedOut?kr.Timeouted:kr.Down))]).then((()=>f(t,l)))},onFail(e){throw v(e),function({content:e,status:t},r){let n=e;try{n=JSON.parse(e).message}catch(o){}return function(e,t,r){return{name:"ApiError",message:e,status:t,transporterStackTrace:r}}(n,t,r)}(e,Fr(o))}};return e.requester.send(p).then((e=>qr(e,d)))};return Tr(e.hostsCache,t).then((e=>f([...e.statelessHosts].reverse(),e.getTimeout)))}function Lr(e){const t={value:`Algolia for JavaScript (${e})`,add(e){const r=`; ${e.segment}${void 0!==e.version?` (${e.version})`:""}`;return-1===t.value.indexOf(r)&&(t.value=`${t.value}${r}`),t}};return t}function Mr(e,t,r){const n=Hr(r);let o=`${e.protocol}://${e.url}/${"/"===t.charAt(0)?t.substr(1):t}`;return n.length&&(o+=`?${n}`),o}function Hr(e){return Object.keys(e).map((t=>{return Er("%s=%s",t,(r=e[t],"[object Object]"===Object.prototype.toString.call(r)||"[object Array]"===Object.prototype.toString.call(r)?JSON.stringify(e[t]):e[t]));var r})).join("&")}function Fr(e){return e.map((e=>Ur(e)))}function Ur(e){const t=e.request.headers["x-algolia-api-key"]?{"x-algolia-api-key":"*****"}:{};return{...e,request:{...e.request,headers:{...e.request.headers,...t}}}}const Br=e=>{const t=e.appId,r=function(e,t,r){const n={"x-algolia-api-key":r,"x-algolia-application-id":t};return{headers:()=>e===Ir.WithinHeaders?n:{},queryParameters:()=>e===Ir.WithinQueryParameters?n:{}}}(void 0!==e.authMode?e.authMode:Ir.WithinHeaders,t,e.apiKey),n=function(e){const{hostsCache:t,logger:r,requester:n,requestsCache:o,responsesCache:i,timeouts:a,userAgent:c,hosts:l,queryParameters:u,headers:s}=e,f={hostsCache:t,logger:r,requester:n,requestsCache:o,responsesCache:i,timeouts:a,userAgent:c,headers:s,queryParameters:u,hosts:l.map((e=>_r(e))),read(e,t){const r=Dr(t,f.timeouts.read),n=()=>Rr(f,f.hosts.filter((e=>0!=(e.accept&Ar.Read))),e,r);if(!0!==(void 0!==r.cacheable?r.cacheable:e.cacheable))return n();const o={request:e,mappedRequestOptions:r,transporter:{queryParameters:f.queryParameters,headers:f.headers}};return f.responsesCache.get(o,(()=>f.requestsCache.get(o,(()=>f.requestsCache.set(o,n()).then((e=>Promise.all([f.requestsCache.delete(o),e])),(e=>Promise.all([f.requestsCache.delete(o),Promise.reject(e)]))).then((([e,t])=>t))))),{miss:e=>f.responsesCache.set(o,e)})},write:(e,t)=>Rr(f,f.hosts.filter((e=>0!=(e.accept&Ar.Write))),e,Dr(t,f.timeouts.write))};return f}({hosts:[{url:`${t}-dsn.algolia.net`,accept:Ar.Read},{url:`${t}.algolia.net`,accept:Ar.Write}].concat(jr([{url:`${t}-1.algolianet.com`},{url:`${t}-2.algolianet.com`},{url:`${t}-3.algolianet.com`}])),...e,headers:{...r.headers(),"content-type":"application/x-www-form-urlencoded",...e.headers},queryParameters:{...r.queryParameters(),...e.queryParameters}}),o={transporter:n,appId:t,addAlgoliaAgent(e,t){n.userAgent.add({segment:e,version:t})},clearCache:()=>Promise.all([n.requestsCache.clear(),n.responsesCache.clear()]).then((()=>{}))};return wr(o,e.methods)},Vr=e=>(t,r)=>t.method===Nr.Get?e.transporter.read(t,r):e.transporter.write(t,r),Kr=e=>(t,r={})=>wr({transporter:e.transporter,appId:e.appId,indexName:t},r.methods),$r=e=>(t,r)=>{const n=t.map((e=>({...e,params:Hr(e.params||{})})));return e.transporter.read({method:Nr.Post,path:"1/indexes/*/queries",data:{requests:n},cacheable:!0},r)},Jr=e=>(t,r)=>Promise.all(t.map((t=>{const{facetName:n,facetQuery:o,...i}=t.params;return Kr(e)(t.indexName,{methods:{searchForFacetValues:Qr}}).searchForFacetValues(n,o,{...r,...i})}))),zr=e=>(t,r,n)=>e.transporter.read({method:Nr.Post,path:Er("1/answers/%s/prediction",e.indexName),data:{query:t,queryLanguages:r},cacheable:!0},n),Wr=e=>(t,r)=>e.transporter.read({method:Nr.Post,path:Er("1/indexes/%s/query",e.indexName),data:{query:t},cacheable:!0},r),Qr=e=>(t,r,n)=>e.transporter.read({method:Nr.Post,path:Er("1/indexes/%s/facets/%s/query",e.indexName,t),data:{facetQuery:r},cacheable:!0},n),Zr={Debug:1,Info:2,Error:3};function Gr(e,t,r){const n={appId:e,apiKey:t,timeouts:{connect:1,read:2,write:30},requester:{send:e=>new Promise((t=>{const r=new XMLHttpRequest;r.open(e.method,e.url,!0),Object.keys(e.headers).forEach((t=>r.setRequestHeader(t,e.headers[t])));const n=(e,n)=>setTimeout((()=>{r.abort(),t({status:0,content:n,isTimedOut:!0})}),1e3*e),o=n(e.connectTimeout,"Connection timeout");let i;r.onreadystatechange=()=>{r.readyState>r.OPENED&&void 0===i&&(clearTimeout(o),i=n(e.responseTimeout,"Socket timeout"))},r.onerror=()=>{0===r.status&&(clearTimeout(o),clearTimeout(i),t({content:r.responseText||"Network request failed",status:r.status,isTimedOut:!1}))},r.onload=()=>{clearTimeout(o),clearTimeout(i),t({content:r.responseText,status:r.status,isTimedOut:!1})},r.send(e.data)}))},logger:(o=Zr.Error,{debug:(e,t)=>(Zr.Debug>=o&&console.debug(e,t),Promise.resolve()),info:(e,t)=>(Zr.Info>=o&&console.info(e,t),Promise.resolve()),error:(e,t)=>(console.error(e,t),Promise.resolve())}),responsesCache:Sr(),requestsCache:Sr({serializable:!1}),hostsCache:Or({caches:[gr({key:`${Pr}-${e}`}),Sr()]}),userAgent:Lr(Pr).add({segment:"Browser",version:"lite"}),authMode:Ir.WithinQueryParameters};var o;return Br({...n,...r,methods:{search:$r,searchForFacetValues:Jr,multipleQueries:$r,multipleSearchForFacetValues:Jr,customRequest:Vr,initIndex:e=>t=>Kr(e)(t,{methods:{search:Wr,searchForFacetValues:Qr,findAnswers:zr}})}})}Gr.version=Pr;const Xr=Gr;var Yr="3.5.1";function en(){}function tn(e){return e}function rn(e){return 1===e.button||e.altKey||e.ctrlKey||e.metaKey||e.shiftKey}function nn(e,t,r){return e.reduce((function(e,n){var o=t(n);return e.hasOwnProperty(o)||(e[o]=[]),e[o].length<(r||5)&&e[o].push(n),e}),{})}var on=["footer","searchBox"];function an(){return an=Object.assign||function(e){for(var t=1;te.length)&&(t=e.length);for(var r=0,n=new Array(t);r=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}function pn(e){var t=e.appId,r=e.apiKey,n=e.indexName,o=e.placeholder,i=void 0===o?"Search docs":o,a=e.searchParameters,c=e.maxResultsPerGroup,l=e.onClose,u=void 0===l?en:l,s=e.transformItems,f=void 0===s?tn:s,m=e.hitComponent,p=void 0===m?St:m,v=e.resultsFooterComponent,d=void 0===v?function(){return null}:v,y=e.navigator,h=e.initialScrollY,b=void 0===h?0:h,g=e.transformSearchClient,O=void 0===g?tn:g,S=e.disableUserPersonalization,j=void 0!==S&&S,w=e.initialQuery,E=void 0===w?"":w,P=e.translations,I=void 0===P?{}:P,D=e.getMissingResultsUrl,A=e.insights,k=void 0!==A&&A,x=I.footer,C=I.searchBox,_=mn(I,on),N=sn(yt.useState({query:"",collections:[],completion:null,context:{},isOpen:!1,activeItemId:null,status:"idle"}),2),T=N[0],q=N[1],R=yt.useRef(null),L=yt.useRef(null),M=yt.useRef(null),H=yt.useRef(null),F=yt.useRef(null),U=yt.useRef(10),B=yt.useRef("undefined"!=typeof window?window.getSelection().toString().slice(0,ht):"").current,V=yt.useRef(E||B).current,K=function(e,t,r){return yt.useMemo((function(){var n=Xr(e,t);return n.addAlgoliaAgent("docsearch",Yr),!1===/docsearch.js \(.*\)/.test(n.transporter.userAgent.value)&&n.addAlgoliaAgent("docsearch-react",Yr),r(n)}),[e,t,r])}(t,r,O),$=yt.useRef(br({key:"__DOCSEARCH_FAVORITE_SEARCHES__".concat(n),limit:10})).current,J=yt.useRef(br({key:"__DOCSEARCH_RECENT_SEARCHES__".concat(n),limit:0===$.getAll().length?7:4})).current,z=yt.useCallback((function(e){if(!j){var t="content"===e.type?e.__docsearch_parent:e;t&&-1===$.getAll().findIndex((function(e){return e.objectID===t.objectID}))&&J.add(t)}}),[$,J,j]),W=yt.useCallback((function(e){if(T.context.algoliaInsightsPlugin&&e.__autocomplete_id){var t=e,r={eventName:"Item Selected",index:t.__autocomplete_indexName,items:[t],positions:[e.__autocomplete_id],queryID:t.__autocomplete_queryID};T.context.algoliaInsightsPlugin.insights.clickedObjectIDsAfterSearch(r)}}),[T.context.algoliaInsightsPlugin]),Q=yt.useMemo((function(){return dt({id:"docsearch",defaultActiveItemId:0,placeholder:i,openOnFocus:!0,initialState:{query:V,context:{searchSuggestions:[]}},insights:k,navigator:y,onStateChange:function(e){q(e.state)},getSources:function(e){var o=e.query,i=e.state,l=e.setContext,s=e.setStatus;if(!o)return j?[]:[{sourceId:"recentSearches",onSelect:function(e){var t=e.item,r=e.event;z(t),rn(r)||u()},getItemUrl:function(e){return e.item.url},getItems:function(){return J.getAll()}},{sourceId:"favoriteSearches",onSelect:function(e){var t=e.item,r=e.event;z(t),rn(r)||u()},getItemUrl:function(e){return e.item.url},getItems:function(){return $.getAll()}}];var m=Boolean(k);return K.search([{query:o,indexName:n,params:ln({attributesToRetrieve:["hierarchy.lvl0","hierarchy.lvl1","hierarchy.lvl2","hierarchy.lvl3","hierarchy.lvl4","hierarchy.lvl5","hierarchy.lvl6","content","type","url"],attributesToSnippet:["hierarchy.lvl1:".concat(U.current),"hierarchy.lvl2:".concat(U.current),"hierarchy.lvl3:".concat(U.current),"hierarchy.lvl4:".concat(U.current),"hierarchy.lvl5:".concat(U.current),"hierarchy.lvl6:".concat(U.current),"content:".concat(U.current)],snippetEllipsisText:"\u2026",highlightPreTag:"",highlightPostTag:"",hitsPerPage:20,clickAnalytics:m},a)}]).catch((function(e){throw"RetryError"===e.name&&s("error"),e})).then((function(e){var o=e.results,a=o[0],s=a.hits,p=a.nbHits,v=nn(s,(function(e){return Qt(e)}),c);i.context.searchSuggestions.length0&&(X(),F.current&&F.current.focus())}),[V,X]),yt.useEffect((function(){function e(){if(L.current){var e=.01*window.innerHeight;L.current.style.setProperty("--docsearch-vh","".concat(e,"px"))}}return e(),window.addEventListener("resize",e),function(){window.removeEventListener("resize",e)}}),[]),yt.createElement("div",an({ref:R},G({"aria-expanded":!0}),{className:["DocSearch","DocSearch-Container","stalled"===T.status&&"DocSearch-Container--Stalled","error"===T.status&&"DocSearch-Container--Errored"].filter(Boolean).join(" "),role:"button",tabIndex:0,onMouseDown:function(e){e.target===e.currentTarget&&u()}}),yt.createElement("div",{className:"DocSearch-Modal",ref:L},yt.createElement("header",{className:"DocSearch-SearchBar",ref:M},yt.createElement(vr,an({},Q,{state:T,autoFocus:0===V.length,inputRef:F,isFromSelection:Boolean(V)&&V===B,translations:C,onClose:u}))),yt.createElement("div",{className:"DocSearch-Dropdown",ref:H},yt.createElement(lr,an({},Q,{indexName:n,state:T,hitComponent:p,resultsFooterComponent:d,disableUserPersonalization:j,recentSearches:J,favoriteSearches:$,inputRef:F,translations:_,getMissingResultsUrl:D,onItemClick:function(e,t){W(e),z(e),rn(t)||u()}}))),yt.createElement("footer",{className:"DocSearch-Footer"},yt.createElement(Ot,{translations:x}))))}}}]); \ No newline at end of file +"use strict";(self.webpackChunk_cumulus_website=self.webpackChunk_cumulus_website||[]).push([[61426],{61426:(e,t,r)=>{function n(e,t){var r=void 0;return function(){for(var n=arguments.length,o=new Array(n),i=0;ipn});var a=function(){};function c(e){var t=e.item,r=e.items;return{index:t.__autocomplete_indexName,items:[t],positions:[1+r.findIndex((function(e){return e.objectID===t.objectID}))],queryID:t.__autocomplete_queryID,algoliaSource:["autocomplete"]}}function l(e,t){return function(e){if(Array.isArray(e))return e}(e)||function(e,t){var r=null==e?null:"undefined"!=typeof Symbol&&e[Symbol.iterator]||e["@@iterator"];if(null!=r){var n,o,i,a,c=[],l=!0,u=!1;try{if(i=(r=r.call(e)).next,0===t){if(Object(r)!==r)return;l=!1}else for(;!(l=(n=i.call(r)).done)&&(c.push(n.value),c.length!==t);l=!0);}catch(s){u=!0,o=s}finally{try{if(!l&&null!=r.return&&(a=r.return(),Object(a)!==a))return}finally{if(u)throw o}}return c}}(e,t)||function(e,t){if(!e)return;if("string"==typeof e)return u(e,t);var r=Object.prototype.toString.call(e).slice(8,-1);"Object"===r&&e.constructor&&(r=e.constructor.name);if("Map"===r||"Set"===r)return Array.from(e);if("Arguments"===r||/^(?:Ui|I)nt(?:8|16|32)(?:Clamped)?Array$/.test(r))return u(e,t)}(e,t)||function(){throw new TypeError("Invalid attempt to destructure non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.")}()}function u(e,t){(null==t||t>e.length)&&(t=e.length);for(var r=0,n=new Array(t);re.length)&&(t=e.length);for(var r=0,n=new Array(t);r=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}function y(e,t){var r=Object.keys(e);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(e);t&&(n=n.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),r.push.apply(r,n)}return r}function h(e){for(var t=1;t=3||2===r&&n>=4||1===r&&n>=10);function i(t,r,n){if(o&&void 0!==n){var i=n[0].__autocomplete_algoliaCredentials,a={"X-Algolia-Application-Id":i.appId,"X-Algolia-API-Key":i.apiKey};e.apply(void 0,[t].concat(p(r),[{headers:a}]))}else e.apply(void 0,[t].concat(p(r)))}return{init:function(t,r){e("init",{appId:t,apiKey:r})},setUserToken:function(t){e("setUserToken",t)},clickedObjectIDsAfterSearch:function(){for(var e=arguments.length,t=new Array(e),r=0;r0&&i("clickedObjectIDsAfterSearch",g(t),t[0].items)},clickedObjectIDs:function(){for(var e=arguments.length,t=new Array(e),r=0;r0&&i("clickedObjectIDs",g(t),t[0].items)},clickedFilters:function(){for(var t=arguments.length,r=new Array(t),n=0;n0&&e.apply(void 0,["clickedFilters"].concat(r))},convertedObjectIDsAfterSearch:function(){for(var e=arguments.length,t=new Array(e),r=0;r0&&i("convertedObjectIDsAfterSearch",g(t),t[0].items)},convertedObjectIDs:function(){for(var e=arguments.length,t=new Array(e),r=0;r0&&i("convertedObjectIDs",g(t),t[0].items)},convertedFilters:function(){for(var t=arguments.length,r=new Array(t),n=0;n0&&e.apply(void 0,["convertedFilters"].concat(r))},viewedObjectIDs:function(){for(var e=arguments.length,t=new Array(e),r=0;r0&&t.reduce((function(e,t){var r=t.items,n=d(t,f);return[].concat(p(e),p(function(e){for(var t=arguments.length>1&&void 0!==arguments[1]?arguments[1]:20,r=[],n=0;n0&&e.apply(void 0,["viewedFilters"].concat(r))}}}function S(e){var t=e.items.reduce((function(e,t){var r;return e[t.__autocomplete_indexName]=(null!==(r=e[t.__autocomplete_indexName])&&void 0!==r?r:[]).concat(t),e}),{});return Object.keys(t).map((function(e){return{index:e,items:t[e],algoliaSource:["autocomplete"]}}))}function j(e){return e.objectID&&e.__autocomplete_indexName&&e.__autocomplete_queryID}function w(e){return w="function"==typeof Symbol&&"symbol"==typeof Symbol.iterator?function(e){return typeof e}:function(e){return e&&"function"==typeof Symbol&&e.constructor===Symbol&&e!==Symbol.prototype?"symbol":typeof e},w(e)}function E(e){return function(e){if(Array.isArray(e))return P(e)}(e)||function(e){if("undefined"!=typeof Symbol&&null!=e[Symbol.iterator]||null!=e["@@iterator"])return Array.from(e)}(e)||function(e,t){if(!e)return;if("string"==typeof e)return P(e,t);var r=Object.prototype.toString.call(e).slice(8,-1);"Object"===r&&e.constructor&&(r=e.constructor.name);if("Map"===r||"Set"===r)return Array.from(e);if("Arguments"===r||/^(?:Ui|I)nt(?:8|16|32)(?:Clamped)?Array$/.test(r))return P(e,t)}(e)||function(){throw new TypeError("Invalid attempt to spread non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.")}()}function P(e,t){(null==t||t>e.length)&&(t=e.length);for(var r=0,n=new Array(t);r0&&C({onItemsChange:o,items:r,insights:f,state:t}))}}),0);return{name:"aa.algoliaInsightsPlugin",subscribe:function(e){var t=e.setContext,r=e.onSelect,n=e.onActive;s("addAlgoliaAgent","insights-plugin"),t({algoliaInsightsPlugin:{__algoliaSearchParameters:{clickAnalytics:!0},insights:f}}),r((function(e){var t=e.item,r=e.state,n=e.event;j(t)&&l({state:r,event:n,insights:f,item:t,insightsEvents:[D({eventName:"Item Selected"},c({item:t,items:m.current}))]})})),n((function(e){var t=e.item,r=e.state,n=e.event;j(t)&&u({state:r,event:n,insights:f,item:t,insightsEvents:[D({eventName:"Item Active"},c({item:t,items:m.current}))]})}))},onStateChange:function(e){var t=e.state;p({state:t})},__autocomplete_pluginOptions:e}}function N(e){return N="function"==typeof Symbol&&"symbol"==typeof Symbol.iterator?function(e){return typeof e}:function(e){return e&&"function"==typeof Symbol&&e.constructor===Symbol&&e!==Symbol.prototype?"symbol":typeof e},N(e)}function T(e,t){var r=Object.keys(e);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(e);t&&(n=n.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),r.push.apply(r,n)}return r}function q(e,t,r){return(t=function(e){var t=function(e,t){if("object"!==N(e)||null===e)return e;var r=e[Symbol.toPrimitive];if(void 0!==r){var n=r.call(e,t||"default");if("object"!==N(n))return n;throw new TypeError("@@toPrimitive must return a primitive value.")}return("string"===t?String:Number)(e)}(e,"string");return"symbol"===N(t)?t:String(t)}(t))in e?Object.defineProperty(e,t,{value:r,enumerable:!0,configurable:!0,writable:!0}):e[t]=r,e}function R(e,t,r){var n,o=t.initialState;return{getState:function(){return o},dispatch:function(n,i){var a=function(e){for(var t=1;te.length)&&(t=e.length);for(var r=0,n=new Array(t);r0},reshape:function(e){return e.sources}},e),{},{id:null!==(r=e.id)&&void 0!==r?r:"autocomplete-".concat(V++),plugins:o,initialState:X({activeItemId:null,query:"",completion:null,collections:[],isOpen:!1,status:"idle",context:{}},e.initialState),onStateChange:function(t){var r;null===(r=e.onStateChange)||void 0===r||r.call(e,t),o.forEach((function(e){var r;return null===(r=e.onStateChange)||void 0===r?void 0:r.call(e,t)}))},onSubmit:function(t){var r;null===(r=e.onSubmit)||void 0===r||r.call(e,t),o.forEach((function(e){var r;return null===(r=e.onSubmit)||void 0===r?void 0:r.call(e,t)}))},onReset:function(t){var r;null===(r=e.onReset)||void 0===r||r.call(e,t),o.forEach((function(e){var r;return null===(r=e.onReset)||void 0===r?void 0:r.call(e,t)}))},getSources:function(r){return Promise.all([].concat(Q(o.map((function(e){return e.getSources}))),[e.getSources]).filter(Boolean).map((function(e){return function(e,t){var r=[];return Promise.resolve(e(t)).then((function(e){return Array.isArray(e),Promise.all(e.filter((function(e){return Boolean(e)})).map((function(e){if(e.sourceId,r.includes(e.sourceId))throw new Error("[Autocomplete] The `sourceId` ".concat(JSON.stringify(e.sourceId)," is not unique."));r.push(e.sourceId);var t={getItemInputValue:function(e){return e.state.query},getItemUrl:function(){},onSelect:function(e){(0,e.setIsOpen)(!1)},onActive:a,onResolve:a};Object.keys(t).forEach((function(e){t[e].__default=!0}));var n=$($({},t),e);return Promise.resolve(n)})))}))}(e,r)}))).then((function(e){return L(e)})).then((function(e){return e.map((function(e){return X(X({},e),{},{onSelect:function(r){e.onSelect(r),t.forEach((function(e){var t;return null===(t=e.onSelect)||void 0===t?void 0:t.call(e,r)}))},onActive:function(r){e.onActive(r),t.forEach((function(e){var t;return null===(t=e.onActive)||void 0===t?void 0:t.call(e,r)}))},onResolve:function(r){e.onResolve(r),t.forEach((function(e){var t;return null===(t=e.onResolve)||void 0===t?void 0:t.call(e,r)}))}})}))}))},navigator:X({navigate:function(e){var t=e.itemUrl;n.location.assign(t)},navigateNewTab:function(e){var t=e.itemUrl,r=n.open(t,"_blank","noopener");null==r||r.focus()},navigateNewWindow:function(e){var t=e.itemUrl;n.open(t,"_blank","noopener")}},e.navigator)})}function te(e){return te="function"==typeof Symbol&&"symbol"==typeof Symbol.iterator?function(e){return typeof e}:function(e){return e&&"function"==typeof Symbol&&e.constructor===Symbol&&e!==Symbol.prototype?"symbol":typeof e},te(e)}function re(e,t){var r=Object.keys(e);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(e);t&&(n=n.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),r.push.apply(r,n)}return r}function ne(e){for(var t=1;te.length)&&(t=e.length);for(var r=0,n=new Array(t);r=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}var Ie,De,Ae,ke=null,xe=(Ie=-1,De=-1,Ae=void 0,function(e){var t=++Ie;return Promise.resolve(e).then((function(e){return Ae&&t=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}var Me=/((gt|sm)-|galaxy nexus)|samsung[- ]|samsungbrowser/i;function He(e){return He="function"==typeof Symbol&&"symbol"==typeof Symbol.iterator?function(e){return typeof e}:function(e){return e&&"function"==typeof Symbol&&e.constructor===Symbol&&e!==Symbol.prototype?"symbol":typeof e},He(e)}var Fe=["props","refresh","store"],Ue=["inputElement","formElement","panelElement"],Be=["inputElement"],Ve=["inputElement","maxLength"],Ke=["sourceIndex"],$e=["sourceIndex"],Je=["item","source","sourceIndex"];function ze(e,t){var r=Object.keys(e);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(e);t&&(n=n.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),r.push.apply(r,n)}return r}function We(e){for(var t=1;t=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}function Ge(e){var t=e.props,r=e.refresh,n=e.store,o=Ze(e,Fe),i=function(e,t){return void 0!==t?"".concat(e,"-").concat(t):e};return{getEnvironmentProps:function(e){var r=e.inputElement,o=e.formElement,i=e.panelElement;function a(e){!n.getState().isOpen&&n.pendingRequests.isEmpty()||e.target===r||!1===[o,i].some((function(t){return r=t,n=e.target,r===n||r.contains(n);var r,n}))&&(n.dispatch("blur",null),t.debug||n.pendingRequests.cancelAll())}return We({onTouchStart:a,onMouseDown:a,onTouchMove:function(e){!1!==n.getState().isOpen&&r===t.environment.document.activeElement&&e.target!==r&&r.blur()}},Ze(e,Ue))},getRootProps:function(e){return We({role:"combobox","aria-expanded":n.getState().isOpen,"aria-haspopup":"listbox","aria-owns":n.getState().isOpen?"".concat(t.id,"-list"):void 0,"aria-labelledby":"".concat(t.id,"-label")},e)},getFormProps:function(e){e.inputElement;return We({action:"",noValidate:!0,role:"search",onSubmit:function(i){var a;i.preventDefault(),t.onSubmit(We({event:i,refresh:r,state:n.getState()},o)),n.dispatch("submit",null),null===(a=e.inputElement)||void 0===a||a.blur()},onReset:function(i){var a;i.preventDefault(),t.onReset(We({event:i,refresh:r,state:n.getState()},o)),n.dispatch("reset",null),null===(a=e.inputElement)||void 0===a||a.focus()}},Ze(e,Be))},getLabelProps:function(e){var r=e||{},n=r.sourceIndex,o=Ze(r,Ke);return We({htmlFor:"".concat(i(t.id,n),"-input"),id:"".concat(i(t.id,n),"-label")},o)},getInputProps:function(e){var i;function c(e){(t.openOnFocus||Boolean(n.getState().query))&&Ce(We({event:e,props:t,query:n.getState().completion||n.getState().query,refresh:r,store:n},o)),n.dispatch("focus",null)}var l=e||{},u=(l.inputElement,l.maxLength),s=void 0===u?512:u,f=Ze(l,Ve),m=ge(n.getState()),p=function(e){return Boolean(e&&e.match(Me))}((null===(i=t.environment.navigator)||void 0===i?void 0:i.userAgent)||""),v=null!=m&&m.itemUrl&&!p?"go":"search";return We({"aria-autocomplete":"both","aria-activedescendant":n.getState().isOpen&&null!==n.getState().activeItemId?"".concat(t.id,"-item-").concat(n.getState().activeItemId):void 0,"aria-controls":n.getState().isOpen?"".concat(t.id,"-list"):void 0,"aria-labelledby":"".concat(t.id,"-label"),value:n.getState().completion||n.getState().query,id:"".concat(t.id,"-input"),autoComplete:"off",autoCorrect:"off",autoCapitalize:"off",enterKeyHint:v,spellCheck:"false",autoFocus:t.autoFocus,placeholder:t.placeholder,maxLength:s,type:"search",onChange:function(e){Ce(We({event:e,props:t,query:e.currentTarget.value.slice(0,s),refresh:r,store:n},o))},onKeyDown:function(e){!function(e){var t=e.event,r=e.props,n=e.refresh,o=e.store,i=Le(e,Ne);if("ArrowUp"===t.key||"ArrowDown"===t.key){var a=function(){var e=r.environment.document.getElementById("".concat(r.id,"-item-").concat(o.getState().activeItemId));e&&(e.scrollIntoViewIfNeeded?e.scrollIntoViewIfNeeded(!1):e.scrollIntoView(!1))},c=function(){var e=ge(o.getState());if(null!==o.getState().activeItemId&&e){var r=e.item,a=e.itemInputValue,c=e.itemUrl,l=e.source;l.onActive(qe({event:t,item:r,itemInputValue:a,itemUrl:c,refresh:n,source:l,state:o.getState()},i))}};t.preventDefault(),!1===o.getState().isOpen&&(r.openOnFocus||Boolean(o.getState().query))?Ce(qe({event:t,props:r,query:o.getState().query,refresh:n,store:o},i)).then((function(){o.dispatch(t.key,{nextActiveItemId:r.defaultActiveItemId}),c(),setTimeout(a,0)})):(o.dispatch(t.key,{}),c(),a())}else if("Escape"===t.key)t.preventDefault(),o.dispatch(t.key,null),o.pendingRequests.cancelAll();else if("Tab"===t.key)o.dispatch("blur",null),o.pendingRequests.cancelAll();else if("Enter"===t.key){if(null===o.getState().activeItemId||o.getState().collections.every((function(e){return 0===e.items.length})))return void(r.debug||o.pendingRequests.cancelAll());t.preventDefault();var l=ge(o.getState()),u=l.item,s=l.itemInputValue,f=l.itemUrl,m=l.source;if(t.metaKey||t.ctrlKey)void 0!==f&&(m.onSelect(qe({event:t,item:u,itemInputValue:s,itemUrl:f,refresh:n,source:m,state:o.getState()},i)),r.navigator.navigateNewTab({itemUrl:f,item:u,state:o.getState()}));else if(t.shiftKey)void 0!==f&&(m.onSelect(qe({event:t,item:u,itemInputValue:s,itemUrl:f,refresh:n,source:m,state:o.getState()},i)),r.navigator.navigateNewWindow({itemUrl:f,item:u,state:o.getState()}));else if(t.altKey);else{if(void 0!==f)return m.onSelect(qe({event:t,item:u,itemInputValue:s,itemUrl:f,refresh:n,source:m,state:o.getState()},i)),void r.navigator.navigate({itemUrl:f,item:u,state:o.getState()});Ce(qe({event:t,nextState:{isOpen:!1},props:r,query:s,refresh:n,store:o},i)).then((function(){m.onSelect(qe({event:t,item:u,itemInputValue:s,itemUrl:f,refresh:n,source:m,state:o.getState()},i))}))}}}(We({event:e,props:t,refresh:r,store:n},o))},onFocus:c,onBlur:a,onClick:function(r){e.inputElement!==t.environment.document.activeElement||n.getState().isOpen||c(r)}},f)},getPanelProps:function(e){return We({onMouseDown:function(e){e.preventDefault()},onMouseLeave:function(){n.dispatch("mouseleave",null)}},e)},getListProps:function(e){var r=e||{},n=r.sourceIndex,o=Ze(r,$e);return We({role:"listbox","aria-labelledby":"".concat(i(t.id,n),"-label"),id:"".concat(i(t.id,n),"-list")},o)},getItemProps:function(e){var a=e.item,c=e.source,l=e.sourceIndex,u=Ze(e,Je);return We({id:"".concat(i(t.id,l),"-item-").concat(a.__autocomplete_id),role:"option","aria-selected":n.getState().activeItemId===a.__autocomplete_id,onMouseMove:function(e){if(a.__autocomplete_id!==n.getState().activeItemId){n.dispatch("mousemove",a.__autocomplete_id);var t=ge(n.getState());if(null!==n.getState().activeItemId&&t){var i=t.item,c=t.itemInputValue,l=t.itemUrl,u=t.source;u.onActive(We({event:e,item:i,itemInputValue:c,itemUrl:l,refresh:r,source:u,state:n.getState()},o))}}},onMouseDown:function(e){e.preventDefault()},onClick:function(e){var i=c.getItemInputValue({item:a,state:n.getState()}),l=c.getItemUrl({item:a,state:n.getState()});(l?Promise.resolve():Ce(We({event:e,nextState:{isOpen:!1},props:t,query:i,refresh:r,store:n},o))).then((function(){c.onSelect(We({event:e,item:a,itemInputValue:i,itemUrl:l,refresh:r,source:c,state:n.getState()},o))}))}},u)}}}var Xe=[{segment:"autocomplete-core",version:"1.9.3"}];function Ye(e){return Ye="function"==typeof Symbol&&"symbol"==typeof Symbol.iterator?function(e){return typeof e}:function(e){return e&&"function"==typeof Symbol&&e.constructor===Symbol&&e!==Symbol.prototype?"symbol":typeof e},Ye(e)}function et(e,t){var r=Object.keys(e);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(e);t&&(n=n.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),r.push.apply(r,n)}return r}function tt(e){for(var t=1;t=r?null===n?null:0:o}function at(e){return at="function"==typeof Symbol&&"symbol"==typeof Symbol.iterator?function(e){return typeof e}:function(e){return e&&"function"==typeof Symbol&&e.constructor===Symbol&&e!==Symbol.prototype?"symbol":typeof e},at(e)}function ct(e,t){var r=Object.keys(e);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(e);t&&(n=n.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),r.push.apply(r,n)}return r}function lt(e){for(var t=1;te.length)&&(t=e.length);for(var r=0,n=new Array(t);r=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}function kt(e){var t=e.translations,r=void 0===t?{}:t,n=At(e,Pt),o=r.noResultsText,i=void 0===o?"No results for":o,a=r.suggestedQueryText,c=void 0===a?"Try searching for":a,l=r.reportMissingResultsText,u=void 0===l?"Believe this query should return results?":l,s=r.reportMissingResultsLinkText,f=void 0===s?"Let us know.":s,m=n.state.context.searchSuggestions;return yt.createElement("div",{className:"DocSearch-NoResults"},yt.createElement("div",{className:"DocSearch-Screen-Icon"},yt.createElement(Et,null)),yt.createElement("p",{className:"DocSearch-Title"},i,' "',yt.createElement("strong",null,n.state.query),'"'),m&&m.length>0&&yt.createElement("div",{className:"DocSearch-NoResults-Prefill-List"},yt.createElement("p",{className:"DocSearch-Help"},c,":"),yt.createElement("ul",null,m.slice(0,3).reduce((function(e,t){return[].concat(It(e),[yt.createElement("li",{key:t},yt.createElement("button",{className:"DocSearch-Prefill",key:t,type:"button",onClick:function(){n.setQuery(t.toLowerCase()+" "),n.refresh(),n.inputRef.current.focus()}},t))])}),[]))),n.getMissingResultsUrl&&yt.createElement("p",{className:"DocSearch-Help"},"".concat(u," "),yt.createElement("a",{href:n.getMissingResultsUrl({query:n.state.query}),target:"_blank",rel:"noopener noreferrer"},f)))}var xt=function(){return yt.createElement("svg",{width:"20",height:"20",viewBox:"0 0 20 20"},yt.createElement("path",{d:"M17 6v12c0 .52-.2 1-1 1H4c-.7 0-1-.33-1-1V2c0-.55.42-1 1-1h8l5 5zM14 8h-3.13c-.51 0-.87-.34-.87-.87V4",stroke:"currentColor",fill:"none",fillRule:"evenodd",strokeLinejoin:"round"}))};function Ct(e){switch(e.type){case"lvl1":return yt.createElement(xt,null);case"content":return yt.createElement(Nt,null);default:return yt.createElement(_t,null)}}function _t(){return yt.createElement("svg",{width:"20",height:"20",viewBox:"0 0 20 20"},yt.createElement("path",{d:"M13 13h4-4V8H7v5h6v4-4H7V8H3h4V3v5h6V3v5h4-4v5zm-6 0v4-4H3h4z",stroke:"currentColor",fill:"none",fillRule:"evenodd",strokeLinecap:"round",strokeLinejoin:"round"}))}function Nt(){return yt.createElement("svg",{width:"20",height:"20",viewBox:"0 0 20 20"},yt.createElement("path",{d:"M17 5H3h14zm0 5H3h14zm0 5H3h14z",stroke:"currentColor",fill:"none",fillRule:"evenodd",strokeLinejoin:"round"}))}function Tt(){return yt.createElement("svg",{className:"DocSearch-Hit-Select-Icon",width:"20",height:"20",viewBox:"0 0 20 20"},yt.createElement("g",{stroke:"currentColor",fill:"none",fillRule:"evenodd",strokeLinecap:"round",strokeLinejoin:"round"},yt.createElement("path",{d:"M18 3v4c0 2-2 4-4 4H2"}),yt.createElement("path",{d:"M8 17l-6-6 6-6"})))}var qt=["hit","attribute","tagName"];function Rt(e,t){var r=Object.keys(e);if(Object.getOwnPropertySymbols){var n=Object.getOwnPropertySymbols(e);t&&(n=n.filter((function(t){return Object.getOwnPropertyDescriptor(e,t).enumerable}))),r.push.apply(r,n)}return r}function Lt(e){for(var t=1;t=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}function Ft(e,t){return t.split(".").reduce((function(e,t){return null!=e&&e[t]?e[t]:null}),e)}function Ut(e){var t=e.hit,r=e.attribute,n=e.tagName,o=void 0===n?"span":n,i=Ht(e,qt);return(0,yt.createElement)(o,Lt(Lt({},i),{},{dangerouslySetInnerHTML:{__html:Ft(t,"_snippetResult.".concat(r,".value"))||Ft(t,r)}}))}function Bt(e,t){return function(e){if(Array.isArray(e))return e}(e)||function(e,t){var r=null==e?null:"undefined"!=typeof Symbol&&e[Symbol.iterator]||e["@@iterator"];if(null==r)return;var n,o,i=[],a=!0,c=!1;try{for(r=r.call(e);!(a=(n=r.next()).done)&&(i.push(n.value),!t||i.length!==t);a=!0);}catch(l){c=!0,o=l}finally{try{a||null==r.return||r.return()}finally{if(c)throw o}}return i}(e,t)||function(e,t){if(!e)return;if("string"==typeof e)return Vt(e,t);var r=Object.prototype.toString.call(e).slice(8,-1);"Object"===r&&e.constructor&&(r=e.constructor.name);if("Map"===r||"Set"===r)return Array.from(e);if("Arguments"===r||/^(?:Ui|I)nt(?:8|16|32)(?:Clamped)?Array$/.test(r))return Vt(e,t)}(e,t)||function(){throw new TypeError("Invalid attempt to destructure non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.")}()}function Vt(e,t){(null==t||t>e.length)&&(t=e.length);for(var r=0,n=new Array(t);r|<\/mark>)/g,Wt=RegExp(zt.source);function Qt(e){var t,r,n,o,i,a=e;if(!a.__docsearch_parent&&!e._highlightResult)return e.hierarchy.lvl0;var c=((a.__docsearch_parent?null===(t=a.__docsearch_parent)||void 0===t||null===(r=t._highlightResult)||void 0===r||null===(n=r.hierarchy)||void 0===n?void 0:n.lvl0:null===(o=e._highlightResult)||void 0===o||null===(i=o.hierarchy)||void 0===i?void 0:i.lvl0)||{}).value;return c&&Wt.test(c)?c.replace(zt,""):c}function Zt(){return Zt=Object.assign||function(e){for(var t=1;t=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}function or(e){var t=e.translations,r=void 0===t?{}:t,n=nr(e,tr),o=r.recentSearchesTitle,i=void 0===o?"Recent":o,a=r.noRecentSearchesText,c=void 0===a?"No recent searches":a,l=r.saveRecentSearchButtonTitle,u=void 0===l?"Save this search":l,s=r.removeRecentSearchButtonTitle,f=void 0===s?"Remove this search from history":s,m=r.favoriteSearchesTitle,p=void 0===m?"Favorite":m,v=r.removeFavoriteSearchButtonTitle,d=void 0===v?"Remove this search from favorites":v;return"idle"===n.state.status&&!1===n.hasCollections?n.disableUserPersonalization?null:yt.createElement("div",{className:"DocSearch-StartScreen"},yt.createElement("p",{className:"DocSearch-Help"},c)):!1===n.hasCollections?null:yt.createElement("div",{className:"DocSearch-Dropdown-Container"},yt.createElement($t,rr({},n,{title:i,collection:n.state.collections[0],renderIcon:function(){return yt.createElement("div",{className:"DocSearch-Hit-icon"},yt.createElement(Xt,null))},renderAction:function(e){var t=e.item,r=e.runFavoriteTransition,o=e.runDeleteTransition;return yt.createElement(yt.Fragment,null,yt.createElement("div",{className:"DocSearch-Hit-action"},yt.createElement("button",{className:"DocSearch-Hit-action-button",title:u,type:"submit",onClick:function(e){e.preventDefault(),e.stopPropagation(),r((function(){n.favoriteSearches.add(t),n.recentSearches.remove(t),n.refresh()}))}},yt.createElement(Yt,null))),yt.createElement("div",{className:"DocSearch-Hit-action"},yt.createElement("button",{className:"DocSearch-Hit-action-button",title:f,type:"submit",onClick:function(e){e.preventDefault(),e.stopPropagation(),o((function(){n.recentSearches.remove(t),n.refresh()}))}},yt.createElement(er,null))))}})),yt.createElement($t,rr({},n,{title:p,collection:n.state.collections[1],renderIcon:function(){return yt.createElement("div",{className:"DocSearch-Hit-icon"},yt.createElement(Yt,null))},renderAction:function(e){var t=e.item,r=e.runDeleteTransition;return yt.createElement("div",{className:"DocSearch-Hit-action"},yt.createElement("button",{className:"DocSearch-Hit-action-button",title:d,type:"submit",onClick:function(e){e.preventDefault(),e.stopPropagation(),r((function(){n.favoriteSearches.remove(t),n.refresh()}))}},yt.createElement(er,null)))}})))}var ir=["translations"];function ar(){return ar=Object.assign||function(e){for(var t=1;t=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}var lr=yt.memo((function(e){var t=e.translations,r=void 0===t?{}:t,n=cr(e,ir);if("error"===n.state.status)return yt.createElement(wt,{translations:null==r?void 0:r.errorScreen});var o=n.state.collections.some((function(e){return e.items.length>0}));return n.state.query?!1===o?yt.createElement(kt,ar({},n,{translations:null==r?void 0:r.noResultsScreen})):yt.createElement(Gt,n):yt.createElement(or,ar({},n,{hasCollections:o,translations:null==r?void 0:r.startScreen}))}),(function(e,t){return"loading"===t.state.status||"stalled"===t.state.status}));function ur(){return yt.createElement("svg",{viewBox:"0 0 38 38",stroke:"currentColor",strokeOpacity:".5"},yt.createElement("g",{fill:"none",fillRule:"evenodd"},yt.createElement("g",{transform:"translate(1 1)",strokeWidth:"2"},yt.createElement("circle",{strokeOpacity:".3",cx:"18",cy:"18",r:"18"}),yt.createElement("path",{d:"M36 18c0-9.94-8.06-18-18-18"},yt.createElement("animateTransform",{attributeName:"transform",type:"rotate",from:"0 18 18",to:"360 18 18",dur:"1s",repeatCount:"indefinite"})))))}var sr=r(20830),fr=["translations"];function mr(){return mr=Object.assign||function(e){for(var t=1;t=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}function vr(e){var t=e.translations,r=void 0===t?{}:t,n=pr(e,fr),o=r.resetButtonTitle,i=void 0===o?"Clear the query":o,a=r.resetButtonAriaLabel,c=void 0===a?"Clear the query":a,l=r.cancelButtonText,u=void 0===l?"Cancel":l,s=r.cancelButtonAriaLabel,f=void 0===s?"Cancel":s,m=n.getFormProps({inputElement:n.inputRef.current}).onReset;return yt.useEffect((function(){n.autoFocus&&n.inputRef.current&&n.inputRef.current.focus()}),[n.autoFocus,n.inputRef]),yt.useEffect((function(){n.isFromSelection&&n.inputRef.current&&n.inputRef.current.select()}),[n.isFromSelection,n.inputRef]),yt.createElement(yt.Fragment,null,yt.createElement("form",{className:"DocSearch-Form",onSubmit:function(e){e.preventDefault()},onReset:m},yt.createElement("label",mr({className:"DocSearch-MagnifierLabel"},n.getLabelProps()),yt.createElement(sr.W,null)),yt.createElement("div",{className:"DocSearch-LoadingIndicator"},yt.createElement(ur,null)),yt.createElement("input",mr({className:"DocSearch-Input",ref:n.inputRef},n.getInputProps({inputElement:n.inputRef.current,autoFocus:n.autoFocus,maxLength:ht}))),yt.createElement("button",{type:"reset",title:i,className:"DocSearch-Reset","aria-label":c,hidden:!n.state.query},yt.createElement(er,null))),yt.createElement("button",{className:"DocSearch-Cancel",type:"reset","aria-label":f,onClick:n.onClose},u))}var dr=["_highlightResult","_snippetResult"];function yr(e,t){if(null==e)return{};var r,n,o=function(e,t){if(null==e)return{};var r,n,o={},i=Object.keys(e);for(n=0;n=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}function hr(e){return!1===function(){var e="__TEST_KEY__";try{return localStorage.setItem(e,""),localStorage.removeItem(e),!0}catch(t){return!1}}()?{setItem:function(){},getItem:function(){return[]}}:{setItem:function(t){return window.localStorage.setItem(e,JSON.stringify(t))},getItem:function(){var t=window.localStorage.getItem(e);return t?JSON.parse(t):[]}}}function br(e){var t=e.key,r=e.limit,n=void 0===r?5:r,o=hr(t),i=o.getItem().slice(0,n);return{add:function(e){var t=e,r=(t._highlightResult,t._snippetResult,yr(t,dr)),a=i.findIndex((function(e){return e.objectID===r.objectID}));a>-1&&i.splice(a,1),i.unshift(r),i=i.slice(0,n),o.setItem(i)},remove:function(e){i=i.filter((function(t){return t.objectID!==e.objectID})),o.setItem(i)},getAll:function(){return i}}}function gr(e){const t=`algoliasearch-client-js-${e.key}`;let r;const n=()=>(void 0===r&&(r=e.localStorage||window.localStorage),r),o=()=>JSON.parse(n().getItem(t)||"{}"),i=e=>{n().setItem(t,JSON.stringify(e))};return{get:(t,r,n={miss:()=>Promise.resolve()})=>Promise.resolve().then((()=>{(()=>{const t=e.timeToLive?1e3*e.timeToLive:null,r=o(),n=Object.fromEntries(Object.entries(r).filter((([,e])=>void 0!==e.timestamp)));if(i(n),!t)return;const a=Object.fromEntries(Object.entries(n).filter((([,e])=>{const r=(new Date).getTime();return!(e.timestamp+tPromise.all([e?e.value:r(),void 0!==e]))).then((([e,t])=>Promise.all([e,t||n.miss(e)]))).then((([e])=>e)),set:(e,r)=>Promise.resolve().then((()=>{const i=o();return i[JSON.stringify(e)]={timestamp:(new Date).getTime(),value:r},n().setItem(t,JSON.stringify(i)),r})),delete:e=>Promise.resolve().then((()=>{const r=o();delete r[JSON.stringify(e)],n().setItem(t,JSON.stringify(r))})),clear:()=>Promise.resolve().then((()=>{n().removeItem(t)}))}}function Or(e){const t=[...e.caches],r=t.shift();return void 0===r?{get:(e,t,r={miss:()=>Promise.resolve()})=>t().then((e=>Promise.all([e,r.miss(e)]))).then((([e])=>e)),set:(e,t)=>Promise.resolve(t),delete:e=>Promise.resolve(),clear:()=>Promise.resolve()}:{get:(e,n,o={miss:()=>Promise.resolve()})=>r.get(e,n,o).catch((()=>Or({caches:t}).get(e,n,o))),set:(e,n)=>r.set(e,n).catch((()=>Or({caches:t}).set(e,n))),delete:e=>r.delete(e).catch((()=>Or({caches:t}).delete(e))),clear:()=>r.clear().catch((()=>Or({caches:t}).clear()))}}function Sr(e={serializable:!0}){let t={};return{get(r,n,o={miss:()=>Promise.resolve()}){const i=JSON.stringify(r);if(i in t)return Promise.resolve(e.serializable?JSON.parse(t[i]):t[i]);const a=n(),c=o&&o.miss||(()=>Promise.resolve());return a.then((e=>c(e))).then((()=>a))},set:(r,n)=>(t[JSON.stringify(r)]=e.serializable?JSON.stringify(n):n,Promise.resolve(n)),delete:e=>(delete t[JSON.stringify(e)],Promise.resolve()),clear:()=>(t={},Promise.resolve())}}function jr(e){let t=e.length-1;for(;t>0;t--){const r=Math.floor(Math.random()*(t+1)),n=e[t];e[t]=e[r],e[r]=n}return e}function wr(e,t){return t?(Object.keys(t).forEach((r=>{e[r]=t[r](e)})),e):e}function Er(e,...t){let r=0;return e.replace(/%s/g,(()=>encodeURIComponent(t[r++])))}const Pr="4.19.1",Ir={WithinQueryParameters:0,WithinHeaders:1};function Dr(e,t){const r=e||{},n=r.data||{};return Object.keys(r).forEach((e=>{-1===["timeout","headers","queryParameters","data","cacheable"].indexOf(e)&&(n[e]=r[e])})),{data:Object.entries(n).length>0?n:void 0,timeout:r.timeout||t,headers:r.headers||{},queryParameters:r.queryParameters||{},cacheable:r.cacheable}}const Ar={Read:1,Write:2,Any:3},kr={Up:1,Down:2,Timeouted:3},xr=12e4;function Cr(e,t=kr.Up){return{...e,status:t,lastUpdate:Date.now()}}function _r(e){return"string"==typeof e?{protocol:"https",url:e,accept:Ar.Any}:{protocol:e.protocol||"https",url:e.url,accept:e.accept||Ar.Any}}const Nr={Delete:"DELETE",Get:"GET",Post:"POST",Put:"PUT"};function Tr(e,t){return Promise.all(t.map((t=>e.get(t,(()=>Promise.resolve(Cr(t))))))).then((e=>{const r=e.filter((e=>function(e){return e.status===kr.Up||Date.now()-e.lastUpdate>xr}(e))),n=e.filter((e=>function(e){return e.status===kr.Timeouted&&Date.now()-e.lastUpdate<=xr}(e))),o=[...r,...n];return{getTimeout:(e,t)=>(0===n.length&&0===e?1:n.length+3+e)*t,statelessHosts:o.length>0?o.map((e=>_r(e))):t}}))}const qr=(e,t)=>(e=>{const t=e.status;return e.isTimedOut||(({isTimedOut:e,status:t})=>!e&&0==~~t)(e)||2!=~~(t/100)&&4!=~~(t/100)})(e)?t.onRetry(e):(({status:e})=>2==~~(e/100))(e)?t.onSuccess(e):t.onFail(e);function Rr(e,t,r,n){const o=[],i=function(e,t){if(e.method===Nr.Get||void 0===e.data&&void 0===t.data)return;const r=Array.isArray(e.data)?e.data:{...e.data,...t.data};return JSON.stringify(r)}(r,n),a=function(e,t){const r={...e.headers,...t.headers},n={};return Object.keys(r).forEach((e=>{const t=r[e];n[e.toLowerCase()]=t})),n}(e,n),c=r.method,l=r.method!==Nr.Get?{}:{...r.data,...n.data},u={"x-algolia-agent":e.userAgent.value,...e.queryParameters,...l,...n.queryParameters};let s=0;const f=(t,l)=>{const m=t.pop();if(void 0===m)throw{name:"RetryError",message:"Unreachable hosts - your application id may be incorrect. If the error persists, contact support@algolia.com.",transporterStackTrace:Fr(o)};const p={data:i,headers:a,method:c,url:Mr(m,r.path,u),connectTimeout:l(s,e.timeouts.connect),responseTimeout:l(s,n.timeout)},v=e=>{const r={request:p,response:e,host:m,triesLeft:t.length};return o.push(r),r},d={onSuccess:e=>function(e){try{return JSON.parse(e.content)}catch(t){throw function(e,t){return{name:"DeserializationError",message:e,response:t}}(t.message,e)}}(e),onRetry(r){const n=v(r);return r.isTimedOut&&s++,Promise.all([e.logger.info("Retryable failure",Ur(n)),e.hostsCache.set(m,Cr(m,r.isTimedOut?kr.Timeouted:kr.Down))]).then((()=>f(t,l)))},onFail(e){throw v(e),function({content:e,status:t},r){let n=e;try{n=JSON.parse(e).message}catch(o){}return function(e,t,r){return{name:"ApiError",message:e,status:t,transporterStackTrace:r}}(n,t,r)}(e,Fr(o))}};return e.requester.send(p).then((e=>qr(e,d)))};return Tr(e.hostsCache,t).then((e=>f([...e.statelessHosts].reverse(),e.getTimeout)))}function Lr(e){const t={value:`Algolia for JavaScript (${e})`,add(e){const r=`; ${e.segment}${void 0!==e.version?` (${e.version})`:""}`;return-1===t.value.indexOf(r)&&(t.value=`${t.value}${r}`),t}};return t}function Mr(e,t,r){const n=Hr(r);let o=`${e.protocol}://${e.url}/${"/"===t.charAt(0)?t.substr(1):t}`;return n.length&&(o+=`?${n}`),o}function Hr(e){return Object.keys(e).map((t=>{return Er("%s=%s",t,(r=e[t],"[object Object]"===Object.prototype.toString.call(r)||"[object Array]"===Object.prototype.toString.call(r)?JSON.stringify(e[t]):e[t]));var r})).join("&")}function Fr(e){return e.map((e=>Ur(e)))}function Ur(e){const t=e.request.headers["x-algolia-api-key"]?{"x-algolia-api-key":"*****"}:{};return{...e,request:{...e.request,headers:{...e.request.headers,...t}}}}const Br=e=>{const t=e.appId,r=function(e,t,r){const n={"x-algolia-api-key":r,"x-algolia-application-id":t};return{headers:()=>e===Ir.WithinHeaders?n:{},queryParameters:()=>e===Ir.WithinQueryParameters?n:{}}}(void 0!==e.authMode?e.authMode:Ir.WithinHeaders,t,e.apiKey),n=function(e){const{hostsCache:t,logger:r,requester:n,requestsCache:o,responsesCache:i,timeouts:a,userAgent:c,hosts:l,queryParameters:u,headers:s}=e,f={hostsCache:t,logger:r,requester:n,requestsCache:o,responsesCache:i,timeouts:a,userAgent:c,headers:s,queryParameters:u,hosts:l.map((e=>_r(e))),read(e,t){const r=Dr(t,f.timeouts.read),n=()=>Rr(f,f.hosts.filter((e=>0!=(e.accept&Ar.Read))),e,r);if(!0!==(void 0!==r.cacheable?r.cacheable:e.cacheable))return n();const o={request:e,mappedRequestOptions:r,transporter:{queryParameters:f.queryParameters,headers:f.headers}};return f.responsesCache.get(o,(()=>f.requestsCache.get(o,(()=>f.requestsCache.set(o,n()).then((e=>Promise.all([f.requestsCache.delete(o),e])),(e=>Promise.all([f.requestsCache.delete(o),Promise.reject(e)]))).then((([e,t])=>t))))),{miss:e=>f.responsesCache.set(o,e)})},write:(e,t)=>Rr(f,f.hosts.filter((e=>0!=(e.accept&Ar.Write))),e,Dr(t,f.timeouts.write))};return f}({hosts:[{url:`${t}-dsn.algolia.net`,accept:Ar.Read},{url:`${t}.algolia.net`,accept:Ar.Write}].concat(jr([{url:`${t}-1.algolianet.com`},{url:`${t}-2.algolianet.com`},{url:`${t}-3.algolianet.com`}])),...e,headers:{...r.headers(),"content-type":"application/x-www-form-urlencoded",...e.headers},queryParameters:{...r.queryParameters(),...e.queryParameters}}),o={transporter:n,appId:t,addAlgoliaAgent(e,t){n.userAgent.add({segment:e,version:t})},clearCache:()=>Promise.all([n.requestsCache.clear(),n.responsesCache.clear()]).then((()=>{}))};return wr(o,e.methods)},Vr=e=>(t,r)=>t.method===Nr.Get?e.transporter.read(t,r):e.transporter.write(t,r),Kr=e=>(t,r={})=>wr({transporter:e.transporter,appId:e.appId,indexName:t},r.methods),$r=e=>(t,r)=>{const n=t.map((e=>({...e,params:Hr(e.params||{})})));return e.transporter.read({method:Nr.Post,path:"1/indexes/*/queries",data:{requests:n},cacheable:!0},r)},Jr=e=>(t,r)=>Promise.all(t.map((t=>{const{facetName:n,facetQuery:o,...i}=t.params;return Kr(e)(t.indexName,{methods:{searchForFacetValues:Qr}}).searchForFacetValues(n,o,{...r,...i})}))),zr=e=>(t,r,n)=>e.transporter.read({method:Nr.Post,path:Er("1/answers/%s/prediction",e.indexName),data:{query:t,queryLanguages:r},cacheable:!0},n),Wr=e=>(t,r)=>e.transporter.read({method:Nr.Post,path:Er("1/indexes/%s/query",e.indexName),data:{query:t},cacheable:!0},r),Qr=e=>(t,r,n)=>e.transporter.read({method:Nr.Post,path:Er("1/indexes/%s/facets/%s/query",e.indexName,t),data:{facetQuery:r},cacheable:!0},n),Zr={Debug:1,Info:2,Error:3};function Gr(e,t,r){const n={appId:e,apiKey:t,timeouts:{connect:1,read:2,write:30},requester:{send:e=>new Promise((t=>{const r=new XMLHttpRequest;r.open(e.method,e.url,!0),Object.keys(e.headers).forEach((t=>r.setRequestHeader(t,e.headers[t])));const n=(e,n)=>setTimeout((()=>{r.abort(),t({status:0,content:n,isTimedOut:!0})}),1e3*e),o=n(e.connectTimeout,"Connection timeout");let i;r.onreadystatechange=()=>{r.readyState>r.OPENED&&void 0===i&&(clearTimeout(o),i=n(e.responseTimeout,"Socket timeout"))},r.onerror=()=>{0===r.status&&(clearTimeout(o),clearTimeout(i),t({content:r.responseText||"Network request failed",status:r.status,isTimedOut:!1}))},r.onload=()=>{clearTimeout(o),clearTimeout(i),t({content:r.responseText,status:r.status,isTimedOut:!1})},r.send(e.data)}))},logger:(o=Zr.Error,{debug:(e,t)=>(Zr.Debug>=o&&console.debug(e,t),Promise.resolve()),info:(e,t)=>(Zr.Info>=o&&console.info(e,t),Promise.resolve()),error:(e,t)=>(console.error(e,t),Promise.resolve())}),responsesCache:Sr(),requestsCache:Sr({serializable:!1}),hostsCache:Or({caches:[gr({key:`${Pr}-${e}`}),Sr()]}),userAgent:Lr(Pr).add({segment:"Browser",version:"lite"}),authMode:Ir.WithinQueryParameters};var o;return Br({...n,...r,methods:{search:$r,searchForFacetValues:Jr,multipleQueries:$r,multipleSearchForFacetValues:Jr,customRequest:Vr,initIndex:e=>t=>Kr(e)(t,{methods:{search:Wr,searchForFacetValues:Qr,findAnswers:zr}})}})}Gr.version=Pr;const Xr=Gr;var Yr="3.5.1";function en(){}function tn(e){return e}function rn(e){return 1===e.button||e.altKey||e.ctrlKey||e.metaKey||e.shiftKey}function nn(e,t,r){return e.reduce((function(e,n){var o=t(n);return e.hasOwnProperty(o)||(e[o]=[]),e[o].length<(r||5)&&e[o].push(n),e}),{})}var on=["footer","searchBox"];function an(){return an=Object.assign||function(e){for(var t=1;te.length)&&(t=e.length);for(var r=0,n=new Array(t);r=0||(o[r]=e[r]);return o}(e,t);if(Object.getOwnPropertySymbols){var i=Object.getOwnPropertySymbols(e);for(n=0;n=0||Object.prototype.propertyIsEnumerable.call(e,r)&&(o[r]=e[r])}return o}function pn(e){var t=e.appId,r=e.apiKey,n=e.indexName,o=e.placeholder,i=void 0===o?"Search docs":o,a=e.searchParameters,c=e.maxResultsPerGroup,l=e.onClose,u=void 0===l?en:l,s=e.transformItems,f=void 0===s?tn:s,m=e.hitComponent,p=void 0===m?St:m,v=e.resultsFooterComponent,d=void 0===v?function(){return null}:v,y=e.navigator,h=e.initialScrollY,b=void 0===h?0:h,g=e.transformSearchClient,O=void 0===g?tn:g,S=e.disableUserPersonalization,j=void 0!==S&&S,w=e.initialQuery,E=void 0===w?"":w,P=e.translations,I=void 0===P?{}:P,D=e.getMissingResultsUrl,A=e.insights,k=void 0!==A&&A,x=I.footer,C=I.searchBox,_=mn(I,on),N=sn(yt.useState({query:"",collections:[],completion:null,context:{},isOpen:!1,activeItemId:null,status:"idle"}),2),T=N[0],q=N[1],R=yt.useRef(null),L=yt.useRef(null),M=yt.useRef(null),H=yt.useRef(null),F=yt.useRef(null),U=yt.useRef(10),B=yt.useRef("undefined"!=typeof window?window.getSelection().toString().slice(0,ht):"").current,V=yt.useRef(E||B).current,K=function(e,t,r){return yt.useMemo((function(){var n=Xr(e,t);return n.addAlgoliaAgent("docsearch",Yr),!1===/docsearch.js \(.*\)/.test(n.transporter.userAgent.value)&&n.addAlgoliaAgent("docsearch-react",Yr),r(n)}),[e,t,r])}(t,r,O),$=yt.useRef(br({key:"__DOCSEARCH_FAVORITE_SEARCHES__".concat(n),limit:10})).current,J=yt.useRef(br({key:"__DOCSEARCH_RECENT_SEARCHES__".concat(n),limit:0===$.getAll().length?7:4})).current,z=yt.useCallback((function(e){if(!j){var t="content"===e.type?e.__docsearch_parent:e;t&&-1===$.getAll().findIndex((function(e){return e.objectID===t.objectID}))&&J.add(t)}}),[$,J,j]),W=yt.useCallback((function(e){if(T.context.algoliaInsightsPlugin&&e.__autocomplete_id){var t=e,r={eventName:"Item Selected",index:t.__autocomplete_indexName,items:[t],positions:[e.__autocomplete_id],queryID:t.__autocomplete_queryID};T.context.algoliaInsightsPlugin.insights.clickedObjectIDsAfterSearch(r)}}),[T.context.algoliaInsightsPlugin]),Q=yt.useMemo((function(){return dt({id:"docsearch",defaultActiveItemId:0,placeholder:i,openOnFocus:!0,initialState:{query:V,context:{searchSuggestions:[]}},insights:k,navigator:y,onStateChange:function(e){q(e.state)},getSources:function(e){var o=e.query,i=e.state,l=e.setContext,s=e.setStatus;if(!o)return j?[]:[{sourceId:"recentSearches",onSelect:function(e){var t=e.item,r=e.event;z(t),rn(r)||u()},getItemUrl:function(e){return e.item.url},getItems:function(){return J.getAll()}},{sourceId:"favoriteSearches",onSelect:function(e){var t=e.item,r=e.event;z(t),rn(r)||u()},getItemUrl:function(e){return e.item.url},getItems:function(){return $.getAll()}}];var m=Boolean(k);return K.search([{query:o,indexName:n,params:ln({attributesToRetrieve:["hierarchy.lvl0","hierarchy.lvl1","hierarchy.lvl2","hierarchy.lvl3","hierarchy.lvl4","hierarchy.lvl5","hierarchy.lvl6","content","type","url"],attributesToSnippet:["hierarchy.lvl1:".concat(U.current),"hierarchy.lvl2:".concat(U.current),"hierarchy.lvl3:".concat(U.current),"hierarchy.lvl4:".concat(U.current),"hierarchy.lvl5:".concat(U.current),"hierarchy.lvl6:".concat(U.current),"content:".concat(U.current)],snippetEllipsisText:"\u2026",highlightPreTag:"",highlightPostTag:"",hitsPerPage:20,clickAnalytics:m},a)}]).catch((function(e){throw"RetryError"===e.name&&s("error"),e})).then((function(e){var o=e.results,a=o[0],s=a.hits,p=a.nbHits,v=nn(s,(function(e){return Qt(e)}),c);i.context.searchSuggestions.length0&&(X(),F.current&&F.current.focus())}),[V,X]),yt.useEffect((function(){function e(){if(L.current){var e=.01*window.innerHeight;L.current.style.setProperty("--docsearch-vh","".concat(e,"px"))}}return e(),window.addEventListener("resize",e),function(){window.removeEventListener("resize",e)}}),[]),yt.createElement("div",an({ref:R},G({"aria-expanded":!0}),{className:["DocSearch","DocSearch-Container","stalled"===T.status&&"DocSearch-Container--Stalled","error"===T.status&&"DocSearch-Container--Errored"].filter(Boolean).join(" "),role:"button",tabIndex:0,onMouseDown:function(e){e.target===e.currentTarget&&u()}}),yt.createElement("div",{className:"DocSearch-Modal",ref:L},yt.createElement("header",{className:"DocSearch-SearchBar",ref:M},yt.createElement(vr,an({},Q,{state:T,autoFocus:0===V.length,inputRef:F,isFromSelection:Boolean(V)&&V===B,translations:C,onClose:u}))),yt.createElement("div",{className:"DocSearch-Dropdown",ref:H},yt.createElement(lr,an({},Q,{indexName:n,state:T,hitComponent:p,resultsFooterComponent:d,disableUserPersonalization:j,recentSearches:J,favoriteSearches:$,inputRef:F,translations:_,getMissingResultsUrl:D,onItemClick:function(e,t){W(e),z(e),rn(t)||u()}}))),yt.createElement("footer",{className:"DocSearch-Footer"},yt.createElement(Ot,{translations:x}))))}}}]); \ No newline at end of file diff --git a/assets/js/runtime~main.326b708d.js b/assets/js/runtime~main.3f7ca0ae.js similarity index 99% rename from assets/js/runtime~main.326b708d.js rename to assets/js/runtime~main.3f7ca0ae.js index 66caf40c3f6..f87e3cfb37e 100644 --- a/assets/js/runtime~main.326b708d.js +++ b/assets/js/runtime~main.3f7ca0ae.js @@ -1 +1 @@ -(()=>{"use strict";var e,b,f,c,d,a={},t={};function r(e){var b=t[e];if(void 0!==b)return b.exports;var f=t[e]={id:e,loaded:!1,exports:{}};return a[e].call(f.exports,f,f.exports,r),f.loaded=!0,f.exports}r.m=a,r.c=t,e=[],r.O=(b,f,c,d)=>{if(!f){var a=1/0;for(i=0;i=d)&&Object.keys(r.O).every((e=>r.O[e](f[o])))?f.splice(o--,1):(t=!1,d0&&e[i-1][2]>d;i--)e[i]=e[i-1];e[i]=[f,c,d]},r.n=e=>{var b=e&&e.__esModule?()=>e.default:()=>e;return r.d(b,{a:b}),b},f=Object.getPrototypeOf?e=>Object.getPrototypeOf(e):e=>e.__proto__,r.t=function(e,c){if(1&c&&(e=this(e)),8&c)return e;if("object"==typeof e&&e){if(4&c&&e.__esModule)return e;if(16&c&&"function"==typeof e.then)return e}var d=Object.create(null);r.r(d);var a={};b=b||[null,f({}),f([]),f(f)];for(var t=2&c&&e;"object"==typeof t&&!~b.indexOf(t);t=f(t))Object.getOwnPropertyNames(t).forEach((b=>a[b]=()=>e[b]));return a.default=()=>e,r.d(d,a),d},r.d=(e,b)=>{for(var f in b)r.o(b,f)&&!r.o(e,f)&&Object.defineProperty(e,f,{enumerable:!0,get:b[f]})},r.f={},r.e=e=>Promise.all(Object.keys(r.f).reduce(((b,f)=>(r.f[f](e,b),b)),[])),r.u=e=>"assets/js/"+({19:"906e49ec",21:"f5e3827c",71:"ab971afc",99:"49c587c2",172:"21730a31",224:"a93c3367",250:"23d30d6b",291:"5da0ca7c",467:"a5bcb3f1",513:"9216ce7b",596:"b564874a",803:"c63e6bd5",899:"54d8bddc",1116:"0109100f",1365:"66ffc608",1387:"be2f7876",1523:"ef01e1dd",1647:"7981506b",1652:"902d2d1d",1664:"9ecb4d01",1671:"eee57cd1",1940:"a971b35f",2044:"0149cacd",2097:"40d51a61",2196:"e1e17943",2312:"ac4bed99",2427:"fa423b6e",2570:"b72a3182",2638:"935116ff",2656:"80631bfd",2905:"60b67194",2916:"7bb83d6b",2989:"9ef1e345",3044:"3bedcc76",3102:"7174660f",3145:"d4d22ad8",3191:"d0a0235c",3197:"ec28562d",3216:"92b043a3",3281:"a50b12c0",3283:"0b092b5c",3326:"5e94ba2e",3397:"b7343c9b",3398:"1e070b7c",3447:"4162c6a3",3650:"020a22ba",3667:"6c1d24e1",3914:"7b7fec6b",3919:"c81517c7",3942:"2b024d60",4125:"feba251b",4151:"5017cef7",4195:"3c93ed7e",4244:"21cfb395",4328:"38680a69",4504:"f28093e3",4513:"ffa15017",4585:"f2dc10f7",4631:"9654b394",4742:"c85450ca",4874:"a3db1255",4882:"4482beb5",4929:"75600d79",5061:"e54b1e77",5129:"d87811ce",5132:"23b46f68",5313:"aa4fa4fb",5352:"622596e9",5383:"f2a3bf8e",5512:"631dea17",5714:"d16a2606",5909:"66e9ea68",5920:"d613e1f8",5981:"85e709dc",6027:"391378fa",6151:"39579801",6386:"a8ef1ed2",6443:"5b4a63ac",6517:"31c3e3d7",6537:"0c99e969",6553:"efc338fe",6695:"111e23e1",6734:"85954f48",6799:"d1284c82",6822:"f49551b9",6824:"cc519fb4",6968:"9e6b2559",6971:"f38fa80d",6978:"7d9c461e",7078:"ff96de6e",7091:"bd1a8573",7092:"30a13577",7108:"365726b0",7120:"7e91f3e1",7155:"bb4987bb",7162:"97ce6959",7318:"b7e69c77",7451:"d8c5fc94",7470:"3e8cde1e",7485:"81f033b8",7500:"bd0e022f",7874:"32d13eb8",8023:"43de05a8",8135:"7d280bdc",8145:"93015a15",8188:"39bddd84",8210:"c565b8da",8230:"3c20ca15",8313:"d163ea32",8328:"fa8af309",8407:"b4473d93",8482:"f983631a",8638:"2b8a5969",8671:"1b93ff3d",8809:"aa01ca6a",8882:"6fdd5bc4",8906:"6d2c1101",9028:"de8a7b18",9119:"debbc0e2",9225:"e5523a26",9235:"407bcc70",9365:"541bc80d",9444:"0ffc31bc",9542:"cf4d312e",9550:"36edbaa2",9615:"9db3bdac",9817:"14eb3368",9836:"13af1bdb",9907:"bfd6b54b",9947:"7097fbbc",10109:"70cd875c",10228:"cdebfca4",10270:"a7da438d",10436:"05fa5837",10480:"8203e9fd",10497:"6773ef05",10650:"e8b1baf4",10756:"c7dacad4",10882:"544bf006",10918:"caf7e36c",10987:"26e1978a",11147:"76ace0dc",11174:"ba17e21b",11203:"c98c0daa",11311:"2913cae6",11321:"a15a0d8e",11326:"65031edd",11342:"4d2bb41f",11398:"ba5e62dd",11565:"9d08c935",11646:"b74e2fe0",11656:"885bf670",11875:"3d446fd0",12228:"b63d08bc",12442:"1fb2401b",12549:"31eb4af1",12555:"885da4ef",12558:"58ac1d26",12560:"a6d8b730",12567:"5aabd190",13253:"b48b6b77",13280:"f65f22ef",13351:"6d905346",13460:"70808dd0",13588:"55920b47",13595:"3be6e3bd",13617:"c642f758",13718:"1ac29206",13896:"8bd7a1a3",13924:"bd0b26a5",13972:"d94f9ca1",13979:"911bbfa4",13995:"0b0df062",14061:"3105dae0",14088:"03902e07",14095:"31585cea",14143:"f71ac404",14200:"1168c96f",14299:"af9acd56",14369:"c099652b",14386:"83cbebfb",14396:"d3fe7aed",14549:"79db63f1",14610:"40a26966",14670:"0f188b70",14713:"763f2b13",14775:"931e6ab8",14840:"39ed38cd",14908:"f6310963",15196:"4338ab08",15380:"99a1a3e3",15393:"8cf636cd",15497:"f9c66408",15658:"a97b7821",15888:"a466ebec",15970:"271906a0",15994:"26134010",16022:"21edad34",16038:"62127933",16058:"891a9b8f",16071:"6ecc8728",16153:"7d0b3c01",16161:"ff3504dd",16379:"66fe7120",16528:"fdbb9241",16635:"a9347149",16672:"e1bbb98e",16685:"46551803",16876:"df878b79",16891:"697b1a5f",16973:"251d94d8",17009:"5573e4eb",17275:"fe34d639",17283:"45e19d44",17457:"f77885f5",17511:"528fc62e",17726:"c003d460",17757:"00c88225",17785:"3e697946",17883:"b530e783",17887:"996d98f3",17989:"9a71d807",18025:"7f9f61f2",18050:"abc9098e",18084:"29dde6c8",18100:"d19aead5",18143:"2730c631",18156:"5bebce7d",18186:"074a0372",18318:"4e07c49f",18559:"18ccacf6",18598:"719c4782",18680:"d1faa944",18706:"ccb072c7",18734:"07645771",18746:"8282a203",18883:"31793acc",18892:"4d58aa3f",18928:"6e11cc87",18998:"bc4716d5",19177:"584b298a",19204:"79617745",19212:"86c7426e",19305:"8a064b88",19408:"b7ec56b9",19427:"126e88af",19493:"6f49328c",19504:"83fe529b",19531:"974829b4",19625:"84eafbb4",19661:"664f913c",19671:"3729e987",19709:"c5593510",19733:"2edcde3e",19806:"dc7ad1ac",19832:"17f9c41b",19876:"760eed0f",19939:"8781c463",19962:"6bf1075e",20040:"b73dbab9",20061:"ef0f9e32",20169:"c9664647",20303:"9b7bae35",20602:"1d014bb1",20689:"cbbe4dac",20707:"120dd2fd",20764:"6d0dfc8d",20841:"67c4e6e9",20876:"55e55873",20911:"be3ddcfb",20917:"e01a2739",20983:"4d0df69e",21015:"653c19c7",21134:"ae5e6a48",21143:"6a89e0dd",21190:"95169675",21207:"0e728709",21228:"0feddf78",21379:"e22055a4",21643:"f18d5795",21688:"aa282e34",21823:"c0ef9e49",21983:"7f2bec55",22030:"43a49a39",22087:"af199e5b",22129:"06ceb223",22163:"9bc49845",22238:"1ca03b4b",22456:"0b0f030b",22523:"ef2624d8",22540:"0e8c522c",22583:"66b5d69c",22604:"a8f480dd",22777:"4995f874",22898:"6775be7c",22940:"fe423ebe",22997:"9e91305d",23064:"500c9b63",23228:"ede8882f",23231:"909cadf6",23252:"8113fd14",23310:"c1f9ba1e",23320:"89c49d10",23343:"57b7b037",23435:"23896e06",23522:"2c9f485d",23536:"edbf4496",23545:"2457e7c2",23663:"c7599d12",23714:"332c497c",23804:"99961c3d",23898:"9a02f8a7",24058:"f1d5089f",24066:"92ce2bd2",24101:"15d86f95",24109:"0c48ef63",24158:"b7738a69",24266:"a8565f1f",24282:"395508da",24401:"fbfa5dfc",24467:"7fc9e2ed",24501:"e6de5f28",24946:"fd378320",24986:"7d30361b",25079:"10fd89ee",25251:"6c0ce6d0",25283:"8b2f7dd6",25427:"2b1e7b76",25433:"b9b67b35",25451:"59740e69",25513:"f265d6a5",25547:"06673fe1",25579:"75071a94",25833:"9427c683",25898:"bd61737f",26067:"cc7818bb",26084:"85860bdc",26086:"21996883",26201:"a958884d",26291:"3bed40a0",26311:"f35b8c8b",26361:"aacdb064",26521:"ec205789",26654:"08985b86",26686:"5667bf50",26695:"3be4d1c2",26858:"ceb6bd62",27031:"cba64cb3",27109:"bb1d1845",27167:"6d03c6cb",27270:"e022cd8b",27276:"ec2b56b1",27303:"ec11103c",27324:"552bb95e",27554:"92307374",27704:"0f6a2fca",27918:"17896441",27982:"865c04d0",28085:"56405cb8",28134:"916fb87b",28139:"95771e39",28181:"278f3637",28219:"50e78136",28261:"9ce40ebc",28367:"7f6814ed",28475:"fc338eb2",28476:"2f4d1edb",28514:"db082e36",28516:"8af04d56",28623:"018243f8",28699:"4a0c84c3",28800:"f6ca5dc0",28880:"2f74be58",28882:"8d83f575",28906:"48c7b3a1",28922:"3417a016",29014:"da9049b8",29025:"8f32218b",29050:"44573fa4",29066:"e5977951",29131:"a670ed1c",29191:"6fe0ccd0",29272:"f5da8015",29514:"1be78505",29520:"4ef1f024",29698:"949a554a",29717:"cff5e41a",29782:"16a52e74",29818:"9e530f0a",29831:"3291c538",29864:"b604d5b2",29871:"729f5dd4",29886:"2c86cbaa",29899:"14e9211b",29978:"3d8cf439",29980:"c32e37fe",30062:"26db341a",30216:"04829abe",30295:"ff9e51b7",30419:"26bc6c41",30433:"1dc72111",30454:"019a0579",30470:"73c32a6c",30589:"10b7b761",30613:"47203c86",30677:"4f643cbc",30678:"8dc6ea19",30800:"534db397",30820:"845c1fa7",30834:"8900c226",30837:"6b685afe",30865:"0c9e4d11",30885:"8d5884d6",30979:"683d9354",31009:"c8b95361",31013:"65360910",31023:"d1036fb2",31044:"9b98b06f",31050:"23c664e3",31068:"928e95c7",31089:"5e56d481",31116:"abe8f5f4",31152:"99496549",31187:"8793e9e6",31293:"cc976a0e",31294:"212ceae2",31441:"347c8874",31471:"2d7d2510",31512:"8a75859c",31516:"ee2f6eec",31570:"6c6d8053",31671:"ce861b37",31824:"9f850ab3",31938:"4dbdcbee",32224:"87719f86",32255:"bc6bcbdd",32308:"d201558c",32319:"570c64c0",32410:"09e7c68c",32446:"0eb0d7dd",32491:"6167ec10",32567:"5d8d28d6",32652:"f8c45ac9",32689:"a5461ca4",32839:"46d1dc13",32872:"f1c17b7f",32892:"e9268009",32914:"0ef4df13",32961:"1347019b",33023:"9fcb81d2",33076:"a9776c25",33083:"95f7392c",33131:"ad516382",33138:"dd0c884c",33178:"cab767d9",33181:"fa17a3e5",33223:"5af48372",33260:"3deda206",33261:"82dec33c",33329:"9ebfae5b",33407:"765a551b",33514:"b07fb42c",33725:"586fa356",33737:"452104f0",33889:"5b659de8",33920:"1943e34c",33966:"c46ba464",34020:"9b00304e",34077:"3db5eb91",34079:"273b8e1f",34153:"4a797306",34206:"23a156eb",34293:"ff318c38",34294:"f8338e5f",34323:"e48c3912",34407:"c4a71dd9",34458:"f0f4a691",34460:"5c8ad115",34475:"5bea2473",34552:"592e779d",34590:"de061f48",34647:"c93364c6",34656:"813ebe83",34748:"71408d45",34766:"116bb944",34777:"4284f636",34784:"243071a0",34792:"5c392fa5",34800:"99a27b29",34882:"16046cb7",34943:"2c06af7c",34979:"9c12417e",35038:"a2bcabb3",35069:"b269633b",35214:"3576f003",35216:"5334bf47",35387:"907c8c6a",35466:"1cf42300",35577:"09e24d74",35614:"032d72a0",35647:"a2e876c5",35768:"90bfd346",35809:"c30c381e",35874:"df463adb",35879:"41f4b8cc",36009:"b3a22aab",36312:"ade0010f",36415:"8d392edd",36442:"f98b2e13",36483:"caa6bd94",36495:"b3fdbb6a",36511:"f3d03ec8",36673:"8d4185e0",36766:"653cd4ec",36773:"91647079",36933:"eb87086a",36935:"63849fd3",36983:"7c43c98e",37021:"6d92a4b5",37055:"ac6b62e9",37058:"6e357be7",37208:"c3a94ed1",37257:"4f4166ed",37316:"229edc10",37362:"6dfd1bfa",37426:"3d7b9a1b",37894:"7779798d",37918:"246772cc",37977:"febe4bf0",38056:"5bb043f7",38104:"b34a9ee0",38230:"80b5c97d",38333:"8e018081",38349:"6fc631de",38368:"11414e0b",38450:"a77f15f9",38469:"66cd2d70",38504:"0df0bc38",38591:"e80537c2",38679:"5eece5ec",38741:"cea40137",38768:"38cd2ebb",38792:"2f6d8a46",38819:"0cb88ec0",38873:"179d37d3",38928:"1ec74ed7",39033:"7aabbdee",39177:"3ae213b8",39209:"f55bfda4",39252:"d179e89e",39275:"c0ba661c",39325:"a072c73d",39368:"b7e5badb",39605:"22f9ccca",39645:"e6e9a3aa",39726:"f2c01e3a",39820:"8277cea1",39853:"73c3a5ed",39941:"f8904416",39972:"fa8dc2e8",39978:"5b34f9ea",40097:"f49b74d5",40158:"2d7caf96",40176:"0260d23f",40342:"7ad00ade",40365:"dd8797f2",40665:"c2ef5f99",40830:"eaaaa138",40930:"51cdab7b",40936:"f5d7fbaf",40986:"0cbb6061",41100:"ba1c1ac8",41120:"14f11778",41329:"9444e723",41388:"7945275b",41537:"e6241e03",41750:"78718572",41840:"5b7c576e",41863:"56181a0b",41929:"c2ae09fd",41954:"85db7b61",41958:"e4176d9e",41998:"81192af7",42051:"a8987ce3",42054:"10c43c6e",42059:"bfe6bb1f",42169:"672c9486",42187:"4b66f540",42226:"3ebe5c8a",42263:"909a3395",42288:"24647619",42289:"48e254a2",42371:"ff4be603",42436:"4447d079",42465:"c14e35a5",42551:"a2ff0b9e",42609:"ff7c02a9",42620:"b6cfa9b7",42690:"4b481283",42721:"8e0282b7",42728:"910f748a",42757:"699b0913",42930:"608d6ba6",43037:"e2e305b4",43047:"ea09532f",43072:"dc98fcfb",43163:"26f63738",43294:"f6d93f4d",43529:"87186dce",43554:"39befbbe",43635:"0c94161c",43645:"d7039a99",43697:"4b718ce0",43793:"0682e49e",43849:"8b15c55c",43919:"bfada16a",43966:"b99800de",44023:"0e0b668d",44029:"4b0528ed",44118:"5d86b3d6",44152:"f193e9f7",44174:"5b1c4ba7",44393:"dfca4314",44523:"3d99ef33",44592:"29c565c8",44765:"0a54392a",44780:"51453fb2",44797:"7e328509",44860:"55a23a94",44907:"0f014490",45057:"46f76bef",45091:"dacae080",45114:"593ffe68",45279:"01f7e848",45287:"8b1145e2",45571:"a258685b",45583:"3476fe8e",45593:"d02f7bc4",45732:"f2abaee2",45786:"239111c7",45809:"83d061ac",45878:"9ee45729",46017:"ce3ddafe",46023:"5216f17c",46045:"e0eae934",46074:"fff3ab69",46218:"8d0344ba",46284:"a5b5d55c",46328:"2fc02015",46447:"e7478c24",46838:"46dcda29",46901:"33a34e3b",47062:"cbbdf9a2",47068:"ba73f26c",47082:"cc1f5ce8",47117:"8b6445a0",47276:"524b67e3",47287:"cf1567e8",47463:"e5c3dfde",47568:"3059ed75",47582:"9ee4ebe9",47655:"ee799351",47708:"3ab425d2",47838:"2f0ee63c",47851:"e327333b",47975:"497aa321",47986:"9a7b56f5",48031:"38bd3ddb",48055:"60da83fa",48150:"6f6b3e89",48218:"c81622cc",48320:"bf0d24cf",48349:"3762a996",48426:"3ddb8349",48637:"ba0b40b1",48670:"8fdcea61",48840:"abfd17f9",48998:"e96fdd6c",49096:"70f3cfb0",49169:"d1b82434",49241:"f99bfa77",49270:"8eed67ba",49681:"28325500",49716:"331a2ebd",49874:"98a6ff5a",50017:"ab2e7268",50052:"8ac39bbe",50145:"4d028f11",50153:"e51da90c",50193:"486e741e",50240:"7b4c719b",50337:"acb04c32",50362:"7ce5ebd9",50375:"e86c0d05",50437:"9f305eae",50472:"40a0c599",50525:"aba6a826",50773:"7bcf009a",50849:"d7e1d518",50999:"6f93a078",51532:"1ed71a7d",51555:"e91074f3",51574:"12e76d03",51593:"6f219482",51605:"ea82a261",51625:"3b5ffa57",51706:"af6e989f",51768:"e6fe050f",51799:"42a4a45b",51830:"2212e80c",51840:"b63fdeeb",51881:"bf3bde03",51945:"86a7da57",52094:"22a76d89",52126:"08ba51c1",52251:"92bceb62",52286:"79bae4c5",52491:"fcb00301",52499:"3b1e54e9",52573:"2006be57",52586:"a18114c4",52593:"28599d52",52715:"c04dcf0d",52789:"0e46f7bf",52870:"f888d9d8",53237:"1df93b7f",53243:"54230287",53371:"c55f973e",53442:"6cd64148",53675:"3ca132b1",53823:"f20f879f",54125:"b684abf7",54133:"6afbfa44",54178:"1632abda",54200:"39a2751e",54210:"4c8d1cae",54250:"b3c952b5",54265:"d6f7d5e2",54363:"fa5bdf0c",54382:"4bae0029",54397:"3034400c",54487:"612ebb8a",54513:"cca83a59",54591:"e0668c88",54741:"7c417199",54756:"130a23fd",54778:"fd67079f",54786:"ed07f994",54794:"dd8be3b2",54855:"32f0f819",55043:"a463ff81",55216:"5560d84e",55239:"661e4fa4",55273:"08472b2d",55335:"746f419e",55478:"1710d498",55552:"8e23b856",55693:"7ec28fd9",55726:"7f536709",55745:"2e18dbc8",55799:"fe2f6d57",55821:"676a3180",55925:"407fa3a0",55962:"f2d325f1",56290:"640fe435",56424:"ac4fb807",56513:"e8d36425",56541:"7ec7e0b0",56552:"43c3babd",56614:"71f8452f",56750:"f1525ef1",56795:"151869e3",56902:"d4a6dda9",57121:"918ae6ff",57126:"7f039048",57242:"522a40f8",57258:"1b4282d0",57293:"fb218ddd",57341:"3ad7b662",57489:"f251ab77",57598:"e4b4615d",57599:"6e586ee3",57699:"d06effa9",57749:"34660ac5",57780:"163044ef",57820:"a3c98c45",57995:"2c5ceec1",58009:"84c320c1",58042:"4893a1cb",58096:"e345afee",58182:"a045168c",58197:"09e9a7df",58234:"6145eda0",58247:"551b313a",58356:"649a76e7",58539:"e5a71ed6",58564:"ea41aad0",58768:"4f9404e5",58818:"cc6053aa",58822:"cf14af90",58824:"bf2622dd",58914:"68709c70",59051:"8938295e",59060:"de11ece8",59181:"010f8398",59191:"af049e12",59241:"07a6f1c2",59248:"8962034b",59336:"902aff6f",59342:"d959d974",59394:"f4be443e",59427:"b43aa387",59442:"0cd38f48",59496:"081ed9af",59506:"c2ed794e",59533:"897798e8",59592:"d243562e",59771:"e56a1a2c",59900:"619d2e79",59982:"f929d4df",59992:"918c9b38",6e4:"78f8003c",60185:"7b2e834b",60331:"34d5cc00",60434:"05a720dd",60487:"34088569",60518:"bb341369",60603:"b8677fbf",60608:"c1b7bc44",60682:"1ebc7fe2",60831:"84960677",60868:"16bb304a",61007:"a79d55be",61200:"4c13f84f",61210:"f8bc4080",61249:"f497508e",61271:"e50573ba",61361:"a529f863",61543:"e31a63b7",61643:"a882bd74",61793:"5e52bbeb",62117:"83a26c48",62235:"912fcb5a",62307:"5c77ea5f",62318:"bdd03912",62319:"8e993d66",62474:"686c1ad3",62493:"6312a106",62523:"54d1c079",62547:"bc1c33e4",62655:"8fca97e0",62867:"34d502be",62948:"c38f23a9",62983:"6dffe7c4",63105:"f29affbe",63387:"877a3c1c",63604:"7bc70741",63663:"cb582f54",63777:"92264b81",63801:"010118f9",63808:"86c8f7cd",64e3:"cacfa11d",64032:"d6990b47",64072:"555f2cec",64100:"e8591f69",64122:"07b92fc6",64164:"aea361f0",64485:"e715560c",64525:"6694e7e9",64533:"fd0f74aa",64596:"eee9e2f1",64656:"300bd484",64754:"e74888b9",64766:"5a7e5a43",64824:"dfd588b8",64871:"42d0afac",64900:"610e19f0",64967:"b81f3fb0",65132:"637ec626",65501:"8d3be60d",65592:"ef0f3981",65594:"88dde0bb",65641:"90dccef4",65758:"62041344",65883:"540f26e7",65965:"21d2296a",66026:"06b7cd3c",66081:"9d336f66",66187:"cf494ba6",66238:"1fb9ab5c",66256:"5f0246ae",66303:"a5560bad",66336:"87cc8f7c",66462:"1d642165",66465:"9c53d859",66597:"e0052e0c",66744:"2dba4e8b",66958:"12b52520",67010:"49763a0b",67021:"01ba3f79",67061:"d7124adb",67069:"c7b80b67",67132:"15f4efbb",67343:"2d8700b9",67448:"85ac525a",67583:"18caf9ef",67597:"1693c0b8",67638:"a48eac25",67670:"a23744f9",67908:"d29db0e3",67954:"8d96489a",68034:"8858d0ce",68126:"db9653b1",68258:"f301134a",68689:"8751004c",68757:"5335ef4f",68793:"9b89ba00",68823:"c3ca7a6a",68943:"ae5838f0",69057:"21cf1efb",69111:"1a54bfd0",69125:"a5f4c814",69169:"046783c0",69209:"193f200e",69234:"212137e2",69254:"140e6a69",69628:"9bfbb8bc",69629:"d5d7628b",69647:"607292ce",69824:"273a5860",69843:"7facae8f",69899:"6d933e1d",70081:"41e02281",70130:"16b47049",70178:"06b3b671",70249:"7bd49e6c",70277:"d72ada40",70367:"1fc4ed50",70497:"3fb51827",70504:"f53e2381",70543:"ce79b72a",70614:"65306ecf",70706:"21d3c1c7",70964:"06876062",71081:"99c371aa",71160:"dff7b4e8",71169:"5ed92a05",71287:"167f5be9",71476:"c26ab7d5",71516:"dcf1d6e7",71544:"d93a0aad",71698:"ff9d88b6",71789:"fcfa677e",71811:"39afc900",71918:"77ccc938",72070:"59dfcfb5",72189:"7f31124b",72331:"66179fb5",72500:"cf945ce5",72613:"084a18af",72638:"1adeac4a",72740:"150a4d14",72887:"2e2a73ec",72952:"d703ca6f",72957:"d9b3adf3",72978:"2e6d047c",73238:"1eec97be",73300:"4efa0483",73338:"aa02927d",73356:"b26f6fa9",73369:"6594bd70",73452:"63b8176f",73533:"c9f98325",73537:"c7953305",73576:"05a8a78d",73618:"12dcfbad",73725:"8f9c5733",73745:"154cbeb4",73766:"af29c71b",73858:"bce7d46d",73975:"566ea6d5",74019:"ff35d8ff",74061:"a0541488",74091:"67e63bc0",74132:"ba454016",74136:"ef4e0f5d",74139:"38341509",74332:"769f97b7",74337:"cb8731ee",74362:"71078103",74441:"4c8fc79c",74465:"e6a17fa0",74480:"4bbc58d4",74578:"80ea5ae7",74794:"016f0e16",74839:"8b87f6f5",74882:"95648126",75070:"5b34d9eb",75118:"29ff1658",75123:"8d81369e",75203:"61c61e17",75273:"ca1d44bc",75546:"c176dc4f",75567:"43a232e9",75671:"ff078e30",75702:"1ac49947",75897:"64e30bbc",75911:"b92bff04",76029:"d6f3938e",76180:"172c9869",76222:"99ae7254",76240:"02de7b5a",76355:"2363ed29",76360:"4f8fd4be",76369:"c0f7075f",76527:"359e34b0",76668:"982c02f7",76793:"d12dbf4d",76878:"198182f0",76895:"8d493a07",76924:"cb870251",77053:"9eb4c1b4",77170:"60e8b504",77413:"4deae4de",77427:"78dc40c2",77527:"56c932ee",77655:"274eaedf",77680:"60043c0d",77887:"be698a2c",77923:"971cbe2f",78035:"1ae50e88",78038:"4ab9b114",78063:"c80936bd",78072:"ed51eb7d",78114:"516dec85",78233:"41742cda",78248:"df22f3af",78452:"118229e6",78863:"2b5e4b34",78924:"0fcbeed9",78991:"69f3d9b5",79073:"a17fb62b",79164:"95f18dd4",79239:"3888e873",79290:"f83967c4",79298:"10c28d6f",79315:"46fa8ad3",79356:"73a7bd5f",79478:"df5a3016",79479:"27e1a14b",79543:"fbff3b11",79691:"c733e485",79829:"6b2b8280",79895:"4bedd8c5",79958:"b00a2879",79963:"092519d2",79977:"4925ce85",80053:"935f2afb",80132:"18dd253f",80192:"fc8aebe3",80268:"eac8f2ef",80380:"2d35b91c",80709:"83cd8f20",80895:"f97cc188",80940:"abf6a1f1",81012:"85c9bffb",81135:"0c1ee94a",81400:"29b6c240",81477:"2dd65ece",81667:"a39041db",81708:"82a4f002",81835:"6a0b4355",81934:"35fa8025",82132:"919b108c",82250:"51da09c7",82342:"dc130668",82348:"3f6554cb",82360:"d885d629",82423:"fadcaea6",82513:"7bcf7096",82614:"aa395a59",82621:"ba4efbe0",82732:"ba8527a9",82864:"3ebee193",82982:"bf02c3ce",82989:"39b565ff",83069:"39c8ecdc",83074:"1a42aba3",83139:"fba94ee1",83147:"a26f7afa",83513:"fb0364ff",83692:"779753bc",83698:"39c159d8",83893:"66716ec1",83957:"10f908b7",84063:"363318d5",84097:"737371dd",84242:"6e366b57",84281:"a3286ddf",84362:"8da7304d",84429:"2da29b2a",84477:"3b12bc8a",84500:"6eeb04e2",84513:"6cb122e3",84710:"4cec253a",84745:"42325f5c",84754:"211f58b1",84841:"7e5ee96c",84847:"5c27dd68",84854:"34b19815",84888:"e9d5739e",84941:"21bf64ca",85011:"6e5d074b",85054:"7bc3feb7",85098:"60d04b47",85217:"f8482b2c",85419:"3743f01c",85455:"7e446cc1",85493:"6d480200",85780:"82033eb7",86009:"c9b79676",86018:"ed809cac",86129:"b47406fa",86150:"1db21d86",86317:"a540f8cd",86333:"08e3aaa9",86356:"4ad39569",86476:"45aa7127",86518:"ce66b6fd",86826:"7dd3be25",86950:"4e1da517",86991:"d1b2a42e",87064:"a6a8af40",87223:"27bd5328",87224:"6fc8d865",87240:"1fdab62e",87304:"9980f90c",87313:"96c0bb00",87316:"a48778d9",87443:"96ec050b",87460:"7668acae",87482:"8faa0fb1",87513:"a291f403",87634:"f807eec9",87667:"c07f2717",87799:"3958a146",87836:"e111f111",87866:"b1998bb1",88068:"8d28c4be",88179:"e5842021",88187:"1e391540",88204:"5304a4a8",88252:"c0074ddd",88295:"21ad5224",88338:"e1b9986a",88380:"ac930f6e",88446:"3c725018",88598:"6827856d",88621:"4499569c",88625:"6f59957c",88821:"41db9914",88831:"1c56d006",88879:"32ea4ecb",89002:"4455e85b",89122:"234a1403",89210:"b4028749",89215:"17114a18",89574:"13b69fa8",89675:"164cd634",89780:"46c600d5",89806:"443045da",89852:"1f79049f",89984:"93d3457d",89986:"fee1f25c",89987:"d25ffd5f",90046:"d043cc46",90075:"390ef088",90085:"d36db526",90185:"41b3e733",90205:"727a1f3c",90333:"f2497893",90342:"2c91f584",90377:"7d607fc0",90392:"f60e43ec",90398:"0b78393d",90431:"dd313590",90451:"0a13c98e",90464:"b2335bc1",90536:"73dfc993",90560:"01fb8b11",90601:"22f40a40",90610:"147b0f6a",90615:"8cd0f4f5",90645:"6601f604",90666:"459a783a",90865:"87e7806e",90896:"4302562a",90976:"6eb0ce42",91178:"9c4bbfc4",91213:"ff0539a2",91231:"28b27838",91274:"19345251",91287:"02ee0502",91304:"872e63de",91316:"b82d5884",91406:"4cd7d8af",91425:"1671b3fa",91523:"d692bb25",91628:"c839a5b0",91753:"2f535455",91782:"304ed800",91810:"0f7553c3",91849:"b2735041",92085:"000c061a",92244:"8c828746",92269:"9c42de85",92393:"db5c8692",92404:"799b872c",92456:"d6360c39",92463:"4d4093bb",92744:"6eebf72d",92775:"7861f6df",92778:"8c31caf6",92786:"ae5bb339",92843:"85c3ba36",92851:"8bfba65b",92900:"e5a16b2e",92964:"14e00221",93023:"b984322c",93071:"61e5c5b8",93151:"e7cbe8da",93176:"f7101d4f",93195:"740eb29c",93308:"b83df1bc",93340:"5d075efb",93346:"f7735fb0",93377:"dd435828",93400:"03e8549c",93590:"dede40b0",93749:"4e6907d6",93832:"917734f8",93837:"cb341380",94114:"c9aea766",94123:"7c8407dd",94136:"91dc98f0",94197:"37aba5d3",94223:"43b891d1",94328:"63f66cb7",94337:"9fdf7324",94401:"6c10648f",94452:"878356ab",94605:"487f7f30",94694:"d3e690ce",94696:"376d31f7",94932:"a233fb97",95020:"b8e39b95",95107:"d666ab7e",95171:"3db8c88b",95281:"bc08bf79",95296:"9936b6c8",95317:"cf282674",95327:"1e173bbe",95329:"5b23c695",95364:"41fbfe2f",95418:"7877b0eb",95441:"e9ef6b31",95561:"0e0f5dd2",95696:"8462ad7a",95745:"edf19300",95801:"e490fd18",95816:"9dd89af2",95911:"7e254f9d",95945:"90b0cf6d",96055:"8fa500ae",96078:"d6011437",96082:"a322018d",96135:"3061ad92",96188:"f0129862",96199:"8e2c0739",96361:"ebf2bdda",96426:"64bd79cb",96535:"38e65fdd",96544:"49ea6ca5",96547:"385bc71d",96617:"e23cd647",96684:"a612420b",96768:"2035956b",96772:"b35418cf",96831:"99ba663e",96945:"09e11ac0",96971:"57973c2b",96979:"7f6f8f16",97065:"6816f4c0",97129:"f3034cf4",97334:"9d4bcb9a",97469:"d91e7ab4",97523:"02fbc840",97547:"902fdb3b",97553:"7ea214d5",97557:"c70cb355",97617:"ed97cef0",97648:"6f25dd34",97782:"b094b997",97816:"7513b789",97826:"16cff1eb",97850:"dd6685df",97920:"1a4e3797",97955:"746bf890",98177:"049dc708",98200:"0e7f2915",98218:"1820eb3b",98272:"b7f629d0",98623:"ced65f67",98740:"d1475ab1",98791:"1a6f209f",98868:"6a913ab1",98939:"3ff950a4",99120:"008b0ccc",99184:"8aecb2ef",99266:"ca443c18",99299:"7cc0ca0e",99367:"00125b11",99389:"c2f4aca4",99427:"64758f43",99494:"f2d5637b",99607:"49ea4a42",99669:"32db5af4",99839:"15d4dc80",99871:"5e3def70",99997:"b63b5bb9"}[e]||e)+"."+{19:"73207e89",21:"a3309319",71:"148bf97d",99:"3f2b222d",172:"c102a782",224:"487be67f",250:"e9efb2fd",291:"b2f7c218",467:"3701a1f3",513:"09978e25",596:"7b2cab9f",803:"9d68a6cc",899:"a7eb7364",1116:"56b8b25b",1365:"573cbaf6",1387:"23980d09",1523:"ec2891cc",1647:"63a2d108",1652:"94113f44",1664:"180295d2",1671:"62a8f5fd",1940:"c99c168b",2044:"d2700165",2097:"7188f7ec",2196:"3febcd57",2312:"d00cc62a",2427:"f701c31d",2570:"3ec3deb8",2638:"118a41f9",2656:"37531cd2",2905:"ff69985d",2916:"2ecd567c",2989:"cf806e65",3044:"712f76ea",3102:"bf11a1b5",3145:"b6e9e373",3191:"74dd2862",3197:"9da021ce",3216:"eacb9804",3281:"35926ded",3283:"cd4afe73",3326:"58beb1bf",3397:"b0ae73af",3398:"63275511",3447:"fb8d1ba4",3650:"2e235066",3667:"1f8a4cb5",3914:"54e8dd0f",3919:"41dcf0b8",3942:"21980884",4125:"96f2442f",4151:"1a6afc47",4195:"7402cd75",4244:"8675cdba",4328:"e0ebdf09",4504:"025ef68d",4513:"558b14b1",4585:"05c4d67b",4631:"1d37eb2a",4742:"5447edff",4874:"39750b03",4882:"d87f6d94",4929:"18e132f8",4972:"a7243668",5061:"509f2053",5129:"479f5110",5132:"27ce2caa",5313:"5f8b4c43",5352:"578d6913",5383:"306e0d1d",5512:"8f6a2d54",5714:"1a18df09",5909:"51facc0d",5920:"8c8aae04",5981:"68d16136",6027:"6f5c789e",6151:"81e3b729",6386:"67f2de5f",6443:"14902fad",6517:"34dee69c",6537:"ba99c5c9",6553:"08568a59",6695:"cfee6a07",6734:"6772bb12",6799:"67367b32",6822:"f569301c",6824:"6e7a03e3",6968:"a44bcc6e",6971:"873ad005",6978:"c6ebb24c",7078:"521e19d7",7091:"8c3d2fe2",7092:"cc64c0ff",7108:"d17c6119",7120:"d18d3e66",7155:"f52fceb3",7162:"cf567055",7318:"e3d90eca",7451:"334acc4c",7470:"ae4956f9",7485:"7b41d7f5",7500:"099ef3cd",7874:"7c1cf1bb",8023:"897a9b52",8135:"672f0bde",8145:"5d99a1dd",8188:"3b884b4b",8210:"d7b2d51a",8230:"119f8a4f",8313:"5d208e75",8328:"d30eb04f",8407:"28cf1826",8482:"cceadeb0",8638:"c9ba8e41",8671:"b4177ac9",8809:"959e39c4",8882:"0bf188b0",8906:"cfd44206",9028:"9d1978ac",9119:"fd788116",9225:"a93f8834",9235:"b22f9f6d",9365:"8b245b69",9444:"ca6f47e0",9542:"4f4a1a23",9550:"83ecb96d",9615:"54fc882e",9817:"83bf0cd5",9836:"c72917a2",9907:"676fbeba",9947:"5cfa1c77",10109:"beb060f2",10228:"aa539b8b",10270:"8ae752b8",10436:"1ec22cec",10480:"8d52d22d",10497:"809aee5f",10650:"547a5e75",10756:"2bf7ecc4",10882:"023d9d6a",10918:"ef85344b",10987:"c12ef65c",11147:"29a02178",11174:"2c33f6da",11203:"9539d62e",11311:"f40cd0a5",11321:"cd4efea5",11326:"6cbc304c",11342:"2bccdb3b",11398:"f8e68ae8",11565:"f2495b87",11646:"3c4058a3",11656:"ac13cb1c",11875:"5b868eb0",12228:"d521a8eb",12442:"25d7e31c",12549:"16f88776",12555:"900d6f87",12558:"0ac14857",12560:"9bb2eb9d",12567:"7d910920",13253:"0ef1443d",13280:"29f73853",13351:"804b2952",13460:"9e91d59d",13588:"d0b6d6aa",13595:"6c1f4a63",13617:"76fcc35a",13718:"94e8a0fc",13896:"76d63f5f",13924:"ba0bd4a2",13972:"302371f0",13979:"aad133c6",13995:"243e677b",14061:"6004bdb7",14088:"6e4abf52",14095:"18927dce",14143:"a3b631ac",14200:"5d9965b6",14299:"f4efc87a",14369:"b25e3d4c",14386:"acd52792",14396:"fd9bfcdc",14549:"607a549f",14610:"a9ad2a64",14670:"bd2bffef",14713:"8c81902c",14775:"35ca632b",14840:"4ae7e4f2",14908:"402f178d",15196:"9dd233ed",15380:"38251be4",15393:"bc061f23",15497:"2c78c17f",15658:"3662f8f0",15888:"8a70bc30",15970:"360efba8",15994:"9794e987",16022:"c86edca7",16038:"8f228690",16058:"f0b25bfc",16071:"0df54331",16153:"4800d10d",16161:"3a69d696",16379:"38089655",16528:"f38ed133",16635:"e969de7c",16672:"38ba7418",16685:"3022befc",16876:"19436136",16891:"303f3d1b",16973:"9fbbd5c9",17009:"80398da5",17275:"06c7edce",17283:"b641bac3",17457:"ab4ccae6",17511:"fdf99bc4",17726:"fc4837d4",17757:"10a7c58d",17785:"172166c7",17883:"3f659ac3",17887:"fbdcba1d",17989:"52e3fc0e",18025:"8e9620ce",18050:"5b8280aa",18084:"fcd7fdb2",18100:"0a6d52c3",18143:"b09cd8a9",18156:"a975996f",18186:"df8b47fc",18318:"edf202fa",18559:"5c93aa35",18598:"cb52400f",18680:"59e00bd8",18706:"15008a37",18734:"5dd15d0b",18746:"f9ac8609",18883:"90cc608f",18892:"6c8911a8",18894:"74b1ce85",18928:"0701e03d",18998:"d6cefe2f",19177:"f4fb3a86",19204:"25b579ad",19212:"c6205a58",19305:"bf07439c",19408:"893bf9b0",19427:"13389c1c",19493:"0990c5c4",19504:"3cbf15b2",19531:"795dc04c",19625:"acfca89a",19661:"8f1f85f8",19671:"e0c673af",19709:"eaec2d23",19733:"3acc99c9",19806:"837a7ae1",19832:"52e2b5cd",19876:"a944f10f",19939:"491b1577",19962:"a3ecf956",20040:"1320b0ce",20061:"fdab8ea6",20169:"f30c5c13",20303:"ac64adc9",20602:"d2e8db2d",20689:"f0ff8154",20707:"9011dfb7",20764:"705b6a69",20841:"ba439ab1",20876:"81d45514",20911:"eb39e4b7",20917:"9c9a3e5c",20983:"f78047ac",21015:"1e986630",21134:"a1896a0f",21143:"16349e64",21190:"580de28d",21207:"8e9578f8",21228:"ad3ad6e5",21379:"80aa9c55",21643:"eb35f457",21688:"795caae7",21823:"35e78f99",21983:"a4e4572b",22030:"c38a2655",22087:"817ffdcc",22129:"267916df",22163:"f5657f46",22238:"bdfbafdb",22456:"62957769",22523:"15391619",22540:"57cb9539",22583:"b8adcfe8",22604:"4c410f1b",22777:"991b45b1",22898:"c8aecb21",22940:"873908b0",22997:"f3a4a591",23064:"03b7ec0b",23228:"c0599384",23231:"295984d8",23252:"94c1e97b",23310:"c407c53a",23320:"3c9b69f0",23343:"08e5a4d6",23435:"59082b53",23522:"dcfb4085",23536:"2a58bbac",23545:"4623b3a1",23663:"33ee14f2",23714:"43716ebe",23804:"66d68fe3",23898:"ec519008",24058:"07462b4e",24066:"9d4d9ce3",24101:"d3e3013e",24109:"795d5349",24158:"4b081433",24266:"0b540723",24282:"d3ef7720",24401:"1ae158f2",24467:"dc4c3279",24501:"178a453f",24946:"8d83115f",24986:"00c01dd4",25079:"1dab7340",25251:"3cee59a7",25283:"06e3d89c",25427:"854a38e7",25433:"22246f6c",25451:"a23a897f",25513:"3f77f081",25547:"284c9b9e",25579:"9e7055ec",25833:"71a40566",25898:"94b4215a",26067:"7b3970ce",26084:"861dcdd5",26086:"12738d95",26201:"75d8825c",26291:"e8f1fa5a",26311:"ce5c5ebb",26361:"5c0647d3",26521:"89b58f07",26654:"4d65993e",26686:"9da74581",26695:"49306f14",26858:"1ce10981",27031:"e213c1a5",27109:"fbcb735e",27167:"eb8d133f",27270:"3324e9fc",27276:"2fd74dbb",27303:"46258253",27324:"04a7ca69",27554:"c429cc73",27704:"849e54d9",27918:"ca462563",27982:"0533c8f0",28085:"d5cffe43",28134:"3e2ffbbe",28139:"80df3532",28181:"4f668e37",28219:"51e8e461",28261:"f279f4e5",28367:"dc6ae3d7",28475:"2000a841",28476:"892e8462",28514:"2d31535a",28516:"d3f4d479",28623:"760e1770",28699:"c9753e68",28800:"7ebb42b4",28880:"777c9b40",28882:"79f11e9e",28906:"dd4d9809",28922:"4a03c698",29014:"63a3f3cc",29025:"caedbded",29050:"15e17037",29066:"100d7b9b",29131:"61e3e7a5",29191:"6765a974",29272:"a7da1cef",29514:"902f2c64",29520:"59707016",29698:"4ac96687",29717:"c3facf77",29782:"8ca83049",29818:"be78b6d0",29831:"c421c31a",29864:"7e0679a3",29871:"8a4a1409",29886:"da3cf2c4",29899:"8cb1ad4a",29978:"f29be154",29980:"15805725",30062:"2dbf55d1",30216:"c844cada",30295:"54944412",30419:"be694780",30433:"db6c199c",30454:"5264f7a4",30470:"0ae2450e",30589:"3a397208",30613:"aaec584f",30677:"b72f0627",30678:"2a52c41d",30800:"9375989b",30820:"b28f98b9",30834:"ed2abcff",30837:"e852c553",30865:"8b9d510d",30885:"e871b509",30979:"589a7d3a",31009:"41ecc7f9",31013:"7a5f9581",31023:"970d7bca",31044:"732db84c",31050:"400af1a1",31068:"1f0b2373",31089:"ec193a0e",31116:"bc1bd6c9",31152:"a086d363",31187:"52f3a337",31293:"17ff3286",31294:"3c2a361c",31441:"c659a961",31471:"df9c6253",31512:"69ffbcf7",31516:"6f3edbc7",31570:"a9bef6dd",31671:"4d0bd185",31824:"22f60a1f",31938:"7889faa7",32224:"19c9db8f",32255:"65ece1b5",32308:"df0742eb",32319:"63655ae9",32410:"943f13a7",32446:"8ce02657",32491:"bdd9498f",32567:"277cb195",32652:"72910332",32689:"08107660",32839:"5c79dc2a",32872:"855c14fa",32892:"250cb5b2",32914:"70e38801",32961:"654461de",33023:"2964b050",33076:"ef5beb95",33083:"16e34a1a",33131:"a67a7052",33138:"cbc82116",33178:"555e80f9",33181:"2310c781",33223:"e5f8838d",33260:"5ee9d0c7",33261:"a2e46c4f",33329:"94ab58ef",33407:"3ff66407",33514:"0e769677",33725:"6470c099",33737:"f6fbd704",33889:"1f400f9d",33920:"32836d68",33966:"9d5d17df",34020:"309d55c2",34077:"652a00df",34079:"25c81e5a",34153:"f309a901",34206:"8ef010f8",34293:"a7e2d1af",34294:"21881a35",34323:"82bd13d2",34407:"f08cd09f",34458:"bedb0cad",34460:"51defee4",34475:"994398a4",34552:"278830b5",34590:"e79f245c",34647:"cb920ca6",34656:"53a0d9e7",34748:"c74caba2",34766:"9716c156",34777:"55318a51",34784:"121bb89d",34792:"10db5cbf",34800:"31234350",34882:"53c961aa",34943:"5a2f2d6e",34979:"be9c4116",35038:"852d9cfe",35069:"05c8b29d",35214:"3be2021d",35216:"783d15db",35387:"d6c4b7cd",35466:"b8c66a97",35577:"eef0d34f",35614:"0550e592",35647:"8df42e8b",35768:"fa150a9f",35809:"29c1c1a6",35874:"64598624",35879:"603ed18f",36009:"ed99bcf4",36312:"f6211aac",36415:"7269bd46",36442:"781b7f36",36483:"e4ad43b7",36495:"c4a16cab",36511:"aa66c640",36673:"9639c6ef",36766:"a983063f",36773:"364a602a",36933:"6c6d7692",36935:"232400dd",36983:"87a7744c",37021:"8953fbe7",37055:"8a714c7c",37058:"e2b52ed6",37208:"4babdc40",37257:"7b25eb85",37316:"136d87ad",37362:"af7565f6",37426:"a3fce28a",37894:"1d31c5b3",37918:"944b0fb5",37977:"a605632a",38056:"c9cc2c03",38104:"70b4c07e",38230:"a9d249c1",38333:"6d30319e",38349:"86ac1432",38368:"99e33615",38450:"971f211e",38469:"c905d16c",38504:"b575cc51",38591:"8b436f7f",38679:"2924f701",38741:"056b89be",38768:"e49628aa",38792:"873b0b4e",38819:"d5786d3c",38873:"aa2dff10",38928:"915007c0",39033:"72f28f6d",39177:"e72fee4a",39209:"ab100076",39252:"a8e9c58b",39275:"68258924",39325:"9e574bbc",39368:"9b3b00b6",39605:"87f4261f",39645:"363e983b",39726:"e601e6d1",39820:"efc15edf",39853:"2eceed3b",39941:"09b3269e",39972:"1af66b9e",39978:"3914825c",40097:"065399f9",40158:"c5447ab8",40176:"a0efce43",40342:"6b02e5f3",40365:"943c9bb3",40665:"43c72f99",40830:"28cfd22e",40930:"0bac8708",40936:"6f445b64",40986:"f6358136",41100:"8b72bd88",41120:"a6c90b9f",41329:"327337f0",41388:"15946aa8",41537:"507f5136",41750:"dfbc322c",41840:"319bb3a8",41863:"397dd98b",41929:"1df8f60e",41954:"7942c49d",41958:"83b83b97",41998:"a08698a0",42051:"c783d817",42054:"9a68e985",42059:"dee04e82",42169:"d7053385",42187:"2cada35c",42226:"85755b59",42263:"1b5d9df4",42288:"5b62f81a",42289:"0c3f570f",42371:"d7fc9caa",42436:"1827f217",42465:"cf7195c1",42551:"e6eb7da2",42609:"0fc5596c",42620:"81528074",42690:"c8225ded",42721:"697fbb16",42728:"2d6aacf6",42757:"a4a33845",42930:"4a3d4ba3",43037:"f8316728",43047:"4998e689",43072:"2e96fcd7",43163:"588bcffe",43294:"0b488eb3",43529:"f1d99b35",43554:"9b15f63b",43635:"382a5fae",43645:"e2041df8",43697:"5f5f48af",43793:"b242fd59",43849:"3d340240",43919:"60913365",43966:"f807cefc",44023:"d8d2c9f3",44029:"a29133e6",44118:"1bc0c1f6",44152:"93b2a9bb",44174:"5bbd8c7c",44393:"619828bb",44523:"c96ddcbc",44592:"64edac99",44765:"404d2c80",44780:"8b703881",44797:"bc343266",44860:"83bf1478",44907:"b7e881e3",45057:"026ec45c",45091:"f47baa55",45114:"c5711e84",45279:"b7f604ea",45287:"7ea5080a",45571:"b3f438b3",45583:"95c5394f",45593:"fa50b001",45732:"646f2c6f",45786:"02f6764b",45809:"a28b7ade",45878:"2405e6ca",46017:"fb5cff5c",46023:"02d71bae",46045:"7c8d179b",46074:"85dc8255",46218:"304359f2",46284:"35783c6a",46328:"e6ad5407",46447:"b4d585fb",46838:"615dfccd",46901:"73c5e427",46945:"aca29914",47062:"ecb4b558",47068:"e5db7558",47082:"87413ddb",47117:"925c7a4a",47276:"414f3077",47287:"17c798ef",47463:"ffbf05d7",47568:"2216e773",47582:"98df77d3",47655:"741c773e",47708:"28dd4c52",47838:"d2dfacb3",47851:"9638e317",47975:"1d9b9deb",47986:"768f43ef",48031:"62017355",48055:"6eda5bd9",48150:"3729eba4",48218:"786de4f3",48320:"0781165c",48349:"1227a5f7",48426:"e645d804",48637:"429d7094",48670:"82bc3aba",48840:"9394dd4f",48998:"217ac770",49096:"bf964682",49169:"7d6d8f24",49241:"a8e10b10",49270:"cdfcac6d",49681:"bf0ec94e",49716:"b198539c",49874:"1c08aa98",50017:"8cb0b147",50052:"32a7f158",50145:"16d798eb",50153:"b103db02",50193:"cd9074e9",50240:"e5ce9a3f",50337:"e1095412",50362:"1653c490",50375:"e03acd0c",50437:"afd1bfa0",50472:"7aad844d",50525:"8ffca8d0",50773:"93035d26",50849:"afc6073d",50999:"7814621e",51532:"4b48e1fb",51555:"8c0ea918",51574:"6cb382dd",51593:"ccb11905",51605:"0604c821",51625:"9af5a24d",51706:"5c9852e1",51768:"4856aecb",51799:"04e48987",51830:"e3f6d03f",51840:"6e2f046c",51881:"6f7b85ae",51945:"084ffea8",52094:"5eeb6372",52126:"5ffae957",52251:"71f8dc72",52286:"dfe5dadf",52491:"3832c13f",52499:"3e054d23",52573:"006068eb",52586:"fc99d991",52593:"93febf0e",52715:"246739de",52789:"4e836d21",52870:"58b181cd",53237:"c3a4514f",53243:"5c3c3aa5",53371:"b01e4c10",53442:"ddb338e3",53675:"bd70cb8a",53823:"6530507c",54125:"2f1c7fe0",54133:"45c8fdf2",54178:"89e6c31a",54200:"8191d4b9",54210:"0ff7d73b",54250:"f3a66d07",54265:"4eb7ccff",54363:"101747c1",54382:"226534bb",54397:"95ca43ad",54487:"96d9e1eb",54513:"952be9ba",54591:"4c7ab366",54741:"8d1a6e43",54756:"354bdb25",54778:"ab81ffc8",54786:"18a4a7ca",54794:"daac75f3",54855:"4f0b894e",55043:"923df72d",55216:"dff3209e",55239:"52636bf0",55273:"27e02213",55335:"40de3b68",55478:"003e195a",55552:"c5b6f07a",55693:"a959df65",55726:"8042dc4a",55745:"17a44615",55799:"92eb75ec",55821:"6a3913bc",55925:"5005937c",55962:"40e7c1dc",56290:"d30a97c4",56424:"1a06e672",56513:"5f155b72",56541:"ac6ce218",56552:"4d94b2e7",56614:"f2830c22",56750:"186f7120",56795:"017a94ef",56902:"f91db1cd",57121:"e7f833f5",57126:"704067cd",57242:"c1396e1a",57258:"3aaaa2c9",57293:"3721a264",57341:"2aa6199b",57489:"3d3e42fc",57598:"7a15184a",57599:"a8e91558",57699:"77e23c32",57749:"aa83a7f5",57780:"44a93095",57820:"d9fe0245",57995:"11d969fd",58009:"c5f9c347",58042:"9798f0fe",58096:"637b08bd",58182:"b13442de",58197:"1d43d81a",58234:"d3624c41",58247:"47cd75a6",58356:"1aea63a9",58539:"2e4b62b8",58564:"fc408722",58768:"41323c68",58818:"9671e908",58822:"0e081dcc",58824:"3ea8fad3",58914:"33792c32",59051:"ce30c81d",59060:"f4bb9845",59181:"ca99c1d1",59191:"4f95915c",59241:"126c9fc5",59248:"8c5d9aae",59336:"1face31a",59342:"2c3b6d28",59394:"cb1346cd",59427:"9a0b349f",59442:"d5c2c74b",59496:"0c397f5a",59506:"4a81177d",59533:"915071cd",59592:"b7b4e63c",59771:"d25615b8",59900:"1e9304e5",59982:"c299fce0",59992:"114b244b",6e4:"cc9f4527",60185:"babf0d59",60331:"924ba5f7",60434:"52042dd4",60487:"80f0aadf",60518:"c0f1e5e8",60603:"5316c07a",60608:"9430b8fe",60682:"7854ea33",60831:"08063f83",60868:"267fe23b",61007:"dcc4e5af",61200:"6c5f5e8b",61210:"d73ed0a2",61249:"09ef4526",61271:"f20051ab",61361:"636b72a8",61426:"4a4dfe81",61543:"8f27603a",61643:"f9dbfec5",61793:"01718e8f",62117:"a30c0b4b",62235:"58285753",62307:"5a3ba620",62318:"9a886142",62319:"b75eb48b",62474:"dc194af4",62493:"bbbbd1d9",62523:"61046ac8",62547:"fc411513",62655:"f09d9004",62867:"46a98c28",62948:"dbf70f88",62983:"61d5d965",63105:"096987cc",63387:"77f70d47",63604:"66965e35",63663:"c51c3a56",63777:"0219cda6",63801:"70a50fe6",63808:"379d8af0",64e3:"c520b0e8",64032:"b5c596b3",64072:"f3f20053",64100:"7d16a1fd",64122:"1eb9bc55",64164:"e2cc50cc",64485:"772190e7",64525:"fde13b5a",64533:"dbde94ef",64596:"7e071618",64656:"5c661dee",64754:"1962a975",64766:"2e434b9b",64824:"68d4a423",64871:"0c878ea9",64900:"9c15d13e",64967:"8c69c566",65132:"c728a280",65501:"2405319f",65592:"a5b39398",65594:"30989c6f",65641:"3548b448",65758:"9dc5e65e",65883:"8e29ae7f",65965:"fca751f4",66026:"e4206534",66081:"0b35a4d3",66187:"6bc78a22",66238:"c7bfdb48",66256:"441bc284",66303:"7097325b",66336:"7ee25827",66462:"933f4a86",66465:"711a6a15",66597:"abbb3946",66744:"fa50fa5d",66958:"d3834829",67010:"64408058",67021:"b6e03309",67061:"94f1459a",67069:"a14d8963",67132:"34a01d6f",67343:"0929d0ad",67448:"c3192226",67583:"7dea7c29",67597:"c925248d",67638:"4606f8cf",67670:"cb8c9278",67908:"7befb180",67954:"a0ccf3c9",68034:"673d4259",68126:"ab783f9c",68258:"fa76b63f",68689:"3a807ecc",68757:"4b7d9782",68793:"af3c15ab",68823:"5c7b8d2a",68943:"f7a596db",69057:"2a9b8add",69111:"6f76f4e0",69125:"6052e105",69169:"6e79070e",69209:"9e539e38",69234:"6337130f",69254:"270b7bf2",69628:"7988be09",69629:"6d3ab36e",69647:"c5c936f9",69824:"7d0858b3",69843:"a06150bd",69899:"576d10d9",70081:"18e16c97",70130:"be0e7aec",70178:"81e6c33b",70249:"b993646f",70277:"311356a4",70367:"1d4d5424",70497:"557d33ec",70504:"28a803b5",70543:"2fedfbd9",70614:"4f9da953",70706:"73960b42",70964:"bc48267c",71081:"31a6d953",71160:"4dddf998",71169:"adf88673",71287:"885b618b",71476:"19277f18",71516:"b09b1ddd",71544:"27aa44ac",71698:"bc2f9371",71789:"59ccffdc",71811:"264139bc",71918:"b32dc731",72070:"7ed11917",72189:"6dcfecb3",72331:"6d29448b",72500:"f64f66be",72613:"c769eb3e",72638:"fdb543fa",72740:"556dfa23",72887:"4cdbf544",72952:"e3a8eab8",72957:"960770e3",72978:"aa7363f2",73238:"d2a2d1e9",73300:"e20e9d0a",73338:"04155867",73356:"36d8b7e0",73369:"f8653575",73452:"ed4f95c5",73533:"0ca294c9",73537:"3e09c354",73576:"1077006e",73618:"f611989b",73725:"5a69a073",73745:"593cd9c9",73766:"e1b090b0",73858:"c8d96799",73975:"4a26020a",74019:"eab93d36",74061:"47eacce0",74091:"b78b34ed",74132:"a5970e5e",74136:"f2226b77",74139:"0bcd285c",74332:"0f66d626",74337:"d3114a2c",74362:"8c45079d",74441:"0ef242c1",74465:"3b3b7dbd",74480:"25f1369f",74578:"178c32cd",74794:"8f102688",74839:"577811de",74882:"f0b13ca6",75070:"22d722a3",75118:"bc928be3",75123:"8b8b6076",75203:"8b7d980c",75273:"8dbf257f",75546:"49f94163",75567:"355318aa",75671:"5943eb1e",75702:"dfcf6ff3",75897:"b7955be2",75911:"02fe2b4d",76029:"a35fff25",76180:"059fca9c",76222:"22652783",76240:"6c6980d9",76355:"b62d2caf",76360:"4e523b78",76369:"9e396c17",76527:"50bbd3b6",76668:"1fe0055f",76793:"a673c81a",76878:"575a3510",76895:"bd4b403b",76924:"bc5962fb",77053:"16eba6b8",77170:"e863ae3b",77413:"7e5de04f",77427:"add12e61",77527:"78b8c9e0",77655:"0003883b",77680:"595d9465",77887:"f2e6f18b",77923:"fb735e37",78035:"4ead3dce",78038:"52038e48",78063:"5324c9d3",78072:"d4105f8b",78114:"34cd7f60",78233:"2e83e610",78248:"74069962",78452:"8e3ff138",78863:"c85a658c",78924:"f65e527e",78991:"52cf8db2",79073:"9b6a9356",79164:"2bf3dad1",79239:"6f1637b7",79290:"0c519db4",79298:"4a5cabcd",79315:"4576f615",79356:"37e00d95",79478:"36233afd",79479:"24435308",79543:"a1c76445",79691:"abf6490b",79829:"d0e80522",79895:"85d250b0",79958:"aeb03179",79963:"420ea460",79977:"09932290",80053:"544bc93f",80132:"2e07a3f4",80192:"44c46920",80268:"729162ff",80380:"8a522d91",80709:"fd31ec19",80895:"2b1b02ce",80940:"34e77834",81012:"71e2a477",81135:"a0f32953",81400:"f1a7b723",81477:"81ce2eb6",81667:"b8b8cdb0",81708:"4d7d7762",81835:"efcb7b56",81934:"33bbfee1",82132:"14c8bdbb",82250:"ca3680fe",82342:"74cd685f",82348:"85d844f1",82360:"143d911e",82423:"7b71feb1",82513:"2f0d774a",82614:"2a95d08e",82621:"dfcb91e5",82732:"fd3290c8",82864:"32d19c2f",82982:"a2616da8",82989:"6a4597e6",83069:"47b3ad61",83074:"c97db8cb",83139:"80fa04ea",83147:"888c5d69",83513:"e4ecf5db",83692:"db7a78e4",83698:"eb723d7a",83893:"3d2b86a2",83957:"1b9c82e0",84063:"c8ad2693",84097:"6ef8d9da",84242:"c9097d63",84281:"6a10882e",84362:"c6417e92",84429:"5df8ba6b",84477:"d8bfc341",84500:"431c6477",84513:"b6949bb2",84710:"de128b25",84745:"b52aeec7",84754:"dee9fa1c",84841:"93e39080",84847:"215dbaa3",84854:"84dae178",84888:"22c51fd4",84941:"b4c8a6cc",85011:"38673f2f",85054:"adf0601d",85098:"279d70d3",85217:"f61e93e0",85419:"46bf7559",85455:"83eea1ee",85493:"119281a5",85780:"b9f5f272",86009:"70baefe3",86018:"31bdd1d7",86129:"a2e712a4",86150:"cc7f840f",86317:"fdb12ba6",86333:"ea59a642",86356:"dde2e98e",86476:"197674e4",86518:"9284ae71",86826:"df69b0fb",86950:"e65ab699",86991:"e5fcd808",87064:"3cca1274",87223:"c957930d",87224:"e14869b8",87240:"58b186a9",87304:"886ecc13",87313:"c8286e0c",87316:"b4e193d1",87443:"9aa85498",87460:"e647bbd4",87482:"b9081c96",87513:"b2501ac9",87634:"413486e3",87667:"1173ba50",87799:"ce5bac8d",87836:"b8d9a0fe",87866:"7f1c6977",88068:"6611ef54",88179:"0286c49a",88187:"138206fb",88204:"c6e4cde3",88252:"a2adc43d",88295:"58531032",88338:"44d418f3",88380:"8c2767c8",88446:"bedc1525",88598:"44737f48",88621:"66870438",88625:"fd93b3b7",88821:"a3cc23cf",88831:"f98e1422",88879:"bc39a8d7",89002:"a4594737",89122:"7931db90",89210:"bbe355e5",89215:"6f690905",89574:"a872329f",89675:"2de3dd35",89780:"cb004132",89806:"8290698b",89852:"0bf03005",89984:"c9c8bc7b",89986:"25e5550e",89987:"7ef3f99e",90046:"18bf9d1a",90075:"4c3e9e36",90085:"bcb38f13",90185:"9e4a08ea",90205:"938baf6b",90333:"50f0aaba",90342:"f21131d2",90377:"d1f93533",90392:"bfd01257",90398:"52310a46",90431:"8afcfb3b",90451:"9f655912",90464:"b5c9216c",90536:"9d746e67",90560:"f21ce84c",90601:"e04386b3",90610:"bf6a18e1",90615:"5eb23ee3",90645:"ec412e06",90666:"67ccf36e",90865:"988a7db1",90896:"464b25cb",90976:"3c307ae3",91178:"ae626034",91213:"1668605a",91231:"20e516c9",91274:"06c24edf",91287:"2cac701a",91304:"51bac317",91316:"2e1f88fa",91406:"2f540e5a",91425:"6224592c",91523:"da34a7fe",91628:"5022e7cf",91753:"c5461936",91782:"d494ecff",91810:"e9392fb4",91849:"c19b88cf",92085:"da4ef155",92244:"b2497d07",92269:"5799f8ca",92393:"63912027",92404:"75dfbd51",92456:"90063d24",92463:"3c0a582c",92744:"d29698c0",92775:"6cc03df9",92778:"871cdeb3",92786:"5e68e413",92843:"ae295bd7",92851:"d6f651a2",92900:"e7dbc981",92964:"b6045191",93023:"67e73b3f",93071:"060ab57a",93151:"ee800142",93176:"a74fd090",93195:"e70b51ce",93308:"1467e450",93340:"11edc4dc",93346:"1b4782c6",93377:"d70f55df",93400:"5c88336c",93590:"c0ae13c5",93749:"b0c0c0b9",93832:"ee4682e2",93837:"e0b7e94d",94114:"c4a618c6",94123:"1a6f14e8",94136:"acfc65cb",94197:"84ad4950",94223:"0ea8a7ad",94328:"3c453dd8",94337:"b1e4d02c",94401:"310ef608",94452:"890d4cf7",94605:"3a9648f7",94694:"efee88a2",94696:"be43fead",94932:"f4de81f7",95020:"b3c692d6",95107:"84d38a95",95171:"b59fa835",95281:"452f0858",95296:"98493597",95317:"8d86f465",95327:"e8f86b92",95329:"b279cc8c",95364:"31dcafc0",95418:"5d25501d",95441:"589b9d96",95561:"fd1ede1f",95696:"cadd461a",95745:"f8e3a2bf",95801:"0a783e08",95816:"df2ce6d0",95911:"b9556956",95945:"f79f370a",96055:"b6642cba",96078:"531d5024",96082:"b914d28e",96135:"a45b92e2",96188:"a987d859",96199:"c561fd38",96361:"900f03a7",96426:"d4071849",96535:"d7e854c6",96544:"f92e8b94",96547:"4ffe2ef3",96617:"95351d4b",96684:"5704d749",96768:"217d8d37",96772:"c1e0dc45",96831:"c60ddcf0",96945:"98d2ab5d",96971:"ba9e1277",96979:"9303b4cf",97065:"947df423",97129:"00e43607",97334:"f8f4fc7e",97469:"5487219f",97523:"a3b57cd6",97547:"9e232073",97553:"2ab99719",97557:"23a5099e",97617:"a15faf99",97648:"57bdbab4",97782:"c09f3975",97816:"3b68fd86",97826:"d0281c69",97850:"77a305aa",97920:"6edfb8bf",97955:"ef56c17f",98177:"9bb8538b",98200:"d224c5fb",98218:"290f20df",98272:"69ca16b2",98623:"4774311b",98740:"5e28c252",98791:"f37abbd5",98868:"7936d28a",98939:"dbffc577",99120:"6a0751c9",99184:"9282b3d6",99266:"cb2b1bcd",99299:"c9478c8c",99367:"c363645f",99389:"b6458274",99427:"dbba9055",99494:"f16b2e8c",99607:"c5ea3f3c",99669:"ce4a8b9f",99839:"19eb08d7",99871:"2f5edb35",99997:"590480c1"}[e]+".js",r.miniCssF=e=>{},r.g=function(){if("object"==typeof globalThis)return globalThis;try{return this||new Function("return this")()}catch(e){if("object"==typeof window)return window}}(),r.o=(e,b)=>Object.prototype.hasOwnProperty.call(e,b),c={},d="@cumulus/website:",r.l=(e,b,f,a)=>{if(c[e])c[e].push(b);else{var t,o;if(void 0!==f)for(var n=document.getElementsByTagName("script"),i=0;i{t.onerror=t.onload=null,clearTimeout(s);var d=c[e];if(delete c[e],t.parentNode&&t.parentNode.removeChild(t),d&&d.forEach((e=>e(f))),b)return b(f)},s=setTimeout(l.bind(null,void 0,{type:"timeout",target:t}),12e4);t.onerror=l.bind(null,t.onerror),t.onload=l.bind(null,t.onload),o&&document.head.appendChild(t)}},r.r=e=>{"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},r.p="/cumulus/",r.gca=function(e){return e={17896441:"27918",19345251:"91274",21996883:"26086",24647619:"42288",26134010:"15994",28325500:"49681",34088569:"60487",38341509:"74139",39579801:"6151",46551803:"16685",54230287:"53243",62041344:"65758",62127933:"16038",65360910:"31013",71078103:"74362",78718572:"41750",79617745:"19204",84960677:"60831",91647079:"36773",92307374:"27554",95169675:"21190",95648126:"74882",99496549:"31152","906e49ec":"19",f5e3827c:"21",ab971afc:"71","49c587c2":"99","21730a31":"172",a93c3367:"224","23d30d6b":"250","5da0ca7c":"291",a5bcb3f1:"467","9216ce7b":"513",b564874a:"596",c63e6bd5:"803","54d8bddc":"899","0109100f":"1116","66ffc608":"1365",be2f7876:"1387",ef01e1dd:"1523","7981506b":"1647","902d2d1d":"1652","9ecb4d01":"1664",eee57cd1:"1671",a971b35f:"1940","0149cacd":"2044","40d51a61":"2097",e1e17943:"2196",ac4bed99:"2312",fa423b6e:"2427",b72a3182:"2570","935116ff":"2638","80631bfd":"2656","60b67194":"2905","7bb83d6b":"2916","9ef1e345":"2989","3bedcc76":"3044","7174660f":"3102",d4d22ad8:"3145",d0a0235c:"3191",ec28562d:"3197","92b043a3":"3216",a50b12c0:"3281","0b092b5c":"3283","5e94ba2e":"3326",b7343c9b:"3397","1e070b7c":"3398","4162c6a3":"3447","020a22ba":"3650","6c1d24e1":"3667","7b7fec6b":"3914",c81517c7:"3919","2b024d60":"3942",feba251b:"4125","5017cef7":"4151","3c93ed7e":"4195","21cfb395":"4244","38680a69":"4328",f28093e3:"4504",ffa15017:"4513",f2dc10f7:"4585","9654b394":"4631",c85450ca:"4742",a3db1255:"4874","4482beb5":"4882","75600d79":"4929",e54b1e77:"5061",d87811ce:"5129","23b46f68":"5132",aa4fa4fb:"5313","622596e9":"5352",f2a3bf8e:"5383","631dea17":"5512",d16a2606:"5714","66e9ea68":"5909",d613e1f8:"5920","85e709dc":"5981","391378fa":"6027",a8ef1ed2:"6386","5b4a63ac":"6443","31c3e3d7":"6517","0c99e969":"6537",efc338fe:"6553","111e23e1":"6695","85954f48":"6734",d1284c82:"6799",f49551b9:"6822",cc519fb4:"6824","9e6b2559":"6968",f38fa80d:"6971","7d9c461e":"6978",ff96de6e:"7078",bd1a8573:"7091","30a13577":"7092","365726b0":"7108","7e91f3e1":"7120",bb4987bb:"7155","97ce6959":"7162",b7e69c77:"7318",d8c5fc94:"7451","3e8cde1e":"7470","81f033b8":"7485",bd0e022f:"7500","32d13eb8":"7874","43de05a8":"8023","7d280bdc":"8135","93015a15":"8145","39bddd84":"8188",c565b8da:"8210","3c20ca15":"8230",d163ea32:"8313",fa8af309:"8328",b4473d93:"8407",f983631a:"8482","2b8a5969":"8638","1b93ff3d":"8671",aa01ca6a:"8809","6fdd5bc4":"8882","6d2c1101":"8906",de8a7b18:"9028",debbc0e2:"9119",e5523a26:"9225","407bcc70":"9235","541bc80d":"9365","0ffc31bc":"9444",cf4d312e:"9542","36edbaa2":"9550","9db3bdac":"9615","14eb3368":"9817","13af1bdb":"9836",bfd6b54b:"9907","7097fbbc":"9947","70cd875c":"10109",cdebfca4:"10228",a7da438d:"10270","05fa5837":"10436","8203e9fd":"10480","6773ef05":"10497",e8b1baf4:"10650",c7dacad4:"10756","544bf006":"10882",caf7e36c:"10918","26e1978a":"10987","76ace0dc":"11147",ba17e21b:"11174",c98c0daa:"11203","2913cae6":"11311",a15a0d8e:"11321","65031edd":"11326","4d2bb41f":"11342",ba5e62dd:"11398","9d08c935":"11565",b74e2fe0:"11646","885bf670":"11656","3d446fd0":"11875",b63d08bc:"12228","1fb2401b":"12442","31eb4af1":"12549","885da4ef":"12555","58ac1d26":"12558",a6d8b730:"12560","5aabd190":"12567",b48b6b77:"13253",f65f22ef:"13280","6d905346":"13351","70808dd0":"13460","55920b47":"13588","3be6e3bd":"13595",c642f758:"13617","1ac29206":"13718","8bd7a1a3":"13896",bd0b26a5:"13924",d94f9ca1:"13972","911bbfa4":"13979","0b0df062":"13995","3105dae0":"14061","03902e07":"14088","31585cea":"14095",f71ac404:"14143","1168c96f":"14200",af9acd56:"14299",c099652b:"14369","83cbebfb":"14386",d3fe7aed:"14396","79db63f1":"14549","40a26966":"14610","0f188b70":"14670","763f2b13":"14713","931e6ab8":"14775","39ed38cd":"14840",f6310963:"14908","4338ab08":"15196","99a1a3e3":"15380","8cf636cd":"15393",f9c66408:"15497",a97b7821:"15658",a466ebec:"15888","271906a0":"15970","21edad34":"16022","891a9b8f":"16058","6ecc8728":"16071","7d0b3c01":"16153",ff3504dd:"16161","66fe7120":"16379",fdbb9241:"16528",a9347149:"16635",e1bbb98e:"16672",df878b79:"16876","697b1a5f":"16891","251d94d8":"16973","5573e4eb":"17009",fe34d639:"17275","45e19d44":"17283",f77885f5:"17457","528fc62e":"17511",c003d460:"17726","00c88225":"17757","3e697946":"17785",b530e783:"17883","996d98f3":"17887","9a71d807":"17989","7f9f61f2":"18025",abc9098e:"18050","29dde6c8":"18084",d19aead5:"18100","2730c631":"18143","5bebce7d":"18156","074a0372":"18186","4e07c49f":"18318","18ccacf6":"18559","719c4782":"18598",d1faa944:"18680",ccb072c7:"18706","07645771":"18734","8282a203":"18746","31793acc":"18883","4d58aa3f":"18892","6e11cc87":"18928",bc4716d5:"18998","584b298a":"19177","86c7426e":"19212","8a064b88":"19305",b7ec56b9:"19408","126e88af":"19427","6f49328c":"19493","83fe529b":"19504","974829b4":"19531","84eafbb4":"19625","664f913c":"19661","3729e987":"19671",c5593510:"19709","2edcde3e":"19733",dc7ad1ac:"19806","17f9c41b":"19832","760eed0f":"19876","8781c463":"19939","6bf1075e":"19962",b73dbab9:"20040",ef0f9e32:"20061",c9664647:"20169","9b7bae35":"20303","1d014bb1":"20602",cbbe4dac:"20689","120dd2fd":"20707","6d0dfc8d":"20764","67c4e6e9":"20841","55e55873":"20876",be3ddcfb:"20911",e01a2739:"20917","4d0df69e":"20983","653c19c7":"21015",ae5e6a48:"21134","6a89e0dd":"21143","0e728709":"21207","0feddf78":"21228",e22055a4:"21379",f18d5795:"21643",aa282e34:"21688",c0ef9e49:"21823","7f2bec55":"21983","43a49a39":"22030",af199e5b:"22087","06ceb223":"22129","9bc49845":"22163","1ca03b4b":"22238","0b0f030b":"22456",ef2624d8:"22523","0e8c522c":"22540","66b5d69c":"22583",a8f480dd:"22604","4995f874":"22777","6775be7c":"22898",fe423ebe:"22940","9e91305d":"22997","500c9b63":"23064",ede8882f:"23228","909cadf6":"23231","8113fd14":"23252",c1f9ba1e:"23310","89c49d10":"23320","57b7b037":"23343","23896e06":"23435","2c9f485d":"23522",edbf4496:"23536","2457e7c2":"23545",c7599d12:"23663","332c497c":"23714","99961c3d":"23804","9a02f8a7":"23898",f1d5089f:"24058","92ce2bd2":"24066","15d86f95":"24101","0c48ef63":"24109",b7738a69:"24158",a8565f1f:"24266","395508da":"24282",fbfa5dfc:"24401","7fc9e2ed":"24467",e6de5f28:"24501",fd378320:"24946","7d30361b":"24986","10fd89ee":"25079","6c0ce6d0":"25251","8b2f7dd6":"25283","2b1e7b76":"25427",b9b67b35:"25433","59740e69":"25451",f265d6a5:"25513","06673fe1":"25547","75071a94":"25579","9427c683":"25833",bd61737f:"25898",cc7818bb:"26067","85860bdc":"26084",a958884d:"26201","3bed40a0":"26291",f35b8c8b:"26311",aacdb064:"26361",ec205789:"26521","08985b86":"26654","5667bf50":"26686","3be4d1c2":"26695",ceb6bd62:"26858",cba64cb3:"27031",bb1d1845:"27109","6d03c6cb":"27167",e022cd8b:"27270",ec2b56b1:"27276",ec11103c:"27303","552bb95e":"27324","0f6a2fca":"27704","865c04d0":"27982","56405cb8":"28085","916fb87b":"28134","95771e39":"28139","278f3637":"28181","50e78136":"28219","9ce40ebc":"28261","7f6814ed":"28367",fc338eb2:"28475","2f4d1edb":"28476",db082e36:"28514","8af04d56":"28516","018243f8":"28623","4a0c84c3":"28699",f6ca5dc0:"28800","2f74be58":"28880","8d83f575":"28882","48c7b3a1":"28906","3417a016":"28922",da9049b8:"29014","8f32218b":"29025","44573fa4":"29050",e5977951:"29066",a670ed1c:"29131","6fe0ccd0":"29191",f5da8015:"29272","1be78505":"29514","4ef1f024":"29520","949a554a":"29698",cff5e41a:"29717","16a52e74":"29782","9e530f0a":"29818","3291c538":"29831",b604d5b2:"29864","729f5dd4":"29871","2c86cbaa":"29886","14e9211b":"29899","3d8cf439":"29978",c32e37fe:"29980","26db341a":"30062","04829abe":"30216",ff9e51b7:"30295","26bc6c41":"30419","1dc72111":"30433","019a0579":"30454","73c32a6c":"30470","10b7b761":"30589","47203c86":"30613","4f643cbc":"30677","8dc6ea19":"30678","534db397":"30800","845c1fa7":"30820","8900c226":"30834","6b685afe":"30837","0c9e4d11":"30865","8d5884d6":"30885","683d9354":"30979",c8b95361:"31009",d1036fb2:"31023","9b98b06f":"31044","23c664e3":"31050","928e95c7":"31068","5e56d481":"31089",abe8f5f4:"31116","8793e9e6":"31187",cc976a0e:"31293","212ceae2":"31294","347c8874":"31441","2d7d2510":"31471","8a75859c":"31512",ee2f6eec:"31516","6c6d8053":"31570",ce861b37:"31671","9f850ab3":"31824","4dbdcbee":"31938","87719f86":"32224",bc6bcbdd:"32255",d201558c:"32308","570c64c0":"32319","09e7c68c":"32410","0eb0d7dd":"32446","6167ec10":"32491","5d8d28d6":"32567",f8c45ac9:"32652",a5461ca4:"32689","46d1dc13":"32839",f1c17b7f:"32872",e9268009:"32892","0ef4df13":"32914","1347019b":"32961","9fcb81d2":"33023",a9776c25:"33076","95f7392c":"33083",ad516382:"33131",dd0c884c:"33138",cab767d9:"33178",fa17a3e5:"33181","5af48372":"33223","3deda206":"33260","82dec33c":"33261","9ebfae5b":"33329","765a551b":"33407",b07fb42c:"33514","586fa356":"33725","452104f0":"33737","5b659de8":"33889","1943e34c":"33920",c46ba464:"33966","9b00304e":"34020","3db5eb91":"34077","273b8e1f":"34079","4a797306":"34153","23a156eb":"34206",ff318c38:"34293",f8338e5f:"34294",e48c3912:"34323",c4a71dd9:"34407",f0f4a691:"34458","5c8ad115":"34460","5bea2473":"34475","592e779d":"34552",de061f48:"34590",c93364c6:"34647","813ebe83":"34656","71408d45":"34748","116bb944":"34766","4284f636":"34777","243071a0":"34784","5c392fa5":"34792","99a27b29":"34800","16046cb7":"34882","2c06af7c":"34943","9c12417e":"34979",a2bcabb3:"35038",b269633b:"35069","3576f003":"35214","5334bf47":"35216","907c8c6a":"35387","1cf42300":"35466","09e24d74":"35577","032d72a0":"35614",a2e876c5:"35647","90bfd346":"35768",c30c381e:"35809",df463adb:"35874","41f4b8cc":"35879",b3a22aab:"36009",ade0010f:"36312","8d392edd":"36415",f98b2e13:"36442",caa6bd94:"36483",b3fdbb6a:"36495",f3d03ec8:"36511","8d4185e0":"36673","653cd4ec":"36766",eb87086a:"36933","63849fd3":"36935","7c43c98e":"36983","6d92a4b5":"37021",ac6b62e9:"37055","6e357be7":"37058",c3a94ed1:"37208","4f4166ed":"37257","229edc10":"37316","6dfd1bfa":"37362","3d7b9a1b":"37426","7779798d":"37894","246772cc":"37918",febe4bf0:"37977","5bb043f7":"38056",b34a9ee0:"38104","80b5c97d":"38230","8e018081":"38333","6fc631de":"38349","11414e0b":"38368",a77f15f9:"38450","66cd2d70":"38469","0df0bc38":"38504",e80537c2:"38591","5eece5ec":"38679",cea40137:"38741","38cd2ebb":"38768","2f6d8a46":"38792","0cb88ec0":"38819","179d37d3":"38873","1ec74ed7":"38928","7aabbdee":"39033","3ae213b8":"39177",f55bfda4:"39209",d179e89e:"39252",c0ba661c:"39275",a072c73d:"39325",b7e5badb:"39368","22f9ccca":"39605",e6e9a3aa:"39645",f2c01e3a:"39726","8277cea1":"39820","73c3a5ed":"39853",f8904416:"39941",fa8dc2e8:"39972","5b34f9ea":"39978",f49b74d5:"40097","2d7caf96":"40158","0260d23f":"40176","7ad00ade":"40342",dd8797f2:"40365",c2ef5f99:"40665",eaaaa138:"40830","51cdab7b":"40930",f5d7fbaf:"40936","0cbb6061":"40986",ba1c1ac8:"41100","14f11778":"41120","9444e723":"41329","7945275b":"41388",e6241e03:"41537","5b7c576e":"41840","56181a0b":"41863",c2ae09fd:"41929","85db7b61":"41954",e4176d9e:"41958","81192af7":"41998",a8987ce3:"42051","10c43c6e":"42054",bfe6bb1f:"42059","672c9486":"42169","4b66f540":"42187","3ebe5c8a":"42226","909a3395":"42263","48e254a2":"42289",ff4be603:"42371","4447d079":"42436",c14e35a5:"42465",a2ff0b9e:"42551",ff7c02a9:"42609",b6cfa9b7:"42620","4b481283":"42690","8e0282b7":"42721","910f748a":"42728","699b0913":"42757","608d6ba6":"42930",e2e305b4:"43037",ea09532f:"43047",dc98fcfb:"43072","26f63738":"43163",f6d93f4d:"43294","87186dce":"43529","39befbbe":"43554","0c94161c":"43635",d7039a99:"43645","4b718ce0":"43697","0682e49e":"43793","8b15c55c":"43849",bfada16a:"43919",b99800de:"43966","0e0b668d":"44023","4b0528ed":"44029","5d86b3d6":"44118",f193e9f7:"44152","5b1c4ba7":"44174",dfca4314:"44393","3d99ef33":"44523","29c565c8":"44592","0a54392a":"44765","51453fb2":"44780","7e328509":"44797","55a23a94":"44860","0f014490":"44907","46f76bef":"45057",dacae080:"45091","593ffe68":"45114","01f7e848":"45279","8b1145e2":"45287",a258685b:"45571","3476fe8e":"45583",d02f7bc4:"45593",f2abaee2:"45732","239111c7":"45786","83d061ac":"45809","9ee45729":"45878",ce3ddafe:"46017","5216f17c":"46023",e0eae934:"46045",fff3ab69:"46074","8d0344ba":"46218",a5b5d55c:"46284","2fc02015":"46328",e7478c24:"46447","46dcda29":"46838","33a34e3b":"46901",cbbdf9a2:"47062",ba73f26c:"47068",cc1f5ce8:"47082","8b6445a0":"47117","524b67e3":"47276",cf1567e8:"47287",e5c3dfde:"47463","3059ed75":"47568","9ee4ebe9":"47582",ee799351:"47655","3ab425d2":"47708","2f0ee63c":"47838",e327333b:"47851","497aa321":"47975","9a7b56f5":"47986","38bd3ddb":"48031","60da83fa":"48055","6f6b3e89":"48150",c81622cc:"48218",bf0d24cf:"48320","3762a996":"48349","3ddb8349":"48426",ba0b40b1:"48637","8fdcea61":"48670",abfd17f9:"48840",e96fdd6c:"48998","70f3cfb0":"49096",d1b82434:"49169",f99bfa77:"49241","8eed67ba":"49270","331a2ebd":"49716","98a6ff5a":"49874",ab2e7268:"50017","8ac39bbe":"50052","4d028f11":"50145",e51da90c:"50153","486e741e":"50193","7b4c719b":"50240",acb04c32:"50337","7ce5ebd9":"50362",e86c0d05:"50375","9f305eae":"50437","40a0c599":"50472",aba6a826:"50525","7bcf009a":"50773",d7e1d518:"50849","6f93a078":"50999","1ed71a7d":"51532",e91074f3:"51555","12e76d03":"51574","6f219482":"51593",ea82a261:"51605","3b5ffa57":"51625",af6e989f:"51706",e6fe050f:"51768","42a4a45b":"51799","2212e80c":"51830",b63fdeeb:"51840",bf3bde03:"51881","86a7da57":"51945","22a76d89":"52094","08ba51c1":"52126","92bceb62":"52251","79bae4c5":"52286",fcb00301:"52491","3b1e54e9":"52499","2006be57":"52573",a18114c4:"52586","28599d52":"52593",c04dcf0d:"52715","0e46f7bf":"52789",f888d9d8:"52870","1df93b7f":"53237",c55f973e:"53371","6cd64148":"53442","3ca132b1":"53675",f20f879f:"53823",b684abf7:"54125","6afbfa44":"54133","1632abda":"54178","39a2751e":"54200","4c8d1cae":"54210",b3c952b5:"54250",d6f7d5e2:"54265",fa5bdf0c:"54363","4bae0029":"54382","3034400c":"54397","612ebb8a":"54487",cca83a59:"54513",e0668c88:"54591","7c417199":"54741","130a23fd":"54756",fd67079f:"54778",ed07f994:"54786",dd8be3b2:"54794","32f0f819":"54855",a463ff81:"55043","5560d84e":"55216","661e4fa4":"55239","08472b2d":"55273","746f419e":"55335","1710d498":"55478","8e23b856":"55552","7ec28fd9":"55693","7f536709":"55726","2e18dbc8":"55745",fe2f6d57:"55799","676a3180":"55821","407fa3a0":"55925",f2d325f1:"55962","640fe435":"56290",ac4fb807:"56424",e8d36425:"56513","7ec7e0b0":"56541","43c3babd":"56552","71f8452f":"56614",f1525ef1:"56750","151869e3":"56795",d4a6dda9:"56902","918ae6ff":"57121","7f039048":"57126","522a40f8":"57242","1b4282d0":"57258",fb218ddd:"57293","3ad7b662":"57341",f251ab77:"57489",e4b4615d:"57598","6e586ee3":"57599",d06effa9:"57699","34660ac5":"57749","163044ef":"57780",a3c98c45:"57820","2c5ceec1":"57995","84c320c1":"58009","4893a1cb":"58042",e345afee:"58096",a045168c:"58182","09e9a7df":"58197","6145eda0":"58234","551b313a":"58247","649a76e7":"58356",e5a71ed6:"58539",ea41aad0:"58564","4f9404e5":"58768",cc6053aa:"58818",cf14af90:"58822",bf2622dd:"58824","68709c70":"58914","8938295e":"59051",de11ece8:"59060","010f8398":"59181",af049e12:"59191","07a6f1c2":"59241","8962034b":"59248","902aff6f":"59336",d959d974:"59342",f4be443e:"59394",b43aa387:"59427","0cd38f48":"59442","081ed9af":"59496",c2ed794e:"59506","897798e8":"59533",d243562e:"59592",e56a1a2c:"59771","619d2e79":"59900",f929d4df:"59982","918c9b38":"59992","78f8003c":"60000","7b2e834b":"60185","34d5cc00":"60331","05a720dd":"60434",bb341369:"60518",b8677fbf:"60603",c1b7bc44:"60608","1ebc7fe2":"60682","16bb304a":"60868",a79d55be:"61007","4c13f84f":"61200",f8bc4080:"61210",f497508e:"61249",e50573ba:"61271",a529f863:"61361",e31a63b7:"61543",a882bd74:"61643","5e52bbeb":"61793","83a26c48":"62117","912fcb5a":"62235","5c77ea5f":"62307",bdd03912:"62318","8e993d66":"62319","686c1ad3":"62474","6312a106":"62493","54d1c079":"62523",bc1c33e4:"62547","8fca97e0":"62655","34d502be":"62867",c38f23a9:"62948","6dffe7c4":"62983",f29affbe:"63105","877a3c1c":"63387","7bc70741":"63604",cb582f54:"63663","92264b81":"63777","010118f9":"63801","86c8f7cd":"63808",cacfa11d:"64000",d6990b47:"64032","555f2cec":"64072",e8591f69:"64100","07b92fc6":"64122",aea361f0:"64164",e715560c:"64485","6694e7e9":"64525",fd0f74aa:"64533",eee9e2f1:"64596","300bd484":"64656",e74888b9:"64754","5a7e5a43":"64766",dfd588b8:"64824","42d0afac":"64871","610e19f0":"64900",b81f3fb0:"64967","637ec626":"65132","8d3be60d":"65501",ef0f3981:"65592","88dde0bb":"65594","90dccef4":"65641","540f26e7":"65883","21d2296a":"65965","06b7cd3c":"66026","9d336f66":"66081",cf494ba6:"66187","1fb9ab5c":"66238","5f0246ae":"66256",a5560bad:"66303","87cc8f7c":"66336","1d642165":"66462","9c53d859":"66465",e0052e0c:"66597","2dba4e8b":"66744","12b52520":"66958","49763a0b":"67010","01ba3f79":"67021",d7124adb:"67061",c7b80b67:"67069","15f4efbb":"67132","2d8700b9":"67343","85ac525a":"67448","18caf9ef":"67583","1693c0b8":"67597",a48eac25:"67638",a23744f9:"67670",d29db0e3:"67908","8d96489a":"67954","8858d0ce":"68034",db9653b1:"68126",f301134a:"68258","8751004c":"68689","5335ef4f":"68757","9b89ba00":"68793",c3ca7a6a:"68823",ae5838f0:"68943","21cf1efb":"69057","1a54bfd0":"69111",a5f4c814:"69125","046783c0":"69169","193f200e":"69209","212137e2":"69234","140e6a69":"69254","9bfbb8bc":"69628",d5d7628b:"69629","607292ce":"69647","273a5860":"69824","7facae8f":"69843","6d933e1d":"69899","41e02281":"70081","16b47049":"70130","06b3b671":"70178","7bd49e6c":"70249",d72ada40:"70277","1fc4ed50":"70367","3fb51827":"70497",f53e2381:"70504",ce79b72a:"70543","65306ecf":"70614","21d3c1c7":"70706","06876062":"70964","99c371aa":"71081",dff7b4e8:"71160","5ed92a05":"71169","167f5be9":"71287",c26ab7d5:"71476",dcf1d6e7:"71516",d93a0aad:"71544",ff9d88b6:"71698",fcfa677e:"71789","39afc900":"71811","77ccc938":"71918","59dfcfb5":"72070","7f31124b":"72189","66179fb5":"72331",cf945ce5:"72500","084a18af":"72613","1adeac4a":"72638","150a4d14":"72740","2e2a73ec":"72887",d703ca6f:"72952",d9b3adf3:"72957","2e6d047c":"72978","1eec97be":"73238","4efa0483":"73300",aa02927d:"73338",b26f6fa9:"73356","6594bd70":"73369","63b8176f":"73452",c9f98325:"73533",c7953305:"73537","05a8a78d":"73576","12dcfbad":"73618","8f9c5733":"73725","154cbeb4":"73745",af29c71b:"73766",bce7d46d:"73858","566ea6d5":"73975",ff35d8ff:"74019",a0541488:"74061","67e63bc0":"74091",ba454016:"74132",ef4e0f5d:"74136","769f97b7":"74332",cb8731ee:"74337","4c8fc79c":"74441",e6a17fa0:"74465","4bbc58d4":"74480","80ea5ae7":"74578","016f0e16":"74794","8b87f6f5":"74839","5b34d9eb":"75070","29ff1658":"75118","8d81369e":"75123","61c61e17":"75203",ca1d44bc:"75273",c176dc4f:"75546","43a232e9":"75567",ff078e30:"75671","1ac49947":"75702","64e30bbc":"75897",b92bff04:"75911",d6f3938e:"76029","172c9869":"76180","99ae7254":"76222","02de7b5a":"76240","2363ed29":"76355","4f8fd4be":"76360",c0f7075f:"76369","359e34b0":"76527","982c02f7":"76668",d12dbf4d:"76793","198182f0":"76878","8d493a07":"76895",cb870251:"76924","9eb4c1b4":"77053","60e8b504":"77170","4deae4de":"77413","78dc40c2":"77427","56c932ee":"77527","274eaedf":"77655","60043c0d":"77680",be698a2c:"77887","971cbe2f":"77923","1ae50e88":"78035","4ab9b114":"78038",c80936bd:"78063",ed51eb7d:"78072","516dec85":"78114","41742cda":"78233",df22f3af:"78248","118229e6":"78452","2b5e4b34":"78863","0fcbeed9":"78924","69f3d9b5":"78991",a17fb62b:"79073","95f18dd4":"79164","3888e873":"79239",f83967c4:"79290","10c28d6f":"79298","46fa8ad3":"79315","73a7bd5f":"79356",df5a3016:"79478","27e1a14b":"79479",fbff3b11:"79543",c733e485:"79691","6b2b8280":"79829","4bedd8c5":"79895",b00a2879:"79958","092519d2":"79963","4925ce85":"79977","935f2afb":"80053","18dd253f":"80132",fc8aebe3:"80192",eac8f2ef:"80268","2d35b91c":"80380","83cd8f20":"80709",f97cc188:"80895",abf6a1f1:"80940","85c9bffb":"81012","0c1ee94a":"81135","29b6c240":"81400","2dd65ece":"81477",a39041db:"81667","82a4f002":"81708","6a0b4355":"81835","35fa8025":"81934","919b108c":"82132","51da09c7":"82250",dc130668:"82342","3f6554cb":"82348",d885d629:"82360",fadcaea6:"82423","7bcf7096":"82513",aa395a59:"82614",ba4efbe0:"82621",ba8527a9:"82732","3ebee193":"82864",bf02c3ce:"82982","39b565ff":"82989","39c8ecdc":"83069","1a42aba3":"83074",fba94ee1:"83139",a26f7afa:"83147",fb0364ff:"83513","779753bc":"83692","39c159d8":"83698","66716ec1":"83893","10f908b7":"83957","363318d5":"84063","737371dd":"84097","6e366b57":"84242",a3286ddf:"84281","8da7304d":"84362","2da29b2a":"84429","3b12bc8a":"84477","6eeb04e2":"84500","6cb122e3":"84513","4cec253a":"84710","42325f5c":"84745","211f58b1":"84754","7e5ee96c":"84841","5c27dd68":"84847","34b19815":"84854",e9d5739e:"84888","21bf64ca":"84941","6e5d074b":"85011","7bc3feb7":"85054","60d04b47":"85098",f8482b2c:"85217","3743f01c":"85419","7e446cc1":"85455","6d480200":"85493","82033eb7":"85780",c9b79676:"86009",ed809cac:"86018",b47406fa:"86129","1db21d86":"86150",a540f8cd:"86317","08e3aaa9":"86333","4ad39569":"86356","45aa7127":"86476",ce66b6fd:"86518","7dd3be25":"86826","4e1da517":"86950",d1b2a42e:"86991",a6a8af40:"87064","27bd5328":"87223","6fc8d865":"87224","1fdab62e":"87240","9980f90c":"87304","96c0bb00":"87313",a48778d9:"87316","96ec050b":"87443","7668acae":"87460","8faa0fb1":"87482",a291f403:"87513",f807eec9:"87634",c07f2717:"87667","3958a146":"87799",e111f111:"87836",b1998bb1:"87866","8d28c4be":"88068",e5842021:"88179","1e391540":"88187","5304a4a8":"88204",c0074ddd:"88252","21ad5224":"88295",e1b9986a:"88338",ac930f6e:"88380","3c725018":"88446","6827856d":"88598","4499569c":"88621","6f59957c":"88625","41db9914":"88821","1c56d006":"88831","32ea4ecb":"88879","4455e85b":"89002","234a1403":"89122",b4028749:"89210","17114a18":"89215","13b69fa8":"89574","164cd634":"89675","46c600d5":"89780","443045da":"89806","1f79049f":"89852","93d3457d":"89984",fee1f25c:"89986",d25ffd5f:"89987",d043cc46:"90046","390ef088":"90075",d36db526:"90085","41b3e733":"90185","727a1f3c":"90205",f2497893:"90333","2c91f584":"90342","7d607fc0":"90377",f60e43ec:"90392","0b78393d":"90398",dd313590:"90431","0a13c98e":"90451",b2335bc1:"90464","73dfc993":"90536","01fb8b11":"90560","22f40a40":"90601","147b0f6a":"90610","8cd0f4f5":"90615","6601f604":"90645","459a783a":"90666","87e7806e":"90865","4302562a":"90896","6eb0ce42":"90976","9c4bbfc4":"91178",ff0539a2:"91213","28b27838":"91231","02ee0502":"91287","872e63de":"91304",b82d5884:"91316","4cd7d8af":"91406","1671b3fa":"91425",d692bb25:"91523",c839a5b0:"91628","2f535455":"91753","304ed800":"91782","0f7553c3":"91810",b2735041:"91849","000c061a":"92085","8c828746":"92244","9c42de85":"92269",db5c8692:"92393","799b872c":"92404",d6360c39:"92456","4d4093bb":"92463","6eebf72d":"92744","7861f6df":"92775","8c31caf6":"92778",ae5bb339:"92786","85c3ba36":"92843","8bfba65b":"92851",e5a16b2e:"92900","14e00221":"92964",b984322c:"93023","61e5c5b8":"93071",e7cbe8da:"93151",f7101d4f:"93176","740eb29c":"93195",b83df1bc:"93308","5d075efb":"93340",f7735fb0:"93346",dd435828:"93377","03e8549c":"93400",dede40b0:"93590","4e6907d6":"93749","917734f8":"93832",cb341380:"93837",c9aea766:"94114","7c8407dd":"94123","91dc98f0":"94136","37aba5d3":"94197","43b891d1":"94223","63f66cb7":"94328","9fdf7324":"94337","6c10648f":"94401","878356ab":"94452","487f7f30":"94605",d3e690ce:"94694","376d31f7":"94696",a233fb97:"94932",b8e39b95:"95020",d666ab7e:"95107","3db8c88b":"95171",bc08bf79:"95281","9936b6c8":"95296",cf282674:"95317","1e173bbe":"95327","5b23c695":"95329","41fbfe2f":"95364","7877b0eb":"95418",e9ef6b31:"95441","0e0f5dd2":"95561","8462ad7a":"95696",edf19300:"95745",e490fd18:"95801","9dd89af2":"95816","7e254f9d":"95911","90b0cf6d":"95945","8fa500ae":"96055",d6011437:"96078",a322018d:"96082","3061ad92":"96135",f0129862:"96188","8e2c0739":"96199",ebf2bdda:"96361","64bd79cb":"96426","38e65fdd":"96535","49ea6ca5":"96544","385bc71d":"96547",e23cd647:"96617",a612420b:"96684","2035956b":"96768",b35418cf:"96772","99ba663e":"96831","09e11ac0":"96945","57973c2b":"96971","7f6f8f16":"96979","6816f4c0":"97065",f3034cf4:"97129","9d4bcb9a":"97334",d91e7ab4:"97469","02fbc840":"97523","902fdb3b":"97547","7ea214d5":"97553",c70cb355:"97557",ed97cef0:"97617","6f25dd34":"97648",b094b997:"97782","7513b789":"97816","16cff1eb":"97826",dd6685df:"97850","1a4e3797":"97920","746bf890":"97955","049dc708":"98177","0e7f2915":"98200","1820eb3b":"98218",b7f629d0:"98272",ced65f67:"98623",d1475ab1:"98740","1a6f209f":"98791","6a913ab1":"98868","3ff950a4":"98939","008b0ccc":"99120","8aecb2ef":"99184",ca443c18:"99266","7cc0ca0e":"99299","00125b11":"99367",c2f4aca4:"99389","64758f43":"99427",f2d5637b:"99494","49ea4a42":"99607","32db5af4":"99669","15d4dc80":"99839","5e3def70":"99871",b63b5bb9:"99997"}[e]||e,r.p+r.u(e)},(()=>{var e={51303:0,40532:0};r.f.j=(b,f)=>{var c=r.o(e,b)?e[b]:void 0;if(0!==c)if(c)f.push(c[2]);else if(/^(40532|51303)$/.test(b))e[b]=0;else{var d=new Promise(((f,d)=>c=e[b]=[f,d]));f.push(c[2]=d);var a=r.p+r.u(b),t=new Error;r.l(a,(f=>{if(r.o(e,b)&&(0!==(c=e[b])&&(e[b]=void 0),c)){var d=f&&("load"===f.type?"missing":f.type),a=f&&f.target&&f.target.src;t.message="Loading chunk "+b+" failed.\n("+d+": "+a+")",t.name="ChunkLoadError",t.type=d,t.request=a,c[1](t)}}),"chunk-"+b,b)}},r.O.j=b=>0===e[b];var b=(b,f)=>{var c,d,[a,t,o]=f,n=0;if(a.some((b=>0!==e[b]))){for(c in t)r.o(t,c)&&(r.m[c]=t[c]);if(o)var i=o(r)}for(b&&b(f);n{"use strict";var e,b,f,c,d,a={},t={};function r(e){var b=t[e];if(void 0!==b)return b.exports;var f=t[e]={id:e,loaded:!1,exports:{}};return a[e].call(f.exports,f,f.exports,r),f.loaded=!0,f.exports}r.m=a,r.c=t,e=[],r.O=(b,f,c,d)=>{if(!f){var a=1/0;for(i=0;i=d)&&Object.keys(r.O).every((e=>r.O[e](f[o])))?f.splice(o--,1):(t=!1,d0&&e[i-1][2]>d;i--)e[i]=e[i-1];e[i]=[f,c,d]},r.n=e=>{var b=e&&e.__esModule?()=>e.default:()=>e;return r.d(b,{a:b}),b},f=Object.getPrototypeOf?e=>Object.getPrototypeOf(e):e=>e.__proto__,r.t=function(e,c){if(1&c&&(e=this(e)),8&c)return e;if("object"==typeof e&&e){if(4&c&&e.__esModule)return e;if(16&c&&"function"==typeof e.then)return e}var d=Object.create(null);r.r(d);var a={};b=b||[null,f({}),f([]),f(f)];for(var t=2&c&&e;"object"==typeof t&&!~b.indexOf(t);t=f(t))Object.getOwnPropertyNames(t).forEach((b=>a[b]=()=>e[b]));return a.default=()=>e,r.d(d,a),d},r.d=(e,b)=>{for(var f in b)r.o(b,f)&&!r.o(e,f)&&Object.defineProperty(e,f,{enumerable:!0,get:b[f]})},r.f={},r.e=e=>Promise.all(Object.keys(r.f).reduce(((b,f)=>(r.f[f](e,b),b)),[])),r.u=e=>"assets/js/"+({19:"906e49ec",21:"f5e3827c",71:"ab971afc",99:"49c587c2",172:"21730a31",224:"a93c3367",250:"23d30d6b",291:"5da0ca7c",467:"a5bcb3f1",513:"9216ce7b",596:"b564874a",803:"c63e6bd5",899:"54d8bddc",1116:"0109100f",1365:"66ffc608",1387:"be2f7876",1523:"ef01e1dd",1647:"7981506b",1652:"902d2d1d",1664:"9ecb4d01",1671:"eee57cd1",1940:"a971b35f",2044:"0149cacd",2097:"40d51a61",2196:"e1e17943",2312:"ac4bed99",2427:"fa423b6e",2570:"b72a3182",2638:"935116ff",2656:"80631bfd",2905:"60b67194",2916:"7bb83d6b",2989:"9ef1e345",3044:"3bedcc76",3102:"7174660f",3145:"d4d22ad8",3191:"d0a0235c",3197:"ec28562d",3216:"92b043a3",3281:"a50b12c0",3283:"0b092b5c",3326:"5e94ba2e",3397:"b7343c9b",3398:"1e070b7c",3447:"4162c6a3",3650:"020a22ba",3667:"6c1d24e1",3914:"7b7fec6b",3919:"c81517c7",3942:"2b024d60",4125:"feba251b",4151:"5017cef7",4195:"3c93ed7e",4244:"21cfb395",4328:"38680a69",4504:"f28093e3",4513:"ffa15017",4585:"f2dc10f7",4631:"9654b394",4742:"c85450ca",4874:"a3db1255",4882:"4482beb5",4929:"75600d79",5061:"e54b1e77",5129:"d87811ce",5132:"23b46f68",5313:"aa4fa4fb",5352:"622596e9",5383:"f2a3bf8e",5512:"631dea17",5714:"d16a2606",5909:"66e9ea68",5920:"d613e1f8",5981:"85e709dc",6027:"391378fa",6151:"39579801",6386:"a8ef1ed2",6443:"5b4a63ac",6517:"31c3e3d7",6537:"0c99e969",6553:"efc338fe",6695:"111e23e1",6734:"85954f48",6799:"d1284c82",6822:"f49551b9",6824:"cc519fb4",6968:"9e6b2559",6971:"f38fa80d",6978:"7d9c461e",7078:"ff96de6e",7091:"bd1a8573",7092:"30a13577",7108:"365726b0",7120:"7e91f3e1",7155:"bb4987bb",7162:"97ce6959",7318:"b7e69c77",7451:"d8c5fc94",7470:"3e8cde1e",7485:"81f033b8",7500:"bd0e022f",7874:"32d13eb8",8023:"43de05a8",8135:"7d280bdc",8145:"93015a15",8188:"39bddd84",8210:"c565b8da",8230:"3c20ca15",8313:"d163ea32",8328:"fa8af309",8407:"b4473d93",8482:"f983631a",8638:"2b8a5969",8671:"1b93ff3d",8809:"aa01ca6a",8882:"6fdd5bc4",8906:"6d2c1101",9028:"de8a7b18",9119:"debbc0e2",9225:"e5523a26",9235:"407bcc70",9365:"541bc80d",9444:"0ffc31bc",9542:"cf4d312e",9550:"36edbaa2",9615:"9db3bdac",9817:"14eb3368",9836:"13af1bdb",9907:"bfd6b54b",9947:"7097fbbc",10109:"70cd875c",10228:"cdebfca4",10270:"a7da438d",10436:"05fa5837",10480:"8203e9fd",10497:"6773ef05",10650:"e8b1baf4",10756:"c7dacad4",10882:"544bf006",10918:"caf7e36c",10987:"26e1978a",11147:"76ace0dc",11174:"ba17e21b",11203:"c98c0daa",11311:"2913cae6",11321:"a15a0d8e",11326:"65031edd",11342:"4d2bb41f",11398:"ba5e62dd",11565:"9d08c935",11646:"b74e2fe0",11656:"885bf670",11875:"3d446fd0",12228:"b63d08bc",12442:"1fb2401b",12549:"31eb4af1",12555:"885da4ef",12558:"58ac1d26",12560:"a6d8b730",12567:"5aabd190",13253:"b48b6b77",13280:"f65f22ef",13351:"6d905346",13460:"70808dd0",13588:"55920b47",13595:"3be6e3bd",13617:"c642f758",13718:"1ac29206",13896:"8bd7a1a3",13924:"bd0b26a5",13972:"d94f9ca1",13979:"911bbfa4",13995:"0b0df062",14061:"3105dae0",14088:"03902e07",14095:"31585cea",14143:"f71ac404",14200:"1168c96f",14299:"af9acd56",14369:"c099652b",14386:"83cbebfb",14396:"d3fe7aed",14549:"79db63f1",14610:"40a26966",14670:"0f188b70",14713:"763f2b13",14775:"931e6ab8",14840:"39ed38cd",14908:"f6310963",15196:"4338ab08",15380:"99a1a3e3",15393:"8cf636cd",15497:"f9c66408",15658:"a97b7821",15888:"a466ebec",15970:"271906a0",15994:"26134010",16022:"21edad34",16038:"62127933",16058:"891a9b8f",16071:"6ecc8728",16153:"7d0b3c01",16161:"ff3504dd",16379:"66fe7120",16528:"fdbb9241",16635:"a9347149",16672:"e1bbb98e",16685:"46551803",16876:"df878b79",16891:"697b1a5f",16973:"251d94d8",17009:"5573e4eb",17275:"fe34d639",17283:"45e19d44",17457:"f77885f5",17511:"528fc62e",17726:"c003d460",17757:"00c88225",17785:"3e697946",17883:"b530e783",17887:"996d98f3",17989:"9a71d807",18025:"7f9f61f2",18050:"abc9098e",18084:"29dde6c8",18100:"d19aead5",18143:"2730c631",18156:"5bebce7d",18186:"074a0372",18318:"4e07c49f",18559:"18ccacf6",18598:"719c4782",18680:"d1faa944",18706:"ccb072c7",18734:"07645771",18746:"8282a203",18883:"31793acc",18892:"4d58aa3f",18928:"6e11cc87",18998:"bc4716d5",19177:"584b298a",19204:"79617745",19212:"86c7426e",19305:"8a064b88",19408:"b7ec56b9",19427:"126e88af",19493:"6f49328c",19504:"83fe529b",19531:"974829b4",19625:"84eafbb4",19661:"664f913c",19671:"3729e987",19709:"c5593510",19733:"2edcde3e",19806:"dc7ad1ac",19832:"17f9c41b",19876:"760eed0f",19939:"8781c463",19962:"6bf1075e",20040:"b73dbab9",20061:"ef0f9e32",20169:"c9664647",20303:"9b7bae35",20602:"1d014bb1",20689:"cbbe4dac",20707:"120dd2fd",20764:"6d0dfc8d",20841:"67c4e6e9",20876:"55e55873",20911:"be3ddcfb",20917:"e01a2739",20983:"4d0df69e",21015:"653c19c7",21134:"ae5e6a48",21143:"6a89e0dd",21190:"95169675",21207:"0e728709",21228:"0feddf78",21379:"e22055a4",21643:"f18d5795",21688:"aa282e34",21823:"c0ef9e49",21983:"7f2bec55",22030:"43a49a39",22087:"af199e5b",22129:"06ceb223",22163:"9bc49845",22238:"1ca03b4b",22456:"0b0f030b",22523:"ef2624d8",22540:"0e8c522c",22583:"66b5d69c",22604:"a8f480dd",22777:"4995f874",22898:"6775be7c",22940:"fe423ebe",22997:"9e91305d",23064:"500c9b63",23228:"ede8882f",23231:"909cadf6",23252:"8113fd14",23310:"c1f9ba1e",23320:"89c49d10",23343:"57b7b037",23435:"23896e06",23522:"2c9f485d",23536:"edbf4496",23545:"2457e7c2",23663:"c7599d12",23714:"332c497c",23804:"99961c3d",23898:"9a02f8a7",24058:"f1d5089f",24066:"92ce2bd2",24101:"15d86f95",24109:"0c48ef63",24158:"b7738a69",24266:"a8565f1f",24282:"395508da",24401:"fbfa5dfc",24467:"7fc9e2ed",24501:"e6de5f28",24946:"fd378320",24986:"7d30361b",25079:"10fd89ee",25251:"6c0ce6d0",25283:"8b2f7dd6",25427:"2b1e7b76",25433:"b9b67b35",25451:"59740e69",25513:"f265d6a5",25547:"06673fe1",25579:"75071a94",25833:"9427c683",25898:"bd61737f",26067:"cc7818bb",26084:"85860bdc",26086:"21996883",26201:"a958884d",26291:"3bed40a0",26311:"f35b8c8b",26361:"aacdb064",26521:"ec205789",26654:"08985b86",26686:"5667bf50",26695:"3be4d1c2",26858:"ceb6bd62",27031:"cba64cb3",27109:"bb1d1845",27167:"6d03c6cb",27270:"e022cd8b",27276:"ec2b56b1",27303:"ec11103c",27324:"552bb95e",27554:"92307374",27704:"0f6a2fca",27918:"17896441",27982:"865c04d0",28085:"56405cb8",28134:"916fb87b",28139:"95771e39",28181:"278f3637",28219:"50e78136",28261:"9ce40ebc",28367:"7f6814ed",28475:"fc338eb2",28476:"2f4d1edb",28514:"db082e36",28516:"8af04d56",28623:"018243f8",28699:"4a0c84c3",28800:"f6ca5dc0",28880:"2f74be58",28882:"8d83f575",28906:"48c7b3a1",28922:"3417a016",29014:"da9049b8",29025:"8f32218b",29050:"44573fa4",29066:"e5977951",29131:"a670ed1c",29191:"6fe0ccd0",29272:"f5da8015",29514:"1be78505",29520:"4ef1f024",29698:"949a554a",29717:"cff5e41a",29782:"16a52e74",29818:"9e530f0a",29831:"3291c538",29864:"b604d5b2",29871:"729f5dd4",29886:"2c86cbaa",29899:"14e9211b",29978:"3d8cf439",29980:"c32e37fe",30062:"26db341a",30216:"04829abe",30295:"ff9e51b7",30419:"26bc6c41",30433:"1dc72111",30454:"019a0579",30470:"73c32a6c",30589:"10b7b761",30613:"47203c86",30677:"4f643cbc",30678:"8dc6ea19",30800:"534db397",30820:"845c1fa7",30834:"8900c226",30837:"6b685afe",30865:"0c9e4d11",30885:"8d5884d6",30979:"683d9354",31009:"c8b95361",31013:"65360910",31023:"d1036fb2",31044:"9b98b06f",31050:"23c664e3",31068:"928e95c7",31089:"5e56d481",31116:"abe8f5f4",31152:"99496549",31187:"8793e9e6",31293:"cc976a0e",31294:"212ceae2",31441:"347c8874",31471:"2d7d2510",31512:"8a75859c",31516:"ee2f6eec",31570:"6c6d8053",31671:"ce861b37",31824:"9f850ab3",31938:"4dbdcbee",32224:"87719f86",32255:"bc6bcbdd",32308:"d201558c",32319:"570c64c0",32410:"09e7c68c",32446:"0eb0d7dd",32491:"6167ec10",32567:"5d8d28d6",32652:"f8c45ac9",32689:"a5461ca4",32839:"46d1dc13",32872:"f1c17b7f",32892:"e9268009",32914:"0ef4df13",32961:"1347019b",33023:"9fcb81d2",33076:"a9776c25",33083:"95f7392c",33131:"ad516382",33138:"dd0c884c",33178:"cab767d9",33181:"fa17a3e5",33223:"5af48372",33260:"3deda206",33261:"82dec33c",33329:"9ebfae5b",33407:"765a551b",33514:"b07fb42c",33725:"586fa356",33737:"452104f0",33889:"5b659de8",33920:"1943e34c",33966:"c46ba464",34020:"9b00304e",34077:"3db5eb91",34079:"273b8e1f",34153:"4a797306",34206:"23a156eb",34293:"ff318c38",34294:"f8338e5f",34323:"e48c3912",34407:"c4a71dd9",34458:"f0f4a691",34460:"5c8ad115",34475:"5bea2473",34552:"592e779d",34590:"de061f48",34647:"c93364c6",34656:"813ebe83",34748:"71408d45",34766:"116bb944",34777:"4284f636",34784:"243071a0",34792:"5c392fa5",34800:"99a27b29",34882:"16046cb7",34943:"2c06af7c",34979:"9c12417e",35038:"a2bcabb3",35069:"b269633b",35214:"3576f003",35216:"5334bf47",35387:"907c8c6a",35466:"1cf42300",35577:"09e24d74",35614:"032d72a0",35647:"a2e876c5",35768:"90bfd346",35809:"c30c381e",35874:"df463adb",35879:"41f4b8cc",36009:"b3a22aab",36312:"ade0010f",36415:"8d392edd",36442:"f98b2e13",36483:"caa6bd94",36495:"b3fdbb6a",36511:"f3d03ec8",36673:"8d4185e0",36766:"653cd4ec",36773:"91647079",36933:"eb87086a",36935:"63849fd3",36983:"7c43c98e",37021:"6d92a4b5",37055:"ac6b62e9",37058:"6e357be7",37208:"c3a94ed1",37257:"4f4166ed",37316:"229edc10",37362:"6dfd1bfa",37426:"3d7b9a1b",37894:"7779798d",37918:"246772cc",37977:"febe4bf0",38056:"5bb043f7",38104:"b34a9ee0",38230:"80b5c97d",38333:"8e018081",38349:"6fc631de",38368:"11414e0b",38450:"a77f15f9",38469:"66cd2d70",38504:"0df0bc38",38591:"e80537c2",38679:"5eece5ec",38741:"cea40137",38768:"38cd2ebb",38792:"2f6d8a46",38819:"0cb88ec0",38873:"179d37d3",38928:"1ec74ed7",39033:"7aabbdee",39177:"3ae213b8",39209:"f55bfda4",39252:"d179e89e",39275:"c0ba661c",39325:"a072c73d",39368:"b7e5badb",39605:"22f9ccca",39645:"e6e9a3aa",39726:"f2c01e3a",39820:"8277cea1",39853:"73c3a5ed",39941:"f8904416",39972:"fa8dc2e8",39978:"5b34f9ea",40097:"f49b74d5",40158:"2d7caf96",40176:"0260d23f",40342:"7ad00ade",40365:"dd8797f2",40665:"c2ef5f99",40830:"eaaaa138",40930:"51cdab7b",40936:"f5d7fbaf",40986:"0cbb6061",41100:"ba1c1ac8",41120:"14f11778",41329:"9444e723",41388:"7945275b",41537:"e6241e03",41750:"78718572",41840:"5b7c576e",41863:"56181a0b",41929:"c2ae09fd",41954:"85db7b61",41958:"e4176d9e",41998:"81192af7",42051:"a8987ce3",42054:"10c43c6e",42059:"bfe6bb1f",42169:"672c9486",42187:"4b66f540",42226:"3ebe5c8a",42263:"909a3395",42288:"24647619",42289:"48e254a2",42371:"ff4be603",42436:"4447d079",42465:"c14e35a5",42551:"a2ff0b9e",42609:"ff7c02a9",42620:"b6cfa9b7",42690:"4b481283",42721:"8e0282b7",42728:"910f748a",42757:"699b0913",42930:"608d6ba6",43037:"e2e305b4",43047:"ea09532f",43072:"dc98fcfb",43163:"26f63738",43294:"f6d93f4d",43529:"87186dce",43554:"39befbbe",43635:"0c94161c",43645:"d7039a99",43697:"4b718ce0",43793:"0682e49e",43849:"8b15c55c",43919:"bfada16a",43966:"b99800de",44023:"0e0b668d",44029:"4b0528ed",44118:"5d86b3d6",44152:"f193e9f7",44174:"5b1c4ba7",44393:"dfca4314",44523:"3d99ef33",44592:"29c565c8",44765:"0a54392a",44780:"51453fb2",44797:"7e328509",44860:"55a23a94",44907:"0f014490",45057:"46f76bef",45091:"dacae080",45114:"593ffe68",45279:"01f7e848",45287:"8b1145e2",45571:"a258685b",45583:"3476fe8e",45593:"d02f7bc4",45732:"f2abaee2",45786:"239111c7",45809:"83d061ac",45878:"9ee45729",46017:"ce3ddafe",46023:"5216f17c",46045:"e0eae934",46074:"fff3ab69",46218:"8d0344ba",46284:"a5b5d55c",46328:"2fc02015",46447:"e7478c24",46838:"46dcda29",46901:"33a34e3b",47062:"cbbdf9a2",47068:"ba73f26c",47082:"cc1f5ce8",47117:"8b6445a0",47276:"524b67e3",47287:"cf1567e8",47463:"e5c3dfde",47568:"3059ed75",47582:"9ee4ebe9",47655:"ee799351",47708:"3ab425d2",47838:"2f0ee63c",47851:"e327333b",47975:"497aa321",47986:"9a7b56f5",48031:"38bd3ddb",48055:"60da83fa",48150:"6f6b3e89",48218:"c81622cc",48320:"bf0d24cf",48349:"3762a996",48426:"3ddb8349",48637:"ba0b40b1",48670:"8fdcea61",48840:"abfd17f9",48998:"e96fdd6c",49096:"70f3cfb0",49169:"d1b82434",49241:"f99bfa77",49270:"8eed67ba",49681:"28325500",49716:"331a2ebd",49874:"98a6ff5a",50017:"ab2e7268",50052:"8ac39bbe",50145:"4d028f11",50153:"e51da90c",50193:"486e741e",50240:"7b4c719b",50337:"acb04c32",50362:"7ce5ebd9",50375:"e86c0d05",50437:"9f305eae",50472:"40a0c599",50525:"aba6a826",50773:"7bcf009a",50849:"d7e1d518",50999:"6f93a078",51532:"1ed71a7d",51555:"e91074f3",51574:"12e76d03",51593:"6f219482",51605:"ea82a261",51625:"3b5ffa57",51706:"af6e989f",51768:"e6fe050f",51799:"42a4a45b",51830:"2212e80c",51840:"b63fdeeb",51881:"bf3bde03",51945:"86a7da57",52094:"22a76d89",52126:"08ba51c1",52251:"92bceb62",52286:"79bae4c5",52491:"fcb00301",52499:"3b1e54e9",52573:"2006be57",52586:"a18114c4",52593:"28599d52",52715:"c04dcf0d",52789:"0e46f7bf",52870:"f888d9d8",53237:"1df93b7f",53243:"54230287",53371:"c55f973e",53442:"6cd64148",53675:"3ca132b1",53823:"f20f879f",54125:"b684abf7",54133:"6afbfa44",54178:"1632abda",54200:"39a2751e",54210:"4c8d1cae",54250:"b3c952b5",54265:"d6f7d5e2",54363:"fa5bdf0c",54382:"4bae0029",54397:"3034400c",54487:"612ebb8a",54513:"cca83a59",54591:"e0668c88",54741:"7c417199",54756:"130a23fd",54778:"fd67079f",54786:"ed07f994",54794:"dd8be3b2",54855:"32f0f819",55043:"a463ff81",55216:"5560d84e",55239:"661e4fa4",55273:"08472b2d",55335:"746f419e",55478:"1710d498",55552:"8e23b856",55693:"7ec28fd9",55726:"7f536709",55745:"2e18dbc8",55799:"fe2f6d57",55821:"676a3180",55925:"407fa3a0",55962:"f2d325f1",56290:"640fe435",56424:"ac4fb807",56513:"e8d36425",56541:"7ec7e0b0",56552:"43c3babd",56614:"71f8452f",56750:"f1525ef1",56795:"151869e3",56902:"d4a6dda9",57121:"918ae6ff",57126:"7f039048",57242:"522a40f8",57258:"1b4282d0",57293:"fb218ddd",57341:"3ad7b662",57489:"f251ab77",57598:"e4b4615d",57599:"6e586ee3",57699:"d06effa9",57749:"34660ac5",57780:"163044ef",57820:"a3c98c45",57995:"2c5ceec1",58009:"84c320c1",58042:"4893a1cb",58096:"e345afee",58182:"a045168c",58197:"09e9a7df",58234:"6145eda0",58247:"551b313a",58356:"649a76e7",58539:"e5a71ed6",58564:"ea41aad0",58768:"4f9404e5",58818:"cc6053aa",58822:"cf14af90",58824:"bf2622dd",58914:"68709c70",59051:"8938295e",59060:"de11ece8",59181:"010f8398",59191:"af049e12",59241:"07a6f1c2",59248:"8962034b",59336:"902aff6f",59342:"d959d974",59394:"f4be443e",59427:"b43aa387",59442:"0cd38f48",59496:"081ed9af",59506:"c2ed794e",59533:"897798e8",59592:"d243562e",59771:"e56a1a2c",59900:"619d2e79",59982:"f929d4df",59992:"918c9b38",6e4:"78f8003c",60185:"7b2e834b",60331:"34d5cc00",60434:"05a720dd",60487:"34088569",60518:"bb341369",60603:"b8677fbf",60608:"c1b7bc44",60682:"1ebc7fe2",60831:"84960677",60868:"16bb304a",61007:"a79d55be",61200:"4c13f84f",61210:"f8bc4080",61249:"f497508e",61271:"e50573ba",61361:"a529f863",61543:"e31a63b7",61643:"a882bd74",61793:"5e52bbeb",62117:"83a26c48",62235:"912fcb5a",62307:"5c77ea5f",62318:"bdd03912",62319:"8e993d66",62474:"686c1ad3",62493:"6312a106",62523:"54d1c079",62547:"bc1c33e4",62655:"8fca97e0",62867:"34d502be",62948:"c38f23a9",62983:"6dffe7c4",63105:"f29affbe",63387:"877a3c1c",63604:"7bc70741",63663:"cb582f54",63777:"92264b81",63801:"010118f9",63808:"86c8f7cd",64e3:"cacfa11d",64032:"d6990b47",64072:"555f2cec",64100:"e8591f69",64122:"07b92fc6",64164:"aea361f0",64485:"e715560c",64525:"6694e7e9",64533:"fd0f74aa",64596:"eee9e2f1",64656:"300bd484",64754:"e74888b9",64766:"5a7e5a43",64824:"dfd588b8",64871:"42d0afac",64900:"610e19f0",64967:"b81f3fb0",65132:"637ec626",65501:"8d3be60d",65592:"ef0f3981",65594:"88dde0bb",65641:"90dccef4",65758:"62041344",65883:"540f26e7",65965:"21d2296a",66026:"06b7cd3c",66081:"9d336f66",66187:"cf494ba6",66238:"1fb9ab5c",66256:"5f0246ae",66303:"a5560bad",66336:"87cc8f7c",66462:"1d642165",66465:"9c53d859",66597:"e0052e0c",66744:"2dba4e8b",66958:"12b52520",67010:"49763a0b",67021:"01ba3f79",67061:"d7124adb",67069:"c7b80b67",67132:"15f4efbb",67343:"2d8700b9",67448:"85ac525a",67583:"18caf9ef",67597:"1693c0b8",67638:"a48eac25",67670:"a23744f9",67908:"d29db0e3",67954:"8d96489a",68034:"8858d0ce",68126:"db9653b1",68258:"f301134a",68689:"8751004c",68757:"5335ef4f",68793:"9b89ba00",68823:"c3ca7a6a",68943:"ae5838f0",69057:"21cf1efb",69111:"1a54bfd0",69125:"a5f4c814",69169:"046783c0",69209:"193f200e",69234:"212137e2",69254:"140e6a69",69628:"9bfbb8bc",69629:"d5d7628b",69647:"607292ce",69824:"273a5860",69843:"7facae8f",69899:"6d933e1d",70081:"41e02281",70130:"16b47049",70178:"06b3b671",70249:"7bd49e6c",70277:"d72ada40",70367:"1fc4ed50",70497:"3fb51827",70504:"f53e2381",70543:"ce79b72a",70614:"65306ecf",70706:"21d3c1c7",70964:"06876062",71081:"99c371aa",71160:"dff7b4e8",71169:"5ed92a05",71287:"167f5be9",71476:"c26ab7d5",71516:"dcf1d6e7",71544:"d93a0aad",71698:"ff9d88b6",71789:"fcfa677e",71811:"39afc900",71918:"77ccc938",72070:"59dfcfb5",72189:"7f31124b",72331:"66179fb5",72500:"cf945ce5",72613:"084a18af",72638:"1adeac4a",72740:"150a4d14",72887:"2e2a73ec",72952:"d703ca6f",72957:"d9b3adf3",72978:"2e6d047c",73238:"1eec97be",73300:"4efa0483",73338:"aa02927d",73356:"b26f6fa9",73369:"6594bd70",73452:"63b8176f",73533:"c9f98325",73537:"c7953305",73576:"05a8a78d",73618:"12dcfbad",73725:"8f9c5733",73745:"154cbeb4",73766:"af29c71b",73858:"bce7d46d",73975:"566ea6d5",74019:"ff35d8ff",74061:"a0541488",74091:"67e63bc0",74132:"ba454016",74136:"ef4e0f5d",74139:"38341509",74332:"769f97b7",74337:"cb8731ee",74362:"71078103",74441:"4c8fc79c",74465:"e6a17fa0",74480:"4bbc58d4",74578:"80ea5ae7",74794:"016f0e16",74839:"8b87f6f5",74882:"95648126",75070:"5b34d9eb",75118:"29ff1658",75123:"8d81369e",75203:"61c61e17",75273:"ca1d44bc",75546:"c176dc4f",75567:"43a232e9",75671:"ff078e30",75702:"1ac49947",75897:"64e30bbc",75911:"b92bff04",76029:"d6f3938e",76180:"172c9869",76222:"99ae7254",76240:"02de7b5a",76355:"2363ed29",76360:"4f8fd4be",76369:"c0f7075f",76527:"359e34b0",76668:"982c02f7",76793:"d12dbf4d",76878:"198182f0",76895:"8d493a07",76924:"cb870251",77053:"9eb4c1b4",77170:"60e8b504",77413:"4deae4de",77427:"78dc40c2",77527:"56c932ee",77655:"274eaedf",77680:"60043c0d",77887:"be698a2c",77923:"971cbe2f",78035:"1ae50e88",78038:"4ab9b114",78063:"c80936bd",78072:"ed51eb7d",78114:"516dec85",78233:"41742cda",78248:"df22f3af",78452:"118229e6",78863:"2b5e4b34",78924:"0fcbeed9",78991:"69f3d9b5",79073:"a17fb62b",79164:"95f18dd4",79239:"3888e873",79290:"f83967c4",79298:"10c28d6f",79315:"46fa8ad3",79356:"73a7bd5f",79478:"df5a3016",79479:"27e1a14b",79543:"fbff3b11",79691:"c733e485",79829:"6b2b8280",79895:"4bedd8c5",79958:"b00a2879",79963:"092519d2",79977:"4925ce85",80053:"935f2afb",80132:"18dd253f",80192:"fc8aebe3",80268:"eac8f2ef",80380:"2d35b91c",80709:"83cd8f20",80895:"f97cc188",80940:"abf6a1f1",81012:"85c9bffb",81135:"0c1ee94a",81400:"29b6c240",81477:"2dd65ece",81667:"a39041db",81708:"82a4f002",81835:"6a0b4355",81934:"35fa8025",82132:"919b108c",82250:"51da09c7",82342:"dc130668",82348:"3f6554cb",82360:"d885d629",82423:"fadcaea6",82513:"7bcf7096",82614:"aa395a59",82621:"ba4efbe0",82732:"ba8527a9",82864:"3ebee193",82982:"bf02c3ce",82989:"39b565ff",83069:"39c8ecdc",83074:"1a42aba3",83139:"fba94ee1",83147:"a26f7afa",83513:"fb0364ff",83692:"779753bc",83698:"39c159d8",83893:"66716ec1",83957:"10f908b7",84063:"363318d5",84097:"737371dd",84242:"6e366b57",84281:"a3286ddf",84362:"8da7304d",84429:"2da29b2a",84477:"3b12bc8a",84500:"6eeb04e2",84513:"6cb122e3",84710:"4cec253a",84745:"42325f5c",84754:"211f58b1",84841:"7e5ee96c",84847:"5c27dd68",84854:"34b19815",84888:"e9d5739e",84941:"21bf64ca",85011:"6e5d074b",85054:"7bc3feb7",85098:"60d04b47",85217:"f8482b2c",85419:"3743f01c",85455:"7e446cc1",85493:"6d480200",85780:"82033eb7",86009:"c9b79676",86018:"ed809cac",86129:"b47406fa",86150:"1db21d86",86317:"a540f8cd",86333:"08e3aaa9",86356:"4ad39569",86476:"45aa7127",86518:"ce66b6fd",86826:"7dd3be25",86950:"4e1da517",86991:"d1b2a42e",87064:"a6a8af40",87223:"27bd5328",87224:"6fc8d865",87240:"1fdab62e",87304:"9980f90c",87313:"96c0bb00",87316:"a48778d9",87443:"96ec050b",87460:"7668acae",87482:"8faa0fb1",87513:"a291f403",87634:"f807eec9",87667:"c07f2717",87799:"3958a146",87836:"e111f111",87866:"b1998bb1",88068:"8d28c4be",88179:"e5842021",88187:"1e391540",88204:"5304a4a8",88252:"c0074ddd",88295:"21ad5224",88338:"e1b9986a",88380:"ac930f6e",88446:"3c725018",88598:"6827856d",88621:"4499569c",88625:"6f59957c",88821:"41db9914",88831:"1c56d006",88879:"32ea4ecb",89002:"4455e85b",89122:"234a1403",89210:"b4028749",89215:"17114a18",89574:"13b69fa8",89675:"164cd634",89780:"46c600d5",89806:"443045da",89852:"1f79049f",89984:"93d3457d",89986:"fee1f25c",89987:"d25ffd5f",90046:"d043cc46",90075:"390ef088",90085:"d36db526",90185:"41b3e733",90205:"727a1f3c",90333:"f2497893",90342:"2c91f584",90377:"7d607fc0",90392:"f60e43ec",90398:"0b78393d",90431:"dd313590",90451:"0a13c98e",90464:"b2335bc1",90536:"73dfc993",90560:"01fb8b11",90601:"22f40a40",90610:"147b0f6a",90615:"8cd0f4f5",90645:"6601f604",90666:"459a783a",90865:"87e7806e",90896:"4302562a",90976:"6eb0ce42",91178:"9c4bbfc4",91213:"ff0539a2",91231:"28b27838",91274:"19345251",91287:"02ee0502",91304:"872e63de",91316:"b82d5884",91406:"4cd7d8af",91425:"1671b3fa",91523:"d692bb25",91628:"c839a5b0",91753:"2f535455",91782:"304ed800",91810:"0f7553c3",91849:"b2735041",92085:"000c061a",92244:"8c828746",92269:"9c42de85",92393:"db5c8692",92404:"799b872c",92456:"d6360c39",92463:"4d4093bb",92744:"6eebf72d",92775:"7861f6df",92778:"8c31caf6",92786:"ae5bb339",92843:"85c3ba36",92851:"8bfba65b",92900:"e5a16b2e",92964:"14e00221",93023:"b984322c",93071:"61e5c5b8",93151:"e7cbe8da",93176:"f7101d4f",93195:"740eb29c",93308:"b83df1bc",93340:"5d075efb",93346:"f7735fb0",93377:"dd435828",93400:"03e8549c",93590:"dede40b0",93749:"4e6907d6",93832:"917734f8",93837:"cb341380",94114:"c9aea766",94123:"7c8407dd",94136:"91dc98f0",94197:"37aba5d3",94223:"43b891d1",94328:"63f66cb7",94337:"9fdf7324",94401:"6c10648f",94452:"878356ab",94605:"487f7f30",94694:"d3e690ce",94696:"376d31f7",94932:"a233fb97",95020:"b8e39b95",95107:"d666ab7e",95171:"3db8c88b",95281:"bc08bf79",95296:"9936b6c8",95317:"cf282674",95327:"1e173bbe",95329:"5b23c695",95364:"41fbfe2f",95418:"7877b0eb",95441:"e9ef6b31",95561:"0e0f5dd2",95696:"8462ad7a",95745:"edf19300",95801:"e490fd18",95816:"9dd89af2",95911:"7e254f9d",95945:"90b0cf6d",96055:"8fa500ae",96078:"d6011437",96082:"a322018d",96135:"3061ad92",96188:"f0129862",96199:"8e2c0739",96361:"ebf2bdda",96426:"64bd79cb",96535:"38e65fdd",96544:"49ea6ca5",96547:"385bc71d",96617:"e23cd647",96684:"a612420b",96768:"2035956b",96772:"b35418cf",96831:"99ba663e",96945:"09e11ac0",96971:"57973c2b",96979:"7f6f8f16",97065:"6816f4c0",97129:"f3034cf4",97334:"9d4bcb9a",97469:"d91e7ab4",97523:"02fbc840",97547:"902fdb3b",97553:"7ea214d5",97557:"c70cb355",97617:"ed97cef0",97648:"6f25dd34",97782:"b094b997",97816:"7513b789",97826:"16cff1eb",97850:"dd6685df",97920:"1a4e3797",97955:"746bf890",98177:"049dc708",98200:"0e7f2915",98218:"1820eb3b",98272:"b7f629d0",98623:"ced65f67",98740:"d1475ab1",98791:"1a6f209f",98868:"6a913ab1",98939:"3ff950a4",99120:"008b0ccc",99184:"8aecb2ef",99266:"ca443c18",99299:"7cc0ca0e",99367:"00125b11",99389:"c2f4aca4",99427:"64758f43",99494:"f2d5637b",99607:"49ea4a42",99669:"32db5af4",99839:"15d4dc80",99871:"5e3def70",99997:"b63b5bb9"}[e]||e)+"."+{19:"73207e89",21:"a3309319",71:"148bf97d",99:"3f2b222d",172:"c102a782",224:"487be67f",250:"e9efb2fd",291:"b2f7c218",467:"3701a1f3",513:"09978e25",596:"7b2cab9f",803:"9d68a6cc",899:"a7eb7364",1116:"56b8b25b",1365:"573cbaf6",1387:"23980d09",1523:"ec2891cc",1647:"63a2d108",1652:"94113f44",1664:"180295d2",1671:"62a8f5fd",1940:"c99c168b",2044:"d2700165",2097:"7188f7ec",2196:"3febcd57",2312:"d00cc62a",2427:"f701c31d",2570:"3ec3deb8",2638:"118a41f9",2656:"37531cd2",2905:"ff69985d",2916:"2ecd567c",2989:"cf806e65",3044:"712f76ea",3102:"bf11a1b5",3145:"b6e9e373",3191:"74dd2862",3197:"9da021ce",3216:"eacb9804",3281:"35926ded",3283:"cd4afe73",3326:"58beb1bf",3397:"b0ae73af",3398:"63275511",3447:"fb8d1ba4",3650:"2e235066",3667:"1f8a4cb5",3914:"54e8dd0f",3919:"41dcf0b8",3942:"21980884",4125:"96f2442f",4151:"1a6afc47",4195:"7402cd75",4244:"8675cdba",4328:"e0ebdf09",4504:"025ef68d",4513:"558b14b1",4585:"05c4d67b",4631:"1d37eb2a",4742:"5447edff",4874:"39750b03",4882:"d87f6d94",4929:"18e132f8",4972:"a7243668",5061:"509f2053",5129:"479f5110",5132:"27ce2caa",5313:"5f8b4c43",5352:"578d6913",5383:"306e0d1d",5512:"8f6a2d54",5714:"1a18df09",5909:"51facc0d",5920:"8c8aae04",5981:"68d16136",6027:"6f5c789e",6151:"81e3b729",6386:"67f2de5f",6443:"14902fad",6517:"34dee69c",6537:"ba99c5c9",6553:"08568a59",6695:"cfee6a07",6734:"6772bb12",6799:"67367b32",6822:"f569301c",6824:"6e7a03e3",6968:"a44bcc6e",6971:"873ad005",6978:"c6ebb24c",7078:"521e19d7",7091:"8c3d2fe2",7092:"cc64c0ff",7108:"d17c6119",7120:"d18d3e66",7155:"f52fceb3",7162:"cf567055",7318:"e3d90eca",7451:"334acc4c",7470:"ae4956f9",7485:"7b41d7f5",7500:"099ef3cd",7874:"7c1cf1bb",8023:"897a9b52",8135:"672f0bde",8145:"5d99a1dd",8188:"3b884b4b",8210:"d7b2d51a",8230:"119f8a4f",8313:"5d208e75",8328:"d30eb04f",8407:"28cf1826",8482:"cceadeb0",8638:"c9ba8e41",8671:"b4177ac9",8809:"959e39c4",8882:"0bf188b0",8906:"cfd44206",9028:"9d1978ac",9119:"fd788116",9225:"a93f8834",9235:"b22f9f6d",9365:"8b245b69",9444:"ca6f47e0",9542:"4f4a1a23",9550:"83ecb96d",9615:"54fc882e",9817:"83bf0cd5",9836:"c72917a2",9907:"676fbeba",9947:"5cfa1c77",10109:"beb060f2",10228:"aa539b8b",10270:"8ae752b8",10436:"1ec22cec",10480:"8d52d22d",10497:"809aee5f",10650:"547a5e75",10756:"2bf7ecc4",10882:"023d9d6a",10918:"ef85344b",10987:"c12ef65c",11147:"29a02178",11174:"2c33f6da",11203:"9539d62e",11311:"f40cd0a5",11321:"cd4efea5",11326:"6cbc304c",11342:"2bccdb3b",11398:"f8e68ae8",11565:"f2495b87",11646:"3c4058a3",11656:"ac13cb1c",11875:"5b868eb0",12228:"d521a8eb",12442:"25d7e31c",12549:"16f88776",12555:"900d6f87",12558:"0ac14857",12560:"9bb2eb9d",12567:"7d910920",13253:"0ef1443d",13280:"29f73853",13351:"804b2952",13460:"9e91d59d",13588:"d0b6d6aa",13595:"6c1f4a63",13617:"76fcc35a",13718:"94e8a0fc",13896:"76d63f5f",13924:"ba0bd4a2",13972:"302371f0",13979:"aad133c6",13995:"243e677b",14061:"6004bdb7",14088:"6e4abf52",14095:"18927dce",14143:"a3b631ac",14200:"5d9965b6",14299:"f4efc87a",14369:"b25e3d4c",14386:"acd52792",14396:"fd9bfcdc",14549:"607a549f",14610:"a9ad2a64",14670:"bd2bffef",14713:"8c81902c",14775:"35ca632b",14840:"4ae7e4f2",14908:"402f178d",15196:"9dd233ed",15380:"38251be4",15393:"bc061f23",15497:"2c78c17f",15658:"3662f8f0",15888:"8a70bc30",15970:"360efba8",15994:"9794e987",16022:"c86edca7",16038:"8f228690",16058:"f0b25bfc",16071:"0df54331",16153:"4800d10d",16161:"3a69d696",16379:"38089655",16528:"f38ed133",16635:"e969de7c",16672:"38ba7418",16685:"3022befc",16876:"19436136",16891:"303f3d1b",16973:"9fbbd5c9",17009:"80398da5",17275:"06c7edce",17283:"b641bac3",17457:"ab4ccae6",17511:"fdf99bc4",17726:"fc4837d4",17757:"10a7c58d",17785:"172166c7",17883:"3f659ac3",17887:"fbdcba1d",17989:"52e3fc0e",18025:"8e9620ce",18050:"5b8280aa",18084:"fcd7fdb2",18100:"0a6d52c3",18143:"b09cd8a9",18156:"a975996f",18186:"df8b47fc",18318:"edf202fa",18559:"5c93aa35",18598:"cb52400f",18680:"59e00bd8",18706:"15008a37",18734:"5dd15d0b",18746:"f9ac8609",18883:"90cc608f",18892:"6c8911a8",18894:"74b1ce85",18928:"0701e03d",18998:"d6cefe2f",19177:"f4fb3a86",19204:"25b579ad",19212:"c6205a58",19305:"bf07439c",19408:"893bf9b0",19427:"13389c1c",19493:"0990c5c4",19504:"3cbf15b2",19531:"795dc04c",19625:"acfca89a",19661:"8f1f85f8",19671:"e0c673af",19709:"eaec2d23",19733:"3acc99c9",19806:"837a7ae1",19832:"52e2b5cd",19876:"a944f10f",19939:"491b1577",19962:"a3ecf956",20040:"1320b0ce",20061:"fdab8ea6",20169:"f30c5c13",20303:"ac64adc9",20602:"d2e8db2d",20689:"f0ff8154",20707:"9011dfb7",20764:"705b6a69",20841:"ba439ab1",20876:"81d45514",20911:"eb39e4b7",20917:"9c9a3e5c",20983:"f78047ac",21015:"1e986630",21134:"a1896a0f",21143:"16349e64",21190:"580de28d",21207:"8e9578f8",21228:"ad3ad6e5",21379:"80aa9c55",21643:"eb35f457",21688:"795caae7",21823:"35e78f99",21983:"a4e4572b",22030:"c38a2655",22087:"817ffdcc",22129:"267916df",22163:"f5657f46",22238:"bdfbafdb",22456:"62957769",22523:"15391619",22540:"57cb9539",22583:"b8adcfe8",22604:"4c410f1b",22777:"991b45b1",22898:"c8aecb21",22940:"873908b0",22997:"f3a4a591",23064:"03b7ec0b",23228:"c0599384",23231:"295984d8",23252:"94c1e97b",23310:"c407c53a",23320:"3c9b69f0",23343:"08e5a4d6",23435:"59082b53",23522:"dcfb4085",23536:"2a58bbac",23545:"4623b3a1",23663:"33ee14f2",23714:"43716ebe",23804:"66d68fe3",23898:"ec519008",24058:"07462b4e",24066:"9d4d9ce3",24101:"d3e3013e",24109:"795d5349",24158:"4b081433",24266:"0b540723",24282:"d3ef7720",24401:"1ae158f2",24467:"dc4c3279",24501:"178a453f",24946:"8d83115f",24986:"00c01dd4",25079:"1dab7340",25251:"3cee59a7",25283:"06e3d89c",25427:"854a38e7",25433:"22246f6c",25451:"a23a897f",25513:"3f77f081",25547:"284c9b9e",25579:"9e7055ec",25833:"71a40566",25898:"94b4215a",26067:"7b3970ce",26084:"861dcdd5",26086:"12738d95",26201:"75d8825c",26291:"e8f1fa5a",26311:"ce5c5ebb",26361:"5c0647d3",26521:"89b58f07",26654:"4d65993e",26686:"9da74581",26695:"49306f14",26858:"1ce10981",27031:"e213c1a5",27109:"fbcb735e",27167:"eb8d133f",27270:"3324e9fc",27276:"2fd74dbb",27303:"46258253",27324:"04a7ca69",27554:"c429cc73",27704:"849e54d9",27918:"ca462563",27982:"0533c8f0",28085:"d5cffe43",28134:"3e2ffbbe",28139:"80df3532",28181:"4f668e37",28219:"51e8e461",28261:"f279f4e5",28367:"dc6ae3d7",28475:"2000a841",28476:"892e8462",28514:"2d31535a",28516:"d3f4d479",28623:"760e1770",28699:"c9753e68",28800:"7ebb42b4",28880:"777c9b40",28882:"79f11e9e",28906:"dd4d9809",28922:"4a03c698",29014:"63a3f3cc",29025:"caedbded",29050:"15e17037",29066:"100d7b9b",29131:"61e3e7a5",29191:"6765a974",29272:"a7da1cef",29514:"902f2c64",29520:"59707016",29698:"4ac96687",29717:"c3facf77",29782:"8ca83049",29818:"be78b6d0",29831:"c421c31a",29864:"7e0679a3",29871:"8a4a1409",29886:"da3cf2c4",29899:"8cb1ad4a",29978:"f29be154",29980:"15805725",30062:"2dbf55d1",30216:"c844cada",30295:"54944412",30419:"be694780",30433:"db6c199c",30454:"5264f7a4",30470:"0ae2450e",30589:"3a397208",30613:"aaec584f",30677:"b72f0627",30678:"2a52c41d",30800:"9375989b",30820:"b28f98b9",30834:"ed2abcff",30837:"e852c553",30865:"8b9d510d",30885:"e871b509",30979:"589a7d3a",31009:"41ecc7f9",31013:"7a5f9581",31023:"970d7bca",31044:"732db84c",31050:"400af1a1",31068:"1f0b2373",31089:"ec193a0e",31116:"bc1bd6c9",31152:"a086d363",31187:"52f3a337",31293:"17ff3286",31294:"3c2a361c",31441:"c659a961",31471:"df9c6253",31512:"69ffbcf7",31516:"6f3edbc7",31570:"a9bef6dd",31671:"4d0bd185",31824:"22f60a1f",31938:"7889faa7",32224:"19c9db8f",32255:"65ece1b5",32308:"df0742eb",32319:"63655ae9",32410:"943f13a7",32446:"8ce02657",32491:"bdd9498f",32567:"277cb195",32652:"72910332",32689:"08107660",32839:"5c79dc2a",32872:"855c14fa",32892:"250cb5b2",32914:"70e38801",32961:"654461de",33023:"2964b050",33076:"ef5beb95",33083:"16e34a1a",33131:"a67a7052",33138:"cbc82116",33178:"555e80f9",33181:"2310c781",33223:"e5f8838d",33260:"5ee9d0c7",33261:"a2e46c4f",33329:"94ab58ef",33407:"3ff66407",33514:"0e769677",33725:"6470c099",33737:"f6fbd704",33889:"1f400f9d",33920:"32836d68",33966:"9d5d17df",34020:"309d55c2",34077:"652a00df",34079:"25c81e5a",34153:"f309a901",34206:"8ef010f8",34293:"a7e2d1af",34294:"21881a35",34323:"82bd13d2",34407:"f08cd09f",34458:"bedb0cad",34460:"51defee4",34475:"994398a4",34552:"278830b5",34590:"e79f245c",34647:"cb920ca6",34656:"53a0d9e7",34748:"c74caba2",34766:"9716c156",34777:"55318a51",34784:"121bb89d",34792:"10db5cbf",34800:"31234350",34882:"53c961aa",34943:"5a2f2d6e",34979:"be9c4116",35038:"852d9cfe",35069:"05c8b29d",35214:"3be2021d",35216:"783d15db",35387:"d6c4b7cd",35466:"b8c66a97",35577:"eef0d34f",35614:"0550e592",35647:"8df42e8b",35768:"fa150a9f",35809:"29c1c1a6",35874:"64598624",35879:"603ed18f",36009:"ed99bcf4",36312:"f6211aac",36415:"7269bd46",36442:"781b7f36",36483:"e4ad43b7",36495:"c4a16cab",36511:"aa66c640",36673:"9639c6ef",36766:"a983063f",36773:"364a602a",36933:"6c6d7692",36935:"232400dd",36983:"87a7744c",37021:"8953fbe7",37055:"8a714c7c",37058:"e2b52ed6",37208:"4babdc40",37257:"7b25eb85",37316:"136d87ad",37362:"af7565f6",37426:"a3fce28a",37894:"1d31c5b3",37918:"944b0fb5",37977:"a605632a",38056:"c9cc2c03",38104:"70b4c07e",38230:"a9d249c1",38333:"6d30319e",38349:"86ac1432",38368:"99e33615",38450:"971f211e",38469:"c905d16c",38504:"b575cc51",38591:"8b436f7f",38679:"2924f701",38741:"056b89be",38768:"e49628aa",38792:"873b0b4e",38819:"d5786d3c",38873:"aa2dff10",38928:"915007c0",39033:"72f28f6d",39177:"e72fee4a",39209:"ab100076",39252:"a8e9c58b",39275:"68258924",39325:"9e574bbc",39368:"9b3b00b6",39605:"87f4261f",39645:"363e983b",39726:"e601e6d1",39820:"efc15edf",39853:"2eceed3b",39941:"09b3269e",39972:"1af66b9e",39978:"3914825c",40097:"065399f9",40158:"c5447ab8",40176:"a0efce43",40342:"6b02e5f3",40365:"943c9bb3",40665:"43c72f99",40830:"28cfd22e",40930:"0bac8708",40936:"6f445b64",40986:"f6358136",41100:"8b72bd88",41120:"a6c90b9f",41329:"327337f0",41388:"15946aa8",41537:"507f5136",41750:"dfbc322c",41840:"319bb3a8",41863:"397dd98b",41929:"1df8f60e",41954:"7942c49d",41958:"83b83b97",41998:"a08698a0",42051:"c783d817",42054:"9a68e985",42059:"dee04e82",42169:"d7053385",42187:"2cada35c",42226:"85755b59",42263:"1b5d9df4",42288:"5b62f81a",42289:"0c3f570f",42371:"d7fc9caa",42436:"1827f217",42465:"cf7195c1",42551:"e6eb7da2",42609:"0fc5596c",42620:"81528074",42690:"c8225ded",42721:"697fbb16",42728:"2d6aacf6",42757:"a4a33845",42930:"4a3d4ba3",43037:"f8316728",43047:"4998e689",43072:"2e96fcd7",43163:"588bcffe",43294:"0b488eb3",43529:"f1d99b35",43554:"9b15f63b",43635:"382a5fae",43645:"e2041df8",43697:"5f5f48af",43793:"b242fd59",43849:"3d340240",43919:"60913365",43966:"f807cefc",44023:"d8d2c9f3",44029:"a29133e6",44118:"1bc0c1f6",44152:"93b2a9bb",44174:"5bbd8c7c",44393:"619828bb",44523:"c96ddcbc",44592:"64edac99",44765:"404d2c80",44780:"8b703881",44797:"bc343266",44860:"83bf1478",44907:"b7e881e3",45057:"026ec45c",45091:"f47baa55",45114:"c5711e84",45279:"b7f604ea",45287:"7ea5080a",45571:"b3f438b3",45583:"95c5394f",45593:"fa50b001",45732:"646f2c6f",45786:"02f6764b",45809:"a28b7ade",45878:"2405e6ca",46017:"fb5cff5c",46023:"02d71bae",46045:"7c8d179b",46074:"85dc8255",46218:"304359f2",46284:"35783c6a",46328:"e6ad5407",46447:"b4d585fb",46838:"615dfccd",46901:"73c5e427",46945:"aca29914",47062:"ecb4b558",47068:"e5db7558",47082:"87413ddb",47117:"925c7a4a",47276:"414f3077",47287:"17c798ef",47463:"ffbf05d7",47568:"2216e773",47582:"98df77d3",47655:"741c773e",47708:"28dd4c52",47838:"d2dfacb3",47851:"9638e317",47975:"1d9b9deb",47986:"768f43ef",48031:"62017355",48055:"6eda5bd9",48150:"3729eba4",48218:"786de4f3",48320:"0781165c",48349:"1227a5f7",48426:"e645d804",48637:"429d7094",48670:"82bc3aba",48840:"9394dd4f",48998:"217ac770",49096:"bf964682",49169:"7d6d8f24",49241:"a8e10b10",49270:"cdfcac6d",49681:"bf0ec94e",49716:"b198539c",49874:"1c08aa98",50017:"8cb0b147",50052:"32a7f158",50145:"16d798eb",50153:"b103db02",50193:"cd9074e9",50240:"e5ce9a3f",50337:"e1095412",50362:"1653c490",50375:"e03acd0c",50437:"afd1bfa0",50472:"7aad844d",50525:"8ffca8d0",50773:"93035d26",50849:"afc6073d",50999:"7814621e",51532:"4b48e1fb",51555:"8c0ea918",51574:"6cb382dd",51593:"ccb11905",51605:"0604c821",51625:"9af5a24d",51706:"5c9852e1",51768:"4856aecb",51799:"04e48987",51830:"e3f6d03f",51840:"6e2f046c",51881:"6f7b85ae",51945:"084ffea8",52094:"5eeb6372",52126:"5ffae957",52251:"71f8dc72",52286:"dfe5dadf",52491:"3832c13f",52499:"3e054d23",52573:"006068eb",52586:"fc99d991",52593:"93febf0e",52715:"246739de",52789:"4e836d21",52870:"58b181cd",53237:"c3a4514f",53243:"5c3c3aa5",53371:"b01e4c10",53442:"ddb338e3",53675:"bd70cb8a",53823:"6530507c",54125:"2f1c7fe0",54133:"45c8fdf2",54178:"89e6c31a",54200:"8191d4b9",54210:"0ff7d73b",54250:"f3a66d07",54265:"4eb7ccff",54363:"101747c1",54382:"226534bb",54397:"95ca43ad",54487:"96d9e1eb",54513:"952be9ba",54591:"4c7ab366",54741:"8d1a6e43",54756:"354bdb25",54778:"ab81ffc8",54786:"18a4a7ca",54794:"daac75f3",54855:"4f0b894e",55043:"923df72d",55216:"dff3209e",55239:"52636bf0",55273:"27e02213",55335:"40de3b68",55478:"003e195a",55552:"c5b6f07a",55693:"a959df65",55726:"8042dc4a",55745:"17a44615",55799:"92eb75ec",55821:"6a3913bc",55925:"5005937c",55962:"40e7c1dc",56290:"d30a97c4",56424:"1a06e672",56513:"5f155b72",56541:"ac6ce218",56552:"4d94b2e7",56614:"f2830c22",56750:"186f7120",56795:"017a94ef",56902:"f91db1cd",57121:"e7f833f5",57126:"704067cd",57242:"c1396e1a",57258:"3aaaa2c9",57293:"3721a264",57341:"2aa6199b",57489:"3d3e42fc",57598:"7a15184a",57599:"a8e91558",57699:"77e23c32",57749:"aa83a7f5",57780:"44a93095",57820:"d9fe0245",57995:"11d969fd",58009:"c5f9c347",58042:"9798f0fe",58096:"637b08bd",58182:"b13442de",58197:"1d43d81a",58234:"d3624c41",58247:"47cd75a6",58356:"1aea63a9",58539:"2e4b62b8",58564:"fc408722",58768:"41323c68",58818:"9671e908",58822:"0e081dcc",58824:"3ea8fad3",58914:"33792c32",59051:"ce30c81d",59060:"f4bb9845",59181:"ca99c1d1",59191:"4f95915c",59241:"126c9fc5",59248:"8c5d9aae",59336:"1face31a",59342:"2c3b6d28",59394:"cb1346cd",59427:"9a0b349f",59442:"d5c2c74b",59496:"0c397f5a",59506:"4a81177d",59533:"915071cd",59592:"b7b4e63c",59771:"d25615b8",59900:"1e9304e5",59982:"c299fce0",59992:"114b244b",6e4:"cc9f4527",60185:"babf0d59",60331:"924ba5f7",60434:"52042dd4",60487:"80f0aadf",60518:"c0f1e5e8",60603:"5316c07a",60608:"9430b8fe",60682:"7854ea33",60831:"08063f83",60868:"267fe23b",61007:"dcc4e5af",61200:"6c5f5e8b",61210:"d73ed0a2",61249:"09ef4526",61271:"f20051ab",61361:"636b72a8",61426:"8bf7a004",61543:"8f27603a",61643:"f9dbfec5",61793:"01718e8f",62117:"a30c0b4b",62235:"58285753",62307:"5a3ba620",62318:"9a886142",62319:"b75eb48b",62474:"dc194af4",62493:"bbbbd1d9",62523:"61046ac8",62547:"fc411513",62655:"f09d9004",62867:"46a98c28",62948:"dbf70f88",62983:"61d5d965",63105:"096987cc",63387:"77f70d47",63604:"66965e35",63663:"c51c3a56",63777:"0219cda6",63801:"70a50fe6",63808:"379d8af0",64e3:"c520b0e8",64032:"b5c596b3",64072:"f3f20053",64100:"7d16a1fd",64122:"1eb9bc55",64164:"e2cc50cc",64485:"772190e7",64525:"fde13b5a",64533:"dbde94ef",64596:"7e071618",64656:"5c661dee",64754:"1962a975",64766:"2e434b9b",64824:"68d4a423",64871:"0c878ea9",64900:"9c15d13e",64967:"8c69c566",65132:"c728a280",65501:"2405319f",65592:"a5b39398",65594:"30989c6f",65641:"3548b448",65758:"9dc5e65e",65883:"8e29ae7f",65965:"fca751f4",66026:"e4206534",66081:"0b35a4d3",66187:"6bc78a22",66238:"c7bfdb48",66256:"441bc284",66303:"7097325b",66336:"7ee25827",66462:"933f4a86",66465:"711a6a15",66597:"abbb3946",66744:"fa50fa5d",66958:"d3834829",67010:"64408058",67021:"b6e03309",67061:"94f1459a",67069:"a14d8963",67132:"34a01d6f",67343:"0929d0ad",67448:"c3192226",67583:"7dea7c29",67597:"c925248d",67638:"4606f8cf",67670:"cb8c9278",67908:"7befb180",67954:"a0ccf3c9",68034:"673d4259",68126:"ab783f9c",68258:"fa76b63f",68689:"3a807ecc",68757:"4b7d9782",68793:"af3c15ab",68823:"5c7b8d2a",68943:"f7a596db",69057:"2a9b8add",69111:"6f76f4e0",69125:"6052e105",69169:"6e79070e",69209:"9e539e38",69234:"6337130f",69254:"270b7bf2",69628:"7988be09",69629:"6d3ab36e",69647:"c5c936f9",69824:"7d0858b3",69843:"a06150bd",69899:"576d10d9",70081:"18e16c97",70130:"be0e7aec",70178:"81e6c33b",70249:"b993646f",70277:"311356a4",70367:"1d4d5424",70497:"557d33ec",70504:"28a803b5",70543:"2fedfbd9",70614:"4f9da953",70706:"73960b42",70964:"bc48267c",71081:"31a6d953",71160:"4dddf998",71169:"adf88673",71287:"885b618b",71476:"19277f18",71516:"b09b1ddd",71544:"27aa44ac",71698:"bc2f9371",71789:"59ccffdc",71811:"264139bc",71918:"b32dc731",72070:"7ed11917",72189:"6dcfecb3",72331:"6d29448b",72500:"f64f66be",72613:"c769eb3e",72638:"fdb543fa",72740:"556dfa23",72887:"4cdbf544",72952:"e3a8eab8",72957:"960770e3",72978:"aa7363f2",73238:"d2a2d1e9",73300:"e20e9d0a",73338:"04155867",73356:"36d8b7e0",73369:"f8653575",73452:"ed4f95c5",73533:"0ca294c9",73537:"3e09c354",73576:"1077006e",73618:"f611989b",73725:"5a69a073",73745:"593cd9c9",73766:"e1b090b0",73858:"c8d96799",73975:"4a26020a",74019:"eab93d36",74061:"47eacce0",74091:"b78b34ed",74132:"a5970e5e",74136:"f2226b77",74139:"0bcd285c",74332:"0f66d626",74337:"d3114a2c",74362:"8c45079d",74441:"0ef242c1",74465:"3b3b7dbd",74480:"25f1369f",74578:"178c32cd",74794:"8f102688",74839:"577811de",74882:"f0b13ca6",75070:"22d722a3",75118:"bc928be3",75123:"8b8b6076",75203:"8b7d980c",75273:"8dbf257f",75546:"49f94163",75567:"355318aa",75671:"5943eb1e",75702:"dfcf6ff3",75897:"b7955be2",75911:"02fe2b4d",76029:"a35fff25",76180:"059fca9c",76222:"22652783",76240:"6c6980d9",76355:"b62d2caf",76360:"4e523b78",76369:"9e396c17",76527:"50bbd3b6",76668:"1fe0055f",76793:"a673c81a",76878:"575a3510",76895:"bd4b403b",76924:"bc5962fb",77053:"16eba6b8",77170:"e863ae3b",77413:"7e5de04f",77427:"add12e61",77527:"78b8c9e0",77655:"0003883b",77680:"595d9465",77887:"f2e6f18b",77923:"fb735e37",78035:"4ead3dce",78038:"52038e48",78063:"5324c9d3",78072:"d4105f8b",78114:"34cd7f60",78233:"2e83e610",78248:"74069962",78452:"8e3ff138",78863:"c85a658c",78924:"f65e527e",78991:"52cf8db2",79073:"9b6a9356",79164:"2bf3dad1",79239:"6f1637b7",79290:"0c519db4",79298:"4a5cabcd",79315:"4576f615",79356:"37e00d95",79478:"36233afd",79479:"24435308",79543:"a1c76445",79691:"abf6490b",79829:"d0e80522",79895:"85d250b0",79958:"aeb03179",79963:"420ea460",79977:"09932290",80053:"544bc93f",80132:"2e07a3f4",80192:"44c46920",80268:"729162ff",80380:"8a522d91",80709:"fd31ec19",80895:"2b1b02ce",80940:"34e77834",81012:"71e2a477",81135:"a0f32953",81400:"f1a7b723",81477:"81ce2eb6",81667:"b8b8cdb0",81708:"4d7d7762",81835:"efcb7b56",81934:"33bbfee1",82132:"14c8bdbb",82250:"ca3680fe",82342:"74cd685f",82348:"85d844f1",82360:"143d911e",82423:"7b71feb1",82513:"2f0d774a",82614:"2a95d08e",82621:"dfcb91e5",82732:"fd3290c8",82864:"32d19c2f",82982:"a2616da8",82989:"6a4597e6",83069:"47b3ad61",83074:"c97db8cb",83139:"80fa04ea",83147:"888c5d69",83513:"e4ecf5db",83692:"db7a78e4",83698:"eb723d7a",83893:"3d2b86a2",83957:"1b9c82e0",84063:"c8ad2693",84097:"6ef8d9da",84242:"c9097d63",84281:"6a10882e",84362:"c6417e92",84429:"5df8ba6b",84477:"d8bfc341",84500:"431c6477",84513:"b6949bb2",84710:"de128b25",84745:"b52aeec7",84754:"dee9fa1c",84841:"93e39080",84847:"215dbaa3",84854:"84dae178",84888:"22c51fd4",84941:"b4c8a6cc",85011:"38673f2f",85054:"adf0601d",85098:"279d70d3",85217:"f61e93e0",85419:"46bf7559",85455:"83eea1ee",85493:"119281a5",85780:"b9f5f272",86009:"70baefe3",86018:"31bdd1d7",86129:"a2e712a4",86150:"cc7f840f",86317:"fdb12ba6",86333:"ea59a642",86356:"dde2e98e",86476:"197674e4",86518:"9284ae71",86826:"df69b0fb",86950:"e65ab699",86991:"e5fcd808",87064:"3cca1274",87223:"c957930d",87224:"e14869b8",87240:"58b186a9",87304:"886ecc13",87313:"c8286e0c",87316:"b4e193d1",87443:"9aa85498",87460:"e647bbd4",87482:"b9081c96",87513:"b2501ac9",87634:"413486e3",87667:"1173ba50",87799:"ce5bac8d",87836:"b8d9a0fe",87866:"7f1c6977",88068:"6611ef54",88179:"0286c49a",88187:"138206fb",88204:"c6e4cde3",88252:"a2adc43d",88295:"58531032",88338:"44d418f3",88380:"8c2767c8",88446:"bedc1525",88598:"44737f48",88621:"66870438",88625:"fd93b3b7",88821:"a3cc23cf",88831:"f98e1422",88879:"bc39a8d7",89002:"a4594737",89122:"7931db90",89210:"bbe355e5",89215:"6f690905",89574:"a872329f",89675:"2de3dd35",89780:"cb004132",89806:"8290698b",89852:"0bf03005",89984:"c9c8bc7b",89986:"25e5550e",89987:"7ef3f99e",90046:"18bf9d1a",90075:"4c3e9e36",90085:"bcb38f13",90185:"9e4a08ea",90205:"938baf6b",90333:"50f0aaba",90342:"f21131d2",90377:"d1f93533",90392:"bfd01257",90398:"52310a46",90431:"8afcfb3b",90451:"9f655912",90464:"b5c9216c",90536:"9d746e67",90560:"f21ce84c",90601:"e04386b3",90610:"bf6a18e1",90615:"5eb23ee3",90645:"ec412e06",90666:"67ccf36e",90865:"988a7db1",90896:"464b25cb",90976:"3c307ae3",91178:"ae626034",91213:"1668605a",91231:"20e516c9",91274:"06c24edf",91287:"2cac701a",91304:"51bac317",91316:"2e1f88fa",91406:"2f540e5a",91425:"6224592c",91523:"da34a7fe",91628:"5022e7cf",91753:"c5461936",91782:"d494ecff",91810:"e9392fb4",91849:"c19b88cf",92085:"da4ef155",92244:"b2497d07",92269:"5799f8ca",92393:"63912027",92404:"75dfbd51",92456:"90063d24",92463:"3c0a582c",92744:"d29698c0",92775:"6cc03df9",92778:"871cdeb3",92786:"5e68e413",92843:"ae295bd7",92851:"d6f651a2",92900:"e7dbc981",92964:"b6045191",93023:"67e73b3f",93071:"060ab57a",93151:"ee800142",93176:"a74fd090",93195:"e70b51ce",93308:"1467e450",93340:"11edc4dc",93346:"1b4782c6",93377:"d70f55df",93400:"5c88336c",93590:"c0ae13c5",93749:"b0c0c0b9",93832:"ee4682e2",93837:"e0b7e94d",94114:"c4a618c6",94123:"1a6f14e8",94136:"acfc65cb",94197:"84ad4950",94223:"0ea8a7ad",94328:"3c453dd8",94337:"b1e4d02c",94401:"310ef608",94452:"890d4cf7",94605:"3a9648f7",94694:"efee88a2",94696:"be43fead",94932:"f4de81f7",95020:"b3c692d6",95107:"84d38a95",95171:"b59fa835",95281:"452f0858",95296:"98493597",95317:"8d86f465",95327:"e8f86b92",95329:"b279cc8c",95364:"31dcafc0",95418:"5d25501d",95441:"589b9d96",95561:"fd1ede1f",95696:"cadd461a",95745:"f8e3a2bf",95801:"0a783e08",95816:"df2ce6d0",95911:"b9556956",95945:"f79f370a",96055:"b6642cba",96078:"531d5024",96082:"b914d28e",96135:"a45b92e2",96188:"a987d859",96199:"c561fd38",96361:"900f03a7",96426:"d4071849",96535:"d7e854c6",96544:"f92e8b94",96547:"4ffe2ef3",96617:"95351d4b",96684:"5704d749",96768:"217d8d37",96772:"c1e0dc45",96831:"c60ddcf0",96945:"98d2ab5d",96971:"ba9e1277",96979:"9303b4cf",97065:"947df423",97129:"00e43607",97334:"f8f4fc7e",97469:"5487219f",97523:"a3b57cd6",97547:"9e232073",97553:"2ab99719",97557:"23a5099e",97617:"a15faf99",97648:"57bdbab4",97782:"c09f3975",97816:"3b68fd86",97826:"d0281c69",97850:"77a305aa",97920:"1ecd994c",97955:"ef56c17f",98177:"9bb8538b",98200:"d224c5fb",98218:"290f20df",98272:"69ca16b2",98623:"4774311b",98740:"5e28c252",98791:"f37abbd5",98868:"7936d28a",98939:"dbffc577",99120:"6a0751c9",99184:"9282b3d6",99266:"cb2b1bcd",99299:"c9478c8c",99367:"c363645f",99389:"b6458274",99427:"dbba9055",99494:"f16b2e8c",99607:"c5ea3f3c",99669:"ce4a8b9f",99839:"19eb08d7",99871:"2f5edb35",99997:"590480c1"}[e]+".js",r.miniCssF=e=>{},r.g=function(){if("object"==typeof globalThis)return globalThis;try{return this||new Function("return this")()}catch(e){if("object"==typeof window)return window}}(),r.o=(e,b)=>Object.prototype.hasOwnProperty.call(e,b),c={},d="@cumulus/website:",r.l=(e,b,f,a)=>{if(c[e])c[e].push(b);else{var t,o;if(void 0!==f)for(var n=document.getElementsByTagName("script"),i=0;i{t.onerror=t.onload=null,clearTimeout(s);var d=c[e];if(delete c[e],t.parentNode&&t.parentNode.removeChild(t),d&&d.forEach((e=>e(f))),b)return b(f)},s=setTimeout(l.bind(null,void 0,{type:"timeout",target:t}),12e4);t.onerror=l.bind(null,t.onerror),t.onload=l.bind(null,t.onload),o&&document.head.appendChild(t)}},r.r=e=>{"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},r.p="/cumulus/",r.gca=function(e){return e={17896441:"27918",19345251:"91274",21996883:"26086",24647619:"42288",26134010:"15994",28325500:"49681",34088569:"60487",38341509:"74139",39579801:"6151",46551803:"16685",54230287:"53243",62041344:"65758",62127933:"16038",65360910:"31013",71078103:"74362",78718572:"41750",79617745:"19204",84960677:"60831",91647079:"36773",92307374:"27554",95169675:"21190",95648126:"74882",99496549:"31152","906e49ec":"19",f5e3827c:"21",ab971afc:"71","49c587c2":"99","21730a31":"172",a93c3367:"224","23d30d6b":"250","5da0ca7c":"291",a5bcb3f1:"467","9216ce7b":"513",b564874a:"596",c63e6bd5:"803","54d8bddc":"899","0109100f":"1116","66ffc608":"1365",be2f7876:"1387",ef01e1dd:"1523","7981506b":"1647","902d2d1d":"1652","9ecb4d01":"1664",eee57cd1:"1671",a971b35f:"1940","0149cacd":"2044","40d51a61":"2097",e1e17943:"2196",ac4bed99:"2312",fa423b6e:"2427",b72a3182:"2570","935116ff":"2638","80631bfd":"2656","60b67194":"2905","7bb83d6b":"2916","9ef1e345":"2989","3bedcc76":"3044","7174660f":"3102",d4d22ad8:"3145",d0a0235c:"3191",ec28562d:"3197","92b043a3":"3216",a50b12c0:"3281","0b092b5c":"3283","5e94ba2e":"3326",b7343c9b:"3397","1e070b7c":"3398","4162c6a3":"3447","020a22ba":"3650","6c1d24e1":"3667","7b7fec6b":"3914",c81517c7:"3919","2b024d60":"3942",feba251b:"4125","5017cef7":"4151","3c93ed7e":"4195","21cfb395":"4244","38680a69":"4328",f28093e3:"4504",ffa15017:"4513",f2dc10f7:"4585","9654b394":"4631",c85450ca:"4742",a3db1255:"4874","4482beb5":"4882","75600d79":"4929",e54b1e77:"5061",d87811ce:"5129","23b46f68":"5132",aa4fa4fb:"5313","622596e9":"5352",f2a3bf8e:"5383","631dea17":"5512",d16a2606:"5714","66e9ea68":"5909",d613e1f8:"5920","85e709dc":"5981","391378fa":"6027",a8ef1ed2:"6386","5b4a63ac":"6443","31c3e3d7":"6517","0c99e969":"6537",efc338fe:"6553","111e23e1":"6695","85954f48":"6734",d1284c82:"6799",f49551b9:"6822",cc519fb4:"6824","9e6b2559":"6968",f38fa80d:"6971","7d9c461e":"6978",ff96de6e:"7078",bd1a8573:"7091","30a13577":"7092","365726b0":"7108","7e91f3e1":"7120",bb4987bb:"7155","97ce6959":"7162",b7e69c77:"7318",d8c5fc94:"7451","3e8cde1e":"7470","81f033b8":"7485",bd0e022f:"7500","32d13eb8":"7874","43de05a8":"8023","7d280bdc":"8135","93015a15":"8145","39bddd84":"8188",c565b8da:"8210","3c20ca15":"8230",d163ea32:"8313",fa8af309:"8328",b4473d93:"8407",f983631a:"8482","2b8a5969":"8638","1b93ff3d":"8671",aa01ca6a:"8809","6fdd5bc4":"8882","6d2c1101":"8906",de8a7b18:"9028",debbc0e2:"9119",e5523a26:"9225","407bcc70":"9235","541bc80d":"9365","0ffc31bc":"9444",cf4d312e:"9542","36edbaa2":"9550","9db3bdac":"9615","14eb3368":"9817","13af1bdb":"9836",bfd6b54b:"9907","7097fbbc":"9947","70cd875c":"10109",cdebfca4:"10228",a7da438d:"10270","05fa5837":"10436","8203e9fd":"10480","6773ef05":"10497",e8b1baf4:"10650",c7dacad4:"10756","544bf006":"10882",caf7e36c:"10918","26e1978a":"10987","76ace0dc":"11147",ba17e21b:"11174",c98c0daa:"11203","2913cae6":"11311",a15a0d8e:"11321","65031edd":"11326","4d2bb41f":"11342",ba5e62dd:"11398","9d08c935":"11565",b74e2fe0:"11646","885bf670":"11656","3d446fd0":"11875",b63d08bc:"12228","1fb2401b":"12442","31eb4af1":"12549","885da4ef":"12555","58ac1d26":"12558",a6d8b730:"12560","5aabd190":"12567",b48b6b77:"13253",f65f22ef:"13280","6d905346":"13351","70808dd0":"13460","55920b47":"13588","3be6e3bd":"13595",c642f758:"13617","1ac29206":"13718","8bd7a1a3":"13896",bd0b26a5:"13924",d94f9ca1:"13972","911bbfa4":"13979","0b0df062":"13995","3105dae0":"14061","03902e07":"14088","31585cea":"14095",f71ac404:"14143","1168c96f":"14200",af9acd56:"14299",c099652b:"14369","83cbebfb":"14386",d3fe7aed:"14396","79db63f1":"14549","40a26966":"14610","0f188b70":"14670","763f2b13":"14713","931e6ab8":"14775","39ed38cd":"14840",f6310963:"14908","4338ab08":"15196","99a1a3e3":"15380","8cf636cd":"15393",f9c66408:"15497",a97b7821:"15658",a466ebec:"15888","271906a0":"15970","21edad34":"16022","891a9b8f":"16058","6ecc8728":"16071","7d0b3c01":"16153",ff3504dd:"16161","66fe7120":"16379",fdbb9241:"16528",a9347149:"16635",e1bbb98e:"16672",df878b79:"16876","697b1a5f":"16891","251d94d8":"16973","5573e4eb":"17009",fe34d639:"17275","45e19d44":"17283",f77885f5:"17457","528fc62e":"17511",c003d460:"17726","00c88225":"17757","3e697946":"17785",b530e783:"17883","996d98f3":"17887","9a71d807":"17989","7f9f61f2":"18025",abc9098e:"18050","29dde6c8":"18084",d19aead5:"18100","2730c631":"18143","5bebce7d":"18156","074a0372":"18186","4e07c49f":"18318","18ccacf6":"18559","719c4782":"18598",d1faa944:"18680",ccb072c7:"18706","07645771":"18734","8282a203":"18746","31793acc":"18883","4d58aa3f":"18892","6e11cc87":"18928",bc4716d5:"18998","584b298a":"19177","86c7426e":"19212","8a064b88":"19305",b7ec56b9:"19408","126e88af":"19427","6f49328c":"19493","83fe529b":"19504","974829b4":"19531","84eafbb4":"19625","664f913c":"19661","3729e987":"19671",c5593510:"19709","2edcde3e":"19733",dc7ad1ac:"19806","17f9c41b":"19832","760eed0f":"19876","8781c463":"19939","6bf1075e":"19962",b73dbab9:"20040",ef0f9e32:"20061",c9664647:"20169","9b7bae35":"20303","1d014bb1":"20602",cbbe4dac:"20689","120dd2fd":"20707","6d0dfc8d":"20764","67c4e6e9":"20841","55e55873":"20876",be3ddcfb:"20911",e01a2739:"20917","4d0df69e":"20983","653c19c7":"21015",ae5e6a48:"21134","6a89e0dd":"21143","0e728709":"21207","0feddf78":"21228",e22055a4:"21379",f18d5795:"21643",aa282e34:"21688",c0ef9e49:"21823","7f2bec55":"21983","43a49a39":"22030",af199e5b:"22087","06ceb223":"22129","9bc49845":"22163","1ca03b4b":"22238","0b0f030b":"22456",ef2624d8:"22523","0e8c522c":"22540","66b5d69c":"22583",a8f480dd:"22604","4995f874":"22777","6775be7c":"22898",fe423ebe:"22940","9e91305d":"22997","500c9b63":"23064",ede8882f:"23228","909cadf6":"23231","8113fd14":"23252",c1f9ba1e:"23310","89c49d10":"23320","57b7b037":"23343","23896e06":"23435","2c9f485d":"23522",edbf4496:"23536","2457e7c2":"23545",c7599d12:"23663","332c497c":"23714","99961c3d":"23804","9a02f8a7":"23898",f1d5089f:"24058","92ce2bd2":"24066","15d86f95":"24101","0c48ef63":"24109",b7738a69:"24158",a8565f1f:"24266","395508da":"24282",fbfa5dfc:"24401","7fc9e2ed":"24467",e6de5f28:"24501",fd378320:"24946","7d30361b":"24986","10fd89ee":"25079","6c0ce6d0":"25251","8b2f7dd6":"25283","2b1e7b76":"25427",b9b67b35:"25433","59740e69":"25451",f265d6a5:"25513","06673fe1":"25547","75071a94":"25579","9427c683":"25833",bd61737f:"25898",cc7818bb:"26067","85860bdc":"26084",a958884d:"26201","3bed40a0":"26291",f35b8c8b:"26311",aacdb064:"26361",ec205789:"26521","08985b86":"26654","5667bf50":"26686","3be4d1c2":"26695",ceb6bd62:"26858",cba64cb3:"27031",bb1d1845:"27109","6d03c6cb":"27167",e022cd8b:"27270",ec2b56b1:"27276",ec11103c:"27303","552bb95e":"27324","0f6a2fca":"27704","865c04d0":"27982","56405cb8":"28085","916fb87b":"28134","95771e39":"28139","278f3637":"28181","50e78136":"28219","9ce40ebc":"28261","7f6814ed":"28367",fc338eb2:"28475","2f4d1edb":"28476",db082e36:"28514","8af04d56":"28516","018243f8":"28623","4a0c84c3":"28699",f6ca5dc0:"28800","2f74be58":"28880","8d83f575":"28882","48c7b3a1":"28906","3417a016":"28922",da9049b8:"29014","8f32218b":"29025","44573fa4":"29050",e5977951:"29066",a670ed1c:"29131","6fe0ccd0":"29191",f5da8015:"29272","1be78505":"29514","4ef1f024":"29520","949a554a":"29698",cff5e41a:"29717","16a52e74":"29782","9e530f0a":"29818","3291c538":"29831",b604d5b2:"29864","729f5dd4":"29871","2c86cbaa":"29886","14e9211b":"29899","3d8cf439":"29978",c32e37fe:"29980","26db341a":"30062","04829abe":"30216",ff9e51b7:"30295","26bc6c41":"30419","1dc72111":"30433","019a0579":"30454","73c32a6c":"30470","10b7b761":"30589","47203c86":"30613","4f643cbc":"30677","8dc6ea19":"30678","534db397":"30800","845c1fa7":"30820","8900c226":"30834","6b685afe":"30837","0c9e4d11":"30865","8d5884d6":"30885","683d9354":"30979",c8b95361:"31009",d1036fb2:"31023","9b98b06f":"31044","23c664e3":"31050","928e95c7":"31068","5e56d481":"31089",abe8f5f4:"31116","8793e9e6":"31187",cc976a0e:"31293","212ceae2":"31294","347c8874":"31441","2d7d2510":"31471","8a75859c":"31512",ee2f6eec:"31516","6c6d8053":"31570",ce861b37:"31671","9f850ab3":"31824","4dbdcbee":"31938","87719f86":"32224",bc6bcbdd:"32255",d201558c:"32308","570c64c0":"32319","09e7c68c":"32410","0eb0d7dd":"32446","6167ec10":"32491","5d8d28d6":"32567",f8c45ac9:"32652",a5461ca4:"32689","46d1dc13":"32839",f1c17b7f:"32872",e9268009:"32892","0ef4df13":"32914","1347019b":"32961","9fcb81d2":"33023",a9776c25:"33076","95f7392c":"33083",ad516382:"33131",dd0c884c:"33138",cab767d9:"33178",fa17a3e5:"33181","5af48372":"33223","3deda206":"33260","82dec33c":"33261","9ebfae5b":"33329","765a551b":"33407",b07fb42c:"33514","586fa356":"33725","452104f0":"33737","5b659de8":"33889","1943e34c":"33920",c46ba464:"33966","9b00304e":"34020","3db5eb91":"34077","273b8e1f":"34079","4a797306":"34153","23a156eb":"34206",ff318c38:"34293",f8338e5f:"34294",e48c3912:"34323",c4a71dd9:"34407",f0f4a691:"34458","5c8ad115":"34460","5bea2473":"34475","592e779d":"34552",de061f48:"34590",c93364c6:"34647","813ebe83":"34656","71408d45":"34748","116bb944":"34766","4284f636":"34777","243071a0":"34784","5c392fa5":"34792","99a27b29":"34800","16046cb7":"34882","2c06af7c":"34943","9c12417e":"34979",a2bcabb3:"35038",b269633b:"35069","3576f003":"35214","5334bf47":"35216","907c8c6a":"35387","1cf42300":"35466","09e24d74":"35577","032d72a0":"35614",a2e876c5:"35647","90bfd346":"35768",c30c381e:"35809",df463adb:"35874","41f4b8cc":"35879",b3a22aab:"36009",ade0010f:"36312","8d392edd":"36415",f98b2e13:"36442",caa6bd94:"36483",b3fdbb6a:"36495",f3d03ec8:"36511","8d4185e0":"36673","653cd4ec":"36766",eb87086a:"36933","63849fd3":"36935","7c43c98e":"36983","6d92a4b5":"37021",ac6b62e9:"37055","6e357be7":"37058",c3a94ed1:"37208","4f4166ed":"37257","229edc10":"37316","6dfd1bfa":"37362","3d7b9a1b":"37426","7779798d":"37894","246772cc":"37918",febe4bf0:"37977","5bb043f7":"38056",b34a9ee0:"38104","80b5c97d":"38230","8e018081":"38333","6fc631de":"38349","11414e0b":"38368",a77f15f9:"38450","66cd2d70":"38469","0df0bc38":"38504",e80537c2:"38591","5eece5ec":"38679",cea40137:"38741","38cd2ebb":"38768","2f6d8a46":"38792","0cb88ec0":"38819","179d37d3":"38873","1ec74ed7":"38928","7aabbdee":"39033","3ae213b8":"39177",f55bfda4:"39209",d179e89e:"39252",c0ba661c:"39275",a072c73d:"39325",b7e5badb:"39368","22f9ccca":"39605",e6e9a3aa:"39645",f2c01e3a:"39726","8277cea1":"39820","73c3a5ed":"39853",f8904416:"39941",fa8dc2e8:"39972","5b34f9ea":"39978",f49b74d5:"40097","2d7caf96":"40158","0260d23f":"40176","7ad00ade":"40342",dd8797f2:"40365",c2ef5f99:"40665",eaaaa138:"40830","51cdab7b":"40930",f5d7fbaf:"40936","0cbb6061":"40986",ba1c1ac8:"41100","14f11778":"41120","9444e723":"41329","7945275b":"41388",e6241e03:"41537","5b7c576e":"41840","56181a0b":"41863",c2ae09fd:"41929","85db7b61":"41954",e4176d9e:"41958","81192af7":"41998",a8987ce3:"42051","10c43c6e":"42054",bfe6bb1f:"42059","672c9486":"42169","4b66f540":"42187","3ebe5c8a":"42226","909a3395":"42263","48e254a2":"42289",ff4be603:"42371","4447d079":"42436",c14e35a5:"42465",a2ff0b9e:"42551",ff7c02a9:"42609",b6cfa9b7:"42620","4b481283":"42690","8e0282b7":"42721","910f748a":"42728","699b0913":"42757","608d6ba6":"42930",e2e305b4:"43037",ea09532f:"43047",dc98fcfb:"43072","26f63738":"43163",f6d93f4d:"43294","87186dce":"43529","39befbbe":"43554","0c94161c":"43635",d7039a99:"43645","4b718ce0":"43697","0682e49e":"43793","8b15c55c":"43849",bfada16a:"43919",b99800de:"43966","0e0b668d":"44023","4b0528ed":"44029","5d86b3d6":"44118",f193e9f7:"44152","5b1c4ba7":"44174",dfca4314:"44393","3d99ef33":"44523","29c565c8":"44592","0a54392a":"44765","51453fb2":"44780","7e328509":"44797","55a23a94":"44860","0f014490":"44907","46f76bef":"45057",dacae080:"45091","593ffe68":"45114","01f7e848":"45279","8b1145e2":"45287",a258685b:"45571","3476fe8e":"45583",d02f7bc4:"45593",f2abaee2:"45732","239111c7":"45786","83d061ac":"45809","9ee45729":"45878",ce3ddafe:"46017","5216f17c":"46023",e0eae934:"46045",fff3ab69:"46074","8d0344ba":"46218",a5b5d55c:"46284","2fc02015":"46328",e7478c24:"46447","46dcda29":"46838","33a34e3b":"46901",cbbdf9a2:"47062",ba73f26c:"47068",cc1f5ce8:"47082","8b6445a0":"47117","524b67e3":"47276",cf1567e8:"47287",e5c3dfde:"47463","3059ed75":"47568","9ee4ebe9":"47582",ee799351:"47655","3ab425d2":"47708","2f0ee63c":"47838",e327333b:"47851","497aa321":"47975","9a7b56f5":"47986","38bd3ddb":"48031","60da83fa":"48055","6f6b3e89":"48150",c81622cc:"48218",bf0d24cf:"48320","3762a996":"48349","3ddb8349":"48426",ba0b40b1:"48637","8fdcea61":"48670",abfd17f9:"48840",e96fdd6c:"48998","70f3cfb0":"49096",d1b82434:"49169",f99bfa77:"49241","8eed67ba":"49270","331a2ebd":"49716","98a6ff5a":"49874",ab2e7268:"50017","8ac39bbe":"50052","4d028f11":"50145",e51da90c:"50153","486e741e":"50193","7b4c719b":"50240",acb04c32:"50337","7ce5ebd9":"50362",e86c0d05:"50375","9f305eae":"50437","40a0c599":"50472",aba6a826:"50525","7bcf009a":"50773",d7e1d518:"50849","6f93a078":"50999","1ed71a7d":"51532",e91074f3:"51555","12e76d03":"51574","6f219482":"51593",ea82a261:"51605","3b5ffa57":"51625",af6e989f:"51706",e6fe050f:"51768","42a4a45b":"51799","2212e80c":"51830",b63fdeeb:"51840",bf3bde03:"51881","86a7da57":"51945","22a76d89":"52094","08ba51c1":"52126","92bceb62":"52251","79bae4c5":"52286",fcb00301:"52491","3b1e54e9":"52499","2006be57":"52573",a18114c4:"52586","28599d52":"52593",c04dcf0d:"52715","0e46f7bf":"52789",f888d9d8:"52870","1df93b7f":"53237",c55f973e:"53371","6cd64148":"53442","3ca132b1":"53675",f20f879f:"53823",b684abf7:"54125","6afbfa44":"54133","1632abda":"54178","39a2751e":"54200","4c8d1cae":"54210",b3c952b5:"54250",d6f7d5e2:"54265",fa5bdf0c:"54363","4bae0029":"54382","3034400c":"54397","612ebb8a":"54487",cca83a59:"54513",e0668c88:"54591","7c417199":"54741","130a23fd":"54756",fd67079f:"54778",ed07f994:"54786",dd8be3b2:"54794","32f0f819":"54855",a463ff81:"55043","5560d84e":"55216","661e4fa4":"55239","08472b2d":"55273","746f419e":"55335","1710d498":"55478","8e23b856":"55552","7ec28fd9":"55693","7f536709":"55726","2e18dbc8":"55745",fe2f6d57:"55799","676a3180":"55821","407fa3a0":"55925",f2d325f1:"55962","640fe435":"56290",ac4fb807:"56424",e8d36425:"56513","7ec7e0b0":"56541","43c3babd":"56552","71f8452f":"56614",f1525ef1:"56750","151869e3":"56795",d4a6dda9:"56902","918ae6ff":"57121","7f039048":"57126","522a40f8":"57242","1b4282d0":"57258",fb218ddd:"57293","3ad7b662":"57341",f251ab77:"57489",e4b4615d:"57598","6e586ee3":"57599",d06effa9:"57699","34660ac5":"57749","163044ef":"57780",a3c98c45:"57820","2c5ceec1":"57995","84c320c1":"58009","4893a1cb":"58042",e345afee:"58096",a045168c:"58182","09e9a7df":"58197","6145eda0":"58234","551b313a":"58247","649a76e7":"58356",e5a71ed6:"58539",ea41aad0:"58564","4f9404e5":"58768",cc6053aa:"58818",cf14af90:"58822",bf2622dd:"58824","68709c70":"58914","8938295e":"59051",de11ece8:"59060","010f8398":"59181",af049e12:"59191","07a6f1c2":"59241","8962034b":"59248","902aff6f":"59336",d959d974:"59342",f4be443e:"59394",b43aa387:"59427","0cd38f48":"59442","081ed9af":"59496",c2ed794e:"59506","897798e8":"59533",d243562e:"59592",e56a1a2c:"59771","619d2e79":"59900",f929d4df:"59982","918c9b38":"59992","78f8003c":"60000","7b2e834b":"60185","34d5cc00":"60331","05a720dd":"60434",bb341369:"60518",b8677fbf:"60603",c1b7bc44:"60608","1ebc7fe2":"60682","16bb304a":"60868",a79d55be:"61007","4c13f84f":"61200",f8bc4080:"61210",f497508e:"61249",e50573ba:"61271",a529f863:"61361",e31a63b7:"61543",a882bd74:"61643","5e52bbeb":"61793","83a26c48":"62117","912fcb5a":"62235","5c77ea5f":"62307",bdd03912:"62318","8e993d66":"62319","686c1ad3":"62474","6312a106":"62493","54d1c079":"62523",bc1c33e4:"62547","8fca97e0":"62655","34d502be":"62867",c38f23a9:"62948","6dffe7c4":"62983",f29affbe:"63105","877a3c1c":"63387","7bc70741":"63604",cb582f54:"63663","92264b81":"63777","010118f9":"63801","86c8f7cd":"63808",cacfa11d:"64000",d6990b47:"64032","555f2cec":"64072",e8591f69:"64100","07b92fc6":"64122",aea361f0:"64164",e715560c:"64485","6694e7e9":"64525",fd0f74aa:"64533",eee9e2f1:"64596","300bd484":"64656",e74888b9:"64754","5a7e5a43":"64766",dfd588b8:"64824","42d0afac":"64871","610e19f0":"64900",b81f3fb0:"64967","637ec626":"65132","8d3be60d":"65501",ef0f3981:"65592","88dde0bb":"65594","90dccef4":"65641","540f26e7":"65883","21d2296a":"65965","06b7cd3c":"66026","9d336f66":"66081",cf494ba6:"66187","1fb9ab5c":"66238","5f0246ae":"66256",a5560bad:"66303","87cc8f7c":"66336","1d642165":"66462","9c53d859":"66465",e0052e0c:"66597","2dba4e8b":"66744","12b52520":"66958","49763a0b":"67010","01ba3f79":"67021",d7124adb:"67061",c7b80b67:"67069","15f4efbb":"67132","2d8700b9":"67343","85ac525a":"67448","18caf9ef":"67583","1693c0b8":"67597",a48eac25:"67638",a23744f9:"67670",d29db0e3:"67908","8d96489a":"67954","8858d0ce":"68034",db9653b1:"68126",f301134a:"68258","8751004c":"68689","5335ef4f":"68757","9b89ba00":"68793",c3ca7a6a:"68823",ae5838f0:"68943","21cf1efb":"69057","1a54bfd0":"69111",a5f4c814:"69125","046783c0":"69169","193f200e":"69209","212137e2":"69234","140e6a69":"69254","9bfbb8bc":"69628",d5d7628b:"69629","607292ce":"69647","273a5860":"69824","7facae8f":"69843","6d933e1d":"69899","41e02281":"70081","16b47049":"70130","06b3b671":"70178","7bd49e6c":"70249",d72ada40:"70277","1fc4ed50":"70367","3fb51827":"70497",f53e2381:"70504",ce79b72a:"70543","65306ecf":"70614","21d3c1c7":"70706","06876062":"70964","99c371aa":"71081",dff7b4e8:"71160","5ed92a05":"71169","167f5be9":"71287",c26ab7d5:"71476",dcf1d6e7:"71516",d93a0aad:"71544",ff9d88b6:"71698",fcfa677e:"71789","39afc900":"71811","77ccc938":"71918","59dfcfb5":"72070","7f31124b":"72189","66179fb5":"72331",cf945ce5:"72500","084a18af":"72613","1adeac4a":"72638","150a4d14":"72740","2e2a73ec":"72887",d703ca6f:"72952",d9b3adf3:"72957","2e6d047c":"72978","1eec97be":"73238","4efa0483":"73300",aa02927d:"73338",b26f6fa9:"73356","6594bd70":"73369","63b8176f":"73452",c9f98325:"73533",c7953305:"73537","05a8a78d":"73576","12dcfbad":"73618","8f9c5733":"73725","154cbeb4":"73745",af29c71b:"73766",bce7d46d:"73858","566ea6d5":"73975",ff35d8ff:"74019",a0541488:"74061","67e63bc0":"74091",ba454016:"74132",ef4e0f5d:"74136","769f97b7":"74332",cb8731ee:"74337","4c8fc79c":"74441",e6a17fa0:"74465","4bbc58d4":"74480","80ea5ae7":"74578","016f0e16":"74794","8b87f6f5":"74839","5b34d9eb":"75070","29ff1658":"75118","8d81369e":"75123","61c61e17":"75203",ca1d44bc:"75273",c176dc4f:"75546","43a232e9":"75567",ff078e30:"75671","1ac49947":"75702","64e30bbc":"75897",b92bff04:"75911",d6f3938e:"76029","172c9869":"76180","99ae7254":"76222","02de7b5a":"76240","2363ed29":"76355","4f8fd4be":"76360",c0f7075f:"76369","359e34b0":"76527","982c02f7":"76668",d12dbf4d:"76793","198182f0":"76878","8d493a07":"76895",cb870251:"76924","9eb4c1b4":"77053","60e8b504":"77170","4deae4de":"77413","78dc40c2":"77427","56c932ee":"77527","274eaedf":"77655","60043c0d":"77680",be698a2c:"77887","971cbe2f":"77923","1ae50e88":"78035","4ab9b114":"78038",c80936bd:"78063",ed51eb7d:"78072","516dec85":"78114","41742cda":"78233",df22f3af:"78248","118229e6":"78452","2b5e4b34":"78863","0fcbeed9":"78924","69f3d9b5":"78991",a17fb62b:"79073","95f18dd4":"79164","3888e873":"79239",f83967c4:"79290","10c28d6f":"79298","46fa8ad3":"79315","73a7bd5f":"79356",df5a3016:"79478","27e1a14b":"79479",fbff3b11:"79543",c733e485:"79691","6b2b8280":"79829","4bedd8c5":"79895",b00a2879:"79958","092519d2":"79963","4925ce85":"79977","935f2afb":"80053","18dd253f":"80132",fc8aebe3:"80192",eac8f2ef:"80268","2d35b91c":"80380","83cd8f20":"80709",f97cc188:"80895",abf6a1f1:"80940","85c9bffb":"81012","0c1ee94a":"81135","29b6c240":"81400","2dd65ece":"81477",a39041db:"81667","82a4f002":"81708","6a0b4355":"81835","35fa8025":"81934","919b108c":"82132","51da09c7":"82250",dc130668:"82342","3f6554cb":"82348",d885d629:"82360",fadcaea6:"82423","7bcf7096":"82513",aa395a59:"82614",ba4efbe0:"82621",ba8527a9:"82732","3ebee193":"82864",bf02c3ce:"82982","39b565ff":"82989","39c8ecdc":"83069","1a42aba3":"83074",fba94ee1:"83139",a26f7afa:"83147",fb0364ff:"83513","779753bc":"83692","39c159d8":"83698","66716ec1":"83893","10f908b7":"83957","363318d5":"84063","737371dd":"84097","6e366b57":"84242",a3286ddf:"84281","8da7304d":"84362","2da29b2a":"84429","3b12bc8a":"84477","6eeb04e2":"84500","6cb122e3":"84513","4cec253a":"84710","42325f5c":"84745","211f58b1":"84754","7e5ee96c":"84841","5c27dd68":"84847","34b19815":"84854",e9d5739e:"84888","21bf64ca":"84941","6e5d074b":"85011","7bc3feb7":"85054","60d04b47":"85098",f8482b2c:"85217","3743f01c":"85419","7e446cc1":"85455","6d480200":"85493","82033eb7":"85780",c9b79676:"86009",ed809cac:"86018",b47406fa:"86129","1db21d86":"86150",a540f8cd:"86317","08e3aaa9":"86333","4ad39569":"86356","45aa7127":"86476",ce66b6fd:"86518","7dd3be25":"86826","4e1da517":"86950",d1b2a42e:"86991",a6a8af40:"87064","27bd5328":"87223","6fc8d865":"87224","1fdab62e":"87240","9980f90c":"87304","96c0bb00":"87313",a48778d9:"87316","96ec050b":"87443","7668acae":"87460","8faa0fb1":"87482",a291f403:"87513",f807eec9:"87634",c07f2717:"87667","3958a146":"87799",e111f111:"87836",b1998bb1:"87866","8d28c4be":"88068",e5842021:"88179","1e391540":"88187","5304a4a8":"88204",c0074ddd:"88252","21ad5224":"88295",e1b9986a:"88338",ac930f6e:"88380","3c725018":"88446","6827856d":"88598","4499569c":"88621","6f59957c":"88625","41db9914":"88821","1c56d006":"88831","32ea4ecb":"88879","4455e85b":"89002","234a1403":"89122",b4028749:"89210","17114a18":"89215","13b69fa8":"89574","164cd634":"89675","46c600d5":"89780","443045da":"89806","1f79049f":"89852","93d3457d":"89984",fee1f25c:"89986",d25ffd5f:"89987",d043cc46:"90046","390ef088":"90075",d36db526:"90085","41b3e733":"90185","727a1f3c":"90205",f2497893:"90333","2c91f584":"90342","7d607fc0":"90377",f60e43ec:"90392","0b78393d":"90398",dd313590:"90431","0a13c98e":"90451",b2335bc1:"90464","73dfc993":"90536","01fb8b11":"90560","22f40a40":"90601","147b0f6a":"90610","8cd0f4f5":"90615","6601f604":"90645","459a783a":"90666","87e7806e":"90865","4302562a":"90896","6eb0ce42":"90976","9c4bbfc4":"91178",ff0539a2:"91213","28b27838":"91231","02ee0502":"91287","872e63de":"91304",b82d5884:"91316","4cd7d8af":"91406","1671b3fa":"91425",d692bb25:"91523",c839a5b0:"91628","2f535455":"91753","304ed800":"91782","0f7553c3":"91810",b2735041:"91849","000c061a":"92085","8c828746":"92244","9c42de85":"92269",db5c8692:"92393","799b872c":"92404",d6360c39:"92456","4d4093bb":"92463","6eebf72d":"92744","7861f6df":"92775","8c31caf6":"92778",ae5bb339:"92786","85c3ba36":"92843","8bfba65b":"92851",e5a16b2e:"92900","14e00221":"92964",b984322c:"93023","61e5c5b8":"93071",e7cbe8da:"93151",f7101d4f:"93176","740eb29c":"93195",b83df1bc:"93308","5d075efb":"93340",f7735fb0:"93346",dd435828:"93377","03e8549c":"93400",dede40b0:"93590","4e6907d6":"93749","917734f8":"93832",cb341380:"93837",c9aea766:"94114","7c8407dd":"94123","91dc98f0":"94136","37aba5d3":"94197","43b891d1":"94223","63f66cb7":"94328","9fdf7324":"94337","6c10648f":"94401","878356ab":"94452","487f7f30":"94605",d3e690ce:"94694","376d31f7":"94696",a233fb97:"94932",b8e39b95:"95020",d666ab7e:"95107","3db8c88b":"95171",bc08bf79:"95281","9936b6c8":"95296",cf282674:"95317","1e173bbe":"95327","5b23c695":"95329","41fbfe2f":"95364","7877b0eb":"95418",e9ef6b31:"95441","0e0f5dd2":"95561","8462ad7a":"95696",edf19300:"95745",e490fd18:"95801","9dd89af2":"95816","7e254f9d":"95911","90b0cf6d":"95945","8fa500ae":"96055",d6011437:"96078",a322018d:"96082","3061ad92":"96135",f0129862:"96188","8e2c0739":"96199",ebf2bdda:"96361","64bd79cb":"96426","38e65fdd":"96535","49ea6ca5":"96544","385bc71d":"96547",e23cd647:"96617",a612420b:"96684","2035956b":"96768",b35418cf:"96772","99ba663e":"96831","09e11ac0":"96945","57973c2b":"96971","7f6f8f16":"96979","6816f4c0":"97065",f3034cf4:"97129","9d4bcb9a":"97334",d91e7ab4:"97469","02fbc840":"97523","902fdb3b":"97547","7ea214d5":"97553",c70cb355:"97557",ed97cef0:"97617","6f25dd34":"97648",b094b997:"97782","7513b789":"97816","16cff1eb":"97826",dd6685df:"97850","1a4e3797":"97920","746bf890":"97955","049dc708":"98177","0e7f2915":"98200","1820eb3b":"98218",b7f629d0:"98272",ced65f67:"98623",d1475ab1:"98740","1a6f209f":"98791","6a913ab1":"98868","3ff950a4":"98939","008b0ccc":"99120","8aecb2ef":"99184",ca443c18:"99266","7cc0ca0e":"99299","00125b11":"99367",c2f4aca4:"99389","64758f43":"99427",f2d5637b:"99494","49ea4a42":"99607","32db5af4":"99669","15d4dc80":"99839","5e3def70":"99871",b63b5bb9:"99997"}[e]||e,r.p+r.u(e)},(()=>{var e={51303:0,40532:0};r.f.j=(b,f)=>{var c=r.o(e,b)?e[b]:void 0;if(0!==c)if(c)f.push(c[2]);else if(/^(40532|51303)$/.test(b))e[b]=0;else{var d=new Promise(((f,d)=>c=e[b]=[f,d]));f.push(c[2]=d);var a=r.p+r.u(b),t=new Error;r.l(a,(f=>{if(r.o(e,b)&&(0!==(c=e[b])&&(e[b]=void 0),c)){var d=f&&("load"===f.type?"missing":f.type),a=f&&f.target&&f.target.src;t.message="Loading chunk "+b+" failed.\n("+d+": "+a+")",t.name="ChunkLoadError",t.type=d,t.request=a,c[1](t)}}),"chunk-"+b,b)}},r.O.j=b=>0===e[b];var b=(b,f)=>{var c,d,[a,t,o]=f,n=0;if(a.some((b=>0!==e[b]))){for(c in t)r.o(t,c)&&(r.m[c]=t[c]);if(o)var i=o(r)}for(b&&b(f);n Contributing a Task | Cumulus Documentation - +
Version: v16.0.0

Contributing a Task

We're tracking reusable Cumulus tasks in this list and, if you've got one you'd like to share with others, you can add it!

Right now we're focused on tasks distributed via npm, but are open to including others. For now the script that pulls all the data for each package only supports npm.

The tasks.md file is generated in the build process

The tasks list in docs/tasks.md is generated from the list of task package names from the tasks folder.

Do not edit the docs/tasks.md file directly.

- + \ No newline at end of file diff --git a/docs/api/index.html b/docs/api/index.html index a8345137eb7..d38a26b655a 100644 --- a/docs/api/index.html +++ b/docs/api/index.html @@ -5,13 +5,13 @@ Cumulus API | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/architecture/index.html b/docs/architecture/index.html index ee1d4488ce7..512be817d30 100644 --- a/docs/architecture/index.html +++ b/docs/architecture/index.html @@ -5,14 +5,14 @@ Architecture | Cumulus Documentation - +
Version: v16.0.0

Architecture

Architecture

Below, find a diagram with the components that comprise an instance of Cumulus.

Architecture diagram of a Cumulus deployment

This diagram details all of the major architectural components of a Cumulus deployment.

While the diagram can feel complex, it can easily be digested in several major components:

Data Distribution

End Users can access data via Cumulus's distribution submodule, which includes ASF's thin egress application, this provides authenticated data egress, temporary S3 links and other statistics features.

End user exposure of Cumulus's holdings is expected to be provided by an external service.

For NASA use, this is assumed to be CMR in this diagram.

Data ingest

Workflows

The core of the ingest and processing capabilities in Cumulus is built into the deployed AWS Step Function workflows. Cumulus rules trigger workflows via either Cloud Watch rules, Kinesis streams, SNS topic, or SQS queue. The workflows then run with a configured Cumulus message, utilizing built-in processes to report status of granules, PDRs, executions, etc to the Data Persistence components.

Workflows can optionally report granule metadata to CMR, and workflow steps can report metrics information to a shared SNS topic, which could be subscribed to for near real time granule, execution, and PDR status. This could be used for metrics reporting using an external ELK stack, for example.

Data persistence

Cumulus entity state data is stored in a PostgreSQL compatible database, and is exported to an Elasticsearch instance for non-authoritative querying/state data for the API and other applications that require more complex queries.

Data discovery

Discovering data for ingest is handled via workflow step components using Cumulus provider and collection configurations and various triggers. Data can be ingested from AWS S3, FTP, HTTPS and more.

Database

Cumulus utilizes a user-provided PostgreSQL database backend. For improved API search query efficiency Cumulus provides data replication to an Elasticsearch instance.

PostgreSQL Database Schema Diagram

ERD of the Cumulus Database

Maintenance

System maintenance personnel have access to manage ingest and various portions of Cumulus via an AWS API gateway, as well as the operator dashboard.

Deployment Structure

Cumulus is deployed via Terraform and is organized internally into two separate top-level modules, as well as several external modules.

Cumulus

The Cumulus module, which contains multiple internal submodules, deploys all of the Cumulus components that are not part of the Data Persistence portion of this diagram.

Data persistence

The data persistence module provides the Data Persistence portion of the diagram.

Other modules

Other modules are provided as artifacts on the release page for use in users configuring their own deployment and contain extracted subcomponents of the cumulus module. For more on these components see the components documentation.

For more on the specific structure, examples of use and how to deploy and more, please see the deployment docs as well as the cumulus-template-deploy repo .

- + \ No newline at end of file diff --git a/docs/category/about-cumulus/index.html b/docs/category/about-cumulus/index.html index 36c3b8d6878..4e97c65939f 100644 --- a/docs/category/about-cumulus/index.html +++ b/docs/category/about-cumulus/index.html @@ -5,13 +5,13 @@ About Cumulus | Cumulus Documentation - +
- + \ No newline at end of file diff --git a/docs/category/common-use-cases/index.html b/docs/category/common-use-cases/index.html index e0d410ee931..467bc1c7085 100644 --- a/docs/category/common-use-cases/index.html +++ b/docs/category/common-use-cases/index.html @@ -5,13 +5,13 @@ Common Use Cases | Cumulus Documentation - +
- + \ No newline at end of file diff --git a/docs/category/configuration-1/index.html b/docs/category/configuration-1/index.html index 34628345376..bb221d465c1 100644 --- a/docs/category/configuration-1/index.html +++ b/docs/category/configuration-1/index.html @@ -5,13 +5,13 @@ Configuration | Cumulus Documentation - +
- + \ No newline at end of file diff --git a/docs/category/configuration/index.html b/docs/category/configuration/index.html index a7e8b57f870..3822a330038 100644 --- a/docs/category/configuration/index.html +++ b/docs/category/configuration/index.html @@ -5,13 +5,13 @@ Configuration | Cumulus Documentation - +
Version: v16.0.0

Configuration

- + \ No newline at end of file diff --git a/docs/category/cookbooks/index.html b/docs/category/cookbooks/index.html index 65d960bd38e..192c71c13b7 100644 --- a/docs/category/cookbooks/index.html +++ b/docs/category/cookbooks/index.html @@ -5,13 +5,13 @@ Cookbooks | Cumulus Documentation - +
Version: v16.0.0

Cookbooks

- + \ No newline at end of file diff --git a/docs/category/cumulus-development/index.html b/docs/category/cumulus-development/index.html index e04c24f17ba..a8b64189d1f 100644 --- a/docs/category/cumulus-development/index.html +++ b/docs/category/cumulus-development/index.html @@ -5,13 +5,13 @@ Cumulus Development | Cumulus Documentation - +
- + \ No newline at end of file diff --git a/docs/category/deployment/index.html b/docs/category/deployment/index.html index f200e71c6df..3b2b4d04556 100644 --- a/docs/category/deployment/index.html +++ b/docs/category/deployment/index.html @@ -5,13 +5,13 @@ Cumulus Deployment | Cumulus Documentation - +
- + \ No newline at end of file diff --git a/docs/category/development/index.html b/docs/category/development/index.html index 03f5f415607..d0533ed70db 100644 --- a/docs/category/development/index.html +++ b/docs/category/development/index.html @@ -5,13 +5,13 @@ Development | Cumulus Documentation - +
- + \ No newline at end of file diff --git a/docs/category/external-contributions/index.html b/docs/category/external-contributions/index.html index 45f9efb2cb3..70a6312e68c 100644 --- a/docs/category/external-contributions/index.html +++ b/docs/category/external-contributions/index.html @@ -5,13 +5,13 @@ External Contributions | Cumulus Documentation - +
- + \ No newline at end of file diff --git a/docs/category/features/index.html b/docs/category/features/index.html index 60b11939180..116c17ad671 100644 --- a/docs/category/features/index.html +++ b/docs/category/features/index.html @@ -5,13 +5,13 @@ Features | Cumulus Documentation - +
Version: v16.0.0

Features

📄️ How to replay Kinesis messages after an outage

After a period of outage, it may be necessary for a Cumulus operator to reprocess or 'replay' messages that arrived on an AWS Kinesis Data Stream but did not trigger an ingest. This document serves as an outline on how to start a replay operation, and how to perform status tracking. Cumulus supports replay of all Kinesis messages on a stream (subject to the normal RetentionPeriod constraints), or all messages within a given time slice delimited by start and end timestamps.

- + \ No newline at end of file diff --git a/docs/category/getting-started/index.html b/docs/category/getting-started/index.html index bf63d17f9bd..60fbdcef547 100644 --- a/docs/category/getting-started/index.html +++ b/docs/category/getting-started/index.html @@ -5,13 +5,13 @@ Getting Started | Cumulus Documentation - +
- + \ No newline at end of file diff --git a/docs/category/integrator-guide/index.html b/docs/category/integrator-guide/index.html index fab33df1764..f79eb9551b9 100644 --- a/docs/category/integrator-guide/index.html +++ b/docs/category/integrator-guide/index.html @@ -5,13 +5,13 @@ Integrator Guide | Cumulus Documentation - +
- + \ No newline at end of file diff --git a/docs/category/logs/index.html b/docs/category/logs/index.html index 5676b0ed907..ac039e4da27 100644 --- a/docs/category/logs/index.html +++ b/docs/category/logs/index.html @@ -5,13 +5,13 @@ Logs | Cumulus Documentation - +
- + \ No newline at end of file diff --git a/docs/category/operations/index.html b/docs/category/operations/index.html index 8b45409e093..db16e25bf0b 100644 --- a/docs/category/operations/index.html +++ b/docs/category/operations/index.html @@ -5,13 +5,13 @@ Operations | Cumulus Documentation - +
- + \ No newline at end of file diff --git a/docs/category/troubleshooting/index.html b/docs/category/troubleshooting/index.html index 17a902c9e75..990a8caba97 100644 --- a/docs/category/troubleshooting/index.html +++ b/docs/category/troubleshooting/index.html @@ -5,13 +5,13 @@ Troubleshooting | Cumulus Documentation - +
- + \ No newline at end of file diff --git a/docs/category/upgrade-notes/index.html b/docs/category/upgrade-notes/index.html index 220127b31db..4230adeed3c 100644 --- a/docs/category/upgrade-notes/index.html +++ b/docs/category/upgrade-notes/index.html @@ -5,13 +5,13 @@ Upgrade Notes | Cumulus Documentation - +
- + \ No newline at end of file diff --git a/docs/category/workflow-tasks/index.html b/docs/category/workflow-tasks/index.html index 15ef6c48e88..0ed0a82d356 100644 --- a/docs/category/workflow-tasks/index.html +++ b/docs/category/workflow-tasks/index.html @@ -5,13 +5,13 @@ Workflow Tasks | Cumulus Documentation - +
- + \ No newline at end of file diff --git a/docs/category/workflows/index.html b/docs/category/workflows/index.html index 798134c4638..10a1d684385 100644 --- a/docs/category/workflows/index.html +++ b/docs/category/workflows/index.html @@ -5,13 +5,13 @@ Workflows | Cumulus Documentation - +
- + \ No newline at end of file diff --git a/docs/configuration/cloudwatch-retention/index.html b/docs/configuration/cloudwatch-retention/index.html index a387ff8d858..06411ae15b2 100644 --- a/docs/configuration/cloudwatch-retention/index.html +++ b/docs/configuration/cloudwatch-retention/index.html @@ -5,7 +5,7 @@ Cloudwatch Retention | Cumulus Documentation - + @@ -14,7 +14,7 @@ the retention period (in days) of cloudwatch log groups for lambdas and tasks which the cumulus, cumulus_distribution, and cumulus_ecs_service modules supports (using the cumulus module as an example):

module "cumulus" {
# ... other variables
default_log_retention_days = var.default_log_retention_days
cloudwatch_log_retention_periods = var.cloudwatch_log_retention_periods
}

By setting the below variables in terraform.tfvars and deploying, the cloudwatch log groups will be instantiated or updated with the new retention value.

default_log_retention_periods

The variable default_log_retention_days can be configured in order to set the default log retention for all cloudwatch log groups managed by Cumulus in case a custom value isn't used. The log groups will use this value for their retention, and if this value is not set either, the retention will default to 30 days. For example, if a user would like their log groups of the Cumulus module to have a retention period of one year, deploy the respective modules with the variable in the example below.

Example

default_log_retention_periods = 365

cloudwatch_log_retention_periods

The retention period (in days) of cloudwatch log groups for specific lambdas and tasks can be set during deployment using the cloudwatch_log_retention_periods terraform map variable. In order to configure these values for respective cloudwatch log groups, uncomment the cloudwatch_log_retention_periods variable and add the retention values listed below corresponding to the group's retention you want to change. The following values are supported correlating to their lambda/task name, (i.e. "/aws/lambda/prefix-DiscoverPdrs" would have the retention variable "DiscoverPdrs" )

  • ApiEndpoints
  • AsyncOperationEcsLogs
  • DiscoverPdrs
  • DistributionApiEndpoints
  • EcsLogs
  • granuleFilesCacheUpdater
  • HyraxMetadataUpdates
  • ParsePdr
  • PostToCmr
  • PrivateApiLambda
  • publishExecutions
  • publishGranules
  • publishPdrs
  • QueuePdrs
  • QueueWorkflow
  • replaySqsMessages
  • SyncGranule
  • UpdateCmrAccessConstraints
note

EcsLogs is used for all cumulus_ecs_service tasks cloudwatch log groups

Example

cloudwatch_log_retention_periods = {
ParsePdr = 365
}

The retention periods are the number of days you'd like to retain the logs in the specified log group for. There is a list of possible values available in the aws logs documentation.

- + \ No newline at end of file diff --git a/docs/configuration/collection-storage-best-practices/index.html b/docs/configuration/collection-storage-best-practices/index.html index fc568f5f2da..d6bfe9f2f82 100644 --- a/docs/configuration/collection-storage-best-practices/index.html +++ b/docs/configuration/collection-storage-best-practices/index.html @@ -5,13 +5,13 @@ Collection Cost Tracking and Storage Best Practices | Cumulus Documentation - +
Version: v16.0.0

Collection Cost Tracking and Storage Best Practices

Organizing your data is important for metrics you may want to collect. AWS S3 storage and cost metrics are calculated at the bucket level, so it is easy to get metrics by bucket. You can get storage metrics at the key prefix level, but that is done through the CLI, which can be very slow for large buckets. It is very difficult to estimate costs at the prefix level.

Calculating Storage By Collection

By bucket

Usage by bucket can be obtained in your AWS Billing Dashboard via an S3 Usage Report. You can download your usage report for a period of time and review your storage and requests at the bucket level.

Bucket metrics can also be found in the AWS CloudWatch Metrics Console (also see Using Amazon CloudWatch Metrics).

Navigate to Storage Metrics and select the BucketName for all buckets you are interested in. The available metrics are BucketSizeInBytes and NumberOfObjects.

In the Graphed metrics tab, you can select the type of statistic (i.e. average, minimum, maximum) and the period for the stats. At the top, it's useful to select from the dropdown to view the metrics as a number. You can also select the time period for which you want to see stats.

Alternatively you can query CloudWatch using the CLI.

This command will return the average number of bytes in the bucket test-bucket for 7/31/2019:

aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2019-07-31T00:00:00 --end-time 2019-08-01T00:00:00 --period 86400 --statistics Average --region us-east-1 --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=test-bucket Name=StorageType,Value=StandardStorage

The result looks like:

{
"Datapoints": [
{
"Timestamp": "2019-07-31T00:00:00Z",
"Average": 150996467959.0,
"Unit": "Bytes"
}
],
"Label": "BucketSizeBytes"
}

By key prefix

AWS does not offer storage and usage statistics at a key prefix level. Via the AWS CLI, you can get the total storage for a bucket or folder. The following command would get the storage for folder example-folder in bucket sample-bucket:

aws s3 ls --summarize --human-readable --recursive s3://sample-bucket/example-folder | grep 'Total'

Note that this can be a long-running operation for large buckets.

Calculating Cost By Collection

NASA NGAP Environment

If using an NGAP account, the cost per bucket can be found in your CloudTamer console, in the Financials section of your account information. This is calculated on a monthly basis.

There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

Outside of NGAP

You can enabled S3 Cost Allocation Tags and tag your buckets. From there, you can view the cost breakdown in your AWS Billing Dashboard via the Cost Explorer. Cost Allocation Tagging is available at the bucket level.

There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

Storage Configuration

Cumulus allows for the configuration of many buckets for your files. Buckets are created and added to your deployment as part of the deployment process.

In your Cumulus collection configuration, you specify where you want the files to be stored post-processing. This is done by matching a regular expression on the file with the configured bucket.

Note that in the collection configuration, the bucket field is the key to the buckets variable in the deployment's .tfvars file.

Organizing By Bucket

You can specify separate groups of buckets for each collection, which could look like the example below.

{
"name": "MOD09GQ",
"version": "006",
"granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
"sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
"files": [
{
"bucket": "MOD09GQ-006-protected",
"regex": "^.*\\.hdf$",
"sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
},
{
"bucket": "MOD09GQ-006-private",
"regex": "^.*\\.hdf\\.met$",
"sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
},
{
"bucket": "MOD09GQ-006-protected",
"regex": "^.*\\.cmr\\.xml$",
"sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
},
{
"bucket": "MOD09GQ-006-public",
"regex": "^*\\.jpg$",
"sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
}
]
}

Additional collections would go to different buckets.

Organizing by Key Prefix

Different collections can be organized into different folders in the same bucket, using the key prefix, which is specified as the url_path in the collection configuration. In this simplified collection configuration example, the url_path field is set at the top level so that all files go to a path prefixed with the collection name and version.

{
"name": "MOD09GQ",
"version": "006",
"granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
"url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}",
"sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
"files": [
{
"bucket": "protected",
"regex": "^.*\\.hdf$",
"sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
},
{
"bucket": "private",
"regex": "^.*\\.hdf\\.met$",
"sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
},
{
"bucket": "protected",
"regex": "^.*\\.cmr\\.xml$",
"sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
},
{
"bucket": "public",
"regex": "^*\\.jpg$",
"sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
}
]
}

In this case, the path to all the files would be: MOD09GQ___006/<filename> in their respective buckets.

The url_path can be overidden directly on the file configuration. The example below produces the same result.

{
"name": "MOD09GQ",
"version": "006",
"granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
"sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
"files": [
{
"bucket": "protected",
"regex": "^.*\\.hdf$",
"sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
"url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
},
{
"bucket": "private",
"regex": "^.*\\.hdf\\.met$",
"sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met",
"url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
},
{
"bucket": "protected-2",
"regex": "^.*\\.cmr\\.xml$",
"sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml",
"url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
},
{
"bucket": "public",
"regex": "^*\\.jpg$",
"sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg",
"url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
}
]
}
- + \ No newline at end of file diff --git a/docs/configuration/data-management-types/index.html b/docs/configuration/data-management-types/index.html index 57235e21ec7..a5c207c7e66 100644 --- a/docs/configuration/data-management-types/index.html +++ b/docs/configuration/data-management-types/index.html @@ -5,13 +5,13 @@ Cumulus Data Management Types | Cumulus Documentation - +
Version: v16.0.0

Cumulus Data Management Types

What Are The Cumulus Data Management Types

  • Collections: Collections are logical sets of data objects of the same data type and version. They provide contextual information used by Cumulus ingest.
  • Granules: Granules are the smallest aggregation of data that can be independently managed. They are always associated with a collection, which is a grouping of granules.
  • Providers: Providers generate and distribute input data that Cumulus obtains and sends to workflows.
  • Rules: Rules tell Cumulus how to associate providers and collections and when/how to start processing a workflow.
  • Workflows: Workflows are composed of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage, and archive data.
  • Executions: Executions are records of a workflow.
  • Reconciliation Reports: Reports are a comparison of data sets to check to see if they are in agreement and to help Cumulus users detect conflicts.

Interaction

  • Providers tell Cumulus where to get new data - i.e. S3, HTTPS
  • Collections tell Cumulus where to store the data files
  • Rules tell Cumulus when to trigger a workflow execution and tie providers and collections together

Managing Data Management Types

The following are created via the dashboard or API:

  • Providers
  • Collections
  • Rules
  • Reconciliation reports

Granules are created by workflow executions and then can be managed via the dashboard or API.

An execution record is created for each workflow execution triggered and can be viewed in the dashboard or data can be retrieved via the API.

Workflows are created and managed via the Cumulus deployment.

Configuration Fields

Schemas

Looking at our API schema definitions can provide us with some insight into collections, providers, rules, and their attributes (and whether those are required or not). The schema for different concepts will be reference throughout this document.

The schemas are extremely useful for understanding which attributes are configurable and which of those are required. Cumulus uses these schemas for validation.

Providers

Please note:

  • While connection configuration is defined here, things that are more specific to a specific ingest setup (e.g. 'What target directory should we be pulling from' or 'How is duplicate handling configured?') are generally defined in a Rule or Collection, not the Provider.
  • There is some provider behavior which is controlled by task-specific configuration and not the provider definition. This configuration has to be set on a per-workflow basis. For example, see the httpListTimeout configuration on the discover-granules task

Provider Configuration

The Provider configuration is defined by a JSON object that takes different configuration keys depending on the provider type. The following are definitions of typical configuration values relevant for the various providers:

Configuration by provider type
S3
KeyTypeRequiredDescription
idstringYesUnique identifier for the provider
globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
protocolstringYesThe protocol for this provider. Must be s3 for this provider type.
hoststringYesS3 Bucket to pull data from
http
KeyTypeRequiredDescription
idstringYesUnique identifier for the provider
globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
protocolstringYesThe protocol for this provider. Must be http for this provider type
hoststringYesThe host to pull data from (e.g. nasa.gov)
usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
portintegerNoPort to connect to the provider on. Defaults to 80
allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
certificateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
https
KeyTypeRequiredDescription
idstringYesUnique identifier for the provider
globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
protocolstringYesThe protocol for this provider. Must be https for this provider type
hoststringYesThe host to pull data from (e.g. nasa.gov)
usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
portintegerNoPort to connect to the provider on. Defaults to 443
allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
certiciateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
ftp
KeyTypeRequiredDescription
idstringYesUnique identifier for the provider
globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
protocolstringYesThe protocol for this provider. Must be ftp for this provider type
hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
usernamestringNoUsername to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to anonymous if not defined
passwordstringNoPassword to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to password if not defined
portintegerNoPort to connect to the provider on. Defaults to 21
sftp
KeyTypeRequiredDescription
idstringYesUnique identifier for the provider
globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
protocolstringYesThe protocol for this provider. Must be sftp for this provider type
hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
usernamestringNoUsername to use to connect to the sftp server.
passwordstringNoPassword to use to connect to the sftp server.
portintegerNoPort to connect to the provider on. Defaults to 22
privateKeystringNofilename assumed to be in s3://bucketInternal/stackName/crypto
cmKeyIdstringNoAWS KMS Customer Master Key arn or alias

Collections

Break down of [s3_MOD09GQ_006.json](https://github.com/nasa/cumulus/blob/master/example/data/collections/s3_MOD09GQ_006/s3_MOD09GQ_006.json)
KeyValueRequiredDescription
name"MOD09GQ"YesThe name attribute designates the name of the collection. This is the name under which the collection will be displayed on the dashboard
version"006"YesA version tag for the collection
granuleId"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$"YesThe regular expression used to validate the granule ID extracted from filenames according to the granuleIdExtraction
granuleIdExtraction"(MOD09GQ\..*)(\.hdf|\.cmr|_ndvi\.jpg)"YesThe regular expression used to extract the granule ID from filenames. The first capturing group extracted from the filename by the regex will be used as the granule ID.
sampleFileName"MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesAn example filename belonging to this collection
files<JSON Object> of files defined hereYesDescribe the individual files that will exist for each granule in this collection (size, browse, meta, etc.)
dataType"MOD09GQ"NoCan be specified, but this value will default to the collection_name if not
duplicateHandling"replace"No("replace"|"version"|"skip") determines granule duplicate handling scheme
ignoreFilesConfigForDiscoveryfalse (default)NoBy default, during discovery only files that match one of the regular expressions in this collection's files attribute (see above) are ingested. Setting this to true will ignore the files attribute during discovery, meaning that all files for a granule (i.e., all files with filenames matching granuleIdExtraction) will be ingested even when they don't match a regular expression in the files attribute at discovery time. (NOTE: this attribute does not appear in the example file, but is listed here for completeness.)
process"modis"NoExample options for this are found in the ChooseProcess step definition in the IngestAndPublish workflow definition
meta<JSON Object> of MetaData for the collectionNoMetaData for the collection. This metadata will be available to workflows for this collection via the Cumulus Message Adapter.
url_path"{cmrMetadata.Granule.Collection.ShortName}/
{substring(file.fileName, 0, 3)}"
NoFilename without extension

files-object

KeyValueRequiredDescription
regex"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"YesRegular expression used to identify the file
sampleFileNameMOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesFilename used to validate the provided regex
type"data"NoValue to be assigned to the Granule File Type. CNM types are used by Cumulus CMR steps, non-CNM values will be treated as 'data' type. Currently only utilized in DiscoverGranules task
bucket"internal"YesName of the bucket where the file will be stored
url_path"${collectionShortName}/{substring(file.fileName, 0, 3)}"NoFolder used to save the granule in the bucket. Defaults to the collection url_path
checksumFor"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"NoIf this is a checksum file, set checksumFor to the regex of the target file.

Rules

Rules are used by to start processing workflows and the transformation process. Rules can be invoked manually, based on a schedule, or can be configured to be triggered by either events in Kinesis, SNS messages or SQS messages.

Rule configuration
KeyValueRequiredDescription
name"L2_HR_PIXC_kinesisRule"YesName of the rule. This is the name under which the rule will be listed on the dashboard
workflow"CNMExampleWorkflow"YesName of the workflow to be run. A list of available workflows can be found on the Workflows page
provider"PODAAC_SWOT"NoConfigured provider's ID. This can be found on the Providers dashboard page
collection<JSON Object> collection object shown belowYesName and version of the collection this rule will moderate. Relates to a collection configured and found in the Collections page
payload<JSON Object or Array>NoThe payload to be passed to the workflow
meta<JSON Object> of MetaData for the ruleNoMetaData for the rule. This metadata will be available to workflows for this rule via the Cumulus Message Adapter.
rule<JSON Object> rule type and associated values - discussed belowYesObject defining the type and subsequent attributes of the rule
state"ENABLED"No("ENABLED"|"DISABLED") whether or not the rule will be active. Defaults to "ENABLED".
queueUrlhttps://sqs.us-east-1.amazonaws.com/1234567890/queue-nameNoURL for SQS queue that will be used to schedule workflows for this rule
tags["kinesis", "podaac"]NoAn array of strings that can be used to simplify search

collection-object

KeyValueRequiredDescription
name"L2_HR_PIXC"YesName of a collection defined/configured in the Collections dashboard page
version"000"YesVersion number of a collection defined/configured in the Collections dashboard page

meta-object

KeyValueRequiredDescription
retries3NoNumber of retries on errors, for sqs-type rule only. Defaults to 3.
visibilityTimeout900NoVisibilityTimeout in seconds for the inflight messages, for sqs-type rule only. Defaults to the visibility timeout of the SQS queue when the rule is created.

rule-object

KeyValueRequiredDescription
type"kinesis"Yes("onetime"|"scheduled"|"kinesis"|"sns"|"sqs") type of scheduling/workflow kick-off desired
value<String> ObjectDependsDiscussion of valid values is below

rule-value

The rule - value entry depends on the type of run:

  • If this is a onetime rule this can be left blank. Example
  • If this is a scheduled rule this field must hold a valid cron-type expression or rate expression.
  • If this is a kinesis rule, this must be a configured ${Kinesis_stream_ARN}. Example
  • If this is an sns rule, this must be an existing ${SNS_Topic_Arn}. Example
  • If this is an sqs rule, this must be an existing ${SQS_QueueUrl} that your account has permissions to access, and also you must configure a dead-letter queue for this SQS queue. Example

sqs-type rule features

  • When an SQS rule is triggered, the SQS message remains on the queue.
  • The SQS message is not processed multiple times in parallel when visibility timeout is properly set. You should set the visibility timeout to the maximum expected length of the workflow with padding. Longer is better to avoid parallel processing.
  • The SQS message visibility timeout can be overridden by the rule.
  • Upon successful workflow execution, the SQS message is removed from the queue.
  • Upon failed execution(s), the workflow is run 3 or configured number of times.
  • Upon failed execution(s), the visibility timeout will be set to 5s to allow retries.
  • After configured number of failed retries, the SQS message is moved to the dead-letter queue configured for the SQS queue.

Configuration Via Cumulus Dashboard

Create A Provider

  • In the Cumulus dashboard, go to the Provider page.

Screenshot of Create Provider form

  • Click on Add Provider.
  • Fill in the form and then submit it.

Screenshot of Create Provider form

Create A Collection

  • Go to the Collections page.

Screenshot of the Collections page

  • Click on Add Collection.
  • Copy and paste or fill in the collection JSON object form.

Screenshot of Add Collection form

  • Once you submit the form, you should be able to verify that your new collection is in the list.

Create A Rule

  1. Go To Rules Page
  • Go to the Cumulus dashboard, click on Rules in the navigation.
  • Click Add Rule.

Screenshot of Rules page

  1. Complete Form
  • Fill out the template form.

Screenshot of a Rules template for adding a new rule

For more details regarding the field definitions and required information go to Data Cookbooks.

Note: If the state field is left blank, it defaults to false.

Rule Examples

  • A rule form with completed required fields:

Screenshot of a completed rule form

  • A successfully added Rule:

Screenshot of created rule

- + \ No newline at end of file diff --git a/docs/configuration/lifecycle-policies/index.html b/docs/configuration/lifecycle-policies/index.html index 767450df0db..98e90f9e793 100644 --- a/docs/configuration/lifecycle-policies/index.html +++ b/docs/configuration/lifecycle-policies/index.html @@ -5,13 +5,13 @@ Setting S3 Lifecycle Policies | Cumulus Documentation - +
Version: v16.0.0

Setting S3 Lifecycle Policies

This document will outline, in brief, how to set data lifecycle policies so that you are more easily able to control data storage costs while keeping your data accessible. For more information on why you might want to do this, see the 'Additional Information' section at the end of the document.

Requirements

  • The AWS CLI installed and configured (if you wish to run the CLI example). See AWS's guide to setting up the AWS CLI for more on this. Please ensure the AWS CLI is in your shell path.
  • You will need a S3 bucket on AWS. You are strongly encouraged to use a bucket without voluminous amounts of data in it for experimenting/learning.
  • An AWS user with the appropriate roles to access the target bucket as well as modify bucket policies.

Examples

Walk-through on setting time-based S3 Infrequent Access (S3IA) bucket policy

This example will give step-by-step instructions on updating a bucket's lifecycle policy to move all objects in the bucket from the default storage to S3 Infrequent Access (S3IA) after a period of 90 days. Below are instructions for walking through configuration via the command line and the management console.

Command Line

Please ensure you have the AWS CLI installed and configured for access prior to attempting this example.

Create policy

From any directory you chose, open an editor and add the following to a file named exampleRule.json

{
"Rules": [
{
"Status": "Enabled",
"Filter": {
"Prefix": ""
},
"Transitions": [
{
"Days": 90,
"StorageClass": "STANDARD_IA"
}
],
"NoncurrentVersionTransitions": [
{
"NoncurrentDays": 90,
"StorageClass": "STANDARD_IA"
}
]
"ID": "90DayS3IAExample"
}
]
}

Set policy

On the command line run the following command (with the bucket you're working with substituted in place of yourBucketNameHere).

aws s3api put-bucket-lifecycle-configuration --bucket yourBucketNameHere --lifecycle-configuration file://exampleRule.json

Verify policy has been set

To obtain all of the existing policies for a bucket, run the following command (again substituting the correct bucket name):

 $ aws s3api get-bucket-lifecycle-configuration --bucket yourBucketNameHere
{
"Rules": [
{
"Status": "Enabled",
"Filter": {
"Prefix": ""
},
"Transitions": [
{
"Days": 90,
"StorageClass": "STANDARD_IA"
}
],
"NoncurrentVersionTransitions": [
{
"NoncurrentDays": 90,
"StorageClass": "STANDARD_IA"
}
]
"ID": "90DayS3IAExample"
}
]
}

You have set a policy that transitions any version of an object in the bucket to S3IA after each object version has not been modified for 90 days.

Management Console

Create Policy

To create the example policy on a bucket via the management console, go to the following URL (replacing 'yourBucketHere' with the bucket you intend to update):

https://s3.console.aws.amazon.com/s3/buckets/yourBucketHere/?tab=overview

You should see a screen similar to:

Screenshot of AWS console for an S3 bucket

Click the "Management" Tab, then lifecycle button and press + Add lifecycle rule:

Screenshot of &quot;Management&quot; tab of AWS console for an S3 bucket

Give the rule a name (e.g. '90DayRule'), leaving the filter blank:

Screenshot of window for configuring the name and scope of a lifecycle rule on an S3 bucket in the AWS console

Click next, and mark Current Version and Previous Versions.

Then for each, click + Add transition and select Transition to Standard-IA after for the Object creation field, and set 90 for the Days after creation/Days after objects become concurrent field. Your screen should look similar to:

Screenshot of window for configuring the storage class transitions of a lifecycle rule on an S3 bucket in the AWS console

Click next, then next past the Configure expiration screen (we won't be setting this), and on the fourth page, click Save:

Screenshot of window for reviewing the configuration of a lifecycle rule on an S3 bucket in the AWS console

You should now see you have a rule configured for your bucket:

Screenshot of lifecycle rule appearing in the &quot;Management&quot; tab of AWS console for an S3 bucket

You have now set a policy that transitions any version of an object in the bucket to S3IA after each object has not been modified for 90 days.

Additional Information

This section lists information you may want prior to enacting lifecycle policies. It is not required content for working through the examples.

Strategy Overview

For a discussion of overall recommended strategy, please review the Methodology for Data Lifecycle Management on the EarthData wiki.

AWS Documentation

The examples shown in this document are obviously fairly basic cases. By using object tags, filters and other configuration options you can enact far more complicated policies for various scenarios. For more reading on the topics presented on this page see:

- + \ No newline at end of file diff --git a/docs/configuration/monitoring-readme/index.html b/docs/configuration/monitoring-readme/index.html index bb75513f7fb..4ec867e0134 100644 --- a/docs/configuration/monitoring-readme/index.html +++ b/docs/configuration/monitoring-readme/index.html @@ -5,14 +5,14 @@ Monitoring Best Practices | Cumulus Documentation - +
Version: v16.0.0

Monitoring Best Practices

This document intends to provide a set of recommendations and best practices for monitoring the state of a deployed Cumulus and diagnosing any issues.

Cumulus-provided resources and integrations for monitoring

Cumulus provides a number set of resources that are useful for monitoring the system and its operation.

Cumulus Dashboard

The primary tool for monitoring the Cumulus system is the Cumulus Dashboard. The dashboard is hosted on Github and includes instructions on how to deploy and link it into your core Cumulus deployment.

The dashboard displays workflow executions, their status, inputs, outputs, and some diagnostic information such as logs. For further information on the dashboard, its usage, and the information it provides, see the documentation.

Cumulus-provided AWS resources

Cumulus sets up CloudWatch log groups for all Core-provided tasks.

Monitoring Lambda Functions

Logging for each Lambda Function is available in Lambda-specific CloudWatch log groups.

Monitoring ECS services

Each deployed cumulus_ecs_service module also includes a CloudWatch log group for the processes running on ECS.

Monitoring workflows

For advanced debugging, we also configure dead letter queues on critical system functions. These will allow you to monitor and debug invalid inputs to the functions we use to start workflows, which can be helpful if you find that you are not seeing workflows being started as expected. More information on these can be found in the dead letter queue documentation

AWS recommendations

AWS has a number of recommendations on system monitoring. Rather than reproduce those here and risk providing outdated guidance, we've documented the following links which will take you to available AWS docs on monitoring recommendations and best practices for the services used in Cumulus:

Example: Setting up email notifications for CloudWatch logs

Cumulus does not provide out-of-the-box support for email notifications at this time. However, setting up email notifications on AWS is fairly straightforward in that the operative components are an AWS SNS topic and a subscribed email address.

In terms of Cumulus integration, forwarding CloudWatch logs requires creating a mechanism, most likely a Lambda Function subscribed to the log group that will receive, filter and forward these messages to the SNS topic.

As a very simple example, we could create a function that filters CloudWatch logs created by the @cumulus/logger package and sends email notifications for error and fatal log levels, adapting the example linked above:

const zlib = require('zlib');
const aws = require('aws-sdk');
const { promisify } = require('util');

const gunzip = promisify(zlib.gunzip);
const sns = new aws.SNS();

exports.handler = async (event) => {
const payload = Buffer.from(event.awslogs.data, 'base64');
const decompressedData = await gunzip(payload);
const logData = JSON.parse(decompressedData.toString('ascii'));
return await Promise.all(logData.logEvents.map(async (logEvent) => {
const logMessage = JSON.parse(logEvent.message);
if (['error', 'fatal'].includes(logMessage.level)) {
return sns.publish({
TopicArn: process.env.EmailReportingTopicArn,
Message: logEvent.message
}).promise();
}
return Promise.resolve();
}));
};

After creating the SNS topic, We can deploy this code as a lambda function, following the setup steps from Amazon. Make sure to include your SNS topic ARN as an environment variable on the lambda function by using the --environment option on aws lambda create-function.

You will need to create subscription filters for each log group you want to receive emails for. We recommend automating this as much as possible, and you could very well handle this via Terraform, such as using a module to deploy filters alongside log groups, or exporting the log group names to an all-in-one email notification module.

- + \ No newline at end of file diff --git a/docs/configuration/server_access_logging/index.html b/docs/configuration/server_access_logging/index.html index 5c24fb692b0..615d60cdf49 100644 --- a/docs/configuration/server_access_logging/index.html +++ b/docs/configuration/server_access_logging/index.html @@ -5,13 +5,13 @@ S3 Server Access Logging | Cumulus Documentation - +
Version: v16.0.0

S3 Server Access Logging

Via AWS Console

Enable server access logging for an S3 bucket

Via AWS Command Line Interface

  1. Create a logging.json file with these contents, replacing <stack-internal-bucket> with your stack's internal bucket name, and <stack> with the name of your cumulus stack.

    {
    "LoggingEnabled": {
    "TargetBucket": "<stack-internal-bucket>",
    "TargetPrefix": "<stack>/ems-distribution/s3-server-access-logs/"
    }
    }
  2. Add the logging policy to each of your protected and public buckets by calling this command on each bucket.

    aws s3api put-bucket-logging --bucket <protected/public-bucket-name> --bucket-logging-status file://logging.json
  3. Verify the logging policy exists on your buckets.

    aws s3api get-bucket-logging --bucket <protected/public-bucket-name>
- + \ No newline at end of file diff --git a/docs/configuration/task-configuration/index.html b/docs/configuration/task-configuration/index.html index 0446f1ea551..4a867a210c6 100644 --- a/docs/configuration/task-configuration/index.html +++ b/docs/configuration/task-configuration/index.html @@ -5,13 +5,13 @@ Configuration of Tasks | Cumulus Documentation - +
Version: v16.0.0

Configuration of Tasks

The cumulus module exposes values for configuration for some of the provided archive and ingest tasks. Currently the following are available as configurable variables:

cmr_search_client_config

Configuration parameters for CMR search client for cumulus archive module tasks in the form:

<lambda_identifier>_report_cmr_limit = <maximum number records can be returned from cmr-client search, this should be greater than cmr_page_size>
<lambda_identifier>_report_cmr_page_size = <number of records for each page returned from CMR>
type = map(string)

More information about cmr limit and cmr page_size can be found from @cumulus/cmr-client and CMR Search API document.

Currently the following values are supported:

  • create_reconciliation_report_cmr_limit
  • create_reconciliation_report_cmr_page_size

Example

cmr_search_client_config = {
create_reconciliation_report_cmr_limit = 2500
create_reconciliation_report_cmr_page_size = 250
}

elasticsearch_client_config

Configuration parameters for Elasticsearch client for cumulus archive module tasks in the form:

<lambda_identifier>_es_scroll_duration = <duration>
<lambda_identifier>_es_scroll_size = <size>
type = map(string)

Currently the following values are supported:

  • create_reconciliation_report_es_scroll_duration
  • create_reconciliation_report_es_scroll_size

Example

elasticsearch_client_config = {
create_reconciliation_report_es_scroll_duration = "15m"
create_reconciliation_report_es_scroll_size = 2000
}

lambda_timeouts

A configurable map of timeouts (in seconds) for cumulus ingest module task lambdas in the form:

<lambda_identifier>_timeout: <timeout>
type = map(string)

Currently the following values are supported:

  • add_missing_file_checksums_task_timeout
  • discover_granules_task_timeout
  • discover_pdrs_task_timeout
  • fake_processing_task_timeout
  • files_to_granules_task_timeout
  • hello_world_task_timeout
  • hyrax_metadata_update_tasks_timeout
  • lzards_backup_task_timeout
  • move_granules_task_timeout
  • parse_pdr_task_timeout
  • pdr_status_check_task_timeout
  • post_to_cmr_task_timeout
  • queue_granules_task_timeout
  • queue_pdrs_task_timeout
  • queue_workflow_task_timeout
  • sf_sqs_report_task_timeout
  • sync_granule_task_timeout
  • update_granules_cmr_metadata_file_links_task_timeout

Example

lambda_timeouts = {
discover_granules_task_timeout = 300
}

lambda_memory_sizes

A configurable map of memory sizes (in MBs) for cumulus ingest module task lambdas in the form:

<lambda_identifier>_memory_size: <memory_size>
type = map(string)

Currently the following values are supported:

  • add_missing_file_checksums_task_memory_size
  • discover_granules_task_memory_size
  • discover_pdrs_task_memory_size
  • fake_processing_task_memory_size
  • hyrax_metadata_updates_task_memory_size
  • lzards_backup_task_memory_size
  • move_granules_task_memory_size
  • parse_pdr_task_memory_size
  • pdr_status_check_task_memory_size
  • post_to_cmr_task_memory_size
  • queue_granules_task_memory_size
  • queue_pdrs_task_memory_size
  • queue_workflow_task_memory_size
  • sf_sqs_report_task_memory_size
  • sync_granule_task_memory_size
  • update_cmr_acess_constraints_task_memory_size
  • update_granules_cmr_metadata_file_links_task_memory_size

Example

lambda_memory_sizes = {
queue_granules_task_memory_size = 1036
}
- + \ No newline at end of file diff --git a/docs/data-cookbooks/about-cookbooks/index.html b/docs/data-cookbooks/about-cookbooks/index.html index 2a1475e06e2..bd34c278cac 100644 --- a/docs/data-cookbooks/about-cookbooks/index.html +++ b/docs/data-cookbooks/about-cookbooks/index.html @@ -5,13 +5,13 @@ About Cookbooks | Cumulus Documentation - +
Version: v16.0.0

About Cookbooks

Introduction

The following data cookbooks are documents containing examples and explanations of workflows in the Cumulus framework. Additionally, the following data cookbooks should serve to help unify an institution/user group on a set of terms.

Setup

The data cookbooks assume you can configure providers, collections, and rules to run workflows. Visit Cumulus data management types for information on how to configure Cumulus data management types.

Adding a page

As shown in detail in the "Add a New Page and Sidebars" section in Cumulus Docs: How To's, you can add a new page to the data cookbook by creating a markdown (.md) file in the docs/data-cookbooks directory. The new page can then be linked to the sidebar by adding it to the Data-Cookbooks object in the website/sidebar.json file as data-cookbooks/${id}.

More about workflows

Workflow general information

Input & Output

Developing Workflow Tasks

Workflow Configuration How-to's

- + \ No newline at end of file diff --git a/docs/data-cookbooks/browse-generation/index.html b/docs/data-cookbooks/browse-generation/index.html index 9627d79ea9f..d3fb91369fc 100644 --- a/docs/data-cookbooks/browse-generation/index.html +++ b/docs/data-cookbooks/browse-generation/index.html @@ -5,7 +5,7 @@ Ingest Browse Generation | Cumulus Documentation - + @@ -15,7 +15,7 @@ provider keys with the previously entered values) Note that you need to set the "provider_path" to the path on your bucket (e.g. "/data") that you've staged your mock/test data.:

{
"name": "TestBrowseGeneration",
"workflow": "DiscoverGranulesBrowseExample",
"provider": "{{provider_from_previous_step}}",
"collection": {
"name": "MOD09GQ",
"version": "006"
},
"meta": {
"provider_path": "{{path_to_data}}"
},
"rule": {
"type": "onetime"
},
"state": "ENABLED",
"updatedAt": 1553053438767
}

Run Workflows

Once you've configured the Collection and Provider and added a onetime rule, you're ready to trigger your rule, and watch the ingest workflows process.

Go to the Rules tab, click the rule you just created:

Screenshot of the Rules overview page with a list of rules in the Cumulus dashboard

Then click the gear in the upper right corner and click "Rerun":

Screenshot of clicking the button to rerun a workflow rule from the rule edit page in the Cumulus dashboard

Tab over to executions and you should see the DiscoverGranulesBrowseExample workflow run, succeed, and then moments later the CookbookBrowseExample should run and succeed.

Screenshot of page listing executions in the Cumulus dashboard

Results

You can verify your data has ingested by clicking the successful workflow entry:

Screenshot of individual entry from table listing executions in the Cumulus dashboard

Select "Show Output" on the next page

Screenshot of &quot;Show output&quot; button from individual execution page in the Cumulus dashboard

and you should see in the payload from the workflow something similar to:

"payload": {
"process": "modis",
"granules": [
{
"files": [
{
"fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
"key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
"type": "data",
"bucket": "cumulus-test-sandbox-protected",
"path": "data",
"url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
"size": 1908635
},
{
"fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
"key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
"type": "metadata",
"bucket": "cumulus-test-sandbox-private",
"path": "data",
"url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}",
"size": 21708
},
{
"fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
"key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
"type": "browse",
"bucket": "cumulus-test-sandbox-protected",
"path": "data",
"url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
"size": 1908635
},
{
"fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
"key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
"type": "metadata",
"bucket": "cumulus-test-sandbox-protected-2",
"url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}"
}
],
"cmrLink": "https://cmr.uat.earthdata.nasa.gov/search/granules.json?concept_id=G1222231611-CUMULUS",
"cmrConceptId": "G1222231611-CUMULUS",
"granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
"cmrMetadataFormat": "echo10",
"dataType": "MOD09GQ",
"version": "006",
"published": true
}
]
}

You can verify the granules exist within your cumulus instance (search using the Granules interface, check the S3 buckets, etc) and validate that the above CMR entry


Build Processing Lambda

This section discusses the construction of a custom processing lambda to replace the contrived example from this entry for a real dataset processing task.

To ingest your own data using this example, you will need to construct your own lambda to replace the source in ProcessingStep that will generate browse imagery and provide or update a CMR metadata export file.

You will then need to add the lambda to your Cumulus deployment as a aws_lambda_function Terraform resource.

The discussion below outlines requirements for this lambda.

Inputs

The incoming message to the task defined in the ProcessingStep as configured will have the following configuration values (accessible inside event.config courtesy of the message adapter):

Configuration

  • event.config.bucket -- the name of the bucket configured in terraform.tfvars as your internal bucket.

  • event.config.collection -- The full collection object we will configure in the Configure Ingest section. You can view the expected collection schema in the docs here or in the source code on github. You need this as available input and output so you can update as needed.

event.config.additionalUrls, generateFakeBrowse and event.config.cmrMetadataFormat from the example can be ignored as they're configuration flags for the provided example script.

Payload

The 'payload' from the previous task is accessible via event.input. The expected payload output schema from SyncGranules can be viewed here.

In our example, the payload would look like the following. Note: The types are set per-file based on what we configured in our collection, and were initially added as part of the DiscoverGranules step in the DiscoverGranulesBrowseExample workflow.

 "payload": {
"process": "modis",
"granules": [
{
"granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
"dataType": "MOD09GQ",
"version": "006",
"files": [
{
"fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
"bucket": "cumulus-test-sandbox-internal",
"key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
"size": 1908635
},
{
"fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
"bucket": "cumulus-test-sandbox-internal",
"key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
"size": 21708
}
]
}
]
}

Generating Browse Imagery

The provided example script used in the example goes through all granules and adds a 'fake' .jpg browse file to the same staging location as the data staged by prior ingest tasksf.

The processing lambda you construct will need to do the following:

  • Create a browse image file based on the input data, and stage it to a location accessible to both this task and the FilesToGranules and MoveGranules tasks in a S3 bucket.
  • Add the browse file to the input granule files, making sure to set the granule file's type to browse.
  • Update meta.input_granules with the updated granules list, as well as provide the files to be integrated by FilesToGranules as output from the task.

Generating/updating CMR metadata

If you do not already have a CMR file in the granules list, you will need to generate one for valid export. This example's processing script generates and adds it to the FilesToGranules file list via the payload but it can be present in the InputGranules from the DiscoverGranules task as well if you'd prefer to pre-generate it.

Both downstream tasks MoveGranules, UpdateGranulesCmrMetadataFileLinks, and PostToCmr expect a valid CMR file to be available if you want to export to CMR.

Expected Outputs for processing task/tasks

In the above example, the critical portion of the output to FilesToGranules is the payload and meta.input_granules.

In the example provided, the processing task is setup to return an object with the keys "files" and "granules". In the cumulus_message configuration, the outputs are mapped in the configuration to the payload, granules to meta.input_granules:

          "task_config": {
"inputGranules": "{$.meta.input_granules}",
"granuleIdExtraction": "{$.meta.collection.granuleIdExtraction}"
}

Their expected values from the example above may be useful in constructing a processing task:

payload

The payload includes a full list of files to be 'moved' into the cumulus archive. The FilesToGranules task will take this list, merge it with the information from InputGranules, then pass that list to the MoveGranules task. The MoveGranules task will then move the files to their targets. The UpdateGranulesCmrMetadataFileLinks task will update the CMR metadata file if it exists with the updated granule locations and update the CMR file etags.

In the provided example, a payload being passed to the FilesToGranules task should be expected to look like:

  "payload": [
"s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
"s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
"s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
"s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml"
]

This list is the list of granules FilesToGranules will act upon to add/merge with the input_granules object.

The pathing is generated from sync-granules, but in principle the files can be staged wherever you like so long as the processing/MoveGranules task's roles have access and the filename matches the collection configuration.

input_granules

The FilesToGranules task utilizes the incoming payload to chose which files to move, but pulls all other metadata from meta.input_granules. As such, the output payload in the example would look like:

"input_granules": [
{
"granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
"dataType": "MOD09GQ",
"version": "006",
"files": [
{
"fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
"bucket": "cumulus-test-sandbox-internal",
"key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
"size": 1908635
},
{
"fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
"bucket": "cumulus-test-sandbox-internal",
"key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
"size": 21708
},
{
"fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
"bucket": "cumulus-test-sandbox-internal",
"key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg"
}
]
}
],
- + \ No newline at end of file diff --git a/docs/data-cookbooks/choice-states/index.html b/docs/data-cookbooks/choice-states/index.html index b22cc86a293..99de4e9418e 100644 --- a/docs/data-cookbooks/choice-states/index.html +++ b/docs/data-cookbooks/choice-states/index.html @@ -5,13 +5,13 @@ Choice States | Cumulus Documentation - +
Version: v16.0.0

Choice States

Cumulus supports AWS Step Function Choice states. A Choice state enables branching logic in Cumulus workflows.

Choice state definitions include a list of Choice Rules. Each Choice Rule defines a logical operation which compares an input value against a value using a comparison operator. For available comparison operators, review the AWS docs.

If the comparison evaluates to true, the Next state is followed.

Example

In examples/cumulus-tf/parse_pdr_workflow.tf the ParsePdr workflow uses a Choice state, CheckAgainChoice, to terminate the workflow once meta.isPdrFinished: true is returned by the CheckStatus state.

The CheckAgainChoice state definition requires an input object of the following structure:

{
"meta": {
"isPdrFinished": false
}
}

Given the above input to the CheckAgainChoice state, the workflow would transition to the PdrStatusReport state.

"CheckAgainChoice": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.meta.isPdrFinished",
"BooleanEquals": false,
"Next": "PdrStatusReport"
},
{
"Variable": "$.meta.isPdrFinished",
"BooleanEquals": true,
"Next": "WorkflowSucceeded"
}
],
"Default": "WorkflowSucceeded"
}

Advanced: Loops in Cumulus Workflows

Understanding the complete ParsePdr workflow is not necessary to understanding how Choice states work, but ParsePdr provides an example of how Choice states can be used to create a loop in a Cumulus workflow.

In the complete ParsePdr workflow definition, the state QueueGranules is followed by CheckStatus. From CheckStatus a loop starts: Given CheckStatus returns meta.isPdrFinished: false, CheckStatus is followed by CheckAgainChoice is followed by PdrStatusReport is followed by WaitForSomeTime, which returns to CheckStatus. Once CheckStatus returns meta.isPdrFinished: true, CheckAgainChoice proceeds to WorkflowSucceeded.

Execution graph of SIPS ParsePdr workflow in AWS Step Functions console

Further documentation

For complete details on Choice state configuration options, see the Choice state documentation.

- + \ No newline at end of file diff --git a/docs/data-cookbooks/cnm-workflow/index.html b/docs/data-cookbooks/cnm-workflow/index.html index 9bc5b999d55..f16d371733a 100644 --- a/docs/data-cookbooks/cnm-workflow/index.html +++ b/docs/data-cookbooks/cnm-workflow/index.html @@ -5,7 +5,7 @@ CNM Workflow | Cumulus Documentation - + @@ -13,7 +13,7 @@
Version: v16.0.0

CNM Workflow

This entry documents how to setup a workflow that utilizes the built-in CNM/Kinesis functionality in Cumulus.

Prior to working through this entry you should be familiar with the Cloud Notification Mechanism.

Sections


Prerequisites

Cumulus

This entry assumes you have a deployed instance of Cumulus (version >= 1.16.0). The entry assumes you are deploying Cumulus via the cumulus terraform module sourced from the release page.

AWS CLI

This entry assumes you have the AWS CLI installed and configured. If you do not, please take a moment to review the documentation - particularly the examples relevant to Kinesis - and install it now.

Kinesis

This entry assumes you already have two Kinesis data steams created for use as CNM notification and response data streams.

If you do not have two streams setup, please take a moment to review the Kinesis documentation and setup two basic single-shard streams for this example:

Using the "Create Data Stream" button on the Kinesis Dashboard, work through the dialogue.

You should be able to quickly use the "Create Data Stream" button on the Kinesis Dashboard, and setup streams that are similar to the following example:

Screenshot of AWS console page for creating a Kinesis stream

Please bear in mind that your {{prefix}}-lambda-processing IAM role will need permissions to write to the response stream for this workflow to succeed if you create the Kinesis stream with a dashboard user. If you are using the cumulus top-level module for your deployment this should be set properly.

If not, the most straightforward approach is to attach the AmazonKinesisFullAccess policy for the stream resource to whatever role your Lambda s are using, however your environment/security policies may require an approach specific to your deployment environment.

In operational environments it's likely science data providers would typically be responsible for providing a Kinesis stream with the appropriate permissions.

For more information on how this process works and how to develop a process that will add records to a stream, read the Kinesis documentation and the developer guide.

Source Data

This entry will run the SyncGranule task against a single target data file. To that end it will require a single data file to be present in an S3 bucket matching the Provider configured in the next section.

Collection and Provider

Cumulus will need to be configured with a Collection and Provider entry of your choosing. The provider should match the location of the source data from the Ingest Source Data section.

This can be done via the Cumulus Dashboard if installed or the API. It is strongly recommended to use the dashboard if possible.


Configure the Workflow

Provided the prerequisites have been fulfilled, you can begin adding the needed values to your Cumulus configuration to configure the example workflow.

The following are steps that are required to set up your Cumulus instance to run the example workflow:

Example CNM Workflow

In this example, we're going to trigger a workflow by creating a Kinesis rule and sending a record to a Kinesis stream.

The following workflow definition should be added to a new .tf workflow resource (e.g. cnm_workflow.tf) in your deployment directory. For the complete CNM workflow example, see examples/cumulus-tf/cnm_workflow.tf.

Add the following to the new terraform file in your deployment directory, updating the following:

  • Set the response-endpoint key in the CnmResponse task in the workflow JSON to match the name of the Kinesis response stream you configured in the prerequisites section
  • Update the source key to the workflow module to match the Cumulus release associated with your deployment.
module "cnm_workflow" {
source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-workflow.zip"

prefix = var.prefix
name = "CNMExampleWorkflow"
workflow_config = module.cumulus.workflow_config
system_bucket = var.system_bucket

{
state_machine_definition = <<JSON
"CNMExampleWorkflow": {
"Comment": "CNMExampleWorkflow",
"StartAt": "TranslateMessage",
"States": {
"TranslateMessage": {
"Type": "Task",
"Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
"Parameters": {
"cma": {
"event.$": "$",
"task_config": {
"collection": "{$.meta.collection}",
"cumulus_message": {
"outputs": [
{
"source": "{$.cnm}",
"destination": "{$.meta.cnm}"
},
{
"source": "{$}",
"destination": "{$.payload}"
}
]
}
}
}
},
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.exception",
"Next": "CnmResponse"
}
],
"Next": "SyncGranule"
},
"SyncGranule": {
"Parameters": {
"cma": {
"event.$": "$",
"task_config": {
"provider": "{$.meta.provider}",
"buckets": "{$.meta.buckets}",
"collection": "{$.meta.collection}",
"downloadBucket": "{$.meta.buckets.private.name}",
"stack": "{$.meta.stack}",
"cumulus_message": {
"outputs": [
{
"source": "{$.granules}",
"destination": "{$.meta.input_granules}"
},
{
"source": "{$}",
"destination": "{$.payload}"
}
]
}
}
}
},
"Type": "Task",
"Resource": "${module.cumulus.sync_granule_task.task_arn}",
"Retry": [
{
"ErrorEquals": [
"States.ALL"
],
"IntervalSeconds": 10,
"MaxAttempts": 3
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.exception",
"Next": "CnmResponse"
}
],
"Next": "CnmResponse"
},
"CnmResponse": {
"Parameters": {
"cma": {
"event.$": "$",
"task_config": {
"OriginalCNM": "{$.meta.cnm}",
"distribution_endpoint": "{$.meta.distribution_endpoint}",
"response-endpoint": "ADD YOUR RESPONSE STREAM NAME HERE",
"region": "us-east-1",
"type": "kinesis",
"WorkflowException": "{$.exception}",
"cumulus_message": {
"outputs": [
{
"source": "{$.cnm}",
"destination": "{$.meta.cnmResponse}"
},
{
"source": "{$.input.input}",
"destination": "{$.payload}"
}
]
}
}
}
},
"Type": "Task",
"Resource": "${aws_lambda_function.cnm_response_task.arn}",
"Retry": [
{
"ErrorEquals": [
"States.ALL"
],
"IntervalSeconds": 5,
"MaxAttempts": 3
}
],
"End": true
}
}
}
}
JSON

Again, please make sure to modify the value response-endpoint to match the stream name (not ARN) for your Kinesis response stream.

Lambda Configuration

To execute this workflow, you're required to include several Lambda resources in your deployment. To do this, add the following task (Lambda) definitions to your deployment along with the workflow you created above:

Please note: To utilize these tasks you need to ensure you have a compatible CMA layer. See the deployment instructions for more details on how to deploy a CMA layer.

Below is a description of each of these tasks:

CNMToCMA

CNMToCMA is meant for the beginning of a workflow: it maps CNM granule information to a payload for downstream tasks. For other CNM workflows, you would need to ensure that downstream tasks in your workflow either understand the CNM message or include a translation task like this one.

You can also manipulate the data sent to downstream tasks using task_config for various states in your workflow resource configuration. Read more about how to configure data on the Workflow Input & Output page.

CnmResponse

The CnmResponse Lambda generates a CNM response message and puts it on the response-endpoint Kinesis stream.

You can read more about the expected schema of a CnmResponse record in the Cloud Notification Mechanism schema repository.

Additional Tasks

Lastly, this entry also makes use of the SyncGranule task from the cumulus module.

Redeploy

Once the above configuration changes have been made, redeploy your stack.

Please refer to Update Cumulus resources in the deployment documentation if you are unfamiliar with redeployment.

Rule Configuration

Cumulus includes a messageConsumer Lambda function (message-consumer). Cumulus kinesis-type rules create the event source mappings between Kinesis streams and the messageConsumer Lambda. The messageConsumer Lambda consumes records from one or more Kinesis streams, as defined by enabled kinesis-type rules. When new records are pushed to one of these streams, the messageConsumer triggers workflows associated with the enabled kinesis-type rules.

To add a rule via the dashboard (if you'd like to use the API, see the docs here), navigate to the Rules page and click Add a rule, then configure the new rule using the following template (substituting correct values for parameters denoted by ${}):

{
"collection": {
"name": "L2_HR_PIXC",
"version": "000"
},
"name": "L2_HR_PIXC_kinesisRule",
"provider": "PODAAC_SWOT",
"rule": {
"type": "kinesis",
"value": "arn:aws:kinesis:{{awsRegion}}:{{awsAccountId}}:stream/{{streamName}}"
},
"state": "ENABLED",
"workflow": "CNMExampleWorkflow"
}

Please Note:

  • The rule's value attribute value must match the Amazon Resource Name ARN for the Kinesis data stream you've preconfigured. You should be able to obtain this ARN from the Kinesis Dashboard entry for the selected stream.
  • The collection and provider should match the collection and provider you setup in the Prerequisites section.

Once you've clicked on 'submit' a new rule should appear in the dashboard's Rule Overview.


Execute the Workflow

Once Cumulus has been redeployed and a rule has been added, we're ready to trigger the workflow and watch it execute.

How to Trigger the Workflow

To trigger matching workflows, you will need to put a record on the Kinesis stream that the message-consumer Lambda will recognize as a matching event. Most importantly, it should include a collection name that matches a valid collection.

For the purpose of this example, the easiest way to accomplish this is using the AWS CLI.

Create Record JSON

Construct a JSON file containing an object that matches the values that have been previously setup. This JSON object should be a valid Cloud Notification Mechanism message.

Please note: this example is somewhat contrived, as the downstream tasks don't care about most of these fields. A 'real' data ingest workflow would.

The following values (denoted by ${} in the sample below) should be replaced to match values we've previously configured:

  • TEST_DATA_FILE_NAME: The filename of the test data that is available in the S3 (or other) provider we created earlier.
  • TEST_DATA_URI: The full S3 path to the test data (e.g. s3://bucket-name/path/granule)
  • COLLECTION: The collection name defined in the prerequisites for this product
{
"product": {
"files": [
{
"checksumType": "md5",
"name": "${TEST_DATA_FILE_NAME}",
"checksum": "bogus_checksum_value",
"uri": "${TEST_DATA_URI}",
"type": "data",
"size": 12345678
}
],
"name": "${TEST_DATA_FILE_NAME}",
"dataVersion": "006"
},
"identifier ": "testIdentifier123456",
"collection": "${COLLECTION}",
"provider": "TestProvider",
"version": "001",
"submissionTime": "2017-09-30T03:42:29.791198"
}

Add Record to Kinesis Data Stream

Using the JSON file you created, push it to the Kinesis notification stream:

aws kinesis put-record --stream-name YOUR_KINESIS_NOTIFICATION_STREAM_NAME_HERE --partition-key 1 --data file:///path/to/file.json

Please note: The above command uses the stream name, not the ARN.

The command should return output similar to:

{
"ShardId": "shardId-000000000000",
"SequenceNumber": "42356659532578640215890215117033555573986830588739321858"
}

This command will put a record containing the JSON from the --data flag onto the Kinesis data stream. The messageConsumer Lambda will consume the record and construct a valid CMA payload to trigger workflows. For this example, the record will trigger the CNMExampleWorkflow workflow as defined by the rule previously configured.

You can view the current running executions on the Executions dashboard page which presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information.

Verify Workflow Execution

As detailed above, once the record is added to the Kinesis data stream, the messageConsumer Lambda will trigger the CNMExampleWorkflow .

TranslateMessage

TranslateMessage (which corresponds to the CNMToCMA Lambda) will take the CNM object payload and add a granules object to the CMA payload that's consistent with other Cumulus ingest tasks, and add a meta.cnm key (as well as the payload) to store the original message.

For more on the Message Adapter, please see the Message Flow documentation.

An example of what is happening in the CNMToCMA Lambda is as follows:

Example Input Payload:

"payload": {
"identifier ": "testIdentifier123456",
"product": {
"files": [
{
"checksumType": "md5",
"name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
"checksum": "bogus_checksum_value",
"uri": "s3://some_bucket/cumulus-test-data/pdrs/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
"type": "data",
"size": 12345678
}
],
"name": "TestGranuleUR",
"dataVersion": "006"
},
"version": "123456",
"collection": "MOD09GQ",
"provider": "TestProvider",
"submissionTime": "2017-09-30T03:42:29.791198"
}

Example Output Payload:

  "payload": {
"cnm": {
"identifier ": "testIdentifier123456",
"product": {
"files": [
{
"checksumType": "md5",
"name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
"checksum": "bogus_checksum_value",
"uri": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
"type": "data",
"size": 12345678
}
],
"name": "TestGranuleUR",
"dataVersion": "006"
},
"version": "123456",
"collection": "MOD09GQ",
"provider": "TestProvider",
"submissionTime": "2017-09-30T03:42:29.791198",
"receivedTime": "2017-09-30T03:42:31.634552"
},
"output": {
"granules": [
{
"granuleId": "TestGranuleUR",
"files": [
{
"path": "some-bucket/data",
"url_path": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
"bucket": "some-bucket",
"name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
"size": 12345678
}
]
}
]
}
}

SyncGranules

This Lambda will take the files listed in the payload and move them to s3://{deployment-private-bucket}/file-staging/{deployment-name}/{COLLECTION}/{file_name}.

CnmResponse

Assuming a successful execution of the workflow, this task will recover the meta.cnm key from the CMA output, and add a "SUCCESS" record to the notification Kinesis stream.

If a prior step in the workflow has failed, this will add a "FAILURE" record to the stream instead.

The data written to the response-endpoint should adhere to the Response Message Fields schema.

Example CNM Success Response:

{
"provider": "PODAAC_SWOT",
"collection": "SWOT_Prod_l2:1",
"processCompleteTime": "2017-09-30T03:45:29.791198",
"submissionTime": "2017-09-30T03:42:29.791198",
"receivedTime": "2017-09-30T03:42:31.634552",
"identifier": "1234-abcd-efg0-9876",
"response": {
"status": "SUCCESS"
}
}

Example CNM Error Response:

{
"provider": "PODAAC_SWOT",
"collection": "SWOT_Prod_l2:1",
"processCompleteTime": "2017-09-30T03:45:29.791198",
"submissionTime": "2017-09-30T03:42:29.791198",
"receivedTime": "2017-09-30T03:42:31.634552",
"identifier": "1234-abcd-efg0-9876",
"response": {
"status": "FAILURE",
"errorCode": "PROCESSING_ERROR",
"errorMessage": "File [cumulus-dev-a4d38f59-5e57-590c-a2be-58640db02d91/prod_20170926T11:30:36/production_file.nc] did not match gve checksum value."
}
}

Note the CnmResponse state defined in the .tf workflow definition above configures $.exception to be passed to the CnmResponse Lambda keyed under config.WorkflowException. This is required for the CnmResponse code to deliver a failure response.

To test the failure scenario, send a record missing the product.name key.


Verify results

Check for successful execution on the dashboard

Following the successful execution of this workflow, you should expect to see the workflow complete successfully on the dashboard:

Screenshot of a successful CNM workflow appearing on the executions page of the Cumulus dashboard

Check the test granule has been delivered to S3 staging

The test granule identified in the Kinesis record should be moved to the deployment's private staging area.

Check for Kinesis records

A SUCCESS notification should be present on the response-endpoint Kinesis stream.

You should be able to validate the notification and response streams have the expected records with the following steps (the AWS CLI Kinesis Basic Stream Operations is useful to review before proceeding):

Get a shard iterator (substituting your stream name as appropriate):

aws kinesis get-shard-iterator \
--shard-id shardId-000000000000 \
--shard-iterator-type LATEST \
--stream-name NOTIFICATION_OR_RESPONSE_STREAM_NAME

which should result in an output to:

{
"ShardIterator": "VeryLongString=="
}
  • Re-trigger the workflow by using the put-record command from
  • As the workflow completes, use the output from the get-shard-iterator command to request data from the stream:
aws kinesis get-records --shard-iterator SHARD_ITERATOR_VALUE

This should result in output similar to:

{
"Records": [
{
"SequenceNumber": "49586720336541656798369548102057798835250389930873978882",
"ApproximateArrivalTimestamp": 1532664689.128,
"Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjI4LjkxOSJ9",
"PartitionKey": "1"
},
{
"SequenceNumber": "49586720336541656798369548102059007761070005796999266306",
"ApproximateArrivalTimestamp": 1532664707.149,
"Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjQ2Ljk1OCJ9",
"PartitionKey": "1"
}
],
"NextShardIterator": "AAAAAAAAAAFo9SkF8RzVYIEmIsTN+1PYuyRRdlj4Gmy3dBzsLEBxLo4OU+2Xj1AFYr8DVBodtAiXbs3KD7tGkOFsilD9R5tA+5w9SkGJZ+DRRXWWCywh+yDPVE0KtzeI0andAXDh9yTvs7fLfHH6R4MN9Gutb82k3lD8ugFUCeBVo0xwJULVqFZEFh3KXWruo6KOG79cz2EF7vFApx+skanQPveIMz/80V72KQvb6XNmg6WBhdjqAA==",
"MillisBehindLatest": 0
}

Note the data encoding is not human readable and would need to be parsed/converted to be interpretable. There are many options to build a Kineis consumer such as the KCL.

For purposes of validating the workflow, it may be simpler to locate the workflow in the Step Function Management Console and assert the expected output is similar to the below examples.

Successful CNM Response Object Example:

{
"cnmResponse": {
"provider": "TestProvider",
"collection": "MOD09GQ",
"version": "123456",
"processCompleteTime": "2017-09-30T03:45:29.791198",
"submissionTime": "2017-09-30T03:42:29.791198",
"receivedTime": "2017-09-30T03:42:31.634552",
"identifier ": "testIdentifier123456",
"response": {
"status": "SUCCESS"
}
}
}

Kinesis Record Error Handling

messageConsumer

The default Kinesis stream processing in the Cumulus system is configured for record error tolerance.

When the messageConsumer fails to process a record, the failure is captured and the record is published to the kinesisFallback SNS Topic. The kinesisFallback SNS topic broadcasts the record and a subscribed copy of the messageConsumer Lambda named kinesisFallback consumes these failures.

At this point, the normal Lambda asynchronous invocation retry behavior will attempt to process the record 3 mores times. After this, if the record cannot successfully be processed, it is written to a dead letter queue. Cumulus' dead letter queue is an SQS Queue named kinesisFailure. Operators can use this queue to inspect failed records.

This system ensures when messageConsumer fails to process a record and trigger a workflow, the record is retried 3 times. This retry behavior improves system reliability in case of any external service failure outside of Cumulus control.

The Kinesis error handling system - the kinesisFallback SNS topic, messageConsumer Lambda, and kinesisFailure SQS queue - come with the API package and do not need to be configured by the operator.

To examine records that were unable to be processed at any step you need to go look at the dead letter queue {{prefix}}-kinesisFailure. Check the Simple Queue Service (SQS) console. Select your queue, and under the Queue Actions tab, you can choose View/Delete Messages. Start polling for messages and you will see records that failed to process through the messageConsumer.

Note, these are only records that occurred when processing records from Kinesis streams. Workflow failures are handled differently.

Kinesis Stream logging

Notification Stream messages

Cumulus includes two Lambdas (KinesisInboundEventLogger and KinesisOutboundEventLogger) that utilize the same code to take a Kinesis record event as input, deserialize the data field and output the modified event to the logs.

When a kinesis rule is created, in addition to the messageConsumer event mapping, an event mapping is created to trigger KinesisInboundEventLogger to record a log of the inbound record, to allow for analysis in case of unexpected failure.

Response Stream messages

Cumulus also supports this feature for all outbound messages. To take advantage of this feature, you will need to set an event mapping on the KinesisOutboundEventLogger Lambda that targets your response-endpoint. You can do this in the Lambda management page for KinesisOutboundEventLogger. Add a Kinesis trigger, and configure it to target the cnmResponseStream for your workflow:

Screenshot of the AWS console showing configuration for Kinesis stream trigger on KinesisOutboundEventLogger Lambda

Once this is done, all records sent to the response-endpoint will also be logged in CloudWatch. For more on configuring Lambdas to trigger on Kinesis events, please see creating an event source mapping.

- + \ No newline at end of file diff --git a/docs/data-cookbooks/error-handling/index.html b/docs/data-cookbooks/error-handling/index.html index f09a2ff57f9..f01876b9d82 100644 --- a/docs/data-cookbooks/error-handling/index.html +++ b/docs/data-cookbooks/error-handling/index.html @@ -5,7 +5,7 @@ Error Handling in Workflows | Cumulus Documentation - + @@ -45,7 +45,7 @@ Service Exception. See this documentation on configuring your workflow to handle transient lambda errors.

Example state machine definition:

{
"Comment": "Tests Workflow from Kinesis Stream",
"StartAt": "TranslateMessage",
"States": {
"TranslateMessage": {
"Parameters": {
"cma": {
"event.$": "$",
"task_config": {
"cumulus_message": {
"outputs": [
{
"source": "{$.cnm}",
"destination": "{$.meta.cnm}"
},
{
"source": "{$}",
"destination": "{$.payload}"
}
]
}
}
}
},
"Type": "Task",
"Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Catch": [
{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.exception",
"Next": "CnmResponseFail"
}
],
"Next": "SyncGranule"
},
"SyncGranule": {
"Parameters": {
"cma": {
"event.$": "$",
"ReplaceConfig": {
"Path": "$.payload",
"TargetPath": "$.payload"
},
"task_config": {
"provider": "{$.meta.provider}",
"buckets": "{$.meta.buckets}",
"collection": "{$.meta.collection}",
"downloadBucket": "{$.meta.buckets.private.name}",
"stack": "{$.meta.stack}",
"cumulus_message": {
"outputs": [
{
"source": "{$.granules}",
"destination": "{$.meta.input_granules}"
},
{
"source": "{$}",
"destination": "{$.payload}"
}
]
}
}
}
},
"Type": "Task",
"Resource": "${module.cumulus.sync_granule_task.task_arn}",
"Retry": [
{
"ErrorEquals": ["States.ALL"],
"IntervalSeconds": 10,
"MaxAttempts": 3
}
],
"Catch": [
{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.exception",
"Next": "CnmResponseFail"
}
],
"Next": "CnmResponse"
},
"CnmResponse": {
"Parameters": {
"cma": {
"event.$": "$",
"task_config": {
"OriginalCNM": "{$.meta.cnm}",
"CNMResponseStream": "{$.meta.cnmResponseStream}",
"region": "us-east-1",
"WorkflowException": "{$.exception}",
"cumulus_message": {
"outputs": [
{
"source": "{$}",
"destination": "{$.meta.cnmResponse}"
},
{
"source": "{$}",
"destination": "{$.payload}"
}
]
}
}
}
},
"Type": "Task",
"Resource": "${aws_lambda_function.cnm_response_task.arn}",
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Catch": [
{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.exception",
"Next": "WorkflowFailed"
}
],
"Next": "WorkflowSucceeded"
},
"CnmResponseFail": {
"Parameters": {
"cma": {
"event.$": "$",
"task_config": {
"OriginalCNM": "{$.meta.cnm}",
"CNMResponseStream": "{$.meta.cnmResponseStream}",
"region": "us-east-1",
"WorkflowException": "{$.exception}",
"cumulus_message": {
"outputs": [
{
"source": "{$}",
"destination": "{$.meta.cnmResponse}"
},
{
"source": "{$}",
"destination": "{$.payload}"
}
]
}
}
}
},
"Type": "Task",
"Resource": "${aws_lambda_function.cnm_response_task.arn}",
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Catch": [
{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.exception",
"Next": "WorkflowFailed"
}
],
"Next": "WorkflowFailed"
},
"WorkflowSucceeded": {
"Type": "Succeed"
},
"WorkflowFailed": {
"Type": "Fail",
"Cause": "Workflow failed"
}
}
}

The above results in a workflow which is visualized in the diagram below:

Screenshot of a visualization of an AWS Step Function workflow definition with branching logic for failures

Summary

Error handling should (mostly) be the domain of workflow configuration.

- + \ No newline at end of file diff --git a/docs/data-cookbooks/hello-world/index.html b/docs/data-cookbooks/hello-world/index.html index 3ef654aa6dc..d27822c9982 100644 --- a/docs/data-cookbooks/hello-world/index.html +++ b/docs/data-cookbooks/hello-world/index.html @@ -5,14 +5,14 @@ HelloWorld Workflow | Cumulus Documentation - +
Version: v16.0.0

HelloWorld Workflow

Example task meant to be a sanity check/introduction to the Cumulus workflows.

Pre-Deployment Configuration

Workflow Configuration

A workflow definition can be found in the template repository hello_world_workflow module.

{
"Comment": "Returns Hello World",
"StartAt": "HelloWorld",
"States": {
"HelloWorld": {
"Parameters": {
"cma": {
"event.$": "$",
"task_config": {
"buckets": "{$.meta.buckets}",
"provider": "{$.meta.provider}",
"collection": "{$.meta.collection}"
}
}
},
"Type": "Task",
"Resource": "${module.cumulus.hello_world_task.task_arn}",
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"End": true
}
}
}

Workflow error-handling can be configured as discussed in the Error-Handling cookbook.

Task Configuration

The HelloWorld task is provided for you as part of the cumulus terraform module, no configuration is needed.

If you want to manually deploy your own version of this Lambda for testing, you can copy the Lambda resource definition located in the Cumulus source code at cumulus/tf-modules/ingest/hello-world-task.tf. The Lambda source code is located in the Cumulus source code at 'cumulus/tasks/hello-world'.

Execution

We will focus on using the Cumulus dashboard to schedule the execution of a HelloWorld workflow.

Our goal here is to create a rule through the Cumulus dashboard that will define the scheduling and execution of our HelloWorld workflow. Let's navigate to the Rules page and click Add a rule.

{
"collection": { # collection values can be configured and found on the Collections page
"name": "${collection_name}",
"version": "${collection_version}"
},
"name": "helloworld_rule",
"provider": "${provider}", # found on the Providers page
"rule": {
"type": "onetime"
},
"state": "ENABLED",
"workflow": "HelloWorldWorkflow" # This can be found on the Workflows page
}

Screenshot of AWS Step Function execution graph for the HelloWorld workflow Executed workflow as seen in AWS Console

Output/Results

The Executions page presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information. The rule defined in the previous section should start an execution of its own accord, and the status of that execution can be tracked here.

To get some deeper information on the execution, click on the value in the Name column of your execution of interest. This should bring up a visual representation of the workflow similar to that shown above, execution details, and a list of events.

Summary

Setting up the HelloWorld workflow on the Cumulus dashboard is the tip of the iceberg, so to speak. The task and step-function need to be configured before Cumulus deployment. A compatible collection and provider must be configured and applied to the rule. Finally, workflow execution status can be viewed via the workflows tab on the dashboard.

- + \ No newline at end of file diff --git a/docs/data-cookbooks/ingest-notifications/index.html b/docs/data-cookbooks/ingest-notifications/index.html index a6acadd0213..c2592b62590 100644 --- a/docs/data-cookbooks/ingest-notifications/index.html +++ b/docs/data-cookbooks/ingest-notifications/index.html @@ -5,13 +5,13 @@ Ingest Notification in Workflows | Cumulus Documentation - +
Version: v16.0.0

Ingest Notification in Workflows

On deployment, an SQS queue and three SNS topics, one for executions, granules, and PDRs, are created and used for handling notification messages related to the workflow.

The ingest notification reporting SQS queue is populated via a Cloudwatch rule for any Step Function execution state transitions. The sfEventSqsToDbRecords Lambda consumes this queue. The queue and Lambda are included in the cumulus module and the Cloudwatch rule in the workflow module and are included by default in a Cumulus deployment.

The sfEventSqsToDbRecords Lambda function reads from the sfEventSqsToDbRecordsInputQueue queue and updates the RDS database records for granules, executions, and PDRs. When the records are updated, messages are posted to the three SNS topics. This Lambda is invoked both when the workflow starts and when it reaches a terminal state (completion or failure).

Diagram of architecture for reporting workflow ingest notifications from AWS Step Functions

Sending SQS messages to report status

Publishing granule/PDR reports directly to the SQS queue

If you have a non-Cumulus workflow or process ingesting data and would like to update the status of your granules or PDRs, you can publish directly to the reporting SQS queue. Publishing messages to this queue will result in those messages being stored as granule/PDR records in the Cumulus database and having the status of those granules/PDRs being visible on the Cumulus dashboard. The queue does have certain expectations as it expects a Cumulus Message nested within a Cloudwatch Step Function Event object.

Posting directly to the queue will require knowing the queue URL. Assuming that you are using the cumulus module for your deployment, you can get the queue URL by adding them to outputs.tf for your Terraform deployment as in our example deployment:

output "stepfunction_event_reporter_queue_url" {
value = module.cumulus.stepfunction_event_reporter_queue_url
}

output "report_executions_sns_topic_arn" {
value = module.cumulus.report_executions_sns_topic_arn
}
output "report_granules_sns_topic_arn" {
value = module.cumulus.report_executions_sns_topic_arn
}
output "report_pdrs_sns_topic_arn" {
value = module.cumulus.report_pdrs_sns_topic_arn
}

Then, when you run terraform deploy, you should see the topic ARNs printed to your console:

Outputs:
...
stepfunction_event_reporter_queue_url = https://sqs.us-east-1.amazonaws.com/xxxxxxxxx/<prefix>-sfEventSqsToDbRecordsInputQueue
report_executions_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
report_granules_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
report_pdrs_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-pdrs-topic

Once you have the queue URL, you can use the AWS SDK for your language of choice to publish messages to the topic. The expected format of these messages is that of a Cloudwatch Step Function event containing a Cumulus message. For SUCCEEDED events, the Cumulus message is expected to be in detail.output. For all other events statuses, a Cumulus Message is expected in detail.input. The Cumulus Message populating these fields MUST be a JSON string, not an object. Messages that do not conform to the schemas will fail to be created as records.

If you are not seeing records persist to the database or show up in the Cumulus dashboard, you can investigate the Cloudwatch logs of the SQS consumer Lambda:

  • /aws/lambda/<prefix>-sfEventSqsToDbRecords

In a workflow

As described above, ingest notifications will automatically be published to the SNS topics on workflow start and completion/failure, so you should not include a workflow step to publish the initial or final status of your workflows.

However, if you want to report your ingest status at any point during a workflow execution, you can add a workflow step using the SfSqsReport Lambda. In the following example from cumulus-tf/parse_pdr_workflow.tf, the ParsePdr workflow is configured to use the SfSqsReport Lambda, primarily to update the PDR ingestion status.

Note: ${sf_sqs_report_task_arn} is an interpolated value referring to a Terraform resource. See the example deployment code for the ParsePdr workflow.

  "PdrStatusReport": {
"Parameters": {
"cma": {
"event.$": "$",
"ReplaceConfig": {
"FullMessage": true
},
"task_config": {
"cumulus_message": {
"input": "{$}"
}
}
}
},
"ResultPath": null,
"Type": "Task",
"Resource": "${sf_sqs_report_task_arn}",
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.exception",
"Next": "WorkflowFailed"
}
],
"Next": "WaitForSomeTime"
},

Subscribing additional listeners to SNS topics

Additional listeners to SNS topics can be configured in a .tf file for your Cumulus deployment. Shown below is configuration that subscribes an additional Lambda function (test_lambda) to receive messages from the report_executions SNS topic. To subscribe to the report_granules or report_pdrs SNS topics instead, simply replace report_executions in the code block below with either of those values.

resource "aws_lambda_function" "test_lambda" {
function_name = "${var.prefix}-testLambda"
filename = "./testLambda.zip"
source_code_hash = filebase64sha256("./testLambda.zip")
handler = "index.handler"
role = module.cumulus.lambda_processing_role_arn
runtime = "nodejs10.x"
}

resource "aws_sns_topic_subscription" "test_lambda" {
topic_arn = module.cumulus.report_executions_sns_topic_arn
protocol = "lambda"
endpoint = aws_lambda_function.test_lambda.arn
}

resource "aws_lambda_permission" "test_lambda" {
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.test_lambda.arn
principal = "sns.amazonaws.com"
source_arn = module.cumulus.report_executions_sns_topic_arn
}

SNS message format

Subscribers to the SNS topics can expect to find the published message in the SNS event at Records[0].Sns.Message. The message will be a JSON stringified version of the ingest notification record for an execution or a PDR. For granules, the message will be a JSON stringified object with ingest notification record in the record property and the event type as the event property.

The ingest notification record of the execution, granule, or PDR should conform to the data model schema for the given record type.

Summary

Workflows can be configured to send SQS messages at any point using the sf-sqs-report task.

Additional listeners can be easily configured to trigger when messages are sent to the SNS topics.

- + \ No newline at end of file diff --git a/docs/data-cookbooks/queue-post-to-cmr/index.html b/docs/data-cookbooks/queue-post-to-cmr/index.html index 7e3d77675e6..5a95cfc0485 100644 --- a/docs/data-cookbooks/queue-post-to-cmr/index.html +++ b/docs/data-cookbooks/queue-post-to-cmr/index.html @@ -5,13 +5,13 @@ Queue PostToCmr | Cumulus Documentation - +
Version: v16.0.0

Queue PostToCmr

In this document, we walk through handling CMR errors in workflows by queueing PostToCmr. We assume that the user already has an ingest workflow setup.

Overview

The general concept is that the last task of the ingest workflow will be QueueWorkflow, which queues the publish workflow. The publish workflow contains the PostToCmr task and if a CMR error occurs during PostToCmr, the publish workflow will add itself back onto the queue so that it can be executed when CMR is back online. This is achieved by leveraging the QueueWorkflow task again in the publish workflow. The following diagram demonstrates this queueing process.

Diagram of workflow queueing

Ingest Workflow

The last step should be the QueuePublishWorkflow step. It should be configured with a queueUrl and workflow. In this case, the queueUrl is a throttled queue. Any queueUrl can be specified here which is useful if you would like to use a lower priority queue. The workflow is the unprefixed workflow name that you would like to queue (e.g. PublishWorkflow).

  "QueuePublishWorkflowStep": {
"Parameters": {
"cma": {
"event.$": "$",
"ReplaceConfig": {
"FullMessage": true
},
"task_config": {
"internalBucket": "{$.meta.buckets.internal.name}",
"stackName": "{$.meta.stack}",
"workflow": "{$.meta.workflow}",
"queueUrl": "${start_sf_queue_url}",
"provider": "{$.meta.provider}",
"collection": "{$.meta.collection}"
}
}
},
"Type": "Task",
"Resource": "${queue_workflow_task_arn}",
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.exception",
"Next": "WorkflowFailed"
}
],
"End": true
},

Publish Workflow

Configure the Catch section of your PostToCmr task to proceed to QueueWorkflow if a CMRInternalError is caught. Any other error will cause the workflow to fail.

  "Catch": [
{
"ErrorEquals": [
"CMRInternalError"
],
"Next": "RequeueWorkflow"
},
{
"ErrorEquals": [
"States.ALL"
],
"Next": "WorkflowFailed",
"ResultPath": "$.exception"
}
],

Then, configure the QueueWorkflow task similarly to its configuration in the ingest workflow. This time, pass the current publish workflow to the task config. This allows for the publish workflow to be requeued when there is a CMR error.

{
"RequeueWorkflow": {
"Parameters": {
"cma": {
"event.$": "$",
"task_config": {
"buckets": "{$.meta.buckets}",
"distribution_endpoint": "{$.meta.distribution_endpoint}",
"workflow": "PublishGranuleQueue",
"queueUrl": "${start_sf_queue_url}",
"provider": "{$.meta.provider}",
"collection": "{$.meta.collection}"
}
}
},
"Type": "Task",
"Resource": "${queue_workflow_task_arn}",
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "WorkflowFailed",
"ResultPath": "$.exception"
}
],
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"End": true
}
}
- + \ No newline at end of file diff --git a/docs/data-cookbooks/run-tasks-in-lambda-or-docker/index.html b/docs/data-cookbooks/run-tasks-in-lambda-or-docker/index.html index cd4b3b4f212..e477cd398cc 100644 --- a/docs/data-cookbooks/run-tasks-in-lambda-or-docker/index.html +++ b/docs/data-cookbooks/run-tasks-in-lambda-or-docker/index.html @@ -5,13 +5,13 @@ Run Step Function Tasks in AWS Lambda or Docker | Cumulus Documentation - +
Version: v16.0.0

Run Step Function Tasks in AWS Lambda or Docker

Overview

AWS Step Function Tasks can run tasks on AWS Lambda or on AWS Elastic Container Service (ECS) as a Docker container.

Lambda provides serverless architecture, providing the best option for minimizing cost and server management. ECS provides the fullest extent of AWS EC2 resources via the flexibility to execute arbitrary code on any AWS EC2 instance type.

When to use Lambda

You should use AWS Lambda whenever all of the following are true:

  • The task runs on one of the supported Lambda Runtimes. At time of this writing, supported runtimes include versions of python, Java, Ruby, node.js, Go and .NET.
  • The lambda package is less than 50 MB in size, zipped.
  • The task consumes less than each of the following resources:
    • 3008 MB memory allocation
    • 512 MB disk storage (must be written to /tmp)
    • 15 minutes of execution time

See this page for a complete and up-to-date list of AWS Lambda limits.

If your task requires more than any of these resources or an unsupported runtime, creating a Docker image which can be run on ECS is the way to go. Cumulus supports running any lambda package (and its configured layers) as a Docker container with cumulus-ecs-task.

Step Function Activities and cumulus-ecs-task

Step Function Activities enable a state machine task to "publish" an activity task which can be picked up by any activity worker. Activity workers can run pretty much anywhere, but Cumulus workflows support the cumulus-ecs-task activity worker. The cumulus-ecs-task worker runs as a Docker container on the Cumulus ECS cluster.

The cumulus-ecs-task container takes an AWS Lambda Amazon Resource Name (ARN) as an argument (see --lambdaArn in the example below). This ARN argument is defined at deployment time. The cumulus-ecs-task worker polls for new Step Function Activity Tasks. When a Step Function executes, the worker (container) picks up the activity task and runs the code contained in the lambda package defined on deployment.

Example: Replacing AWS Lambda with a Docker container run on ECS

This example will use an already-defined workflow from the cumulus module that includes the QueueGranules task in its configuration.

The following example is an excerpt from the Discover Granules workflow containing the step definition for the QueueGranules step:

Note: ${ingest_granule_workflow_name} and ${queue_granules_task_arn} are interpolated values that refer to Terraform resources. See the example deployment code for the Discover Granules workflow.

  "QueueGranules": {
"Parameters": {
"cma": {
"event.$": "$",
"ReplaceConfig": {
"FullMessage": true
},
"task_config": {
"provider": "{$.meta.provider}",
"internalBucket": "{$.meta.buckets.internal.name}",
"stackName": "{$.meta.stack}",
"granuleIngestWorkflow": "${ingest_granule_workflow_name}",
"queueUrl": "{$.meta.queues.startSF}"
}
}
},
"Type": "Task",
"Resource": "${queue_granules_task_arn}",
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.exception",
"Next": "WorkflowFailed"
}
],
"End": true
},

Given it has been discovered this task can no longer run in AWS Lambda, you can instead run it on the Cumulus ECS cluster by adding the following resources to your terraform deployment (by either adding a new .tf file or updating an existing one):

  • A aws_sfn_activity resource:
resource "aws_sfn_activity" "queue_granules" {
name = "${var.prefix}-QueueGranules"
}
  • An instance of the cumulus_ecs_service module (found on the Cumulus releases page configured to provide the QueueGranules task:

module "queue_granules_service" {
source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-ecs-service.zip"

prefix = var.prefix
name = "QueueGranules"

cluster_arn = module.cumulus.ecs_cluster_arn
desired_count = 1
image = "cumuluss/cumulus-ecs-task:1.9.0"

cpu = 400
memory_reservation = 700

environment = {
AWS_DEFAULT_REGION = data.aws_region.current.name
}
command = [
"cumulus-ecs-task",
"--activityArn",
aws_sfn_activity.queue_granules.id,
"--lambdaArn",
module.cumulus.queue_granules_task.task_arn,
"--lastModified",
module.cumulus.queue_granules_task.last_modified_date
]
alarms = {
MemoryUtilizationHigh = {
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 1
metric_name = "MemoryUtilization"
statistic = "SampleCount"
threshold = 75
}
}
}

Please note: If you have updated the code for the Lambda specified by --lambdaArn, you will have to manually restart the tasks in your ECS service before invocation of the Step Function activity will use the updated Lambda code.

  • An updated Discover Granules workflow) to utilize the new resource (the Resource key in the QueueGranules step has been updated to:

"Resource": "${aws_sfn_activity.queue_granules.id}")`

If you then run this workflow in place of the DiscoverGranules workflow, the QueueGranules step would run as an ECS task instead of a lambda.

Final note

Step Function Activities and AWS Lambda are not the only ways to run tasks in an AWS Step Function. Learn more about other service integrations, including direct ECS integration via the AWS Service Integrations page.

- + \ No newline at end of file diff --git a/docs/data-cookbooks/sips-workflow/index.html b/docs/data-cookbooks/sips-workflow/index.html index 440f93a422e..c882df30b1b 100644 --- a/docs/data-cookbooks/sips-workflow/index.html +++ b/docs/data-cookbooks/sips-workflow/index.html @@ -5,7 +5,7 @@ Science Investigator-led Processing Systems (SIPS) | Cumulus Documentation - + @@ -16,7 +16,7 @@ we're just going to create a onetime throw-away rule that will be easy to test with. This rule will kick off the DiscoverAndQueuePdrs workflow, which is the beginning of a Cumulus SIPS workflow:

Screenshot of a Cumulus rule configuration

Note: A list of configured workflows exists under the "Workflows" in the navigation bar on the Cumulus dashboard. Additionally, one can find a list of executions and their respective status in the "Executions" tab in the navigation bar.

DiscoverAndQueuePdrs Workflow

This workflow will discover PDRs and queue them to be processed. Duplicate PDRs will be dealt with according to the configured duplicate handling setting in the collection. The lambdas below are included in the cumulus terraform module for use in your workflows:

  1. DiscoverPdrs - source
  2. QueuePdrs - source

Screenshot of execution graph for discover and queue PDRs workflow in the AWS Step Functions console

An example workflow module configuration can be viewed in the Cumulus source for the discover_and_queue_pdrs_workflow.

Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

ParsePdr Workflow

The ParsePdr workflow will parse a PDR, queue the specified granules (duplicates are handled according to the duplicate handling setting) and periodically check the status of those queued granules. This workflow will not succeed until all the granules included in the PDR are successfully ingested. If one of those fails, the ParsePdr workflow will fail. NOTE that ParsePdr may spin up multiple IngestGranule workflows in parallel, depending on the granules included in the PDR.

The lambdas below are included in the cumulus terraform module for use in your workflows:

  1. ParsePdr - source
  2. QueueGranules - source
  3. CheckStatus - source

Screenshot of execution graph for SIPS Parse PDR workflow in AWS Step Functions console

An example workflow module configuration can be viewed in the Cumulus source for the parse_pdr_workflow.

Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

IngestGranule Workflow

The IngestGranule workflow processes and ingests a granule and posts the granule metadata to CMR.

The lambdas below are included in the cumulus terraform module for use in your workflows:

  1. SyncGranule - source.
  2. CmrStep - source

Additionally this workflow requires a processing step you must provide. The ProcessingStep step in the workflow picture below is an example of a custom processing step.

Note: Using the CmrStep is not required and can be left out of the processing trajectory if desired (for example, in testing situations).

Screenshot of execution graph for SIPS IngestGranule workflow in AWS Step Functions console

An example workflow module configuration can be viewed in the Cumulus source for the ingest_and_publish_granule_workflow.

Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

Summary

In this cookbook we went over setting up a collection, rule, and provider for a SIPS workflow. Once we had the setup completed, we looked over the Cumulus workflows that participate in parsing PDRs, ingesting and processing granules, and updating CMR.

- + \ No newline at end of file diff --git a/docs/data-cookbooks/throttling-queued-executions/index.html b/docs/data-cookbooks/throttling-queued-executions/index.html index ca0f794aa27..5974e06e522 100644 --- a/docs/data-cookbooks/throttling-queued-executions/index.html +++ b/docs/data-cookbooks/throttling-queued-executions/index.html @@ -5,13 +5,13 @@ Throttling queued executions | Cumulus Documentation - +
Version: v16.0.0

Throttling queued executions

In this entry, we will walk through how to create an SQS queue for scheduling executions which will be used to limit those executions to a maximum concurrency. And we will see how to configure our Cumulus workflows/rules to use this queue.

We will also review the architecture of this feature and highlight some implementation notes.

Limiting the number of executions that can be running from a given queue is useful for controlling the cloud resource usage of workflows that may be lower priority, such as granule reingestion or reprocessing campaigns. It could also be useful for preventing workflows from exceeding known resource limits, such as a maximum number of open connections to a data provider.

Implementing the queue

Create and deploy the queue

Add a new queue

In a .tf file for your Cumulus deployment, add a new SQS queue:

resource "aws_sqs_queue" "background_job_queue" {
name = "${var.prefix}-backgroundJobQueue"
receive_wait_time_seconds = 20
visibility_timeout_seconds = 60
}

Set maximum executions for the queue

Define the throttled_queues variable for the cumulus module in your Cumulus deployment to specify the maximum concurrent executions for the queue.

module "cumulus" {
# ... other variables

throttled_queues = [{
url = aws_sqs_queue.background_job_queue.id,
execution_limit = 5
}]
}

Setup consumer for the queue

Add the sqs2sfThrottle Lambda as the consumer for the queue and add a Cloudwatch event rule/target to read from the queue on a scheduled basis.

Please note: You must use the sqs2sfThrottle Lambda as the consumer for any queue with a queue execution limit or else the execution throttling will not work correctly. Additionally, please allow at least 60 seconds after creation before using the queue while associated infrastructure and triggers are set up and made ready.

aws_sqs_queue.background_job_queue.id refers to the queue resource defined above.

resource "aws_cloudwatch_event_rule" "background_job_queue_watcher" {
schedule_expression = "rate(1 minute)"
}

resource "aws_cloudwatch_event_target" "background_job_queue_watcher" {
rule = aws_cloudwatch_event_rule.background_job_queue_watcher.name
arn = module.cumulus.sqs2sfThrottle_lambda_function_arn
input = jsonencode({
messageLimit = 500
queueUrl = aws_sqs_queue.background_job_queue.id
timeLimit = 60
})
}

resource "aws_lambda_permission" "background_job_queue_watcher" {
action = "lambda:InvokeFunction"
function_name = module.cumulus.sqs2sfThrottle_lambda_function_arn
principal = "events.amazonaws.com"
source_arn = aws_cloudwatch_event_rule.background_job_queue_watcher.arn
}

Re-deploy your Cumulus application

Follow the instructions to re-deploy your Cumulus application. After you have re-deployed, your workflow template will be updated to the include information about the queue (the output below is partial output from an expected workflow template):

{
"cumulus_meta": {
"queueExecutionLimits": {
"<backgroundJobQueue_SQS_URL>": 5
}
}
}

Integrate your queue with workflows and/or rules

Integrate queue with queuing steps in workflows

For any workflows using QueueGranules or QueuePdrs that you want to use your new queue, update the Cumulus configuration of those steps in your workflows.

As seen in this partial configuration for a QueueGranules step, update the queueUrl to reference the new throttled queue:

Note: ${ingest_granule_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverGranules workflow.

{
"QueueGranules": {
"Parameters": {
"cma": {
"event.$": "$",
"ReplaceConfig": {
"FullMessage": true
},
"task_config": {
"queueUrl": "${aws_sqs_queue.background_job_queue.id}",
"provider": "{$.meta.provider}",
"internalBucket": "{$.meta.buckets.internal.name}",
"stackName": "{$.meta.stack}",
"granuleIngestWorkflow": "${ingest_granule_workflow_name}"
}
}
}
}
}

Similarly, for a QueuePdrs step:

Note: ${parse_pdr_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverPdrs workflow.

{
"QueuePdrs": {
"Parameters": {
"cma": {
"event.$": "$",
"ReplaceConfig": {
"FullMessage": true
},
"task_config": {
"queueUrl": "${aws_sqs_queue.background_job_queue.id}",
"provider": "{$.meta.provider}",
"collection": "{$.meta.collection}",
"internalBucket": "{$.meta.buckets.internal.name}",
"stackName": "{$.meta.stack}",
"parsePdrWorkflow": "${parse_pdr_workflow_name}"
}
}
}
}
}

After making these changes, re-deploy your Cumulus application for the execution throttling to take effect on workflow executions queued by these workflows.

Create/update a rule to use your new queue

Create or update a rule definition to include a queueUrl property that refers to your new queue:

{
"name": "s3_provider_rule",
"workflow": "DiscoverAndQueuePdrs",
"provider": "s3_provider",
"collection": {
"name": "MOD09GQ",
"version": "006"
},
"rule": {
"type": "onetime"
},
"state": "ENABLED",
"queueUrl": "<backgroundJobQueue_SQS_URL>" // configure rule to use your queue URL
}

After creating/updating the rule, any subsequent invocations of the rule should respect the maximum number of executions when starting workflows from the queue.

Architecture

Architecture diagram showing how executions started from a queue are throttled to a maximum concurrent limit

Execution throttling based on the queue works by manually keeping a count (semaphore) of how many executions are running for the queue at a time. The key operation that prevents the number of executions from exceeding the maximum for the queue is that before starting new executions, the sqs2sfThrottle Lambda attempts to increment the semaphore and responds as follows:

  • If the increment operation is successful, then the count was not at the maximum and an execution is started
  • If the increment operation fails, then the count was already at the maximum so no execution is started

Final notes

Limiting the number of concurrent executions for work scheduled via a queue has several consequences worth noting:

  • The number of executions that are running for a given queue will be limited to the maximum for that queue regardless of which workflow(s) are started.
  • If you use the same queue to schedule executions across multiple workflows/rules, then the limit on the total number of executions running concurrently will be applied to all of the executions scheduled across all of those workflows/rules.
  • If you are scheduling the same workflow both via a queue with a maxExecutions value and a queue without a maxExecutions value, only the executions scheduled via the queue with the maxExecutions value will be limited to the maximum.
- + \ No newline at end of file diff --git a/docs/data-cookbooks/tracking-files/index.html b/docs/data-cookbooks/tracking-files/index.html index 0ca2e4c735e..47031f13b29 100644 --- a/docs/data-cookbooks/tracking-files/index.html +++ b/docs/data-cookbooks/tracking-files/index.html @@ -5,7 +5,7 @@ Tracking Ancillary Files | Cumulus Documentation - + @@ -19,7 +19,7 @@ The UMM-G column reflects the RelatedURL's Type derived from the CNM type, whereas the ECHO10 column shows how the CNM type affects the destination element.

CNM TypeUMM-G RelatedUrl.TypeECHO10 Location
ancillary'VIEW RELATED INFORMATION'OnlineResource
data'GET DATA'(HTTPS URL) or 'GET DATA VIA DIRECT ACCESS'(S3 URI)OnlineAccessURL
browse'GET RELATED VISUALIZATION'AssociatedBrowseImage
linkage'EXTENDED METADATA'OnlineResource
metadata'EXTENDED METADATA'OnlineResource
qa'EXTENDED METADATA'OnlineResource

Common Use Cases

This section briefly documents some common use cases and the recommended configuration for the file. The examples shown here are for the DiscoverGranules use case, which allows configuration at the Cumulus dashboard level. The other two cases covered in the ancillary metadata documentation require configuration at the provider notification level (either CNM message or PDR) and are not covered here.

Configuring browse imagery:

{
"bucket": "public",
"regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_[\\d]{1}.jpg$",
"sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_1.jpg",
"type": "browse"
}

Configuring a documentation entry:

{
"bucket": "protected",
"regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_README.pdf$",
"sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_README.pdf",
"type": "metadata"
}

Configuring other associated files (use types metadata or qa as appropriate):

{
"bucket": "protected",
"regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_QA.txt$",
"sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_QA.txt",
"type": "qa"
}
- + \ No newline at end of file diff --git a/docs/deployment/api-gateway-logging/index.html b/docs/deployment/api-gateway-logging/index.html index 0594ba333b6..16081399bbe 100644 --- a/docs/deployment/api-gateway-logging/index.html +++ b/docs/deployment/api-gateway-logging/index.html @@ -5,13 +5,13 @@ API Gateway Logging | Cumulus Documentation - +
Version: v16.0.0

API Gateway Logging

Enabling API Gateway Logging

In order to enable distribution API Access and execution logging, configure the TEA deployment by setting log_api_gateway_to_cloudwatch on the thin_egress_app module:

log_api_gateway_to_cloudwatch = true

This enables the distribution API to send its logs to the default CloudWatch location: API-Gateway-Execution-Logs_<RESTAPI_ID>/<STAGE>

Configure Permissions for API Gateway Logging to CloudWatch

Instructions: Enabling Account Level Logging from API Gateway to CloudWatch

This is a one time operation that must be performed on each AWS account to allow API Gateway to push logs to CloudWatch.

  1. Create a policy document

    The AmazonAPIGatewayPushToCloudWatchLogs managed policy, with an ARN of arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs, has all the required permissions to enable API Gateway logging to CloudWatch. To grant these permissions to your account, first create an IAM role with apigateway.amazonaws.com as its trusted entity.

    Save this snippet as apigateway-policy.json.

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "",
    "Effect": "Allow",
    "Principal": {
    "Service": "apigateway.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
    }
    ]
    }
  2. Create an account role to act as ApiGateway and write to CloudWatchLogs

    in NGAP

    NASA users in NGAP: Be sure to use your account's permission boundary.

        aws iam create-role \
    --role-name ApiGatewayToCloudWatchLogs \
    [--permissions-boundary <permissionBoundaryArn>] \
    --assume-role-policy-document file://apigateway-policy.json

    Note the ARN of the returned role for the last step.

  3. Attach correct permissions to role

    Next attach the AmazonAPIGatewayPushToCloudWatchLogs policy to the IAM role.

    aws iam attach-role-policy \
    --role-name ApiGatewayToCloudWatchLogs \
    --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs"
  4. Update Account API Gateway settings with correct permissions

    Finally, set the IAM role ARN on the cloudWatchRoleArn property on your API Gateway Account settings.

    aws apigateway update-account \
    --patch-operations op='replace',path='/cloudwatchRoleArn',value='<ApiGatewayToCloudWatchLogs ARN>'

Configure API Gateway CloudWatch Logs Delivery

For details about configuring the API Gateway CloudWatch Logs delivery, see Configure Cloudwatch Logs Delivery.

- + \ No newline at end of file diff --git a/docs/deployment/apis-introduction/index.html b/docs/deployment/apis-introduction/index.html index 248d5dbf3f8..2852f28b9f0 100644 --- a/docs/deployment/apis-introduction/index.html +++ b/docs/deployment/apis-introduction/index.html @@ -5,13 +5,13 @@ APIs | Cumulus Documentation - +
Version: v16.0.0

APIs

Common Distribution APIs

When deploying from the Cumulus Deployment Template or a configuration based on that repo, the Thin Egress App (TEA) distribution app will be used by default. However, you have the choice to use the Cumulus Distribution API as well.

Cumulus API Customization Use Cases

Our Cumulus API offers you the flexibility to customize for your DAAC/organization. Below is a list of use cases that may help you with options:

Types of APIs

- + \ No newline at end of file diff --git a/docs/deployment/choosing_configuring_rds/index.html b/docs/deployment/choosing_configuring_rds/index.html index d82d5126d41..dd0c5bf96ec 100644 --- a/docs/deployment/choosing_configuring_rds/index.html +++ b/docs/deployment/choosing_configuring_rds/index.html @@ -5,7 +5,7 @@ RDS: Choosing and Configuring Your Database Type | Cumulus Documentation - + @@ -36,7 +36,7 @@ using this module to create your RDS cluster, you can configure the autoscaling timeout action, the cluster minimum and maximum capacity, and more as seen in the supported variables for the module.

Unfortunately, Terraform currently doesn't allow specifying the autoscaling timeout itself, so that value will have to be manually configured in the AWS console or CLI.

Optional: Manage RDS Database with pgAdmin

Setup SSM Port Forwarding

note

In order to perform this action you will need to deploy it within a VPC and have the credentials to access via NGAP protocols.

For a walkthrough guide on how to utilize AWS's Session Manager for port forwarding to access the Cumulus RDS database go to the Accessing Cumulus RDS database via SSM Port Forwarding article.

- + \ No newline at end of file diff --git a/docs/deployment/cloudwatch-logs-delivery/index.html b/docs/deployment/cloudwatch-logs-delivery/index.html index 47f6697ad73..f77bcd49ea9 100644 --- a/docs/deployment/cloudwatch-logs-delivery/index.html +++ b/docs/deployment/cloudwatch-logs-delivery/index.html @@ -5,13 +5,13 @@ Configure Cloudwatch Logs Delivery | Cumulus Documentation - +
Version: v16.0.0

Configure Cloudwatch Logs Delivery

As an optional configuration step, it is possible to deliver CloudWatch logs to a cross-account shared AWS::Logs::Destination. An operator does this by configuring the cumulus module for your deployment as shown below. The value of the log_destination_arn variable is the ARN of a writeable log destination.

The value can be either an AWS::Logs::Destination or a Kinesis Stream ARN to which your account can write.

log_destination_arn           = arn:aws:[kinesis|logs]:us-east-1:123456789012:[streamName|destination:logDestinationName]

Logs Sent

By default, the following logs will be sent to the destination when one is given.

  • Ingest logs
  • Async Operation logs
  • Thin Egress App API Gateway logs (if configured)

Additional Logs

If additional logs are needed, you can configure additional_log_groups_to_elk with the Cloudwatch log groups you want to send to the destination. additional_log_groups_to_elk is a map with the key as a descriptor and the value with the Cloudwatch log group name.

additional_log_groups_to_elk = {
"HelloWorldTask" = "/aws/lambda/cumulus-example-HelloWorld"
"MyCustomTask" = "my-custom-task-log-group"
}
- + \ No newline at end of file diff --git a/docs/deployment/components/index.html b/docs/deployment/components/index.html index 74033c9440e..b228956e57b 100644 --- a/docs/deployment/components/index.html +++ b/docs/deployment/components/index.html @@ -5,7 +5,7 @@ Component-based Cumulus Deployment | Cumulus Documentation - + @@ -39,7 +39,7 @@ Terraform at the same time.

With remote state, Terraform writes the state data to a remote data store, which can then be shared between all members of a team.

The recommended approach for handling remote state with Cumulus is to use the S3 backend. This backend stores state in S3 and uses a DynamoDB table for locking.

See the deployment documentation for a walk-through of creating resources for your remote state using an S3 backend.

- + \ No newline at end of file diff --git a/docs/deployment/create_bucket/index.html b/docs/deployment/create_bucket/index.html index 3d6019023b8..ada848ba72c 100644 --- a/docs/deployment/create_bucket/index.html +++ b/docs/deployment/create_bucket/index.html @@ -5,13 +5,13 @@ Creating an S3 Bucket | Cumulus Documentation - +
Version: v16.0.0

Creating an S3 Bucket

Buckets can be created on the command line with AWS CLI or via the web interface on the AWS console.

When creating a protected bucket (a bucket containing data which will be served through the distribution API), make sure to enable S3 server access logging. See S3 Server Access Logging for more details.

Command Line

Using the AWS Command Line Tool create-bucket s3api subcommand:

$ aws s3api create-bucket \
--bucket foobar-internal \
--region us-west-2 \
--create-bucket-configuration LocationConstraint=us-west-2
{
"Location": "/foobar-internal"
}
info

The region and create-bucket-configuration arguments are only necessary if you are creating a bucket outside of the us-east-1 region.

Please note security settings and other bucket options can be set via the options listed in the s3api documentation.

Repeat the above step for each bucket to be created.

Web Interface

If you prefer to use the AWS web interface instead of the command line, see AWS "Creating a Bucket" documentation.

- + \ No newline at end of file diff --git a/docs/deployment/cumulus_distribution/index.html b/docs/deployment/cumulus_distribution/index.html index 2c212558fae..9b7f224f70f 100644 --- a/docs/deployment/cumulus_distribution/index.html +++ b/docs/deployment/cumulus_distribution/index.html @@ -5,14 +5,14 @@ Using the Cumulus Distribution API | Cumulus Documentation - +
Version: v16.0.0

Using the Cumulus Distribution API

The Cumulus Distribution API is a set of endpoints that can be used to enable AWS Cognito authentication when downloading data from S3.

tip

If you need to access our quick reference materials while setting up or continuing to manage your API access go to the Cumulus Distribution API Docs.

Configuring a Cumulus Distribution Deployment

The Cumulus Distribution API is included in the main Cumulus repo. It is available as part of the terraform-aws-cumulus.zip archive in the latest release.

These steps assume you're using the Cumulus Deployment Template but they can also be used for custom deployments.

To configure a deployment to use Cumulus Distribution:

  1. Remove or comment the "Thin Egress App Settings" in the Cumulus Template Deploy and enable the "Cumulus Distribution Settings".
  2. Delete or comment the contents of thin_egress_app.tf and the corresponding Thin Egress App outputs in outputs.tf. These are not necessary for a Cumulus Distribution deployment.
  3. Uncomment the Cumulus Distribution outputs in outputs.tf.
  4. Rename cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.example to cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.

Cognito Application and User Credentials

The major prerequisite for using the Cumulus Distribution API is to set up Cognito. If operating within NGAP, this should already be done for you. If operating outside of NGAP, you must set up Cognito yourself, which is beyond the scope of this documentation.

Given that Cognito is set up, in order to be able to download granule files via the Cumulus Distribution API, you must obtain Cognito user credentials, because any attempt to download such files (that will be, or have been, published to the CMR via your Cumulus deployment) will result in a prompt for you to supply Cognito user credentials. To obtain your own user credentials, talk to your product owner or scrum master for additional information. They should either know how to create the credentials, know who can create them for the team, or be the liaison to the Cognito team.

Further, whoever helps to obtain your Cognito user credentials should also be able to supply you with the values for the following new variables that you must add to your cumulus-tf/terraform.tfvars file:

  • csdap_host_url: The URL of the Cognito service to which your Cumulus deployment will make Cognito API calls during a distribution (download) event
  • csdap_client_id: The client ID for the Cumulus application registered within the Cognito service
  • csdap_client_password: The client password for the Cumulus application registered within the Cognito service

Although you might have to wait a bit for your Cognito user credentials, the remaining instructions do not depend upon having them, so you may continue with these instructions while waiting for your credentials.

Cumulus Distribution URL

Your Cumulus Distribution URL is used by Cumulus to generate download URLs as part of the granule metadata generated and published to the CMR. For example, a granule download URL will be of the form <distribution url>/<protected bucket>/<key> (or <distribution url>/path/to/file, if using a custom bucket map, as explained further below).

By default, the value of your distribution URL is the URL of your private Cumulus Distribution API Gateway (the API Gateway named <prefix>-distribution, once you deploy the Cumulus Distribution module). Therefore, by default, the generated download URLs are private, and thus inaccessible directly, but there are 2 ways to address this issue (both of which are detailed below): (a) use tunneling (typically in development) or (b) put a CloudFront URL in front of your API Gateway (typically in production, and perhaps UAT and/or SIT).

In either case, you must first know the default URL (i.e., the URL for the private Cumulus Distribution API Gateway). In order to obtain this default URL, you must first deploy your cumulus-tf module with the new Cumulus Distribution module, and once your initial deployment is complete, one of the Terraform outputs will be cumulus_distribution_api_uri, which is the URL for the private API Gateway.

You may override this default URL by adding a cumulus_distribution_url variable to your cumulus-tf/terraform.tfvars file and setting it to one of the following values (both are explained below):

  1. The default URL, but with a port added to it, in order to allow you to configure tunneling (typically only in development)
  2. A CloudFront URL placed in front of your Cumulus Distribution API Gateway (typically only for Production, but perhaps also for a UAT or SIT environment)

The following subsections explain these approaches in turn.

Using Your Cumulus Distribution API Gateway URL as Your Distribution URL

Since your Cumulus Distribution API Gateway URL is private, the only way you can use it to confirm that your integration with Cognito is working is by using tunneling (again, generally for development). Here is an outline of the required steps with details provided further below:

  1. Create/import a key pair into your AWS EC2 service (if you haven't already done so)
  2. Add a reference to the name of the key pair to your Terraform variables (we'll set the key_name Terraform variable)
  3. Choose an open local port on your machine (we'll use 9000 in the following example)
  4. Add a reference to the value of your cumulus_distribution_api_uri (mentioned earlier), including your chosen port (we'll set the cumulus_distribution_url Terraform variable)
  5. Redeploy Cumulus
  6. Add an entry to your /etc/hosts file
  7. Add a redirect URI to Cognito via the Cognito API
  8. Install the Session Manager Plugin for the AWS CLI (if you haven't already done so; assuming you have already installed the AWS CLI)
  9. Add a sample file to S3 to test downloading via Cognito

To create or import an existing key pair, you can use the AWS CLI (see AWS ec2 import-key-pair), or the AWS Console (see Amazon EC2 key pairs and Linux instances).

Once your key pair is added to AWS, add the following to your cumulus-tf/terraform.tfvars file:

key_name = "<name>"
cumulus_distribution_url = "https://<id>.execute-api.<region>.amazonaws.com:<port>/dev/"

where:

  • <name> is the name of the key pair you just added to AWS
  • <id> and <region> are the corresponding parts from your cumulus_distribution_api_uri output variable
  • <port> is your open local port of choice (9000 is typically a good choice)

Once you save your variable changes, redeploy your cumulus-tf module.

While your deployment runs, add the following entry to your /etc/hosts file, replacing <hostname> with the host name of the cumulus_distribution_url Terraform variable you just added above:

localhost <hostname>

Next, you'll need to use the Cognito API to add the value of your cumulus_distribution_url Terraform variable as a Cognito redirect URI. To do so, use your favorite tool (e.g., curl, wget, Postman, etc.) to make a BasicAuth request to the Cognito API, using the following details:

  • method: POST
  • base URL: the value of your csdap_host_url Terraform variable
  • path: /authclient/updateRedirectUri
  • username: the value of your csdap_client_id Terraform variable
  • password: the value of your csdap_client_password Terraform variable
  • headers: Content-Type='application/x-www-form-urlencoded'
  • body: redirect_uri=<cumulus_distribution_url>/login

where <cumulus_distribution_url> is the value of your cumulus_distribution_url Terraform variable. Note the /login path at the end of the redirect_uri value.

For reference, see the Cognito Authentication Service API.

Next, install the Session Manager Plugin for the AWS CLI. If running on macOS, and you use Homebrew, you can install it simply as follows:

brew install --cask session-manager-plugin --no-quarantine

As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

At this point, you should be ready to open a tunnel and attempt to download your sample file via your browser, summarized as follows:

  1. Determine your EC2 instance ID
  2. Connect to the NASA VPN
  3. Start an AWS SSM session
  4. Open an SSH tunnel
  5. Use a browser to navigate to your file

To determine your EC2 instance ID for your Cumulus deployment, run the follow command where <profile> is the name of the appropriate AWS profile to use, and <prefix> is the value of your prefix Terraform variable:

aws --profile <profile> ec2 describe-instances --filters Name=tag:Deployment,Values=<prefix> Name=instance-state-name,Values=running --query "Reservations[0].Instances[].InstanceId" --output text
Connect to NASA VPN

Before proceeding with the remaining steps, make sure you are connected to the NASA VPN.

Use the value output from the command above in place of <id> in the following command, which will start an SSM session:

aws ssm start-session --target <id> --document-name AWS-StartPortForwardingSession --parameters portNumber=22,localPortNumber=6000

If successful, you should see output similar to the following:

Starting session with SessionId: NGAPShApplicationDeveloper-***
Port 6000 opened for sessionId NGAPShApplicationDeveloper-***.
Waiting for connections...

In another terminal window, open a tunnel with port forwarding using your chosen port from above (e.g., 9000):

ssh -4 -p 6000 -N -L <port>:<api-gateway-host>:443 ec2-user@127.0.0.1

where:

  • <port> is the open local port you chose earlier (e.g., 9000)
  • <api-gateway-host> is the hostname of your private API Gateway (i.e., the host portion of the URL you used as the value of your cumulus_distribution_url Terraform variable above)

Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3 above.

If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, and then next enter a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

Once you're finished testing, clean up as follows:

  1. Stop your SSH tunnel (enter Ctrl-C)
  2. Stop your AWS SSM session (enter Ctrl-C)
  3. If you like, disconnect from the NASA VPN

While this is a relatively lengthy process, things are much easier when using CloudFront, such as in Production (OPS), SIT, or UAT, as explained next.

Using a CloudFront URL as Your Distribution URL

In Production (OPS), and perhaps in other environments, such as UAT and SIT, you'll need to provide a publicly accessible URL for users to use for downloading (distributing) granule files.

This is generally done by placing a CloudFront URL in front of your private Cumulus Distribution API Gateway. In order to create such a CloudFront URL, contact the person who helped you obtain your Cognito credentials, and request a CloudFront URL with the following details:

  • The private, backing URL, which is the value of your cumulus_distribution_api_uri Terraform output value
  • A request to add the AWS account's VPC to the whitelist

Once this request is completed, and you obtain the new CloudFront URL, override your default distribution URL with the CloudFront URL by adding the following to your cumulus-tf/terraform.tfvars file:

cumulus_distribution_url = <cloudfront_url>

In addition, add a Cognito redirect URI, as detailed in the previous section. Note that in this case, the value you'll use for redirect_uri is <cloudfront_url>/login since the value of your cumulus_distribution_url is now your CloudFront URL.

At this point, it is assumed that you have added the appropriate values for this environment for the variables described at the top (csdap_host_url, csdap_client_id, and csdap_client_password).

Redeploy Cumulus with your new/updated Terraform variables.

As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3.

If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

S3 Bucket Mapping

An S3 Bucket map allows users to abstract bucket names. If the bucket names change at any point, only the bucket map would need to be updated instead of every S3 link.

The Cumulus Distribution API uses a bucket_map.yaml or bucket_map.yaml.tmpl file to determine which buckets to serve. See the examples.

The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

The configuration file is a simple JSON mapping of the form:

{
"daac-public-data-bucket": "/path/to/this/kind/of/data"
}
cumulus bucket mapping

Cumulus only supports a one-to-one mapping of bucket -> Cumulus Distribution path for 'distribution' buckets. Also, the bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

Switching from the Thin Egress App to Cumulus Distribution

If you have previously deployed the Thin Egress App (TEA) as your distribution app, you can switch to Cumulus Distribution by following the steps above.

Note, however, that the cumulus_distribution module will generate a bucket map cache and overwrite any existing bucket map caches created by TEA.

There will also be downtime while your API Gateway is updated.

- + \ No newline at end of file diff --git a/docs/deployment/databases-introduction/index.html b/docs/deployment/databases-introduction/index.html index 9b6b530556e..0e984116d7c 100644 --- a/docs/deployment/databases-introduction/index.html +++ b/docs/deployment/databases-introduction/index.html @@ -5,13 +5,13 @@ Databases | Cumulus Documentation - +
Version: v16.0.0

Databases

Cumulus Core Database

Cumulus uses a PostgreSQL database as its primary data store for operational and archive records (e.g. collections, granules, etc). We expect a PostgreSQL database to be provided by the AWS RDS service; however, there are two types of the RDS database which we will explore in the upcoming pages.

Types of Databases

- + \ No newline at end of file diff --git a/docs/deployment/index.html b/docs/deployment/index.html index e6ef9f2b2eb..fe7b1396578 100644 --- a/docs/deployment/index.html +++ b/docs/deployment/index.html @@ -5,7 +5,7 @@ How to Deploy Cumulus | Cumulus Documentation - + @@ -19,7 +19,7 @@ for deployment's EC2 instances and allows you to connect to them via SSH/SSM.

Consider the sizing of your Cumulus instance when configuring your variables.

Choose a Distribution API

Default Configuration

If you are deploying from the Cumulus Deployment Template or a configuration based on that repo, the Thin Egress App (TEA) distribution app will be used by default.

Configuration Options

Cumulus can be configured to use either TEA or the Cumulus Distribution API. The default selection is the Thin Egress App if you're using the Deployment Template.

note

If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

Configure the Thin Egress App

TEA can be used for Cumulus distribution and is the default selection. It allows authentication using Earthdata Login. Follow the steps in the TEA documentation to configure distribution in your cumulus-tf deployment.

Configure the Cumulus Distribution API (Optional)

If you would prefer to use the Cumulus Distribution API, which supports AWS Cognito authentication, follow these steps to configure distribution in your cumulus-tf deployment.

Initialize Terraform

Follow the above instructions to initialize Terraform using terraform init3.

Deploy

Run terraform apply to deploy the resources. Type yes when prompted to confirm that you want to create the resources. Assuming the operation is successful, you should see output like this:

Apply complete! Resources: 292 added, 0 changed, 0 destroyed.

Outputs:

archive_api_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/token
archive_api_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/
distribution_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/login
distribution_url = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/
note

Be sure to copy the redirect URLs because you will need them to update your Earthdata application.

Update Earthdata Application

Add the two redirect URLs to your EarthData login application by doing the following:

  1. Login to URS
  2. Under My Applications -> Application Administration -> use the edit icon of your application
  3. Under Manage -> redirect URIs, add the Archive API url returned from the stack deployment
    • e.g. archive_api_redirect_uri = https://<czbbkscuy6>.execute-api.us-east-1.amazonaws.com/dev/token
  4. Also add the Distribution url
    • e.g. distribution_redirect_uri = https://<kido2r7kji>.execute-api.us-east-1.amazonaws.com/dev/login1
  5. You may delete the placeholder url you used to create the application

If you've lost track of the needed redirect URIs, they can be located on the API Gateway. Once there, select <prefix>-archive and/or <prefix>-thin-egress-app-EgressGateway, Dashboard and utilizing the base URL at the top of the page that is accompanied by the text Invoke this API at:. Make sure to append /token for the archive URL and /login to the thin egress app URL.


Deploy Cumulus Dashboard

Dashboard Requirements

note

The requirements are similar to the Cumulus stack deployment requirements. The installation instructions below include a step that will install/use the required node version referenced in the .nvmrc file in the Dashboard repository.

Prepare AWS

Create S3 Bucket for Dashboard:

  • Create it, e.g. <prefix>-dashboard. Use the command line or console as you did when preparing AWS configuration.
  • Configure the bucket to host a website:
    • AWS S3 console: Select <prefix>-dashboard bucket then, "Properties" -> "Static Website Hosting", point to index.html
    • CLI: aws s3 website s3://<prefix>-dashboard --index-document index.html
  • The bucket's url will be http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or you can find it on the AWS console via "Properties" -> "Static website hosting" -> "Endpoint"
  • Ensure the bucket's access permissions allow your deployment user access to write to the bucket

Install Dashboard

To install the Cumulus Dashboard, clone the repository into the root deploy directory and install dependencies with npm install:

  git clone https://github.com/nasa/cumulus-dashboard
cd cumulus-dashboard
nvm use
npm install

If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

Dashboard Versioning

By default, the master branch will be used for Dashboard deployments. The master branch of the repository contains the most recent stable release of the Cumulus Dashboard.

If you want to test unreleased changes to the Dashboard, use the develop branch.

Each release/version of the Dashboard will have a tag in the Dashboard repo. Release/version numbers will use semantic versioning (major/minor/patch).

To checkout and install a specific version of the Dashboard:

  git fetch --tags
git checkout <version-number> # e.g. v1.2.0
nvm use
npm install

If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

Building the Dashboard

caution

These environment variables are available during the build: APIROOT, DAAC_NAME, STAGE, HIDE_PDR. Any of these can be set on the command line to override the values contained in config.js when running the build below.

To configure your dashboard for deployment, set the APIROOT environment variable to your app's API root.2

Build your dashboard from the Cumulus Dashboard repository root directory, cumulus-dashboard:

  APIROOT=<your_api_root> npm run build

Dashboard Deployment

Deploy your dashboard to S3 bucket from the cumulus-dashboard directory:

Using AWS CLI:

  aws s3 sync dist s3://<prefix>-dashboard --acl public-read

From the S3 Console:

  • Open the <prefix>-dashboard bucket, click 'upload'. Add the contents of the 'dist' subdirectory to the upload. Then select 'Next'. On the permissions window allow the public to view. Select 'Upload'.

You should be able to visit the Dashboard website at http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or find the url <prefix>-dashboard -> "Properties" -> "Static website hosting" -> "Endpoint" and log in with a user that you had previously configured for access.


Cumulus Instance Sizing

The Cumulus deployment default sizing for Elasticsearch instances, EC2 instances, and Autoscaling Groups are small and designed for testing and cost savings. The default settings are likely not suitable for production workloads. Sizing is highly individual and dependent on expected load and archive size.

Please be cognizant of costs as any change in size will affect your AWS bill. AWS provides a pricing calculator for estimating costs.

Elasticsearch

The mappings file contains all of the data types that will be indexed into Elasticsearch. Elasticsearch sizing is tied to your archive size, including your collections, granules, and workflow executions that will be stored.

AWS provides documentation on calculating and configuring for sizing.

In addition to size you'll want to consider the number of nodes which determine how the system reacts in the event of a failure.

Configuration can be done in the data persistence module in elasticsearch_config and the cumulus module in es_index_shards.

If you make changes to your Elasticsearch configuration you will need to reindex for those changes to take effect.

EC2 Instances and Autoscaling Groups

EC2 instances are used for long-running operations (i.e. generating a reconciliation report) and long-running workflow tasks. Configuration for your ECS cluster is achieved via Cumulus deployment variables.

When configuring your ECS cluster consider:

  • The EC2 instance type and EBS volume size needed to accommodate your workloads. Configured as ecs_cluster_instance_type and ecs_cluster_instance_docker_volume_size.
  • The minimum and desired number of instances on hand to accommodate your workloads. Configured as ecs_cluster_min_size and ecs_cluster_desired_size.
  • The maximum number of instances you will need and are willing to pay for to accommodate your heaviest workloads. Configured as ecs_cluster_max_size.
  • Your autoscaling parameters: ecs_cluster_scale_in_adjustment_percent, ecs_cluster_scale_out_adjustment_percent, ecs_cluster_scale_in_threshold_percent, and ecs_cluster_scale_out_threshold_percent.

Footnotes


  1. Run terraform init if:

    • This is the first time deploying the module
    • You have added any additional child modules, including Cumulus components
    • You have updated the source for any of the child modules

  2. To add another redirect URIs to your application. On Earthdata home page, select "My Applications". Scroll down to "Application Administration" and use the edit icon for your application. Then Manage -> Redirect URIs.

  3. The API root can be found a number of ways. The easiest is to note it in the output of the app deployment step. But you can also find it from the AWS console -> Amazon API Gateway -> APIs -> <prefix>-archive -> Dashboard, and reading the URL at the top after "Invoke this API at"

- + \ No newline at end of file diff --git a/docs/deployment/postgres_database_deployment/index.html b/docs/deployment/postgres_database_deployment/index.html index 3b91ccc2a3a..e318446eb1a 100644 --- a/docs/deployment/postgres_database_deployment/index.html +++ b/docs/deployment/postgres_database_deployment/index.html @@ -5,7 +5,7 @@ PostgreSQL Database Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ cumulus-rds-tf that will deploy an AWS RDS Aurora Serverless PostgreSQL 11 compatible database cluster, and optionally provision a single deployment database with credentialed secrets for use with Cumulus.

We have provided an example terraform deployment using this module in the Cumulus template-deploy repository on GitHub.

Use of this example involves:

  • Creating/configuring a Terraform module directory
  • Using Terraform to deploy resources to AWS

Requirements

Configuration/installation of this module requires the following:

  • Terraform
  • git
  • A VPC configured for use with Cumulus Core. This should match the subnets you provide when Deploying Cumulus to allow Core's lambdas to properly access the database.
  • At least two subnets across multiple AZs. These should match the subnets you provide as configuration when Deploying Cumulus, and should be within the same VPC.

Needed Git Repositories

Assumptions

OS/Environment

The instructions in this module require Linux/MacOS. While deployment via Windows is possible, it is unsupported.

Terraform

This document assumes knowledge of Terraform. If you are not comfortable working with Terraform, the following links should bring you up to speed:

For Cumulus specific instructions on installation of Terraform, refer to the main Cumulus Installation Documentation.

Aurora/RDS

This document also assumes some basic familiarity with PostgreSQL databases and Amazon Aurora/RDS. If you're unfamiliar consider perusing the AWS docs and the Aurora Serverless V1 docs.

Prepare Deployment Repository

tip

If you already are working with an existing repository that has a configured rds-cluster-tf deployment for the version of Cumulus you intend to deploy or update, or you need to only configure this module for your repository, skip to Prepare AWS Configuration.

Clone the cumulus-template-deploy repo and name appropriately for your organization:

  git clone https://github.com/nasa/cumulus-template-deploy <repository-name>

We will return to configuring this repo and using it for deployment below.

Optional: Create a New Repository

Create a new repository on GitHub so that you can add your workflows and other modules to source control:

  git remote set-url origin https://github.com/<org>/<repository-name>
git push origin master

You can then add/commit changes as needed.

Update Your Gitignore File

If you are pushing your deployment code to a git repo, make sure to add terraform.tf and terraform.tfvars to .gitignore, as these files will contain sensitive data related to your AWS account.


Prepare AWS Configuration

To deploy this module, you need to make sure that you have the following steps from the Cumulus deployment instructions in similar fashion for this module:


Configure and Deploy the Module

When configuring this module, please keep in mind that unlike Cumulus deployment, this module should be deployed once to create the database cluster and only thereafter to make changes to that configuration/upgrade/etc.

tip

This module does not need to be re-deployed for each Core update.

These steps should be executed in the rds-cluster-tf directory of the template deploy repo that you previously cloned. Run the following to copy the example files:

cd rds-cluster-tf/
cp terraform.tf.example terraform.tf
cp terraform.tfvars.example terraform.tfvars

In terraform.tf, configure the remote state settings by substituting the appropriate values for:

  • bucket
  • dynamodb_table
  • PREFIX (whatever prefix you've chosen for your deployment)

Fill in the appropriate values in terraform.tfvars. See the rds-cluster-tf module variable definitions for more detail on all of the configuration options. A few notable configuration options are documented in the next section.

Configuration Options

  • deletion_protection -- defaults to true. Set it to false if you want to be able to delete your cluster with a terraform destroy without manually updating the cluster.
  • db_admin_username -- cluster database administration username. Defaults to postgres.
  • db_admin_password -- required variable that specifies the admin user password for the cluster. To randomize this on each deployment, consider using a random_string resource as input.
  • region -- defaults to us-east-1.
  • subnets -- requires at least 2 across different AZs. For use with Cumulus, these AZs should match the values you configure for your lambda_subnet_ids.
  • max_capacity -- the max ACUs the cluster is allowed to use. Carefully consider cost/performance concerns when setting this value.
  • min_capacity -- the minimum ACUs the cluster will scale to
  • provision_user_database -- Optional flag to allow module to provision a user database in addition to creating the cluster. Described in the next section.

Provision User and User Database

If you wish for the module to provision a PostgreSQL database on your new cluster and provide a secret for access in the module output, in addition to managing the cluster itself, the following configuration keys are required:

  • provision_user_database -- must be set to true. This configures the module to deploy a lambda that will create the user database, and update the provided configuration on deploy.
  • permissions_boundary_arn -- the permissions boundary to use in creating the roles for access the provisioning lambda will need. This should in most use cases be the same one used for Cumulus Core deployment.
  • rds_user_password -- the value to set the user password to.
  • prefix -- this value will be used to set a unique identifier for the ProvisionDatabase lambda, as well as name the provisioned user/database.

Once configured, the module will deploy the lambda and run it on each provision thus creating the configured database (if it does not exist), updating the user password (if that value has been changed), and updating the output user database secret.

Setting provision_user_database to false after provisioning will not result in removal of the configured database, as the lambda is non-destructive as configured in this module.

note

This functionality is limited in that it will only provision a single database/user and configure a basic database, and should not be used in scenarios where more complex configuration is required.

Initialize Terraform

Run terraform init

You should see a similar output:

* provider.aws: version = "~> 2.32"

Terraform has been successfully initialized!

Deploy

Run terraform apply to deploy the resources.

caution

If re-applying this module, variables (e.g. engine_version, snapshot_identifier ) that force a recreation of the database cluster may result in data loss if deletion protection is disabled. Examine the changeset carefully for resources that will be re-created/destroyed before applying.

Review the changeset, and assuming it looks correct, type yes when prompted to confirm that you want to create all of the resources.

Assuming the operation is successful, you should see output similar to the following (this example omits the creation of a user's database, lambdas, and security groups):

Output Example
terraform apply

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create

Terraform will perform the following actions:

# module.rds_cluster.aws_db_subnet_group.default will be created
+ resource "aws_db_subnet_group" "default" {
+ arn = (known after apply)
+ description = "Managed by Terraform"
+ id = (known after apply)
+ name = (known after apply)
+ name_prefix = "xxxxxxxxx"
+ subnet_ids = [
+ "subnet-xxxxxxxxx",
+ "subnet-xxxxxxxxx",
]
+ tags = {
+ "Deployment" = "xxxxxxxxx"
}
}

# module.rds_cluster.aws_rds_cluster.cumulus will be created
+ resource "aws_rds_cluster" "cumulus" {
+ apply_immediately = true
+ arn = (known after apply)
+ availability_zones = (known after apply)
+ backup_retention_period = 1
+ cluster_identifier = "xxxxxxxxx"
+ cluster_identifier_prefix = (known after apply)
+ cluster_members = (known after apply)
+ cluster_resource_id = (known after apply)
+ copy_tags_to_snapshot = false
+ database_name = "xxxxxxxxx"
+ db_cluster_parameter_group_name = (known after apply)
+ db_subnet_group_name = (known after apply)
+ deletion_protection = true
+ enable_http_endpoint = true
+ endpoint = (known after apply)
+ engine = "aurora-postgresql"
+ engine_mode = "serverless"
+ engine_version = "10.12"
+ final_snapshot_identifier = "xxxxxxxxx"
+ hosted_zone_id = (known after apply)
+ id = (known after apply)
+ kms_key_id = (known after apply)
+ master_password = (sensitive value)
+ master_username = "xxxxxxxxx"
+ port = (known after apply)
+ preferred_backup_window = "07:00-09:00"
+ preferred_maintenance_window = (known after apply)
+ reader_endpoint = (known after apply)
+ skip_final_snapshot = false
+ storage_encrypted = (known after apply)
+ tags = {
+ "Deployment" = "xxxxxxxxx"
}
+ vpc_security_group_ids = (known after apply)

+ scaling_configuration {
+ auto_pause = true
+ max_capacity = 4
+ min_capacity = 2
+ seconds_until_auto_pause = 300
+ timeout_action = "RollbackCapacityChange"
}
}

# module.rds_cluster.aws_secretsmanager_secret.rds_login will be created
+ resource "aws_secretsmanager_secret" "rds_login" {
+ arn = (known after apply)
+ id = (known after apply)
+ name = (known after apply)
+ name_prefix = "xxxxxxxxx"
+ policy = (known after apply)
+ recovery_window_in_days = 30
+ rotation_enabled = (known after apply)
+ rotation_lambda_arn = (known after apply)
+ tags = {
+ "Deployment" = "xxxxxxxxx"
}

+ rotation_rules {
+ automatically_after_days = (known after apply)
}
}

# module.rds_cluster.aws_secretsmanager_secret_version.rds_login will be created
+ resource "aws_secretsmanager_secret_version" "rds_login" {
+ arn = (known after apply)
+ id = (known after apply)
+ secret_id = (known after apply)
+ secret_string = (sensitive value)
+ version_id = (known after apply)
+ version_stages = (known after apply)
}

# module.rds_cluster.aws_security_group.rds_cluster_access will be created
+ resource "aws_security_group" "rds_cluster_access" {
+ arn = (known after apply)
+ description = "Managed by Terraform"
+ egress = (known after apply)
+ id = (known after apply)
+ ingress = (known after apply)
+ name = (known after apply)
+ name_prefix = "cumulus_rds_cluster_access_ingress"
+ owner_id = (known after apply)
+ revoke_rules_on_delete = false
+ tags = {
+ "Deployment" = "xxxxxxxxx"
}
+ vpc_id = "vpc-xxxxxxxxx"
}

# module.rds_cluster.aws_security_group_rule.rds_security_group_allow_PostgreSQL will be created
+ resource "aws_security_group_rule" "rds_security_group_allow_postgres" {
+ from_port = 5432
+ id = (known after apply)
+ protocol = "tcp"
+ security_group_id = (known after apply)
+ self = true
+ source_security_group_id = (known after apply)
+ to_port = 5432
+ type = "ingress"
}

Plan: 6 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.

Enter a value: yes

module.rds_cluster.aws_db_subnet_group.default: Creating...
module.rds_cluster.aws_security_group.rds_cluster_access: Creating...
module.rds_cluster.aws_secretsmanager_secret.rds_login: Creating...

Then, after the resources are created:

Apply complete! Resources: X added, 0 changed, 0 destroyed.
Releasing state lock. This may take a few moments...

Outputs:

admin_db_login_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxxxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmdR
admin_db_login_secret_version = xxxxxxxxx
rds_endpoint = xxxxxxxxx.us-east-1.rds.amazonaws.com
security_group_id = xxxxxxxxx
user_credentials_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmpXA

Note the output values for admin_db_login_secret_arn (and optionally user_credentials_secret_arn) as these provide the AWS Secrets Manager secrets required to access the database as the administrative user and, optionally, the user database credentials Cumulus requires as well.

The content of each of these secrets are in the form:

{
"database": "postgres",
"dbClusterIdentifier": "clusterName",
"engine": "postgres",
"host": "xxx",
"password": "defaultPassword",
"port": 5432,
"username": "xxx"
}
  • database -- the PostgreSQL database used by the configured user
  • dbClusterIdentifier -- the value set by the cluster_identifier variable in the terraform module
  • engine -- the Aurora/RDS database engine
  • host -- the RDS service host for the database in the form (dbClusterIdentifier)-(AWS ID string).(region).rds.amazonaws.com
  • password -- the database password
  • username -- the account username
  • port -- The database connection port, should always be 5432

Connect to PostgreSQL DB via pgAdmin

If you would like to manage your PostgreSQL database in an GUI tool, you can via pgAdmin.

Requirements

SSH Setup in AWS Secrets Manager

You will need to navigate to AWS Secrets Manager and retrieve the secret values for your database. The secret name will contain the string _db_login and your prefix. Click the "Retrieve secret value" button (Retrieve secret value)to see the secret values.

The value for your secret name can also be retrieved from the data-persistence-tf directory with the command terraform output.

pgAdmin values to retrieve

Setup ~/.ssh/config

Replace HOST value and PORT value with the values retrieved from Secrets Manager.

The LocalForward number 9202 can be any unused LocalForward number in your SSH config:

Host ssm-proxy
Hostname 127.0.0.1
User ec2-user
LocalForward 9202 [HOST value]:[PORT value]
IdentityFile ~/.ssh/id_rsa
Port 6868

Create a Local Port Forward

  • Create a local port forward to the SSM box port 22, this creates a tunnel from <local ssh port> to the SSH port on the SSM host.
caution

<local ssh port> should not be 8000.

  • Replace the following command values for <instance id> with your instance ID:
aws ssm start-session --target <instance id> --document-name AWS-StartPortForwardingSession --parameters portNumber=22,localPortNumber=6868
  • Then, in another terminal tab, enter:
ssh ssm-proxy

Create PgAdmin Server

  • Open pgAdmin and begin creating a new server (in newer versions it may be registering a new server).

Creating a pgAdmin server

  • In the "Connection" tab, enter the values retrieved from Secrets Manager. Host name/address and Port should be the Hostname and LocalForward number from the ~/.ssh/config file.

pgAdmin server connection value entries

note

Maintenance database corresponds to "database".

You can select "Save Password?" to save your password. Click "Save" when you are finished. You should see your new server in pgAdmin.

Query Your Database

  • In the "Browser" area find your database, navigate to the name, and click on it.

  • Select the "Query Editor" to begin writing queries to your database.

Using the query editor in pgAdmin

You are all set to manage your queries in pgAdmin!


Next Steps

Your database cluster has been created/updated! From here you can continue to add additional user accounts, databases, and other database configurations.

- + \ No newline at end of file diff --git a/docs/deployment/share-s3-access-logs/index.html b/docs/deployment/share-s3-access-logs/index.html index d048f31ffd3..457ce338b89 100644 --- a/docs/deployment/share-s3-access-logs/index.html +++ b/docs/deployment/share-s3-access-logs/index.html @@ -5,13 +5,13 @@ Share S3 Access Logs | Cumulus Documentation - +
Version: v16.0.0

Share S3 Access Logs

It is possible through Cumulus to share S3 access logs across multiple S3 packages using the S3 replicator package.

S3 Replicator

The S3 Replicator is a Node.js package that contains a simple Lambda function, associated permissions, and the Terraform instructions to replicate create-object events from one S3 bucket to another.

First, ensure that you have enabled S3 Server Access Logging.

Next, configure your terraform.tfvars as described in the s3-replicator/README.md to correspond to your deployment. The source_bucket and source_prefix are determined by how you enabled the S3 Server Access Logging.

In order to deploy the s3-replicator with Cumulus you will need to add the module to your terraform main.tf definition as the example below:

module "s3-replicator" {
source = "<path to s3-replicator.zip>"
prefix = var.prefix
vpc_id = var.vpc_id
subnet_ids = var.subnet_ids
permissions_boundary = var.permissions_boundary_arn
source_bucket = var.s3_replicator_config.source_bucket
source_prefix = var.s3_replicator_config.source_prefix
target_bucket = var.s3_replicator_config.target_bucket
target_prefix = var.s3_replicator_config.target_prefix
}

The Terraform source package can be found on the Cumulus GitHub Release page under the asset tab terraform-aws-cumulus-s3-replicator.zip.

ESDIS Metrics

In the NGAP environment, the ESDIS Metrics team has set up an ELK stack to process logs from Cumulus instances. To use this system, you must deliver any S3 Server Access logs that Cumulus creates.

Configure the S3 Replicator as described above using the target_bucket and target_prefix provided by the Metrics team.

The Metrics team has taken care of setting up Logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

info

For a more in-depth overview regarding ESDIS Metrics view the Cumulus Distribution Metrics section.

- + \ No newline at end of file diff --git a/docs/deployment/terraform-best-practices/index.html b/docs/deployment/terraform-best-practices/index.html index eb3868d7b33..4159eb18426 100644 --- a/docs/deployment/terraform-best-practices/index.html +++ b/docs/deployment/terraform-best-practices/index.html @@ -5,7 +5,7 @@ Terraform Best Practices | Cumulus Documentation - + @@ -84,7 +84,7 @@ are any dangling resources left behind for any reason, by running the following AWS CLI command, replacing PREFIX with your deployment prefix name:

aws resourcegroupstaggingapi get-resources \
--query "ResourceTagMappingList[].ResourceARN" \
--tag-filters Key=Deployment,Values=PREFIX

Ideally, the output should be an empty list, but if it is not, then you may need to manually delete the listed resources.

- + \ No newline at end of file diff --git a/docs/deployment/thin_egress_app/index.html b/docs/deployment/thin_egress_app/index.html index 60ecc325f87..9f1d92f540e 100644 --- a/docs/deployment/thin_egress_app/index.html +++ b/docs/deployment/thin_egress_app/index.html @@ -5,7 +5,7 @@ Using the Thin Egress App (TEA) for Cumulus Distribution | Cumulus Documentation - + @@ -13,7 +13,7 @@
Version: v16.0.0

Using the Thin Egress App (TEA) for Cumulus Distribution

The Thin Egress App (TEA) is an app running in Lambda that allows retrieving data from S3 using temporary links and provides URS integration.

Configuring a TEA Deployment

TEA is deployed using Terraform modules. Refer to these instructions for guidance on how to integrate new components with your deployment.

The cumulus-template-deploy repository cumulus-tf/main.tf contains a thin_egress_app for distribution.

The TEA module provides these instructions showing how to add it to your deployment and the following are instructions to configure the thin_egress_app module in your Cumulus deployment.

Create a Secret for Signing Thin Egress App JWTs

The Thin Egress App uses JSON Web Tokens (JWTs) internally to authenticate requests and requires a secret stored in AWS Secrets Manager containing SSH keys that are used to sign the JWTs.

See the Thin Egress App documentation on how to create this secret with the correct values. It will be used later to set the thin_egress_jwt_secret_name variable when deploying the Cumulus module.

Bucket_map.yaml

The Thin Egress App uses a bucket_map.yaml file to determine which buckets to serve. Documentation of the file format is available here.

The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

The configuration file is a simple JSON mapping of the form:

{
"daac-public-data-bucket": "/path/to/this/kind/of/data"
}
info

Cumulus only supports a one-to-one mapping of bucket->TEA path for 'distribution' buckets.

Optionally Configure a Custom Bucket Map

A simple configuration would look something like this:

bucket_map.yaml
MAP:
my-protected: my-protected
my-public: my-public

PUBLIC_BUCKETS:
- my-public
caution

Your custom bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

Optionally Configure Shared Variables

The cumulus module deploys certain components that interact with TEA. As a result, the cumulus module requires that if you are specifying a value for the stage_name variable to the TEA module, you must use the same value for the tea_api_gateway_stage variable to the cumulus module.

One way to keep these variable values in sync across the modules is to use Terraform local values to define values to use for the variables for both modules. This approach is shown in the Cumulus Core example deployment code.

- + \ No newline at end of file diff --git a/docs/deployment/upgrade-readme/index.html b/docs/deployment/upgrade-readme/index.html index 72efcae8543..6a628b69cf0 100644 --- a/docs/deployment/upgrade-readme/index.html +++ b/docs/deployment/upgrade-readme/index.html @@ -5,7 +5,7 @@ Upgrading Cumulus | Cumulus Documentation - + @@ -15,7 +15,7 @@ deployment functions correctly. Please refer to some recommended smoke tests given above, and consider additional tests appropriate for your particular deployment and environment.

Update Cumulus Dashboard

If there are breaking (or otherwise significant) changes to the Cumulus API, you should also upgrade your Cumulus Dashboard deployment to use the version of the Cumulus API matching the version of Cumulus to which you are migrating.

- + \ No newline at end of file diff --git a/docs/development/forked-pr/index.html b/docs/development/forked-pr/index.html index 928a8facece..814a4c94729 100644 --- a/docs/development/forked-pr/index.html +++ b/docs/development/forked-pr/index.html @@ -5,13 +5,13 @@ Issuing PR From Forked Repos | Cumulus Documentation - +
Version: v16.0.0

Issuing PR From Forked Repos

Fork the Repo

  • Fork the Cumulus repo
  • Create a new branch from the branch you'd like to contribute to
  • If an issue does't already exist, submit one (see above)

Create a Pull Request

Reviewing PRs from Forked Repos

Upon submission of a pull request, the Cumulus development team will review the code.

Once the code passes an initial review, the team will run the CI tests against the proposed update.

The request will then either be merged, declined, or an adjustment to the code will be requested via the issue opened with the original PR request.

PRs from forked repos cannot directly merged to master. Cumulus reviews must follow the following steps before completing the review process:

  1. Create a new branch:

      git checkout -b from-<name-of-the-branch> master
  2. Push the new branch to GitHub

  3. Change the destination of the forked PR to the new branch that was just pushed

    Screenshot of Github interface showing how to change the base branch of a pull request

  4. After code review and approval, merge the forked PR to the new branch.

  5. Create a PR for the new branch to master.

  6. If the CI tests pass, merge the new branch to master and close the issue. If the CI tests do not pass, request an amended PR from the original author/ or resolve failures as appropriate.

- + \ No newline at end of file diff --git a/docs/development/integration-tests/index.html b/docs/development/integration-tests/index.html index 5fdfadd4d0c..e311ebaefe8 100644 --- a/docs/development/integration-tests/index.html +++ b/docs/development/integration-tests/index.html @@ -5,7 +5,7 @@ Integration Tests | Cumulus Documentation - + @@ -19,7 +19,7 @@ in the commit message.

If you create a new stack and want to be able to run integration tests against it in CI, you will need to add it to bamboo/select-stack.js.

- + \ No newline at end of file diff --git a/docs/development/quality-and-coverage/index.html b/docs/development/quality-and-coverage/index.html index 384d6f0dd60..ae6a4cfdb6e 100644 --- a/docs/development/quality-and-coverage/index.html +++ b/docs/development/quality-and-coverage/index.html @@ -5,7 +5,7 @@ Code Coverage and Quality | Cumulus Documentation - + @@ -23,7 +23,7 @@ here.

To run linting on the markdown files, run npm run lint-md.

Audit

This project uses audit-ci to run a security audit on the package dependency tree. This must pass prior to merge. The configured rules for audit-ci can be found here.

To execute an audit, run npm run audit.

- + \ No newline at end of file diff --git a/docs/development/release/index.html b/docs/development/release/index.html index 4939cd59db8..b6ef5514d3d 100644 --- a/docs/development/release/index.html +++ b/docs/development/release/index.html @@ -5,7 +5,7 @@ Versioning and Releases | Cumulus Documentation - + @@ -24,7 +24,7 @@ this is a backport and patch release on the 13.3.x series of releases. Updates that are included in the future will have a corresponding CHANGELOG entry in future releases..

Troubleshooting

Delete and regenerate the tag

To delete a published tag to re-tag, follow these steps:

  git tag -d vMAJOR.MINOR.PATCH
git push -d origin vMAJOR.MINOR.PATCH

e.g.:
git tag -d v9.1.0
git push -d origin v9.1.0
- + \ No newline at end of file diff --git a/docs/docs-how-to/index.html b/docs/docs-how-to/index.html index 2bed9b6e858..2245298e55d 100644 --- a/docs/docs-how-to/index.html +++ b/docs/docs-how-to/index.html @@ -5,7 +5,7 @@ Cumulus Documentation: How To's | Cumulus Documentation - + @@ -13,7 +13,7 @@
Version: v16.0.0

Cumulus Documentation: How To's

Cumulus Docs Installation

Run a Local Server

Environment variables DOCSEARCH_APP_ID, DOCSEARCH_API_KEY and DOCSEARCH_INDEX_NAME must be set for search to work. At the moment, search is only truly functional on prod because that is the only website we have registered to be indexed with DocSearch (see below on search).

git clone git@github.com:nasa/cumulus
cd cumulus
npm run docs-install
npm run docs-serve
note

docs-build will build the documents into website/build. docs-clear will clear the documents.

caution

Fix any broken links reported by Docusaurus if you see the following messages during build.

[INFO] Docusaurus found broken links!

Exhaustive list of all broken links found:

Cumulus Documentation

Our project documentation is hosted on GitHub Pages. The resources published to this website are housed in docs/ directory at the top of the Cumulus repository. Those resources primarily consist of markdown files and images.

We use the open-source static website generator Docusaurus to build html files from our markdown documentation, add some organization and navigation, and provide some other niceties in the final website (search, easy templating, etc.).

Add a New Page and Sidebars

Adding a new page should be as simple as writing some documentation in markdown, placing it under the correct directory in the docs/ folder and adding some configuration values wrapped by --- at the top of the file. There are many files that already have this header which can be used as reference.

---
id: doc-unique-id # unique id for this document. This must be unique across ALL documentation under docs/
title: Title Of Doc # Whatever title you feel like adding. This will show up as the index to this page on the sidebar.
hide_title: false
---

Note: To have the new page show up in a sidebar the designated id must be added to a sidebar in the website/sidebars.js file. Docusaurus has an in depth explanation of sidebars here.

Versioning Docs

We lean heavily on Docusaurus for versioning. Their suggestions and walk-through can be found here. Docusaurus v2 uses snapshot approach for documentation versioning. Every versioned docs does not depends on other version. It is worth noting that we would like the Documentation versions to match up directly with release versions. However, a new versioned docs can take up a lot of repo space and require maintenance, we suggest to update existing versioned docs for minor releases when there are no significant functionality changes. Cumulus versioning is explained in the Versioning Docs.

Search on our documentation site is taken care of by DocSearch. We have been provided with an apiId, apiKey and an indexName by DocSearch that we include in our website/docusaurus.config.js file. The rest, indexing and actual searching, we leave to DocSearch. Our builds expect environment variables for these values to exist - DOCSEARCH_APP_ID, DOCSEARCH_API_KEY and DOCSEARCH_NAME_INDEX.

Add a new task

The tasks list in docs/tasks.md is generated from the list of task package in the task folder. Do not edit the docs/tasks.md file directly.

Read more about adding a new task.

Editing the tasks.md header or template

Look at the bin/build-tasks-doc.js and bin/tasks-header.md files to edit the output of the tasks build script.

Editing diagrams

For some diagrams included in the documentation, the raw source is included in the docs/assets/raw directory to allow for easy updating in the future:

  • assets/interfaces.svg -> assets/raw/interfaces.drawio (generated using draw.io)

Deployment

The master branch is automatically built and deployed to gh-pages branch. The gh-pages branch is served by Github Pages. Do not make edits to the gh-pages branch.

- + \ No newline at end of file diff --git a/docs/external-contributions/index.html b/docs/external-contributions/index.html index 8e9c73e0aa2..7c034c8bab5 100644 --- a/docs/external-contributions/index.html +++ b/docs/external-contributions/index.html @@ -5,13 +5,13 @@ External Contributions | Cumulus Documentation - +
Version: v16.0.0

External Contributions

Contributions to Cumulus may be made in the form of PRs to the repositories directly or through externally developed tasks and components. Cumulus is designed as an ecosystem that leverages Terraform deployments and AWS Step Functions to easily integrate external components.

This list may not be exhaustive and represents components that are open source, owned externally, and that have been tested with the Cumulus system. For more information and contributing guidelines, visit the respective GitHub repositories.

Distribution

The ASF Thin Egress App is used by Cumulus for distribution. TEA can be deployed with Cumulus or as part of other applications to distribute data.

Operational Cloud Recovery Archive (ORCA)

ORCA can be deployed with Cumulus to provide a customizable baseline for creating and managing operational backups.

Workflow Tasks

CNM

PO.DAAC provides two workflow tasks to be used with the Cloud Notification Mechanism (CNM) Schema: CNM to Granule and CNM Response.

See the CNM workflow data cookbook for an example of how these can be used in a Cumulus ingest workflow.

DMR++ Generation

GHRC has provided a DMR++ Generation wokrflow task. This task is meant to be used in conjunction with Cumulus' Hyrax Metadata Updates workflow task.

- + \ No newline at end of file diff --git a/docs/faqs/index.html b/docs/faqs/index.html index 33485628597..6cdcb2861c9 100644 --- a/docs/faqs/index.html +++ b/docs/faqs/index.html @@ -5,13 +5,13 @@ Frequently Asked Questions | Cumulus Documentation - +
Version: v16.0.0

Frequently Asked Questions

Below are some commonly asked questions that you may encounter that can assist you along the way when working with Cumulus.

General | Workflows | Integrators & Developers | Operators


General

What prerequisites are needed to setup Cumulus?
Answer: Here is a list of the tools and access that you will need in order to get started. To maintain the up-to-date versions that we are using please visit our [Cumulus main README](https://github.com/nasa/cumulus) for details.
  • NVM for node versioning
  • AWS CLI
  • Bash
  • Docker (only required for testing)
  • docker-compose (only required for testing pip install docker-compose)
  • Python
  • pipenv

Keep in mind you will need access to the AWS console and an Earthdata account before you can deploy Cumulus.

What is the preferred web browser for the Cumulus environment?

Answer: Our preferred web browser is the latest version of Google Chrome.

How do I deploy a new instance in Cumulus?

Answer: For steps on the Cumulus deployment process go to How to Deploy Cumulus.

Where can I find Cumulus release notes?

Answer: To get the latest information about updates to Cumulus go to Cumulus Versions.

How do I quickly troubleshoot an issue in Cumulus?

Answer: To troubleshoot and fix issues in Cumulus reference our recommended solutions in Troubleshooting Cumulus.

Where can I get support help?

Answer: The following options are available for assistance:

  • Cumulus: Outside NASA users should file a GitHub issue and inside NASA users should file a Cumulus JIRA ticket.
  • AWS: You can create a case in the AWS Support Center, accessible via your AWS Console.

For more information on how to submit an issue or contribute to Cumulus follow our guidelines at Contributing


Workflows

What is a Cumulus workflow?

Answer: A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions. For more details, we suggest visiting the Workflows section.

How do I set up a Cumulus workflow?

Answer: You will need to create a provider, have an associated collection (add a new one), and generate a new rule first. Then you can set up a Cumulus workflow by following these steps here.

Where can I find a list of workflow tasks?

Answer: You can access a list of reusable tasks for Cumulus development at Cumulus Tasks.

Are there any third-party workflows or applications that I can use with Cumulus?

Answer: The Cumulus team works with various partners to help build a robust framework. You can visit our External Contributions section to see what other options are available to help you customize Cumulus for your needs.


Integrators & Developers

What is a Cumulus integrator?

Answer: Those who are working within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

  • Configure and deploy Cumulus to the AWS environment
  • Configure Cumulus workflows
  • Write custom workflow tasks
What are the steps if I run into an issue during deployment?

Answer: If you encounter an issue with your deployment go to the Troubleshooting Deployment guide.

Is Cumulus customizable and flexible?

Answer: Yes. Cumulus is a modular architecture that allows you to decide which components that you want/need to deploy. These components are maintained as Terraform modules.

What are Terraform modules?

Answer: They are modules that are composed to create a Cumulus deployment, which gives integrators the flexibility to choose the components of Cumulus that want/need. To view Cumulus maintained modules or steps on how to create a module go to Terraform modules.

Where do I find Terraform module variables

Answer: Go here for a list of Cumulus maintained variables.

What are the common use cases that a Cumulus integrator encounters?

Answer: The following are some examples of possible use cases you may see:


Operators

What is a Cumulus operator?

Answer: Those that ingests, archives, and troubleshoots datasets (called collections in Cumulus). Your daily activities might include but not limited to the following:

  • Ingesting datasets
  • Maintaining historical data ingest
  • Starting and stopping data handlers
  • Managing collections
  • Managing provider definitions
  • Creating, enabling, and disabling rules
  • Investigating errors for granules and deleting or re-ingesting granules
  • Investigating errors in executions and isolating failed workflow step(s)
What are the common use cases that a Cumulus operator encounters?

Answer: The following are some examples of possible use cases you may see:

Explore more Cumulus operator best practices and how-tos in the dedicated Operator Docs.

Can you re-run a workflow execution in AWS?

Answer: Yes. For steps on how to re-run a workflow execution go to Re-running workflow executions in the Cumulus Operator Docs.

- + \ No newline at end of file diff --git a/docs/features/ancillary_metadata/index.html b/docs/features/ancillary_metadata/index.html index 024339d1b01..82de3b62017 100644 --- a/docs/features/ancillary_metadata/index.html +++ b/docs/features/ancillary_metadata/index.html @@ -5,7 +5,7 @@ Ancillary Metadata Export | Cumulus Documentation - + @@ -13,7 +13,7 @@
Version: v16.0.0

Ancillary Metadata Export

This feature utilizes the type key on a files object in a Cumulus granule. It uses the key to provide a mechanism where granule discovery, processing and other tasks can set and use this value to facilitate metadata export to CMR.

Tasks setting type

Discover Granules

Uses the Collection type key to set the value for files on discovered granules in it's output.

Parse PDR

Uses a task-specific mapping to map PDR 'FILE_TYPE' to a CNM type to set type on granules from the PDR.

CNMToCMALambdaFunction

Natively supports types that are included in incoming messages to a CNM Workflow.

Tasks using type

Move Granules

Uses the granule file type key to update UMM/ECHO 10 CMR files passed in as candidates to the task. This task adds the external facing URLs to the CMR metadata file based on the type. See the file tracking data cookbook for a detailed mapping. If a non-CNM type is specified, the task assumes it is a 'data' file.

- + \ No newline at end of file diff --git a/docs/features/backup_and_restore/index.html b/docs/features/backup_and_restore/index.html index 57be6327b8c..0abcc4c0dc6 100644 --- a/docs/features/backup_and_restore/index.html +++ b/docs/features/backup_and_restore/index.html @@ -5,7 +5,7 @@ Cumulus Backup and Restore | Cumulus Documentation - + @@ -52,7 +52,7 @@ writing to the old cluster.

  • Set the snapshot_identifier variable to the snapshot you wish to create, and configure the module like a new deployment, with a unique cluster_identifier

  • Deploy the module using terraform apply

  • Once deployed, verify the cluster has the expected data

  • Redeploy the data persistence and Cumulus deployments - You should not need to reconfigure either, as the secret ARN and the security group should not change, however double-check the configured values are as expected

  • - + \ No newline at end of file diff --git a/docs/features/dead_letter_archive/index.html b/docs/features/dead_letter_archive/index.html index 515e51115b1..98f51f7cb66 100644 --- a/docs/features/dead_letter_archive/index.html +++ b/docs/features/dead_letter_archive/index.html @@ -5,13 +5,13 @@ Cumulus Dead Letter Archive | Cumulus Documentation - +
    Version: v16.0.0

    Cumulus Dead Letter Archive

    This documentation explains the Cumulus dead letter archive and associated functionality.

    DB Records DLQ Archive

    The Cumulus system contains a number of dead letter queues. Perhaps the most important system lambda function supported by a DLQ is the sfEventSqsToDbRecords lambda function which parses Cumulus messages from workflow executions to generate and write database records to the Cumulus database.

    As of Cumulus v9+, the dead letter queue for this lambda (named sfEventSqsToDbRecordsDeadLetterQueue) has been updated with a consumer lambda that will automatically write any incoming records to the S3 system bucket, under the path <stackName>/dead-letter-archive/sqs/. This will allow integrators and operators engaged in debugging missing records to inspect any Cumulus messages which failed to process and did not result in the successful creation of database records.

    Dead Letter Archive recovery

    In addition to the above, as of Cumulus v9+, the Cumulus API also contains a new endpoint at /deadLetterArchive/recoverCumulusMessages.

    Sending a POST request to this endpoint will trigger a Cumulus AsyncOperation that will attempt to reprocess (and if successful delete) all Cumulus messages in the dead letter archive, using the same underlying logic as the existing sfEventSqsToDbRecords. Otherwise, all Cumulus messages that fail to be reprocessed will be moved to a new archive location under the path <stackName>/dead-letter-archive/failed-sqs/<YYYY-MM-DD>.

    This endpoint may prove particularly useful when recovering from extended or unexpected database outage, where messages failed to process due to external outage and there is no essential malformation of each Cumulus message.

    - + \ No newline at end of file diff --git a/docs/features/dead_letter_queues/index.html b/docs/features/dead_letter_queues/index.html index 5f9ed3a6a57..f9e87236800 100644 --- a/docs/features/dead_letter_queues/index.html +++ b/docs/features/dead_letter_queues/index.html @@ -5,13 +5,13 @@ Dead Letter Queues | Cumulus Documentation - +
    Version: v16.0.0

    Dead Letter Queues

    startSF SQS queue

    The workflow-trigger for the startSF queue has a Redrive Policy set up that directs any failed attempts to pull from the workflow start queue to a SQS queue Dead Letter Queue.

    This queue can then be monitored for failures to initiate a workflow. Please note that workflow failures will not show up in this queue, only repeated failure to trigger a workflow.

    Named Lambda Dead Letter Queues

    Cumulus provides configured Dead Letter Queues (DLQ) for non-workflow Lambdas (such as ScheduleSF) to capture Lambda failures for further processing.

    These DLQs are setup with the following configuration:

      receive_wait_time_seconds  = 20
    message_retention_seconds = 1209600
    visibility_timeout_seconds = 60

    Default Lambda Configuration

    The following built-in Cumulus Lambdas are setup with DLQs to allow handling of process failures:

    • dbIndexer (Updates Elasticsearch)
    • JobsLambda (writes logs outputs to Elasticsearch)
    • ScheduleSF (the SF Scheduler Lambda that places messages on the queue that is used to start workflows, see Workflow Triggers)
    • publishReports (Lambda that publishes messages to the SNS topics for execution, granule and PDR reporting)
    • reportGranules, reportExecutions, reportPdrs (Lambdas responsible for updating records based on messages in the queues published by publishReports)

    Troubleshooting/Utilizing messages in a Dead Letter Queue

    Ideally an automated process should be configured to poll the queue and process messages off a dead letter queue.

    For aid in manually troubleshooting, you can utilize the SQS Management console to view/messages available in the queues setup for a particular stack. The dead letter queues will have a Message Body containing the Lambda payload, as well as Message Attributes that reference both the error returned and a RequestID which can be cross referenced to the associated Lambda's CloudWatch logs for more information:

    Screenshot of the AWS SQS console showing how to view SQS message attributes

    - + \ No newline at end of file diff --git a/docs/features/distribution-metrics/index.html b/docs/features/distribution-metrics/index.html index 75ceb7f08ee..5c8dac0b813 100644 --- a/docs/features/distribution-metrics/index.html +++ b/docs/features/distribution-metrics/index.html @@ -5,13 +5,13 @@ Cumulus Distribution Metrics | Cumulus Documentation - +
    Version: v16.0.0

    Cumulus Distribution Metrics

    It is possible to configure Cumulus and the Cumulus Dashboard to display information about the successes and failures of requests for data. This requires the Cumulus instance to deliver Cloudwatch Logs and S3 Server Access logs to an ELK stack.

    ESDIS Metrics in NGAP

    Work with the ESDIS metrics team to set up permissions and access to forward Cloudwatch Logs to a shared AWS:Logs:Destination as well as transferring your S3 Server Access logs to a metrics team bucket.

    The metrics team has taken care of setting up logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    Once Cumulus has been configured to deliver Cloudwatch logs to the ESDIS Metrics team, you can use the Elasticsearch indexes to create the necessary target patterns on the dashboard. These are often <daac>-cloudwatch-cumulus-<env>-* and <daac>-distribution-<env>-*, but they will depend on your specific Elastiscearch setup.

    Cumulus / ESDIS Metrics distribution system

    Architecture diagram showing how logs are replicated from a Cumulus instance to the ESDIS Metrics account and accessed by the Cumulus dashboard

    - + \ No newline at end of file diff --git a/docs/features/execution_payload_retention/index.html b/docs/features/execution_payload_retention/index.html index d71a4a8db9d..62b8715c5e7 100644 --- a/docs/features/execution_payload_retention/index.html +++ b/docs/features/execution_payload_retention/index.html @@ -5,13 +5,13 @@ Execution Payload Retention | Cumulus Documentation - +
    Version: v16.0.0

    Execution Payload Retention

    In addition to CloudWatch logs and AWS StepFunction API records, Cumulus automatically stores the initial and 'final' (the last update to the execution record) payload values as part of the Execution record in your RDS database and Elasticsearch.

    This allows access via the API (or optionally direct DB/Elasticsearch querying) for debugging/reporting purposes. The data is stored in the "originalPayload" and "finalPayload" fields.

    Payload record cleanup

    To reduce storage requirements, a CloudWatch rule ({stack-name}-dailyExecutionPayloadCleanupRule) triggering a daily run of the provided cleanExecutions lambda has been added. This lambda will remove all 'completed' and 'non-completed' payload records in the database that are older than the specified configuration.

    Configuration

    The following configuration flags have been made available in the cumulus module. They may be overridden in your deployment's instance of the cumulus module by adding the following configuration options:

    dailyexecution_payload_cleanup_schedule_expression (string)_

    This configuration option sets the execution times for this Lambda to run, using a Cloudwatch cron expression.

    Default value is "cron(0 4 * * ? *)".

    completeexecution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of completed execution payloads.

    Default value is false.

    completeexecution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a 'completed' status in days. Records with updatedAt values older than this with payload information will have that information removed.

    Default value is 10.

    noncomplete_execution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of "non-complete" (any status other than completed) execution payloads.

    Default value is false.

    noncomplete_execution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a status other than 'complete' in days. Records with updateTime values older than this with payload information will have that information removed.

    Default value is 30 days.

    • complete_execution_payload_disable/non_complete_execution_payload_disable

    These flags (true/false) determine if the cleanup script's logic for 'complete' and 'non-complete' executions will run. Default value is false for both.

    - + \ No newline at end of file diff --git a/docs/features/logging-esdis-metrics/index.html b/docs/features/logging-esdis-metrics/index.html index 6ed5d13e662..fca68af90d6 100644 --- a/docs/features/logging-esdis-metrics/index.html +++ b/docs/features/logging-esdis-metrics/index.html @@ -5,13 +5,13 @@ Writing logs for ESDIS Metrics | Cumulus Documentation - +
    Version: v16.0.0

    Writing logs for ESDIS Metrics

    Note: This feature is only available for Cumulus deployments in NGAP environments.

    Prerequisite: You must configure your Cumulus deployment to deliver your logs to the correct shared logs destination for ESDIS metrics.

    Log messages delivered to the ESDIS metrics logs destination conforming to an expected format will be automatically ingested and parsed to enable helpful searching/filtering of your logs via the ESDIS metrics Kibana dashboard.

    Expected log format

    The ESDIS metrics pipeline expects a log message to be a JSON string representation of an object (dict in Python or map in Java). An example log message might look like:

    {
    "level": "info",
    "executions": "arn:aws:states:us-east-1:000000000000:execution:MySfn:abcd1234",
    "granules": "[\"granule-1\",\"granule-2\"]",
    "message": "hello world",
    "sender": "greetingFunction",
    "stackName": "myCumulus",
    "timestamp": "2018-10-19T19:12:47.501Z"
    }

    A log message can contain the following properties:

    • executions: The AWS Step Function execution name in which this task is executing, if any
    • granules: A JSON string of the array of granule IDs being processed by this code, if any
    • level: A string identifier for the type of message being logged. Possible values:
      • debug
      • error
      • fatal
      • info
      • warn
      • trace
    • message: String containing your actual log message
    • parentArn: The parent AWS Step Function execution ARN that triggered the current execution, if any
    • sender: The name of the resource generating the log message (e.g. a library name, a Lambda function name, an ECS activity name)
    • stackName: The unique prefix for your Cumulus deployment
    • timestamp: An ISO-8601 formatted timestamp
    • version: The version of the resource generating the log message, if any

    None of these properties are explicitly required for ESDIS metrics to parse your log correctly. However, a log without a message has no informational content. And having level, sender, and timestamp properties is very useful for filtering your logs. Including a stackName in your logs is helpful as it allows you to distinguish between logs generated by different deployments.

    Using Cumulus Message Adapter libraries

    If you are writing a custom task that is integrated with the Cumulus Message Adapter, then some of language specific client libraries can be used to write logs compatible with ESDIS metrics.

    The usage of each library differs slightly, but in general a logger is initialized with a Cumulus workflow message to determine the contextual information for the task (e.g. granules, executions). Then, after the logger is initialized, writing logs only requires specifying a message, but the logged output will include the contextual information as well.

    Writing logs using custom code

    Any code that produces logs matching the expected log format can be processed by ESDIS metrics.

    Node.js

    Cumulus core provides a @cumulus/logger library that writes logs in the expected format for ESDIS metrics.

    - + \ No newline at end of file diff --git a/docs/features/replay-archived-sqs-messages/index.html b/docs/features/replay-archived-sqs-messages/index.html index 6527231ede5..1d660d77211 100644 --- a/docs/features/replay-archived-sqs-messages/index.html +++ b/docs/features/replay-archived-sqs-messages/index.html @@ -5,14 +5,14 @@ How to replay SQS messages archived in S3 | Cumulus Documentation - +
    Version: v16.0.0

    How to replay SQS messages archived in S3

    Context

    Cumulus archives all incoming SQS messages to S3 and removes messages once they have been processed. Unprocessed messages are archived at the path: ${stackName}/archived-incoming-messages/${queueName}/${messageId}

    Replay SQS messages endpoint

    The Cumulus API has added a new endpoint, /replays/sqs. This endpoint will allow you to start a replay operation to requeue all archived SQS messages by queueName and returns an AsyncOperationId for operation status tracking.

    Start replaying archived SQS messages

    In order to start a replay, you must perform a POST request to the replays/sqs endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    FieldTypeDescription
    queueNamestringAny valid SQS queue name (not ARN)

    Status tracking

    A successful response from the /replays/sqs endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/features/replay-kinesis-messages/index.html b/docs/features/replay-kinesis-messages/index.html index 6cdf2fcf139..04caae96452 100644 --- a/docs/features/replay-kinesis-messages/index.html +++ b/docs/features/replay-kinesis-messages/index.html @@ -5,7 +5,7 @@ How to replay Kinesis messages after an outage | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v16.0.0

    How to replay Kinesis messages after an outage

    After a period of outage, it may be necessary for a Cumulus operator to reprocess or 'replay' messages that arrived on an AWS Kinesis Data Stream but did not trigger an ingest. This document serves as an outline on how to start a replay operation, and how to perform status tracking. Cumulus supports replay of all Kinesis messages on a stream (subject to the normal RetentionPeriod constraints), or all messages within a given time slice delimited by start and end timestamps.

    As Kinesis has no comparable field to e.g. the SQS ReceiveCount on its records, Cumulus cannot tell which messages within a given time slice have never been processed, and cannot guarantee only missed messages will be processed. Users will have to rely on duplicate handling or some other method of identifying messages that should not be processed within the time slice.

    NOTE: This operation flow effectively changes only the trigger mechanism for Kinesis ingest notifications. The existence of valid Kinesis-type rules and all other normal requirements for the triggering of ingest via Kinesis still apply.

    Replays endpoint

    Cumulus has added a new endpoint to its API, /replays. This endpoint will allow you to start replay operations and returns an AsyncOperationId for operation status tracking.

    Start a replay

    In order to start a replay, you must perform a POST request to the replays endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    NOTE: As the endTimestamp relies on a comparison with the Kinesis server-side ApproximateArrivalTimestamp, and given that there is no documented level of accuracy for the approximation, it is recommended that the endTimestamp include some amount of buffer to allow for slight discrepancies. If tolerable, the same is recommended for the startTimestamp although it is used differently and less vulnerable to discrepancies since a server-side arrival timestamp should never be earlier than the client-side request timestamp.

    FieldTypeRequiredDescription
    typestringrequiredCurrently only accepts kinesis.
    kinesisStreamstringfor type kinesisAny valid kinesis stream name (not ARN)
    kinesisStreamCreationTimestamp*optionalAny input valid for a JS Date constructor. For reasons to use this field see AWS documentation on StreamCreationTimestamp.
    endTimestamp*optionalAny input valid for a JS Date constructor. Messages newer than this timestamp will be skipped.
    startTimestamp*optionalAny input valid for a JS Date constructor. Messages will be fetched from the Kinesis stream starting at this timestamp. Ignored if it is further in the past than the stream's retention period.

    Status tracking

    A successful response from the /replays endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/features/reports/index.html b/docs/features/reports/index.html index 75038dbabfd..b225a1418de 100644 --- a/docs/features/reports/index.html +++ b/docs/features/reports/index.html @@ -5,7 +5,7 @@ Reconciliation Reports | Cumulus Documentation - + @@ -19,7 +19,7 @@ report generation. The data buckets will include any buckets in your Cumulus buckets configuration that have type public, protected or private.
    - + \ No newline at end of file diff --git a/docs/getting-started/index.html b/docs/getting-started/index.html index 3380f4e4610..f9bc738259c 100644 --- a/docs/getting-started/index.html +++ b/docs/getting-started/index.html @@ -5,13 +5,13 @@ Getting Started | Cumulus Documentation - +
    Version: v16.0.0

    Getting Started

    Overview | Quick Tutorials | Helpful Tips

    Overview

    This serves as a guide for new Cumulus users to deploy and learn how to use Cumulus. Here you will learn what you need in order to complete any prerequisites, what Cumulus is and how it works, and how to successfully navigate and deploy a Cumulus environment.

    What is Cumulus

    Cumulus is an open source set of components for creating cloud-based data ingest, archive, distribution and management designed for NASA's future Earth Science data streams.

    Who uses Cumulus

    Data integrators/developers and operators across projects not limited to NASA use Cumulus for their daily work functions.

    Cumulus Roles

    Integrator/Developer

    Cumulus integrators/developers are those who work within Cumulus and AWS for deployments and to manage workflows.

    Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections.

    Role Guides

    As a developer, integrator, or operator, you will need to set up your environments to work in Cumulus. The following docs can get you started in your role specific activities.

    What is a Cumulus Data Type

    In Cumulus, we have the following types of data that you can create and manage:

    • Collections
    • Granules
    • Providers
    • Rules
    • Workflows
    • Executions
    • Reports

    For details on how to create or manage data types go to Data Management Types.


    Quick Tutorials

    Deployment & Configuration

    Cumulus is deployed to an AWS account, so you must have access to deploy resources to an AWS account to get started.

    1. Set up Git Secrets

    To ensure your AWS access keys and passwords are protected as you submit commits we recommend setting up Git Secrets.

    2. Deploy Cumulus Core and Cumulus Dashboard to AWS

    Follow the deployment instructions to deploy Cumulus to your AWS account.

    3. Configure and Run the HelloWorld Workflow

    If you have deployed using the cumulus-template-deploy repository, you have a HelloWorld workflow deployed to your Cumulus backend.

    You can see your deployed workflows on the Workflows page of your Cumulus dashboard.

    Configure a collection and provider using the setup guidance on the Cumulus dashboard.

    Then create a rule to trigger your HelloWorld workflow. You can select a rule type of one time.

    Navigate to the Executions page of the dashboard to check the status of your workflow execution.

    4. Configure a Custom Workflow

    See Developing a custom workflow documentation for adding a new workflow to your deployment.

    There are plenty of workflow examples using Cumulus tasks here. The Data Cookbooks provide a more in-depth look at some of these more advanced workflows and their configurations.

    There is a list of Cumulus tasks already included in your deployment here.

    After configuring your workflow and redeploying, you can configure and run your workflow using the same steps as in step 2.


    Helpful Tips

    Here are some useful tips to keep in mind when deploying or working in Cumulus.

    Integrator/Developer

    • Versioning and Releases: This documentation gives information on our global versioning approach. We suggest upgrading to the supported version for Cumulus, Cumulus dashboard, and Thin Egress App (TEA).
    • Cumulus Developer Documentation: We suggest that you read through and reference this resource for development best practices in Cumulus.
    • Cumulus Deployment: We will guide you on how to manually deploy a new instance of Cumulus. In this reference, you will learn how to install Terraform, create an AWS S3 bucket, configure a compatible database, and create a Lambda layer.
    • Terraform Best Practices: This will help guide you through your Terraform configuration and Cumulus deployment.

    For an introduction about Terraform go here.

    Operator

    Troubleshooting

    Troubleshooting: Some suggestions to help you troubleshoot and solve issues you may encounter.

    Resources

    - + \ No newline at end of file diff --git a/docs/glossary/index.html b/docs/glossary/index.html index cc971e80f8c..56a178bec95 100644 --- a/docs/glossary/index.html +++ b/docs/glossary/index.html @@ -5,13 +5,13 @@ Glossary | Cumulus Documentation - +
    Version: v16.0.0

    Glossary

    AWS Glossary

    For terms/items from Amazon/AWS not mentioned in this glossary, please refer to the AWS Glossary.

    Cumulus Glossary of Terms

    API Gateway

    Refers to AWS's API Gateway. Used by the Cumulus API.

    ARN

    Refers to an AWS "Amazon Resource Name".

    For more info, see the AWS documentation.

    AWS

    See: Amazon Web Services documentation.

    AWS Lambda/Lambda Function

    AWS's 'serverless' option. Allows the running of code without provisioning a service or managing server/ECS instances/etc.

    For more information, see the AWS Lambda documentation.

    AWS Access Keys

    Access credentials that give you access to AWS to act as a IAM user programmatically or from the command line.

    For more information, see the AWS IAM Documentation.

    Bucket

    An Amazon S3 cloud storage resource.

    For more information, see the AWS Bucket Documentation.

    CloudFormation

    An AWS service that allows you to define and manage cloud resources as a preconfigured block.

    For more information, see the AWS CloudFormation User Guide.

    Cloudformation Template

    A template that defines an AWS Cloud Formation.

    For more information, see the AWS intro page.

    Cloudwatch

    AWS service that allows logging and metrics collections on various cloud resources you have in AWS.

    For more information, see the AWS User Guide.

    Cloud Notification Mechanism (CNM)

    An interface mechanism to support cloud-based ingest messaging. For more information, see PO.DAAC's CNM Schema.

    Common Metadata Repository (CMR)

    "A high-performance, high-quality, continuously evolving metadata system that catalogs Earth Science data and associated service metadata records". For more information, see NASA's CMR page.

    Collection (Cumulus)

    Cumulus Collections are logical sets of data objects of the same data type and version.

    For more information, see Collections - Data Management Types.

    Cumulus Message Adapter (CMA)

    A library designed to help task developers integrate step function tasks into a Cumulus workflow by adapting task input/output into the Cumulus Message format.

    For more information, see CMA workflow reference page.

    Distributed Active Archive Center (DAAC)

    Refers to a specific organization that's part of NASA's distributed system of archive centers. For more information see EOSDIS's DAAC page.

    Dead Letter Queue (DLQ)

    This refers to Amazon SQS Dead-Letter Queues - these SQS queues are specifically configured to capture failed messages from other services/SQS queues/etc to allow for processing of failed messages.

    For more on DLQs, see the Amazon Documentation and the Cumulus DLQ feature page.

    Developer

    Those who setup deployment and workflow management for Cumulus. Sometimes referred to as an integrator. See integrator.

    ECS

    Amazon's Elastic Container Service. Used in Cumulus by workflow steps that require more flexibility than Lambda can provide.

    For more information, see AWS's developer guide.

    ECS Activity

    An ECS instance run via a Step Function.

    Execution (Cumulus)

    A Cumulus execution refers to a single execution of a (Cumulus) Workflow.

    GIBS

    Global Imagery Browse Services

    Granule

    A granule is the smallest aggregation of data that can be independently managed (described, inventoried, and retrieved). Granules are always associated with a collection, which is a grouping of granules. A granule is a grouping of data files.

    IAM

    AWS Identity and Access Management.

    For more information, see AWS IAMs.

    Integrator/Developer

    Those who work within Cumulus and AWS for deployments and to manage workflows.

    Kinesis

    Amazon's platform for streaming data on AWS.

    See AWS Kinesis for more information.

    Lambda

    AWS's cloud service that lets you run code without provisioning or managing servers.

    For more information, see AWS's lambda page.

    Module (Terraform)

    Refers to a terraform module.

    Node

    See node.js.

    Node Package Manager (npm)

    Node package manager. Often referred to as npm.

    For more information, see npm.

    Operator

    Those who work within Cumulus to ingest/archive data and manage collections.

    PDR

    "Polling Delivery Mechanism" used in "DAAC Ingest" workflows.

    For more information, see nasa.gov.

    Packages (npm)

    Npm hosted node.js packages. Cumulus packages can be found on npm's site here

    Provider

    Data source that generates and/or distributes data for Cumulus workflows to act upon.

    For more information, see the Cumulus documentation.

    Rule

    Rules are configurable scheduled events that trigger workflows based on various criteria.

    For more information, see the Cumulus Rules documentation.

    S3

    Amazon's Simple Storage Service provides data object storage in the cloud. Used in Cumulus to store configuration, data, and more.

    For more information, see AWS's S3 page.

    SIPS

    Science Investigator-led Processing Systems. In the context of DAAC ingest, this refers to data producers/providers.

    For more information, see nasa.gov.

    SNS

    Amazon's Simple Notification Service provides a messaging service that allows publication of and subscription to events. Used in Cumulus to trigger workflow events, track event failures, and others.

    For more information, see AWS's SNS page.

    SQS

    Amazon's Simple Queue Service.

    For more information, see AWS's SQS page.

    Stack

    A collection of AWS resources you can manage as a single unit.

    In the context of Cumulus, this refers to a deployment of the cumulus and data-persistence modules that is managed by Terraform.

    Step Function

    AWS's web service that allows you to compose complex workflows as a state machine comprised of tasks (Lambdas, activities hosted on EC2/ECS, some AWS service APIs, etc). See AWS's Step Function Documentation for more information. In the context of Cumulus these are the underlying AWS service used to create Workflows.

    Terraform

    Terraform is the tool that you will use for deployment and configuration of your Cumulus environment.

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    - + \ No newline at end of file diff --git a/docs/index.html b/docs/index.html index a77d921b054..1baec2564a9 100644 --- a/docs/index.html +++ b/docs/index.html @@ -5,13 +5,13 @@ Introduction | Cumulus Documentation - +
    Version: v16.0.0

    Introduction

    This Cumulus project seeks to address the existing need for a “native” cloud-based data ingest, archive, distribution, and management system that can be used for all future Earth Observing System Data and Information System (EOSDIS) data streams via the development and implementation of Cumulus. The term “native” implies that the system will leverage all components of a cloud infrastructure provided by the vendor for efficiency (in terms of both processing time and cost). Additionally, Cumulus will operate on future data streams involving satellite missions, aircraft missions, and field campaigns.

    This documentation includes both guidelines, examples, and source code docs. It is accessible at https://nasa.github.io/cumulus.


    Get To Know Cumulus

    • Getting Started - here - If you are new to Cumulus we suggest that you begin with this section to help you understand and work in the environment.
    • General Cumulus Documentation - here <- you're here

    Cumulus Reference Docs

    • Cumulus API Documentation - here
    • Cumulus Developer Documentation - here - READMEs throughout the main repository.
    • Data Cookbooks - here

    Auxiliary Guides

    • Integrator Guide - here
    • Operator Docs - here

    Contributing

    Please refer to: https://github.com/nasa/cumulus/blob/master/CONTRIBUTING.md for information. We thank you in advance.

    - + \ No newline at end of file diff --git a/docs/integrator-guide/about-int-guide/index.html b/docs/integrator-guide/about-int-guide/index.html index 24e33f9da9b..9fadab00346 100644 --- a/docs/integrator-guide/about-int-guide/index.html +++ b/docs/integrator-guide/about-int-guide/index.html @@ -5,13 +5,13 @@ About Integrator Guide | Cumulus Documentation - +
    Version: v16.0.0

    About Integrator Guide

    Purpose

    The Integrator Guide is to help supplement the Cumulus documentation and Data Cookbooks. This content is for Cumulus integrators who are either new to the project or need a step-by-step resource to help them along.

    What Is A Cumulus Integrator

    Cumulus integrators are those who work within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    - + \ No newline at end of file diff --git a/docs/integrator-guide/int-common-use-cases/index.html b/docs/integrator-guide/int-common-use-cases/index.html index 52625d4590f..0e5b6468721 100644 --- a/docs/integrator-guide/int-common-use-cases/index.html +++ b/docs/integrator-guide/int-common-use-cases/index.html @@ -5,13 +5,13 @@ Integrator Common Use Cases | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/integrator-guide/workflow-add-new-lambda/index.html b/docs/integrator-guide/workflow-add-new-lambda/index.html index beb527df440..56402b9e04b 100644 --- a/docs/integrator-guide/workflow-add-new-lambda/index.html +++ b/docs/integrator-guide/workflow-add-new-lambda/index.html @@ -5,13 +5,13 @@ Workflow - Add New Lambda | Cumulus Documentation - +
    Version: v16.0.0

    Workflow - Add New Lambda

    You can develop a workflow task in AWS Lambda or Elastic Container Service (ECS). AWS ECS requires Docker. For a list of tasks to use go to our Cumulus Tasks page.

    The following steps are to help you along as you write a new Lambda that integrates with a Cumulus workflow. This will aid you with the understanding of the Cumulus Message Adapter (CMA) process.

    Steps

    1. Define New Lambda in Terraform

    2. Add Task in JSON Object

      For details on how to set up a workflow via CMA go to the CMA Tasks: Message Flow.

      You will need to assign input and output for the new task and follow the CMA contract here. This contract defines how libraries should call the cumulus-message-adapter to integrate a task into an existing Cumulus Workflow.

    3. Verify New Task

      Check the updated workflow in AWS and in Cumulus.

    - + \ No newline at end of file diff --git a/docs/integrator-guide/workflow-ts-failed-step/index.html b/docs/integrator-guide/workflow-ts-failed-step/index.html index 6d743c1cf1c..4952d31bb94 100644 --- a/docs/integrator-guide/workflow-ts-failed-step/index.html +++ b/docs/integrator-guide/workflow-ts-failed-step/index.html @@ -5,13 +5,13 @@ Workflow - Troubleshoot Failed Step(s) | Cumulus Documentation - +
    Version: v16.0.0

    Workflow - Troubleshoot Failed Step(s)

    Steps

    1. Locate Step
    • Go to Cumulus dashboard
    • Find the granule
    • Go to Executions to determine the failed step
    1. Investigate in Cloudwatch
    • Go to Cloudwatch
    • Locate lambda
    • Search Cloudwatch logs
    1. Recreate Error

      In your sandbox environment, try to recreate the error.

    2. Resolution

    - + \ No newline at end of file diff --git a/docs/interfaces/index.html b/docs/interfaces/index.html index 89ed108c3a9..6d2393dc9d0 100644 --- a/docs/interfaces/index.html +++ b/docs/interfaces/index.html @@ -5,13 +5,13 @@ Interfaces | Cumulus Documentation - +
    Version: v16.0.0

    Interfaces

    Cumulus has multiple interfaces that allow interaction with discrete components of the system, such as starting workflows via SNS/Kinesis/SQS, manually queueing workflow start messages, submitting SNS notifications for completed workflows, and the many operations allowed by the Cumulus API.

    The diagram below illustrates the workflow process in detail and the various interfaces that allow starting of workflows, reporting of workflow information, and database create operations that occur when a workflow reporting message is processed. For interfaces with expected input or output schemas, details are provided below.

    Architecture diagram showing the interfaces for triggering and reporting of Cumulus workflow executions

    Workflow triggers and queuing

    Kinesis stream

    As a Kinesis stream is consumed by the messageConsumer Lambda to queue workflow executions, the incoming event is validated against this consumer schema by the ajv package.

    SQS queue for executions

    The messages put into the SQS queue for executions should conform to the Cumulus message format.

    Workflow executions

    See the documentation on Cumulus workflows.

    Workflow reporting

    SNS reporting topics

    For granule and PDR reporting, the topics will only receive data if the Cumulus workflow execution message meets the following criteria:

    • Granules - workflow message contains granule data in payload.granules
    • PDRs - workflow message contains PDR data in payload.pdr

    The messages published to the SNS reporting topics for executions and PDRs and the record property in the messages published to the granules SNS topic should conform to the model schema for each data type.

    Further detail on workflow reporting and how to interact with these interfaces can be found in the workflow notifications data cookbook.

    Cumulus API

    See the Cumulus API documentation.

    - + \ No newline at end of file diff --git a/docs/next/adding-a-task/index.html b/docs/next/adding-a-task/index.html index b4cd3292aff..c7f396c4086 100644 --- a/docs/next/adding-a-task/index.html +++ b/docs/next/adding-a-task/index.html @@ -5,13 +5,13 @@ Contributing a Task | Cumulus Documentation - +
    Version: Next

    Contributing a Task

    We're tracking reusable Cumulus tasks in this list and, if you've got one you'd like to share with others, you can add it!

    Right now we're focused on tasks distributed via npm, but are open to including others. For now the script that pulls all the data for each package only supports npm.

    The tasks.md file is generated in the build process

    The tasks list in docs/tasks.md is generated from the list of task package names from the tasks folder.

    caution

    Do not edit the docs/tasks.md file directly.

    - + \ No newline at end of file diff --git a/docs/next/api/index.html b/docs/next/api/index.html index 012ed5c6dc0..017761ddfbf 100644 --- a/docs/next/api/index.html +++ b/docs/next/api/index.html @@ -5,13 +5,13 @@ Cumulus API | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/next/architecture/index.html b/docs/next/architecture/index.html index 6111ff2e617..88449a5c466 100644 --- a/docs/next/architecture/index.html +++ b/docs/next/architecture/index.html @@ -5,14 +5,14 @@ Architecture | Cumulus Documentation - +
    Version: Next

    Architecture

    Architecture

    Below, find a diagram with the components that comprise an instance of Cumulus.

    Architecture diagram of a Cumulus deployment

    This diagram details all of the major architectural components of a Cumulus deployment.

    While the diagram can feel complex, it can easily be digested in several major components:

    Data Distribution

    End Users can access data via Cumulus's distribution submodule, which includes ASF's thin egress application, this provides authenticated data egress, temporary S3 links and other statistics features.

    End user exposure of Cumulus's holdings is expected to be provided by an external service.

    For NASA use, this is assumed to be CMR in this diagram.

    Data ingest

    Workflows

    The core of the ingest and processing capabilities in Cumulus is built into the deployed AWS Step Function workflows. Cumulus rules trigger workflows via either Cloud Watch rules, Kinesis streams, SNS topic, or SQS queue. The workflows then run with a configured Cumulus message, utilizing built-in processes to report status of granules, PDRs, executions, etc to the Data Persistence components.

    Workflows can optionally report granule metadata to CMR, and workflow steps can report metrics information to a shared SNS topic, which could be subscribed to for near real time granule, execution, and PDR status. This could be used for metrics reporting using an external ELK stack, for example.

    Data persistence

    Cumulus entity state data is stored in a PostgreSQL compatible database, and is exported to an Elasticsearch instance for non-authoritative querying/state data for the API and other applications that require more complex queries.

    Data discovery

    Discovering data for ingest is handled via workflow step components using Cumulus provider and collection configurations and various triggers. Data can be ingested from AWS S3, FTP, HTTPS and more.

    Database

    Cumulus utilizes a user-provided PostgreSQL database backend. For improved API search query efficiency Cumulus provides data replication to an Elasticsearch instance.

    PostgreSQL Database Schema Diagram

    ERD of the Cumulus Database

    Maintenance

    System maintenance personnel have access to manage ingest and various portions of Cumulus via an AWS API gateway, as well as the operator dashboard.

    Deployment Structure

    Cumulus is deployed via Terraform and is organized internally into two separate top-level modules, as well as several external modules.

    Cumulus

    The Cumulus module, which contains multiple internal submodules, deploys all of the Cumulus components that are not part of the Data Persistence portion of this diagram.

    Data persistence

    The data persistence module provides the Data Persistence portion of the diagram.

    Other modules

    Other modules are provided as artifacts on the release page for use in users configuring their own deployment and contain extracted subcomponents of the cumulus module. For more on these components see the components documentation.

    For more on the specific structure, examples of use and how to deploy and more, please see the deployment docs as well as the cumulus-template-deploy repo .

    - + \ No newline at end of file diff --git a/docs/next/category/about-cumulus/index.html b/docs/next/category/about-cumulus/index.html index f8a69bb5419..87f3dc02617 100644 --- a/docs/next/category/about-cumulus/index.html +++ b/docs/next/category/about-cumulus/index.html @@ -5,13 +5,13 @@ About Cumulus | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/next/category/common-use-cases/index.html b/docs/next/category/common-use-cases/index.html index 56a9c7fa640..248bafd7f0f 100644 --- a/docs/next/category/common-use-cases/index.html +++ b/docs/next/category/common-use-cases/index.html @@ -5,13 +5,13 @@ Common Use Cases | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/next/category/configuration-1/index.html b/docs/next/category/configuration-1/index.html index ece936a5b47..ad1346f4598 100644 --- a/docs/next/category/configuration-1/index.html +++ b/docs/next/category/configuration-1/index.html @@ -5,13 +5,13 @@ Configuration | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/next/category/configuration/index.html b/docs/next/category/configuration/index.html index 5ff4296c458..663ae684c3f 100644 --- a/docs/next/category/configuration/index.html +++ b/docs/next/category/configuration/index.html @@ -5,13 +5,13 @@ Configuration | Cumulus Documentation - +
    Version: Next

    Configuration

    - + \ No newline at end of file diff --git a/docs/next/category/cookbooks/index.html b/docs/next/category/cookbooks/index.html index 011d420c7f2..cc6431b6967 100644 --- a/docs/next/category/cookbooks/index.html +++ b/docs/next/category/cookbooks/index.html @@ -5,13 +5,13 @@ Cookbooks | Cumulus Documentation - +
    Version: Next

    Cookbooks

    - + \ No newline at end of file diff --git a/docs/next/category/cumulus-development/index.html b/docs/next/category/cumulus-development/index.html index bcf3cf1ec8b..a31ac2c813b 100644 --- a/docs/next/category/cumulus-development/index.html +++ b/docs/next/category/cumulus-development/index.html @@ -5,13 +5,13 @@ Cumulus Development | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/next/category/deployment/index.html b/docs/next/category/deployment/index.html index 047d014dfd4..365731dcf0f 100644 --- a/docs/next/category/deployment/index.html +++ b/docs/next/category/deployment/index.html @@ -5,13 +5,13 @@ Cumulus Deployment | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/next/category/development/index.html b/docs/next/category/development/index.html index 84fd1c3f346..7a66f6dab51 100644 --- a/docs/next/category/development/index.html +++ b/docs/next/category/development/index.html @@ -5,13 +5,13 @@ Development | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/next/category/external-contributions/index.html b/docs/next/category/external-contributions/index.html index e06c41a4d83..136b055d627 100644 --- a/docs/next/category/external-contributions/index.html +++ b/docs/next/category/external-contributions/index.html @@ -5,13 +5,13 @@ External Contributions | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/next/category/features/index.html b/docs/next/category/features/index.html index 955d63f650c..166e8574aea 100644 --- a/docs/next/category/features/index.html +++ b/docs/next/category/features/index.html @@ -5,13 +5,13 @@ Features | Cumulus Documentation - +
    Version: Next

    Features

    📄️ How to replay Kinesis messages after an outage

    After a period of outage, it may be necessary for a Cumulus operator to reprocess or 'replay' messages that arrived on an AWS Kinesis Data Stream but did not trigger an ingest. This document serves as an outline on how to start a replay operation, and how to perform status tracking. Cumulus supports replay of all Kinesis messages on a stream (subject to the normal RetentionPeriod constraints), or all messages within a given time slice delimited by start and end timestamps.

    - + \ No newline at end of file diff --git a/docs/next/category/getting-started/index.html b/docs/next/category/getting-started/index.html index 2b571bbac92..867382acc7d 100644 --- a/docs/next/category/getting-started/index.html +++ b/docs/next/category/getting-started/index.html @@ -5,13 +5,13 @@ Getting Started | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/next/category/integrator-guide/index.html b/docs/next/category/integrator-guide/index.html index 5d4f4fd33d5..dde21ef4f29 100644 --- a/docs/next/category/integrator-guide/index.html +++ b/docs/next/category/integrator-guide/index.html @@ -5,13 +5,13 @@ Integrator Guide | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/next/category/logs/index.html b/docs/next/category/logs/index.html index 8f2c391f1b0..06adf6d26c6 100644 --- a/docs/next/category/logs/index.html +++ b/docs/next/category/logs/index.html @@ -5,13 +5,13 @@ Logs | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/next/category/operations/index.html b/docs/next/category/operations/index.html index 3febcc9d91e..b9fb90f4945 100644 --- a/docs/next/category/operations/index.html +++ b/docs/next/category/operations/index.html @@ -5,13 +5,13 @@ Operations | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/next/category/troubleshooting/index.html b/docs/next/category/troubleshooting/index.html index 7ebfb6545ce..cf6dca1f3b8 100644 --- a/docs/next/category/troubleshooting/index.html +++ b/docs/next/category/troubleshooting/index.html @@ -5,13 +5,13 @@ Troubleshooting | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/next/category/upgrade-notes/index.html b/docs/next/category/upgrade-notes/index.html index ba1111b973e..644cef88d11 100644 --- a/docs/next/category/upgrade-notes/index.html +++ b/docs/next/category/upgrade-notes/index.html @@ -5,13 +5,13 @@ Upgrade Notes | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/next/category/workflow-tasks/index.html b/docs/next/category/workflow-tasks/index.html index b9bd8c999ae..4eafe9be1a3 100644 --- a/docs/next/category/workflow-tasks/index.html +++ b/docs/next/category/workflow-tasks/index.html @@ -5,13 +5,13 @@ Workflow Tasks | Cumulus Documentation - +
    Version: Next

    Workflow Tasks

    - + \ No newline at end of file diff --git a/docs/next/category/workflows/index.html b/docs/next/category/workflows/index.html index 76526bb19c0..73560452900 100644 --- a/docs/next/category/workflows/index.html +++ b/docs/next/category/workflows/index.html @@ -5,13 +5,13 @@ Workflows | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/next/configuration/cloudwatch-retention/index.html b/docs/next/configuration/cloudwatch-retention/index.html index 125b23bf128..7788c32145f 100644 --- a/docs/next/configuration/cloudwatch-retention/index.html +++ b/docs/next/configuration/cloudwatch-retention/index.html @@ -5,7 +5,7 @@ Cloudwatch Retention | Cumulus Documentation - + @@ -14,7 +14,7 @@ the retention period (in days) of cloudwatch log groups for lambdas and tasks which the cumulus, cumulus_distribution, and cumulus_ecs_service modules supports (using the cumulus module as an example):

    module "cumulus" {
    # ... other variables
    default_log_retention_days = var.default_log_retention_days
    cloudwatch_log_retention_periods = var.cloudwatch_log_retention_periods
    }

    By setting the below variables in terraform.tfvars and deploying, the cloudwatch log groups will be instantiated or updated with the new retention value.

    default_log_retention_periods

    The variable default_log_retention_days can be configured in order to set the default log retention for all cloudwatch log groups managed by Cumulus in case a custom value isn't used. The log groups will use this value for their retention, and if this value is not set either, the retention will default to 30 days. For example, if a user would like their log groups of the Cumulus module to have a retention period of one year, deploy the respective modules with the variable in the example below.

    Example

    default_log_retention_periods = 365

    cloudwatch_log_retention_periods

    The retention period (in days) of cloudwatch log groups for specific lambdas and tasks can be set during deployment using the cloudwatch_log_retention_periods terraform map variable. In order to configure these values for respective cloudwatch log groups, uncomment the cloudwatch_log_retention_periods variable and add the retention values listed below corresponding to the group's retention you want to change. The following values are supported correlating to their lambda/task name, (i.e. "/aws/lambda/prefix-DiscoverPdrs" would have the retention variable "DiscoverPdrs" )

    • ApiEndpoints
    • AsyncOperationEcsLogs
    • DiscoverPdrs
    • DistributionApiEndpoints
    • EcsLogs
    • granuleFilesCacheUpdater
    • HyraxMetadataUpdates
    • ParsePdr
    • PostToCmr
    • PrivateApiLambda
    • publishExecutions
    • publishGranules
    • publishPdrs
    • QueuePdrs
    • QueueWorkflow
    • replaySqsMessages
    • SyncGranule
    • UpdateCmrAccessConstraints
    note

    EcsLogs is used for all cumulus_ecs_service tasks cloudwatch log groups

    Example

    cloudwatch_log_retention_periods = {
    ParsePdr = 365
    }

    The retention periods are the number of days you'd like to retain the logs in the specified log group for. There is a list of possible values available in the aws logs documentation.

    - + \ No newline at end of file diff --git a/docs/next/configuration/collection-storage-best-practices/index.html b/docs/next/configuration/collection-storage-best-practices/index.html index 75a7a76e95c..1805bb669a4 100644 --- a/docs/next/configuration/collection-storage-best-practices/index.html +++ b/docs/next/configuration/collection-storage-best-practices/index.html @@ -5,13 +5,13 @@ Collection Cost Tracking and Storage Best Practices | Cumulus Documentation - +
    Version: Next

    Collection Cost Tracking and Storage Best Practices

    Organizing your data is important for metrics you may want to collect. AWS S3 storage and cost metrics are calculated at the bucket level, so it is easy to get metrics by bucket. You can get storage metrics at the key prefix level, but that is done through the CLI, which can be very slow for large buckets. It is very difficult to estimate costs at the prefix level.

    Calculating Storage By Collection

    By bucket

    Usage by bucket can be obtained in your AWS Billing Dashboard via an S3 Usage Report. You can download your usage report for a period of time and review your storage and requests at the bucket level.

    Bucket metrics can also be found in the AWS CloudWatch Metrics Console (also see Using Amazon CloudWatch Metrics).

    Navigate to Storage Metrics and select the BucketName for all buckets you are interested in. The available metrics are BucketSizeInBytes and NumberOfObjects.

    In the Graphed metrics tab, you can select the type of statistic (i.e. average, minimum, maximum) and the period for the stats. At the top, it's useful to select from the dropdown to view the metrics as a number. You can also select the time period for which you want to see stats.

    Alternatively you can query CloudWatch using the CLI.

    This command will return the average number of bytes in the bucket test-bucket for 7/31/2019:

    aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2019-07-31T00:00:00 --end-time 2019-08-01T00:00:00 --period 86400 --statistics Average --region us-east-1 --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=test-bucket Name=StorageType,Value=StandardStorage

    The result looks like:

    {
    "Datapoints": [
    {
    "Timestamp": "2019-07-31T00:00:00Z",
    "Average": 150996467959.0,
    "Unit": "Bytes"
    }
    ],
    "Label": "BucketSizeBytes"
    }

    By key prefix

    AWS does not offer storage and usage statistics at a key prefix level. Via the AWS CLI, you can get the total storage for a bucket or folder. The following command would get the storage for folder example-folder in bucket sample-bucket:

    aws s3 ls --summarize --human-readable --recursive s3://sample-bucket/example-folder | grep 'Total'

    Note that this can be a long-running operation for large buckets.

    Calculating Cost By Collection

    NASA NGAP Environment

    If using an NGAP account, the cost per bucket can be found in your CloudTamer console, in the Financials section of your account information. This is calculated on a monthly basis.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Outside of NGAP

    You can enabled S3 Cost Allocation Tags and tag your buckets. From there, you can view the cost breakdown in your AWS Billing Dashboard via the Cost Explorer. Cost Allocation Tagging is available at the bucket level.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Storage Configuration

    Cumulus allows for the configuration of many buckets for your files. Buckets are created and added to your deployment as part of the deployment process.

    In your Cumulus collection configuration, you specify where you want the files to be stored post-processing. This is done by matching a regular expression on the file with the configured bucket.

    Note that in the collection configuration, the bucket field is the key to the buckets variable in the deployment's .tfvars file.

    Organizing By Bucket

    You can specify separate groups of buckets for each collection, which could look like the example below.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "MOD09GQ-006-private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "MOD09GQ-006-public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    Additional collections would go to different buckets.

    Organizing by Key Prefix

    Different collections can be organized into different folders in the same bucket, using the key prefix, which is specified as the url_path in the collection configuration. In this simplified collection configuration example, the url_path field is set at the top level so that all files go to a path prefixed with the collection name and version.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    In this case, the path to all the files would be: MOD09GQ___006/<filename> in their respective buckets.

    The url_path can be overidden directly on the file configuration. The example below produces the same result.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "protected-2",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    }
    ]
    }
    - + \ No newline at end of file diff --git a/docs/next/configuration/data-management-types/index.html b/docs/next/configuration/data-management-types/index.html index a1cfcfe96d4..4ec0cac711b 100644 --- a/docs/next/configuration/data-management-types/index.html +++ b/docs/next/configuration/data-management-types/index.html @@ -5,13 +5,13 @@ Cumulus Data Management Types | Cumulus Documentation - +
    Version: Next

    Cumulus Data Management Types

    What Are The Cumulus Data Management Types

    • Collections: Collections are logical sets of data objects of the same data type and version. They provide contextual information used by Cumulus ingest.
    • Granules: Granules are the smallest aggregation of data that can be independently managed. They are always associated with a collection, which is a grouping of granules.
    • Providers: Providers generate and distribute input data that Cumulus obtains and sends to workflows.
    • Rules: Rules tell Cumulus how to associate providers and collections and when/how to start processing a workflow.
    • Workflows: Workflows are composed of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage, and archive data.
    • Executions: Executions are records of a workflow.
    • Reconciliation Reports: Reports are a comparison of data sets to check to see if they are in agreement and to help Cumulus users detect conflicts.

    Interaction

    • Providers tell Cumulus where to get new data - i.e. S3, HTTPS
    • Collections tell Cumulus where to store the data files
    • Rules tell Cumulus when to trigger a workflow execution and tie providers and collections together

    Managing Data Management Types

    The following are created via the dashboard or API:

    • Providers
    • Collections
    • Rules
    • Reconciliation reports

    Granules are created by workflow executions and then can be managed via the dashboard or API.

    An execution record is created for each workflow execution triggered and can be viewed in the dashboard or data can be retrieved via the API.

    Workflows are created and managed via the Cumulus deployment.

    Configuration Fields

    Schemas

    Looking at our API schema definitions can provide us with some insight into collections, providers, rules, and their attributes (and whether those are required or not). The schema for different concepts will be reference throughout this document.

    note

    The schemas are extremely useful for understanding which attributes are configurable and which of those are required. Cumulus uses these schemas for validation.

    Providers

    note
    • While connection configuration is defined here, things that are more specific to a specific ingest setup (e.g. 'What target directory should we be pulling from' or 'How is duplicate handling configured?') are generally defined in a Rule or Collection, not the Provider.
    • There is some provider behavior which is controlled by task-specific configuration and not the provider definition. This configuration has to be set on a per-workflow basis. For example, see the httpListTimeout configuration on the discover-granules task

    Provider Configuration

    The Provider configuration is defined by a JSON object that takes different configuration keys depending on the provider type. The following are definitions of typical configuration values relevant for the various providers:

    Configuration by provider type
    S3
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be s3 for this provider type.
    hoststringYesS3 Bucket to pull data from
    http
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be http for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 80
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certificateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    https
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be https for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 443
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certiciateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    ftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be ftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to anonymous if not defined
    passwordstringNoPassword to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to password if not defined
    portintegerNoPort to connect to the provider on. Defaults to 21
    sftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be sftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the sftp server.
    passwordstringNoPassword to use to connect to the sftp server.
    portintegerNoPort to connect to the provider on. Defaults to 22
    privateKeystringNofilename assumed to be in s3://bucketInternal/stackName/crypto
    cmKeyIdstringNoAWS KMS Customer Master Key arn or alias

    Collections

    Break down of [s3_MOD09GQ_006.json](https://github.com/nasa/cumulus/blob/master/example/data/collections/s3_MOD09GQ_006/s3_MOD09GQ_006.json)
    KeyValueRequiredDescription
    name"MOD09GQ"YesThe name attribute designates the name of the collection. This is the name under which the collection will be displayed on the dashboard
    version"006"YesA version tag for the collection
    granuleId"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$"YesThe regular expression used to validate the granule ID extracted from filenames according to the granuleIdExtraction
    granuleIdExtraction"(MOD09GQ\..*)(\.hdf|\.cmr|_ndvi\.jpg)"YesThe regular expression used to extract the granule ID from filenames. The first capturing group extracted from the filename by the regex will be used as the granule ID.
    sampleFileName"MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesAn example filename belonging to this collection
    files<JSON Object> of files defined hereYesDescribe the individual files that will exist for each granule in this collection (size, browse, meta, etc.)
    dataType"MOD09GQ"NoCan be specified, but this value will default to the collection_name if not
    duplicateHandling"replace"No("replace"|"version"|"skip") determines granule duplicate handling scheme
    ignoreFilesConfigForDiscoveryfalse (default)NoBy default, during discovery only files that match one of the regular expressions in this collection's files attribute (see above) are ingested. Setting this to true will ignore the files attribute during discovery, meaning that all files for a granule (i.e., all files with filenames matching granuleIdExtraction) will be ingested even when they don't match a regular expression in the files attribute at discovery time. (NOTE: this attribute does not appear in the example file, but is listed here for completeness.)
    process"modis"NoExample options for this are found in the ChooseProcess step definition in the IngestAndPublish workflow definition
    meta<JSON Object> of MetaData for the collectionNoMetaData for the collection. This metadata will be available to workflows for this collection via the Cumulus Message Adapter.
    url_path"{cmrMetadata.Granule.Collection.ShortName}/
    {substring(file.fileName, 0, 3)}"
    NoFilename without extension

    files-object

    KeyValueRequiredDescription
    regex"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"YesRegular expression used to identify the file
    sampleFileNameMOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesFilename used to validate the provided regex
    type"data"NoValue to be assigned to the Granule File Type. CNM types are used by Cumulus CMR steps, non-CNM values will be treated as 'data' type. Currently only utilized in DiscoverGranules task
    bucket"internal"YesName of the bucket where the file will be stored
    url_path"${collectionShortName}/{substring(file.fileName, 0, 3)}"NoFolder used to save the granule in the bucket. Defaults to the collection url_path
    checksumFor"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"NoIf this is a checksum file, set checksumFor to the regex of the target file.

    Rules

    Rules are used by to start processing workflows and the transformation process. Rules can be invoked manually, based on a schedule, or can be configured to be triggered by either events in Kinesis, SNS messages or SQS messages.

    Rule configuration
    KeyValueRequiredDescription
    name"L2_HR_PIXC_kinesisRule"YesName of the rule. This is the name under which the rule will be listed on the dashboard
    workflow"CNMExampleWorkflow"YesName of the workflow to be run. A list of available workflows can be found on the Workflows page
    provider"PODAAC_SWOT"NoConfigured provider's ID. This can be found on the Providers dashboard page
    collection<JSON Object> collection object shown belowYesName and version of the collection this rule will moderate. Relates to a collection configured and found in the Collections page
    payload<JSON Object or Array>NoThe payload to be passed to the workflow
    meta<JSON Object> of MetaData for the ruleNoMetaData for the rule. This metadata will be available to workflows for this rule via the Cumulus Message Adapter.
    rule<JSON Object> rule type and associated values - discussed belowYesObject defining the type and subsequent attributes of the rule
    state"ENABLED"No("ENABLED"|"DISABLED") whether or not the rule will be active. Defaults to "ENABLED".
    queueUrlhttps://sqs.us-east-1.amazonaws.com/1234567890/queue-nameNoURL for SQS queue that will be used to schedule workflows for this rule
    tags["kinesis", "podaac"]NoAn array of strings that can be used to simplify search

    collection-object

    KeyValueRequiredDescription
    name"L2_HR_PIXC"YesName of a collection defined/configured in the Collections dashboard page
    version"000"YesVersion number of a collection defined/configured in the Collections dashboard page

    meta-object

    KeyValueRequiredDescription
    retries3NoNumber of retries on errors, for sqs-type rule only. Defaults to 3.
    visibilityTimeout900NoVisibilityTimeout in seconds for the inflight messages, for sqs-type rule only. Defaults to the visibility timeout of the SQS queue when the rule is created.

    rule-object

    KeyValueRequiredDescription
    type"kinesis"Yes("onetime"|"scheduled"|"kinesis"|"sns"|"sqs") type of scheduling/workflow kick-off desired
    value<String> ObjectDependsDiscussion of valid values is below

    rule-value

    The rule - value entry depends on the type of run:

    • If this is a onetime rule this can be left blank. Example
    • If this is a scheduled rule this field must hold a valid cron-type expression or rate expression.
    • If this is a kinesis rule, this must be a configured ${Kinesis_stream_ARN}. Example
    • If this is an sns rule, this must be an existing ${SNS_Topic_Arn}. Example
    • If this is an sqs rule, this must be an existing ${SQS_QueueUrl} that your account has permissions to access, and also you must configure a dead-letter queue for this SQS queue. Example

    sqs-type rule features

    • When an SQS rule is triggered, the SQS message remains on the queue.
    • The SQS message is not processed multiple times in parallel when visibility timeout is properly set. You should set the visibility timeout to the maximum expected length of the workflow with padding. Longer is better to avoid parallel processing.
    • The SQS message visibility timeout can be overridden by the rule.
    • Upon successful workflow execution, the SQS message is removed from the queue.
    • Upon failed execution(s), the workflow is run 3 or configured number of times.
    • Upon failed execution(s), the visibility timeout will be set to 5s to allow retries.
    • After configured number of failed retries, the SQS message is moved to the dead-letter queue configured for the SQS queue.

    Configuration Via Cumulus Dashboard

    Create A Provider

    • In the Cumulus dashboard, go to the Provider page.

    Screenshot of Create Provider form

    • Click on Add Provider.
    • Fill in the form and then submit it.

    Screenshot of Create Provider form

    Create A Collection

    • Go to the Collections page.

    Screenshot of the Collections page

    • Click on Add Collection.
    • Copy and paste or fill in the collection JSON object form.

    Screenshot of Add Collection form

    • Once you submit the form, you should be able to verify that your new collection is in the list.

    Create A Rule

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    state field conditional

    If the state field is left blank, it defaults to false.

    Rule Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/next/configuration/lifecycle-policies/index.html b/docs/next/configuration/lifecycle-policies/index.html index 1c627605f14..87b87091eae 100644 --- a/docs/next/configuration/lifecycle-policies/index.html +++ b/docs/next/configuration/lifecycle-policies/index.html @@ -5,13 +5,13 @@ Setting S3 Lifecycle Policies | Cumulus Documentation - +
    Version: Next

    Setting S3 Lifecycle Policies

    This document will outline, in brief, how to set data lifecycle policies so that you are more easily able to control data storage costs while keeping your data accessible. For more information on why you might want to do this, see the 'Additional Information' section at the end of the document.

    Requirements

    • The AWS CLI installed and configured (if you wish to run the CLI example). See AWS's guide to setting up the AWS CLI for more on this. Please ensure the AWS CLI is in your shell path.
    • You will need a S3 bucket on AWS. You are strongly encouraged to use a bucket without voluminous amounts of data in it for experimenting/learning.
    • An AWS user with the appropriate roles to access the target bucket as well as modify bucket policies.

    Examples

    Walk-through on setting time-based S3 Infrequent Access (S3IA) bucket policy

    This example will give step-by-step instructions on updating a bucket's lifecycle policy to move all objects in the bucket from the default storage to S3 Infrequent Access (S3IA) after a period of 90 days. Below are instructions for walking through configuration via the command line and the management console.

    Command Line

    caution

    Please ensure you have the AWS CLI installed and configured for access prior to attempting this example.

    Create policy

    From any directory you chose, open an editor and add the following to a file named exampleRule.json

    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    Set policy

    On the command line run the following command (with the bucket you're working with substituted in place of yourBucketNameHere).

    aws s3api put-bucket-lifecycle-configuration --bucket yourBucketNameHere --lifecycle-configuration file://exampleRule.json

    Verify policy has been set

    To obtain all of the existing policies for a bucket, run the following command (again substituting the correct bucket name):

     $ aws s3api get-bucket-lifecycle-configuration --bucket yourBucketNameHere
    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    You have set a policy that transitions any version of an object in the bucket to S3IA after each object version has not been modified for 90 days.

    Management Console

    Create Policy

    To create the example policy on a bucket via the management console, go to the following URL (replacing 'yourBucketHere' with the bucket you intend to update):

    https://s3.console.aws.amazon.com/s3/buckets/yourBucketHere/?tab=overview

    You should see a screen similar to:

    Screenshot of AWS console for an S3 bucket

    Click the "Management" Tab, then lifecycle button and press + Add lifecycle rule:

    Screenshot of &quot;Management&quot; tab of AWS console for an S3 bucket

    Give the rule a name (e.g. '90DayRule'), leaving the filter blank:

    Screenshot of window for configuring the name and scope of a lifecycle rule on an S3 bucket in the AWS console

    Click next, and mark Current Version and Previous Versions.

    Then for each, click + Add transition and select Transition to Standard-IA after for the Object creation field, and set 90 for the Days after creation/Days after objects become concurrent field. Your screen should look similar to:

    Screenshot of window for configuring the storage class transitions of a lifecycle rule on an S3 bucket in the AWS console

    Click next, then next past the Configure expiration screen (we won't be setting this), and on the fourth page, click Save:

    Screenshot of window for reviewing the configuration of a lifecycle rule on an S3 bucket in the AWS console

    You should now see you have a rule configured for your bucket:

    Screenshot of lifecycle rule appearing in the &quot;Management&quot; tab of AWS console for an S3 bucket

    You have now set a policy that transitions any version of an object in the bucket to S3IA after each object has not been modified for 90 days.

    Additional Information

    This section lists information you may want prior to enacting lifecycle policies. It is not required content for working through the examples.

    Strategy Overview

    For a discussion of overall recommended strategy, please review the Methodology for Data Lifecycle Management on the EarthData wiki.

    AWS Documentation

    The examples shown in this document are obviously fairly basic cases. By using object tags, filters and other configuration options you can enact far more complicated policies for various scenarios. For more reading on the topics presented on this page see:

    - + \ No newline at end of file diff --git a/docs/next/configuration/monitoring-readme/index.html b/docs/next/configuration/monitoring-readme/index.html index e3438d1aa9c..e5aede290fa 100644 --- a/docs/next/configuration/monitoring-readme/index.html +++ b/docs/next/configuration/monitoring-readme/index.html @@ -5,14 +5,14 @@ Monitoring Best Practices | Cumulus Documentation - +
    Version: Next

    Monitoring Best Practices

    This document intends to provide a set of recommendations and best practices for monitoring the state of a deployed Cumulus and diagnosing any issues.

    Cumulus-provided resources and integrations for monitoring

    Cumulus provides a number set of resources that are useful for monitoring the system and its operation.

    Cumulus Dashboard

    The primary tool for monitoring the Cumulus system is the Cumulus Dashboard. The dashboard is hosted on Github and includes instructions on how to deploy and link it into your core Cumulus deployment.

    The dashboard displays workflow executions, their status, inputs, outputs, and some diagnostic information such as logs. For further information on the dashboard, its usage, and the information it provides, see the documentation.

    Cumulus-provided AWS resources

    Cumulus sets up CloudWatch log groups for all Core-provided tasks.

    Monitoring Lambda Functions

    Logging for each Lambda Function is available in Lambda-specific CloudWatch log groups.

    Monitoring ECS services

    Each deployed cumulus_ecs_service module also includes a CloudWatch log group for the processes running on ECS.

    Monitoring workflows

    For advanced debugging, we also configure dead letter queues on critical system functions. These will allow you to monitor and debug invalid inputs to the functions we use to start workflows, which can be helpful if you find that you are not seeing workflows being started as expected. More information on these can be found in the dead letter queue documentation

    AWS recommendations

    AWS has a number of recommendations on system monitoring. Rather than reproduce those here and risk providing outdated guidance, we've documented the following links which will take you to available AWS docs on monitoring recommendations and best practices for the services used in Cumulus:

    Example: Setting up email notifications for CloudWatch logs

    Cumulus does not provide out-of-the-box support for email notifications at this time. However, setting up email notifications on AWS is fairly straightforward in that the operative components are an AWS SNS topic and a subscribed email address.

    In terms of Cumulus integration, forwarding CloudWatch logs requires creating a mechanism, most likely a Lambda Function subscribed to the log group that will receive, filter and forward these messages to the SNS topic.

    As a very simple example, we could create a function that filters CloudWatch logs created by the @cumulus/logger package and sends email notifications for error and fatal log levels, adapting the example linked above:

    const zlib = require('zlib');
    const aws = require('aws-sdk');
    const { promisify } = require('util');

    const gunzip = promisify(zlib.gunzip);
    const sns = new aws.SNS();

    exports.handler = async (event) => {
    const payload = Buffer.from(event.awslogs.data, 'base64');
    const decompressedData = await gunzip(payload);
    const logData = JSON.parse(decompressedData.toString('ascii'));
    return await Promise.all(logData.logEvents.map(async (logEvent) => {
    const logMessage = JSON.parse(logEvent.message);
    if (['error', 'fatal'].includes(logMessage.level)) {
    return sns.publish({
    TopicArn: process.env.EmailReportingTopicArn,
    Message: logEvent.message
    }).promise();
    }
    return Promise.resolve();
    }));
    };

    After creating the SNS topic, We can deploy this code as a lambda function, following the setup steps from Amazon. Make sure to include your SNS topic ARN as an environment variable on the lambda function by using the --environment option on aws lambda create-function.

    You will need to create subscription filters for each log group you want to receive emails for. We recommend automating this as much as possible, and you could very well handle this via Terraform, such as using a module to deploy filters alongside log groups, or exporting the log group names to an all-in-one email notification module.

    - + \ No newline at end of file diff --git a/docs/next/configuration/server_access_logging/index.html b/docs/next/configuration/server_access_logging/index.html index 1a38399d621..d28af91d350 100644 --- a/docs/next/configuration/server_access_logging/index.html +++ b/docs/next/configuration/server_access_logging/index.html @@ -5,13 +5,13 @@ S3 Server Access Logging | Cumulus Documentation - +
    Version: Next

    S3 Server Access Logging

    Via AWS Console

    Enable server access logging for an S3 bucket

    Via AWS Command Line Interface

    1. Create a logging.json file with these contents, replacing <stack-internal-bucket> with your stack's internal bucket name, and <stack> with the name of your cumulus stack.

      {
      "LoggingEnabled": {
      "TargetBucket": "<stack-internal-bucket>",
      "TargetPrefix": "<stack>/ems-distribution/s3-server-access-logs/"
      }
      }
    2. Add the logging policy to each of your protected and public buckets by calling this command on each bucket.

      aws s3api put-bucket-logging --bucket <protected/public-bucket-name> --bucket-logging-status file://logging.json
    3. Verify the logging policy exists on your buckets.

      aws s3api get-bucket-logging --bucket <protected/public-bucket-name>
    - + \ No newline at end of file diff --git a/docs/next/configuration/task-configuration/index.html b/docs/next/configuration/task-configuration/index.html index 9475bc485ba..6bc94563eb1 100644 --- a/docs/next/configuration/task-configuration/index.html +++ b/docs/next/configuration/task-configuration/index.html @@ -5,13 +5,13 @@ Configuration of Tasks | Cumulus Documentation - +
    Version: Next

    Configuration of Tasks

    The cumulus module exposes values for configuration for some of the provided archive and ingest tasks. Currently the following are available as configurable variables:

    cmr_search_client_config

    Configuration parameters for CMR search client for cumulus archive module tasks in the form:

    <lambda_identifier>_report_cmr_limit = <maximum number records can be returned from cmr-client search, this should be greater than cmr_page_size>
    <lambda_identifier>_report_cmr_page_size = <number of records for each page returned from CMR>
    type = map(string)

    More information about cmr limit and cmr page_size can be found from @cumulus/cmr-client and CMR Search API document.

    Currently the following values are supported:

    • create_reconciliation_report_cmr_limit
    • create_reconciliation_report_cmr_page_size

    Example

    cmr_search_client_config = {
    create_reconciliation_report_cmr_limit = 2500
    create_reconciliation_report_cmr_page_size = 250
    }

    elasticsearch_client_config

    Configuration parameters for Elasticsearch client for cumulus archive module tasks in the form:

    <lambda_identifier>_es_scroll_duration = <duration>
    <lambda_identifier>_es_scroll_size = <size>
    type = map(string)

    Currently the following values are supported:

    • create_reconciliation_report_es_scroll_duration
    • create_reconciliation_report_es_scroll_size

    Example

    elasticsearch_client_config = {
    create_reconciliation_report_es_scroll_duration = "15m"
    create_reconciliation_report_es_scroll_size = 2000
    }

    lambda_timeouts

    A configurable map of timeouts (in seconds) for cumulus ingest module task lambdas in the form:

    <lambda_identifier>_timeout: <timeout>
    type = map(string)

    Currently the following values are supported:

    • add_missing_file_checksums_task_timeout
    • discover_granules_task_timeout
    • discover_pdrs_task_timeout
    • fake_processing_task_timeout
    • files_to_granules_task_timeout
    • hello_world_task_timeout
    • hyrax_metadata_update_tasks_timeout
    • lzards_backup_task_timeout
    • move_granules_task_timeout
    • parse_pdr_task_timeout
    • pdr_status_check_task_timeout
    • post_to_cmr_task_timeout
    • queue_granules_task_timeout
    • queue_pdrs_task_timeout
    • queue_workflow_task_timeout
    • sf_sqs_report_task_timeout
    • sync_granule_task_timeout
    • update_granules_cmr_metadata_file_links_task_timeout

    Example

    lambda_timeouts = {
    discover_granules_task_timeout = 300
    }

    lambda_memory_sizes

    A configurable map of memory sizes (in MBs) for cumulus ingest module task lambdas in the form:

    <lambda_identifier>_memory_size: <memory_size>
    type = map(string)

    Currently the following values are supported:

    • add_missing_file_checksums_task_memory_size
    • discover_granules_task_memory_size
    • discover_pdrs_task_memory_size
    • fake_processing_task_memory_size
    • hyrax_metadata_updates_task_memory_size
    • lzards_backup_task_memory_size
    • move_granules_task_memory_size
    • parse_pdr_task_memory_size
    • pdr_status_check_task_memory_size
    • post_to_cmr_task_memory_size
    • queue_granules_task_memory_size
    • queue_pdrs_task_memory_size
    • queue_workflow_task_memory_size
    • sf_sqs_report_task_memory_size
    • sync_granule_task_memory_size
    • update_cmr_acess_constraints_task_memory_size
    • update_granules_cmr_metadata_file_links_task_memory_size

    Example

    lambda_memory_sizes = {
    queue_granules_task_memory_size = 1036
    }
    - + \ No newline at end of file diff --git a/docs/next/data-cookbooks/about-cookbooks/index.html b/docs/next/data-cookbooks/about-cookbooks/index.html index 7c30ebfe01e..7abae75957a 100644 --- a/docs/next/data-cookbooks/about-cookbooks/index.html +++ b/docs/next/data-cookbooks/about-cookbooks/index.html @@ -5,13 +5,13 @@ About Cookbooks | Cumulus Documentation - +
    Version: Next

    About Cookbooks

    Introduction

    The following data cookbooks are documents containing examples and explanations of workflows in the Cumulus framework. Additionally, the following data cookbooks should serve to help unify an institution/user group on a set of terms.

    Setup

    The data cookbooks assume you can configure providers, collections, and rules to run workflows. Visit Cumulus data management types for information on how to configure Cumulus data management types.

    Adding a page

    As shown in detail in the "Add a New Page and Sidebars" section in Cumulus Docs: How To's, you can add a new page to the data cookbook by creating a markdown (.md) file in the docs/data-cookbooks directory. The new page can then be linked to the sidebar by adding it to the Data-Cookbooks object in the website/sidebar.json file as data-cookbooks/${id}.

    More about workflows

    Workflow general information

    Input & Output

    Developing Workflow Tasks

    Workflow Configuration How-to's

    - + \ No newline at end of file diff --git a/docs/next/data-cookbooks/browse-generation/index.html b/docs/next/data-cookbooks/browse-generation/index.html index 8b607a106cf..b8c949bce22 100644 --- a/docs/next/data-cookbooks/browse-generation/index.html +++ b/docs/next/data-cookbooks/browse-generation/index.html @@ -5,7 +5,7 @@ Ingest Browse Generation | Cumulus Documentation - + @@ -15,7 +15,7 @@ provider keys with the previously entered values) Note that you need to set the "provider_path" to the path on your bucket (e.g. "/data") that you've staged your mock/test data.:

    {
    "name": "TestBrowseGeneration",
    "workflow": "DiscoverGranulesBrowseExample",
    "provider": "{{provider_from_previous_step}}",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "meta": {
    "provider_path": "{{path_to_data}}"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "updatedAt": 1553053438767
    }

    Run Workflows

    Once you've configured the Collection and Provider and added a onetime rule with an ENABLED state, you're ready to trigger your rule, and watch the ingest workflows process.

    Go to the Rules tab, click the rule you just created:

    Screenshot of the Rules overview page with a list of rules in the Cumulus dashboard

    Then click the gear in the upper right corner and click "Rerun":

    Screenshot of clicking the button to rerun a workflow rule from the rule edit page in the Cumulus dashboard

    Tab over to executions and you should see the DiscoverGranulesBrowseExample workflow run, succeed, and then moments later the CookbookBrowseExample should run and succeed.

    Screenshot of page listing executions in the Cumulus dashboard

    Results

    You can verify your data has ingested by clicking the successful workflow entry:

    Screenshot of individual entry from table listing executions in the Cumulus dashboard

    Select "Show Output" on the next page

    Screenshot of &quot;Show output&quot; button from individual execution page in the Cumulus dashboard

    and you should see in the payload from the workflow something similar to:

    "payload": {
    "process": "modis",
    "granules": [
    {
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-private",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "type": "browse",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-protected-2",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}"
    }
    ],
    "cmrLink": "https://cmr.uat.earthdata.nasa.gov/search/granules.json?concept_id=G1222231611-CUMULUS",
    "cmrConceptId": "G1222231611-CUMULUS",
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "cmrMetadataFormat": "echo10",
    "dataType": "MOD09GQ",
    "version": "006",
    "published": true
    }
    ]
    }

    You can verify the granules exist within your cumulus instance (search using the Granules interface, check the S3 buckets, etc) and validate that the above CMR entry


    Build Processing Lambda

    This section discusses the construction of a custom processing lambda to replace the contrived example from this entry for a real dataset processing task.

    To ingest your own data using this example, you will need to construct your own lambda to replace the source in ProcessingStep that will generate browse imagery and provide or update a CMR metadata export file.

    You will then need to add the lambda to your Cumulus deployment as a aws_lambda_function Terraform resource.

    The discussion below outlines requirements for this lambda.

    Inputs

    The incoming message to the task defined in the ProcessingStep as configured will have the following configuration values (accessible inside event.config courtesy of the message adapter):

    Configuration

    • event.config.bucket -- the name of the bucket configured in terraform.tfvars as your internal bucket.

    • event.config.collection -- The full collection object we will configure in the Configure Ingest section. You can view the expected collection schema in the docs here or in the source code on github. You need this as available input and output so you can update as needed.

    event.config.additionalUrls, generateFakeBrowse and event.config.cmrMetadataFormat from the example can be ignored as they're configuration flags for the provided example script.

    Payload

    The 'payload' from the previous task is accessible via event.input. The expected payload output schema from SyncGranules can be viewed here.

    In our example, the payload would look like the following. Note: The types are set per-file based on what we configured in our collection, and were initially added as part of the DiscoverGranules step in the DiscoverGranulesBrowseExample workflow.

     "payload": {
    "process": "modis",
    "granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    }
    ]
    }
    ]
    }

    Generating Browse Imagery

    The provided example script used in the example goes through all granules and adds a 'fake' .jpg browse file to the same staging location as the data staged by prior ingest tasksf.

    The processing lambda you construct will need to do the following:

    • Create a browse image file based on the input data, and stage it to a location accessible to both this task and the FilesToGranules and MoveGranules tasks in a S3 bucket.
    • Add the browse file to the input granule files, making sure to set the granule file's type to browse.
    • Update meta.input_granules with the updated granules list, as well as provide the files to be integrated by FilesToGranules as output from the task.

    Generating/updating CMR metadata

    If you do not already have a CMR file in the granules list, you will need to generate one for valid export. This example's processing script generates and adds it to the FilesToGranules file list via the payload but it can be present in the InputGranules from the DiscoverGranules task as well if you'd prefer to pre-generate it.

    Both downstream tasks MoveGranules, UpdateGranulesCmrMetadataFileLinks, and PostToCmr expect a valid CMR file to be available if you want to export to CMR.

    Expected Outputs for processing task/tasks

    In the above example, the critical portion of the output to FilesToGranules is the payload and meta.input_granules.

    In the example provided, the processing task is setup to return an object with the keys "files" and "granules". In the cumulus_message configuration, the outputs are mapped in the configuration to the payload, granules to meta.input_granules:

              "task_config": {
    "inputGranules": "{$.meta.input_granules}",
    "granuleIdExtraction": "{$.meta.collection.granuleIdExtraction}"
    }

    Their expected values from the example above may be useful in constructing a processing task:

    payload

    The payload includes a full list of files to be 'moved' into the cumulus archive. The FilesToGranules task will take this list, merge it with the information from InputGranules, then pass that list to the MoveGranules task. The MoveGranules task will then move the files to their targets. The UpdateGranulesCmrMetadataFileLinks task will update the CMR metadata file if it exists with the updated granule locations and update the CMR file etags.

    In the provided example, a payload being passed to the FilesToGranules task should be expected to look like:

      "payload": [
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml"
    ]

    This list is the list of granules FilesToGranules will act upon to add/merge with the input_granules object.

    The pathing is generated from sync-granules, but in principle the files can be staged wherever you like so long as the processing/MoveGranules task's roles have access and the filename matches the collection configuration.

    input_granules

    The FilesToGranules task utilizes the incoming payload to chose which files to move, but pulls all other metadata from meta.input_granules. As such, the output payload in the example would look like:

    "input_granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg"
    }
    ]
    }
    ],
    - + \ No newline at end of file diff --git a/docs/next/data-cookbooks/choice-states/index.html b/docs/next/data-cookbooks/choice-states/index.html index 6b045d831bc..3b5a1a11c38 100644 --- a/docs/next/data-cookbooks/choice-states/index.html +++ b/docs/next/data-cookbooks/choice-states/index.html @@ -5,13 +5,13 @@ Choice States | Cumulus Documentation - +
    Version: Next

    Choice States

    Cumulus supports AWS Step Function Choice states. A Choice state enables branching logic in Cumulus workflows.

    Choice state definitions include a list of Choice Rules. Each Choice Rule defines a logical operation which compares an input value against a value using a comparison operator. For available comparison operators, review the AWS docs.

    If the comparison evaluates to true, the Next state is followed.

    Example

    In examples/cumulus-tf/parse_pdr_workflow.tf the ParsePdr workflow uses a Choice state, CheckAgainChoice, to terminate the workflow once meta.isPdrFinished: true is returned by the CheckStatus state.

    The CheckAgainChoice state definition requires an input object of the following structure:

    {
    "meta": {
    "isPdrFinished": false
    }
    }

    Given the above input to the CheckAgainChoice state, the workflow would transition to the PdrStatusReport state.

    "CheckAgainChoice": {
    "Type": "Choice",
    "Choices": [
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": false,
    "Next": "PdrStatusReport"
    },
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": true,
    "Next": "WorkflowSucceeded"
    }
    ],
    "Default": "WorkflowSucceeded"
    }

    Advanced: Loops in Cumulus Workflows

    Understanding the complete ParsePdr workflow is not necessary to understanding how Choice states work, but ParsePdr provides an example of how Choice states can be used to create a loop in a Cumulus workflow.

    In the complete ParsePdr workflow definition, the state QueueGranules is followed by CheckStatus. From CheckStatus a loop starts: Given CheckStatus returns meta.isPdrFinished: false, CheckStatus is followed by CheckAgainChoice is followed by PdrStatusReport is followed by WaitForSomeTime, which returns to CheckStatus. Once CheckStatus returns meta.isPdrFinished: true, CheckAgainChoice proceeds to WorkflowSucceeded.

    Execution graph of SIPS ParsePdr workflow in AWS Step Functions console

    Further documentation

    For complete details on Choice state configuration options, see the Choice state documentation.

    - + \ No newline at end of file diff --git a/docs/next/data-cookbooks/cnm-workflow/index.html b/docs/next/data-cookbooks/cnm-workflow/index.html index 5fefdf67586..bd5b60615d9 100644 --- a/docs/next/data-cookbooks/cnm-workflow/index.html +++ b/docs/next/data-cookbooks/cnm-workflow/index.html @@ -5,7 +5,7 @@ CNM Workflow | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: Next

    CNM Workflow

    This entry documents how to setup a workflow that utilizes the built-in CNM/Kinesis functionality in Cumulus.

    Prior to working through this entry you should be familiar with the Cloud Notification Mechanism.

    Sections


    Prerequisites

    Cumulus

    This entry assumes you have a deployed instance of Cumulus (version >= 1.16.0). The entry assumes you are deploying Cumulus via the cumulus terraform module sourced from the release page.

    AWS CLI

    This entry assumes you have the AWS CLI installed and configured. If you do not, please take a moment to review the documentation - particularly the examples relevant to Kinesis - and install it now.

    Kinesis

    This entry assumes you already have two Kinesis data steams created for use as CNM notification and response data streams.

    If you do not have two streams setup, please take a moment to review the Kinesis documentation and setup two basic single-shard streams for this example:

    Using the "Create Data Stream" button on the Kinesis Dashboard, work through the dialogue.

    You should be able to quickly use the "Create Data Stream" button on the Kinesis Dashboard, and setup streams that are similar to the following example:

    Screenshot of AWS console page for creating a Kinesis stream

    Please bear in mind that your {{prefix}}-lambda-processing IAM role will need permissions to write to the response stream for this workflow to succeed if you create the Kinesis stream with a dashboard user. If you are using the cumulus top-level module for your deployment this should be set properly.

    If not, the most straightforward approach is to attach the AmazonKinesisFullAccess policy for the stream resource to whatever role your Lambda s are using, however your environment/security policies may require an approach specific to your deployment environment.

    In operational environments it's likely science data providers would typically be responsible for providing a Kinesis stream with the appropriate permissions.

    For more information on how this process works and how to develop a process that will add records to a stream, read the Kinesis documentation and the developer guide.

    Source Data

    This entry will run the SyncGranule task against a single target data file. To that end it will require a single data file to be present in an S3 bucket matching the Provider configured in the next section.

    Collection and Provider

    Cumulus will need to be configured with a Collection and Provider entry of your choosing. The provider should match the location of the source data from the Ingest Source Data section.

    This can be done via the Cumulus Dashboard if installed or the API. It is strongly recommended to use the dashboard if possible.


    Configure the Workflow

    Provided the prerequisites have been fulfilled, you can begin adding the needed values to your Cumulus configuration to configure the example workflow.

    The following are steps that are required to set up your Cumulus instance to run the example workflow:

    Example CNM Workflow

    In this example, we're going to trigger a workflow by creating a Kinesis rule and sending a record to a Kinesis stream.

    The following workflow definition should be added to a new .tf workflow resource (e.g. cnm_workflow.tf) in your deployment directory. For the complete CNM workflow example, see examples/cumulus-tf/cnm_workflow.tf.

    Add the following to the new terraform file in your deployment directory, updating the following:

    • Set the response-endpoint key in the CnmResponse task in the workflow JSON to match the name of the Kinesis response stream you configured in the prerequisites section
    • Update the source key to the workflow module to match the Cumulus release associated with your deployment.
    module "cnm_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-workflow.zip"

    prefix = var.prefix
    name = "CNMExampleWorkflow"
    workflow_config = module.cumulus.workflow_config
    system_bucket = var.system_bucket

    {
    state_machine_definition = <<JSON
    "CNMExampleWorkflow": {
    "Comment": "CNMExampleWorkflow",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "collection": "{$.meta.collection}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "response-endpoint": "ADD YOUR RESPONSE STREAM NAME HERE",
    "region": "us-east-1",
    "type": "kinesis",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$.input.input}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 5,
    "MaxAttempts": 3
    }
    ],
    "End": true
    }
    }
    }
    }
    JSON

    Again, please make sure to modify the value response-endpoint to match the stream name (not ARN) for your Kinesis response stream.

    Lambda Configuration

    To execute this workflow, you're required to include several Lambda resources in your deployment. To do this, add the following task (Lambda) definitions to your deployment along with the workflow you created above:

    note

    To utilize these tasks you need to ensure you have a compatible CMA layer. See the deployment instructions for more details on how to deploy a CMA layer.

    Below is a description of each of these tasks:

    CNMToCMA

    CNMToCMA is meant for the beginning of a workflow: it maps CNM granule information to a payload for downstream tasks. For other CNM workflows, you would need to ensure that downstream tasks in your workflow either understand the CNM message or include a translation task like this one.

    You can also manipulate the data sent to downstream tasks using task_config for various states in your workflow resource configuration. Read more about how to configure data on the Workflow Input & Output page.

    CnmResponse

    The CnmResponse Lambda generates a CNM response message and puts it on the response-endpoint Kinesis stream.

    You can read more about the expected schema of a CnmResponse record in the Cloud Notification Mechanism schema repository.

    Additional Tasks

    Lastly, this entry also makes use of the SyncGranule task from the cumulus module.

    Redeploy

    Once the above configuration changes have been made, redeploy your stack.

    Please refer to Update Cumulus resources in the deployment documentation if you are unfamiliar with redeployment.

    Rule Configuration

    Cumulus includes a messageConsumer Lambda function (message-consumer). Cumulus kinesis-type rules create the event source mappings between Kinesis streams and the messageConsumer Lambda. The messageConsumer Lambda consumes records from one or more Kinesis streams, as defined by enabled kinesis-type rules. When new records are pushed to one of these streams, the messageConsumer triggers workflows associated with the enabled kinesis-type rules.

    To add a rule via the dashboard (if you'd like to use the API, see the docs here), navigate to the Rules page and click Add a rule, then configure the new rule using the following template (substituting correct values for parameters denoted by ${}):

    {
    "collection": {
    "name": "L2_HR_PIXC",
    "version": "000"
    },
    "name": "L2_HR_PIXC_kinesisRule",
    "provider": "PODAAC_SWOT",
    "rule": {
    "type": "kinesis",
    "value": "arn:aws:kinesis:{{awsRegion}}:{{awsAccountId}}:stream/{{streamName}}"
    },
    "state": "ENABLED",
    "workflow": "CNMExampleWorkflow"
    }
    note
    • The rule's value attribute value must match the Amazon Resource Name ARN for the Kinesis data stream you've preconfigured. You should be able to obtain this ARN from the Kinesis Dashboard entry for the selected stream.
    • The collection and provider should match the collection and provider you setup in the Prerequisites section.

    Once you've clicked on 'submit' a new rule should appear in the dashboard's Rule Overview.


    Execute the Workflow

    Once Cumulus has been redeployed and a rule has been added, we're ready to trigger the workflow and watch it execute.

    How to Trigger the Workflow

    To trigger matching workflows, you will need to put a record on the Kinesis stream that the message-consumer Lambda will recognize as a matching event. Most importantly, it should include a collection name that matches a valid collection.

    For the purpose of this example, the easiest way to accomplish this is using the AWS CLI.

    Create Record JSON

    Construct a JSON file containing an object that matches the values that have been previously setup. This JSON object should be a valid Cloud Notification Mechanism message.

    note

    This example is somewhat contrived, as the downstream tasks don't care about most of these fields. A 'real' data ingest workflow would.

    The following values (denoted by ${} in the sample below) should be replaced to match values we've previously configured:

    • TEST_DATA_FILE_NAME: The filename of the test data that is available in the S3 (or other) provider we created earlier.
    • TEST_DATA_URI: The full S3 path to the test data (e.g. s3://bucket-name/path/granule)
    • COLLECTION: The collection name defined in the prerequisites for this product
    {
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "${TEST_DATA_FILE_NAME}",
    "checksum": "bogus_checksum_value",
    "uri": "${TEST_DATA_URI}",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "${TEST_DATA_FILE_NAME}",
    "dataVersion": "006"
    },
    "identifier ": "testIdentifier123456",
    "collection": "${COLLECTION}",
    "provider": "TestProvider",
    "version": "001",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Add Record to Kinesis Data Stream

    Using the JSON file you created, push it to the Kinesis notification stream:

    aws kinesis put-record --stream-name YOUR_KINESIS_NOTIFICATION_STREAM_NAME_HERE --partition-key 1 --data file:///path/to/file.json
    note

    The above command uses the stream name, not the ARN.

    The command should return output similar to:

    {
    "ShardId": "shardId-000000000000",
    "SequenceNumber": "42356659532578640215890215117033555573986830588739321858"
    }

    This command will put a record containing the JSON from the --data flag onto the Kinesis data stream. The messageConsumer Lambda will consume the record and construct a valid CMA payload to trigger workflows. For this example, the record will trigger the CNMExampleWorkflow workflow as defined by the rule previously configured.

    You can view the current running executions on the Executions dashboard page which presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information.

    Verify Workflow Execution

    As detailed above, once the record is added to the Kinesis data stream, the messageConsumer Lambda will trigger the CNMExampleWorkflow .

    TranslateMessage

    TranslateMessage (which corresponds to the CNMToCMA Lambda) will take the CNM object payload and add a granules object to the CMA payload that's consistent with other Cumulus ingest tasks, and add a meta.cnm key (as well as the payload) to store the original message.

    info

    For more on the Message Adapter, please see the Message Flow documentation.

    An example of what is happening in the CNMToCMA Lambda is as follows:

    Example Input Payload:

    "payload": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some_bucket/cumulus-test-data/pdrs/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Example Output Payload:

      "payload": {
    "cnm": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552"
    },
    "output": {
    "granules": [
    {
    "granuleId": "TestGranuleUR",
    "files": [
    {
    "path": "some-bucket/data",
    "url_path": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "some-bucket",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 12345678
    }
    ]
    }
    ]
    }
    }

    SyncGranules

    This Lambda will take the files listed in the payload and move them to s3://{deployment-private-bucket}/file-staging/{deployment-name}/{COLLECTION}/{file_name}.

    CnmResponse

    Assuming a successful execution of the workflow, this task will recover the meta.cnm key from the CMA output, and add a "SUCCESS" record to the notification Kinesis stream.

    If a prior step in the workflow has failed, this will add a "FAILURE" record to the stream instead.

    The data written to the response-endpoint should adhere to the Response Message Fields schema.

    Example CNM Success Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "SUCCESS"
    }
    }

    Example CNM Error Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "FAILURE",
    "errorCode": "PROCESSING_ERROR",
    "errorMessage": "File [cumulus-dev-a4d38f59-5e57-590c-a2be-58640db02d91/prod_20170926T11:30:36/production_file.nc] did not match gve checksum value."
    }
    }

    Note the CnmResponse state defined in the .tf workflow definition above configures $.exception to be passed to the CnmResponse Lambda keyed under config.WorkflowException. This is required for the CnmResponse code to deliver a failure response.

    To test the failure scenario, send a record missing the product.name key.


    Verify results

    Check for successful execution on the dashboard

    Following the successful execution of this workflow, you should expect to see the workflow complete successfully on the dashboard:

    Screenshot of a successful CNM workflow appearing on the executions page of the Cumulus dashboard

    Check the test granule has been delivered to S3 staging

    The test granule identified in the Kinesis record should be moved to the deployment's private staging area.

    Check for Kinesis records

    A SUCCESS notification should be present on the response-endpoint Kinesis stream.

    You should be able to validate the notification and response streams have the expected records with the following steps (the AWS CLI Kinesis Basic Stream Operations is useful to review before proceeding):

    Get a shard iterator (substituting your stream name as appropriate):

    aws kinesis get-shard-iterator \
    --shard-id shardId-000000000000 \
    --shard-iterator-type LATEST \
    --stream-name NOTIFICATION_OR_RESPONSE_STREAM_NAME

    which should result in an output to:

    {
    "ShardIterator": "VeryLongString=="
    }
    • Re-trigger the workflow by using the put-record command from
    • As the workflow completes, use the output from the get-shard-iterator command to request data from the stream:
    aws kinesis get-records --shard-iterator SHARD_ITERATOR_VALUE

    This should result in output similar to:

    {
    "Records": [
    {
    "SequenceNumber": "49586720336541656798369548102057798835250389930873978882",
    "ApproximateArrivalTimestamp": 1532664689.128,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjI4LjkxOSJ9",
    "PartitionKey": "1"
    },
    {
    "SequenceNumber": "49586720336541656798369548102059007761070005796999266306",
    "ApproximateArrivalTimestamp": 1532664707.149,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjQ2Ljk1OCJ9",
    "PartitionKey": "1"
    }
    ],
    "NextShardIterator": "AAAAAAAAAAFo9SkF8RzVYIEmIsTN+1PYuyRRdlj4Gmy3dBzsLEBxLo4OU+2Xj1AFYr8DVBodtAiXbs3KD7tGkOFsilD9R5tA+5w9SkGJZ+DRRXWWCywh+yDPVE0KtzeI0andAXDh9yTvs7fLfHH6R4MN9Gutb82k3lD8ugFUCeBVo0xwJULVqFZEFh3KXWruo6KOG79cz2EF7vFApx+skanQPveIMz/80V72KQvb6XNmg6WBhdjqAA==",
    "MillisBehindLatest": 0
    }

    Note the data encoding is not human readable and would need to be parsed/converted to be interpretable. There are many options to build a Kineis consumer such as the KCL.

    For purposes of validating the workflow, it may be simpler to locate the workflow in the Step Function Management Console and assert the expected output is similar to the below examples.

    Successful CNM Response Object Example:

    {
    "cnmResponse": {
    "provider": "TestProvider",
    "collection": "MOD09GQ",
    "version": "123456",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier ": "testIdentifier123456",
    "response": {
    "status": "SUCCESS"
    }
    }
    }

    Kinesis Record Error Handling

    messageConsumer

    The default Kinesis stream processing in the Cumulus system is configured for record error tolerance.

    When the messageConsumer fails to process a record, the failure is captured and the record is published to the kinesisFallback SNS Topic. The kinesisFallback SNS topic broadcasts the record and a subscribed copy of the messageConsumer Lambda named kinesisFallback consumes these failures.

    At this point, the normal Lambda asynchronous invocation retry behavior will attempt to process the record 3 mores times. After this, if the record cannot successfully be processed, it is written to a dead letter queue. Cumulus' dead letter queue is an SQS Queue named kinesisFailure. Operators can use this queue to inspect failed records.

    This system ensures when messageConsumer fails to process a record and trigger a workflow, the record is retried 3 times. This retry behavior improves system reliability in case of any external service failure outside of Cumulus control.

    The Kinesis error handling system - the kinesisFallback SNS topic, messageConsumer Lambda, and kinesisFailure SQS queue - come with the API package and do not need to be configured by the operator.

    To examine records that were unable to be processed at any step you need to go look at the dead letter queue {{prefix}}-kinesisFailure. Check the Simple Queue Service (SQS) console. Select your queue, and under the Queue Actions tab, you can choose View/Delete Messages. Start polling for messages and you will see records that failed to process through the messageConsumer.

    Note, these are only records that occurred when processing records from Kinesis streams. Workflow failures are handled differently.

    Kinesis Stream logging

    Notification Stream messages

    Cumulus includes two Lambdas (KinesisInboundEventLogger and KinesisOutboundEventLogger) that utilize the same code to take a Kinesis record event as input, deserialize the data field and output the modified event to the logs.

    When a kinesis rule is created, in addition to the messageConsumer event mapping, an event mapping is created to trigger KinesisInboundEventLogger to record a log of the inbound record, to allow for analysis in case of unexpected failure.

    Response Stream messages

    Cumulus also supports this feature for all outbound messages. To take advantage of this feature, you will need to set an event mapping on the KinesisOutboundEventLogger Lambda that targets your response-endpoint. You can do this in the Lambda management page for KinesisOutboundEventLogger. Add a Kinesis trigger, and configure it to target the cnmResponseStream for your workflow:

    Screenshot of the AWS console showing configuration for Kinesis stream trigger on KinesisOutboundEventLogger Lambda

    Once this is done, all records sent to the response-endpoint will also be logged in CloudWatch. For more on configuring Lambdas to trigger on Kinesis events, please see creating an event source mapping.

    - + \ No newline at end of file diff --git a/docs/next/data-cookbooks/error-handling/index.html b/docs/next/data-cookbooks/error-handling/index.html index 07718cded2e..9f52d4f0c62 100644 --- a/docs/next/data-cookbooks/error-handling/index.html +++ b/docs/next/data-cookbooks/error-handling/index.html @@ -5,7 +5,7 @@ Error Handling in Workflows | Cumulus Documentation - + @@ -45,7 +45,7 @@ Service Exception. See this documentation on configuring your workflow to handle transient lambda errors.

    Example state machine definition:

    {
    "Comment": "Tests Workflow from Kinesis Stream",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "Path": "$.payload",
    "TargetPath": "$.payload"
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": ["States.ALL"],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowSucceeded"
    },
    "CnmResponseFail": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowFailed"
    },
    "WorkflowSucceeded": {
    "Type": "Succeed"
    },
    "WorkflowFailed": {
    "Type": "Fail",
    "Cause": "Workflow failed"
    }
    }
    }

    The above results in a workflow which is visualized in the diagram below:

    Screenshot of a visualization of an AWS Step Function workflow definition with branching logic for failures

    Summary

    Error handling should (mostly) be the domain of workflow configuration.

    - + \ No newline at end of file diff --git a/docs/next/data-cookbooks/hello-world/index.html b/docs/next/data-cookbooks/hello-world/index.html index 2cc7afbd2ac..13a3ef22e73 100644 --- a/docs/next/data-cookbooks/hello-world/index.html +++ b/docs/next/data-cookbooks/hello-world/index.html @@ -5,14 +5,14 @@ HelloWorld Workflow | Cumulus Documentation - +
    Version: Next

    HelloWorld Workflow

    Example task meant to be a sanity check/introduction to the Cumulus workflows.

    Pre-Deployment Configuration

    Workflow Configuration

    A workflow definition can be found in the template repository hello_world_workflow module.

    {
    "Comment": "Returns Hello World",
    "StartAt": "HelloWorld",
    "States": {
    "HelloWorld": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.hello_world_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    }

    Workflow error-handling can be configured as discussed in the Error-Handling cookbook.

    Task Configuration

    The HelloWorld task is provided for you as part of the cumulus terraform module, no configuration is needed.

    If you want to manually deploy your own version of this Lambda for testing, you can copy the Lambda resource definition located in the Cumulus source code at cumulus/tf-modules/ingest/hello-world-task.tf. The Lambda source code is located in the Cumulus source code at 'cumulus/tasks/hello-world'.

    Execution

    We will focus on using the Cumulus dashboard to schedule the execution of a HelloWorld workflow.

    Our goal here is to create a rule through the Cumulus dashboard that will define the scheduling and execution of our HelloWorld workflow. Let's navigate to the Rules page and click Add a rule.

    {
    "collection": { # collection values can be configured and found on the Collections page
    "name": "${collection_name}",
    "version": "${collection_version}"
    },
    "name": "helloworld_rule",
    "provider": "${provider}", # found on the Providers page
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "workflow": "HelloWorldWorkflow" # This can be found on the Workflows page
    }

    Screenshot of AWS Step Function execution graph for the HelloWorld workflow Executed workflow as seen in AWS Console

    Output/Results

    The Executions page presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information. The rule defined in the previous section should start an execution of its own accord, and the status of that execution can be tracked here.

    To get some deeper information on the execution, click on the value in the Name column of your execution of interest. This should bring up a visual representation of the workflow similar to that shown above, execution details, and a list of events.

    Summary

    Setting up the HelloWorld workflow on the Cumulus dashboard is the tip of the iceberg, so to speak. The task and step-function need to be configured before Cumulus deployment. A compatible collection and provider must be configured and applied to the rule. Finally, workflow execution status can be viewed via the workflows tab on the dashboard.

    - + \ No newline at end of file diff --git a/docs/next/data-cookbooks/ingest-notifications/index.html b/docs/next/data-cookbooks/ingest-notifications/index.html index 1dd5126a6bb..1fab1a233cd 100644 --- a/docs/next/data-cookbooks/ingest-notifications/index.html +++ b/docs/next/data-cookbooks/ingest-notifications/index.html @@ -5,13 +5,13 @@ Ingest Notification in Workflows | Cumulus Documentation - +
    Version: Next

    Ingest Notification in Workflows

    On deployment, an SQS queue and three SNS topics, one for executions, granules, and PDRs, are created and used for handling notification messages related to the workflow.

    The ingest notification reporting SQS queue is populated via a Cloudwatch rule for any Step Function execution state transitions. The sfEventSqsToDbRecords Lambda consumes this queue. The queue and Lambda are included in the cumulus module and the Cloudwatch rule in the workflow module and are included by default in a Cumulus deployment.

    The sfEventSqsToDbRecords Lambda function reads from the sfEventSqsToDbRecordsInputQueue queue and updates the RDS database records for granules, executions, and PDRs. When the records are updated, messages are posted to the three SNS topics. This Lambda is invoked both when the workflow starts and when it reaches a terminal state (completion or failure).

    Diagram of architecture for reporting workflow ingest notifications from AWS Step Functions

    Sending SQS messages to report status

    Publishing granule/PDR reports directly to the SQS queue

    If you have a non-Cumulus workflow or process ingesting data and would like to update the status of your granules or PDRs, you can publish directly to the reporting SQS queue. Publishing messages to this queue will result in those messages being stored as granule/PDR records in the Cumulus database and having the status of those granules/PDRs being visible on the Cumulus dashboard. The queue does have certain expectations as it expects a Cumulus Message nested within a Cloudwatch Step Function Event object.

    Posting directly to the queue will require knowing the queue URL. Assuming that you are using the cumulus module for your deployment, you can get the queue URL by adding them to outputs.tf for your Terraform deployment as in our example deployment:

    output "stepfunction_event_reporter_queue_url" {
    value = module.cumulus.stepfunction_event_reporter_queue_url
    }

    output "report_executions_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_granules_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_pdrs_sns_topic_arn" {
    value = module.cumulus.report_pdrs_sns_topic_arn
    }

    Then, when you run terraform deploy, you should see the topic ARNs printed to your console:

    Outputs:
    ...
    stepfunction_event_reporter_queue_url = https://sqs.us-east-1.amazonaws.com/xxxxxxxxx/<prefix>-sfEventSqsToDbRecordsInputQueue
    report_executions_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_granules_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_pdrs_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-pdrs-topic

    Once you have the queue URL, you can use the AWS SDK for your language of choice to publish messages to the topic. The expected format of these messages is that of a Cloudwatch Step Function event containing a Cumulus message. For SUCCEEDED events, the Cumulus message is expected to be in detail.output. For all other events statuses, a Cumulus Message is expected in detail.input. The Cumulus Message populating these fields MUST be a JSON string, not an object. Messages that do not conform to the schemas will fail to be created as records.

    If you are not seeing records persist to the database or show up in the Cumulus dashboard, you can investigate the Cloudwatch logs of the SQS consumer Lambda:

    • /aws/lambda/<prefix>-sfEventSqsToDbRecords

    In a workflow

    As described above, ingest notifications will automatically be published to the SNS topics on workflow start and completion/failure, so you should not include a workflow step to publish the initial or final status of your workflows.

    However, if you want to report your ingest status at any point during a workflow execution, you can add a workflow step using the SfSqsReport Lambda. In the following example from cumulus-tf/parse_pdr_workflow.tf, the ParsePdr workflow is configured to use the SfSqsReport Lambda, primarily to update the PDR ingestion status.

    info

    ${sf_sqs_report_task_arn} is an interpolated value referring to a Terraform resource. See the example deployment code for the ParsePdr workflow.

      "PdrStatusReport": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    },
    "ResultPath": null,
    "Type": "Task",
    "Resource": "${sf_sqs_report_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WaitForSomeTime"
    },

    Subscribing additional listeners to SNS topics

    Additional listeners to SNS topics can be configured in a .tf file for your Cumulus deployment. Shown below is configuration that subscribes an additional Lambda function (test_lambda) to receive messages from the report_executions SNS topic. To subscribe to the report_granules or report_pdrs SNS topics instead, simply replace report_executions in the code block below with either of those values.

    resource "aws_lambda_function" "test_lambda" {
    function_name = "${var.prefix}-testLambda"
    filename = "./testLambda.zip"
    source_code_hash = filebase64sha256("./testLambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"
    }

    resource "aws_sns_topic_subscription" "test_lambda" {
    topic_arn = module.cumulus.report_executions_sns_topic_arn
    protocol = "lambda"
    endpoint = aws_lambda_function.test_lambda.arn
    }

    resource "aws_lambda_permission" "test_lambda" {
    action = "lambda:InvokeFunction"
    function_name = aws_lambda_function.test_lambda.arn
    principal = "sns.amazonaws.com"
    source_arn = module.cumulus.report_executions_sns_topic_arn
    }

    SNS message format

    Subscribers to the SNS topics can expect to find the published message in the SNS event at Records[0].Sns.Message. The message will be a JSON stringified version of the ingest notification record for an execution or a PDR. For granules, the message will be a JSON stringified object with ingest notification record in the record property and the event type as the event property.

    The ingest notification record of the execution, granule, or PDR should conform to the data model schema for the given record type.

    Summary

    Workflows can be configured to send SQS messages at any point using the sf-sqs-report task.

    Additional listeners can be easily configured to trigger when messages are sent to the SNS topics.

    - + \ No newline at end of file diff --git a/docs/next/data-cookbooks/queue-post-to-cmr/index.html b/docs/next/data-cookbooks/queue-post-to-cmr/index.html index 8a401f74a09..34d3f9a43da 100644 --- a/docs/next/data-cookbooks/queue-post-to-cmr/index.html +++ b/docs/next/data-cookbooks/queue-post-to-cmr/index.html @@ -5,13 +5,13 @@ Queue PostToCmr | Cumulus Documentation - +
    Version: Next

    Queue PostToCmr

    In this document, we walk through handling CMR errors in workflows by queueing PostToCmr. We assume that the user already has an ingest workflow setup.

    Overview

    The general concept is that the last task of the ingest workflow will be QueueWorkflow, which queues the publish workflow. The publish workflow contains the PostToCmr task and if a CMR error occurs during PostToCmr, the publish workflow will add itself back onto the queue so that it can be executed when CMR is back online. This is achieved by leveraging the QueueWorkflow task again in the publish workflow. The following diagram demonstrates this queueing process.

    Diagram of workflow queueing

    Ingest Workflow

    The last step should be the QueuePublishWorkflow step. It should be configured with a queueUrl and workflow. In this case, the queueUrl is a throttled queue. Any queueUrl can be specified here which is useful if you would like to use a lower priority queue. The workflow is the unprefixed workflow name that you would like to queue (e.g. PublishWorkflow).

      "QueuePublishWorkflowStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "workflow": "{$.meta.workflow}",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Publish Workflow

    Configure the Catch section of your PostToCmr task to proceed to QueueWorkflow if a CMRInternalError is caught. Any other error will cause the workflow to fail.

      "Catch": [
    {
    "ErrorEquals": [
    "CMRInternalError"
    ],
    "Next": "RequeueWorkflow"
    },
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],

    Then, configure the QueueWorkflow task similarly to its configuration in the ingest workflow. This time, pass the current publish workflow to the task config. This allows for the publish workflow to be requeued when there is a CMR error.

    {
    "RequeueWorkflow": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "workflow": "PublishGranuleQueue",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    - + \ No newline at end of file diff --git a/docs/next/data-cookbooks/run-tasks-in-lambda-or-docker/index.html b/docs/next/data-cookbooks/run-tasks-in-lambda-or-docker/index.html index d2f11d1e08b..0cbd7411509 100644 --- a/docs/next/data-cookbooks/run-tasks-in-lambda-or-docker/index.html +++ b/docs/next/data-cookbooks/run-tasks-in-lambda-or-docker/index.html @@ -5,13 +5,13 @@ Run Step Function Tasks in AWS Lambda or Docker | Cumulus Documentation - +
    Version: Next

    Run Step Function Tasks in AWS Lambda or Docker

    Overview

    AWS Step Function Tasks can run tasks on AWS Lambda or on AWS Elastic Container Service (ECS) as a Docker container.

    Lambda provides serverless architecture, providing the best option for minimizing cost and server management. ECS provides the fullest extent of AWS EC2 resources via the flexibility to execute arbitrary code on any AWS EC2 instance type.

    When to use Lambda

    You should use AWS Lambda whenever all of the following are true:

    • The task runs on one of the supported Lambda Runtimes. At time of this writing, supported runtimes include versions of python, Java, Ruby, node.js, Go and .NET.
    • The lambda package is less than 50 MB in size, zipped.
    • The task consumes less than each of the following resources:
      • 3008 MB memory allocation
      • 512 MB disk storage (must be written to /tmp)
      • 15 minutes of execution time
    info

    See this page for a complete and up-to-date list of AWS Lambda limits.

    If your task requires more than any of these resources or an unsupported runtime, creating a Docker image which can be run on ECS is the way to go. Cumulus supports running any lambda package (and its configured layers) as a Docker container with cumulus-ecs-task.

    Step Function Activities and cumulus-ecs-task

    Step Function Activities enable a state machine task to "publish" an activity task which can be picked up by any activity worker. Activity workers can run pretty much anywhere, but Cumulus workflows support the cumulus-ecs-task activity worker. The cumulus-ecs-task worker runs as a Docker container on the Cumulus ECS cluster.

    The cumulus-ecs-task container takes an AWS Lambda Amazon Resource Name (ARN) as an argument (see --lambdaArn in the example below). This ARN argument is defined at deployment time. The cumulus-ecs-task worker polls for new Step Function Activity Tasks. When a Step Function executes, the worker (container) picks up the activity task and runs the code contained in the lambda package defined on deployment.

    Example: Replacing AWS Lambda with a Docker container run on ECS

    This example will use an already-defined workflow from the cumulus module that includes the QueueGranules task in its configuration.

    The following example is an excerpt from the Discover Granules workflow containing the step definition for the QueueGranules step:

    interpolated values

    ${ingest_granule_workflow_name} and ${queue_granules_task_arn} are interpolated values that refer to Terraform resources. See the example deployment code for the Discover Granules workflow.

      "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "queueUrl": "{$.meta.queues.startSF}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Given it has been discovered this task can no longer run in AWS Lambda, you can instead run it on the Cumulus ECS cluster by adding the following resources to your terraform deployment (by either adding a new .tf file or updating an existing one):

    • A aws_sfn_activity resource:
    resource "aws_sfn_activity" "queue_granules" {
    name = "${var.prefix}-QueueGranules"
    }
    • An instance of the cumulus_ecs_service module (found on the Cumulus releases page configured to provide the QueueGranules task:

    module "queue_granules_service" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-ecs-service.zip"

    prefix = var.prefix
    name = "QueueGranules"

    cluster_arn = module.cumulus.ecs_cluster_arn
    desired_count = 1
    image = "cumuluss/cumulus-ecs-task:1.9.0"

    cpu = 400
    memory_reservation = 700

    environment = {
    AWS_DEFAULT_REGION = data.aws_region.current.name
    }
    command = [
    "cumulus-ecs-task",
    "--activityArn",
    aws_sfn_activity.queue_granules.id,
    "--lambdaArn",
    module.cumulus.queue_granules_task.task_arn,
    "--lastModified",
    module.cumulus.queue_granules_task.last_modified_date
    ]
    alarms = {
    MemoryUtilizationHigh = {
    comparison_operator = "GreaterThanThreshold"
    evaluation_periods = 1
    metric_name = "MemoryUtilization"
    statistic = "SampleCount"
    threshold = 75
    }
    }
    }
    note

    If you have updated the code for the Lambda specified by --lambdaArn, you will have to manually restart the tasks in your ECS service before invocation of the Step Function activity will use the updated Lambda code.

    • An updated Discover Granules workflow) to utilize the new resource (the Resource key in the QueueGranules step has been updated to:

    "Resource": "${aws_sfn_activity.queue_granules.id}")`

    If you then run this workflow in place of the DiscoverGranules workflow, the QueueGranules step would run as an ECS task instead of a lambda.

    Final note

    Step Function Activities and AWS Lambda are not the only ways to run tasks in an AWS Step Function. Learn more about other service integrations, including direct ECS integration via the AWS Service Integrations page.

    - + \ No newline at end of file diff --git a/docs/next/data-cookbooks/sips-workflow/index.html b/docs/next/data-cookbooks/sips-workflow/index.html index 1b07c20ed11..8396c2918aa 100644 --- a/docs/next/data-cookbooks/sips-workflow/index.html +++ b/docs/next/data-cookbooks/sips-workflow/index.html @@ -5,7 +5,7 @@ Science Investigator-led Processing Systems (SIPS) | Cumulus Documentation - + @@ -16,7 +16,7 @@ we're just going to create a onetime throw-away rule that will be easy to test with. This rule will kick off the DiscoverAndQueuePdrs workflow, which is the beginning of a Cumulus SIPS workflow:

    Screenshot of a Cumulus rule configuration

    note

    A list of configured workflows exists under the "Workflows" in the navigation bar on the Cumulus dashboard. Additionally, one can find a list of executions and their respective status in the "Executions" tab in the navigation bar.

    DiscoverAndQueuePdrs Workflow

    This workflow will discover PDRs and queue them to be processed. Duplicate PDRs will be dealt with according to the configured duplicate handling setting in the collection. The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. DiscoverPdrs - source
    2. QueuePdrs - source

    Screenshot of execution graph for discover and queue PDRs workflow in the AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the discover_and_queue_pdrs_workflow.

    note

    To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    ParsePdr Workflow

    The ParsePdr workflow will parse a PDR, queue the specified granules (duplicates are handled according to the duplicate handling setting) and periodically check the status of those queued granules. This workflow will not succeed until all the granules included in the PDR are successfully ingested. If one of those fails, the ParsePdr workflow will fail. NOTE that ParsePdr may spin up multiple IngestGranule workflows in parallel, depending on the granules included in the PDR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. ParsePdr - source
    2. QueueGranules - source
    3. CheckStatus - source

    Screenshot of execution graph for SIPS Parse PDR workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the parse_pdr_workflow.

    note

    To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    IngestGranule Workflow

    The IngestGranule workflow processes and ingests a granule and posts the granule metadata to CMR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. SyncGranule - source.
    2. CmrStep - source

    Additionally this workflow requires a processing step you must provide. The ProcessingStep step in the workflow picture below is an example of a custom processing step.

    tip

    Using the CmrStep is not required and can be left out of the processing trajectory if desired (for example, in testing situations).

    Screenshot of execution graph for SIPS IngestGranule workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the ingest_and_publish_granule_workflow.

    note

    To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    Summary

    In this cookbook we went over setting up a collection, rule, and provider for a SIPS workflow. Once we had the setup completed, we looked over the Cumulus workflows that participate in parsing PDRs, ingesting and processing granules, and updating CMR.

    - + \ No newline at end of file diff --git a/docs/next/data-cookbooks/throttling-queued-executions/index.html b/docs/next/data-cookbooks/throttling-queued-executions/index.html index e50468d9531..c1a4c15d5d7 100644 --- a/docs/next/data-cookbooks/throttling-queued-executions/index.html +++ b/docs/next/data-cookbooks/throttling-queued-executions/index.html @@ -5,13 +5,13 @@ Throttling queued executions | Cumulus Documentation - +
    Version: Next

    Throttling queued executions

    In this entry, we will walk through how to create an SQS queue for scheduling executions which will be used to limit those executions to a maximum concurrency. And we will see how to configure our Cumulus workflows/rules to use this queue.

    We will also review the architecture of this feature and highlight some implementation notes.

    Limiting the number of executions that can be running from a given queue is useful for controlling the cloud resource usage of workflows that may be lower priority, such as granule reingestion or reprocessing campaigns. It could also be useful for preventing workflows from exceeding known resource limits, such as a maximum number of open connections to a data provider.

    Implementing the queue

    Create and deploy the queue

    Add a new queue

    In a .tf file for your Cumulus deployment, add a new SQS queue:

    resource "aws_sqs_queue" "background_job_queue" {
    name = "${var.prefix}-backgroundJobQueue"
    receive_wait_time_seconds = 20
    visibility_timeout_seconds = 60
    }

    Set maximum executions for the queue

    Define the throttled_queues variable for the cumulus module in your Cumulus deployment to specify the maximum concurrent executions for the queue.

    module "cumulus" {
    # ... other variables

    throttled_queues = [{
    url = aws_sqs_queue.background_job_queue.id,
    execution_limit = 5
    }]
    }

    Setup consumer for the queue

    Add the sqs2sfThrottle Lambda as the consumer for the queue and add a Cloudwatch event rule/target to read from the queue on a scheduled basis.

    caution

    You must use the sqs2sfThrottle Lambda as the consumer for any queue with a queue execution limit or else the execution throttling will not work correctly. Additionally, please allow at least 60 seconds after creation before using the queue while associated infrastructure and triggers are set up and made ready.

    aws_sqs_queue.background_job_queue.id refers to the queue resource defined above.

    resource "aws_cloudwatch_event_rule" "background_job_queue_watcher" {
    schedule_expression = "rate(1 minute)"
    }

    resource "aws_cloudwatch_event_target" "background_job_queue_watcher" {
    rule = aws_cloudwatch_event_rule.background_job_queue_watcher.name
    arn = module.cumulus.sqs2sfThrottle_lambda_function_arn
    input = jsonencode({
    messageLimit = 500
    queueUrl = aws_sqs_queue.background_job_queue.id
    timeLimit = 60
    })
    }

    resource "aws_lambda_permission" "background_job_queue_watcher" {
    action = "lambda:InvokeFunction"
    function_name = module.cumulus.sqs2sfThrottle_lambda_function_arn
    principal = "events.amazonaws.com"
    source_arn = aws_cloudwatch_event_rule.background_job_queue_watcher.arn
    }

    Re-deploy your Cumulus application

    Follow the instructions to re-deploy your Cumulus application. After you have re-deployed, your workflow template will be updated to the include information about the queue (the output below is partial output from an expected workflow template):

    {
    "cumulus_meta": {
    "queueExecutionLimits": {
    "<backgroundJobQueue_SQS_URL>": 5
    }
    }
    }

    Integrate your queue with workflows and/or rules

    Integrate queue with queuing steps in workflows

    For any workflows using QueueGranules or QueuePdrs that you want to use your new queue, update the Cumulus configuration of those steps in your workflows.

    As seen in this partial configuration for a QueueGranules step, update the queueUrl to reference the new throttled queue:

    ingest_granule_workflow_name

    ${ingest_granule_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverGranules workflow.

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}"
    }
    }
    }
    }
    }

    Similarly, for a QueuePdrs step:

    parse_pdr_workflow_name

    ${parse_pdr_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverPdrs workflow.

    {
    "QueuePdrs": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "parsePdrWorkflow": "${parse_pdr_workflow_name}"
    }
    }
    }
    }
    }

    After making these changes, re-deploy your Cumulus application for the execution throttling to take effect on workflow executions queued by these workflows.

    Create/update a rule to use your new queue

    Create or update a rule definition to include a queueUrl property that refers to your new queue:

    {
    "name": "s3_provider_rule",
    "workflow": "DiscoverAndQueuePdrs",
    "provider": "s3_provider",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "queueUrl": "<backgroundJobQueue_SQS_URL>" // configure rule to use your queue URL
    }

    After creating/updating the rule, any subsequent invocations of the rule should respect the maximum number of executions when starting workflows from the queue.

    Architecture

    Architecture diagram showing how executions started from a queue are throttled to a maximum concurrent limit

    Execution throttling based on the queue works by manually keeping a count (semaphore) of how many executions are running for the queue at a time. The key operation that prevents the number of executions from exceeding the maximum for the queue is that before starting new executions, the sqs2sfThrottle Lambda attempts to increment the semaphore and responds as follows:

    • If the increment operation is successful, then the count was not at the maximum and an execution is started
    • If the increment operation fails, then the count was already at the maximum so no execution is started

    Final notes

    Limiting the number of concurrent executions for work scheduled via a queue has several consequences worth noting:

    • The number of executions that are running for a given queue will be limited to the maximum for that queue regardless of which workflow(s) are started.
    • If you use the same queue to schedule executions across multiple workflows/rules, then the limit on the total number of executions running concurrently will be applied to all of the executions scheduled across all of those workflows/rules.
    • If you are scheduling the same workflow both via a queue with a maxExecutions value and a queue without a maxExecutions value, only the executions scheduled via the queue with the maxExecutions value will be limited to the maximum.
    - + \ No newline at end of file diff --git a/docs/next/data-cookbooks/tracking-files/index.html b/docs/next/data-cookbooks/tracking-files/index.html index 80a81ccd411..78572898bca 100644 --- a/docs/next/data-cookbooks/tracking-files/index.html +++ b/docs/next/data-cookbooks/tracking-files/index.html @@ -5,7 +5,7 @@ Tracking Ancillary Files | Cumulus Documentation - + @@ -19,7 +19,7 @@ The UMM-G column reflects the RelatedURL's Type derived from the CNM type, whereas the ECHO10 column shows how the CNM type affects the destination element.

    CNM TypeUMM-G RelatedUrl.TypeECHO10 Location
    ancillary'VIEW RELATED INFORMATION'OnlineResource
    data'GET DATA'(HTTPS URL) or 'GET DATA VIA DIRECT ACCESS'(S3 URI)OnlineAccessURL
    browse'GET RELATED VISUALIZATION'AssociatedBrowseImage
    linkage'EXTENDED METADATA'OnlineResource
    metadata'EXTENDED METADATA'OnlineResource
    qa'EXTENDED METADATA'OnlineResource

    Common Use Cases

    This section briefly documents some common use cases and the recommended configuration for the file. The examples shown here are for the DiscoverGranules use case, which allows configuration at the Cumulus dashboard level. The other two cases covered in the ancillary metadata documentation require configuration at the provider notification level (either CNM message or PDR) and are not covered here.

    Configuring browse imagery:

    {
    "bucket": "public",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_[\\d]{1}.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_1.jpg",
    "type": "browse"
    }

    Configuring a documentation entry:

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_README.pdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_README.pdf",
    "type": "metadata"
    }

    Configuring other associated files (use types metadata or qa as appropriate):

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_QA.txt$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_QA.txt",
    "type": "qa"
    }
    - + \ No newline at end of file diff --git a/docs/next/deployment/api-gateway-logging/index.html b/docs/next/deployment/api-gateway-logging/index.html index d646b730cee..8ed52764bf4 100644 --- a/docs/next/deployment/api-gateway-logging/index.html +++ b/docs/next/deployment/api-gateway-logging/index.html @@ -5,13 +5,13 @@ API Gateway Logging | Cumulus Documentation - +
    Version: Next

    API Gateway Logging

    Enabling API Gateway Logging

    In order to enable distribution API Access and execution logging, configure the TEA deployment by setting log_api_gateway_to_cloudwatch on the thin_egress_app module:

    log_api_gateway_to_cloudwatch = true

    This enables the distribution API to send its logs to the default CloudWatch location: API-Gateway-Execution-Logs_<RESTAPI_ID>/<STAGE>

    Configure Permissions for API Gateway Logging to CloudWatch

    Instructions: Enabling Account Level Logging from API Gateway to CloudWatch

    This is a one time operation that must be performed on each AWS account to allow API Gateway to push logs to CloudWatch.

    1. Create a policy document

      The AmazonAPIGatewayPushToCloudWatchLogs managed policy, with an ARN of arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs, has all the required permissions to enable API Gateway logging to CloudWatch. To grant these permissions to your account, first create an IAM role with apigateway.amazonaws.com as its trusted entity.

      Save this snippet as apigateway-policy.json.

      {
      "Version": "2012-10-17",
      "Statement": [
      {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
      "Service": "apigateway.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
      }
      ]
      }
    2. Create an account role to act as ApiGateway and write to CloudWatchLogs

      in NGAP

      NASA users in NGAP: Be sure to use your account's permission boundary.

          aws iam create-role \
      --role-name ApiGatewayToCloudWatchLogs \
      [--permissions-boundary <permissionBoundaryArn>] \
      --assume-role-policy-document file://apigateway-policy.json

      Note the ARN of the returned role for the last step.

    3. Attach correct permissions to role

      Next attach the AmazonAPIGatewayPushToCloudWatchLogs policy to the IAM role.

      aws iam attach-role-policy \
      --role-name ApiGatewayToCloudWatchLogs \
      --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs"
    4. Update Account API Gateway settings with correct permissions

      Finally, set the IAM role ARN on the cloudWatchRoleArn property on your API Gateway Account settings.

      aws apigateway update-account \
      --patch-operations op='replace',path='/cloudwatchRoleArn',value='<ApiGatewayToCloudWatchLogs ARN>'

    Configure API Gateway CloudWatch Logs Delivery

    For details about configuring the API Gateway CloudWatch Logs delivery, see Configure Cloudwatch Logs Delivery.

    - + \ No newline at end of file diff --git a/docs/next/deployment/apis-introduction/index.html b/docs/next/deployment/apis-introduction/index.html index b9b6caa7421..9e06f2f5796 100644 --- a/docs/next/deployment/apis-introduction/index.html +++ b/docs/next/deployment/apis-introduction/index.html @@ -5,13 +5,13 @@ APIs | Cumulus Documentation - +
    Version: Next

    APIs

    Common Distribution APIs

    When deploying from the Cumulus Deployment Template or a configuration based on that repo, the Thin Egress App (TEA) distribution app will be used by default. However, you have the choice to use the Cumulus Distribution API as well.

    Cumulus API Customization Use Cases

    Our Cumulus API offers you the flexibility to customize for your DAAC/organization. Below is a list of use cases that may help you with options:

    Types of APIs

    - + \ No newline at end of file diff --git a/docs/next/deployment/choosing_configuring_rds/index.html b/docs/next/deployment/choosing_configuring_rds/index.html index 61d865eebf7..b9f36901893 100644 --- a/docs/next/deployment/choosing_configuring_rds/index.html +++ b/docs/next/deployment/choosing_configuring_rds/index.html @@ -5,7 +5,7 @@ RDS: Choosing and Configuring Your Database Type | Cumulus Documentation - + @@ -36,7 +36,7 @@ using this module to create your RDS cluster, you can configure the autoscaling timeout action, the cluster minimum and maximum capacity, and more as seen in the supported variables for the module.

    Unfortunately, Terraform currently doesn't allow specifying the autoscaling timeout itself, so that value will have to be manually configured in the AWS console or CLI.

    Optional: Manage RDS Database with pgAdmin

    Setup SSM Port Forwarding

    note

    In order to perform this action you will need to deploy it within a VPC and have the credentials to access via NGAP protocols.

    For a walkthrough guide on how to utilize AWS's Session Manager for port forwarding to access the Cumulus RDS database go to the Accessing Cumulus RDS database via SSM Port Forwarding article.

    - + \ No newline at end of file diff --git a/docs/next/deployment/cloudwatch-logs-delivery/index.html b/docs/next/deployment/cloudwatch-logs-delivery/index.html index af44d3889e9..5484516f147 100644 --- a/docs/next/deployment/cloudwatch-logs-delivery/index.html +++ b/docs/next/deployment/cloudwatch-logs-delivery/index.html @@ -5,13 +5,13 @@ Configure Cloudwatch Logs Delivery | Cumulus Documentation - +
    Version: Next

    Configure Cloudwatch Logs Delivery

    As an optional configuration step, it is possible to deliver CloudWatch logs to a cross-account shared AWS::Logs::Destination. An operator does this by configuring the cumulus module for your deployment as shown below. The value of the log_destination_arn variable is the ARN of a writeable log destination.

    The value can be either an AWS::Logs::Destination or a Kinesis Stream ARN to which your account can write.

    log_destination_arn           = arn:aws:[kinesis|logs]:us-east-1:123456789012:[streamName|destination:logDestinationName]

    Logs Sent

    By default, the following logs will be sent to the destination when one is given.

    • Ingest logs
    • Async Operation logs
    • Thin Egress App API Gateway logs (if configured)

    Additional Logs

    If additional logs are needed, you can configure additional_log_groups_to_elk with the Cloudwatch log groups you want to send to the destination. additional_log_groups_to_elk is a map with the key as a descriptor and the value with the Cloudwatch log group name.

    additional_log_groups_to_elk = {
    "HelloWorldTask" = "/aws/lambda/cumulus-example-HelloWorld"
    "MyCustomTask" = "my-custom-task-log-group"
    }
    - + \ No newline at end of file diff --git a/docs/next/deployment/components/index.html b/docs/next/deployment/components/index.html index 880477b1464..a4ee953deca 100644 --- a/docs/next/deployment/components/index.html +++ b/docs/next/deployment/components/index.html @@ -5,7 +5,7 @@ Component-based Cumulus Deployment | Cumulus Documentation - + @@ -39,7 +39,7 @@ Terraform at the same time.

    With remote state, Terraform writes the state data to a remote data store, which can then be shared between all members of a team.

    The recommended approach for handling remote state with Cumulus is to use the S3 backend. This backend stores state in S3 and uses a DynamoDB table for locking.

    See the deployment documentation for a walk-through of creating resources for your remote state using an S3 backend.

    - + \ No newline at end of file diff --git a/docs/next/deployment/create_bucket/index.html b/docs/next/deployment/create_bucket/index.html index bf492e4fa5d..650d4601bb7 100644 --- a/docs/next/deployment/create_bucket/index.html +++ b/docs/next/deployment/create_bucket/index.html @@ -5,13 +5,13 @@ Creating an S3 Bucket | Cumulus Documentation - +
    Version: Next

    Creating an S3 Bucket

    Buckets can be created on the command line with AWS CLI or via the web interface on the AWS console.

    When creating a protected bucket (a bucket containing data which will be served through the distribution API), make sure to enable S3 server access logging. See S3 Server Access Logging for more details.

    Command Line

    Using the AWS Command Line Tool create-bucket s3api subcommand:

    $ aws s3api create-bucket \
    --bucket foobar-internal \
    --region us-west-2 \
    --create-bucket-configuration LocationConstraint=us-west-2
    {
    "Location": "/foobar-internal"
    }
    info

    The region and create-bucket-configuration arguments are only necessary if you are creating a bucket outside of the us-east-1 region.

    Please note security settings and other bucket options can be set via the options listed in the s3api documentation.

    Repeat the above step for each bucket to be created.

    Web Interface

    If you prefer to use the AWS web interface instead of the command line, see AWS "Creating a Bucket" documentation.

    - + \ No newline at end of file diff --git a/docs/next/deployment/cumulus_distribution/index.html b/docs/next/deployment/cumulus_distribution/index.html index 3d86348aee9..07faa4d3cfc 100644 --- a/docs/next/deployment/cumulus_distribution/index.html +++ b/docs/next/deployment/cumulus_distribution/index.html @@ -5,14 +5,14 @@ Using the Cumulus Distribution API | Cumulus Documentation - +
    Version: Next

    Using the Cumulus Distribution API

    The Cumulus Distribution API is a set of endpoints that can be used to enable AWS Cognito authentication when downloading data from S3.

    tip

    If you need to access our quick reference materials while setting up or continuing to manage your API access go to the Cumulus Distribution API Docs.

    Configuring a Cumulus Distribution Deployment

    The Cumulus Distribution API is included in the main Cumulus repo. It is available as part of the terraform-aws-cumulus.zip archive in the latest release.

    These steps assume you're using the Cumulus Deployment Template but they can also be used for custom deployments.

    To configure a deployment to use Cumulus Distribution:

    1. Remove or comment the "Thin Egress App Settings" in the Cumulus Template Deploy and enable the "Cumulus Distribution Settings".
    2. Delete or comment the contents of thin_egress_app.tf and the corresponding Thin Egress App outputs in outputs.tf. These are not necessary for a Cumulus Distribution deployment.
    3. Uncomment the Cumulus Distribution outputs in outputs.tf.
    4. Rename cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.example to cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.

    Cognito Application and User Credentials

    The major prerequisite for using the Cumulus Distribution API is to set up Cognito. If operating within NGAP, this should already be done for you. If operating outside of NGAP, you must set up Cognito yourself, which is beyond the scope of this documentation.

    Given that Cognito is set up, in order to be able to download granule files via the Cumulus Distribution API, you must obtain Cognito user credentials, because any attempt to download such files (that will be, or have been, published to the CMR via your Cumulus deployment) will result in a prompt for you to supply Cognito user credentials. To obtain your own user credentials, talk to your product owner or scrum master for additional information. They should either know how to create the credentials, know who can create them for the team, or be the liaison to the Cognito team.

    Further, whoever helps to obtain your Cognito user credentials should also be able to supply you with the values for the following new variables that you must add to your cumulus-tf/terraform.tfvars file:

    • csdap_host_url: The URL of the Cognito service to which your Cumulus deployment will make Cognito API calls during a distribution (download) event
    • csdap_client_id: The client ID for the Cumulus application registered within the Cognito service
    • csdap_client_password: The client password for the Cumulus application registered within the Cognito service

    Although you might have to wait a bit for your Cognito user credentials, the remaining instructions do not depend upon having them, so you may continue with these instructions while waiting for your credentials.

    Cumulus Distribution URL

    Your Cumulus Distribution URL is used by Cumulus to generate download URLs as part of the granule metadata generated and published to the CMR. For example, a granule download URL will be of the form <distribution url>/<protected bucket>/<key> (or <distribution url>/path/to/file, if using a custom bucket map, as explained further below).

    By default, the value of your distribution URL is the URL of your private Cumulus Distribution API Gateway (the API Gateway named <prefix>-distribution, once you deploy the Cumulus Distribution module). Therefore, by default, the generated download URLs are private, and thus inaccessible directly, but there are 2 ways to address this issue (both of which are detailed below): (a) use tunneling (typically in development) or (b) put a CloudFront URL in front of your API Gateway (typically in production, and perhaps UAT and/or SIT).

    In either case, you must first know the default URL (i.e., the URL for the private Cumulus Distribution API Gateway). In order to obtain this default URL, you must first deploy your cumulus-tf module with the new Cumulus Distribution module, and once your initial deployment is complete, one of the Terraform outputs will be cumulus_distribution_api_uri, which is the URL for the private API Gateway.

    You may override this default URL by adding a cumulus_distribution_url variable to your cumulus-tf/terraform.tfvars file and setting it to one of the following values (both are explained below):

    1. The default URL, but with a port added to it, in order to allow you to configure tunneling (typically only in development)
    2. A CloudFront URL placed in front of your Cumulus Distribution API Gateway (typically only for Production, but perhaps also for a UAT or SIT environment)

    The following subsections explain these approaches in turn.

    Using Your Cumulus Distribution API Gateway URL as Your Distribution URL

    Since your Cumulus Distribution API Gateway URL is private, the only way you can use it to confirm that your integration with Cognito is working is by using tunneling (again, generally for development). Here is an outline of the required steps with details provided further below:

    1. Create/import a key pair into your AWS EC2 service (if you haven't already done so)
    2. Add a reference to the name of the key pair to your Terraform variables (we'll set the key_name Terraform variable)
    3. Choose an open local port on your machine (we'll use 9000 in the following example)
    4. Add a reference to the value of your cumulus_distribution_api_uri (mentioned earlier), including your chosen port (we'll set the cumulus_distribution_url Terraform variable)
    5. Redeploy Cumulus
    6. Add an entry to your /etc/hosts file
    7. Add a redirect URI to Cognito via the Cognito API
    8. Install the Session Manager Plugin for the AWS CLI (if you haven't already done so; assuming you have already installed the AWS CLI)
    9. Add a sample file to S3 to test downloading via Cognito

    To create or import an existing key pair, you can use the AWS CLI (see AWS ec2 import-key-pair), or the AWS Console (see Amazon EC2 key pairs and Linux instances).

    Once your key pair is added to AWS, add the following to your cumulus-tf/terraform.tfvars file:

    key_name = "<name>"
    cumulus_distribution_url = "https://<id>.execute-api.<region>.amazonaws.com:<port>/dev/"

    where:

    • <name> is the name of the key pair you just added to AWS
    • <id> and <region> are the corresponding parts from your cumulus_distribution_api_uri output variable
    • <port> is your open local port of choice (9000 is typically a good choice)

    Once you save your variable changes, redeploy your cumulus-tf module.

    While your deployment runs, add the following entry to your /etc/hosts file, replacing <hostname> with the host name of the cumulus_distribution_url Terraform variable you just added above:

    localhost <hostname>

    Next, you'll need to use the Cognito API to add the value of your cumulus_distribution_url Terraform variable as a Cognito redirect URI. To do so, use your favorite tool (e.g., curl, wget, Postman, etc.) to make a BasicAuth request to the Cognito API, using the following details:

    • method: POST
    • base URL: the value of your csdap_host_url Terraform variable
    • path: /authclient/updateRedirectUri
    • username: the value of your csdap_client_id Terraform variable
    • password: the value of your csdap_client_password Terraform variable
    • headers: Content-Type='application/x-www-form-urlencoded'
    • body: redirect_uri=<cumulus_distribution_url>/login

    where <cumulus_distribution_url> is the value of your cumulus_distribution_url Terraform variable. Note the /login path at the end of the redirect_uri value.

    For reference, see the Cognito Authentication Service API.

    Next, install the Session Manager Plugin for the AWS CLI. If running on macOS, and you use Homebrew, you can install it simply as follows:

    brew install --cask session-manager-plugin --no-quarantine

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    At this point, you should be ready to open a tunnel and attempt to download your sample file via your browser, summarized as follows:

    1. Determine your EC2 instance ID
    2. Connect to the NASA VPN
    3. Start an AWS SSM session
    4. Open an SSH tunnel
    5. Use a browser to navigate to your file

    To determine your EC2 instance ID for your Cumulus deployment, run the follow command where <profile> is the name of the appropriate AWS profile to use, and <prefix> is the value of your prefix Terraform variable:

    aws --profile <profile> ec2 describe-instances --filters Name=tag:Deployment,Values=<prefix> Name=instance-state-name,Values=running --query "Reservations[0].Instances[].InstanceId" --output text
    Connect to NASA VPN

    Before proceeding with the remaining steps, make sure you are connected to the NASA VPN.

    Use the value output from the command above in place of <id> in the following command, which will start an SSM session:

    aws ssm start-session --target <id> --document-name AWS-StartPortForwardingSession --parameters portNumber=22,localPortNumber=6000

    If successful, you should see output similar to the following:

    Starting session with SessionId: NGAPShApplicationDeveloper-***
    Port 6000 opened for sessionId NGAPShApplicationDeveloper-***.
    Waiting for connections...

    In another terminal window, open a tunnel with port forwarding using your chosen port from above (e.g., 9000):

    ssh -4 -p 6000 -N -L <port>:<api-gateway-host>:443 ec2-user@127.0.0.1

    where:

    • <port> is the open local port you chose earlier (e.g., 9000)
    • <api-gateway-host> is the hostname of your private API Gateway (i.e., the host portion of the URL you used as the value of your cumulus_distribution_url Terraform variable above)

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3 above.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, and then next enter a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    Once you're finished testing, clean up as follows:

    1. Stop your SSH tunnel (enter Ctrl-C)
    2. Stop your AWS SSM session (enter Ctrl-C)
    3. If you like, disconnect from the NASA VPN

    While this is a relatively lengthy process, things are much easier when using CloudFront, such as in Production (OPS), SIT, or UAT, as explained next.

    Using a CloudFront URL as Your Distribution URL

    In Production (OPS), and perhaps in other environments, such as UAT and SIT, you'll need to provide a publicly accessible URL for users to use for downloading (distributing) granule files.

    This is generally done by placing a CloudFront URL in front of your private Cumulus Distribution API Gateway. In order to create such a CloudFront URL, contact the person who helped you obtain your Cognito credentials, and request a CloudFront URL with the following details:

    • The private, backing URL, which is the value of your cumulus_distribution_api_uri Terraform output value
    • A request to add the AWS account's VPC to the whitelist

    Once this request is completed, and you obtain the new CloudFront URL, override your default distribution URL with the CloudFront URL by adding the following to your cumulus-tf/terraform.tfvars file:

    cumulus_distribution_url = <cloudfront_url>

    In addition, add a Cognito redirect URI, as detailed in the previous section. Note that in this case, the value you'll use for redirect_uri is <cloudfront_url>/login since the value of your cumulus_distribution_url is now your CloudFront URL.

    At this point, it is assumed that you have added the appropriate values for this environment for the variables described at the top (csdap_host_url, csdap_client_id, and csdap_client_password).

    Redeploy Cumulus with your new/updated Terraform variables.

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    S3 Bucket Mapping

    An S3 Bucket map allows users to abstract bucket names. If the bucket names change at any point, only the bucket map would need to be updated instead of every S3 link.

    The Cumulus Distribution API uses a bucket_map.yaml or bucket_map.yaml.tmpl file to determine which buckets to serve. See the examples.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple JSON mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }
    cumulus bucket mapping

    Cumulus only supports a one-to-one mapping of bucket -> Cumulus Distribution path for 'distribution' buckets. Also, the bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Switching from the Thin Egress App to Cumulus Distribution

    If you have previously deployed the Thin Egress App (TEA) as your distribution app, you can switch to Cumulus Distribution by following the steps above.

    Note, however, that the cumulus_distribution module will generate a bucket map cache and overwrite any existing bucket map caches created by TEA.

    There will also be downtime while your API Gateway is updated.

    - + \ No newline at end of file diff --git a/docs/next/deployment/databases-introduction/index.html b/docs/next/deployment/databases-introduction/index.html index 5b356333eae..1950e444e39 100644 --- a/docs/next/deployment/databases-introduction/index.html +++ b/docs/next/deployment/databases-introduction/index.html @@ -5,13 +5,13 @@ Databases | Cumulus Documentation - +
    Version: Next

    Databases

    Cumulus Core Database

    Cumulus uses a PostgreSQL database as its primary data store for operational and archive records (e.g. collections, granules, etc). We expect a PostgreSQL database to be provided by the AWS RDS service; however, there are two types of the RDS database which we will explore in the upcoming pages.

    Types of Databases

    - + \ No newline at end of file diff --git a/docs/next/deployment/index.html b/docs/next/deployment/index.html index d5e2bdace91..2dc2638e8b2 100644 --- a/docs/next/deployment/index.html +++ b/docs/next/deployment/index.html @@ -5,7 +5,7 @@ How to Deploy Cumulus | Cumulus Documentation - + @@ -19,7 +19,7 @@ for deployment's EC2 instances and allows you to connect to them via SSH/SSM.

    Consider the sizing of your Cumulus instance when configuring your variables.

    Choose a Distribution API

    Default Configuration

    If you are deploying from the Cumulus Deployment Template or a configuration based on that repo, the Thin Egress App (TEA) distribution app will be used by default.

    Configuration Options

    Cumulus can be configured to use either TEA or the Cumulus Distribution API. The default selection is the Thin Egress App if you're using the Deployment Template.

    note

    If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    Configure the Thin Egress App

    TEA can be used for Cumulus distribution and is the default selection. It allows authentication using Earthdata Login. Follow the steps in the TEA documentation to configure distribution in your cumulus-tf deployment.

    Configure the Cumulus Distribution API (Optional)

    If you would prefer to use the Cumulus Distribution API, which supports AWS Cognito authentication, follow these steps to configure distribution in your cumulus-tf deployment.

    Initialize Terraform

    Follow the above instructions to initialize Terraform using terraform init3.

    Deploy

    Run terraform apply to deploy the resources. Type yes when prompted to confirm that you want to create the resources. Assuming the operation is successful, you should see output like this:

    Apply complete! Resources: 292 added, 0 changed, 0 destroyed.

    Outputs:

    archive_api_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/token
    archive_api_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/
    distribution_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/login
    distribution_url = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/
    note

    Be sure to copy the redirect URLs because you will need them to update your Earthdata application.

    Update Earthdata Application

    Add the two redirect URLs to your EarthData login application by doing the following:

    1. Login to URS
    2. Under My Applications -> Application Administration -> use the edit icon of your application
    3. Under Manage -> redirect URIs, add the Archive API url returned from the stack deployment
      • e.g. archive_api_redirect_uri = https://<czbbkscuy6>.execute-api.us-east-1.amazonaws.com/dev/token
    4. Also add the Distribution url
      • e.g. distribution_redirect_uri = https://<kido2r7kji>.execute-api.us-east-1.amazonaws.com/dev/login1
    5. You may delete the placeholder url you used to create the application

    If you've lost track of the needed redirect URIs, they can be located on the API Gateway. Once there, select <prefix>-archive and/or <prefix>-thin-egress-app-EgressGateway, Dashboard and utilizing the base URL at the top of the page that is accompanied by the text Invoke this API at:. Make sure to append /token for the archive URL and /login to the thin egress app URL.


    Deploy Cumulus Dashboard

    Dashboard Requirements

    what you will need

    The requirements are similar to the Cumulus stack deployment requirements. The installation instructions below include a step that will install/use the required node version referenced in the .nvmrc file in the Dashboard repository.

    Prepare AWS

    Create S3 Bucket for Dashboard:

    • Create it, e.g. <prefix>-dashboard. Use the command line or console as you did when preparing AWS configuration.
    • Configure the bucket to host a website:
      • AWS S3 console: Select <prefix>-dashboard bucket then, "Properties" -> "Static Website Hosting", point to index.html
      • CLI: aws s3 website s3://<prefix>-dashboard --index-document index.html
    • The bucket's url will be http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or you can find it on the AWS console via "Properties" -> "Static website hosting" -> "Endpoint"
    • Ensure the bucket's access permissions allow your deployment user access to write to the bucket

    Install Dashboard

    To install the Cumulus Dashboard, clone the repository into the root deploy directory and install dependencies with npm install:

      git clone https://github.com/nasa/cumulus-dashboard
    cd cumulus-dashboard
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Dashboard Versioning

    By default, the master branch will be used for Dashboard deployments. The master branch of the repository contains the most recent stable release of the Cumulus Dashboard.

    If you want to test unreleased changes to the Dashboard, use the develop branch.

    Each release/version of the Dashboard will have a tag in the Dashboard repo. Release/version numbers will use semantic versioning (major/minor/patch).

    To checkout and install a specific version of the Dashboard:

      git fetch --tags
    git checkout <version-number> # e.g. v1.2.0
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Building the Dashboard

    caution

    These environment variables are available during the build: APIROOT, DAAC_NAME, STAGE, HIDE_PDR. Any of these can be set on the command line to override the values contained in config.js when running the build below.

    To configure your dashboard for deployment, set the APIROOT environment variable to your app's API root.2

    Build your dashboard from the Cumulus Dashboard repository root directory, cumulus-dashboard:

      APIROOT=<your_api_root> npm run build

    Dashboard Deployment

    Deploy your dashboard to S3 bucket from the cumulus-dashboard directory:

    Using AWS CLI:

      aws s3 sync dist s3://<prefix>-dashboard

    From the S3 Console:

    • Open the <prefix>-dashboard bucket, click 'upload'. Add the contents of the 'dist' subdirectory to the upload. Then select 'Next'. On the permissions window allow the public to view. Select 'Upload'.

    You should be able to visit the Dashboard website at http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or find the url <prefix>-dashboard -> "Properties" -> "Static website hosting" -> "Endpoint" and log in with a user that you had previously configured for access.


    Cumulus Instance Sizing

    The Cumulus deployment default sizing for Elasticsearch instances, EC2 instances, and Autoscaling Groups are small and designed for testing and cost savings. The default settings are likely not suitable for production workloads. Sizing is highly individual and dependent on expected load and archive size.

    aws cost calculator

    Please be cognizant of costs as any change in size will affect your AWS bill. AWS provides a pricing calculator for estimating costs.

    Elasticsearch

    The mappings file contains all of the data types that will be indexed into Elasticsearch. Elasticsearch sizing is tied to your archive size, including your collections, granules, and workflow executions that will be stored.

    AWS provides documentation on calculating and configuring for sizing.

    In addition to size you'll want to consider the number of nodes which determine how the system reacts in the event of a failure.

    Configuration can be done in the data persistence module in elasticsearch_config and the cumulus module in es_index_shards.

    reindex after changes

    If you make changes to your Elasticsearch configuration you will need to reindex for those changes to take effect.

    EC2 Instances and Autoscaling Groups

    EC2 instances are used for long-running operations (i.e. generating a reconciliation report) and long-running workflow tasks. Configuration for your ECS cluster is achieved via Cumulus deployment variables.

    When configuring your ECS cluster consider:

    • The EC2 instance type and EBS volume size needed to accommodate your workloads. Configured as ecs_cluster_instance_type and ecs_cluster_instance_docker_volume_size.
    • The minimum and desired number of instances on hand to accommodate your workloads. Configured as ecs_cluster_min_size and ecs_cluster_desired_size.
    • The maximum number of instances you will need and are willing to pay for to accommodate your heaviest workloads. Configured as ecs_cluster_max_size.
    • Your autoscaling parameters: ecs_cluster_scale_in_adjustment_percent, ecs_cluster_scale_out_adjustment_percent, ecs_cluster_scale_in_threshold_percent, and ecs_cluster_scale_out_threshold_percent.

    Footnotes


    1. Run terraform init if:

      • This is the first time deploying the module
      • You have added any additional child modules, including Cumulus components
      • You have updated the source for any of the child modules

    2. To add another redirect URIs to your application. On Earthdata home page, select "My Applications". Scroll down to "Application Administration" and use the edit icon for your application. Then Manage -> Redirect URIs.

    3. The API root can be found a number of ways. The easiest is to note it in the output of the app deployment step. But you can also find it from the AWS console -> Amazon API Gateway -> APIs -> <prefix>-archive -> Dashboard, and reading the URL at the top after "Invoke this API at"

    - + \ No newline at end of file diff --git a/docs/next/deployment/postgres_database_deployment/index.html b/docs/next/deployment/postgres_database_deployment/index.html index 0d1b9714d1f..620528594dd 100644 --- a/docs/next/deployment/postgres_database_deployment/index.html +++ b/docs/next/deployment/postgres_database_deployment/index.html @@ -5,7 +5,7 @@ PostgreSQL Database Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ cumulus-rds-tf that will deploy an AWS RDS Aurora Serverless PostgreSQL 11 compatible database cluster, and optionally provision a single deployment database with credentialed secrets for use with Cumulus.

    We have provided an example terraform deployment using this module in the Cumulus template-deploy repository on GitHub.

    Use of this example involves:

    • Creating/configuring a Terraform module directory
    • Using Terraform to deploy resources to AWS

    Requirements

    Configuration/installation of this module requires the following:

    • Terraform
    • git
    • A VPC configured for use with Cumulus Core. This should match the subnets you provide when Deploying Cumulus to allow Core's lambdas to properly access the database.
    • At least two subnets across multiple AZs. These should match the subnets you provide as configuration when Deploying Cumulus, and should be within the same VPC.

    Needed Git Repositories

    Assumptions

    OS/Environment

    The instructions in this module require Linux/MacOS. While deployment via Windows is possible, it is unsupported.

    Terraform

    This document assumes knowledge of Terraform. If you are not comfortable working with Terraform, the following links should bring you up to speed:

    For Cumulus specific instructions on installation of Terraform, refer to the main Cumulus Installation Documentation.

    Aurora/RDS

    This document also assumes some basic familiarity with PostgreSQL databases and Amazon Aurora/RDS. If you're unfamiliar consider perusing the AWS docs and the Aurora Serverless V1 docs.

    Prepare Deployment Repository

    tip

    If you already are working with an existing repository that has a configured rds-cluster-tf deployment for the version of Cumulus you intend to deploy or update, or you need to only configure this module for your repository, skip to Prepare AWS Configuration.

    Clone the cumulus-template-deploy repo and name appropriately for your organization:

      git clone https://github.com/nasa/cumulus-template-deploy <repository-name>

    We will return to configuring this repo and using it for deployment below.

    Optional: Create a New Repository

    Create a new repository on GitHub so that you can add your workflows and other modules to source control:

      git remote set-url origin https://github.com/<org>/<repository-name>
    git push origin master

    You can then add/commit changes as needed.

    Update Your Gitignore File

    If you are pushing your deployment code to a git repo, make sure to add terraform.tf and terraform.tfvars to .gitignore, as these files will contain sensitive data related to your AWS account.


    Prepare AWS Configuration

    To deploy this module, you need to make sure that you have the following steps from the Cumulus deployment instructions in similar fashion for this module:


    Configure and Deploy the Module

    When configuring this module, please keep in mind that unlike Cumulus deployment, this module should be deployed once to create the database cluster and only thereafter to make changes to that configuration/upgrade/etc.

    tip

    This module does not need to be re-deployed for each Core update.

    These steps should be executed in the rds-cluster-tf directory of the template deploy repo that you previously cloned. Run the following to copy the example files:

    cd rds-cluster-tf/
    cp terraform.tf.example terraform.tf
    cp terraform.tfvars.example terraform.tfvars

    In terraform.tf, configure the remote state settings by substituting the appropriate values for:

    • bucket
    • dynamodb_table
    • PREFIX (whatever prefix you've chosen for your deployment)

    Fill in the appropriate values in terraform.tfvars. See the rds-cluster-tf module variable definitions for more detail on all of the configuration options. A few notable configuration options are documented in the next section.

    Configuration Options

    • deletion_protection -- defaults to true. Set it to false if you want to be able to delete your cluster with a terraform destroy without manually updating the cluster.
    • db_admin_username -- cluster database administration username. Defaults to postgres.
    • db_admin_password -- required variable that specifies the admin user password for the cluster. To randomize this on each deployment, consider using a random_string resource as input.
    • region -- defaults to us-east-1.
    • subnets -- requires at least 2 across different AZs. For use with Cumulus, these AZs should match the values you configure for your lambda_subnet_ids.
    • max_capacity -- the max ACUs the cluster is allowed to use. Carefully consider cost/performance concerns when setting this value.
    • min_capacity -- the minimum ACUs the cluster will scale to
    • provision_user_database -- Optional flag to allow module to provision a user database in addition to creating the cluster. Described in the next section.

    Provision User and User Database

    If you wish for the module to provision a PostgreSQL database on your new cluster and provide a secret for access in the module output, in addition to managing the cluster itself, the following configuration keys are required:

    • provision_user_database -- must be set to true. This configures the module to deploy a lambda that will create the user database, and update the provided configuration on deploy.
    • permissions_boundary_arn -- the permissions boundary to use in creating the roles for access the provisioning lambda will need. This should in most use cases be the same one used for Cumulus Core deployment.
    • rds_user_password -- the value to set the user password to.
    • prefix -- this value will be used to set a unique identifier for the ProvisionDatabase lambda, as well as name the provisioned user/database.

    Once configured, the module will deploy the lambda and run it on each provision thus creating the configured database (if it does not exist), updating the user password (if that value has been changed), and updating the output user database secret.

    Setting provision_user_database to false after provisioning will not result in removal of the configured database, as the lambda is non-destructive as configured in this module.

    note

    This functionality is limited in that it will only provision a single database/user and configure a basic database, and should not be used in scenarios where more complex configuration is required.

    Initialize Terraform

    Run terraform init

    You should see a similar output:

    * provider.aws: version = "~> 2.32"

    Terraform has been successfully initialized!

    Deploy

    Run terraform apply to deploy the resources.

    caution

    If re-applying this module, variables (e.g. engine_version, snapshot_identifier ) that force a recreation of the database cluster may result in data loss if deletion protection is disabled. Examine the changeset carefully for resources that will be re-created/destroyed before applying.

    Review the changeset, and assuming it looks correct, type yes when prompted to confirm that you want to create all of the resources.

    Assuming the operation is successful, you should see output similar to the following (this example omits the creation of a user's database, lambdas, and security groups):

    Output Example
    terraform apply

    An execution plan has been generated and is shown below.
    Resource actions are indicated with the following symbols:
    + create

    Terraform will perform the following actions:

    # module.rds_cluster.aws_db_subnet_group.default will be created
    + resource "aws_db_subnet_group" "default" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + subnet_ids = [
    + "subnet-xxxxxxxxx",
    + "subnet-xxxxxxxxx",
    ]
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    }

    # module.rds_cluster.aws_rds_cluster.cumulus will be created
    + resource "aws_rds_cluster" "cumulus" {
    + apply_immediately = true
    + arn = (known after apply)
    + availability_zones = (known after apply)
    + backup_retention_period = 1
    + cluster_identifier = "xxxxxxxxx"
    + cluster_identifier_prefix = (known after apply)
    + cluster_members = (known after apply)
    + cluster_resource_id = (known after apply)
    + copy_tags_to_snapshot = false
    + database_name = "xxxxxxxxx"
    + db_cluster_parameter_group_name = (known after apply)
    + db_subnet_group_name = (known after apply)
    + deletion_protection = true
    + enable_http_endpoint = true
    + endpoint = (known after apply)
    + engine = "aurora-postgresql"
    + engine_mode = "serverless"
    + engine_version = "10.12"
    + final_snapshot_identifier = "xxxxxxxxx"
    + hosted_zone_id = (known after apply)
    + id = (known after apply)
    + kms_key_id = (known after apply)
    + master_password = (sensitive value)
    + master_username = "xxxxxxxxx"
    + port = (known after apply)
    + preferred_backup_window = "07:00-09:00"
    + preferred_maintenance_window = (known after apply)
    + reader_endpoint = (known after apply)
    + skip_final_snapshot = false
    + storage_encrypted = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_security_group_ids = (known after apply)

    + scaling_configuration {
    + auto_pause = true
    + max_capacity = 4
    + min_capacity = 2
    + seconds_until_auto_pause = 300
    + timeout_action = "RollbackCapacityChange"
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret.rds_login will be created
    + resource "aws_secretsmanager_secret" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + policy = (known after apply)
    + recovery_window_in_days = 30
    + rotation_enabled = (known after apply)
    + rotation_lambda_arn = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }

    + rotation_rules {
    + automatically_after_days = (known after apply)
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret_version.rds_login will be created
    + resource "aws_secretsmanager_secret_version" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + secret_id = (known after apply)
    + secret_string = (sensitive value)
    + version_id = (known after apply)
    + version_stages = (known after apply)
    }

    # module.rds_cluster.aws_security_group.rds_cluster_access will be created
    + resource "aws_security_group" "rds_cluster_access" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + egress = (known after apply)
    + id = (known after apply)
    + ingress = (known after apply)
    + name = (known after apply)
    + name_prefix = "cumulus_rds_cluster_access_ingress"
    + owner_id = (known after apply)
    + revoke_rules_on_delete = false
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_id = "vpc-xxxxxxxxx"
    }

    # module.rds_cluster.aws_security_group_rule.rds_security_group_allow_PostgreSQL will be created
    + resource "aws_security_group_rule" "rds_security_group_allow_postgres" {
    + from_port = 5432
    + id = (known after apply)
    + protocol = "tcp"
    + security_group_id = (known after apply)
    + self = true
    + source_security_group_id = (known after apply)
    + to_port = 5432
    + type = "ingress"
    }

    Plan: 6 to add, 0 to change, 0 to destroy.

    Do you want to perform these actions?
    Terraform will perform the actions described above.
    Only 'yes' will be accepted to approve.

    Enter a value: yes

    module.rds_cluster.aws_db_subnet_group.default: Creating...
    module.rds_cluster.aws_security_group.rds_cluster_access: Creating...
    module.rds_cluster.aws_secretsmanager_secret.rds_login: Creating...

    Then, after the resources are created:

    Apply complete! Resources: X added, 0 changed, 0 destroyed.
    Releasing state lock. This may take a few moments...

    Outputs:

    admin_db_login_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxxxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmdR
    admin_db_login_secret_version = xxxxxxxxx
    rds_endpoint = xxxxxxxxx.us-east-1.rds.amazonaws.com
    security_group_id = xxxxxxxxx
    user_credentials_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmpXA

    Note the output values for admin_db_login_secret_arn (and optionally user_credentials_secret_arn) as these provide the AWS Secrets Manager secrets required to access the database as the administrative user and, optionally, the user database credentials Cumulus requires as well.

    The content of each of these secrets are in the form:

    {
    "database": "postgres",
    "dbClusterIdentifier": "clusterName",
    "engine": "postgres",
    "host": "xxx",
    "password": "defaultPassword",
    "port": 5432,
    "username": "xxx"
    }
    • database -- the PostgreSQL database used by the configured user
    • dbClusterIdentifier -- the value set by the cluster_identifier variable in the terraform module
    • engine -- the Aurora/RDS database engine
    • host -- the RDS service host for the database in the form (dbClusterIdentifier)-(AWS ID string).(region).rds.amazonaws.com
    • password -- the database password
    • username -- the account username
    • port -- The database connection port, should always be 5432

    Connect to PostgreSQL DB via pgAdmin

    If you would like to manage your PostgreSQL database in an GUI tool, you can via pgAdmin.

    Requirements

    SSH Setup in AWS Secrets Manager

    You will need to navigate to AWS Secrets Manager and retrieve the secret values for your database. The secret name will contain the string _db_login and your prefix. Click the "Retrieve secret value" button (Retrieve secret value)to see the secret values.

    The value for your secret name can also be retrieved from the data-persistence-tf directory with the command terraform output.

    pgAdmin values to retrieve

    Setup ~/.ssh/config

    Replace HOST value and PORT value with the values retrieved from Secrets Manager.

    The LocalForward number 9202 can be any unused LocalForward number in your SSH config:

    Host ssm-proxy
    Hostname 127.0.0.1
    User ec2-user
    LocalForward 9202 [HOST value]:[PORT value]
    IdentityFile ~/.ssh/id_rsa
    Port 6868

    Create a Local Port Forward

    • Create a local port forward to the SSM box port 22, this creates a tunnel from <local ssh port> to the SSH port on the SSM host.
    caution

    <local ssh port> should not be 8000.

    • Replace the following command values for <instance id> with your instance ID:
    aws ssm start-session --target <instance id> --document-name AWS-StartPortForwardingSession --parameters portNumber=22,localPortNumber=6868
    • Then, in another terminal tab, enter:
    ssh ssm-proxy

    Create PgAdmin Server

    • Open pgAdmin and begin creating a new server (in newer versions it may be registering a new server).

    Creating a pgAdmin server

    • In the "Connection" tab, enter the values retrieved from Secrets Manager. Host name/address and Port should be the Hostname and LocalForward number from the ~/.ssh/config file.

    pgAdmin server connection value entries

    note

    Maintenance database corresponds to "database".

    You can select "Save Password?" to save your password. Click "Save" when you are finished. You should see your new server in pgAdmin.

    Query Your Database

    • In the "Browser" area find your database, navigate to the name, and click on it.

    • Select the "Query Editor" to begin writing queries to your database.

    Using the query editor in pgAdmin

    You are all set to manage your queries in pgAdmin!


    Next Steps

    Your database cluster has been created/updated! From here you can continue to add additional user accounts, databases, and other database configurations.

    - + \ No newline at end of file diff --git a/docs/next/deployment/share-s3-access-logs/index.html b/docs/next/deployment/share-s3-access-logs/index.html index 6d66aaec05c..6bd73585073 100644 --- a/docs/next/deployment/share-s3-access-logs/index.html +++ b/docs/next/deployment/share-s3-access-logs/index.html @@ -5,13 +5,13 @@ Share S3 Access Logs | Cumulus Documentation - +
    Version: Next

    Share S3 Access Logs

    It is possible through Cumulus to share S3 access logs across multiple S3 packages using the S3 replicator package.

    S3 Replicator

    The S3 Replicator is a Node.js package that contains a simple Lambda function, associated permissions, and the Terraform instructions to replicate create-object events from one S3 bucket to another.

    First, ensure that you have enabled S3 Server Access Logging.

    Next, configure your terraform.tfvars as described in the s3-replicator/README.md to correspond to your deployment. The source_bucket and source_prefix are determined by how you enabled the S3 Server Access Logging.

    In order to deploy the s3-replicator with Cumulus you will need to add the module to your terraform main.tf definition as the example below:

    module "s3-replicator" {
    source = "<path to s3-replicator.zip>"
    prefix = var.prefix
    vpc_id = var.vpc_id
    subnet_ids = var.subnet_ids
    permissions_boundary = var.permissions_boundary_arn
    source_bucket = var.s3_replicator_config.source_bucket
    source_prefix = var.s3_replicator_config.source_prefix
    target_bucket = var.s3_replicator_config.target_bucket
    target_prefix = var.s3_replicator_config.target_prefix
    }

    The Terraform source package can be found on the Cumulus GitHub Release page under the asset tab terraform-aws-cumulus-s3-replicator.zip.

    ESDIS Metrics

    In the NGAP environment, the ESDIS Metrics team has set up an ELK stack to process logs from Cumulus instances. To use this system, you must deliver any S3 Server Access logs that Cumulus creates.

    Configure the S3 Replicator as described above using the target_bucket and target_prefix provided by the Metrics team.

    The Metrics team has taken care of setting up Logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    info

    For a more in-depth overview regarding ESDIS Metrics view the Cumulus Distribution Metrics section.

    - + \ No newline at end of file diff --git a/docs/next/deployment/terraform-best-practices/index.html b/docs/next/deployment/terraform-best-practices/index.html index c6147cd2c97..466bcbaaead 100644 --- a/docs/next/deployment/terraform-best-practices/index.html +++ b/docs/next/deployment/terraform-best-practices/index.html @@ -5,7 +5,7 @@ Terraform Best Practices | Cumulus Documentation - + @@ -84,7 +84,7 @@ are any dangling resources left behind for any reason, by running the following AWS CLI command, replacing PREFIX with your deployment prefix name:

    aws resourcegroupstaggingapi get-resources \
    --query "ResourceTagMappingList[].ResourceARN" \
    --tag-filters Key=Deployment,Values=PREFIX

    Ideally, the output should be an empty list, but if it is not, then you may need to manually delete the listed resources.

    - + \ No newline at end of file diff --git a/docs/next/deployment/thin_egress_app/index.html b/docs/next/deployment/thin_egress_app/index.html index 56166b4609a..9dca044525e 100644 --- a/docs/next/deployment/thin_egress_app/index.html +++ b/docs/next/deployment/thin_egress_app/index.html @@ -5,7 +5,7 @@ Using the Thin Egress App (TEA) for Cumulus Distribution | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: Next

    Using the Thin Egress App (TEA) for Cumulus Distribution

    The Thin Egress App (TEA) is an app running in Lambda that allows retrieving data from S3 using temporary links and provides URS integration.

    Configuring a TEA Deployment

    TEA is deployed using Terraform modules. Refer to these instructions for guidance on how to integrate new components with your deployment.

    The cumulus-template-deploy repository cumulus-tf/main.tf contains a thin_egress_app for distribution.

    The TEA module provides these instructions showing how to add it to your deployment and the following are instructions to configure the thin_egress_app module in your Cumulus deployment.

    Create a Secret for Signing Thin Egress App JWTs

    The Thin Egress App uses JSON Web Tokens (JWTs) internally to authenticate requests and requires a secret stored in AWS Secrets Manager containing SSH keys that are used to sign the JWTs.

    See the Thin Egress App documentation on how to create this secret with the correct values. It will be used later to set the thin_egress_jwt_secret_name variable when deploying the Cumulus module.

    Bucket_map.yaml

    The Thin Egress App uses a bucket_map.yaml file to determine which buckets to serve. Documentation of the file format is available here.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple JSON mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }
    info

    Cumulus only supports a one-to-one mapping of bucket->TEA path for 'distribution' buckets.

    Optionally Configure a Custom Bucket Map

    A simple configuration would look something like this:

    bucket_map.yaml
    MAP:
    my-protected: my-protected
    my-public: my-public

    PUBLIC_BUCKETS:
    - my-public
    caution

    Your custom bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Optionally Configure Shared Variables

    The cumulus module deploys certain components that interact with TEA. As a result, the cumulus module requires that if you are specifying a value for the stage_name variable to the TEA module, you must use the same value for the tea_api_gateway_stage variable to the cumulus module.

    One way to keep these variable values in sync across the modules is to use Terraform local values to define values to use for the variables for both modules. This approach is shown in the Cumulus Core example deployment code.

    - + \ No newline at end of file diff --git a/docs/next/deployment/upgrade-readme/index.html b/docs/next/deployment/upgrade-readme/index.html index 168c57e822f..7a0ca84dbd5 100644 --- a/docs/next/deployment/upgrade-readme/index.html +++ b/docs/next/deployment/upgrade-readme/index.html @@ -5,7 +5,7 @@ Upgrading Cumulus | Cumulus Documentation - + @@ -15,7 +15,7 @@ deployment functions correctly. Please refer to some recommended smoke tests given above, and consider additional tests appropriate for your particular deployment and environment.

    Update Cumulus Dashboard

    If there are breaking (or otherwise significant) changes to the Cumulus API, you should also upgrade your Cumulus Dashboard deployment to use the version of the Cumulus API matching the version of Cumulus to which you are migrating.

    - + \ No newline at end of file diff --git a/docs/next/development/forked-pr/index.html b/docs/next/development/forked-pr/index.html index 94a9569d738..12ce67b08dc 100644 --- a/docs/next/development/forked-pr/index.html +++ b/docs/next/development/forked-pr/index.html @@ -5,13 +5,13 @@ Issuing PR From Forked Repos | Cumulus Documentation - +
    Version: Next

    Issuing PR From Forked Repos

    Fork the Repo

    • Fork the Cumulus repo
    • Create a new branch from the branch you'd like to contribute to
    • If an issue does't already exist, submit one (see above)

    Create a Pull Request

    Reviewing PRs from Forked Repos

    Upon submission of a pull request, the Cumulus development team will review the code.

    Once the code passes an initial review, the team will run the CI tests against the proposed update.

    The request will then either be merged, declined, or an adjustment to the code will be requested via the issue opened with the original PR request.

    PRs from forked repos cannot directly merged to master. Cumulus reviews must follow the following steps before completing the review process:

    1. Create a new branch:

        git checkout -b from-<name-of-the-branch> master
    2. Push the new branch to GitHub

    3. Change the destination of the forked PR to the new branch that was just pushed

      Screenshot of Github interface showing how to change the base branch of a pull request

    4. After code review and approval, merge the forked PR to the new branch.

    5. Create a PR for the new branch to master.

    6. If the CI tests pass, merge the new branch to master and close the issue. If the CI tests do not pass, request an amended PR from the original author/ or resolve failures as appropriate.

    - + \ No newline at end of file diff --git a/docs/next/development/integration-tests/index.html b/docs/next/development/integration-tests/index.html index 53da7c6f89c..76a36fee782 100644 --- a/docs/next/development/integration-tests/index.html +++ b/docs/next/development/integration-tests/index.html @@ -5,7 +5,7 @@ Integration Tests | Cumulus Documentation - + @@ -19,7 +19,7 @@ in the commit message.

    If you create a new stack and want to be able to run integration tests against it in CI, you will need to add it to bamboo/select-stack.js.

    - + \ No newline at end of file diff --git a/docs/next/development/quality-and-coverage/index.html b/docs/next/development/quality-and-coverage/index.html index 338449a0b02..e5f16018add 100644 --- a/docs/next/development/quality-and-coverage/index.html +++ b/docs/next/development/quality-and-coverage/index.html @@ -5,7 +5,7 @@ Code Coverage and Quality | Cumulus Documentation - + @@ -23,7 +23,7 @@ here.

    To run linting on the markdown files, run npm run lint-md.

    Audit

    This project uses audit-ci to run a security audit on the package dependency tree. This must pass prior to merge. The configured rules for audit-ci can be found here.

    To execute an audit, run npm run audit.

    - + \ No newline at end of file diff --git a/docs/next/development/release/index.html b/docs/next/development/release/index.html index ec1b431c821..9fd91c10621 100644 --- a/docs/next/development/release/index.html +++ b/docs/next/development/release/index.html @@ -5,7 +5,7 @@ Versioning and Releases | Cumulus Documentation - + @@ -23,7 +23,7 @@ this is a backport and patch release on the 13.3.x series of releases. Updates that are included in the future will have a corresponding CHANGELOG entry in future releases..

    Troubleshooting

    Delete and regenerate the tag

    To delete a published tag to re-tag, follow these steps:

      git tag -d vMAJOR.MINOR.PATCH
    git push -d origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -d v9.1.0
    git push -d origin v9.1.0
    - + \ No newline at end of file diff --git a/docs/next/docs-how-to/index.html b/docs/next/docs-how-to/index.html index d2edc796457..17278183247 100644 --- a/docs/next/docs-how-to/index.html +++ b/docs/next/docs-how-to/index.html @@ -5,7 +5,7 @@ Cumulus Documentation: How To's | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: Next

    Cumulus Documentation: How To's

    Cumulus Docs Installation

    Run a Local Server

    Environment variables DOCSEARCH_APP_ID, DOCSEARCH_API_KEY and DOCSEARCH_INDEX_NAME must be set for search to work. At the moment, search is only truly functional on prod because that is the only website we have registered to be indexed with DocSearch (see below on search).

    git clone git@github.com:nasa/cumulus
    cd cumulus
    npm run docs-install
    npm run docs-serve
    note

    docs-build will build the documents into website/build. docs-clear will clear the documents.

    caution

    Fix any broken links reported by Docusaurus if you see the following messages during build.

    [INFO] Docusaurus found broken links!

    Exhaustive list of all broken links found:

    Cumulus Documentation

    Our project documentation is hosted on GitHub Pages. The resources published to this website are housed in docs/ directory at the top of the Cumulus repository. Those resources primarily consist of markdown files and images.

    We use the open-source static website generator Docusaurus to build html files from our markdown documentation, add some organization and navigation, and provide some other niceties in the final website (search, easy templating, etc.).

    Add a New Page and Sidebars

    Adding a new page should be as simple as writing some documentation in markdown, placing it under the correct directory in the docs/ folder and adding some configuration values wrapped by --- at the top of the file. There are many files that already have this header which can be used as reference.

    ---
    id: doc-unique-id # unique id for this document. This must be unique across ALL documentation under docs/
    title: Title Of Doc # Whatever title you feel like adding. This will show up as the index to this page on the sidebar.
    hide_title: false
    ---
    note

    To have the new page show up in a sidebar the designated id must be added to a sidebar in the website/sidebars.js file. Docusaurus has an in depth explanation of sidebars here.

    Versioning Docs

    We lean heavily on Docusaurus for versioning. Their suggestions and walk-through can be found here. Docusaurus v2 uses snapshot approach for documentation versioning. Every versioned docs does not depends on other version. It is worth noting that we would like the Documentation versions to match up directly with release versions. However, a new versioned docs can take up a lot of repo space and require maintenance, we suggest to update existing versioned docs for minor releases when there are no significant functionality changes. Cumulus versioning is explained in the Versioning Docs.

    Search on our documentation site is taken care of by DocSearch. We have been provided with an apiId, apiKey and an indexName by DocSearch that we include in our website/docusaurus.config.js file. The rest, indexing and actual searching, we leave to DocSearch. Our builds expect environment variables for these values to exist - DOCSEARCH_APP_ID, DOCSEARCH_API_KEY and DOCSEARCH_NAME_INDEX.

    Add a new task

    The tasks list in docs/tasks.md is generated from the list of task package in the task folder. Do not edit the docs/tasks.md file directly.

    Read more about adding a new task.

    Editing the tasks.md header or template

    Look at the bin/build-tasks-doc.js and bin/tasks-header.md files to edit the output of the tasks build script.

    Editing diagrams

    For some diagrams included in the documentation, the raw source is included in the docs/assets/raw directory to allow for easy updating in the future:

    • assets/interfaces.svg -> assets/raw/interfaces.drawio (generated using draw.io)

    Deployment

    The master branch is automatically built and deployed to gh-pages branch. The gh-pages branch is served by Github Pages. Do not make edits to the gh-pages branch.

    - + \ No newline at end of file diff --git a/docs/next/external-contributions/index.html b/docs/next/external-contributions/index.html index 086f2d93cba..28245120fee 100644 --- a/docs/next/external-contributions/index.html +++ b/docs/next/external-contributions/index.html @@ -5,13 +5,13 @@ External Contributions | Cumulus Documentation - +
    Version: Next

    External Contributions

    Contributions to Cumulus may be made in the form of PRs to the repositories directly or through externally developed tasks and components. Cumulus is designed as an ecosystem that leverages Terraform deployments and AWS Step Functions to easily integrate external components.

    This list may not be exhaustive and represents components that are open source, owned externally, and that have been tested with the Cumulus system. For more information and contributing guidelines, visit the respective GitHub repositories.

    Distribution

    The ASF Thin Egress App is used by Cumulus for distribution. TEA can be deployed with Cumulus or as part of other applications to distribute data.

    Operational Cloud Recovery Archive (ORCA)

    ORCA can be deployed with Cumulus to provide a customizable baseline for creating and managing operational backups.

    Workflow Tasks

    CNM

    PO.DAAC provides two workflow tasks to be used with the Cloud Notification Mechanism (CNM) Schema: CNM to Granule and CNM Response.

    See the CNM workflow data cookbook for an example of how these can be used in a Cumulus ingest workflow.

    DMR++ Generation

    GHRC has provided a DMR++ Generation wokrflow task. This task is meant to be used in conjunction with Cumulus' Hyrax Metadata Updates workflow task.

    - + \ No newline at end of file diff --git a/docs/next/faqs/index.html b/docs/next/faqs/index.html index 5cfb14e9b41..53cdf22e208 100644 --- a/docs/next/faqs/index.html +++ b/docs/next/faqs/index.html @@ -5,13 +5,13 @@ Frequently Asked Questions | Cumulus Documentation - +
    Version: Next

    Frequently Asked Questions

    Below are some commonly asked questions that you may encounter that can assist you along the way when working with Cumulus.

    General | Workflows | Integrators & Developers | Operators


    General

    What prerequisites are needed to setup Cumulus?

    Answer: Here is a list of the tools and access that you will need in order to get started. To maintain the up-to-date versions that we are using please visit our Cumulus main README for details.

    • NVM for node versioning
    • AWS CLI
    • Bash
    • Docker (only required for testing)
    • docker-compose (only required for testing pip install docker-compose)
    • Python
    • pipenv
    login credentials

    Keep in mind you will need access to the AWS console and an Earthdata account before you can deploy Cumulus.

    What is the preferred web browser for the Cumulus environment?

    Answer: Our preferred web browser is the latest version of Google Chrome.

    How do I deploy a new instance in Cumulus?

    Answer: For steps on the Cumulus deployment process go to How to Deploy Cumulus.

    Where can I find Cumulus release notes?

    Answer: To get the latest information about updates to Cumulus go to Cumulus Versions.

    How do I quickly troubleshoot an issue in Cumulus?

    Answer: To troubleshoot and fix issues in Cumulus reference our recommended solutions in Troubleshooting Cumulus.

    Where can I get support help?

    Answer: The following options are available for assistance:

    • Cumulus: Outside NASA users should file a GitHub issue and inside NASA users should file a Cumulus JIRA ticket.
    • AWS: You can create a case in the AWS Support Center, accessible via your AWS Console.
    info

    For more information on how to submit an issue or contribute to Cumulus follow our guidelines at Contributing.


    Workflows

    What is a Cumulus workflow?

    Answer: A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions. For more details, we suggest visiting the Workflows section.

    How do I set up a Cumulus workflow?

    Answer: You will need to create a provider, have an associated collection (add a new one), and generate a new rule first. Then you can set up a Cumulus workflow by following these steps here.

    Where can I find a list of workflow tasks?

    Answer: You can access a list of reusable tasks for Cumulus development at Cumulus Tasks.

    Are there any third-party workflows or applications that I can use with Cumulus?

    Answer: The Cumulus team works with various partners to help build a robust framework. You can visit our External Contributions section to see what other options are available to help you customize Cumulus for your needs.


    Integrators & Developers

    What is a Cumulus integrator?

    Answer: Those who are working within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    What are the steps if I run into an issue during deployment?

    Answer: If you encounter an issue with your deployment go to the Troubleshooting Deployment guide.

    Is Cumulus customizable and flexible?

    Answer: Yes. Cumulus is a modular architecture that allows you to decide which components that you want/need to deploy. These components are maintained as Terraform modules.

    What are Terraform modules?

    Answer: They are modules that are composed to create a Cumulus deployment, which gives integrators the flexibility to choose the components of Cumulus that want/need. To view Cumulus maintained modules or steps on how to create a module go to Terraform modules.

    Where do I find Terraform module variables

    Answer: Go here for a list of Cumulus maintained variables.

    What are the common use cases that a Cumulus integrator encounters?

    Answer: The following are some examples of possible use cases you may see:


    Operators

    What is a Cumulus operator?

    Answer: Those that ingests, archives, and troubleshoots datasets (called collections in Cumulus). Your daily activities might include but not limited to the following:

    • Ingesting datasets
    • Maintaining historical data ingest
    • Starting and stopping data handlers
    • Managing collections
    • Managing provider definitions
    • Creating, enabling, and disabling rules
    • Investigating errors for granules and deleting or re-ingesting granules
    • Investigating errors in executions and isolating failed workflow step(s)
    What are the common use cases that a Cumulus operator encounters?

    Answer: The following are some examples of possible use cases you may see:

    Explore more Cumulus operator best practices and how-tos in the dedicated Operator Docs.

    Can you re-run a workflow execution in AWS?

    Answer: Yes. For steps on how to re-run a workflow execution go to Re-running workflow executions in the Cumulus Operator Docs.

    - + \ No newline at end of file diff --git a/docs/next/features/ancillary_metadata/index.html b/docs/next/features/ancillary_metadata/index.html index 1a5c93c7f45..577f3e21ada 100644 --- a/docs/next/features/ancillary_metadata/index.html +++ b/docs/next/features/ancillary_metadata/index.html @@ -5,7 +5,7 @@ Ancillary Metadata Export | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: Next

    Ancillary Metadata Export

    This feature utilizes the type key on a files object in a Cumulus granule. It uses the key to provide a mechanism where granule discovery, processing and other tasks can set and use this value to facilitate metadata export to CMR.

    Tasks setting type

    Discover Granules

    Uses the Collection type key to set the value for files on discovered granules in it's output.

    Parse PDR

    Uses a task-specific mapping to map PDR 'FILE_TYPE' to a CNM type to set type on granules from the PDR.

    CNMToCMALambdaFunction

    Natively supports types that are included in incoming messages to a CNM Workflow.

    Tasks using type

    Move Granules

    Uses the granule file type key to update UMM/ECHO 10 CMR files passed in as candidates to the task. This task adds the external facing URLs to the CMR metadata file based on the type. See the file tracking data cookbook for a detailed mapping. If a non-CNM type is specified, the task assumes it is a 'data' file.

    - + \ No newline at end of file diff --git a/docs/next/features/backup_and_restore/index.html b/docs/next/features/backup_and_restore/index.html index bda830d8b5c..1c3c783e61b 100644 --- a/docs/next/features/backup_and_restore/index.html +++ b/docs/next/features/backup_and_restore/index.html @@ -5,7 +5,7 @@ Cumulus Backup and Restore | Cumulus Documentation - + @@ -50,7 +50,7 @@ writing to the old cluster.

  • Set the snapshot_identifier variable to the snapshot you wish to create, and configure the module like a new deployment, with a unique cluster_identifier

  • Deploy the module using terraform apply

  • Once deployed, verify the cluster has the expected data

  • Redeploy the data persistence and Cumulus deployments - You should not need to reconfigure either, as the secret ARN and the security group should not change, however double-check the configured values are as expected

  • - + \ No newline at end of file diff --git a/docs/next/features/dead_letter_archive/index.html b/docs/next/features/dead_letter_archive/index.html index 258e643f0fd..2b613129431 100644 --- a/docs/next/features/dead_letter_archive/index.html +++ b/docs/next/features/dead_letter_archive/index.html @@ -5,13 +5,13 @@ Cumulus Dead Letter Archive | Cumulus Documentation - +
    Version: Next

    Cumulus Dead Letter Archive

    This documentation explains the Cumulus dead letter archive and associated functionality.

    DB Records DLQ Archive

    The Cumulus system contains a number of dead letter queues. Perhaps the most important system lambda function supported by a DLQ is the sfEventSqsToDbRecords lambda function which parses Cumulus messages from workflow executions to generate and write database records to the Cumulus database.

    As of Cumulus v9+, the dead letter queue for this lambda (named sfEventSqsToDbRecordsDeadLetterQueue) has been updated with a consumer lambda that will automatically write any incoming records to the S3 system bucket, under the path <stackName>/dead-letter-archive/sqs/. This will allow integrators and operators engaged in debugging missing records to inspect any Cumulus messages which failed to process and did not result in the successful creation of database records.

    Dead Letter Archive recovery

    In addition to the above, as of Cumulus v9+, the Cumulus API also contains a new endpoint at /deadLetterArchive/recoverCumulusMessages.

    Sending a POST request to this endpoint will trigger a Cumulus AsyncOperation that will attempt to reprocess (and if successful delete) all Cumulus messages in the dead letter archive, using the same underlying logic as the existing sfEventSqsToDbRecords. Otherwise, all Cumulus messages that fail to be reprocessed will be moved to a new archive location under the path <stackName>/dead-letter-archive/failed-sqs/<YYYY-MM-DD>.

    This endpoint may prove particularly useful when recovering from extended or unexpected database outage, where messages failed to process due to external outage and there is no essential malformation of each Cumulus message.

    - + \ No newline at end of file diff --git a/docs/next/features/dead_letter_queues/index.html b/docs/next/features/dead_letter_queues/index.html index cf9a544ebcd..16aeb642ee7 100644 --- a/docs/next/features/dead_letter_queues/index.html +++ b/docs/next/features/dead_letter_queues/index.html @@ -5,13 +5,13 @@ Dead Letter Queues | Cumulus Documentation - +
    Version: Next

    Dead Letter Queues

    startSF SQS queue

    The workflow-trigger for the startSF queue has a Redrive Policy set up that directs any failed attempts to pull from the workflow start queue to a SQS queue Dead Letter Queue.

    This queue can then be monitored for failures to initiate a workflow. Please note that workflow failures will not show up in this queue, only repeated failure to trigger a workflow.

    Named Lambda Dead Letter Queues

    Cumulus provides configured Dead Letter Queues (DLQ) for non-workflow Lambdas (such as ScheduleSF) to capture Lambda failures for further processing.

    These DLQs are setup with the following configuration:

      receive_wait_time_seconds  = 20
    message_retention_seconds = 1209600
    visibility_timeout_seconds = 60

    Default Lambda Configuration

    The following built-in Cumulus Lambdas are setup with DLQs to allow handling of process failures:

    • dbIndexer (Updates Elasticsearch)
    • JobsLambda (writes logs outputs to Elasticsearch)
    • ScheduleSF (the SF Scheduler Lambda that places messages on the queue that is used to start workflows, see Workflow Triggers)
    • publishReports (Lambda that publishes messages to the SNS topics for execution, granule and PDR reporting)
    • reportGranules, reportExecutions, reportPdrs (Lambdas responsible for updating records based on messages in the queues published by publishReports)

    Troubleshooting/Utilizing messages in a Dead Letter Queue

    Ideally an automated process should be configured to poll the queue and process messages off a dead letter queue.

    For aid in manually troubleshooting, you can utilize the SQS Management console to view/messages available in the queues setup for a particular stack. The dead letter queues will have a Message Body containing the Lambda payload, as well as Message Attributes that reference both the error returned and a RequestID which can be cross referenced to the associated Lambda's CloudWatch logs for more information:

    Screenshot of the AWS SQS console showing how to view SQS message attributes

    - + \ No newline at end of file diff --git a/docs/next/features/distribution-metrics/index.html b/docs/next/features/distribution-metrics/index.html index f897e0f9e89..2d64f2ecf60 100644 --- a/docs/next/features/distribution-metrics/index.html +++ b/docs/next/features/distribution-metrics/index.html @@ -5,13 +5,13 @@ Cumulus Distribution Metrics | Cumulus Documentation - +
    Version: Next

    Cumulus Distribution Metrics

    It is possible to configure Cumulus and the Cumulus Dashboard to display information about the successes and failures of requests for data. This requires the Cumulus instance to deliver Cloudwatch Logs and S3 Server Access logs to an ELK stack.

    ESDIS Metrics in NGAP

    Work with the ESDIS metrics team to set up permissions and access to forward Cloudwatch Logs to a shared AWS:Logs:Destination as well as transferring your S3 Server Access logs to a metrics team bucket.

    The metrics team has taken care of setting up logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    Once Cumulus has been configured to deliver Cloudwatch logs to the ESDIS Metrics team, you can use the Elasticsearch indexes to create the necessary target patterns on the dashboard. These are often <daac>-cloudwatch-cumulus-<env>-* and <daac>-distribution-<env>-*, but they will depend on your specific Elastiscearch setup.

    Cumulus / ESDIS Metrics distribution system

    Architecture diagram showing how logs are replicated from a Cumulus instance to the ESDIS Metrics account and accessed by the Cumulus dashboard

    - + \ No newline at end of file diff --git a/docs/next/features/execution_payload_retention/index.html b/docs/next/features/execution_payload_retention/index.html index b09679456bd..3398b281fa5 100644 --- a/docs/next/features/execution_payload_retention/index.html +++ b/docs/next/features/execution_payload_retention/index.html @@ -5,13 +5,13 @@ Execution Payload Retention | Cumulus Documentation - +
    Version: Next

    Execution Payload Retention

    In addition to CloudWatch logs and AWS StepFunction API records, Cumulus automatically stores the initial and 'final' (the last update to the execution record) payload values as part of the Execution record in your RDS database and Elasticsearch.

    This allows access via the API (or optionally direct DB/Elasticsearch querying) for debugging/reporting purposes. The data is stored in the "originalPayload" and "finalPayload" fields.

    Payload record cleanup

    To reduce storage requirements, a CloudWatch rule ({stack-name}-dailyExecutionPayloadCleanupRule) triggering a daily run of the provided cleanExecutions lambda has been added. This lambda will remove all 'completed' and 'non-completed' payload records in the database that are older than the specified configuration.

    Configuration

    The following configuration flags have been made available in the cumulus module. They may be overridden in your deployment's instance of the cumulus module by adding the following configuration options:

    dailyexecution_payload_cleanup_schedule_expression (string)_

    This configuration option sets the execution times for this Lambda to run, using a Cloudwatch cron expression.

    Default value is "cron(0 4 * * ? *)".

    completeexecution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of completed execution payloads.

    Default value is false.

    completeexecution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a 'completed' status in days. Records with updatedAt values older than this with payload information will have that information removed.

    Default value is 10.

    noncomplete_execution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of "non-complete" (any status other than completed) execution payloads.

    Default value is false.

    noncomplete_execution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a status other than 'complete' in days. Records with updateTime values older than this with payload information will have that information removed.

    Default value is 30 days.

    • complete_execution_payload_disable/non_complete_execution_payload_disable

    These flags (true/false) determine if the cleanup script's logic for 'complete' and 'non-complete' executions will run. Default value is false for both.

    - + \ No newline at end of file diff --git a/docs/next/features/logging-esdis-metrics/index.html b/docs/next/features/logging-esdis-metrics/index.html index 59bc5a73d6e..88ec2c520bb 100644 --- a/docs/next/features/logging-esdis-metrics/index.html +++ b/docs/next/features/logging-esdis-metrics/index.html @@ -5,13 +5,13 @@ Writing logs for ESDIS Metrics | Cumulus Documentation - +
    Version: Next

    Writing logs for ESDIS Metrics

    info

    This feature is only available for Cumulus deployments in NGAP environments.

    Log messages delivered to the ESDIS metrics logs destination conforming to an expected format will be automatically ingested and parsed to enable helpful searching/filtering of your logs via the ESDIS metrics Kibana dashboard.

    Expected log format

    The ESDIS metrics pipeline expects a log message to be a JSON string representation of an object (dict in Python or map in Java). An example log message might look like:

    {
    "level": "info",
    "executions": "arn:aws:states:us-east-1:000000000000:execution:MySfn:abcd1234",
    "granules": "[\"granule-1\",\"granule-2\"]",
    "message": "hello world",
    "sender": "greetingFunction",
    "stackName": "myCumulus",
    "timestamp": "2018-10-19T19:12:47.501Z"
    }

    A log message can contain the following properties:

    • executions: The AWS Step Function execution name in which this task is executing, if any
    • granules: A JSON string of the array of granule IDs being processed by this code, if any
    • level: A string identifier for the type of message being logged. Possible values:
      • debug
      • error
      • fatal
      • info
      • warn
      • trace
    • message: String containing your actual log message
    • parentArn: The parent AWS Step Function execution ARN that triggered the current execution, if any
    • sender: The name of the resource generating the log message (e.g. a library name, a Lambda function name, an ECS activity name)
    • stackName: The unique prefix for your Cumulus deployment
    • timestamp: An ISO-8601 formatted timestamp
    • version: The version of the resource generating the log message, if any

    None of these properties are explicitly required for ESDIS metrics to parse your log correctly. However, a log without a message has no informational content. And having level, sender, and timestamp properties is very useful for filtering your logs. Including a stackName in your logs is helpful as it allows you to distinguish between logs generated by different deployments.

    Using Cumulus Message Adapter libraries

    If you are writing a custom task that is integrated with the Cumulus Message Adapter, then some of language specific client libraries can be used to write logs compatible with ESDIS metrics.

    The usage of each library differs slightly, but in general a logger is initialized with a Cumulus workflow message to determine the contextual information for the task (e.g. granules, executions). Then, after the logger is initialized, writing logs only requires specifying a message, but the logged output will include the contextual information as well.

    Writing logs using custom code

    Any code that produces logs matching the expected log format can be processed by ESDIS metrics.

    Node.js

    Cumulus core provides a @cumulus/logger library that writes logs in the expected format for ESDIS metrics.

    - + \ No newline at end of file diff --git a/docs/next/features/replay-archived-sqs-messages/index.html b/docs/next/features/replay-archived-sqs-messages/index.html index c2a42b83ddf..7d6515104b3 100644 --- a/docs/next/features/replay-archived-sqs-messages/index.html +++ b/docs/next/features/replay-archived-sqs-messages/index.html @@ -5,14 +5,14 @@ How to replay SQS messages archived in S3 | Cumulus Documentation - +
    Version: Next

    How to replay SQS messages archived in S3

    Context

    Cumulus archives all incoming SQS messages to S3 and removes messages once they have been processed. Unprocessed messages are archived at the path: ${stackName}/archived-incoming-messages/${queueName}/${messageId}

    Replay SQS messages endpoint

    The Cumulus API has added a new endpoint, /replays/sqs. This endpoint will allow you to start a replay operation to requeue all archived SQS messages by queueName and returns an AsyncOperationId for operation status tracking.

    Start replaying archived SQS messages

    In order to start a replay, you must perform a POST request to the replays/sqs endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    FieldTypeDescription
    queueNamestringAny valid SQS queue name (not ARN)

    Status tracking

    A successful response from the /replays/sqs endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/next/features/replay-kinesis-messages/index.html b/docs/next/features/replay-kinesis-messages/index.html index 4aa291be844..5163239d1a9 100644 --- a/docs/next/features/replay-kinesis-messages/index.html +++ b/docs/next/features/replay-kinesis-messages/index.html @@ -5,7 +5,7 @@ How to replay Kinesis messages after an outage | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: Next

    How to replay Kinesis messages after an outage

    After a period of outage, it may be necessary for a Cumulus operator to reprocess or 'replay' messages that arrived on an AWS Kinesis Data Stream but did not trigger an ingest. This document serves as an outline on how to start a replay operation, and how to perform status tracking. Cumulus supports replay of all Kinesis messages on a stream (subject to the normal RetentionPeriod constraints), or all messages within a given time slice delimited by start and end timestamps.

    As Kinesis has no comparable field to e.g. the SQS ReceiveCount on its records, Cumulus cannot tell which messages within a given time slice have never been processed, and cannot guarantee only missed messages will be processed. Users will have to rely on duplicate handling or some other method of identifying messages that should not be processed within the time slice.

    note

    This operation flow effectively changes only the trigger mechanism for Kinesis ingest notifications. The existence of valid Kinesis-type rules and all other normal requirements for the triggering of ingest via Kinesis still apply.

    Replays endpoint

    Cumulus has added a new endpoint to its API, /replays. This endpoint will allow you to start replay operations and returns an AsyncOperationId for operation status tracking.

    Start a replay

    In order to start a replay, you must perform a POST request to the replays endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    NOTE: As the endTimestamp relies on a comparison with the Kinesis server-side ApproximateArrivalTimestamp, and given that there is no documented level of accuracy for the approximation, it is recommended that the endTimestamp include some amount of buffer to allow for slight discrepancies. If tolerable, the same is recommended for the startTimestamp although it is used differently and less vulnerable to discrepancies since a server-side arrival timestamp should never be earlier than the client-side request timestamp.

    FieldTypeRequiredDescription
    typestringrequiredCurrently only accepts kinesis.
    kinesisStreamstringfor type kinesisAny valid kinesis stream name (not ARN)
    kinesisStreamCreationTimestamp*optionalAny input valid for a JS Date constructor. For reasons to use this field see AWS documentation on StreamCreationTimestamp.
    endTimestamp*optionalAny input valid for a JS Date constructor. Messages newer than this timestamp will be skipped.
    startTimestamp*optionalAny input valid for a JS Date constructor. Messages will be fetched from the Kinesis stream starting at this timestamp. Ignored if it is further in the past than the stream's retention period.

    Status tracking

    A successful response from the /replays endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/next/features/reports/index.html b/docs/next/features/reports/index.html index 923eccc8625..259d9257d1a 100644 --- a/docs/next/features/reports/index.html +++ b/docs/next/features/reports/index.html @@ -5,7 +5,7 @@ Reconciliation Reports | Cumulus Documentation - + @@ -19,7 +19,7 @@ report generation. The data buckets will include any buckets in your Cumulus buckets configuration that have type public, protected or private.
    - + \ No newline at end of file diff --git a/docs/next/getting-started/index.html b/docs/next/getting-started/index.html index 0ae78adcc53..f6b6b8c6010 100644 --- a/docs/next/getting-started/index.html +++ b/docs/next/getting-started/index.html @@ -5,13 +5,13 @@ Getting Started | Cumulus Documentation - +
    Version: Next

    Getting Started

    Overview | Quick Tutorials | Helpful Tips

    Overview

    This serves as a guide for new Cumulus users to deploy and learn how to use Cumulus. Here you will learn what you need in order to complete any prerequisites, what Cumulus is and how it works, and how to successfully navigate and deploy a Cumulus environment.

    What is Cumulus

    Cumulus is an open source set of components for creating cloud-based data ingest, archive, distribution and management designed for NASA's future Earth Science data streams.

    Who uses Cumulus

    Data integrators/developers and operators across projects not limited to NASA use Cumulus for their daily work functions.

    Cumulus Roles

    Integrator/Developer

    Cumulus integrators/developers are those who work within Cumulus and AWS for deployments and to manage workflows.

    Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections.

    Role Guides

    As a developer, integrator, or operator, you will need to set up your environments to work in Cumulus. The following docs can get you started in your role specific activities.

    What is a Cumulus Data Type

    In Cumulus, we have the following types of data that you can create and manage:

    • Collections
    • Granules
    • Providers
    • Rules
    • Workflows
    • Executions
    • Reports

    For details on how to create or manage data types go to Data Management Types.


    Quick Tutorials

    Deployment & Configuration

    Cumulus is deployed to an AWS account, so you must have access to deploy resources to an AWS account to get started.

    1. Set up Git Secrets

    To ensure your AWS access keys and passwords are protected as you submit commits we recommend setting up Git Secrets.

    2. Deploy Cumulus Core and Cumulus Dashboard to AWS

    Follow the deployment instructions to deploy Cumulus to your AWS account.

    3. Configure and Run the HelloWorld Workflow

    If you have deployed using the cumulus-template-deploy repository, you have a HelloWorld workflow deployed to your Cumulus backend.

    You can see your deployed workflows on the Workflows page of your Cumulus dashboard.

    Configure a collection and provider using the setup guidance on the Cumulus dashboard.

    Then create a rule to trigger your HelloWorld workflow. You can select a rule type of one time.

    Navigate to the Executions page of the dashboard to check the status of your workflow execution.

    4. Configure a Custom Workflow

    See Developing a custom workflow documentation for adding a new workflow to your deployment.

    There are plenty of workflow examples using Cumulus tasks here. The Data Cookbooks provide a more in-depth look at some of these more advanced workflows and their configurations.

    There is a list of Cumulus tasks already included in your deployment here.

    After configuring your workflow and redeploying, you can configure and run your workflow using the same steps as in step 2.


    Helpful Tips

    Here are some useful tips to keep in mind when deploying or working in Cumulus.

    Integrator/Developer

    • Versioning and Releases: This documentation gives information on our global versioning approach. We suggest upgrading to the supported version for Cumulus, Cumulus dashboard, and Thin Egress App (TEA).
    • Cumulus Developer Documentation: We suggest that you read through and reference this resource for development best practices in Cumulus.
    • Cumulus Deployment: We will guide you on how to manually deploy a new instance of Cumulus. In this reference, you will learn how to install Terraform, create an AWS S3 bucket, configure a compatible database, and create a Lambda layer.
    • Terraform Best Practices: This will help guide you through your Terraform configuration and Cumulus deployment. For an introduction about Terraform, go to Terraform's official site.
    • Integrator Common Use Cases: Scenarios to help integrators along in the Cumulus environment.

    Operator

    Troubleshooting

    Troubleshooting: Some suggestions to help you troubleshoot and solve issues you may encounter.

    Resources

    - + \ No newline at end of file diff --git a/docs/next/glossary/index.html b/docs/next/glossary/index.html index 8c1ab58e2fe..4d0580ec06d 100644 --- a/docs/next/glossary/index.html +++ b/docs/next/glossary/index.html @@ -5,13 +5,13 @@ Glossary | Cumulus Documentation - +
    Version: Next

    Glossary

    AWS Glossary

    For terms/items from Amazon/AWS not mentioned in this glossary, please refer to the AWS Glossary.

    Cumulus Glossary of Terms

    API Gateway

    Refers to AWS's API Gateway. Used by the Cumulus API.

    ARN

    Refers to an AWS "Amazon Resource Name".

    For more info, see the AWS documentation.

    AWS

    See: Amazon Web Services documentation.

    AWS Lambda/Lambda Function

    AWS's 'serverless' option. Allows the running of code without provisioning a service or managing server/ECS instances/etc.

    For more information, see the AWS Lambda documentation.

    AWS Access Keys

    Access credentials that give you access to AWS to act as a IAM user programmatically or from the command line.

    For more information, see the AWS IAM Documentation.

    Bucket

    An Amazon S3 cloud storage resource.

    For more information, see the AWS Bucket Documentation.

    CloudFormation

    An AWS service that allows you to define and manage cloud resources as a preconfigured block.

    For more information, see the AWS CloudFormation User Guide.

    Cloudformation Template

    A template that defines an AWS Cloud Formation.

    For more information, see the AWS intro page.

    Cloudwatch

    AWS service that allows logging and metrics collections on various cloud resources you have in AWS.

    For more information, see the AWS User Guide.

    Cloud Notification Mechanism (CNM)

    An interface mechanism to support cloud-based ingest messaging. For more information, see PO.DAAC's CNM Schema.

    Common Metadata Repository (CMR)

    "A high-performance, high-quality, continuously evolving metadata system that catalogs Earth Science data and associated service metadata records". For more information, see NASA's CMR page.

    Collection (Cumulus)

    Cumulus Collections are logical sets of data objects of the same data type and version.

    For more information, see Collections - Data Management Types.

    Cumulus Message Adapter (CMA)

    A library designed to help task developers integrate step function tasks into a Cumulus workflow by adapting task input/output into the Cumulus Message format.

    For more information, see CMA workflow reference page.

    Distributed Active Archive Center (DAAC)

    Refers to a specific organization that's part of NASA's distributed system of archive centers. For more information see EOSDIS's DAAC page.

    Dead Letter Queue (DLQ)

    This refers to Amazon SQS Dead-Letter Queues - these SQS queues are specifically configured to capture failed messages from other services/SQS queues/etc to allow for processing of failed messages.

    For more on DLQs, see the Amazon Documentation and the Cumulus DLQ feature page.

    Developer

    Those who setup deployment and workflow management for Cumulus. Sometimes referred to as an integrator. See integrator.

    ECS

    Amazon's Elastic Container Service. Used in Cumulus by workflow steps that require more flexibility than Lambda can provide.

    For more information, see AWS's developer guide.

    ECS Activity

    An ECS instance run via a Step Function.

    Execution (Cumulus)

    A Cumulus execution refers to a single execution of a (Cumulus) Workflow.

    GIBS

    Global Imagery Browse Services

    Granule

    A granule is the smallest aggregation of data that can be independently managed (described, inventoried, and retrieved). Granules are always associated with a collection, which is a grouping of granules. A granule is a grouping of data files.

    IAM

    AWS Identity and Access Management.

    For more information, see AWS IAMs.

    Integrator/Developer

    Those who work within Cumulus and AWS for deployments and to manage workflows.

    Kinesis

    Amazon's platform for streaming data on AWS.

    See AWS Kinesis for more information.

    Lambda

    AWS's cloud service that lets you run code without provisioning or managing servers.

    For more information, see AWS's lambda page.

    Module (Terraform)

    Refers to a terraform module.

    Node

    See node.js.

    Node Package Manager (npm)

    Node package manager. Often referred to as npm.

    For more information, see npm.

    Operator

    Those who work within Cumulus to ingest/archive data and manage collections.

    PDR

    "Polling Delivery Mechanism" used in "DAAC Ingest" workflows.

    For more information, see nasa.gov.

    Packages (npm)

    Npm hosted node.js packages. Cumulus packages can be found on npm's site here

    Provider

    Data source that generates and/or distributes data for Cumulus workflows to act upon.

    For more information, see the Cumulus documentation.

    Rule

    Rules are configurable scheduled events that trigger workflows based on various criteria.

    For more information, see the Cumulus Rules documentation.

    S3

    Amazon's Simple Storage Service provides data object storage in the cloud. Used in Cumulus to store configuration, data, and more.

    For more information, see AWS's S3 page.

    SIPS

    Science Investigator-led Processing Systems. In the context of DAAC ingest, this refers to data producers/providers.

    For more information, see nasa.gov.

    SNS

    Amazon's Simple Notification Service provides a messaging service that allows publication of and subscription to events. Used in Cumulus to trigger workflow events, track event failures, and others.

    For more information, see AWS's SNS page.

    SQS

    Amazon's Simple Queue Service.

    For more information, see AWS's SQS page.

    Stack

    A collection of AWS resources you can manage as a single unit.

    In the context of Cumulus, this refers to a deployment of the cumulus and data-persistence modules that is managed by Terraform.

    Step Function

    AWS's web service that allows you to compose complex workflows as a state machine comprised of tasks (Lambdas, activities hosted on EC2/ECS, some AWS service APIs, etc). See AWS's Step Function Documentation for more information. In the context of Cumulus these are the underlying AWS service used to create Workflows.

    Terraform

    Terraform is the tool that you will use for deployment and configuration of your Cumulus environment.

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    - + \ No newline at end of file diff --git a/docs/next/index.html b/docs/next/index.html index 1a669a70e15..ee412fcb886 100644 --- a/docs/next/index.html +++ b/docs/next/index.html @@ -5,13 +5,13 @@ Introduction | Cumulus Documentation - +
    Version: Next

    Introduction

    This Cumulus project seeks to address the existing need for a “native” cloud-based data ingest, archive, distribution, and management system that can be used for all future Earth Observing System Data and Information System (EOSDIS) data streams via the development and implementation of Cumulus. The term “native” implies that the system will leverage all components of a cloud infrastructure provided by the vendor for efficiency (in terms of both processing time and cost). Additionally, Cumulus will operate on future data streams involving satellite missions, aircraft missions, and field campaigns.

    This documentation includes both guidelines, examples, and source code docs. It is accessible at https://nasa.github.io/cumulus.


    Get To Know Cumulus

    • Getting Started - here - If you are new to Cumulus we suggest that you begin with this section to help you understand and work in the environment.
    • General Cumulus Documentation - here <- you're here

    Cumulus Reference Docs

    • Cumulus API Documentation - here
    • Cumulus Developer Documentation - here - READMEs throughout the main repository.
    • Data Cookbooks - here

    Auxiliary Guides

    • Integrator Guide - here
    • Operator Docs - here

    Contributing

    Please refer to: https://github.com/nasa/cumulus/blob/master/CONTRIBUTING.md for information. We thank you in advance.

    - + \ No newline at end of file diff --git a/docs/next/integrator-guide/about-int-guide/index.html b/docs/next/integrator-guide/about-int-guide/index.html index 0dd5d106af8..d5259e24a53 100644 --- a/docs/next/integrator-guide/about-int-guide/index.html +++ b/docs/next/integrator-guide/about-int-guide/index.html @@ -5,13 +5,13 @@ About Integrator Guide | Cumulus Documentation - +
    Version: Next

    About Integrator Guide

    Purpose

    The Integrator Guide is to help supplement the Cumulus documentation and Data Cookbooks. This content is for Cumulus integrators who are either new to the project or need a step-by-step resource to help them along.

    What Is A Cumulus Integrator

    Cumulus integrators are those who work within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    - + \ No newline at end of file diff --git a/docs/next/integrator-guide/int-common-use-cases/index.html b/docs/next/integrator-guide/int-common-use-cases/index.html index 4486807d5fe..4dacc6df0ef 100644 --- a/docs/next/integrator-guide/int-common-use-cases/index.html +++ b/docs/next/integrator-guide/int-common-use-cases/index.html @@ -5,13 +5,13 @@ Integrator Common Use Cases | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/next/integrator-guide/workflow-add-new-lambda/index.html b/docs/next/integrator-guide/workflow-add-new-lambda/index.html index 98bcdeb46d5..8aa3ce8d16a 100644 --- a/docs/next/integrator-guide/workflow-add-new-lambda/index.html +++ b/docs/next/integrator-guide/workflow-add-new-lambda/index.html @@ -5,13 +5,13 @@ Workflow - Add New Lambda | Cumulus Documentation - +
    Version: Next

    Workflow - Add New Lambda

    You can develop a workflow task in AWS Lambda or Elastic Container Service (ECS). AWS ECS requires Docker. For a list of tasks to use go to our Cumulus Tasks page.

    The following steps are to help you along as you write a new Lambda that integrates with a Cumulus workflow. This will aid you with the understanding of the Cumulus Message Adapter (CMA) process.

    Steps

    1. Define New Lambda in Terraform

    2. Add Task in JSON Object

      For details on how to set up a workflow via CMA go to the CMA Tasks: Message Flow.

      You will need to assign input and output for the new task and follow the CMA contract here. This contract defines how libraries should call the cumulus-message-adapter to integrate a task into an existing Cumulus Workflow.

    3. Verify New Task

      Check the updated workflow in AWS and in Cumulus.

    - + \ No newline at end of file diff --git a/docs/next/integrator-guide/workflow-ts-failed-step/index.html b/docs/next/integrator-guide/workflow-ts-failed-step/index.html index d02f2783d67..97322bb1c37 100644 --- a/docs/next/integrator-guide/workflow-ts-failed-step/index.html +++ b/docs/next/integrator-guide/workflow-ts-failed-step/index.html @@ -5,13 +5,13 @@ Workflow - Troubleshoot Failed Step(s) | Cumulus Documentation - +
    Version: Next

    Workflow - Troubleshoot Failed Step(s)

    Steps

    1. Locate Step
    • Go to Cumulus dashboard
    • Find the granule
    • Go to Executions to determine the failed step
    1. Investigate in Cloudwatch
    • Go to Cloudwatch
    • Locate lambda
    • Search Cloudwatch logs
    1. Recreate Error

      In your sandbox environment, try to recreate the error.

    2. Resolution

    - + \ No newline at end of file diff --git a/docs/next/interfaces/index.html b/docs/next/interfaces/index.html index f5da126398e..ee3534f0c4c 100644 --- a/docs/next/interfaces/index.html +++ b/docs/next/interfaces/index.html @@ -5,13 +5,13 @@ Interfaces | Cumulus Documentation - +
    Version: Next

    Interfaces

    Cumulus has multiple interfaces that allow interaction with discrete components of the system, such as starting workflows via SNS/Kinesis/SQS, manually queueing workflow start messages, submitting SNS notifications for completed workflows, and the many operations allowed by the Cumulus API.

    The diagram below illustrates the workflow process in detail and the various interfaces that allow starting of workflows, reporting of workflow information, and database create operations that occur when a workflow reporting message is processed. For interfaces with expected input or output schemas, details are provided below.

    Architecture diagram showing the interfaces for triggering and reporting of Cumulus workflow executions

    Workflow triggers and queuing

    Kinesis stream

    As a Kinesis stream is consumed by the messageConsumer Lambda to queue workflow executions, the incoming event is validated against this consumer schema by the ajv package.

    SQS queue for executions

    The messages put into the SQS queue for executions should conform to the Cumulus message format.

    Workflow executions

    See the documentation on Cumulus workflows.

    Workflow reporting

    SNS reporting topics

    For granule and PDR reporting, the topics will only receive data if the Cumulus workflow execution message meets the following criteria:

    • Granules - workflow message contains granule data in payload.granules
    • PDRs - workflow message contains PDR data in payload.pdr

    The messages published to the SNS reporting topics for executions and PDRs and the record property in the messages published to the granules SNS topic should conform to the model schema for each data type.

    Further detail on workflow reporting and how to interact with these interfaces can be found in the workflow notifications data cookbook.

    Cumulus API

    See the Cumulus API documentation.

    - + \ No newline at end of file diff --git a/docs/next/operator-docs/about-operator-docs/index.html b/docs/next/operator-docs/about-operator-docs/index.html index ffae5abe53b..1bfc8f378fe 100644 --- a/docs/next/operator-docs/about-operator-docs/index.html +++ b/docs/next/operator-docs/about-operator-docs/index.html @@ -5,13 +5,13 @@ About Operator Docs | Cumulus Documentation - +
    Version: Next

    About Operator Docs

    Purpose

    Operator Docs are an augmentation to Cumulus documentation and Data Cookbooks. These documents will walk step-by-step through common Cumulus activities (that aren't necessarily as use-case directed as what you'd see in Data Cookbooks).

    What Is A Cumulus Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections. They may perform the following functions via the operator dashboard or API:

    • Configure providers and collections
    • Configure rules and monitor workflow executions
    • Monitor granule ingestion
    • Monitor system metrics
    - + \ No newline at end of file diff --git a/docs/next/operator-docs/bulk-operations/index.html b/docs/next/operator-docs/bulk-operations/index.html index 35a66c1888b..ad09e870f4f 100644 --- a/docs/next/operator-docs/bulk-operations/index.html +++ b/docs/next/operator-docs/bulk-operations/index.html @@ -5,14 +5,14 @@ Bulk Operations | Cumulus Documentation - +
    Version: Next

    Bulk Operations

    Cumulus implements bulk operations through the use of AsyncOperations, which are long-running processes executed on an AWS ECS cluster.

    Submitting a bulk API request

    Bulk operations are generally submitted via the endpoint for the relevant data type, e.g. granules. For a list of supported API requests, refer to the Cumulus API documentation. Bulk operations are denoted with the keyword 'bulk'.

    Starting bulk operations from the Cumulus dashboard

    Using a Kibana query

    caution

    You must have configured your dashboard build with a KIBANAROOT environment variable in order for the Kibana link to render in the bulk granules modal.

    1. From the Granules dashboard page, click on the "Run Bulk Granules" button, then select what type of action you would like to perform.
    note

    The rest of the process is the same regardless of what type of bulk action you perform.

    1. From the bulk granules modal, click the "Open Kibana" link:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations

    2. Once you have accessed Kibana, navigate to the "Discover" page. If this is your first time using Kibana, you may see a message like this at the top of the page:

      In order to visualize and explore data in Kibana, you'll need to create an index pattern to retrieve data from Elasticsearch.

      In that case, see the docs for creating an index pattern for Kibana

      Screenshot of Kibana user interface showing the &quot;Discover&quot; page for running queries

    3. Enter a query that returns the granule records that you want to use for bulk operations:

      Screenshot of Kibana user interface showing an example Kibana query and results

    4. Once the Kibana query is returning the results you want, click the "Inspect" link near the top of the page. A slide out tab with request details will appear on the right side of the page:

      Screenshot of Kibana user interface showing details of an example request

    5. In the slide out tab that appears on the right side of the page, click the "Request" link near the top and scroll down until you see the query property:

      Screenshot of Kibana user interface showing the Elasticsearch data request made for a given Kibana query

    6. Highlight and copy the query contents from Kibana. Go back to the Cumulus dashboard and paste the query contents from Kibana inside of the query property in the bulk granules request payload. It is expected that you should have a property of query nested inside of the existing query property:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query information populated

    7. Add values for the index and workflowName to the bulk granules request payload. The value for index will vary based on your Elasticsearch setup, but it is good to target an index specifically for granule data if possible:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query, index, and workflow information populated

    8. Click the "Run Bulk Operations" button. You should see a confirmation message, including an ID for the async operation that was started to handle your bulk action. You can track the status of this async operation on the Operations dashboard page, which can be visited by clicking the "Go To Operations" button:

      ![Screenshot of Cumulus dashboard showing confirmation message with async operation ID for bulk granules request](../assets/bulk-granules-submitted.png)

    Creating an index pattern for Kibana

    1. Define the index pattern for the indices that your Kibana queries should use. A wildcard character, *, will match across multiple indices. Once you are satisfied with your index pattern, click the "Next step" button:

      Screenshot of Kibana user interface for defining an index pattern

    2. Choose whether to use a Time Filter for your data, which is not required. Then click the "Create index pattern" button:

      ![Screenshot of Kibana user interface for configuring the settings of an index pattern](../assets/kibana-create-index-pattern-2.png)

    Status Tracking

    All bulk operations return an AsyncOperationId which can be submitted to the /asyncOperations endpoint.

    The /asyncOperations endpoint allows listing of AsyncOperation records as well as record retrieval for individual records, which will contain the status. The Cumulus API documentation shows sample requests for these actions.

    The Cumulus Dashboard also includes an Operations monitoring page, where operations and their status are visible:

    Screenshot of Cumulus Dashboard Operations Page showing 5 operations and their status, ID, description, type and creation timestamp

    - + \ No newline at end of file diff --git a/docs/next/operator-docs/cmr-operations/index.html b/docs/next/operator-docs/cmr-operations/index.html index 28438439781..a3da3fba9d7 100644 --- a/docs/next/operator-docs/cmr-operations/index.html +++ b/docs/next/operator-docs/cmr-operations/index.html @@ -5,7 +5,7 @@ CMR Operations | Cumulus Documentation - + @@ -16,7 +16,7 @@ UpdateCmrAccessConstraints will update CMR metadata file contents on S3, and PostToCmr will push the updates to CMR. The rest of this section will assume you have created this workflow under the name UpdateCmrAccessConstraints.

    Once created and deployed, the workflow is available in the Cumulus dashboard's Execute workflow selector. However, note that additional configuration is required for this request, to supply an access constraint integer value and optional description to the UpdateCmrAccessConstraints workflow, by clicking the Add Custom Workflow Meta option in the Execute popup, as shown below:

    Screenshot showing granule execute popup with &#39;updateCmrAccessConstraints&#39; selected and configuration values shown in a collapsible JSON field

    An example invocation of the API to perform this action is:

    $ curl --request PUT https://example.com/granules/MOD11A1.A2017137.h19v16.006.2017138085750 \
    --header 'Authorization: Bearer ReplaceWithTheToken' \
    --header 'Content-Type: application/json' \
    --data '{
    "action": "applyWorkflow",
    "workflow": "updateCmrAccessConstraints",
    "meta": {
    accessConstraints: {
    value: 5,
    description: "sample access constraint"
    }
    }
    }'

    Supported CMR metadata formats for the above operation are Echo10XML and UMMG-JSON, which will populate the RestrictionFlag and RestrictionComment fields in Echo10XML, or the AccessConstraints values in UMMG-JSON.

    Additional Operations

    At this time Cumulus does not, out of the box, support additional operations on CMR metadata. However, given the examples shown above, we recommend working with your integrators to develop additional workflows that perform any required operations.

    Bulk CMR operations

    In order to perform the above operations in bulk, Cumulus supports the use of ApplyWorkflow in an AsyncOperation. These are accessed via the Bulk Operation button on the dashboard, or the /granules/bulk endpoint on the Cumulus API.

    More information on bulk operations are in the bulk operations operator doc.

    - + \ No newline at end of file diff --git a/docs/next/operator-docs/create-rule-in-cumulus/index.html b/docs/next/operator-docs/create-rule-in-cumulus/index.html index f8e7d5ced49..1498c8f9634 100644 --- a/docs/next/operator-docs/create-rule-in-cumulus/index.html +++ b/docs/next/operator-docs/create-rule-in-cumulus/index.html @@ -5,13 +5,13 @@ Create Rule In Cumulus | Cumulus Documentation - +
    Version: Next

    Create Rule In Cumulus

    Once the above files are in place and the entries created in CMR and Cumulus, we are ready to begin ingesting data. Depending on the type of ingestion (FTP/Kinesis, etc) the values below will change, but for the most part they are all similar. Rules tell Cumulus how to associate providers and collections, and when/how to start processing a workflow.

    Steps

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    state field conditional

    If the state field is left blank, it defaults to false.

    Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/next/operator-docs/discovery-filtering/index.html b/docs/next/operator-docs/discovery-filtering/index.html index 302c914e3e7..363592b02ec 100644 --- a/docs/next/operator-docs/discovery-filtering/index.html +++ b/docs/next/operator-docs/discovery-filtering/index.html @@ -5,7 +5,7 @@ Discovery Filtering | Cumulus Documentation - + @@ -24,7 +24,7 @@ directly list the provider_path. If the path contains regular expression components, this may fail.

    It is recommended that operators diagnose any failures by checking error logs and ensuring that permissions on the remote file system allow reading of the default directory and any subdirectories that match the filter.

    Supported protocols

    Currently support for this feature is limited to the following protocols:

    • ftp
    • sftp
    - + \ No newline at end of file diff --git a/docs/next/operator-docs/granule-workflows/index.html b/docs/next/operator-docs/granule-workflows/index.html index b2424c26f14..01718975257 100644 --- a/docs/next/operator-docs/granule-workflows/index.html +++ b/docs/next/operator-docs/granule-workflows/index.html @@ -5,13 +5,13 @@ Granule Workflows | Cumulus Documentation - +
    Version: Next

    Granule Workflows

    Failed Granule

    Delete and Ingest

    1. Delete Granule
    note

    Granules published to CMR will need to be removed from CMR via the dashboard prior to deletion.

    1. Ingest Granule via Ingest Rule
    • Re-trigger a one-time, kinesis, SQS, or SNS rule or a scheduled rule will re-discover and reingest the deleted granule.

    Reingest

    1. Select Failed Granule
    • In the Cumulus dashboard, go to the Collections page.
    • Use search field to find the granule.
    1. Re-ingest Granule
    • Go to the Collections page.
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of the Reingest modal workflow

    Delete and Ingest

    1. Bulk Delete Granules
    • Go to the Granules page.
    • Use the Bulk Delete button to bulk delete selected granules or select via a Kibana query
    tip

    You can optionally force deletion from CMR.

    1. Ingest Granules via Ingest Rule
    • Re-trigger one-time, kinesis, SQS, or SNS rules or scheduled rules will re-discover and reingest the deleted granule.

    Multiple Failed Granules

    1. Select Failed Granules
    • In the Cumulus dashboard, go to the Collections page.
    • Click on Failed Granules.
    • Select multiple granules.

    Screenshot of selected multiple granules

    1. Bulk Re-ingest Granules
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of Bulk Reingest modal workflow

    - + \ No newline at end of file diff --git a/docs/next/operator-docs/kinesis-stream-for-ingest/index.html b/docs/next/operator-docs/kinesis-stream-for-ingest/index.html index 4df0a87ed33..343463eb234 100644 --- a/docs/next/operator-docs/kinesis-stream-for-ingest/index.html +++ b/docs/next/operator-docs/kinesis-stream-for-ingest/index.html @@ -5,13 +5,13 @@ Setup Kinesis Stream & CNM Message | Cumulus Documentation - +
    Version: Next

    Setup Kinesis Stream & CNM Message

    tip

    Keep in mind that you should only have to set this up once per ingest stream. Kinesis pricing is based on the shard value and not on amount of kinesis usage.

    1. Create a Kinesis Stream

      • In your AWS console, go to the Kinesis service and click Create Data Stream.
      • Assign a name to the stream.
      • Apply a shard value of 1.
      • Click on Create Kinesis Stream.
      • A status page with stream details display. Once the status is active then the stream is ready to use. Keep in mind to record the streamName and StreamARN for later use.

      Screenshot of AWS console page for creating a Kinesis stream

    2. Create a Rule

    3. Send a message

      • Send a message that makes your schema using python or by your command line.
      • The streamName and Collection must match the kinesisArn+collection defined in the rule that you have created in Step 2.
    - + \ No newline at end of file diff --git a/docs/next/operator-docs/locating-access-logs/index.html b/docs/next/operator-docs/locating-access-logs/index.html index 32eb8439569..179806517f6 100644 --- a/docs/next/operator-docs/locating-access-logs/index.html +++ b/docs/next/operator-docs/locating-access-logs/index.html @@ -5,13 +5,13 @@ Locating S3 Access Logs | Cumulus Documentation - +
    Version: Next

    Locating S3 Access Logs

    When enabling S3 Access Logs for EMS Reporting you configured a TargetBucket and TargetPrefix. Inside the TargetBucket at the TargetPrefix is where you will find the raw S3 access logs.

    In a standard deployment, this will be your stack's <internal bucket name> and a key prefix of <stack>/ems-distribution/s3-server-access-logs/

    - + \ No newline at end of file diff --git a/docs/next/operator-docs/naming-executions/index.html b/docs/next/operator-docs/naming-executions/index.html index e7ea1bb63a4..21a17b51aef 100644 --- a/docs/next/operator-docs/naming-executions/index.html +++ b/docs/next/operator-docs/naming-executions/index.html @@ -5,7 +5,7 @@ Naming Executions | Cumulus Documentation - + @@ -21,7 +21,7 @@ QueuePdrs step.

    In the following excerpt, the QueueGranules config.executionNamePrefix property is set using the value configured in the workflow's meta.executionNamePrefix.

    info

    This meta.executionNamePrefix property should not be confused with the optional rule executionNamePrefix property from the previous section. Setting executionNamePrefix as a root property of the rule will set a prefix for the names of any workflows triggered by the rule. Setting meta.executionNamePrefix on the rule will set meta.executionNamePrefix in the workflow messages generated for this rule, allowing workflow steps like QueueGranules to read from the message meta.executionNamePrefix for their config. Then, workflows scheduled by QueueGranules would use the configured execution name prefix.

    Setting executionNamePrefix config for QueueGranules using rule.meta

    If you wanted to use a prefix of "my-prefix", you would create a rule with a meta property similar to the following Rule snippet:

    {
    ...other rule keys here...
    "meta":
    {
    "executionNamePrefix": "my-prefix"
    }
    }

    The value of meta.executionNamePrefix from the rule will be set as meta.executionNamePrefix in the workflow message.

    Then, the workflow could contain a "QueueGranules" step with the following state, which uses meta.executionNamePrefix from the message as the value for the executionNamePrefix config to the "QueueGranules" step:

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "executionNamePrefix": "{$.meta.executionNamePrefix}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },
    }
    - + \ No newline at end of file diff --git a/docs/next/operator-docs/ops-common-use-cases/index.html b/docs/next/operator-docs/ops-common-use-cases/index.html index c089c33c94c..bc2067cafc5 100644 --- a/docs/next/operator-docs/ops-common-use-cases/index.html +++ b/docs/next/operator-docs/ops-common-use-cases/index.html @@ -5,13 +5,13 @@ Operator Common Use Cases | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/next/operator-docs/trigger-workflow/index.html b/docs/next/operator-docs/trigger-workflow/index.html index 004abd1c909..383e9b4323c 100644 --- a/docs/next/operator-docs/trigger-workflow/index.html +++ b/docs/next/operator-docs/trigger-workflow/index.html @@ -5,13 +5,13 @@ Trigger a Workflow Execution | Cumulus Documentation - +
    Version: Next

    Trigger a Workflow Execution

    To trigger a workflow, you need to create a rule. To trigger an ingest workflow, one that requires discovering and ingesting data, you will also need to configure the collection and provider and associate those to a rule.

    Trigger a HelloWorld Workflow

    To trigger a HelloWorld workflow that does not need to discover or archive data, you just need to create a rule.

    You can leave the provider and collection blank and do not need any additional metadata. If you create a onetime rule, the workflow execution will start momentarily and you can view its status on the Executions page unless it was created with a DISABLED state.

    Trigger an Ingest Workflow

    To ingest data, you will need a provider and collection configured to tell your workflow where to discover data and where to archive the data respectively.

    Follow the instructions to create a provider and create a collection and configure their fields for your data ingest.

    In the rule's additional metadata you can specify a provider_path from which to get the data from the provider.

    Example: Ingest data from S3

    Setup

    Assume there are 2 files to be ingested in an S3 bucket called discovery-bucket, located in the test-data folder:

    • GRANULE.A2017025.jpg
    • GRANULE.A2017025.hdf

    Archive buckets should already be created and mapped to public / private / protected in the Cumulus deployment.

    For example:

    buckets = {
    private = {
    name = "discovery-bucket"
    type = "private"
    },
    protected = {
    name = "archive-protected"
    type = "protected"
    }
    public = {
    name = "archive-public"
    type = "public"
    }
    }

    Create a provider

    Create a new provider. Set protocol to S3 and Host to discovery-bucket.

    Screenshot of adding a sample S3 provider

    Create a collection

    Create a new collection. Configure the collection to extract the granule id from the filenames and configure where to store the granule files.

    The configuration below will store hdf files in the protected bucket and jpg files in the private bucket. The bucket types are

    {
    "name": "test-collection",
    "version": "001",
    "granuleId": "^GRANULE\\.A[\\d]{7}$",
    "granuleIdExtraction": "(GRANULE\\..*)(\\.hdf|\\.jpg)",
    "reportToEms": false,
    "sampleFileName": "GRANULE.A2017025.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^GRANULE\\.A[\\d]{7}\\.hdf$",
    "sampleFileName": "GRANULE.A2017025.hdf"
    },
    {
    "bucket": "public",
    "regex": "^GRANULE\\.A[\\d]{7}\\.jpg$",
    "sampleFileName": "GRANULE.A2017025.jpg"
    }
    ]
    }

    Create a rule

    Create a rule to trigger the workflow to discover your granule data and ingest your granule.

    Select the previously created provider and collection. See the Cumulus Discover Granules workflow for a workflow example of using Cumulus tasks to discover and queue data for ingest.

    In the rule meta, set the provider_path to test-data, so the test-data folder will be used to discover new granules.

    Screenshot of adding a Discover Granules rule

    A onetime rule will run your workflow on-demand and you can view it on the dashboard Executions page unless it has a DISABLED state. In order to run a workflow with a onetime DISABLED rule, please change the rule state to ENABLED and re-run. The Cumulus Discover Granules workflow will trigger an ingest workflow and your ingested granules will be visible on the dashboard Granules page.

    - + \ No newline at end of file diff --git a/docs/next/tasks/index.html b/docs/next/tasks/index.html index b7193d45a27..547f2aacf26 100644 --- a/docs/next/tasks/index.html +++ b/docs/next/tasks/index.html @@ -5,13 +5,13 @@ Cumulus Tasks | Cumulus Documentation - +
    Version: Next

    Cumulus Tasks

    A list of reusable Cumulus tasks. Add your own.

    Tasks

    @cumulus/add-missing-file-checksums

    Add checksums to files in S3 which don't have one


    @cumulus/discover-granules

    Discover Granules in FTP/HTTP/HTTPS/SFTP/S3 endpoints


    @cumulus/discover-pdrs

    Discover PDRs in FTP and HTTP endpoints


    @cumulus/files-to-granules

    Converts array-of-files input into a granules object by extracting granuleId from filename


    @cumulus/hello-world

    Example task


    @cumulus/hyrax-metadata-updates

    Update granule metadata with hooks to OPeNDAP URL


    @cumulus/lzards-backup

    Run LZARDS backup


    @cumulus/move-granules

    Move granule files from staging to final location


    @cumulus/orca-copy-to-archive-adapter

    Adapter to invoke orca copy-to-archive lambda


    @cumulus/orca-recovery-adapter

    Adapter to invoke orca recovery workflow


    @cumulus/parse-pdr

    Download and Parse a given PDR


    @cumulus/pdr-status-check

    Checks execution status of granules in a PDR


    @cumulus/post-to-cmr

    Post a given granule to CMR


    @cumulus/queue-granules

    Add discovered granules to the queue


    @cumulus/queue-pdrs

    Add discovered PDRs to a queue


    @cumulus/queue-workflow

    Add workflow to the queue


    @cumulus/sf-sqs-report

    Sends an incoming Cumulus message to SQS


    @cumulus/sync-granule

    Download a given granule


    @cumulus/test-processing

    Fake processing task used for integration tests


    @cumulus/update-cmr-access-constraints

    Updates CMR metadata to set access constraints


    Update CMR metadata files with correct online access urls and etags and transfer etag info to granules' CMR files

    - + \ No newline at end of file diff --git a/docs/next/team/index.html b/docs/next/team/index.html index 19c92712a61..4e89a954085 100644 --- a/docs/next/team/index.html +++ b/docs/next/team/index.html @@ -5,13 +5,13 @@ Cumulus Team | Cumulus Documentation - +
    Version: Next

    Cumulus Team

    Cumulus Core Team

    Cumulus Emeritus Team

    - + \ No newline at end of file diff --git a/docs/next/troubleshooting/index.html b/docs/next/troubleshooting/index.html index b67ca5f6b15..4471b4d8a02 100644 --- a/docs/next/troubleshooting/index.html +++ b/docs/next/troubleshooting/index.html @@ -5,14 +5,14 @@ How to Troubleshoot and Fix Issues | Cumulus Documentation - +
    Version: Next

    How to Troubleshoot and Fix Issues

    While Cumulus is a complex system, there is a focus on maintaining the integrity and availability of the system and data. Should you encounter errors or issues while using this system, this section will help troubleshoot and solve those issues.

    Backup and Restore

    Cumulus has backup and restore functionality built-in to protect Cumulus data and allow recovery of a Cumulus stack. This is currently limited to Cumulus data and not full S3 archive data. Backup and restore is not enabled by default and must be enabled and configured to take advantage of this feature.

    For more information, read the Backup and Restore documentation.

    Elasticsearch reindexing

    If you run into issues with your Elasticsearch index, a reindex operation is available via the Cumulus API. See the Reindexing Guide.

    Information on how to reindex Elasticsearch is in the Cumulus API documentation.

    Troubleshooting Workflows

    Workflows are state machines comprised of tasks and services and each component logs to CloudWatch. The CloudWatch logs for all steps in the execution are displayed in the Cumulus dashboard or you can find them by going to CloudWatch and navigating to the logs for that particular task.

    Workflow Errors

    Visual representations of executed workflows can be found in the Cumulus dashboard or the AWS Step Functions console for that particular execution.

    If a workflow errors, the error will be handled according to the error handling configuration. The task that fails will have the exception field populated in the output, giving information about the error. Further information can be found in the CloudWatch logs for the task.

    Graph of AWS Step Function execution showing a failing workflow

    Workflow Did Not Start

    Generally, first check your rule configuration. If that is satisfactory, the answer will likely be in the CloudWatch logs for the schedule SF or SF starter lambda functions. See the workflow triggers page for more information on how workflows start.

    For Kinesis and SNS rules specifically, if an error occurs during the message consumer process, the fallback consumer lambda will be called and if the message continues to error, a message will be placed on the dead letter queue. Check the dead letter queue for a failure message. Errors can be traced back to the CloudWatch logs for the message consumer and the fallback consumer. Additionally, check that the name and version match those configured in your rule, as rules are filtered by the notification's collection name and version before scheduling executions.

    More information on kinesis error handling is here.

    Operator API Errors

    All operator API calls are funneled through the ApiEndpoints lambda. Each API call is logged to the ApiEndpoints CloudWatch log for your deployment.

    Lambda Errors

    KMS Exception: AccessDeniedException

    KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.

    The above error was being thrown by cumulus lambda function invocation. The KMS key is the encryption key used to encrypt lambda environment variables. The root cause of this error is unknown, but is speculated to be caused by deleting and recreating, with the same name, the IAM role the lambda uses.

    This error can be resolved by switching the lambda's execution role to a different one and then back through the Lambda management console. Unfortunately, this approach doesn't scale well.

    The other resolution (that scales but takes some time) that was found is as follows:

    1. Comment out all lambda definitions (and dependent resources) in your Terraform configuration.
    2. terraform apply to delete the lambdas.
    3. Un-comment the definitions.
    4. terraform apply to recreate the lambdas.

    If this problem occurs with Core lambdas and you are using the terraform-aws-cumulus.zip file source distributed in our release, we recommend using the non-scaling approach as the number of lambdas we distribute is in the low teens, which are likely to be easier and faster to reconfigure one-by-one compared to editing our configs.

    Error: Unable to import module 'index': Error

    This error is shown in the CloudWatch logs for a Lambda function.

    One possible cause is that the Lambda definition in the .tf file defining the lambda is not pointing to the correct packaged lambda source file. In order to resolve this issue, update the lambda definition to point directly to the packaged (e.g. .zip) lambda source file.

    resource "aws_lambda_function" "discover_granules_task" {
    function_name = "${var.prefix}-DiscoverGranules"
    filename = "${path.module}/../../tasks/discover-granules/dist/lambda.zip"
    handler = "index.handler"
    }

    If you are seeing this error when using the Lambda as a step in a Cumulus workflow, then inspect the output for this Lambda step in the AWS Step Function console. If you see the error Cannot find module 'node_modules/@cumulus/cumulus-message-adapter-js', then you need to ensure the lambda's packaged dependencies include cumulus-message-adapter-js.

    - + \ No newline at end of file diff --git a/docs/next/troubleshooting/reindex-elasticsearch/index.html b/docs/next/troubleshooting/reindex-elasticsearch/index.html index c4846d9f5db..52e8d94534b 100644 --- a/docs/next/troubleshooting/reindex-elasticsearch/index.html +++ b/docs/next/troubleshooting/reindex-elasticsearch/index.html @@ -5,7 +5,7 @@ Reindexing Elasticsearch Guide | Cumulus Documentation - + @@ -14,7 +14,7 @@ current index, or the mappings for an index have been updated (they do not update automatically). Any reindexing that will be required when upgrading Cumulus will be in the Migration Steps section of the changelog.

    Switch to a new index and Reindex

    There are two operations needed: reindex and change-index to switch over to the new index. A Change Index/Reindex can be done in either order, but both have their trade-offs.

    If you decide to point Cumulus to a new (empty) index first (with a change index operation), and then Reindex the data to the new index, data ingested while reindexing will automatically be sent to the new index. As reindexing operations can take a while, not all the data will show up on the Cumulus Dashboard right away. The advantage is you do not have to turn of any ingest operations. This way is recommended.

    If you decide to Reindex data to a new index first, and then point Cumulus to that new index, it is not guaranteed that data that is sent to the old index while reindexing will show up in the new index. If you prefer this way, it is recommended to turn off any ingest operations. This order will keep your dashboard data from seeing any interruption.

    Change Index

    This will point Cumulus to the index in Elasticsearch that will be used when retrieving data. Performing a change index operation to an index that does not exist yet will create the index for you. The change index operation can be found here.

    Reindex from the old index to the new index

    The reindex operation will take the data from one index and copy it into another index. The reindex operation can be found here

    Reindex status

    Reindexing is a long-running operation. The reindex-status endpoint can be used to monitor the progress of the operation.

    Index from database

    If you want to just grab the data straight from the database you can perform an Index from Database Operation. After the data is indexed from the database, a Change Index operation will need to be performed to ensure Cumulus is pointing to the right index. It is strongly recommended to turn off workflow rules when performing this operation so any data ingested to the database is not lost.

    Validate reindex

    To validate the reindex, use the reindex-status endpoint. The doc count can be used to verify that the reindex was successful. In the below example the reindex from cumulus-2020-11-3 to cumulus-2021-3-4 was not fully successful as they show different doc counts.

    "indices": {
    "cumulus-2020-11-3": {
    "primaries": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    },
    "total": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    }
    },
    "cumulus-2021-3-4": {
    "primaries": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    },
    "total": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    }
    }
    }

    To further drill down into what is missing, log in to the Kibana instance (found in the Elasticsearch section of the AWS console) and run the following command replacing <index> with your index name.

    GET <index>/_search
    {
    "aggs": {
    "count_by_type": {
    "terms": {
    "field": "_type"
    }
    }
    },
    "size": 0
    }

    which will produce a result like

    "aggregations": {
    "count_by_type": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "logs",
    "doc_count": 483955
    },
    {
    "key": "execution",
    "doc_count": 4966
    },
    {
    "key": "deletedgranule",
    "doc_count": 4715
    },
    {
    "key": "pdr",
    "doc_count": 1822
    },
    {
    "key": "granule",
    "doc_count": 740
    },
    {
    "key": "asyncOperation",
    "doc_count": 616
    },
    {
    "key": "provider",
    "doc_count": 108
    },
    {
    "key": "collection",
    "doc_count": 87
    },
    {
    "key": "reconciliationReport",
    "doc_count": 48
    },
    {
    "key": "rule",
    "doc_count": 7
    }
    ]
    }
    }

    Resuming a reindex

    If a reindex operation did not fully complete it can be resumed using the following command run from the Kibana instance.

    POST _reindex?wait_for_completion=false
    {
    "conflicts": "proceed",
    "source": {
    "index": "cumulus-2020-11-3"
    },
    "dest": {
    "index": "cumulus-2021-3-4",
    "op_type": "create"
    }
    }

    The Cumulus API reindex-status endpoint can be used to monitor completion of this operation.

    - + \ No newline at end of file diff --git a/docs/next/troubleshooting/rerunning-workflow-executions/index.html b/docs/next/troubleshooting/rerunning-workflow-executions/index.html index 72367301add..4ef371f44a7 100644 --- a/docs/next/troubleshooting/rerunning-workflow-executions/index.html +++ b/docs/next/troubleshooting/rerunning-workflow-executions/index.html @@ -5,13 +5,13 @@ Rerunning workflow executions | Cumulus Documentation - +
    Version: Next

    Rerunning workflow executions

    To rerun a Cumulus workflow execution from the AWS console:

    1. Visit the page for an individual workflow execution

    2. Click the "New execution" button at the top right of the screen

      Screenshot of the AWS console for a Step Function execution highlighting the &quot;New execution&quot; button at the top right of the screen

    3. In the "New execution" modal that appears, replace the cumulus_meta.execution_name value in the default input with the value of the new execution ID as seen in the screenshot below

      Screenshot of the AWS console showing the modal window for entering input when running a new Step Function execution

    4. Click the "Start execution" button

    - + \ No newline at end of file diff --git a/docs/next/troubleshooting/troubleshooting-deployment/index.html b/docs/next/troubleshooting/troubleshooting-deployment/index.html index ab5f6da6ad4..3fe167a1151 100644 --- a/docs/next/troubleshooting/troubleshooting-deployment/index.html +++ b/docs/next/troubleshooting/troubleshooting-deployment/index.html @@ -5,7 +5,7 @@ Troubleshooting Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ data-persistence modules, but your config is only creating one Elasticsearch instance. To fix the issue, update the elasticsearch_config variable for your data-persistence module to increase the number of instances:

    {
    domain_name = "es"
    instance_count = 2
    instance_type = "t2.small.elasticsearch"
    version = "5.3"
    volume_size = 10
    }

    Install Dashboard

    Dashboard Configuration

    Issues

    Not Able To Clear Cache

    Problem clearing the cache: EACCES: permission denied, rmdir '/tmp/gulp-cache/default'", this probably means the files at that location, and/or the folder, are owned by someone else (or some other factor prevents you from writing there).

    Workaround Option

    It's possible to workaround this by editing the file cumulus-dashboard/node_modules/gulp-cache/index.js and alter the value of the line var fileCache = new Cache({cacheDirName: 'gulp-cache'}); to something like var fileCache = new Cache({cacheDirName: '<prefix>-cache'});. Now gulp-cache will be able to write to /tmp/<prefix>-cache/default, and the error should resolve.

    Dashboard Deployment

    Issues

    Earthdata Login Error

    The dashboard sends you to an Earthdata Login page that has an error reading "Invalid request, please verify the client status or redirect_uri before resubmitting".

    Check your variables and values

    Check to see if you are missing or have forgotten to update one or more of your EARTHDATA_CLIENT_ID, EARTHDATA_CLIENT_PASSWORD environment variables (from your app/.env file) and re-deploy Cumulus, or you haven't placed the correct values in them, or you've forgotten to add both the "redirect" and "token" URL to the Earthdata Application.

    Caching Issue

    There is odd caching behavior associated with the dashboard and Earthdata Login at this point in time that can cause the above error to reappear on the Earthdata Login page loaded by the dashboard even after fixing the cause of the error.

    browser solution

    If you experience this, attempt to access the dashboard in a new browser window, and it should work.

    - + \ No newline at end of file diff --git a/docs/next/upgrade-notes/cumulus_distribution_migration/index.html b/docs/next/upgrade-notes/cumulus_distribution_migration/index.html index 1437d14e615..f09cac76030 100644 --- a/docs/next/upgrade-notes/cumulus_distribution_migration/index.html +++ b/docs/next/upgrade-notes/cumulus_distribution_migration/index.html @@ -5,14 +5,14 @@ Migrate from TEA deployment to Cumulus Distribution | Cumulus Documentation - +
    Version: Next

    Migrate from TEA deployment to Cumulus Distribution

    Background

    The Cumulus Distribution API is configured to use the AWS Cognito OAuth client. This API can be used instead of the Thin Egress App, which is the default distribution API if using the Deployment Template.

    Configuring a Cumulus Distribution deployment

    See these instructions for deploying the Cumulus Distribution API.

    Important note if migrating from TEA to Cumulus Distribution

    If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    - + \ No newline at end of file diff --git a/docs/next/upgrade-notes/migrate_tea_standalone/index.html b/docs/next/upgrade-notes/migrate_tea_standalone/index.html index 9ccb16560c2..7db2acca74f 100644 --- a/docs/next/upgrade-notes/migrate_tea_standalone/index.html +++ b/docs/next/upgrade-notes/migrate_tea_standalone/index.html @@ -5,13 +5,13 @@ Migrate TEA deployment to standalone module | Cumulus Documentation - +
    Version: Next

    Migrate TEA deployment to standalone module

    Background

    info

    This document is only relevant for upgrades of Cumulus from versions < 3.x.x to versions > 3.x.x

    Previous versions of Cumulus included deployment of the Thin Egress App (TEA) by default in the distribution module. As a result, Cumulus users who wanted to deploy a new version of TEA to wait on a new release of Cumulus that incorporated that release.

    In order to give Cumulus users the flexibility to deploy newer versions of TEA whenever they want, deployment of TEA has been removed from the distribution module and Cumulus users must now add the TEA module to their deployment. Guidance on integrating the TEA module to your deployment is provided, or you can refer to Cumulus core example deployment code for the thin_egress_app module.

    By default, when upgrading Cumulus and moving from TEA deployed via the distribution module to deployed as a separate module, your API gateway for TEA would be destroyed and re-created, which could cause outages for any Cloudfront endpoints pointing at that API gateway.

    These instructions outline how to modify your state to preserve your existing Thin Egress App (TEA) API gateway when upgrading Cumulus and moving deployment of TEA to a standalone module. If you do not care about preserving your API gateway for TEA when upgrading your Cumulus deployment, you can skip these instructions.

    Prerequisites

    Notes about state management

    These instructions will involve manipulating your Terraform state via terraform state mv commands. These operations are extremely dangerous, since a mistake in editing your Terraform state can leave your stack in a corrupted state where deployment may be impossible or may result in unanticipated resource deletion.

    Since bucket versioning preserves a separate version of your state file each time it is written, and the Terraform state modification commands overwrite the state file, we can mitigate the risk of these operations by downloading the most recent state file before starting the upgrade process. Then, if anything goes wrong during the upgrade, we can restore that previous state version. Guidance on how to perform both operations is provided below.

    Download your most recent state version

    Run this command to download the most recent cumulus deployment state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp s3://BUCKET/KEY /path/to/terraform.tfstate

    Restore a previous state version

    Upload the state file that was previously downloaded to the bucket/key for your state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp /path/to/terraform.tfstate s3://BUCKET/KEY

    Then run terraform plan, which will give an error because we manually overwrote the state file and it is now out of sync with the lock table Terraform uses to track your state file:

    Error: Error loading state: state data in S3 does not have the expected content.

    This may be caused by unusually long delays in S3 processing a previous state
    update. Please wait for a minute or two and try again. If this problem
    persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
    to manually verify the remote state and update the Digest value stored in the
    DynamoDB table to the following value: <some-digest-value>

    To resolve this error, run this command and replace DYNAMO_LOCK_TABLE, BUCKET and KEY with the correct values from cumulus-tf/terraform.tf, and use the digest value from the previous error output:

     aws dynamodb put-item \
    --table-name DYNAMO_LOCK_TABLE \
    --item '{
    "LockID": {"S": "BUCKET/KEY-md5"},
    "Digest": {"S": "some-digest-value"}
    }'

    Now, if you re-run terraform plan, it should work as expected.

    Migration instructions

    note

    These instructions assume that you are deploying the thin_egress_app module as shown in the Cumulus core example deployment code.

    1. Ensure that you have downloaded the latest version of your state file for your cumulus deployment

    2. Find the URL for your <prefix>-thin-egress-app-EgressGateway API gateway. Confirm that you can access it in the browser and that it is functional.

    3. Run terraform plan. You should see output like (edited for readability):

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be created
      + resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket.lambda_source will be created
      + resource "aws_s3_bucket" "lambda_source" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be created
      + resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be created
      + resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be created
      + resource "aws_s3_bucket_object" "lambda_source" {

      # module.thin_egress_app.aws_security_group.egress_lambda[0] will be created
      + resource "aws_security_group" "egress_lambda" {

      ...

      # module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be destroyed
      - resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source will be destroyed
      - resource "aws_s3_bucket" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be destroyed
      - resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be destroyed
      - resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source will be destroyed
      - resource "aws_s3_bucket_object" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda[0] will be destroyed
      - resource "aws_security_group" "egress_lambda" {
    4. Run the state modification commands. The commands must be run in exactly this order:

       # Move security group
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda module.thin_egress_app.aws_security_group.egress_lambda

      # Move TEA storage bucket
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source module.thin_egress_app.aws_s3_bucket.lambda_source

      # Move TEA lambda source code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source module.thin_egress_app.aws_s3_bucket_object.lambda_source

      # Move TEA lambda dependency code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive

      # Move TEA Cloudformation template
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template module.thin_egress_app.aws_s3_bucket_object.cloudformation_template

      # Move URS creds secret version
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret_version.thin_egress_urs_creds aws_secretsmanager_secret_version.thin_egress_urs_creds

      # Move URS creds secret
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret.thin_egress_urs_creds aws_secretsmanager_secret.thin_egress_urs_creds

      # Move TEA Cloudformation stack
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app module.thin_egress_app.aws_cloudformation_stack.thin_egress_app

      Depending on how you were supplying a bucket map to TEA, there may be an additional step. If you were specifying the bucket_map_key variable to the cumulus module to use a custom bucket map, then you can ignore this step and just ensure that the bucket_map_file variable to the TEA module uses that same S3 key. Otherwise, if you were letting Cumulus generate a bucket map for you, then you need to take this step to migrate that bucket map:

      # Move bucket map
      terraform state mv module.cumulus.module.distribution.aws_s3_bucket_object.bucket_map_yaml[0] aws_s3_bucket_object.bucket_map_yaml
    5. Run terraform plan again. You may still see a few additions/modifications pending like below, but you should not see any deletion of Thin Egress App resources pending:

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be updated in-place
      ~ resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be updated in-place
      ~ resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_source" {

      If you still see deletion of module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app pending, then something went wrong and you should restore the previously downloaded state file version and start over from step 1. Otherwise, proceed to step 6.

    6. Once you have confirmed that everything looks as expected, run terraform apply.

    7. Visit the same API gateway from step 1 and confirm that it still works.

    Your TEA deployment has now been migrated to a standalone module, which gives you the ability to upgrade the deployed version of TEA independently of Cumulus releases.

    - + \ No newline at end of file diff --git a/docs/next/upgrade-notes/rds-phase-3-data-migration-guidance/index.html b/docs/next/upgrade-notes/rds-phase-3-data-migration-guidance/index.html index cd09c361b12..84bfe0ebc25 100644 --- a/docs/next/upgrade-notes/rds-phase-3-data-migration-guidance/index.html +++ b/docs/next/upgrade-notes/rds-phase-3-data-migration-guidance/index.html @@ -5,13 +5,13 @@ Data Integrity & Migration Guidance (RDS Phase 3 Upgrade) | Cumulus Documentation - +
    Version: Next

    Data Integrity & Migration Guidance (RDS Phase 3 Upgrade)

    A few issues were identied as part of the RDS Phase 2 release. These issues could impact Granule data-integrity and are described below along with recommended actions and guidance going forward.

    Issue Descriptions

    Issue 1

    Relevant ticket: CUMULUS-3019

    Ingesting granules will delete unrelated files from the Files Postgres table. This is due to an issue in our logic to remove excess files when writing granules and fixed in Cumulus versions 13.2.1, 12.0.2, 11.1.5

    With this bug we believe the data in Dynamo is the most reliable and Postgres is out-of-sync.

    Issue 2

    Relevant ticket: CUMULUS-3024

    Updating an existing granule either via API or Workflow could result in datastores becoming out-of-sync if a partial granule record is provided. Our update logic operates differently in Postgres and Dynamo/Elastic. If a partial object is provided in an update payload the Postgres record will delete/nullify fields not present in the payload. Dynamo/Elastic will retain existing values and not delete/nullify.

    With this bug it’s possible that either Dynamo or PG could be the source of truth. It’s likely that it’s still Dynamo.

    Issue 3

    Relevant ticket: CUMULUS-3024

    Updating an existing granule with an empty files array in the update payload results in datastores becoming out-of-sync. If an empty array is provided, existing files in Dynamo and Elastic will be removed. Existing files in Postgres will be retained.

    With this bug Postgres is the source of truth. Files are retained in PG and incorrectly removed in Dynamo/Elastic.

    Issue 4

    Relevant ticket: CUMULUS-3017

    Updating/putting a granule via framework writes that duplicates a granuleId but has a different collection results in overwrite of the DynamoDB granule but a new granule record for Postgres. This intended post RDS transition, however should not be happening now.

    With this bug we believe Dynamo is the source of truth, and ‘excess’ older granules will be left in postgres. This should be detectable with tooling/query to detect duplicate granuleIds in the granules table.

    Issue 5

    Relevant ticket: CUMULUS-3024

    This is a sub-issue of issue 2 above - due to the way we assign a PDR name to a record, if the pdr field is missing from the final payload for a granule as part of a workflow message write, the final granule record will not link the PDR to the granule properly in postgres, however the dynamo record will have the linked PDR. This can happen in situations where the granule is written prior to completion with the PDR in the payload, but then downstream only the granule object is included, particularly in multi-workflow ingest scenarios and/or bulk update situations.

    Immediate Actions

    1. Re-review the issues described above

      • GHRC was able to scope the affected granules to specific collections, which makes the recovery process much easier. This may not be an option for all DAACs.
    2. If you have not ingested granules or performed partial granule updates on affected Cumulus versions (questions 1 and 2 on the survey), no action is required. You may update to the latest version of Cumulus.

    3. One option to ensure your Postgres data matches Dynamo is running the data-migration lambda (see below for instructions) before updating to the latest Cumulus version if both of the following are true:

      • you have ingested granules using an affected Cumulus version
      • your DAAC has not had any operations that updated an existing granule with an empty files array (granule.files = [])
    4. A second option for DAACs that have ingested data using an affected Cumulus version is to use your DAAC’s recovery tools or reingest the affected granules. This is likely the most certain method for ensuring Postgres contains the correct data but may be infeasible depending on the size of data holdings, etc..

    Guidance Going Forward

    1. Before updating to Cumulus version 16.x and beyond, take a snapshot of your DynamoDB instance. The v16 update removes the DynamoDB tables. This snapshot would be for use in unexpected data recovery scenarios only.

    2. Cumulus recommends that you establish and follow a database backup/disaster recovery protocol for your RDS database, which should include periodic backups. The frequency will depend on each DAAC’s database architecture, comfort level, datastore size, and time available. Relevant AWS Docs

    3. Invest future development effort in data validation/integrity tools and procedures. Each DAAC has different requirements here. Each DAAC should maintain procedures for validating their Cumulus datastore against their holdings.

    Running a Granule Migration

    Instructions for running the data-migration operation to sync Granules from DynamoDB to PostgreSQL

    The data-migration2 Lambda (which is invoked asynchronously using ${PREFIX}-postgres-migration-async-operation) uses Cumulus' Granule upsert logic to write granules from DynamoDB to PostgreSQL. This is particularly notable because granules with a running or queued status will only migrate a subset of their fields:

    • status
    • timestamp
    • updated_at
    • created_at

    It is recommended that users ensure their granules are in a final state (running, completed) before running this data migration. If there are Granules with an incomplete status, it may impact the data migration.

    For example, if a Granule in the running status is updated by a workflow or API call (containing an updated status) and fails, that granule will have the original running status, not the intended/updated status. Failed Granule writes/updates should be evaluated and resolved prior to this data migration.

    Cumulus provides the Cumulus Dead Letter Archive which is populated by the Dead Letter Queue for the sfEventSqsToDbRecords Lambda, which is responsible for Cumulus message writes to PostgreSQL. This may not catch all write failures depending on where the failure happened and workflow configuration but may be a useful tool.

    If a Granule record is correct except for the status, Cumulus provides an API to update specific granule fields.

    - + \ No newline at end of file diff --git a/docs/next/upgrade-notes/update-cma-2.0.2/index.html b/docs/next/upgrade-notes/update-cma-2.0.2/index.html index 3d1dc009e4d..52bf30e3474 100644 --- a/docs/next/upgrade-notes/update-cma-2.0.2/index.html +++ b/docs/next/upgrade-notes/update-cma-2.0.2/index.html @@ -5,13 +5,13 @@ Upgrade to CMA 2.0.2 | Cumulus Documentation - +
    Version: Next

    Upgrade to CMA 2.0.2

    Updating a Cumulus Deployment to CMA 2.0.2

    Background

    The Cumulus Message Adapter has been updated in release 2.0.2 to no longer utilize the AWS step function API to look up the defined name of a step function task for population in meta.workflow_tasks, but instead use an incrementing integer field.

    Additionally a bugfix was released in the form of v2.0.1/v2.0.2 following the initial 2.0.0 release, so all users should update to release 2.0.2

    The update is not tied to a particular version of Core, however the update should be done across all task components in order to ensure consistent execution records.

    Changes

    Execution Record Update

    This update functionally means that Cumulus tasks/activities using the CMA will now record a record that looks like the following in meta.workflowtasks, and more importantly in the tasks column for an execution record:

    Original

          "DiscoverGranules": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "QueueGranules": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    New

          "0": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "1": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    Actions Required

    The following should be done as part of a Cumulus stack update to utilize cumulus message adapter > 2.0.2:

    • Python tasks that utilize cumulus-message-adapter-python should be updated to use > 2.0.0, their lambdas rebuilt and Cumulus workflows reconfigured to use the updated version.

    • Python activities that utilize cumulus-process-py should be rebuilt using > 1.0.0 with updated dependencies, and have their images deployed/Cumulus configured to use the new version.

    • The cumulus-message-adapter v2.0.2 lambda layer should be made available in the deployment account, and the Cumulus deployment should be reconfigured to use it (via the cumulus_message_adapter_lambda_layer_version_arn variable in the cumulus module). This should address all Core node.js tasks that utilize the CMA, and many contributed node.js/JAVA components.

    Once the above have been done, redeploy Cumulus to apply the configuration and the updates should be live.

    - + \ No newline at end of file diff --git a/docs/next/upgrade-notes/update-task-file-schemas/index.html b/docs/next/upgrade-notes/update-task-file-schemas/index.html index 1da9884fd1b..25437cab54a 100644 --- a/docs/next/upgrade-notes/update-task-file-schemas/index.html +++ b/docs/next/upgrade-notes/update-task-file-schemas/index.html @@ -5,13 +5,13 @@ Updates to task granule file schemas | Cumulus Documentation - +
    Version: Next

    Updates to task granule file schemas

    Background

    Most Cumulus workflow tasks expect as input a payload of granule(s) which contain the files for each granule. Most tasks also return this same granule structure as output.

    However, up to this point, there was inconsistency in the schemas for the granule files objects expected by each task. Furthermore, there was no guarantee of consistency between granule files objects as stored in the database and the expectations of any given workflow task.

    Thus, when performing bulk granule operations which pass granules from the database into a Cumulus workflow, it was possible for there to be schema validation failures depending on which task was used to start the workflow and its particular schema.

    In order to rectify this situation, CUMULUS-2388 was filed and addressed to create a common granule files schema between nearly all of the Cumulus tasks (exceptions discussed below) and the Cumulus database. The following documentation explains the manual changes you need to make to your deployment in order to be compatible with the updated files schema.

    Updated files schema

    The updated granule files schema can be found here.

    These former properties were deprecated (with notes about how to derive the same information from the updated schema, if possible):

    • filename - concatenate the bucket and key values with a directory separator (/)
    • name - use fileName property
    • etag - ETags are no longer provided as an individual file property. Instead, a separate etags object mapping S3 URIs to ETag values is provided as output from the following workflow tasks (guidance on how to integrate this output with your workflows is provided in the Upgrading your workflows section below):
      • update-granules-cmr-metadata-file-links
      • hyrax-metadata-updates
    • fileStagingDir - no longer supported
    • url_path - no longer supported
    • duplicate_found - This property is no longer supported, however sync-granule and move-granules now produce a separate granuleDuplicates object as part of their output. The granuleDuplicates object is a map of granules by granule ID which includes the files that encountered duplicates during processing. Guidance on how to integrate granuleDuplicates information into your workflow configuration is provided below.

    Exceptions

    These workflow tasks did not have their schema for granule files updated:

    • discover-granules - no updates
    • queue-granules - no updates
    • parse-pdr - no updates
    • sync-granule - input schema not updated, output schema was updated

    The reason that these task schemas were not updated is that all of these tasks start before the files have been ingested to S3, thus much of the information that is required in the updated files schema like bucket, key, or checksum is not yet known.

    Bulk granule operations

    Since the input schema for the above tasks was not updated, that means you cannot run bulk granule operations against workflows if they start with any of those tasks. Bulk granule operations work by loading the specified granules from the database and sending them as input to a specified workflow, so if the specified workflow begins with a task whose input schema does not conform to what is coming out of the database, there will be schema errors.

    Upgrading your deployment

    Upgrading your workflows

    For any workflows using the update-granules-cmr-metadata-file-links task before the hyrax-metadata-updates and/or post-to-cmr tasks, update the step definition for update-granules-cmr-metadata-file-links as follows:

        "UpdateGranulesCmrMetadataFileLinksStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    hyrax-metadata-updates

    For any workflows using the hyrax-metadata-updates task before a post-to-cmr task, update the definition of the hyrax-metadata-updates step as follows:

        "HyraxMetadataUpdatesTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    post-to-cmr

    For any workflows using post-to-cmr task after the update-granules-cmr-metadata-file-links or hyrax-metadata-updates tasks, update the post-to-cmr step definition as follows:

        "CmrStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}"
    }
    }
    },
    ...more configuration...

    Example workflow

    For an example workflow integrating all of these changes, please see our example ingest and publish workflow.

    Optional - Integrate granuleDuplicates information
    View Details
    note

    The granuleDuplicates output is purely informational and does not have any bearing on the separate configuration for how duplicates should be handled.

    You can include granuleDuplicates output from the sync-granule or move-granules tasks in your workflow messages like so:

        "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    ...other config...
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granuleDuplicates}",
    "destination": "{$.meta.sync_granule.granule_duplicates}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    }
    ...more configuration...

    The result of this configuration is that the granuleDuplicates output from sync-granule would be placed in meta.sync_granule.granule_duplicates on the workflow message and remain there throughout the rest of the workflow. The same configuration could be replicated for the move-granules task, but be sure to use a different destination in the workflow message for the granuleDuplicates output.

    Updating collection URL path templates

    Collections can specify url_path templates to dynamically generate the final location of files. As part of url_path templates, file object properties can be interpolated to generate the file path. Thus, these url_path templates need to be updated to ensure that they are compatible with the updated files schema and the properties that will actually be available on file objects.

    See the notes on the updated files schema to know which properties are available and which previously existing properties were deprecated.

    As an example, you will want to update any url_path properties in your collections to remove references to file.name and replace them with references to file.fileName like so:

    - "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.name, 0, 3)}",
    + "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.fileName, 0, 3)}",
    - + \ No newline at end of file diff --git a/docs/next/upgrade-notes/upgrade-rds-phase-3-release/index.html b/docs/next/upgrade-notes/upgrade-rds-phase-3-release/index.html index 79a0f0058ae..7045e53ade6 100644 --- a/docs/next/upgrade-notes/upgrade-rds-phase-3-release/index.html +++ b/docs/next/upgrade-notes/upgrade-rds-phase-3-release/index.html @@ -5,14 +5,14 @@ Upgrade RDS Phase 3 Release | Cumulus Documentation - +
    Version: Next

    Upgrade RDS Phase 3 Release

    Background

    Release v16 of Cumulus Core includes an update to remove the now-unneeded AWS DynamoDB tables for the primary archive, as this datastore has been fully migrated to PostgreSQL databases in prior releases, and should have been operating in a parallel write mode to allow for repair/remediation of prior issues.

    Requirements

    To update to this release (and beyond) users must:

    • Have deployed a release of at least version 11.0.0 (preferably at least the latest supported minor version in the 11.1.x release series), having successfully completed the transition to using PostgreSQL as the primary datastore in release 11
    • Completed evaluation of the primary datastore for data irregularities that might be resolved by re-migration of data from the DynamoDB datastores.
    • Review the CHANGELOG for any migration instructions/changes between (and including) this release and the release you're upgrading from. Complete migration instructions from the previous release series should be included in release notes/CHANGELOG for this release, this document notes migration instructions specifically for release 16.0.0+, and is not all-inclusive if upgrading from multiple prior release versions.
    • Configure your deployment terraform environment to utilize the new release, noting all migration instructions.
    • The PostgreSQL database cluster should be updated to the supported version (Aurora Postgres 11.13+ compatible)

    Suggested Prerequisites

    In addition to the above requirements, we suggest users:

    • Retain a backup of the primary DynamoDB datastore in case of recovery/integrity concerns exist between DynamoDB and PostgreSQL.

      This should only be considered if remediation/re-migration from DynamoDB has recently occurred, specifically due to the issues reported in the following tickets:

      • CUMULUS-3019
      • CUMULUS-3024
      • CUMULUS-3017

      and other efforts included in the outcome from CUMULUS-3035/CUMULUS-3071.

    • Halt all ingest prior to performing the version upgrade.

    • Run load testing/functional testing.

      While the majority of the modifications for release 16 are related to DynamoDB removal, we always encourage user engineering teams ensure compatibility at scale with their deployment's configuration prior to promotion to a production environment to ensure a smooth upgrade.

    Upgrade procedure

    1. (Optional) Halt ingest

    If ingest is not halted, once the data-persistence module is deployed but the main Core module is not deployed, existing database writes will fail, resulting in in-flight workflow messages failing to the message Dead Letter Archive, and all API write related calls failing.

    While this is optional, it is highly encouraged, as cleanup could be significant.

    2. Deploy the data persistence module

    Ensure your source for the data-persistence module is set to the release version (substituting v16.0.0 for the latest v16 release):

      source = "https://github.com/nasa/cumulus/releases/download/v16.0.0/terraform-aws-cumulus.zip//tf-modules/data-persistence"

    Run terraform init to bring all updated source modules, then run terraform apply and evaluate the changeset before proceeding. The changeset should include blocks like the following for each table removed:

    # module.data_persistence.aws_dynamodb_table.collections_table will be destroyed
    # module.data_persistence.aws_dynamodb_table.executions_table will be destroyed
    # module.data_persistence.aws_dynamodb_table.files_table will be destroyed
    # module.data_persistence.aws_dynamodb_table.granules_table will be destroyed
    # module.data_persistence.aws_dynamodb_table.pdrs_table will be destroyed

    In addition, you should expect to see the outputs from the module remove the references to the DynamoDB tables:

    Changes to Outputs:
    ~ dynamo_tables = {
    access_tokens = {
    arn = "arn:aws:dynamodb:us-east-1:XXXXXX:table/prefix-AccessTokensTable"
    name = "prefix-AccessTokensTable"
    }
    async_operations = {
    arn = "arn:aws:dynamodb:us-east-1:XXXXXX:table/prefix-AsyncOperationsTable"
    name = "prefix-AsyncOperationsTable"
    }
    - collections = {
    - arn = "arn:aws:dynamodb:us-east-1:XXXXXX:table/prefix-CollectionsTable"
    - name = "prefix-CollectionsTable"
    } -> null
    - executions = {
    - arn = "arn:aws:dynamodb:us-east-1:XXXXXX:table/prefix-ExecutionsTable"
    - name = "prefix-ExecutionsTable"
    } -> null
    - files = {
    - arn = "arn:aws:dynamodb:us-east-1:XXXXXX:table/prefix-FilesTable"
    - name = "prefix-FilesTable"
    } -> null
    - granules = {
    - arn = "arn:aws:dynamodb:us-east-1:XXXXXX:table/prefix-GranulesTable"
    - name = "prefix-GranulesTable"
    } -> null
    - pdrs = {
    - arn = "arn:aws:dynamodb:us-east-1:XXXXXX:table/prefix-PdrsTable"
    - name = "prefix-PdrsTable"
    } -> null

    Once this completes successfully, proceed to the next step.

    Deploy cumulus-tf module

    Ensure your source for the cumulus-tf module is set to the release version (substituting v16.0.0 for the latest v16 release):

    source = "https://github.com/nasa/cumulus/releases/download/v16.0.0/terraform-aws-cumulus.zip//tf-modules/cumulus"

    You should expect to see a significant changeset in Core provided resources, in addition to the following resources being destroyed from the RDS Phase 3 update set:

    # module.cumulus.module.archive.aws_cloudwatch_log_group.granule_files_cache_updater_logs will be destroyed
    # module.cumulus.module.archive.aws_iam_role.granule_files_cache_updater_lambda_role will be destroyed
    # module.cumulus.module.archive.aws_iam_role.migration_processing will be destroyed
    # module.cumulus.module.archive.aws_iam_role_policy.granule_files_cache_updater_lambda_role_policy will be destroyed
    # module.cumulus.module.archive.aws_iam_role_policy.migration_processing will be destroyed
    # module.cumulus.module.archive.aws_iam_role_policy.process_dead_letter_archive_role_policy will be destroyed
    # module.cumulus.module.archive.aws_iam_role_policy.publish_collections_lambda_role_policy will be destroyed
    # module.cumulus.module.archive.aws_iam_role_policy.publish_executions_lambda_role_policy will be destroyed
    # module.cumulus.module.archive.aws_iam_role_policy.publish_granules_lambda_role_policy will be destroyed
    # module.cumulus.module.archive.aws_lambda_event_source_mapping.granule_files_cache_updater will be destroyed
    # module.cumulus.module.archive.aws_lambda_event_source_mapping.publish_pdrs will be destroyed
    # module.cumulus.module.archive.aws_lambda_function.execute_migrations will be destroyed
    # module.cumulus.module.archive.aws_lambda_function.granule_files_cache_updater will be destroyed
    # module.cumulus.module.data_migration2.aws_iam_role.data_migration2 will be destroyed
    # module.cumulus.module.data_migration2.aws_iam_role_policy.data_migration2 will be destroyed
    # module.cumulus.module.data_migration2.aws_lambda_function.data_migration2 will be destroyed
    # module.cumulus.module.data_migration2.aws_security_group.data_migration2[0] will be destroyed
    # module.cumulus.module.postgres_migration_async_operation.aws_iam_role.postgres_migration_async_operation_role will be destroyed
    # module.cumulus.module.postgres_migration_async_operation.aws_iam_role_policy.postgres_migration_async_operation will be destroyed
    # module.cumulus.module.postgres_migration_async_operation.aws_lambda_function.postgres-migration-async-operation will be destroyed
    # module.cumulus.module.postgres_migration_async_operation.aws_security_group.postgres_migration_async_operation[0] will be destroyed
    # module.cumulus.module.postgres_migration_count_tool.aws_iam_role.postgres_migration_count_role will be destroyed
    # module.cumulus.module.postgres_migration_count_tool.aws_iam_role_policy.postgres_migration_count will be destroyed
    # module.cumulus.module.postgres_migration_count_tool.aws_lambda_function.postgres_migration_count_tool will be destroyed
    # module.cumulus.module.postgres_migration_count_tool.aws_security_group.postgres_migration_count[0] will be destroyed

    Possible deployment issues

    Security group deletion

    The following security group resources will be deleted as part of this update:

    module.cumulus.module.data_migration2.aws_security_group.data_migration2[0]
    module.cumulus.module.postgres_migration_count_tool.aws_security_group.postgres_migration_count[0]
    module.cumulus.module.postgres_migration_async_operation.aws_security_group.postgres_migration_async_operation[0]

    Because the AWS resources associated with these security groups can take some time to be properly updated (in testing this was 20-35 minutes), these deletions may cause the deployment to take some time. If for some unexpected reason this takes longer than expected and this causes the update to time out, you should be able to continue the deployment by re-running terraform to completion.

    Users may also opt to attempt to reassign the affected Network Interfaces from the Security Group/deleting the security group manually if this situation occurs and the deployment time is not desirable.

    - + \ No newline at end of file diff --git a/docs/next/upgrade-notes/upgrade-rds/index.html b/docs/next/upgrade-notes/upgrade-rds/index.html index 31c24fbbc75..d2d57a4e2f7 100644 --- a/docs/next/upgrade-notes/upgrade-rds/index.html +++ b/docs/next/upgrade-notes/upgrade-rds/index.html @@ -5,7 +5,7 @@ Upgrade to RDS release | Cumulus Documentation - + @@ -21,7 +21,7 @@ | cutoffSeconds | number | Number of seconds prior to this execution to 'cutoff' reconciliation queries. This allows in-progress/other in-flight operations time to complete and propagate to Elasticsearch/postgres. | 3600 | | dbConcurrency | number | Sets max number of parallel collections reports the script will run at a time. | 20 | | dbMaxPool | number | Sets the maximum number of connections the database pool has available. Modifying this may result in unexpected failures. | 20 |

    - + \ No newline at end of file diff --git a/docs/next/upgrade-notes/upgrade_tf_version_0.13.6/index.html b/docs/next/upgrade-notes/upgrade_tf_version_0.13.6/index.html index 2c6e906a3f6..d141b02e12c 100644 --- a/docs/next/upgrade-notes/upgrade_tf_version_0.13.6/index.html +++ b/docs/next/upgrade-notes/upgrade_tf_version_0.13.6/index.html @@ -5,13 +5,13 @@ Upgrade to TF version 0.13.6 | Cumulus Documentation - +
    Version: Next

    Upgrade to TF version 0.13.6

    Background

    Cumulus pins its support to a specific version of Terraform see: deployment documentation. The reason for only supporting one specific Terraform version at a time is to avoid deployment errors than can be caused by deploying to the same target with different Terraform versions.

    Cumulus is upgrading its supported version of Terraform from 0.12.12 to 0.13.6. This document contains instructions on how to perform the upgrade for your deployments.

    Prerequisites

    • Follow the Terraform guidance for what to do before upgrading, notably ensuring that you have no pending changes to your Cumulus deployments before proceeding.
      • You should do a terraform plan to see if you have any pending changes for your deployment (for both the data-persistence-tf and cumulus-tf modules), and if so, run a terraform apply before doing the upgrade to Terraform 0.13.6
    • Review the Terraform v0.13 release notes to prepare for any breaking changes that may affect your custom deployment code. Cumulus' deployment code has already been updated for compatibility with version 0.13.
    • Install Terraform version 0.13.6. We recommend using Terraform Version Manager tfenv to manage your installed versons of Terraform, but this is not required.

    Upgrade your deployment code

    Terraform 0.13 does not support some of the syntax from previous Terraform versions, so you need to upgrade your deployment code for compatibility.

    Terraform provides a 0.13upgrade command as part of version 0.13 to handle automatically upgrading your code. Make sure to check out the documentation on batch usage of 0.13upgrade, which will allow you to upgrade all of your Terraform code with one command.

    Run the 0.13upgrade command until you have no more necessary updates to your deployment code.

    Upgrade your deployment

    1. Ensure that you are running Terraform 0.13.6 by running terraform --version. If you are using tfenv, you can switch versions by running tfenv use 0.13.6.

    2. For the data-persistence-tf and cumulus-tf directories, take the following steps:

      1. Run terraform init --reconfigure. The --reconfigure flag is required, otherwise you might see an error like:

        Error: Failed to decode current backend config

        The backend configuration created by the most recent run of "terraform init"
        could not be decoded: unsupported attribute "lock_table". The configuration
        may have been initialized by an earlier version that used an incompatible
        configuration structure. Run "terraform init -reconfigure" to force
        re-initialization of the backend.
      2. Run terraform apply to perform a deployment.

        caution

        Even if Terraform says that no resource changes are pending, running the apply using Terraform version 0.13.6 will modify your backend state from version 0.12.12 to version 0.13.6 without requiring approval. Updating the backend state is a necessary part of the version 0.13.6 upgrade, but it is not completely transparent.

    - + \ No newline at end of file diff --git a/docs/next/workflow_tasks/discover_granules/index.html b/docs/next/workflow_tasks/discover_granules/index.html index e2754304eb3..c4a22738b59 100644 --- a/docs/next/workflow_tasks/discover_granules/index.html +++ b/docs/next/workflow_tasks/discover_granules/index.html @@ -5,7 +5,7 @@ Discover Granules | Cumulus Documentation - + @@ -21,7 +21,7 @@ included in a granule's file list. That is, no such filtering based on filename occurs as described above.

    When set on the task configuration, the value applies to all collections during discovery. Otherwise, this property may be set on individual collections.

    Concurrency

    A number property that determines the level of concurrency with which granule duplicate checks are performed when duplicateGranuleHandling is skip or error.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when discover-granules discovers a large number of granules with skip or error duplicate handling. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the discover-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/next/workflow_tasks/files_to_granules/index.html b/docs/next/workflow_tasks/files_to_granules/index.html index 79798f6c815..22d425a00e5 100644 --- a/docs/next/workflow_tasks/files_to_granules/index.html +++ b/docs/next/workflow_tasks/files_to_granules/index.html @@ -5,13 +5,13 @@ Files To Granules | Cumulus Documentation - +
    Version: Next

    Files To Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming config.inputGranules and the task input list of s3 URIs along with the rest of the configuration objects to take the list of incoming files and sort them into a list of granule objects.

    Please note Files passed in without metadata defined previously for config.inputGranules will be added with the following keys:

    • size
    • bucket
    • key
    • fileName

    It is primarily intended to support compatibility with the standard output of a processing task, and convert that output into a granule object accepted as input by the majority of other Cumulus tasks.

    Task Inputs

    Input

    This task expects an incoming input that contains an array of 'staged' S3 URIs to move to their final archive location.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    inputGranules

    An array of Cumulus granule objects.

    This object will be used to define metadata values for the move granules task, and is the basis for the updated object that will be added to the output.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/next/workflow_tasks/lzards_backup/index.html b/docs/next/workflow_tasks/lzards_backup/index.html index 8d8863d508c..cde04849fdf 100644 --- a/docs/next/workflow_tasks/lzards_backup/index.html +++ b/docs/next/workflow_tasks/lzards_backup/index.html @@ -5,13 +5,13 @@ LZARDS Backup | Cumulus Documentation - +
    Version: Next

    LZARDS Backup

    The LZARDS backup task takes an array of granules and initiates backup requests to the LZARDS API, which will be handled asynchronously by LZARDS.

    info

    For more information about LZARDS and the backup process go to the LZARDS Overview.

    Deployment

    The LZARDS backup task is not automatically deployed with Cumulus. To deploy the task through the Cumulus module, first you must specify a lzards_launchpad_passphrase in your terraform variables (e.g. variables.tf) like so:

    variable "lzards_launchpad_passphrase" {
    type = string
    default = ""
    }

    Then you can specify a value for your lzards_launchpad_passphrase in terraform.tfvars like so:

    lzards_launchpad_passphrase = your-passphrase

    Lastly, you need to make sure that the lzards_launchpad_passphrase is passed into the Cumulus module (in main.tf) like so:

    lzards_launchpad_passphrase  = var.lzards_launchpad_passphrase

    In short, deploying the LZARDS task requires configuring a passphrase variable and ensuring that your TF configuration passes that variable into the Cumulus module.

    Additional terraform configuration for the LZARDS task can be found in the cumulus module's variables.tf file, where the the relevant variables are prefixed with lzards_. You can add these variables to your deployment using the same process outlined above for lzards_launchpad_passphrase.

    Task Inputs

    Input

    This task expects an array of granules as input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Task Outputs

    Output

    The LZARDS task outputs a composite object containing:

    • the input granules array, and
    • a backupResults object that describes the results of LZARDS backup attempts.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    - + \ No newline at end of file diff --git a/docs/next/workflow_tasks/move_granules/index.html b/docs/next/workflow_tasks/move_granules/index.html index d7e339a5d56..077b0af47f9 100644 --- a/docs/next/workflow_tasks/move_granules/index.html +++ b/docs/next/workflow_tasks/move_granules/index.html @@ -5,13 +5,13 @@ Move Granules | Cumulus Documentation - +
    Version: Next

    Move Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming event.input array of Cumulus granule objects to do the following:

    • Move granules from their 'staging' location to the final location (as configured in the Sync Granules task)

    • Update the event.input object with the new file locations.

    • If the granule has a ECHO10/UMM CMR file(.cmr.xml or .cmr.json) file included in the event.input:

      • Update that file's access locations
      • Add it to the appropriate access URL category for the CMR filetype as defined by granule CNM filetype.
      • Set the CMR file to 'metadata' in the output granules object and add it to the granule files if it's not already present.
    invalid CNM type

    Granules without a valid CNM type set in the granule file type field in event.input will be treated as "data" in the updated CMR metadata file.

    • Task then outputs an updated list of granule objects.

    Task Inputs

    Input

    This task expects an incoming input that contains a list of 'staged' S3 URIs to move to their final archive location. If CMR metadata is to be updated for a granule, it must also be included in the input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects event.input to provide an array of Cumulus granule objects. The files listed for each granule represent the files to be acted upon as described in summary.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects with post-move file locations as the payload for the next task, and returns only the expected payload for the next task. If a CMR file has been specified for a granule object, the CMR resources related to the granule files will be updated according to the updated granule file metadata.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow.

    - + \ No newline at end of file diff --git a/docs/next/workflow_tasks/parse_pdr/index.html b/docs/next/workflow_tasks/parse_pdr/index.html index 453635310ea..c1fc3e0f963 100644 --- a/docs/next/workflow_tasks/parse_pdr/index.html +++ b/docs/next/workflow_tasks/parse_pdr/index.html @@ -5,13 +5,13 @@ Parse PDR | Cumulus Documentation - +
    Version: Next

    Parse PDR

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to do the following with the incoming PDR object:

    • Stage it to an internal S3 bucket

    • Parse the PDR

    • Archive the PDR and remove the staged file if successful

    • Outputs a payload object containing metadata about the parsed PDR (e.g. total size of all files, files counts, etc) and a granules object

    The constructed granules object is created using PDR metadata to determine values like data type and version, collection definitions to determine a file storage location based on the extracted data type and version number.

    Granule file types are converted from the PDR spec types to CNM types according to the following translation table:

      HDF: 'data',
    HDF-EOS: 'data',
    SCIENCE: 'data',
    BROWSE: 'browse',
    METADATA: 'metadata',
    BROWSE_METADATA: 'metadata',
    QA_METADATA: 'metadata',
    PRODHIST: 'qa',
    QA: 'metadata',
    TGZ: 'data',
    LINKAGE: 'data'

    Files missing file types will have none assigned, files with invalid types will result in a PDR parse failure.

    Task Inputs

    Input

    This task expects an incoming input that contains name and path information about the PDR to be parsed. For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    Provider

    A Cumulus provider object. Used to define connection information for retrieving the PDR.

    Bucket

    Defines the bucket where the 'pdrs' folder for parsed PDRs will be stored.

    Collection

    A Cumulus collection object. Used to define granule file groupings and granule metadata for discovered files.

    Task Outputs

    This task outputs a single payload output object containing metadata about the parsed PDR (e.g. filesCount, totalSize, etc), a pdr object with information for later steps and a the generated array of granule objects.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow.

    - + \ No newline at end of file diff --git a/docs/next/workflow_tasks/queue_granules/index.html b/docs/next/workflow_tasks/queue_granules/index.html index fb488be5a34..6d2a8dc0744 100644 --- a/docs/next/workflow_tasks/queue_granules/index.html +++ b/docs/next/workflow_tasks/queue_granules/index.html @@ -5,14 +5,14 @@ Queue Granules | Cumulus Documentation - +
    Version: Next

    Queue Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions, and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to schedule ingest of granules that were discovered on a remote host, whether via the DiscoverGranules task or the ParsePDR task.

    The task utilizes a defined collection in concert with a defined provider, either on each granule, or passed in via config to queue up ingest executions for each granule, or for batches of granules.

    The constructed granules object is defined by the collection passed in the configuration, and has impacts to other provided core Cumulus Tasks.

    Users of this task in a workflow are encouraged to carefully consider their configuration in context of downstream tasks and workflows.

    Task Inputs

    Each of the following sections are a high-level discussion of the intent of the various input/output/config values.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects an incoming input that contains granules and information about them and their files. For the specifics, see the Cumulus Tasks page entry for the schema.

    This input is most commonly the output from a preceding DiscoverGranules or ParsePDR task.

    Cumulus Configuration

    This task does expect values to be set in the task_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    provider

    A Cumulus provider object for the originating provider. Will be passed along to the ingest workflow. This will be overruled by more specific provider information that may exist on a granule.

    internalBucket

    The Cumulus internal system bucket.

    granuleIngestWorkflow

    A string property that denotes the name of the ingest workflow into which granules should be queued.

    queueUrl

    A string property that denotes the URL of the queue to which scheduled execution messages are sent.

    preferredQueueBatchSize

    A number property that sets an upper bound on the size of each batch of granules queued into the payload of an ingest execution. Setting this property to a value higher than 1 allows queueing of multiple granules per ingest workflow.

    As ingest executions typically expect granules in the payload to have a common collection and common provider, this property only sets an upper bound within which batches will be created based on common collection and provider information.

    This means batches may be smaller than the preferred size if collection or provider information diverge, but never larger.

    The default value if none is specified is 1, which will queue one ingest execution per granule.

    concurrency

    A number property that determines the level of concurrency with which ingest executions are scheduled. Granules or batches of granules will be queued up into executions at this level of concurrency.

    This property is also used to limit concurrency when updating granule status to queued.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when queue-granules receives a large number of granules as input. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the queue-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    executionNamePrefix

    A string property that will prefix the names of scheduled executions.

    childWorkflowMeta

    An object property that will be merged into the scheduled execution input's meta field.

    Task Outputs

    This task outputs an assembled array of workflow execution ARNs for all scheduled workflow executions within the payload's running object.

    - + \ No newline at end of file diff --git a/docs/next/workflows/cumulus-task-message-flow/index.html b/docs/next/workflows/cumulus-task-message-flow/index.html index 81a5d924b27..c4a9b082e27 100644 --- a/docs/next/workflows/cumulus-task-message-flow/index.html +++ b/docs/next/workflows/cumulus-task-message-flow/index.html @@ -5,14 +5,14 @@ Cumulus Tasks: Message Flow | Cumulus Documentation - +
    Version: Next

    Cumulus Tasks: Message Flow

    Cumulus Tasks comprise Cumulus Workflows and are either AWS Lambda tasks or AWS Elastic Container Service (ECS) activities. Cumulus Tasks permit a payload as input to the main task application code. The task payload is additionally wrapped by the Cumulus Message Adapter. The Cumulus Message Adapter supplies additional information supporting message templating and metadata management of these workflows.

    Diagram showing how incoming and outgoing Cumulus messages for workflow steps are handled by the Cumulus Message Adapter

    The steps in this flow are detailed in sections below.

    Cumulus Message Format

    A full Cumulus Message has the following keys:

    • cumulus_meta: System runtime information that should generally not be touched outside of Cumulus library code or the Cumulus Message Adapter. Stores meta information about the workflow such as the state machine name and the current workflow execution's name. This information is used to look up the current active task. The name of the current active task is used to look up the corresponding task's config in task_config.
    • meta: Runtime information captured by the workflow operators. Stores execution-agnostic variables.
    • payload: Payload is runtime information for the tasks.

    In addition to the above keys, it may contain the following keys:

    • replace: A key generated in conjunction with the Cumulus Message adapter. It contains the location on S3 for a message payload and a Target JSON path in the message to extract it to.
    • exception: A key used to track workflow exceptions, should not be modified outside of Cumulus library code.

    Here's a simple example of a Cumulus Message:

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    A message utilizing the Cumulus Remote message functionality must have at least the keys replace and cumulus_meta. Depending on configuration other portions of the message may be present, however the cumulus_meta, meta, and payload keys must be present once extraction is complete.

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    Cumulus Message Preparation

    The event coming into a Cumulus Task is assumed to be a Cumulus Message and should first be handled by the functions described below before being passed to the task application code.

    Preparation Step 1: Fetch remote event

    Fetch remote event will fetch the full event from S3 if the cumulus message includes a replace key.

    Once "my-large-event.json" is fetched from S3, it's returned from the fetch remote event function. If no "replace" key is present, the event passed to the fetch remote event function is assumed to be a complete Cumulus Message and returned as-is.

    Preparation Step 2: Parse step function config from CMA configuration parameters

    This step determines what current task is being executed. Note this is different from what lambda or activity is being executed, because the same lambda or activity can be used for different tasks. The current task name is used to load the appropriate configuration from the Cumulus Message's 'task_config' configuration parameter.

    Preparation Step 3: Load nested event

    Using the config returned from the previous step, load nested event resolves templates for the final config and input to send to the task's application code.

    Task Application Code

    After message prep, the message passed to the task application code is of the form:

    {
    "input": {},
    "config": {}
    }

    Create Next Message functions

    Whatever comes out of the task application code is used to construct an outgoing Cumulus Message.

    Create Next Message Step 1: Assign outputs

    The config loaded from the Fetch step function config step may have a cumulus_message key. This can be used to "dispatch" fields from the task's application output to a destination in the final event output (via URL templating). Here's an example where the value of input.anykey would be dispatched as the value of payload.out in the final cumulus message:

    {
    "task_config": {
    "bar": "baz",
    "cumulus_message": {
    "input": "{$.payload.input}",
    "outputs": [
    {
    "source": "{$.input.anykey}",
    "destination": "{$.payload.out}"
    }
    ]
    }
    },
    "cumulus_meta": {
    "task": "Example",
    "message_source": "local",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "input": {
    "anykey": "anyvalue"
    }
    }
    }

    Create Next Message Step 2: Store remote event

    If the ReplaceConfiguration parameter is set, the configured key's value will be stored in S3 and the final output of the task will include a replace key that contains configuration for a future step to extract the payload on S3 back into the Cumulus Message. The replace key identifies where the large event node has been stored in S3.

    - + \ No newline at end of file diff --git a/docs/next/workflows/developing-a-cumulus-workflow/index.html b/docs/next/workflows/developing-a-cumulus-workflow/index.html index cb259fdb4a0..b6e68afbe8d 100644 --- a/docs/next/workflows/developing-a-cumulus-workflow/index.html +++ b/docs/next/workflows/developing-a-cumulus-workflow/index.html @@ -5,13 +5,13 @@ Creating a Cumulus Workflow | Cumulus Documentation - +
    Version: Next

    Creating a Cumulus Workflow

    The Cumulus workflow module

    To facilitate adding a workflows to your deployment Cumulus provides a workflow module.

    In combination with the Cumulus message, the workflow module provides a way to easily turn a Step Function definition into a Cumulus workflow, complete with:

    Using the module also ensures that your workflows will continue to be compatible with future versions of Cumulus.

    For more on the full set of current available options for the module, please consult the module README.

    Adding a new Cumulus workflow to your deployment

    To add a new Cumulus workflow to your deployment that is using the cumulus module, add a new workflow resource to your deployment directory, either in a new .tf file, or to an existing file.

    The workflow should follow a syntax similar to:

    module "my_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/vx.x.x/terraform-aws-cumulus-workflow.zip"

    prefix = "my-prefix"
    name = "MyWorkflowName"
    system_bucket = "my-internal-bucket"

    workflow_config = module.cumulus.workflow_config

    tags = { Deployment = var.prefix }

    state_machine_definition = <<JSON
    {}
    JSON
    }

    In the above example, you would add your state_machine_definition using the Amazon States Language, using tasks you've developed and Cumulus core tasks that are made available as part of the cumulus terraform module.

    note

    Cumulus follows the convention of tagging resources with the prefix variable { Deployment = var.prefix } that you pass to the cumulus module. For resources defined outside of Core, it's recommended that you adopt this convention as it makes resources and/or deployment recovery scenarios much easier to manage.

    Examples

    For a functional example of a basic workflow, please take a look at the hello_world_workflow.

    For more complete/advanced examples, please read the following cookbook entries/topics:

    - + \ No newline at end of file diff --git a/docs/next/workflows/developing-workflow-tasks/index.html b/docs/next/workflows/developing-workflow-tasks/index.html index 228b1b9cb2e..5a4cdc681bb 100644 --- a/docs/next/workflows/developing-workflow-tasks/index.html +++ b/docs/next/workflows/developing-workflow-tasks/index.html @@ -5,13 +5,13 @@ Developing Workflow Tasks | Cumulus Documentation - +
    Version: Next

    Developing Workflow Tasks

    Workflow tasks can be either AWS Lambda Functions or ECS Activities.

    Lambda functions

    The full set of available core Lambda functions can be found in the deployed cumulus module zipfile at /tasks, as well as reference documentation here. These Lambdas can be referenced in workflows via the outputs from that module (see the cumulus-template-deploy repo for an example).

    The tasks source is located in the Cumulus repository at cumulus/tasks.

    You can also develop your own Lambda function. See the Lambda Functions page to learn more.

    ECS Activities

    ECS activities are supported via the cumulus_ecs_module available from the Cumulus release page.

    Please read the module README for configuration details.

    For assistance in creating a task definition within the module read the AWS Task Definition Docs.

    For a step-by-step example of using the cumulus_ecs_module, please see the related cookbook entry.

    Cumulus Docker Image

    ECS activities require a docker image. Cumulus provides a docker image (source for node 12x+ lambdas on dockerhub: cumuluss/cumulus-ecs-task.

    Alternate Docker Images

    Custom docker images/runtimes are supported as are private registries. For details on configuring a private registry/image see the AWS documentation on Private Registry Authentication for Tasks.

    - + \ No newline at end of file diff --git a/docs/next/workflows/docker/index.html b/docs/next/workflows/docker/index.html index 50c79a78d83..839b93e473e 100644 --- a/docs/next/workflows/docker/index.html +++ b/docs/next/workflows/docker/index.html @@ -5,13 +5,13 @@ Dockerizing Data Processing | Cumulus Documentation - +
    Version: Next

    Dockerizing Data Processing

    The software used for processing data amongst DAAC's is developed in a variety of languages, and with different sets of dependencies and build environments. To standardize processing, Docker allows us to provide an environment (called an image) to meet the needs of any processing software, while running on the kernel of the host server (in this case, an EC2 instance). This lightweight virtualization does not carry the overhead of any additional VM, providing near-instant startup and the ability to run any dockerized process as a command-line call.

    Using Docker

    Docker images are run using the docker command and can be used to build a Docker image from a Dockerfile, fetch an existing image from a remote repository, or run an existing image. In Cumulus, docker-compose is used to help developers by making it easy to build images locally and test them.

    To run a command using docker-compose use:

    docker-compose run *command*

    where command is one of

    • build: Build and tag the image using the Dockerfile
    • bash: Run the Dockerfile interactively (via a bash shell)
    • test: Processes data in the directory data/input and saves the output to the data/test-output directory. These directories must exist.

    The Docker Registry

    Docker images that are built can be stored in the cloud in a Docker registry. Currently we are using the AWS Docker Registry, called ECR. To access these images, you must first log in using your AWS credentials, and use AWS CLI to get the proper login string:

    # install awscli
    pip install awscli

    # login to the AWS Docker registry
    aws ecr get-login --region us-east-1 | source /dev/stdin

    As long as you have permissions to access the NASA Cumulus AWS account, this will allow you to pull images from AWS ECR, and push rebuilt or new images there as well. Docker-compose may also be used to push images.

    docker-compose push

    Which will push the built image to AWS ECR. Note that the image built by docker-compose will have is the :latest tag, and will overwrite the :latest tagged docker image on the registry. This file should be updated to push to a different tag if overwriting is not desired.

    In normal use-cases for most production images on either repository, CircleCI takes care of this building and deploying process

    Source Control and Versions

    All the code necessary for processing a data collection, and the code used to create a Docker image for it, is contained within a single GitHub repository, following the naming convention docker-${dataname}, where dataname is the collection's short name. The git develop branch is the current development version, master is the latest release version, and a git tag exists for each tagged version (e.g., v0.1.3).

    Docker images can have multiple tagged versions. The Docker images in the registry follow this same convention. A Docker image tagged as 'develop' is an image of the development branch. 'latest' is the master brach, and thus the latest tagged version, with an additional tagged image for each version tagged in the git repository.

    The generation of the released tagged images are created and deployed automatically with Circle-CI, the continuous integration system used by Cumulus. When new commits are merged into a branch, the appropriate Docker image is built, tested, and deployed to the Docker registry. More on testing below.

    Docker Images

    docker-base

    Docker images are built in layers, allowing common dependencies to be shared to child Docker images. A base docker image is provided that includes some dependencies shared among the current HS3 data processing codes. This includes NetCDF libraries, AWS Cli, Python, Git, as well as py-cumulus, a collection of Python utilities that are used in the processing scripts. The docker-base repository is used to generate new images that are then stored in AWS ECR.

    The docker-base image can be interacted with by running it in interactive mode (ie, docker run -it docker-base, since the default "entrypoint" to the image is a bash shell.

    docker-data example: docker-hs3-avaps

    To create a new processing stream for a data collection, a Dockerfile is used to specify what additional dependencies may be required, and to build them in that environment, if necessary. An example Dockerfile is shown here, for the hs3avaps collection.

    # cumulus processing Dockerfile: docker-hs3-avaps

    FROM 000000000000.dkr.ecr.us-east-1.amazonaws.com/cumulus-base:latest

    # copy needed files
    WORKDIR /work
    COPY . /work

    RUN apt-get install -y nco libhdf5-dev

    # compile code
    RUN gcc convert/hs3cpl2nc.c -o _convert -I/usr/include/hdf5/serial -L/usr/include/x86_64-linux-gnu -lnetcdf -lhdf5_serial

    # input and output directories will be Data Pipeline staging dir env vars
    ENTRYPOINT ["/work/process.py"]
    CMD ["input", "output"]

    When this Dockerfile is built, docker will first use the latest cumulus-base image. It will then copy the entire GitHub repository (the processing required for a single data collection is a repository) to the /work directory which will now contain all the code necessary to process this data. In this case, a C file is compiled to convert the supplied hdf5 files to NetCDF files. Note that this also requires installing the system libraries nco and libhdf5-dev via apt-get. Lastly, the Dockerfile sets the entrypoint to the processing handler, so that this command is run when the image is run. It expects two arguments to be handed to it: 'input' and 'output' meaning the input and output directories.

    Process Handler

    All of the processing is managed through a handler, which is called when the docker image is run. Currently, Python is used for the process handler, which provides a simple interface to perform validation, run shell commands, test the output generated, and log the output for us. The handler function takes two arguments: input directory and output directory. Any other needed parameters are set via environment variables. The handler function will process the input directory, and put any output to be saved in the output directory.

    Py-cumulus

    The py-cumulus library provides some helper functions that can be used for logging, writing metadata, and testing. Py-cumulus is installed in the docker-base image. Currently, there are three modules:

    import cumulus.logutils
    import cumulus.metadata
    import cumulus.process

    Example process handler

    An example process handler is given here, in this case a shortened version of the hs3-cpl data collection. The main function at the bottom passes the provided input and output directory arguments to the process() function. The first thing process() does is to get the Cumulus logger. The Cumulus logger will send output to both stdout and Splunk, to be used in the Cumulus pipeline. Log strings are made using the make_log_string() function which properly formats a message to be handled by Splunk.

    #!/usr/bin/env python

    import os
    import sys
    import glob
    import re
    import datetime
    import subprocess
    from cumulus.logutils import get_logger, make_log_string
    from cumulus.metadata import write_metadata
    from cumulus.process import check_output

    # the main process handler
    def process(indir, outdir):
    """ Process this directory """
    log = get_logger()
    log.info(
    make_log_string(process='processing', message="Processing %s into %s" % (indir, outdir))
    )

    dataname = 'cpl'
    dataid = os.getenv('SHORT_NAME', 'hs3cpl')

    for f in glob.glob(os.path.join(indir, '*.hdf5')):
    bname = os.path.basename(f)
    log.info(
    make_log_string(granule_id=bname, process='processing', message="Processing started for %s" % bname)
    )

    # convert file to netcdf
    cmd = ['/work/_convert', f, outdir]
    out = subprocess.check_output(cmd)
    fout = glob.glob(os.path.join(outdir, 'HS3_%s*.nc' % bname[0:7]))
    fout = '' if len(fout) == 0 else fout[0]
    check_output(fout)
    cmd = ['ncatted -h -a Conventions,global,c,c,"CF-1.6" %s' % fout]
    out = subprocess.check_output(cmd, shell=True)
    log.debug(out)

    # write metadata output
    write_metadata(fout, dataname=dataname, dataid=dataid, outdir=outdir)

    # remove the generated metadata files
    for f in glob.glob(os.path.join(outdir, '*.met')):
    os.remove(f)

    if __name__ == "__main__":
    indir = sys.argv[1]
    outdir = sys.argv[2]
    process(indir, outdir)

    After setting up logging the code has a for-loop for processing any matching hdf5 in the input directory:

    1. Convert to NetCDF with a C script
    2. Validate the output (in this case just check for existence)
    3. Use 'ncatted' to update the resulting file to be CF-compliant
    4. Write out metadata generated for this file

    Process Testing

    It is important to have tests for data processing, however in many cases datafiles can be large so it is not practical to store the test data in the repository. Instead, test data is currently stored on AWS S3, and can be retrieved using the AWS CLI.

    aws s3 sync s3://cumulus-ghrc-logs/sample-data/collection-name data

    Where collection-name is the name of the data collection, such as 'avaps', or 'cpl'. For example, an abridged version of the data for CPL includes:

    ├── cpl
    │   ├── input
    │   │   ├── HS3_CPL_ATB_12203a_20120906.hdf5
    │   │   ├── HS3_CPL_OP_12203a_20120906.hdf5
    │   └── output
    │   ├── HS3_CPL_ATB_12203a_20120906.nc
    │   ├── HS3_CPL_ATB_12203a_20120906.nc.meta.xml
    │   ├── HS3_CPL_OP_12203a_20120906.nc
    │   ├── HS3_CPL_OP_12203a_20120906.nc.meta.xml

    Contained in the input directory are all possible sets of data files, while the output directory is the expected result of processing. In this case the hdf5 files are converted to NetCDF files and XML metadata files are generated.

    The docker image for a process can be used on the retrieved test data. First create a test-output directory in the newly created data directory.

    mkdir data/test-output

    Then run the docker image using docker-compose.

    docker-compose run test

    This will process the data in the data/input directory and put the output into data/test-output. Repositories also include Python based tests which will validate this newly created output to the contents of data/output. Use Python's Nose tool to run the included tests.

    nosetests

    If the data/test-output directory validated against the contents of data/output the tests will be successful, otherwise an error will be reported.

    - + \ No newline at end of file diff --git a/docs/next/workflows/index.html b/docs/next/workflows/index.html index cfa614bcfd6..6791f2cfd96 100644 --- a/docs/next/workflows/index.html +++ b/docs/next/workflows/index.html @@ -5,13 +5,13 @@ Workflows Overview | Cumulus Documentation - +
    Version: Next

    Workflows Overview

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    Provider data ingest and GIBS have a set of common needs in getting data from a source system and into the cloud where they can be distributed to end users. These common needs are:

    • Data Discovery - Crawling, polling, or detecting changes from a variety of sources.
    • Data Transformation - Taking data files in their original format and extracting and transforming them into another desired format such as visible browse images.
    • Archival - Storage of the files in a location that's accessible to end users.

    The high level view of the architecture and many of the individual steps are the same but the details of ingesting each type of collection differs. Different collection types and different providers have different needs. The individual boxes of a workflow are not only different. The branching, error handling, and multiplicity of the arrows connecting the boxes are also different. Some need visible images rendered from component data files from multiple collections. Some need to contact the CMR with updated metadata. Some will have different retry strategies to handle availability issues with source data systems.

    AWS and other cloud vendors provide an ideal solution for parts of these problems but there needs to be a higher level solution to allow the composition of AWS components into a full featured solution. The Ingest Workflow Architecture is designed to meet the needs for Earth Science data ingest and transformation.

    Goals

    Flexibility and Composability

    The steps to ingest and process data is different for each collection within a provider. Ingest should be as flexible as possible in the rearranging of steps and configuration.

    We want to use lego-like individual steps that can be composed by an operator.

    Individual steps should ...

    • Be as ignorant as possible of the overall flow. They should not be aware of previous steps.
    • Be runnable on their own.
    • Define their input and output in simple data structures.
    • Be domain agnostic.
    • Not make assumptions of specifics of what goes into a granule for example.

    Scalable

    The ingest architecture needs to be scalable both to handle ingesting hundreds of millions of granules and interpret dozens of different workflows.

    Data Provenance

    • We should have traceability for how data was produced and where it comes from.
    • Use immutable representations of data. Data once received is not overwritten. Data can be removed for cleanup.
    • All software is versioned. We can trace transformation of data by tracking the immutable source data and the versioned software applied to it.

    Operator Visibility and Control

    • Operators should be able to see and understand everything that is happening in the system.
    • It should be obvious why things are happening and straightforward to diagnose problems.
    • We generally assume that the operators know best in terms of the limits on a providers infrastructure, how often things need to be done, and details of a collection. The architecture should defer to their decisions and knowledge while providing safety nets to prevent problems.

    A Reconfigurable Workflow Architecture

    The Ingest Workflow Architecture is defined by two entity types, Workflows and Tasks. A Workflow is a set of composed Tasks to complete an objective such as ingesting a granule. Tasks are the individual steps of a Workflow that perform one job. The workflow is responsible for executing the right task based on the current state and response from the last task executed. Tasks are completely decoupled in that they don't call each other or even need to know about the presence of other tasks.

    Workflows and tasks are configured as Terraform resources, which are triggered via configured rules within Cumulus.

    Diagram showing the Step Function execution path through workflow tasks for a collection ingest

    See the Example GIBS Ingest Architecture showing how workflows and tasks are used to define the GIBS Ingest Architecture.

    Workflows

    A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions.

    Benefits of AWS Step Functions

    AWS Step functions are described in detail in the AWS documentation but they provide several benefits which are applicable to AWS.

    • Prebuilt solution
    • Operations Visibility
      • Visual diagram
      • Every execution is recorded with both inputs and output for every step.
    • Composability
      • Allow composing AWS Lambdas and code running in other steps. Code can be run in EC2 to interface with it or even on premise if desired.
      • Step functions allow specifying when steps run in parallel or choices between steps based on data from the previous step.
    • Flexibility
      • Step functions are designed to be easy to build new applications and reconfigure. We're exposing that flexibility directly to the provider.
    • Reliability and Error Handling
      • Step functions allow configuration of retries and adding handling of error conditions.
    • Described via data
      • This makes it easy to save the step function in configuration management solutions.
      • We can build simple interfaces on top of the flexibility provided.

    Workflow Scheduler

    The scheduler is responsible for initiating a step function and passing in the relevant data for a collection. This is currently configured as an interval for each collection. The scheduler service creates the initial event by combining the collection configuration with the AWS execution context defined via the cumulus terraform module.

    Tasks

    A workflow is composed of tasks. Each task is responsible for performing a discrete step of the ingest process. These can be activities like:

    • Crawling a provider website for new data.
    • Uploading data from a provider to S3.
    • Executing a process to transform data.

    AWS Step Functions permit tasks to be code running anywhere, even on premise. We expect most tasks will be written as Lambda functions in order to take advantage of the easy deployment, scalability, and cost benefits provided by AWS Lambda.

    • Leverages Existing Work
      • The design leverages the existing work of Amazon by defining workflows using the AWS Step Function State Language. This is the language that was created for describing the state machines used in AWS Step Functions.
    • Open for Extension
      • Both meta and task_config which are used for configuring at the collection and task levels do not dictate the fields and structure of the configuration. Additional task specific JSON schemas can be used for extending the validation of individual steps.
    • Data-centric Configuration
      • The use of a single JSON configuration file allows this to be added to a workflow. We build additional support on top of the configuration file for simpler domain specific configuration or interactive GUIs.

    For more details on Task Messages and Configuration, visit Cumulus configuration and message protocol documentation.

    Ingest Deploy

    To view deployment documentation, please see the Cumulus deployment documentation.

    Tradeoffs, and Benefits

    This section documents various tradeoffs and benefits of the Ingest Workflow Architecture.

    Tradeoffs

    Workflow execution is handled completely by AWS

    This means we can't add our own code into the orchestration of the workflow. We can't add new features not supported by Step Functions. We can't do things like enforce that the responses from tasks always conform to a schema or extract the configuration for a task ahead of it's execution.

    If we implemented our own orchestration we'd be able to add all of these. We save significant amounts of development effort and gain all the features of Step Functions for this trade off. One workaround is by providing a library of common task capabilities. These would optionally be available to tasks that can be implemented with Node.js and are able to include the library.

    Workflow Configuration is specified in AWS Step Function States Language

    The current design combines the states language defined by AWS with Ingest specific configuration. This means our representation has a tight coupling with their standard. If they make backwards incompatible changes in the future we will have to deal with existing projects written against that.

    We avoid having to develop our own standard and code to process it. The design can support new features in AWS Step Functions without needing to update the Ingest library code changes. It is unlikely they will make a backwards incompatible change at this point. One mitigation for this is writing data transformations to a new format if that were to happen.

    Collection Configuration Flexibility vs Complexity

    The Collections Configuration File is very flexible but requires more knowledge of AWS step functions to configure. A person modifying this file directly would need to comfortable editing a JSON file and configuring AWS Step Functions state transitions which address AWS resources.

    The configuration file itself is not necessarily meant to be edited by a human directly. Since we are developing a reconfigurable, composable architecture that specified entirely in data additional tools can be developed on top of it. The existing recipes.json files can be mapped to this format. Operational Tools like a GUI can be built that provide a usable interface for customizing workflows but it will take time to develop these tools.

    Benefits

    This section describes benefits of the Ingest Workflow Architecture.

    Simplicity

    The concepts of Workflows and Tasks are simple ones that should make sense to providers. Additionally, the implementation will only consist of a few components because the design leverages existing services and capabilities of AWS. The Ingest implementation will only consist of some reusable task code to make task implementation easier, Ingest deployment, and the Workflow Scheduler.

    Composability

    The design aims to satisfy the needs for ingest integrating different workflows for providers. It's flexible in terms of the ability to arrange tasks to meet the needs of a collection. Providers have developed and incorporated open source tools over the years. All of these are easily integrable into the workflows as tasks.

    There is low coupling between task steps. Failures of one component don't bring the whole system down. Individual tasks can be deployed separately.

    Scalability

    AWS Step Functions scale up as needed and aren't limited by a set of number of servers. They also easily allow you to leverage the inherent scalability of serverless functions.

    Monitoring and Auditing

    • Every execution is captured.
    • Every task run has captured input and outputs.
    • CloudWatch Metrics can be used for monitoring many of the events with the StepFunctions. It can also generate alarms for the whole process.
    • Visual report of the entire configuration.
      • Errors and success states are highlighted visually in the flow.

    Data Provenance

    • Monitoring and auditing ensures we know the data that was given to a task.
    • Workflows are versioned and the state machines stored in AWS Step Functions are immutable. Once created they cannot change.
    • Versioning of data in S3 or using immutable records in S3 will mean we always know what data was created as the result of a step or fed into a step.

    Appendix

    Example GIBS Ingest Architecture

    This shows the GIBS Ingest Architecture as an example of the use of the Ingest Workflow Architecture.

    • The GIBS Ingest Architecture consists of two workflows per collection type. There is one for discovery and one for ingest. The final stage of discovery triggers multiple ingest workflows for each MRF granule that needs to be generated.
    • It demonstrates both lambdas as tasks and a container used for MRF generation.

    GIBS Ingest Workflows

    Diagram showing the AWS Step Function execution path for a GIBS ingest workflow

    GIBS Ingest Granules Workflow

    This shows a visualization of an execution of the ingets granules workflow in step functions. The steps highlighted in green are the ones that executed and completed successfully.

    Diagram showing the AWS Step Function execution path for a GIBS ingest granules workflow

    - + \ No newline at end of file diff --git a/docs/next/workflows/input_output/index.html b/docs/next/workflows/input_output/index.html index 8273479acfb..7eb1d9908ea 100644 --- a/docs/next/workflows/input_output/index.html +++ b/docs/next/workflows/input_output/index.html @@ -5,14 +5,14 @@ Workflow Inputs & Outputs | Cumulus Documentation - +
    Version: Next

    Workflow Inputs & Outputs

    General Structure

    Cumulus uses a common format for all inputs and outputs to workflows. The same format is used for input and output from workflow steps. The common format consists of a JSON object which holds all necessary information about the task execution and AWS environment. Tasks return objects identical in format to their input with the exception of a task-specific payload field. Tasks may also augment their execution metadata.

    Cumulus Message Adapter

    The Cumulus Message Adapter and Cumulus Message Adapter libraries help task developers integrate their tasks into a Cumulus workflow. These libraries adapt input and outputs from tasks into the Cumulus Message format. The Scheduler service creates the initial event message by combining the collection configuration, external resource configuration, workflow configuration, and deployment environment settings. The subsequent workflow messages between tasks must conform to the message schema. By using the Cumulus Message Adapter, individual task Lambda functions only receive the input and output specifically configured for the task, and not non-task-related message fields.

    The Cumulus Message Adapter libraries are called by the tasks with a callback function containing the business logic of the task as a parameter. They first adapt the incoming message to a format more easily consumable by Cumulus tasks, then invoke the task, and then adapt the task response back to the Cumulus message protocol to be sent to the next task.

    A task's Lambda function can be configured to include a Cumulus Message Adapter library which constructs input/output messages and resolves task configurations. The CMA can then be included in one of several ways:

    Lambda Layer

    In order to make use of this configuration, a Lambda layer must be uploaded to your account. Due to platform restrictions, Core cannot currently support sharable public layers, however you can deploy the appropriate version from the release page in two ways:

    Once you've deployed the layer, integrate the CMA layer with your Lambdas:

    • If using the cumulus module, set the cumulus_message_adapter_lambda_layer_version_arn in your .tfvars file to integrate the CMA layer with all core Cumulus lambdas.
    • If including your own Lambda or ECS task Terraform modules, specify the CMA layer ARN in the Terraform resource definitions. Also, make sure to set the CUMULUS_MESSAGE_ADAPTER_DIR environment variable for the task to /opt for the CMA integration to work properly.

    In the future if you wish to update/change the CMA version you will need to update the deployed CMA, and update the layer configuration for the impacted Lambdas as needed.

    note

    Updating/removing a layer does not change a deployed Lambda, so to update the CMA you should deploy a new version of the CMA layer, update the associated Lambda configuration to reference the new CMA version, and re-deploy your Lambdas.

    Manual Addition

    You can include the CMA package in the Lambda code in the cumulus-message-adapter sub-directory in your lambda .zip, for any Lambda runtime that includes a python runtime. python 2 is included in Lambda runtimes that use Amazon Linux, however Amazon Linux 2 will not support this directly.

    python runtime

    It is expected that upcoming Cumulus releases will update the CMA layer to include a python runtime.

    If you are manually adding the message adapter to your source and utilizing the CMA, you should set the Lambda's CUMULUS_MESSAGE_ADAPTER_DIR environment variable to target the installation path for the CMA.

    CMA Input/Output

    Input to the task application code is a json object with keys:

    • input: By default, the incoming payload is the payload output from the previous task, or it can be a portion of the payload as configured for the task in the corresponding .tf workflow definition file.
    • config: Task-specific configuration object with URL templates resolved.

    Output from the task application code is returned in and placed in the payload key by default, but the config key can also be used to return just a portion of the task output.

    CMA configuration

    As of Cumulus > 1.15 and CMA > v1.1.1, configuration of the CMA is expected to be driven by AWS Step Function Parameters.

    Using the CMA package with the Lambda by any of the above mentioned methods (Lambda Layers, manual) requires configuration for its various features via a specific Step Function Parameters configuration format (see sample workflows in the examples cumulus-tf source for more examples):

    {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": "{some config}",
    "task_config": "{some config}"
    }
    }

    The "event.$": "$" parameter is required as it passes the entire incoming message to the CMA client library for parsing, and the CMA itself to convert the incoming message into a Cumulus message for use in the function.

    The following are the CMA's current configuration settings:

    ReplaceConfig (Cumulus Remote Message)

    Because of the potential size of a Cumulus message, mainly the payload field, a task can be set via configuration to store a portion of its output on S3 with a message key Remote Message that defines how to retrieve it and an empty JSON object {} in its place. If the portion of the message targeted exceeds the configured MaxSize (defaults to 0 bytes) it will be written to S3.

    The CMA remote message functionality can be configured using parameters in several ways:

    Partial Message

    Setting the Path/Target path in the ReplaceConfig parameter (and optionally a non-default MaxSize)

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 1,
    "Path": "$.payload",
    "TargetPath": "$.payload"
    }
    }
    }
    }
    }

    will result in any payload output larger than the MaxSize (in bytes) to be written to S3. The CMA will then mark that the key has been replaced via a replace key on the event. When the CMA picks up the replace key in future steps, it will attempt to retrieve the output from S3 and write it back to payload.

    Note that you can optionally use a different TargetPath than Path, however as the target is a JSON path there must be a key to target for replacement in the output of that step. Also note that the JSON path specified must target one node, otherwise the CMA will error, as it does not support multiple replacement targets.

    If TargetPath is omitted, it will default to the value for Path.

    Full Message

    Setting the following parameters for a lambda:

    DiscoverGranules:
    Parameters:
    cma:
    event.$: '$'
    ReplaceConfig:
    FullMessage: true

    will result in the CMA assuming the entire inbound message should be stored to S3 if it exceeds the default max size.

    This is effectively the same as doing:

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 0,
    "Path": "$",
    "TargetPath": "$"
    }
    }
    }
    }
    }

    Cumulus Message example

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Cumulus Remote Message example

    The message may contain a reference to an S3 Bucket, Key and TargetPath as follows:

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    task_config

    This configuration key contains the input/output configuration values for definition of inputs/outputs via URL paths. Important: These values are all relative to json object configured for event.$.

    This configuration's behavior is outlined in the CMA step description below.

    The configuration should follow the format:

    {
    "FunctionName": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "other_cma_configuration": "<config object>",
    "task_config": "<task config>"
    }
    }
    }
    }

    Example:

    {
    "StepFunction": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "sfnEnd": true,
    "stack": "{$.meta.stack}",
    "bucket": "{$.meta.buckets.internal.name}",
    "stateMachine": "{$.cumulus_meta.state_machine}",
    "executionName": "{$.cumulus_meta.execution_name}",
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    }
    }
    }

    Cumulus Message Adapter Steps

    1. Reformat AWS Step Function message into Cumulus Message

    Due to the way AWS handles Parameterized messages, when Parameters are used the CMA takes an inbound message:

    {
    "resource": "arn:aws:lambda:us-east-1:<lambda arn values>",
    "input": {
    "Other Parameter": {},
    "cma": {
    "ConfigKey": {
    "config values": "some config values"
    },
    "event": {
    "cumulus_meta": {},
    "payload": {},
    "meta": {},
    "exception": {}
    }
    }
    }
    }

    and takes the following actions:

    • Takes the object at input.cma.event and makes it the full input
    • Merges all of the keys except event under input.cma into the parent input object

    This results in the incoming message (presumably a Cumulus message) with any cma configuration parameters merged in being passed to the CMA. All other parameterized values defined outside of the cma key are ignored

    2. Resolve Remote Messages

    If the incoming Cumulus message has a replace key value, the CMA will attempt to pull the payload from S3,

    For example, if the incoming contains the following:

      "meta": {
    "foo": {}
    },
    "replace": {
    "TargetPath": "$.meta.foo",
    "Bucket": "some_bucket",
    "Key": "events/some-event-id"
    }

    The CMA will attempt to pull the file stored at Bucket/Key and replace the value at TargetPath, then remove the replace object entirely and continue.

    3. Resolve URL templates in the task configuration

    In the workflow configuration (defined under the task_config key), each task has its own configuration, and it can use URL template as a value to achieve simplicity or for values only available at execution time. The Cumulus Message Adapter resolves the URL templates (relative to the event configuration key) and then passes message to next task. For example, given a task which has the following configuration:

    {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }
    }
    }
    }

    and and incoming message that contains:

    {
    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    }
    }

    The corresponding Cumulus Message would contain:

    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }

    The message sent to the task would be:

    "config" : {
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    },
    "inlinestr": "prefixbarsuffix",
    "array": ["bar"],
    "object": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    },
    "input": "{...}"

    URL template variables replace dotted paths inside curly brackets with their corresponding value. If the Cumulus Message Adapter cannot resolve a value, it will ignore the template, leaving it verbatim in the string. While seemingly complex, this allows significant decoupling of Tasks from one another and the data that drives them. Tasks are able to easily receive runtime configuration produced by previously run tasks and domain data.

    4. Resolve task input

    By default, the incoming payload is the payload from the previous task. The task can also be configured to use a portion of the payload its input message. For example, given a task specifies cma.task_config.cumulus_message.input:

        ExampleTask:
    Parameters:
    cma:
    event.$: '$'
    task_config:
    cumulus_message:
    input: '{$.payload.foo}'

    The task configuration in the message would be:

        {
    "task_config": {
    "cumulus_message": {
    "input": "{$.payload.foo}"
    }
    },
    "payload": {
    "foo": {
    "anykey": "anyvalue"
    }
    }
    }

    The Cumulus Message Adapter will resolve the task input, instead of sending the whole payload as task input, the task input would be:

        {
    "input" : {
    "anykey": "anyvalue"
    },
    "config": {...}
    }

    5. Resolve task output

    By default, the task's return value is the next payload. However, the workflow task configuration can specify a portion of the return value as the next payload, and can also augment values to other fields. Based on the task configuration under cma.task_config.cumulus_message.outputs, the Message Adapter uses a task's return value to output a message as configured by the task-specific config defined under cma.task_config. The Message Adapter dispatches a "source" to a "destination" as defined by URL templates stored in the task-specific cumulus_message.outputs. The value of the task's return value at the "source" URL is used to create or replace the value of the task's return value at the "destination" URL. For example, given a task specifies cumulus_message.output in its workflow configuration as follows:

    {
    "ExampleTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    }
    }
    }
    }
    }

    The corresponding Cumulus Message would be:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Given the response from the task is:

        {
    "output": {
    "anykey": "boo"
    }
    }

    The Cumulus Message Adapter would output the following Cumulus Message:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    6. Apply Remote Message Configuration

    If the ReplaceConfig configuration parameter is defined, the CMA will evaluate the configuration options provided, and if required write a portion of the Cumulus Message to S3, and add a replace key to the message for future steps to utilize.

    note

    The non user-modifiable field cumulus-meta will always be retained, regardless of the configuration.

    For example, if the output message (post output configuration) from a cumulus message looks like:

        {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    the resultant output would look like:

    {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "replace": {
    "TargetPath": "$",
    "Bucket": "some-internal-bucket",
    "Key": "events/some-event-id"
    }
    }

    Additional features

    Validate task input, output and configuration messages against the schemas provided

    The Cumulus Message Adapter has the capability to validate task input, output and configuration messages against their schemas. The default location of the schemas is the schemas folder in the top level of the task and the default filenames are input.json, output.json, and config.json. The task can also configure a different schema location. If no schema can be found, the Cumulus Message Adapter will not validate the messages.

    - + \ No newline at end of file diff --git a/docs/next/workflows/lambda/index.html b/docs/next/workflows/lambda/index.html index e84a9915208..d6eebe42a31 100644 --- a/docs/next/workflows/lambda/index.html +++ b/docs/next/workflows/lambda/index.html @@ -5,13 +5,13 @@ Develop Lambda Functions | Cumulus Documentation - +
    Version: Next

    Develop Lambda Functions

    Develop a new Cumulus Lambda

    AWS provides great getting started guide for building Lambdas in the developer guide.

    Cumulus currently supports the following environments for Cumulus Message Adapter enabled functions:

    Additionally you may chose to include any of the other languages AWS supports as a resource with reduced feature support.

    Deploy a Lambda

    Node.js Lambda

    For a new Node.js Lambda, create a new function and add an aws_lambda_function resource to your Cumulus deployment (for examples, see the example in source example/lambdas.tf and ingest/lambda-functions.tf) as either a new .tf file, or added to an existing .tf file:

    resource "aws_lambda_function" "myfunction" {
    function_name = "${var.prefix}-function"
    filename = "/path/to/zip/lambda.zip"
    source_code_hash = filebase64sha256("/path/to/zip/lambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"

    vpc_config {
    subnet_ids = var.subnet_ids
    security_group_ids = var.security_group_ids
    }
    }
    configuration example

    This example contains the minimum set of required configuration.

    Make sure to include a vpc_config that matches the information you've provided the cumulus module if intending to integrate the lambda with a Cumulus deployment.

    Java Lambda

    Java Lambdas are created in much the same way as the Node.js example above.

    The source points to a folder with the compiled .class files and dependency libraries in the Lambda Java zip folder structure (details here), not an uber-jar.

    The deploy folder referenced here would contain a folder 'test_task/task/' which contains Task.class and TaskLogic.class as well as a lib folder containing dependency jars.

    Python Lambda

    Python Lambdas are created the same way as the Node.js example above.

    Cumulus Message Adapter

    For Lambdas wishing to utilize the Cumulus Message Adapter(CMA), you should define a layers key on your Lambda resource with the CMA you wish to include. See the input_output docs for more on how to create/use the CMA.

    Other Lambda Options

    Cumulus supports all of the options available to you via the aws_lambda_function Terraform resource. For more information on what's available, check out the Terraform resource docs.

    Cloudwatch log groups

    If you want to enable Cloudwatch logging for your Lambda resource, you'll need to add a aws_cloudwatch_log_group resource to your Lambda definition:

    resource "aws_cloudwatch_log_group" "myfunction_log_group" {
    name = "/aws/lambda/${aws_lambda_function.myfunction.function_name}"
    retention_in_days = 30
    tags = { Deployment = var.prefix }
    }
    - + \ No newline at end of file diff --git a/docs/next/workflows/message_granule_writes/index.html b/docs/next/workflows/message_granule_writes/index.html index b9460135348..1357842b329 100644 --- a/docs/next/workflows/message_granule_writes/index.html +++ b/docs/next/workflows/message_granule_writes/index.html @@ -5,13 +5,13 @@ Workflow Message Granule Writes | Cumulus Documentation - +
    Version: Next

    Workflow Message Granule Writes

    Overview

    When an AWS Step Function Event occurs for a Cumulus workflow or a write is attempted via the sf-sqs-report task a message is dispatched to the sfEventSqsToDbRecordsInputQueue for processing.

    Messages on the sfEventSqsToDbRecordsInputQueue (which correspond to lambda invocations or workflow events) are processed in batches of 10 and the sfEventSqsToDbRecords Lambda is triggered for each. The corresponding execution/PDR is attempted to write, then the granule records associated with the message are also attempted to be written.

    For each granule in the batch of granules one of the following occurs:

    • The granule is written successfully.
    • The granule write is dropped, due to asynchronous write constraints.
    • The lambda fails to write the granule in an unexpected way (e.g. lambda failure, AWS outage, etc). In this case, the granule will become visible again after the sfEventSqsToDbRecordsInputQueue visibility timeout (currently set as a function of the rds_connection_timing_configuration terraform variable:
    var.rds_connection_timing_configuration.acquireTimeoutMillis / 1000) + 60
    • The granule fails to write due to a schema violation, database connection issue or other expected/caught error. The message is immediately written to the Dead Letter Archive for manual intervention/investigation.

    Caveats

    • All non-bulk Cumulus API granule operations are not constrained by this logic and do not utilize the SQS update queue. They are instead invoked synchronously and follow expected RESTful logic without any asynchronous write constraints or default message values.
    • This information is correct as of release v16 of Cumulus Core. Please review the CHANGELOG and migration instructions for updated features/changes/bugfixes.

    Granule Write Constraints

    For each granule to be written, the following constraints apply:

    • granuleId must be unique.

      Granule write will not be allowed if granuleId already exists in the database for another collection, granules in this state will be rejected to write and wind up in the Dead Letter Archive

    • Message granule must match the API Granule schema.

      If not the write will be rejected, the granule status will be updated to failed, and the message will wind up in the Dead Letter Archive

    • If the granule is being updated to a running/queued status:

      • Only status, timestamp, updated_at and created_at are updated. All other values are retained as they currently exist in the database.
      • The write will only be allowed if the following are true, else the write request will be ignored as out-of-order/stale:
        • The granule createdAt value is newer or the same as the existing record.
        • If the granule is being updated to running, the execution the granule is being associated with doesn’t already exist in the following states: completed, failed.
        • If the granule is being updated to queued, the execution the granule is being associated with does not exist in any state in the database.
    • If the granule is being updated to a failed/completed state:

      • All fields provided will override existing values in the database, if any.
      • The write will only be allowed if the following are true, else the write request will be ignored as out-of-order/stale:
        • The granule createdAt value is newer or the same as the existing record.

    Message Granule Write Behavior

    The granule object values are set based on the incoming Cumulus Message values (unless otherwise specified the message values overwrite the granule payload values):

    ColumnValue
    collectionDerived from meta.collection.name and meta.collection.version
    createdAtDefaults to cumulus_meta.workflow_start_time, else payload.granule.createdAt
    durationCalculated based on the delta between cumulus_meta.workflow_start_time and when the database message writes
    errorObject taken directly from the message.error object
    executionDerived from cumulus_meta.state_machine and cumulus_meta.execution_name
    filesTaken directly from payload.granule.files. If files is null, set it to an empty list []
    pdrNameTaken directly from payload.pdr.name
    processingEndDateTimeDerived from AWS API interrogation (sfn().describeExecution) based on execution value
    processingStartDateTimeDerived from AWS API interrogation (sfn().describeExecution) based on execution value
    productVolumeSums the values of the passed in payload.granules.files.size. Does not validate against S3
    providerInferred from meta.provider value in cumulus message
    publishedTaken directly from granule.published, if not specified or null is specified, defaults to false
    queryFieldsObject taken directly from meta.granule.queryFields
    statusTaken directly from meta.status
    statusUses meta.status if provided, else payload.granule.status
    timeStampSet to the date-time value for the sfEventSqsToDbRecords invocation
    timeToArchiveTaken from payload.granule.post_to_cmr_duration/1000, provided by Core task or user task. Value will be set to zero if no value set
    timeToPreprocesspayload.granule.sync_granule_duration, provided by core or user task. Will set to 0 if value is not set
    updatedAtSet to the date-time value for the sfEventSqsToDbRecords invocation
    beginningDateTimeSee: CMR Temporal Values section below
    endingDateTimeSee: CMR Temporal Values section below
    productionDateTimeSee: CMR Temporal Values section below
    lastUpdateDateTimeSee: CMR Temporal Values section below

    CMR Temporal Values

    The following fields are generated based on values in the associated granule CMR file, if available:

    • beginningDateTime

      • If there is a beginning and end DateTime:

        • UMMG: TemporalExtent.RangeDateTime.BeginningDateTime
        • ISO: gmd:MD_DataIdentification.gmd:extent.gmd:EX_Extent.gmd:temporalElement.gmd:EX_TemporalExtent.gmd:extent.gml:TimePeriod:gml:beginPosition
      • If not:

        • UMMG: TemporalExtent.SingleDateTime
        • ISO: gmd:MD_DataIdentification.gmd:extent.gmd:EX_Extent.gmd:temporalElement.gmd:EX_TemporalExtent.gmd:extent.gml:TimeInstant.gml:timePosition
    • endingDateTime

      • If there is a beginning and end DateTime:

        • UMMG: TemporalExtent.RangeDateTime.BeginningDateTime
        • ISO: gmd:MD_DataIdentification.gmd:extent.gmd:EX_Extent.gmd:temporalElement.gmd:EX_TemporalExtent.gmd:extent.gml:TimePeriod:gml:beginPosition
      • If not:

        • UMMG: TemporalExtent.SingleDateTime
        • ISO: gmd:MD_DataIdentification.gmd:extent.gmd:EX_Extent.gmd:temporalElement.gmd:EX_TemporalExtent.gmd:extent.gml:TimeInstant.gml:timePosition
    • productionDateTime

      • UMMG: DataGranule.ProductionDateTime
      • ISO: gmd:identificationInfo:gmd:dataQualityInfo.gmd:DQ_DataQuality.gmd:lineage.gmd:LI_Lineage.gmd:processStep.gmi:LE_ProcessStep.gmd:dateTime.gco:DateTime
    • lastUpdateDateTime

      • UMMG:

      Given DataGranule.ProductionDateTime values where Type is in Update, Insert, Create , select most recent value.

      • ISO: Given a node matching gmd:MD_DataIdentification.gmd:citation.gmd:CI_Citation.gmd:title.gco:CharacterString === UpdateTime, use gmd:identificationInfo:gmd:MD_DataIdentification.gmd:citation.gmd:CI_Citation.gmd:date.gmd:CI_Date.gmd:date.gco:DateTime
    - + \ No newline at end of file diff --git a/docs/next/workflows/protocol/index.html b/docs/next/workflows/protocol/index.html index bd9e2a15458..cc97887f370 100644 --- a/docs/next/workflows/protocol/index.html +++ b/docs/next/workflows/protocol/index.html @@ -5,13 +5,13 @@ Workflow Protocol | Cumulus Documentation - +
    Version: Next

    Workflow Protocol

    Configuration and Message Use Diagram

    A diagram showing at which point in a workflow the Cumulus message is checked for conformity with the message schema and where the configuration is checked for conformity with the configuration schema

    • Configuration - The Cumulus workflow configuration defines everything needed to describe an instance of Cumulus.
    • Scheduler - This starts ingest of a collection on configured intervals.
    • Input to Step Functions - The Scheduler uses the Configuration as source data to construct the input to the Workflow.
    • AWS Step Functions - Run the workflows as kicked off by the scheduler or other processes.
    • Input to Task - The input for each task is a JSON document that conforms to the message schema.
    • Output from Task - The output of each task must conform to the message schemas as well and is used as the input for the subsequent task.
    - + \ No newline at end of file diff --git a/docs/next/workflows/workflow-configuration-how-to/index.html b/docs/next/workflows/workflow-configuration-how-to/index.html index eab66d8dd53..7e4b4724f93 100644 --- a/docs/next/workflows/workflow-configuration-how-to/index.html +++ b/docs/next/workflows/workflow-configuration-how-to/index.html @@ -5,7 +5,7 @@ Workflow Configuration How To's | Cumulus Documentation - + @@ -24,7 +24,7 @@ To take a subset of any given metadata, use the option substring.

    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{substring(file.fileName, 0, 3)}"

    This example will populate to "MOD09GQ/MOD"

    In addition to substring, several datetime-specific functions are available, which can parse a datetime string in the metadata and extract a certain part of it:

    "url_path": "{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"

    or

     "url_path": "{dateFormat(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime, YYYY-MM-DD[T]HH[:]mm[:]ss)}"

    The following functions are implemented:

    • extractYear - returns the year, formatted as YYYY
    • extractMonth - returns the month, formatted as MM
    • extractDate - returns the day of the month, formatted as DD
    • extractHour - returns the hour in 24-hour format, with no leading zero
    • dateFormat - takes a second argument describing how to format the date, and passes the metadata date string and the format argument to moment().format()
    note

    The 'move-granules' step needs to be in the workflow for this template to be populated and the file moved. This cmrMetadata or CMR granule XML needs to have been generated and stored on S3. From there any field could be retrieved and used for a url_path.

    Adding Metadata dates and times to the URL Path

    There are a number of options to pull dates from the CMR file metadata. With this metadata:

    <Granule>
    <Temporal>
    <RangeDateTime>
    <BeginningDateTime>2003-02-19T00:00:00Z</BeginningDateTime>
    <EndingDateTime>2003-02-19T23:59:59Z</EndingDateTime>
    </RangeDateTime>
    </Temporal>
    </Granule>

    The following examples of url_path could be used.

    {extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the year from the full date: 2003.

    {extractMonth(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the month: 2.

    {extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the day: 19.

    {extractHour(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the hour: 0.

    Different values can be combined to create the url_path. For example

    {
    "bucket": "sample-protected-bucket",
    "name": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)/extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"
    }

    The final file location for the above would be s3://sample-protected-bucket/MOD09GQ/2003/19/MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.

    - + \ No newline at end of file diff --git a/docs/next/workflows/workflow-triggers/index.html b/docs/next/workflows/workflow-triggers/index.html index 0c93f8df7fa..95a55a52842 100644 --- a/docs/next/workflows/workflow-triggers/index.html +++ b/docs/next/workflows/workflow-triggers/index.html @@ -5,13 +5,13 @@ Workflow Triggers | Cumulus Documentation - +
    Version: Next

    Workflow Triggers

    For a workflow to run, it needs to be associated with a rule (see rule configuration). The rule configuration determines how and when a workflow execution is triggered. Rules can be triggered one time, on a schedule, or by new data written to a kinesis stream.

    There are three lambda functions in the API package responsible for scheduling and starting workflows: SF scheduler, message consumer, and SF starter. Each Cumulus instance comes with a Start SF SQS queue.

    The SF scheduler lambda puts a message onto the Start SF queue. This message is picked up the Start SF lambda and an execution is started with the body of the message as the input.

    When a one time rule is created, the schedule SF lambda is triggered. Rules that are not one time are associated with a CloudWatch event which will manage the trigger of the lambdas that trigger the workflows.

    For a scheduled rule, the Cloudwatch event is triggered on the given schedule which calls directly to the schedule SF lambda.

    For a kinesis rule, when data is added to the kinesis stream, the Cloudwatch event is triggered, which calls the message consumer lambda. The message consumer lambda parses the kinesis message and finds all of the rules associated with that message. For each rule (which corresponds to one workflow), the schedule SF lambda is triggered to queue a message to start the workflow.

    For an sns rule, when a message is published to the SNS topic, the message consumer receives the SNS message (JSON expected), parses it into an object, starts a new execution of the workflow associated with the rule and passes the object in the payload field of the Cumulus message.

    Diagram showing how workflows are scheduled via rules

    - + \ No newline at end of file diff --git a/docs/operator-docs/about-operator-docs/index.html b/docs/operator-docs/about-operator-docs/index.html index d57d917d977..6f95dcb4eb8 100644 --- a/docs/operator-docs/about-operator-docs/index.html +++ b/docs/operator-docs/about-operator-docs/index.html @@ -5,13 +5,13 @@ About Operator Docs | Cumulus Documentation - +
    Version: v16.0.0

    About Operator Docs

    Purpose

    Operator Docs are an augmentation to Cumulus documentation and Data Cookbooks. These documents will walk step-by-step through common Cumulus activities (that aren't necessarily as use-case directed as what you'd see in Data Cookbooks).

    What Is A Cumulus Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections. They may perform the following functions via the operator dashboard or API:

    • Configure providers and collections
    • Configure rules and monitor workflow executions
    • Monitor granule ingestion
    • Monitor system metrics
    - + \ No newline at end of file diff --git a/docs/operator-docs/bulk-operations/index.html b/docs/operator-docs/bulk-operations/index.html index c6803ea44d7..cf71d9689fc 100644 --- a/docs/operator-docs/bulk-operations/index.html +++ b/docs/operator-docs/bulk-operations/index.html @@ -5,14 +5,14 @@ Bulk Operations | Cumulus Documentation - +
    Version: v16.0.0

    Bulk Operations

    Cumulus implements bulk operations through the use of AsyncOperations, which are long-running processes executed on an AWS ECS cluster.

    Submitting a bulk API request

    Bulk operations are generally submitted via the endpoint for the relevant data type, e.g. granules. For a list of supported API requests, refer to the Cumulus API documentation. Bulk operations are denoted with the keyword 'bulk'.

    Starting bulk operations from the Cumulus dashboard

    Using a Kibana query

    Note: You must have configured your dashboard build with a KIBANAROOT environment variable in order for the Kibana link to render in the bulk granules modal

    1. From the Granules dashboard page, click on the "Run Bulk Granules" button, then select what type of action you would like to perform

      • Note: the rest of the process is the same regardless of what type of bulk action you perform
    2. From the bulk granules modal, click the "Open Kibana" link:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations

    3. Once you have accessed Kibana, navigate to the "Discover" page. If this is your first time using Kibana, you may see a message like this at the top of the page:

      In order to visualize and explore data in Kibana, you'll need to create an index pattern to retrieve data from Elasticsearch.

      In that case, see the docs for creating an index pattern for Kibana

      Screenshot of Kibana user interface showing the &quot;Discover&quot; page for running queries

    4. Enter a query that returns the granule records that you want to use for bulk operations:

      Screenshot of Kibana user interface showing an example Kibana query and results

    5. Once the Kibana query is returning the results you want, click the "Inspect" link near the top of the page. A slide out tab with request details will appear on the right side of the page:

      Screenshot of Kibana user interface showing details of an example request

    6. In the slide out tab that appears on the right side of the page, click the "Request" link near the top and scroll down until you see the query property:

      Screenshot of Kibana user interface showing the Elasticsearch data request made for a given Kibana query

    7. Highlight and copy the query contents from Kibana. Go back to the Cumulus dashboard and paste the query contents from Kibana inside of the query property in the bulk granules request payload. It is expected that you should have a property of query nested inside of the existing query property:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query information populated

    8. Add values for the index and workflowName to the bulk granules request payload. The value for index will vary based on your Elasticsearch setup, but it is good to target an index specifically for granule data if possible:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query, index, and workflow information populated

    9. Click the "Run Bulk Operations" button. You should see a confirmation message, including an ID for the async operation that was started to handle your bulk action. You can track the status of this async operation on the Operations dashboard page, which can be visited by clicking the "Go To Operations" button:

      Screenshot of Cumulus dashboard showing confirmation message with async operation ID for bulk granules request

    Creating an index pattern for Kibana

    1. Define the index pattern for the indices that your Kibana queries should use. A wildcard character, *, will match across multiple indices. Once you are satisfied with your index pattern, click the "Next step" button:

      Screenshot of Kibana user interface for defining an index pattern

    2. Choose whether to use a Time Filter for your data, which is not required. Then click the "Create index pattern" button:

      Screenshot of Kibana user interface for configuring the settings of an index pattern

    Status Tracking

    All bulk operations return an AsyncOperationId which can be submitted to the /asyncOperations endpoint.

    The /asyncOperations endpoint allows listing of AsyncOperation records as well as record retrieval for individual records, which will contain the status. The Cumulus API documentation shows sample requests for these actions.

    The Cumulus Dashboard also includes an Operations monitoring page, where operations and their status are visible:

    Screenshot of Cumulus Dashboard Operations Page showing 5 operations and their status, ID, description, type and creation timestamp

    - + \ No newline at end of file diff --git a/docs/operator-docs/cmr-operations/index.html b/docs/operator-docs/cmr-operations/index.html index 2591a4d8a53..3102d927541 100644 --- a/docs/operator-docs/cmr-operations/index.html +++ b/docs/operator-docs/cmr-operations/index.html @@ -5,7 +5,7 @@ CMR Operations | Cumulus Documentation - + @@ -16,7 +16,7 @@ UpdateCmrAccessConstraints will update CMR metadata file contents on S3, and PostToCmr will push the updates to CMR. The rest of this section will assume you have created this workflow under the name UpdateCmrAccessConstraints.

    Once created and deployed, the workflow is available in the Cumulus dashboard's Execute workflow selector. However, note that additional configuration is required for this request, to supply an access constraint integer value and optional description to the UpdateCmrAccessConstraints workflow, by clicking the Add Custom Workflow Meta option in the Execute popup, as shown below:

    Screenshot showing granule execute popup with &#39;updateCmrAccessConstraints&#39; selected and configuration values shown in a collapsible JSON field

    An example invocation of the API to perform this action is:

    $ curl --request PUT https://example.com/granules/MOD11A1.A2017137.h19v16.006.2017138085750 \
    --header 'Authorization: Bearer ReplaceWithTheToken' \
    --header 'Content-Type: application/json' \
    --data '{
    "action": "applyWorkflow",
    "workflow": "updateCmrAccessConstraints",
    "meta": {
    accessConstraints: {
    value: 5,
    description: "sample access constraint"
    }
    }
    }'

    Supported CMR metadata formats for the above operation are Echo10XML and UMMG-JSON, which will populate the RestrictionFlag and RestrictionComment fields in Echo10XML, or the AccessConstraints values in UMMG-JSON.

    Additional Operations

    At this time Cumulus does not, out of the box, support additional operations on CMR metadata. However, given the examples shown above, we recommend working with your integrators to develop additional workflows that perform any required operations.

    Bulk CMR operations

    In order to perform the above operations in bulk, Cumulus supports the use of ApplyWorkflow in an AsyncOperation. These are accessed via the Bulk Operation button on the dashboard, or the /granules/bulk endpoint on the Cumulus API.

    More information on bulk operations are in the bulk operations operator doc.

    - + \ No newline at end of file diff --git a/docs/operator-docs/create-rule-in-cumulus/index.html b/docs/operator-docs/create-rule-in-cumulus/index.html index b4cd25f62a1..4bc43d12314 100644 --- a/docs/operator-docs/create-rule-in-cumulus/index.html +++ b/docs/operator-docs/create-rule-in-cumulus/index.html @@ -5,13 +5,13 @@ Create Rule In Cumulus | Cumulus Documentation - +
    Version: v16.0.0

    Create Rule In Cumulus

    Once the above files are in place and the entries created in CMR and Cumulus, we are ready to begin ingesting data. Depending on the type of ingestion (FTP/Kinesis, etc) the values below will change, but for the most part they are all similar. Rules tell Cumulus how to associate providers and collections, and when/how to start processing a workflow.

    Steps

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/operator-docs/discovery-filtering/index.html b/docs/operator-docs/discovery-filtering/index.html index cfeeaaf212a..ff28916bc49 100644 --- a/docs/operator-docs/discovery-filtering/index.html +++ b/docs/operator-docs/discovery-filtering/index.html @@ -5,7 +5,7 @@ Discovery Filtering | Cumulus Documentation - + @@ -24,7 +24,7 @@ directly list the provider_path. If the path contains regular expression components, this may fail.

    It is recommended that operators diagnose any failures by checking error logs and ensuring that permissions on the remote file system allow reading of the default directory and any subdirectories that match the filter.

    Supported protocols

    Currently support for this feature is limited to the following protocols:

    • ftp
    • sftp
    - + \ No newline at end of file diff --git a/docs/operator-docs/granule-workflows/index.html b/docs/operator-docs/granule-workflows/index.html index 6b51e511d26..cefd2f6b666 100644 --- a/docs/operator-docs/granule-workflows/index.html +++ b/docs/operator-docs/granule-workflows/index.html @@ -5,13 +5,13 @@ Granule Workflows | Cumulus Documentation - +
    Version: v16.0.0

    Granule Workflows

    Failed Granule

    Delete and Ingest

    1. Delete Granule

    Note: Granules published to CMR will need to be removed from CMR via the dashboard prior to deletion

    1. Ingest Granule via Ingest Rule
    • Re-trigger a one-time, kinesis, SQS, or SNS rule or a scheduled rule will re-discover and reingest the deleted granule.

    Reingest

    1. Select Failed Granule
    • In the Cumulus dashboard, go to the Collections page.
    • Use search field to find the granule.
    1. Re-ingest Granule
    • Go to the Collections page.
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of the Reingest modal workflow

    Delete and Ingest

    1. Bulk Delete Granules
    • Go to the Granules page.
    • Use the Bulk Delete button to bulk delete selected granules or select via a Kibana query

    Note: You can optionally force deletion from CMR

    1. Ingest Granules via Ingest Rule
    • Re-trigger one-time, kinesis, SQS, or SNS rules or scheduled rules will re-discover and reingest the deleted granule.

    Multiple Failed Granules

    1. Select Failed Granules
    • In the Cumulus dashboard, go to the Collections page.
    • Click on Failed Granules.
    • Select multiple granules.

    Screenshot of selected multiple granules

    1. Bulk Re-ingest Granules
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of Bulk Reingest modal workflow

    - + \ No newline at end of file diff --git a/docs/operator-docs/kinesis-stream-for-ingest/index.html b/docs/operator-docs/kinesis-stream-for-ingest/index.html index 3c80a04affb..40f5429240c 100644 --- a/docs/operator-docs/kinesis-stream-for-ingest/index.html +++ b/docs/operator-docs/kinesis-stream-for-ingest/index.html @@ -5,13 +5,13 @@ Setup Kinesis Stream & CNM Message | Cumulus Documentation - +
    Version: v16.0.0

    Setup Kinesis Stream & CNM Message

    Note: Keep in mind that you should only have to set this up once per ingest stream. Kinesis pricing is based on the shard value and not on amount of kinesis usage.

    1. Create a Kinesis Stream

      • In your AWS console, go to the Kinesis service and click Create Data Stream.
      • Assign a name to the stream.
      • Apply a shard value of 1.
      • Click on Create Kinesis Stream.
      • A status page with stream details display. Once the status is active then the stream is ready to use. Keep in mind to record the streamName and StreamARN for later use.

      Screenshot of AWS console page for creating a Kinesis stream

    2. Create a Rule

    3. Send a message

      • Send a message that makes your schema using python or by your command line.
      • The streamName and Collection must match the kinesisArn+collection defined in the rule that you have created in Step 2.
    - + \ No newline at end of file diff --git a/docs/operator-docs/locating-access-logs/index.html b/docs/operator-docs/locating-access-logs/index.html index 52a93ccf53b..ba1db265345 100644 --- a/docs/operator-docs/locating-access-logs/index.html +++ b/docs/operator-docs/locating-access-logs/index.html @@ -5,13 +5,13 @@ Locating S3 Access Logs | Cumulus Documentation - +
    Version: v16.0.0

    Locating S3 Access Logs

    When enabling S3 Access Logs for EMS Reporting you configured a TargetBucket and TargetPrefix. Inside the TargetBucket at the TargetPrefix is where you will find the raw S3 access logs.

    In a standard deployment, this will be your stack's <internal bucket name> and a key prefix of <stack>/ems-distribution/s3-server-access-logs/

    - + \ No newline at end of file diff --git a/docs/operator-docs/naming-executions/index.html b/docs/operator-docs/naming-executions/index.html index 87d418e4ff8..9e9e523ce25 100644 --- a/docs/operator-docs/naming-executions/index.html +++ b/docs/operator-docs/naming-executions/index.html @@ -5,7 +5,7 @@ Naming Executions | Cumulus Documentation - + @@ -21,7 +21,7 @@ QueuePdrs step.

    In the following excerpt, the QueueGranules config.executionNamePrefix property is set using the value configured in the workflow's meta.executionNamePrefix.

    Please note: This meta.executionNamePrefix property should not be confused with the optional rule executionNamePrefix property from the previous section. Setting executionNamePrefix as a root property of the rule will set a prefix for the names of any workflows triggered by the rule. Setting meta.executionNamePrefix on the rule will set meta.executionNamePrefix in the workflow messages generated for this rule, allowing workflow steps like QueueGranules to read from the message meta.executionNamePrefix for their config. Then, workflows scheduled by QueueGranules would use the configured execution name prefix.

    Setting executionNamePrefix config for QueueGranules using rule.meta

    If you wanted to use a prefix of "my-prefix", you would create a rule with a meta property similar to the following Rule snippet:

    {
    ...other rule keys here...
    "meta":
    {
    "executionNamePrefix": "my-prefix"
    }
    }

    The value of meta.executionNamePrefix from the rule will be set as meta.executionNamePrefix in the workflow message.

    Then, the workflow could contain a "QueueGranules" step with the following state, which uses meta.executionNamePrefix from the message as the value for the executionNamePrefix config to the "QueueGranules" step:

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "executionNamePrefix": "{$.meta.executionNamePrefix}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },
    }
    - + \ No newline at end of file diff --git a/docs/operator-docs/ops-common-use-cases/index.html b/docs/operator-docs/ops-common-use-cases/index.html index 2524ff21fc9..9c4c39281a0 100644 --- a/docs/operator-docs/ops-common-use-cases/index.html +++ b/docs/operator-docs/ops-common-use-cases/index.html @@ -5,13 +5,13 @@ Operator Common Use Cases | Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/docs/operator-docs/trigger-workflow/index.html b/docs/operator-docs/trigger-workflow/index.html index 1903b1f74fb..d7303a6ef56 100644 --- a/docs/operator-docs/trigger-workflow/index.html +++ b/docs/operator-docs/trigger-workflow/index.html @@ -5,13 +5,13 @@ Trigger a Workflow Execution | Cumulus Documentation - +
    Version: v16.0.0

    Trigger a Workflow Execution

    To trigger a workflow, you need to create a rule. To trigger an ingest workflow, one that requires discovering and ingesting data, you will also need to configure the collection and provider and associate those to a rule.

    Trigger a HelloWorld Workflow

    To trigger a HelloWorld workflow that does not need to discover or archive data, you just need to create a rule.

    You can leave the provider and collection blank and do not need any additional metadata. If you create a onetime rule, the workflow execution will start momentarily and you can view its status on the Executions page.

    Trigger an Ingest Workflow

    To ingest data, you will need a provider and collection configured to tell your workflow where to discover data and where to archive the data respectively.

    Follow the instructions to create a provider and create a collection and configure their fields for your data ingest.

    In the rule's additional metadata you can specify a provider_path from which to get the data from the provider.

    Example: Ingest data from S3

    Setup

    Assume there are 2 files to be ingested in an S3 bucket called discovery-bucket, located in the test-data folder:

    • GRANULE.A2017025.jpg
    • GRANULE.A2017025.hdf

    Archive buckets should already be created and mapped to public / private / protected in the Cumulus deployment.

    For example:

    buckets = {
    private = {
    name = "discovery-bucket"
    type = "private"
    },
    protected = {
    name = "archive-protected"
    type = "protected"
    }
    public = {
    name = "archive-public"
    type = "public"
    }
    }

    Create a provider

    Create a new provider. Set protocol to S3 and Host to discovery-bucket.

    Screenshot of adding a sample S3 provider

    Create a collection

    Create a new collection. Configure the collection to extract the granule id from the filenames and configure where to store the granule files.

    The configuration below will store hdf files in the protected bucket and jpg files in the private bucket. The bucket types are

    {
    "name": "test-collection",
    "version": "001",
    "granuleId": "^GRANULE\\.A[\\d]{7}$",
    "granuleIdExtraction": "(GRANULE\\..*)(\\.hdf|\\.jpg)",
    "reportToEms": false,
    "sampleFileName": "GRANULE.A2017025.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^GRANULE\\.A[\\d]{7}\\.hdf$",
    "sampleFileName": "GRANULE.A2017025.hdf"
    },
    {
    "bucket": "public",
    "regex": "^GRANULE\\.A[\\d]{7}\\.jpg$",
    "sampleFileName": "GRANULE.A2017025.jpg"
    }
    ]
    }

    Create a rule

    Create a rule to trigger the workflow to discover your granule data and ingest your granule.

    Select the previously created provider and collection. See the Cumulus Discover Granules workflow for a workflow example of using Cumulus tasks to discover and queue data for ingest.

    In the rule meta, set the provider_path to test-data, so the test-data folder will be used to discover new granules.

    Screenshot of adding a Discover Granules rule

    A onetime rule will run your workflow on-demand and you can view it on the dashboard Executions page. The Cumulus Discover Granules workflow will trigger an ingest workflow and your ingested granules will be visible on the dashboard Granules page.

    - + \ No newline at end of file diff --git a/docs/tasks/index.html b/docs/tasks/index.html index 5317df01539..9888b82b45a 100644 --- a/docs/tasks/index.html +++ b/docs/tasks/index.html @@ -5,13 +5,13 @@ Cumulus Tasks | Cumulus Documentation - +
    Version: v16.0.0

    Cumulus Tasks

    A list of reusable Cumulus tasks. Add your own.

    Tasks

    @cumulus/add-missing-file-checksums

    Add checksums to files in S3 which don't have one


    @cumulus/discover-granules

    Discover Granules in FTP/HTTP/HTTPS/SFTP/S3 endpoints


    @cumulus/discover-pdrs

    Discover PDRs in FTP and HTTP endpoints


    @cumulus/files-to-granules

    Converts array-of-files input into a granules object by extracting granuleId from filename


    @cumulus/hello-world

    Example task


    @cumulus/hyrax-metadata-updates

    Update granule metadata with hooks to OPeNDAP URL


    @cumulus/lzards-backup

    Run LZARDS backup


    @cumulus/move-granules

    Move granule files from staging to final location


    @cumulus/parse-pdr

    Download and Parse a given PDR


    @cumulus/pdr-status-check

    Checks execution status of granules in a PDR


    @cumulus/post-to-cmr

    Post a given granule to CMR


    @cumulus/queue-granules

    Add discovered granules to the queue


    @cumulus/queue-pdrs

    Add discovered PDRs to a queue


    @cumulus/queue-workflow

    Add workflow to the queue


    @cumulus/sf-sqs-report

    Sends an incoming Cumulus message to SQS


    @cumulus/sync-granule

    Download a given granule


    @cumulus/test-processing

    Fake processing task used for integration tests


    @cumulus/update-cmr-access-constraints

    Updates CMR metadata to set access constraints


    Update CMR metadata files with correct online access urls and etags and transfer etag info to granules' CMR files

    - + \ No newline at end of file diff --git a/docs/team/index.html b/docs/team/index.html index 91b737ef499..72154bbbde9 100644 --- a/docs/team/index.html +++ b/docs/team/index.html @@ -5,13 +5,13 @@ Cumulus Team | Cumulus Documentation - +
    Version: v16.0.0

    Cumulus Team

    Cumulus Core Team

    Cumulus Emeritus Team

    - + \ No newline at end of file diff --git a/docs/troubleshooting/index.html b/docs/troubleshooting/index.html index 8c394abbb5d..de7a038df0d 100644 --- a/docs/troubleshooting/index.html +++ b/docs/troubleshooting/index.html @@ -5,14 +5,14 @@ How to Troubleshoot and Fix Issues | Cumulus Documentation - +
    Version: v16.0.0

    How to Troubleshoot and Fix Issues

    While Cumulus is a complex system, there is a focus on maintaining the integrity and availability of the system and data. Should you encounter errors or issues while using this system, this section will help troubleshoot and solve those issues.

    Backup and Restore

    Cumulus has backup and restore functionality built-in to protect Cumulus data and allow recovery of a Cumulus stack. This is currently limited to Cumulus data and not full S3 archive data. Backup and restore is not enabled by default and must be enabled and configured to take advantage of this feature.

    For more information, read the Backup and Restore documentation.

    Elasticsearch reindexing

    If you run into issues with your Elasticsearch index, a reindex operation is available via the Cumulus API. See the Reindexing Guide.

    Information on how to reindex Elasticsearch is in the Cumulus API documentation.

    Troubleshooting Workflows

    Workflows are state machines comprised of tasks and services and each component logs to CloudWatch. The CloudWatch logs for all steps in the execution are displayed in the Cumulus dashboard or you can find them by going to CloudWatch and navigating to the logs for that particular task.

    Workflow Errors

    Visual representations of executed workflows can be found in the Cumulus dashboard or the AWS Step Functions console for that particular execution.

    If a workflow errors, the error will be handled according to the error handling configuration. The task that fails will have the exception field populated in the output, giving information about the error. Further information can be found in the CloudWatch logs for the task.

    Graph of AWS Step Function execution showing a failing workflow

    Workflow Did Not Start

    Generally, first check your rule configuration. If that is satisfactory, the answer will likely be in the CloudWatch logs for the schedule SF or SF starter lambda functions. See the workflow triggers page for more information on how workflows start.

    For Kinesis and SNS rules specifically, if an error occurs during the message consumer process, the fallback consumer lambda will be called and if the message continues to error, a message will be placed on the dead letter queue. Check the dead letter queue for a failure message. Errors can be traced back to the CloudWatch logs for the message consumer and the fallback consumer. Additionally, check that the name and version match those configured in your rule, as rules are filtered by the notification's collection name and version before scheduling executions.

    More information on kinesis error handling is here.

    Operator API Errors

    All operator API calls are funneled through the ApiEndpoints lambda. Each API call is logged to the ApiEndpoints CloudWatch log for your deployment.

    Lambda Errors

    KMS Exception: AccessDeniedException

    KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.

    The above error was being thrown by cumulus lambda function invocation. The KMS key is the encryption key used to encrypt lambda environment variables. The root cause of this error is unknown, but is speculated to be caused by deleting and recreating, with the same name, the IAM role the lambda uses.

    This error can be resolved by switching the lambda's execution role to a different one and then back through the Lambda management console. Unfortunately, this approach doesn't scale well.

    The other resolution (that scales but takes some time) that was found is as follows:

    1. Comment out all lambda definitions (and dependent resources) in your Terraform configuration.
    2. terraform apply to delete the lambdas.
    3. Un-comment the definitions.
    4. terraform apply to recreate the lambdas.

    If this problem occurs with Core lambdas and you are using the terraform-aws-cumulus.zip file source distributed in our release, we recommend using the non-scaling approach as the number of lambdas we distribute is in the low teens, which are likely to be easier and faster to reconfigure one-by-one compared to editing our configs.

    Error: Unable to import module 'index': Error

    This error is shown in the CloudWatch logs for a Lambda function.

    One possible cause is that the Lambda definition in the .tf file defining the lambda is not pointing to the correct packaged lambda source file. In order to resolve this issue, update the lambda definition to point directly to the packaged (e.g. .zip) lambda source file.

    resource "aws_lambda_function" "discover_granules_task" {
    function_name = "${var.prefix}-DiscoverGranules"
    filename = "${path.module}/../../tasks/discover-granules/dist/lambda.zip"
    handler = "index.handler"
    }

    If you are seeing this error when using the Lambda as a step in a Cumulus workflow, then inspect the output for this Lambda step in the AWS Step Function console. If you see the error Cannot find module 'node_modules/@cumulus/cumulus-message-adapter-js', then you need to ensure the lambda's packaged dependencies include cumulus-message-adapter-js.

    - + \ No newline at end of file diff --git a/docs/troubleshooting/reindex-elasticsearch/index.html b/docs/troubleshooting/reindex-elasticsearch/index.html index 6b5f1554e2e..c5f112adf73 100644 --- a/docs/troubleshooting/reindex-elasticsearch/index.html +++ b/docs/troubleshooting/reindex-elasticsearch/index.html @@ -5,7 +5,7 @@ Reindexing Elasticsearch Guide | Cumulus Documentation - + @@ -14,7 +14,7 @@ current index, or the mappings for an index have been updated (they do not update automatically). Any reindexing that will be required when upgrading Cumulus will be in the Migration Steps section of the changelog.

    Switch to a new index and Reindex

    There are two operations needed: reindex and change-index to switch over to the new index. A Change Index/Reindex can be done in either order, but both have their trade-offs.

    If you decide to point Cumulus to a new (empty) index first (with a change index operation), and then Reindex the data to the new index, data ingested while reindexing will automatically be sent to the new index. As reindexing operations can take a while, not all the data will show up on the Cumulus Dashboard right away. The advantage is you do not have to turn of any ingest operations. This way is recommended.

    If you decide to Reindex data to a new index first, and then point Cumulus to that new index, it is not guaranteed that data that is sent to the old index while reindexing will show up in the new index. If you prefer this way, it is recommended to turn off any ingest operations. This order will keep your dashboard data from seeing any interruption.

    Change Index

    This will point Cumulus to the index in Elasticsearch that will be used when retrieving data. Performing a change index operation to an index that does not exist yet will create the index for you. The change index operation can be found here.

    Reindex from the old index to the new index

    The reindex operation will take the data from one index and copy it into another index. The reindex operation can be found here

    Reindex status

    Reindexing is a long-running operation. The reindex-status endpoint can be used to monitor the progress of the operation.

    Index from database

    If you want to just grab the data straight from the database you can perform an Index from Database Operation. After the data is indexed from the database, a Change Index operation will need to be performed to ensure Cumulus is pointing to the right index. It is strongly recommended to turn off workflow rules when performing this operation so any data ingested to the database is not lost.

    Validate reindex

    To validate the reindex, use the reindex-status endpoint. The doc count can be used to verify that the reindex was successful. In the below example the reindex from cumulus-2020-11-3 to cumulus-2021-3-4 was not fully successful as they show different doc counts.

    "indices": {
    "cumulus-2020-11-3": {
    "primaries": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    },
    "total": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    }
    },
    "cumulus-2021-3-4": {
    "primaries": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    },
    "total": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    }
    }
    }

    To further drill down into what is missing, log in to the Kibana instance (found in the Elasticsearch section of the AWS console) and run the following command replacing <index> with your index name.

    GET <index>/_search
    {
    "aggs": {
    "count_by_type": {
    "terms": {
    "field": "_type"
    }
    }
    },
    "size": 0
    }

    which will produce a result like

    "aggregations": {
    "count_by_type": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "logs",
    "doc_count": 483955
    },
    {
    "key": "execution",
    "doc_count": 4966
    },
    {
    "key": "deletedgranule",
    "doc_count": 4715
    },
    {
    "key": "pdr",
    "doc_count": 1822
    },
    {
    "key": "granule",
    "doc_count": 740
    },
    {
    "key": "asyncOperation",
    "doc_count": 616
    },
    {
    "key": "provider",
    "doc_count": 108
    },
    {
    "key": "collection",
    "doc_count": 87
    },
    {
    "key": "reconciliationReport",
    "doc_count": 48
    },
    {
    "key": "rule",
    "doc_count": 7
    }
    ]
    }
    }

    Resuming a reindex

    If a reindex operation did not fully complete it can be resumed using the following command run from the Kibana instance.

    POST _reindex?wait_for_completion=false
    {
    "conflicts": "proceed",
    "source": {
    "index": "cumulus-2020-11-3"
    },
    "dest": {
    "index": "cumulus-2021-3-4",
    "op_type": "create"
    }
    }

    The Cumulus API reindex-status endpoint can be used to monitor completion of this operation.

    - + \ No newline at end of file diff --git a/docs/troubleshooting/rerunning-workflow-executions/index.html b/docs/troubleshooting/rerunning-workflow-executions/index.html index d203708d853..7cff1b40904 100644 --- a/docs/troubleshooting/rerunning-workflow-executions/index.html +++ b/docs/troubleshooting/rerunning-workflow-executions/index.html @@ -5,13 +5,13 @@ Rerunning workflow executions | Cumulus Documentation - +
    Version: v16.0.0

    Rerunning workflow executions

    To rerun a Cumulus workflow execution from the AWS console:

    1. Visit the page for an individual workflow execution

    2. Click the "New execution" button at the top right of the screen

      Screenshot of the AWS console for a Step Function execution highlighting the &quot;New execution&quot; button at the top right of the screen

    3. In the "New execution" modal that appears, replace the cumulus_meta.execution_name value in the default input with the value of the new execution ID as seen in the screenshot below

      Screenshot of the AWS console showing the modal window for entering input when running a new Step Function execution

    4. Click the "Start execution" button

    - + \ No newline at end of file diff --git a/docs/troubleshooting/troubleshooting-deployment/index.html b/docs/troubleshooting/troubleshooting-deployment/index.html index 0adb0230d09..39bbfb427b3 100644 --- a/docs/troubleshooting/troubleshooting-deployment/index.html +++ b/docs/troubleshooting/troubleshooting-deployment/index.html @@ -5,7 +5,7 @@ Troubleshooting Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ data-persistence modules, but your config is only creating one Elasticsearch instance. To fix the issue, update the elasticsearch_config variable for your data-persistence module to increase the number of instances:

    {
    domain_name = "es"
    instance_count = 2
    instance_type = "t2.small.elasticsearch"
    version = "5.3"
    volume_size = 10
    }

    Install Dashboard

    Dashboard Configuration

    Issues

    Not Able To Clear Cache

    Problem clearing the cache: EACCES: permission denied, rmdir '/tmp/gulp-cache/default'", this probably means the files at that location, and/or the folder, are owned by someone else (or some other factor prevents you from writing there).

    Workaround Option

    It's possible to workaround this by editing the file cumulus-dashboard/node_modules/gulp-cache/index.js and alter the value of the line var fileCache = new Cache({cacheDirName: 'gulp-cache'}); to something like var fileCache = new Cache({cacheDirName: '<prefix>-cache'});. Now gulp-cache will be able to write to /tmp/<prefix>-cache/default, and the error should resolve.

    Dashboard Deployment

    Issues

    Earthdata Login Error

    The dashboard sends you to an Earthdata Login page that has an error reading "Invalid request, please verify the client status or redirect_uri before resubmitting".

    Check your variables and values

    Check to see if you are missing or have forgotten to update one or more of your EARTHDATA_CLIENT_ID, EARTHDATA_CLIENT_PASSWORD environment variables (from your app/.env file) and re-deploy Cumulus, or you haven't placed the correct values in them, or you've forgotten to add both the "redirect" and "token" URL to the Earthdata Application.

    Caching Issue

    There is odd caching behavior associated with the dashboard and Earthdata Login at this point in time that can cause the above error to reappear on the Earthdata Login page loaded by the dashboard even after fixing the cause of the error.

    browser solution

    If you experience this, attempt to access the dashboard in a new browser window, and it should work.

    - + \ No newline at end of file diff --git a/docs/upgrade-notes/cumulus_distribution_migration/index.html b/docs/upgrade-notes/cumulus_distribution_migration/index.html index 1a6f81c0e3f..bceb657bbd8 100644 --- a/docs/upgrade-notes/cumulus_distribution_migration/index.html +++ b/docs/upgrade-notes/cumulus_distribution_migration/index.html @@ -5,14 +5,14 @@ Migrate from TEA deployment to Cumulus Distribution | Cumulus Documentation - +
    Version: v16.0.0

    Migrate from TEA deployment to Cumulus Distribution

    Background

    The Cumulus Distribution API is configured to use the AWS Cognito OAuth client. This API can be used instead of the Thin Egress App, which is the default distribution API if using the Deployment Template.

    Configuring a Cumulus Distribution deployment

    See these instructions for deploying the Cumulus Distribution API.

    Important note if migrating from TEA to Cumulus Distribution

    If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    - + \ No newline at end of file diff --git a/docs/upgrade-notes/migrate_tea_standalone/index.html b/docs/upgrade-notes/migrate_tea_standalone/index.html index 34d05219ae5..3cbb4e9eea8 100644 --- a/docs/upgrade-notes/migrate_tea_standalone/index.html +++ b/docs/upgrade-notes/migrate_tea_standalone/index.html @@ -5,13 +5,13 @@ Migrate TEA deployment to standalone module | Cumulus Documentation - +
    Version: v16.0.0

    Migrate TEA deployment to standalone module

    Background

    This document is only relevant for upgrades of Cumulus from versions < 3.x.x to versions > 3.x.x

    Previous versions of Cumulus included deployment of the Thin Egress App (TEA) by default in the distribution module. As a result, Cumulus users who wanted to deploy a new version of TEA to wait on a new release of Cumulus that incorporated that release.

    In order to give Cumulus users the flexibility to deploy newer versions of TEA whenever they want, deployment of TEA has been removed from the distribution module and Cumulus users must now add the TEA module to their deployment. Guidance on integrating the TEA module to your deployment is provided, or you can refer to Cumulus core example deployment code for the thin_egress_app module.

    By default, when upgrading Cumulus and moving from TEA deployed via the distribution module to deployed as a separate module, your API gateway for TEA would be destroyed and re-created, which could cause outages for any Cloudfront endpoints pointing at that API gateway.

    These instructions outline how to modify your state to preserve your existing Thin Egress App (TEA) API gateway when upgrading Cumulus and moving deployment of TEA to a standalone module. If you do not care about preserving your API gateway for TEA when upgrading your Cumulus deployment, you can skip these instructions.

    Prerequisites

    Notes about state management

    These instructions will involve manipulating your Terraform state via terraform state mv commands. These operations are extremely dangerous, since a mistake in editing your Terraform state can leave your stack in a corrupted state where deployment may be impossible or may result in unanticipated resource deletion.

    Since bucket versioning preserves a separate version of your state file each time it is written, and the Terraform state modification commands overwrite the state file, we can mitigate the risk of these operations by downloading the most recent state file before starting the upgrade process. Then, if anything goes wrong during the upgrade, we can restore that previous state version. Guidance on how to perform both operations is provided below.

    Download your most recent state version

    Run this command to download the most recent cumulus deployment state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp s3://BUCKET/KEY /path/to/terraform.tfstate

    Restore a previous state version

    Upload the state file that was previously downloaded to the bucket/key for your state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp /path/to/terraform.tfstate s3://BUCKET/KEY

    Then run terraform plan, which will give an error because we manually overwrote the state file and it is now out of sync with the lock table Terraform uses to track your state file:

    Error: Error loading state: state data in S3 does not have the expected content.

    This may be caused by unusually long delays in S3 processing a previous state
    update. Please wait for a minute or two and try again. If this problem
    persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
    to manually verify the remote state and update the Digest value stored in the
    DynamoDB table to the following value: <some-digest-value>

    To resolve this error, run this command and replace DYNAMO_LOCK_TABLE, BUCKET and KEY with the correct values from cumulus-tf/terraform.tf, and use the digest value from the previous error output:

     aws dynamodb put-item \
    --table-name DYNAMO_LOCK_TABLE \
    --item '{
    "LockID": {"S": "BUCKET/KEY-md5"},
    "Digest": {"S": "some-digest-value"}
    }'

    Now, if you re-run terraform plan, it should work as expected.

    Migration instructions

    Please note: These instructions assume that you are deploying the thin_egress_app module as shown in the Cumulus core example deployment code

    1. Ensure that you have downloaded the latest version of your state file for your cumulus deployment

    2. Find the URL for your <prefix>-thin-egress-app-EgressGateway API gateway. Confirm that you can access it in the browser and that it is functional.

    3. Run terraform plan. You should see output like (edited for readability):

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be created
      + resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket.lambda_source will be created
      + resource "aws_s3_bucket" "lambda_source" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be created
      + resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be created
      + resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be created
      + resource "aws_s3_bucket_object" "lambda_source" {

      # module.thin_egress_app.aws_security_group.egress_lambda[0] will be created
      + resource "aws_security_group" "egress_lambda" {

      ...

      # module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be destroyed
      - resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source will be destroyed
      - resource "aws_s3_bucket" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be destroyed
      - resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be destroyed
      - resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source will be destroyed
      - resource "aws_s3_bucket_object" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda[0] will be destroyed
      - resource "aws_security_group" "egress_lambda" {
    4. Run the state modification commands. The commands must be run in exactly this order:

       # Move security group
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda module.thin_egress_app.aws_security_group.egress_lambda

      # Move TEA storage bucket
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source module.thin_egress_app.aws_s3_bucket.lambda_source

      # Move TEA lambda source code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source module.thin_egress_app.aws_s3_bucket_object.lambda_source

      # Move TEA lambda dependency code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive

      # Move TEA Cloudformation template
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template module.thin_egress_app.aws_s3_bucket_object.cloudformation_template

      # Move URS creds secret version
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret_version.thin_egress_urs_creds aws_secretsmanager_secret_version.thin_egress_urs_creds

      # Move URS creds secret
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret.thin_egress_urs_creds aws_secretsmanager_secret.thin_egress_urs_creds

      # Move TEA Cloudformation stack
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app module.thin_egress_app.aws_cloudformation_stack.thin_egress_app

      Depending on how you were supplying a bucket map to TEA, there may be an additional step. If you were specifying the bucket_map_key variable to the cumulus module to use a custom bucket map, then you can ignore this step and just ensure that the bucket_map_file variable to the TEA module uses that same S3 key. Otherwise, if you were letting Cumulus generate a bucket map for you, then you need to take this step to migrate that bucket map:

      # Move bucket map
      terraform state mv module.cumulus.module.distribution.aws_s3_bucket_object.bucket_map_yaml[0] aws_s3_bucket_object.bucket_map_yaml
    5. Run terraform plan again. You may still see a few additions/modifications pending like below, but you should not see any deletion of Thin Egress App resources pending:

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be updated in-place
      ~ resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be updated in-place
      ~ resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_source" {

      If you still see deletion of module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app pending, then something went wrong and you should restore the previously downloaded state file version and start over from step 1. Otherwise, proceed to step 6.

    6. Once you have confirmed that everything looks as expected, run terraform apply.

    7. Visit the same API gateway from step 1 and confirm that it still works.

    Your TEA deployment has now been migrated to a standalone module, which gives you the ability to upgrade the deployed version of TEA independently of Cumulus releases.

    - + \ No newline at end of file diff --git a/docs/upgrade-notes/rds-phase-3-data-migration-guidance/index.html b/docs/upgrade-notes/rds-phase-3-data-migration-guidance/index.html index 3c4ebb37a06..0cde698e79a 100644 --- a/docs/upgrade-notes/rds-phase-3-data-migration-guidance/index.html +++ b/docs/upgrade-notes/rds-phase-3-data-migration-guidance/index.html @@ -5,13 +5,13 @@ Data Integrity & Migration Guidance (RDS Phase 3 Upgrade) | Cumulus Documentation - +
    Version: v16.0.0

    Data Integrity & Migration Guidance (RDS Phase 3 Upgrade)

    A few issues were identied as part of the RDS Phase 2 release. These issues could impact Granule data-integrity and are described below along with recommended actions and guidance going forward.

    Issue Descriptions

    Issue 1

    Relevant ticket: CUMULUS-3019

    Ingesting granules will delete unrelated files from the Files Postgres table. This is due to an issue in our logic to remove excess files when writing granules and fixed in Cumulus versions 13.2.1, 12.0.2, 11.1.5

    With this bug we believe the data in Dynamo is the most reliable and Postgres is out-of-sync.

    Issue 2

    Relevant ticket: CUMULUS-3024

    Updating an existing granule either via API or Workflow could result in datastores becoming out-of-sync if a partial granule record is provided. Our update logic operates differently in Postgres and Dynamo/Elastic. If a partial object is provided in an update payload the Postgres record will delete/nullify fields not present in the payload. Dynamo/Elastic will retain existing values and not delete/nullify.

    With this bug it’s possible that either Dynamo or PG could be the source of truth. It’s likely that it’s still Dynamo.

    Issue 3

    Relevant ticket: CUMULUS-3024

    Updating an existing granule with an empty files array in the update payload results in datastores becoming out-of-sync. If an empty array is provided, existing files in Dynamo and Elastic will be removed. Existing files in Postgres will be retained.

    With this bug Postgres is the source of truth. Files are retained in PG and incorrectly removed in Dynamo/Elastic.

    Issue 4

    Relevant ticket: CUMULUS-3017

    Updating/putting a granule via framework writes that duplicates a granuleId but has a different collection results in overwrite of the DynamoDB granule but a new granule record for Postgres. This intended post RDS transition, however should not be happening now.

    With this bug we believe Dynamo is the source of truth, and ‘excess’ older granules will be left in postgres. This should be detectable with tooling/query to detect duplicate granuleIds in the granules table.

    Issue 5

    Relevant ticket: CUMULUS-3024

    This is a sub-issue of issue 2 above - due to the way we assign a PDR name to a record, if the pdr field is missing from the final payload for a granule as part of a workflow message write, the final granule record will not link the PDR to the granule properly in postgres, however the dynamo record will have the linked PDR. This can happen in situations where the granule is written prior to completion with the PDR in the payload, but then downstream only the granule object is included, particularly in multi-workflow ingest scenarios and/or bulk update situations.

    Immediate Actions

    1. Re-review the issues described above

      • GHRC was able to scope the affected granules to specific collections, which makes the recovery process much easier. This may not be an option for all DAACs.
    2. If you have not ingested granules or performed partial granule updates on affected Cumulus versions (questions 1 and 2 on the survey), no action is required. You may update to the latest version of Cumulus.

    3. One option to ensure your Postgres data matches Dynamo is running the data-migration lambda (see below for instructions) before updating to the latest Cumulus version if both of the following are true:

      • you have ingested granules using an affected Cumulus version
      • your DAAC has not had any operations that updated an existing granule with an empty files array (granule.files = [])
    4. A second option for DAACs that have ingested data using an affected Cumulus version is to use your DAAC’s recovery tools or reingest the affected granules. This is likely the most certain method for ensuring Postgres contains the correct data but may be infeasible depending on the size of data holdings, etc..

    Guidance Going Forward

    1. Before updating to Cumulus version 16.x and beyond, take a snapshot of your DynamoDB instance. The v16 update removes the DynamoDB tables. This snapshot would be for use in unexpected data recovery scenarios only.

    2. Cumulus recommends that you establish and follow a database backup/disaster recovery protocol for your RDS database, which should include periodic backups. The frequency will depend on each DAAC’s database architecture, comfort level, datastore size, and time available. Relevant AWS Docs

    3. Invest future development effort in data validation/integrity tools and procedures. Each DAAC has different requirements here. Each DAAC should maintain procedures for validating their Cumulus datastore against their holdings.

    Running a Granule Migration

    Instructions for running the data-migration operation to sync Granules from DynamoDB to PostgreSQL

    The data-migration2 Lambda (which is invoked asynchronously using ${PREFIX}-postgres-migration-async-operation) uses Cumulus' Granule upsert logic to write granules from DynamoDB to PostgreSQL. This is particularly notable because granules with a running or queued status will only migrate a subset of their fields:

    • status
    • timestamp
    • updated_at
    • created_at

    It is recommended that users ensure their granules are in a final state (running, completed) before running this data migration. If there are Granules with an incomplete status, it may impact the data migration.

    For example, if a Granule in the running status is updated by a workflow or API call (containing an updated status) and fails, that granule will have the original running status, not the intended/updated status. Failed Granule writes/updates should be evaluated and resolved prior to this data migration.

    Cumulus provides the Cumulus Dead Letter Archive which is populated by the Dead Letter Queue for the sfEventSqsToDbRecords Lambda, which is responsible for Cumulus message writes to PostgreSQL. This may not catch all write failures depending on where the failure happened and workflow configuration but may be a useful tool.

    If a Granule record is correct except for the status, Cumulus provides an API to update specific granule fields.

    - + \ No newline at end of file diff --git a/docs/upgrade-notes/update-cma-2.0.2/index.html b/docs/upgrade-notes/update-cma-2.0.2/index.html index f53960acdcf..83c71595578 100644 --- a/docs/upgrade-notes/update-cma-2.0.2/index.html +++ b/docs/upgrade-notes/update-cma-2.0.2/index.html @@ -5,13 +5,13 @@ Upgrade to CMA 2.0.2 | Cumulus Documentation - +
    Version: v16.0.0

    Upgrade to CMA 2.0.2

    Updating a Cumulus Deployment to CMA 2.0.2

    Background

    The Cumulus Message Adapter has been updated in release 2.0.2 to no longer utilize the AWS step function API to look up the defined name of a step function task for population in meta.workflow_tasks, but instead use an incrementing integer field.

    Additionally a bugfix was released in the form of v2.0.1/v2.0.2 following the initial 2.0.0 release, so all users should update to release 2.0.2

    The update is not tied to a particular version of Core, however the update should be done across all task components in order to ensure consistent execution records.

    Changes

    Execution Record Update

    This update functionally means that Cumulus tasks/activities using the CMA will now record a record that looks like the following in meta.workflowtasks, and more importantly in the tasks column for an execution record:

    Original

          "DiscoverGranules": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "QueueGranules": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    New

          "0": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "1": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    Actions Required

    The following should be done as part of a Cumulus stack update to utilize cumulus message adapter > 2.0.2:

    • Python tasks that utilize cumulus-message-adapter-python should be updated to use > 2.0.0, their lambdas rebuilt and Cumulus workflows reconfigured to use the updated version.

    • Python activities that utilize cumulus-process-py should be rebuilt using > 1.0.0 with updated dependencies, and have their images deployed/Cumulus configured to use the new version.

    • The cumulus-message-adapter v2.0.2 lambda layer should be made available in the deployment account, and the Cumulus deployment should be reconfigured to use it (via the cumulus_message_adapter_lambda_layer_version_arn variable in the cumulus module). This should address all Core node.js tasks that utilize the CMA, and many contributed node.js/JAVA components.

    Once the above have been done, redeploy Cumulus to apply the configuration and the updates should be live.

    - + \ No newline at end of file diff --git a/docs/upgrade-notes/update-task-file-schemas/index.html b/docs/upgrade-notes/update-task-file-schemas/index.html index f628737015e..713929b4894 100644 --- a/docs/upgrade-notes/update-task-file-schemas/index.html +++ b/docs/upgrade-notes/update-task-file-schemas/index.html @@ -5,13 +5,13 @@ Updates to task granule file schemas | Cumulus Documentation - +
    Version: v16.0.0

    Updates to task granule file schemas

    Background

    Most Cumulus workflow tasks expect as input a payload of granule(s) which contain the files for each granule. Most tasks also return this same granule structure as output.

    However, up to this point, there was inconsistency in the schemas for the granule files objects expected by each task. Furthermore, there was no guarantee of consistency between granule files objects as stored in the database and the expectations of any given workflow task.

    Thus, when performing bulk granule operations which pass granules from the database into a Cumulus workflow, it was possible for there to be schema validation failures depending on which task was used to start the workflow and its particular schema.

    In order to rectify this situation, CUMULUS-2388 was filed and addressed to create a common granule files schema between nearly all of the Cumulus tasks (exceptions discussed below) and the Cumulus database. The following documentation explains the manual changes you need to make to your deployment in order to be compatible with the updated files schema.

    Updated files schema

    The updated granule files schema can be found here.

    These former properties were deprecated (with notes about how to derive the same information from the updated schema, if possible):

    • filename - concatenate the bucket and key values with a directory separator (/)
    • name - use fileName property
    • etag - ETags are no longer provided as an individual file property. Instead, a separate etags object mapping S3 URIs to ETag values is provided as output from the following workflow tasks (guidance on how to integrate this output with your workflows is provided in the Upgrading your workflows section below):
      • update-granules-cmr-metadata-file-links
      • hyrax-metadata-updates
    • fileStagingDir - no longer supported
    • url_path - no longer supported
    • duplicate_found - This property is no longer supported, however sync-granule and move-granules now produce a separate granuleDuplicates object as part of their output. The granuleDuplicates object is a map of granules by granule ID which includes the files that encountered duplicates during processing. Guidance on how to integrate granuleDuplicates information into your workflow configuration is provided below.

    Exceptions

    These workflow tasks did not have their schema for granule files updated:

    • discover-granules - no updates
    • queue-granules - no updates
    • parse-pdr - no updates
    • sync-granule - input schema not updated, output schema was updated

    The reason that these task schemas were not updated is that all of these tasks start before the files have been ingested to S3, thus much of the information that is required in the updated files schema like bucket, key, or checksum is not yet known.

    Bulk granule operations

    Since the input schema for the above tasks was not updated, that means you cannot run bulk granule operations against workflows if they start with any of those tasks. Bulk granule operations work by loading the specified granules from the database and sending them as input to a specified workflow, so if the specified workflow begins with a task whose input schema does not conform to what is coming out of the database, there will be schema errors.

    Upgrading your deployment

    Upgrading your workflows

    For any workflows using the update-granules-cmr-metadata-file-links task before the hyrax-metadata-updates and/or post-to-cmr tasks, update the step definition for update-granules-cmr-metadata-file-links as follows:

        "UpdateGranulesCmrMetadataFileLinksStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    hyrax-metadata-updates

    For any workflows using the hyrax-metadata-updates task before a post-to-cmr task, update the definition of the hyrax-metadata-updates step as follows:

        "HyraxMetadataUpdatesTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    post-to-cmr

    For any workflows using post-to-cmr task after the update-granules-cmr-metadata-file-links or hyrax-metadata-updates tasks, update the post-to-cmr step definition as follows:

        "CmrStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}"
    }
    }
    },
    ...more configuration...

    Example workflow

    For an example workflow integrating all of these changes, please see our example ingest and publish workflow.

    Optional - Integrate granuleDuplicates information

    Please note that the granuleDuplicates output is purely informational and does not have any bearing on the separate configuration for how duplicates should be handled.

    You can include granuleDuplicates output from the sync-granule or move-granules tasks in your workflow messages like so:

        "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    ...other config...
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granuleDuplicates}",
    "destination": "{$.meta.sync_granule.granule_duplicates}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    }
    ...more configuration...

    The result of this configuration is that the granuleDuplicates output from sync-granule would be placed in meta.sync_granule.granule_duplicates on the workflow message and remain there throughout the rest of the workflow. The same configuration could be replicated for the move-granules task, but be sure to use a different destination in the workflow message for the granuleDuplicates output .

    Updating collection URL path templates

    Collections can specify url_path templates to dynamically generate the final location of files. As part of url_path templates, file object properties can be interpolated to generate the file path. Thus, these url_path templates need to be updated to ensure that they are compatible with the updated files schema and the properties that will actually be available on file objects.

    See the notes on the updated files schema to know which properties are available and which previously existing properties were deprecated.

    As an example, you will want to update any url_path properties in your collections to remove references to file.name and replace them with references to file.fileName like so:

    - "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.name, 0, 3)}",
    + "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.fileName, 0, 3)}",
    - + \ No newline at end of file diff --git a/docs/upgrade-notes/upgrade-rds-phase-3-release/index.html b/docs/upgrade-notes/upgrade-rds-phase-3-release/index.html index 27ee0f2c717..542571c788a 100644 --- a/docs/upgrade-notes/upgrade-rds-phase-3-release/index.html +++ b/docs/upgrade-notes/upgrade-rds-phase-3-release/index.html @@ -5,14 +5,14 @@ Upgrade RDS Phase 3 Release | Cumulus Documentation - +
    Version: v16.0.0

    Upgrade RDS Phase 3 Release

    Background

    Release v16 of Cumulus Core includes an update to remove the now-unneeded AWS DynamoDB tables for the primary archive, as this datastore has been fully migrated to PostgreSQL databases in prior releases, and should have been operating in a parallel write mode to allow for repair/remediation of prior issues.

    Requirements

    To update to this release (and beyond) users must:

    • Have deployed a release of at least version 11.0.0 (preferably at least the latest supported minor version in the 11.1.x release series), having successfully completed the transition to using PostgreSQL as the primary datastore in release 11
    • Completed evaluation of the primary datastore for data irregularities that might be resolved by re-migration of data from the DynamoDB datastores.
    • Review the CHANGELOG for any migration instructions/changes between (and including) this release and the release you're upgrading from. Complete migration instructions from the previous release series should be included in release notes/CHANGELOG for this release, this document notes migration instructions specifically for release 16.0.0+, and is not all-inclusive if upgrading from multiple prior release versions.
    • Configure your deployment terraform environment to utilize the new release, noting all migration instructions.
    • The PostgreSQL database cluster should be updated to the supported version (Aurora Postgres 11.13+ compatible)

    Suggested Prerequisites

    In addition to the above requirements, we suggest users:

    • Retain a backup of the primary DynamoDB datastore in case of recovery/integrity concerns exist between DynamoDB and PostgreSQL.

      This should only be considered if remediation/re-migration from DynamoDB has recently occurred, specifically due to the issues reported in the following tickets:

      • CUMULUS-3019
      • CUMULUS-3024
      • CUMULUS-3017

      and other efforts included in the outcome from CUMULUS-3035/CUMULUS-3071.

    • Halt all ingest prior to performing the version upgrade.

    • Run load testing/functional testing.

      While the majority of the modifications for release 16 are related to DynamoDB removal, we always encourage user engineering teams ensure compatibility at scale with their deployment's configuration prior to promotion to a production environment to ensure a smooth upgrade.

    Upgrade procedure

    1. (Optional) Halt ingest

    If ingest is not halted, once the data-persistence module is deployed but the main Core module is not deployed, existing database writes will fail, resulting in in-flight workflow messages failing to the message Dead Letter Archive, and all API write related calls failing.

    While this is optional, it is highly encouraged, as cleanup could be significant.

    2. Deploy the data persistence module

    Ensure your source for the data-persistence module is set to the release version (substituting v16.0.0 for the latest v16 release):

      source = "https://github.com/nasa/cumulus/releases/download/v16.0.0/terraform-aws-cumulus.zip//tf-modules/data-persistence"

    Run terraform init to bring all updated source modules, then run terraform apply and evaluate the changeset before proceeding. The changeset should include blocks like the following for each table removed:

    # module.data_persistence.aws_dynamodb_table.collections_table will be destroyed
    # module.data_persistence.aws_dynamodb_table.executions_table will be destroyed
    # module.data_persistence.aws_dynamodb_table.files_table will be destroyed
    # module.data_persistence.aws_dynamodb_table.granules_table will be destroyed
    # module.data_persistence.aws_dynamodb_table.pdrs_table will be destroyed

    In addition, you should expect to see the outputs from the module remove the references to the DynamoDB tables:

    Changes to Outputs:
    ~ dynamo_tables = {
    access_tokens = {
    arn = "arn:aws:dynamodb:us-east-1:XXXXXX:table/prefix-AccessTokensTable"
    name = "prefix-AccessTokensTable"
    }
    async_operations = {
    arn = "arn:aws:dynamodb:us-east-1:XXXXXX:table/prefix-AsyncOperationsTable"
    name = "prefix-AsyncOperationsTable"
    }
    - collections = {
    - arn = "arn:aws:dynamodb:us-east-1:XXXXXX:table/prefix-CollectionsTable"
    - name = "prefix-CollectionsTable"
    } -> null
    - executions = {
    - arn = "arn:aws:dynamodb:us-east-1:XXXXXX:table/prefix-ExecutionsTable"
    - name = "prefix-ExecutionsTable"
    } -> null
    - files = {
    - arn = "arn:aws:dynamodb:us-east-1:XXXXXX:table/prefix-FilesTable"
    - name = "prefix-FilesTable"
    } -> null
    - granules = {
    - arn = "arn:aws:dynamodb:us-east-1:XXXXXX:table/prefix-GranulesTable"
    - name = "prefix-GranulesTable"
    } -> null
    - pdrs = {
    - arn = "arn:aws:dynamodb:us-east-1:XXXXXX:table/prefix-PdrsTable"
    - name = "prefix-PdrsTable"
    } -> null

    Once this completes successfully, proceed to the next step.

    Deploy cumulus-tf module

    Ensure your source for the data-persistence module is set to the release version (substituting v16.0.0 for the latest v16 release):

    source = "https://github.com/nasa/cumulus/releases/download/v16.0.0/terraform-aws-cumulus.zip//tf-modules/cumulus"

    You should expect to see a significant changeset in Core provided resources, in addition to the following resources being destroyed from the RDS Phase 3 update set:

    # module.cumulus.module.archive.aws_cloudwatch_log_group.granule_files_cache_updater_logs will be destroyed
    # module.cumulus.module.archive.aws_iam_role.granule_files_cache_updater_lambda_role will be destroyed
    # module.cumulus.module.archive.aws_iam_role.migration_processing will be destroyed
    # module.cumulus.module.archive.aws_iam_role_policy.granule_files_cache_updater_lambda_role_policy will be destroyed
    # module.cumulus.module.archive.aws_iam_role_policy.migration_processing will be destroyed
    # module.cumulus.module.archive.aws_iam_role_policy.process_dead_letter_archive_role_policy will be destroyed
    # module.cumulus.module.archive.aws_iam_role_policy.publish_collections_lambda_role_policy will be destroyed
    # module.cumulus.module.archive.aws_iam_role_policy.publish_executions_lambda_role_policy will be destroyed
    # module.cumulus.module.archive.aws_iam_role_policy.publish_granules_lambda_role_policy will be destroyed
    # module.cumulus.module.archive.aws_lambda_event_source_mapping.granule_files_cache_updater will be destroyed
    # module.cumulus.module.archive.aws_lambda_event_source_mapping.publish_pdrs will be destroyed
    # module.cumulus.module.archive.aws_lambda_function.execute_migrations will be destroyed
    # module.cumulus.module.archive.aws_lambda_function.granule_files_cache_updater will be destroyed
    # module.cumulus.module.data_migration2.aws_iam_role.data_migration2 will be destroyed
    # module.cumulus.module.data_migration2.aws_iam_role_policy.data_migration2 will be destroyed
    # module.cumulus.module.data_migration2.aws_lambda_function.data_migration2 will be destroyed
    # module.cumulus.module.data_migration2.aws_security_group.data_migration2[0] will be destroyed
    # module.cumulus.module.postgres_migration_async_operation.aws_iam_role.postgres_migration_async_operation_role will be destroyed
    # module.cumulus.module.postgres_migration_async_operation.aws_iam_role_policy.postgres_migration_async_operation will be destroyed
    # module.cumulus.module.postgres_migration_async_operation.aws_lambda_function.postgres-migration-async-operation will be destroyed
    # module.cumulus.module.postgres_migration_async_operation.aws_security_group.postgres_migration_async_operation[0] will be destroyed
    # module.cumulus.module.postgres_migration_count_tool.aws_iam_role.postgres_migration_count_role will be destroyed
    # module.cumulus.module.postgres_migration_count_tool.aws_iam_role_policy.postgres_migration_count will be destroyed
    # module.cumulus.module.postgres_migration_count_tool.aws_lambda_function.postgres_migration_count_tool will be destroyed
    # module.cumulus.module.postgres_migration_count_tool.aws_security_group.postgres_migration_count[0] will be destroyed

    Possible deployment issues

    Security group deletion

    The following security group resources will be deleted as part of this update:

    module.cumulus.module.data_migration2.aws_security_group.data_migration2[0]
    module.cumulus.module.postgres_migration_count_tool.aws_security_group.postgres_migration_count[0]
    module.cumulus.module.postgres_migration_async_operation.aws_security_group.postgres_migration_async_operation[0]

    Because the AWS resources associated with these security groups can take some time to be properly updated (in testing this was 20-35 minutes), these deletions may cause the deployment to take some time. If for some unexpected reason this takes longer than expected and this causes the update to time out, you should be able to continue the deployment by re-running terraform to completion.

    Users may also opt to attempt to reassign the affected Network Interfaces from the Security Group/deleting the security group manually if this situation occurs and the deployment time is not desirable.

    - + \ No newline at end of file diff --git a/docs/upgrade-notes/upgrade-rds/index.html b/docs/upgrade-notes/upgrade-rds/index.html index 465200ecda6..90000b01a62 100644 --- a/docs/upgrade-notes/upgrade-rds/index.html +++ b/docs/upgrade-notes/upgrade-rds/index.html @@ -5,7 +5,7 @@ Upgrade to RDS release | Cumulus Documentation - + @@ -21,7 +21,7 @@ | cutoffSeconds | number | Number of seconds prior to this execution to 'cutoff' reconciliation queries. This allows in-progress/other in-flight operations time to complete and propagate to Elasticsearch/postgres. | 3600 | | dbConcurrency | number | Sets max number of parallel collections reports the script will run at a time. | 20 | | dbMaxPool | number | Sets the maximum number of connections the database pool has available. Modifying this may result in unexpected failures. | 20 |

    - + \ No newline at end of file diff --git a/docs/upgrade-notes/upgrade_tf_version_0.13.6/index.html b/docs/upgrade-notes/upgrade_tf_version_0.13.6/index.html index bd5e23b9c51..3e30b2b587c 100644 --- a/docs/upgrade-notes/upgrade_tf_version_0.13.6/index.html +++ b/docs/upgrade-notes/upgrade_tf_version_0.13.6/index.html @@ -5,13 +5,13 @@ Upgrade to TF version 0.13.6 | Cumulus Documentation - +
    Version: v16.0.0

    Upgrade to TF version 0.13.6

    Background

    Cumulus pins its support to a specific version of Terraform see: deployment documentation. The reason for only supporting one specific Terraform version at a time is to avoid deployment errors than can be caused by deploying to the same target with different Terraform versions.

    Cumulus is upgrading its supported version of Terraform from 0.12.12 to 0.13.6. This document contains instructions on how to perform the upgrade for your deployments.

    Prerequisites

    • Follow the Terraform guidance for what to do before upgrading, notably ensuring that you have no pending changes to your Cumulus deployments before proceeding.
      • You should do a terraform plan to see if you have any pending changes for your deployment (for both the data-persistence-tf and cumulus-tf modules), and if so, run a terraform apply before doing the upgrade to Terraform 0.13.6
    • Review the Terraform v0.13 release notes to prepare for any breaking changes that may affect your custom deployment code. Cumulus' deployment code has already been updated for compatibility with version 0.13.
    • Install Terraform version 0.13.6. We recommend using Terraform Version Manager tfenv to manage your installed versons of Terraform, but this is not required.

    Upgrade your deployment code

    Terraform 0.13 does not support some of the syntax from previous Terraform versions, so you need to upgrade your deployment code for compatibility.

    Terraform provides a 0.13upgrade command as part of version 0.13 to handle automatically upgrading your code. Make sure to check out the documentation on batch usage of 0.13upgrade, which will allow you to upgrade all of your Terraform code with one command.

    Run the 0.13upgrade command until you have no more necessary updates to your deployment code.

    Upgrade your deployment

    1. Ensure that you are running Terraform 0.13.6 by running terraform --version. If you are using tfenv, you can switch versions by running tfenv use 0.13.6.

    2. For the data-persistence-tf and cumulus-tf directories, take the following steps:

      1. Run terraform init --reconfigure. The --reconfigure flag is required, otherwise you might see an error like:

        Error: Failed to decode current backend config

        The backend configuration created by the most recent run of "terraform init"
        could not be decoded: unsupported attribute "lock_table". The configuration
        may have been initialized by an earlier version that used an incompatible
        configuration structure. Run "terraform init -reconfigure" to force
        re-initialization of the backend.
      2. Run terraform apply to perform a deployment.

        WARNING: Even if Terraform says that no resource changes are pending, running the apply using Terraform version 0.13.6 will modify your backend state from version 0.12.12 to version 0.13.6 without requiring approval. Updating the backend state is a necessary part of the version 0.13.6 upgrade, but it is not completely transparent.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/adding-a-task/index.html b/docs/v10.0.0/adding-a-task/index.html index 3deb32c829a..27e115b8b73 100644 --- a/docs/v10.0.0/adding-a-task/index.html +++ b/docs/v10.0.0/adding-a-task/index.html @@ -5,13 +5,13 @@ Contributing a Task | Cumulus Documentation - +
    Version: v10.0.0

    Contributing a Task

    We're tracking reusable Cumulus tasks in this list and, if you've got one you'd like to share with others, you can add it!

    Right now we're focused on tasks distributed via npm, but are open to including others. For now the script that pulls all the data for each package only supports npm.

    The tasks.md file is generated in the build process

    The tasks list in docs/tasks.md is generated from the list of task package names from the tasks folder.

    Do not edit the docs/tasks.md file directly.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/api/index.html b/docs/v10.0.0/api/index.html index 2359171e4a3..ade2880be3b 100644 --- a/docs/v10.0.0/api/index.html +++ b/docs/v10.0.0/api/index.html @@ -5,13 +5,13 @@ Cumulus API | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v10.0.0/architecture/index.html b/docs/v10.0.0/architecture/index.html index bb739fc3e43..98af1ffead0 100644 --- a/docs/v10.0.0/architecture/index.html +++ b/docs/v10.0.0/architecture/index.html @@ -5,14 +5,14 @@ Architecture | Cumulus Documentation - +
    Version: v10.0.0

    Architecture

    Architecture

    Below, find a diagram with the components that comprise an instance of Cumulus.

    Architecture diagram of a Cumulus deployment

    This diagram details all of the major architectural components of a Cumulus deployment.

    While the diagram can feel complex, it can easily be digested in several major components:

    Data Distribution

    End Users can access data via Cumulus's distribution submodule, which includes ASF's thin egress application, this provides authenticated data egress, temporary S3 links and other statistics features.

    End user exposure of Cumulus's holdings is expected to be provided by an external service.

    For NASA use, this is assumed to be CMR in this diagram.

    Data ingest

    Workflows

    The core of the ingest and processing capabilities in Cumulus is built into the deployed AWS Step Function workflows. Cumulus rules trigger workflows via either Cloud Watch rules, Kinesis streams, SNS topic, or SQS queue. The workflows then run with a configured Cumulus message, utilizing built-in processes to report status of granules, PDRs, executions, etc to the Data Persistence components.

    Workflows can optionally report granule metadata to CMR, and workflow steps can report metrics information to a shared SNS topic, which could be subscribed to for near real time granule, execution, and PDR status. This could be used for metrics reporting using an external ELK stack, for example.

    Data persistence

    Cumulus entity state data is stored in a set of DynamoDB database tables, and is exported to an ElasticSearch instance for non-authoritative querying/state data for the API and other applications that require more complex queries.

    Data discovery

    Discovering data for ingest is handled via workflow step components using Cumulus provider and collection configurations and various triggers. Data can be ingested from AWS S3, FTP, HTTPS and more.

    Database

    Cumulus utilizes a user-provided PostgreSQL database backend. For improved API search query efficiency Cumulus provides data replication to an Elasticsearch instance. For legacy reasons, Cumulus is currently also deploying a DynamoDB datastore, and writes are replicated in parallel with the PostgreSQL database writes. The DynamoDB replicated tables and parallel writes will be removed in future releases.

    PostgreSQL Database Schema Diagram

    ERD of the Cumulus Database

    Maintenance

    System maintenance personnel have access to manage ingest and various portions of Cumulus via an AWS API gateway, as well as the operator dashboard.

    Deployment Structure

    Cumulus is deployed via Terraform and is organized internally into two separate top-level modules, as well as several external modules.

    Cumulus

    The Cumulus module, which contains multiple internal submodules, deploys all of the Cumulus components that are not part of the Data Persistence portion of this diagram.

    Data persistence

    The data persistence module provides the Data Persistence portion of the diagram.

    Other modules

    Other modules are provided as artifacts on the release page for use in users configuring their own deployment and contain extracted subcomponents of the cumulus module. For more on these components see the components documentation.

    For more on the specific structure, examples of use and how to deploy and more, please see the deployment docs as well as the cumulus-template-deploy repo .

    - + \ No newline at end of file diff --git a/docs/v10.0.0/configuration/cloudwatch-retention/index.html b/docs/v10.0.0/configuration/cloudwatch-retention/index.html index 3748d57888d..2e7c8918f28 100644 --- a/docs/v10.0.0/configuration/cloudwatch-retention/index.html +++ b/docs/v10.0.0/configuration/cloudwatch-retention/index.html @@ -5,13 +5,13 @@ Cloudwatch Retention | Cumulus Documentation - +
    Version: v10.0.0

    Cloudwatch Retention

    Our lambdas dump logs to AWS CloudWatch. By default, these logs exist indefinitely. However, there are ways to specify a duration for log retention.

    aws-cli

    In addition to getting your aws-cli set-up, there are two values you'll need to acquire.

    1. log-group-name: the name of the log group who's retention policy (retention time) you'd like to change. We'll use /aws/lambda/KinesisInboundLogger in our examples.
    2. retention-in-days: the number of days you'd like to retain the logs in the specified log group for. There is a list of possible values available in the aws logs documentation.

    For example, if we wanted to set log retention to 30 days on our KinesisInboundLogger lambda, we would write:

    aws logs put-retention-policy --log-group-name "/aws/lambda/KinesisInboundLogger" --retention-in-days 30

    Note: The aws-cli log command that we're using is explained in detail here.

    AWS Management Console

    Changing the log retention policy in the AWS Management Console is a fairly simple process:

    1. Navigate to the CloudWatch service in the AWS Management Console.
    2. Click on the Logs entry on the sidebar.
    3. Find the Log Group who's retention policy you're interested in changing.
    4. Click on the value in the Expire Events After column.
    5. Enter/Select the number of days you'd like to retain logs in that log group for.

    Screenshot of AWS console showing how to configure the retention period for Cloudwatch logs

    - + \ No newline at end of file diff --git a/docs/v10.0.0/configuration/collection-storage-best-practices/index.html b/docs/v10.0.0/configuration/collection-storage-best-practices/index.html index 269e53ab81f..e73b609bb8c 100644 --- a/docs/v10.0.0/configuration/collection-storage-best-practices/index.html +++ b/docs/v10.0.0/configuration/collection-storage-best-practices/index.html @@ -5,13 +5,13 @@ Collection Cost Tracking and Storage Best Practices | Cumulus Documentation - +
    Version: v10.0.0

    Collection Cost Tracking and Storage Best Practices

    Organizing your data is important for metrics you may want to collect. AWS S3 storage and cost metrics are calculated at the bucket level, so it is easy to get metrics by bucket. You can get storage metrics at the key prefix level, but that is done through the CLI, which can be very slow for large buckets. It is very difficult to estimate costs at the prefix level.

    Calculating Storage By Collection

    By bucket

    Usage by bucket can be obtained in your AWS Billing Dashboard via an S3 Usage Report. You can download your usage report for a period of time and review your storage and requests at the bucket level.

    Bucket metrics can also be found in the AWS CloudWatch Metrics Console (also see Using Amazon CloudWatch Metrics).

    Navigate to Storage Metrics and select the BucketName for all buckets you are interested in. The available metrics are BucketSizeInBytes and NumberOfObjects.

    In the Graphed metrics tab, you can select the type of statistic (i.e. average, minimum, maximum) and the period for the stats. At the top, it's useful to select from the dropdown to view the metrics as a number. You can also select the time period for which you want to see stats.

    Alternatively you can query CloudWatch using the CLI.

    This command will return the average number of bytes in the bucket test-bucket for 7/31/2019:

    aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2019-07-31T00:00:00 --end-time 2019-08-01T00:00:00 --period 86400 --statistics Average --region us-east-1 --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=test-bucket Name=StorageType,Value=StandardStorage

    The result looks like:

    {
    "Datapoints": [
    {
    "Timestamp": "2019-07-31T00:00:00Z",
    "Average": 150996467959.0,
    "Unit": "Bytes"
    }
    ],
    "Label": "BucketSizeBytes"
    }

    By key prefix

    AWS does not offer storage and usage statistics at a key prefix level. Via the AWS CLI, you can get the total storage for a bucket or folder. The following command would get the storage for folder example-folder in bucket sample-bucket:

    aws s3 ls --summarize --human-readable --recursive s3://sample-bucket/example-folder | grep 'Total'

    Note that this can be a long-running operation for large buckets.

    Calculating Cost By Collection

    NASA NGAP Environment

    If using an NGAP account, the cost per bucket can be found in your CloudTamer console, in the Financials section of your account information. This is calculated on a monthly basis.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Outside of NGAP

    You can enabled S3 Cost Allocation Tags and tag your buckets. From there, you can view the cost breakdown in your AWS Billing Dashboard via the Cost Explorer. Cost Allocation Tagging is available at the bucket level.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Storage Configuration

    Cumulus allows for the configuration of many buckets for your files. Buckets are created and added to your deployment as part of the deployment process.

    In your Cumulus collection configuration, you specify where you want the files to be stored post-processing. This is done by matching a regular expression on the file with the configured bucket.

    Note that in the collection configuration, the bucket field is the key to the buckets variable in the deployment's .tfvars file.

    Organizing By Bucket

    You can specify separate groups of buckets for each collection, which could look like the example below.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "MOD09GQ-006-private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "MOD09GQ-006-public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    Additional collections would go to different buckets.

    Organizing by Key Prefix

    Different collections can be organized into different folders in the same bucket, using the key prefix, which is specified as the url_path in the collection configuration. In this simplified collection configuration example, the url_path field is set at the top level so that all files go to a path prefixed with the collection name and version.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    In this case, the path to all the files would be: MOD09GQ___006/<filename> in their respective buckets.

    The url_path can be overidden directly on the file configuration. The example below produces the same result.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "protected-2",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    }
    ]
    }
    - + \ No newline at end of file diff --git a/docs/v10.0.0/configuration/data-management-types/index.html b/docs/v10.0.0/configuration/data-management-types/index.html index e0a439f4b43..92291043d1d 100644 --- a/docs/v10.0.0/configuration/data-management-types/index.html +++ b/docs/v10.0.0/configuration/data-management-types/index.html @@ -5,13 +5,13 @@ Cumulus Data Management Types | Cumulus Documentation - +
    Version: v10.0.0

    Cumulus Data Management Types

    What Are The Cumulus Data Management Types

    • Collections: Collections are logical sets of data objects of the same data type and version. They provide contextual information used by Cumulus ingest.
    • Granules: Granules are the smallest aggregation of data that can be independently managed. They are always associated with a collection, which is a grouping of granules.
    • Providers: Providers generate and distribute input data that Cumulus obtains and sends to workflows.
    • Rules: Rules tell Cumulus how to associate providers and collections and when/how to start processing a workflow.
    • Workflows: Workflows are composed of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage, and archive data.
    • Executions: Executions are records of a workflow.
    • Reconciliation Reports: Reports are a comparison of data sets to check to see if they are in agreement and to help Cumulus users detect conflicts.

    Interaction

    • Providers tell Cumulus where to get new data - i.e. S3, HTTPS
    • Collections tell Cumulus where to store the data files
    • Rules tell Cumulus when to trigger a workflow execution and tie providers and collections together

    Managing Data Management Types

    The following are created via the dashboard or API:

    • Providers
    • Collections
    • Rules
    • Reconciliation reports

    Granules are created by workflow executions and then can be managed via the dashboard or API.

    An execution record is created for each workflow execution triggered and can be viewed in the dashboard or data can be retrieved via the API.

    Workflows are created and managed via the Cumulus deployment.

    Configuration Fields

    Schemas

    Looking at our API schema definitions can provide us with some insight into collections, providers, rules, and their attributes (and whether those are required or not). The schema for different concepts will be reference throughout this document.

    The schemas are extremely useful for understanding which attributes are configurable and which of those are required. Cumulus uses these schemas for validation.

    Providers

    Please note:

    • While connection configuration is defined here, things that are more specific to a specific ingest setup (e.g. 'What target directory should we be pulling from' or 'How is duplicate handling configured?') are generally defined in a Rule or Collection, not the Provider.
    • There is some provider behavior which is controlled by task-specific configuration and not the provider definition. This configuration has to be set on a per-workflow basis. For example, see the httpListTimeout configuration on the discover-granules task

    Provider Configuration

    The Provider configuration is defined by a JSON object that takes different configuration keys depending on the provider type. The following are definitions of typical configuration values relevant for the various providers:

    Configuration by provider type
    S3
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be s3 for this provider type.
    hoststringYesS3 Bucket to pull data from
    http
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be http for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 80
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certificateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    https
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be https for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 443
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certiciateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    ftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be ftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to anonymous if not defined
    passwordstringNoPassword to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to password if not defined
    portintegerNoPort to connect to the provider on. Defaults to 21
    sftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be sftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the sftp server.
    passwordstringNoPassword to use to connect to the sftp server.
    portintegerNoPort to connect to the provider on. Defaults to 22
    privateKeystringNofilename assumed to be in s3://bucketInternal/stackName/crypto
    cmKeyIdstringNoAWS KMS Customer Master Key arn or alias

    Collections

    Break down of [s3_MOD09GQ_006.json](https://github.com/nasa/cumulus/blob/master/example/data/collections/s3_MOD09GQ_006/s3_MOD09GQ_006.json)
    KeyValueRequiredDescription
    name"MOD09GQ"YesThe name attribute designates the name of the collection. This is the name under which the collection will be displayed on the dashboard
    version"006"YesA version tag for the collection
    granuleId"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$"YesThe regular expression used to validate the granule ID extracted from filenames according to the granuleIdExtraction
    granuleIdExtraction"(MOD09GQ\..*)(\.hdf|\.cmr|_ndvi\.jpg)"YesThe regular expression used to extract the granule ID from filenames. The first capturing group extracted from the filename by the regex will be used as the granule ID.
    sampleFileName"MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesAn example filename belonging to this collection
    files<JSON Object> of files defined hereYesDescribe the individual files that will exist for each granule in this collection (size, browse, meta, etc.)
    dataType"MOD09GQ"NoCan be specified, but this value will default to the collection_name if not
    duplicateHandling"replace"No("replace"|"version"|"skip") determines granule duplicate handling scheme
    ignoreFilesConfigForDiscoveryfalse (default)NoBy default, during discovery only files that match one of the regular expressions in this collection's files attribute (see above) are ingested. Setting this to true will ignore the files attribute during discovery, meaning that all files for a granule (i.e., all files with filenames matching granuleIdExtraction) will be ingested even when they don't match a regular expression in the files attribute at discovery time. (NOTE: this attribute does not appear in the example file, but is listed here for completeness.)
    process"modis"NoExample options for this are found in the ChooseProcess step definition in the IngestAndPublish workflow definition
    meta<JSON Object> of MetaData for the collectionNoMetaData for the collection. This metadata will be available to workflows for this collection via the Cumulus Message Adapter.
    url_path"{cmrMetadata.Granule.Collection.ShortName}/
    {substring(file.fileName, 0, 3)}"
    NoFilename without extension

    files-object

    KeyValueRequiredDescription
    regex"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"YesRegular expression used to identify the file
    sampleFileNameMOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesFilename used to validate the provided regex
    type"data"NoValue to be assigned to the Granule File Type. CNM types are used by Cumulus CMR steps, non-CNM values will be treated as 'data' type. Currently only utilized in DiscoverGranules task
    bucket"internal"YesName of the bucket where the file will be stored
    url_path"${collectionShortName}/{substring(file.fileName, 0, 3)}"NoFolder used to save the granule in the bucket. Defaults to the collection url_path
    checksumFor"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"NoIf this is a checksum file, set checksumFor to the regex of the target file.

    Rules

    Rules are used by to start processing workflows and the transformation process. Rules can be invoked manually, based on a schedule, or can be configured to be triggered by either events in Kinesis, SNS messages or SQS messages.

    Rule configuration
    KeyValueRequiredDescription
    name"L2_HR_PIXC_kinesisRule"YesName of the rule. This is the name under which the rule will be listed on the dashboard
    workflow"CNMExampleWorkflow"YesName of the workflow to be run. A list of available workflows can be found on the Workflows page
    provider"PODAAC_SWOT"NoConfigured provider's ID. This can be found on the Providers dashboard page
    collection<JSON Object> collection object shown belowYesName and version of the collection this rule will moderate. Relates to a collection configured and found in the Collections page
    payload<JSON Object or Array>NoThe payload to be passed to the workflow
    meta<JSON Object> of MetaData for the ruleNoMetaData for the rule. This metadata will be available to workflows for this rule via the Cumulus Message Adapter.
    rule<JSON Object> rule type and associated values - discussed belowYesObject defining the type and subsequent attributes of the rule
    state"ENABLED"No("ENABLED"|"DISABLED") whether or not the rule will be active. Defaults to "ENABLED".
    queueUrlhttps://sqs.us-east-1.amazonaws.com/1234567890/queue-nameNoURL for SQS queue that will be used to schedule workflows for this rule
    tags["kinesis", "podaac"]NoAn array of strings that can be used to simplify search

    collection-object

    KeyValueRequiredDescription
    name"L2_HR_PIXC"YesName of a collection defined/configured in the Collections dashboard page
    version"000"YesVersion number of a collection defined/configured in the Collections dashboard page

    meta-object

    KeyValueRequiredDescription
    retries3NoNumber of retries on errors, for sqs-type rule only. Defaults to 3.
    visibilityTimeout900NoVisibilityTimeout in seconds for the inflight messages, for sqs-type rule only. Defaults to the visibility timeout of the SQS queue when the rule is created.

    rule-object

    KeyValueRequiredDescription
    type"kinesis"Yes("onetime"|"scheduled"|"kinesis"|"sns"|"sqs") type of scheduling/workflow kick-off desired
    value<String> ObjectDependsDiscussion of valid values is below

    rule-value

    The rule - value entry depends on the type of run:

    • If this is a onetime rule this can be left blank. Example
    • If this is a scheduled rule this field must hold a valid cron-type expression or rate expression.
    • If this is a kinesis rule, this must be a configured ${Kinesis_stream_ARN}. Example
    • If this is an sns rule, this must be an existing ${SNS_Topic_Arn}. Example
    • If this is an sqs rule, this must be an existing ${SQS_QueueUrl} that your account has permissions to access, and also you must configure a dead-letter queue for this SQS queue. Example

    sqs-type rule features

    • When an SQS rule is triggered, the SQS message remains on the queue.
    • The SQS message is not processed multiple times in parallel when visibility timeout is properly set. You should set the visibility timeout to the maximum expected length of the workflow with padding. Longer is better to avoid parallel processing.
    • The SQS message visibility timeout can be overridden by the rule.
    • Upon successful workflow execution, the SQS message is removed from the queue.
    • Upon failed execution(s), the workflow is run 3 or configured number of times.
    • Upon failed execution(s), the visibility timeout will be set to 5s to allow retries.
    • After configured number of failed retries, the SQS message is moved to the dead-letter queue configured for the SQS queue.

    Configuration Via Cumulus Dashboard

    Create A Provider

    • In the Cumulus dashboard, go to the Provider page.

    Screenshot of Create Provider form

    • Click on Add Provider.
    • Fill in the form and then submit it.

    Screenshot of Create Provider form

    Create A Collection

    • Go to the Collections page.

    Screenshot of the Collections page

    • Click on Add Collection.
    • Copy and paste or fill in the collection JSON object form.

    Screenshot of Add Collection form

    • Once you submit the form, you should be able to verify that your new collection is in the list.

    Create A Rule

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Rule Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v10.0.0/configuration/lifecycle-policies/index.html b/docs/v10.0.0/configuration/lifecycle-policies/index.html index 58a68a88979..46a0a24a2eb 100644 --- a/docs/v10.0.0/configuration/lifecycle-policies/index.html +++ b/docs/v10.0.0/configuration/lifecycle-policies/index.html @@ -5,13 +5,13 @@ Setting S3 Lifecycle Policies | Cumulus Documentation - +
    Version: v10.0.0

    Setting S3 Lifecycle Policies

    This document will outline, in brief, how to set data lifecycle policies so that you are more easily able to control data storage costs while keeping your data accessible. For more information on why you might want to do this, see the 'Additional Information' section at the end of the document.

    Requirements

    • The AWS CLI installed and configured (if you wish to run the CLI example). See AWS's guide to setting up the AWS CLI for more on this. Please ensure the AWS CLI is in your shell path.
    • You will need a S3 bucket on AWS. You are strongly encouraged to use a bucket without voluminous amounts of data in it for experimenting/learning.
    • An AWS user with the appropriate roles to access the target bucket as well as modify bucket policies.

    Examples

    Walk-through on setting time-based S3 Infrequent Access (S3IA) bucket policy

    This example will give step-by-step instructions on updating a bucket's lifecycle policy to move all objects in the bucket from the default storage to S3 Infrequent Access (S3IA) after a period of 90 days. Below are instructions for walking through configuration via the command line and the management console.

    Command Line

    Please ensure you have the AWS CLI installed and configured for access prior to attempting this example.

    Create policy

    From any directory you chose, open an editor and add the following to a file named exampleRule.json

    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    Set policy

    On the command line run the following command (with the bucket you're working with substituted in place of yourBucketNameHere).

    aws s3api put-bucket-lifecycle-configuration --bucket yourBucketNameHere --lifecycle-configuration file://exampleRule.json

    Verify policy has been set

    To obtain all of the existing policies for a bucket, run the following command (again substituting the correct bucket name):

     $ aws s3api get-bucket-lifecycle-configuration --bucket yourBucketNameHere
    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    You have set a policy that transitions any version of an object in the bucket to S3IA after each object version has not been modified for 90 days.

    Management Console

    Create Policy

    To create the example policy on a bucket via the management console, go to the following URL (replacing 'yourBucketHere' with the bucket you intend to update):

    https://s3.console.aws.amazon.com/s3/buckets/yourBucketHere/?tab=overview

    You should see a screen similar to:

    Screenshot of AWS console for an S3 bucket

    Click the "Management" Tab, then lifecycle button and press + Add lifecycle rule:

    Screenshot of &quot;Management&quot; tab of AWS console for an S3 bucket

    Give the rule a name (e.g. '90DayRule'), leaving the filter blank:

    Screenshot of window for configuring the name and scope of a lifecycle rule on an S3 bucket in the AWS console

    Click next, and mark Current Version and Previous Versions.

    Then for each, click + Add transition and select Transition to Standard-IA after for the Object creation field, and set 90 for the Days after creation/Days after objects become concurrent field. Your screen should look similar to:

    Screenshot of window for configuring the storage class transitions of a lifecycle rule on an S3 bucket in the AWS console

    Click next, then next past the Configure expiration screen (we won't be setting this), and on the fourth page, click Save:

    Screenshot of window for reviewing the configuration of a lifecycle rule on an S3 bucket in the AWS console

    You should now see you have a rule configured for your bucket:

    Screenshot of lifecycle rule appearing in the &quot;Management&quot; tab of AWS console for an S3 bucket

    You have now set a policy that transitions any version of an object in the bucket to S3IA after each object has not been modified for 90 days.

    Additional Information

    This section lists information you may want prior to enacting lifecycle policies. It is not required content for working through the examples.

    Strategy Overview

    For a discussion of overall recommended strategy, please review the Methodology for Data Lifecycle Management on the EarthData wiki.

    AWS Documentation

    The examples shown in this document are obviously fairly basic cases. By using object tags, filters and other configuration options you can enact far more complicated policies for various scenarios. For more reading on the topics presented on this page see:

    - + \ No newline at end of file diff --git a/docs/v10.0.0/configuration/monitoring-readme/index.html b/docs/v10.0.0/configuration/monitoring-readme/index.html index 0a91a2035ad..adfed024121 100644 --- a/docs/v10.0.0/configuration/monitoring-readme/index.html +++ b/docs/v10.0.0/configuration/monitoring-readme/index.html @@ -5,14 +5,14 @@ Monitoring Best Practices | Cumulus Documentation - +
    Version: v10.0.0

    Monitoring Best Practices

    This document intends to provide a set of recommendations and best practices for monitoring the state of a deployed Cumulus and diagnosing any issues.

    Cumulus-provided resources and integrations for monitoring

    Cumulus provides a number set of resources that are useful for monitoring the system and its operation.

    Cumulus Dashboard

    The primary tool for monitoring the Cumulus system is the Cumulus Dashboard. The dashboard is hosted on Github and includes instructions on how to deploy and link it into your core Cumulus deployment.

    The dashboard displays workflow executions, their status, inputs, outputs, and some diagnostic information such as logs. For further information on the dashboard, its usage, and the information it provides, see the documentation.

    Cumulus-provided AWS resources

    Cumulus sets up CloudWatch log groups for all Core-provided tasks.

    Monitoring Lambda Functions

    Logging for each Lambda Function is available in Lambda-specific CloudWatch log groups.

    Monitoring ECS services

    Each deployed cumulus_ecs_service module also includes a CloudWatch log group for the processes running on ECS.

    Monitoring workflows

    For advanced debugging, we also configure dead letter queues on critical system functions. These will allow you to monitor and debug invalid inputs to the functions we use to start workflows, which can be helpful if you find that you are not seeing workflows being started as expected. More information on these can be found in the dead letter queue documentation

    AWS recommendations

    AWS has a number of recommendations on system monitoring. Rather than reproduce those here and risk providing outdated guidance, we've documented the following links which will take you to available AWS docs on monitoring recommendations and best practices for the services used in Cumulus:

    Example: Setting up email notifications for CloudWatch logs

    Cumulus does not provide out-of-the-box support for email notifications at this time. However, setting up email notifications on AWS is fairly straightforward in that the operative components are an AWS SNS topic and a subscribed email address.

    In terms of Cumulus integration, forwarding CloudWatch logs requires creating a mechanism, most likely a Lambda Function subscribed to the log group that will receive, filter and forward these messages to the SNS topic.

    As a very simple example, we could create a function that filters CloudWatch logs created by the @cumulus/logger package and sends email notifications for error and fatal log levels, adapting the example linked above:

    const zlib = require('zlib');
    const aws = require('aws-sdk');
    const { promisify } = require('util');

    const gunzip = promisify(zlib.gunzip);
    const sns = new aws.SNS();

    exports.handler = async (event) => {
    const payload = Buffer.from(event.awslogs.data, 'base64');
    const decompressedData = await gunzip(payload);
    const logData = JSON.parse(decompressedData.toString('ascii'));
    return await Promise.all(logData.logEvents.map(async (logEvent) => {
    const logMessage = JSON.parse(logEvent.message);
    if (['error', 'fatal'].includes(logMessage.level)) {
    return sns.publish({
    TopicArn: process.env.EmailReportingTopicArn,
    Message: logEvent.message
    }).promise();
    }
    return Promise.resolve();
    }));
    };

    After creating the SNS topic, We can deploy this code as a lambda function, following the setup steps from Amazon. Make sure to include your SNS topic ARN as an environment variable on the lambda function by using the --environment option on aws lambda create-function.

    You will need to create subscription filters for each log group you want to receive emails for. We recommend automating this as much as possible, and you could very well handle this via Terraform, such as using a module to deploy filters alongside log groups, or exporting the log group names to an all-in-one email notification module.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/configuration/server_access_logging/index.html b/docs/v10.0.0/configuration/server_access_logging/index.html index 3726e6395bb..0f69f3a0613 100644 --- a/docs/v10.0.0/configuration/server_access_logging/index.html +++ b/docs/v10.0.0/configuration/server_access_logging/index.html @@ -5,13 +5,13 @@ S3 Server Access Logging | Cumulus Documentation - +
    Version: v10.0.0

    S3 Server Access Logging

    Via AWS Console

    Enable server access logging for an S3 bucket

    Via AWS Command Line Interface

    1. Create a logging.json file with these contents, replacing <stack-internal-bucket> with your stack's internal bucket name, and <stack> with the name of your cumulus stack.

      {
      "LoggingEnabled": {
      "TargetBucket": "<stack-internal-bucket>",
      "TargetPrefix": "<stack>/ems-distribution/s3-server-access-logs/"
      }
      }
    2. Add the logging policy to each of your protected and public buckets by calling this command on each bucket.

      aws s3api put-bucket-logging --bucket <protected/public-bucket-name> --bucket-logging-status file://logging.json
    3. Verify the logging policy exists on your buckets.

      aws s3api get-bucket-logging --bucket <protected/public-bucket-name>
    - + \ No newline at end of file diff --git a/docs/v10.0.0/configuration/task-configuration/index.html b/docs/v10.0.0/configuration/task-configuration/index.html index 966508b53a4..f960e2b3972 100644 --- a/docs/v10.0.0/configuration/task-configuration/index.html +++ b/docs/v10.0.0/configuration/task-configuration/index.html @@ -5,13 +5,13 @@ Configuration of Tasks | Cumulus Documentation - +
    Version: v10.0.0

    Configuration of Tasks

    The cumulus module exposes values for configuration for some of the provided archive and ingest tasks. Currently the following are available as configurable variables:

    cmr_search_client_config

    Configuration parameters for CMR search client for cumulus archive module tasks in the form:

    <lambda_identifier>_report_cmr_limit = <maximum number records can be returned from cmr-client search, this should be greater than cmr_page_size>
    <lambda_identifier>_report_cmr_page_size = <number of records for each page returned from CMR>
    type = map(string)

    More information about cmr limit and cmr page_size can be found from @cumulus/cmr-client and CMR Search API document.

    Currently the following values are supported:

    • create_reconciliation_report_cmr_limit
    • create_reconciliation_report_cmr_page_size

    Example

    cmr_search_client_config = {
    create_reconciliation_report_cmr_limit = 2500
    create_reconciliation_report_cmr_page_size = 250
    }

    elasticsearch_client_config

    Configuration parameters for Elasticsearch client for cumulus archive module tasks in the form:

    <lambda_identifier>_es_scroll_duration = <duration>
    <lambda_identifier>_es_scroll_size = <size>
    type = map(string)

    Currently the following values are supported:

    • create_reconciliation_report_es_scroll_duration
    • create_reconciliation_report_es_scroll_size

    Example

    elasticsearch_client_config = {
    create_reconciliation_report_es_scroll_duration = "15m"
    create_reconciliation_report_es_scroll_size = 2000
    }

    lambda_timeouts

    A configurable map of timeouts (in seconds) for cumulus ingest module task lambdas in the form:

    <lambda_identifier>_timeout: <timeout>
    type = map(string)

    Currently the following values are supported:

    • discover_granules_task_timeout
    • discover_pdrs_task_timeout
    • hyrax_metadata_update_tasks_timeout
    • lzards_backup_task_timeout
    • move_granules_task_timeout
    • parse_pdr_task_timeout
    • pdr_status_check_task_timeout
    • post_to_cmr_task_timeout
    • queue_granules_task_timeout
    • queue_pdrs_task_timeout
    • queue_workflow_task_timeout
    • sync_granule_task_timeout
    • update_granules_cmr_metadata_file_links_task_timeout

    Example

    lambda_timeouts = {
    discover_granules_task_timeout = 300
    }
    - + \ No newline at end of file diff --git a/docs/v10.0.0/data-cookbooks/about-cookbooks/index.html b/docs/v10.0.0/data-cookbooks/about-cookbooks/index.html index 006a6a43061..b98eb9bb7df 100644 --- a/docs/v10.0.0/data-cookbooks/about-cookbooks/index.html +++ b/docs/v10.0.0/data-cookbooks/about-cookbooks/index.html @@ -5,13 +5,13 @@ About Cookbooks | Cumulus Documentation - +
    Version: v10.0.0

    About Cookbooks

    Introduction

    The following data cookbooks are documents containing examples and explanations of workflows in the Cumulus framework. Additionally, the following data cookbooks should serve to help unify an institution/user group on a set of terms.

    Setup

    The data cookbooks assume you can configure providers, collections, and rules to run workflows. Visit Cumulus data management types for information on how to configure Cumulus data management types.

    Adding a page

    As shown in detail in the "Add a New Page and Sidebars" section in Cumulus Docs: How To's, you can add a new page to the data cookbook by creating a markdown (.md) file in the docs/data-cookbooks directory. The new page can then be linked to the sidebar by adding it to the Data-Cookbooks object in the website/sidebar.json file as data-cookbooks/${id}.

    More about workflows

    Workflow general information

    Input & Output

    Developing Workflow Tasks

    Workflow Configuration How-to's

    - + \ No newline at end of file diff --git a/docs/v10.0.0/data-cookbooks/browse-generation/index.html b/docs/v10.0.0/data-cookbooks/browse-generation/index.html index b750e274086..983af2b2802 100644 --- a/docs/v10.0.0/data-cookbooks/browse-generation/index.html +++ b/docs/v10.0.0/data-cookbooks/browse-generation/index.html @@ -5,7 +5,7 @@ Ingest Browse Generation | Cumulus Documentation - + @@ -15,7 +15,7 @@ provider keys with the previously entered values) Note that you need to set the "provider_path" to the path on your bucket (e.g. "/data") that you've staged your mock/test data.:

    {
    "name": "TestBrowseGeneration",
    "workflow": "DiscoverGranulesBrowseExample",
    "provider": "{{provider_from_previous_step}}",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "meta": {
    "provider_path": "{{path_to_data}}"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "updatedAt": 1553053438767
    }

    Run Workflows

    Once you've configured the Collection and Provider and added a onetime rule, you're ready to trigger your rule, and watch the ingest workflows process.

    Go to the Rules tab, click the rule you just created:

    Screenshot of the Rules overview page with a list of rules in the Cumulus dashboard

    Then click the gear in the upper right corner and click "Rerun":

    Screenshot of clicking the button to rerun a workflow rule from the rule edit page in the Cumulus dashboard

    Tab over to executions and you should see the DiscoverGranulesBrowseExample workflow run, succeed, and then moments later the CookbookBrowseExample should run and succeed.

    Screenshot of page listing executions in the Cumulus dashboard

    Results

    You can verify your data has ingested by clicking the successful workflow entry:

    Screenshot of individual entry from table listing executions in the Cumulus dashboard

    Select "Show Output" on the next page

    Screenshot of &quot;Show output&quot; button from individual execution page in the Cumulus dashboard

    and you should see in the payload from the workflow something similar to:

    "payload": {
    "process": "modis",
    "granules": [
    {
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-private",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "type": "browse",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-protected-2",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}"
    }
    ],
    "cmrLink": "https://cmr.uat.earthdata.nasa.gov/search/granules.json?concept_id=G1222231611-CUMULUS",
    "cmrConceptId": "G1222231611-CUMULUS",
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "cmrMetadataFormat": "echo10",
    "dataType": "MOD09GQ",
    "version": "006",
    "published": true
    }
    ]
    }

    You can verify the granules exist within your cumulus instance (search using the Granules interface, check the S3 buckets, etc) and validate that the above CMR entry


    Build Processing Lambda

    This section discusses the construction of a custom processing lambda to replace the contrived example from this entry for a real dataset processing task.

    To ingest your own data using this example, you will need to construct your own lambda to replace the source in ProcessingStep that will generate browse imagery and provide or update a CMR metadata export file.

    You will then need to add the lambda to your Cumulus deployment as a aws_lambda_function Terraform resource.

    The discussion below outlines requirements for this lambda.

    Inputs

    The incoming message to the task defined in the ProcessingStep as configured will have the following configuration values (accessible inside event.config courtesy of the message adapter):

    Configuration

    • event.config.bucket -- the name of the bucket configured in terraform.tfvars as your internal bucket.

    • event.config.collection -- The full collection object we will configure in the Configure Ingest section. You can view the expected collection schema in the docs here or in the source code on github. You need this as available input and output so you can update as needed.

    event.config.additionalUrls, generateFakeBrowse and event.config.cmrMetadataFormat from the example can be ignored as they're configuration flags for the provided example script.

    Payload

    The 'payload' from the previous task is accessible via event.input. The expected payload output schema from SyncGranules can be viewed here.

    In our example, the payload would look like the following. Note: The types are set per-file based on what we configured in our collection, and were initially added as part of the DiscoverGranules step in the DiscoverGranulesBrowseExample workflow.

     "payload": {
    "process": "modis",
    "granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    }
    ]
    }
    ]
    }

    Generating Browse Imagery

    The provided example script used in the example goes through all granules and adds a 'fake' .jpg browse file to the same staging location as the data staged by prior ingest tasksf.

    The processing lambda you construct will need to do the following:

    • Create a browse image file based on the input data, and stage it to a location accessible to both this task and the FilesToGranules and MoveGranules tasks in a S3 bucket.
    • Add the browse file to the input granule files, making sure to set the granule file's type to browse.
    • Update meta.input_granules with the updated granules list, as well as provide the files to be integrated by FilesToGranules as output from the task.

    Generating/updating CMR metadata

    If you do not already have a CMR file in the granules list, you will need to generate one for valid export. This example's processing script generates and adds it to the FilesToGranules file list via the payload but it can be present in the InputGranules from the DiscoverGranules task as well if you'd prefer to pre-generate it.

    Both downstream tasks MoveGranules, UpdateGranulesCmrMetadataFileLinks, and PostToCmr expect a valid CMR file to be available if you want to export to CMR.

    Expected Outputs for processing task/tasks

    In the above example, the critical portion of the output to FilesToGranules is the payload and meta.input_granules.

    In the example provided, the processing task is setup to return an object with the keys "files" and "granules". In the cumulus_message configuration, the outputs are mapped in the configuration to the payload, granules to meta.input_granules:

              "task_config": {
    "inputGranules": "{$.meta.input_granules}",
    "granuleIdExtraction": "{$.meta.collection.granuleIdExtraction}"
    }

    Their expected values from the example above may be useful in constructing a processing task:

    payload

    The payload includes a full list of files to be 'moved' into the cumulus archive. The FilesToGranules task will take this list, merge it with the information from InputGranules, then pass that list to the MoveGranules task. The MoveGranules task will then move the files to their targets. The UpdateGranulesCmrMetadataFileLinks task will update the CMR metadata file if it exists with the updated granule locations and update the CMR file etags.

    In the provided example, a payload being passed to the FilesToGranules task should be expected to look like:

      "payload": [
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml"
    ]

    This list is the list of granules FilesToGranules will act upon to add/merge with the input_granules object.

    The pathing is generated from sync-granules, but in principle the files can be staged wherever you like so long as the processing/MoveGranules task's roles have access and the filename matches the collection configuration.

    input_granules

    The FilesToGranules task utilizes the incoming payload to chose which files to move, but pulls all other metadata from meta.input_granules. As such, the output payload in the example would look like:

    "input_granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg"
    }
    ]
    }
    ],
    - + \ No newline at end of file diff --git a/docs/v10.0.0/data-cookbooks/choice-states/index.html b/docs/v10.0.0/data-cookbooks/choice-states/index.html index 08c126e3b89..b3f2219968a 100644 --- a/docs/v10.0.0/data-cookbooks/choice-states/index.html +++ b/docs/v10.0.0/data-cookbooks/choice-states/index.html @@ -5,13 +5,13 @@ Choice States | Cumulus Documentation - +
    Version: v10.0.0

    Choice States

    Cumulus supports AWS Step Function Choice states. A Choice state enables branching logic in Cumulus workflows.

    Choice state definitions include a list of Choice Rules. Each Choice Rule defines a logical operation which compares an input value against a value using a comparison operator. For available comparison operators, review the AWS docs.

    If the comparison evaluates to true, the Next state is followed.

    Example

    In examples/cumulus-tf/parse_pdr_workflow.tf the ParsePdr workflow uses a Choice state, CheckAgainChoice, to terminate the workflow once meta.isPdrFinished: true is returned by the CheckStatus state.

    The CheckAgainChoice state definition requires an input object of the following structure:

    {
    "meta": {
    "isPdrFinished": false
    }
    }

    Given the above input to the CheckAgainChoice state, the workflow would transition to the PdrStatusReport state.

    "CheckAgainChoice": {
    "Type": "Choice",
    "Choices": [
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": false,
    "Next": "PdrStatusReport"
    },
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": true,
    "Next": "WorkflowSucceeded"
    }
    ],
    "Default": "WorkflowSucceeded"
    }

    Advanced: Loops in Cumulus Workflows

    Understanding the complete ParsePdr workflow is not necessary to understanding how Choice states work, but ParsePdr provides an example of how Choice states can be used to create a loop in a Cumulus workflow.

    In the complete ParsePdr workflow definition, the state QueueGranules is followed by CheckStatus. From CheckStatus a loop starts: Given CheckStatus returns meta.isPdrFinished: false, CheckStatus is followed by CheckAgainChoice is followed by PdrStatusReport is followed by WaitForSomeTime, which returns to CheckStatus. Once CheckStatus returns meta.isPdrFinished: true, CheckAgainChoice proceeds to WorkflowSucceeded.

    Execution graph of SIPS ParsePdr workflow in AWS Step Functions console

    Further documentation

    For complete details on Choice state configuration options, see the Choice state documentation.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/data-cookbooks/cnm-workflow/index.html b/docs/v10.0.0/data-cookbooks/cnm-workflow/index.html index 301777b2936..5c8b138e308 100644 --- a/docs/v10.0.0/data-cookbooks/cnm-workflow/index.html +++ b/docs/v10.0.0/data-cookbooks/cnm-workflow/index.html @@ -5,7 +5,7 @@ CNM Workflow | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v10.0.0

    CNM Workflow

    This entry documents how to setup a workflow that utilizes the built-in CNM/Kinesis functionality in Cumulus.

    Prior to working through this entry you should be familiar with the Cloud Notification Mechanism.

    Sections


    Prerequisites

    Cumulus

    This entry assumes you have a deployed instance of Cumulus (version >= 1.16.0). The entry assumes you are deploying Cumulus via the cumulus terraform module sourced from the release page.

    AWS CLI

    This entry assumes you have the AWS CLI installed and configured. If you do not, please take a moment to review the documentation - particularly the examples relevant to Kinesis - and install it now.

    Kinesis

    This entry assumes you already have two Kinesis data steams created for use as CNM notification and response data streams.

    If you do not have two streams setup, please take a moment to review the Kinesis documentation and setup two basic single-shard streams for this example:

    Using the "Create Data Stream" button on the Kinesis Dashboard, work through the dialogue.

    You should be able to quickly use the "Create Data Stream" button on the Kinesis Dashboard, and setup streams that are similar to the following example:

    Screenshot of AWS console page for creating a Kinesis stream

    Please bear in mind that your {{prefix}}-lambda-processing IAM role will need permissions to write to the response stream for this workflow to succeed if you create the Kinesis stream with a dashboard user. If you are using the cumulus top-level module for your deployment this should be set properly.

    If not, the most straightforward approach is to attach the AmazonKinesisFullAccess policy for the stream resource to whatever role your Lambda s are using, however your environment/security policies may require an approach specific to your deployment environment.

    In operational environments it's likely science data providers would typically be responsible for providing a Kinesis stream with the appropriate permissions.

    For more information on how this process works and how to develop a process that will add records to a stream, read the Kinesis documentation and the developer guide.

    Source Data

    This entry will run the SyncGranule task against a single target data file. To that end it will require a single data file to be present in an S3 bucket matching the Provider configured in the next section.

    Collection and Provider

    Cumulus will need to be configured with a Collection and Provider entry of your choosing. The provider should match the location of the source data from the Ingest Source Data section.

    This can be done via the Cumulus Dashboard if installed or the API. It is strongly recommended to use the dashboard if possible.


    Configure the Workflow

    Provided the prerequisites have been fulfilled, you can begin adding the needed values to your Cumulus configuration to configure the example workflow.

    The following are steps that are required to set up your Cumulus instance to run the example workflow:

    Example CNM Workflow

    In this example, we're going to trigger a workflow by creating a Kinesis rule and sending a record to a Kinesis stream.

    The following workflow definition should be added to a new .tf workflow resource (e.g. cnm_workflow.tf) in your deployment directory. For the complete CNM workflow example, see examples/cumulus-tf/kinesis_trigger_test_workflow.tf.

    Add the following to the new terraform file in your deployment directory, updating the following:

    • Set the response-endpoint key in the CnmResponse task in the workflow JSON to match the name of the Kinesis response stream you configured in the prerequisites section
    • Update the source key to the workflow module to match the Cumulus release associated with your deployment.
    module "cnm_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-workflow.zip"

    prefix = var.prefix
    name = "CNMExampleWorkflow"
    workflow_config = module.cumulus.workflow_config
    system_bucket = var.system_bucket

    {
    state_machine_definition = <<JSON
    "CNMExampleWorkflow": {
    "Comment": "CNMExampleWorkflow",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "collection": "{$.meta.collection}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "response-endpoint": "ADD YOUR RESPONSE STREAM NAME HERE",
    "region": "us-east-1",
    "type": "kinesis",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$.input.input}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 5,
    "MaxAttempts": 3
    }
    ],
    "End": true
    }
    }
    }
    }
    JSON

    Again, please make sure to modify the value response-endpoint to match the stream name (not ARN) for your Kinesis response stream.

    Lambda Configuration

    To execute this workflow, you're required to include several Lambda resources in your deployment. To do this, add the following task (Lambda) definitions to your deployment along with the workflow you created above:

    Please note: To utilize these tasks you need to ensure you have a compatible CMA layer. See the deployment instructions for more details on how to deploy a CMA layer.

    Below is a description of each of these tasks:

    CNMToCMA

    CNMToCMA is meant for the beginning of a workflow: it maps CNM granule information to a payload for downstream tasks. For other CNM workflows, you would need to ensure that downstream tasks in your workflow either understand the CNM message or include a translation task like this one.

    You can also manipulate the data sent to downstream tasks using task_config for various states in your workflow resource configuration. Read more about how to configure data on the Workflow Input & Output page.

    CnmResponse

    The CnmResponse Lambda generates a CNM response message and puts it on the response-endpoint Kinesis stream.

    You can read more about the expected schema of a CnmResponse record in the Cloud Notification Mechanism schema repository.

    Additional Tasks

    Lastly, this entry also makes use of the SyncGranule task from the cumulus module.

    Redeploy

    Once the above configuration changes have been made, redeploy your stack.

    Please refer to Update Cumulus resources in the deployment documentation if you are unfamiliar with redeployment.

    Rule Configuration

    Cumulus includes a messageConsumer Lambda function (message-consumer). Cumulus kinesis-type rules create the event source mappings between Kinesis streams and the messageConsumer Lambda. The messageConsumer Lambda consumes records from one or more Kinesis streams, as defined by enabled kinesis-type rules. When new records are pushed to one of these streams, the messageConsumer triggers workflows associated with the enabled kinesis-type rules.

    To add a rule via the dashboard (if you'd like to use the API, see the docs here), navigate to the Rules page and click Add a rule, then configure the new rule using the following template (substituting correct values for parameters denoted by ${}):

    {
    "collection": {
    "name": "L2_HR_PIXC",
    "version": "000"
    },
    "name": "L2_HR_PIXC_kinesisRule",
    "provider": "PODAAC_SWOT",
    "rule": {
    "type": "kinesis",
    "value": "arn:aws:kinesis:{{awsRegion}}:{{awsAccountId}}:stream/{{streamName}}"
    },
    "state": "ENABLED",
    "workflow": "CNMExampleWorkflow"
    }

    Please Note:

    • The rule's value attribute value must match the Amazon Resource Name ARN for the Kinesis data stream you've preconfigured. You should be able to obtain this ARN from the Kinesis Dashboard entry for the selected stream.
    • The collection and provider should match the collection and provider you setup in the Prerequisites section.

    Once you've clicked on 'submit' a new rule should appear in the dashboard's Rule Overview.


    Execute the Workflow

    Once Cumulus has been redeployed and a rule has been added, we're ready to trigger the workflow and watch it execute.

    How to Trigger the Workflow

    To trigger matching workflows, you will need to put a record on the Kinesis stream that the message-consumer Lambda will recognize as a matching event. Most importantly, it should include a collection name that matches a valid collection.

    For the purpose of this example, the easiest way to accomplish this is using the AWS CLI.

    Create Record JSON

    Construct a JSON file containing an object that matches the values that have been previously setup. This JSON object should be a valid Cloud Notification Mechanism message.

    Please note: this example is somewhat contrived, as the downstream tasks don't care about most of these fields. A 'real' data ingest workflow would.

    The following values (denoted by ${} in the sample below) should be replaced to match values we've previously configured:

    • TEST_DATA_FILE_NAME: The filename of the test data that is available in the S3 (or other) provider we created earlier.
    • TEST_DATA_URI: The full S3 path to the test data (e.g. s3://bucket-name/path/granule)
    • COLLECTION: The collection name defined in the prerequisites for this product
    {
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "${TEST_DATA_FILE_NAME}",
    "checksum": "bogus_checksum_value",
    "uri": "${TEST_DATA_URI}",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "${TEST_DATA_FILE_NAME}",
    "dataVersion": "006"
    },
    "identifier ": "testIdentifier123456",
    "collection": "${COLLECTION}",
    "provider": "TestProvider",
    "version": "001",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Add Record to Kinesis Data Stream

    Using the JSON file you created, push it to the Kinesis notification stream:

    aws kinesis put-record --stream-name YOUR_KINESIS_NOTIFICATION_STREAM_NAME_HERE --partition-key 1 --data file:///path/to/file.json

    Please note: The above command uses the stream name, not the ARN.

    The command should return output similar to:

    {
    "ShardId": "shardId-000000000000",
    "SequenceNumber": "42356659532578640215890215117033555573986830588739321858"
    }

    This command will put a record containing the JSON from the --data flag onto the Kinesis data stream. The messageConsumer Lambda will consume the record and construct a valid CMA payload to trigger workflows. For this example, the record will trigger the CNMExampleWorkflow workflow as defined by the rule previously configured.

    You can view the current running executions on the Executions dashboard page which presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information.

    Verify Workflow Execution

    As detailed above, once the record is added to the Kinesis data stream, the messageConsumer Lambda will trigger the CNMExampleWorkflow .

    TranslateMessage

    TranslateMessage (which corresponds to the CNMToCMA Lambda) will take the CNM object payload and add a granules object to the CMA payload that's consistent with other Cumulus ingest tasks, and add a meta.cnm key (as well as the payload) to store the original message.

    For more on the Message Adapter, please see the Message Flow documentation.

    An example of what is happening in the CNMToCMA Lambda is as follows:

    Example Input Payload:

    "payload": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some_bucket/cumulus-test-data/pdrs/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Example Output Payload:

      "payload": {
    "cnm": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552"
    },
    "output": {
    "granules": [
    {
    "granuleId": "TestGranuleUR",
    "files": [
    {
    "path": "some-bucket/data",
    "url_path": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "some-bucket",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 12345678
    }
    ]
    }
    ]
    }
    }

    SyncGranules

    This Lambda will take the files listed in the payload and move them to s3://{deployment-private-bucket}/file-staging/{deployment-name}/{COLLECTION}/{file_name}.

    CnmResponse

    Assuming a successful execution of the workflow, this task will recover the meta.cnm key from the CMA output, and add a "SUCCESS" record to the notification Kinesis stream.

    If a prior step in the workflow has failed, this will add a "FAILURE" record to the stream instead.

    The data written to the response-endpoint should adhere to the Response Message Fields schema.

    Example CNM Success Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "SUCCESS"
    }
    }

    Example CNM Error Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "FAILURE",
    "errorCode": "PROCESSING_ERROR",
    "errorMessage": "File [cumulus-dev-a4d38f59-5e57-590c-a2be-58640db02d91/prod_20170926T11:30:36/production_file.nc] did not match gve checksum value."
    }
    }

    Note the CnmResponse state defined in the .tf workflow definition above configures $.exception to be passed to the CnmResponse Lambda keyed under config.WorkflowException. This is required for the CnmResponse code to deliver a failure response.

    To test the failure scenario, send a record missing the product.name key.


    Verify results

    Check for successful execution on the dashboard

    Following the successful execution of this workflow, you should expect to see the workflow complete successfully on the dashboard:

    Screenshot of a successful CNM workflow appearing on the executions page of the Cumulus dashboard

    Check the test granule has been delivered to S3 staging

    The test granule identified in the Kinesis record should be moved to the deployment's private staging area.

    Check for Kinesis records

    A SUCCESS notification should be present on the response-endpoint Kinesis stream.

    You should be able to validate the notification and response streams have the expected records with the following steps (the AWS CLI Kinesis Basic Stream Operations is useful to review before proceeding):

    Get a shard iterator (substituting your stream name as appropriate):

    aws kinesis get-shard-iterator \
    --shard-id shardId-000000000000 \
    --shard-iterator-type LATEST \
    --stream-name NOTIFICATION_OR_RESPONSE_STREAM_NAME

    which should result in an output to:

    {
    "ShardIterator": "VeryLongString=="
    }
    • Re-trigger the workflow by using the put-record command from
    • As the workflow completes, use the output from the get-shard-iterator command to request data from the stream:
    aws kinesis get-records --shard-iterator SHARD_ITERATOR_VALUE

    This should result in output similar to:

    {
    "Records": [
    {
    "SequenceNumber": "49586720336541656798369548102057798835250389930873978882",
    "ApproximateArrivalTimestamp": 1532664689.128,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjI4LjkxOSJ9",
    "PartitionKey": "1"
    },
    {
    "SequenceNumber": "49586720336541656798369548102059007761070005796999266306",
    "ApproximateArrivalTimestamp": 1532664707.149,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjQ2Ljk1OCJ9",
    "PartitionKey": "1"
    }
    ],
    "NextShardIterator": "AAAAAAAAAAFo9SkF8RzVYIEmIsTN+1PYuyRRdlj4Gmy3dBzsLEBxLo4OU+2Xj1AFYr8DVBodtAiXbs3KD7tGkOFsilD9R5tA+5w9SkGJZ+DRRXWWCywh+yDPVE0KtzeI0andAXDh9yTvs7fLfHH6R4MN9Gutb82k3lD8ugFUCeBVo0xwJULVqFZEFh3KXWruo6KOG79cz2EF7vFApx+skanQPveIMz/80V72KQvb6XNmg6WBhdjqAA==",
    "MillisBehindLatest": 0
    }

    Note the data encoding is not human readable and would need to be parsed/converted to be interpretable. There are many options to build a Kineis consumer such as the KCL.

    For purposes of validating the workflow, it may be simpler to locate the workflow in the Step Function Management Console and assert the expected output is similar to the below examples.

    Successful CNM Response Object Example:

    {
    "cnmResponse": {
    "provider": "TestProvider",
    "collection": "MOD09GQ",
    "version": "123456",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier ": "testIdentifier123456",
    "response": {
    "status": "SUCCESS"
    }
    }
    }

    Kinesis Record Error Handling

    messageConsumer

    The default Kinesis stream processing in the Cumulus system is configured for record error tolerance.

    When the messageConsumer fails to process a record, the failure is captured and the record is published to the kinesisFallback SNS Topic. The kinesisFallback SNS topic broadcasts the record and a subscribed copy of the messageConsumer Lambda named kinesisFallback consumes these failures.

    At this point, the normal Lambda asynchronous invocation retry behavior will attempt to process the record 3 mores times. After this, if the record cannot successfully be processed, it is written to a dead letter queue. Cumulus' dead letter queue is an SQS Queue named kinesisFailure. Operators can use this queue to inspect failed records.

    This system ensures when messageConsumer fails to process a record and trigger a workflow, the record is retried 3 times. This retry behavior improves system reliability in case of any external service failure outside of Cumulus control.

    The Kinesis error handling system - the kinesisFallback SNS topic, messageConsumer Lambda, and kinesisFailure SQS queue - come with the API package and do not need to be configured by the operator.

    To examine records that were unable to be processed at any step you need to go look at the dead letter queue {{prefix}}-kinesisFailure. Check the Simple Queue Service (SQS) console. Select your queue, and under the Queue Actions tab, you can choose View/Delete Messages. Start polling for messages and you will see records that failed to process through the messageConsumer.

    Note, these are only records that occurred when processing records from Kinesis streams. Workflow failures are handled differently.

    Kinesis Stream logging

    Notification Stream messages

    Cumulus includes two Lambdas (KinesisInboundEventLogger and KinesisOutboundEventLogger) that utilize the same code to take a Kinesis record event as input, deserialize the data field and output the modified event to the logs.

    When a kinesis rule is created, in addition to the messageConsumer event mapping, an event mapping is created to trigger KinesisInboundEventLogger to record a log of the inbound record, to allow for analysis in case of unexpected failure.

    Response Stream messages

    Cumulus also supports this feature for all outbound messages. To take advantage of this feature, you will need to set an event mapping on the KinesisOutboundEventLogger Lambda that targets your response-endpoint. You can do this in the Lambda management page for KinesisOutboundEventLogger. Add a Kinesis trigger, and configure it to target the cnmResponseStream for your workflow:

    Screenshot of the AWS console showing configuration for Kinesis stream trigger on KinesisOutboundEventLogger Lambda

    Once this is done, all records sent to the response-endpoint will also be logged in CloudWatch. For more on configuring Lambdas to trigger on Kinesis events, please see creating an event source mapping.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/data-cookbooks/error-handling/index.html b/docs/v10.0.0/data-cookbooks/error-handling/index.html index 6e71b00b147..cda9d12e289 100644 --- a/docs/v10.0.0/data-cookbooks/error-handling/index.html +++ b/docs/v10.0.0/data-cookbooks/error-handling/index.html @@ -5,7 +5,7 @@ Error Handling in Workflows | Cumulus Documentation - + @@ -45,7 +45,7 @@ Service Exception. See this documentation on configuring your workflow to handle transient lambda errors.

    Example state machine definition:

    {
    "Comment": "Tests Workflow from Kinesis Stream",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "Path": "$.payload",
    "TargetPath": "$.payload"
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": ["States.ALL"],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowSucceeded"
    },
    "CnmResponseFail": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowFailed"
    },
    "WorkflowSucceeded": {
    "Type": "Succeed"
    },
    "WorkflowFailed": {
    "Type": "Fail",
    "Cause": "Workflow failed"
    }
    }
    }

    The above results in a workflow which is visualized in the diagram below:

    Screenshot of a visualization of an AWS Step Function workflow definition with branching logic for failures

    Summary

    Error handling should (mostly) be the domain of workflow configuration.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/data-cookbooks/hello-world/index.html b/docs/v10.0.0/data-cookbooks/hello-world/index.html index 451726f9eae..9abaf20b16c 100644 --- a/docs/v10.0.0/data-cookbooks/hello-world/index.html +++ b/docs/v10.0.0/data-cookbooks/hello-world/index.html @@ -5,14 +5,14 @@ HelloWorld Workflow | Cumulus Documentation - +
    Version: v10.0.0

    HelloWorld Workflow

    Example task meant to be a sanity check/introduction to the Cumulus workflows.

    Pre-Deployment Configuration

    Workflow Configuration

    A workflow definition can be found in the template repository hello_world_workflow module.

    {
    "Comment": "Returns Hello World",
    "StartAt": "HelloWorld",
    "States": {
    "HelloWorld": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.hello_world_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    }

    Workflow error-handling can be configured as discussed in the Error-Handling cookbook.

    Task Configuration

    The HelloWorld task is provided for you as part of the cumulus terraform module, no configuration is needed.

    If you want to manually deploy your own version of this Lambda for testing, you can copy the Lambda resource definition located in the Cumulus source code at cumulus/tf-modules/ingest/hello-world-task.tf. The Lambda source code is located in the Cumulus source code at 'cumulus/tasks/hello-world'.

    Execution

    We will focus on using the Cumulus dashboard to schedule the execution of a HelloWorld workflow.

    Our goal here is to create a rule through the Cumulus dashboard that will define the scheduling and execution of our HelloWorld workflow. Let's navigate to the Rules page and click Add a rule.

    {
    "collection": { # collection values can be configured and found on the Collections page
    "name": "${collection_name}",
    "version": "${collection_version}"
    },
    "name": "helloworld_rule",
    "provider": "${provider}", # found on the Providers page
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "workflow": "HelloWorldWorkflow" # This can be found on the Workflows page
    }

    Screenshot of AWS Step Function execution graph for the HelloWorld workflow Executed workflow as seen in AWS Console

    Output/Results

    The Executions page presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information. The rule defined in the previous section should start an execution of its own accord, and the status of that execution can be tracked here.

    To get some deeper information on the execution, click on the value in the Name column of your execution of interest. This should bring up a visual representation of the workflow similar to that shown above, execution details, and a list of events.

    Summary

    Setting up the HelloWorld workflow on the Cumulus dashboard is the tip of the iceberg, so to speak. The task and step-function need to be configured before Cumulus deployment. A compatible collection and provider must be configured and applied to the rule. Finally, workflow execution status can be viewed via the workflows tab on the dashboard.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/data-cookbooks/ingest-notifications/index.html b/docs/v10.0.0/data-cookbooks/ingest-notifications/index.html index 9a003c78c33..5bb3746adf4 100644 --- a/docs/v10.0.0/data-cookbooks/ingest-notifications/index.html +++ b/docs/v10.0.0/data-cookbooks/ingest-notifications/index.html @@ -5,13 +5,13 @@ Ingest Notification in Workflows | Cumulus Documentation - +
    Version: v10.0.0

    Ingest Notification in Workflows

    On deployment, an SQS queue and three SNS topics are created and used for handling notification messages related to the workflow.

    The sfEventSqsToDbRecords Lambda function reads from the sfEventSqsToDbRecordsInputQueue queue and updates DynamoDB. The DynamoDB events for the ExecutionsTable, GranulesTable and PdrsTable are streamed on DynamoDBStreams, which are read by the publishExecutions, publishGranules and publishPdrs Lambda functions, respectively.

    These Lambda functions publish to the three SNS topics both when the workflow starts and when it reaches a terminal state (completion or failure). The following describes how many message(s) each topic receives both on workflow start and workflow completion/failure:

    • reportExecutions - Receives 1 message per workflow execution
    • reportGranules - Receives 1 message per granule in a workflow execution
    • reportPdrs - Receives 1 message per PDR

    Diagram of architecture for reporting workflow ingest notifications from AWS Step Functions

    The ingest notification reporting SQS queue is populated via a Cloudwatch rule for any Step Function execution state transitions. The sfEventSqsToDbRecords Lambda consumes this queue. The queue and Lambda are included in the cumulus module and the Cloudwatch rule in the workflow module and are included by default in a Cumulus deployment.

    Sending SQS messages to report status

    Publishing granule/PDR reports directly to the SQS queue

    If you have a non-Cumulus workflow or process ingesting data and would like to update the status of your granules or PDRs, you can publish directly to the reporting SQS queue. Publishing messages to this queue will result in those messages being stored as granule/PDR records in the Cumulus database and having the status of those granules/PDRs being visible on the Cumulus dashboard. The queue does have certain expectations as it expects a Cumulus Message nested within a Cloudwatch Step Function Event object.

    Posting directly to the queue will require knowing the queue URL. Assuming that you are using the cumulus module for your deployment, you can get the queue URL by adding them to outputs.tf for your Terraform deployment as in our example deployment:

    output "stepfunction_event_reporter_queue_url" {
    value = module.cumulus.stepfunction_event_reporter_queue_url
    }

    output "report_executions_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_granules_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_pdrs_sns_topic_arn" {
    value = module.cumulus.report_pdrs_sns_topic_arn
    }

    Then, when you run terraform deploy, you should see the topic ARNs printed to your console:

    Outputs:
    ...
    stepfunction_event_reporter_queue_url = https://sqs.us-east-1.amazonaws.com/xxxxxxxxx/<prefix>-sfEventSqsToDbRecordsInputQueue
    report_executions_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_granules_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_pdrs_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-pdrs-topic

    Once you have the queue URL, you can use the AWS SDK for your language of choice to publish messages to the topic. The expected format of these messages is that of a Cloudwatch Step Function event containing a Cumulus message. For SUCCEEDED events, the Cumulus message is expected to be in detail.output. For all other events statuses, a Cumulus Message is expected in detail.input. The Cumulus Message populating these fields MUST be a JSON string, not an object. Messages that do not conform to the schemas will fail to be created as records.

    If you are not seeing records persist to the database or show up in the Cumulus dashboard, you can investigate the Cloudwatch logs of the SQS consumer Lambda:

    • /aws/lambda/<prefix>-sfEventSqsToDbRecords

    In a workflow

    As described above, ingest notifications will automatically be published to the SNS topics on workflow start and completion/failure, so you should not include a workflow step to publish the initial or final status of your workflows.

    However, if you want to report your ingest status at any point during a workflow execution, you can add a workflow step using the SfSqsReport Lambda. In the following example from cumulus-tf/parse_pdr_workflow.tf, the ParsePdr workflow is configured to use the SfSqsReport Lambda, primarily to update the PDR ingestion status.

    Note: ${sf_sqs_report_task_arn} is an interpolated value referring to a Terraform resource. See the example deployment code for the ParsePdr workflow.

      "PdrStatusReport": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    },
    "ResultPath": null,
    "Type": "Task",
    "Resource": "${sf_sqs_report_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WaitForSomeTime"
    },

    Subscribing additional listeners to SNS topics

    Additional listeners to SNS topics can be configured in a .tf file for your Cumulus deployment. Shown below is configuration that subscribes an additional Lambda function (test_lambda) to receive messages from the report_executions SNS topic. To subscribe to the report_granules or report_pdrs SNS topics instead, simply replace report_executions in the code block below with either of those values.

    resource "aws_lambda_function" "test_lambda" {
    function_name = "${var.prefix}-testLambda"
    filename = "./testLambda.zip"
    source_code_hash = filebase64sha256("./testLambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"
    }

    resource "aws_sns_topic_subscription" "test_lambda" {
    topic_arn = module.cumulus.report_executions_sns_topic_arn
    protocol = "lambda"
    endpoint = aws_lambda_function.test_lambda.arn
    }

    resource "aws_lambda_permission" "test_lambda" {
    action = "lambda:InvokeFunction"
    function_name = aws_lambda_function.test_lambda.arn
    principal = "sns.amazonaws.com"
    source_arn = module.cumulus.report_executions_sns_topic_arn
    }

    SNS message format

    Subscribers to the SNS topics can expect to find the published message in the SNS event at Records[0].Sns.Message. The message will be a JSON stringified version of the ingest notification record for an execution or a PDR. For granules, the message will be a JSON stringified object with ingest notification record in the record property and the event type as the event property.

    The ingest notification record of the execution, granule, or PDR should conform to the data model schema for the given record type.

    Summary

    Workflows can be configured to send SQS messages at any point using the sf-sqs-report task.

    Additional listeners can be easily configured to trigger when messages are sent to the SNS topics.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/data-cookbooks/queue-post-to-cmr/index.html b/docs/v10.0.0/data-cookbooks/queue-post-to-cmr/index.html index eccd6cda22d..bb7c4c1ce28 100644 --- a/docs/v10.0.0/data-cookbooks/queue-post-to-cmr/index.html +++ b/docs/v10.0.0/data-cookbooks/queue-post-to-cmr/index.html @@ -5,13 +5,13 @@ Queue PostToCmr | Cumulus Documentation - +
    Version: v10.0.0

    Queue PostToCmr

    In this document, we walk through handling CMR errors in workflows by queueing PostToCmr. We assume that the user already has an ingest workflow setup.

    Overview

    The general concept is that the last task of the ingest workflow will be QueueWorkflow, which queues the publish workflow. The publish workflow contains the PostToCmr task and if a CMR error occurs during PostToCmr, the publish workflow will add itself back onto the queue so that it can be executed when CMR is back online. This is achieved by leveraging the QueueWorkflow task again in the publish workflow. The following diagram demonstrates this queueing process.

    Diagram of workflow queueing

    Ingest Workflow

    The last step should be the QueuePublishWorkflow step. It should be configured with a queueUrl and workflow. In this case, the queueUrl is a throttled queue. Any queueUrl can be specified here which is useful if you would like to use a lower priority queue. The workflow is the unprefixed workflow name that you would like to queue (e.g. PublishWorkflow).

      "QueuePublishWorkflowStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "workflow": "{$.meta.workflow}",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Publish Workflow

    Configure the Catch section of your PostToCmr task to proceed to QueueWorkflow if a CMRInternalError is caught. Any other error will cause the workflow to fail.

      "Catch": [
    {
    "ErrorEquals": [
    "CMRInternalError"
    ],
    "Next": "RequeueWorkflow"
    },
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],

    Then, configure the QueueWorkflow task similarly to its configuration in the ingest workflow. This time, pass the current publish workflow to the task config. This allows for the publish workflow to be requeued when there is a CMR error.

    {
    "RequeueWorkflow": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "workflow": "PublishGranuleQueue",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    - + \ No newline at end of file diff --git a/docs/v10.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html b/docs/v10.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html index b1f3abe2f22..367345b9725 100644 --- a/docs/v10.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html +++ b/docs/v10.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html @@ -5,13 +5,13 @@ Run Step Function Tasks in AWS Lambda or Docker | Cumulus Documentation - +
    Version: v10.0.0

    Run Step Function Tasks in AWS Lambda or Docker

    Overview

    AWS Step Function Tasks can run tasks on AWS Lambda or on AWS Elastic Container Service (ECS) as a Docker container.

    Lambda provides serverless architecture, providing the best option for minimizing cost and server management. ECS provides the fullest extent of AWS EC2 resources via the flexibility to execute arbitrary code on any AWS EC2 instance type.

    When to use Lambda

    You should use AWS Lambda whenever all of the following are true:

    • The task runs on one of the supported Lambda Runtimes. At time of this writing, supported runtimes include versions of python, Java, Ruby, node.js, Go and .NET.
    • The lambda package is less than 50 MB in size, zipped.
    • The task consumes less than each of the following resources:
      • 3008 MB memory allocation
      • 512 MB disk storage (must be written to /tmp)
      • 15 minutes of execution time

    See this page for a complete and up-to-date list of AWS Lambda limits.

    If your task requires more than any of these resources or an unsupported runtime, creating a Docker image which can be run on ECS is the way to go. Cumulus supports running any lambda package (and its configured layers) as a Docker container with cumulus-ecs-task.

    Step Function Activities and cumulus-ecs-task

    Step Function Activities enable a state machine task to "publish" an activity task which can be picked up by any activity worker. Activity workers can run pretty much anywhere, but Cumulus workflows support the cumulus-ecs-task activity worker. The cumulus-ecs-task worker runs as a Docker container on the Cumulus ECS cluster.

    The cumulus-ecs-task container takes an AWS Lambda Amazon Resource Name (ARN) as an argument (see --lambdaArn in the example below). This ARN argument is defined at deployment time. The cumulus-ecs-task worker polls for new Step Function Activity Tasks. When a Step Function executes, the worker (container) picks up the activity task and runs the code contained in the lambda package defined on deployment.

    Example: Replacing AWS Lambda with a Docker container run on ECS

    This example will use an already-defined workflow from the cumulus module that includes the QueueGranules task in its configuration.

    The following example is an excerpt from the Discover Granules workflow containing the step definition for the QueueGranules step:

    Note: ${ingest_granule_workflow_name} and ${queue_granules_task_arn} are interpolated values that refer to Terraform resources. See the example deployment code for the Discover Granules workflow.

      "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "queueUrl": "{$.meta.queues.startSF}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Given it has been discovered this task can no longer run in AWS Lambda, you can instead run it on the Cumulus ECS cluster by adding the following resources to your terraform deployment (by either adding a new .tf file or updating an existing one):

    • A aws_sfn_activity resource:
    resource "aws_sfn_activity" "queue_granules" {
    name = "${var.prefix}-QueueGranules"
    }
    • An instance of the cumulus_ecs_service module (found on the Cumulus releases page configured to provide the QueueGranules task:

    module "queue_granules_service" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-ecs-service.zip"

    prefix = var.prefix
    name = "QueueGranules"

    cluster_arn = module.cumulus.ecs_cluster_arn
    desired_count = 1
    image = "cumuluss/cumulus-ecs-task:1.7.0"

    cpu = 400
    memory_reservation = 700

    environment = {
    AWS_DEFAULT_REGION = data.aws_region.current.name
    }
    command = [
    "cumulus-ecs-task",
    "--activityArn",
    aws_sfn_activity.queue_granules.id,
    "--lambdaArn",
    module.cumulus.queue_granules_task.task_arn
    ]
    alarms = {
    MemoryUtilizationHigh = {
    comparison_operator = "GreaterThanThreshold"
    evaluation_periods = 1
    metric_name = "MemoryUtilization"
    statistic = "SampleCount"
    threshold = 75
    }
    }
    }

    Please note: If you have updated the code for the Lambda specified by --lambdaArn, you will have to manually restart the tasks in your ECS service before invocation of the Step Function activity will use the updated Lambda code.

    • An updated Discover Granules workflow) to utilize the new resource (the Resource key in the QueueGranules step has been updated to:

    "Resource": "${aws_sfn_activity.queue_granules.id}")`

    If you then run this workflow in place of the DiscoverGranules workflow, the QueueGranules step would run as an ECS task instead of a lambda.

    Final note

    Step Function Activities and AWS Lambda are not the only ways to run tasks in an AWS Step Function. Learn more about other service integrations, including direct ECS integration via the AWS Service Integrations page.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/data-cookbooks/sips-workflow/index.html b/docs/v10.0.0/data-cookbooks/sips-workflow/index.html index 2f3c6621aa2..c29bb616200 100644 --- a/docs/v10.0.0/data-cookbooks/sips-workflow/index.html +++ b/docs/v10.0.0/data-cookbooks/sips-workflow/index.html @@ -5,7 +5,7 @@ Science Investigator-led Processing Systems (SIPS) | Cumulus Documentation - + @@ -16,7 +16,7 @@ we're just going to create a onetime throw-away rule that will be easy to test with. This rule will kick off the DiscoverAndQueuePdrs workflow, which is the beginning of a Cumulus SIPS workflow:

    Screenshot of a Cumulus rule configuration

    Note: A list of configured workflows exists under the "Workflows" in the navigation bar on the Cumulus dashboard. Additionally, one can find a list of executions and their respective status in the "Executions" tab in the navigation bar.

    DiscoverAndQueuePdrs Workflow

    This workflow will discover PDRs and queue them to be processed. Duplicate PDRs will be dealt with according to the configured duplicate handling setting in the collection. The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. DiscoverPdrs - source
    2. QueuePdrs - source

    Screenshot of execution graph for discover and queue PDRs workflow in the AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the discover_and_queue_pdrs_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    ParsePdr Workflow

    The ParsePdr workflow will parse a PDR, queue the specified granules (duplicates are handled according to the duplicate handling setting) and periodically check the status of those queued granules. This workflow will not succeed until all the granules included in the PDR are successfully ingested. If one of those fails, the ParsePdr workflow will fail. NOTE that ParsePdr may spin up multiple IngestGranule workflows in parallel, depending on the granules included in the PDR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. ParsePdr - source
    2. QueueGranules - source
    3. CheckStatus - source

    Screenshot of execution graph for SIPS Parse PDR workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the parse_pdr_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    IngestGranule Workflow

    The IngestGranule workflow processes and ingests a granule and posts the granule metadata to CMR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. SyncGranule - source.
    2. CmrStep - source

    Additionally this workflow requires a processing step you must provide. The ProcessingStep step in the workflow picture below is an example of a custom processing step.

    Note: Using the CmrStep is not required and can be left out of the processing trajectory if desired (for example, in testing situations).

    Screenshot of execution graph for SIPS IngestGranule workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the ingest_and_publish_granule_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    Summary

    In this cookbook we went over setting up a collection, rule, and provider for a SIPS workflow. Once we had the setup completed, we looked over the Cumulus workflows that participate in parsing PDRs, ingesting and processing granules, and updating CMR.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/data-cookbooks/throttling-queued-executions/index.html b/docs/v10.0.0/data-cookbooks/throttling-queued-executions/index.html index 40a8d55f67f..173f7edd7a1 100644 --- a/docs/v10.0.0/data-cookbooks/throttling-queued-executions/index.html +++ b/docs/v10.0.0/data-cookbooks/throttling-queued-executions/index.html @@ -5,13 +5,13 @@ Throttling queued executions | Cumulus Documentation - +
    Version: v10.0.0

    Throttling queued executions

    In this entry, we will walk through how to create an SQS queue for scheduling executions which will be used to limit those executions to a maximum concurrency. And we will see how to configure our Cumulus workflows/rules to use this queue.

    We will also review the architecture of this feature and highlight some implementation notes.

    Limiting the number of executions that can be running from a given queue is useful for controlling the cloud resource usage of workflows that may be lower priority, such as granule reingestion or reprocessing campaigns. It could also be useful for preventing workflows from exceeding known resource limits, such as a maximum number of open connections to a data provider.

    Implementing the queue

    Create and deploy the queue

    Add a new queue

    In a .tf file for your Cumulus deployment, add a new SQS queue:

    resource "aws_sqs_queue" "background_job_queue" {
    name = "${var.prefix}-backgroundJobQueue"
    receive_wait_time_seconds = 20
    visibility_timeout_seconds = 60
    }

    Set maximum executions for the queue

    Define the throttled_queues variable for the cumulus module in your Cumulus deployment to specify the maximum concurrent executions for the queue.

    module "cumulus" {
    # ... other variables

    throttled_queues = [{
    url = aws_sqs_queue.background_job_queue.id,
    execution_limit = 5
    }]
    }

    Setup consumer for the queue

    Add the sqs2sfThrottle Lambda as the consumer for the queue and add a Cloudwatch event rule/target to read from the queue on a scheduled basis.

    Please note: You must use the sqs2sfThrottle Lambda as the consumer for any queue with a queue execution limit or else the execution throttling will not work correctly. Additionally, please allow at least 60 seconds after creation before using the queue while associated infrastructure and triggers are set up and made ready.

    aws_sqs_queue.background_job_queue.id refers to the queue resource defined above.

    resource "aws_cloudwatch_event_rule" "background_job_queue_watcher" {
    schedule_expression = "rate(1 minute)"
    }

    resource "aws_cloudwatch_event_target" "background_job_queue_watcher" {
    rule = aws_cloudwatch_event_rule.background_job_queue_watcher.name
    arn = module.cumulus.sqs2sfThrottle_lambda_function_arn
    input = jsonencode({
    messageLimit = 500
    queueUrl = aws_sqs_queue.background_job_queue.id
    timeLimit = 60
    })
    }

    resource "aws_lambda_permission" "background_job_queue_watcher" {
    action = "lambda:InvokeFunction"
    function_name = module.cumulus.sqs2sfThrottle_lambda_function_arn
    principal = "events.amazonaws.com"
    source_arn = aws_cloudwatch_event_rule.background_job_queue_watcher.arn
    }

    Re-deploy your Cumulus application

    Follow the instructions to re-deploy your Cumulus application. After you have re-deployed, your workflow template will be updated to the include information about the queue (the output below is partial output from an expected workflow template):

    {
    "cumulus_meta": {
    "queueExecutionLimits": {
    "<backgroundJobQueue_SQS_URL>": 5
    }
    }
    }

    Integrate your queue with workflows and/or rules

    Integrate queue with queuing steps in workflows

    For any workflows using QueueGranules or QueuePdrs that you want to use your new queue, update the Cumulus configuration of those steps in your workflows.

    As seen in this partial configuration for a QueueGranules step, update the queueUrl to reference the new throttled queue:

    Note: ${ingest_granule_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverGranules workflow.

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}"
    }
    }
    }
    }
    }

    Similarly, for a QueuePdrs step:

    Note: ${parse_pdr_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverPdrs workflow.

    {
    "QueuePdrs": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "parsePdrWorkflow": "${parse_pdr_workflow_name}"
    }
    }
    }
    }
    }

    After making these changes, re-deploy your Cumulus application for the execution throttling to take effect on workflow executions queued by these workflows.

    Create/update a rule to use your new queue

    Create or update a rule definition to include a queueUrl property that refers to your new queue:

    {
    "name": "s3_provider_rule",
    "workflow": "DiscoverAndQueuePdrs",
    "provider": "s3_provider",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "queueUrl": "<backgroundJobQueue_SQS_URL>" // configure rule to use your queue URL
    }

    After creating/updating the rule, any subsequent invocations of the rule should respect the maximum number of executions when starting workflows from the queue.

    Architecture

    Architecture diagram showing how executions started from a queue are throttled to a maximum concurrent limit

    Execution throttling based on the queue works by manually keeping a count (semaphore) of how many executions are running for the queue at a time. The key operation that prevents the number of executions from exceeding the maximum for the queue is that before starting new executions, the sqs2sfThrottle Lambda attempts to increment the semaphore and responds as follows:

    • If the increment operation is successful, then the count was not at the maximum and an execution is started
    • If the increment operation fails, then the count was already at the maximum so no execution is started

    Final notes

    Limiting the number of concurrent executions for work scheduled via a queue has several consequences worth noting:

    • The number of executions that are running for a given queue will be limited to the maximum for that queue regardless of which workflow(s) are started.
    • If you use the same queue to schedule executions across multiple workflows/rules, then the limit on the total number of executions running concurrently will be applied to all of the executions scheduled across all of those workflows/rules.
    • If you are scheduling the same workflow both via a queue with a maxExecutions value and a queue without a maxExecutions value, only the executions scheduled via the queue with the maxExecutions value will be limited to the maximum.
    - + \ No newline at end of file diff --git a/docs/v10.0.0/data-cookbooks/tracking-files/index.html b/docs/v10.0.0/data-cookbooks/tracking-files/index.html index 95fbd3dc471..80d9a4d9166 100644 --- a/docs/v10.0.0/data-cookbooks/tracking-files/index.html +++ b/docs/v10.0.0/data-cookbooks/tracking-files/index.html @@ -5,7 +5,7 @@ Tracking Ancillary Files | Cumulus Documentation - + @@ -19,7 +19,7 @@ The UMM-G column reflects the RelatedURL's Type derived from the CNM type, whereas the ECHO10 column shows how the CNM type affects the destination element.

    CNM TypeUMM-G RelatedUrl.TypeECHO10 Location
    ancillary'VIEW RELATED INFORMATION'OnlineResource
    data'GET DATA'(HTTPS URL) or 'GET DATA VIA DIRECT ACCESS'(S3 URI)OnlineAccessURL
    browse'GET RELATED VISUALIZATION'AssociatedBrowseImage
    linkage'EXTENDED METADATA'OnlineResource
    metadata'EXTENDED METADATA'OnlineResource
    qa'EXTENDED METADATA'OnlineResource

    Common Use Cases

    This section briefly documents some common use cases and the recommended configuration for the file. The examples shown here are for the DiscoverGranules use case, which allows configuration at the Cumulus dashboard level. The other two cases covered in the ancillary metadata documentation require configuration at the provider notification level (either CNM message or PDR) and are not covered here.

    Configuring browse imagery:

    {
    "bucket": "public",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_[\\d]{1}.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_1.jpg",
    "type": "browse"
    }

    Configuring a documentation entry:

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_README.pdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_README.pdf",
    "type": "metadata"
    }

    Configuring other associated files (use types metadata or qa as appropriate):

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_QA.txt$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_QA.txt",
    "type": "qa"
    }
    - + \ No newline at end of file diff --git a/docs/v10.0.0/deployment/api-gateway-logging/index.html b/docs/v10.0.0/deployment/api-gateway-logging/index.html index cf44599d53a..8fa20c8ee2c 100644 --- a/docs/v10.0.0/deployment/api-gateway-logging/index.html +++ b/docs/v10.0.0/deployment/api-gateway-logging/index.html @@ -5,13 +5,13 @@ API Gateway Logging | Cumulus Documentation - +
    Version: v10.0.0

    API Gateway Logging

    Enabling API Gateway logging

    In order to enable distribution API Access and execution logging, configure the TEA deployment by setting log_api_gateway_to_cloudwatch on the thin_egress_app module:

    log_api_gateway_to_cloudwatch = true

    This enables the distribution API to send its logs to the default CloudWatch location: API-Gateway-Execution-Logs_<RESTAPI_ID>/<STAGE>

    Configure Permissions for API Gateway Logging to CloudWatch

    Instructions for enabling account level logging from API Gateway to CloudWatch

    This is a one time operation that must be performed on each AWS account to allow API Gateway to push logs to CloudWatch.

    Create a policy document

    The AmazonAPIGatewayPushToCloudWatchLogs managed policy, with an ARN of arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs, has all the required permissions to enable API Gateway logging to CloudWatch. To grant these permissions to your account, first create an IAM role with apigateway.amazonaws.com as its trusted entity.

    Save this snippet as apigateway-policy.json.

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "",
    "Effect": "Allow",
    "Principal": {
    "Service": "apigateway.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
    }
    ]
    }

    Create an account role to act as ApiGateway and write to CloudWatchLogs

    NASA users in NGAP: be sure to use your account's permission boundary.

    aws iam create-role \
    --role-name ApiGatewayToCloudWatchLogs \
    [--permissions-boundary <permissionBoundaryArn>] \
    --assume-role-policy-document file://apigateway-policy.json

    Note the ARN of the returned role for the last step.

    Attach correct permissions to role

    Next attach the AmazonAPIGatewayPushToCloudWatchLogs policy to the IAM role.

    aws iam attach-role-policy \
    --role-name ApiGatewayToCloudWatchLogs \
    --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs"

    Update Account API Gateway settings with correct permissions

    Finally, set the IAM role ARN on the cloudWatchRoleArn property on your API Gateway Account settings.

    aws apigateway update-account \
    --patch-operations op='replace',path='/cloudwatchRoleArn',value='<ApiGatewayToCloudWatchLogs ARN>'

    Configure API Gateway CloudWatch Logs Delivery

    See Configure Cloudwatch Logs Delivery

    - + \ No newline at end of file diff --git a/docs/v10.0.0/deployment/cloudwatch-logs-delivery/index.html b/docs/v10.0.0/deployment/cloudwatch-logs-delivery/index.html index 4cddd87080e..c359e55cdd7 100644 --- a/docs/v10.0.0/deployment/cloudwatch-logs-delivery/index.html +++ b/docs/v10.0.0/deployment/cloudwatch-logs-delivery/index.html @@ -5,13 +5,13 @@ Configure Cloudwatch Logs Delivery | Cumulus Documentation - +
    Version: v10.0.0

    Configure Cloudwatch Logs Delivery

    As an optional configuration step, it is possible to deliver CloudWatch logs to a cross-account shared AWS::Logs::Destination. An operator does this by configuring the cumulus module for your deployment as shown below. The value of the log_destination_arn variable is the ARN of a writeable log destination.

    The value can be either an AWS::Logs::Destination or a Kinesis Stream ARN to which your account can write.

    log_destination_arn           = arn:aws:[kinesis|logs]:us-east-1:123456789012:[streamName|destination:logDestinationName]

    Logs Sent

    Be default, the following logs will be sent to the destination when one is given.

    • Ingest logs
    • Async Operation logs
    • Thin Egress App API Gateway logs (if configured)

    Additional Logs

    If additional logs are needed, you can configure additional_log_groups_to_elk with the Cloudwatch log groups you want to send to the destination. additional_log_groups_to_elk is a map with the key as a descriptor and the value with the Cloudwatch log group name.

    additional_log_groups_to_elk = {
    "HelloWorldTask" = "/aws/lambda/cumulus-example-HelloWorld"
    "MyCustomTask" = "my-custom-task-log-group"
    }
    - + \ No newline at end of file diff --git a/docs/v10.0.0/deployment/components/index.html b/docs/v10.0.0/deployment/components/index.html index 09f838ca5ac..24538d22367 100644 --- a/docs/v10.0.0/deployment/components/index.html +++ b/docs/v10.0.0/deployment/components/index.html @@ -5,7 +5,7 @@ Component-based Cumulus Deployment | Cumulus Documentation - + @@ -39,7 +39,7 @@ Terraform at the same time.

    With remote state, Terraform writes the state data to a remote data store, which can then be shared between all members of a team.

    The recommended approach for handling remote state with Cumulus is to use the S3 backend. This backend stores state in S3 and uses a DynamoDB table for locking.

    See the deployment documentation for a walk-through of creating resources for your remote state using an S3 backend.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/deployment/create_bucket/index.html b/docs/v10.0.0/deployment/create_bucket/index.html index 792f1e8d4ee..b99a5bde40c 100644 --- a/docs/v10.0.0/deployment/create_bucket/index.html +++ b/docs/v10.0.0/deployment/create_bucket/index.html @@ -5,13 +5,13 @@ Creating an S3 Bucket | Cumulus Documentation - +
    Version: v10.0.0

    Creating an S3 Bucket

    Buckets can be created on the command line with AWS CLI or via the web interface on the AWS console.

    When creating a protected bucket (a bucket containing data which will be served through the distribution API), make sure to enable S3 server access logging. See S3 Server Access Logging for more details.

    Command line

    Using the AWS command line tool create-bucket s3api subcommand:

    $ aws s3api create-bucket \
    --bucket foobar-internal \
    --region us-west-2 \
    --create-bucket-configuration LocationConstraint=us-west-2
    {
    "Location": "/foobar-internal"
    }

    Note: The region and create-bucket-configuration arguments are only necessary if you are creating a bucket outside of the us-east-1 region.

    Please note security settings and other bucket options can be set via the options listed in the s3api documentation.

    Repeat the above step for each bucket to be created.

    Web interface

    See: AWS "Creating a Bucket" documentation

    - + \ No newline at end of file diff --git a/docs/v10.0.0/deployment/cumulus_distribution/index.html b/docs/v10.0.0/deployment/cumulus_distribution/index.html index ee22dcf6eb1..fe819139351 100644 --- a/docs/v10.0.0/deployment/cumulus_distribution/index.html +++ b/docs/v10.0.0/deployment/cumulus_distribution/index.html @@ -5,14 +5,14 @@ Using the Cumulus Distribution API | Cumulus Documentation - +
    Version: v10.0.0

    Using the Cumulus Distribution API

    The Cumulus Distribution API is a set of endpoints that can be used to enable AWS Cognito authentication when downloading data from S3.

    Configuring a Cumulus Distribution deployment

    The Cumulus Distribution API is included in the main Cumulus repo. It is available as part of the terraform-aws-cumulus.zip archive in the latest release.

    These steps assume you're using the Cumulus Deployment Template but can also be used for custom deployments.

    To configure a deployment to use Cumulus Distribution:

    1. Remove or comment the "Thin Egress App Settings" in the Cumulus Template Deploy and enable the Cumulus Distribution settings.
    2. Delete or comment the contents of thin_egress_app.tf and the corresponding Thin Egress App outputs in outputs.tf. These are not necessary for a Cumulus Distribution deployment.
    3. Uncomment the Cumulus Distribution outputs in outputs.tf.
    4. Rename cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.example to cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.

    Cognito Application and User Credentials

    The major prerequisite for using the Cumulus Distribution API is to set up Cognito. If operating within NGAP, this should already be done for you. If operating outside of NGAP, you must set up Cognito yourself, which is beyond the scope of this documentation.

    Given that Cognito is set up, in order to be able to download granule files via the Cumulus Distribution API, you must obtain Cognito user credentials, because any attempt to download such files (that will be, or have been, published to the CMR via your Cumulus deployment) will result in a prompt for you to supply Cognito user credentials. To obtain your own user credentials, talk to your product owner or scrum master for additional information. They should either know how to create the credentials, know who can create them for the team, or be the liaison to the Cognito team.

    Further, whoever helps to obtain your Cognito user credentials should also be able to supply you with the values for the following new variables that you must add to your cumulus-tf/terraform.tfvars file:

    • csdap_host_url: The URL of the Cognito service to which your Cumulus deployment will make Cognito API calls during a distribution (download) event
    • csdap_client_id: The client ID for the Cumulus application registered within the Cognito service
    • csdap_client_password: The client password for the Cumulus application registered within the Cognito service

    Although you might have to wait a bit for your Cognito user credentials, the remaining instructions do not depend upon having them, so you may continue with these instructions while waiting for your credentials.

    Cumulus Distribution URL

    Your Cumulus Distribution URL is used by Cumulus to generate download URLs as part of the granule metadata generated and published to the CMR. For example, a granule download URL will be of the form <distribution url>/<protected bucket>/<key> (or <distribution url>/path/to/file, if using a custom bucket map, as explained further below).

    By default, the value of your distribution URL is the URL of your private Cumulus Distribution API Gateway (the API Gateway named <prefix>-distribution, once you deploy the Cumulus Distribution module). Therefore, by default, the generated download URLs are private, and thus inaccessible directly, but there are 2 ways to address this issue (both of which are detailed below): (a) use tunneling (typically in development) or (b) put a CloudFront URL in front of your API Gateway (typically in production, and perhaps UAT and/or SIT).

    In either case, you must first know the default URL (i.e., the URL for the private Cumulus Distribution API Gateway). In order to obtain this default URL, you must first deploy your cumulus-tf module with the new Cumulus Distribution module, and once your initial deployment is complete, one of the Terraform outputs will be cumulus_distribution_api_uri, which is the URL for the private API Gateway.

    You may override this default URL by adding a cumulus_distribution_url variable to your cumulus-tf/terraform.tfvars file, and setting it to one of the following values (both of which are explained below):

    1. The default URL, but with a port added to it, in order to allow you to configure tunneling (typically only in development)
    2. A CloudFront URL placed in front of your Cumulus Distribution API Gateway (typically only for Production, but perhaps also for a UAT or SIT environment)

    The following subsections explain these approaches, in turn.

    Using your Cumulus Distribution API Gateway URL as your distribution URL

    Since your Cumulus Distribution API Gateway URL is private, the only way you can use it to confirm that your integration with Cognito is working is by using tunneling (again, generally for development), as described here. Here is an outline of the required steps, with details provided further below:

    1. Create/import a key pair into your AWS EC2 service (if you haven't already done so)
    2. Add a reference to the name of the key pair to your Terraform variables (we'll set the key_name Terraform variable)
    3. Choose an open local port on your machine (we'll use 9000 in the following details)
    4. Add a reference to the value of your cumulus_distribution_api_uri (mentioned earlier), including your chosen port (we'll set the cumulus_distribution_url Terraform variable)
    5. Redeploy Cumulus
    6. Add an entry to your /etc/hosts file
    7. Add a redirect URI to Cognito, via the Cognito API
    8. Install the Session Manager Plugin for the AWS CLI (if you haven't already done so; assuming you have already installed the AWS CLI)
    9. Add a sample file to S3 to test downloading via Cognito

    To create or import an existing key pair, you can use the AWS CLI (see aws ec2 import-key-pair), or the AWS Console (see Amazon EC2 key pairs and Linux instances).

    Once your key pair is added to AWS, add the following to your cumulus-tf/terraform.tfvars file:

    key_name = "<name>"
    cumulus_distribution_url = "https://<id>.execute-api.<region>.amazonaws.com:<port>/dev/"

    where:

    • <name> is the name of the key pair you just added to AWS
    • <id> and <region> are the corresponding parts from your cumulus_distribution_api_uri output variable
    • <port> is your open local port of choice (9000 is typically a good choice)

    Once you save your variable changes, redeploy your cumulus-tf module.

    While your deployment runs, add the following entry to your /etc/hosts file, replacing <hostname> with the host name of the cumulus_distribution_url Terraform variable you just added above:

    localhost <hostname>

    Next, you'll need to use the Cognito API to add the value of your cumulus_distribution_url Terraform variable as a Cognito redirect URI. To do so, use your favorite tool (e.g., curl, wget, Postman, etc.) to make a BasicAuth request to the Cognito API, using the following details:

    • method: POST
    • base URL: the value of your csdap_host_url Terraform variable
    • path: /authclient/updateRedirectUri
    • username: the value of your csdap_client_id Terraform variable
    • password: the value of your csdap_client_password Terraform variable
    • headers: Content-Type='application/x-www-form-urlencoded'
    • body: redirect_uri=<cumulus_distribution_url>/login

    where <cumulus_distribution_url> is the value of your cumulus_distribution_url Terraform variable. Note the /login path at the end of the redirect_uri value.

    For reference, see the Cognito Authentication Service API.

    Next, install the Session Manager Plugin for the AWS CLI. If running on macOS, and you use Homebrew, you can install it simply as follows:

    brew install --cask session-manager-plugin --no-quarantine

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    At this point, you should be ready to open a tunnel and attempt to download your sample file via your browser, summarized as follows:

    1. Determine your ec2 instance ID
    2. Connect to the NASA VPN
    3. Start an AWS SSM session
    4. Open an ssh tunnel
    5. Use a browser to navigate to your file

    To determine your ec2 instance ID for your Cumulus deployment, run the follow command, where <profile> is the name of the appropriate AWS profile to use, and <prefix> is the value of your prefix Terraform variable:

    aws --profile <profile> ec2 describe-instances --filters Name=tag:Deployment,Values=<prefix> Name=instance-state-name,Values=running --query "Reservations[0].Instances[].InstanceId" --output text

    IMPORTANT: Before proceeding with the remaining steps, make sure you're connected to the NASA VPN.

    Use the value output from the command above in place of <id> in the following command, which will start an SSM session:

    aws ssm start-session --target <id> --document-name AWS-StartPortForwardingSession --parameters portNumber=22,localPortNumber=6000

    If successful, you should see output similar to the following:

    Starting session with SessionId: NGAPShApplicationDeveloper-***
    Port 6000 opened for sessionId NGAPShApplicationDeveloper-***.
    Waiting for connections...

    Open another terminal window, and open a tunnel with port forwarding, using your chosen port from above (e.g., 9000):

    ssh -4 -p 6000 -N -L <port>:<api-gateway-host>:443 ec2-user@127.0.0.1

    where:

    • <port> is the open local port you chose earlier (e.g., 9000)
    • <api-gateway-host> is the hostname of your private API Gateway (i.e., the host portion of the URL you used as the value of your cumulus_distribution_url Terraform variable above)

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3 above.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    Once you're finished testing, clean up as follows:

    1. Kill your ssh tunnel (Ctrl-C)
    2. Kill your AWS SSM session (Ctrl-C)
    3. If you like, disconnect from the NASA VPC

    While this is a relatively lengthy process, things are much easier when using CloudFront, such as in Production (OPS), SIT, or UAT, as explained next.

    Using a CloudFront URL as your distribution URL

    In Production (OPS), and perhaps in other environments, such as UAT and SIT, you'll need to provide a publicly accessible URL for users to use for downloading (distributing) granule files.

    This is generally done by placing a CloudFront URL in front of your private Cumulus Distribution API Gateway. In order to create such a CloudFront URL, contact the person who helped you obtain your Cognito credentials, and request a CloudFront URL with the following details:

    • The private, backing URL, which is the value of your cumulus_distribution_api_uri Terraform output value
    • A request to add the AWS account's VPC to the whitelist

    Once this request is completed, and you obtain the new CloudFront URL, override your default distribution URL with the CloudFront URL by adding the following to your cumulus-tf/terraform.tfvars file:

    cumulus_distribution_url = <cloudfront_url>

    In addition, add a Cognito redirect URI, as detailed in the previous section. Note that in this case, the value you'll use for redirect_uri is <cloudfront_url>/login since the value of your cumulus_distribution_url is now your CloudFront URL.

    At this point, it is assumed that you have added the appropriate values for this environment for the variables described at the top (csdap_host_url, csdap_client_id, and csdap_client_password).

    Redeploy Cumulus with your new/updated Terraform variables.

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    S3 Bucket Mapping

    An S3 Bucket map allows users to abstract bucket names. If the bucket names change at any point, only the bucket map would need to be updated instead of every S3 link.

    The Cumulus Distribution API uses a bucket_map.yaml or bucket_map.yaml.tmpl file to determine which buckets to serve. See the examples.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple json mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    Note: Cumulus only supports a one-to-one mapping of bucket -> Cumulus Distribution path for 'distribution' buckets. Also, the bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Switching from the Thin Egress App to Cumulus Distribution

    If you have previously deployed the Thin Egress App (TEA) as your distribution app, you can switch to Cumulus Distribution by following the steps above.

    Note, however, that the cumulus_distribution module will generate a bucket map cache and overwrite any existing bucket map caches created by TEA.

    There will also be downtime while your API gateway is updated.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/deployment/index.html b/docs/v10.0.0/deployment/index.html index 727a526deb9..5a152b655cd 100644 --- a/docs/v10.0.0/deployment/index.html +++ b/docs/v10.0.0/deployment/index.html @@ -5,7 +5,7 @@ How to Deploy Cumulus | Cumulus Documentation - + @@ -21,7 +21,7 @@ for deployment's EC2 instances and allows you to connect to them via SSH/SSM.

    Consider the sizing of your Cumulus instance when configuring your variables.

    Choose a distribution API

    Cumulus can be configured to use either the Thin Egress App (TEA) or the Cumulus Distribution API. The default selection is the Thin Egress App if you're using the Deployment Template.

    IMPORTANT! If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    Configure the Thin Egress App

    The Thin Egress App can be used for Cumulus distribution and is the default selection. It allows authentication using Earthdata Login. Follow the steps in the documentation to configure distribution in your cumulus-tf deployment.

    Configure the Cumulus Distribution API (optional)

    If you would prefer to use the Cumulus Distribution API, which supports AWS Cognito authentication, follow these steps to configure distribution in your cumulus-tf deployment.

    Initialize Terraform

    Follow the above instructions to initialize Terraform using terraform init3.

    Deploy

    Run terraform apply to deploy the resources. Type yes when prompted to confirm that you want to create the resources. Assuming the operation is successful, you should see output like this:

    Apply complete! Resources: 292 added, 0 changed, 0 destroyed.

    Outputs:

    archive_api_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/token
    archive_api_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/
    distribution_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/login
    distribution_url = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/

    Note: Be sure to copy the redirect URLs, as you will use them to update your Earthdata application.

    Update Earthdata Application

    You will need to add two redirect URLs to your EarthData login application.

    1. Login to URS.
    2. Under My Applications -> Application Administration -> use the edit icon of your application.
    3. Under Manage -> redirect URIs, add the Archive API url returned from the stack deployment
      • e.g. archive_api_redirect_uri = https://<czbbkscuy6>.execute-api.us-east-1.amazonaws.com/dev/token.
    4. Also add the Distribution url
      • e.g. distribution_redirect_uri = https://<kido2r7kji>.execute-api.us-east-1.amazonaws.com/dev/login1.
    5. You may delete the placeholder url you used to create the application.

    If you've lost track of the needed redirect URIs, they can be located on the API Gateway. Once there, select <prefix>-archive and/or <prefix>-thin-egress-app-EgressGateway, Dashboard and utilizing the base URL at the top of the page that is accompanied by the text Invoke this API at:. Make sure to append /token for the archive URL and /login to the thin egress app URL.


    Deploy Cumulus dashboard

    Dashboard Requirements

    Please note that the requirements are similar to the Cumulus stack deployment requirements. The installation instructions below include a step that will install/use the required node version referenced in the .nvmrc file in the dashboard repository.

    Prepare AWS

    Create S3 bucket for dashboard:

    • Create it, e.g. <prefix>-dashboard. Use the command line or console as you did when preparing AWS configuration.
    • Configure the bucket to host a website:
      • AWS S3 console: Select <prefix>-dashboard bucket then, "Properties" -> "Static Website Hosting", point to index.html
      • CLI: aws s3 website s3://<prefix>-dashboard --index-document index.html
    • The bucket's url will be http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or you can find it on the AWS console via "Properties" -> "Static website hosting" -> "Endpoint"
    • Ensure the bucket's access permissions allow your deployment user access to write to the bucket

    Install dashboard

    To install the dashboard, clone the Cumulus dashboard repository into the root deploy directory and install dependencies with npm install:

      git clone https://github.com/nasa/cumulus-dashboard
    cd cumulus-dashboard
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Dashboard versioning

    By default, the master branch will be used for dashboard deployments. The master branch of the dashboard repo contains the most recent stable release of the dashboard.

    If you want to test unreleased changes to the dashboard, use the develop branch.

    Each release/version of the dashboard will have a tag in the dashboard repo. Release/version numbers will use semantic versioning (major/minor/patch).

    To checkout and install a specific version of the dashboard:

      git fetch --tags
    git checkout <version-number> # e.g. v1.2.0
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Building the dashboard

    Note: These environment variables are available during the build: APIROOT, DAAC_NAME, STAGE, HIDE_PDR. Any of these can be set on the command line to override the values contained in config.js when running the build below.

    To configure your dashboard for deployment, set the APIROOT environment variable to your app's API root.2

    Build the dashboard from the dashboard repository root directory, cumulus-dashboard:

      APIROOT=<your_api_root> npm run build

    Dashboard deployment

    Deploy dashboard to s3 bucket from the cumulus-dashboard directory:

    Using AWS CLI:

      aws s3 sync dist s3://<prefix>-dashboard --acl public-read

    From the S3 Console:

    • Open the <prefix>-dashboard bucket, click 'upload'. Add the contents of the 'dist' subdirectory to the upload. Then select 'Next'. On the permissions window allow the public to view. Select 'Upload'.

    You should be able to visit the dashboard website at http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or find the url <prefix>-dashboard -> "Properties" -> "Static website hosting" -> "Endpoint" and login with a user that you configured for access in the Configure and Deploy the Cumulus Stack step.


    Cumulus Instance Sizing

    The Cumulus deployment default sizing for Elasticsearch instances, EC2 instances, and Autoscaling Groups are small and designed for testing and cost savings. The default settings are likely not suitable for production workloads. Sizing is highly individual and dependent on expected load and archive size.

    Please be cognizant of costs as any change in size will affect your AWS bill. AWS provides a pricing calculator for estimating costs.

    Elasticsearch

    The mappings file contains all of the data types that will be indexed into Elasticsearch. Elasticsearch sizing is tied to your archive size, including your collections, granules, and workflow executions that will be stored.

    AWS provides documentation on calculating and configuring for sizing.

    In addition to size you'll want to consider the number of nodes which determine how the system reacts in the event of a failure.

    Configuration can be done in the data persistence module in elasticsearch_config and the cumulus module in es_index_shards.

    If you make changes to your Elasticsearch configuration you will need to reindex for those changes to take effect.

    EC2 instances and autoscaling groups

    EC2 instances are used for long-running operations (i.e. generating a reconciliation report) and long-running workflow tasks. Configuration for your ECS cluster is achieved via Cumulus deployment variables.

    When configuring your ECS cluster consider:

    • The EC2 instance type and EBS volume size needed to accommodate your workloads. Configured as ecs_cluster_instance_type and ecs_cluster_instance_docker_volume_size.
    • The minimum and desired number of instances on hand to accommodate your workloads. Configured as ecs_cluster_min_size and ecs_cluster_desired_size.
    • The maximum number of instances you will need and are willing to pay for to accommodate your heaviest workloads. Configured as ecs_cluster_max_size.
    • Your autoscaling parameters: ecs_cluster_scale_in_adjustment_percent, ecs_cluster_scale_out_adjustment_percent, ecs_cluster_scale_in_threshold_percent, and ecs_cluster_scale_out_threshold_percent.

    Footnotes


    1. Run terraform init if:

      • This is the first time deploying the module
      • You have added any additional child modules, including Cumulus components
      • You have updated the source for any of the child modules

    2. To add another redirect URIs to your application. On Earthdata home page, select "My Applications". Scroll down to "Application Administration" and use the edit icon for your application. Then Manage -> Redirect URIs.

    3. The API root can be found a number of ways. The easiest is to note it in the output of the app deployment step. But you can also find it from the AWS console -> Amazon API Gateway -> APIs -> <prefix>-archive -> Dashboard, and reading the URL at the top after "Invoke this API at"

    - + \ No newline at end of file diff --git a/docs/v10.0.0/deployment/postgres_database_deployment/index.html b/docs/v10.0.0/deployment/postgres_database_deployment/index.html index f6cf7a3a39a..4932284fd2b 100644 --- a/docs/v10.0.0/deployment/postgres_database_deployment/index.html +++ b/docs/v10.0.0/deployment/postgres_database_deployment/index.html @@ -5,7 +5,7 @@ PostgreSQL Database Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ cumulus-rds-tf that will deploy an AWS RDS Aurora Serverless PostgreSQL 10.2 compatible database cluster, and optionally provision a single deployment database with credentialed secrets for use with Cumulus.

    We have provided an example terraform deployment using this module in the Cumulus template-deploy repository on github.

    Use of this example involves:

    • Creating/configuring a Terraform module directory
    • Using Terraform to deploy resources to AWS

    Requirements

    Configuration/installation of this module requires the following:

    • Terraform
    • git
    • A VPC configured for use with Cumulus Core. This should match the subnets you provide when Deploying Cumulus to allow Core's lambdas to properly access the database.
    • At least two subnets across multiple AZs. These should match the subnets you provide as configuration when Deploying Cumulus, and should be within the same VPC.

    Needed Git Repositories

    Assumptions

    OS/Environment

    The instructions in this module require Linux/MacOS. While deployment via Windows is possible, it is unsupported.

    Terraform

    This document assumes knowledge of Terraform. If you are not comfortable working with Terraform, the following links should bring you up to speed:

    For Cumulus specific instructions on installation of Terraform, refer to the main Cumulus Installation Documentation

    Aurora/RDS

    This document also assumes some basic familiarity with PostgreSQL databases, and Amazon Aurora/RDS. If you're unfamiliar consider perusing the AWS docs, and the Aurora Serverless V1 docs.

    Prepare deployment repository

    If you already are working with an existing repository that has a configured rds-cluster-tf deployment for the version of Cumulus you intend to deploy or update, or just need to configure this module for your repository, skip to Prepare AWS configuration.

    Clone the cumulus-template-deploy repo and name appropriately for your organization:

      git clone https://github.com/nasa/cumulus-template-deploy <repository-name>

    We will return to configuring this repo and using it for deployment below.

    Optional: Create a new repository

    Create a new repository on Github so that you can add your workflows and other modules to source control:

      git remote set-url origin https://github.com/<org>/<repository-name>
    git push origin master

    You can then add/commit changes as needed.

    Note: If you are pushing your deployment code to a git repo, make sure to add terraform.tf and terraform.tfvars to .gitignore, as these files will contain sensitive data related to your AWS account.


    Prepare AWS configuration

    To deploy this module, you need to make sure that you have the following steps from the Cumulus deployment instructions in similar fashion for this module:

    --

    Configure and deploy the module

    When configuring this module, please keep in mind that unlike Cumulus deployment, this module should be deployed once to create the database cluster and only thereafter to make changes to that configuration/upgrade/etc. This module does not need to be re-deployed for each Core update.

    These steps should be executed in the rds-cluster-tf directory of the template deploy repo that you previously cloned. Run the following to copy the example files:

    cd rds-cluster-tf/
    cp terraform.tf.example terraform.tf
    cp terraform.tfvars.example terraform.tfvars

    In terraform.tf, configure the remote state settings by substituting the appropriate values for:

    • bucket
    • dynamodb_table
    • PREFIX (whatever prefix you've chosen for your deployment)

    Fill in the appropriate values in terraform.tfvars. See the rds-cluster-tf module variable definitions for more detail on all of the configuration options. A few notable configuration options are documented in the next section.

    Configuration Options

    • deletion_protection -- defaults to true. Set it to false if you want to be able to delete your cluster with a terraform destroy without manually updating the cluster.
    • db_admin_username -- cluster database administration username. Defaults to postgres.
    • db_admin_password -- required variable that specifies the admin user password for the cluster. To randomize this on each deployment, consider using a random_string resource as input.
    • region -- defaults to us-east-1.
    • subnets -- requires at least 2 across different AZs. For use with Cumulus, these AZs should match the values you configure for your lambda_subnet_ids.
    • max_capacity -- the max ACUs the cluster is allowed to use. Carefully consider cost/performance concerns when setting this value.
    • min_capacity -- the minimum ACUs the cluster will scale to
    • provision_user_database -- Optional flag to allow module to provision a user database in addition to creating the cluster. Described in the next section.

    Provision user and user database

    If you wish for the module to provision a PostgreSQL database on your new cluster and provide a secret for access in the module output, in addition to managing the cluster itself, the following configuration keys are required:

    • provision_user_database -- must be set to true, this configures the module to deploy a lambda that will create the user database, and update the provided configuration on deploy.
    • permissions_boundary_arn -- the permissions boundary to use in creating the roles for access the provisioning lambda will need. This should in most use cases be the same one used for Cumulus Core deployment.
    • rds_user_password -- the value to set the user password to
    • prefix -- this value will be used to set a unique identifier the ProvisionDatabase lambda, as well as name the provisioned user/database.

    Once configured, the module will deploy the lambda, and run it on each provision, creating the configured database if it does not exist, updating the user password if that value has been changed, and updating the output user database secret.

    Setting provision_user_database to false after provisioning will not result in removal of the configured database, as the lambda is non-destructive as configured in this module.

    Please Note: This functionality is limited in that it will only provision a single database/user and configure a basic database, and should not be used in scenarios where more complex configuration is required.

    Initialize Terraform

    Run terraform init

    You should see output like:

    * provider.aws: version = "~> 2.32"

    Terraform has been successfully initialized!

    Deploy

    Run terraform apply to deploy the resources.

    If re-applying this module, variables (e.g. engine_version, snapshot_identifier ) that force a recreation of the database cluster may result in data loss if deletion protection is disabled. Examine the changeset carefully for resources that will be re-created/destroyed before applying.

    Review the changeset, and assuming it looks correct, type yes when prompted to confirm that you want to create all of the resources.

    Assuming the operation is successful, you should see output similar to the following (this example omits the creation of a user database/lambdas/security groups):

    terraform apply

    An execution plan has been generated and is shown below.
    Resource actions are indicated with the following symbols:
    + create

    Terraform will perform the following actions:

    # module.rds_cluster.aws_db_subnet_group.default will be created
    + resource "aws_db_subnet_group" "default" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + subnet_ids = [
    + "subnet-xxxxxxxxx",
    + "subnet-xxxxxxxxx",
    ]
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    }

    # module.rds_cluster.aws_rds_cluster.cumulus will be created
    + resource "aws_rds_cluster" "cumulus" {
    + apply_immediately = true
    + arn = (known after apply)
    + availability_zones = (known after apply)
    + backup_retention_period = 1
    + cluster_identifier = "xxxxxxxxx"
    + cluster_identifier_prefix = (known after apply)
    + cluster_members = (known after apply)
    + cluster_resource_id = (known after apply)
    + copy_tags_to_snapshot = false
    + database_name = "xxxxxxxxx"
    + db_cluster_parameter_group_name = (known after apply)
    + db_subnet_group_name = (known after apply)
    + deletion_protection = true
    + enable_http_endpoint = true
    + endpoint = (known after apply)
    + engine = "aurora-postgresql"
    + engine_mode = "serverless"
    + engine_version = "10.12"
    + final_snapshot_identifier = "xxxxxxxxx"
    + hosted_zone_id = (known after apply)
    + id = (known after apply)
    + kms_key_id = (known after apply)
    + master_password = (sensitive value)
    + master_username = "xxxxxxxxx"
    + port = (known after apply)
    + preferred_backup_window = "07:00-09:00"
    + preferred_maintenance_window = (known after apply)
    + reader_endpoint = (known after apply)
    + skip_final_snapshot = false
    + storage_encrypted = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_security_group_ids = (known after apply)

    + scaling_configuration {
    + auto_pause = true
    + max_capacity = 4
    + min_capacity = 2
    + seconds_until_auto_pause = 300
    + timeout_action = "RollbackCapacityChange"
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret.rds_login will be created
    + resource "aws_secretsmanager_secret" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + policy = (known after apply)
    + recovery_window_in_days = 30
    + rotation_enabled = (known after apply)
    + rotation_lambda_arn = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }

    + rotation_rules {
    + automatically_after_days = (known after apply)
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret_version.rds_login will be created
    + resource "aws_secretsmanager_secret_version" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + secret_id = (known after apply)
    + secret_string = (sensitive value)
    + version_id = (known after apply)
    + version_stages = (known after apply)
    }

    # module.rds_cluster.aws_security_group.rds_cluster_access will be created
    + resource "aws_security_group" "rds_cluster_access" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + egress = (known after apply)
    + id = (known after apply)
    + ingress = (known after apply)
    + name = (known after apply)
    + name_prefix = "cumulus_rds_cluster_access_ingress"
    + owner_id = (known after apply)
    + revoke_rules_on_delete = false
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_id = "vpc-xxxxxxxxx"
    }

    # module.rds_cluster.aws_security_group_rule.rds_security_group_allow_PostgreSQL will be created
    + resource "aws_security_group_rule" "rds_security_group_allow_postgres" {
    + from_port = 5432
    + id = (known after apply)
    + protocol = "tcp"
    + security_group_id = (known after apply)
    + self = true
    + source_security_group_id = (known after apply)
    + to_port = 5432
    + type = "ingress"
    }

    Plan: 6 to add, 0 to change, 0 to destroy.

    Do you want to perform these actions?
    Terraform will perform the actions described above.
    Only 'yes' will be accepted to approve.

    Enter a value: yes

    module.rds_cluster.aws_db_subnet_group.default: Creating...
    module.rds_cluster.aws_security_group.rds_cluster_access: Creating...
    module.rds_cluster.aws_secretsmanager_secret.rds_login: Creating...

    Then, after the resources are created:

    Apply complete! Resources: X added, 0 changed, 0 destroyed.
    Releasing state lock. This may take a few moments...

    Outputs:

    admin_db_login_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxxxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmdR
    admin_db_login_secret_version = xxxxxxxxx
    rds_endpoint = xxxxxxxxx.us-east-1.rds.amazonaws.com
    security_group_id = xxxxxxxxx
    user_credentials_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmpXA

    Note the output values for admin_db_login_secret_arn (and optionally user_credentials_secret_arn) as these provide the AWS Secrets Manager secret required to access the database as the administrative user and, optionally, the user database credentials Cumulus requires as well.

    The content of each of these secrets are is in the form:

    {
    "database": "postgres",
    "dbClusterIdentifier": "clusterName",
    "engine": "postgres",
    "host": "xxx",
    "password": "defaultPassword",
    "port": 5432,
    "username": "xxx"
    }
    • database -- the PostgreSQL database used by the configured user
    • dbClusterIdentifier -- the value set by the cluster_identifier variable in the terraform module
    • engine -- the Aurora/RDS database engine
    • host -- the RDS service host for the database in the form (dbClusterIdentifier)-(AWS ID string).(region).rds.amazonaws.com
    • password -- the database password
    • username -- the account username
    • port -- The database connection port, should always be 5432

    Next Steps

    The database cluster has been created/updated! From here you can continue to add additional user accounts, databases and other database configuration.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/deployment/share-s3-access-logs/index.html b/docs/v10.0.0/deployment/share-s3-access-logs/index.html index df558c1a209..c7e6b1ccd9a 100644 --- a/docs/v10.0.0/deployment/share-s3-access-logs/index.html +++ b/docs/v10.0.0/deployment/share-s3-access-logs/index.html @@ -5,14 +5,14 @@ Share S3 Access Logs | Cumulus Documentation - +
    Version: v10.0.0

    Share S3 Access Logs

    It is possible through Cumulus to share S3 access logs across multiple S3 packages using the S3 replicator package.

    S3 Replicator

    The S3 Replicator is a node package that contains a simple lambda function, associated permissions, and the Terraform instructions to replicate create-object events from one S3 bucket to another.

    First ensure that you have enabled S3 Server Access Logging.

    Next configure your config.tfvars as described in the s3-replicator/README.md to correspond to your deployment. The source_bucket and source_prefix are determined by how you enabled the S3 Server Access Logging.

    In order to deploy the s3-replicator with cumulus you will need to add the module to your terraform main.tf definition. e.g.

    module "s3-replicator" {
    source = "<path to s3-replicator.zip>"
    prefix = var.prefix
    vpc_id = var.vpc_id
    subnet_ids = var.subnet_ids
    permissions_boundary = var.permissions_boundary_arn
    source_bucket = var.s3_replicator_config.source_bucket
    source_prefix = var.s3_replicator_config.source_prefix
    target_bucket = var.s3_replicator_config.target_bucket
    target_prefix = var.s3_replicator_config.target_prefix
    }

    The terraform source package can be found on the Cumulus github release page under the asset tab terraform-aws-cumulus-s3-replicator.zip.

    ESDIS Metrics

    In the NGAP environment, the ESDIS Metrics team has set up an ELK stack to process logs from Cumulus instances. To use this system, you must deliver any S3 Server Access logs that Cumulus creates.

    Configure the S3 replicator as described above using the target_bucket and target_prefix provided by the metrics team.

    The metrics team has taken care of setting up Logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/deployment/terraform-best-practices/index.html b/docs/v10.0.0/deployment/terraform-best-practices/index.html index 0f88c3c0cd7..d9101fe299f 100644 --- a/docs/v10.0.0/deployment/terraform-best-practices/index.html +++ b/docs/v10.0.0/deployment/terraform-best-practices/index.html @@ -5,7 +5,7 @@ Terraform Best Practices | Cumulus Documentation - + @@ -88,7 +88,7 @@ AWS CLI command, replacing PREFIX with your deployment prefix name:

    aws resourcegroupstaggingapi get-resources \
    --query "ResourceTagMappingList[].ResourceARN" \
    --tag-filters Key=Deployment,Values=PREFIX

    Ideally, the output should be an empty list, but if it is not, then you may need to manually delete the listed resources.

    Configuring the Cumulus deployment: link Restoring a previous version: link

    - + \ No newline at end of file diff --git a/docs/v10.0.0/deployment/thin_egress_app/index.html b/docs/v10.0.0/deployment/thin_egress_app/index.html index e7d1970f065..0895ecbd891 100644 --- a/docs/v10.0.0/deployment/thin_egress_app/index.html +++ b/docs/v10.0.0/deployment/thin_egress_app/index.html @@ -5,7 +5,7 @@ Using the Thin Egress App for Cumulus distribution | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v10.0.0

    Using the Thin Egress App for Cumulus distribution

    The Thin Egress App (TEA) is an app running in Lambda that allows retrieving data from S3 using temporary links and provides URS integration.

    Configuring a TEA deployment

    TEA is deployed using Terraform modules. Refer to these instructions for guidance on how to integrate new components with your deployment.

    The cumulus-template-deploy repository cumulus-tf/main.tf contains a thin_egress_app for distribution.

    The TEA module provides these instructions showing how to add it to your deployment and the following are instructions to configure the thin_egress_app module in your Cumulus deployment.

    Create a secret for signing Thin Egress App JWTs

    The Thin Egress App uses JWTs internally to authenticate requests and requires a secret stored in AWS Secrets Manager containing SSH keys that are used to sign the JWTs.

    See the Thin Egress App documentation on how to create this secret with the correct values. It will be used later to set the thin_egress_jwt_secret_name variable when deploying the Cumulus module.

    bucket_map.yaml

    The Thin Egress App uses a bucket_map.yaml file to determine which buckets to serve. Documentation of the file format is available here.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple json mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    Please note: Cumulus only supports a one-to-one mapping of bucket->TEA path for 'distribution' buckets.

    Optionally configure a custom bucket map

    A simple config would look something like this:

    bucket_map.yaml
    MAP:
    my-protected: my-protected
    my-public: my-public

    PUBLIC_BUCKETS:
    - my-public

    Please note: your custom bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Optionally configure shared variables

    The cumulus module deploys certain components that interact with TEA. As a result, the cumulus module requires that if you are specifying a value for the stage_name variable to the TEA module, you must use the same value for the tea_api_gateway_stage variable to the cumulus module.

    One way to keep these variable values in sync across the modules is to use Terraform local values to define values to use for the variables for both modules. This approach is shown in the Cumulus core example deployment code.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/deployment/upgrade-readme/index.html b/docs/v10.0.0/deployment/upgrade-readme/index.html index 34cec03011a..1293d0b743a 100644 --- a/docs/v10.0.0/deployment/upgrade-readme/index.html +++ b/docs/v10.0.0/deployment/upgrade-readme/index.html @@ -5,7 +5,7 @@ Upgrading Cumulus | Cumulus Documentation - + @@ -15,7 +15,7 @@ deployment functions correctly. Please refer to some recommended smoke tests given above, and consider additional tests appropriate for your particular deployment and environment.

    Update Cumulus Dashboard

    If there are breaking (or otherwise significant) changes to the Cumulus API, you should also upgrade your Cumulus Dashboard deployment to use the version of the Cumulus API matching the version of Cumulus to which you are migrating.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/development/forked-pr/index.html b/docs/v10.0.0/development/forked-pr/index.html index 22bc6719b61..f35f52caf7e 100644 --- a/docs/v10.0.0/development/forked-pr/index.html +++ b/docs/v10.0.0/development/forked-pr/index.html @@ -5,13 +5,13 @@ Issuing PR From Forked Repos | Cumulus Documentation - +
    Version: v10.0.0

    Issuing PR From Forked Repos

    Fork the Repo

    • Fork the Cumulus repo
    • Create a new branch from the branch you'd like to contribute to
    • If an issue does't already exist, submit one (see above)

    Create a Pull Request

    Reviewing PRs from Forked Repos

    Upon submission of a pull request, the Cumulus development team will review the code.

    Once the code passes an initial review, the team will run the CI tests against the proposed update.

    The request will then either be merged, declined, or an adjustment to the code will be requested via the issue opened with the original PR request.

    PRs from forked repos cannot directly merged to master. Cumulus reviews must follow the following steps before completing the review process:

    1. Create a new branch:

        git checkout -b from-<name-of-the-branch> master
    2. Push the new branch to GitHub

    3. Change the destination of the forked PR to the new branch that was just pushed

      Screenshot of Github interface showing how to change the base branch of a pull request

    4. After code review and approval, merge the forked PR to the new branch.

    5. Create a PR for the new branch to master.

    6. If the CI tests pass, merge the new branch to master and close the issue. If the CI tests do not pass, request an amended PR from the original author/ or resolve failures as appropriate.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/development/integration-tests/index.html b/docs/v10.0.0/development/integration-tests/index.html index 5553d3badb3..efc9209f569 100644 --- a/docs/v10.0.0/development/integration-tests/index.html +++ b/docs/v10.0.0/development/integration-tests/index.html @@ -5,7 +5,7 @@ Integration Tests | Cumulus Documentation - + @@ -19,7 +19,7 @@ in the commit message.

    If you create a new stack and want to be able to run integration tests against it in CI, you will need to add it to bamboo/select-stack.js.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/development/quality-and-coverage/index.html b/docs/v10.0.0/development/quality-and-coverage/index.html index 2edbf4dfa53..995cb1ae789 100644 --- a/docs/v10.0.0/development/quality-and-coverage/index.html +++ b/docs/v10.0.0/development/quality-and-coverage/index.html @@ -5,7 +5,7 @@ Code Coverage and Quality | Cumulus Documentation - + @@ -23,7 +23,7 @@ here.

    To run linting on the markdown files, run npm run lint-md.

    Audit

    This project uses audit-ci to run a security audit on the package dependency tree. This must pass prior to merge. The configured rules for audit-ci can be found here.

    To execute an audit, run npm run audit.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/development/release/index.html b/docs/v10.0.0/development/release/index.html index 172bc1e5a21..4e7a0e363b5 100644 --- a/docs/v10.0.0/development/release/index.html +++ b/docs/v10.0.0/development/release/index.html @@ -5,7 +5,7 @@ Versioning and Releases | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v10.0.0

    Versioning and Releases

    Versioning

    We use a global versioning approach, meaning version numbers in cumulus are consistent across all packages and tasks, and semantic versioning to track major, minor, and patch version (i.e. 1.0.0). We use Lerna to manage our versioning. Any change will force lerna to increment the version of all packages.

    Read more about the semantic versioning here.

    Pre-release testing

    Note: This is only necessary when preparing a release for a new major version of Cumulus (e.g. preparing to go from 6.x.x to 7.0.0)

    Before releasing a new major version of Cumulus, we should test the deployment upgrade path from the latest release of Cumulus to the upcoming release.

    It is preferable to use the cumulus-template-deploy repo for testing the deployment, since that repo is the officially recommended deployment configuration for end users.

    You should create an entirely new deployment for this testing to replicate the end user upgrade path. Using an existing test or CI deployment would not be useful because that deployment may already have been deployed with the latest changes and not match the upgrade path for end users.

    Pre-release testing steps:

    1. Checkout the cumulus-template-deploy repo

    2. Update the deployment code to use the latest release artifacts if it wasn't done already. For example, assuming that the latest release was 5.0.1, update the deployment files as follows:

      # in data-persistence-tf/main.tf
      source = "https://github.com/nasa/cumulus/releases/download/v5.0.1/terraform-aws-cumulus.zip//tf-modules/data-persistence"

      # in cumulus-tf/main.tf
      source = "https://github.com/nasa/cumulus/releases/download/v5.0.1/terraform-aws-cumulus.zip//tf-modules/cumulus"
    3. For both the data-persistence-tf and cumulus-tf modules:

      1. Add the necessary backend configuration (terraform.tf) and variables (terraform.tfvars)
        • You should use an entirely new deployment for this testing, so make sure to use values for key in terraform.tf and prefix in terraform.tfvars that don't collide with existing deployments
      2. Run terraform init
      3. Run terraform apply
    4. Checkout the master branch of the cumulus repo

    5. Run a full bootstrap of the code: npm run bootstrap

    6. Build the pre-release artifacts: ./bamboo/create-release-artifacts.sh

    7. For both the data-persistence-tf and cumulus-tf modules:

      1. Update the deployment to use the built release artifacts:

        # in data-persistence-tf/main.tf
        source = "[path]/cumulus/terraform-aws-cumulus.zip//tf-modules/data-persistence"

        # in cumulus-tf/main.tf
        source = "/Users/mboyd/development/cumulus/terraform-aws-cumulus.zip//tf-modules/cumulus"
      2. Review the CHANGELOG.md for any pre-deployment migration steps. If there are, go through the steps and confirm that they are successful

      3. Run terraform init

      4. Run terraform apply

    8. Review the CHANGELOG.md for any post-deployment migration steps and confirm that they are successful

    9. Delete your test deployment by running terraform destroy in cumulus-tf and data-persistence-tf

    Updating Cumulus version and publishing to NPM

    1. Create a branch for the new release

    From Master

    Create a branch titled release-MAJOR.MINOR.x for the release (use a literal x for the patch version).

        git checkout -b release-MAJOR.MINOR.x

    e.g.:
    git checkout -b release-9.1.x

    If creating a new major version release from master, say 5.0.0, then the branch would be named release-5.0.x. If creating a new minor version release from master, say 1.14.0 then the branch would be named release-1.14.x.

    Having a release branch for each major/minor version allows us to easily backport patches to that version.

    Push the release-MAJOR.MINOR.x branch to GitHub if it was created locally. (Commits should be even with master at this point.)

    If creating a patch release, you can check out the existing base branch.

    Then create the release branch (e.g. release-1.14.0) from the minor version base branch. For example, from the release-1.14.x branch:

    git checkout -b release-1.14.0

    Backporting

    When creating a backport, a minor version base branch should already exist on GitHub. Check out the existing minor version base branch then create a release branch from it. For example:

    # check out existing minor version base branch
    git checkout release-1.14.x
    # pull to ensure you have the latest changes
    git pull origin release-1.14.x
    # create new release branch for backport
    git checkout -b release-1.14.1
    # cherry pick the commits (or single squashed commit of changes) relevant to the backport
    git cherry-pick [replace-with-commit-SHA]
    # push up the changes to the release branch
    git push

    2. Update the Cumulus version number

    When changes are ready to be released, the Cumulus version number must be updated.

    Lerna handles the process of deciding which version number should be used as long as the developer specifies whether the change is a major, minor, or patch change.

    To update Cumulus's version number run:

    npm run update

    Screenshot of terminal showing interactive prompt from Lerna for selecting the new release version

    Lerna will handle updating the packages and all of the dependent package version numbers. If a dependency has not been changed with the update, however, lerna will not update the version of the dependency.

    Note: Lerna will struggle to correctly update the versions on any non-standard/alpha versions (e.g. 1.17.0-alpha0). Please be sure to check any packages that are new or have been manually published since the previous release and any packages that list it as a dependency to ensure the listed versions are correct. It's useful to use the search feature of your code editor or grep to see if there any references to outdated package versions.

    3. Check Cumulus Dashboard PRs for Version Bump

    There may be unreleased changes in the Cumulus Dashboard project that rely on this unreleased Cumulus Core version.

    If there is exists a PR in the cumulus-dashboard repo with a name containing: "Version Bump for Next Cumulus API Release":

    • There will be a placeholder change-me value that should be replaced with the Cumulus Core to-be-released-version.
    • Mark that PR as ready to be reviewed.

    4. Update CHANGELOG.md

    Update the CHANGELOG.md. Put a header under the Unreleased section with the new version number and the date.

    Add a link reference for the github "compare" view at the bottom of the CHANGELOG.md, following the existing pattern. This link reference should create a link in the CHANGELOG's release header to changes in the corresponding release.

    5. Update DATA_MODEL_CHANGELOG.md

    Similar to #4, make sure the DATA_MODEL_CHANGELOG is updated if there are data model changes in the release, and the link reference at the end of the document is updated as appropriate.

    6. Update CONTRIBUTORS.md

    ./bin/update-contributors.sh
    git add CONTRIBUTORS.md

    Commit and push these changes, if any.

    7. Update Cumulus package API documentation

    Update auto-generated API documentation for any Cumulus packages that have it:

    npm run docs-build-packages

    Commit and push these changes, if any.

    8. Cut new version of Cumulus Documentation

    If this is a backport, do not create a new version of the documentation. For various reasons, we do not merge backports back to master, other than changelog notes. Documentation changes for backports will not be published to our documentation website.

    cd website
    npm run version ${release_version}
    git add .

    Where ${release_version} corresponds to the version tag v1.2.3, for example.

    Commit and push these changes.

    9. Create a pull request against the minor version branch

    1. Push the release branch (e.g. release-1.2.3) to GitHub.

    2. Create a PR against the minor version base branch (e.g. release-1.2.x).

    3. Configure Bamboo to run automated tests against this PR by finding the branch plan for the release branch (release-1.2.3) and setting only these variables:

      • GIT_PR: true
      • SKIP_AUDIT: true

      IMPORTANT: Do NOT set the PUBLISH_FLAG variable to true for this branch plan. The actual publishing of the release will be handled by a separate, manually triggered branch plan.

      Screenshot of Bamboo CI interface showing the configuration of the GIT_PR branch variable to have a value of &quot;true&quot;

    4. Verify that the Bamboo build for the PR succeeds and then merge to the minor version base branch (release-1.2.x).

      • It is safe to do a squash merge in this instance, but not required
    5. You may delete your release branch (release-1.2.3) after merging to the base branch.

    10. Create a git tag for the release

    Check out the minor version base branch now that your changes are merged in and do a git pull.

    Ensure you are on the latest commit.

    Create and push a new git tag:

        git tag -a vMAJOR.MINOR.PATCH -m "Release MAJOR.MINOR.PATCH"
    git push origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -a v9.1.0 -m "Release 9.1.0"
    git push origin v9.1.0

    11. Publishing the release

    Publishing of new releases is handled by a custom Bamboo branch plan and is manually triggered.

    The reasons for using a separate branch plan to handle releases instead of the branch plan for the minor version (e.g. release-1.2.x) are:

    • The Bamboo build for the minor version release branch is triggered automatically on any commits to that branch, whereas we want to manually control when the release is published.
    • We want to verify that integration tests have passed on the Bamboo build for the minor version release branch before we manually trigger the release, so that we can be sure that our code is safe to release.

    If this is a new minor version branch, then you will need to create a new Bamboo branch plan for publishing the release following the instructions below:

    Creating a Bamboo branch plan for the release

    • In the Cumulus Core project (https://ci.earthdata.nasa.gov/browse/CUM-CBA), click Actions -> Configure Plan in the top right.

    • Next to Plan branch click the rightmost button that displays Create Plan Branch upon hover.

    • Click Create plan branch manually.

    • Add the values in that list. Choose a display name that makes it very clear this is a deployment branch plan. Release (minor version branch name) seems to work well (e.g. Release (1.2.x))).

      • Make sure you enter the correct branch name (e.g. release-1.2.x).
    • Important Deselect Enable Branch - if you do not do this, it will immediately fire off a build.

    • Do Immediately On the Branch Details page, enable Change trigger. Set the Trigger type to manual, this will prevent commits to the branch from triggering the build plan. You should have been redirected to the Branch Details tab after creating the plan. If not, navigate to the branch from the list where you clicked Create Plan Branch in the previous step.

    • Go to the Variables tab. Ensure that you are on your branch plan and not the master plan: You should not see a large list of configured variables, but instead a dropdown allowing you to select variables to override, and the tab title will be Branch Variables. Then set the branch variables as follow:

      • DEPLOYMENT: cumulus-from-npm-tf (except in special cases such as incompatible backport branches)
        • If this variable is not set, it will default to the deployment name for the last committer on the branch
      • USE_CACHED_BOOTSTRAP: false
      • USE_TERRAFORM_ZIPS: true (IMPORTANT: MUST be set in order to run integration tests against the .zip files published during the build so that we are actually testing our released files)
      • GIT_PR: true
      • SKIP_AUDIT: true
      • PUBLISH_FLAG: true
    • Enable the branch from the Branch Details page.

    • Run the branch using the Run button in the top right.

    Bamboo will build and run lint, audit and unit tests against that tagged release, publish the new packages to NPM, and then run the integration tests using those newly released packages.

    12. Create a new Cumulus release on github

    The CI release scripts will automatically create a GitHub release based on the release version tag, as well as upload artifacts to the Github release for the Terraform modules provided by Cumulus. The Terraform release artifacts include:

    • A multi-module Terraform .zip artifact containing filtered copies of the tf-modules, packages, and tasks directories for use as Terraform module sources.
    • A S3 replicator module
    • A workflow module
    • A distribution API module
    • An ECS service module

    Just make sure to verify the appropriate .zip files are present on Github after the release process is complete.

    13. Merge base branch back to master

    Finally, you need to reproduce the version update changes back to master.

    If this is the latest version, you can simply create a PR to merge the minor version base branch back to master.

    Do not merge master back into the release branch since we want the release branch to just have the code from the release. Instead, create a new branch off of the release branch and merge that to master. You can freely merge master into this branch and delete it when it is merged to master.

    If this is a backport, you will need to create a PR that ports the changelog updates back to master. It is important in this changelog note to call it out as a backport. For example, fixes in backport version 1.14.5 may not be available in 1.15.0 because the fix was introduced in 1.15.3.

    Troubleshooting

    Delete and regenerate the tag

    To delete a published tag to re-tag, follow these steps:

      git tag -d vMAJOR.MINOR.PATCH
    git push -d origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -d v9.1.0
    git push -d origin v9.1.0
    - + \ No newline at end of file diff --git a/docs/v10.0.0/docs-how-to/index.html b/docs/v10.0.0/docs-how-to/index.html index d0e85e40faf..5e7a28f40c5 100644 --- a/docs/v10.0.0/docs-how-to/index.html +++ b/docs/v10.0.0/docs-how-to/index.html @@ -5,13 +5,13 @@ Cumulus Documentation: How To's | Cumulus Documentation - +
    Version: v10.0.0

    Cumulus Documentation: How To's

    Cumulus Docs Installation

    Run a Local Server

    Environment variables DOCSEARCH_API_KEY and DOCSEARCH_INDEX_NAME must be set for search to work. At the moment, search is only truly functional on prod because that is the only website we have registered to be indexed with DocSearch (see below on search).

    git clone git@github.com:nasa/cumulus
    cd cumulus
    npm run docs-install
    npm run docs-serve

    Note: docs-build will build the documents into website/build.

    Cumulus Documentation

    Our project documentation is hosted on GitHub Pages. The resources published to this website are housed in docs/ directory at the top of the Cumulus repository. Those resources primarily consist of markdown files and images.

    We use the open-source static website generator Docusaurus to build html files from our markdown documentation, add some organization and navigation, and provide some other niceties in the final website (search, easy templating, etc.).

    Add a New Page and Sidebars

    Adding a new page should be as simple as writing some documentation in markdown, placing it under the correct directory in the docs/ folder and adding some configuration values wrapped by --- at the top of the file. There are many files that already have this header which can be used as reference.

    ---
    id: doc-unique-id # unique id for this document. This must be unique across ALL documentation under docs/
    title: Title Of Doc # Whatever title you feel like adding. This will show up as the index to this page on the sidebar.
    hide_title: false
    ---

    Note: To have the new page show up in a sidebar the designated id must be added to a sidebar in the website/sidebars.js file. Docusaurus has an in depth explanation of sidebars here.

    Versioning Docs

    We lean heavily on Docusaurus for versioning. Their suggestions and walk-through can be found here. It is worth noting that we would like the Documentation versions to match up directly with release versions. Cumulus versioning is explained in the Versioning Docs.

    Search on our documentation site is taken care of by DocSearch. We have been provided with an apiKey and an indexName by DocSearch that we include in our website/siteConfig.js file. The rest, indexing and actual searching, we leave to DocSearch. Our builds expect environment variables for both these values to exist - DOCSEARCH_API_KEY and DOCSEARCH_NAME_INDEX.

    Add a new task

    The tasks list in docs/tasks.md is generated from the list of task package in the task folder. Do not edit the docs/tasks.md file directly.

    Read more about adding a new task.

    Editing the tasks.md header or template

    Look at the bin/build-tasks-doc.js and bin/tasks-header.md files to edit the output of the tasks build script.

    Editing diagrams

    For some diagrams included in the documentation, the raw source is included in the docs/assets/raw directory to allow for easy updating in the future:

    • assets/interfaces.svg -> assets/raw/interfaces.drawio (generated using draw.io)

    Deployment

    The master branch is automatically built and deployed to gh-pages branch. The gh-pages branch is served by Github Pages. Do not make edits to the gh-pages branch.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/external-contributions/index.html b/docs/v10.0.0/external-contributions/index.html index 552aa8db729..645b36cb69a 100644 --- a/docs/v10.0.0/external-contributions/index.html +++ b/docs/v10.0.0/external-contributions/index.html @@ -5,13 +5,13 @@ External Contributions | Cumulus Documentation - +
    Version: v10.0.0

    External Contributions

    Contributions to Cumulus may be made in the form of PRs to the repositories directly or through externally developed tasks and components. Cumulus is designed as an ecosystem that leverages Terraform deployments and AWS Step Functions to easily integrate external components.

    This list may not be exhaustive and represents components that are open source, owned externally, and that have been tested with the Cumulus system. For more information and contributing guidelines, visit the respective GitHub repositories.

    Distribution

    The ASF Thin Egress App is used by Cumulus for distribution. TEA can be deployed with Cumulus or as part of other applications to distribute data.

    Operational Cloud Recovery Archive (ORCA)

    ORCA can be deployed with Cumulus to provide a customizable baseline for creating and managing operational backups.

    Workflow Tasks

    CNM

    PO.DAAC provides two workflow tasks to be used with the Cloud Notification Mechanism (CNM) Schema: CNM to Granule and CNM Response.

    See the CNM workflow data cookbook for an example of how these can be used in a Cumulus ingest workflow.

    DMR++ Generation

    GHRC has provided a DMR++ Generation wokrflow task. This task is meant to be used in conjunction with Cumulus' Hyrax Metadata Updates workflow task.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/faqs/index.html b/docs/v10.0.0/faqs/index.html index 70a1e3b2d53..911280c9dcc 100644 --- a/docs/v10.0.0/faqs/index.html +++ b/docs/v10.0.0/faqs/index.html @@ -5,13 +5,13 @@ Frequently Asked Questions | Cumulus Documentation - +
    Version: v10.0.0

    Frequently Asked Questions

    Below are some commonly asked questions that you may encounter that can assist you along the way when working with Cumulus.

    General

    How do I deploy a new instance in Cumulus?

    Answer: For steps on the Cumulus deployment process go to How to Deploy Cumulus.

    What prerequisites are needed to setup Cumulus?

    Answer: You will need access to the AWS console and an Earthdata login before you can deploy Cumulus.

    What is the preferred web browser for the Cumulus environment?

    Answer: Our preferred web browser is the latest version of Google Chrome.

    How do I quickly troubleshoot an issue in Cumulus?

    Answer: To troubleshoot and fix issues in Cumulus reference our recommended solutions in Troubleshooting Cumulus.

    Where can I get support help?

    Answer: The following options are available for assistance:

    • Cumulus: Outside NASA users should file a GitHub issue and inside NASA users should file a JIRA issue.
    • AWS: You can create a case in the AWS Support Center, accessible via your AWS Console.

    Integrators & Developers

    What is a Cumulus integrator?

    Answer: Those who are working within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    What are the steps if I run into an issue during deployment?

    Answer: If you encounter an issue with your deployment go to the Troubleshooting Deployment guide.

    Is Cumulus customizable and flexible?

    Answer: Yes. Cumulus is a modular architecture that allows you to decide which components that you want/need to deploy. These components are maintained as Terraform modules.

    What are Terraform modules?

    Answer: They are modules that are composed to create a Cumulus deployment, which gives integrators the flexibility to choose the components of Cumulus that want/need. To view Cumulus maintained modules or steps on how to create a module go to Terraform modules.

    Where do I find Terraform module variables

    Answer: Go here for a list of Cumulus maintained variables.

    What is a Cumulus workflow?

    Answer: A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions. For more details, we suggest visiting here.

    How do I set up a Cumulus workflow?

    Answer: You will need to create a provider, have an associated collection (add a new one), and generate a new rule first. Then you can set up a Cumulus workflow by following these steps here.

    What are the common use cases that a Cumulus integrator encounters?

    Answer: The following are some examples of possible use cases you may see:


    Operators

    What is a Cumulus operator?

    Answer: Those that ingests, archives, and troubleshoots datasets (called collections in Cumulus). Your daily activities might include but not limited to the following:

    • Ingesting datasets
    • Maintaining historical data ingest
    • Starting and stopping data handlers
    • Managing collections
    • Managing provider definitions
    • Creating, enabling, and disabling rules
    • Investigating errors for granules and deleting or re-ingesting granules
    • Investigating errors in executions and isolating failed workflow step(s)
    What are the common use cases that a Cumulus operator encounters?

    Answer: The following are some examples of possible use cases you may see:

    Can you re-run a workflow execution in AWS?

    Answer: Yes. For steps on how to re-run a workflow execution go to Re-running workflow executions in the Cumulus Operator Docs.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/features/ancillary_metadata/index.html b/docs/v10.0.0/features/ancillary_metadata/index.html index 978a6738d0f..87761425d94 100644 --- a/docs/v10.0.0/features/ancillary_metadata/index.html +++ b/docs/v10.0.0/features/ancillary_metadata/index.html @@ -5,7 +5,7 @@ Ancillary Metadata Export | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v10.0.0

    Ancillary Metadata Export

    This feature utilizes the type key on a files object in a Cumulus granule. It uses the key to provide a mechanism where granule discovery, processing and other tasks can set and use this value to facilitate metadata export to CMR.

    Tasks setting type

    Discover Granules

    Uses the Collection type key to set the value for files on discovered granules in it's output.

    Parse PDR

    Uses a task-specific mapping to map PDR 'FILE_TYPE' to a CNM type to set type on granules from the PDR.

    CNMToCMALambdaFunction

    Natively supports types that are included in incoming messages to a CNM Workflow.

    Tasks using type

    Move Granules

    Uses the granule file type key to update UMM/ECHO 10 CMR files passed in as candidates to the task. This task adds the external facing URLs to the CMR metadata file based on the type. See the file tracking data cookbook for a detailed mapping. If a non-CNM type is specified, the task assumes it is a 'data' file.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/features/backup_and_restore/index.html b/docs/v10.0.0/features/backup_and_restore/index.html index a212f51fc7b..78628ac8f8f 100644 --- a/docs/v10.0.0/features/backup_and_restore/index.html +++ b/docs/v10.0.0/features/backup_and_restore/index.html @@ -5,7 +5,7 @@ Cumulus Backup and Restore | Cumulus Documentation - + @@ -71,7 +71,7 @@ utilize the new cluster/security groups and redeploy.

    DynamoDB

    Backup and Restore with AWS

    You can enable point-in-time recovery (PITR) as well as create an on-demand backup for your Amazon DynamoDB tables.

    PITR provides continuous backups of your DynamoDB table data. PITR can be enabled through your Terraform deployment, the AWS console, or the AWS API. When enabled, DynamoDB maintains continuous backups of your table up to the last 35 days. You can recover a copy of that table to a previous state at any point in time from the moment you enable PITR, up to a maximum of the 35 preceding days. PITR provides continuous backups until you explicitly disable it.

    On-demand backups allow you to create backups of DynamoDB table data and its settings. You can initiate an on-demand backup at any time with a single click from the AWS Management Console or a single API call. You can restore the backups to a new DynamoDB table in the same AWS Region at any time.

    PITR gives your DynamoDB tables continuous protection from accidental writes and deletes. With PITR, you do not have to worry about creating, maintaining, or scheduling backups. You enable PITR on your table and your backup is available for restore at any point in time from the moment you enable it, up to a maximum of the 35 preceding days. For example, imagine a test script writing accidentally to a production DynamoDB table. You could recover your table to any point in time within the last 35 days.

    On-demand backups help with long-term archival requirements for regulatory compliance. On-demand backups give you full-control of managing the lifecycle of your backups, from creating as many backups as you need to retaining these for as long as you need.

    Enabling PITR during deployment

    By default, the Cumulus data-persistence module enables PITR on the default tables listed in the module's variable defaults for enable_point_in_time_tables. At the time of writing, that list includes:

    • AsyncOperationsTable
    • CollectionsTable
    • ExecutionsTable
    • FilesTable
    • GranulesTable
    • PdrsTable
    • ProvidersTable
    • RulesTable

    If you wish to change this list, simply update your deployment's data_persistence module (here in the template-deploy repository) to pass the correct list of tables.

    Restoring with PITR

    Restoring a full deployment

    If your deployment has been deleted all of your tables with PITR enabled will have had backups created automatically. You can locate these backups in the AWS console in the DynamoDb Backups Page or through the CLI by running:

    aws dynamodb list-backups --backup-type SYSTEM

    You can restore your tables to your AWS account using the following command:

    aws dynamodb restore-table-from-backup --target-table-name <prefix>-CollectionsTable --backup-arn <backup-arn>

    Where prefix matches the prefix from your data-persistence deployment. backup-arn can be found in the AWS console or by listing the backups using the command above.

    This will restore your tables to AWS. They will need to be linked to your Terraform deployment. After terraform init and before terraform apply, run the following command for each table:

    terraform import module.data_persistence.aws_dynamodb_table.collections_table <prefix>-CollectionsTable

    replacing collections_table with the table identifier in the DynamoDB Terraform table definitions.

    Terraform will now manage these tables as part of the Terraform state. Run terrform apply to generate the rest of the data-persistence deployment and then follow the instructions to deploy the cumulus deployment as normal.

    At this point the data will be in DynamoDB, but not in Elasticsearch, so nothing will be returned on the Operator dashboard or through Operator API calls. To get the data into Elasticsearch, run an index-from-database operation via the Operator API. The status of this operation can be viewed on the dashboard. When Elasticsearch is switched to the recovery index the data will be visible on the dashboard and available via the Operator API.

    Restoring an individual table

    A table can be restored to a previous state using PITR. This is easily achievable via the AWS Console by visiting the Backups tab for the table.

    A table can only be recovered to a new table name. Following the restoration of the table, the new table must be imported into Terraform.

    First, remove the old table from the Terraform state:

    terraform state rm module.data_persistence.aws_dynamodb_table.collections_table

    replacing collections_table with the table identifier in the DynamoDB Terraform table definitions.

    Then import the new table into the Terraform state:

    terraform import module.data_persistence.aws_dynamodb_table.collections_table <new-table-name>

    replacing collections_table with the table identifier in the DynamoDB Terraform table definitions.

    Your data-persistence and cumulus deployments should be redeployed so that your instance of Cumulus uses this new table. After the deployment, your Elasticsearch instance will be out of sync with your new table if there is any change in data. To resync your Elasticsearch with your database run an index-from-database operation via the Operator API. The status of this operation can be viewed on the dashboard. When Elasticsearch is switched to the new index the DynamoDB tables and Elasticsearch instance will be in sync and the correct data will be reflected on the dashboard.

    Backup and Restore with cumulus-api CLI

    cumulus-api CLI also includes a backup and restore command. The CLI backup command downloads the content of any of your DynamoDB tables to .json files. You can also use these .json files to restore the records to another DynamoDB table.

    Backup with the CLI

    To backup a table with the CLI, install the @cumulus/api package using npm, making sure to install the same version as your Cumulus deployment:

    npm install -g @cumulus/api@version

    Then run:

    cumulus-api backup --table <table-name>

    the backup will be stored at backups/<table-name>.json

    Restore with the CLI

    To restore data from a json file run the following command:

    cumulus-api restore backups/<table-name>.json --table <table-name>

    The restore can go to the in-use table and will update Elasticsearch. If an existing record exists in the table it will not be duplicated but will be updated with the record from the restore file.

    Data Backup and Restore

    Cumulus provides no core functionality to backup data stored in S3. Data disaster recovery is being developed in a separate effort here.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/features/data_in_dynamodb/index.html b/docs/v10.0.0/features/data_in_dynamodb/index.html index 3d45669dd42..329f323cbea 100644 --- a/docs/v10.0.0/features/data_in_dynamodb/index.html +++ b/docs/v10.0.0/features/data_in_dynamodb/index.html @@ -5,13 +5,13 @@ Cumulus Metadata in DynamoDB | Cumulus Documentation - +
    Version: v10.0.0

    Cumulus Metadata in DynamoDB

    @cumulus/api uses a number of methods to preserve the metadata generated in a Cumulus instance.

    All configurations and system-generated metadata is stored in DynamoDB tables except the logs. System logs are stored in the AWS CloudWatch service.

    Amazon DynamoDB stores three geographically distributed replicas of each table to enable high availability and data durability. Amazon DynamoDB runs exclusively on solid-state drives (SSDs). SSDs help AWS achieve the design goals of predictable low-latency response times for storing and accessing data at any scale.

    DynamoDB Auto Scaling

    Cumulus deployed tables from the data-persistence module are set to on-demand mode.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/features/dead_letter_archive/index.html b/docs/v10.0.0/features/dead_letter_archive/index.html index eb6daeecae3..b4303eb6641 100644 --- a/docs/v10.0.0/features/dead_letter_archive/index.html +++ b/docs/v10.0.0/features/dead_letter_archive/index.html @@ -5,13 +5,13 @@ Cumulus Dead Letter Archive | Cumulus Documentation - +
    Version: v10.0.0

    Cumulus Dead Letter Archive

    This documentation explains the Cumulus dead letter archive and associated functionality.

    DB Records DLQ Archive

    The Cumulus system contains a number of dead letter queues. Perhaps the most important system lambda function supported by a DLQ is the sfEventSqsToDbRecords lambda function which parses Cumulus messages from workflow executions to generate and write database records to the Cumulus database.

    As of Cumulus v9+, the dead letter queue for this lambda (named sfEventSqsToDbRecordsDeadLetterQueue) has been updated with a consumer lambda that will automatically write any incoming records to the S3 system bucket, under the path <stackName>/dead-letter-archive/sqs/. This will allow integrators and operators engaged in debugging missing records to inspect any Cumulus messages which failed to process and did not result in the successful creation of database records.

    Dead Letter Archive recovery

    In addition to the above, as of Cumulus v9+, the Cumulus API also contains a new endpoint at /deadLetterArchive/recoverCumulusMessages.

    Sending a POST request to this endpoint will trigger a Cumulus AsyncOperation that will attempt to reprocess (and if successful delete) all Cumulus messages in the dead letter archive, using the same underlying logic as the existing sfEventSqsToDbRecords.

    This endpoint may prove particularly useful when recovering from extended or unexpected database outage, where messages failed to process due to external outage and there is no essential malformation of each Cumulus message.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/features/dead_letter_queues/index.html b/docs/v10.0.0/features/dead_letter_queues/index.html index 58caf94aaba..532f3a6b685 100644 --- a/docs/v10.0.0/features/dead_letter_queues/index.html +++ b/docs/v10.0.0/features/dead_letter_queues/index.html @@ -5,13 +5,13 @@ Dead Letter Queues | Cumulus Documentation - +
    Version: v10.0.0

    Dead Letter Queues

    startSF SQS queue

    The workflow-trigger for the startSF queue has a Redrive Policy set up that directs any failed attempts to pull from the workflow start queue to a SQS queue Dead Letter Queue.

    This queue can then be monitored for failures to initiate a workflow. Please note that workflow failures will not show up in this queue, only repeated failure to trigger a workflow.

    Named Lambda Dead Letter Queues

    Cumulus provides configured Dead Letter Queues (DLQ) for non-workflow Lambdas (such as ScheduleSF) to capture Lambda failures for further processing.

    These DLQs are setup with the following configuration:

      receive_wait_time_seconds  = 20
    message_retention_seconds = 1209600
    visibility_timeout_seconds = 60

    Default Lambda Configuration

    The following built-in Cumulus Lambdas are setup with DLQs to allow handling of process failures:

    • dbIndexer (Updates Elasticsearch based on DynamoDB events)
    • JobsLambda (writes logs outputs to Elasticsearch)
    • ScheduleSF (the SF Scheduler Lambda that places messages on the queue that is used to start workflows, see Workflow Triggers)
    • publishReports (Lambda that publishes messages to the SNS topics for execution, granule and PDR reporting)
    • reportGranules, reportExecutions, reportPdrs (Lambdas responsible for updating records based on messages in the queues published by publishReports)

    Troubleshooting/Utilizing messages in a Dead Letter Queue

    Ideally an automated process should be configured to poll the queue and process messages off a dead letter queue.

    For aid in manually troubleshooting, you can utilize the SQS Management console to view/messages available in the queues setup for a particular stack. The dead letter queues will have a Message Body containing the Lambda payload, as well as Message Attributes that reference both the error returned and a RequestID which can be cross referenced to the associated Lambda's CloudWatch logs for more information:

    Screenshot of the AWS SQS console showing how to view SQS message attributes

    - + \ No newline at end of file diff --git a/docs/v10.0.0/features/distribution-metrics/index.html b/docs/v10.0.0/features/distribution-metrics/index.html index a5d8c4b65db..dec7e91d2bb 100644 --- a/docs/v10.0.0/features/distribution-metrics/index.html +++ b/docs/v10.0.0/features/distribution-metrics/index.html @@ -5,13 +5,13 @@ Cumulus Distribution Metrics | Cumulus Documentation - +
    Version: v10.0.0

    Cumulus Distribution Metrics

    It is possible to configure Cumulus and the Cumulus Dashboard to display information about the successes and failures of requests for data. This requires the Cumulus instance to deliver Cloudwatch Logs and S3 Server Access logs to an ELK stack.

    ESDIS Metrics in NGAP

    Work with the ESDIS metrics team to set up permissions and access to forward Cloudwatch Logs to a shared AWS:Logs:Destination as well as transferring your S3 Server Access logs to a metrics team bucket.

    The metrics team has taken care of setting up logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    Once Cumulus has been configured to deliver Cloudwatch logs to the ESDIS Metrics team, you can use the Elasticsearch indexes to create the necessary target patterns on the dashboard. These are often <daac>-cloudwatch-cumulus-<env>-* and <daac>-distribution-<env>-*, but they will depend on your specific Elastiscearch setup.

    Cumulus / ESDIS Metrics distribution system

    Architecture diagram showing how logs are replicated from a Cumulus instance to the ESDIS Metrics account and accessed by the Cumulus dashboard

    - + \ No newline at end of file diff --git a/docs/v10.0.0/features/execution_payload_retention/index.html b/docs/v10.0.0/features/execution_payload_retention/index.html index ee29c39bbf0..7a14d7a32e3 100644 --- a/docs/v10.0.0/features/execution_payload_retention/index.html +++ b/docs/v10.0.0/features/execution_payload_retention/index.html @@ -5,13 +5,13 @@ Execution Payload Retention | Cumulus Documentation - +
    Version: v10.0.0

    Execution Payload Retention

    In addition to CloudWatch logs and AWS StepFunction API records, Cumulus automatically stores the initial and 'final' (the last update to the execution record) payload values as part of the Execution record in DynamoDB and Elasticsearch.

    This allows access via the API (or optionally direct DB/Elasticsearch querying) for debugging/reporting purposes. The data is stored in the "originalPayload" and "finalPayload" fields.

    Payload record cleanup

    To reduce storage requirements, a CloudWatch rule ({stack-name}-dailyExecutionPayloadCleanupRule) triggering a daily run of the provided cleanExecutions lambda has been added. This lambda will remove all 'completed' and 'non-completed' payload records in the database that are older than the specified configuration.

    Configuration

    The following configuration flags have been made available in the cumulus module. They may be overridden in your deployment's instance of the cumulus module by adding the following configuration options:

    dailyexecution_payload_cleanup_schedule_expression (string)_

    This configuration option sets the execution times for this Lambda to run, using a Cloudwatch cron expression.

    Default value is "cron(0 4 * * ? *)".

    completeexecution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of completed execution payloads.

    Default value is false.

    completeexecution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a 'completed' status in days. Records with updatedAt values older than this with payload information will have that information removed.

    Default value is 10.

    noncomplete_execution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of "non-complete" (any status other than completed) execution payloads.

    Default value is false.

    noncomplete_execution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a status other than 'complete' in days. Records with updateTime values older than this with payload information will have that information removed.

    Default value is 30 days.

    • complete_execution_payload_disable/non_complete_execution_payload_disable

    These flags (true/false) determine if the cleanup script's logic for 'complete' and 'non-complete' executions will run. Default value is false for both.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/features/logging-esdis-metrics/index.html b/docs/v10.0.0/features/logging-esdis-metrics/index.html index 799ac893306..5c156e0bd38 100644 --- a/docs/v10.0.0/features/logging-esdis-metrics/index.html +++ b/docs/v10.0.0/features/logging-esdis-metrics/index.html @@ -5,13 +5,13 @@ Writing logs for ESDIS Metrics | Cumulus Documentation - +
    Version: v10.0.0

    Writing logs for ESDIS Metrics

    Note: This feature is only available for Cumulus deployments in NGAP environments.

    Prerequisite: You must configure your Cumulus deployment to deliver your logs to the correct shared logs destination for ESDIS metrics.

    Log messages delivered to the ESDIS metrics logs destination conforming to an expected format will be automatically ingested and parsed to enable helpful searching/filtering of your logs via the ESDIS metrics Kibana dashboard.

    Expected log format

    The ESDIS metrics pipeline expects a log message to be a JSON string representation of an object (dict in Python or map in Java). An example log message might look like:

    {
    "level": "info",
    "executions": "arn:aws:states:us-east-1:000000000000:execution:MySfn:abcd1234",
    "granules": "[\"granule-1\",\"granule-2\"]",
    "message": "hello world",
    "sender": "greetingFunction",
    "stackName": "myCumulus",
    "timestamp": "2018-10-19T19:12:47.501Z"
    }

    A log message can contain the following properties:

    • executions: The AWS Step Function execution name in which this task is executing, if any
    • granules: A JSON string of the array of granule IDs being processed by this code, if any
    • level: A string identifier for the type of message being logged. Possible values:
      • debug
      • error
      • fatal
      • info
      • warn
      • trace
    • message: String containing your actual log message
    • parentArn: The parent AWS Step Function execution ARN that triggered the current execution, if any
    • sender: The name of the resource generating the log message (e.g. a library name, a Lambda function name, an ECS activity name)
    • stackName: The unique prefix for your Cumulus deployment
    • timestamp: An ISO-8601 formatted timestamp
    • version: The version of the resource generating the log message, if any

    None of these properties are explicitly required for ESDIS metrics to parse your log correctly. However, a log without a message has no informational content. And having level, sender, and timestamp properties is very useful for filtering your logs. Including a stackName in your logs is helpful as it allows you to distinguish between logs generated by different deployments.

    Using Cumulus Message Adapter libraries

    If you are writing a custom task that is integrated with the Cumulus Message Adapter, then some of language specific client libraries can be used to write logs compatible with ESDIS metrics.

    The usage of each library differs slightly, but in general a logger is initialized with a Cumulus workflow message to determine the contextual information for the task (e.g. granules, executions). Then, after the logger is initialized, writing logs only requires specifying a message, but the logged output will include the contextual information as well.

    Writing logs using custom code

    Any code that produces logs matching the expected log format can be processed by ESDIS metrics.

    Node.js

    Cumulus core provides a @cumulus/logger library that writes logs in the expected format for ESDIS metrics.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/features/replay-archived-sqs-messages/index.html b/docs/v10.0.0/features/replay-archived-sqs-messages/index.html index 7a2fb49724e..adae509d9ae 100644 --- a/docs/v10.0.0/features/replay-archived-sqs-messages/index.html +++ b/docs/v10.0.0/features/replay-archived-sqs-messages/index.html @@ -5,14 +5,14 @@ How to replay SQS messages archived in S3 | Cumulus Documentation - +
    Version: v10.0.0

    How to replay SQS messages archived in S3

    Context

    Cumulus archives all incoming SQS messages to S3 and removes messages once they have been processed. Unprocessed messages are archived at the path: ${stackName}/archived-incoming-messages/${queueName}/${messageId}

    Replay SQS messages endpoint

    The Cumulus API has added a new endpoint, /replays/sqs. This endpoint will allow you to start a replay operation to requeue all archived SQS messages by queueName and returns an AsyncOperationId for operation status tracking.

    Start replaying archived SQS messages

    In order to start a replay, you must perform a POST request to the replays/sqs endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    FieldTypeDescription
    queueNamestringAny valid SQS queue name (not ARN)

    Status tracking

    A successful response from the /replays/sqs endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/features/replay-kinesis-messages/index.html b/docs/v10.0.0/features/replay-kinesis-messages/index.html index 1cef8c713a9..1b04e327b93 100644 --- a/docs/v10.0.0/features/replay-kinesis-messages/index.html +++ b/docs/v10.0.0/features/replay-kinesis-messages/index.html @@ -5,7 +5,7 @@ How to replay Kinesis messages after an outage | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v10.0.0

    How to replay Kinesis messages after an outage

    After a period of outage, it may be necessary for a Cumulus operator to reprocess or 'replay' messages that arrived on an AWS Kinesis Data Stream but did not trigger an ingest. This document serves as an outline on how to start a replay operation, and how to perform status tracking. Cumulus supports replay of all Kinesis messages on a stream (subject to the normal RetentionPeriod constraints), or all messages within a given time slice delimited by start and end timestamps.

    As Kinesis has no comparable field to e.g. the SQS ReceiveCount on its records, Cumulus cannot tell which messages within a given time slice have never been processed, and cannot guarantee only missed messages will be processed. Users will have to rely on duplicate handling or some other method of identifying messages that should not be processed within the time slice.

    NOTE: This operation flow effectively changes only the trigger mechanism for Kinesis ingest notifications. The existence of valid Kinesis-type rules and all other normal requirements for the triggering of ingest via Kinesis still apply.

    Replays endpoint

    Cumulus has added a new endpoint to its API, /replays. This endpoint will allow you to start replay operations and returns an AsyncOperationId for operation status tracking.

    Start a replay

    In order to start a replay, you must perform a POST request to the replays endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    NOTE: As the endTimestamp relies on a comparison with the Kinesis server-side ApproximateArrivalTimestamp, and given that there is no documented level of accuracy for the approximation, it is recommended that the endTimestamp include some amount of buffer to allow for slight discrepancies. If tolerable, the same is recommended for the startTimestamp although it is used differently and less vulnerable to discrepancies since a server-side arrival timestamp should never be earlier than the client-side request timestamp.

    FieldTypeRequiredDescription
    typestringrequiredCurrently only accepts kinesis.
    kinesisStreamstringfor type kinesisAny valid kinesis stream name (not ARN)
    kinesisStreamCreationTimestamp*optionalAny input valid for a JS Date constructor. For reasons to use this field see AWS documentation on StreamCreationTimestamp.
    endTimestamp*optionalAny input valid for a JS Date constructor. Messages newer than this timestamp will be skipped.
    startTimestamp*optionalAny input valid for a JS Date constructor. Messages will be fetched from the Kinesis stream starting at this timestamp. Ignored if it is further in the past than the stream's retention period.

    Status tracking

    A successful response from the /replays endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/features/reports/index.html b/docs/v10.0.0/features/reports/index.html index b1d9fe48bbf..265c3b3fab0 100644 --- a/docs/v10.0.0/features/reports/index.html +++ b/docs/v10.0.0/features/reports/index.html @@ -5,7 +5,7 @@ Reconciliation Reports | Cumulus Documentation - + @@ -19,7 +19,7 @@ report generation. The data buckets will include any buckets in your Cumulus buckets configuration that have type public, protected or private.
    - + \ No newline at end of file diff --git a/docs/v10.0.0/getting-started/index.html b/docs/v10.0.0/getting-started/index.html index d445ceb760e..86adff0163d 100644 --- a/docs/v10.0.0/getting-started/index.html +++ b/docs/v10.0.0/getting-started/index.html @@ -5,13 +5,13 @@ Getting Started | Cumulus Documentation - +
    Version: v10.0.0

    Getting Started

    Overview | Quick Tutorials | Helpful Tips

    Overview

    This serves as a guide for new Cumulus users to deploy and learn how to use Cumulus. Here you will learn what you need in order to complete any prerequisites, what Cumulus is and how it works, and how to successfully navigate and deploy a Cumulus environment.

    What is Cumulus

    Cumulus is an open source set of components for creating cloud-based data ingest, archive, distribution and management designed for NASA's future Earth Science data streams.

    Who uses Cumulus

    Data integrators/developers and operators across projects not limited to NASA use Cumulus for their daily work functions.

    Cumulus Roles

    Integrator/Developer

    Cumulus integrators/developers are those who work within Cumulus and AWS for deployments and to manage workflows.

    Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections.

    Role Guides

    As a developer, integrator, or operator, you will need to set up your environments to work in Cumulus. The following docs can get you started in your role specific activities.

    What is a Cumulus Data Type

    In Cumulus, we have the following types of data that you can create and manage:

    • Collections
    • Granules
    • Providers
    • Rules
    • Workflows
    • Executions
    • Reports

    For details on how to create or manage data types go to Data Management Types.


    Quick Tutorials

    Deployment & Configuration

    Cumulus is deployed to an AWS account, so you must have access to deploy resources to an AWS account to get started.

    1. Deploy Cumulus and Cumulus Dashboard to AWS

    Follow the deployment instructions to deploy Cumulus to your AWS account.

    2. Configure and Run the HelloWorld Workflow

    If you have deployed using the cumulus-template-deploy repository, you have a HelloWorld workflow deployed to your Cumulus backend.

    You can see your deployed workflows on the Workflows page of your Cumulus dashboard.

    Configure a collection and provider using the setup guidance on the Cumulus dashboard.

    Then create a rule to trigger your HelloWorld workflow. You can select a rule type of one time.

    Navigate to the Executions page of the dashboard to check the status of your workflow execution.

    3. Configure a Custom Workflow

    See Developing a custom workflow documentation for adding a new workflow to your deployment.

    There are plenty of workflow examples using Cumulus tasks here. The Data Cookbooks provide a more in-depth look at some of these more advanced workflows and their configurations.

    There is a list of Cumulus tasks already included in your deployment here.

    After configuring your workflow and redeploying, you can configure and run your workflow using the same steps as in step 2.


    Helpful Tips

    Here are some useful tips to keep in mind when deploying or working in Cumulus.

    Integrator/Developer

    • Versioning and Releases: This documentation gives information on our global versioning approach. We suggest upgrading to the supported version for Cumulus, Cumulus dashboard, and Thin Egress App (TEA).
    • Cumulus Developer Documentation: We suggest that you read through and reference this resource for development best practices in Cumulus.
    • Cumulus Deployment: We will guide you on how to manually deploy a new instance of Cumulus. In this reference, you will learn how to install Terraform, create an AWS S3 bucket, configure a compatible database, and create a Lambda layer.
    • Terraform Best Practices: This will help guide you through your Terraform configuration and Cumulus deployment. For an introduction about Terraform go here.
    • Integrator Common Use Cases: Scenarios to help integrators along in the Cumulus environment.

    Operator

    Troubleshooting

    Troubleshooting: Some suggestions to help you troubleshoot and solve issues you may encounter.

    Resources

    - + \ No newline at end of file diff --git a/docs/v10.0.0/glossary/index.html b/docs/v10.0.0/glossary/index.html index dcd68fc9ff5..dc800e8fcd1 100644 --- a/docs/v10.0.0/glossary/index.html +++ b/docs/v10.0.0/glossary/index.html @@ -5,13 +5,13 @@ Glossary | Cumulus Documentation - +
    Version: v10.0.0

    Glossary

    AWS Glossary

    For terms/items from Amazon/AWS not mentioned in this glossary, please refer to the AWS Glossary.

    Cumulus Glossary of Terms

    API Gateway

    Refers to AWS's API Gateway. Used by the Cumulus API.

    ARN

    Refers to an AWS "Amazon Resource Name".

    For more info, see the AWS documentation.

    AWS

    See: aws.amazon.com

    AWS Lambda/Lambda Function

    AWS's 'serverless' option. Allows the running of code without provisioning a service or managing server/ECS instances/etc.

    For more information, see the AWS Lambda documentation.

    AWS Access Keys

    Access credentials that give you access to AWS to act as a IAM user programmatically or from the command line.

    For more information, see the AWS IAM Documentation.

    Bucket

    An Amazon S3 cloud storage resource.

    For more information, see the AWS Bucket Documentation.

    CloudFormation

    An AWS service that allows you to define and manage cloud resources as a preconfigured block.

    For more information, see the AWS CloudFormation User Guide.

    Cloudformation Template

    A template that defines an AWS Cloud Formation.

    For more information, see the AWS intro page.

    Cloudwatch

    AWS service that allows logging and metrics collections on various cloud resources you have in AWS.

    For more information, see the AWS User Guide.

    Cloud Notification Mechanism (CNM)

    An interface mechanism to support cloud-based ingest messaging. For more information, see PO.DAAC's CNM Schema.

    Common Metadata Repository (CMR)

    "A high-performance, high-quality, continuously evolving metadata system that catalogs Earth Science data and associated service metadata records". For more information, see NASA's CMR page.

    Collection (Cumulus)

    Cumulus Collections are logical sets of data objects of the same data type and version.

    For more information, see cookbook reference page.

    Cumulus Message Adapter (CMA)

    A library designed to help task developers integrate step function tasks into a Cumulus workflow by adapting task input/output into the Cumulus Message format.

    For more information, see CMA workflow reference page.

    Distributed Active Archive Center (DAAC)

    Refers to a specific organization that's part of NASA's distributed system of archive centers. For more information see EOSDIS's DAAC page

    Dead Letter Queue (DLQ)

    This refers to Amazon SQS Dead-Letter Queues - these SQS queues are specifically configured to capture failed messages from other services/SQS queues/etc to allow for processing of failed messages.

    For more on DLQs, see the Amazon Documentation and the Cumulus DLQ feature page.

    Developer

    Those who setup deployment and workflow management for Cumulus. Sometimes referred to as an integrator. See integrator.

    ECS

    Amazon's Elastic Container Service. Used in Cumulus by workflow steps that require more flexibility than Lambda can provide.

    For more information, see AWS's developer guide.

    ECS Activity

    An ECS instance run via a Step Function.

    Execution (Cumulus)

    A Cumulus execution refers to a single execution of a (Cumulus) Workflow.

    GIBS

    Global Imagery Browse Services

    Granule

    A granule is the smallest aggregation of data that can be independently managed (described, inventoried, and retrieved). Granules are always associated with a collection, which is a grouping of granules. A granule is a grouping of data files.

    IAM

    AWS Identity and Access Management.

    For more information, see AWS IAMs.

    Integrator/Developer

    Those who work within Cumulus and AWS for deployments and to manage workflows.

    Kinesis

    Amazon's platform for streaming data on AWS.

    See AWS Kinesis for more information.

    Lambda

    AWS's cloud service that lets you run code without provisioning or managing servers.

    For more information, see AWS's lambda page.

    Module (Terraform)

    Refers to a terraform module.

    Node

    See node.js.

    Npm

    Node package manager.

    For more information, see npmjs.com.

    Operator

    Those who work within Cumulus to ingest/archive data and manage collections.

    PDR

    "Polling Delivery Mechanism" used in "DAAC Ingest" workflows.

    For more information, see nasa.gov.

    Packages (NPM)

    NPM hosted node.js packages. Cumulus packages can be found on NPM's site here

    Provider

    Data source that generates and/or distributes data for Cumulus workflows to act upon.

    For more information, see the Cumulus documentation.

    Rule

    Rules are configurable scheduled events that trigger workflows based on various criteria.

    For more information, see the Cumulus Rules documentation.

    S3

    Amazon's Simple Storage Service provides data object storage in the cloud. Used in Cumulus to store configuration, data and more.

    For more information, see AWS's s3 page.

    SIPS

    Science Investigator-led Processing Systems. In the context of DAAC ingest, this refers to data producers/providers.

    For more information, see nasa.gov.

    SNS

    Amazon's Simple Notification Service provides a messaging service that allows publication of and subscription to events. Used in Cumulus to trigger workflow events, track event failures, and others.

    For more information, see AWS's SNS page.

    SQS

    Amazon's Simple Queue Service.

    For more information, see AWS's SQS page.

    Stack

    A collection of AWS resources you can manage as a single unit.

    In the context of Cumulus, this refers to a deployment of the cumulus and data-persistence modules that is managed by Terraform

    Step Function

    AWS's web service that allows you to compose complex workflows as a state machine comprised of tasks (Lambdas, activities hosted on EC2/ECS, some AWS service APIs, etc). See AWS's Step Function Documentation for more information. In the context of Cumulus these are the underlying AWS service used to create Workflows.

    Terraform

    Terraform is the tool that you will use for deployment and configuration of your Cumulus environment.

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/index.html b/docs/v10.0.0/index.html index b10970e355d..1334de593c5 100644 --- a/docs/v10.0.0/index.html +++ b/docs/v10.0.0/index.html @@ -5,13 +5,13 @@ Introduction | Cumulus Documentation - +
    Version: v10.0.0

    Introduction

    This Cumulus project seeks to address the existing need for a “native” cloud-based data ingest, archive, distribution, and management system that can be used for all future Earth Observing System Data and Information System (EOSDIS) data streams via the development and implementation of Cumulus. The term “native” implies that the system will leverage all components of a cloud infrastructure provided by the vendor for efficiency (in terms of both processing time and cost). Additionally, Cumulus will operate on future data streams involving satellite missions, aircraft missions, and field campaigns.

    This documentation includes both guidelines, examples, and source code docs. It is accessible at https://nasa.github.io/cumulus.


    Get To Know Cumulus

    • Getting Started - here - If you are new to Cumulus we suggest that you begin with this section to help you understand and work in the environment.
    • General Cumulus Documentation - here <- you're here

    Cumulus Reference Docs

    • Cumulus API Documentation - here
    • Cumulus Developer Documentation - here - READMEs throughout the main repository.
    • Data Cookbooks - here

    Auxiliary Guides

    • Integrator Guide - here
    • Operator Docs - here

    Contributing

    Please refer to: https://github.com/nasa/cumulus/blob/master/CONTRIBUTING.md for information. We thank you in advance.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/integrator-guide/about-int-guide/index.html b/docs/v10.0.0/integrator-guide/about-int-guide/index.html index 52edef9206d..9c1d5e9ce26 100644 --- a/docs/v10.0.0/integrator-guide/about-int-guide/index.html +++ b/docs/v10.0.0/integrator-guide/about-int-guide/index.html @@ -5,13 +5,13 @@ About Integrator Guide | Cumulus Documentation - +
    Version: v10.0.0

    About Integrator Guide

    Purpose

    The Integrator Guide is to help supplement the Cumulus documentation and Data Cookbooks. This content is for Cumulus integrators who are either new to the project or need a step-by-step resource to help them along.

    What Is A Cumulus Integrator

    Cumulus integrators are those who work within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    - + \ No newline at end of file diff --git a/docs/v10.0.0/integrator-guide/int-common-use-cases/index.html b/docs/v10.0.0/integrator-guide/int-common-use-cases/index.html index 98d77a8fedf..322c626ab5e 100644 --- a/docs/v10.0.0/integrator-guide/int-common-use-cases/index.html +++ b/docs/v10.0.0/integrator-guide/int-common-use-cases/index.html @@ -5,13 +5,13 @@ Integrator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v10.0.0/integrator-guide/workflow-add-new-lambda/index.html b/docs/v10.0.0/integrator-guide/workflow-add-new-lambda/index.html index 56e755b0bf6..e76356f94eb 100644 --- a/docs/v10.0.0/integrator-guide/workflow-add-new-lambda/index.html +++ b/docs/v10.0.0/integrator-guide/workflow-add-new-lambda/index.html @@ -5,13 +5,13 @@ Workflow - Add New Lambda | Cumulus Documentation - +
    Version: v10.0.0

    Workflow - Add New Lambda

    You can develop a workflow task in AWS Lambda or Elastic Container Service (ECS). AWS ECS requires Docker. For a list of tasks to use go to our Cumulus Tasks page.

    The following steps are to help you along as you write a new Lambda that integrates with a Cumulus workflow. This will aid you with the understanding of the Cumulus Message Adapter (CMA) process.

    Steps

    1. Define New Lambda in Terraform

    2. Add Task in JSON Object

      For details on how to set up a workflow via CMA go to the CMA Tasks: Message Flow.

      You will need to assign input and output for the new task and follow the CMA contract here. This contract defines how libraries should call the cumulus-message-adapter to integrate a task into an existing Cumulus Workflow.

    3. Verify New Task

      Check the updated workflow in AWS and in Cumulus.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/integrator-guide/workflow-ts-failed-step/index.html b/docs/v10.0.0/integrator-guide/workflow-ts-failed-step/index.html index 427153b4b09..99b17e4303c 100644 --- a/docs/v10.0.0/integrator-guide/workflow-ts-failed-step/index.html +++ b/docs/v10.0.0/integrator-guide/workflow-ts-failed-step/index.html @@ -5,13 +5,13 @@ Workflow - Troubleshoot Failed Step(s) | Cumulus Documentation - +
    Version: v10.0.0

    Workflow - Troubleshoot Failed Step(s)

    Steps

    1. Locate Step
    • Go to Cumulus dashboard
    • Find the granule
    • Go to Executions to determine the failed step
    1. Investigate in Cloudwatch
    • Go to Cloudwatch
    • Locate lambda
    • Search Cloudwatch logs
    1. Recreate Error

      In your sandbox environment, try to recreate the error.

    2. Resolution

    - + \ No newline at end of file diff --git a/docs/v10.0.0/interfaces/index.html b/docs/v10.0.0/interfaces/index.html index 6902f5a4fa6..e178739b515 100644 --- a/docs/v10.0.0/interfaces/index.html +++ b/docs/v10.0.0/interfaces/index.html @@ -5,13 +5,13 @@ Interfaces | Cumulus Documentation - +
    Version: v10.0.0

    Interfaces

    Cumulus has multiple interfaces that allow interaction with discrete components of the system, such as starting workflows via SNS/Kinesis/SQS, manually queueing workflow start messages, submitting SNS notifications for completed workflows, and the many operations allowed by the Cumulus API.

    The diagram below illustrates the workflow process in detail and the various interfaces that allow starting of workflows, reporting of workflow information, and database create operations that occur when a workflow reporting message is processed. For interfaces with expected input or output schemas, details are provided below.

    Note: This diagram is current of v1.18.0.

    Architecture diagram showing the interfaces for triggering and reporting of Cumulus workflow executions

    Workflow triggers and queuing

    Kinesis stream

    As a Kinesis stream is consumed by the messageConsumer Lambda to queue workflow executions, the incoming event is validated against this consumer schema by the ajv package.

    SQS queue for executions

    The messages put into the SQS queue for executions should conform to the Cumulus message format.

    Workflow executions

    See the documentation on Cumulus workflows.

    Workflow reporting

    SNS reporting topics

    For granule and PDR reporting, the topics will only receive data if the Cumulus workflow execution message meets the following criteria:

    • Granules - workflow message contains granule data in payload.granules
    • PDRs - workflow message contains PDR data in payload.pdr

    The messages published to the SNS reporting topics for executions and PDRs and the record property in the messages published to the granules SNS topic should conform to the model schema for each data type.

    Further detail on workflow reporting and how to interact with these interfaces can be found in the workflow notifications data cookbook.

    Cumulus API

    See the Cumulus API documentation.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/operator-docs/about-operator-docs/index.html b/docs/v10.0.0/operator-docs/about-operator-docs/index.html index 6a425c0eb54..065ff846155 100644 --- a/docs/v10.0.0/operator-docs/about-operator-docs/index.html +++ b/docs/v10.0.0/operator-docs/about-operator-docs/index.html @@ -5,13 +5,13 @@ About Operator Docs | Cumulus Documentation - +
    Version: v10.0.0

    About Operator Docs

    Purpose

    Operator Docs are an augmentation to Cumulus documentation and Data Cookbooks. These documents will walk step-by-step through common Cumulus activities (that aren't necessarily as use-case directed as what you'd see in Data Cookbooks).

    What Is A Cumulus Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections. They may perform the following functions via the operator dashboard or API:

    • Configure providers and collections
    • Configure rules and monitor workflow executions
    • Monitor granule ingestion
    • Monitor system metrics
    - + \ No newline at end of file diff --git a/docs/v10.0.0/operator-docs/bulk-operations/index.html b/docs/v10.0.0/operator-docs/bulk-operations/index.html index 9bfcc62ba43..f9197e49796 100644 --- a/docs/v10.0.0/operator-docs/bulk-operations/index.html +++ b/docs/v10.0.0/operator-docs/bulk-operations/index.html @@ -5,14 +5,14 @@ Bulk Operations | Cumulus Documentation - +
    Version: v10.0.0

    Bulk Operations

    Cumulus implements bulk operations through the use of AsyncOperations, which are long-running processes executed on an AWS ECS cluster.

    Submitting a bulk API request

    Bulk operations are generally submitted via the endpoint for the relevant data type, e.g. granules. For a list of supported API requests, refer to the Cumulus API documentation. Bulk operations are denoted with the keyword 'bulk'.

    Starting bulk operations from the Cumulus dashboard

    Using a Kibana query

    Note: You must have configured your dashboard build with a KIBANAROOT environment variable in order for the Kibana link to render in the bulk granules modal

    1. From the Granules dashboard page, click on the "Run Bulk Granules" button, then select what type of action you would like to perform

      • Note: the rest of the process is the same regardless of what type of bulk action you perform
    2. From the bulk granules modal, click the "Open Kibana" link:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations

    3. Once you have accessed Kibana, navigate to the "Discover" page. If this is your first time using Kibana, you may see a message like this at the top of the page:

      In order to visualize and explore data in Kibana, you'll need to create an index pattern to retrieve data from Elasticsearch.

      In that case, see the docs for creating an index pattern for Kibana

      Screenshot of Kibana user interface showing the &quot;Discover&quot; page for running queries

    4. Enter a query that returns the granule records that you want to use for bulk operations:

      Screenshot of Kibana user interface showing an example Kibana query and results

    5. Once the Kibana query is returning the results you want, click the "Inspect" link near the top of the page. A slide out tab with request details will appear on the right side of the page:

      Screenshot of Kibana user interface showing details of an example request

    6. In the slide out tab that appears on the right side of the page, click the "Request" link near the top and scroll down until you see the query property:

      Screenshot of Kibana user interface showing the Elasticsearch data request made for a given Kibana query

    7. Highlight and copy the query contents from Kibana. Go back to the Cumulus dashboard and paste the query contents from Kibana inside of the query property in the bulk granules request payload. It is expected that you should have a property of query nested inside of the existing query property:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query information populated

    8. Add values for the index and workflowName to the bulk granules request payload. The value for index will vary based on your Elasticsearch setup, but it is good to target an index specifically for granule data if possible:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query, index, and workflow information populated

    9. Click the "Run Bulk Operations" button. You should see a confirmation message, including an ID for the async operation that was started to handle your bulk action. You can track the status of this async operation on the Operations dashboard page, which can be visited by clicking the "Go To Operations" button:

      Screenshot of Cumulus dashboard showing confirmation message with async operation ID for bulk granules request

    Creating an index pattern for Kibana

    1. Define the index pattern for the indices that your Kibana queries should use. A wildcard character, *, will match across multiple indices. Once you are satisfied with your index pattern, click the "Next step" button:

      Screenshot of Kibana user interface for defining an index pattern

    2. Choose whether to use a Time Filter for your data, which is not required. Then click the "Create index pattern" button:

      Screenshot of Kibana user interface for configuring the settings of an index pattern

    Status Tracking

    All bulk operations return an AsyncOperationId which can be submitted to the /asyncOperations endpoint.

    The /asyncOperations endpoint allows listing of AsyncOperation records as well as record retrieval for individual records, which will contain the status. The Cumulus API documentation shows sample requests for these actions.

    The Cumulus Dashboard also includes an Operations monitoring page, where operations and their status are visible:

    Screenshot of Cumulus Dashboard Operations Page showing 5 operations and their status, ID, description, type and creation timestamp

    - + \ No newline at end of file diff --git a/docs/v10.0.0/operator-docs/cmr-operations/index.html b/docs/v10.0.0/operator-docs/cmr-operations/index.html index 91cc93ac218..6a7a980de87 100644 --- a/docs/v10.0.0/operator-docs/cmr-operations/index.html +++ b/docs/v10.0.0/operator-docs/cmr-operations/index.html @@ -5,7 +5,7 @@ CMR Operations | Cumulus Documentation - + @@ -16,7 +16,7 @@ UpdateCmrAccessConstraints will update CMR metadata file contents on S3, and PostToCmr will push the updates to CMR. The rest of this section will assume you have created this workflow under the name UpdateCmrAccessConstraints.

    Once created and deployed, the workflow is available in the Cumulus dashboard's Execute workflow selector. However, note that additional configuration is required for this request, to supply an access constraint integer value and optional description to the UpdateCmrAccessConstraints workflow, by clicking the Add Custom Workflow Meta option in the Execute popup, as shown below:

    Screenshot showing granule execute popup with &#39;updateCmrAccessConstraints&#39; selected and configuration values shown in a collapsible JSON field

    An example invocation of the API to perform this action is:

    $ curl --request PUT https://example.com/granules/MOD11A1.A2017137.h19v16.006.2017138085750 \
    --header 'Authorization: Bearer ReplaceWithTheToken' \
    --header 'Content-Type: application/json' \
    --data '{
    "action": "applyWorkflow",
    "workflow": "updateCmrAccessConstraints",
    "meta": {
    accessConstraints: {
    value: 5,
    description: "sample access constraint"
    }
    }
    }'

    Supported CMR metadata formats for the above operation are Echo10XML and UMMG-JSON, which will populate the RestrictionFlag and RestrictionComment fields in Echo10XML, or the AccessConstraints values in UMMG-JSON.

    Additional Operations

    At this time Cumulus does not, out of the box, support additional operations on CMR metadata. However, given the examples shown above, we recommend working with your integrators to develop additional workflows that perform any required operations.

    Bulk CMR operations

    In order to perform the above operations in bulk, Cumulus supports the use of ApplyWorkflow in an AsyncOperation. These are accessed via the Bulk Operation button on the dashboard, or the /granules/bulk endpoint on the Cumulus API.

    More information on bulk operations are in the bulk operations operator doc.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/operator-docs/create-rule-in-cumulus/index.html b/docs/v10.0.0/operator-docs/create-rule-in-cumulus/index.html index 5708794b955..b7de601fcc9 100644 --- a/docs/v10.0.0/operator-docs/create-rule-in-cumulus/index.html +++ b/docs/v10.0.0/operator-docs/create-rule-in-cumulus/index.html @@ -5,13 +5,13 @@ Create Rule In Cumulus | Cumulus Documentation - +
    Version: v10.0.0

    Create Rule In Cumulus

    Once the above files are in place and the entries created in CMR and Cumulus, we are ready to begin ingesting data. Depending on the type of ingestion (FTP/Kinesis, etc) the values below will change, but for the most part they are all similar. Rules tell Cumulus how to associate providers and collections, and when/how to start processing a workflow.

    Steps

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v10.0.0/operator-docs/discovery-filtering/index.html b/docs/v10.0.0/operator-docs/discovery-filtering/index.html index 0a8c7d3a4d9..1d28e0aa91b 100644 --- a/docs/v10.0.0/operator-docs/discovery-filtering/index.html +++ b/docs/v10.0.0/operator-docs/discovery-filtering/index.html @@ -5,7 +5,7 @@ Discovery Filtering | Cumulus Documentation - + @@ -24,7 +24,7 @@ directly list the provider_path. If the path contains regular expression components, this may fail.

    It is recommended that operators diagnose any failures by checking error logs and ensuring that permissions on the remote file system allow reading of the default directory and any subdirectories that match the filter.

    Supported protocols

    Currently support for this feature is limited to the following protocols:

    • ftp
    • sftp
    - + \ No newline at end of file diff --git a/docs/v10.0.0/operator-docs/granule-workflows/index.html b/docs/v10.0.0/operator-docs/granule-workflows/index.html index f438a089935..a69243f61f8 100644 --- a/docs/v10.0.0/operator-docs/granule-workflows/index.html +++ b/docs/v10.0.0/operator-docs/granule-workflows/index.html @@ -5,13 +5,13 @@ Granule Workflows | Cumulus Documentation - +
    Version: v10.0.0

    Granule Workflows

    Failed Granule

    Delete and Ingest

    1. Delete Granule

    Note: Granules published to CMR will need to be removed from CMR via the dashboard prior to deletion

    1. Ingest Granule via Ingest Rule
    • Re-trigger a one-time, kinesis, SQS, or SNS rule or a scheduled rule will re-discover and reingest the deleted granule.

    Reingest

    1. Select Failed Granule
    • In the Cumulus dashboard, go to the Collections page.
    • Use search field to find the granule.
    1. Re-ingest Granule
    • Go to the Collections page.
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of the Reingest modal workflow

    Delete and Ingest

    1. Bulk Delete Granules
    • Go to the Granules page.
    • Use the Bulk Delete button to bulk delete selected granules or select via a Kibana query

    Note: You can optionally force deletion from CMR

    1. Ingest Granules via Ingest Rule
    • Re-trigger one-time, kinesis, SQS, or SNS rules or scheduled rules will re-discover and reingest the deleted granule.

    Multiple Failed Granules

    1. Select Failed Granules
    • In the Cumulus dashboard, go to the Collections page.
    • Click on Failed Granules.
    • Select multiple granules.

    Screenshot of selected multiple granules

    1. Bulk Re-ingest Granules
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of Bulk Reingest modal workflow

    - + \ No newline at end of file diff --git a/docs/v10.0.0/operator-docs/kinesis-stream-for-ingest/index.html b/docs/v10.0.0/operator-docs/kinesis-stream-for-ingest/index.html index d72106f29ff..0bb33e1f91d 100644 --- a/docs/v10.0.0/operator-docs/kinesis-stream-for-ingest/index.html +++ b/docs/v10.0.0/operator-docs/kinesis-stream-for-ingest/index.html @@ -5,13 +5,13 @@ Setup Kinesis Stream & CNM Message | Cumulus Documentation - +
    Version: v10.0.0

    Setup Kinesis Stream & CNM Message

    Note: Keep in mind that you should only have to set this up once per ingest stream. Kinesis pricing is based on the shard value and not on amount of kinesis usage.

    1. Create a Kinesis Stream

      • In your AWS console, go to the Kinesis service and click Create Data Stream.
      • Assign a name to the stream.
      • Apply a shard value of 1.
      • Click on Create Kinesis Stream.
      • A status page with stream details display. Once the status is active then the stream is ready to use. Keep in mind to record the streamName and StreamARN for later use.

      Screenshot of AWS console page for creating a Kinesis stream

    2. Create a Rule

    3. Send a message

      • Send a message that makes your schema using python or by your command line.
      • The streamName and Collection must match the kinesisArn+collection defined in the rule that you have created in Step 2.
    - + \ No newline at end of file diff --git a/docs/v10.0.0/operator-docs/locating-access-logs/index.html b/docs/v10.0.0/operator-docs/locating-access-logs/index.html index ba4f83fa8de..d3a3cdb7972 100644 --- a/docs/v10.0.0/operator-docs/locating-access-logs/index.html +++ b/docs/v10.0.0/operator-docs/locating-access-logs/index.html @@ -5,13 +5,13 @@ Locating S3 Access Logs | Cumulus Documentation - +
    Version: v10.0.0

    Locating S3 Access Logs

    When enabling S3 Access Logs for EMS Reporting you configured a TargetBucket and TargetPrefix. Inside the TargetBucket at the TargetPrefix is where you will find the raw S3 access logs.

    In a standard deployment, this will be your stack's <internal bucket name> and a key prefix of <stack>/ems-distribution/s3-server-access-logs/

    - + \ No newline at end of file diff --git a/docs/v10.0.0/operator-docs/naming-executions/index.html b/docs/v10.0.0/operator-docs/naming-executions/index.html index 069e0db120f..51ce0d9e290 100644 --- a/docs/v10.0.0/operator-docs/naming-executions/index.html +++ b/docs/v10.0.0/operator-docs/naming-executions/index.html @@ -5,7 +5,7 @@ Naming Executions | Cumulus Documentation - + @@ -21,7 +21,7 @@ QueuePdrs step.

    In the following excerpt, the QueueGranules config.executionNamePrefix property is set using the value configured in the workflow's meta.executionNamePrefix.

    Please note: This meta.executionNamePrefix property should not be confused with the optional rule executionNamePrefix property from the previous section. Setting executionNamePrefix as a root property of the rule will set a prefix for the names of any workflows triggered by the rule. Setting meta.executionNamePrefix on the rule will set meta.executionNamePrefix in the workflow messages generated for this rule, allowing workflow steps like QueueGranules to read from the message meta.executionNamePrefix for their config. Then, workflows scheduled by QueueGranules would use the configured execution name prefix.

    Setting executionNamePrefix config for QueueGranules using rule.meta

    If you wanted to use a prefix of "my-prefix", you would create a rule with a meta property similar to the following Rule snippet:

    {
    ...other rule keys here...
    "meta":
    {
    "executionNamePrefix": "my-prefix"
    }
    }

    The value of meta.executionNamePrefix from the rule will be set as meta.executionNamePrefix in the workflow message.

    Then, the workflow could contain a "QueueGranules" step with the following state, which uses meta.executionNamePrefix from the message as the value for the executionNamePrefix config to the "QueueGranules" step:

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "executionNamePrefix": "{$.meta.executionNamePrefix}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },
    }
    - + \ No newline at end of file diff --git a/docs/v10.0.0/operator-docs/ops-common-use-cases/index.html b/docs/v10.0.0/operator-docs/ops-common-use-cases/index.html index 6895a82bc68..395163df7c0 100644 --- a/docs/v10.0.0/operator-docs/ops-common-use-cases/index.html +++ b/docs/v10.0.0/operator-docs/ops-common-use-cases/index.html @@ -5,13 +5,13 @@ Operator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v10.0.0/operator-docs/trigger-workflow/index.html b/docs/v10.0.0/operator-docs/trigger-workflow/index.html index 3ed38a63bc2..180154cfe11 100644 --- a/docs/v10.0.0/operator-docs/trigger-workflow/index.html +++ b/docs/v10.0.0/operator-docs/trigger-workflow/index.html @@ -5,13 +5,13 @@ Trigger a Workflow Execution | Cumulus Documentation - +
    Version: v10.0.0

    Trigger a Workflow Execution

    To trigger a workflow, you need to create a rule. To trigger an ingest workflow, one that requires discovering and ingesting data, you will also need to configure the collection and provider and associate those to a rule.

    Trigger a HelloWorld Workflow

    To trigger a HelloWorld workflow that does not need to discover or archive data, you just need to create a rule.

    You can leave the provider and collection blank and do not need any additional metadata. If you create a onetime rule, the workflow execution will start momentarily and you can view its status on the Executions page.

    Trigger an Ingest Workflow

    To ingest data, you will need a provider and collection configured to tell your workflow where to discover data and where to archive the data respectively.

    Follow the instructions to create a provider and create a collection and configure their fields for your data ingest.

    In the rule's additional metadata you can specify a provider_path from which to get the data from the provider.

    Example: Ingest data from S3

    Setup

    Assume there are 2 files to be ingested in an S3 bucket called discovery-bucket, located in the test-data folder:

    • GRANULE.A2017025.jpg
    • GRANULE.A2017025.hdf

    Archive buckets should already be created and mapped to public / private / protected in the Cumulus deployment.

    For example:

    buckets = {
    private = {
    name = "discovery-bucket"
    type = "private"
    },
    protected = {
    name = "archive-protected"
    type = "protected"
    }
    public = {
    name = "archive-public"
    type = "public"
    }
    }

    Create a provider

    Create a new provider. Set protocol to S3 and Host to discovery-bucket.

    Screenshot of adding a sample S3 provider

    Create a collection

    Create a new collection. Configure the collection to extract the granule id from the filenames and configure where to store the granule files.

    The configuration below will store hdf files in the protected bucket and jpg files in the private bucket. The bucket types are

    {
    "name": "test-collection",
    "version": "001",
    "granuleId": "^GRANULE\\.A[\\d]{7}$",
    "granuleIdExtraction": "(GRANULE\\..*)(\\.hdf|\\.jpg)",
    "reportToEms": false,
    "sampleFileName": "GRANULE.A2017025.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^GRANULE\\.A[\\d]{7}\\.hdf$",
    "sampleFileName": "GRANULE.A2017025.hdf"
    },
    {
    "bucket": "public",
    "regex": "^GRANULE\\.A[\\d]{7}\\.jpg$",
    "sampleFileName": "GRANULE.A2017025.jpg"
    }
    ]
    }

    Create a rule

    Create a rule to trigger the workflow to discover your granule data and ingest your granule.

    Select the previously created provider and collection. See the Cumulus Discover Granules workflow for a workflow example of using Cumulus tasks to discover and queue data for ingest.

    In the rule meta, set the provider_path to test-data, so the test-data folder will be used to discover new granules.

    Screenshot of adding a Discover Granules rule

    A onetime rule will run your workflow on-demand and you can view it on the dashboard Executions page. The Cumulus Discover Granules workflow will trigger an ingest workflow and your ingested granules will be visible on the dashboard Granules page.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/tasks/index.html b/docs/v10.0.0/tasks/index.html index 1cbdea241a1..69756fdfe43 100644 --- a/docs/v10.0.0/tasks/index.html +++ b/docs/v10.0.0/tasks/index.html @@ -5,13 +5,13 @@ Cumulus Tasks | Cumulus Documentation - +
    Version: v10.0.0

    Cumulus Tasks

    A list of reusable Cumulus tasks. Add your own.

    Tasks

    @cumulus/add-missing-file-checksums

    Add checksums to files in S3 which don't have one


    @cumulus/discover-granules

    Discover Granules in FTP/HTTP/HTTPS/SFTP/S3 endpoints


    @cumulus/discover-pdrs

    Discover PDRs in FTP and HTTP endpoints


    @cumulus/files-to-granules

    Converts array-of-files input into a granules object by extracting granuleId from filename


    @cumulus/hello-world

    Example task


    @cumulus/hyrax-metadata-updates

    Update granule metadata with hooks to OPeNDAP URL


    @cumulus/lzards-backup

    Run LZARDS backup


    @cumulus/move-granules

    Move granule files from staging to final location


    @cumulus/parse-pdr

    Download and Parse a given PDR


    @cumulus/pdr-status-check

    Checks execution status of granules in a PDR


    @cumulus/post-to-cmr

    Post a given granule to CMR


    @cumulus/queue-granules

    Add discovered granules to the queue


    @cumulus/queue-pdrs

    Add discovered PDRs to a queue


    @cumulus/queue-workflow

    Add workflow to the queue


    @cumulus/sf-sqs-report

    Sends an incoming Cumulus message to SQS


    @cumulus/sync-granule

    Download a given granule


    @cumulus/test-processing

    Fake processing task used for integration tests


    @cumulus/update-cmr-access-constraints

    Updates CMR metadata to set access constraints


    Update CMR metadata files with correct online access urls and etags and transfer etag info to granules' CMR files

    - + \ No newline at end of file diff --git a/docs/v10.0.0/team/index.html b/docs/v10.0.0/team/index.html index fd5528a7abd..561d866ed3a 100644 --- a/docs/v10.0.0/team/index.html +++ b/docs/v10.0.0/team/index.html @@ -5,13 +5,13 @@ Cumulus Team | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v10.0.0/troubleshooting/index.html b/docs/v10.0.0/troubleshooting/index.html index 988a5f22e02..d7fa3deaf70 100644 --- a/docs/v10.0.0/troubleshooting/index.html +++ b/docs/v10.0.0/troubleshooting/index.html @@ -5,14 +5,14 @@ How to Troubleshoot and Fix Issues | Cumulus Documentation - +
    Version: v10.0.0

    How to Troubleshoot and Fix Issues

    While Cumulus is a complex system, there is a focus on maintaining the integrity and availability of the system and data. Should you encounter errors or issues while using this system, this section will help troubleshoot and solve those issues.

    Backup and Restore

    Cumulus has backup and restore functionality built-in to protect Cumulus data and allow recovery of a Cumulus stack. This is currently limited to Cumulus data and not full S3 archive data. Backup and restore is not enabled by default and must be enabled and configured to take advantage of this feature.

    For more information, read the Backup and Restore documentation.

    Elasticsearch reindexing

    If you run into issues with your Elasticsearch index, a reindex operation is available via the Cumulus API. See the Reindexing Guide.

    Information on how to reindex Elasticsearch is in the Cumulus API documentation.

    Troubleshooting Workflows

    Workflows are state machines comprised of tasks and services and each component logs to CloudWatch. The CloudWatch logs for all steps in the execution are displayed in the Cumulus dashboard or you can find them by going to CloudWatch and navigating to the logs for that particular task.

    Workflow Errors

    Visual representations of executed workflows can be found in the Cumulus dashboard or the AWS Step Functions console for that particular execution.

    If a workflow errors, the error will be handled according to the error handling configuration. The task that fails will have the exception field populated in the output, giving information about the error. Further information can be found in the CloudWatch logs for the task.

    Graph of AWS Step Function execution showing a failing workflow

    Workflow Did Not Start

    Generally, first check your rule configuration. If that is satisfactory, the answer will likely be in the CloudWatch logs for the schedule SF or SF starter lambda functions. See the workflow triggers page for more information on how workflows start.

    For Kinesis and SNS rules specifically, if an error occurs during the message consumer process, the fallback consumer lambda will be called and if the message continues to error, a message will be placed on the dead letter queue. Check the dead letter queue for a failure message. Errors can be traced back to the CloudWatch logs for the message consumer and the fallback consumer. Additionally, check that the name and version match those configured in your rule, as rules are filtered by the notification's collection name and version before scheduling executions.

    More information on kinesis error handling is here.

    Operator API Errors

    All operator API calls are funneled through the ApiEndpoints lambda. Each API call is logged to the ApiEndpoints CloudWatch log for your deployment.

    Lambda Errors

    KMS Exception: AccessDeniedException

    KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.

    The above error was being thrown by cumulus lambda function invocation. The KMS key is the encryption key used to encrypt lambda environment variables. The root cause of this error is unknown, but is speculated to be caused by deleting and recreating, with the same name, the IAM role the lambda uses.

    This error can be resolved by switching the lambda's execution role to a different one and then back through the Lambda management console. Unfortunately, this approach doesn't scale well.

    The other resolution (that scales but takes some time) that was found is as follows:

    1. Comment out all lambda definitions (and dependent resources) in your Terraform configuration.
    2. terraform apply to delete the lambdas.
    3. Un-comment the definitions.
    4. terraform apply to recreate the lambdas.

    If this problem occurs with Core lambdas and you are using the terraform-aws-cumulus.zip file source distributed in our release, we recommend using the non-scaling approach as the number of lambdas we distribute is in the low teens, which are likely to be easier and faster to reconfigure one-by-one compared to editing our configs.

    Error: Unable to import module 'index': Error

    This error is shown in the CloudWatch logs for a Lambda function.

    One possible cause is that the Lambda definition in the .tf file defining the lambda is not pointing to the correct packaged lambda source file. In order to resolve this issue, update the lambda definition to point directly to the packaged (e.g. .zip) lambda source file.

    resource "aws_lambda_function" "discover_granules_task" {
    function_name = "${var.prefix}-DiscoverGranules"
    filename = "${path.module}/../../tasks/discover-granules/dist/lambda.zip"
    handler = "index.handler"
    }

    If you are seeing this error when using the Lambda as a step in a Cumulus workflow, then inspect the output for this Lambda step in the AWS Step Function console. If you see the error Cannot find module 'node_modules/@cumulus/cumulus-message-adapter-js', then you need to ensure the lambda's packaged dependencies include cumulus-message-adapter-js.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/troubleshooting/reindex-elasticsearch/index.html b/docs/v10.0.0/troubleshooting/reindex-elasticsearch/index.html index 9ea985efcae..964306dca1c 100644 --- a/docs/v10.0.0/troubleshooting/reindex-elasticsearch/index.html +++ b/docs/v10.0.0/troubleshooting/reindex-elasticsearch/index.html @@ -5,7 +5,7 @@ Reindexing Elasticsearch Guide | Cumulus Documentation - + @@ -14,7 +14,7 @@ current index, or the mappings for an index have been updated (they do not update automatically). Any reindexing that will be required when upgrading Cumulus will be in the Migration Steps section of the changelog.

    Switch to a new index and Reindex

    There are two operations needed: reindex and change-index to switch over to the new index. A Change Index/Reindex can be done in either order, but both have their trade-offs.

    If you decide to point Cumulus to a new (empty) index first (with a change index operation), and then Reindex the data to the new index, data ingested while reindexing will automatically be sent to the new index. As reindexing operations can take a while, not all the data will show up on the Cumulus Dashboard right away. The advantage is you do not have to turn of any ingest operations. This way is recommended.

    If you decide to Reindex data to a new index first, and then point Cumulus to that new index, it is not guaranteed that data that is sent to the old index while reindexing will show up in the new index. If you prefer this way, it is recommended to turn off any ingest operations. This order will keep your dashboard data from seeing any interruption.

    Change Index

    This will point Cumulus to the index in Elasticsearch that will be used when retrieving data. Performing a change index operation to an index that does not exist yet will create the index for you. The change index operation can be found here.

    Reindex from the old index to the new index

    The reindex operation will take the data from one index and copy it into another index. The reindex operation can be found here

    Reindex status

    Reindexing is a long-running operation. The reindex-status endpoint can be used to monitor the progress of the operation.

    Index from database

    If you want to just grab the data straight from the database you can perform an Index from Database Operation. After the data is indexed from the database, a Change Index operation will need to be performed to ensure Cumulus is pointing to the right index. It is strongly recommended to turn off workflow rules when performing this operation so any data ingested to the database is not lost.

    Validate reindex

    To validate the reindex, use the reindex-status endpoint. The doc count can be used to verify that the reindex was successful. In the below example the reindex from cumulus-2020-11-3 to cumulus-2021-3-4 was not fully successful as they show different doc counts.

    "indices": {
    "cumulus-2020-11-3": {
    "primaries": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    },
    "total": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    }
    },
    "cumulus-2021-3-4": {
    "primaries": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    },
    "total": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    }
    }
    }

    To further drill down into what is missing, log in to the Kibana instance (found in the Elasticsearch section of the AWS console) and run the following command replacing <index> with your index name.

    GET <index>/_search
    {
    "aggs": {
    "count_by_type": {
    "terms": {
    "field": "_type"
    }
    }
    },
    "size": 0
    }

    which will produce a result like

    "aggregations": {
    "count_by_type": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "logs",
    "doc_count": 483955
    },
    {
    "key": "execution",
    "doc_count": 4966
    },
    {
    "key": "deletedgranule",
    "doc_count": 4715
    },
    {
    "key": "pdr",
    "doc_count": 1822
    },
    {
    "key": "granule",
    "doc_count": 740
    },
    {
    "key": "asyncOperation",
    "doc_count": 616
    },
    {
    "key": "provider",
    "doc_count": 108
    },
    {
    "key": "collection",
    "doc_count": 87
    },
    {
    "key": "reconciliationReport",
    "doc_count": 48
    },
    {
    "key": "rule",
    "doc_count": 7
    }
    ]
    }
    }

    Resuming a reindex

    If a reindex operation did not fully complete it can be resumed using the following command run from the Kibana instance.

    POST _reindex?wait_for_completion=false
    {
    "conflicts": "proceed",
    "source": {
    "index": "cumulus-2020-11-3"
    },
    "dest": {
    "index": "cumulus-2021-3-4",
    "op_type": "create"
    }
    }

    The Cumulus API reindex-status endpoint can be used to monitor completion of this operation.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/troubleshooting/rerunning-workflow-executions/index.html b/docs/v10.0.0/troubleshooting/rerunning-workflow-executions/index.html index 10d4d01872b..125f063b0f9 100644 --- a/docs/v10.0.0/troubleshooting/rerunning-workflow-executions/index.html +++ b/docs/v10.0.0/troubleshooting/rerunning-workflow-executions/index.html @@ -5,13 +5,13 @@ Re-running workflow executions | Cumulus Documentation - +
    Version: v10.0.0

    Re-running workflow executions

    To re-run a Cumulus workflow execution from the AWS console:

    1. Visit the page for an individual workflow execution

    2. Click the "New execution" button at the top right of the screen

      Screenshot of the AWS console for a Step Function execution highlighting the &quot;New execution&quot; button at the top right of the screen

    3. In the "New execution" modal that appears, replace the cumulus_meta.execution_name value in the default input with the value of the new execution ID as seen in the screenshot below

      Screenshot of the AWS console showing the modal window for entering input when running a new Step Function execution

    4. Click the "Start execution" button

    - + \ No newline at end of file diff --git a/docs/v10.0.0/troubleshooting/troubleshooting-deployment/index.html b/docs/v10.0.0/troubleshooting/troubleshooting-deployment/index.html index 499cba54310..3393d1027ca 100644 --- a/docs/v10.0.0/troubleshooting/troubleshooting-deployment/index.html +++ b/docs/v10.0.0/troubleshooting/troubleshooting-deployment/index.html @@ -5,7 +5,7 @@ Troubleshooting Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ data-persistence modules, but your config is only creating one Elasticsearch instance. To fix the issue, update the elasticsearch_config variable for your data-persistence module to increase the number of instances:

    {
    domain_name = "es"
    instance_count = 2
    instance_type = "t2.small.elasticsearch"
    version = "5.3"
    volume_size = 10
    }

    Install dashboard

    Dashboard configuration

    Issues:

    • Problem clearing the cache: EACCES: permission denied, rmdir '/tmp/gulp-cache/default'", this probably means the files at that location, and/or the folder, are owned by someone else (or some other factor prevents you from writing there).

    It's possible to workaround this by editing the file cumulus-dashboard/node_modules/gulp-cache/index.js and alter the value of the line var fileCache = new Cache({cacheDirName: 'gulp-cache'}); to something like var fileCache = new Cache({cacheDirName: '<prefix>-cache'});. Now gulp-cache will be able to write to /tmp/<prefix>-cache/default, and the error should resolve.

    Dashboard deployment

    Issues:

    • If the dashboard sends you to an Earthdata Login page that has an error reading "Invalid request, please verify the client status or redirect_uri before resubmitting", this means you've either forgotten to update one or more of your EARTHDATA_CLIENT_ID, EARTHDATA_CLIENT_PASSWORD environment variables (from your app/.env file) and re-deploy Cumulus, or you haven't placed the correct values in them, or you've forgotten to add both the "redirect" and "token" URL to the Earthdata Application.
    • There is odd caching behavior associated with the dashboard and Earthdata Login at this point in time that can cause the above error to reappear on the Earthdata Login page loaded by the dashboard even after fixing the cause of the error. If you experience this, attempt to access the dashboard in a new browser window, and it should work.
    - + \ No newline at end of file diff --git a/docs/v10.0.0/upgrade-notes/cumulus_distribution_migration/index.html b/docs/v10.0.0/upgrade-notes/cumulus_distribution_migration/index.html index 6c097a0e348..93eb971f5c6 100644 --- a/docs/v10.0.0/upgrade-notes/cumulus_distribution_migration/index.html +++ b/docs/v10.0.0/upgrade-notes/cumulus_distribution_migration/index.html @@ -5,14 +5,14 @@ Migrate from TEA deployment to Cumulus Distribution | Cumulus Documentation - +
    Version: v10.0.0

    Migrate from TEA deployment to Cumulus Distribution

    Background

    The Cumulus Distribution API is configured to use the AWS Cognito OAuth client. This API can be used instead of the Thin Egress App, which is the default distribution API if using the Deployment Template.

    Configuring a Cumulus Distribution deployment

    See these instructions for deploying the Cumulus Distribution API.

    Important note if migrating from TEA to Cumulus Distribution

    If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/upgrade-notes/migrate_tea_standalone/index.html b/docs/v10.0.0/upgrade-notes/migrate_tea_standalone/index.html index 12998f76a81..f5ca3f897e4 100644 --- a/docs/v10.0.0/upgrade-notes/migrate_tea_standalone/index.html +++ b/docs/v10.0.0/upgrade-notes/migrate_tea_standalone/index.html @@ -5,13 +5,13 @@ Migrate TEA deployment to standalone module | Cumulus Documentation - +
    Version: v10.0.0

    Migrate TEA deployment to standalone module

    Background

    This document is only relevant for upgrades of Cumulus from versions < 3.x.x to versions > 3.x.x

    Previous versions of Cumulus included deployment of the Thin Egress App (TEA) by default in the distribution module. As a result, Cumulus users who wanted to deploy a new version of TEA to wait on a new release of Cumulus that incorporated that release.

    In order to give Cumulus users the flexibility to deploy newer versions of TEA whenever they want, deployment of TEA has been removed from the distribution module and Cumulus users must now add the TEA module to their deployment. Guidance on integrating the TEA module to your deployment is provided, or you can refer to Cumulus core example deployment code for the thin_egress_app module.

    By default, when upgrading Cumulus and moving from TEA deployed via the distribution module to deployed as a separate module, your API gateway for TEA would be destroyed and re-created, which could cause outages for any Cloudfront endpoints pointing at that API gateway.

    These instructions outline how to modify your state to preserve your existing Thin Egress App (TEA) API gateway when upgrading Cumulus and moving deployment of TEA to a standalone module. If you do not care about preserving your API gateway for TEA when upgrading your Cumulus deployment, you can skip these instructions.

    Prerequisites

    Notes about state management

    These instructions will involve manipulating your Terraform state via terraform state mv commands. These operations are extremely dangerous, since a mistake in editing your Terraform state can leave your stack in a corrupted state where deployment may be impossible or may result in unanticipated resource deletion.

    Since bucket versioning preserves a separate version of your state file each time it is written, and the Terraform state modification commands overwrite the state file, we can mitigate the risk of these operations by downloading the most recent state file before starting the upgrade process. Then, if anything goes wrong during the upgrade, we can restore that previous state version. Guidance on how to perform both operations is provided below.

    Download your most recent state version

    Run this command to download the most recent cumulus deployment state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp s3://BUCKET/KEY /path/to/terraform.tfstate

    Restore a previous state version

    Upload the state file that was previously downloaded to the bucket/key for your state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp /path/to/terraform.tfstate s3://BUCKET/KEY

    Then run terraform plan, which will give an error because we manually overwrote the state file and it is now out of sync with the lock table Terraform uses to track your state file:

    Error: Error loading state: state data in S3 does not have the expected content.

    This may be caused by unusually long delays in S3 processing a previous state
    update. Please wait for a minute or two and try again. If this problem
    persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
    to manually verify the remote state and update the Digest value stored in the
    DynamoDB table to the following value: <some-digest-value>

    To resolve this error, run this command and replace DYNAMO_LOCK_TABLE, BUCKET and KEY with the correct values from cumulus-tf/terraform.tf, and use the digest value from the previous error output:

     aws dynamodb put-item \
    --table-name DYNAMO_LOCK_TABLE \
    --item '{
    "LockID": {"S": "BUCKET/KEY-md5"},
    "Digest": {"S": "some-digest-value"}
    }'

    Now, if you re-run terraform plan, it should work as expected.

    Migration instructions

    Please note: These instructions assume that you are deploying the thin_egress_app module as shown in the Cumulus core example deployment code

    1. Ensure that you have downloaded the latest version of your state file for your cumulus deployment

    2. Find the URL for your <prefix>-thin-egress-app-EgressGateway API gateway. Confirm that you can access it in the browser and that it is functional.

    3. Run terraform plan. You should see output like (edited for readability):

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be created
      + resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket.lambda_source will be created
      + resource "aws_s3_bucket" "lambda_source" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be created
      + resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be created
      + resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be created
      + resource "aws_s3_bucket_object" "lambda_source" {

      # module.thin_egress_app.aws_security_group.egress_lambda[0] will be created
      + resource "aws_security_group" "egress_lambda" {

      ...

      # module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be destroyed
      - resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source will be destroyed
      - resource "aws_s3_bucket" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be destroyed
      - resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be destroyed
      - resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source will be destroyed
      - resource "aws_s3_bucket_object" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda[0] will be destroyed
      - resource "aws_security_group" "egress_lambda" {
    4. Run the state modification commands. The commands must be run in exactly this order:

       # Move security group
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda module.thin_egress_app.aws_security_group.egress_lambda

      # Move TEA storage bucket
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source module.thin_egress_app.aws_s3_bucket.lambda_source

      # Move TEA lambda source code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source module.thin_egress_app.aws_s3_bucket_object.lambda_source

      # Move TEA lambda dependency code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive

      # Move TEA Cloudformation template
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template module.thin_egress_app.aws_s3_bucket_object.cloudformation_template

      # Move URS creds secret version
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret_version.thin_egress_urs_creds aws_secretsmanager_secret_version.thin_egress_urs_creds

      # Move URS creds secret
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret.thin_egress_urs_creds aws_secretsmanager_secret.thin_egress_urs_creds

      # Move TEA Cloudformation stack
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app module.thin_egress_app.aws_cloudformation_stack.thin_egress_app

      Depending on how you were supplying a bucket map to TEA, there may be an additional step. If you were specifying the bucket_map_key variable to the cumulus module to use a custom bucket map, then you can ignore this step and just ensure that the bucket_map_file variable to the TEA module uses that same S3 key. Otherwise, if you were letting Cumulus generate a bucket map for you, then you need to take this step to migrate that bucket map:

      # Move bucket map
      terraform state mv module.cumulus.module.distribution.aws_s3_bucket_object.bucket_map_yaml[0] aws_s3_bucket_object.bucket_map_yaml
    5. Run terraform plan again. You may still see a few additions/modifications pending like below, but you should not see any deletion of Thin Egress App resources pending:

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be updated in-place
      ~ resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be updated in-place
      ~ resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_source" {

      If you still see deletion of module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app pending, then something went wrong and you should restore the previously downloaded state file version and start over from step 1. Otherwise, proceed to step 6.

    6. Once you have confirmed that everything looks as expected, run terraform apply.

    7. Visit the same API gateway from step 1 and confirm that it still works.

    Your TEA deployment has now been migrated to a standalone module, which gives you the ability to upgrade the deployed version of TEA independently of Cumulus releases.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/upgrade-notes/update-cma-2.0.2/index.html b/docs/v10.0.0/upgrade-notes/update-cma-2.0.2/index.html index df2b10cb796..5aac4eab324 100644 --- a/docs/v10.0.0/upgrade-notes/update-cma-2.0.2/index.html +++ b/docs/v10.0.0/upgrade-notes/update-cma-2.0.2/index.html @@ -5,13 +5,13 @@ Upgrade to CMA 2.0.2 | Cumulus Documentation - +
    Version: v10.0.0

    Upgrade to CMA 2.0.2

    Updating a Cumulus Deployment to CMA 2.0.2

    Background

    The Cumulus Message Adapter has been updated in release 2.0.2 to no longer utilize the AWS step function API to look up the defined name of a step function task for population in meta.workflow_tasks, but instead use an incrementing integer field.

    Additionally a bugfix was released in the form of v2.0.1/v2.0.2 following the initial 2.0.0 release, so all users should update to release 2.0.2

    The update is not tied to a particular version of Core, however the update should be done across all task components in order to ensure consistent execution records.

    Changes

    Execution Record Update

    This update functionally means that Cumulus tasks/activities using the CMA will now record a record that looks like the following in meta.workflowtasks, and more importantly in the tasks column for an execution record:

    Original

          "DiscoverGranules": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "QueueGranules": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    New

          "0": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "1": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    Actions Required

    The following should be done as part of a Cumulus stack update to utilize cumulus message adapter > 2.0.2:

    • Python tasks that utilize cumulus-message-adapter-python should be updated to use > 2.0.0, their lambdas rebuilt and Cumulus workflows reconfigured to use the updated version.

    • Python activities that utilize cumulus-process-py should be rebuilt using > 1.0.0 with updated dependencies, and have their images deployed/Cumulus configured to use the new version.

    • The cumulus-message-adapter v2.0.2 lambda layer should be made available in the deployment account, and the Cumulus deployment should be reconfigured to use it (via the cumulus_message_adapter_lambda_layer_version_arn variable in the cumulus module). This should address all Core node.js tasks that utilize the CMA, and many contributed node.js/JAVA components.

    Once the above have been done, redeploy Cumulus to apply the configuration and the updates should be live.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/upgrade-notes/update-task-file-schemas/index.html b/docs/v10.0.0/upgrade-notes/update-task-file-schemas/index.html index 9d8b4d19f0b..0a1463e06bf 100644 --- a/docs/v10.0.0/upgrade-notes/update-task-file-schemas/index.html +++ b/docs/v10.0.0/upgrade-notes/update-task-file-schemas/index.html @@ -5,13 +5,13 @@ Updates to task granule file schemas | Cumulus Documentation - +
    Version: v10.0.0

    Updates to task granule file schemas

    Background

    Most Cumulus workflow tasks expect as input a payload of granule(s) which contain the files for each granule. Most tasks also return this same granule structure as output.

    However, up to this point, there was inconsistency in the schemas for the granule files objects expected by each task. Furthermore, there was no guarantee of consistency between granule files objects as stored in the database and the expectations of any given workflow task.

    Thus, when performing bulk granule operations which pass granules from the database into a Cumulus workflow, it was possible for there to be schema validation failures depending on which task was used to start the workflow and its particular schema.

    In order to rectify this situation, CUMULUS-2388 was filed and addressed to create a common granule files schema between nearly all of the Cumulus tasks (exceptions discussed below) and the Cumulus database. The following documentation explains the manual changes you need to make to your deployment in order to be compatible with the updated files schema.

    Updated files schema

    The updated granule files schema can be found here.

    These former properties were deprecated (with notes about how to derive the same information from the updated schema, if possible):

    • filename - concatenate the bucket and key values with a directory separator (/)
    • name - use fileName property
    • etag - ETags are no longer provided as an individual file property. Instead, a separate etags object mapping S3 URIs to ETag values is provided as output from the following workflow tasks (guidance on how to integrate this output with your workflows is provided in the Upgrading your workflows section below):
      • update-granules-cmr-metadata-file-links
      • hyrax-metadata-updates
    • fileStagingDir - no longer supported
    • url_path - no longer supported
    • duplicate_found - This property is no longer supported, however sync-granule and move-granules now produce a separate granuleDuplicates object as part of their output. The granuleDuplicates object is a map of granules by granule ID which includes the files that encountered duplicates during processing. Guidance on how to integrate granuleDuplicates information into your workflow configuration is provided below.

    Exceptions

    These workflow tasks did not have their schema for granule files updated:

    • discover-granules - no updates
    • queue-granules - no updates
    • parse-pdr - no updates
    • sync-granule - input schema not updated, output schema was updated

    The reason that these task schemas were not updated is that all of these tasks start before the files have been ingested to S3, thus much of the information that is required in the updated files schema like bucket, key, or checksum is not yet known.

    Bulk granule operations

    Since the input schema for the above tasks was not updated, that means you cannot run bulk granule operations against workflows if they start with any of those tasks. Bulk granule operations work by loading the specified granules from the database and sending them as input to a specified workflow, so if the specified workflow begins with a task whose input schema does not conform to what is coming out of the database, there will be schema errors.

    Upgrading your deployment

    Upgrading your workflows

    For any workflows using the update-granules-cmr-metadata-file-links task before the hyrax-metadata-updates and/or post-to-cmr tasks, update the step definition for update-granules-cmr-metadata-file-links as follows:

        "UpdateGranulesCmrMetadataFileLinksStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    hyrax-metadata-updates

    For any workflows using the hyrax-metadata-updates task before a post-to-cmr task, update the definition of the hyrax-metadata-updates step as follows:

        "HyraxMetadataUpdatesTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    post-to-cmr

    For any workflows using post-to-cmr task after the update-granules-cmr-metadata-file-links or hyrax-metadata-updates tasks, update the post-to-cmr step definition as follows:

        "CmrStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}"
    }
    }
    },
    ...more configuration...

    Example workflow

    For an example workflow integrating all of these changes, please see our example ingest and publish workflow.

    Optional - Integrate granuleDuplicates information

    Please note that the granuleDuplicates output is purely informational and does not have any bearing on the separate configuration for how duplicates should be handled.

    You can include granuleDuplicates output from the sync-granule or move-granules tasks in your workflow messages like so:

        "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    ...other config...
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granuleDuplicates}",
    "destination": "{$.meta.sync_granule.granule_duplicates}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    }
    ...more configuration...

    The result of this configuration is that the granuleDuplicates output from sync-granule would be placed in meta.sync_granule.granule_duplicates on the workflow message and remain there throughout the rest of the workflow. The same configuration could be replicated for the move-granules task, but be sure to use a different destination in the workflow message for the granuleDuplicates output .

    Updating collection URL path templates

    Collections can specify url_path templates to dynamically generate the final location of files. As part of url_path templates, file object properties can be interpolated to generate the file path. Thus, these url_path templates need to be updated to ensure that they are compatible with the updated files schema and the properties that will actually be available on file objects.

    See the notes on the updated files schema to know which properties are available and which previously existing properties were deprecated.

    As an example, you will want to update any url_path properties in your collections to remove references to file.name and replace them with references to file.fileName like so:

    - "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.name, 0, 3)}",
    + "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.fileName, 0, 3)}",
    - + \ No newline at end of file diff --git a/docs/v10.0.0/upgrade-notes/upgrade-rds/index.html b/docs/v10.0.0/upgrade-notes/upgrade-rds/index.html index e76fc83c4dc..2a64ab36642 100644 --- a/docs/v10.0.0/upgrade-notes/upgrade-rds/index.html +++ b/docs/v10.0.0/upgrade-notes/upgrade-rds/index.html @@ -5,7 +5,7 @@ Upgrade to RDS release | Cumulus Documentation - + @@ -21,7 +21,7 @@ | cutoffSeconds | number | Number of seconds prior to this execution to 'cutoff' reconciliation queries. This allows in-progress/other in-flight operations time to complete and propagate to Elasticsearch/Dynamo/postgres. | 3600 | | dbConcurrency | number | Sets max number of parallel collections reports the script will run at a time. | 20 | | dbMaxPool | number | Sets the maximum number of connections the database pool has available. Modifying this may result in unexpected failures. | 20 |

    - + \ No newline at end of file diff --git a/docs/v10.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html b/docs/v10.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html index dae65b0346c..b93bce27376 100644 --- a/docs/v10.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html +++ b/docs/v10.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html @@ -5,13 +5,13 @@ Upgrade to TF version 0.13.6 | Cumulus Documentation - +
    Version: v10.0.0

    Upgrade to TF version 0.13.6

    Background

    Cumulus pins its support to a specific version of Terraform see: deployment documentation. The reason for only supporting one specific Terraform version at a time is to avoid deployment errors than can be caused by deploying to the same target with different Terraform versions.

    Cumulus is upgrading its supported version of Terraform from 0.12.12 to 0.13.6. This document contains instructions on how to perform the upgrade for your deployments.

    Prerequisites

    • Follow the Terraform guidance for what to do before upgrading, notably ensuring that you have no pending changes to your Cumulus deployments before proceeding.
      • You should do a terraform plan to see if you have any pending changes for your deployment (for both the data-persistence-tf and cumulus-tf modules), and if so, run a terraform apply before doing the upgrade to Terraform 0.13.6
    • Review the Terraform v0.13 release notes to prepare for any breaking changes that may affect your custom deployment code. Cumulus' deployment code has already been updated for compatibility with version 0.13.
    • Install Terraform version 0.13.6. We recommend using Terraform Version Manager tfenv to manage your installed versons of Terraform, but this is not required.

    Upgrade your deployment code

    Terraform 0.13 does not support some of the syntax from previous Terraform versions, so you need to upgrade your deployment code for compatibility.

    Terraform provides a 0.13upgrade command as part of version 0.13 to handle automatically upgrading your code. Make sure to check out the documentation on batch usage of 0.13upgrade, which will allow you to upgrade all of your Terraform code with one command.

    Run the 0.13upgrade command until you have no more necessary updates to your deployment code.

    Upgrade your deployment

    1. Ensure that you are running Terraform 0.13.6 by running terraform --version. If you are using tfenv, you can switch versions by running tfenv use 0.13.6.

    2. For the data-persistence-tf and cumulus-tf directories, take the following steps:

      1. Run terraform init --reconfigure. The --reconfigure flag is required, otherwise you might see an error like:

        Error: Failed to decode current backend config

        The backend configuration created by the most recent run of "terraform init"
        could not be decoded: unsupported attribute "lock_table". The configuration
        may have been initialized by an earlier version that used an incompatible
        configuration structure. Run "terraform init -reconfigure" to force
        re-initialization of the backend.
      2. Run terraform apply to perform a deployment.

        WARNING: Even if Terraform says that no resource changes are pending, running the apply using Terraform version 0.13.6 will modify your backend state from version 0.12.12 to version 0.13.6 without requiring approval. Updating the backend state is a necessary part of the version 0.13.6 upgrade, but it is not completely transparent.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/workflow_tasks/discover_granules/index.html b/docs/v10.0.0/workflow_tasks/discover_granules/index.html index 15798d1945e..075db72f0d6 100644 --- a/docs/v10.0.0/workflow_tasks/discover_granules/index.html +++ b/docs/v10.0.0/workflow_tasks/discover_granules/index.html @@ -5,7 +5,7 @@ Discover Granules | Cumulus Documentation - + @@ -21,7 +21,7 @@ included in a granule's file list. That is, no such filtering based on filename occurs as described above.

    When set on the task configuration, the value applies to all collections during discovery. Otherwise, this property may be set on individual collections.

    Concurrency

    A number property that determines the level of concurrency with which granule duplicate checks are performed when duplicateGranuleHandling is skip or error.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when discover-granules discovers a large number of granules with skip or error duplicate handling. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the discover-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/workflow_tasks/files_to_granules/index.html b/docs/v10.0.0/workflow_tasks/files_to_granules/index.html index 8dd0bb91ed3..91a02af4424 100644 --- a/docs/v10.0.0/workflow_tasks/files_to_granules/index.html +++ b/docs/v10.0.0/workflow_tasks/files_to_granules/index.html @@ -5,13 +5,13 @@ Files To Granules | Cumulus Documentation - +
    Version: v10.0.0

    Files To Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming config.inputGranules and the task input list of s3 URIs along with the rest of the configuration objects to take the list of incoming files and sort them into a list of granule objects.

    Please note Files passed in without metadata defined previously for config.inputGranules will be added with the following keys:

    • size
    • bucket
    • key
    • fileName

    It is primarily intended to support compatibility with the standard output of a processing task, and convert that output into a granule object accepted as input by the majority of other Cumulus tasks.

    Task Inputs

    Input

    This task expects an incoming input that contains an array of 'staged' S3 URIs to move to their final archive location.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    inputGranules

    An array of Cumulus granule objects.

    This object will be used to define metadata values for the move granules task, and is the basis for the updated object that will be added to the output.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/workflow_tasks/lzards_backup/index.html b/docs/v10.0.0/workflow_tasks/lzards_backup/index.html index 51032e3721f..4ddfb6a3513 100644 --- a/docs/v10.0.0/workflow_tasks/lzards_backup/index.html +++ b/docs/v10.0.0/workflow_tasks/lzards_backup/index.html @@ -5,13 +5,13 @@ LZARDS Backup | Cumulus Documentation - +
    Version: v10.0.0

    LZARDS Backup

    The LZARDS backup task takes an array of granules and initiates backup requests to the LZARDS API, which will be handled asynchronously by LZARDS.

    Deployment

    The LZARDS backup task is not automatically deployed with Cumulus. To deploy the task through the Cumulus module, first you must specify a lzards_launchpad_passphrase in your terraform variables (e.g. variables.tf) like so:

    variable "lzards_launchpad_passphrase" {
    type = string
    default = ""
    }

    Then you can specify a value for your lzards_launchpad_passphrase in terraform.tfvars like so:

    lzards_launchpad_passphrase = your-passphrase

    Lastly, you need to make sure that the lzards_launchpad_passphrase is passed into the Cumulus module (in main.tf) like so:

    lzards_launchpad_passphrase  = var.lzards_launchpad_passphrase

    In short, deploying the LZARDS task requires configuring a passphrase variable and ensuring that your TF configuration passes that variable into the Cumulus module.

    Additional terraform configuration for the LZARDS task can be found in the cumulus module's variables.tf file, where the the relevant variables are prefixed with lzards_. You can add these variables to your deployment using the same process outlined above for lzards_launchpad_passphrase.

    Task Inputs

    Input

    This task expects an array of granules as input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Task Outputs

    Output

    The LZARDS task outputs a composite object containing:

    • the input granules array, and
    • a backupResults object that describes the results of LZARDS backup attempts.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/workflow_tasks/move_granules/index.html b/docs/v10.0.0/workflow_tasks/move_granules/index.html index 81416feaf8a..442dfce536b 100644 --- a/docs/v10.0.0/workflow_tasks/move_granules/index.html +++ b/docs/v10.0.0/workflow_tasks/move_granules/index.html @@ -5,13 +5,13 @@ Move Granules | Cumulus Documentation - +
    Version: v10.0.0

    Move Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming event.input array of Cumulus granule objects to do the following:

    • Move granules from their 'staging' location to the final location (as configured in the Sync Granules task)

    • Update the event.input object with the new file locations.

    • If the granule has a ECHO10/UMM CMR file(.cmr.xml or .cmr.json) file included in the event.input:

      • Update that file's access locations

      • Add it to the appropriate access URL category for the CMR filetype as defined by granule CNM filetype.

      • Set the CMR file to 'metadata' in the output granules object and add it to the granule files if it's not already present.

        Please note: Granules without a valid CNM type set in the granule file type field in event.input will be treated as "data" in the updated CMR metadata file

    • Task then outputs an updated list of granule objects.

    Task Inputs

    Input

    This task expects an incoming input that contains a list of 'staged' S3 URIs to move to their final archive location. If CMR metadata is to be updated for a granule, it must also be included in the input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects event.input to provide an array of Cumulus granule objects. The files listed for each granule represent the files to be acted upon as described in summary.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects with post-move file locations as the payload for the next task, and returns only the expected payload for the next task. If a CMR file has been specified for a granule object, the CMR resources related to the granule files will be updated according to the updated granule file metadata.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v10.0.0/workflow_tasks/parse_pdr/index.html b/docs/v10.0.0/workflow_tasks/parse_pdr/index.html index 704c5513f3f..f0e1a63a6b2 100644 --- a/docs/v10.0.0/workflow_tasks/parse_pdr/index.html +++ b/docs/v10.0.0/workflow_tasks/parse_pdr/index.html @@ -5,13 +5,13 @@ Parse PDR | Cumulus Documentation - +
    Version: v10.0.0

    Parse PDR

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to do the following with the incoming PDR object:

    • Stage it to an internal S3 bucket

    • Parse the PDR

    • Archive the PDR and remove the staged file if successful

    • Outputs a payload object containing metadata about the parsed PDR (e.g. total size of all files, files counts, etc) and a granules object

    The constructed granules object is created using PDR metadata to determine values like data type and version, collection definitions to determine a file storage location based on the extracted data type and version number.

    Granule file types are converted from the PDR spec types to CNM types according to the following translation table:

      HDF: 'data',
    HDF-EOS: 'data',
    SCIENCE: 'data',
    BROWSE: 'browse',
    METADATA: 'metadata',
    BROWSE_METADATA: 'metadata',
    QA_METADATA: 'metadata',
    PRODHIST: 'qa',
    QA: 'metadata',
    TGZ: 'data',
    LINKAGE: 'data'

    Files missing file types will have none assigned, files with invalid types will result in a PDR parse failure.

    Task Inputs

    Input

    This task expects an incoming input that contains name and path information about the PDR to be parsed. For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    Provider

    A Cumulus provider object. Used to define connection information for retrieving the PDR.

    Bucket

    Defines the bucket where the 'pdrs' folder for parsed PDRs will be stored.

    Collection

    A Cumulus collection object. Used to define granule file groupings and granule metadata for discovered files.

    Task Outputs

    This task outputs a single payload output object containing metadata about the parsed PDR (e.g. filesCount, totalSize, etc), a pdr object with information for later steps and a the generated array of granule objects.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v10.0.0/workflow_tasks/queue_granules/index.html b/docs/v10.0.0/workflow_tasks/queue_granules/index.html index 601c124de20..e258125a187 100644 --- a/docs/v10.0.0/workflow_tasks/queue_granules/index.html +++ b/docs/v10.0.0/workflow_tasks/queue_granules/index.html @@ -5,14 +5,14 @@ Queue Granules | Cumulus Documentation - +
    Version: v10.0.0

    Queue Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions, and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to schedule ingest of granules that were discovered on a remote host, whether via the DiscoverGranules task or the ParsePDR task.

    The task utilizes a defined collection in concert with a defined provider, either on each granule, or passed in via config to queue up ingest executions for each granule, or for batches of granules.

    The constructed granules object is defined by the collection passed in the configuration, and has impacts to other provided core Cumulus Tasks.

    Users of this task in a workflow are encouraged to carefully consider their configuration in context of downstream tasks and workflows.

    Task Inputs

    Each of the following sections are a high-level discussion of the intent of the various input/output/config values.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects an incoming input that contains granules and information about them and their files. For the specifics, see the Cumulus Tasks page entry for the schema.

    This input is most commonly the output from a preceding DiscoverGranules or ParsePDR task.

    Cumulus Configuration

    This task does expect values to be set in the task_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    provider

    A Cumulus provider object for the originating provider. Will be passed along to the ingest workflow. This will be overruled by more specific provider information that may exist on a granule.

    internalBucket

    The Cumulus internal system bucket.

    granuleIngestWorkflow

    A string property that denotes the name of the ingest workflow into which granules should be queued.

    queueUrl

    A string property that denotes the URL of the queue to which scheduled execution messages are sent.

    preferredQueueBatchSize

    A number property that sets an upper bound on the size of each batch of granules queued into the payload of an ingest execution. Setting this property to a value higher than 1 allows queueing of multiple granules per ingest workflow.

    As ingest executions typically expect granules in the payload to have a common collection and common provider, this property only sets an upper bound within which batches will be created based on common collection and provider information.

    This means batches may be smaller than the preferred size if collection or provider information diverge, but never larger.

    The default value if none is specified is 1, which will queue one ingest execution per granule.

    concurrency

    A number property that determines the level of concurrency with which ingest executions are scheduled. Granules or batches of granules will be queued up into executions at this level of concurrency.

    This property is also used to limit concurrency when updating granule status to queued.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when queue-granules receives a large number of granules as input. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the queue-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    executionNamePrefix

    A string property that will prefix the names of scheduled executions.

    childWorkflowMeta

    An object property that will be merged into the scheduled execution input's meta field.

    Task Outputs

    This task outputs an assembled array of workflow execution ARNs for all scheduled workflow executions within the payload's running object.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/workflows/cumulus-task-message-flow/index.html b/docs/v10.0.0/workflows/cumulus-task-message-flow/index.html index 7e1804831bc..0d5c575a58c 100644 --- a/docs/v10.0.0/workflows/cumulus-task-message-flow/index.html +++ b/docs/v10.0.0/workflows/cumulus-task-message-flow/index.html @@ -5,14 +5,14 @@ Cumulus Tasks: Message Flow | Cumulus Documentation - +
    Version: v10.0.0

    Cumulus Tasks: Message Flow

    Cumulus Tasks comprise Cumulus Workflows and are either AWS Lambda tasks or AWS Elastic Container Service (ECS) activities. Cumulus Tasks permit a payload as input to the main task application code. The task payload is additionally wrapped by the Cumulus Message Adapter. The Cumulus Message Adapter supplies additional information supporting message templating and metadata management of these workflows.

    Diagram showing how incoming and outgoing Cumulus messages for workflow steps are handled by the Cumulus Message Adapter

    The steps in this flow are detailed in sections below.

    Cumulus Message Format

    A full Cumulus Message has the following keys:

    • cumulus_meta: System runtime information that should generally not be touched outside of Cumulus library code or the Cumulus Message Adapter. Stores meta information about the workflow such as the state machine name and the current workflow execution's name. This information is used to look up the current active task. The name of the current active task is used to look up the corresponding task's config in task_config.
    • meta: Runtime information captured by the workflow operators. Stores execution-agnostic variables.
    • payload: Payload is runtime information for the tasks.

    In addition to the above keys, it may contain the following keys:

    • replace: A key generated in conjunction with the Cumulus Message adapter. It contains the location on S3 for a message payload and a Target JSON path in the message to extract it to.
    • exception: A key used to track workflow exceptions, should not be modified outside of Cumulus library code.

    Here's a simple example of a Cumulus Message:

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    A message utilizing the Cumulus Remote message functionality must have at least the keys replace and cumulus_meta. Depending on configuration other portions of the message may be present, however the cumulus_meta, meta, and payload keys must be present once extraction is complete.

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    Cumulus Message Preparation

    The event coming into a Cumulus Task is assumed to be a Cumulus Message and should first be handled by the functions described below before being passed to the task application code.

    Preparation Step 1: Fetch remote event

    Fetch remote event will fetch the full event from S3 if the cumulus message includes a replace key.

    Once "my-large-event.json" is fetched from S3, it's returned from the fetch remote event function. If no "replace" key is present, the event passed to the fetch remote event function is assumed to be a complete Cumulus Message and returned as-is.

    Preparation Step 2: Parse step function config from CMA configuration parameters

    This step determines what current task is being executed. Note this is different from what lambda or activity is being executed, because the same lambda or activity can be used for different tasks. The current task name is used to load the appropriate configuration from the Cumulus Message's 'task_config' configuration parameter.

    Preparation Step 3: Load nested event

    Using the config returned from the previous step, load nested event resolves templates for the final config and input to send to the task's application code.

    Task Application Code

    After message prep, the message passed to the task application code is of the form:

    {
    "input": {},
    "config": {}
    }

    Create Next Message functions

    Whatever comes out of the task application code is used to construct an outgoing Cumulus Message.

    Create Next Message Step 1: Assign outputs

    The config loaded from the Fetch step function config step may have a cumulus_message key. This can be used to "dispatch" fields from the task's application output to a destination in the final event output (via URL templating). Here's an example where the value of input.anykey would be dispatched as the value of payload.out in the final cumulus message:

    {
    "task_config": {
    "bar": "baz",
    "cumulus_message": {
    "input": "{$.payload.input}",
    "outputs": [
    {
    "source": "{$.input.anykey}",
    "destination": "{$.payload.out}"
    }
    ]
    }
    },
    "cumulus_meta": {
    "task": "Example",
    "message_source": "local",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "input": {
    "anykey": "anyvalue"
    }
    }
    }

    Create Next Message Step 2: Store remote event

    If the ReplaceConfiguration parameter is set, the configured key's value will be stored in S3 and the final output of the task will include a replace key that contains configuration for a future step to extract the payload on S3 back into the Cumulus Message. The replace key identifies where the large event node has been stored in S3.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/workflows/developing-a-cumulus-workflow/index.html b/docs/v10.0.0/workflows/developing-a-cumulus-workflow/index.html index 89f87e2a096..9e56a4fafc2 100644 --- a/docs/v10.0.0/workflows/developing-a-cumulus-workflow/index.html +++ b/docs/v10.0.0/workflows/developing-a-cumulus-workflow/index.html @@ -5,13 +5,13 @@ Creating a Cumulus Workflow | Cumulus Documentation - +
    Version: v10.0.0

    Creating a Cumulus Workflow

    The Cumulus workflow module

    To facilitate adding a workflows to your deployment Cumulus provides a workflow module.

    In combination with the Cumulus message, the workflow module provides a way to easily turn a Step Function definition into a Cumulus workflow, complete with:

    Using the module also ensures that your workflows will continue to be compatible with future versions of Cumulus.

    For more on the full set of current available options for the module, please consult the module README.

    Adding a new Cumulus workflow to your deployment

    To add a new Cumulus workflow to your deployment that is using the cumulus module, add a new workflow resource to your deployment directory, either in a new .tf file, or to an existing file.

    The workflow should follow a syntax similar to:

    module "my_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/vx.x.x/terraform-aws-cumulus-workflow.zip"

    prefix = "my-prefix"
    name = "MyWorkflowName"
    system_bucket = "my-internal-bucket"

    workflow_config = module.cumulus.workflow_config

    tags = { Deployment = var.prefix }

    state_machine_definition = <<JSON
    {}
    JSON
    }

    In the above example, you would add your state_machine_definition using the Amazon States Language, using tasks you've developed and Cumulus core tasks that are made available as part of the cumulus terraform module.

    Please note: Cumulus follows the convention of tagging resources with the prefix variable { Deployment = var.prefix } that you pass to the cumulus module. For resources defined outside of Core, it's recommended that you adopt this convention as it makes resources and/or deployment recovery scenarios much easier to manage.

    Examples

    For a functional example of a basic workflow, please take a look at the hello_world_workflow.

    For more complete/advanced examples, please read the following cookbook entries/topics:

    - + \ No newline at end of file diff --git a/docs/v10.0.0/workflows/developing-workflow-tasks/index.html b/docs/v10.0.0/workflows/developing-workflow-tasks/index.html index a517ecd0ea0..d7cfc34aade 100644 --- a/docs/v10.0.0/workflows/developing-workflow-tasks/index.html +++ b/docs/v10.0.0/workflows/developing-workflow-tasks/index.html @@ -5,13 +5,13 @@ Developing Workflow Tasks | Cumulus Documentation - +
    Version: v10.0.0

    Developing Workflow Tasks

    Workflow tasks can be either AWS Lambda Functions or ECS Activities.

    Lambda functions

    The full set of available core Lambda functions can be found in the deployed cumulus module zipfile at /tasks, as well as reference documentation here. These Lambdas can be referenced in workflows via the outputs from that module (see the cumulus-template-deploy repo for an example).

    The tasks source is located in the Cumulus repository at cumulus/tasks.

    You can also develop your own Lambda function. See the Lambda Functions page to learn more.

    ECS Activities

    ECS activities are supported via the cumulus_ecs_module available from the Cumulus release page.

    Please read the module README for configuration details.

    For assistance in creating a task definition within the module read the AWS Task Definition Docs.

    For a step-by-step example of using the cumulus_ecs_module, please see the related cookbook entry.

    Cumulus Docker Image

    ECS activities require a docker image. Cumulus provides a docker image (source for node 12x+ lambdas on dockerhub: cumuluss/cumulus-ecs-task.

    Alternate Docker Images

    Custom docker images/runtimes are supported as are private registries. For details on configuring a private registry/image see the AWS documentation on Private Registry Authentication for Tasks.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/workflows/docker/index.html b/docs/v10.0.0/workflows/docker/index.html index 185a0997c63..0c20dbd6620 100644 --- a/docs/v10.0.0/workflows/docker/index.html +++ b/docs/v10.0.0/workflows/docker/index.html @@ -5,7 +5,7 @@ Dockerizing Data Processing | Cumulus Documentation - + @@ -14,7 +14,7 @@ 2) validate the output (in this case just check for existence) 3) use 'ncatted' to update the resulting file to be CF-compliant 4) write out metadata generated for this file

    Process Testing

    It is important to have tests for data processing, however in many cases datafiles can be large so it is not practical to store the test data in the repository. Instead, test data is currently stored on AWS S3, and can be retrieved using the AWS CLI.

    aws s3 sync s3://cumulus-ghrc-logs/sample-data/collection-name data

    Where collection-name is the name of the data collection, such as 'avaps', or 'cpl'. For example, an abridged version of the data for CPL includes:

    ├── cpl
    │   ├── input
    │   │   ├── HS3_CPL_ATB_12203a_20120906.hdf5
    │   │   ├── HS3_CPL_OP_12203a_20120906.hdf5
    │   └── output
    │   ├── HS3_CPL_ATB_12203a_20120906.nc
    │   ├── HS3_CPL_ATB_12203a_20120906.nc.meta.xml
    │   ├── HS3_CPL_OP_12203a_20120906.nc
    │   ├── HS3_CPL_OP_12203a_20120906.nc.meta.xml

    Contained in the input directory are all possible sets of data files, while the output directory is the expected result of processing. In this case the hdf5 files are converted to NetCDF files and XML metadata files are generated.

    The docker image for a process can be used on the retrieved test data. First create a test-output directory in the newly created data directory.

    mkdir data/test-output

    Then run the docker image using docker-compose.

    docker-compose run test

    This will process the data in the data/input directory and put the output into data/test-output. Repositories also include Python based tests which will validate this newly created output to the contents of data/output. Use Python's Nose tool to run the included tests.

    nosetests

    If the data/test-output directory validated against the contents of data/output the tests will be successful, otherwise an error will be reported.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/workflows/index.html b/docs/v10.0.0/workflows/index.html index d139d887acf..1438b718b51 100644 --- a/docs/v10.0.0/workflows/index.html +++ b/docs/v10.0.0/workflows/index.html @@ -5,13 +5,13 @@ Workflows | Cumulus Documentation - +
    Version: v10.0.0

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    Provider data ingest and GIBS have a set of common needs in getting data from a source system and into the cloud where they can be distributed to end users. These common needs are:

    • Data Discovery - Crawling, polling, or detecting changes from a variety of sources.
    • Data Transformation - Taking data files in their original format and extracting and transforming them into another desired format such as visible browse images.
    • Archival - Storage of the files in a location that's accessible to end users.

    The high level view of the architecture and many of the individual steps are the same but the details of ingesting each type of collection differs. Different collection types and different providers have different needs. The individual boxes of a workflow are not only different. The branching, error handling, and multiplicity of the arrows connecting the boxes are also different. Some need visible images rendered from component data files from multiple collections. Some need to contact the CMR with updated metadata. Some will have different retry strategies to handle availability issues with source data systems.

    AWS and other cloud vendors provide an ideal solution for parts of these problems but there needs to be a higher level solution to allow the composition of AWS components into a full featured solution. The Ingest Workflow Architecture is designed to meet the needs for Earth Science data ingest and transformation.

    Goals

    Flexibility and Composability

    The steps to ingest and process data is different for each collection within a provider. Ingest should be as flexible as possible in the rearranging of steps and configuration.

    We want to use lego-like individual steps that can be composed by an operator.

    Individual steps should ...

    • Be as ignorant as possible of the overall flow. They should not be aware of previous steps.
    • Be runnable on their own.
    • Define their input and output in simple data structures.
    • Be domain agnostic.
    • Not make assumptions of specifics of what goes into a granule for example.

    Scalable

    The ingest architecture needs to be scalable both to handle ingesting hundreds of millions of granules and interpret dozens of different workflows.

    Data Provenance

    • We should have traceability for how data was produced and where it comes from.
    • Use immutable representations of data. Data once received is not overwritten. Data can be removed for cleanup.
    • All software is versioned. We can trace transformation of data by tracking the immutable source data and the versioned software applied to it.

    Operator Visibility and Control

    • Operators should be able to see and understand everything that is happening in the system.
    • It should be obvious why things are happening and straightforward to diagnose problems.
    • We generally assume that the operators know best in terms of the limits on a providers infrastructure, how often things need to be done, and details of a collection. The architecture should defer to their decisions and knowledge while providing safety nets to prevent problems.

    A Reconfigurable Workflow Architecture

    The Ingest Workflow Architecture is defined by two entity types, Workflows and Tasks. A Workflow is a set of composed Tasks to complete an objective such as ingesting a granule. Tasks are the individual steps of a Workflow that perform one job. The workflow is responsible for executing the right task based on the current state and response from the last task executed. Tasks are completely decoupled in that they don't call each other or even need to know about the presence of other tasks.

    Workflows and tasks are configured as Terraform resources, which are triggered via configured rules within Cumulus.

    Diagram showing the Step Function execution path through workflow tasks for a collection ingest

    See the Example GIBS Ingest Architecture showing how workflows and tasks are used to define the GIBS Ingest Architecture.

    Workflows

    A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions.

    Benefits of AWS Step Functions

    AWS Step functions are described in detail in the AWS documentation but they provide several benefits which are applicable to AWS.

    • Prebuilt solution
    • Operations Visibility
      • Visual diagram
      • Every execution is recorded with both inputs and output for every step.
    • Composability
      • Allow composing AWS Lambdas and code running in other steps. Code can be run in EC2 to interface with it or even on premise if desired.
      • Step functions allow specifying when steps run in parallel or choices between steps based on data from the previous step.
    • Flexibility
      • Step functions are designed to be easy to build new applications and reconfigure. We're exposing that flexibility directly to the provider.
    • Reliability and Error Handling
      • Step functions allow configuration of retries and adding handling of error conditions.
    • Described via data
      • This makes it easy to save the step function in configuration management solutions.
      • We can build simple interfaces on top of the flexibility provided.

    Workflow Scheduler

    The scheduler is responsible for initiating a step function and passing in the relevant data for a collection. This is currently configured as an interval for each collection. The scheduler service creates the initial event by combining the collection configuration with the AWS execution context defined via the cumulus terraform module.

    Tasks

    A workflow is composed of tasks. Each task is responsible for performing a discrete step of the ingest process. These can be activities like:

    • Crawling a provider website for new data.
    • Uploading data from a provider to S3.
    • Executing a process to transform data.

    AWS Step Functions permit tasks to be code running anywhere, even on premise. We expect most tasks will be written as Lambda functions in order to take advantage of the easy deployment, scalability, and cost benefits provided by AWS Lambda.

    • Leverages Existing Work
      • The design leverages the existing work of Amazon by defining workflows using the AWS Step Function State Language. This is the language that was created for describing the state machines used in AWS Step Functions.
    • Open for Extension
      • Both meta and task_config which are used for configuring at the collection and task levels do not dictate the fields and structure of the configuration. Additional task specific JSON schemas can be used for extending the validation of individual steps.
    • Data-centric Configuration
      • The use of a single JSON configuration file allows this to be added to a workflow. We build additional support on top of the configuration file for simpler domain specific configuration or interactive GUIs.

    For more details on Task Messages and Configuration, visit Cumulus configuration and message protocol documentation.

    Ingest Deploy

    To view deployment documentation, please see the Cumulus deployment documentation.

    Tradeoffs, and Benefits

    This section documents various tradeoffs and benefits of the Ingest Workflow Architecture.

    Tradeoffs

    Workflow execution is handled completely by AWS

    This means we can't add our own code into the orchestration of the workflow. We can't add new features not supported by Step Functions. We can't do things like enforce that the responses from tasks always conform to a schema or extract the configuration for a task ahead of it's execution.

    If we implemented our own orchestration we'd be able to add all of these. We save significant amounts of development effort and gain all the features of Step Functions for this trade off. One workaround is by providing a library of common task capabilities. These would optionally be available to tasks that can be implemented with Node.js and are able to include the library.

    Workflow Configuration is specified in AWS Step Function States Language

    The current design combines the states language defined by AWS with Ingest specific configuration. This means our representation has a tight coupling with their standard. If they make backwards incompatible changes in the future we will have to deal with existing projects written against that.

    We avoid having to develop our own standard and code to process it. The design can support new features in AWS Step Functions without needing to update the Ingest library code changes. It is unlikely they will make a backwards incompatible change at this point. One mitigation for this is writing data transformations to a new format if that were to happen.

    Collection Configuration Flexibility vs Complexity

    The Collections Configuration File is very flexible but requires more knowledge of AWS step functions to configure. A person modifying this file directly would need to comfortable editing a JSON file and configuring AWS Step Functions state transitions which address AWS resources.

    The configuration file itself is not necessarily meant to be edited by a human directly. Since we are developing a reconfigurable, composable architecture that specified entirely in data additional tools can be developed on top of it. The existing recipes.json files can be mapped to this format. Operational Tools like a GUI can be built that provide a usable interface for customizing workflows but it will take time to develop these tools.

    Benefits

    This section describes benefits of the Ingest Workflow Architecture.

    Simplicity

    The concepts of Workflows and Tasks are simple ones that should make sense to providers. Additionally, the implementation will only consist of a few components because the design leverages existing services and capabilities of AWS. The Ingest implementation will only consist of some reusable task code to make task implementation easier, Ingest deployment, and the Workflow Scheduler.

    Composability

    The design aims to satisfy the needs for ingest integrating different workflows for providers. It's flexible in terms of the ability to arrange tasks to meet the needs of a collection. Providers have developed and incorporated open source tools over the years. All of these are easily integrable into the workflows as tasks.

    There is low coupling between task steps. Failures of one component don't bring the whole system down. Individual tasks can be deployed separately.

    Scalability

    AWS Step Functions scale up as needed and aren't limited by a set of number of servers. They also easily allow you to leverage the inherent scalability of serverless functions.

    Monitoring and Auditing

    • Every execution is captured.
    • Every task run has captured input and outputs.
    • CloudWatch Metrics can be used for monitoring many of the events with the StepFunctions. It can also generate alarms for the whole process.
    • Visual report of the entire configuration.
      • Errors and success states are highlighted visually in the flow.

    Data Provenance

    • Monitoring and auditing ensures we know the data that was given to a task.
    • Workflows are versioned and the state machines stored in AWS Step Functions are immutable. Once created they cannot change.
    • Versioning of data in S3 or using immutable records in S3 will mean we always know what data was created as the result of a step or fed into a step.

    Appendix

    Example GIBS Ingest Architecture

    This shows the GIBS Ingest Architecture as an example of the use of the Ingest Workflow Architecture.

    • The GIBS Ingest Architecture consists of two workflows per collection type. There is one for discovery and one for ingest. The final stage of discovery triggers multiple ingest workflows for each MRF granule that needs to be generated.
    • It demonstrates both lambdas as tasks and a container used for MRF generation.

    GIBS Ingest Workflows

    Diagram showing the AWS Step Function execution path for a GIBS ingest workflow

    GIBS Ingest Granules Workflow

    This shows a visualization of an execution of the ingets granules workflow in step functions. The steps highlighted in green are the ones that executed and completed successfully.

    Diagram showing the AWS Step Function execution path for a GIBS ingest granules workflow

    - + \ No newline at end of file diff --git a/docs/v10.0.0/workflows/input_output/index.html b/docs/v10.0.0/workflows/input_output/index.html index 7ca064b8b73..31b74289f2d 100644 --- a/docs/v10.0.0/workflows/input_output/index.html +++ b/docs/v10.0.0/workflows/input_output/index.html @@ -5,14 +5,14 @@ Workflow Inputs & Outputs | Cumulus Documentation - +
    Version: v10.0.0

    Workflow Inputs & Outputs

    General Structure

    Cumulus uses a common format for all inputs and outputs to workflows. The same format is used for input and output from workflow steps. The common format consists of a JSON object which holds all necessary information about the task execution and AWS environment. Tasks return objects identical in format to their input with the exception of a task-specific payload field. Tasks may also augment their execution metadata.

    Cumulus Message Adapter

    The Cumulus Message Adapter and Cumulus Message Adapter libraries help task developers integrate their tasks into a Cumulus workflow. These libraries adapt input and outputs from tasks into the Cumulus Message format. The Scheduler service creates the initial event message by combining the collection configuration, external resource configuration, workflow configuration, and deployment environment settings. The subsequent workflow messages between tasks must conform to the message schema. By using the Cumulus Message Adapter, individual task Lambda functions only receive the input and output specifically configured for the task, and not non-task-related message fields.

    The Cumulus Message Adapter libraries are called by the tasks with a callback function containing the business logic of the task as a parameter. They first adapt the incoming message to a format more easily consumable by Cumulus tasks, then invoke the task, and then adapt the task response back to the Cumulus message protocol to be sent to the next task.

    A task's Lambda function can be configured to include a Cumulus Message Adapter library which constructs input/output messages and resolves task configurations. The CMA can then be included in one of several ways:

    Lambda Layer

    In order to make use of this configuration, a Lambda layer must be uploaded to your account. Due to platform restrictions, Core cannot currently support sharable public layers, however you can deploy the appropriate version from the release page in two ways:

    Once you've deployed the layer, integrate the CMA layer with your Lambdas:

    • If using the cumulus module, set the cumulus_message_adapter_lambda_layer_version_arn in your .tfvars file to integrate the CMA layer with all core Cumulus lambdas.
    • If including your own Lambda or ECS task Terraform modules, specify the CMA layer ARN in the Terraform resource definitions. Also, make sure to set the CUMULUS_MESSAGE_ADAPTER_DIR environment variable for the task to /opt for the CMA integration to work properly.

    In the future if you wish to update/change the CMA version you will need to update the deployed CMA, and update the layer configuration for the impacted Lambdas as needed.

    Please Note: Updating/removing a layer does not change a deployed Lambda, so to update the CMA you should deploy a new version of the CMA layer, update the associated Lambda configuration to reference the new CMA version, and re-deploy your Lambdas.

    Manual Addition

    You can include the CMA package in the Lambda code in the cumulus-message-adapter sub-directory in your lambda .zip, for any Lambda runtime that includes a python runtime. python 2 is included in Lambda runtimes that use Amazon Linux, however Amazon Linux 2 will not support this directly.

    Please note: It is expected that upcoming Cumulus releases will update the CMA layer to include a python runtime.

    If you are manually adding the message adapter to your source and utilizing the CMA, you should set the Lambda's CUMULUS_MESSAGE_ADAPTER_DIR environment variable to target the installation path for the CMA.

    CMA Input/Output

    Input to the task application code is a json object with keys:

    • input: By default, the incoming payload is the payload output from the previous task, or it can be a portion of the payload as configured for the task in the corresponding .tf workflow definition file.
    • config: Task-specific configuration object with URL templates resolved.

    Output from the task application code is returned in and placed in the payload key by default, but the config key can also be used to return just a portion of the task output.

    CMA configuration

    As of Cumulus > 1.15 and CMA > v1.1.1, configuration of the CMA is expected to be driven by AWS Step Function Parameters.

    Using the CMA package with the Lambda by any of the above mentioned methods (Lambda Layers, manual) requires configuration for its various features via a specific Step Function Parameters configuration format (see sample workflows in the examples cumulus-tf source for more examples):

    {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": "{some config}",
    "task_config": "{some config}"
    }
    }

    The "event.$": "$" parameter is required as it passes the entire incoming message to the CMA client library for parsing, and the CMA itself to convert the incoming message into a Cumulus message for use in the function.

    The following are the CMA's current configuration settings:

    ReplaceConfig (Cumulus Remote Message)

    Because of the potential size of a Cumulus message, mainly the payload field, a task can be set via configuration to store a portion of its output on S3 with a message key Remote Message that defines how to retrieve it and an empty JSON object {} in its place. If the portion of the message targeted exceeds the configured MaxSize (defaults to 0 bytes) it will be written to S3.

    The CMA remote message functionality can be configured using parameters in several ways:

    Partial Message

    Setting the Path/Target path in the ReplaceConfig parameter (and optionally a non-default MaxSize)

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 1,
    "Path": "$.payload",
    "TargetPath": "$.payload"
    }
    }
    }
    }
    }

    will result in any payload output larger than the MaxSize (in bytes) to be written to S3. The CMA will then mark that the key has been replaced via a replace key on the event. When the CMA picks up the replace key in future steps, it will attempt to retrieve the output from S3 and write it back to payload.

    Note that you can optionally use a different TargetPath than Path, however as the target is a JSON path there must be a key to target for replacement in the output of that step. Also note that the JSON path specified must target one node, otherwise the CMA will error, as it does not support multiple replacement targets.

    If TargetPath is omitted, it will default to the value for Path.

    Full Message

    Setting the following parameters for a lambda:

    DiscoverGranules:
    Parameters:
    cma:
    event.$: '$'
    ReplaceConfig:
    FullMessage: true

    will result in the CMA assuming the entire inbound message should be stored to S3 if it exceeds the default max size.

    This is effectively the same as doing:

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 0,
    "Path": "$",
    "TargetPath": "$"
    }
    }
    }
    }
    }

    Cumulus Message example

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Cumulus Remote Message example

    The message may contain a reference to an S3 Bucket, Key and TargetPath as follows:

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    task_config

    This configuration key contains the input/output configuration values for definition of inputs/outputs via URL paths. Important: These values are all relative to json object configured for event.$.

    This configuration's behavior is outlined in the CMA step description below.

    The configuration should follow the format:

    {
    "FunctionName": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "other_cma_configuration": "<config object>",
    "task_config": "<task config>"
    }
    }
    }
    }

    Example:

    {
    "StepFunction": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "sfnEnd": true,
    "stack": "{$.meta.stack}",
    "bucket": "{$.meta.buckets.internal.name}",
    "stateMachine": "{$.cumulus_meta.state_machine}",
    "executionName": "{$.cumulus_meta.execution_name}",
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    }
    }
    }

    Cumulus Message Adapter Steps

    1. Reformat AWS Step Function message into Cumulus Message

    Due to the way AWS handles Parameterized messages, when Parameters are used the CMA takes an inbound message:

    {
    "resource": "arn:aws:lambda:us-east-1:<lambda arn values>",
    "input": {
    "Other Parameter": {},
    "cma": {
    "ConfigKey": {
    "config values": "some config values"
    },
    "event": {
    "cumulus_meta": {},
    "payload": {},
    "meta": {},
    "exception": {}
    }
    }
    }
    }

    and takes the following actions:

    • Takes the object at input.cma.event and makes it the full input
    • Merges all of the keys except event under input.cma into the parent input object

    This results in the incoming message (presumably a Cumulus message) with any cma configuration parameters merged in being passed to the CMA. All other parameterized values defined outside of the cma key are ignored

    2. Resolve Remote Messages

    If the incoming Cumulus message has a replace key value, the CMA will attempt to pull the payload from S3,

    For example, if the incoming contains the following:

      "meta": {
    "foo": {}
    },
    "replace": {
    "TargetPath": "$.meta.foo",
    "Bucket": "some_bucket",
    "Key": "events/some-event-id"
    }

    The CMA will attempt to pull the file stored at Bucket/Key and replace the value at TargetPath, then remove the replace object entirely and continue.

    3. Resolve URL templates in the task configuration

    In the workflow configuration (defined under the task_config key), each task has its own configuration, and it can use URL template as a value to achieve simplicity or for values only available at execution time. The Cumulus Message Adapter resolves the URL templates (relative to the event configuration key) and then passes message to next task. For example, given a task which has the following configuration:

    {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }
    }
    }
    }

    and and incoming message that contains:

    {
    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    }
    }

    The corresponding Cumulus Message would contain:

    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }

    The message sent to the task would be:

    "config" : {
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    },
    "inlinestr": "prefixbarsuffix",
    "array": ["bar"],
    "object": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    },
    "input": "{...}"

    URL template variables replace dotted paths inside curly brackets with their corresponding value. If the Cumulus Message Adapter cannot resolve a value, it will ignore the template, leaving it verbatim in the string. While seemingly complex, this allows significant decoupling of Tasks from one another and the data that drives them. Tasks are able to easily receive runtime configuration produced by previously run tasks and domain data.

    4. Resolve task input

    By default, the incoming payload is the payload from the previous task. The task can also be configured to use a portion of the payload its input message. For example, given a task specifies cma.task_config.cumulus_message.input:

        ExampleTask:
    Parameters:
    cma:
    event.$: '$'
    task_config:
    cumulus_message:
    input: '{$.payload.foo}'

    The task configuration in the message would be:

        {
    "task_config": {
    "cumulus_message": {
    "input": "{$.payload.foo}"
    }
    },
    "payload": {
    "foo": {
    "anykey": "anyvalue"
    }
    }
    }

    The Cumulus Message Adapter will resolve the task input, instead of sending the whole payload as task input, the task input would be:

        {
    "input" : {
    "anykey": "anyvalue"
    },
    "config": {...}
    }

    5. Resolve task output

    By default, the task's return value is the next payload. However, the workflow task configuration can specify a portion of the return value as the next payload, and can also augment values to other fields. Based on the task configuration under cma.task_config.cumulus_message.outputs, the Message Adapter uses a task's return value to output a message as configured by the task-specific config defined under cma.task_config. The Message Adapter dispatches a "source" to a "destination" as defined by URL templates stored in the task-specific cumulus_message.outputs. The value of the task's return value at the "source" URL is used to create or replace the value of the task's return value at the "destination" URL. For example, given a task specifies cumulus_message.output in its workflow configuration as follows:

    {
    "ExampleTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    }
    }
    }
    }
    }

    The corresponding Cumulus Message would be:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Given the response from the task is:

        {
    "output": {
    "anykey": "boo"
    }
    }

    The Cumulus Message Adapter would output the following Cumulus Message:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    6. Apply Remote Message Configuration

    If the ReplaceConfig configuration parameter is defined, the CMA will evaluate the configuration options provided, and if required write a portion of the Cumulus Message to S3, and add a replace key to the message for future steps to utilize.

    Please Note: the non user-modifiable field cumulus-meta will always be retained, regardless of the configuration.

    For example, if the output message (post output configuration) from a cumulus message looks like:

        {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    the resultant output would look like:

    {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "replace": {
    "TargetPath": "$",
    "Bucket": "some-internal-bucket",
    "Key": "events/some-event-id"
    }
    }

    Additional features

    Validate task input, output and configuration messages against the schemas provided

    The Cumulus Message Adapter has the capability to validate task input, output and configuration messages against their schemas. The default location of the schemas is the schemas folder in the top level of the task and the default filenames are input.json, output.json, and config.json. The task can also configure a different schema location. If no schema can be found, the Cumulus Message Adapter will not validate the messages.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/workflows/lambda/index.html b/docs/v10.0.0/workflows/lambda/index.html index 700cce62661..77fae51b28c 100644 --- a/docs/v10.0.0/workflows/lambda/index.html +++ b/docs/v10.0.0/workflows/lambda/index.html @@ -5,13 +5,13 @@ Develop Lambda Functions | Cumulus Documentation - +
    Version: v10.0.0

    Develop Lambda Functions

    Develop a new Cumulus Lambda

    AWS provides great getting started guide for building Lambdas in the developer guide.

    Cumulus currently supports the following environments for Cumulus Message Adapter enabled functions:

    Additionally you may chose to include any of the other languages AWS supports as a resource with reduced feature support.

    Deploy a Lambda

    Node.js Lambda

    For a new Node.js Lambda, create a new function and add an aws_lambda_function resource to your Cumulus deployment (for examples, see the example in source example/lambdas.tf and ingest/lambda-functions.tf) as either a new .tf file, or added to an existing .tf file:

    resource "aws_lambda_function" "myfunction" {
    function_name = "${var.prefix}-function"
    filename = "/path/to/zip/lambda.zip"
    source_code_hash = filebase64sha256("/path/to/zip/lambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"

    vpc_config {
    subnet_ids = var.subnet_ids
    security_group_ids = var.security_group_ids
    }
    }

    Please note: This example contains the minimum set of required configuration.

    Make sure to include a vpc_config that matches the information you've provided the cumulus module if intending to integrate the lambda with a Cumulus deployment.

    Java Lambda

    Java Lambdas are created in much the same way as the Node.js example above.

    The source points to a folder with the compiled .class files and dependency libraries in the Lambda Java zip folder structure (details here), not an uber-jar.

    The deploy folder referenced here would contain a folder 'test_task/task/' which contains Task.class and TaskLogic.class as well as a lib folder containing dependency jars.

    Python Lambda

    Python Lambdas are created the same way as the Node.js example above.

    Cumulus Message Adapter

    For Lambdas wishing to utilize the Cumulus Message Adapter(CMA), you should define a layers key on your Lambda resource with the CMA you wish to include. See the input_output docs for more on how to create/use the CMA.

    Other Lambda Options

    Cumulus supports all of the options available to you via the aws_lambda_function Terraform resource. For more information on what's available, check out the Terraform resource docs.

    Cloudwatch log groups

    If you want to enable Cloudwatch logging for your Lambda resource, you'll need to add a aws_cloudwatch_log_group resource to your Lambda definition:

    resource "aws_cloudwatch_log_group" "myfunction_log_group" {
    name = "/aws/lambda/${aws_lambda_function.myfunction.function_name}"
    retention_in_days = 30
    tags = { Deployment = var.prefix }
    }
    - + \ No newline at end of file diff --git a/docs/v10.0.0/workflows/protocol/index.html b/docs/v10.0.0/workflows/protocol/index.html index 9014cbe5257..f947a8b4cb5 100644 --- a/docs/v10.0.0/workflows/protocol/index.html +++ b/docs/v10.0.0/workflows/protocol/index.html @@ -5,13 +5,13 @@ Workflow Protocol | Cumulus Documentation - +
    Version: v10.0.0

    Workflow Protocol

    Configuration and Message Use Diagram

    A diagram showing at which point in a workflow the Cumulus message is checked for conformity with the message schema and where the configuration is checked for conformity with the configuration schema

    • Configuration - The Cumulus workflow configuration defines everything needed to describe an instance of Cumulus.
    • Scheduler - This starts ingest of a collection on configured intervals.
    • Input to Step Functions - The Scheduler uses the Configuration as source data to construct the input to the Workflow.
    • AWS Step Functions - Run the workflows as kicked off by the scheduler or other processes.
    • Input to Task - The input for each task is a JSON document that conforms to the message schema.
    • Output from Task - The output of each task must conform to the message schemas as well and is used as the input for the subsequent task.
    - + \ No newline at end of file diff --git a/docs/v10.0.0/workflows/workflow-configuration-how-to/index.html b/docs/v10.0.0/workflows/workflow-configuration-how-to/index.html index ae9aa31ae5b..ef53e93bd68 100644 --- a/docs/v10.0.0/workflows/workflow-configuration-how-to/index.html +++ b/docs/v10.0.0/workflows/workflow-configuration-how-to/index.html @@ -5,7 +5,7 @@ Workflow Configuration How To's | Cumulus Documentation - + @@ -24,7 +24,7 @@ To take a subset of any given metadata, use the option substring.

    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{substring(file.fileName, 0, 3)}"

    This example will populate to "MOD09GQ/MOD"

    In addition to substring, several datetime-specific functions are available, which can parse a datetime string in the metadata and extract a certain part of it:

    "url_path": "{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"

    or

     "url_path": "{dateFormat(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime, YYYY-MM-DD[T]HH[:]mm[:]ss)}"

    The following functions are implemented:

    • extractYear - returns the year, formatted as YYYY
    • extractMonth - returns the month, formatted as MM
    • extractDate - returns the day of the month, formatted as DD
    • extractHour - returns the hour in 24-hour format, with no leading zero
    • dateFormat - takes a second argument describing how to format the date, and passes the metadata date string and the format argument to moment().format()

    Note: the move-granules step needs to be in the workflow for this template to be populated and the file moved. This cmrMetadata or CMR granule XML needs to have been generated and stored on S3. From there any field could be retrieved and used for a url_path.

    Adding Metadata dates and times to the URL Path

    There are a number of options to pull dates from the CMR file metadata. With this metadata:

    <Granule>
    <Temporal>
    <RangeDateTime>
    <BeginningDateTime>2003-02-19T00:00:00Z</BeginningDateTime>
    <EndingDateTime>2003-02-19T23:59:59Z</EndingDateTime>
    </RangeDateTime>
    </Temporal>
    </Granule>

    The following examples of url_path could be used.

    {extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the year from the full date: 2003.

    {extractMonth(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the month: 2.

    {extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the day: 19.

    {extractHour(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the hour: 0.

    Different values can be combined to create the url_path. For example

    {
    "bucket": "sample-protected-bucket",
    "name": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)/extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"
    }

    The final file location for the above would be s3://sample-protected-bucket/MOD09GQ/2003/19/MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.

    - + \ No newline at end of file diff --git a/docs/v10.0.0/workflows/workflow-triggers/index.html b/docs/v10.0.0/workflows/workflow-triggers/index.html index 2c493c678b8..c1424512b42 100644 --- a/docs/v10.0.0/workflows/workflow-triggers/index.html +++ b/docs/v10.0.0/workflows/workflow-triggers/index.html @@ -5,13 +5,13 @@ Workflow Triggers | Cumulus Documentation - +
    Version: v10.0.0

    Workflow Triggers

    For a workflow to run, it needs to be associated with a rule (see rule configuration). The rule configuration determines how and when a workflow execution is triggered. Rules can be triggered one time, on a schedule, or by new data written to a kinesis stream.

    There are three lambda functions in the API package responsible for scheduling and starting workflows: SF scheduler, message consumer, and SF starter. Each Cumulus instance comes with a Start SF SQS queue.

    The SF scheduler lambda puts a message onto the Start SF queue. This message is picked up the Start SF lambda and an execution is started with the body of the message as the input.

    When a one time rule is created, the schedule SF lambda is triggered. Rules that are not one time are associated with a CloudWatch event which will manage the trigger of the lambdas that trigger the workflows.

    For a scheduled rule, the Cloudwatch event is triggered on the given schedule which calls directly to the schedule SF lambda.

    For a kinesis rule, when data is added to the kinesis stream, the Cloudwatch event is triggered, which calls the message consumer lambda. The message consumer lambda parses the kinesis message and finds all of the rules associated with that message. For each rule (which corresponds to one workflow), the schedule SF lambda is triggered to queue a message to start the workflow.

    For an sns rule, when a message is published to the SNS topic, the message consumer receives the SNS message (JSON expected), parses it into an object, starts a new execution of the workflow associated with the rule and passes the object in the payload field of the Cumulus message.

    Diagram showing how workflows are scheduled via rules

    - + \ No newline at end of file diff --git a/docs/v10.1.0/adding-a-task/index.html b/docs/v10.1.0/adding-a-task/index.html index 53f3082e500..820b69c267b 100644 --- a/docs/v10.1.0/adding-a-task/index.html +++ b/docs/v10.1.0/adding-a-task/index.html @@ -5,13 +5,13 @@ Contributing a Task | Cumulus Documentation - +
    Version: v10.1.0

    Contributing a Task

    We're tracking reusable Cumulus tasks in this list and, if you've got one you'd like to share with others, you can add it!

    Right now we're focused on tasks distributed via npm, but are open to including others. For now the script that pulls all the data for each package only supports npm.

    The tasks.md file is generated in the build process

    The tasks list in docs/tasks.md is generated from the list of task package names from the tasks folder.

    Do not edit the docs/tasks.md file directly.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/api/index.html b/docs/v10.1.0/api/index.html index 16087fdbcd2..1cb9ab22bcd 100644 --- a/docs/v10.1.0/api/index.html +++ b/docs/v10.1.0/api/index.html @@ -5,13 +5,13 @@ Cumulus API | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v10.1.0/architecture/index.html b/docs/v10.1.0/architecture/index.html index 1ea836352aa..f7253eec13f 100644 --- a/docs/v10.1.0/architecture/index.html +++ b/docs/v10.1.0/architecture/index.html @@ -5,14 +5,14 @@ Architecture | Cumulus Documentation - +
    Version: v10.1.0

    Architecture

    Architecture

    Below, find a diagram with the components that comprise an instance of Cumulus.

    Architecture diagram of a Cumulus deployment

    This diagram details all of the major architectural components of a Cumulus deployment.

    While the diagram can feel complex, it can easily be digested in several major components:

    Data Distribution

    End Users can access data via Cumulus's distribution submodule, which includes ASF's thin egress application, this provides authenticated data egress, temporary S3 links and other statistics features.

    End user exposure of Cumulus's holdings is expected to be provided by an external service.

    For NASA use, this is assumed to be CMR in this diagram.

    Data ingest

    Workflows

    The core of the ingest and processing capabilities in Cumulus is built into the deployed AWS Step Function workflows. Cumulus rules trigger workflows via either Cloud Watch rules, Kinesis streams, SNS topic, or SQS queue. The workflows then run with a configured Cumulus message, utilizing built-in processes to report status of granules, PDRs, executions, etc to the Data Persistence components.

    Workflows can optionally report granule metadata to CMR, and workflow steps can report metrics information to a shared SNS topic, which could be subscribed to for near real time granule, execution, and PDR status. This could be used for metrics reporting using an external ELK stack, for example.

    Data persistence

    Cumulus entity state data is stored in a set of DynamoDB database tables, and is exported to an ElasticSearch instance for non-authoritative querying/state data for the API and other applications that require more complex queries.

    Data discovery

    Discovering data for ingest is handled via workflow step components using Cumulus provider and collection configurations and various triggers. Data can be ingested from AWS S3, FTP, HTTPS and more.

    Database

    Cumulus utilizes a user-provided PostgreSQL database backend. For improved API search query efficiency Cumulus provides data replication to an Elasticsearch instance. For legacy reasons, Cumulus is currently also deploying a DynamoDB datastore, and writes are replicated in parallel with the PostgreSQL database writes. The DynamoDB replicated tables and parallel writes will be removed in future releases.

    PostgreSQL Database Schema Diagram

    ERD of the Cumulus Database

    Maintenance

    System maintenance personnel have access to manage ingest and various portions of Cumulus via an AWS API gateway, as well as the operator dashboard.

    Deployment Structure

    Cumulus is deployed via Terraform and is organized internally into two separate top-level modules, as well as several external modules.

    Cumulus

    The Cumulus module, which contains multiple internal submodules, deploys all of the Cumulus components that are not part of the Data Persistence portion of this diagram.

    Data persistence

    The data persistence module provides the Data Persistence portion of the diagram.

    Other modules

    Other modules are provided as artifacts on the release page for use in users configuring their own deployment and contain extracted subcomponents of the cumulus module. For more on these components see the components documentation.

    For more on the specific structure, examples of use and how to deploy and more, please see the deployment docs as well as the cumulus-template-deploy repo .

    - + \ No newline at end of file diff --git a/docs/v10.1.0/configuration/cloudwatch-retention/index.html b/docs/v10.1.0/configuration/cloudwatch-retention/index.html index c5a0b6033f1..2de0bc4c23a 100644 --- a/docs/v10.1.0/configuration/cloudwatch-retention/index.html +++ b/docs/v10.1.0/configuration/cloudwatch-retention/index.html @@ -5,13 +5,13 @@ Cloudwatch Retention | Cumulus Documentation - +
    Version: v10.1.0

    Cloudwatch Retention

    Our lambdas dump logs to AWS CloudWatch. By default, these logs exist indefinitely. However, there are ways to specify a duration for log retention.

    aws-cli

    In addition to getting your aws-cli set-up, there are two values you'll need to acquire.

    1. log-group-name: the name of the log group who's retention policy (retention time) you'd like to change. We'll use /aws/lambda/KinesisInboundLogger in our examples.
    2. retention-in-days: the number of days you'd like to retain the logs in the specified log group for. There is a list of possible values available in the aws logs documentation.

    For example, if we wanted to set log retention to 30 days on our KinesisInboundLogger lambda, we would write:

    aws logs put-retention-policy --log-group-name "/aws/lambda/KinesisInboundLogger" --retention-in-days 30

    Note: The aws-cli log command that we're using is explained in detail here.

    AWS Management Console

    Changing the log retention policy in the AWS Management Console is a fairly simple process:

    1. Navigate to the CloudWatch service in the AWS Management Console.
    2. Click on the Logs entry on the sidebar.
    3. Find the Log Group who's retention policy you're interested in changing.
    4. Click on the value in the Expire Events After column.
    5. Enter/Select the number of days you'd like to retain logs in that log group for.

    Screenshot of AWS console showing how to configure the retention period for Cloudwatch logs

    - + \ No newline at end of file diff --git a/docs/v10.1.0/configuration/collection-storage-best-practices/index.html b/docs/v10.1.0/configuration/collection-storage-best-practices/index.html index 87a4596d525..864f6d2fbe0 100644 --- a/docs/v10.1.0/configuration/collection-storage-best-practices/index.html +++ b/docs/v10.1.0/configuration/collection-storage-best-practices/index.html @@ -5,13 +5,13 @@ Collection Cost Tracking and Storage Best Practices | Cumulus Documentation - +
    Version: v10.1.0

    Collection Cost Tracking and Storage Best Practices

    Organizing your data is important for metrics you may want to collect. AWS S3 storage and cost metrics are calculated at the bucket level, so it is easy to get metrics by bucket. You can get storage metrics at the key prefix level, but that is done through the CLI, which can be very slow for large buckets. It is very difficult to estimate costs at the prefix level.

    Calculating Storage By Collection

    By bucket

    Usage by bucket can be obtained in your AWS Billing Dashboard via an S3 Usage Report. You can download your usage report for a period of time and review your storage and requests at the bucket level.

    Bucket metrics can also be found in the AWS CloudWatch Metrics Console (also see Using Amazon CloudWatch Metrics).

    Navigate to Storage Metrics and select the BucketName for all buckets you are interested in. The available metrics are BucketSizeInBytes and NumberOfObjects.

    In the Graphed metrics tab, you can select the type of statistic (i.e. average, minimum, maximum) and the period for the stats. At the top, it's useful to select from the dropdown to view the metrics as a number. You can also select the time period for which you want to see stats.

    Alternatively you can query CloudWatch using the CLI.

    This command will return the average number of bytes in the bucket test-bucket for 7/31/2019:

    aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2019-07-31T00:00:00 --end-time 2019-08-01T00:00:00 --period 86400 --statistics Average --region us-east-1 --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=test-bucket Name=StorageType,Value=StandardStorage

    The result looks like:

    {
    "Datapoints": [
    {
    "Timestamp": "2019-07-31T00:00:00Z",
    "Average": 150996467959.0,
    "Unit": "Bytes"
    }
    ],
    "Label": "BucketSizeBytes"
    }

    By key prefix

    AWS does not offer storage and usage statistics at a key prefix level. Via the AWS CLI, you can get the total storage for a bucket or folder. The following command would get the storage for folder example-folder in bucket sample-bucket:

    aws s3 ls --summarize --human-readable --recursive s3://sample-bucket/example-folder | grep 'Total'

    Note that this can be a long-running operation for large buckets.

    Calculating Cost By Collection

    NASA NGAP Environment

    If using an NGAP account, the cost per bucket can be found in your CloudTamer console, in the Financials section of your account information. This is calculated on a monthly basis.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Outside of NGAP

    You can enabled S3 Cost Allocation Tags and tag your buckets. From there, you can view the cost breakdown in your AWS Billing Dashboard via the Cost Explorer. Cost Allocation Tagging is available at the bucket level.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Storage Configuration

    Cumulus allows for the configuration of many buckets for your files. Buckets are created and added to your deployment as part of the deployment process.

    In your Cumulus collection configuration, you specify where you want the files to be stored post-processing. This is done by matching a regular expression on the file with the configured bucket.

    Note that in the collection configuration, the bucket field is the key to the buckets variable in the deployment's .tfvars file.

    Organizing By Bucket

    You can specify separate groups of buckets for each collection, which could look like the example below.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "MOD09GQ-006-private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "MOD09GQ-006-public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    Additional collections would go to different buckets.

    Organizing by Key Prefix

    Different collections can be organized into different folders in the same bucket, using the key prefix, which is specified as the url_path in the collection configuration. In this simplified collection configuration example, the url_path field is set at the top level so that all files go to a path prefixed with the collection name and version.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    In this case, the path to all the files would be: MOD09GQ___006/<filename> in their respective buckets.

    The url_path can be overidden directly on the file configuration. The example below produces the same result.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "protected-2",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    }
    ]
    }
    - + \ No newline at end of file diff --git a/docs/v10.1.0/configuration/data-management-types/index.html b/docs/v10.1.0/configuration/data-management-types/index.html index 1a7e0767adf..100df7dfa08 100644 --- a/docs/v10.1.0/configuration/data-management-types/index.html +++ b/docs/v10.1.0/configuration/data-management-types/index.html @@ -5,13 +5,13 @@ Cumulus Data Management Types | Cumulus Documentation - +
    Version: v10.1.0

    Cumulus Data Management Types

    What Are The Cumulus Data Management Types

    • Collections: Collections are logical sets of data objects of the same data type and version. They provide contextual information used by Cumulus ingest.
    • Granules: Granules are the smallest aggregation of data that can be independently managed. They are always associated with a collection, which is a grouping of granules.
    • Providers: Providers generate and distribute input data that Cumulus obtains and sends to workflows.
    • Rules: Rules tell Cumulus how to associate providers and collections and when/how to start processing a workflow.
    • Workflows: Workflows are composed of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage, and archive data.
    • Executions: Executions are records of a workflow.
    • Reconciliation Reports: Reports are a comparison of data sets to check to see if they are in agreement and to help Cumulus users detect conflicts.

    Interaction

    • Providers tell Cumulus where to get new data - i.e. S3, HTTPS
    • Collections tell Cumulus where to store the data files
    • Rules tell Cumulus when to trigger a workflow execution and tie providers and collections together

    Managing Data Management Types

    The following are created via the dashboard or API:

    • Providers
    • Collections
    • Rules
    • Reconciliation reports

    Granules are created by workflow executions and then can be managed via the dashboard or API.

    An execution record is created for each workflow execution triggered and can be viewed in the dashboard or data can be retrieved via the API.

    Workflows are created and managed via the Cumulus deployment.

    Configuration Fields

    Schemas

    Looking at our API schema definitions can provide us with some insight into collections, providers, rules, and their attributes (and whether those are required or not). The schema for different concepts will be reference throughout this document.

    The schemas are extremely useful for understanding which attributes are configurable and which of those are required. Cumulus uses these schemas for validation.

    Providers

    Please note:

    • While connection configuration is defined here, things that are more specific to a specific ingest setup (e.g. 'What target directory should we be pulling from' or 'How is duplicate handling configured?') are generally defined in a Rule or Collection, not the Provider.
    • There is some provider behavior which is controlled by task-specific configuration and not the provider definition. This configuration has to be set on a per-workflow basis. For example, see the httpListTimeout configuration on the discover-granules task

    Provider Configuration

    The Provider configuration is defined by a JSON object that takes different configuration keys depending on the provider type. The following are definitions of typical configuration values relevant for the various providers:

    Configuration by provider type
    S3
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be s3 for this provider type.
    hoststringYesS3 Bucket to pull data from
    http
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be http for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 80
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certificateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    https
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be https for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 443
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certiciateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    ftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be ftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to anonymous if not defined
    passwordstringNoPassword to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to password if not defined
    portintegerNoPort to connect to the provider on. Defaults to 21
    sftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be sftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the sftp server.
    passwordstringNoPassword to use to connect to the sftp server.
    portintegerNoPort to connect to the provider on. Defaults to 22
    privateKeystringNofilename assumed to be in s3://bucketInternal/stackName/crypto
    cmKeyIdstringNoAWS KMS Customer Master Key arn or alias

    Collections

    Break down of [s3_MOD09GQ_006.json](https://github.com/nasa/cumulus/blob/master/example/data/collections/s3_MOD09GQ_006/s3_MOD09GQ_006.json)
    KeyValueRequiredDescription
    name"MOD09GQ"YesThe name attribute designates the name of the collection. This is the name under which the collection will be displayed on the dashboard
    version"006"YesA version tag for the collection
    granuleId"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$"YesThe regular expression used to validate the granule ID extracted from filenames according to the granuleIdExtraction
    granuleIdExtraction"(MOD09GQ\..*)(\.hdf|\.cmr|_ndvi\.jpg)"YesThe regular expression used to extract the granule ID from filenames. The first capturing group extracted from the filename by the regex will be used as the granule ID.
    sampleFileName"MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesAn example filename belonging to this collection
    files<JSON Object> of files defined hereYesDescribe the individual files that will exist for each granule in this collection (size, browse, meta, etc.)
    dataType"MOD09GQ"NoCan be specified, but this value will default to the collection_name if not
    duplicateHandling"replace"No("replace"|"version"|"skip") determines granule duplicate handling scheme
    ignoreFilesConfigForDiscoveryfalse (default)NoBy default, during discovery only files that match one of the regular expressions in this collection's files attribute (see above) are ingested. Setting this to true will ignore the files attribute during discovery, meaning that all files for a granule (i.e., all files with filenames matching granuleIdExtraction) will be ingested even when they don't match a regular expression in the files attribute at discovery time. (NOTE: this attribute does not appear in the example file, but is listed here for completeness.)
    process"modis"NoExample options for this are found in the ChooseProcess step definition in the IngestAndPublish workflow definition
    meta<JSON Object> of MetaData for the collectionNoMetaData for the collection. This metadata will be available to workflows for this collection via the Cumulus Message Adapter.
    url_path"{cmrMetadata.Granule.Collection.ShortName}/
    {substring(file.fileName, 0, 3)}"
    NoFilename without extension

    files-object

    KeyValueRequiredDescription
    regex"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"YesRegular expression used to identify the file
    sampleFileNameMOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesFilename used to validate the provided regex
    type"data"NoValue to be assigned to the Granule File Type. CNM types are used by Cumulus CMR steps, non-CNM values will be treated as 'data' type. Currently only utilized in DiscoverGranules task
    bucket"internal"YesName of the bucket where the file will be stored
    url_path"${collectionShortName}/{substring(file.fileName, 0, 3)}"NoFolder used to save the granule in the bucket. Defaults to the collection url_path
    checksumFor"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"NoIf this is a checksum file, set checksumFor to the regex of the target file.

    Rules

    Rules are used by to start processing workflows and the transformation process. Rules can be invoked manually, based on a schedule, or can be configured to be triggered by either events in Kinesis, SNS messages or SQS messages.

    Rule configuration
    KeyValueRequiredDescription
    name"L2_HR_PIXC_kinesisRule"YesName of the rule. This is the name under which the rule will be listed on the dashboard
    workflow"CNMExampleWorkflow"YesName of the workflow to be run. A list of available workflows can be found on the Workflows page
    provider"PODAAC_SWOT"NoConfigured provider's ID. This can be found on the Providers dashboard page
    collection<JSON Object> collection object shown belowYesName and version of the collection this rule will moderate. Relates to a collection configured and found in the Collections page
    payload<JSON Object or Array>NoThe payload to be passed to the workflow
    meta<JSON Object> of MetaData for the ruleNoMetaData for the rule. This metadata will be available to workflows for this rule via the Cumulus Message Adapter.
    rule<JSON Object> rule type and associated values - discussed belowYesObject defining the type and subsequent attributes of the rule
    state"ENABLED"No("ENABLED"|"DISABLED") whether or not the rule will be active. Defaults to "ENABLED".
    queueUrlhttps://sqs.us-east-1.amazonaws.com/1234567890/queue-nameNoURL for SQS queue that will be used to schedule workflows for this rule
    tags["kinesis", "podaac"]NoAn array of strings that can be used to simplify search

    collection-object

    KeyValueRequiredDescription
    name"L2_HR_PIXC"YesName of a collection defined/configured in the Collections dashboard page
    version"000"YesVersion number of a collection defined/configured in the Collections dashboard page

    meta-object

    KeyValueRequiredDescription
    retries3NoNumber of retries on errors, for sqs-type rule only. Defaults to 3.
    visibilityTimeout900NoVisibilityTimeout in seconds for the inflight messages, for sqs-type rule only. Defaults to the visibility timeout of the SQS queue when the rule is created.

    rule-object

    KeyValueRequiredDescription
    type"kinesis"Yes("onetime"|"scheduled"|"kinesis"|"sns"|"sqs") type of scheduling/workflow kick-off desired
    value<String> ObjectDependsDiscussion of valid values is below

    rule-value

    The rule - value entry depends on the type of run:

    • If this is a onetime rule this can be left blank. Example
    • If this is a scheduled rule this field must hold a valid cron-type expression or rate expression.
    • If this is a kinesis rule, this must be a configured ${Kinesis_stream_ARN}. Example
    • If this is an sns rule, this must be an existing ${SNS_Topic_Arn}. Example
    • If this is an sqs rule, this must be an existing ${SQS_QueueUrl} that your account has permissions to access, and also you must configure a dead-letter queue for this SQS queue. Example

    sqs-type rule features

    • When an SQS rule is triggered, the SQS message remains on the queue.
    • The SQS message is not processed multiple times in parallel when visibility timeout is properly set. You should set the visibility timeout to the maximum expected length of the workflow with padding. Longer is better to avoid parallel processing.
    • The SQS message visibility timeout can be overridden by the rule.
    • Upon successful workflow execution, the SQS message is removed from the queue.
    • Upon failed execution(s), the workflow is run 3 or configured number of times.
    • Upon failed execution(s), the visibility timeout will be set to 5s to allow retries.
    • After configured number of failed retries, the SQS message is moved to the dead-letter queue configured for the SQS queue.

    Configuration Via Cumulus Dashboard

    Create A Provider

    • In the Cumulus dashboard, go to the Provider page.

    Screenshot of Create Provider form

    • Click on Add Provider.
    • Fill in the form and then submit it.

    Screenshot of Create Provider form

    Create A Collection

    • Go to the Collections page.

    Screenshot of the Collections page

    • Click on Add Collection.
    • Copy and paste or fill in the collection JSON object form.

    Screenshot of Add Collection form

    • Once you submit the form, you should be able to verify that your new collection is in the list.

    Create A Rule

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Rule Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v10.1.0/configuration/lifecycle-policies/index.html b/docs/v10.1.0/configuration/lifecycle-policies/index.html index a6c31215829..56558a1d3f0 100644 --- a/docs/v10.1.0/configuration/lifecycle-policies/index.html +++ b/docs/v10.1.0/configuration/lifecycle-policies/index.html @@ -5,13 +5,13 @@ Setting S3 Lifecycle Policies | Cumulus Documentation - +
    Version: v10.1.0

    Setting S3 Lifecycle Policies

    This document will outline, in brief, how to set data lifecycle policies so that you are more easily able to control data storage costs while keeping your data accessible. For more information on why you might want to do this, see the 'Additional Information' section at the end of the document.

    Requirements

    • The AWS CLI installed and configured (if you wish to run the CLI example). See AWS's guide to setting up the AWS CLI for more on this. Please ensure the AWS CLI is in your shell path.
    • You will need a S3 bucket on AWS. You are strongly encouraged to use a bucket without voluminous amounts of data in it for experimenting/learning.
    • An AWS user with the appropriate roles to access the target bucket as well as modify bucket policies.

    Examples

    Walk-through on setting time-based S3 Infrequent Access (S3IA) bucket policy

    This example will give step-by-step instructions on updating a bucket's lifecycle policy to move all objects in the bucket from the default storage to S3 Infrequent Access (S3IA) after a period of 90 days. Below are instructions for walking through configuration via the command line and the management console.

    Command Line

    Please ensure you have the AWS CLI installed and configured for access prior to attempting this example.

    Create policy

    From any directory you chose, open an editor and add the following to a file named exampleRule.json

    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    Set policy

    On the command line run the following command (with the bucket you're working with substituted in place of yourBucketNameHere).

    aws s3api put-bucket-lifecycle-configuration --bucket yourBucketNameHere --lifecycle-configuration file://exampleRule.json

    Verify policy has been set

    To obtain all of the existing policies for a bucket, run the following command (again substituting the correct bucket name):

     $ aws s3api get-bucket-lifecycle-configuration --bucket yourBucketNameHere
    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    You have set a policy that transitions any version of an object in the bucket to S3IA after each object version has not been modified for 90 days.

    Management Console

    Create Policy

    To create the example policy on a bucket via the management console, go to the following URL (replacing 'yourBucketHere' with the bucket you intend to update):

    https://s3.console.aws.amazon.com/s3/buckets/yourBucketHere/?tab=overview

    You should see a screen similar to:

    Screenshot of AWS console for an S3 bucket

    Click the "Management" Tab, then lifecycle button and press + Add lifecycle rule:

    Screenshot of &quot;Management&quot; tab of AWS console for an S3 bucket

    Give the rule a name (e.g. '90DayRule'), leaving the filter blank:

    Screenshot of window for configuring the name and scope of a lifecycle rule on an S3 bucket in the AWS console

    Click next, and mark Current Version and Previous Versions.

    Then for each, click + Add transition and select Transition to Standard-IA after for the Object creation field, and set 90 for the Days after creation/Days after objects become concurrent field. Your screen should look similar to:

    Screenshot of window for configuring the storage class transitions of a lifecycle rule on an S3 bucket in the AWS console

    Click next, then next past the Configure expiration screen (we won't be setting this), and on the fourth page, click Save:

    Screenshot of window for reviewing the configuration of a lifecycle rule on an S3 bucket in the AWS console

    You should now see you have a rule configured for your bucket:

    Screenshot of lifecycle rule appearing in the &quot;Management&quot; tab of AWS console for an S3 bucket

    You have now set a policy that transitions any version of an object in the bucket to S3IA after each object has not been modified for 90 days.

    Additional Information

    This section lists information you may want prior to enacting lifecycle policies. It is not required content for working through the examples.

    Strategy Overview

    For a discussion of overall recommended strategy, please review the Methodology for Data Lifecycle Management on the EarthData wiki.

    AWS Documentation

    The examples shown in this document are obviously fairly basic cases. By using object tags, filters and other configuration options you can enact far more complicated policies for various scenarios. For more reading on the topics presented on this page see:

    - + \ No newline at end of file diff --git a/docs/v10.1.0/configuration/monitoring-readme/index.html b/docs/v10.1.0/configuration/monitoring-readme/index.html index fd1d2338178..625c8e1e8a7 100644 --- a/docs/v10.1.0/configuration/monitoring-readme/index.html +++ b/docs/v10.1.0/configuration/monitoring-readme/index.html @@ -5,14 +5,14 @@ Monitoring Best Practices | Cumulus Documentation - +
    Version: v10.1.0

    Monitoring Best Practices

    This document intends to provide a set of recommendations and best practices for monitoring the state of a deployed Cumulus and diagnosing any issues.

    Cumulus-provided resources and integrations for monitoring

    Cumulus provides a number set of resources that are useful for monitoring the system and its operation.

    Cumulus Dashboard

    The primary tool for monitoring the Cumulus system is the Cumulus Dashboard. The dashboard is hosted on Github and includes instructions on how to deploy and link it into your core Cumulus deployment.

    The dashboard displays workflow executions, their status, inputs, outputs, and some diagnostic information such as logs. For further information on the dashboard, its usage, and the information it provides, see the documentation.

    Cumulus-provided AWS resources

    Cumulus sets up CloudWatch log groups for all Core-provided tasks.

    Monitoring Lambda Functions

    Logging for each Lambda Function is available in Lambda-specific CloudWatch log groups.

    Monitoring ECS services

    Each deployed cumulus_ecs_service module also includes a CloudWatch log group for the processes running on ECS.

    Monitoring workflows

    For advanced debugging, we also configure dead letter queues on critical system functions. These will allow you to monitor and debug invalid inputs to the functions we use to start workflows, which can be helpful if you find that you are not seeing workflows being started as expected. More information on these can be found in the dead letter queue documentation

    AWS recommendations

    AWS has a number of recommendations on system monitoring. Rather than reproduce those here and risk providing outdated guidance, we've documented the following links which will take you to available AWS docs on monitoring recommendations and best practices for the services used in Cumulus:

    Example: Setting up email notifications for CloudWatch logs

    Cumulus does not provide out-of-the-box support for email notifications at this time. However, setting up email notifications on AWS is fairly straightforward in that the operative components are an AWS SNS topic and a subscribed email address.

    In terms of Cumulus integration, forwarding CloudWatch logs requires creating a mechanism, most likely a Lambda Function subscribed to the log group that will receive, filter and forward these messages to the SNS topic.

    As a very simple example, we could create a function that filters CloudWatch logs created by the @cumulus/logger package and sends email notifications for error and fatal log levels, adapting the example linked above:

    const zlib = require('zlib');
    const aws = require('aws-sdk');
    const { promisify } = require('util');

    const gunzip = promisify(zlib.gunzip);
    const sns = new aws.SNS();

    exports.handler = async (event) => {
    const payload = Buffer.from(event.awslogs.data, 'base64');
    const decompressedData = await gunzip(payload);
    const logData = JSON.parse(decompressedData.toString('ascii'));
    return await Promise.all(logData.logEvents.map(async (logEvent) => {
    const logMessage = JSON.parse(logEvent.message);
    if (['error', 'fatal'].includes(logMessage.level)) {
    return sns.publish({
    TopicArn: process.env.EmailReportingTopicArn,
    Message: logEvent.message
    }).promise();
    }
    return Promise.resolve();
    }));
    };

    After creating the SNS topic, We can deploy this code as a lambda function, following the setup steps from Amazon. Make sure to include your SNS topic ARN as an environment variable on the lambda function by using the --environment option on aws lambda create-function.

    You will need to create subscription filters for each log group you want to receive emails for. We recommend automating this as much as possible, and you could very well handle this via Terraform, such as using a module to deploy filters alongside log groups, or exporting the log group names to an all-in-one email notification module.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/configuration/server_access_logging/index.html b/docs/v10.1.0/configuration/server_access_logging/index.html index cbf2b997c0e..1676a20911a 100644 --- a/docs/v10.1.0/configuration/server_access_logging/index.html +++ b/docs/v10.1.0/configuration/server_access_logging/index.html @@ -5,13 +5,13 @@ S3 Server Access Logging | Cumulus Documentation - +
    Version: v10.1.0

    S3 Server Access Logging

    Via AWS Console

    Enable server access logging for an S3 bucket

    Via AWS Command Line Interface

    1. Create a logging.json file with these contents, replacing <stack-internal-bucket> with your stack's internal bucket name, and <stack> with the name of your cumulus stack.

      {
      "LoggingEnabled": {
      "TargetBucket": "<stack-internal-bucket>",
      "TargetPrefix": "<stack>/ems-distribution/s3-server-access-logs/"
      }
      }
    2. Add the logging policy to each of your protected and public buckets by calling this command on each bucket.

      aws s3api put-bucket-logging --bucket <protected/public-bucket-name> --bucket-logging-status file://logging.json
    3. Verify the logging policy exists on your buckets.

      aws s3api get-bucket-logging --bucket <protected/public-bucket-name>
    - + \ No newline at end of file diff --git a/docs/v10.1.0/configuration/task-configuration/index.html b/docs/v10.1.0/configuration/task-configuration/index.html index 85d4b86725a..959b0684c21 100644 --- a/docs/v10.1.0/configuration/task-configuration/index.html +++ b/docs/v10.1.0/configuration/task-configuration/index.html @@ -5,13 +5,13 @@ Configuration of Tasks | Cumulus Documentation - +
    Version: v10.1.0

    Configuration of Tasks

    The cumulus module exposes values for configuration for some of the provided archive and ingest tasks. Currently the following are available as configurable variables:

    cmr_search_client_config

    Configuration parameters for CMR search client for cumulus archive module tasks in the form:

    <lambda_identifier>_report_cmr_limit = <maximum number records can be returned from cmr-client search, this should be greater than cmr_page_size>
    <lambda_identifier>_report_cmr_page_size = <number of records for each page returned from CMR>
    type = map(string)

    More information about cmr limit and cmr page_size can be found from @cumulus/cmr-client and CMR Search API document.

    Currently the following values are supported:

    • create_reconciliation_report_cmr_limit
    • create_reconciliation_report_cmr_page_size

    Example

    cmr_search_client_config = {
    create_reconciliation_report_cmr_limit = 2500
    create_reconciliation_report_cmr_page_size = 250
    }

    elasticsearch_client_config

    Configuration parameters for Elasticsearch client for cumulus archive module tasks in the form:

    <lambda_identifier>_es_scroll_duration = <duration>
    <lambda_identifier>_es_scroll_size = <size>
    type = map(string)

    Currently the following values are supported:

    • create_reconciliation_report_es_scroll_duration
    • create_reconciliation_report_es_scroll_size

    Example

    elasticsearch_client_config = {
    create_reconciliation_report_es_scroll_duration = "15m"
    create_reconciliation_report_es_scroll_size = 2000
    }

    lambda_timeouts

    A configurable map of timeouts (in seconds) for cumulus ingest module task lambdas in the form:

    <lambda_identifier>_timeout: <timeout>
    type = map(string)

    Currently the following values are supported:

    • discover_granules_task_timeout
    • discover_pdrs_task_timeout
    • hyrax_metadata_update_tasks_timeout
    • lzards_backup_task_timeout
    • move_granules_task_timeout
    • parse_pdr_task_timeout
    • pdr_status_check_task_timeout
    • post_to_cmr_task_timeout
    • queue_granules_task_timeout
    • queue_pdrs_task_timeout
    • queue_workflow_task_timeout
    • sync_granule_task_timeout
    • update_granules_cmr_metadata_file_links_task_timeout

    Example

    lambda_timeouts = {
    discover_granules_task_timeout = 300
    }
    - + \ No newline at end of file diff --git a/docs/v10.1.0/data-cookbooks/about-cookbooks/index.html b/docs/v10.1.0/data-cookbooks/about-cookbooks/index.html index b8a1a26e87f..ee0495ac72d 100644 --- a/docs/v10.1.0/data-cookbooks/about-cookbooks/index.html +++ b/docs/v10.1.0/data-cookbooks/about-cookbooks/index.html @@ -5,13 +5,13 @@ About Cookbooks | Cumulus Documentation - +
    Version: v10.1.0

    About Cookbooks

    Introduction

    The following data cookbooks are documents containing examples and explanations of workflows in the Cumulus framework. Additionally, the following data cookbooks should serve to help unify an institution/user group on a set of terms.

    Setup

    The data cookbooks assume you can configure providers, collections, and rules to run workflows. Visit Cumulus data management types for information on how to configure Cumulus data management types.

    Adding a page

    As shown in detail in the "Add a New Page and Sidebars" section in Cumulus Docs: How To's, you can add a new page to the data cookbook by creating a markdown (.md) file in the docs/data-cookbooks directory. The new page can then be linked to the sidebar by adding it to the Data-Cookbooks object in the website/sidebar.json file as data-cookbooks/${id}.

    More about workflows

    Workflow general information

    Input & Output

    Developing Workflow Tasks

    Workflow Configuration How-to's

    - + \ No newline at end of file diff --git a/docs/v10.1.0/data-cookbooks/browse-generation/index.html b/docs/v10.1.0/data-cookbooks/browse-generation/index.html index 5aa8118b2d0..3188fc9e7b3 100644 --- a/docs/v10.1.0/data-cookbooks/browse-generation/index.html +++ b/docs/v10.1.0/data-cookbooks/browse-generation/index.html @@ -5,7 +5,7 @@ Ingest Browse Generation | Cumulus Documentation - + @@ -15,7 +15,7 @@ provider keys with the previously entered values) Note that you need to set the "provider_path" to the path on your bucket (e.g. "/data") that you've staged your mock/test data.:

    {
    "name": "TestBrowseGeneration",
    "workflow": "DiscoverGranulesBrowseExample",
    "provider": "{{provider_from_previous_step}}",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "meta": {
    "provider_path": "{{path_to_data}}"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "updatedAt": 1553053438767
    }

    Run Workflows

    Once you've configured the Collection and Provider and added a onetime rule, you're ready to trigger your rule, and watch the ingest workflows process.

    Go to the Rules tab, click the rule you just created:

    Screenshot of the Rules overview page with a list of rules in the Cumulus dashboard

    Then click the gear in the upper right corner and click "Rerun":

    Screenshot of clicking the button to rerun a workflow rule from the rule edit page in the Cumulus dashboard

    Tab over to executions and you should see the DiscoverGranulesBrowseExample workflow run, succeed, and then moments later the CookbookBrowseExample should run and succeed.

    Screenshot of page listing executions in the Cumulus dashboard

    Results

    You can verify your data has ingested by clicking the successful workflow entry:

    Screenshot of individual entry from table listing executions in the Cumulus dashboard

    Select "Show Output" on the next page

    Screenshot of &quot;Show output&quot; button from individual execution page in the Cumulus dashboard

    and you should see in the payload from the workflow something similar to:

    "payload": {
    "process": "modis",
    "granules": [
    {
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-private",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "type": "browse",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-protected-2",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}"
    }
    ],
    "cmrLink": "https://cmr.uat.earthdata.nasa.gov/search/granules.json?concept_id=G1222231611-CUMULUS",
    "cmrConceptId": "G1222231611-CUMULUS",
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "cmrMetadataFormat": "echo10",
    "dataType": "MOD09GQ",
    "version": "006",
    "published": true
    }
    ]
    }

    You can verify the granules exist within your cumulus instance (search using the Granules interface, check the S3 buckets, etc) and validate that the above CMR entry


    Build Processing Lambda

    This section discusses the construction of a custom processing lambda to replace the contrived example from this entry for a real dataset processing task.

    To ingest your own data using this example, you will need to construct your own lambda to replace the source in ProcessingStep that will generate browse imagery and provide or update a CMR metadata export file.

    You will then need to add the lambda to your Cumulus deployment as a aws_lambda_function Terraform resource.

    The discussion below outlines requirements for this lambda.

    Inputs

    The incoming message to the task defined in the ProcessingStep as configured will have the following configuration values (accessible inside event.config courtesy of the message adapter):

    Configuration

    • event.config.bucket -- the name of the bucket configured in terraform.tfvars as your internal bucket.

    • event.config.collection -- The full collection object we will configure in the Configure Ingest section. You can view the expected collection schema in the docs here or in the source code on github. You need this as available input and output so you can update as needed.

    event.config.additionalUrls, generateFakeBrowse and event.config.cmrMetadataFormat from the example can be ignored as they're configuration flags for the provided example script.

    Payload

    The 'payload' from the previous task is accessible via event.input. The expected payload output schema from SyncGranules can be viewed here.

    In our example, the payload would look like the following. Note: The types are set per-file based on what we configured in our collection, and were initially added as part of the DiscoverGranules step in the DiscoverGranulesBrowseExample workflow.

     "payload": {
    "process": "modis",
    "granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    }
    ]
    }
    ]
    }

    Generating Browse Imagery

    The provided example script used in the example goes through all granules and adds a 'fake' .jpg browse file to the same staging location as the data staged by prior ingest tasksf.

    The processing lambda you construct will need to do the following:

    • Create a browse image file based on the input data, and stage it to a location accessible to both this task and the FilesToGranules and MoveGranules tasks in a S3 bucket.
    • Add the browse file to the input granule files, making sure to set the granule file's type to browse.
    • Update meta.input_granules with the updated granules list, as well as provide the files to be integrated by FilesToGranules as output from the task.

    Generating/updating CMR metadata

    If you do not already have a CMR file in the granules list, you will need to generate one for valid export. This example's processing script generates and adds it to the FilesToGranules file list via the payload but it can be present in the InputGranules from the DiscoverGranules task as well if you'd prefer to pre-generate it.

    Both downstream tasks MoveGranules, UpdateGranulesCmrMetadataFileLinks, and PostToCmr expect a valid CMR file to be available if you want to export to CMR.

    Expected Outputs for processing task/tasks

    In the above example, the critical portion of the output to FilesToGranules is the payload and meta.input_granules.

    In the example provided, the processing task is setup to return an object with the keys "files" and "granules". In the cumulus_message configuration, the outputs are mapped in the configuration to the payload, granules to meta.input_granules:

              "task_config": {
    "inputGranules": "{$.meta.input_granules}",
    "granuleIdExtraction": "{$.meta.collection.granuleIdExtraction}"
    }

    Their expected values from the example above may be useful in constructing a processing task:

    payload

    The payload includes a full list of files to be 'moved' into the cumulus archive. The FilesToGranules task will take this list, merge it with the information from InputGranules, then pass that list to the MoveGranules task. The MoveGranules task will then move the files to their targets. The UpdateGranulesCmrMetadataFileLinks task will update the CMR metadata file if it exists with the updated granule locations and update the CMR file etags.

    In the provided example, a payload being passed to the FilesToGranules task should be expected to look like:

      "payload": [
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml"
    ]

    This list is the list of granules FilesToGranules will act upon to add/merge with the input_granules object.

    The pathing is generated from sync-granules, but in principle the files can be staged wherever you like so long as the processing/MoveGranules task's roles have access and the filename matches the collection configuration.

    input_granules

    The FilesToGranules task utilizes the incoming payload to chose which files to move, but pulls all other metadata from meta.input_granules. As such, the output payload in the example would look like:

    "input_granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg"
    }
    ]
    }
    ],
    - + \ No newline at end of file diff --git a/docs/v10.1.0/data-cookbooks/choice-states/index.html b/docs/v10.1.0/data-cookbooks/choice-states/index.html index beda5a288c0..5f95dc19f1f 100644 --- a/docs/v10.1.0/data-cookbooks/choice-states/index.html +++ b/docs/v10.1.0/data-cookbooks/choice-states/index.html @@ -5,13 +5,13 @@ Choice States | Cumulus Documentation - +
    Version: v10.1.0

    Choice States

    Cumulus supports AWS Step Function Choice states. A Choice state enables branching logic in Cumulus workflows.

    Choice state definitions include a list of Choice Rules. Each Choice Rule defines a logical operation which compares an input value against a value using a comparison operator. For available comparison operators, review the AWS docs.

    If the comparison evaluates to true, the Next state is followed.

    Example

    In examples/cumulus-tf/parse_pdr_workflow.tf the ParsePdr workflow uses a Choice state, CheckAgainChoice, to terminate the workflow once meta.isPdrFinished: true is returned by the CheckStatus state.

    The CheckAgainChoice state definition requires an input object of the following structure:

    {
    "meta": {
    "isPdrFinished": false
    }
    }

    Given the above input to the CheckAgainChoice state, the workflow would transition to the PdrStatusReport state.

    "CheckAgainChoice": {
    "Type": "Choice",
    "Choices": [
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": false,
    "Next": "PdrStatusReport"
    },
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": true,
    "Next": "WorkflowSucceeded"
    }
    ],
    "Default": "WorkflowSucceeded"
    }

    Advanced: Loops in Cumulus Workflows

    Understanding the complete ParsePdr workflow is not necessary to understanding how Choice states work, but ParsePdr provides an example of how Choice states can be used to create a loop in a Cumulus workflow.

    In the complete ParsePdr workflow definition, the state QueueGranules is followed by CheckStatus. From CheckStatus a loop starts: Given CheckStatus returns meta.isPdrFinished: false, CheckStatus is followed by CheckAgainChoice is followed by PdrStatusReport is followed by WaitForSomeTime, which returns to CheckStatus. Once CheckStatus returns meta.isPdrFinished: true, CheckAgainChoice proceeds to WorkflowSucceeded.

    Execution graph of SIPS ParsePdr workflow in AWS Step Functions console

    Further documentation

    For complete details on Choice state configuration options, see the Choice state documentation.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/data-cookbooks/cnm-workflow/index.html b/docs/v10.1.0/data-cookbooks/cnm-workflow/index.html index 9d68a84d204..2ef6b5ac691 100644 --- a/docs/v10.1.0/data-cookbooks/cnm-workflow/index.html +++ b/docs/v10.1.0/data-cookbooks/cnm-workflow/index.html @@ -5,7 +5,7 @@ CNM Workflow | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v10.1.0

    CNM Workflow

    This entry documents how to setup a workflow that utilizes the built-in CNM/Kinesis functionality in Cumulus.

    Prior to working through this entry you should be familiar with the Cloud Notification Mechanism.

    Sections


    Prerequisites

    Cumulus

    This entry assumes you have a deployed instance of Cumulus (version >= 1.16.0). The entry assumes you are deploying Cumulus via the cumulus terraform module sourced from the release page.

    AWS CLI

    This entry assumes you have the AWS CLI installed and configured. If you do not, please take a moment to review the documentation - particularly the examples relevant to Kinesis - and install it now.

    Kinesis

    This entry assumes you already have two Kinesis data steams created for use as CNM notification and response data streams.

    If you do not have two streams setup, please take a moment to review the Kinesis documentation and setup two basic single-shard streams for this example:

    Using the "Create Data Stream" button on the Kinesis Dashboard, work through the dialogue.

    You should be able to quickly use the "Create Data Stream" button on the Kinesis Dashboard, and setup streams that are similar to the following example:

    Screenshot of AWS console page for creating a Kinesis stream

    Please bear in mind that your {{prefix}}-lambda-processing IAM role will need permissions to write to the response stream for this workflow to succeed if you create the Kinesis stream with a dashboard user. If you are using the cumulus top-level module for your deployment this should be set properly.

    If not, the most straightforward approach is to attach the AmazonKinesisFullAccess policy for the stream resource to whatever role your Lambda s are using, however your environment/security policies may require an approach specific to your deployment environment.

    In operational environments it's likely science data providers would typically be responsible for providing a Kinesis stream with the appropriate permissions.

    For more information on how this process works and how to develop a process that will add records to a stream, read the Kinesis documentation and the developer guide.

    Source Data

    This entry will run the SyncGranule task against a single target data file. To that end it will require a single data file to be present in an S3 bucket matching the Provider configured in the next section.

    Collection and Provider

    Cumulus will need to be configured with a Collection and Provider entry of your choosing. The provider should match the location of the source data from the Ingest Source Data section.

    This can be done via the Cumulus Dashboard if installed or the API. It is strongly recommended to use the dashboard if possible.


    Configure the Workflow

    Provided the prerequisites have been fulfilled, you can begin adding the needed values to your Cumulus configuration to configure the example workflow.

    The following are steps that are required to set up your Cumulus instance to run the example workflow:

    Example CNM Workflow

    In this example, we're going to trigger a workflow by creating a Kinesis rule and sending a record to a Kinesis stream.

    The following workflow definition should be added to a new .tf workflow resource (e.g. cnm_workflow.tf) in your deployment directory. For the complete CNM workflow example, see examples/cumulus-tf/kinesis_trigger_test_workflow.tf.

    Add the following to the new terraform file in your deployment directory, updating the following:

    • Set the response-endpoint key in the CnmResponse task in the workflow JSON to match the name of the Kinesis response stream you configured in the prerequisites section
    • Update the source key to the workflow module to match the Cumulus release associated with your deployment.
    module "cnm_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-workflow.zip"

    prefix = var.prefix
    name = "CNMExampleWorkflow"
    workflow_config = module.cumulus.workflow_config
    system_bucket = var.system_bucket

    {
    state_machine_definition = <<JSON
    "CNMExampleWorkflow": {
    "Comment": "CNMExampleWorkflow",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "collection": "{$.meta.collection}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "response-endpoint": "ADD YOUR RESPONSE STREAM NAME HERE",
    "region": "us-east-1",
    "type": "kinesis",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$.input.input}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 5,
    "MaxAttempts": 3
    }
    ],
    "End": true
    }
    }
    }
    }
    JSON

    Again, please make sure to modify the value response-endpoint to match the stream name (not ARN) for your Kinesis response stream.

    Lambda Configuration

    To execute this workflow, you're required to include several Lambda resources in your deployment. To do this, add the following task (Lambda) definitions to your deployment along with the workflow you created above:

    Please note: To utilize these tasks you need to ensure you have a compatible CMA layer. See the deployment instructions for more details on how to deploy a CMA layer.

    Below is a description of each of these tasks:

    CNMToCMA

    CNMToCMA is meant for the beginning of a workflow: it maps CNM granule information to a payload for downstream tasks. For other CNM workflows, you would need to ensure that downstream tasks in your workflow either understand the CNM message or include a translation task like this one.

    You can also manipulate the data sent to downstream tasks using task_config for various states in your workflow resource configuration. Read more about how to configure data on the Workflow Input & Output page.

    CnmResponse

    The CnmResponse Lambda generates a CNM response message and puts it on the response-endpoint Kinesis stream.

    You can read more about the expected schema of a CnmResponse record in the Cloud Notification Mechanism schema repository.

    Additional Tasks

    Lastly, this entry also makes use of the SyncGranule task from the cumulus module.

    Redeploy

    Once the above configuration changes have been made, redeploy your stack.

    Please refer to Update Cumulus resources in the deployment documentation if you are unfamiliar with redeployment.

    Rule Configuration

    Cumulus includes a messageConsumer Lambda function (message-consumer). Cumulus kinesis-type rules create the event source mappings between Kinesis streams and the messageConsumer Lambda. The messageConsumer Lambda consumes records from one or more Kinesis streams, as defined by enabled kinesis-type rules. When new records are pushed to one of these streams, the messageConsumer triggers workflows associated with the enabled kinesis-type rules.

    To add a rule via the dashboard (if you'd like to use the API, see the docs here), navigate to the Rules page and click Add a rule, then configure the new rule using the following template (substituting correct values for parameters denoted by ${}):

    {
    "collection": {
    "name": "L2_HR_PIXC",
    "version": "000"
    },
    "name": "L2_HR_PIXC_kinesisRule",
    "provider": "PODAAC_SWOT",
    "rule": {
    "type": "kinesis",
    "value": "arn:aws:kinesis:{{awsRegion}}:{{awsAccountId}}:stream/{{streamName}}"
    },
    "state": "ENABLED",
    "workflow": "CNMExampleWorkflow"
    }

    Please Note:

    • The rule's value attribute value must match the Amazon Resource Name ARN for the Kinesis data stream you've preconfigured. You should be able to obtain this ARN from the Kinesis Dashboard entry for the selected stream.
    • The collection and provider should match the collection and provider you setup in the Prerequisites section.

    Once you've clicked on 'submit' a new rule should appear in the dashboard's Rule Overview.


    Execute the Workflow

    Once Cumulus has been redeployed and a rule has been added, we're ready to trigger the workflow and watch it execute.

    How to Trigger the Workflow

    To trigger matching workflows, you will need to put a record on the Kinesis stream that the message-consumer Lambda will recognize as a matching event. Most importantly, it should include a collection name that matches a valid collection.

    For the purpose of this example, the easiest way to accomplish this is using the AWS CLI.

    Create Record JSON

    Construct a JSON file containing an object that matches the values that have been previously setup. This JSON object should be a valid Cloud Notification Mechanism message.

    Please note: this example is somewhat contrived, as the downstream tasks don't care about most of these fields. A 'real' data ingest workflow would.

    The following values (denoted by ${} in the sample below) should be replaced to match values we've previously configured:

    • TEST_DATA_FILE_NAME: The filename of the test data that is available in the S3 (or other) provider we created earlier.
    • TEST_DATA_URI: The full S3 path to the test data (e.g. s3://bucket-name/path/granule)
    • COLLECTION: The collection name defined in the prerequisites for this product
    {
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "${TEST_DATA_FILE_NAME}",
    "checksum": "bogus_checksum_value",
    "uri": "${TEST_DATA_URI}",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "${TEST_DATA_FILE_NAME}",
    "dataVersion": "006"
    },
    "identifier ": "testIdentifier123456",
    "collection": "${COLLECTION}",
    "provider": "TestProvider",
    "version": "001",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Add Record to Kinesis Data Stream

    Using the JSON file you created, push it to the Kinesis notification stream:

    aws kinesis put-record --stream-name YOUR_KINESIS_NOTIFICATION_STREAM_NAME_HERE --partition-key 1 --data file:///path/to/file.json

    Please note: The above command uses the stream name, not the ARN.

    The command should return output similar to:

    {
    "ShardId": "shardId-000000000000",
    "SequenceNumber": "42356659532578640215890215117033555573986830588739321858"
    }

    This command will put a record containing the JSON from the --data flag onto the Kinesis data stream. The messageConsumer Lambda will consume the record and construct a valid CMA payload to trigger workflows. For this example, the record will trigger the CNMExampleWorkflow workflow as defined by the rule previously configured.

    You can view the current running executions on the Executions dashboard page which presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information.

    Verify Workflow Execution

    As detailed above, once the record is added to the Kinesis data stream, the messageConsumer Lambda will trigger the CNMExampleWorkflow .

    TranslateMessage

    TranslateMessage (which corresponds to the CNMToCMA Lambda) will take the CNM object payload and add a granules object to the CMA payload that's consistent with other Cumulus ingest tasks, and add a meta.cnm key (as well as the payload) to store the original message.

    For more on the Message Adapter, please see the Message Flow documentation.

    An example of what is happening in the CNMToCMA Lambda is as follows:

    Example Input Payload:

    "payload": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some_bucket/cumulus-test-data/pdrs/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Example Output Payload:

      "payload": {
    "cnm": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552"
    },
    "output": {
    "granules": [
    {
    "granuleId": "TestGranuleUR",
    "files": [
    {
    "path": "some-bucket/data",
    "url_path": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "some-bucket",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 12345678
    }
    ]
    }
    ]
    }
    }

    SyncGranules

    This Lambda will take the files listed in the payload and move them to s3://{deployment-private-bucket}/file-staging/{deployment-name}/{COLLECTION}/{file_name}.

    CnmResponse

    Assuming a successful execution of the workflow, this task will recover the meta.cnm key from the CMA output, and add a "SUCCESS" record to the notification Kinesis stream.

    If a prior step in the workflow has failed, this will add a "FAILURE" record to the stream instead.

    The data written to the response-endpoint should adhere to the Response Message Fields schema.

    Example CNM Success Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "SUCCESS"
    }
    }

    Example CNM Error Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "FAILURE",
    "errorCode": "PROCESSING_ERROR",
    "errorMessage": "File [cumulus-dev-a4d38f59-5e57-590c-a2be-58640db02d91/prod_20170926T11:30:36/production_file.nc] did not match gve checksum value."
    }
    }

    Note the CnmResponse state defined in the .tf workflow definition above configures $.exception to be passed to the CnmResponse Lambda keyed under config.WorkflowException. This is required for the CnmResponse code to deliver a failure response.

    To test the failure scenario, send a record missing the product.name key.


    Verify results

    Check for successful execution on the dashboard

    Following the successful execution of this workflow, you should expect to see the workflow complete successfully on the dashboard:

    Screenshot of a successful CNM workflow appearing on the executions page of the Cumulus dashboard

    Check the test granule has been delivered to S3 staging

    The test granule identified in the Kinesis record should be moved to the deployment's private staging area.

    Check for Kinesis records

    A SUCCESS notification should be present on the response-endpoint Kinesis stream.

    You should be able to validate the notification and response streams have the expected records with the following steps (the AWS CLI Kinesis Basic Stream Operations is useful to review before proceeding):

    Get a shard iterator (substituting your stream name as appropriate):

    aws kinesis get-shard-iterator \
    --shard-id shardId-000000000000 \
    --shard-iterator-type LATEST \
    --stream-name NOTIFICATION_OR_RESPONSE_STREAM_NAME

    which should result in an output to:

    {
    "ShardIterator": "VeryLongString=="
    }
    • Re-trigger the workflow by using the put-record command from
    • As the workflow completes, use the output from the get-shard-iterator command to request data from the stream:
    aws kinesis get-records --shard-iterator SHARD_ITERATOR_VALUE

    This should result in output similar to:

    {
    "Records": [
    {
    "SequenceNumber": "49586720336541656798369548102057798835250389930873978882",
    "ApproximateArrivalTimestamp": 1532664689.128,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjI4LjkxOSJ9",
    "PartitionKey": "1"
    },
    {
    "SequenceNumber": "49586720336541656798369548102059007761070005796999266306",
    "ApproximateArrivalTimestamp": 1532664707.149,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjQ2Ljk1OCJ9",
    "PartitionKey": "1"
    }
    ],
    "NextShardIterator": "AAAAAAAAAAFo9SkF8RzVYIEmIsTN+1PYuyRRdlj4Gmy3dBzsLEBxLo4OU+2Xj1AFYr8DVBodtAiXbs3KD7tGkOFsilD9R5tA+5w9SkGJZ+DRRXWWCywh+yDPVE0KtzeI0andAXDh9yTvs7fLfHH6R4MN9Gutb82k3lD8ugFUCeBVo0xwJULVqFZEFh3KXWruo6KOG79cz2EF7vFApx+skanQPveIMz/80V72KQvb6XNmg6WBhdjqAA==",
    "MillisBehindLatest": 0
    }

    Note the data encoding is not human readable and would need to be parsed/converted to be interpretable. There are many options to build a Kineis consumer such as the KCL.

    For purposes of validating the workflow, it may be simpler to locate the workflow in the Step Function Management Console and assert the expected output is similar to the below examples.

    Successful CNM Response Object Example:

    {
    "cnmResponse": {
    "provider": "TestProvider",
    "collection": "MOD09GQ",
    "version": "123456",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier ": "testIdentifier123456",
    "response": {
    "status": "SUCCESS"
    }
    }
    }

    Kinesis Record Error Handling

    messageConsumer

    The default Kinesis stream processing in the Cumulus system is configured for record error tolerance.

    When the messageConsumer fails to process a record, the failure is captured and the record is published to the kinesisFallback SNS Topic. The kinesisFallback SNS topic broadcasts the record and a subscribed copy of the messageConsumer Lambda named kinesisFallback consumes these failures.

    At this point, the normal Lambda asynchronous invocation retry behavior will attempt to process the record 3 mores times. After this, if the record cannot successfully be processed, it is written to a dead letter queue. Cumulus' dead letter queue is an SQS Queue named kinesisFailure. Operators can use this queue to inspect failed records.

    This system ensures when messageConsumer fails to process a record and trigger a workflow, the record is retried 3 times. This retry behavior improves system reliability in case of any external service failure outside of Cumulus control.

    The Kinesis error handling system - the kinesisFallback SNS topic, messageConsumer Lambda, and kinesisFailure SQS queue - come with the API package and do not need to be configured by the operator.

    To examine records that were unable to be processed at any step you need to go look at the dead letter queue {{prefix}}-kinesisFailure. Check the Simple Queue Service (SQS) console. Select your queue, and under the Queue Actions tab, you can choose View/Delete Messages. Start polling for messages and you will see records that failed to process through the messageConsumer.

    Note, these are only records that occurred when processing records from Kinesis streams. Workflow failures are handled differently.

    Kinesis Stream logging

    Notification Stream messages

    Cumulus includes two Lambdas (KinesisInboundEventLogger and KinesisOutboundEventLogger) that utilize the same code to take a Kinesis record event as input, deserialize the data field and output the modified event to the logs.

    When a kinesis rule is created, in addition to the messageConsumer event mapping, an event mapping is created to trigger KinesisInboundEventLogger to record a log of the inbound record, to allow for analysis in case of unexpected failure.

    Response Stream messages

    Cumulus also supports this feature for all outbound messages. To take advantage of this feature, you will need to set an event mapping on the KinesisOutboundEventLogger Lambda that targets your response-endpoint. You can do this in the Lambda management page for KinesisOutboundEventLogger. Add a Kinesis trigger, and configure it to target the cnmResponseStream for your workflow:

    Screenshot of the AWS console showing configuration for Kinesis stream trigger on KinesisOutboundEventLogger Lambda

    Once this is done, all records sent to the response-endpoint will also be logged in CloudWatch. For more on configuring Lambdas to trigger on Kinesis events, please see creating an event source mapping.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/data-cookbooks/error-handling/index.html b/docs/v10.1.0/data-cookbooks/error-handling/index.html index 02e93319da4..de82a85c7b9 100644 --- a/docs/v10.1.0/data-cookbooks/error-handling/index.html +++ b/docs/v10.1.0/data-cookbooks/error-handling/index.html @@ -5,7 +5,7 @@ Error Handling in Workflows | Cumulus Documentation - + @@ -45,7 +45,7 @@ Service Exception. See this documentation on configuring your workflow to handle transient lambda errors.

    Example state machine definition:

    {
    "Comment": "Tests Workflow from Kinesis Stream",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "Path": "$.payload",
    "TargetPath": "$.payload"
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": ["States.ALL"],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowSucceeded"
    },
    "CnmResponseFail": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowFailed"
    },
    "WorkflowSucceeded": {
    "Type": "Succeed"
    },
    "WorkflowFailed": {
    "Type": "Fail",
    "Cause": "Workflow failed"
    }
    }
    }

    The above results in a workflow which is visualized in the diagram below:

    Screenshot of a visualization of an AWS Step Function workflow definition with branching logic for failures

    Summary

    Error handling should (mostly) be the domain of workflow configuration.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/data-cookbooks/hello-world/index.html b/docs/v10.1.0/data-cookbooks/hello-world/index.html index 536dd347bdc..11518e33fef 100644 --- a/docs/v10.1.0/data-cookbooks/hello-world/index.html +++ b/docs/v10.1.0/data-cookbooks/hello-world/index.html @@ -5,14 +5,14 @@ HelloWorld Workflow | Cumulus Documentation - +
    Version: v10.1.0

    HelloWorld Workflow

    Example task meant to be a sanity check/introduction to the Cumulus workflows.

    Pre-Deployment Configuration

    Workflow Configuration

    A workflow definition can be found in the template repository hello_world_workflow module.

    {
    "Comment": "Returns Hello World",
    "StartAt": "HelloWorld",
    "States": {
    "HelloWorld": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.hello_world_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    }

    Workflow error-handling can be configured as discussed in the Error-Handling cookbook.

    Task Configuration

    The HelloWorld task is provided for you as part of the cumulus terraform module, no configuration is needed.

    If you want to manually deploy your own version of this Lambda for testing, you can copy the Lambda resource definition located in the Cumulus source code at cumulus/tf-modules/ingest/hello-world-task.tf. The Lambda source code is located in the Cumulus source code at 'cumulus/tasks/hello-world'.

    Execution

    We will focus on using the Cumulus dashboard to schedule the execution of a HelloWorld workflow.

    Our goal here is to create a rule through the Cumulus dashboard that will define the scheduling and execution of our HelloWorld workflow. Let's navigate to the Rules page and click Add a rule.

    {
    "collection": { # collection values can be configured and found on the Collections page
    "name": "${collection_name}",
    "version": "${collection_version}"
    },
    "name": "helloworld_rule",
    "provider": "${provider}", # found on the Providers page
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "workflow": "HelloWorldWorkflow" # This can be found on the Workflows page
    }

    Screenshot of AWS Step Function execution graph for the HelloWorld workflow Executed workflow as seen in AWS Console

    Output/Results

    The Executions page presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information. The rule defined in the previous section should start an execution of its own accord, and the status of that execution can be tracked here.

    To get some deeper information on the execution, click on the value in the Name column of your execution of interest. This should bring up a visual representation of the workflow similar to that shown above, execution details, and a list of events.

    Summary

    Setting up the HelloWorld workflow on the Cumulus dashboard is the tip of the iceberg, so to speak. The task and step-function need to be configured before Cumulus deployment. A compatible collection and provider must be configured and applied to the rule. Finally, workflow execution status can be viewed via the workflows tab on the dashboard.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/data-cookbooks/ingest-notifications/index.html b/docs/v10.1.0/data-cookbooks/ingest-notifications/index.html index f960c485736..ee79b0fdc50 100644 --- a/docs/v10.1.0/data-cookbooks/ingest-notifications/index.html +++ b/docs/v10.1.0/data-cookbooks/ingest-notifications/index.html @@ -5,13 +5,13 @@ Ingest Notification in Workflows | Cumulus Documentation - +
    Version: v10.1.0

    Ingest Notification in Workflows

    On deployment, an SQS queue and three SNS topics are created and used for handling notification messages related to the workflow.

    The sfEventSqsToDbRecords Lambda function reads from the sfEventSqsToDbRecordsInputQueue queue and updates DynamoDB. The DynamoDB events for the ExecutionsTable, GranulesTable and PdrsTable are streamed on DynamoDBStreams, which are read by the publishExecutions, publishGranules and publishPdrs Lambda functions, respectively.

    These Lambda functions publish to the three SNS topics both when the workflow starts and when it reaches a terminal state (completion or failure). The following describes how many message(s) each topic receives both on workflow start and workflow completion/failure:

    • reportExecutions - Receives 1 message per workflow execution
    • reportGranules - Receives 1 message per granule in a workflow execution
    • reportPdrs - Receives 1 message per PDR

    Diagram of architecture for reporting workflow ingest notifications from AWS Step Functions

    The ingest notification reporting SQS queue is populated via a Cloudwatch rule for any Step Function execution state transitions. The sfEventSqsToDbRecords Lambda consumes this queue. The queue and Lambda are included in the cumulus module and the Cloudwatch rule in the workflow module and are included by default in a Cumulus deployment.

    Sending SQS messages to report status

    Publishing granule/PDR reports directly to the SQS queue

    If you have a non-Cumulus workflow or process ingesting data and would like to update the status of your granules or PDRs, you can publish directly to the reporting SQS queue. Publishing messages to this queue will result in those messages being stored as granule/PDR records in the Cumulus database and having the status of those granules/PDRs being visible on the Cumulus dashboard. The queue does have certain expectations as it expects a Cumulus Message nested within a Cloudwatch Step Function Event object.

    Posting directly to the queue will require knowing the queue URL. Assuming that you are using the cumulus module for your deployment, you can get the queue URL by adding them to outputs.tf for your Terraform deployment as in our example deployment:

    output "stepfunction_event_reporter_queue_url" {
    value = module.cumulus.stepfunction_event_reporter_queue_url
    }

    output "report_executions_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_granules_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_pdrs_sns_topic_arn" {
    value = module.cumulus.report_pdrs_sns_topic_arn
    }

    Then, when you run terraform deploy, you should see the topic ARNs printed to your console:

    Outputs:
    ...
    stepfunction_event_reporter_queue_url = https://sqs.us-east-1.amazonaws.com/xxxxxxxxx/<prefix>-sfEventSqsToDbRecordsInputQueue
    report_executions_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_granules_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_pdrs_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-pdrs-topic

    Once you have the queue URL, you can use the AWS SDK for your language of choice to publish messages to the topic. The expected format of these messages is that of a Cloudwatch Step Function event containing a Cumulus message. For SUCCEEDED events, the Cumulus message is expected to be in detail.output. For all other events statuses, a Cumulus Message is expected in detail.input. The Cumulus Message populating these fields MUST be a JSON string, not an object. Messages that do not conform to the schemas will fail to be created as records.

    If you are not seeing records persist to the database or show up in the Cumulus dashboard, you can investigate the Cloudwatch logs of the SQS consumer Lambda:

    • /aws/lambda/<prefix>-sfEventSqsToDbRecords

    In a workflow

    As described above, ingest notifications will automatically be published to the SNS topics on workflow start and completion/failure, so you should not include a workflow step to publish the initial or final status of your workflows.

    However, if you want to report your ingest status at any point during a workflow execution, you can add a workflow step using the SfSqsReport Lambda. In the following example from cumulus-tf/parse_pdr_workflow.tf, the ParsePdr workflow is configured to use the SfSqsReport Lambda, primarily to update the PDR ingestion status.

    Note: ${sf_sqs_report_task_arn} is an interpolated value referring to a Terraform resource. See the example deployment code for the ParsePdr workflow.

      "PdrStatusReport": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    },
    "ResultPath": null,
    "Type": "Task",
    "Resource": "${sf_sqs_report_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WaitForSomeTime"
    },

    Subscribing additional listeners to SNS topics

    Additional listeners to SNS topics can be configured in a .tf file for your Cumulus deployment. Shown below is configuration that subscribes an additional Lambda function (test_lambda) to receive messages from the report_executions SNS topic. To subscribe to the report_granules or report_pdrs SNS topics instead, simply replace report_executions in the code block below with either of those values.

    resource "aws_lambda_function" "test_lambda" {
    function_name = "${var.prefix}-testLambda"
    filename = "./testLambda.zip"
    source_code_hash = filebase64sha256("./testLambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"
    }

    resource "aws_sns_topic_subscription" "test_lambda" {
    topic_arn = module.cumulus.report_executions_sns_topic_arn
    protocol = "lambda"
    endpoint = aws_lambda_function.test_lambda.arn
    }

    resource "aws_lambda_permission" "test_lambda" {
    action = "lambda:InvokeFunction"
    function_name = aws_lambda_function.test_lambda.arn
    principal = "sns.amazonaws.com"
    source_arn = module.cumulus.report_executions_sns_topic_arn
    }

    SNS message format

    Subscribers to the SNS topics can expect to find the published message in the SNS event at Records[0].Sns.Message. The message will be a JSON stringified version of the ingest notification record for an execution or a PDR. For granules, the message will be a JSON stringified object with ingest notification record in the record property and the event type as the event property.

    The ingest notification record of the execution, granule, or PDR should conform to the data model schema for the given record type.

    Summary

    Workflows can be configured to send SQS messages at any point using the sf-sqs-report task.

    Additional listeners can be easily configured to trigger when messages are sent to the SNS topics.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/data-cookbooks/queue-post-to-cmr/index.html b/docs/v10.1.0/data-cookbooks/queue-post-to-cmr/index.html index 716db839c5d..90e4bc5c678 100644 --- a/docs/v10.1.0/data-cookbooks/queue-post-to-cmr/index.html +++ b/docs/v10.1.0/data-cookbooks/queue-post-to-cmr/index.html @@ -5,13 +5,13 @@ Queue PostToCmr | Cumulus Documentation - +
    Version: v10.1.0

    Queue PostToCmr

    In this document, we walk through handling CMR errors in workflows by queueing PostToCmr. We assume that the user already has an ingest workflow setup.

    Overview

    The general concept is that the last task of the ingest workflow will be QueueWorkflow, which queues the publish workflow. The publish workflow contains the PostToCmr task and if a CMR error occurs during PostToCmr, the publish workflow will add itself back onto the queue so that it can be executed when CMR is back online. This is achieved by leveraging the QueueWorkflow task again in the publish workflow. The following diagram demonstrates this queueing process.

    Diagram of workflow queueing

    Ingest Workflow

    The last step should be the QueuePublishWorkflow step. It should be configured with a queueUrl and workflow. In this case, the queueUrl is a throttled queue. Any queueUrl can be specified here which is useful if you would like to use a lower priority queue. The workflow is the unprefixed workflow name that you would like to queue (e.g. PublishWorkflow).

      "QueuePublishWorkflowStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "workflow": "{$.meta.workflow}",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Publish Workflow

    Configure the Catch section of your PostToCmr task to proceed to QueueWorkflow if a CMRInternalError is caught. Any other error will cause the workflow to fail.

      "Catch": [
    {
    "ErrorEquals": [
    "CMRInternalError"
    ],
    "Next": "RequeueWorkflow"
    },
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],

    Then, configure the QueueWorkflow task similarly to its configuration in the ingest workflow. This time, pass the current publish workflow to the task config. This allows for the publish workflow to be requeued when there is a CMR error.

    {
    "RequeueWorkflow": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "workflow": "PublishGranuleQueue",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    - + \ No newline at end of file diff --git a/docs/v10.1.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html b/docs/v10.1.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html index 06da4c7834e..263f85320ee 100644 --- a/docs/v10.1.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html +++ b/docs/v10.1.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html @@ -5,13 +5,13 @@ Run Step Function Tasks in AWS Lambda or Docker | Cumulus Documentation - +
    Version: v10.1.0

    Run Step Function Tasks in AWS Lambda or Docker

    Overview

    AWS Step Function Tasks can run tasks on AWS Lambda or on AWS Elastic Container Service (ECS) as a Docker container.

    Lambda provides serverless architecture, providing the best option for minimizing cost and server management. ECS provides the fullest extent of AWS EC2 resources via the flexibility to execute arbitrary code on any AWS EC2 instance type.

    When to use Lambda

    You should use AWS Lambda whenever all of the following are true:

    • The task runs on one of the supported Lambda Runtimes. At time of this writing, supported runtimes include versions of python, Java, Ruby, node.js, Go and .NET.
    • The lambda package is less than 50 MB in size, zipped.
    • The task consumes less than each of the following resources:
      • 3008 MB memory allocation
      • 512 MB disk storage (must be written to /tmp)
      • 15 minutes of execution time

    See this page for a complete and up-to-date list of AWS Lambda limits.

    If your task requires more than any of these resources or an unsupported runtime, creating a Docker image which can be run on ECS is the way to go. Cumulus supports running any lambda package (and its configured layers) as a Docker container with cumulus-ecs-task.

    Step Function Activities and cumulus-ecs-task

    Step Function Activities enable a state machine task to "publish" an activity task which can be picked up by any activity worker. Activity workers can run pretty much anywhere, but Cumulus workflows support the cumulus-ecs-task activity worker. The cumulus-ecs-task worker runs as a Docker container on the Cumulus ECS cluster.

    The cumulus-ecs-task container takes an AWS Lambda Amazon Resource Name (ARN) as an argument (see --lambdaArn in the example below). This ARN argument is defined at deployment time. The cumulus-ecs-task worker polls for new Step Function Activity Tasks. When a Step Function executes, the worker (container) picks up the activity task and runs the code contained in the lambda package defined on deployment.

    Example: Replacing AWS Lambda with a Docker container run on ECS

    This example will use an already-defined workflow from the cumulus module that includes the QueueGranules task in its configuration.

    The following example is an excerpt from the Discover Granules workflow containing the step definition for the QueueGranules step:

    Note: ${ingest_granule_workflow_name} and ${queue_granules_task_arn} are interpolated values that refer to Terraform resources. See the example deployment code for the Discover Granules workflow.

      "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "queueUrl": "{$.meta.queues.startSF}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Given it has been discovered this task can no longer run in AWS Lambda, you can instead run it on the Cumulus ECS cluster by adding the following resources to your terraform deployment (by either adding a new .tf file or updating an existing one):

    • A aws_sfn_activity resource:
    resource "aws_sfn_activity" "queue_granules" {
    name = "${var.prefix}-QueueGranules"
    }
    • An instance of the cumulus_ecs_service module (found on the Cumulus releases page configured to provide the QueueGranules task:

    module "queue_granules_service" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-ecs-service.zip"

    prefix = var.prefix
    name = "QueueGranules"

    cluster_arn = module.cumulus.ecs_cluster_arn
    desired_count = 1
    image = "cumuluss/cumulus-ecs-task:1.7.0"

    cpu = 400
    memory_reservation = 700

    environment = {
    AWS_DEFAULT_REGION = data.aws_region.current.name
    }
    command = [
    "cumulus-ecs-task",
    "--activityArn",
    aws_sfn_activity.queue_granules.id,
    "--lambdaArn",
    module.cumulus.queue_granules_task.task_arn
    ]
    alarms = {
    MemoryUtilizationHigh = {
    comparison_operator = "GreaterThanThreshold"
    evaluation_periods = 1
    metric_name = "MemoryUtilization"
    statistic = "SampleCount"
    threshold = 75
    }
    }
    }

    Please note: If you have updated the code for the Lambda specified by --lambdaArn, you will have to manually restart the tasks in your ECS service before invocation of the Step Function activity will use the updated Lambda code.

    • An updated Discover Granules workflow) to utilize the new resource (the Resource key in the QueueGranules step has been updated to:

    "Resource": "${aws_sfn_activity.queue_granules.id}")`

    If you then run this workflow in place of the DiscoverGranules workflow, the QueueGranules step would run as an ECS task instead of a lambda.

    Final note

    Step Function Activities and AWS Lambda are not the only ways to run tasks in an AWS Step Function. Learn more about other service integrations, including direct ECS integration via the AWS Service Integrations page.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/data-cookbooks/sips-workflow/index.html b/docs/v10.1.0/data-cookbooks/sips-workflow/index.html index 74c3a5a3e17..bd9c76d99a8 100644 --- a/docs/v10.1.0/data-cookbooks/sips-workflow/index.html +++ b/docs/v10.1.0/data-cookbooks/sips-workflow/index.html @@ -5,7 +5,7 @@ Science Investigator-led Processing Systems (SIPS) | Cumulus Documentation - + @@ -16,7 +16,7 @@ we're just going to create a onetime throw-away rule that will be easy to test with. This rule will kick off the DiscoverAndQueuePdrs workflow, which is the beginning of a Cumulus SIPS workflow:

    Screenshot of a Cumulus rule configuration

    Note: A list of configured workflows exists under the "Workflows" in the navigation bar on the Cumulus dashboard. Additionally, one can find a list of executions and their respective status in the "Executions" tab in the navigation bar.

    DiscoverAndQueuePdrs Workflow

    This workflow will discover PDRs and queue them to be processed. Duplicate PDRs will be dealt with according to the configured duplicate handling setting in the collection. The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. DiscoverPdrs - source
    2. QueuePdrs - source

    Screenshot of execution graph for discover and queue PDRs workflow in the AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the discover_and_queue_pdrs_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    ParsePdr Workflow

    The ParsePdr workflow will parse a PDR, queue the specified granules (duplicates are handled according to the duplicate handling setting) and periodically check the status of those queued granules. This workflow will not succeed until all the granules included in the PDR are successfully ingested. If one of those fails, the ParsePdr workflow will fail. NOTE that ParsePdr may spin up multiple IngestGranule workflows in parallel, depending on the granules included in the PDR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. ParsePdr - source
    2. QueueGranules - source
    3. CheckStatus - source

    Screenshot of execution graph for SIPS Parse PDR workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the parse_pdr_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    IngestGranule Workflow

    The IngestGranule workflow processes and ingests a granule and posts the granule metadata to CMR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. SyncGranule - source.
    2. CmrStep - source

    Additionally this workflow requires a processing step you must provide. The ProcessingStep step in the workflow picture below is an example of a custom processing step.

    Note: Using the CmrStep is not required and can be left out of the processing trajectory if desired (for example, in testing situations).

    Screenshot of execution graph for SIPS IngestGranule workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the ingest_and_publish_granule_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    Summary

    In this cookbook we went over setting up a collection, rule, and provider for a SIPS workflow. Once we had the setup completed, we looked over the Cumulus workflows that participate in parsing PDRs, ingesting and processing granules, and updating CMR.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/data-cookbooks/throttling-queued-executions/index.html b/docs/v10.1.0/data-cookbooks/throttling-queued-executions/index.html index 92032dc0841..9976dd12ae6 100644 --- a/docs/v10.1.0/data-cookbooks/throttling-queued-executions/index.html +++ b/docs/v10.1.0/data-cookbooks/throttling-queued-executions/index.html @@ -5,13 +5,13 @@ Throttling queued executions | Cumulus Documentation - +
    Version: v10.1.0

    Throttling queued executions

    In this entry, we will walk through how to create an SQS queue for scheduling executions which will be used to limit those executions to a maximum concurrency. And we will see how to configure our Cumulus workflows/rules to use this queue.

    We will also review the architecture of this feature and highlight some implementation notes.

    Limiting the number of executions that can be running from a given queue is useful for controlling the cloud resource usage of workflows that may be lower priority, such as granule reingestion or reprocessing campaigns. It could also be useful for preventing workflows from exceeding known resource limits, such as a maximum number of open connections to a data provider.

    Implementing the queue

    Create and deploy the queue

    Add a new queue

    In a .tf file for your Cumulus deployment, add a new SQS queue:

    resource "aws_sqs_queue" "background_job_queue" {
    name = "${var.prefix}-backgroundJobQueue"
    receive_wait_time_seconds = 20
    visibility_timeout_seconds = 60
    }

    Set maximum executions for the queue

    Define the throttled_queues variable for the cumulus module in your Cumulus deployment to specify the maximum concurrent executions for the queue.

    module "cumulus" {
    # ... other variables

    throttled_queues = [{
    url = aws_sqs_queue.background_job_queue.id,
    execution_limit = 5
    }]
    }

    Setup consumer for the queue

    Add the sqs2sfThrottle Lambda as the consumer for the queue and add a Cloudwatch event rule/target to read from the queue on a scheduled basis.

    Please note: You must use the sqs2sfThrottle Lambda as the consumer for any queue with a queue execution limit or else the execution throttling will not work correctly. Additionally, please allow at least 60 seconds after creation before using the queue while associated infrastructure and triggers are set up and made ready.

    aws_sqs_queue.background_job_queue.id refers to the queue resource defined above.

    resource "aws_cloudwatch_event_rule" "background_job_queue_watcher" {
    schedule_expression = "rate(1 minute)"
    }

    resource "aws_cloudwatch_event_target" "background_job_queue_watcher" {
    rule = aws_cloudwatch_event_rule.background_job_queue_watcher.name
    arn = module.cumulus.sqs2sfThrottle_lambda_function_arn
    input = jsonencode({
    messageLimit = 500
    queueUrl = aws_sqs_queue.background_job_queue.id
    timeLimit = 60
    })
    }

    resource "aws_lambda_permission" "background_job_queue_watcher" {
    action = "lambda:InvokeFunction"
    function_name = module.cumulus.sqs2sfThrottle_lambda_function_arn
    principal = "events.amazonaws.com"
    source_arn = aws_cloudwatch_event_rule.background_job_queue_watcher.arn
    }

    Re-deploy your Cumulus application

    Follow the instructions to re-deploy your Cumulus application. After you have re-deployed, your workflow template will be updated to the include information about the queue (the output below is partial output from an expected workflow template):

    {
    "cumulus_meta": {
    "queueExecutionLimits": {
    "<backgroundJobQueue_SQS_URL>": 5
    }
    }
    }

    Integrate your queue with workflows and/or rules

    Integrate queue with queuing steps in workflows

    For any workflows using QueueGranules or QueuePdrs that you want to use your new queue, update the Cumulus configuration of those steps in your workflows.

    As seen in this partial configuration for a QueueGranules step, update the queueUrl to reference the new throttled queue:

    Note: ${ingest_granule_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverGranules workflow.

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}"
    }
    }
    }
    }
    }

    Similarly, for a QueuePdrs step:

    Note: ${parse_pdr_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverPdrs workflow.

    {
    "QueuePdrs": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "parsePdrWorkflow": "${parse_pdr_workflow_name}"
    }
    }
    }
    }
    }

    After making these changes, re-deploy your Cumulus application for the execution throttling to take effect on workflow executions queued by these workflows.

    Create/update a rule to use your new queue

    Create or update a rule definition to include a queueUrl property that refers to your new queue:

    {
    "name": "s3_provider_rule",
    "workflow": "DiscoverAndQueuePdrs",
    "provider": "s3_provider",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "queueUrl": "<backgroundJobQueue_SQS_URL>" // configure rule to use your queue URL
    }

    After creating/updating the rule, any subsequent invocations of the rule should respect the maximum number of executions when starting workflows from the queue.

    Architecture

    Architecture diagram showing how executions started from a queue are throttled to a maximum concurrent limit

    Execution throttling based on the queue works by manually keeping a count (semaphore) of how many executions are running for the queue at a time. The key operation that prevents the number of executions from exceeding the maximum for the queue is that before starting new executions, the sqs2sfThrottle Lambda attempts to increment the semaphore and responds as follows:

    • If the increment operation is successful, then the count was not at the maximum and an execution is started
    • If the increment operation fails, then the count was already at the maximum so no execution is started

    Final notes

    Limiting the number of concurrent executions for work scheduled via a queue has several consequences worth noting:

    • The number of executions that are running for a given queue will be limited to the maximum for that queue regardless of which workflow(s) are started.
    • If you use the same queue to schedule executions across multiple workflows/rules, then the limit on the total number of executions running concurrently will be applied to all of the executions scheduled across all of those workflows/rules.
    • If you are scheduling the same workflow both via a queue with a maxExecutions value and a queue without a maxExecutions value, only the executions scheduled via the queue with the maxExecutions value will be limited to the maximum.
    - + \ No newline at end of file diff --git a/docs/v10.1.0/data-cookbooks/tracking-files/index.html b/docs/v10.1.0/data-cookbooks/tracking-files/index.html index c5fb2f336da..17cf207e9f3 100644 --- a/docs/v10.1.0/data-cookbooks/tracking-files/index.html +++ b/docs/v10.1.0/data-cookbooks/tracking-files/index.html @@ -5,7 +5,7 @@ Tracking Ancillary Files | Cumulus Documentation - + @@ -19,7 +19,7 @@ The UMM-G column reflects the RelatedURL's Type derived from the CNM type, whereas the ECHO10 column shows how the CNM type affects the destination element.

    CNM TypeUMM-G RelatedUrl.TypeECHO10 Location
    ancillary'VIEW RELATED INFORMATION'OnlineResource
    data'GET DATA'(HTTPS URL) or 'GET DATA VIA DIRECT ACCESS'(S3 URI)OnlineAccessURL
    browse'GET RELATED VISUALIZATION'AssociatedBrowseImage
    linkage'EXTENDED METADATA'OnlineResource
    metadata'EXTENDED METADATA'OnlineResource
    qa'EXTENDED METADATA'OnlineResource

    Common Use Cases

    This section briefly documents some common use cases and the recommended configuration for the file. The examples shown here are for the DiscoverGranules use case, which allows configuration at the Cumulus dashboard level. The other two cases covered in the ancillary metadata documentation require configuration at the provider notification level (either CNM message or PDR) and are not covered here.

    Configuring browse imagery:

    {
    "bucket": "public",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_[\\d]{1}.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_1.jpg",
    "type": "browse"
    }

    Configuring a documentation entry:

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_README.pdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_README.pdf",
    "type": "metadata"
    }

    Configuring other associated files (use types metadata or qa as appropriate):

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_QA.txt$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_QA.txt",
    "type": "qa"
    }
    - + \ No newline at end of file diff --git a/docs/v10.1.0/deployment/api-gateway-logging/index.html b/docs/v10.1.0/deployment/api-gateway-logging/index.html index d3140b7ddc0..d1ce1453e50 100644 --- a/docs/v10.1.0/deployment/api-gateway-logging/index.html +++ b/docs/v10.1.0/deployment/api-gateway-logging/index.html @@ -5,13 +5,13 @@ API Gateway Logging | Cumulus Documentation - +
    Version: v10.1.0

    API Gateway Logging

    Enabling API Gateway logging

    In order to enable distribution API Access and execution logging, configure the TEA deployment by setting log_api_gateway_to_cloudwatch on the thin_egress_app module:

    log_api_gateway_to_cloudwatch = true

    This enables the distribution API to send its logs to the default CloudWatch location: API-Gateway-Execution-Logs_<RESTAPI_ID>/<STAGE>

    Configure Permissions for API Gateway Logging to CloudWatch

    Instructions for enabling account level logging from API Gateway to CloudWatch

    This is a one time operation that must be performed on each AWS account to allow API Gateway to push logs to CloudWatch.

    Create a policy document

    The AmazonAPIGatewayPushToCloudWatchLogs managed policy, with an ARN of arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs, has all the required permissions to enable API Gateway logging to CloudWatch. To grant these permissions to your account, first create an IAM role with apigateway.amazonaws.com as its trusted entity.

    Save this snippet as apigateway-policy.json.

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "",
    "Effect": "Allow",
    "Principal": {
    "Service": "apigateway.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
    }
    ]
    }

    Create an account role to act as ApiGateway and write to CloudWatchLogs

    NASA users in NGAP: be sure to use your account's permission boundary.

    aws iam create-role \
    --role-name ApiGatewayToCloudWatchLogs \
    [--permissions-boundary <permissionBoundaryArn>] \
    --assume-role-policy-document file://apigateway-policy.json

    Note the ARN of the returned role for the last step.

    Attach correct permissions to role

    Next attach the AmazonAPIGatewayPushToCloudWatchLogs policy to the IAM role.

    aws iam attach-role-policy \
    --role-name ApiGatewayToCloudWatchLogs \
    --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs"

    Update Account API Gateway settings with correct permissions

    Finally, set the IAM role ARN on the cloudWatchRoleArn property on your API Gateway Account settings.

    aws apigateway update-account \
    --patch-operations op='replace',path='/cloudwatchRoleArn',value='<ApiGatewayToCloudWatchLogs ARN>'

    Configure API Gateway CloudWatch Logs Delivery

    See Configure Cloudwatch Logs Delivery

    - + \ No newline at end of file diff --git a/docs/v10.1.0/deployment/cloudwatch-logs-delivery/index.html b/docs/v10.1.0/deployment/cloudwatch-logs-delivery/index.html index b356f3fb4d0..5291d24310d 100644 --- a/docs/v10.1.0/deployment/cloudwatch-logs-delivery/index.html +++ b/docs/v10.1.0/deployment/cloudwatch-logs-delivery/index.html @@ -5,13 +5,13 @@ Configure Cloudwatch Logs Delivery | Cumulus Documentation - +
    Version: v10.1.0

    Configure Cloudwatch Logs Delivery

    As an optional configuration step, it is possible to deliver CloudWatch logs to a cross-account shared AWS::Logs::Destination. An operator does this by configuring the cumulus module for your deployment as shown below. The value of the log_destination_arn variable is the ARN of a writeable log destination.

    The value can be either an AWS::Logs::Destination or a Kinesis Stream ARN to which your account can write.

    log_destination_arn           = arn:aws:[kinesis|logs]:us-east-1:123456789012:[streamName|destination:logDestinationName]

    Logs Sent

    Be default, the following logs will be sent to the destination when one is given.

    • Ingest logs
    • Async Operation logs
    • Thin Egress App API Gateway logs (if configured)

    Additional Logs

    If additional logs are needed, you can configure additional_log_groups_to_elk with the Cloudwatch log groups you want to send to the destination. additional_log_groups_to_elk is a map with the key as a descriptor and the value with the Cloudwatch log group name.

    additional_log_groups_to_elk = {
    "HelloWorldTask" = "/aws/lambda/cumulus-example-HelloWorld"
    "MyCustomTask" = "my-custom-task-log-group"
    }
    - + \ No newline at end of file diff --git a/docs/v10.1.0/deployment/components/index.html b/docs/v10.1.0/deployment/components/index.html index 4b014428dbc..0b074ebfe98 100644 --- a/docs/v10.1.0/deployment/components/index.html +++ b/docs/v10.1.0/deployment/components/index.html @@ -5,7 +5,7 @@ Component-based Cumulus Deployment | Cumulus Documentation - + @@ -39,7 +39,7 @@ Terraform at the same time.

    With remote state, Terraform writes the state data to a remote data store, which can then be shared between all members of a team.

    The recommended approach for handling remote state with Cumulus is to use the S3 backend. This backend stores state in S3 and uses a DynamoDB table for locking.

    See the deployment documentation for a walk-through of creating resources for your remote state using an S3 backend.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/deployment/create_bucket/index.html b/docs/v10.1.0/deployment/create_bucket/index.html index fcb269efb7c..f2658b59b6b 100644 --- a/docs/v10.1.0/deployment/create_bucket/index.html +++ b/docs/v10.1.0/deployment/create_bucket/index.html @@ -5,13 +5,13 @@ Creating an S3 Bucket | Cumulus Documentation - +
    Version: v10.1.0

    Creating an S3 Bucket

    Buckets can be created on the command line with AWS CLI or via the web interface on the AWS console.

    When creating a protected bucket (a bucket containing data which will be served through the distribution API), make sure to enable S3 server access logging. See S3 Server Access Logging for more details.

    Command line

    Using the AWS command line tool create-bucket s3api subcommand:

    $ aws s3api create-bucket \
    --bucket foobar-internal \
    --region us-west-2 \
    --create-bucket-configuration LocationConstraint=us-west-2
    {
    "Location": "/foobar-internal"
    }

    Note: The region and create-bucket-configuration arguments are only necessary if you are creating a bucket outside of the us-east-1 region.

    Please note security settings and other bucket options can be set via the options listed in the s3api documentation.

    Repeat the above step for each bucket to be created.

    Web interface

    See: AWS "Creating a Bucket" documentation

    - + \ No newline at end of file diff --git a/docs/v10.1.0/deployment/cumulus_distribution/index.html b/docs/v10.1.0/deployment/cumulus_distribution/index.html index 58a1e5594a1..864d64b0e84 100644 --- a/docs/v10.1.0/deployment/cumulus_distribution/index.html +++ b/docs/v10.1.0/deployment/cumulus_distribution/index.html @@ -5,14 +5,14 @@ Using the Cumulus Distribution API | Cumulus Documentation - +
    Version: v10.1.0

    Using the Cumulus Distribution API

    The Cumulus Distribution API is a set of endpoints that can be used to enable AWS Cognito authentication when downloading data from S3.

    Configuring a Cumulus Distribution deployment

    The Cumulus Distribution API is included in the main Cumulus repo. It is available as part of the terraform-aws-cumulus.zip archive in the latest release.

    These steps assume you're using the Cumulus Deployment Template but can also be used for custom deployments.

    To configure a deployment to use Cumulus Distribution:

    1. Remove or comment the "Thin Egress App Settings" in the Cumulus Template Deploy and enable the Cumulus Distribution settings.
    2. Delete or comment the contents of thin_egress_app.tf and the corresponding Thin Egress App outputs in outputs.tf. These are not necessary for a Cumulus Distribution deployment.
    3. Uncomment the Cumulus Distribution outputs in outputs.tf.
    4. Rename cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.example to cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.

    Cognito Application and User Credentials

    The major prerequisite for using the Cumulus Distribution API is to set up Cognito. If operating within NGAP, this should already be done for you. If operating outside of NGAP, you must set up Cognito yourself, which is beyond the scope of this documentation.

    Given that Cognito is set up, in order to be able to download granule files via the Cumulus Distribution API, you must obtain Cognito user credentials, because any attempt to download such files (that will be, or have been, published to the CMR via your Cumulus deployment) will result in a prompt for you to supply Cognito user credentials. To obtain your own user credentials, talk to your product owner or scrum master for additional information. They should either know how to create the credentials, know who can create them for the team, or be the liaison to the Cognito team.

    Further, whoever helps to obtain your Cognito user credentials should also be able to supply you with the values for the following new variables that you must add to your cumulus-tf/terraform.tfvars file:

    • csdap_host_url: The URL of the Cognito service to which your Cumulus deployment will make Cognito API calls during a distribution (download) event
    • csdap_client_id: The client ID for the Cumulus application registered within the Cognito service
    • csdap_client_password: The client password for the Cumulus application registered within the Cognito service

    Although you might have to wait a bit for your Cognito user credentials, the remaining instructions do not depend upon having them, so you may continue with these instructions while waiting for your credentials.

    Cumulus Distribution URL

    Your Cumulus Distribution URL is used by Cumulus to generate download URLs as part of the granule metadata generated and published to the CMR. For example, a granule download URL will be of the form <distribution url>/<protected bucket>/<key> (or <distribution url>/path/to/file, if using a custom bucket map, as explained further below).

    By default, the value of your distribution URL is the URL of your private Cumulus Distribution API Gateway (the API Gateway named <prefix>-distribution, once you deploy the Cumulus Distribution module). Therefore, by default, the generated download URLs are private, and thus inaccessible directly, but there are 2 ways to address this issue (both of which are detailed below): (a) use tunneling (typically in development) or (b) put a CloudFront URL in front of your API Gateway (typically in production, and perhaps UAT and/or SIT).

    In either case, you must first know the default URL (i.e., the URL for the private Cumulus Distribution API Gateway). In order to obtain this default URL, you must first deploy your cumulus-tf module with the new Cumulus Distribution module, and once your initial deployment is complete, one of the Terraform outputs will be cumulus_distribution_api_uri, which is the URL for the private API Gateway.

    You may override this default URL by adding a cumulus_distribution_url variable to your cumulus-tf/terraform.tfvars file, and setting it to one of the following values (both of which are explained below):

    1. The default URL, but with a port added to it, in order to allow you to configure tunneling (typically only in development)
    2. A CloudFront URL placed in front of your Cumulus Distribution API Gateway (typically only for Production, but perhaps also for a UAT or SIT environment)

    The following subsections explain these approaches, in turn.

    Using your Cumulus Distribution API Gateway URL as your distribution URL

    Since your Cumulus Distribution API Gateway URL is private, the only way you can use it to confirm that your integration with Cognito is working is by using tunneling (again, generally for development), as described here. Here is an outline of the required steps, with details provided further below:

    1. Create/import a key pair into your AWS EC2 service (if you haven't already done so)
    2. Add a reference to the name of the key pair to your Terraform variables (we'll set the key_name Terraform variable)
    3. Choose an open local port on your machine (we'll use 9000 in the following details)
    4. Add a reference to the value of your cumulus_distribution_api_uri (mentioned earlier), including your chosen port (we'll set the cumulus_distribution_url Terraform variable)
    5. Redeploy Cumulus
    6. Add an entry to your /etc/hosts file
    7. Add a redirect URI to Cognito, via the Cognito API
    8. Install the Session Manager Plugin for the AWS CLI (if you haven't already done so; assuming you have already installed the AWS CLI)
    9. Add a sample file to S3 to test downloading via Cognito

    To create or import an existing key pair, you can use the AWS CLI (see aws ec2 import-key-pair), or the AWS Console (see Amazon EC2 key pairs and Linux instances).

    Once your key pair is added to AWS, add the following to your cumulus-tf/terraform.tfvars file:

    key_name = "<name>"
    cumulus_distribution_url = "https://<id>.execute-api.<region>.amazonaws.com:<port>/dev/"

    where:

    • <name> is the name of the key pair you just added to AWS
    • <id> and <region> are the corresponding parts from your cumulus_distribution_api_uri output variable
    • <port> is your open local port of choice (9000 is typically a good choice)

    Once you save your variable changes, redeploy your cumulus-tf module.

    While your deployment runs, add the following entry to your /etc/hosts file, replacing <hostname> with the host name of the cumulus_distribution_url Terraform variable you just added above:

    localhost <hostname>

    Next, you'll need to use the Cognito API to add the value of your cumulus_distribution_url Terraform variable as a Cognito redirect URI. To do so, use your favorite tool (e.g., curl, wget, Postman, etc.) to make a BasicAuth request to the Cognito API, using the following details:

    • method: POST
    • base URL: the value of your csdap_host_url Terraform variable
    • path: /authclient/updateRedirectUri
    • username: the value of your csdap_client_id Terraform variable
    • password: the value of your csdap_client_password Terraform variable
    • headers: Content-Type='application/x-www-form-urlencoded'
    • body: redirect_uri=<cumulus_distribution_url>/login

    where <cumulus_distribution_url> is the value of your cumulus_distribution_url Terraform variable. Note the /login path at the end of the redirect_uri value.

    For reference, see the Cognito Authentication Service API.

    Next, install the Session Manager Plugin for the AWS CLI. If running on macOS, and you use Homebrew, you can install it simply as follows:

    brew install --cask session-manager-plugin --no-quarantine

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    At this point, you should be ready to open a tunnel and attempt to download your sample file via your browser, summarized as follows:

    1. Determine your ec2 instance ID
    2. Connect to the NASA VPN
    3. Start an AWS SSM session
    4. Open an ssh tunnel
    5. Use a browser to navigate to your file

    To determine your ec2 instance ID for your Cumulus deployment, run the follow command, where <profile> is the name of the appropriate AWS profile to use, and <prefix> is the value of your prefix Terraform variable:

    aws --profile <profile> ec2 describe-instances --filters Name=tag:Deployment,Values=<prefix> Name=instance-state-name,Values=running --query "Reservations[0].Instances[].InstanceId" --output text

    IMPORTANT: Before proceeding with the remaining steps, make sure you're connected to the NASA VPN.

    Use the value output from the command above in place of <id> in the following command, which will start an SSM session:

    aws ssm start-session --target <id> --document-name AWS-StartPortForwardingSession --parameters portNumber=22,localPortNumber=6000

    If successful, you should see output similar to the following:

    Starting session with SessionId: NGAPShApplicationDeveloper-***
    Port 6000 opened for sessionId NGAPShApplicationDeveloper-***.
    Waiting for connections...

    Open another terminal window, and open a tunnel with port forwarding, using your chosen port from above (e.g., 9000):

    ssh -4 -p 6000 -N -L <port>:<api-gateway-host>:443 ec2-user@127.0.0.1

    where:

    • <port> is the open local port you chose earlier (e.g., 9000)
    • <api-gateway-host> is the hostname of your private API Gateway (i.e., the host portion of the URL you used as the value of your cumulus_distribution_url Terraform variable above)

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3 above.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    Once you're finished testing, clean up as follows:

    1. Kill your ssh tunnel (Ctrl-C)
    2. Kill your AWS SSM session (Ctrl-C)
    3. If you like, disconnect from the NASA VPC

    While this is a relatively lengthy process, things are much easier when using CloudFront, such as in Production (OPS), SIT, or UAT, as explained next.

    Using a CloudFront URL as your distribution URL

    In Production (OPS), and perhaps in other environments, such as UAT and SIT, you'll need to provide a publicly accessible URL for users to use for downloading (distributing) granule files.

    This is generally done by placing a CloudFront URL in front of your private Cumulus Distribution API Gateway. In order to create such a CloudFront URL, contact the person who helped you obtain your Cognito credentials, and request a CloudFront URL with the following details:

    • The private, backing URL, which is the value of your cumulus_distribution_api_uri Terraform output value
    • A request to add the AWS account's VPC to the whitelist

    Once this request is completed, and you obtain the new CloudFront URL, override your default distribution URL with the CloudFront URL by adding the following to your cumulus-tf/terraform.tfvars file:

    cumulus_distribution_url = <cloudfront_url>

    In addition, add a Cognito redirect URI, as detailed in the previous section. Note that in this case, the value you'll use for redirect_uri is <cloudfront_url>/login since the value of your cumulus_distribution_url is now your CloudFront URL.

    At this point, it is assumed that you have added the appropriate values for this environment for the variables described at the top (csdap_host_url, csdap_client_id, and csdap_client_password).

    Redeploy Cumulus with your new/updated Terraform variables.

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    S3 Bucket Mapping

    An S3 Bucket map allows users to abstract bucket names. If the bucket names change at any point, only the bucket map would need to be updated instead of every S3 link.

    The Cumulus Distribution API uses a bucket_map.yaml or bucket_map.yaml.tmpl file to determine which buckets to serve. See the examples.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple json mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    Note: Cumulus only supports a one-to-one mapping of bucket -> Cumulus Distribution path for 'distribution' buckets. Also, the bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Switching from the Thin Egress App to Cumulus Distribution

    If you have previously deployed the Thin Egress App (TEA) as your distribution app, you can switch to Cumulus Distribution by following the steps above.

    Note, however, that the cumulus_distribution module will generate a bucket map cache and overwrite any existing bucket map caches created by TEA.

    There will also be downtime while your API gateway is updated.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/deployment/index.html b/docs/v10.1.0/deployment/index.html index 1062c87f713..02b1eba4786 100644 --- a/docs/v10.1.0/deployment/index.html +++ b/docs/v10.1.0/deployment/index.html @@ -5,7 +5,7 @@ How to Deploy Cumulus | Cumulus Documentation - + @@ -21,7 +21,7 @@ for deployment's EC2 instances and allows you to connect to them via SSH/SSM.

    Consider the sizing of your Cumulus instance when configuring your variables.

    Choose a distribution API

    Cumulus can be configured to use either the Thin Egress App (TEA) or the Cumulus Distribution API. The default selection is the Thin Egress App if you're using the Deployment Template.

    IMPORTANT! If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    Configure the Thin Egress App

    The Thin Egress App can be used for Cumulus distribution and is the default selection. It allows authentication using Earthdata Login. Follow the steps in the documentation to configure distribution in your cumulus-tf deployment.

    Configure the Cumulus Distribution API (optional)

    If you would prefer to use the Cumulus Distribution API, which supports AWS Cognito authentication, follow these steps to configure distribution in your cumulus-tf deployment.

    Initialize Terraform

    Follow the above instructions to initialize Terraform using terraform init3.

    Deploy

    Run terraform apply to deploy the resources. Type yes when prompted to confirm that you want to create the resources. Assuming the operation is successful, you should see output like this:

    Apply complete! Resources: 292 added, 0 changed, 0 destroyed.

    Outputs:

    archive_api_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/token
    archive_api_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/
    distribution_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/login
    distribution_url = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/

    Note: Be sure to copy the redirect URLs, as you will use them to update your Earthdata application.

    Update Earthdata Application

    You will need to add two redirect URLs to your EarthData login application.

    1. Login to URS.
    2. Under My Applications -> Application Administration -> use the edit icon of your application.
    3. Under Manage -> redirect URIs, add the Archive API url returned from the stack deployment
      • e.g. archive_api_redirect_uri = https://<czbbkscuy6>.execute-api.us-east-1.amazonaws.com/dev/token.
    4. Also add the Distribution url
      • e.g. distribution_redirect_uri = https://<kido2r7kji>.execute-api.us-east-1.amazonaws.com/dev/login1.
    5. You may delete the placeholder url you used to create the application.

    If you've lost track of the needed redirect URIs, they can be located on the API Gateway. Once there, select <prefix>-archive and/or <prefix>-thin-egress-app-EgressGateway, Dashboard and utilizing the base URL at the top of the page that is accompanied by the text Invoke this API at:. Make sure to append /token for the archive URL and /login to the thin egress app URL.


    Deploy Cumulus dashboard

    Dashboard Requirements

    Please note that the requirements are similar to the Cumulus stack deployment requirements. The installation instructions below include a step that will install/use the required node version referenced in the .nvmrc file in the dashboard repository.

    Prepare AWS

    Create S3 bucket for dashboard:

    • Create it, e.g. <prefix>-dashboard. Use the command line or console as you did when preparing AWS configuration.
    • Configure the bucket to host a website:
      • AWS S3 console: Select <prefix>-dashboard bucket then, "Properties" -> "Static Website Hosting", point to index.html
      • CLI: aws s3 website s3://<prefix>-dashboard --index-document index.html
    • The bucket's url will be http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or you can find it on the AWS console via "Properties" -> "Static website hosting" -> "Endpoint"
    • Ensure the bucket's access permissions allow your deployment user access to write to the bucket

    Install dashboard

    To install the dashboard, clone the Cumulus dashboard repository into the root deploy directory and install dependencies with npm install:

      git clone https://github.com/nasa/cumulus-dashboard
    cd cumulus-dashboard
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Dashboard versioning

    By default, the master branch will be used for dashboard deployments. The master branch of the dashboard repo contains the most recent stable release of the dashboard.

    If you want to test unreleased changes to the dashboard, use the develop branch.

    Each release/version of the dashboard will have a tag in the dashboard repo. Release/version numbers will use semantic versioning (major/minor/patch).

    To checkout and install a specific version of the dashboard:

      git fetch --tags
    git checkout <version-number> # e.g. v1.2.0
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Building the dashboard

    Note: These environment variables are available during the build: APIROOT, DAAC_NAME, STAGE, HIDE_PDR. Any of these can be set on the command line to override the values contained in config.js when running the build below.

    To configure your dashboard for deployment, set the APIROOT environment variable to your app's API root.2

    Build the dashboard from the dashboard repository root directory, cumulus-dashboard:

      APIROOT=<your_api_root> npm run build

    Dashboard deployment

    Deploy dashboard to s3 bucket from the cumulus-dashboard directory:

    Using AWS CLI:

      aws s3 sync dist s3://<prefix>-dashboard --acl public-read

    From the S3 Console:

    • Open the <prefix>-dashboard bucket, click 'upload'. Add the contents of the 'dist' subdirectory to the upload. Then select 'Next'. On the permissions window allow the public to view. Select 'Upload'.

    You should be able to visit the dashboard website at http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or find the url <prefix>-dashboard -> "Properties" -> "Static website hosting" -> "Endpoint" and login with a user that you configured for access in the Configure and Deploy the Cumulus Stack step.


    Cumulus Instance Sizing

    The Cumulus deployment default sizing for Elasticsearch instances, EC2 instances, and Autoscaling Groups are small and designed for testing and cost savings. The default settings are likely not suitable for production workloads. Sizing is highly individual and dependent on expected load and archive size.

    Please be cognizant of costs as any change in size will affect your AWS bill. AWS provides a pricing calculator for estimating costs.

    Elasticsearch

    The mappings file contains all of the data types that will be indexed into Elasticsearch. Elasticsearch sizing is tied to your archive size, including your collections, granules, and workflow executions that will be stored.

    AWS provides documentation on calculating and configuring for sizing.

    In addition to size you'll want to consider the number of nodes which determine how the system reacts in the event of a failure.

    Configuration can be done in the data persistence module in elasticsearch_config and the cumulus module in es_index_shards.

    If you make changes to your Elasticsearch configuration you will need to reindex for those changes to take effect.

    EC2 instances and autoscaling groups

    EC2 instances are used for long-running operations (i.e. generating a reconciliation report) and long-running workflow tasks. Configuration for your ECS cluster is achieved via Cumulus deployment variables.

    When configuring your ECS cluster consider:

    • The EC2 instance type and EBS volume size needed to accommodate your workloads. Configured as ecs_cluster_instance_type and ecs_cluster_instance_docker_volume_size.
    • The minimum and desired number of instances on hand to accommodate your workloads. Configured as ecs_cluster_min_size and ecs_cluster_desired_size.
    • The maximum number of instances you will need and are willing to pay for to accommodate your heaviest workloads. Configured as ecs_cluster_max_size.
    • Your autoscaling parameters: ecs_cluster_scale_in_adjustment_percent, ecs_cluster_scale_out_adjustment_percent, ecs_cluster_scale_in_threshold_percent, and ecs_cluster_scale_out_threshold_percent.

    Footnotes


    1. Run terraform init if:

      • This is the first time deploying the module
      • You have added any additional child modules, including Cumulus components
      • You have updated the source for any of the child modules

    2. To add another redirect URIs to your application. On Earthdata home page, select "My Applications". Scroll down to "Application Administration" and use the edit icon for your application. Then Manage -> Redirect URIs.

    3. The API root can be found a number of ways. The easiest is to note it in the output of the app deployment step. But you can also find it from the AWS console -> Amazon API Gateway -> APIs -> <prefix>-archive -> Dashboard, and reading the URL at the top after "Invoke this API at"

    - + \ No newline at end of file diff --git a/docs/v10.1.0/deployment/postgres_database_deployment/index.html b/docs/v10.1.0/deployment/postgres_database_deployment/index.html index 3fb8d500520..e79afa2cc44 100644 --- a/docs/v10.1.0/deployment/postgres_database_deployment/index.html +++ b/docs/v10.1.0/deployment/postgres_database_deployment/index.html @@ -5,7 +5,7 @@ PostgreSQL Database Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ cumulus-rds-tf that will deploy an AWS RDS Aurora Serverless PostgreSQL 10.2 compatible database cluster, and optionally provision a single deployment database with credentialed secrets for use with Cumulus.

    We have provided an example terraform deployment using this module in the Cumulus template-deploy repository on github.

    Use of this example involves:

    • Creating/configuring a Terraform module directory
    • Using Terraform to deploy resources to AWS

    Requirements

    Configuration/installation of this module requires the following:

    • Terraform
    • git
    • A VPC configured for use with Cumulus Core. This should match the subnets you provide when Deploying Cumulus to allow Core's lambdas to properly access the database.
    • At least two subnets across multiple AZs. These should match the subnets you provide as configuration when Deploying Cumulus, and should be within the same VPC.

    Needed Git Repositories

    Assumptions

    OS/Environment

    The instructions in this module require Linux/MacOS. While deployment via Windows is possible, it is unsupported.

    Terraform

    This document assumes knowledge of Terraform. If you are not comfortable working with Terraform, the following links should bring you up to speed:

    For Cumulus specific instructions on installation of Terraform, refer to the main Cumulus Installation Documentation

    Aurora/RDS

    This document also assumes some basic familiarity with PostgreSQL databases, and Amazon Aurora/RDS. If you're unfamiliar consider perusing the AWS docs, and the Aurora Serverless V1 docs.

    Prepare deployment repository

    If you already are working with an existing repository that has a configured rds-cluster-tf deployment for the version of Cumulus you intend to deploy or update, or just need to configure this module for your repository, skip to Prepare AWS configuration.

    Clone the cumulus-template-deploy repo and name appropriately for your organization:

      git clone https://github.com/nasa/cumulus-template-deploy <repository-name>

    We will return to configuring this repo and using it for deployment below.

    Optional: Create a new repository

    Create a new repository on Github so that you can add your workflows and other modules to source control:

      git remote set-url origin https://github.com/<org>/<repository-name>
    git push origin master

    You can then add/commit changes as needed.

    Note: If you are pushing your deployment code to a git repo, make sure to add terraform.tf and terraform.tfvars to .gitignore, as these files will contain sensitive data related to your AWS account.


    Prepare AWS configuration

    To deploy this module, you need to make sure that you have the following steps from the Cumulus deployment instructions in similar fashion for this module:

    --

    Configure and deploy the module

    When configuring this module, please keep in mind that unlike Cumulus deployment, this module should be deployed once to create the database cluster and only thereafter to make changes to that configuration/upgrade/etc. This module does not need to be re-deployed for each Core update.

    These steps should be executed in the rds-cluster-tf directory of the template deploy repo that you previously cloned. Run the following to copy the example files:

    cd rds-cluster-tf/
    cp terraform.tf.example terraform.tf
    cp terraform.tfvars.example terraform.tfvars

    In terraform.tf, configure the remote state settings by substituting the appropriate values for:

    • bucket
    • dynamodb_table
    • PREFIX (whatever prefix you've chosen for your deployment)

    Fill in the appropriate values in terraform.tfvars. See the rds-cluster-tf module variable definitions for more detail on all of the configuration options. A few notable configuration options are documented in the next section.

    Configuration Options

    • deletion_protection -- defaults to true. Set it to false if you want to be able to delete your cluster with a terraform destroy without manually updating the cluster.
    • db_admin_username -- cluster database administration username. Defaults to postgres.
    • db_admin_password -- required variable that specifies the admin user password for the cluster. To randomize this on each deployment, consider using a random_string resource as input.
    • region -- defaults to us-east-1.
    • subnets -- requires at least 2 across different AZs. For use with Cumulus, these AZs should match the values you configure for your lambda_subnet_ids.
    • max_capacity -- the max ACUs the cluster is allowed to use. Carefully consider cost/performance concerns when setting this value.
    • min_capacity -- the minimum ACUs the cluster will scale to
    • provision_user_database -- Optional flag to allow module to provision a user database in addition to creating the cluster. Described in the next section.

    Provision user and user database

    If you wish for the module to provision a PostgreSQL database on your new cluster and provide a secret for access in the module output, in addition to managing the cluster itself, the following configuration keys are required:

    • provision_user_database -- must be set to true, this configures the module to deploy a lambda that will create the user database, and update the provided configuration on deploy.
    • permissions_boundary_arn -- the permissions boundary to use in creating the roles for access the provisioning lambda will need. This should in most use cases be the same one used for Cumulus Core deployment.
    • rds_user_password -- the value to set the user password to
    • prefix -- this value will be used to set a unique identifier the ProvisionDatabase lambda, as well as name the provisioned user/database.

    Once configured, the module will deploy the lambda, and run it on each provision, creating the configured database if it does not exist, updating the user password if that value has been changed, and updating the output user database secret.

    Setting provision_user_database to false after provisioning will not result in removal of the configured database, as the lambda is non-destructive as configured in this module.

    Please Note: This functionality is limited in that it will only provision a single database/user and configure a basic database, and should not be used in scenarios where more complex configuration is required.

    Initialize Terraform

    Run terraform init

    You should see output like:

    * provider.aws: version = "~> 2.32"

    Terraform has been successfully initialized!

    Deploy

    Run terraform apply to deploy the resources.

    If re-applying this module, variables (e.g. engine_version, snapshot_identifier ) that force a recreation of the database cluster may result in data loss if deletion protection is disabled. Examine the changeset carefully for resources that will be re-created/destroyed before applying.

    Review the changeset, and assuming it looks correct, type yes when prompted to confirm that you want to create all of the resources.

    Assuming the operation is successful, you should see output similar to the following (this example omits the creation of a user database/lambdas/security groups):

    terraform apply

    An execution plan has been generated and is shown below.
    Resource actions are indicated with the following symbols:
    + create

    Terraform will perform the following actions:

    # module.rds_cluster.aws_db_subnet_group.default will be created
    + resource "aws_db_subnet_group" "default" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + subnet_ids = [
    + "subnet-xxxxxxxxx",
    + "subnet-xxxxxxxxx",
    ]
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    }

    # module.rds_cluster.aws_rds_cluster.cumulus will be created
    + resource "aws_rds_cluster" "cumulus" {
    + apply_immediately = true
    + arn = (known after apply)
    + availability_zones = (known after apply)
    + backup_retention_period = 1
    + cluster_identifier = "xxxxxxxxx"
    + cluster_identifier_prefix = (known after apply)
    + cluster_members = (known after apply)
    + cluster_resource_id = (known after apply)
    + copy_tags_to_snapshot = false
    + database_name = "xxxxxxxxx"
    + db_cluster_parameter_group_name = (known after apply)
    + db_subnet_group_name = (known after apply)
    + deletion_protection = true
    + enable_http_endpoint = true
    + endpoint = (known after apply)
    + engine = "aurora-postgresql"
    + engine_mode = "serverless"
    + engine_version = "10.12"
    + final_snapshot_identifier = "xxxxxxxxx"
    + hosted_zone_id = (known after apply)
    + id = (known after apply)
    + kms_key_id = (known after apply)
    + master_password = (sensitive value)
    + master_username = "xxxxxxxxx"
    + port = (known after apply)
    + preferred_backup_window = "07:00-09:00"
    + preferred_maintenance_window = (known after apply)
    + reader_endpoint = (known after apply)
    + skip_final_snapshot = false
    + storage_encrypted = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_security_group_ids = (known after apply)

    + scaling_configuration {
    + auto_pause = true
    + max_capacity = 4
    + min_capacity = 2
    + seconds_until_auto_pause = 300
    + timeout_action = "RollbackCapacityChange"
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret.rds_login will be created
    + resource "aws_secretsmanager_secret" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + policy = (known after apply)
    + recovery_window_in_days = 30
    + rotation_enabled = (known after apply)
    + rotation_lambda_arn = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }

    + rotation_rules {
    + automatically_after_days = (known after apply)
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret_version.rds_login will be created
    + resource "aws_secretsmanager_secret_version" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + secret_id = (known after apply)
    + secret_string = (sensitive value)
    + version_id = (known after apply)
    + version_stages = (known after apply)
    }

    # module.rds_cluster.aws_security_group.rds_cluster_access will be created
    + resource "aws_security_group" "rds_cluster_access" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + egress = (known after apply)
    + id = (known after apply)
    + ingress = (known after apply)
    + name = (known after apply)
    + name_prefix = "cumulus_rds_cluster_access_ingress"
    + owner_id = (known after apply)
    + revoke_rules_on_delete = false
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_id = "vpc-xxxxxxxxx"
    }

    # module.rds_cluster.aws_security_group_rule.rds_security_group_allow_PostgreSQL will be created
    + resource "aws_security_group_rule" "rds_security_group_allow_postgres" {
    + from_port = 5432
    + id = (known after apply)
    + protocol = "tcp"
    + security_group_id = (known after apply)
    + self = true
    + source_security_group_id = (known after apply)
    + to_port = 5432
    + type = "ingress"
    }

    Plan: 6 to add, 0 to change, 0 to destroy.

    Do you want to perform these actions?
    Terraform will perform the actions described above.
    Only 'yes' will be accepted to approve.

    Enter a value: yes

    module.rds_cluster.aws_db_subnet_group.default: Creating...
    module.rds_cluster.aws_security_group.rds_cluster_access: Creating...
    module.rds_cluster.aws_secretsmanager_secret.rds_login: Creating...

    Then, after the resources are created:

    Apply complete! Resources: X added, 0 changed, 0 destroyed.
    Releasing state lock. This may take a few moments...

    Outputs:

    admin_db_login_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxxxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmdR
    admin_db_login_secret_version = xxxxxxxxx
    rds_endpoint = xxxxxxxxx.us-east-1.rds.amazonaws.com
    security_group_id = xxxxxxxxx
    user_credentials_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmpXA

    Note the output values for admin_db_login_secret_arn (and optionally user_credentials_secret_arn) as these provide the AWS Secrets Manager secret required to access the database as the administrative user and, optionally, the user database credentials Cumulus requires as well.

    The content of each of these secrets are is in the form:

    {
    "database": "postgres",
    "dbClusterIdentifier": "clusterName",
    "engine": "postgres",
    "host": "xxx",
    "password": "defaultPassword",
    "port": 5432,
    "username": "xxx"
    }
    • database -- the PostgreSQL database used by the configured user
    • dbClusterIdentifier -- the value set by the cluster_identifier variable in the terraform module
    • engine -- the Aurora/RDS database engine
    • host -- the RDS service host for the database in the form (dbClusterIdentifier)-(AWS ID string).(region).rds.amazonaws.com
    • password -- the database password
    • username -- the account username
    • port -- The database connection port, should always be 5432

    Next Steps

    The database cluster has been created/updated! From here you can continue to add additional user accounts, databases and other database configuration.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/deployment/share-s3-access-logs/index.html b/docs/v10.1.0/deployment/share-s3-access-logs/index.html index 0d6f5770aeb..169a3d3099f 100644 --- a/docs/v10.1.0/deployment/share-s3-access-logs/index.html +++ b/docs/v10.1.0/deployment/share-s3-access-logs/index.html @@ -5,14 +5,14 @@ Share S3 Access Logs | Cumulus Documentation - +
    Version: v10.1.0

    Share S3 Access Logs

    It is possible through Cumulus to share S3 access logs across multiple S3 packages using the S3 replicator package.

    S3 Replicator

    The S3 Replicator is a node package that contains a simple lambda function, associated permissions, and the Terraform instructions to replicate create-object events from one S3 bucket to another.

    First ensure that you have enabled S3 Server Access Logging.

    Next configure your config.tfvars as described in the s3-replicator/README.md to correspond to your deployment. The source_bucket and source_prefix are determined by how you enabled the S3 Server Access Logging.

    In order to deploy the s3-replicator with cumulus you will need to add the module to your terraform main.tf definition. e.g.

    module "s3-replicator" {
    source = "<path to s3-replicator.zip>"
    prefix = var.prefix
    vpc_id = var.vpc_id
    subnet_ids = var.subnet_ids
    permissions_boundary = var.permissions_boundary_arn
    source_bucket = var.s3_replicator_config.source_bucket
    source_prefix = var.s3_replicator_config.source_prefix
    target_bucket = var.s3_replicator_config.target_bucket
    target_prefix = var.s3_replicator_config.target_prefix
    }

    The terraform source package can be found on the Cumulus github release page under the asset tab terraform-aws-cumulus-s3-replicator.zip.

    ESDIS Metrics

    In the NGAP environment, the ESDIS Metrics team has set up an ELK stack to process logs from Cumulus instances. To use this system, you must deliver any S3 Server Access logs that Cumulus creates.

    Configure the S3 replicator as described above using the target_bucket and target_prefix provided by the metrics team.

    The metrics team has taken care of setting up Logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/deployment/terraform-best-practices/index.html b/docs/v10.1.0/deployment/terraform-best-practices/index.html index 47be50b35c8..cfee8bc8ef7 100644 --- a/docs/v10.1.0/deployment/terraform-best-practices/index.html +++ b/docs/v10.1.0/deployment/terraform-best-practices/index.html @@ -5,7 +5,7 @@ Terraform Best Practices | Cumulus Documentation - + @@ -88,7 +88,7 @@ AWS CLI command, replacing PREFIX with your deployment prefix name:

    aws resourcegroupstaggingapi get-resources \
    --query "ResourceTagMappingList[].ResourceARN" \
    --tag-filters Key=Deployment,Values=PREFIX

    Ideally, the output should be an empty list, but if it is not, then you may need to manually delete the listed resources.

    Configuring the Cumulus deployment: link Restoring a previous version: link

    - + \ No newline at end of file diff --git a/docs/v10.1.0/deployment/thin_egress_app/index.html b/docs/v10.1.0/deployment/thin_egress_app/index.html index 18e37be892f..37691d8f827 100644 --- a/docs/v10.1.0/deployment/thin_egress_app/index.html +++ b/docs/v10.1.0/deployment/thin_egress_app/index.html @@ -5,7 +5,7 @@ Using the Thin Egress App for Cumulus distribution | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v10.1.0

    Using the Thin Egress App for Cumulus distribution

    The Thin Egress App (TEA) is an app running in Lambda that allows retrieving data from S3 using temporary links and provides URS integration.

    Configuring a TEA deployment

    TEA is deployed using Terraform modules. Refer to these instructions for guidance on how to integrate new components with your deployment.

    The cumulus-template-deploy repository cumulus-tf/main.tf contains a thin_egress_app for distribution.

    The TEA module provides these instructions showing how to add it to your deployment and the following are instructions to configure the thin_egress_app module in your Cumulus deployment.

    Create a secret for signing Thin Egress App JWTs

    The Thin Egress App uses JWTs internally to authenticate requests and requires a secret stored in AWS Secrets Manager containing SSH keys that are used to sign the JWTs.

    See the Thin Egress App documentation on how to create this secret with the correct values. It will be used later to set the thin_egress_jwt_secret_name variable when deploying the Cumulus module.

    bucket_map.yaml

    The Thin Egress App uses a bucket_map.yaml file to determine which buckets to serve. Documentation of the file format is available here.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple json mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    Please note: Cumulus only supports a one-to-one mapping of bucket->TEA path for 'distribution' buckets.

    Optionally configure a custom bucket map

    A simple config would look something like this:

    bucket_map.yaml
    MAP:
    my-protected: my-protected
    my-public: my-public

    PUBLIC_BUCKETS:
    - my-public

    Please note: your custom bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Optionally configure shared variables

    The cumulus module deploys certain components that interact with TEA. As a result, the cumulus module requires that if you are specifying a value for the stage_name variable to the TEA module, you must use the same value for the tea_api_gateway_stage variable to the cumulus module.

    One way to keep these variable values in sync across the modules is to use Terraform local values to define values to use for the variables for both modules. This approach is shown in the Cumulus core example deployment code.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/deployment/upgrade-readme/index.html b/docs/v10.1.0/deployment/upgrade-readme/index.html index 7d459f5043b..73df8f1837f 100644 --- a/docs/v10.1.0/deployment/upgrade-readme/index.html +++ b/docs/v10.1.0/deployment/upgrade-readme/index.html @@ -5,7 +5,7 @@ Upgrading Cumulus | Cumulus Documentation - + @@ -15,7 +15,7 @@ deployment functions correctly. Please refer to some recommended smoke tests given above, and consider additional tests appropriate for your particular deployment and environment.

    Update Cumulus Dashboard

    If there are breaking (or otherwise significant) changes to the Cumulus API, you should also upgrade your Cumulus Dashboard deployment to use the version of the Cumulus API matching the version of Cumulus to which you are migrating.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/development/forked-pr/index.html b/docs/v10.1.0/development/forked-pr/index.html index db93bd07280..2a2cfdf8ad2 100644 --- a/docs/v10.1.0/development/forked-pr/index.html +++ b/docs/v10.1.0/development/forked-pr/index.html @@ -5,13 +5,13 @@ Issuing PR From Forked Repos | Cumulus Documentation - +
    Version: v10.1.0

    Issuing PR From Forked Repos

    Fork the Repo

    • Fork the Cumulus repo
    • Create a new branch from the branch you'd like to contribute to
    • If an issue does't already exist, submit one (see above)

    Create a Pull Request

    Reviewing PRs from Forked Repos

    Upon submission of a pull request, the Cumulus development team will review the code.

    Once the code passes an initial review, the team will run the CI tests against the proposed update.

    The request will then either be merged, declined, or an adjustment to the code will be requested via the issue opened with the original PR request.

    PRs from forked repos cannot directly merged to master. Cumulus reviews must follow the following steps before completing the review process:

    1. Create a new branch:

        git checkout -b from-<name-of-the-branch> master
    2. Push the new branch to GitHub

    3. Change the destination of the forked PR to the new branch that was just pushed

      Screenshot of Github interface showing how to change the base branch of a pull request

    4. After code review and approval, merge the forked PR to the new branch.

    5. Create a PR for the new branch to master.

    6. If the CI tests pass, merge the new branch to master and close the issue. If the CI tests do not pass, request an amended PR from the original author/ or resolve failures as appropriate.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/development/integration-tests/index.html b/docs/v10.1.0/development/integration-tests/index.html index c2482d4397d..d9377514674 100644 --- a/docs/v10.1.0/development/integration-tests/index.html +++ b/docs/v10.1.0/development/integration-tests/index.html @@ -5,7 +5,7 @@ Integration Tests | Cumulus Documentation - + @@ -19,7 +19,7 @@ in the commit message.

    If you create a new stack and want to be able to run integration tests against it in CI, you will need to add it to bamboo/select-stack.js.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/development/quality-and-coverage/index.html b/docs/v10.1.0/development/quality-and-coverage/index.html index a15554e904e..eaef1c0977a 100644 --- a/docs/v10.1.0/development/quality-and-coverage/index.html +++ b/docs/v10.1.0/development/quality-and-coverage/index.html @@ -5,7 +5,7 @@ Code Coverage and Quality | Cumulus Documentation - + @@ -23,7 +23,7 @@ here.

    To run linting on the markdown files, run npm run lint-md.

    Audit

    This project uses audit-ci to run a security audit on the package dependency tree. This must pass prior to merge. The configured rules for audit-ci can be found here.

    To execute an audit, run npm run audit.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/development/release/index.html b/docs/v10.1.0/development/release/index.html index 8f1293504f7..0fbc296d8c8 100644 --- a/docs/v10.1.0/development/release/index.html +++ b/docs/v10.1.0/development/release/index.html @@ -5,7 +5,7 @@ Versioning and Releases | Cumulus Documentation - + @@ -15,7 +15,7 @@ It's useful to use the search feature of your code editor or grep to see if there any references to the old package versions. In bash shell you can run

    find . -name package.json -exec grep -nH "@cumulus/.*MAJOR\.MINOR\.PATCH.*" {} \;

    Verify that each of those is updated to the new MAJOR.MINOR.PATCH verion you are trying to release.

    A similar search for alpha and beta versions should be run on the release version and any problems should be fixed.

    find . -name package.json -exec grep -nHE "MAJOR\.MINOR\.PATCH.*(alpha|beta)" {} \;

    3. Check Cumulus Dashboard PRs for Version Bump

    There may be unreleased changes in the Cumulus Dashboard project that rely on this unreleased Cumulus Core version.

    If there is exists a PR in the cumulus-dashboard repo with a name containing: "Version Bump for Next Cumulus API Release":

    • There will be a placeholder change-me value that should be replaced with the Cumulus Core to-be-released-version.
    • Mark that PR as ready to be reviewed.

    4. Update CHANGELOG.md

    Update the CHANGELOG.md. Put a header under the Unreleased section with the new version number and the date.

    Add a link reference for the github "compare" view at the bottom of the CHANGELOG.md, following the existing pattern. This link reference should create a link in the CHANGELOG's release header to changes in the corresponding release.

    5. Update DATA_MODEL_CHANGELOG.md

    Similar to #4, make sure the DATA_MODEL_CHANGELOG is updated if there are data model changes in the release, and the link reference at the end of the document is updated as appropriate.

    6. Update CONTRIBUTORS.md

    ./bin/update-contributors.sh
    git add CONTRIBUTORS.md

    Commit and push these changes, if any.

    7. Update Cumulus package API documentation

    Update auto-generated API documentation for any Cumulus packages that have it:

    npm run docs-build-packages

    Commit and push these changes, if any.

    8. Cut new version of Cumulus Documentation

    If this is a backport, do not create a new version of the documentation. For various reasons, we do not merge backports back to master, other than changelog notes. Documentation changes for backports will not be published to our documentation website.

    cd website
    npm run version ${release_version}
    git add .

    Where ${release_version} corresponds to the version tag v1.2.3, for example.

    Commit and push these changes.

    9. Create a pull request against the minor version branch

    1. Push the release branch (e.g. release-1.2.3) to GitHub.

    2. Create a PR against the minor version base branch (e.g. release-1.2.x).

    3. Configure Bamboo to run automated tests against this PR by finding the branch plan for the release branch (release-1.2.3) and setting only these variables:

      • GIT_PR: true
      • SKIP_AUDIT: true

      IMPORTANT: Do NOT set the PUBLISH_FLAG variable to true for this branch plan. The actual publishing of the release will be handled by a separate, manually triggered branch plan.

      Screenshot of Bamboo CI interface showing the configuration of the GIT_PR branch variable to have a value of &quot;true&quot;

    4. Verify that the Bamboo build for the PR succeeds and then merge to the minor version base branch (release-1.2.x).

      • It is safe to do a squash merge in this instance, but not required
    5. You may delete your release branch (release-1.2.3) after merging to the base branch.

    10. Create a git tag for the release

    Check out the minor version base branch (release-1.2.x) now that your changes are merged in and do a git pull.

    Ensure you are on the latest commit.

    Create and push a new git tag:

        git tag -a vMAJOR.MINOR.PATCH -m "Release MAJOR.MINOR.PATCH"
    git push origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -a v9.1.0 -m "Release 9.1.0"
    git push origin v9.1.0

    11. Publishing the release

    Publishing of new releases is handled by a custom Bamboo branch plan and is manually triggered.

    The reasons for using a separate branch plan to handle releases instead of the branch plan for the minor version (e.g. release-1.2.x) are:

    • The Bamboo build for the minor version release branch is triggered automatically on any commits to that branch, whereas we want to manually control when the release is published.
    • We want to verify that integration tests have passed on the Bamboo build for the minor version release branch before we manually trigger the release, so that we can be sure that our code is safe to release.

    If this is a new minor version branch, then you will need to create a new Bamboo branch plan for publishing the release following the instructions below:

    Creating a Bamboo branch plan for the release

    • In the Cumulus Core project (https://ci.earthdata.nasa.gov/browse/CUM-CBA), click Actions -> Configure Plan in the top right.

    • Next to Plan branch click the rightmost button that displays Create Plan Branch upon hover.

    • Click Create plan branch manually.

    • Add the values in that list. Choose a display name that makes it very clear this is a deployment branch plan. Release (minor version branch name) seems to work well (e.g. Release (1.2.x))).

      • Make sure you enter the correct branch name (e.g. release-1.2.x).
    • Important Deselect Enable Branch - if you do not do this, it will immediately fire off a build.

    • Do Immediately On the Branch Details page, enable Change trigger. Set the Trigger type to manual, this will prevent commits to the branch from triggering the build plan. You should have been redirected to the Branch Details tab after creating the plan. If not, navigate to the branch from the list where you clicked Create Plan Branch in the previous step.

    • Go to the Variables tab. Ensure that you are on your branch plan and not the master plan: You should not see a large list of configured variables, but instead a dropdown allowing you to select variables to override, and the tab title will be Branch Variables. Then set the branch variables as follow:

      • DEPLOYMENT: cumulus-from-npm-tf (except in special cases such as incompatible backport branches)
        • If this variable is not set, it will default to the deployment name for the last committer on the branch
      • USE_CACHED_BOOTSTRAP: false
      • USE_TERRAFORM_ZIPS: true (IMPORTANT: MUST be set in order to run integration tests against the .zip files published during the build so that we are actually testing our released files)
      • GIT_PR: true
      • SKIP_AUDIT: true
      • PUBLISH_FLAG: true
    • Enable the branch from the Branch Details page.

    • Run the branch using the Run button in the top right.

    Bamboo will build and run lint and unit tests against that tagged release, publish the new packages to NPM, and then run the integration tests using those newly released packages.

    12. Create a new Cumulus release on github

    The CI release scripts will automatically create a GitHub release based on the release version tag, as well as upload artifacts to the Github release for the Terraform modules provided by Cumulus. The Terraform release artifacts include:

    • A multi-module Terraform .zip artifact containing filtered copies of the tf-modules, packages, and tasks directories for use as Terraform module sources.
    • A S3 replicator module
    • A workflow module
    • A distribution API module
    • An ECS service module

    Just make sure to verify the appropriate .zip files are present on Github after the release process is complete.

    13. Merge base branch back to master

    Finally, you need to reproduce the version update changes back to master.

    If this is the latest version, you can simply create a PR to merge the minor version base branch back to master.

    Do not merge master back into the release branch since we want the release branch to just have the code from the release. Instead, create a new branch off of the release branch and merge that to master. You can freely merge master into this branch and delete it when it is merged to master.

    If this is a backport, you will need to create a PR that ports the changelog updates back to master. It is important in this changelog note to call it out as a backport. For example, fixes in backport version 1.14.5 may not be available in 1.15.0 because the fix was introduced in 1.15.3.

    Troubleshooting

    Delete and regenerate the tag

    To delete a published tag to re-tag, follow these steps:

      git tag -d vMAJOR.MINOR.PATCH
    git push -d origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -d v9.1.0
    git push -d origin v9.1.0
    - + \ No newline at end of file diff --git a/docs/v10.1.0/docs-how-to/index.html b/docs/v10.1.0/docs-how-to/index.html index 6dc3254d7af..b981c1d259c 100644 --- a/docs/v10.1.0/docs-how-to/index.html +++ b/docs/v10.1.0/docs-how-to/index.html @@ -5,13 +5,13 @@ Cumulus Documentation: How To's | Cumulus Documentation - +
    Version: v10.1.0

    Cumulus Documentation: How To's

    Cumulus Docs Installation

    Run a Local Server

    Environment variables DOCSEARCH_API_KEY and DOCSEARCH_INDEX_NAME must be set for search to work. At the moment, search is only truly functional on prod because that is the only website we have registered to be indexed with DocSearch (see below on search).

    git clone git@github.com:nasa/cumulus
    cd cumulus
    npm run docs-install
    npm run docs-serve

    Note: docs-build will build the documents into website/build.

    Cumulus Documentation

    Our project documentation is hosted on GitHub Pages. The resources published to this website are housed in docs/ directory at the top of the Cumulus repository. Those resources primarily consist of markdown files and images.

    We use the open-source static website generator Docusaurus to build html files from our markdown documentation, add some organization and navigation, and provide some other niceties in the final website (search, easy templating, etc.).

    Add a New Page and Sidebars

    Adding a new page should be as simple as writing some documentation in markdown, placing it under the correct directory in the docs/ folder and adding some configuration values wrapped by --- at the top of the file. There are many files that already have this header which can be used as reference.

    ---
    id: doc-unique-id # unique id for this document. This must be unique across ALL documentation under docs/
    title: Title Of Doc # Whatever title you feel like adding. This will show up as the index to this page on the sidebar.
    hide_title: false
    ---

    Note: To have the new page show up in a sidebar the designated id must be added to a sidebar in the website/sidebars.js file. Docusaurus has an in depth explanation of sidebars here.

    Versioning Docs

    We lean heavily on Docusaurus for versioning. Their suggestions and walk-through can be found here. It is worth noting that we would like the Documentation versions to match up directly with release versions. Cumulus versioning is explained in the Versioning Docs.

    Search on our documentation site is taken care of by DocSearch. We have been provided with an apiKey and an indexName by DocSearch that we include in our website/siteConfig.js file. The rest, indexing and actual searching, we leave to DocSearch. Our builds expect environment variables for both these values to exist - DOCSEARCH_API_KEY and DOCSEARCH_NAME_INDEX.

    Add a new task

    The tasks list in docs/tasks.md is generated from the list of task package in the task folder. Do not edit the docs/tasks.md file directly.

    Read more about adding a new task.

    Editing the tasks.md header or template

    Look at the bin/build-tasks-doc.js and bin/tasks-header.md files to edit the output of the tasks build script.

    Editing diagrams

    For some diagrams included in the documentation, the raw source is included in the docs/assets/raw directory to allow for easy updating in the future:

    • assets/interfaces.svg -> assets/raw/interfaces.drawio (generated using draw.io)

    Deployment

    The master branch is automatically built and deployed to gh-pages branch. The gh-pages branch is served by Github Pages. Do not make edits to the gh-pages branch.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/external-contributions/index.html b/docs/v10.1.0/external-contributions/index.html index fb7b82901c2..32e3bd3454f 100644 --- a/docs/v10.1.0/external-contributions/index.html +++ b/docs/v10.1.0/external-contributions/index.html @@ -5,13 +5,13 @@ External Contributions | Cumulus Documentation - +
    Version: v10.1.0

    External Contributions

    Contributions to Cumulus may be made in the form of PRs to the repositories directly or through externally developed tasks and components. Cumulus is designed as an ecosystem that leverages Terraform deployments and AWS Step Functions to easily integrate external components.

    This list may not be exhaustive and represents components that are open source, owned externally, and that have been tested with the Cumulus system. For more information and contributing guidelines, visit the respective GitHub repositories.

    Distribution

    The ASF Thin Egress App is used by Cumulus for distribution. TEA can be deployed with Cumulus or as part of other applications to distribute data.

    Operational Cloud Recovery Archive (ORCA)

    ORCA can be deployed with Cumulus to provide a customizable baseline for creating and managing operational backups.

    Workflow Tasks

    CNM

    PO.DAAC provides two workflow tasks to be used with the Cloud Notification Mechanism (CNM) Schema: CNM to Granule and CNM Response.

    See the CNM workflow data cookbook for an example of how these can be used in a Cumulus ingest workflow.

    DMR++ Generation

    GHRC has provided a DMR++ Generation wokrflow task. This task is meant to be used in conjunction with Cumulus' Hyrax Metadata Updates workflow task.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/faqs/index.html b/docs/v10.1.0/faqs/index.html index c7e75d28896..462a9f4a405 100644 --- a/docs/v10.1.0/faqs/index.html +++ b/docs/v10.1.0/faqs/index.html @@ -5,13 +5,13 @@ Frequently Asked Questions | Cumulus Documentation - +
    Version: v10.1.0

    Frequently Asked Questions

    Below are some commonly asked questions that you may encounter that can assist you along the way when working with Cumulus.

    General

    How do I deploy a new instance in Cumulus?

    Answer: For steps on the Cumulus deployment process go to How to Deploy Cumulus.

    What prerequisites are needed to setup Cumulus?

    Answer: You will need access to the AWS console and an Earthdata login before you can deploy Cumulus.

    What is the preferred web browser for the Cumulus environment?

    Answer: Our preferred web browser is the latest version of Google Chrome.

    How do I quickly troubleshoot an issue in Cumulus?

    Answer: To troubleshoot and fix issues in Cumulus reference our recommended solutions in Troubleshooting Cumulus.

    Where can I get support help?

    Answer: The following options are available for assistance:

    • Cumulus: Outside NASA users should file a GitHub issue and inside NASA users should file a JIRA issue.
    • AWS: You can create a case in the AWS Support Center, accessible via your AWS Console.

    Integrators & Developers

    What is a Cumulus integrator?

    Answer: Those who are working within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    What are the steps if I run into an issue during deployment?

    Answer: If you encounter an issue with your deployment go to the Troubleshooting Deployment guide.

    Is Cumulus customizable and flexible?

    Answer: Yes. Cumulus is a modular architecture that allows you to decide which components that you want/need to deploy. These components are maintained as Terraform modules.

    What are Terraform modules?

    Answer: They are modules that are composed to create a Cumulus deployment, which gives integrators the flexibility to choose the components of Cumulus that want/need. To view Cumulus maintained modules or steps on how to create a module go to Terraform modules.

    Where do I find Terraform module variables

    Answer: Go here for a list of Cumulus maintained variables.

    What is a Cumulus workflow?

    Answer: A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions. For more details, we suggest visiting here.

    How do I set up a Cumulus workflow?

    Answer: You will need to create a provider, have an associated collection (add a new one), and generate a new rule first. Then you can set up a Cumulus workflow by following these steps here.

    What are the common use cases that a Cumulus integrator encounters?

    Answer: The following are some examples of possible use cases you may see:


    Operators

    What is a Cumulus operator?

    Answer: Those that ingests, archives, and troubleshoots datasets (called collections in Cumulus). Your daily activities might include but not limited to the following:

    • Ingesting datasets
    • Maintaining historical data ingest
    • Starting and stopping data handlers
    • Managing collections
    • Managing provider definitions
    • Creating, enabling, and disabling rules
    • Investigating errors for granules and deleting or re-ingesting granules
    • Investigating errors in executions and isolating failed workflow step(s)
    What are the common use cases that a Cumulus operator encounters?

    Answer: The following are some examples of possible use cases you may see:

    Can you re-run a workflow execution in AWS?

    Answer: Yes. For steps on how to re-run a workflow execution go to Re-running workflow executions in the Cumulus Operator Docs.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/features/ancillary_metadata/index.html b/docs/v10.1.0/features/ancillary_metadata/index.html index f143629b73b..7a4878e89d9 100644 --- a/docs/v10.1.0/features/ancillary_metadata/index.html +++ b/docs/v10.1.0/features/ancillary_metadata/index.html @@ -5,7 +5,7 @@ Ancillary Metadata Export | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v10.1.0

    Ancillary Metadata Export

    This feature utilizes the type key on a files object in a Cumulus granule. It uses the key to provide a mechanism where granule discovery, processing and other tasks can set and use this value to facilitate metadata export to CMR.

    Tasks setting type

    Discover Granules

    Uses the Collection type key to set the value for files on discovered granules in it's output.

    Parse PDR

    Uses a task-specific mapping to map PDR 'FILE_TYPE' to a CNM type to set type on granules from the PDR.

    CNMToCMALambdaFunction

    Natively supports types that are included in incoming messages to a CNM Workflow.

    Tasks using type

    Move Granules

    Uses the granule file type key to update UMM/ECHO 10 CMR files passed in as candidates to the task. This task adds the external facing URLs to the CMR metadata file based on the type. See the file tracking data cookbook for a detailed mapping. If a non-CNM type is specified, the task assumes it is a 'data' file.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/features/backup_and_restore/index.html b/docs/v10.1.0/features/backup_and_restore/index.html index b67a4cdf09a..f1fd636796c 100644 --- a/docs/v10.1.0/features/backup_and_restore/index.html +++ b/docs/v10.1.0/features/backup_and_restore/index.html @@ -5,7 +5,7 @@ Cumulus Backup and Restore | Cumulus Documentation - + @@ -71,7 +71,7 @@ utilize the new cluster/security groups and redeploy.

    DynamoDB

    Backup and Restore with AWS

    You can enable point-in-time recovery (PITR) as well as create an on-demand backup for your Amazon DynamoDB tables.

    PITR provides continuous backups of your DynamoDB table data. PITR can be enabled through your Terraform deployment, the AWS console, or the AWS API. When enabled, DynamoDB maintains continuous backups of your table up to the last 35 days. You can recover a copy of that table to a previous state at any point in time from the moment you enable PITR, up to a maximum of the 35 preceding days. PITR provides continuous backups until you explicitly disable it.

    On-demand backups allow you to create backups of DynamoDB table data and its settings. You can initiate an on-demand backup at any time with a single click from the AWS Management Console or a single API call. You can restore the backups to a new DynamoDB table in the same AWS Region at any time.

    PITR gives your DynamoDB tables continuous protection from accidental writes and deletes. With PITR, you do not have to worry about creating, maintaining, or scheduling backups. You enable PITR on your table and your backup is available for restore at any point in time from the moment you enable it, up to a maximum of the 35 preceding days. For example, imagine a test script writing accidentally to a production DynamoDB table. You could recover your table to any point in time within the last 35 days.

    On-demand backups help with long-term archival requirements for regulatory compliance. On-demand backups give you full-control of managing the lifecycle of your backups, from creating as many backups as you need to retaining these for as long as you need.

    Enabling PITR during deployment

    By default, the Cumulus data-persistence module enables PITR on the default tables listed in the module's variable defaults for enable_point_in_time_tables. At the time of writing, that list includes:

    • AsyncOperationsTable
    • CollectionsTable
    • ExecutionsTable
    • FilesTable
    • GranulesTable
    • PdrsTable
    • ProvidersTable
    • RulesTable

    If you wish to change this list, simply update your deployment's data_persistence module (here in the template-deploy repository) to pass the correct list of tables.

    Restoring with PITR

    Restoring a full deployment

    If your deployment has been deleted all of your tables with PITR enabled will have had backups created automatically. You can locate these backups in the AWS console in the DynamoDb Backups Page or through the CLI by running:

    aws dynamodb list-backups --backup-type SYSTEM

    You can restore your tables to your AWS account using the following command:

    aws dynamodb restore-table-from-backup --target-table-name <prefix>-CollectionsTable --backup-arn <backup-arn>

    Where prefix matches the prefix from your data-persistence deployment. backup-arn can be found in the AWS console or by listing the backups using the command above.

    This will restore your tables to AWS. They will need to be linked to your Terraform deployment. After terraform init and before terraform apply, run the following command for each table:

    terraform import module.data_persistence.aws_dynamodb_table.collections_table <prefix>-CollectionsTable

    replacing collections_table with the table identifier in the DynamoDB Terraform table definitions.

    Terraform will now manage these tables as part of the Terraform state. Run terrform apply to generate the rest of the data-persistence deployment and then follow the instructions to deploy the cumulus deployment as normal.

    At this point the data will be in DynamoDB, but not in Elasticsearch, so nothing will be returned on the Operator dashboard or through Operator API calls. To get the data into Elasticsearch, run an index-from-database operation via the Operator API. The status of this operation can be viewed on the dashboard. When Elasticsearch is switched to the recovery index the data will be visible on the dashboard and available via the Operator API.

    Restoring an individual table

    A table can be restored to a previous state using PITR. This is easily achievable via the AWS Console by visiting the Backups tab for the table.

    A table can only be recovered to a new table name. Following the restoration of the table, the new table must be imported into Terraform.

    First, remove the old table from the Terraform state:

    terraform state rm module.data_persistence.aws_dynamodb_table.collections_table

    replacing collections_table with the table identifier in the DynamoDB Terraform table definitions.

    Then import the new table into the Terraform state:

    terraform import module.data_persistence.aws_dynamodb_table.collections_table <new-table-name>

    replacing collections_table with the table identifier in the DynamoDB Terraform table definitions.

    Your data-persistence and cumulus deployments should be redeployed so that your instance of Cumulus uses this new table. After the deployment, your Elasticsearch instance will be out of sync with your new table if there is any change in data. To resync your Elasticsearch with your database run an index-from-database operation via the Operator API. The status of this operation can be viewed on the dashboard. When Elasticsearch is switched to the new index the DynamoDB tables and Elasticsearch instance will be in sync and the correct data will be reflected on the dashboard.

    Backup and Restore with cumulus-api CLI

    cumulus-api CLI also includes a backup and restore command. The CLI backup command downloads the content of any of your DynamoDB tables to .json files. You can also use these .json files to restore the records to another DynamoDB table.

    Backup with the CLI

    To backup a table with the CLI, install the @cumulus/api package using npm, making sure to install the same version as your Cumulus deployment:

    npm install -g @cumulus/api@version

    Then run:

    cumulus-api backup --table <table-name>

    the backup will be stored at backups/<table-name>.json

    Restore with the CLI

    To restore data from a json file run the following command:

    cumulus-api restore backups/<table-name>.json --table <table-name>

    The restore can go to the in-use table and will update Elasticsearch. If an existing record exists in the table it will not be duplicated but will be updated with the record from the restore file.

    Data Backup and Restore

    Cumulus provides no core functionality to backup data stored in S3. Data disaster recovery is being developed in a separate effort here.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/features/data_in_dynamodb/index.html b/docs/v10.1.0/features/data_in_dynamodb/index.html index 7dcd376e929..383b8f6afc5 100644 --- a/docs/v10.1.0/features/data_in_dynamodb/index.html +++ b/docs/v10.1.0/features/data_in_dynamodb/index.html @@ -5,13 +5,13 @@ Cumulus Metadata in DynamoDB | Cumulus Documentation - +
    Version: v10.1.0

    Cumulus Metadata in DynamoDB

    @cumulus/api uses a number of methods to preserve the metadata generated in a Cumulus instance.

    All configurations and system-generated metadata is stored in DynamoDB tables except the logs. System logs are stored in the AWS CloudWatch service.

    Amazon DynamoDB stores three geographically distributed replicas of each table to enable high availability and data durability. Amazon DynamoDB runs exclusively on solid-state drives (SSDs). SSDs help AWS achieve the design goals of predictable low-latency response times for storing and accessing data at any scale.

    DynamoDB Auto Scaling

    Cumulus deployed tables from the data-persistence module are set to on-demand mode.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/features/dead_letter_archive/index.html b/docs/v10.1.0/features/dead_letter_archive/index.html index e38252e5074..0e89b963811 100644 --- a/docs/v10.1.0/features/dead_letter_archive/index.html +++ b/docs/v10.1.0/features/dead_letter_archive/index.html @@ -5,13 +5,13 @@ Cumulus Dead Letter Archive | Cumulus Documentation - +
    Version: v10.1.0

    Cumulus Dead Letter Archive

    This documentation explains the Cumulus dead letter archive and associated functionality.

    DB Records DLQ Archive

    The Cumulus system contains a number of dead letter queues. Perhaps the most important system lambda function supported by a DLQ is the sfEventSqsToDbRecords lambda function which parses Cumulus messages from workflow executions to generate and write database records to the Cumulus database.

    As of Cumulus v9+, the dead letter queue for this lambda (named sfEventSqsToDbRecordsDeadLetterQueue) has been updated with a consumer lambda that will automatically write any incoming records to the S3 system bucket, under the path <stackName>/dead-letter-archive/sqs/. This will allow integrators and operators engaged in debugging missing records to inspect any Cumulus messages which failed to process and did not result in the successful creation of database records.

    Dead Letter Archive recovery

    In addition to the above, as of Cumulus v9+, the Cumulus API also contains a new endpoint at /deadLetterArchive/recoverCumulusMessages.

    Sending a POST request to this endpoint will trigger a Cumulus AsyncOperation that will attempt to reprocess (and if successful delete) all Cumulus messages in the dead letter archive, using the same underlying logic as the existing sfEventSqsToDbRecords.

    This endpoint may prove particularly useful when recovering from extended or unexpected database outage, where messages failed to process due to external outage and there is no essential malformation of each Cumulus message.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/features/dead_letter_queues/index.html b/docs/v10.1.0/features/dead_letter_queues/index.html index caf0393ebb2..487d74aa91d 100644 --- a/docs/v10.1.0/features/dead_letter_queues/index.html +++ b/docs/v10.1.0/features/dead_letter_queues/index.html @@ -5,13 +5,13 @@ Dead Letter Queues | Cumulus Documentation - +
    Version: v10.1.0

    Dead Letter Queues

    startSF SQS queue

    The workflow-trigger for the startSF queue has a Redrive Policy set up that directs any failed attempts to pull from the workflow start queue to a SQS queue Dead Letter Queue.

    This queue can then be monitored for failures to initiate a workflow. Please note that workflow failures will not show up in this queue, only repeated failure to trigger a workflow.

    Named Lambda Dead Letter Queues

    Cumulus provides configured Dead Letter Queues (DLQ) for non-workflow Lambdas (such as ScheduleSF) to capture Lambda failures for further processing.

    These DLQs are setup with the following configuration:

      receive_wait_time_seconds  = 20
    message_retention_seconds = 1209600
    visibility_timeout_seconds = 60

    Default Lambda Configuration

    The following built-in Cumulus Lambdas are setup with DLQs to allow handling of process failures:

    • dbIndexer (Updates Elasticsearch based on DynamoDB events)
    • JobsLambda (writes logs outputs to Elasticsearch)
    • ScheduleSF (the SF Scheduler Lambda that places messages on the queue that is used to start workflows, see Workflow Triggers)
    • publishReports (Lambda that publishes messages to the SNS topics for execution, granule and PDR reporting)
    • reportGranules, reportExecutions, reportPdrs (Lambdas responsible for updating records based on messages in the queues published by publishReports)

    Troubleshooting/Utilizing messages in a Dead Letter Queue

    Ideally an automated process should be configured to poll the queue and process messages off a dead letter queue.

    For aid in manually troubleshooting, you can utilize the SQS Management console to view/messages available in the queues setup for a particular stack. The dead letter queues will have a Message Body containing the Lambda payload, as well as Message Attributes that reference both the error returned and a RequestID which can be cross referenced to the associated Lambda's CloudWatch logs for more information:

    Screenshot of the AWS SQS console showing how to view SQS message attributes

    - + \ No newline at end of file diff --git a/docs/v10.1.0/features/distribution-metrics/index.html b/docs/v10.1.0/features/distribution-metrics/index.html index d8d9486823b..bdeae002842 100644 --- a/docs/v10.1.0/features/distribution-metrics/index.html +++ b/docs/v10.1.0/features/distribution-metrics/index.html @@ -5,13 +5,13 @@ Cumulus Distribution Metrics | Cumulus Documentation - +
    Version: v10.1.0

    Cumulus Distribution Metrics

    It is possible to configure Cumulus and the Cumulus Dashboard to display information about the successes and failures of requests for data. This requires the Cumulus instance to deliver Cloudwatch Logs and S3 Server Access logs to an ELK stack.

    ESDIS Metrics in NGAP

    Work with the ESDIS metrics team to set up permissions and access to forward Cloudwatch Logs to a shared AWS:Logs:Destination as well as transferring your S3 Server Access logs to a metrics team bucket.

    The metrics team has taken care of setting up logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    Once Cumulus has been configured to deliver Cloudwatch logs to the ESDIS Metrics team, you can use the Elasticsearch indexes to create the necessary target patterns on the dashboard. These are often <daac>-cloudwatch-cumulus-<env>-* and <daac>-distribution-<env>-*, but they will depend on your specific Elastiscearch setup.

    Cumulus / ESDIS Metrics distribution system

    Architecture diagram showing how logs are replicated from a Cumulus instance to the ESDIS Metrics account and accessed by the Cumulus dashboard

    - + \ No newline at end of file diff --git a/docs/v10.1.0/features/execution_payload_retention/index.html b/docs/v10.1.0/features/execution_payload_retention/index.html index 0fdb3ef2823..3dc050d27a7 100644 --- a/docs/v10.1.0/features/execution_payload_retention/index.html +++ b/docs/v10.1.0/features/execution_payload_retention/index.html @@ -5,13 +5,13 @@ Execution Payload Retention | Cumulus Documentation - +
    Version: v10.1.0

    Execution Payload Retention

    In addition to CloudWatch logs and AWS StepFunction API records, Cumulus automatically stores the initial and 'final' (the last update to the execution record) payload values as part of the Execution record in DynamoDB and Elasticsearch.

    This allows access via the API (or optionally direct DB/Elasticsearch querying) for debugging/reporting purposes. The data is stored in the "originalPayload" and "finalPayload" fields.

    Payload record cleanup

    To reduce storage requirements, a CloudWatch rule ({stack-name}-dailyExecutionPayloadCleanupRule) triggering a daily run of the provided cleanExecutions lambda has been added. This lambda will remove all 'completed' and 'non-completed' payload records in the database that are older than the specified configuration.

    Configuration

    The following configuration flags have been made available in the cumulus module. They may be overridden in your deployment's instance of the cumulus module by adding the following configuration options:

    dailyexecution_payload_cleanup_schedule_expression (string)_

    This configuration option sets the execution times for this Lambda to run, using a Cloudwatch cron expression.

    Default value is "cron(0 4 * * ? *)".

    completeexecution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of completed execution payloads.

    Default value is false.

    completeexecution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a 'completed' status in days. Records with updatedAt values older than this with payload information will have that information removed.

    Default value is 10.

    noncomplete_execution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of "non-complete" (any status other than completed) execution payloads.

    Default value is false.

    noncomplete_execution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a status other than 'complete' in days. Records with updateTime values older than this with payload information will have that information removed.

    Default value is 30 days.

    • complete_execution_payload_disable/non_complete_execution_payload_disable

    These flags (true/false) determine if the cleanup script's logic for 'complete' and 'non-complete' executions will run. Default value is false for both.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/features/logging-esdis-metrics/index.html b/docs/v10.1.0/features/logging-esdis-metrics/index.html index 817a6b52303..e988209898a 100644 --- a/docs/v10.1.0/features/logging-esdis-metrics/index.html +++ b/docs/v10.1.0/features/logging-esdis-metrics/index.html @@ -5,13 +5,13 @@ Writing logs for ESDIS Metrics | Cumulus Documentation - +
    Version: v10.1.0

    Writing logs for ESDIS Metrics

    Note: This feature is only available for Cumulus deployments in NGAP environments.

    Prerequisite: You must configure your Cumulus deployment to deliver your logs to the correct shared logs destination for ESDIS metrics.

    Log messages delivered to the ESDIS metrics logs destination conforming to an expected format will be automatically ingested and parsed to enable helpful searching/filtering of your logs via the ESDIS metrics Kibana dashboard.

    Expected log format

    The ESDIS metrics pipeline expects a log message to be a JSON string representation of an object (dict in Python or map in Java). An example log message might look like:

    {
    "level": "info",
    "executions": "arn:aws:states:us-east-1:000000000000:execution:MySfn:abcd1234",
    "granules": "[\"granule-1\",\"granule-2\"]",
    "message": "hello world",
    "sender": "greetingFunction",
    "stackName": "myCumulus",
    "timestamp": "2018-10-19T19:12:47.501Z"
    }

    A log message can contain the following properties:

    • executions: The AWS Step Function execution name in which this task is executing, if any
    • granules: A JSON string of the array of granule IDs being processed by this code, if any
    • level: A string identifier for the type of message being logged. Possible values:
      • debug
      • error
      • fatal
      • info
      • warn
      • trace
    • message: String containing your actual log message
    • parentArn: The parent AWS Step Function execution ARN that triggered the current execution, if any
    • sender: The name of the resource generating the log message (e.g. a library name, a Lambda function name, an ECS activity name)
    • stackName: The unique prefix for your Cumulus deployment
    • timestamp: An ISO-8601 formatted timestamp
    • version: The version of the resource generating the log message, if any

    None of these properties are explicitly required for ESDIS metrics to parse your log correctly. However, a log without a message has no informational content. And having level, sender, and timestamp properties is very useful for filtering your logs. Including a stackName in your logs is helpful as it allows you to distinguish between logs generated by different deployments.

    Using Cumulus Message Adapter libraries

    If you are writing a custom task that is integrated with the Cumulus Message Adapter, then some of language specific client libraries can be used to write logs compatible with ESDIS metrics.

    The usage of each library differs slightly, but in general a logger is initialized with a Cumulus workflow message to determine the contextual information for the task (e.g. granules, executions). Then, after the logger is initialized, writing logs only requires specifying a message, but the logged output will include the contextual information as well.

    Writing logs using custom code

    Any code that produces logs matching the expected log format can be processed by ESDIS metrics.

    Node.js

    Cumulus core provides a @cumulus/logger library that writes logs in the expected format for ESDIS metrics.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/features/replay-archived-sqs-messages/index.html b/docs/v10.1.0/features/replay-archived-sqs-messages/index.html index 30f7b6b20b9..e7e0fe1db17 100644 --- a/docs/v10.1.0/features/replay-archived-sqs-messages/index.html +++ b/docs/v10.1.0/features/replay-archived-sqs-messages/index.html @@ -5,14 +5,14 @@ How to replay SQS messages archived in S3 | Cumulus Documentation - +
    Version: v10.1.0

    How to replay SQS messages archived in S3

    Context

    Cumulus archives all incoming SQS messages to S3 and removes messages once they have been processed. Unprocessed messages are archived at the path: ${stackName}/archived-incoming-messages/${queueName}/${messageId}

    Replay SQS messages endpoint

    The Cumulus API has added a new endpoint, /replays/sqs. This endpoint will allow you to start a replay operation to requeue all archived SQS messages by queueName and returns an AsyncOperationId for operation status tracking.

    Start replaying archived SQS messages

    In order to start a replay, you must perform a POST request to the replays/sqs endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    FieldTypeDescription
    queueNamestringAny valid SQS queue name (not ARN)

    Status tracking

    A successful response from the /replays/sqs endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/features/replay-kinesis-messages/index.html b/docs/v10.1.0/features/replay-kinesis-messages/index.html index d0bdc186ddd..dc95ce40ba0 100644 --- a/docs/v10.1.0/features/replay-kinesis-messages/index.html +++ b/docs/v10.1.0/features/replay-kinesis-messages/index.html @@ -5,7 +5,7 @@ How to replay Kinesis messages after an outage | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v10.1.0

    How to replay Kinesis messages after an outage

    After a period of outage, it may be necessary for a Cumulus operator to reprocess or 'replay' messages that arrived on an AWS Kinesis Data Stream but did not trigger an ingest. This document serves as an outline on how to start a replay operation, and how to perform status tracking. Cumulus supports replay of all Kinesis messages on a stream (subject to the normal RetentionPeriod constraints), or all messages within a given time slice delimited by start and end timestamps.

    As Kinesis has no comparable field to e.g. the SQS ReceiveCount on its records, Cumulus cannot tell which messages within a given time slice have never been processed, and cannot guarantee only missed messages will be processed. Users will have to rely on duplicate handling or some other method of identifying messages that should not be processed within the time slice.

    NOTE: This operation flow effectively changes only the trigger mechanism for Kinesis ingest notifications. The existence of valid Kinesis-type rules and all other normal requirements for the triggering of ingest via Kinesis still apply.

    Replays endpoint

    Cumulus has added a new endpoint to its API, /replays. This endpoint will allow you to start replay operations and returns an AsyncOperationId for operation status tracking.

    Start a replay

    In order to start a replay, you must perform a POST request to the replays endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    NOTE: As the endTimestamp relies on a comparison with the Kinesis server-side ApproximateArrivalTimestamp, and given that there is no documented level of accuracy for the approximation, it is recommended that the endTimestamp include some amount of buffer to allow for slight discrepancies. If tolerable, the same is recommended for the startTimestamp although it is used differently and less vulnerable to discrepancies since a server-side arrival timestamp should never be earlier than the client-side request timestamp.

    FieldTypeRequiredDescription
    typestringrequiredCurrently only accepts kinesis.
    kinesisStreamstringfor type kinesisAny valid kinesis stream name (not ARN)
    kinesisStreamCreationTimestamp*optionalAny input valid for a JS Date constructor. For reasons to use this field see AWS documentation on StreamCreationTimestamp.
    endTimestamp*optionalAny input valid for a JS Date constructor. Messages newer than this timestamp will be skipped.
    startTimestamp*optionalAny input valid for a JS Date constructor. Messages will be fetched from the Kinesis stream starting at this timestamp. Ignored if it is further in the past than the stream's retention period.

    Status tracking

    A successful response from the /replays endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/features/reports/index.html b/docs/v10.1.0/features/reports/index.html index eb067ff2421..1d84f28efaa 100644 --- a/docs/v10.1.0/features/reports/index.html +++ b/docs/v10.1.0/features/reports/index.html @@ -5,7 +5,7 @@ Reconciliation Reports | Cumulus Documentation - + @@ -19,7 +19,7 @@ report generation. The data buckets will include any buckets in your Cumulus buckets configuration that have type public, protected or private.
    - + \ No newline at end of file diff --git a/docs/v10.1.0/getting-started/index.html b/docs/v10.1.0/getting-started/index.html index 11ec9201343..684e98e8d65 100644 --- a/docs/v10.1.0/getting-started/index.html +++ b/docs/v10.1.0/getting-started/index.html @@ -5,13 +5,13 @@ Getting Started | Cumulus Documentation - +
    Version: v10.1.0

    Getting Started

    Overview | Quick Tutorials | Helpful Tips

    Overview

    This serves as a guide for new Cumulus users to deploy and learn how to use Cumulus. Here you will learn what you need in order to complete any prerequisites, what Cumulus is and how it works, and how to successfully navigate and deploy a Cumulus environment.

    What is Cumulus

    Cumulus is an open source set of components for creating cloud-based data ingest, archive, distribution and management designed for NASA's future Earth Science data streams.

    Who uses Cumulus

    Data integrators/developers and operators across projects not limited to NASA use Cumulus for their daily work functions.

    Cumulus Roles

    Integrator/Developer

    Cumulus integrators/developers are those who work within Cumulus and AWS for deployments and to manage workflows.

    Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections.

    Role Guides

    As a developer, integrator, or operator, you will need to set up your environments to work in Cumulus. The following docs can get you started in your role specific activities.

    What is a Cumulus Data Type

    In Cumulus, we have the following types of data that you can create and manage:

    • Collections
    • Granules
    • Providers
    • Rules
    • Workflows
    • Executions
    • Reports

    For details on how to create or manage data types go to Data Management Types.


    Quick Tutorials

    Deployment & Configuration

    Cumulus is deployed to an AWS account, so you must have access to deploy resources to an AWS account to get started.

    1. Deploy Cumulus and Cumulus Dashboard to AWS

    Follow the deployment instructions to deploy Cumulus to your AWS account.

    2. Configure and Run the HelloWorld Workflow

    If you have deployed using the cumulus-template-deploy repository, you have a HelloWorld workflow deployed to your Cumulus backend.

    You can see your deployed workflows on the Workflows page of your Cumulus dashboard.

    Configure a collection and provider using the setup guidance on the Cumulus dashboard.

    Then create a rule to trigger your HelloWorld workflow. You can select a rule type of one time.

    Navigate to the Executions page of the dashboard to check the status of your workflow execution.

    3. Configure a Custom Workflow

    See Developing a custom workflow documentation for adding a new workflow to your deployment.

    There are plenty of workflow examples using Cumulus tasks here. The Data Cookbooks provide a more in-depth look at some of these more advanced workflows and their configurations.

    There is a list of Cumulus tasks already included in your deployment here.

    After configuring your workflow and redeploying, you can configure and run your workflow using the same steps as in step 2.


    Helpful Tips

    Here are some useful tips to keep in mind when deploying or working in Cumulus.

    Integrator/Developer

    • Versioning and Releases: This documentation gives information on our global versioning approach. We suggest upgrading to the supported version for Cumulus, Cumulus dashboard, and Thin Egress App (TEA).
    • Cumulus Developer Documentation: We suggest that you read through and reference this resource for development best practices in Cumulus.
    • Cumulus Deployment: We will guide you on how to manually deploy a new instance of Cumulus. In this reference, you will learn how to install Terraform, create an AWS S3 bucket, configure a compatible database, and create a Lambda layer.
    • Terraform Best Practices: This will help guide you through your Terraform configuration and Cumulus deployment. For an introduction about Terraform go here.
    • Integrator Common Use Cases: Scenarios to help integrators along in the Cumulus environment.

    Operator

    Troubleshooting

    Troubleshooting: Some suggestions to help you troubleshoot and solve issues you may encounter.

    Resources

    - + \ No newline at end of file diff --git a/docs/v10.1.0/glossary/index.html b/docs/v10.1.0/glossary/index.html index 80d3da093c2..16cd2cc37ad 100644 --- a/docs/v10.1.0/glossary/index.html +++ b/docs/v10.1.0/glossary/index.html @@ -5,13 +5,13 @@ Glossary | Cumulus Documentation - +
    Version: v10.1.0

    Glossary

    AWS Glossary

    For terms/items from Amazon/AWS not mentioned in this glossary, please refer to the AWS Glossary.

    Cumulus Glossary of Terms

    API Gateway

    Refers to AWS's API Gateway. Used by the Cumulus API.

    ARN

    Refers to an AWS "Amazon Resource Name".

    For more info, see the AWS documentation.

    AWS

    See: aws.amazon.com

    AWS Lambda/Lambda Function

    AWS's 'serverless' option. Allows the running of code without provisioning a service or managing server/ECS instances/etc.

    For more information, see the AWS Lambda documentation.

    AWS Access Keys

    Access credentials that give you access to AWS to act as a IAM user programmatically or from the command line.

    For more information, see the AWS IAM Documentation.

    Bucket

    An Amazon S3 cloud storage resource.

    For more information, see the AWS Bucket Documentation.

    CloudFormation

    An AWS service that allows you to define and manage cloud resources as a preconfigured block.

    For more information, see the AWS CloudFormation User Guide.

    Cloudformation Template

    A template that defines an AWS Cloud Formation.

    For more information, see the AWS intro page.

    Cloudwatch

    AWS service that allows logging and metrics collections on various cloud resources you have in AWS.

    For more information, see the AWS User Guide.

    Cloud Notification Mechanism (CNM)

    An interface mechanism to support cloud-based ingest messaging. For more information, see PO.DAAC's CNM Schema.

    Common Metadata Repository (CMR)

    "A high-performance, high-quality, continuously evolving metadata system that catalogs Earth Science data and associated service metadata records". For more information, see NASA's CMR page.

    Collection (Cumulus)

    Cumulus Collections are logical sets of data objects of the same data type and version.

    For more information, see cookbook reference page.

    Cumulus Message Adapter (CMA)

    A library designed to help task developers integrate step function tasks into a Cumulus workflow by adapting task input/output into the Cumulus Message format.

    For more information, see CMA workflow reference page.

    Distributed Active Archive Center (DAAC)

    Refers to a specific organization that's part of NASA's distributed system of archive centers. For more information see EOSDIS's DAAC page

    Dead Letter Queue (DLQ)

    This refers to Amazon SQS Dead-Letter Queues - these SQS queues are specifically configured to capture failed messages from other services/SQS queues/etc to allow for processing of failed messages.

    For more on DLQs, see the Amazon Documentation and the Cumulus DLQ feature page.

    Developer

    Those who setup deployment and workflow management for Cumulus. Sometimes referred to as an integrator. See integrator.

    ECS

    Amazon's Elastic Container Service. Used in Cumulus by workflow steps that require more flexibility than Lambda can provide.

    For more information, see AWS's developer guide.

    ECS Activity

    An ECS instance run via a Step Function.

    Execution (Cumulus)

    A Cumulus execution refers to a single execution of a (Cumulus) Workflow.

    GIBS

    Global Imagery Browse Services

    Granule

    A granule is the smallest aggregation of data that can be independently managed (described, inventoried, and retrieved). Granules are always associated with a collection, which is a grouping of granules. A granule is a grouping of data files.

    IAM

    AWS Identity and Access Management.

    For more information, see AWS IAMs.

    Integrator/Developer

    Those who work within Cumulus and AWS for deployments and to manage workflows.

    Kinesis

    Amazon's platform for streaming data on AWS.

    See AWS Kinesis for more information.

    Lambda

    AWS's cloud service that lets you run code without provisioning or managing servers.

    For more information, see AWS's lambda page.

    Module (Terraform)

    Refers to a terraform module.

    Node

    See node.js.

    Npm

    Node package manager.

    For more information, see npmjs.com.

    Operator

    Those who work within Cumulus to ingest/archive data and manage collections.

    PDR

    "Polling Delivery Mechanism" used in "DAAC Ingest" workflows.

    For more information, see nasa.gov.

    Packages (NPM)

    NPM hosted node.js packages. Cumulus packages can be found on NPM's site here

    Provider

    Data source that generates and/or distributes data for Cumulus workflows to act upon.

    For more information, see the Cumulus documentation.

    Rule

    Rules are configurable scheduled events that trigger workflows based on various criteria.

    For more information, see the Cumulus Rules documentation.

    S3

    Amazon's Simple Storage Service provides data object storage in the cloud. Used in Cumulus to store configuration, data and more.

    For more information, see AWS's s3 page.

    SIPS

    Science Investigator-led Processing Systems. In the context of DAAC ingest, this refers to data producers/providers.

    For more information, see nasa.gov.

    SNS

    Amazon's Simple Notification Service provides a messaging service that allows publication of and subscription to events. Used in Cumulus to trigger workflow events, track event failures, and others.

    For more information, see AWS's SNS page.

    SQS

    Amazon's Simple Queue Service.

    For more information, see AWS's SQS page.

    Stack

    A collection of AWS resources you can manage as a single unit.

    In the context of Cumulus, this refers to a deployment of the cumulus and data-persistence modules that is managed by Terraform

    Step Function

    AWS's web service that allows you to compose complex workflows as a state machine comprised of tasks (Lambdas, activities hosted on EC2/ECS, some AWS service APIs, etc). See AWS's Step Function Documentation for more information. In the context of Cumulus these are the underlying AWS service used to create Workflows.

    Terraform

    Terraform is the tool that you will use for deployment and configuration of your Cumulus environment.

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/index.html b/docs/v10.1.0/index.html index c0ab2d86fde..30d64037926 100644 --- a/docs/v10.1.0/index.html +++ b/docs/v10.1.0/index.html @@ -5,13 +5,13 @@ Introduction | Cumulus Documentation - +
    Version: v10.1.0

    Introduction

    This Cumulus project seeks to address the existing need for a “native” cloud-based data ingest, archive, distribution, and management system that can be used for all future Earth Observing System Data and Information System (EOSDIS) data streams via the development and implementation of Cumulus. The term “native” implies that the system will leverage all components of a cloud infrastructure provided by the vendor for efficiency (in terms of both processing time and cost). Additionally, Cumulus will operate on future data streams involving satellite missions, aircraft missions, and field campaigns.

    This documentation includes both guidelines, examples, and source code docs. It is accessible at https://nasa.github.io/cumulus.


    Get To Know Cumulus

    • Getting Started - here - If you are new to Cumulus we suggest that you begin with this section to help you understand and work in the environment.
    • General Cumulus Documentation - here <- you're here

    Cumulus Reference Docs

    • Cumulus API Documentation - here
    • Cumulus Developer Documentation - here - READMEs throughout the main repository.
    • Data Cookbooks - here

    Auxiliary Guides

    • Integrator Guide - here
    • Operator Docs - here

    Contributing

    Please refer to: https://github.com/nasa/cumulus/blob/master/CONTRIBUTING.md for information. We thank you in advance.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/integrator-guide/about-int-guide/index.html b/docs/v10.1.0/integrator-guide/about-int-guide/index.html index bc0344aef5d..8ce655ffe5e 100644 --- a/docs/v10.1.0/integrator-guide/about-int-guide/index.html +++ b/docs/v10.1.0/integrator-guide/about-int-guide/index.html @@ -5,13 +5,13 @@ About Integrator Guide | Cumulus Documentation - +
    Version: v10.1.0

    About Integrator Guide

    Purpose

    The Integrator Guide is to help supplement the Cumulus documentation and Data Cookbooks. This content is for Cumulus integrators who are either new to the project or need a step-by-step resource to help them along.

    What Is A Cumulus Integrator

    Cumulus integrators are those who work within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    - + \ No newline at end of file diff --git a/docs/v10.1.0/integrator-guide/int-common-use-cases/index.html b/docs/v10.1.0/integrator-guide/int-common-use-cases/index.html index ba211ac2007..7c7b6e67b40 100644 --- a/docs/v10.1.0/integrator-guide/int-common-use-cases/index.html +++ b/docs/v10.1.0/integrator-guide/int-common-use-cases/index.html @@ -5,13 +5,13 @@ Integrator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v10.1.0/integrator-guide/workflow-add-new-lambda/index.html b/docs/v10.1.0/integrator-guide/workflow-add-new-lambda/index.html index 3ee88fbbeac..c75cd74c7fb 100644 --- a/docs/v10.1.0/integrator-guide/workflow-add-new-lambda/index.html +++ b/docs/v10.1.0/integrator-guide/workflow-add-new-lambda/index.html @@ -5,13 +5,13 @@ Workflow - Add New Lambda | Cumulus Documentation - +
    Version: v10.1.0

    Workflow - Add New Lambda

    You can develop a workflow task in AWS Lambda or Elastic Container Service (ECS). AWS ECS requires Docker. For a list of tasks to use go to our Cumulus Tasks page.

    The following steps are to help you along as you write a new Lambda that integrates with a Cumulus workflow. This will aid you with the understanding of the Cumulus Message Adapter (CMA) process.

    Steps

    1. Define New Lambda in Terraform

    2. Add Task in JSON Object

      For details on how to set up a workflow via CMA go to the CMA Tasks: Message Flow.

      You will need to assign input and output for the new task and follow the CMA contract here. This contract defines how libraries should call the cumulus-message-adapter to integrate a task into an existing Cumulus Workflow.

    3. Verify New Task

      Check the updated workflow in AWS and in Cumulus.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/integrator-guide/workflow-ts-failed-step/index.html b/docs/v10.1.0/integrator-guide/workflow-ts-failed-step/index.html index fb40d1d4152..1ea54734ac2 100644 --- a/docs/v10.1.0/integrator-guide/workflow-ts-failed-step/index.html +++ b/docs/v10.1.0/integrator-guide/workflow-ts-failed-step/index.html @@ -5,13 +5,13 @@ Workflow - Troubleshoot Failed Step(s) | Cumulus Documentation - +
    Version: v10.1.0

    Workflow - Troubleshoot Failed Step(s)

    Steps

    1. Locate Step
    • Go to Cumulus dashboard
    • Find the granule
    • Go to Executions to determine the failed step
    1. Investigate in Cloudwatch
    • Go to Cloudwatch
    • Locate lambda
    • Search Cloudwatch logs
    1. Recreate Error

      In your sandbox environment, try to recreate the error.

    2. Resolution

    - + \ No newline at end of file diff --git a/docs/v10.1.0/interfaces/index.html b/docs/v10.1.0/interfaces/index.html index 45eb9fa212b..a169cb493cd 100644 --- a/docs/v10.1.0/interfaces/index.html +++ b/docs/v10.1.0/interfaces/index.html @@ -5,13 +5,13 @@ Interfaces | Cumulus Documentation - +
    Version: v10.1.0

    Interfaces

    Cumulus has multiple interfaces that allow interaction with discrete components of the system, such as starting workflows via SNS/Kinesis/SQS, manually queueing workflow start messages, submitting SNS notifications for completed workflows, and the many operations allowed by the Cumulus API.

    The diagram below illustrates the workflow process in detail and the various interfaces that allow starting of workflows, reporting of workflow information, and database create operations that occur when a workflow reporting message is processed. For interfaces with expected input or output schemas, details are provided below.

    Note: This diagram is current of v1.18.0.

    Architecture diagram showing the interfaces for triggering and reporting of Cumulus workflow executions

    Workflow triggers and queuing

    Kinesis stream

    As a Kinesis stream is consumed by the messageConsumer Lambda to queue workflow executions, the incoming event is validated against this consumer schema by the ajv package.

    SQS queue for executions

    The messages put into the SQS queue for executions should conform to the Cumulus message format.

    Workflow executions

    See the documentation on Cumulus workflows.

    Workflow reporting

    SNS reporting topics

    For granule and PDR reporting, the topics will only receive data if the Cumulus workflow execution message meets the following criteria:

    • Granules - workflow message contains granule data in payload.granules
    • PDRs - workflow message contains PDR data in payload.pdr

    The messages published to the SNS reporting topics for executions and PDRs and the record property in the messages published to the granules SNS topic should conform to the model schema for each data type.

    Further detail on workflow reporting and how to interact with these interfaces can be found in the workflow notifications data cookbook.

    Cumulus API

    See the Cumulus API documentation.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/operator-docs/about-operator-docs/index.html b/docs/v10.1.0/operator-docs/about-operator-docs/index.html index 27a18cbdc42..f6efbf6a7bd 100644 --- a/docs/v10.1.0/operator-docs/about-operator-docs/index.html +++ b/docs/v10.1.0/operator-docs/about-operator-docs/index.html @@ -5,13 +5,13 @@ About Operator Docs | Cumulus Documentation - +
    Version: v10.1.0

    About Operator Docs

    Purpose

    Operator Docs are an augmentation to Cumulus documentation and Data Cookbooks. These documents will walk step-by-step through common Cumulus activities (that aren't necessarily as use-case directed as what you'd see in Data Cookbooks).

    What Is A Cumulus Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections. They may perform the following functions via the operator dashboard or API:

    • Configure providers and collections
    • Configure rules and monitor workflow executions
    • Monitor granule ingestion
    • Monitor system metrics
    - + \ No newline at end of file diff --git a/docs/v10.1.0/operator-docs/bulk-operations/index.html b/docs/v10.1.0/operator-docs/bulk-operations/index.html index 7759a7de2af..198246979a1 100644 --- a/docs/v10.1.0/operator-docs/bulk-operations/index.html +++ b/docs/v10.1.0/operator-docs/bulk-operations/index.html @@ -5,14 +5,14 @@ Bulk Operations | Cumulus Documentation - +
    Version: v10.1.0

    Bulk Operations

    Cumulus implements bulk operations through the use of AsyncOperations, which are long-running processes executed on an AWS ECS cluster.

    Submitting a bulk API request

    Bulk operations are generally submitted via the endpoint for the relevant data type, e.g. granules. For a list of supported API requests, refer to the Cumulus API documentation. Bulk operations are denoted with the keyword 'bulk'.

    Starting bulk operations from the Cumulus dashboard

    Using a Kibana query

    Note: You must have configured your dashboard build with a KIBANAROOT environment variable in order for the Kibana link to render in the bulk granules modal

    1. From the Granules dashboard page, click on the "Run Bulk Granules" button, then select what type of action you would like to perform

      • Note: the rest of the process is the same regardless of what type of bulk action you perform
    2. From the bulk granules modal, click the "Open Kibana" link:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations

    3. Once you have accessed Kibana, navigate to the "Discover" page. If this is your first time using Kibana, you may see a message like this at the top of the page:

      In order to visualize and explore data in Kibana, you'll need to create an index pattern to retrieve data from Elasticsearch.

      In that case, see the docs for creating an index pattern for Kibana

      Screenshot of Kibana user interface showing the &quot;Discover&quot; page for running queries

    4. Enter a query that returns the granule records that you want to use for bulk operations:

      Screenshot of Kibana user interface showing an example Kibana query and results

    5. Once the Kibana query is returning the results you want, click the "Inspect" link near the top of the page. A slide out tab with request details will appear on the right side of the page:

      Screenshot of Kibana user interface showing details of an example request

    6. In the slide out tab that appears on the right side of the page, click the "Request" link near the top and scroll down until you see the query property:

      Screenshot of Kibana user interface showing the Elasticsearch data request made for a given Kibana query

    7. Highlight and copy the query contents from Kibana. Go back to the Cumulus dashboard and paste the query contents from Kibana inside of the query property in the bulk granules request payload. It is expected that you should have a property of query nested inside of the existing query property:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query information populated

    8. Add values for the index and workflowName to the bulk granules request payload. The value for index will vary based on your Elasticsearch setup, but it is good to target an index specifically for granule data if possible:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query, index, and workflow information populated

    9. Click the "Run Bulk Operations" button. You should see a confirmation message, including an ID for the async operation that was started to handle your bulk action. You can track the status of this async operation on the Operations dashboard page, which can be visited by clicking the "Go To Operations" button:

      Screenshot of Cumulus dashboard showing confirmation message with async operation ID for bulk granules request

    Creating an index pattern for Kibana

    1. Define the index pattern for the indices that your Kibana queries should use. A wildcard character, *, will match across multiple indices. Once you are satisfied with your index pattern, click the "Next step" button:

      Screenshot of Kibana user interface for defining an index pattern

    2. Choose whether to use a Time Filter for your data, which is not required. Then click the "Create index pattern" button:

      Screenshot of Kibana user interface for configuring the settings of an index pattern

    Status Tracking

    All bulk operations return an AsyncOperationId which can be submitted to the /asyncOperations endpoint.

    The /asyncOperations endpoint allows listing of AsyncOperation records as well as record retrieval for individual records, which will contain the status. The Cumulus API documentation shows sample requests for these actions.

    The Cumulus Dashboard also includes an Operations monitoring page, where operations and their status are visible:

    Screenshot of Cumulus Dashboard Operations Page showing 5 operations and their status, ID, description, type and creation timestamp

    - + \ No newline at end of file diff --git a/docs/v10.1.0/operator-docs/cmr-operations/index.html b/docs/v10.1.0/operator-docs/cmr-operations/index.html index bb812138593..de7f8548e03 100644 --- a/docs/v10.1.0/operator-docs/cmr-operations/index.html +++ b/docs/v10.1.0/operator-docs/cmr-operations/index.html @@ -5,7 +5,7 @@ CMR Operations | Cumulus Documentation - + @@ -16,7 +16,7 @@ UpdateCmrAccessConstraints will update CMR metadata file contents on S3, and PostToCmr will push the updates to CMR. The rest of this section will assume you have created this workflow under the name UpdateCmrAccessConstraints.

    Once created and deployed, the workflow is available in the Cumulus dashboard's Execute workflow selector. However, note that additional configuration is required for this request, to supply an access constraint integer value and optional description to the UpdateCmrAccessConstraints workflow, by clicking the Add Custom Workflow Meta option in the Execute popup, as shown below:

    Screenshot showing granule execute popup with &#39;updateCmrAccessConstraints&#39; selected and configuration values shown in a collapsible JSON field

    An example invocation of the API to perform this action is:

    $ curl --request PUT https://example.com/granules/MOD11A1.A2017137.h19v16.006.2017138085750 \
    --header 'Authorization: Bearer ReplaceWithTheToken' \
    --header 'Content-Type: application/json' \
    --data '{
    "action": "applyWorkflow",
    "workflow": "updateCmrAccessConstraints",
    "meta": {
    accessConstraints: {
    value: 5,
    description: "sample access constraint"
    }
    }
    }'

    Supported CMR metadata formats for the above operation are Echo10XML and UMMG-JSON, which will populate the RestrictionFlag and RestrictionComment fields in Echo10XML, or the AccessConstraints values in UMMG-JSON.

    Additional Operations

    At this time Cumulus does not, out of the box, support additional operations on CMR metadata. However, given the examples shown above, we recommend working with your integrators to develop additional workflows that perform any required operations.

    Bulk CMR operations

    In order to perform the above operations in bulk, Cumulus supports the use of ApplyWorkflow in an AsyncOperation. These are accessed via the Bulk Operation button on the dashboard, or the /granules/bulk endpoint on the Cumulus API.

    More information on bulk operations are in the bulk operations operator doc.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/operator-docs/create-rule-in-cumulus/index.html b/docs/v10.1.0/operator-docs/create-rule-in-cumulus/index.html index 52672e55ed8..1e62ec8bd94 100644 --- a/docs/v10.1.0/operator-docs/create-rule-in-cumulus/index.html +++ b/docs/v10.1.0/operator-docs/create-rule-in-cumulus/index.html @@ -5,13 +5,13 @@ Create Rule In Cumulus | Cumulus Documentation - +
    Version: v10.1.0

    Create Rule In Cumulus

    Once the above files are in place and the entries created in CMR and Cumulus, we are ready to begin ingesting data. Depending on the type of ingestion (FTP/Kinesis, etc) the values below will change, but for the most part they are all similar. Rules tell Cumulus how to associate providers and collections, and when/how to start processing a workflow.

    Steps

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v10.1.0/operator-docs/discovery-filtering/index.html b/docs/v10.1.0/operator-docs/discovery-filtering/index.html index cc41d4d3415..42294ca851b 100644 --- a/docs/v10.1.0/operator-docs/discovery-filtering/index.html +++ b/docs/v10.1.0/operator-docs/discovery-filtering/index.html @@ -5,7 +5,7 @@ Discovery Filtering | Cumulus Documentation - + @@ -24,7 +24,7 @@ directly list the provider_path. If the path contains regular expression components, this may fail.

    It is recommended that operators diagnose any failures by checking error logs and ensuring that permissions on the remote file system allow reading of the default directory and any subdirectories that match the filter.

    Supported protocols

    Currently support for this feature is limited to the following protocols:

    • ftp
    • sftp
    - + \ No newline at end of file diff --git a/docs/v10.1.0/operator-docs/granule-workflows/index.html b/docs/v10.1.0/operator-docs/granule-workflows/index.html index 8c5db432aaf..e703d6a7e34 100644 --- a/docs/v10.1.0/operator-docs/granule-workflows/index.html +++ b/docs/v10.1.0/operator-docs/granule-workflows/index.html @@ -5,13 +5,13 @@ Granule Workflows | Cumulus Documentation - +
    Version: v10.1.0

    Granule Workflows

    Failed Granule

    Delete and Ingest

    1. Delete Granule

    Note: Granules published to CMR will need to be removed from CMR via the dashboard prior to deletion

    1. Ingest Granule via Ingest Rule
    • Re-trigger a one-time, kinesis, SQS, or SNS rule or a scheduled rule will re-discover and reingest the deleted granule.

    Reingest

    1. Select Failed Granule
    • In the Cumulus dashboard, go to the Collections page.
    • Use search field to find the granule.
    1. Re-ingest Granule
    • Go to the Collections page.
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of the Reingest modal workflow

    Delete and Ingest

    1. Bulk Delete Granules
    • Go to the Granules page.
    • Use the Bulk Delete button to bulk delete selected granules or select via a Kibana query

    Note: You can optionally force deletion from CMR

    1. Ingest Granules via Ingest Rule
    • Re-trigger one-time, kinesis, SQS, or SNS rules or scheduled rules will re-discover and reingest the deleted granule.

    Multiple Failed Granules

    1. Select Failed Granules
    • In the Cumulus dashboard, go to the Collections page.
    • Click on Failed Granules.
    • Select multiple granules.

    Screenshot of selected multiple granules

    1. Bulk Re-ingest Granules
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of Bulk Reingest modal workflow

    - + \ No newline at end of file diff --git a/docs/v10.1.0/operator-docs/kinesis-stream-for-ingest/index.html b/docs/v10.1.0/operator-docs/kinesis-stream-for-ingest/index.html index 7c2d1abcf0d..2c399a04561 100644 --- a/docs/v10.1.0/operator-docs/kinesis-stream-for-ingest/index.html +++ b/docs/v10.1.0/operator-docs/kinesis-stream-for-ingest/index.html @@ -5,13 +5,13 @@ Setup Kinesis Stream & CNM Message | Cumulus Documentation - +
    Version: v10.1.0

    Setup Kinesis Stream & CNM Message

    Note: Keep in mind that you should only have to set this up once per ingest stream. Kinesis pricing is based on the shard value and not on amount of kinesis usage.

    1. Create a Kinesis Stream

      • In your AWS console, go to the Kinesis service and click Create Data Stream.
      • Assign a name to the stream.
      • Apply a shard value of 1.
      • Click on Create Kinesis Stream.
      • A status page with stream details display. Once the status is active then the stream is ready to use. Keep in mind to record the streamName and StreamARN for later use.

      Screenshot of AWS console page for creating a Kinesis stream

    2. Create a Rule

    3. Send a message

      • Send a message that makes your schema using python or by your command line.
      • The streamName and Collection must match the kinesisArn+collection defined in the rule that you have created in Step 2.
    - + \ No newline at end of file diff --git a/docs/v10.1.0/operator-docs/locating-access-logs/index.html b/docs/v10.1.0/operator-docs/locating-access-logs/index.html index d0d6e61b570..bb62e99860a 100644 --- a/docs/v10.1.0/operator-docs/locating-access-logs/index.html +++ b/docs/v10.1.0/operator-docs/locating-access-logs/index.html @@ -5,13 +5,13 @@ Locating S3 Access Logs | Cumulus Documentation - +
    Version: v10.1.0

    Locating S3 Access Logs

    When enabling S3 Access Logs for EMS Reporting you configured a TargetBucket and TargetPrefix. Inside the TargetBucket at the TargetPrefix is where you will find the raw S3 access logs.

    In a standard deployment, this will be your stack's <internal bucket name> and a key prefix of <stack>/ems-distribution/s3-server-access-logs/

    - + \ No newline at end of file diff --git a/docs/v10.1.0/operator-docs/naming-executions/index.html b/docs/v10.1.0/operator-docs/naming-executions/index.html index a9228e0a815..bae19a5d607 100644 --- a/docs/v10.1.0/operator-docs/naming-executions/index.html +++ b/docs/v10.1.0/operator-docs/naming-executions/index.html @@ -5,7 +5,7 @@ Naming Executions | Cumulus Documentation - + @@ -21,7 +21,7 @@ QueuePdrs step.

    In the following excerpt, the QueueGranules config.executionNamePrefix property is set using the value configured in the workflow's meta.executionNamePrefix.

    Please note: This meta.executionNamePrefix property should not be confused with the optional rule executionNamePrefix property from the previous section. Setting executionNamePrefix as a root property of the rule will set a prefix for the names of any workflows triggered by the rule. Setting meta.executionNamePrefix on the rule will set meta.executionNamePrefix in the workflow messages generated for this rule, allowing workflow steps like QueueGranules to read from the message meta.executionNamePrefix for their config. Then, workflows scheduled by QueueGranules would use the configured execution name prefix.

    Setting executionNamePrefix config for QueueGranules using rule.meta

    If you wanted to use a prefix of "my-prefix", you would create a rule with a meta property similar to the following Rule snippet:

    {
    ...other rule keys here...
    "meta":
    {
    "executionNamePrefix": "my-prefix"
    }
    }

    The value of meta.executionNamePrefix from the rule will be set as meta.executionNamePrefix in the workflow message.

    Then, the workflow could contain a "QueueGranules" step with the following state, which uses meta.executionNamePrefix from the message as the value for the executionNamePrefix config to the "QueueGranules" step:

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "executionNamePrefix": "{$.meta.executionNamePrefix}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },
    }
    - + \ No newline at end of file diff --git a/docs/v10.1.0/operator-docs/ops-common-use-cases/index.html b/docs/v10.1.0/operator-docs/ops-common-use-cases/index.html index 02eb3124e8c..a9749067121 100644 --- a/docs/v10.1.0/operator-docs/ops-common-use-cases/index.html +++ b/docs/v10.1.0/operator-docs/ops-common-use-cases/index.html @@ -5,13 +5,13 @@ Operator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v10.1.0/operator-docs/trigger-workflow/index.html b/docs/v10.1.0/operator-docs/trigger-workflow/index.html index 8d8c431edc1..43856bd7313 100644 --- a/docs/v10.1.0/operator-docs/trigger-workflow/index.html +++ b/docs/v10.1.0/operator-docs/trigger-workflow/index.html @@ -5,13 +5,13 @@ Trigger a Workflow Execution | Cumulus Documentation - +
    Version: v10.1.0

    Trigger a Workflow Execution

    To trigger a workflow, you need to create a rule. To trigger an ingest workflow, one that requires discovering and ingesting data, you will also need to configure the collection and provider and associate those to a rule.

    Trigger a HelloWorld Workflow

    To trigger a HelloWorld workflow that does not need to discover or archive data, you just need to create a rule.

    You can leave the provider and collection blank and do not need any additional metadata. If you create a onetime rule, the workflow execution will start momentarily and you can view its status on the Executions page.

    Trigger an Ingest Workflow

    To ingest data, you will need a provider and collection configured to tell your workflow where to discover data and where to archive the data respectively.

    Follow the instructions to create a provider and create a collection and configure their fields for your data ingest.

    In the rule's additional metadata you can specify a provider_path from which to get the data from the provider.

    Example: Ingest data from S3

    Setup

    Assume there are 2 files to be ingested in an S3 bucket called discovery-bucket, located in the test-data folder:

    • GRANULE.A2017025.jpg
    • GRANULE.A2017025.hdf

    Archive buckets should already be created and mapped to public / private / protected in the Cumulus deployment.

    For example:

    buckets = {
    private = {
    name = "discovery-bucket"
    type = "private"
    },
    protected = {
    name = "archive-protected"
    type = "protected"
    }
    public = {
    name = "archive-public"
    type = "public"
    }
    }

    Create a provider

    Create a new provider. Set protocol to S3 and Host to discovery-bucket.

    Screenshot of adding a sample S3 provider

    Create a collection

    Create a new collection. Configure the collection to extract the granule id from the filenames and configure where to store the granule files.

    The configuration below will store hdf files in the protected bucket and jpg files in the private bucket. The bucket types are

    {
    "name": "test-collection",
    "version": "001",
    "granuleId": "^GRANULE\\.A[\\d]{7}$",
    "granuleIdExtraction": "(GRANULE\\..*)(\\.hdf|\\.jpg)",
    "reportToEms": false,
    "sampleFileName": "GRANULE.A2017025.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^GRANULE\\.A[\\d]{7}\\.hdf$",
    "sampleFileName": "GRANULE.A2017025.hdf"
    },
    {
    "bucket": "public",
    "regex": "^GRANULE\\.A[\\d]{7}\\.jpg$",
    "sampleFileName": "GRANULE.A2017025.jpg"
    }
    ]
    }

    Create a rule

    Create a rule to trigger the workflow to discover your granule data and ingest your granule.

    Select the previously created provider and collection. See the Cumulus Discover Granules workflow for a workflow example of using Cumulus tasks to discover and queue data for ingest.

    In the rule meta, set the provider_path to test-data, so the test-data folder will be used to discover new granules.

    Screenshot of adding a Discover Granules rule

    A onetime rule will run your workflow on-demand and you can view it on the dashboard Executions page. The Cumulus Discover Granules workflow will trigger an ingest workflow and your ingested granules will be visible on the dashboard Granules page.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/tasks/index.html b/docs/v10.1.0/tasks/index.html index c9f9f4902eb..13c0dce4351 100644 --- a/docs/v10.1.0/tasks/index.html +++ b/docs/v10.1.0/tasks/index.html @@ -5,13 +5,13 @@ Cumulus Tasks | Cumulus Documentation - +
    Version: v10.1.0

    Cumulus Tasks

    A list of reusable Cumulus tasks. Add your own.

    Tasks

    @cumulus/add-missing-file-checksums

    Add checksums to files in S3 which don't have one


    @cumulus/discover-granules

    Discover Granules in FTP/HTTP/HTTPS/SFTP/S3 endpoints


    @cumulus/discover-pdrs

    Discover PDRs in FTP and HTTP endpoints


    @cumulus/files-to-granules

    Converts array-of-files input into a granules object by extracting granuleId from filename


    @cumulus/hello-world

    Example task


    @cumulus/hyrax-metadata-updates

    Update granule metadata with hooks to OPeNDAP URL


    @cumulus/lzards-backup

    Run LZARDS backup


    @cumulus/move-granules

    Move granule files from staging to final location


    @cumulus/parse-pdr

    Download and Parse a given PDR


    @cumulus/pdr-status-check

    Checks execution status of granules in a PDR


    @cumulus/post-to-cmr

    Post a given granule to CMR


    @cumulus/queue-granules

    Add discovered granules to the queue


    @cumulus/queue-pdrs

    Add discovered PDRs to a queue


    @cumulus/queue-workflow

    Add workflow to the queue


    @cumulus/sf-sqs-report

    Sends an incoming Cumulus message to SQS


    @cumulus/sync-granule

    Download a given granule


    @cumulus/test-processing

    Fake processing task used for integration tests


    @cumulus/update-cmr-access-constraints

    Updates CMR metadata to set access constraints


    Update CMR metadata files with correct online access urls and etags and transfer etag info to granules' CMR files

    - + \ No newline at end of file diff --git a/docs/v10.1.0/team/index.html b/docs/v10.1.0/team/index.html index 4b3e27caad0..25c66fb3645 100644 --- a/docs/v10.1.0/team/index.html +++ b/docs/v10.1.0/team/index.html @@ -5,13 +5,13 @@ Cumulus Team | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v10.1.0/troubleshooting/index.html b/docs/v10.1.0/troubleshooting/index.html index 3bfc03bb749..3e9dd533430 100644 --- a/docs/v10.1.0/troubleshooting/index.html +++ b/docs/v10.1.0/troubleshooting/index.html @@ -5,14 +5,14 @@ How to Troubleshoot and Fix Issues | Cumulus Documentation - +
    Version: v10.1.0

    How to Troubleshoot and Fix Issues

    While Cumulus is a complex system, there is a focus on maintaining the integrity and availability of the system and data. Should you encounter errors or issues while using this system, this section will help troubleshoot and solve those issues.

    Backup and Restore

    Cumulus has backup and restore functionality built-in to protect Cumulus data and allow recovery of a Cumulus stack. This is currently limited to Cumulus data and not full S3 archive data. Backup and restore is not enabled by default and must be enabled and configured to take advantage of this feature.

    For more information, read the Backup and Restore documentation.

    Elasticsearch reindexing

    If you run into issues with your Elasticsearch index, a reindex operation is available via the Cumulus API. See the Reindexing Guide.

    Information on how to reindex Elasticsearch is in the Cumulus API documentation.

    Troubleshooting Workflows

    Workflows are state machines comprised of tasks and services and each component logs to CloudWatch. The CloudWatch logs for all steps in the execution are displayed in the Cumulus dashboard or you can find them by going to CloudWatch and navigating to the logs for that particular task.

    Workflow Errors

    Visual representations of executed workflows can be found in the Cumulus dashboard or the AWS Step Functions console for that particular execution.

    If a workflow errors, the error will be handled according to the error handling configuration. The task that fails will have the exception field populated in the output, giving information about the error. Further information can be found in the CloudWatch logs for the task.

    Graph of AWS Step Function execution showing a failing workflow

    Workflow Did Not Start

    Generally, first check your rule configuration. If that is satisfactory, the answer will likely be in the CloudWatch logs for the schedule SF or SF starter lambda functions. See the workflow triggers page for more information on how workflows start.

    For Kinesis and SNS rules specifically, if an error occurs during the message consumer process, the fallback consumer lambda will be called and if the message continues to error, a message will be placed on the dead letter queue. Check the dead letter queue for a failure message. Errors can be traced back to the CloudWatch logs for the message consumer and the fallback consumer. Additionally, check that the name and version match those configured in your rule, as rules are filtered by the notification's collection name and version before scheduling executions.

    More information on kinesis error handling is here.

    Operator API Errors

    All operator API calls are funneled through the ApiEndpoints lambda. Each API call is logged to the ApiEndpoints CloudWatch log for your deployment.

    Lambda Errors

    KMS Exception: AccessDeniedException

    KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.

    The above error was being thrown by cumulus lambda function invocation. The KMS key is the encryption key used to encrypt lambda environment variables. The root cause of this error is unknown, but is speculated to be caused by deleting and recreating, with the same name, the IAM role the lambda uses.

    This error can be resolved by switching the lambda's execution role to a different one and then back through the Lambda management console. Unfortunately, this approach doesn't scale well.

    The other resolution (that scales but takes some time) that was found is as follows:

    1. Comment out all lambda definitions (and dependent resources) in your Terraform configuration.
    2. terraform apply to delete the lambdas.
    3. Un-comment the definitions.
    4. terraform apply to recreate the lambdas.

    If this problem occurs with Core lambdas and you are using the terraform-aws-cumulus.zip file source distributed in our release, we recommend using the non-scaling approach as the number of lambdas we distribute is in the low teens, which are likely to be easier and faster to reconfigure one-by-one compared to editing our configs.

    Error: Unable to import module 'index': Error

    This error is shown in the CloudWatch logs for a Lambda function.

    One possible cause is that the Lambda definition in the .tf file defining the lambda is not pointing to the correct packaged lambda source file. In order to resolve this issue, update the lambda definition to point directly to the packaged (e.g. .zip) lambda source file.

    resource "aws_lambda_function" "discover_granules_task" {
    function_name = "${var.prefix}-DiscoverGranules"
    filename = "${path.module}/../../tasks/discover-granules/dist/lambda.zip"
    handler = "index.handler"
    }

    If you are seeing this error when using the Lambda as a step in a Cumulus workflow, then inspect the output for this Lambda step in the AWS Step Function console. If you see the error Cannot find module 'node_modules/@cumulus/cumulus-message-adapter-js', then you need to ensure the lambda's packaged dependencies include cumulus-message-adapter-js.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/troubleshooting/reindex-elasticsearch/index.html b/docs/v10.1.0/troubleshooting/reindex-elasticsearch/index.html index 188ec25b236..8e5cb272a51 100644 --- a/docs/v10.1.0/troubleshooting/reindex-elasticsearch/index.html +++ b/docs/v10.1.0/troubleshooting/reindex-elasticsearch/index.html @@ -5,7 +5,7 @@ Reindexing Elasticsearch Guide | Cumulus Documentation - + @@ -14,7 +14,7 @@ current index, or the mappings for an index have been updated (they do not update automatically). Any reindexing that will be required when upgrading Cumulus will be in the Migration Steps section of the changelog.

    Switch to a new index and Reindex

    There are two operations needed: reindex and change-index to switch over to the new index. A Change Index/Reindex can be done in either order, but both have their trade-offs.

    If you decide to point Cumulus to a new (empty) index first (with a change index operation), and then Reindex the data to the new index, data ingested while reindexing will automatically be sent to the new index. As reindexing operations can take a while, not all the data will show up on the Cumulus Dashboard right away. The advantage is you do not have to turn of any ingest operations. This way is recommended.

    If you decide to Reindex data to a new index first, and then point Cumulus to that new index, it is not guaranteed that data that is sent to the old index while reindexing will show up in the new index. If you prefer this way, it is recommended to turn off any ingest operations. This order will keep your dashboard data from seeing any interruption.

    Change Index

    This will point Cumulus to the index in Elasticsearch that will be used when retrieving data. Performing a change index operation to an index that does not exist yet will create the index for you. The change index operation can be found here.

    Reindex from the old index to the new index

    The reindex operation will take the data from one index and copy it into another index. The reindex operation can be found here

    Reindex status

    Reindexing is a long-running operation. The reindex-status endpoint can be used to monitor the progress of the operation.

    Index from database

    If you want to just grab the data straight from the database you can perform an Index from Database Operation. After the data is indexed from the database, a Change Index operation will need to be performed to ensure Cumulus is pointing to the right index. It is strongly recommended to turn off workflow rules when performing this operation so any data ingested to the database is not lost.

    Validate reindex

    To validate the reindex, use the reindex-status endpoint. The doc count can be used to verify that the reindex was successful. In the below example the reindex from cumulus-2020-11-3 to cumulus-2021-3-4 was not fully successful as they show different doc counts.

    "indices": {
    "cumulus-2020-11-3": {
    "primaries": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    },
    "total": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    }
    },
    "cumulus-2021-3-4": {
    "primaries": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    },
    "total": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    }
    }
    }

    To further drill down into what is missing, log in to the Kibana instance (found in the Elasticsearch section of the AWS console) and run the following command replacing <index> with your index name.

    GET <index>/_search
    {
    "aggs": {
    "count_by_type": {
    "terms": {
    "field": "_type"
    }
    }
    },
    "size": 0
    }

    which will produce a result like

    "aggregations": {
    "count_by_type": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "logs",
    "doc_count": 483955
    },
    {
    "key": "execution",
    "doc_count": 4966
    },
    {
    "key": "deletedgranule",
    "doc_count": 4715
    },
    {
    "key": "pdr",
    "doc_count": 1822
    },
    {
    "key": "granule",
    "doc_count": 740
    },
    {
    "key": "asyncOperation",
    "doc_count": 616
    },
    {
    "key": "provider",
    "doc_count": 108
    },
    {
    "key": "collection",
    "doc_count": 87
    },
    {
    "key": "reconciliationReport",
    "doc_count": 48
    },
    {
    "key": "rule",
    "doc_count": 7
    }
    ]
    }
    }

    Resuming a reindex

    If a reindex operation did not fully complete it can be resumed using the following command run from the Kibana instance.

    POST _reindex?wait_for_completion=false
    {
    "conflicts": "proceed",
    "source": {
    "index": "cumulus-2020-11-3"
    },
    "dest": {
    "index": "cumulus-2021-3-4",
    "op_type": "create"
    }
    }

    The Cumulus API reindex-status endpoint can be used to monitor completion of this operation.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/troubleshooting/rerunning-workflow-executions/index.html b/docs/v10.1.0/troubleshooting/rerunning-workflow-executions/index.html index f2a92972970..e08df23a8b2 100644 --- a/docs/v10.1.0/troubleshooting/rerunning-workflow-executions/index.html +++ b/docs/v10.1.0/troubleshooting/rerunning-workflow-executions/index.html @@ -5,13 +5,13 @@ Re-running workflow executions | Cumulus Documentation - +
    Version: v10.1.0

    Re-running workflow executions

    To re-run a Cumulus workflow execution from the AWS console:

    1. Visit the page for an individual workflow execution

    2. Click the "New execution" button at the top right of the screen

      Screenshot of the AWS console for a Step Function execution highlighting the &quot;New execution&quot; button at the top right of the screen

    3. In the "New execution" modal that appears, replace the cumulus_meta.execution_name value in the default input with the value of the new execution ID as seen in the screenshot below

      Screenshot of the AWS console showing the modal window for entering input when running a new Step Function execution

    4. Click the "Start execution" button

    - + \ No newline at end of file diff --git a/docs/v10.1.0/troubleshooting/troubleshooting-deployment/index.html b/docs/v10.1.0/troubleshooting/troubleshooting-deployment/index.html index 99db5195663..f9a6a5eb183 100644 --- a/docs/v10.1.0/troubleshooting/troubleshooting-deployment/index.html +++ b/docs/v10.1.0/troubleshooting/troubleshooting-deployment/index.html @@ -5,7 +5,7 @@ Troubleshooting Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ data-persistence modules, but your config is only creating one Elasticsearch instance. To fix the issue, update the elasticsearch_config variable for your data-persistence module to increase the number of instances:

    {
    domain_name = "es"
    instance_count = 2
    instance_type = "t2.small.elasticsearch"
    version = "5.3"
    volume_size = 10
    }

    Install dashboard

    Dashboard configuration

    Issues:

    • Problem clearing the cache: EACCES: permission denied, rmdir '/tmp/gulp-cache/default'", this probably means the files at that location, and/or the folder, are owned by someone else (or some other factor prevents you from writing there).

    It's possible to workaround this by editing the file cumulus-dashboard/node_modules/gulp-cache/index.js and alter the value of the line var fileCache = new Cache({cacheDirName: 'gulp-cache'}); to something like var fileCache = new Cache({cacheDirName: '<prefix>-cache'});. Now gulp-cache will be able to write to /tmp/<prefix>-cache/default, and the error should resolve.

    Dashboard deployment

    Issues:

    • If the dashboard sends you to an Earthdata Login page that has an error reading "Invalid request, please verify the client status or redirect_uri before resubmitting", this means you've either forgotten to update one or more of your EARTHDATA_CLIENT_ID, EARTHDATA_CLIENT_PASSWORD environment variables (from your app/.env file) and re-deploy Cumulus, or you haven't placed the correct values in them, or you've forgotten to add both the "redirect" and "token" URL to the Earthdata Application.
    • There is odd caching behavior associated with the dashboard and Earthdata Login at this point in time that can cause the above error to reappear on the Earthdata Login page loaded by the dashboard even after fixing the cause of the error. If you experience this, attempt to access the dashboard in a new browser window, and it should work.
    - + \ No newline at end of file diff --git a/docs/v10.1.0/upgrade-notes/cumulus_distribution_migration/index.html b/docs/v10.1.0/upgrade-notes/cumulus_distribution_migration/index.html index 8992721b367..37f1cff24ef 100644 --- a/docs/v10.1.0/upgrade-notes/cumulus_distribution_migration/index.html +++ b/docs/v10.1.0/upgrade-notes/cumulus_distribution_migration/index.html @@ -5,14 +5,14 @@ Migrate from TEA deployment to Cumulus Distribution | Cumulus Documentation - +
    Version: v10.1.0

    Migrate from TEA deployment to Cumulus Distribution

    Background

    The Cumulus Distribution API is configured to use the AWS Cognito OAuth client. This API can be used instead of the Thin Egress App, which is the default distribution API if using the Deployment Template.

    Configuring a Cumulus Distribution deployment

    See these instructions for deploying the Cumulus Distribution API.

    Important note if migrating from TEA to Cumulus Distribution

    If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/upgrade-notes/migrate_tea_standalone/index.html b/docs/v10.1.0/upgrade-notes/migrate_tea_standalone/index.html index 8588814a5ac..2d9ed390717 100644 --- a/docs/v10.1.0/upgrade-notes/migrate_tea_standalone/index.html +++ b/docs/v10.1.0/upgrade-notes/migrate_tea_standalone/index.html @@ -5,13 +5,13 @@ Migrate TEA deployment to standalone module | Cumulus Documentation - +
    Version: v10.1.0

    Migrate TEA deployment to standalone module

    Background

    This document is only relevant for upgrades of Cumulus from versions < 3.x.x to versions > 3.x.x

    Previous versions of Cumulus included deployment of the Thin Egress App (TEA) by default in the distribution module. As a result, Cumulus users who wanted to deploy a new version of TEA to wait on a new release of Cumulus that incorporated that release.

    In order to give Cumulus users the flexibility to deploy newer versions of TEA whenever they want, deployment of TEA has been removed from the distribution module and Cumulus users must now add the TEA module to their deployment. Guidance on integrating the TEA module to your deployment is provided, or you can refer to Cumulus core example deployment code for the thin_egress_app module.

    By default, when upgrading Cumulus and moving from TEA deployed via the distribution module to deployed as a separate module, your API gateway for TEA would be destroyed and re-created, which could cause outages for any Cloudfront endpoints pointing at that API gateway.

    These instructions outline how to modify your state to preserve your existing Thin Egress App (TEA) API gateway when upgrading Cumulus and moving deployment of TEA to a standalone module. If you do not care about preserving your API gateway for TEA when upgrading your Cumulus deployment, you can skip these instructions.

    Prerequisites

    Notes about state management

    These instructions will involve manipulating your Terraform state via terraform state mv commands. These operations are extremely dangerous, since a mistake in editing your Terraform state can leave your stack in a corrupted state where deployment may be impossible or may result in unanticipated resource deletion.

    Since bucket versioning preserves a separate version of your state file each time it is written, and the Terraform state modification commands overwrite the state file, we can mitigate the risk of these operations by downloading the most recent state file before starting the upgrade process. Then, if anything goes wrong during the upgrade, we can restore that previous state version. Guidance on how to perform both operations is provided below.

    Download your most recent state version

    Run this command to download the most recent cumulus deployment state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp s3://BUCKET/KEY /path/to/terraform.tfstate

    Restore a previous state version

    Upload the state file that was previously downloaded to the bucket/key for your state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp /path/to/terraform.tfstate s3://BUCKET/KEY

    Then run terraform plan, which will give an error because we manually overwrote the state file and it is now out of sync with the lock table Terraform uses to track your state file:

    Error: Error loading state: state data in S3 does not have the expected content.

    This may be caused by unusually long delays in S3 processing a previous state
    update. Please wait for a minute or two and try again. If this problem
    persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
    to manually verify the remote state and update the Digest value stored in the
    DynamoDB table to the following value: <some-digest-value>

    To resolve this error, run this command and replace DYNAMO_LOCK_TABLE, BUCKET and KEY with the correct values from cumulus-tf/terraform.tf, and use the digest value from the previous error output:

     aws dynamodb put-item \
    --table-name DYNAMO_LOCK_TABLE \
    --item '{
    "LockID": {"S": "BUCKET/KEY-md5"},
    "Digest": {"S": "some-digest-value"}
    }'

    Now, if you re-run terraform plan, it should work as expected.

    Migration instructions

    Please note: These instructions assume that you are deploying the thin_egress_app module as shown in the Cumulus core example deployment code

    1. Ensure that you have downloaded the latest version of your state file for your cumulus deployment

    2. Find the URL for your <prefix>-thin-egress-app-EgressGateway API gateway. Confirm that you can access it in the browser and that it is functional.

    3. Run terraform plan. You should see output like (edited for readability):

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be created
      + resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket.lambda_source will be created
      + resource "aws_s3_bucket" "lambda_source" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be created
      + resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be created
      + resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be created
      + resource "aws_s3_bucket_object" "lambda_source" {

      # module.thin_egress_app.aws_security_group.egress_lambda[0] will be created
      + resource "aws_security_group" "egress_lambda" {

      ...

      # module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be destroyed
      - resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source will be destroyed
      - resource "aws_s3_bucket" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be destroyed
      - resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be destroyed
      - resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source will be destroyed
      - resource "aws_s3_bucket_object" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda[0] will be destroyed
      - resource "aws_security_group" "egress_lambda" {
    4. Run the state modification commands. The commands must be run in exactly this order:

       # Move security group
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda module.thin_egress_app.aws_security_group.egress_lambda

      # Move TEA storage bucket
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source module.thin_egress_app.aws_s3_bucket.lambda_source

      # Move TEA lambda source code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source module.thin_egress_app.aws_s3_bucket_object.lambda_source

      # Move TEA lambda dependency code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive

      # Move TEA Cloudformation template
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template module.thin_egress_app.aws_s3_bucket_object.cloudformation_template

      # Move URS creds secret version
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret_version.thin_egress_urs_creds aws_secretsmanager_secret_version.thin_egress_urs_creds

      # Move URS creds secret
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret.thin_egress_urs_creds aws_secretsmanager_secret.thin_egress_urs_creds

      # Move TEA Cloudformation stack
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app module.thin_egress_app.aws_cloudformation_stack.thin_egress_app

      Depending on how you were supplying a bucket map to TEA, there may be an additional step. If you were specifying the bucket_map_key variable to the cumulus module to use a custom bucket map, then you can ignore this step and just ensure that the bucket_map_file variable to the TEA module uses that same S3 key. Otherwise, if you were letting Cumulus generate a bucket map for you, then you need to take this step to migrate that bucket map:

      # Move bucket map
      terraform state mv module.cumulus.module.distribution.aws_s3_bucket_object.bucket_map_yaml[0] aws_s3_bucket_object.bucket_map_yaml
    5. Run terraform plan again. You may still see a few additions/modifications pending like below, but you should not see any deletion of Thin Egress App resources pending:

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be updated in-place
      ~ resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be updated in-place
      ~ resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_source" {

      If you still see deletion of module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app pending, then something went wrong and you should restore the previously downloaded state file version and start over from step 1. Otherwise, proceed to step 6.

    6. Once you have confirmed that everything looks as expected, run terraform apply.

    7. Visit the same API gateway from step 1 and confirm that it still works.

    Your TEA deployment has now been migrated to a standalone module, which gives you the ability to upgrade the deployed version of TEA independently of Cumulus releases.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/upgrade-notes/update-cma-2.0.2/index.html b/docs/v10.1.0/upgrade-notes/update-cma-2.0.2/index.html index a4078998b5a..6e3cf29aa9d 100644 --- a/docs/v10.1.0/upgrade-notes/update-cma-2.0.2/index.html +++ b/docs/v10.1.0/upgrade-notes/update-cma-2.0.2/index.html @@ -5,13 +5,13 @@ Upgrade to CMA 2.0.2 | Cumulus Documentation - +
    Version: v10.1.0

    Upgrade to CMA 2.0.2

    Updating a Cumulus Deployment to CMA 2.0.2

    Background

    The Cumulus Message Adapter has been updated in release 2.0.2 to no longer utilize the AWS step function API to look up the defined name of a step function task for population in meta.workflow_tasks, but instead use an incrementing integer field.

    Additionally a bugfix was released in the form of v2.0.1/v2.0.2 following the initial 2.0.0 release, so all users should update to release 2.0.2

    The update is not tied to a particular version of Core, however the update should be done across all task components in order to ensure consistent execution records.

    Changes

    Execution Record Update

    This update functionally means that Cumulus tasks/activities using the CMA will now record a record that looks like the following in meta.workflowtasks, and more importantly in the tasks column for an execution record:

    Original

          "DiscoverGranules": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "QueueGranules": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    New

          "0": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "1": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    Actions Required

    The following should be done as part of a Cumulus stack update to utilize cumulus message adapter > 2.0.2:

    • Python tasks that utilize cumulus-message-adapter-python should be updated to use > 2.0.0, their lambdas rebuilt and Cumulus workflows reconfigured to use the updated version.

    • Python activities that utilize cumulus-process-py should be rebuilt using > 1.0.0 with updated dependencies, and have their images deployed/Cumulus configured to use the new version.

    • The cumulus-message-adapter v2.0.2 lambda layer should be made available in the deployment account, and the Cumulus deployment should be reconfigured to use it (via the cumulus_message_adapter_lambda_layer_version_arn variable in the cumulus module). This should address all Core node.js tasks that utilize the CMA, and many contributed node.js/JAVA components.

    Once the above have been done, redeploy Cumulus to apply the configuration and the updates should be live.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/upgrade-notes/update-task-file-schemas/index.html b/docs/v10.1.0/upgrade-notes/update-task-file-schemas/index.html index fb7e8d1f7e7..59393ab4437 100644 --- a/docs/v10.1.0/upgrade-notes/update-task-file-schemas/index.html +++ b/docs/v10.1.0/upgrade-notes/update-task-file-schemas/index.html @@ -5,13 +5,13 @@ Updates to task granule file schemas | Cumulus Documentation - +
    Version: v10.1.0

    Updates to task granule file schemas

    Background

    Most Cumulus workflow tasks expect as input a payload of granule(s) which contain the files for each granule. Most tasks also return this same granule structure as output.

    However, up to this point, there was inconsistency in the schemas for the granule files objects expected by each task. Furthermore, there was no guarantee of consistency between granule files objects as stored in the database and the expectations of any given workflow task.

    Thus, when performing bulk granule operations which pass granules from the database into a Cumulus workflow, it was possible for there to be schema validation failures depending on which task was used to start the workflow and its particular schema.

    In order to rectify this situation, CUMULUS-2388 was filed and addressed to create a common granule files schema between nearly all of the Cumulus tasks (exceptions discussed below) and the Cumulus database. The following documentation explains the manual changes you need to make to your deployment in order to be compatible with the updated files schema.

    Updated files schema

    The updated granule files schema can be found here.

    These former properties were deprecated (with notes about how to derive the same information from the updated schema, if possible):

    • filename - concatenate the bucket and key values with a directory separator (/)
    • name - use fileName property
    • etag - ETags are no longer provided as an individual file property. Instead, a separate etags object mapping S3 URIs to ETag values is provided as output from the following workflow tasks (guidance on how to integrate this output with your workflows is provided in the Upgrading your workflows section below):
      • update-granules-cmr-metadata-file-links
      • hyrax-metadata-updates
    • fileStagingDir - no longer supported
    • url_path - no longer supported
    • duplicate_found - This property is no longer supported, however sync-granule and move-granules now produce a separate granuleDuplicates object as part of their output. The granuleDuplicates object is a map of granules by granule ID which includes the files that encountered duplicates during processing. Guidance on how to integrate granuleDuplicates information into your workflow configuration is provided below.

    Exceptions

    These workflow tasks did not have their schema for granule files updated:

    • discover-granules - no updates
    • queue-granules - no updates
    • parse-pdr - no updates
    • sync-granule - input schema not updated, output schema was updated

    The reason that these task schemas were not updated is that all of these tasks start before the files have been ingested to S3, thus much of the information that is required in the updated files schema like bucket, key, or checksum is not yet known.

    Bulk granule operations

    Since the input schema for the above tasks was not updated, that means you cannot run bulk granule operations against workflows if they start with any of those tasks. Bulk granule operations work by loading the specified granules from the database and sending them as input to a specified workflow, so if the specified workflow begins with a task whose input schema does not conform to what is coming out of the database, there will be schema errors.

    Upgrading your deployment

    Upgrading your workflows

    For any workflows using the update-granules-cmr-metadata-file-links task before the hyrax-metadata-updates and/or post-to-cmr tasks, update the step definition for update-granules-cmr-metadata-file-links as follows:

        "UpdateGranulesCmrMetadataFileLinksStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    hyrax-metadata-updates

    For any workflows using the hyrax-metadata-updates task before a post-to-cmr task, update the definition of the hyrax-metadata-updates step as follows:

        "HyraxMetadataUpdatesTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    post-to-cmr

    For any workflows using post-to-cmr task after the update-granules-cmr-metadata-file-links or hyrax-metadata-updates tasks, update the post-to-cmr step definition as follows:

        "CmrStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}"
    }
    }
    },
    ...more configuration...

    Example workflow

    For an example workflow integrating all of these changes, please see our example ingest and publish workflow.

    Optional - Integrate granuleDuplicates information

    Please note that the granuleDuplicates output is purely informational and does not have any bearing on the separate configuration for how duplicates should be handled.

    You can include granuleDuplicates output from the sync-granule or move-granules tasks in your workflow messages like so:

        "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    ...other config...
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granuleDuplicates}",
    "destination": "{$.meta.sync_granule.granule_duplicates}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    }
    ...more configuration...

    The result of this configuration is that the granuleDuplicates output from sync-granule would be placed in meta.sync_granule.granule_duplicates on the workflow message and remain there throughout the rest of the workflow. The same configuration could be replicated for the move-granules task, but be sure to use a different destination in the workflow message for the granuleDuplicates output .

    Updating collection URL path templates

    Collections can specify url_path templates to dynamically generate the final location of files. As part of url_path templates, file object properties can be interpolated to generate the file path. Thus, these url_path templates need to be updated to ensure that they are compatible with the updated files schema and the properties that will actually be available on file objects.

    See the notes on the updated files schema to know which properties are available and which previously existing properties were deprecated.

    As an example, you will want to update any url_path properties in your collections to remove references to file.name and replace them with references to file.fileName like so:

    - "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.name, 0, 3)}",
    + "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.fileName, 0, 3)}",
    - + \ No newline at end of file diff --git a/docs/v10.1.0/upgrade-notes/upgrade-rds/index.html b/docs/v10.1.0/upgrade-notes/upgrade-rds/index.html index d411723510a..cca77510f0c 100644 --- a/docs/v10.1.0/upgrade-notes/upgrade-rds/index.html +++ b/docs/v10.1.0/upgrade-notes/upgrade-rds/index.html @@ -5,7 +5,7 @@ Upgrade to RDS release | Cumulus Documentation - + @@ -21,7 +21,7 @@ | cutoffSeconds | number | Number of seconds prior to this execution to 'cutoff' reconciliation queries. This allows in-progress/other in-flight operations time to complete and propagate to Elasticsearch/Dynamo/postgres. | 3600 | | dbConcurrency | number | Sets max number of parallel collections reports the script will run at a time. | 20 | | dbMaxPool | number | Sets the maximum number of connections the database pool has available. Modifying this may result in unexpected failures. | 20 |

    - + \ No newline at end of file diff --git a/docs/v10.1.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html b/docs/v10.1.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html index cacd96a9357..96d3e14e7cb 100644 --- a/docs/v10.1.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html +++ b/docs/v10.1.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html @@ -5,13 +5,13 @@ Upgrade to TF version 0.13.6 | Cumulus Documentation - +
    Version: v10.1.0

    Upgrade to TF version 0.13.6

    Background

    Cumulus pins its support to a specific version of Terraform see: deployment documentation. The reason for only supporting one specific Terraform version at a time is to avoid deployment errors than can be caused by deploying to the same target with different Terraform versions.

    Cumulus is upgrading its supported version of Terraform from 0.12.12 to 0.13.6. This document contains instructions on how to perform the upgrade for your deployments.

    Prerequisites

    • Follow the Terraform guidance for what to do before upgrading, notably ensuring that you have no pending changes to your Cumulus deployments before proceeding.
      • You should do a terraform plan to see if you have any pending changes for your deployment (for both the data-persistence-tf and cumulus-tf modules), and if so, run a terraform apply before doing the upgrade to Terraform 0.13.6
    • Review the Terraform v0.13 release notes to prepare for any breaking changes that may affect your custom deployment code. Cumulus' deployment code has already been updated for compatibility with version 0.13.
    • Install Terraform version 0.13.6. We recommend using Terraform Version Manager tfenv to manage your installed versons of Terraform, but this is not required.

    Upgrade your deployment code

    Terraform 0.13 does not support some of the syntax from previous Terraform versions, so you need to upgrade your deployment code for compatibility.

    Terraform provides a 0.13upgrade command as part of version 0.13 to handle automatically upgrading your code. Make sure to check out the documentation on batch usage of 0.13upgrade, which will allow you to upgrade all of your Terraform code with one command.

    Run the 0.13upgrade command until you have no more necessary updates to your deployment code.

    Upgrade your deployment

    1. Ensure that you are running Terraform 0.13.6 by running terraform --version. If you are using tfenv, you can switch versions by running tfenv use 0.13.6.

    2. For the data-persistence-tf and cumulus-tf directories, take the following steps:

      1. Run terraform init --reconfigure. The --reconfigure flag is required, otherwise you might see an error like:

        Error: Failed to decode current backend config

        The backend configuration created by the most recent run of "terraform init"
        could not be decoded: unsupported attribute "lock_table". The configuration
        may have been initialized by an earlier version that used an incompatible
        configuration structure. Run "terraform init -reconfigure" to force
        re-initialization of the backend.
      2. Run terraform apply to perform a deployment.

        WARNING: Even if Terraform says that no resource changes are pending, running the apply using Terraform version 0.13.6 will modify your backend state from version 0.12.12 to version 0.13.6 without requiring approval. Updating the backend state is a necessary part of the version 0.13.6 upgrade, but it is not completely transparent.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/workflow_tasks/discover_granules/index.html b/docs/v10.1.0/workflow_tasks/discover_granules/index.html index 2f76c15198c..52c3d23b14e 100644 --- a/docs/v10.1.0/workflow_tasks/discover_granules/index.html +++ b/docs/v10.1.0/workflow_tasks/discover_granules/index.html @@ -5,7 +5,7 @@ Discover Granules | Cumulus Documentation - + @@ -21,7 +21,7 @@ included in a granule's file list. That is, no such filtering based on filename occurs as described above.

    When set on the task configuration, the value applies to all collections during discovery. Otherwise, this property may be set on individual collections.

    Concurrency

    A number property that determines the level of concurrency with which granule duplicate checks are performed when duplicateGranuleHandling is skip or error.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when discover-granules discovers a large number of granules with skip or error duplicate handling. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the discover-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/workflow_tasks/files_to_granules/index.html b/docs/v10.1.0/workflow_tasks/files_to_granules/index.html index 09e8d2220a8..dada3921cad 100644 --- a/docs/v10.1.0/workflow_tasks/files_to_granules/index.html +++ b/docs/v10.1.0/workflow_tasks/files_to_granules/index.html @@ -5,13 +5,13 @@ Files To Granules | Cumulus Documentation - +
    Version: v10.1.0

    Files To Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming config.inputGranules and the task input list of s3 URIs along with the rest of the configuration objects to take the list of incoming files and sort them into a list of granule objects.

    Please note Files passed in without metadata defined previously for config.inputGranules will be added with the following keys:

    • size
    • bucket
    • key
    • fileName

    It is primarily intended to support compatibility with the standard output of a processing task, and convert that output into a granule object accepted as input by the majority of other Cumulus tasks.

    Task Inputs

    Input

    This task expects an incoming input that contains an array of 'staged' S3 URIs to move to their final archive location.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    inputGranules

    An array of Cumulus granule objects.

    This object will be used to define metadata values for the move granules task, and is the basis for the updated object that will be added to the output.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/workflow_tasks/lzards_backup/index.html b/docs/v10.1.0/workflow_tasks/lzards_backup/index.html index 8fdb4a8d007..a14d1d75268 100644 --- a/docs/v10.1.0/workflow_tasks/lzards_backup/index.html +++ b/docs/v10.1.0/workflow_tasks/lzards_backup/index.html @@ -5,13 +5,13 @@ LZARDS Backup | Cumulus Documentation - +
    Version: v10.1.0

    LZARDS Backup

    The LZARDS backup task takes an array of granules and initiates backup requests to the LZARDS API, which will be handled asynchronously by LZARDS.

    Deployment

    The LZARDS backup task is not automatically deployed with Cumulus. To deploy the task through the Cumulus module, first you must specify a lzards_launchpad_passphrase in your terraform variables (e.g. variables.tf) like so:

    variable "lzards_launchpad_passphrase" {
    type = string
    default = ""
    }

    Then you can specify a value for your lzards_launchpad_passphrase in terraform.tfvars like so:

    lzards_launchpad_passphrase = your-passphrase

    Lastly, you need to make sure that the lzards_launchpad_passphrase is passed into the Cumulus module (in main.tf) like so:

    lzards_launchpad_passphrase  = var.lzards_launchpad_passphrase

    In short, deploying the LZARDS task requires configuring a passphrase variable and ensuring that your TF configuration passes that variable into the Cumulus module.

    Additional terraform configuration for the LZARDS task can be found in the cumulus module's variables.tf file, where the the relevant variables are prefixed with lzards_. You can add these variables to your deployment using the same process outlined above for lzards_launchpad_passphrase.

    Task Inputs

    Input

    This task expects an array of granules as input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Task Outputs

    Output

    The LZARDS task outputs a composite object containing:

    • the input granules array, and
    • a backupResults object that describes the results of LZARDS backup attempts.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/workflow_tasks/move_granules/index.html b/docs/v10.1.0/workflow_tasks/move_granules/index.html index 8a2ba1a96e3..9dd6c7527d8 100644 --- a/docs/v10.1.0/workflow_tasks/move_granules/index.html +++ b/docs/v10.1.0/workflow_tasks/move_granules/index.html @@ -5,13 +5,13 @@ Move Granules | Cumulus Documentation - +
    Version: v10.1.0

    Move Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming event.input array of Cumulus granule objects to do the following:

    • Move granules from their 'staging' location to the final location (as configured in the Sync Granules task)

    • Update the event.input object with the new file locations.

    • If the granule has a ECHO10/UMM CMR file(.cmr.xml or .cmr.json) file included in the event.input:

      • Update that file's access locations

      • Add it to the appropriate access URL category for the CMR filetype as defined by granule CNM filetype.

      • Set the CMR file to 'metadata' in the output granules object and add it to the granule files if it's not already present.

        Please note: Granules without a valid CNM type set in the granule file type field in event.input will be treated as "data" in the updated CMR metadata file

    • Task then outputs an updated list of granule objects.

    Task Inputs

    Input

    This task expects an incoming input that contains a list of 'staged' S3 URIs to move to their final archive location. If CMR metadata is to be updated for a granule, it must also be included in the input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects event.input to provide an array of Cumulus granule objects. The files listed for each granule represent the files to be acted upon as described in summary.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects with post-move file locations as the payload for the next task, and returns only the expected payload for the next task. If a CMR file has been specified for a granule object, the CMR resources related to the granule files will be updated according to the updated granule file metadata.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v10.1.0/workflow_tasks/parse_pdr/index.html b/docs/v10.1.0/workflow_tasks/parse_pdr/index.html index d7263d4da47..ae4cc067c02 100644 --- a/docs/v10.1.0/workflow_tasks/parse_pdr/index.html +++ b/docs/v10.1.0/workflow_tasks/parse_pdr/index.html @@ -5,13 +5,13 @@ Parse PDR | Cumulus Documentation - +
    Version: v10.1.0

    Parse PDR

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to do the following with the incoming PDR object:

    • Stage it to an internal S3 bucket

    • Parse the PDR

    • Archive the PDR and remove the staged file if successful

    • Outputs a payload object containing metadata about the parsed PDR (e.g. total size of all files, files counts, etc) and a granules object

    The constructed granules object is created using PDR metadata to determine values like data type and version, collection definitions to determine a file storage location based on the extracted data type and version number.

    Granule file types are converted from the PDR spec types to CNM types according to the following translation table:

      HDF: 'data',
    HDF-EOS: 'data',
    SCIENCE: 'data',
    BROWSE: 'browse',
    METADATA: 'metadata',
    BROWSE_METADATA: 'metadata',
    QA_METADATA: 'metadata',
    PRODHIST: 'qa',
    QA: 'metadata',
    TGZ: 'data',
    LINKAGE: 'data'

    Files missing file types will have none assigned, files with invalid types will result in a PDR parse failure.

    Task Inputs

    Input

    This task expects an incoming input that contains name and path information about the PDR to be parsed. For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    Provider

    A Cumulus provider object. Used to define connection information for retrieving the PDR.

    Bucket

    Defines the bucket where the 'pdrs' folder for parsed PDRs will be stored.

    Collection

    A Cumulus collection object. Used to define granule file groupings and granule metadata for discovered files.

    Task Outputs

    This task outputs a single payload output object containing metadata about the parsed PDR (e.g. filesCount, totalSize, etc), a pdr object with information for later steps and a the generated array of granule objects.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v10.1.0/workflow_tasks/queue_granules/index.html b/docs/v10.1.0/workflow_tasks/queue_granules/index.html index 5594739aeff..d7bdc29a867 100644 --- a/docs/v10.1.0/workflow_tasks/queue_granules/index.html +++ b/docs/v10.1.0/workflow_tasks/queue_granules/index.html @@ -5,14 +5,14 @@ Queue Granules | Cumulus Documentation - +
    Version: v10.1.0

    Queue Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions, and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to schedule ingest of granules that were discovered on a remote host, whether via the DiscoverGranules task or the ParsePDR task.

    The task utilizes a defined collection in concert with a defined provider, either on each granule, or passed in via config to queue up ingest executions for each granule, or for batches of granules.

    The constructed granules object is defined by the collection passed in the configuration, and has impacts to other provided core Cumulus Tasks.

    Users of this task in a workflow are encouraged to carefully consider their configuration in context of downstream tasks and workflows.

    Task Inputs

    Each of the following sections are a high-level discussion of the intent of the various input/output/config values.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects an incoming input that contains granules and information about them and their files. For the specifics, see the Cumulus Tasks page entry for the schema.

    This input is most commonly the output from a preceding DiscoverGranules or ParsePDR task.

    Cumulus Configuration

    This task does expect values to be set in the task_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    provider

    A Cumulus provider object for the originating provider. Will be passed along to the ingest workflow. This will be overruled by more specific provider information that may exist on a granule.

    internalBucket

    The Cumulus internal system bucket.

    granuleIngestWorkflow

    A string property that denotes the name of the ingest workflow into which granules should be queued.

    queueUrl

    A string property that denotes the URL of the queue to which scheduled execution messages are sent.

    preferredQueueBatchSize

    A number property that sets an upper bound on the size of each batch of granules queued into the payload of an ingest execution. Setting this property to a value higher than 1 allows queueing of multiple granules per ingest workflow.

    As ingest executions typically expect granules in the payload to have a common collection and common provider, this property only sets an upper bound within which batches will be created based on common collection and provider information.

    This means batches may be smaller than the preferred size if collection or provider information diverge, but never larger.

    The default value if none is specified is 1, which will queue one ingest execution per granule.

    concurrency

    A number property that determines the level of concurrency with which ingest executions are scheduled. Granules or batches of granules will be queued up into executions at this level of concurrency.

    This property is also used to limit concurrency when updating granule status to queued.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when queue-granules receives a large number of granules as input. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the queue-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    executionNamePrefix

    A string property that will prefix the names of scheduled executions.

    childWorkflowMeta

    An object property that will be merged into the scheduled execution input's meta field.

    Task Outputs

    This task outputs an assembled array of workflow execution ARNs for all scheduled workflow executions within the payload's running object.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/workflows/cumulus-task-message-flow/index.html b/docs/v10.1.0/workflows/cumulus-task-message-flow/index.html index c166690f04b..984b8a762b2 100644 --- a/docs/v10.1.0/workflows/cumulus-task-message-flow/index.html +++ b/docs/v10.1.0/workflows/cumulus-task-message-flow/index.html @@ -5,14 +5,14 @@ Cumulus Tasks: Message Flow | Cumulus Documentation - +
    Version: v10.1.0

    Cumulus Tasks: Message Flow

    Cumulus Tasks comprise Cumulus Workflows and are either AWS Lambda tasks or AWS Elastic Container Service (ECS) activities. Cumulus Tasks permit a payload as input to the main task application code. The task payload is additionally wrapped by the Cumulus Message Adapter. The Cumulus Message Adapter supplies additional information supporting message templating and metadata management of these workflows.

    Diagram showing how incoming and outgoing Cumulus messages for workflow steps are handled by the Cumulus Message Adapter

    The steps in this flow are detailed in sections below.

    Cumulus Message Format

    A full Cumulus Message has the following keys:

    • cumulus_meta: System runtime information that should generally not be touched outside of Cumulus library code or the Cumulus Message Adapter. Stores meta information about the workflow such as the state machine name and the current workflow execution's name. This information is used to look up the current active task. The name of the current active task is used to look up the corresponding task's config in task_config.
    • meta: Runtime information captured by the workflow operators. Stores execution-agnostic variables.
    • payload: Payload is runtime information for the tasks.

    In addition to the above keys, it may contain the following keys:

    • replace: A key generated in conjunction with the Cumulus Message adapter. It contains the location on S3 for a message payload and a Target JSON path in the message to extract it to.
    • exception: A key used to track workflow exceptions, should not be modified outside of Cumulus library code.

    Here's a simple example of a Cumulus Message:

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    A message utilizing the Cumulus Remote message functionality must have at least the keys replace and cumulus_meta. Depending on configuration other portions of the message may be present, however the cumulus_meta, meta, and payload keys must be present once extraction is complete.

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    Cumulus Message Preparation

    The event coming into a Cumulus Task is assumed to be a Cumulus Message and should first be handled by the functions described below before being passed to the task application code.

    Preparation Step 1: Fetch remote event

    Fetch remote event will fetch the full event from S3 if the cumulus message includes a replace key.

    Once "my-large-event.json" is fetched from S3, it's returned from the fetch remote event function. If no "replace" key is present, the event passed to the fetch remote event function is assumed to be a complete Cumulus Message and returned as-is.

    Preparation Step 2: Parse step function config from CMA configuration parameters

    This step determines what current task is being executed. Note this is different from what lambda or activity is being executed, because the same lambda or activity can be used for different tasks. The current task name is used to load the appropriate configuration from the Cumulus Message's 'task_config' configuration parameter.

    Preparation Step 3: Load nested event

    Using the config returned from the previous step, load nested event resolves templates for the final config and input to send to the task's application code.

    Task Application Code

    After message prep, the message passed to the task application code is of the form:

    {
    "input": {},
    "config": {}
    }

    Create Next Message functions

    Whatever comes out of the task application code is used to construct an outgoing Cumulus Message.

    Create Next Message Step 1: Assign outputs

    The config loaded from the Fetch step function config step may have a cumulus_message key. This can be used to "dispatch" fields from the task's application output to a destination in the final event output (via URL templating). Here's an example where the value of input.anykey would be dispatched as the value of payload.out in the final cumulus message:

    {
    "task_config": {
    "bar": "baz",
    "cumulus_message": {
    "input": "{$.payload.input}",
    "outputs": [
    {
    "source": "{$.input.anykey}",
    "destination": "{$.payload.out}"
    }
    ]
    }
    },
    "cumulus_meta": {
    "task": "Example",
    "message_source": "local",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "input": {
    "anykey": "anyvalue"
    }
    }
    }

    Create Next Message Step 2: Store remote event

    If the ReplaceConfiguration parameter is set, the configured key's value will be stored in S3 and the final output of the task will include a replace key that contains configuration for a future step to extract the payload on S3 back into the Cumulus Message. The replace key identifies where the large event node has been stored in S3.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/workflows/developing-a-cumulus-workflow/index.html b/docs/v10.1.0/workflows/developing-a-cumulus-workflow/index.html index b6c4b8649e2..45c20d04842 100644 --- a/docs/v10.1.0/workflows/developing-a-cumulus-workflow/index.html +++ b/docs/v10.1.0/workflows/developing-a-cumulus-workflow/index.html @@ -5,13 +5,13 @@ Creating a Cumulus Workflow | Cumulus Documentation - +
    Version: v10.1.0

    Creating a Cumulus Workflow

    The Cumulus workflow module

    To facilitate adding a workflows to your deployment Cumulus provides a workflow module.

    In combination with the Cumulus message, the workflow module provides a way to easily turn a Step Function definition into a Cumulus workflow, complete with:

    Using the module also ensures that your workflows will continue to be compatible with future versions of Cumulus.

    For more on the full set of current available options for the module, please consult the module README.

    Adding a new Cumulus workflow to your deployment

    To add a new Cumulus workflow to your deployment that is using the cumulus module, add a new workflow resource to your deployment directory, either in a new .tf file, or to an existing file.

    The workflow should follow a syntax similar to:

    module "my_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/vx.x.x/terraform-aws-cumulus-workflow.zip"

    prefix = "my-prefix"
    name = "MyWorkflowName"
    system_bucket = "my-internal-bucket"

    workflow_config = module.cumulus.workflow_config

    tags = { Deployment = var.prefix }

    state_machine_definition = <<JSON
    {}
    JSON
    }

    In the above example, you would add your state_machine_definition using the Amazon States Language, using tasks you've developed and Cumulus core tasks that are made available as part of the cumulus terraform module.

    Please note: Cumulus follows the convention of tagging resources with the prefix variable { Deployment = var.prefix } that you pass to the cumulus module. For resources defined outside of Core, it's recommended that you adopt this convention as it makes resources and/or deployment recovery scenarios much easier to manage.

    Examples

    For a functional example of a basic workflow, please take a look at the hello_world_workflow.

    For more complete/advanced examples, please read the following cookbook entries/topics:

    - + \ No newline at end of file diff --git a/docs/v10.1.0/workflows/developing-workflow-tasks/index.html b/docs/v10.1.0/workflows/developing-workflow-tasks/index.html index f600a8cd8a9..2564cdf3819 100644 --- a/docs/v10.1.0/workflows/developing-workflow-tasks/index.html +++ b/docs/v10.1.0/workflows/developing-workflow-tasks/index.html @@ -5,13 +5,13 @@ Developing Workflow Tasks | Cumulus Documentation - +
    Version: v10.1.0

    Developing Workflow Tasks

    Workflow tasks can be either AWS Lambda Functions or ECS Activities.

    Lambda functions

    The full set of available core Lambda functions can be found in the deployed cumulus module zipfile at /tasks, as well as reference documentation here. These Lambdas can be referenced in workflows via the outputs from that module (see the cumulus-template-deploy repo for an example).

    The tasks source is located in the Cumulus repository at cumulus/tasks.

    You can also develop your own Lambda function. See the Lambda Functions page to learn more.

    ECS Activities

    ECS activities are supported via the cumulus_ecs_module available from the Cumulus release page.

    Please read the module README for configuration details.

    For assistance in creating a task definition within the module read the AWS Task Definition Docs.

    For a step-by-step example of using the cumulus_ecs_module, please see the related cookbook entry.

    Cumulus Docker Image

    ECS activities require a docker image. Cumulus provides a docker image (source for node 12x+ lambdas on dockerhub: cumuluss/cumulus-ecs-task.

    Alternate Docker Images

    Custom docker images/runtimes are supported as are private registries. For details on configuring a private registry/image see the AWS documentation on Private Registry Authentication for Tasks.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/workflows/docker/index.html b/docs/v10.1.0/workflows/docker/index.html index 0397d6fb97b..a51caf4a2e1 100644 --- a/docs/v10.1.0/workflows/docker/index.html +++ b/docs/v10.1.0/workflows/docker/index.html @@ -5,7 +5,7 @@ Dockerizing Data Processing | Cumulus Documentation - + @@ -14,7 +14,7 @@ 2) validate the output (in this case just check for existence) 3) use 'ncatted' to update the resulting file to be CF-compliant 4) write out metadata generated for this file

    Process Testing

    It is important to have tests for data processing, however in many cases datafiles can be large so it is not practical to store the test data in the repository. Instead, test data is currently stored on AWS S3, and can be retrieved using the AWS CLI.

    aws s3 sync s3://cumulus-ghrc-logs/sample-data/collection-name data

    Where collection-name is the name of the data collection, such as 'avaps', or 'cpl'. For example, an abridged version of the data for CPL includes:

    ├── cpl
    │   ├── input
    │   │   ├── HS3_CPL_ATB_12203a_20120906.hdf5
    │   │   ├── HS3_CPL_OP_12203a_20120906.hdf5
    │   └── output
    │   ├── HS3_CPL_ATB_12203a_20120906.nc
    │   ├── HS3_CPL_ATB_12203a_20120906.nc.meta.xml
    │   ├── HS3_CPL_OP_12203a_20120906.nc
    │   ├── HS3_CPL_OP_12203a_20120906.nc.meta.xml

    Contained in the input directory are all possible sets of data files, while the output directory is the expected result of processing. In this case the hdf5 files are converted to NetCDF files and XML metadata files are generated.

    The docker image for a process can be used on the retrieved test data. First create a test-output directory in the newly created data directory.

    mkdir data/test-output

    Then run the docker image using docker-compose.

    docker-compose run test

    This will process the data in the data/input directory and put the output into data/test-output. Repositories also include Python based tests which will validate this newly created output to the contents of data/output. Use Python's Nose tool to run the included tests.

    nosetests

    If the data/test-output directory validated against the contents of data/output the tests will be successful, otherwise an error will be reported.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/workflows/index.html b/docs/v10.1.0/workflows/index.html index 92963ed5ac7..c1df1a99081 100644 --- a/docs/v10.1.0/workflows/index.html +++ b/docs/v10.1.0/workflows/index.html @@ -5,13 +5,13 @@ Workflows | Cumulus Documentation - +
    Version: v10.1.0

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    Provider data ingest and GIBS have a set of common needs in getting data from a source system and into the cloud where they can be distributed to end users. These common needs are:

    • Data Discovery - Crawling, polling, or detecting changes from a variety of sources.
    • Data Transformation - Taking data files in their original format and extracting and transforming them into another desired format such as visible browse images.
    • Archival - Storage of the files in a location that's accessible to end users.

    The high level view of the architecture and many of the individual steps are the same but the details of ingesting each type of collection differs. Different collection types and different providers have different needs. The individual boxes of a workflow are not only different. The branching, error handling, and multiplicity of the arrows connecting the boxes are also different. Some need visible images rendered from component data files from multiple collections. Some need to contact the CMR with updated metadata. Some will have different retry strategies to handle availability issues with source data systems.

    AWS and other cloud vendors provide an ideal solution for parts of these problems but there needs to be a higher level solution to allow the composition of AWS components into a full featured solution. The Ingest Workflow Architecture is designed to meet the needs for Earth Science data ingest and transformation.

    Goals

    Flexibility and Composability

    The steps to ingest and process data is different for each collection within a provider. Ingest should be as flexible as possible in the rearranging of steps and configuration.

    We want to use lego-like individual steps that can be composed by an operator.

    Individual steps should ...

    • Be as ignorant as possible of the overall flow. They should not be aware of previous steps.
    • Be runnable on their own.
    • Define their input and output in simple data structures.
    • Be domain agnostic.
    • Not make assumptions of specifics of what goes into a granule for example.

    Scalable

    The ingest architecture needs to be scalable both to handle ingesting hundreds of millions of granules and interpret dozens of different workflows.

    Data Provenance

    • We should have traceability for how data was produced and where it comes from.
    • Use immutable representations of data. Data once received is not overwritten. Data can be removed for cleanup.
    • All software is versioned. We can trace transformation of data by tracking the immutable source data and the versioned software applied to it.

    Operator Visibility and Control

    • Operators should be able to see and understand everything that is happening in the system.
    • It should be obvious why things are happening and straightforward to diagnose problems.
    • We generally assume that the operators know best in terms of the limits on a providers infrastructure, how often things need to be done, and details of a collection. The architecture should defer to their decisions and knowledge while providing safety nets to prevent problems.

    A Reconfigurable Workflow Architecture

    The Ingest Workflow Architecture is defined by two entity types, Workflows and Tasks. A Workflow is a set of composed Tasks to complete an objective such as ingesting a granule. Tasks are the individual steps of a Workflow that perform one job. The workflow is responsible for executing the right task based on the current state and response from the last task executed. Tasks are completely decoupled in that they don't call each other or even need to know about the presence of other tasks.

    Workflows and tasks are configured as Terraform resources, which are triggered via configured rules within Cumulus.

    Diagram showing the Step Function execution path through workflow tasks for a collection ingest

    See the Example GIBS Ingest Architecture showing how workflows and tasks are used to define the GIBS Ingest Architecture.

    Workflows

    A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions.

    Benefits of AWS Step Functions

    AWS Step functions are described in detail in the AWS documentation but they provide several benefits which are applicable to AWS.

    • Prebuilt solution
    • Operations Visibility
      • Visual diagram
      • Every execution is recorded with both inputs and output for every step.
    • Composability
      • Allow composing AWS Lambdas and code running in other steps. Code can be run in EC2 to interface with it or even on premise if desired.
      • Step functions allow specifying when steps run in parallel or choices between steps based on data from the previous step.
    • Flexibility
      • Step functions are designed to be easy to build new applications and reconfigure. We're exposing that flexibility directly to the provider.
    • Reliability and Error Handling
      • Step functions allow configuration of retries and adding handling of error conditions.
    • Described via data
      • This makes it easy to save the step function in configuration management solutions.
      • We can build simple interfaces on top of the flexibility provided.

    Workflow Scheduler

    The scheduler is responsible for initiating a step function and passing in the relevant data for a collection. This is currently configured as an interval for each collection. The scheduler service creates the initial event by combining the collection configuration with the AWS execution context defined via the cumulus terraform module.

    Tasks

    A workflow is composed of tasks. Each task is responsible for performing a discrete step of the ingest process. These can be activities like:

    • Crawling a provider website for new data.
    • Uploading data from a provider to S3.
    • Executing a process to transform data.

    AWS Step Functions permit tasks to be code running anywhere, even on premise. We expect most tasks will be written as Lambda functions in order to take advantage of the easy deployment, scalability, and cost benefits provided by AWS Lambda.

    • Leverages Existing Work
      • The design leverages the existing work of Amazon by defining workflows using the AWS Step Function State Language. This is the language that was created for describing the state machines used in AWS Step Functions.
    • Open for Extension
      • Both meta and task_config which are used for configuring at the collection and task levels do not dictate the fields and structure of the configuration. Additional task specific JSON schemas can be used for extending the validation of individual steps.
    • Data-centric Configuration
      • The use of a single JSON configuration file allows this to be added to a workflow. We build additional support on top of the configuration file for simpler domain specific configuration or interactive GUIs.

    For more details on Task Messages and Configuration, visit Cumulus configuration and message protocol documentation.

    Ingest Deploy

    To view deployment documentation, please see the Cumulus deployment documentation.

    Tradeoffs, and Benefits

    This section documents various tradeoffs and benefits of the Ingest Workflow Architecture.

    Tradeoffs

    Workflow execution is handled completely by AWS

    This means we can't add our own code into the orchestration of the workflow. We can't add new features not supported by Step Functions. We can't do things like enforce that the responses from tasks always conform to a schema or extract the configuration for a task ahead of it's execution.

    If we implemented our own orchestration we'd be able to add all of these. We save significant amounts of development effort and gain all the features of Step Functions for this trade off. One workaround is by providing a library of common task capabilities. These would optionally be available to tasks that can be implemented with Node.js and are able to include the library.

    Workflow Configuration is specified in AWS Step Function States Language

    The current design combines the states language defined by AWS with Ingest specific configuration. This means our representation has a tight coupling with their standard. If they make backwards incompatible changes in the future we will have to deal with existing projects written against that.

    We avoid having to develop our own standard and code to process it. The design can support new features in AWS Step Functions without needing to update the Ingest library code changes. It is unlikely they will make a backwards incompatible change at this point. One mitigation for this is writing data transformations to a new format if that were to happen.

    Collection Configuration Flexibility vs Complexity

    The Collections Configuration File is very flexible but requires more knowledge of AWS step functions to configure. A person modifying this file directly would need to comfortable editing a JSON file and configuring AWS Step Functions state transitions which address AWS resources.

    The configuration file itself is not necessarily meant to be edited by a human directly. Since we are developing a reconfigurable, composable architecture that specified entirely in data additional tools can be developed on top of it. The existing recipes.json files can be mapped to this format. Operational Tools like a GUI can be built that provide a usable interface for customizing workflows but it will take time to develop these tools.

    Benefits

    This section describes benefits of the Ingest Workflow Architecture.

    Simplicity

    The concepts of Workflows and Tasks are simple ones that should make sense to providers. Additionally, the implementation will only consist of a few components because the design leverages existing services and capabilities of AWS. The Ingest implementation will only consist of some reusable task code to make task implementation easier, Ingest deployment, and the Workflow Scheduler.

    Composability

    The design aims to satisfy the needs for ingest integrating different workflows for providers. It's flexible in terms of the ability to arrange tasks to meet the needs of a collection. Providers have developed and incorporated open source tools over the years. All of these are easily integrable into the workflows as tasks.

    There is low coupling between task steps. Failures of one component don't bring the whole system down. Individual tasks can be deployed separately.

    Scalability

    AWS Step Functions scale up as needed and aren't limited by a set of number of servers. They also easily allow you to leverage the inherent scalability of serverless functions.

    Monitoring and Auditing

    • Every execution is captured.
    • Every task run has captured input and outputs.
    • CloudWatch Metrics can be used for monitoring many of the events with the StepFunctions. It can also generate alarms for the whole process.
    • Visual report of the entire configuration.
      • Errors and success states are highlighted visually in the flow.

    Data Provenance

    • Monitoring and auditing ensures we know the data that was given to a task.
    • Workflows are versioned and the state machines stored in AWS Step Functions are immutable. Once created they cannot change.
    • Versioning of data in S3 or using immutable records in S3 will mean we always know what data was created as the result of a step or fed into a step.

    Appendix

    Example GIBS Ingest Architecture

    This shows the GIBS Ingest Architecture as an example of the use of the Ingest Workflow Architecture.

    • The GIBS Ingest Architecture consists of two workflows per collection type. There is one for discovery and one for ingest. The final stage of discovery triggers multiple ingest workflows for each MRF granule that needs to be generated.
    • It demonstrates both lambdas as tasks and a container used for MRF generation.

    GIBS Ingest Workflows

    Diagram showing the AWS Step Function execution path for a GIBS ingest workflow

    GIBS Ingest Granules Workflow

    This shows a visualization of an execution of the ingets granules workflow in step functions. The steps highlighted in green are the ones that executed and completed successfully.

    Diagram showing the AWS Step Function execution path for a GIBS ingest granules workflow

    - + \ No newline at end of file diff --git a/docs/v10.1.0/workflows/input_output/index.html b/docs/v10.1.0/workflows/input_output/index.html index 3351cae7ade..9612c220d28 100644 --- a/docs/v10.1.0/workflows/input_output/index.html +++ b/docs/v10.1.0/workflows/input_output/index.html @@ -5,14 +5,14 @@ Workflow Inputs & Outputs | Cumulus Documentation - +
    Version: v10.1.0

    Workflow Inputs & Outputs

    General Structure

    Cumulus uses a common format for all inputs and outputs to workflows. The same format is used for input and output from workflow steps. The common format consists of a JSON object which holds all necessary information about the task execution and AWS environment. Tasks return objects identical in format to their input with the exception of a task-specific payload field. Tasks may also augment their execution metadata.

    Cumulus Message Adapter

    The Cumulus Message Adapter and Cumulus Message Adapter libraries help task developers integrate their tasks into a Cumulus workflow. These libraries adapt input and outputs from tasks into the Cumulus Message format. The Scheduler service creates the initial event message by combining the collection configuration, external resource configuration, workflow configuration, and deployment environment settings. The subsequent workflow messages between tasks must conform to the message schema. By using the Cumulus Message Adapter, individual task Lambda functions only receive the input and output specifically configured for the task, and not non-task-related message fields.

    The Cumulus Message Adapter libraries are called by the tasks with a callback function containing the business logic of the task as a parameter. They first adapt the incoming message to a format more easily consumable by Cumulus tasks, then invoke the task, and then adapt the task response back to the Cumulus message protocol to be sent to the next task.

    A task's Lambda function can be configured to include a Cumulus Message Adapter library which constructs input/output messages and resolves task configurations. The CMA can then be included in one of several ways:

    Lambda Layer

    In order to make use of this configuration, a Lambda layer must be uploaded to your account. Due to platform restrictions, Core cannot currently support sharable public layers, however you can deploy the appropriate version from the release page in two ways:

    Once you've deployed the layer, integrate the CMA layer with your Lambdas:

    • If using the cumulus module, set the cumulus_message_adapter_lambda_layer_version_arn in your .tfvars file to integrate the CMA layer with all core Cumulus lambdas.
    • If including your own Lambda or ECS task Terraform modules, specify the CMA layer ARN in the Terraform resource definitions. Also, make sure to set the CUMULUS_MESSAGE_ADAPTER_DIR environment variable for the task to /opt for the CMA integration to work properly.

    In the future if you wish to update/change the CMA version you will need to update the deployed CMA, and update the layer configuration for the impacted Lambdas as needed.

    Please Note: Updating/removing a layer does not change a deployed Lambda, so to update the CMA you should deploy a new version of the CMA layer, update the associated Lambda configuration to reference the new CMA version, and re-deploy your Lambdas.

    Manual Addition

    You can include the CMA package in the Lambda code in the cumulus-message-adapter sub-directory in your lambda .zip, for any Lambda runtime that includes a python runtime. python 2 is included in Lambda runtimes that use Amazon Linux, however Amazon Linux 2 will not support this directly.

    Please note: It is expected that upcoming Cumulus releases will update the CMA layer to include a python runtime.

    If you are manually adding the message adapter to your source and utilizing the CMA, you should set the Lambda's CUMULUS_MESSAGE_ADAPTER_DIR environment variable to target the installation path for the CMA.

    CMA Input/Output

    Input to the task application code is a json object with keys:

    • input: By default, the incoming payload is the payload output from the previous task, or it can be a portion of the payload as configured for the task in the corresponding .tf workflow definition file.
    • config: Task-specific configuration object with URL templates resolved.

    Output from the task application code is returned in and placed in the payload key by default, but the config key can also be used to return just a portion of the task output.

    CMA configuration

    As of Cumulus > 1.15 and CMA > v1.1.1, configuration of the CMA is expected to be driven by AWS Step Function Parameters.

    Using the CMA package with the Lambda by any of the above mentioned methods (Lambda Layers, manual) requires configuration for its various features via a specific Step Function Parameters configuration format (see sample workflows in the examples cumulus-tf source for more examples):

    {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": "{some config}",
    "task_config": "{some config}"
    }
    }

    The "event.$": "$" parameter is required as it passes the entire incoming message to the CMA client library for parsing, and the CMA itself to convert the incoming message into a Cumulus message for use in the function.

    The following are the CMA's current configuration settings:

    ReplaceConfig (Cumulus Remote Message)

    Because of the potential size of a Cumulus message, mainly the payload field, a task can be set via configuration to store a portion of its output on S3 with a message key Remote Message that defines how to retrieve it and an empty JSON object {} in its place. If the portion of the message targeted exceeds the configured MaxSize (defaults to 0 bytes) it will be written to S3.

    The CMA remote message functionality can be configured using parameters in several ways:

    Partial Message

    Setting the Path/Target path in the ReplaceConfig parameter (and optionally a non-default MaxSize)

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 1,
    "Path": "$.payload",
    "TargetPath": "$.payload"
    }
    }
    }
    }
    }

    will result in any payload output larger than the MaxSize (in bytes) to be written to S3. The CMA will then mark that the key has been replaced via a replace key on the event. When the CMA picks up the replace key in future steps, it will attempt to retrieve the output from S3 and write it back to payload.

    Note that you can optionally use a different TargetPath than Path, however as the target is a JSON path there must be a key to target for replacement in the output of that step. Also note that the JSON path specified must target one node, otherwise the CMA will error, as it does not support multiple replacement targets.

    If TargetPath is omitted, it will default to the value for Path.

    Full Message

    Setting the following parameters for a lambda:

    DiscoverGranules:
    Parameters:
    cma:
    event.$: '$'
    ReplaceConfig:
    FullMessage: true

    will result in the CMA assuming the entire inbound message should be stored to S3 if it exceeds the default max size.

    This is effectively the same as doing:

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 0,
    "Path": "$",
    "TargetPath": "$"
    }
    }
    }
    }
    }

    Cumulus Message example

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Cumulus Remote Message example

    The message may contain a reference to an S3 Bucket, Key and TargetPath as follows:

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    task_config

    This configuration key contains the input/output configuration values for definition of inputs/outputs via URL paths. Important: These values are all relative to json object configured for event.$.

    This configuration's behavior is outlined in the CMA step description below.

    The configuration should follow the format:

    {
    "FunctionName": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "other_cma_configuration": "<config object>",
    "task_config": "<task config>"
    }
    }
    }
    }

    Example:

    {
    "StepFunction": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "sfnEnd": true,
    "stack": "{$.meta.stack}",
    "bucket": "{$.meta.buckets.internal.name}",
    "stateMachine": "{$.cumulus_meta.state_machine}",
    "executionName": "{$.cumulus_meta.execution_name}",
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    }
    }
    }

    Cumulus Message Adapter Steps

    1. Reformat AWS Step Function message into Cumulus Message

    Due to the way AWS handles Parameterized messages, when Parameters are used the CMA takes an inbound message:

    {
    "resource": "arn:aws:lambda:us-east-1:<lambda arn values>",
    "input": {
    "Other Parameter": {},
    "cma": {
    "ConfigKey": {
    "config values": "some config values"
    },
    "event": {
    "cumulus_meta": {},
    "payload": {},
    "meta": {},
    "exception": {}
    }
    }
    }
    }

    and takes the following actions:

    • Takes the object at input.cma.event and makes it the full input
    • Merges all of the keys except event under input.cma into the parent input object

    This results in the incoming message (presumably a Cumulus message) with any cma configuration parameters merged in being passed to the CMA. All other parameterized values defined outside of the cma key are ignored

    2. Resolve Remote Messages

    If the incoming Cumulus message has a replace key value, the CMA will attempt to pull the payload from S3,

    For example, if the incoming contains the following:

      "meta": {
    "foo": {}
    },
    "replace": {
    "TargetPath": "$.meta.foo",
    "Bucket": "some_bucket",
    "Key": "events/some-event-id"
    }

    The CMA will attempt to pull the file stored at Bucket/Key and replace the value at TargetPath, then remove the replace object entirely and continue.

    3. Resolve URL templates in the task configuration

    In the workflow configuration (defined under the task_config key), each task has its own configuration, and it can use URL template as a value to achieve simplicity or for values only available at execution time. The Cumulus Message Adapter resolves the URL templates (relative to the event configuration key) and then passes message to next task. For example, given a task which has the following configuration:

    {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }
    }
    }
    }

    and and incoming message that contains:

    {
    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    }
    }

    The corresponding Cumulus Message would contain:

    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }

    The message sent to the task would be:

    "config" : {
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    },
    "inlinestr": "prefixbarsuffix",
    "array": ["bar"],
    "object": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    },
    "input": "{...}"

    URL template variables replace dotted paths inside curly brackets with their corresponding value. If the Cumulus Message Adapter cannot resolve a value, it will ignore the template, leaving it verbatim in the string. While seemingly complex, this allows significant decoupling of Tasks from one another and the data that drives them. Tasks are able to easily receive runtime configuration produced by previously run tasks and domain data.

    4. Resolve task input

    By default, the incoming payload is the payload from the previous task. The task can also be configured to use a portion of the payload its input message. For example, given a task specifies cma.task_config.cumulus_message.input:

        ExampleTask:
    Parameters:
    cma:
    event.$: '$'
    task_config:
    cumulus_message:
    input: '{$.payload.foo}'

    The task configuration in the message would be:

        {
    "task_config": {
    "cumulus_message": {
    "input": "{$.payload.foo}"
    }
    },
    "payload": {
    "foo": {
    "anykey": "anyvalue"
    }
    }
    }

    The Cumulus Message Adapter will resolve the task input, instead of sending the whole payload as task input, the task input would be:

        {
    "input" : {
    "anykey": "anyvalue"
    },
    "config": {...}
    }

    5. Resolve task output

    By default, the task's return value is the next payload. However, the workflow task configuration can specify a portion of the return value as the next payload, and can also augment values to other fields. Based on the task configuration under cma.task_config.cumulus_message.outputs, the Message Adapter uses a task's return value to output a message as configured by the task-specific config defined under cma.task_config. The Message Adapter dispatches a "source" to a "destination" as defined by URL templates stored in the task-specific cumulus_message.outputs. The value of the task's return value at the "source" URL is used to create or replace the value of the task's return value at the "destination" URL. For example, given a task specifies cumulus_message.output in its workflow configuration as follows:

    {
    "ExampleTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    }
    }
    }
    }
    }

    The corresponding Cumulus Message would be:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Given the response from the task is:

        {
    "output": {
    "anykey": "boo"
    }
    }

    The Cumulus Message Adapter would output the following Cumulus Message:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    6. Apply Remote Message Configuration

    If the ReplaceConfig configuration parameter is defined, the CMA will evaluate the configuration options provided, and if required write a portion of the Cumulus Message to S3, and add a replace key to the message for future steps to utilize.

    Please Note: the non user-modifiable field cumulus-meta will always be retained, regardless of the configuration.

    For example, if the output message (post output configuration) from a cumulus message looks like:

        {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    the resultant output would look like:

    {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "replace": {
    "TargetPath": "$",
    "Bucket": "some-internal-bucket",
    "Key": "events/some-event-id"
    }
    }

    Additional features

    Validate task input, output and configuration messages against the schemas provided

    The Cumulus Message Adapter has the capability to validate task input, output and configuration messages against their schemas. The default location of the schemas is the schemas folder in the top level of the task and the default filenames are input.json, output.json, and config.json. The task can also configure a different schema location. If no schema can be found, the Cumulus Message Adapter will not validate the messages.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/workflows/lambda/index.html b/docs/v10.1.0/workflows/lambda/index.html index 8a6c012532a..895ac48b77b 100644 --- a/docs/v10.1.0/workflows/lambda/index.html +++ b/docs/v10.1.0/workflows/lambda/index.html @@ -5,13 +5,13 @@ Develop Lambda Functions | Cumulus Documentation - +
    Version: v10.1.0

    Develop Lambda Functions

    Develop a new Cumulus Lambda

    AWS provides great getting started guide for building Lambdas in the developer guide.

    Cumulus currently supports the following environments for Cumulus Message Adapter enabled functions:

    Additionally you may chose to include any of the other languages AWS supports as a resource with reduced feature support.

    Deploy a Lambda

    Node.js Lambda

    For a new Node.js Lambda, create a new function and add an aws_lambda_function resource to your Cumulus deployment (for examples, see the example in source example/lambdas.tf and ingest/lambda-functions.tf) as either a new .tf file, or added to an existing .tf file:

    resource "aws_lambda_function" "myfunction" {
    function_name = "${var.prefix}-function"
    filename = "/path/to/zip/lambda.zip"
    source_code_hash = filebase64sha256("/path/to/zip/lambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"

    vpc_config {
    subnet_ids = var.subnet_ids
    security_group_ids = var.security_group_ids
    }
    }

    Please note: This example contains the minimum set of required configuration.

    Make sure to include a vpc_config that matches the information you've provided the cumulus module if intending to integrate the lambda with a Cumulus deployment.

    Java Lambda

    Java Lambdas are created in much the same way as the Node.js example above.

    The source points to a folder with the compiled .class files and dependency libraries in the Lambda Java zip folder structure (details here), not an uber-jar.

    The deploy folder referenced here would contain a folder 'test_task/task/' which contains Task.class and TaskLogic.class as well as a lib folder containing dependency jars.

    Python Lambda

    Python Lambdas are created the same way as the Node.js example above.

    Cumulus Message Adapter

    For Lambdas wishing to utilize the Cumulus Message Adapter(CMA), you should define a layers key on your Lambda resource with the CMA you wish to include. See the input_output docs for more on how to create/use the CMA.

    Other Lambda Options

    Cumulus supports all of the options available to you via the aws_lambda_function Terraform resource. For more information on what's available, check out the Terraform resource docs.

    Cloudwatch log groups

    If you want to enable Cloudwatch logging for your Lambda resource, you'll need to add a aws_cloudwatch_log_group resource to your Lambda definition:

    resource "aws_cloudwatch_log_group" "myfunction_log_group" {
    name = "/aws/lambda/${aws_lambda_function.myfunction.function_name}"
    retention_in_days = 30
    tags = { Deployment = var.prefix }
    }
    - + \ No newline at end of file diff --git a/docs/v10.1.0/workflows/protocol/index.html b/docs/v10.1.0/workflows/protocol/index.html index 98e789ffa80..89c1e488a5b 100644 --- a/docs/v10.1.0/workflows/protocol/index.html +++ b/docs/v10.1.0/workflows/protocol/index.html @@ -5,13 +5,13 @@ Workflow Protocol | Cumulus Documentation - +
    Version: v10.1.0

    Workflow Protocol

    Configuration and Message Use Diagram

    A diagram showing at which point in a workflow the Cumulus message is checked for conformity with the message schema and where the configuration is checked for conformity with the configuration schema

    • Configuration - The Cumulus workflow configuration defines everything needed to describe an instance of Cumulus.
    • Scheduler - This starts ingest of a collection on configured intervals.
    • Input to Step Functions - The Scheduler uses the Configuration as source data to construct the input to the Workflow.
    • AWS Step Functions - Run the workflows as kicked off by the scheduler or other processes.
    • Input to Task - The input for each task is a JSON document that conforms to the message schema.
    • Output from Task - The output of each task must conform to the message schemas as well and is used as the input for the subsequent task.
    - + \ No newline at end of file diff --git a/docs/v10.1.0/workflows/workflow-configuration-how-to/index.html b/docs/v10.1.0/workflows/workflow-configuration-how-to/index.html index 7d9b4e682d6..77279a50163 100644 --- a/docs/v10.1.0/workflows/workflow-configuration-how-to/index.html +++ b/docs/v10.1.0/workflows/workflow-configuration-how-to/index.html @@ -5,7 +5,7 @@ Workflow Configuration How To's | Cumulus Documentation - + @@ -24,7 +24,7 @@ To take a subset of any given metadata, use the option substring.

    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{substring(file.fileName, 0, 3)}"

    This example will populate to "MOD09GQ/MOD"

    In addition to substring, several datetime-specific functions are available, which can parse a datetime string in the metadata and extract a certain part of it:

    "url_path": "{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"

    or

     "url_path": "{dateFormat(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime, YYYY-MM-DD[T]HH[:]mm[:]ss)}"

    The following functions are implemented:

    • extractYear - returns the year, formatted as YYYY
    • extractMonth - returns the month, formatted as MM
    • extractDate - returns the day of the month, formatted as DD
    • extractHour - returns the hour in 24-hour format, with no leading zero
    • dateFormat - takes a second argument describing how to format the date, and passes the metadata date string and the format argument to moment().format()

    Note: the move-granules step needs to be in the workflow for this template to be populated and the file moved. This cmrMetadata or CMR granule XML needs to have been generated and stored on S3. From there any field could be retrieved and used for a url_path.

    Adding Metadata dates and times to the URL Path

    There are a number of options to pull dates from the CMR file metadata. With this metadata:

    <Granule>
    <Temporal>
    <RangeDateTime>
    <BeginningDateTime>2003-02-19T00:00:00Z</BeginningDateTime>
    <EndingDateTime>2003-02-19T23:59:59Z</EndingDateTime>
    </RangeDateTime>
    </Temporal>
    </Granule>

    The following examples of url_path could be used.

    {extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the year from the full date: 2003.

    {extractMonth(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the month: 2.

    {extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the day: 19.

    {extractHour(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the hour: 0.

    Different values can be combined to create the url_path. For example

    {
    "bucket": "sample-protected-bucket",
    "name": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)/extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"
    }

    The final file location for the above would be s3://sample-protected-bucket/MOD09GQ/2003/19/MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.

    - + \ No newline at end of file diff --git a/docs/v10.1.0/workflows/workflow-triggers/index.html b/docs/v10.1.0/workflows/workflow-triggers/index.html index 3eaf9dbb10f..c4600b8826e 100644 --- a/docs/v10.1.0/workflows/workflow-triggers/index.html +++ b/docs/v10.1.0/workflows/workflow-triggers/index.html @@ -5,13 +5,13 @@ Workflow Triggers | Cumulus Documentation - +
    Version: v10.1.0

    Workflow Triggers

    For a workflow to run, it needs to be associated with a rule (see rule configuration). The rule configuration determines how and when a workflow execution is triggered. Rules can be triggered one time, on a schedule, or by new data written to a kinesis stream.

    There are three lambda functions in the API package responsible for scheduling and starting workflows: SF scheduler, message consumer, and SF starter. Each Cumulus instance comes with a Start SF SQS queue.

    The SF scheduler lambda puts a message onto the Start SF queue. This message is picked up the Start SF lambda and an execution is started with the body of the message as the input.

    When a one time rule is created, the schedule SF lambda is triggered. Rules that are not one time are associated with a CloudWatch event which will manage the trigger of the lambdas that trigger the workflows.

    For a scheduled rule, the Cloudwatch event is triggered on the given schedule which calls directly to the schedule SF lambda.

    For a kinesis rule, when data is added to the kinesis stream, the Cloudwatch event is triggered, which calls the message consumer lambda. The message consumer lambda parses the kinesis message and finds all of the rules associated with that message. For each rule (which corresponds to one workflow), the schedule SF lambda is triggered to queue a message to start the workflow.

    For an sns rule, when a message is published to the SNS topic, the message consumer receives the SNS message (JSON expected), parses it into an object, starts a new execution of the workflow associated with the rule and passes the object in the payload field of the Cumulus message.

    Diagram showing how workflows are scheduled via rules

    - + \ No newline at end of file diff --git a/docs/v11.0.0/adding-a-task/index.html b/docs/v11.0.0/adding-a-task/index.html index 6ee46b47c67..0dd4db9103f 100644 --- a/docs/v11.0.0/adding-a-task/index.html +++ b/docs/v11.0.0/adding-a-task/index.html @@ -5,13 +5,13 @@ Contributing a Task | Cumulus Documentation - +
    Version: v11.0.0

    Contributing a Task

    We're tracking reusable Cumulus tasks in this list and, if you've got one you'd like to share with others, you can add it!

    Right now we're focused on tasks distributed via npm, but are open to including others. For now the script that pulls all the data for each package only supports npm.

    The tasks.md file is generated in the build process

    The tasks list in docs/tasks.md is generated from the list of task package names from the tasks folder.

    Do not edit the docs/tasks.md file directly.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/api/index.html b/docs/v11.0.0/api/index.html index 1b86f0fa760..cf38d92381c 100644 --- a/docs/v11.0.0/api/index.html +++ b/docs/v11.0.0/api/index.html @@ -5,13 +5,13 @@ Cumulus API | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v11.0.0/architecture/index.html b/docs/v11.0.0/architecture/index.html index b2a994b4473..a91fe555f08 100644 --- a/docs/v11.0.0/architecture/index.html +++ b/docs/v11.0.0/architecture/index.html @@ -5,14 +5,14 @@ Architecture | Cumulus Documentation - +
    Version: v11.0.0

    Architecture

    Architecture

    Below, find a diagram with the components that comprise an instance of Cumulus.

    Architecture diagram of a Cumulus deployment

    This diagram details all of the major architectural components of a Cumulus deployment.

    While the diagram can feel complex, it can easily be digested in several major components:

    Data Distribution

    End Users can access data via Cumulus's distribution submodule, which includes ASF's thin egress application, this provides authenticated data egress, temporary S3 links and other statistics features.

    End user exposure of Cumulus's holdings is expected to be provided by an external service.

    For NASA use, this is assumed to be CMR in this diagram.

    Data ingest

    Workflows

    The core of the ingest and processing capabilities in Cumulus is built into the deployed AWS Step Function workflows. Cumulus rules trigger workflows via either Cloud Watch rules, Kinesis streams, SNS topic, or SQS queue. The workflows then run with a configured Cumulus message, utilizing built-in processes to report status of granules, PDRs, executions, etc to the Data Persistence components.

    Workflows can optionally report granule metadata to CMR, and workflow steps can report metrics information to a shared SNS topic, which could be subscribed to for near real time granule, execution, and PDR status. This could be used for metrics reporting using an external ELK stack, for example.

    Data persistence

    Cumulus entity state data is stored in a set of PostgreSQL compatible database, and is exported to an Elasticsearch instance for non-authoritative querying/state data for the API and other applications that require more complex queries. Currently the entity state data is replicated in DynamoDB and this will be removed in a future release.

    Data discovery

    Discovering data for ingest is handled via workflow step components using Cumulus provider and collection configurations and various triggers. Data can be ingested from AWS S3, FTP, HTTPS and more.

    Database

    Cumulus utilizes a user-provided PostgreSQL database backend. For improved API search query efficiency Cumulus provides data replication to an Elasticsearch instance. For legacy reasons, Cumulus is currently also deploying a DynamoDB datastore, and writes are replicated in parallel with the PostgreSQL database writes. The DynamoDB replicated tables and parallel writes will be removed in future releases.

    PostgreSQL Database Schema Diagram

    ERD of the Cumulus Database

    Maintenance

    System maintenance personnel have access to manage ingest and various portions of Cumulus via an AWS API gateway, as well as the operator dashboard.

    Deployment Structure

    Cumulus is deployed via Terraform and is organized internally into two separate top-level modules, as well as several external modules.

    Cumulus

    The Cumulus module, which contains multiple internal submodules, deploys all of the Cumulus components that are not part of the Data Persistence portion of this diagram.

    Data persistence

    The data persistence module provides the Data Persistence portion of the diagram.

    Other modules

    Other modules are provided as artifacts on the release page for use in users configuring their own deployment and contain extracted subcomponents of the cumulus module. For more on these components see the components documentation.

    For more on the specific structure, examples of use and how to deploy and more, please see the deployment docs as well as the cumulus-template-deploy repo .

    - + \ No newline at end of file diff --git a/docs/v11.0.0/configuration/cloudwatch-retention/index.html b/docs/v11.0.0/configuration/cloudwatch-retention/index.html index ac5be2f022f..ad6e9222485 100644 --- a/docs/v11.0.0/configuration/cloudwatch-retention/index.html +++ b/docs/v11.0.0/configuration/cloudwatch-retention/index.html @@ -5,13 +5,13 @@ Cloudwatch Retention | Cumulus Documentation - +
    Version: v11.0.0

    Cloudwatch Retention

    Our lambdas dump logs to AWS CloudWatch. By default, these logs exist indefinitely. However, there are ways to specify a duration for log retention.

    aws-cli

    In addition to getting your aws-cli set-up, there are two values you'll need to acquire.

    1. log-group-name: the name of the log group who's retention policy (retention time) you'd like to change. We'll use /aws/lambda/KinesisInboundLogger in our examples.
    2. retention-in-days: the number of days you'd like to retain the logs in the specified log group for. There is a list of possible values available in the aws logs documentation.

    For example, if we wanted to set log retention to 30 days on our KinesisInboundLogger lambda, we would write:

    aws logs put-retention-policy --log-group-name "/aws/lambda/KinesisInboundLogger" --retention-in-days 30

    Note: The aws-cli log command that we're using is explained in detail here.

    AWS Management Console

    Changing the log retention policy in the AWS Management Console is a fairly simple process:

    1. Navigate to the CloudWatch service in the AWS Management Console.
    2. Click on the Logs entry on the sidebar.
    3. Find the Log Group who's retention policy you're interested in changing.
    4. Click on the value in the Expire Events After column.
    5. Enter/Select the number of days you'd like to retain logs in that log group for.

    Screenshot of AWS console showing how to configure the retention period for Cloudwatch logs

    - + \ No newline at end of file diff --git a/docs/v11.0.0/configuration/collection-storage-best-practices/index.html b/docs/v11.0.0/configuration/collection-storage-best-practices/index.html index 5b98c0b4bc4..3d90bbcea98 100644 --- a/docs/v11.0.0/configuration/collection-storage-best-practices/index.html +++ b/docs/v11.0.0/configuration/collection-storage-best-practices/index.html @@ -5,13 +5,13 @@ Collection Cost Tracking and Storage Best Practices | Cumulus Documentation - +
    Version: v11.0.0

    Collection Cost Tracking and Storage Best Practices

    Organizing your data is important for metrics you may want to collect. AWS S3 storage and cost metrics are calculated at the bucket level, so it is easy to get metrics by bucket. You can get storage metrics at the key prefix level, but that is done through the CLI, which can be very slow for large buckets. It is very difficult to estimate costs at the prefix level.

    Calculating Storage By Collection

    By bucket

    Usage by bucket can be obtained in your AWS Billing Dashboard via an S3 Usage Report. You can download your usage report for a period of time and review your storage and requests at the bucket level.

    Bucket metrics can also be found in the AWS CloudWatch Metrics Console (also see Using Amazon CloudWatch Metrics).

    Navigate to Storage Metrics and select the BucketName for all buckets you are interested in. The available metrics are BucketSizeInBytes and NumberOfObjects.

    In the Graphed metrics tab, you can select the type of statistic (i.e. average, minimum, maximum) and the period for the stats. At the top, it's useful to select from the dropdown to view the metrics as a number. You can also select the time period for which you want to see stats.

    Alternatively you can query CloudWatch using the CLI.

    This command will return the average number of bytes in the bucket test-bucket for 7/31/2019:

    aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2019-07-31T00:00:00 --end-time 2019-08-01T00:00:00 --period 86400 --statistics Average --region us-east-1 --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=test-bucket Name=StorageType,Value=StandardStorage

    The result looks like:

    {
    "Datapoints": [
    {
    "Timestamp": "2019-07-31T00:00:00Z",
    "Average": 150996467959.0,
    "Unit": "Bytes"
    }
    ],
    "Label": "BucketSizeBytes"
    }

    By key prefix

    AWS does not offer storage and usage statistics at a key prefix level. Via the AWS CLI, you can get the total storage for a bucket or folder. The following command would get the storage for folder example-folder in bucket sample-bucket:

    aws s3 ls --summarize --human-readable --recursive s3://sample-bucket/example-folder | grep 'Total'

    Note that this can be a long-running operation for large buckets.

    Calculating Cost By Collection

    NASA NGAP Environment

    If using an NGAP account, the cost per bucket can be found in your CloudTamer console, in the Financials section of your account information. This is calculated on a monthly basis.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Outside of NGAP

    You can enabled S3 Cost Allocation Tags and tag your buckets. From there, you can view the cost breakdown in your AWS Billing Dashboard via the Cost Explorer. Cost Allocation Tagging is available at the bucket level.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Storage Configuration

    Cumulus allows for the configuration of many buckets for your files. Buckets are created and added to your deployment as part of the deployment process.

    In your Cumulus collection configuration, you specify where you want the files to be stored post-processing. This is done by matching a regular expression on the file with the configured bucket.

    Note that in the collection configuration, the bucket field is the key to the buckets variable in the deployment's .tfvars file.

    Organizing By Bucket

    You can specify separate groups of buckets for each collection, which could look like the example below.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "MOD09GQ-006-private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "MOD09GQ-006-public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    Additional collections would go to different buckets.

    Organizing by Key Prefix

    Different collections can be organized into different folders in the same bucket, using the key prefix, which is specified as the url_path in the collection configuration. In this simplified collection configuration example, the url_path field is set at the top level so that all files go to a path prefixed with the collection name and version.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    In this case, the path to all the files would be: MOD09GQ___006/<filename> in their respective buckets.

    The url_path can be overidden directly on the file configuration. The example below produces the same result.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "protected-2",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    }
    ]
    }
    - + \ No newline at end of file diff --git a/docs/v11.0.0/configuration/data-management-types/index.html b/docs/v11.0.0/configuration/data-management-types/index.html index 45f74049f8c..48b1af63a30 100644 --- a/docs/v11.0.0/configuration/data-management-types/index.html +++ b/docs/v11.0.0/configuration/data-management-types/index.html @@ -5,13 +5,13 @@ Cumulus Data Management Types | Cumulus Documentation - +
    Version: v11.0.0

    Cumulus Data Management Types

    What Are The Cumulus Data Management Types

    • Collections: Collections are logical sets of data objects of the same data type and version. They provide contextual information used by Cumulus ingest.
    • Granules: Granules are the smallest aggregation of data that can be independently managed. They are always associated with a collection, which is a grouping of granules.
    • Providers: Providers generate and distribute input data that Cumulus obtains and sends to workflows.
    • Rules: Rules tell Cumulus how to associate providers and collections and when/how to start processing a workflow.
    • Workflows: Workflows are composed of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage, and archive data.
    • Executions: Executions are records of a workflow.
    • Reconciliation Reports: Reports are a comparison of data sets to check to see if they are in agreement and to help Cumulus users detect conflicts.

    Interaction

    • Providers tell Cumulus where to get new data - i.e. S3, HTTPS
    • Collections tell Cumulus where to store the data files
    • Rules tell Cumulus when to trigger a workflow execution and tie providers and collections together

    Managing Data Management Types

    The following are created via the dashboard or API:

    • Providers
    • Collections
    • Rules
    • Reconciliation reports

    Granules are created by workflow executions and then can be managed via the dashboard or API.

    An execution record is created for each workflow execution triggered and can be viewed in the dashboard or data can be retrieved via the API.

    Workflows are created and managed via the Cumulus deployment.

    Configuration Fields

    Schemas

    Looking at our API schema definitions can provide us with some insight into collections, providers, rules, and their attributes (and whether those are required or not). The schema for different concepts will be reference throughout this document.

    The schemas are extremely useful for understanding which attributes are configurable and which of those are required. Cumulus uses these schemas for validation.

    Providers

    Please note:

    • While connection configuration is defined here, things that are more specific to a specific ingest setup (e.g. 'What target directory should we be pulling from' or 'How is duplicate handling configured?') are generally defined in a Rule or Collection, not the Provider.
    • There is some provider behavior which is controlled by task-specific configuration and not the provider definition. This configuration has to be set on a per-workflow basis. For example, see the httpListTimeout configuration on the discover-granules task

    Provider Configuration

    The Provider configuration is defined by a JSON object that takes different configuration keys depending on the provider type. The following are definitions of typical configuration values relevant for the various providers:

    Configuration by provider type
    S3
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be s3 for this provider type.
    hoststringYesS3 Bucket to pull data from
    http
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be http for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 80
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certificateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    https
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be https for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 443
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certiciateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    ftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be ftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to anonymous if not defined
    passwordstringNoPassword to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to password if not defined
    portintegerNoPort to connect to the provider on. Defaults to 21
    sftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be sftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the sftp server.
    passwordstringNoPassword to use to connect to the sftp server.
    portintegerNoPort to connect to the provider on. Defaults to 22
    privateKeystringNofilename assumed to be in s3://bucketInternal/stackName/crypto
    cmKeyIdstringNoAWS KMS Customer Master Key arn or alias

    Collections

    Break down of [s3_MOD09GQ_006.json](https://github.com/nasa/cumulus/blob/master/example/data/collections/s3_MOD09GQ_006/s3_MOD09GQ_006.json)
    KeyValueRequiredDescription
    name"MOD09GQ"YesThe name attribute designates the name of the collection. This is the name under which the collection will be displayed on the dashboard
    version"006"YesA version tag for the collection
    granuleId"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$"YesThe regular expression used to validate the granule ID extracted from filenames according to the granuleIdExtraction
    granuleIdExtraction"(MOD09GQ\..*)(\.hdf|\.cmr|_ndvi\.jpg)"YesThe regular expression used to extract the granule ID from filenames. The first capturing group extracted from the filename by the regex will be used as the granule ID.
    sampleFileName"MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesAn example filename belonging to this collection
    files<JSON Object> of files defined hereYesDescribe the individual files that will exist for each granule in this collection (size, browse, meta, etc.)
    dataType"MOD09GQ"NoCan be specified, but this value will default to the collection_name if not
    duplicateHandling"replace"No("replace"|"version"|"skip") determines granule duplicate handling scheme
    ignoreFilesConfigForDiscoveryfalse (default)NoBy default, during discovery only files that match one of the regular expressions in this collection's files attribute (see above) are ingested. Setting this to true will ignore the files attribute during discovery, meaning that all files for a granule (i.e., all files with filenames matching granuleIdExtraction) will be ingested even when they don't match a regular expression in the files attribute at discovery time. (NOTE: this attribute does not appear in the example file, but is listed here for completeness.)
    process"modis"NoExample options for this are found in the ChooseProcess step definition in the IngestAndPublish workflow definition
    meta<JSON Object> of MetaData for the collectionNoMetaData for the collection. This metadata will be available to workflows for this collection via the Cumulus Message Adapter.
    url_path"{cmrMetadata.Granule.Collection.ShortName}/
    {substring(file.fileName, 0, 3)}"
    NoFilename without extension

    files-object

    KeyValueRequiredDescription
    regex"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"YesRegular expression used to identify the file
    sampleFileNameMOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesFilename used to validate the provided regex
    type"data"NoValue to be assigned to the Granule File Type. CNM types are used by Cumulus CMR steps, non-CNM values will be treated as 'data' type. Currently only utilized in DiscoverGranules task
    bucket"internal"YesName of the bucket where the file will be stored
    url_path"${collectionShortName}/{substring(file.fileName, 0, 3)}"NoFolder used to save the granule in the bucket. Defaults to the collection url_path
    checksumFor"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"NoIf this is a checksum file, set checksumFor to the regex of the target file.

    Rules

    Rules are used by to start processing workflows and the transformation process. Rules can be invoked manually, based on a schedule, or can be configured to be triggered by either events in Kinesis, SNS messages or SQS messages.

    Rule configuration
    KeyValueRequiredDescription
    name"L2_HR_PIXC_kinesisRule"YesName of the rule. This is the name under which the rule will be listed on the dashboard
    workflow"CNMExampleWorkflow"YesName of the workflow to be run. A list of available workflows can be found on the Workflows page
    provider"PODAAC_SWOT"NoConfigured provider's ID. This can be found on the Providers dashboard page
    collection<JSON Object> collection object shown belowYesName and version of the collection this rule will moderate. Relates to a collection configured and found in the Collections page
    payload<JSON Object or Array>NoThe payload to be passed to the workflow
    meta<JSON Object> of MetaData for the ruleNoMetaData for the rule. This metadata will be available to workflows for this rule via the Cumulus Message Adapter.
    rule<JSON Object> rule type and associated values - discussed belowYesObject defining the type and subsequent attributes of the rule
    state"ENABLED"No("ENABLED"|"DISABLED") whether or not the rule will be active. Defaults to "ENABLED".
    queueUrlhttps://sqs.us-east-1.amazonaws.com/1234567890/queue-nameNoURL for SQS queue that will be used to schedule workflows for this rule
    tags["kinesis", "podaac"]NoAn array of strings that can be used to simplify search

    collection-object

    KeyValueRequiredDescription
    name"L2_HR_PIXC"YesName of a collection defined/configured in the Collections dashboard page
    version"000"YesVersion number of a collection defined/configured in the Collections dashboard page

    meta-object

    KeyValueRequiredDescription
    retries3NoNumber of retries on errors, for sqs-type rule only. Defaults to 3.
    visibilityTimeout900NoVisibilityTimeout in seconds for the inflight messages, for sqs-type rule only. Defaults to the visibility timeout of the SQS queue when the rule is created.

    rule-object

    KeyValueRequiredDescription
    type"kinesis"Yes("onetime"|"scheduled"|"kinesis"|"sns"|"sqs") type of scheduling/workflow kick-off desired
    value<String> ObjectDependsDiscussion of valid values is below

    rule-value

    The rule - value entry depends on the type of run:

    • If this is a onetime rule this can be left blank. Example
    • If this is a scheduled rule this field must hold a valid cron-type expression or rate expression.
    • If this is a kinesis rule, this must be a configured ${Kinesis_stream_ARN}. Example
    • If this is an sns rule, this must be an existing ${SNS_Topic_Arn}. Example
    • If this is an sqs rule, this must be an existing ${SQS_QueueUrl} that your account has permissions to access, and also you must configure a dead-letter queue for this SQS queue. Example

    sqs-type rule features

    • When an SQS rule is triggered, the SQS message remains on the queue.
    • The SQS message is not processed multiple times in parallel when visibility timeout is properly set. You should set the visibility timeout to the maximum expected length of the workflow with padding. Longer is better to avoid parallel processing.
    • The SQS message visibility timeout can be overridden by the rule.
    • Upon successful workflow execution, the SQS message is removed from the queue.
    • Upon failed execution(s), the workflow is run 3 or configured number of times.
    • Upon failed execution(s), the visibility timeout will be set to 5s to allow retries.
    • After configured number of failed retries, the SQS message is moved to the dead-letter queue configured for the SQS queue.

    Configuration Via Cumulus Dashboard

    Create A Provider

    • In the Cumulus dashboard, go to the Provider page.

    Screenshot of Create Provider form

    • Click on Add Provider.
    • Fill in the form and then submit it.

    Screenshot of Create Provider form

    Create A Collection

    • Go to the Collections page.

    Screenshot of the Collections page

    • Click on Add Collection.
    • Copy and paste or fill in the collection JSON object form.

    Screenshot of Add Collection form

    • Once you submit the form, you should be able to verify that your new collection is in the list.

    Create A Rule

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Rule Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v11.0.0/configuration/lifecycle-policies/index.html b/docs/v11.0.0/configuration/lifecycle-policies/index.html index 3d82400e1a0..344667272ce 100644 --- a/docs/v11.0.0/configuration/lifecycle-policies/index.html +++ b/docs/v11.0.0/configuration/lifecycle-policies/index.html @@ -5,13 +5,13 @@ Setting S3 Lifecycle Policies | Cumulus Documentation - +
    Version: v11.0.0

    Setting S3 Lifecycle Policies

    This document will outline, in brief, how to set data lifecycle policies so that you are more easily able to control data storage costs while keeping your data accessible. For more information on why you might want to do this, see the 'Additional Information' section at the end of the document.

    Requirements

    • The AWS CLI installed and configured (if you wish to run the CLI example). See AWS's guide to setting up the AWS CLI for more on this. Please ensure the AWS CLI is in your shell path.
    • You will need a S3 bucket on AWS. You are strongly encouraged to use a bucket without voluminous amounts of data in it for experimenting/learning.
    • An AWS user with the appropriate roles to access the target bucket as well as modify bucket policies.

    Examples

    Walk-through on setting time-based S3 Infrequent Access (S3IA) bucket policy

    This example will give step-by-step instructions on updating a bucket's lifecycle policy to move all objects in the bucket from the default storage to S3 Infrequent Access (S3IA) after a period of 90 days. Below are instructions for walking through configuration via the command line and the management console.

    Command Line

    Please ensure you have the AWS CLI installed and configured for access prior to attempting this example.

    Create policy

    From any directory you chose, open an editor and add the following to a file named exampleRule.json

    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    Set policy

    On the command line run the following command (with the bucket you're working with substituted in place of yourBucketNameHere).

    aws s3api put-bucket-lifecycle-configuration --bucket yourBucketNameHere --lifecycle-configuration file://exampleRule.json

    Verify policy has been set

    To obtain all of the existing policies for a bucket, run the following command (again substituting the correct bucket name):

     $ aws s3api get-bucket-lifecycle-configuration --bucket yourBucketNameHere
    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    You have set a policy that transitions any version of an object in the bucket to S3IA after each object version has not been modified for 90 days.

    Management Console

    Create Policy

    To create the example policy on a bucket via the management console, go to the following URL (replacing 'yourBucketHere' with the bucket you intend to update):

    https://s3.console.aws.amazon.com/s3/buckets/yourBucketHere/?tab=overview

    You should see a screen similar to:

    Screenshot of AWS console for an S3 bucket

    Click the "Management" Tab, then lifecycle button and press + Add lifecycle rule:

    Screenshot of &quot;Management&quot; tab of AWS console for an S3 bucket

    Give the rule a name (e.g. '90DayRule'), leaving the filter blank:

    Screenshot of window for configuring the name and scope of a lifecycle rule on an S3 bucket in the AWS console

    Click next, and mark Current Version and Previous Versions.

    Then for each, click + Add transition and select Transition to Standard-IA after for the Object creation field, and set 90 for the Days after creation/Days after objects become concurrent field. Your screen should look similar to:

    Screenshot of window for configuring the storage class transitions of a lifecycle rule on an S3 bucket in the AWS console

    Click next, then next past the Configure expiration screen (we won't be setting this), and on the fourth page, click Save:

    Screenshot of window for reviewing the configuration of a lifecycle rule on an S3 bucket in the AWS console

    You should now see you have a rule configured for your bucket:

    Screenshot of lifecycle rule appearing in the &quot;Management&quot; tab of AWS console for an S3 bucket

    You have now set a policy that transitions any version of an object in the bucket to S3IA after each object has not been modified for 90 days.

    Additional Information

    This section lists information you may want prior to enacting lifecycle policies. It is not required content for working through the examples.

    Strategy Overview

    For a discussion of overall recommended strategy, please review the Methodology for Data Lifecycle Management on the EarthData wiki.

    AWS Documentation

    The examples shown in this document are obviously fairly basic cases. By using object tags, filters and other configuration options you can enact far more complicated policies for various scenarios. For more reading on the topics presented on this page see:

    - + \ No newline at end of file diff --git a/docs/v11.0.0/configuration/monitoring-readme/index.html b/docs/v11.0.0/configuration/monitoring-readme/index.html index 80f4a51afb7..98a0cabcb72 100644 --- a/docs/v11.0.0/configuration/monitoring-readme/index.html +++ b/docs/v11.0.0/configuration/monitoring-readme/index.html @@ -5,14 +5,14 @@ Monitoring Best Practices | Cumulus Documentation - +
    Version: v11.0.0

    Monitoring Best Practices

    This document intends to provide a set of recommendations and best practices for monitoring the state of a deployed Cumulus and diagnosing any issues.

    Cumulus-provided resources and integrations for monitoring

    Cumulus provides a number set of resources that are useful for monitoring the system and its operation.

    Cumulus Dashboard

    The primary tool for monitoring the Cumulus system is the Cumulus Dashboard. The dashboard is hosted on Github and includes instructions on how to deploy and link it into your core Cumulus deployment.

    The dashboard displays workflow executions, their status, inputs, outputs, and some diagnostic information such as logs. For further information on the dashboard, its usage, and the information it provides, see the documentation.

    Cumulus-provided AWS resources

    Cumulus sets up CloudWatch log groups for all Core-provided tasks.

    Monitoring Lambda Functions

    Logging for each Lambda Function is available in Lambda-specific CloudWatch log groups.

    Monitoring ECS services

    Each deployed cumulus_ecs_service module also includes a CloudWatch log group for the processes running on ECS.

    Monitoring workflows

    For advanced debugging, we also configure dead letter queues on critical system functions. These will allow you to monitor and debug invalid inputs to the functions we use to start workflows, which can be helpful if you find that you are not seeing workflows being started as expected. More information on these can be found in the dead letter queue documentation

    AWS recommendations

    AWS has a number of recommendations on system monitoring. Rather than reproduce those here and risk providing outdated guidance, we've documented the following links which will take you to available AWS docs on monitoring recommendations and best practices for the services used in Cumulus:

    Example: Setting up email notifications for CloudWatch logs

    Cumulus does not provide out-of-the-box support for email notifications at this time. However, setting up email notifications on AWS is fairly straightforward in that the operative components are an AWS SNS topic and a subscribed email address.

    In terms of Cumulus integration, forwarding CloudWatch logs requires creating a mechanism, most likely a Lambda Function subscribed to the log group that will receive, filter and forward these messages to the SNS topic.

    As a very simple example, we could create a function that filters CloudWatch logs created by the @cumulus/logger package and sends email notifications for error and fatal log levels, adapting the example linked above:

    const zlib = require('zlib');
    const aws = require('aws-sdk');
    const { promisify } = require('util');

    const gunzip = promisify(zlib.gunzip);
    const sns = new aws.SNS();

    exports.handler = async (event) => {
    const payload = Buffer.from(event.awslogs.data, 'base64');
    const decompressedData = await gunzip(payload);
    const logData = JSON.parse(decompressedData.toString('ascii'));
    return await Promise.all(logData.logEvents.map(async (logEvent) => {
    const logMessage = JSON.parse(logEvent.message);
    if (['error', 'fatal'].includes(logMessage.level)) {
    return sns.publish({
    TopicArn: process.env.EmailReportingTopicArn,
    Message: logEvent.message
    }).promise();
    }
    return Promise.resolve();
    }));
    };

    After creating the SNS topic, We can deploy this code as a lambda function, following the setup steps from Amazon. Make sure to include your SNS topic ARN as an environment variable on the lambda function by using the --environment option on aws lambda create-function.

    You will need to create subscription filters for each log group you want to receive emails for. We recommend automating this as much as possible, and you could very well handle this via Terraform, such as using a module to deploy filters alongside log groups, or exporting the log group names to an all-in-one email notification module.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/configuration/server_access_logging/index.html b/docs/v11.0.0/configuration/server_access_logging/index.html index f65708cfdbe..7c8e2ecf662 100644 --- a/docs/v11.0.0/configuration/server_access_logging/index.html +++ b/docs/v11.0.0/configuration/server_access_logging/index.html @@ -5,13 +5,13 @@ S3 Server Access Logging | Cumulus Documentation - +
    Version: v11.0.0

    S3 Server Access Logging

    Via AWS Console

    Enable server access logging for an S3 bucket

    Via AWS Command Line Interface

    1. Create a logging.json file with these contents, replacing <stack-internal-bucket> with your stack's internal bucket name, and <stack> with the name of your cumulus stack.

      {
      "LoggingEnabled": {
      "TargetBucket": "<stack-internal-bucket>",
      "TargetPrefix": "<stack>/ems-distribution/s3-server-access-logs/"
      }
      }
    2. Add the logging policy to each of your protected and public buckets by calling this command on each bucket.

      aws s3api put-bucket-logging --bucket <protected/public-bucket-name> --bucket-logging-status file://logging.json
    3. Verify the logging policy exists on your buckets.

      aws s3api get-bucket-logging --bucket <protected/public-bucket-name>
    - + \ No newline at end of file diff --git a/docs/v11.0.0/configuration/task-configuration/index.html b/docs/v11.0.0/configuration/task-configuration/index.html index 9163cbbd27c..c274bbf1624 100644 --- a/docs/v11.0.0/configuration/task-configuration/index.html +++ b/docs/v11.0.0/configuration/task-configuration/index.html @@ -5,13 +5,13 @@ Configuration of Tasks | Cumulus Documentation - +
    Version: v11.0.0

    Configuration of Tasks

    The cumulus module exposes values for configuration for some of the provided archive and ingest tasks. Currently the following are available as configurable variables:

    cmr_search_client_config

    Configuration parameters for CMR search client for cumulus archive module tasks in the form:

    <lambda_identifier>_report_cmr_limit = <maximum number records can be returned from cmr-client search, this should be greater than cmr_page_size>
    <lambda_identifier>_report_cmr_page_size = <number of records for each page returned from CMR>
    type = map(string)

    More information about cmr limit and cmr page_size can be found from @cumulus/cmr-client and CMR Search API document.

    Currently the following values are supported:

    • create_reconciliation_report_cmr_limit
    • create_reconciliation_report_cmr_page_size

    Example

    cmr_search_client_config = {
    create_reconciliation_report_cmr_limit = 2500
    create_reconciliation_report_cmr_page_size = 250
    }

    elasticsearch_client_config

    Configuration parameters for Elasticsearch client for cumulus archive module tasks in the form:

    <lambda_identifier>_es_scroll_duration = <duration>
    <lambda_identifier>_es_scroll_size = <size>
    type = map(string)

    Currently the following values are supported:

    • create_reconciliation_report_es_scroll_duration
    • create_reconciliation_report_es_scroll_size

    Example

    elasticsearch_client_config = {
    create_reconciliation_report_es_scroll_duration = "15m"
    create_reconciliation_report_es_scroll_size = 2000
    }

    lambda_timeouts

    A configurable map of timeouts (in seconds) for cumulus ingest module task lambdas in the form:

    <lambda_identifier>_timeout: <timeout>
    type = map(string)

    Currently the following values are supported:

    • discover_granules_task_timeout
    • discover_pdrs_task_timeout
    • hyrax_metadata_update_tasks_timeout
    • lzards_backup_task_timeout
    • move_granules_task_timeout
    • parse_pdr_task_timeout
    • pdr_status_check_task_timeout
    • post_to_cmr_task_timeout
    • queue_granules_task_timeout
    • queue_pdrs_task_timeout
    • queue_workflow_task_timeout
    • sync_granule_task_timeout
    • update_granules_cmr_metadata_file_links_task_timeout

    Example

    lambda_timeouts = {
    discover_granules_task_timeout = 300
    }
    - + \ No newline at end of file diff --git a/docs/v11.0.0/data-cookbooks/about-cookbooks/index.html b/docs/v11.0.0/data-cookbooks/about-cookbooks/index.html index 1ece92c7f09..a1878bf9bfb 100644 --- a/docs/v11.0.0/data-cookbooks/about-cookbooks/index.html +++ b/docs/v11.0.0/data-cookbooks/about-cookbooks/index.html @@ -5,13 +5,13 @@ About Cookbooks | Cumulus Documentation - +
    Version: v11.0.0

    About Cookbooks

    Introduction

    The following data cookbooks are documents containing examples and explanations of workflows in the Cumulus framework. Additionally, the following data cookbooks should serve to help unify an institution/user group on a set of terms.

    Setup

    The data cookbooks assume you can configure providers, collections, and rules to run workflows. Visit Cumulus data management types for information on how to configure Cumulus data management types.

    Adding a page

    As shown in detail in the "Add a New Page and Sidebars" section in Cumulus Docs: How To's, you can add a new page to the data cookbook by creating a markdown (.md) file in the docs/data-cookbooks directory. The new page can then be linked to the sidebar by adding it to the Data-Cookbooks object in the website/sidebar.json file as data-cookbooks/${id}.

    More about workflows

    Workflow general information

    Input & Output

    Developing Workflow Tasks

    Workflow Configuration How-to's

    - + \ No newline at end of file diff --git a/docs/v11.0.0/data-cookbooks/browse-generation/index.html b/docs/v11.0.0/data-cookbooks/browse-generation/index.html index 7ce0de9ba27..df61803a03e 100644 --- a/docs/v11.0.0/data-cookbooks/browse-generation/index.html +++ b/docs/v11.0.0/data-cookbooks/browse-generation/index.html @@ -5,7 +5,7 @@ Ingest Browse Generation | Cumulus Documentation - + @@ -15,7 +15,7 @@ provider keys with the previously entered values) Note that you need to set the "provider_path" to the path on your bucket (e.g. "/data") that you've staged your mock/test data.:

    {
    "name": "TestBrowseGeneration",
    "workflow": "DiscoverGranulesBrowseExample",
    "provider": "{{provider_from_previous_step}}",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "meta": {
    "provider_path": "{{path_to_data}}"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "updatedAt": 1553053438767
    }

    Run Workflows

    Once you've configured the Collection and Provider and added a onetime rule, you're ready to trigger your rule, and watch the ingest workflows process.

    Go to the Rules tab, click the rule you just created:

    Screenshot of the Rules overview page with a list of rules in the Cumulus dashboard

    Then click the gear in the upper right corner and click "Rerun":

    Screenshot of clicking the button to rerun a workflow rule from the rule edit page in the Cumulus dashboard

    Tab over to executions and you should see the DiscoverGranulesBrowseExample workflow run, succeed, and then moments later the CookbookBrowseExample should run and succeed.

    Screenshot of page listing executions in the Cumulus dashboard

    Results

    You can verify your data has ingested by clicking the successful workflow entry:

    Screenshot of individual entry from table listing executions in the Cumulus dashboard

    Select "Show Output" on the next page

    Screenshot of &quot;Show output&quot; button from individual execution page in the Cumulus dashboard

    and you should see in the payload from the workflow something similar to:

    "payload": {
    "process": "modis",
    "granules": [
    {
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-private",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "type": "browse",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-protected-2",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}"
    }
    ],
    "cmrLink": "https://cmr.uat.earthdata.nasa.gov/search/granules.json?concept_id=G1222231611-CUMULUS",
    "cmrConceptId": "G1222231611-CUMULUS",
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "cmrMetadataFormat": "echo10",
    "dataType": "MOD09GQ",
    "version": "006",
    "published": true
    }
    ]
    }

    You can verify the granules exist within your cumulus instance (search using the Granules interface, check the S3 buckets, etc) and validate that the above CMR entry


    Build Processing Lambda

    This section discusses the construction of a custom processing lambda to replace the contrived example from this entry for a real dataset processing task.

    To ingest your own data using this example, you will need to construct your own lambda to replace the source in ProcessingStep that will generate browse imagery and provide or update a CMR metadata export file.

    You will then need to add the lambda to your Cumulus deployment as a aws_lambda_function Terraform resource.

    The discussion below outlines requirements for this lambda.

    Inputs

    The incoming message to the task defined in the ProcessingStep as configured will have the following configuration values (accessible inside event.config courtesy of the message adapter):

    Configuration

    • event.config.bucket -- the name of the bucket configured in terraform.tfvars as your internal bucket.

    • event.config.collection -- The full collection object we will configure in the Configure Ingest section. You can view the expected collection schema in the docs here or in the source code on github. You need this as available input and output so you can update as needed.

    event.config.additionalUrls, generateFakeBrowse and event.config.cmrMetadataFormat from the example can be ignored as they're configuration flags for the provided example script.

    Payload

    The 'payload' from the previous task is accessible via event.input. The expected payload output schema from SyncGranules can be viewed here.

    In our example, the payload would look like the following. Note: The types are set per-file based on what we configured in our collection, and were initially added as part of the DiscoverGranules step in the DiscoverGranulesBrowseExample workflow.

     "payload": {
    "process": "modis",
    "granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    }
    ]
    }
    ]
    }

    Generating Browse Imagery

    The provided example script used in the example goes through all granules and adds a 'fake' .jpg browse file to the same staging location as the data staged by prior ingest tasksf.

    The processing lambda you construct will need to do the following:

    • Create a browse image file based on the input data, and stage it to a location accessible to both this task and the FilesToGranules and MoveGranules tasks in a S3 bucket.
    • Add the browse file to the input granule files, making sure to set the granule file's type to browse.
    • Update meta.input_granules with the updated granules list, as well as provide the files to be integrated by FilesToGranules as output from the task.

    Generating/updating CMR metadata

    If you do not already have a CMR file in the granules list, you will need to generate one for valid export. This example's processing script generates and adds it to the FilesToGranules file list via the payload but it can be present in the InputGranules from the DiscoverGranules task as well if you'd prefer to pre-generate it.

    Both downstream tasks MoveGranules, UpdateGranulesCmrMetadataFileLinks, and PostToCmr expect a valid CMR file to be available if you want to export to CMR.

    Expected Outputs for processing task/tasks

    In the above example, the critical portion of the output to FilesToGranules is the payload and meta.input_granules.

    In the example provided, the processing task is setup to return an object with the keys "files" and "granules". In the cumulus_message configuration, the outputs are mapped in the configuration to the payload, granules to meta.input_granules:

              "task_config": {
    "inputGranules": "{$.meta.input_granules}",
    "granuleIdExtraction": "{$.meta.collection.granuleIdExtraction}"
    }

    Their expected values from the example above may be useful in constructing a processing task:

    payload

    The payload includes a full list of files to be 'moved' into the cumulus archive. The FilesToGranules task will take this list, merge it with the information from InputGranules, then pass that list to the MoveGranules task. The MoveGranules task will then move the files to their targets. The UpdateGranulesCmrMetadataFileLinks task will update the CMR metadata file if it exists with the updated granule locations and update the CMR file etags.

    In the provided example, a payload being passed to the FilesToGranules task should be expected to look like:

      "payload": [
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml"
    ]

    This list is the list of granules FilesToGranules will act upon to add/merge with the input_granules object.

    The pathing is generated from sync-granules, but in principle the files can be staged wherever you like so long as the processing/MoveGranules task's roles have access and the filename matches the collection configuration.

    input_granules

    The FilesToGranules task utilizes the incoming payload to chose which files to move, but pulls all other metadata from meta.input_granules. As such, the output payload in the example would look like:

    "input_granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg"
    }
    ]
    }
    ],
    - + \ No newline at end of file diff --git a/docs/v11.0.0/data-cookbooks/choice-states/index.html b/docs/v11.0.0/data-cookbooks/choice-states/index.html index 7e346bb0169..6bc943e9e6b 100644 --- a/docs/v11.0.0/data-cookbooks/choice-states/index.html +++ b/docs/v11.0.0/data-cookbooks/choice-states/index.html @@ -5,13 +5,13 @@ Choice States | Cumulus Documentation - +
    Version: v11.0.0

    Choice States

    Cumulus supports AWS Step Function Choice states. A Choice state enables branching logic in Cumulus workflows.

    Choice state definitions include a list of Choice Rules. Each Choice Rule defines a logical operation which compares an input value against a value using a comparison operator. For available comparison operators, review the AWS docs.

    If the comparison evaluates to true, the Next state is followed.

    Example

    In examples/cumulus-tf/parse_pdr_workflow.tf the ParsePdr workflow uses a Choice state, CheckAgainChoice, to terminate the workflow once meta.isPdrFinished: true is returned by the CheckStatus state.

    The CheckAgainChoice state definition requires an input object of the following structure:

    {
    "meta": {
    "isPdrFinished": false
    }
    }

    Given the above input to the CheckAgainChoice state, the workflow would transition to the PdrStatusReport state.

    "CheckAgainChoice": {
    "Type": "Choice",
    "Choices": [
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": false,
    "Next": "PdrStatusReport"
    },
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": true,
    "Next": "WorkflowSucceeded"
    }
    ],
    "Default": "WorkflowSucceeded"
    }

    Advanced: Loops in Cumulus Workflows

    Understanding the complete ParsePdr workflow is not necessary to understanding how Choice states work, but ParsePdr provides an example of how Choice states can be used to create a loop in a Cumulus workflow.

    In the complete ParsePdr workflow definition, the state QueueGranules is followed by CheckStatus. From CheckStatus a loop starts: Given CheckStatus returns meta.isPdrFinished: false, CheckStatus is followed by CheckAgainChoice is followed by PdrStatusReport is followed by WaitForSomeTime, which returns to CheckStatus. Once CheckStatus returns meta.isPdrFinished: true, CheckAgainChoice proceeds to WorkflowSucceeded.

    Execution graph of SIPS ParsePdr workflow in AWS Step Functions console

    Further documentation

    For complete details on Choice state configuration options, see the Choice state documentation.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/data-cookbooks/cnm-workflow/index.html b/docs/v11.0.0/data-cookbooks/cnm-workflow/index.html index 198f0d7b5f3..33b4b4bee75 100644 --- a/docs/v11.0.0/data-cookbooks/cnm-workflow/index.html +++ b/docs/v11.0.0/data-cookbooks/cnm-workflow/index.html @@ -5,7 +5,7 @@ CNM Workflow | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v11.0.0

    CNM Workflow

    This entry documents how to setup a workflow that utilizes the built-in CNM/Kinesis functionality in Cumulus.

    Prior to working through this entry you should be familiar with the Cloud Notification Mechanism.

    Sections


    Prerequisites

    Cumulus

    This entry assumes you have a deployed instance of Cumulus (version >= 1.16.0). The entry assumes you are deploying Cumulus via the cumulus terraform module sourced from the release page.

    AWS CLI

    This entry assumes you have the AWS CLI installed and configured. If you do not, please take a moment to review the documentation - particularly the examples relevant to Kinesis - and install it now.

    Kinesis

    This entry assumes you already have two Kinesis data steams created for use as CNM notification and response data streams.

    If you do not have two streams setup, please take a moment to review the Kinesis documentation and setup two basic single-shard streams for this example:

    Using the "Create Data Stream" button on the Kinesis Dashboard, work through the dialogue.

    You should be able to quickly use the "Create Data Stream" button on the Kinesis Dashboard, and setup streams that are similar to the following example:

    Screenshot of AWS console page for creating a Kinesis stream

    Please bear in mind that your {{prefix}}-lambda-processing IAM role will need permissions to write to the response stream for this workflow to succeed if you create the Kinesis stream with a dashboard user. If you are using the cumulus top-level module for your deployment this should be set properly.

    If not, the most straightforward approach is to attach the AmazonKinesisFullAccess policy for the stream resource to whatever role your Lambda s are using, however your environment/security policies may require an approach specific to your deployment environment.

    In operational environments it's likely science data providers would typically be responsible for providing a Kinesis stream with the appropriate permissions.

    For more information on how this process works and how to develop a process that will add records to a stream, read the Kinesis documentation and the developer guide.

    Source Data

    This entry will run the SyncGranule task against a single target data file. To that end it will require a single data file to be present in an S3 bucket matching the Provider configured in the next section.

    Collection and Provider

    Cumulus will need to be configured with a Collection and Provider entry of your choosing. The provider should match the location of the source data from the Ingest Source Data section.

    This can be done via the Cumulus Dashboard if installed or the API. It is strongly recommended to use the dashboard if possible.


    Configure the Workflow

    Provided the prerequisites have been fulfilled, you can begin adding the needed values to your Cumulus configuration to configure the example workflow.

    The following are steps that are required to set up your Cumulus instance to run the example workflow:

    Example CNM Workflow

    In this example, we're going to trigger a workflow by creating a Kinesis rule and sending a record to a Kinesis stream.

    The following workflow definition should be added to a new .tf workflow resource (e.g. cnm_workflow.tf) in your deployment directory. For the complete CNM workflow example, see examples/cumulus-tf/kinesis_trigger_test_workflow.tf.

    Add the following to the new terraform file in your deployment directory, updating the following:

    • Set the response-endpoint key in the CnmResponse task in the workflow JSON to match the name of the Kinesis response stream you configured in the prerequisites section
    • Update the source key to the workflow module to match the Cumulus release associated with your deployment.
    module "cnm_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-workflow.zip"

    prefix = var.prefix
    name = "CNMExampleWorkflow"
    workflow_config = module.cumulus.workflow_config
    system_bucket = var.system_bucket

    {
    state_machine_definition = <<JSON
    "CNMExampleWorkflow": {
    "Comment": "CNMExampleWorkflow",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "collection": "{$.meta.collection}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "response-endpoint": "ADD YOUR RESPONSE STREAM NAME HERE",
    "region": "us-east-1",
    "type": "kinesis",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$.input.input}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 5,
    "MaxAttempts": 3
    }
    ],
    "End": true
    }
    }
    }
    }
    JSON

    Again, please make sure to modify the value response-endpoint to match the stream name (not ARN) for your Kinesis response stream.

    Lambda Configuration

    To execute this workflow, you're required to include several Lambda resources in your deployment. To do this, add the following task (Lambda) definitions to your deployment along with the workflow you created above:

    Please note: To utilize these tasks you need to ensure you have a compatible CMA layer. See the deployment instructions for more details on how to deploy a CMA layer.

    Below is a description of each of these tasks:

    CNMToCMA

    CNMToCMA is meant for the beginning of a workflow: it maps CNM granule information to a payload for downstream tasks. For other CNM workflows, you would need to ensure that downstream tasks in your workflow either understand the CNM message or include a translation task like this one.

    You can also manipulate the data sent to downstream tasks using task_config for various states in your workflow resource configuration. Read more about how to configure data on the Workflow Input & Output page.

    CnmResponse

    The CnmResponse Lambda generates a CNM response message and puts it on the response-endpoint Kinesis stream.

    You can read more about the expected schema of a CnmResponse record in the Cloud Notification Mechanism schema repository.

    Additional Tasks

    Lastly, this entry also makes use of the SyncGranule task from the cumulus module.

    Redeploy

    Once the above configuration changes have been made, redeploy your stack.

    Please refer to Update Cumulus resources in the deployment documentation if you are unfamiliar with redeployment.

    Rule Configuration

    Cumulus includes a messageConsumer Lambda function (message-consumer). Cumulus kinesis-type rules create the event source mappings between Kinesis streams and the messageConsumer Lambda. The messageConsumer Lambda consumes records from one or more Kinesis streams, as defined by enabled kinesis-type rules. When new records are pushed to one of these streams, the messageConsumer triggers workflows associated with the enabled kinesis-type rules.

    To add a rule via the dashboard (if you'd like to use the API, see the docs here), navigate to the Rules page and click Add a rule, then configure the new rule using the following template (substituting correct values for parameters denoted by ${}):

    {
    "collection": {
    "name": "L2_HR_PIXC",
    "version": "000"
    },
    "name": "L2_HR_PIXC_kinesisRule",
    "provider": "PODAAC_SWOT",
    "rule": {
    "type": "kinesis",
    "value": "arn:aws:kinesis:{{awsRegion}}:{{awsAccountId}}:stream/{{streamName}}"
    },
    "state": "ENABLED",
    "workflow": "CNMExampleWorkflow"
    }

    Please Note:

    • The rule's value attribute value must match the Amazon Resource Name ARN for the Kinesis data stream you've preconfigured. You should be able to obtain this ARN from the Kinesis Dashboard entry for the selected stream.
    • The collection and provider should match the collection and provider you setup in the Prerequisites section.

    Once you've clicked on 'submit' a new rule should appear in the dashboard's Rule Overview.


    Execute the Workflow

    Once Cumulus has been redeployed and a rule has been added, we're ready to trigger the workflow and watch it execute.

    How to Trigger the Workflow

    To trigger matching workflows, you will need to put a record on the Kinesis stream that the message-consumer Lambda will recognize as a matching event. Most importantly, it should include a collection name that matches a valid collection.

    For the purpose of this example, the easiest way to accomplish this is using the AWS CLI.

    Create Record JSON

    Construct a JSON file containing an object that matches the values that have been previously setup. This JSON object should be a valid Cloud Notification Mechanism message.

    Please note: this example is somewhat contrived, as the downstream tasks don't care about most of these fields. A 'real' data ingest workflow would.

    The following values (denoted by ${} in the sample below) should be replaced to match values we've previously configured:

    • TEST_DATA_FILE_NAME: The filename of the test data that is available in the S3 (or other) provider we created earlier.
    • TEST_DATA_URI: The full S3 path to the test data (e.g. s3://bucket-name/path/granule)
    • COLLECTION: The collection name defined in the prerequisites for this product
    {
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "${TEST_DATA_FILE_NAME}",
    "checksum": "bogus_checksum_value",
    "uri": "${TEST_DATA_URI}",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "${TEST_DATA_FILE_NAME}",
    "dataVersion": "006"
    },
    "identifier ": "testIdentifier123456",
    "collection": "${COLLECTION}",
    "provider": "TestProvider",
    "version": "001",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Add Record to Kinesis Data Stream

    Using the JSON file you created, push it to the Kinesis notification stream:

    aws kinesis put-record --stream-name YOUR_KINESIS_NOTIFICATION_STREAM_NAME_HERE --partition-key 1 --data file:///path/to/file.json

    Please note: The above command uses the stream name, not the ARN.

    The command should return output similar to:

    {
    "ShardId": "shardId-000000000000",
    "SequenceNumber": "42356659532578640215890215117033555573986830588739321858"
    }

    This command will put a record containing the JSON from the --data flag onto the Kinesis data stream. The messageConsumer Lambda will consume the record and construct a valid CMA payload to trigger workflows. For this example, the record will trigger the CNMExampleWorkflow workflow as defined by the rule previously configured.

    You can view the current running executions on the Executions dashboard page which presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information.

    Verify Workflow Execution

    As detailed above, once the record is added to the Kinesis data stream, the messageConsumer Lambda will trigger the CNMExampleWorkflow .

    TranslateMessage

    TranslateMessage (which corresponds to the CNMToCMA Lambda) will take the CNM object payload and add a granules object to the CMA payload that's consistent with other Cumulus ingest tasks, and add a meta.cnm key (as well as the payload) to store the original message.

    For more on the Message Adapter, please see the Message Flow documentation.

    An example of what is happening in the CNMToCMA Lambda is as follows:

    Example Input Payload:

    "payload": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some_bucket/cumulus-test-data/pdrs/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Example Output Payload:

      "payload": {
    "cnm": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552"
    },
    "output": {
    "granules": [
    {
    "granuleId": "TestGranuleUR",
    "files": [
    {
    "path": "some-bucket/data",
    "url_path": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "some-bucket",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 12345678
    }
    ]
    }
    ]
    }
    }

    SyncGranules

    This Lambda will take the files listed in the payload and move them to s3://{deployment-private-bucket}/file-staging/{deployment-name}/{COLLECTION}/{file_name}.

    CnmResponse

    Assuming a successful execution of the workflow, this task will recover the meta.cnm key from the CMA output, and add a "SUCCESS" record to the notification Kinesis stream.

    If a prior step in the workflow has failed, this will add a "FAILURE" record to the stream instead.

    The data written to the response-endpoint should adhere to the Response Message Fields schema.

    Example CNM Success Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "SUCCESS"
    }
    }

    Example CNM Error Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "FAILURE",
    "errorCode": "PROCESSING_ERROR",
    "errorMessage": "File [cumulus-dev-a4d38f59-5e57-590c-a2be-58640db02d91/prod_20170926T11:30:36/production_file.nc] did not match gve checksum value."
    }
    }

    Note the CnmResponse state defined in the .tf workflow definition above configures $.exception to be passed to the CnmResponse Lambda keyed under config.WorkflowException. This is required for the CnmResponse code to deliver a failure response.

    To test the failure scenario, send a record missing the product.name key.


    Verify results

    Check for successful execution on the dashboard

    Following the successful execution of this workflow, you should expect to see the workflow complete successfully on the dashboard:

    Screenshot of a successful CNM workflow appearing on the executions page of the Cumulus dashboard

    Check the test granule has been delivered to S3 staging

    The test granule identified in the Kinesis record should be moved to the deployment's private staging area.

    Check for Kinesis records

    A SUCCESS notification should be present on the response-endpoint Kinesis stream.

    You should be able to validate the notification and response streams have the expected records with the following steps (the AWS CLI Kinesis Basic Stream Operations is useful to review before proceeding):

    Get a shard iterator (substituting your stream name as appropriate):

    aws kinesis get-shard-iterator \
    --shard-id shardId-000000000000 \
    --shard-iterator-type LATEST \
    --stream-name NOTIFICATION_OR_RESPONSE_STREAM_NAME

    which should result in an output to:

    {
    "ShardIterator": "VeryLongString=="
    }
    • Re-trigger the workflow by using the put-record command from
    • As the workflow completes, use the output from the get-shard-iterator command to request data from the stream:
    aws kinesis get-records --shard-iterator SHARD_ITERATOR_VALUE

    This should result in output similar to:

    {
    "Records": [
    {
    "SequenceNumber": "49586720336541656798369548102057798835250389930873978882",
    "ApproximateArrivalTimestamp": 1532664689.128,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjI4LjkxOSJ9",
    "PartitionKey": "1"
    },
    {
    "SequenceNumber": "49586720336541656798369548102059007761070005796999266306",
    "ApproximateArrivalTimestamp": 1532664707.149,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjQ2Ljk1OCJ9",
    "PartitionKey": "1"
    }
    ],
    "NextShardIterator": "AAAAAAAAAAFo9SkF8RzVYIEmIsTN+1PYuyRRdlj4Gmy3dBzsLEBxLo4OU+2Xj1AFYr8DVBodtAiXbs3KD7tGkOFsilD9R5tA+5w9SkGJZ+DRRXWWCywh+yDPVE0KtzeI0andAXDh9yTvs7fLfHH6R4MN9Gutb82k3lD8ugFUCeBVo0xwJULVqFZEFh3KXWruo6KOG79cz2EF7vFApx+skanQPveIMz/80V72KQvb6XNmg6WBhdjqAA==",
    "MillisBehindLatest": 0
    }

    Note the data encoding is not human readable and would need to be parsed/converted to be interpretable. There are many options to build a Kineis consumer such as the KCL.

    For purposes of validating the workflow, it may be simpler to locate the workflow in the Step Function Management Console and assert the expected output is similar to the below examples.

    Successful CNM Response Object Example:

    {
    "cnmResponse": {
    "provider": "TestProvider",
    "collection": "MOD09GQ",
    "version": "123456",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier ": "testIdentifier123456",
    "response": {
    "status": "SUCCESS"
    }
    }
    }

    Kinesis Record Error Handling

    messageConsumer

    The default Kinesis stream processing in the Cumulus system is configured for record error tolerance.

    When the messageConsumer fails to process a record, the failure is captured and the record is published to the kinesisFallback SNS Topic. The kinesisFallback SNS topic broadcasts the record and a subscribed copy of the messageConsumer Lambda named kinesisFallback consumes these failures.

    At this point, the normal Lambda asynchronous invocation retry behavior will attempt to process the record 3 mores times. After this, if the record cannot successfully be processed, it is written to a dead letter queue. Cumulus' dead letter queue is an SQS Queue named kinesisFailure. Operators can use this queue to inspect failed records.

    This system ensures when messageConsumer fails to process a record and trigger a workflow, the record is retried 3 times. This retry behavior improves system reliability in case of any external service failure outside of Cumulus control.

    The Kinesis error handling system - the kinesisFallback SNS topic, messageConsumer Lambda, and kinesisFailure SQS queue - come with the API package and do not need to be configured by the operator.

    To examine records that were unable to be processed at any step you need to go look at the dead letter queue {{prefix}}-kinesisFailure. Check the Simple Queue Service (SQS) console. Select your queue, and under the Queue Actions tab, you can choose View/Delete Messages. Start polling for messages and you will see records that failed to process through the messageConsumer.

    Note, these are only records that occurred when processing records from Kinesis streams. Workflow failures are handled differently.

    Kinesis Stream logging

    Notification Stream messages

    Cumulus includes two Lambdas (KinesisInboundEventLogger and KinesisOutboundEventLogger) that utilize the same code to take a Kinesis record event as input, deserialize the data field and output the modified event to the logs.

    When a kinesis rule is created, in addition to the messageConsumer event mapping, an event mapping is created to trigger KinesisInboundEventLogger to record a log of the inbound record, to allow for analysis in case of unexpected failure.

    Response Stream messages

    Cumulus also supports this feature for all outbound messages. To take advantage of this feature, you will need to set an event mapping on the KinesisOutboundEventLogger Lambda that targets your response-endpoint. You can do this in the Lambda management page for KinesisOutboundEventLogger. Add a Kinesis trigger, and configure it to target the cnmResponseStream for your workflow:

    Screenshot of the AWS console showing configuration for Kinesis stream trigger on KinesisOutboundEventLogger Lambda

    Once this is done, all records sent to the response-endpoint will also be logged in CloudWatch. For more on configuring Lambdas to trigger on Kinesis events, please see creating an event source mapping.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/data-cookbooks/error-handling/index.html b/docs/v11.0.0/data-cookbooks/error-handling/index.html index 2fee7d4c177..2d0391c0a31 100644 --- a/docs/v11.0.0/data-cookbooks/error-handling/index.html +++ b/docs/v11.0.0/data-cookbooks/error-handling/index.html @@ -5,7 +5,7 @@ Error Handling in Workflows | Cumulus Documentation - + @@ -45,7 +45,7 @@ Service Exception. See this documentation on configuring your workflow to handle transient lambda errors.

    Example state machine definition:

    {
    "Comment": "Tests Workflow from Kinesis Stream",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "Path": "$.payload",
    "TargetPath": "$.payload"
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": ["States.ALL"],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowSucceeded"
    },
    "CnmResponseFail": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowFailed"
    },
    "WorkflowSucceeded": {
    "Type": "Succeed"
    },
    "WorkflowFailed": {
    "Type": "Fail",
    "Cause": "Workflow failed"
    }
    }
    }

    The above results in a workflow which is visualized in the diagram below:

    Screenshot of a visualization of an AWS Step Function workflow definition with branching logic for failures

    Summary

    Error handling should (mostly) be the domain of workflow configuration.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/data-cookbooks/hello-world/index.html b/docs/v11.0.0/data-cookbooks/hello-world/index.html index 060a94c8ca8..5946c6481a4 100644 --- a/docs/v11.0.0/data-cookbooks/hello-world/index.html +++ b/docs/v11.0.0/data-cookbooks/hello-world/index.html @@ -5,14 +5,14 @@ HelloWorld Workflow | Cumulus Documentation - +
    Version: v11.0.0

    HelloWorld Workflow

    Example task meant to be a sanity check/introduction to the Cumulus workflows.

    Pre-Deployment Configuration

    Workflow Configuration

    A workflow definition can be found in the template repository hello_world_workflow module.

    {
    "Comment": "Returns Hello World",
    "StartAt": "HelloWorld",
    "States": {
    "HelloWorld": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.hello_world_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    }

    Workflow error-handling can be configured as discussed in the Error-Handling cookbook.

    Task Configuration

    The HelloWorld task is provided for you as part of the cumulus terraform module, no configuration is needed.

    If you want to manually deploy your own version of this Lambda for testing, you can copy the Lambda resource definition located in the Cumulus source code at cumulus/tf-modules/ingest/hello-world-task.tf. The Lambda source code is located in the Cumulus source code at 'cumulus/tasks/hello-world'.

    Execution

    We will focus on using the Cumulus dashboard to schedule the execution of a HelloWorld workflow.

    Our goal here is to create a rule through the Cumulus dashboard that will define the scheduling and execution of our HelloWorld workflow. Let's navigate to the Rules page and click Add a rule.

    {
    "collection": { # collection values can be configured and found on the Collections page
    "name": "${collection_name}",
    "version": "${collection_version}"
    },
    "name": "helloworld_rule",
    "provider": "${provider}", # found on the Providers page
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "workflow": "HelloWorldWorkflow" # This can be found on the Workflows page
    }

    Screenshot of AWS Step Function execution graph for the HelloWorld workflow Executed workflow as seen in AWS Console

    Output/Results

    The Executions page presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information. The rule defined in the previous section should start an execution of its own accord, and the status of that execution can be tracked here.

    To get some deeper information on the execution, click on the value in the Name column of your execution of interest. This should bring up a visual representation of the workflow similar to that shown above, execution details, and a list of events.

    Summary

    Setting up the HelloWorld workflow on the Cumulus dashboard is the tip of the iceberg, so to speak. The task and step-function need to be configured before Cumulus deployment. A compatible collection and provider must be configured and applied to the rule. Finally, workflow execution status can be viewed via the workflows tab on the dashboard.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/data-cookbooks/ingest-notifications/index.html b/docs/v11.0.0/data-cookbooks/ingest-notifications/index.html index 65b1f2c2a44..dee9e0a6038 100644 --- a/docs/v11.0.0/data-cookbooks/ingest-notifications/index.html +++ b/docs/v11.0.0/data-cookbooks/ingest-notifications/index.html @@ -5,13 +5,13 @@ Ingest Notification in Workflows | Cumulus Documentation - +
    Version: v11.0.0

    Ingest Notification in Workflows

    On deployment, an SQS queue and three SNS topics, one for executions, granules, and PDRs, are created and used for handling notification messages related to the workflow.

    The ingest notification reporting SQS queue is populated via a Cloudwatch rule for any Step Function execution state transitions. The sfEventSqsToDbRecords Lambda consumes this queue. The queue and Lambda are included in the cumulus module and the Cloudwatch rule in the workflow module and are included by default in a Cumulus deployment.

    The sfEventSqsToDbRecords Lambda function reads from the sfEventSqsToDbRecordsInputQueue queue and updates the RDS database records for granules, executions, and PDRs. When the records are updated, messages are posted to the three SNS topics. This Lambda is invoked both when the workflow starts and when it reaches a terminal state (completion or failure).

    Diagram of architecture for reporting workflow ingest notifications from AWS Step Functions

    Sending SQS messages to report status

    Publishing granule/PDR reports directly to the SQS queue

    If you have a non-Cumulus workflow or process ingesting data and would like to update the status of your granules or PDRs, you can publish directly to the reporting SQS queue. Publishing messages to this queue will result in those messages being stored as granule/PDR records in the Cumulus database and having the status of those granules/PDRs being visible on the Cumulus dashboard. The queue does have certain expectations as it expects a Cumulus Message nested within a Cloudwatch Step Function Event object.

    Posting directly to the queue will require knowing the queue URL. Assuming that you are using the cumulus module for your deployment, you can get the queue URL by adding them to outputs.tf for your Terraform deployment as in our example deployment:

    output "stepfunction_event_reporter_queue_url" {
    value = module.cumulus.stepfunction_event_reporter_queue_url
    }

    output "report_executions_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_granules_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_pdrs_sns_topic_arn" {
    value = module.cumulus.report_pdrs_sns_topic_arn
    }

    Then, when you run terraform deploy, you should see the topic ARNs printed to your console:

    Outputs:
    ...
    stepfunction_event_reporter_queue_url = https://sqs.us-east-1.amazonaws.com/xxxxxxxxx/<prefix>-sfEventSqsToDbRecordsInputQueue
    report_executions_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_granules_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_pdrs_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-pdrs-topic

    Once you have the queue URL, you can use the AWS SDK for your language of choice to publish messages to the topic. The expected format of these messages is that of a Cloudwatch Step Function event containing a Cumulus message. For SUCCEEDED events, the Cumulus message is expected to be in detail.output. For all other events statuses, a Cumulus Message is expected in detail.input. The Cumulus Message populating these fields MUST be a JSON string, not an object. Messages that do not conform to the schemas will fail to be created as records.

    If you are not seeing records persist to the database or show up in the Cumulus dashboard, you can investigate the Cloudwatch logs of the SQS consumer Lambda:

    • /aws/lambda/<prefix>-sfEventSqsToDbRecords

    In a workflow

    As described above, ingest notifications will automatically be published to the SNS topics on workflow start and completion/failure, so you should not include a workflow step to publish the initial or final status of your workflows.

    However, if you want to report your ingest status at any point during a workflow execution, you can add a workflow step using the SfSqsReport Lambda. In the following example from cumulus-tf/parse_pdr_workflow.tf, the ParsePdr workflow is configured to use the SfSqsReport Lambda, primarily to update the PDR ingestion status.

    Note: ${sf_sqs_report_task_arn} is an interpolated value referring to a Terraform resource. See the example deployment code for the ParsePdr workflow.

      "PdrStatusReport": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    },
    "ResultPath": null,
    "Type": "Task",
    "Resource": "${sf_sqs_report_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WaitForSomeTime"
    },

    Subscribing additional listeners to SNS topics

    Additional listeners to SNS topics can be configured in a .tf file for your Cumulus deployment. Shown below is configuration that subscribes an additional Lambda function (test_lambda) to receive messages from the report_executions SNS topic. To subscribe to the report_granules or report_pdrs SNS topics instead, simply replace report_executions in the code block below with either of those values.

    resource "aws_lambda_function" "test_lambda" {
    function_name = "${var.prefix}-testLambda"
    filename = "./testLambda.zip"
    source_code_hash = filebase64sha256("./testLambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"
    }

    resource "aws_sns_topic_subscription" "test_lambda" {
    topic_arn = module.cumulus.report_executions_sns_topic_arn
    protocol = "lambda"
    endpoint = aws_lambda_function.test_lambda.arn
    }

    resource "aws_lambda_permission" "test_lambda" {
    action = "lambda:InvokeFunction"
    function_name = aws_lambda_function.test_lambda.arn
    principal = "sns.amazonaws.com"
    source_arn = module.cumulus.report_executions_sns_topic_arn
    }

    SNS message format

    Subscribers to the SNS topics can expect to find the published message in the SNS event at Records[0].Sns.Message. The message will be a JSON stringified version of the ingest notification record for an execution or a PDR. For granules, the message will be a JSON stringified object with ingest notification record in the record property and the event type as the event property.

    The ingest notification record of the execution, granule, or PDR should conform to the data model schema for the given record type.

    Summary

    Workflows can be configured to send SQS messages at any point using the sf-sqs-report task.

    Additional listeners can be easily configured to trigger when messages are sent to the SNS topics.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/data-cookbooks/queue-post-to-cmr/index.html b/docs/v11.0.0/data-cookbooks/queue-post-to-cmr/index.html index 92898936d3f..fe6dfa0780e 100644 --- a/docs/v11.0.0/data-cookbooks/queue-post-to-cmr/index.html +++ b/docs/v11.0.0/data-cookbooks/queue-post-to-cmr/index.html @@ -5,13 +5,13 @@ Queue PostToCmr | Cumulus Documentation - +
    Version: v11.0.0

    Queue PostToCmr

    In this document, we walk through handling CMR errors in workflows by queueing PostToCmr. We assume that the user already has an ingest workflow setup.

    Overview

    The general concept is that the last task of the ingest workflow will be QueueWorkflow, which queues the publish workflow. The publish workflow contains the PostToCmr task and if a CMR error occurs during PostToCmr, the publish workflow will add itself back onto the queue so that it can be executed when CMR is back online. This is achieved by leveraging the QueueWorkflow task again in the publish workflow. The following diagram demonstrates this queueing process.

    Diagram of workflow queueing

    Ingest Workflow

    The last step should be the QueuePublishWorkflow step. It should be configured with a queueUrl and workflow. In this case, the queueUrl is a throttled queue. Any queueUrl can be specified here which is useful if you would like to use a lower priority queue. The workflow is the unprefixed workflow name that you would like to queue (e.g. PublishWorkflow).

      "QueuePublishWorkflowStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "workflow": "{$.meta.workflow}",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Publish Workflow

    Configure the Catch section of your PostToCmr task to proceed to QueueWorkflow if a CMRInternalError is caught. Any other error will cause the workflow to fail.

      "Catch": [
    {
    "ErrorEquals": [
    "CMRInternalError"
    ],
    "Next": "RequeueWorkflow"
    },
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],

    Then, configure the QueueWorkflow task similarly to its configuration in the ingest workflow. This time, pass the current publish workflow to the task config. This allows for the publish workflow to be requeued when there is a CMR error.

    {
    "RequeueWorkflow": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "workflow": "PublishGranuleQueue",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    - + \ No newline at end of file diff --git a/docs/v11.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html b/docs/v11.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html index 7c36e8c3b7e..d137ce97327 100644 --- a/docs/v11.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html +++ b/docs/v11.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html @@ -5,13 +5,13 @@ Run Step Function Tasks in AWS Lambda or Docker | Cumulus Documentation - +
    Version: v11.0.0

    Run Step Function Tasks in AWS Lambda or Docker

    Overview

    AWS Step Function Tasks can run tasks on AWS Lambda or on AWS Elastic Container Service (ECS) as a Docker container.

    Lambda provides serverless architecture, providing the best option for minimizing cost and server management. ECS provides the fullest extent of AWS EC2 resources via the flexibility to execute arbitrary code on any AWS EC2 instance type.

    When to use Lambda

    You should use AWS Lambda whenever all of the following are true:

    • The task runs on one of the supported Lambda Runtimes. At time of this writing, supported runtimes include versions of python, Java, Ruby, node.js, Go and .NET.
    • The lambda package is less than 50 MB in size, zipped.
    • The task consumes less than each of the following resources:
      • 3008 MB memory allocation
      • 512 MB disk storage (must be written to /tmp)
      • 15 minutes of execution time

    See this page for a complete and up-to-date list of AWS Lambda limits.

    If your task requires more than any of these resources or an unsupported runtime, creating a Docker image which can be run on ECS is the way to go. Cumulus supports running any lambda package (and its configured layers) as a Docker container with cumulus-ecs-task.

    Step Function Activities and cumulus-ecs-task

    Step Function Activities enable a state machine task to "publish" an activity task which can be picked up by any activity worker. Activity workers can run pretty much anywhere, but Cumulus workflows support the cumulus-ecs-task activity worker. The cumulus-ecs-task worker runs as a Docker container on the Cumulus ECS cluster.

    The cumulus-ecs-task container takes an AWS Lambda Amazon Resource Name (ARN) as an argument (see --lambdaArn in the example below). This ARN argument is defined at deployment time. The cumulus-ecs-task worker polls for new Step Function Activity Tasks. When a Step Function executes, the worker (container) picks up the activity task and runs the code contained in the lambda package defined on deployment.

    Example: Replacing AWS Lambda with a Docker container run on ECS

    This example will use an already-defined workflow from the cumulus module that includes the QueueGranules task in its configuration.

    The following example is an excerpt from the Discover Granules workflow containing the step definition for the QueueGranules step:

    Note: ${ingest_granule_workflow_name} and ${queue_granules_task_arn} are interpolated values that refer to Terraform resources. See the example deployment code for the Discover Granules workflow.

      "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "queueUrl": "{$.meta.queues.startSF}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Given it has been discovered this task can no longer run in AWS Lambda, you can instead run it on the Cumulus ECS cluster by adding the following resources to your terraform deployment (by either adding a new .tf file or updating an existing one):

    • A aws_sfn_activity resource:
    resource "aws_sfn_activity" "queue_granules" {
    name = "${var.prefix}-QueueGranules"
    }
    • An instance of the cumulus_ecs_service module (found on the Cumulus releases page configured to provide the QueueGranules task:

    module "queue_granules_service" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-ecs-service.zip"

    prefix = var.prefix
    name = "QueueGranules"

    cluster_arn = module.cumulus.ecs_cluster_arn
    desired_count = 1
    image = "cumuluss/cumulus-ecs-task:1.7.0"

    cpu = 400
    memory_reservation = 700

    environment = {
    AWS_DEFAULT_REGION = data.aws_region.current.name
    }
    command = [
    "cumulus-ecs-task",
    "--activityArn",
    aws_sfn_activity.queue_granules.id,
    "--lambdaArn",
    module.cumulus.queue_granules_task.task_arn
    ]
    alarms = {
    MemoryUtilizationHigh = {
    comparison_operator = "GreaterThanThreshold"
    evaluation_periods = 1
    metric_name = "MemoryUtilization"
    statistic = "SampleCount"
    threshold = 75
    }
    }
    }

    Please note: If you have updated the code for the Lambda specified by --lambdaArn, you will have to manually restart the tasks in your ECS service before invocation of the Step Function activity will use the updated Lambda code.

    • An updated Discover Granules workflow) to utilize the new resource (the Resource key in the QueueGranules step has been updated to:

    "Resource": "${aws_sfn_activity.queue_granules.id}")`

    If you then run this workflow in place of the DiscoverGranules workflow, the QueueGranules step would run as an ECS task instead of a lambda.

    Final note

    Step Function Activities and AWS Lambda are not the only ways to run tasks in an AWS Step Function. Learn more about other service integrations, including direct ECS integration via the AWS Service Integrations page.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/data-cookbooks/sips-workflow/index.html b/docs/v11.0.0/data-cookbooks/sips-workflow/index.html index 1b3335a89a2..246a2ef9485 100644 --- a/docs/v11.0.0/data-cookbooks/sips-workflow/index.html +++ b/docs/v11.0.0/data-cookbooks/sips-workflow/index.html @@ -5,7 +5,7 @@ Science Investigator-led Processing Systems (SIPS) | Cumulus Documentation - + @@ -16,7 +16,7 @@ we're just going to create a onetime throw-away rule that will be easy to test with. This rule will kick off the DiscoverAndQueuePdrs workflow, which is the beginning of a Cumulus SIPS workflow:

    Screenshot of a Cumulus rule configuration

    Note: A list of configured workflows exists under the "Workflows" in the navigation bar on the Cumulus dashboard. Additionally, one can find a list of executions and their respective status in the "Executions" tab in the navigation bar.

    DiscoverAndQueuePdrs Workflow

    This workflow will discover PDRs and queue them to be processed. Duplicate PDRs will be dealt with according to the configured duplicate handling setting in the collection. The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. DiscoverPdrs - source
    2. QueuePdrs - source

    Screenshot of execution graph for discover and queue PDRs workflow in the AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the discover_and_queue_pdrs_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    ParsePdr Workflow

    The ParsePdr workflow will parse a PDR, queue the specified granules (duplicates are handled according to the duplicate handling setting) and periodically check the status of those queued granules. This workflow will not succeed until all the granules included in the PDR are successfully ingested. If one of those fails, the ParsePdr workflow will fail. NOTE that ParsePdr may spin up multiple IngestGranule workflows in parallel, depending on the granules included in the PDR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. ParsePdr - source
    2. QueueGranules - source
    3. CheckStatus - source

    Screenshot of execution graph for SIPS Parse PDR workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the parse_pdr_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    IngestGranule Workflow

    The IngestGranule workflow processes and ingests a granule and posts the granule metadata to CMR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. SyncGranule - source.
    2. CmrStep - source

    Additionally this workflow requires a processing step you must provide. The ProcessingStep step in the workflow picture below is an example of a custom processing step.

    Note: Using the CmrStep is not required and can be left out of the processing trajectory if desired (for example, in testing situations).

    Screenshot of execution graph for SIPS IngestGranule workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the ingest_and_publish_granule_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    Summary

    In this cookbook we went over setting up a collection, rule, and provider for a SIPS workflow. Once we had the setup completed, we looked over the Cumulus workflows that participate in parsing PDRs, ingesting and processing granules, and updating CMR.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/data-cookbooks/throttling-queued-executions/index.html b/docs/v11.0.0/data-cookbooks/throttling-queued-executions/index.html index 21c7bf658d1..2210769ac5e 100644 --- a/docs/v11.0.0/data-cookbooks/throttling-queued-executions/index.html +++ b/docs/v11.0.0/data-cookbooks/throttling-queued-executions/index.html @@ -5,13 +5,13 @@ Throttling queued executions | Cumulus Documentation - +
    Version: v11.0.0

    Throttling queued executions

    In this entry, we will walk through how to create an SQS queue for scheduling executions which will be used to limit those executions to a maximum concurrency. And we will see how to configure our Cumulus workflows/rules to use this queue.

    We will also review the architecture of this feature and highlight some implementation notes.

    Limiting the number of executions that can be running from a given queue is useful for controlling the cloud resource usage of workflows that may be lower priority, such as granule reingestion or reprocessing campaigns. It could also be useful for preventing workflows from exceeding known resource limits, such as a maximum number of open connections to a data provider.

    Implementing the queue

    Create and deploy the queue

    Add a new queue

    In a .tf file for your Cumulus deployment, add a new SQS queue:

    resource "aws_sqs_queue" "background_job_queue" {
    name = "${var.prefix}-backgroundJobQueue"
    receive_wait_time_seconds = 20
    visibility_timeout_seconds = 60
    }

    Set maximum executions for the queue

    Define the throttled_queues variable for the cumulus module in your Cumulus deployment to specify the maximum concurrent executions for the queue.

    module "cumulus" {
    # ... other variables

    throttled_queues = [{
    url = aws_sqs_queue.background_job_queue.id,
    execution_limit = 5
    }]
    }

    Setup consumer for the queue

    Add the sqs2sfThrottle Lambda as the consumer for the queue and add a Cloudwatch event rule/target to read from the queue on a scheduled basis.

    Please note: You must use the sqs2sfThrottle Lambda as the consumer for any queue with a queue execution limit or else the execution throttling will not work correctly. Additionally, please allow at least 60 seconds after creation before using the queue while associated infrastructure and triggers are set up and made ready.

    aws_sqs_queue.background_job_queue.id refers to the queue resource defined above.

    resource "aws_cloudwatch_event_rule" "background_job_queue_watcher" {
    schedule_expression = "rate(1 minute)"
    }

    resource "aws_cloudwatch_event_target" "background_job_queue_watcher" {
    rule = aws_cloudwatch_event_rule.background_job_queue_watcher.name
    arn = module.cumulus.sqs2sfThrottle_lambda_function_arn
    input = jsonencode({
    messageLimit = 500
    queueUrl = aws_sqs_queue.background_job_queue.id
    timeLimit = 60
    })
    }

    resource "aws_lambda_permission" "background_job_queue_watcher" {
    action = "lambda:InvokeFunction"
    function_name = module.cumulus.sqs2sfThrottle_lambda_function_arn
    principal = "events.amazonaws.com"
    source_arn = aws_cloudwatch_event_rule.background_job_queue_watcher.arn
    }

    Re-deploy your Cumulus application

    Follow the instructions to re-deploy your Cumulus application. After you have re-deployed, your workflow template will be updated to the include information about the queue (the output below is partial output from an expected workflow template):

    {
    "cumulus_meta": {
    "queueExecutionLimits": {
    "<backgroundJobQueue_SQS_URL>": 5
    }
    }
    }

    Integrate your queue with workflows and/or rules

    Integrate queue with queuing steps in workflows

    For any workflows using QueueGranules or QueuePdrs that you want to use your new queue, update the Cumulus configuration of those steps in your workflows.

    As seen in this partial configuration for a QueueGranules step, update the queueUrl to reference the new throttled queue:

    Note: ${ingest_granule_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverGranules workflow.

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}"
    }
    }
    }
    }
    }

    Similarly, for a QueuePdrs step:

    Note: ${parse_pdr_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverPdrs workflow.

    {
    "QueuePdrs": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "parsePdrWorkflow": "${parse_pdr_workflow_name}"
    }
    }
    }
    }
    }

    After making these changes, re-deploy your Cumulus application for the execution throttling to take effect on workflow executions queued by these workflows.

    Create/update a rule to use your new queue

    Create or update a rule definition to include a queueUrl property that refers to your new queue:

    {
    "name": "s3_provider_rule",
    "workflow": "DiscoverAndQueuePdrs",
    "provider": "s3_provider",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "queueUrl": "<backgroundJobQueue_SQS_URL>" // configure rule to use your queue URL
    }

    After creating/updating the rule, any subsequent invocations of the rule should respect the maximum number of executions when starting workflows from the queue.

    Architecture

    Architecture diagram showing how executions started from a queue are throttled to a maximum concurrent limit

    Execution throttling based on the queue works by manually keeping a count (semaphore) of how many executions are running for the queue at a time. The key operation that prevents the number of executions from exceeding the maximum for the queue is that before starting new executions, the sqs2sfThrottle Lambda attempts to increment the semaphore and responds as follows:

    • If the increment operation is successful, then the count was not at the maximum and an execution is started
    • If the increment operation fails, then the count was already at the maximum so no execution is started

    Final notes

    Limiting the number of concurrent executions for work scheduled via a queue has several consequences worth noting:

    • The number of executions that are running for a given queue will be limited to the maximum for that queue regardless of which workflow(s) are started.
    • If you use the same queue to schedule executions across multiple workflows/rules, then the limit on the total number of executions running concurrently will be applied to all of the executions scheduled across all of those workflows/rules.
    • If you are scheduling the same workflow both via a queue with a maxExecutions value and a queue without a maxExecutions value, only the executions scheduled via the queue with the maxExecutions value will be limited to the maximum.
    - + \ No newline at end of file diff --git a/docs/v11.0.0/data-cookbooks/tracking-files/index.html b/docs/v11.0.0/data-cookbooks/tracking-files/index.html index 67238941ce9..7993fbda151 100644 --- a/docs/v11.0.0/data-cookbooks/tracking-files/index.html +++ b/docs/v11.0.0/data-cookbooks/tracking-files/index.html @@ -5,7 +5,7 @@ Tracking Ancillary Files | Cumulus Documentation - + @@ -19,7 +19,7 @@ The UMM-G column reflects the RelatedURL's Type derived from the CNM type, whereas the ECHO10 column shows how the CNM type affects the destination element.

    CNM TypeUMM-G RelatedUrl.TypeECHO10 Location
    ancillary'VIEW RELATED INFORMATION'OnlineResource
    data'GET DATA'(HTTPS URL) or 'GET DATA VIA DIRECT ACCESS'(S3 URI)OnlineAccessURL
    browse'GET RELATED VISUALIZATION'AssociatedBrowseImage
    linkage'EXTENDED METADATA'OnlineResource
    metadata'EXTENDED METADATA'OnlineResource
    qa'EXTENDED METADATA'OnlineResource

    Common Use Cases

    This section briefly documents some common use cases and the recommended configuration for the file. The examples shown here are for the DiscoverGranules use case, which allows configuration at the Cumulus dashboard level. The other two cases covered in the ancillary metadata documentation require configuration at the provider notification level (either CNM message or PDR) and are not covered here.

    Configuring browse imagery:

    {
    "bucket": "public",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_[\\d]{1}.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_1.jpg",
    "type": "browse"
    }

    Configuring a documentation entry:

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_README.pdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_README.pdf",
    "type": "metadata"
    }

    Configuring other associated files (use types metadata or qa as appropriate):

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_QA.txt$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_QA.txt",
    "type": "qa"
    }
    - + \ No newline at end of file diff --git a/docs/v11.0.0/deployment/api-gateway-logging/index.html b/docs/v11.0.0/deployment/api-gateway-logging/index.html index dc7f53a6e53..87b7bfe9a9d 100644 --- a/docs/v11.0.0/deployment/api-gateway-logging/index.html +++ b/docs/v11.0.0/deployment/api-gateway-logging/index.html @@ -5,13 +5,13 @@ API Gateway Logging | Cumulus Documentation - +
    Version: v11.0.0

    API Gateway Logging

    Enabling API Gateway logging

    In order to enable distribution API Access and execution logging, configure the TEA deployment by setting log_api_gateway_to_cloudwatch on the thin_egress_app module:

    log_api_gateway_to_cloudwatch = true

    This enables the distribution API to send its logs to the default CloudWatch location: API-Gateway-Execution-Logs_<RESTAPI_ID>/<STAGE>

    Configure Permissions for API Gateway Logging to CloudWatch

    Instructions for enabling account level logging from API Gateway to CloudWatch

    This is a one time operation that must be performed on each AWS account to allow API Gateway to push logs to CloudWatch.

    Create a policy document

    The AmazonAPIGatewayPushToCloudWatchLogs managed policy, with an ARN of arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs, has all the required permissions to enable API Gateway logging to CloudWatch. To grant these permissions to your account, first create an IAM role with apigateway.amazonaws.com as its trusted entity.

    Save this snippet as apigateway-policy.json.

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "",
    "Effect": "Allow",
    "Principal": {
    "Service": "apigateway.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
    }
    ]
    }

    Create an account role to act as ApiGateway and write to CloudWatchLogs

    NASA users in NGAP: be sure to use your account's permission boundary.

    aws iam create-role \
    --role-name ApiGatewayToCloudWatchLogs \
    [--permissions-boundary <permissionBoundaryArn>] \
    --assume-role-policy-document file://apigateway-policy.json

    Note the ARN of the returned role for the last step.

    Attach correct permissions to role

    Next attach the AmazonAPIGatewayPushToCloudWatchLogs policy to the IAM role.

    aws iam attach-role-policy \
    --role-name ApiGatewayToCloudWatchLogs \
    --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs"

    Update Account API Gateway settings with correct permissions

    Finally, set the IAM role ARN on the cloudWatchRoleArn property on your API Gateway Account settings.

    aws apigateway update-account \
    --patch-operations op='replace',path='/cloudwatchRoleArn',value='<ApiGatewayToCloudWatchLogs ARN>'

    Configure API Gateway CloudWatch Logs Delivery

    See Configure Cloudwatch Logs Delivery

    - + \ No newline at end of file diff --git a/docs/v11.0.0/deployment/cloudwatch-logs-delivery/index.html b/docs/v11.0.0/deployment/cloudwatch-logs-delivery/index.html index e15291fa2af..702750e10e7 100644 --- a/docs/v11.0.0/deployment/cloudwatch-logs-delivery/index.html +++ b/docs/v11.0.0/deployment/cloudwatch-logs-delivery/index.html @@ -5,13 +5,13 @@ Configure Cloudwatch Logs Delivery | Cumulus Documentation - +
    Version: v11.0.0

    Configure Cloudwatch Logs Delivery

    As an optional configuration step, it is possible to deliver CloudWatch logs to a cross-account shared AWS::Logs::Destination. An operator does this by configuring the cumulus module for your deployment as shown below. The value of the log_destination_arn variable is the ARN of a writeable log destination.

    The value can be either an AWS::Logs::Destination or a Kinesis Stream ARN to which your account can write.

    log_destination_arn           = arn:aws:[kinesis|logs]:us-east-1:123456789012:[streamName|destination:logDestinationName]

    Logs Sent

    Be default, the following logs will be sent to the destination when one is given.

    • Ingest logs
    • Async Operation logs
    • Thin Egress App API Gateway logs (if configured)

    Additional Logs

    If additional logs are needed, you can configure additional_log_groups_to_elk with the Cloudwatch log groups you want to send to the destination. additional_log_groups_to_elk is a map with the key as a descriptor and the value with the Cloudwatch log group name.

    additional_log_groups_to_elk = {
    "HelloWorldTask" = "/aws/lambda/cumulus-example-HelloWorld"
    "MyCustomTask" = "my-custom-task-log-group"
    }
    - + \ No newline at end of file diff --git a/docs/v11.0.0/deployment/components/index.html b/docs/v11.0.0/deployment/components/index.html index 0989f2525a3..2b803d84d95 100644 --- a/docs/v11.0.0/deployment/components/index.html +++ b/docs/v11.0.0/deployment/components/index.html @@ -5,7 +5,7 @@ Component-based Cumulus Deployment | Cumulus Documentation - + @@ -39,7 +39,7 @@ Terraform at the same time.

    With remote state, Terraform writes the state data to a remote data store, which can then be shared between all members of a team.

    The recommended approach for handling remote state with Cumulus is to use the S3 backend. This backend stores state in S3 and uses a DynamoDB table for locking.

    See the deployment documentation for a walk-through of creating resources for your remote state using an S3 backend.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/deployment/create_bucket/index.html b/docs/v11.0.0/deployment/create_bucket/index.html index b183e8047bc..253e55fc4cb 100644 --- a/docs/v11.0.0/deployment/create_bucket/index.html +++ b/docs/v11.0.0/deployment/create_bucket/index.html @@ -5,13 +5,13 @@ Creating an S3 Bucket | Cumulus Documentation - +
    Version: v11.0.0

    Creating an S3 Bucket

    Buckets can be created on the command line with AWS CLI or via the web interface on the AWS console.

    When creating a protected bucket (a bucket containing data which will be served through the distribution API), make sure to enable S3 server access logging. See S3 Server Access Logging for more details.

    Command line

    Using the AWS command line tool create-bucket s3api subcommand:

    $ aws s3api create-bucket \
    --bucket foobar-internal \
    --region us-west-2 \
    --create-bucket-configuration LocationConstraint=us-west-2
    {
    "Location": "/foobar-internal"
    }

    Note: The region and create-bucket-configuration arguments are only necessary if you are creating a bucket outside of the us-east-1 region.

    Please note security settings and other bucket options can be set via the options listed in the s3api documentation.

    Repeat the above step for each bucket to be created.

    Web interface

    See: AWS "Creating a Bucket" documentation

    - + \ No newline at end of file diff --git a/docs/v11.0.0/deployment/cumulus_distribution/index.html b/docs/v11.0.0/deployment/cumulus_distribution/index.html index 2364b5a6bd1..05edd43c0a7 100644 --- a/docs/v11.0.0/deployment/cumulus_distribution/index.html +++ b/docs/v11.0.0/deployment/cumulus_distribution/index.html @@ -5,14 +5,14 @@ Using the Cumulus Distribution API | Cumulus Documentation - +
    Version: v11.0.0

    Using the Cumulus Distribution API

    The Cumulus Distribution API is a set of endpoints that can be used to enable AWS Cognito authentication when downloading data from S3.

    Configuring a Cumulus Distribution deployment

    The Cumulus Distribution API is included in the main Cumulus repo. It is available as part of the terraform-aws-cumulus.zip archive in the latest release.

    These steps assume you're using the Cumulus Deployment Template but can also be used for custom deployments.

    To configure a deployment to use Cumulus Distribution:

    1. Remove or comment the "Thin Egress App Settings" in the Cumulus Template Deploy and enable the Cumulus Distribution settings.
    2. Delete or comment the contents of thin_egress_app.tf and the corresponding Thin Egress App outputs in outputs.tf. These are not necessary for a Cumulus Distribution deployment.
    3. Uncomment the Cumulus Distribution outputs in outputs.tf.
    4. Rename cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.example to cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.

    Cognito Application and User Credentials

    The major prerequisite for using the Cumulus Distribution API is to set up Cognito. If operating within NGAP, this should already be done for you. If operating outside of NGAP, you must set up Cognito yourself, which is beyond the scope of this documentation.

    Given that Cognito is set up, in order to be able to download granule files via the Cumulus Distribution API, you must obtain Cognito user credentials, because any attempt to download such files (that will be, or have been, published to the CMR via your Cumulus deployment) will result in a prompt for you to supply Cognito user credentials. To obtain your own user credentials, talk to your product owner or scrum master for additional information. They should either know how to create the credentials, know who can create them for the team, or be the liaison to the Cognito team.

    Further, whoever helps to obtain your Cognito user credentials should also be able to supply you with the values for the following new variables that you must add to your cumulus-tf/terraform.tfvars file:

    • csdap_host_url: The URL of the Cognito service to which your Cumulus deployment will make Cognito API calls during a distribution (download) event
    • csdap_client_id: The client ID for the Cumulus application registered within the Cognito service
    • csdap_client_password: The client password for the Cumulus application registered within the Cognito service

    Although you might have to wait a bit for your Cognito user credentials, the remaining instructions do not depend upon having them, so you may continue with these instructions while waiting for your credentials.

    Cumulus Distribution URL

    Your Cumulus Distribution URL is used by Cumulus to generate download URLs as part of the granule metadata generated and published to the CMR. For example, a granule download URL will be of the form <distribution url>/<protected bucket>/<key> (or <distribution url>/path/to/file, if using a custom bucket map, as explained further below).

    By default, the value of your distribution URL is the URL of your private Cumulus Distribution API Gateway (the API Gateway named <prefix>-distribution, once you deploy the Cumulus Distribution module). Therefore, by default, the generated download URLs are private, and thus inaccessible directly, but there are 2 ways to address this issue (both of which are detailed below): (a) use tunneling (typically in development) or (b) put a CloudFront URL in front of your API Gateway (typically in production, and perhaps UAT and/or SIT).

    In either case, you must first know the default URL (i.e., the URL for the private Cumulus Distribution API Gateway). In order to obtain this default URL, you must first deploy your cumulus-tf module with the new Cumulus Distribution module, and once your initial deployment is complete, one of the Terraform outputs will be cumulus_distribution_api_uri, which is the URL for the private API Gateway.

    You may override this default URL by adding a cumulus_distribution_url variable to your cumulus-tf/terraform.tfvars file, and setting it to one of the following values (both of which are explained below):

    1. The default URL, but with a port added to it, in order to allow you to configure tunneling (typically only in development)
    2. A CloudFront URL placed in front of your Cumulus Distribution API Gateway (typically only for Production, but perhaps also for a UAT or SIT environment)

    The following subsections explain these approaches, in turn.

    Using your Cumulus Distribution API Gateway URL as your distribution URL

    Since your Cumulus Distribution API Gateway URL is private, the only way you can use it to confirm that your integration with Cognito is working is by using tunneling (again, generally for development), as described here. Here is an outline of the required steps, with details provided further below:

    1. Create/import a key pair into your AWS EC2 service (if you haven't already done so)
    2. Add a reference to the name of the key pair to your Terraform variables (we'll set the key_name Terraform variable)
    3. Choose an open local port on your machine (we'll use 9000 in the following details)
    4. Add a reference to the value of your cumulus_distribution_api_uri (mentioned earlier), including your chosen port (we'll set the cumulus_distribution_url Terraform variable)
    5. Redeploy Cumulus
    6. Add an entry to your /etc/hosts file
    7. Add a redirect URI to Cognito, via the Cognito API
    8. Install the Session Manager Plugin for the AWS CLI (if you haven't already done so; assuming you have already installed the AWS CLI)
    9. Add a sample file to S3 to test downloading via Cognito

    To create or import an existing key pair, you can use the AWS CLI (see aws ec2 import-key-pair), or the AWS Console (see Amazon EC2 key pairs and Linux instances).

    Once your key pair is added to AWS, add the following to your cumulus-tf/terraform.tfvars file:

    key_name = "<name>"
    cumulus_distribution_url = "https://<id>.execute-api.<region>.amazonaws.com:<port>/dev/"

    where:

    • <name> is the name of the key pair you just added to AWS
    • <id> and <region> are the corresponding parts from your cumulus_distribution_api_uri output variable
    • <port> is your open local port of choice (9000 is typically a good choice)

    Once you save your variable changes, redeploy your cumulus-tf module.

    While your deployment runs, add the following entry to your /etc/hosts file, replacing <hostname> with the host name of the cumulus_distribution_url Terraform variable you just added above:

    localhost <hostname>

    Next, you'll need to use the Cognito API to add the value of your cumulus_distribution_url Terraform variable as a Cognito redirect URI. To do so, use your favorite tool (e.g., curl, wget, Postman, etc.) to make a BasicAuth request to the Cognito API, using the following details:

    • method: POST
    • base URL: the value of your csdap_host_url Terraform variable
    • path: /authclient/updateRedirectUri
    • username: the value of your csdap_client_id Terraform variable
    • password: the value of your csdap_client_password Terraform variable
    • headers: Content-Type='application/x-www-form-urlencoded'
    • body: redirect_uri=<cumulus_distribution_url>/login

    where <cumulus_distribution_url> is the value of your cumulus_distribution_url Terraform variable. Note the /login path at the end of the redirect_uri value.

    For reference, see the Cognito Authentication Service API.

    Next, install the Session Manager Plugin for the AWS CLI. If running on macOS, and you use Homebrew, you can install it simply as follows:

    brew install --cask session-manager-plugin --no-quarantine

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    At this point, you should be ready to open a tunnel and attempt to download your sample file via your browser, summarized as follows:

    1. Determine your ec2 instance ID
    2. Connect to the NASA VPN
    3. Start an AWS SSM session
    4. Open an ssh tunnel
    5. Use a browser to navigate to your file

    To determine your ec2 instance ID for your Cumulus deployment, run the follow command, where <profile> is the name of the appropriate AWS profile to use, and <prefix> is the value of your prefix Terraform variable:

    aws --profile <profile> ec2 describe-instances --filters Name=tag:Deployment,Values=<prefix> Name=instance-state-name,Values=running --query "Reservations[0].Instances[].InstanceId" --output text

    IMPORTANT: Before proceeding with the remaining steps, make sure you're connected to the NASA VPN.

    Use the value output from the command above in place of <id> in the following command, which will start an SSM session:

    aws ssm start-session --target <id> --document-name AWS-StartPortForwardingSession --parameters portNumber=22,localPortNumber=6000

    If successful, you should see output similar to the following:

    Starting session with SessionId: NGAPShApplicationDeveloper-***
    Port 6000 opened for sessionId NGAPShApplicationDeveloper-***.
    Waiting for connections...

    Open another terminal window, and open a tunnel with port forwarding, using your chosen port from above (e.g., 9000):

    ssh -4 -p 6000 -N -L <port>:<api-gateway-host>:443 ec2-user@127.0.0.1

    where:

    • <port> is the open local port you chose earlier (e.g., 9000)
    • <api-gateway-host> is the hostname of your private API Gateway (i.e., the host portion of the URL you used as the value of your cumulus_distribution_url Terraform variable above)

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3 above.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    Once you're finished testing, clean up as follows:

    1. Kill your ssh tunnel (Ctrl-C)
    2. Kill your AWS SSM session (Ctrl-C)
    3. If you like, disconnect from the NASA VPC

    While this is a relatively lengthy process, things are much easier when using CloudFront, such as in Production (OPS), SIT, or UAT, as explained next.

    Using a CloudFront URL as your distribution URL

    In Production (OPS), and perhaps in other environments, such as UAT and SIT, you'll need to provide a publicly accessible URL for users to use for downloading (distributing) granule files.

    This is generally done by placing a CloudFront URL in front of your private Cumulus Distribution API Gateway. In order to create such a CloudFront URL, contact the person who helped you obtain your Cognito credentials, and request a CloudFront URL with the following details:

    • The private, backing URL, which is the value of your cumulus_distribution_api_uri Terraform output value
    • A request to add the AWS account's VPC to the whitelist

    Once this request is completed, and you obtain the new CloudFront URL, override your default distribution URL with the CloudFront URL by adding the following to your cumulus-tf/terraform.tfvars file:

    cumulus_distribution_url = <cloudfront_url>

    In addition, add a Cognito redirect URI, as detailed in the previous section. Note that in this case, the value you'll use for redirect_uri is <cloudfront_url>/login since the value of your cumulus_distribution_url is now your CloudFront URL.

    At this point, it is assumed that you have added the appropriate values for this environment for the variables described at the top (csdap_host_url, csdap_client_id, and csdap_client_password).

    Redeploy Cumulus with your new/updated Terraform variables.

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    S3 Bucket Mapping

    An S3 Bucket map allows users to abstract bucket names. If the bucket names change at any point, only the bucket map would need to be updated instead of every S3 link.

    The Cumulus Distribution API uses a bucket_map.yaml or bucket_map.yaml.tmpl file to determine which buckets to serve. See the examples.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple json mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    Note: Cumulus only supports a one-to-one mapping of bucket -> Cumulus Distribution path for 'distribution' buckets. Also, the bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Switching from the Thin Egress App to Cumulus Distribution

    If you have previously deployed the Thin Egress App (TEA) as your distribution app, you can switch to Cumulus Distribution by following the steps above.

    Note, however, that the cumulus_distribution module will generate a bucket map cache and overwrite any existing bucket map caches created by TEA.

    There will also be downtime while your API gateway is updated.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/deployment/index.html b/docs/v11.0.0/deployment/index.html index c3a0c4918db..3335dfa3d7f 100644 --- a/docs/v11.0.0/deployment/index.html +++ b/docs/v11.0.0/deployment/index.html @@ -5,7 +5,7 @@ How to Deploy Cumulus | Cumulus Documentation - + @@ -21,7 +21,7 @@ for deployment's EC2 instances and allows you to connect to them via SSH/SSM.

    Consider the sizing of your Cumulus instance when configuring your variables.

    Choose a distribution API

    Cumulus can be configured to use either the Thin Egress App (TEA) or the Cumulus Distribution API. The default selection is the Thin Egress App if you're using the Deployment Template.

    IMPORTANT! If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    Configure the Thin Egress App

    The Thin Egress App can be used for Cumulus distribution and is the default selection. It allows authentication using Earthdata Login. Follow the steps in the documentation to configure distribution in your cumulus-tf deployment.

    Configure the Cumulus Distribution API (optional)

    If you would prefer to use the Cumulus Distribution API, which supports AWS Cognito authentication, follow these steps to configure distribution in your cumulus-tf deployment.

    Initialize Terraform

    Follow the above instructions to initialize Terraform using terraform init3.

    Deploy

    Run terraform apply to deploy the resources. Type yes when prompted to confirm that you want to create the resources. Assuming the operation is successful, you should see output like this:

    Apply complete! Resources: 292 added, 0 changed, 0 destroyed.

    Outputs:

    archive_api_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/token
    archive_api_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/
    distribution_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/login
    distribution_url = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/

    Note: Be sure to copy the redirect URLs, as you will use them to update your Earthdata application.

    Update Earthdata Application

    You will need to add two redirect URLs to your EarthData login application.

    1. Login to URS.
    2. Under My Applications -> Application Administration -> use the edit icon of your application.
    3. Under Manage -> redirect URIs, add the Archive API url returned from the stack deployment
      • e.g. archive_api_redirect_uri = https://<czbbkscuy6>.execute-api.us-east-1.amazonaws.com/dev/token.
    4. Also add the Distribution url
      • e.g. distribution_redirect_uri = https://<kido2r7kji>.execute-api.us-east-1.amazonaws.com/dev/login1.
    5. You may delete the placeholder url you used to create the application.

    If you've lost track of the needed redirect URIs, they can be located on the API Gateway. Once there, select <prefix>-archive and/or <prefix>-thin-egress-app-EgressGateway, Dashboard and utilizing the base URL at the top of the page that is accompanied by the text Invoke this API at:. Make sure to append /token for the archive URL and /login to the thin egress app URL.


    Deploy Cumulus dashboard

    Dashboard Requirements

    Please note that the requirements are similar to the Cumulus stack deployment requirements. The installation instructions below include a step that will install/use the required node version referenced in the .nvmrc file in the dashboard repository.

    Prepare AWS

    Create S3 bucket for dashboard:

    • Create it, e.g. <prefix>-dashboard. Use the command line or console as you did when preparing AWS configuration.
    • Configure the bucket to host a website:
      • AWS S3 console: Select <prefix>-dashboard bucket then, "Properties" -> "Static Website Hosting", point to index.html
      • CLI: aws s3 website s3://<prefix>-dashboard --index-document index.html
    • The bucket's url will be http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or you can find it on the AWS console via "Properties" -> "Static website hosting" -> "Endpoint"
    • Ensure the bucket's access permissions allow your deployment user access to write to the bucket

    Install dashboard

    To install the dashboard, clone the Cumulus dashboard repository into the root deploy directory and install dependencies with npm install:

      git clone https://github.com/nasa/cumulus-dashboard
    cd cumulus-dashboard
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Dashboard versioning

    By default, the master branch will be used for dashboard deployments. The master branch of the dashboard repo contains the most recent stable release of the dashboard.

    If you want to test unreleased changes to the dashboard, use the develop branch.

    Each release/version of the dashboard will have a tag in the dashboard repo. Release/version numbers will use semantic versioning (major/minor/patch).

    To checkout and install a specific version of the dashboard:

      git fetch --tags
    git checkout <version-number> # e.g. v1.2.0
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Building the dashboard

    Note: These environment variables are available during the build: APIROOT, DAAC_NAME, STAGE, HIDE_PDR. Any of these can be set on the command line to override the values contained in config.js when running the build below.

    To configure your dashboard for deployment, set the APIROOT environment variable to your app's API root.2

    Build the dashboard from the dashboard repository root directory, cumulus-dashboard:

      APIROOT=<your_api_root> npm run build

    Dashboard deployment

    Deploy dashboard to s3 bucket from the cumulus-dashboard directory:

    Using AWS CLI:

      aws s3 sync dist s3://<prefix>-dashboard --acl public-read

    From the S3 Console:

    • Open the <prefix>-dashboard bucket, click 'upload'. Add the contents of the 'dist' subdirectory to the upload. Then select 'Next'. On the permissions window allow the public to view. Select 'Upload'.

    You should be able to visit the dashboard website at http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or find the url <prefix>-dashboard -> "Properties" -> "Static website hosting" -> "Endpoint" and login with a user that you configured for access in the Configure and Deploy the Cumulus Stack step.


    Cumulus Instance Sizing

    The Cumulus deployment default sizing for Elasticsearch instances, EC2 instances, and Autoscaling Groups are small and designed for testing and cost savings. The default settings are likely not suitable for production workloads. Sizing is highly individual and dependent on expected load and archive size.

    Please be cognizant of costs as any change in size will affect your AWS bill. AWS provides a pricing calculator for estimating costs.

    Elasticsearch

    The mappings file contains all of the data types that will be indexed into Elasticsearch. Elasticsearch sizing is tied to your archive size, including your collections, granules, and workflow executions that will be stored.

    AWS provides documentation on calculating and configuring for sizing.

    In addition to size you'll want to consider the number of nodes which determine how the system reacts in the event of a failure.

    Configuration can be done in the data persistence module in elasticsearch_config and the cumulus module in es_index_shards.

    If you make changes to your Elasticsearch configuration you will need to reindex for those changes to take effect.

    EC2 instances and autoscaling groups

    EC2 instances are used for long-running operations (i.e. generating a reconciliation report) and long-running workflow tasks. Configuration for your ECS cluster is achieved via Cumulus deployment variables.

    When configuring your ECS cluster consider:

    • The EC2 instance type and EBS volume size needed to accommodate your workloads. Configured as ecs_cluster_instance_type and ecs_cluster_instance_docker_volume_size.
    • The minimum and desired number of instances on hand to accommodate your workloads. Configured as ecs_cluster_min_size and ecs_cluster_desired_size.
    • The maximum number of instances you will need and are willing to pay for to accommodate your heaviest workloads. Configured as ecs_cluster_max_size.
    • Your autoscaling parameters: ecs_cluster_scale_in_adjustment_percent, ecs_cluster_scale_out_adjustment_percent, ecs_cluster_scale_in_threshold_percent, and ecs_cluster_scale_out_threshold_percent.

    Footnotes


    1. Run terraform init if:

      • This is the first time deploying the module
      • You have added any additional child modules, including Cumulus components
      • You have updated the source for any of the child modules

    2. To add another redirect URIs to your application. On Earthdata home page, select "My Applications". Scroll down to "Application Administration" and use the edit icon for your application. Then Manage -> Redirect URIs.

    3. The API root can be found a number of ways. The easiest is to note it in the output of the app deployment step. But you can also find it from the AWS console -> Amazon API Gateway -> APIs -> <prefix>-archive -> Dashboard, and reading the URL at the top after "Invoke this API at"

    - + \ No newline at end of file diff --git a/docs/v11.0.0/deployment/postgres_database_deployment/index.html b/docs/v11.0.0/deployment/postgres_database_deployment/index.html index d2b262e8525..7b4d5a115dc 100644 --- a/docs/v11.0.0/deployment/postgres_database_deployment/index.html +++ b/docs/v11.0.0/deployment/postgres_database_deployment/index.html @@ -5,7 +5,7 @@ PostgreSQL Database Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ cumulus-rds-tf that will deploy an AWS RDS Aurora Serverless PostgreSQL 10.2 compatible database cluster, and optionally provision a single deployment database with credentialed secrets for use with Cumulus.

    We have provided an example terraform deployment using this module in the Cumulus template-deploy repository on github.

    Use of this example involves:

    • Creating/configuring a Terraform module directory
    • Using Terraform to deploy resources to AWS

    Requirements

    Configuration/installation of this module requires the following:

    • Terraform
    • git
    • A VPC configured for use with Cumulus Core. This should match the subnets you provide when Deploying Cumulus to allow Core's lambdas to properly access the database.
    • At least two subnets across multiple AZs. These should match the subnets you provide as configuration when Deploying Cumulus, and should be within the same VPC.

    Needed Git Repositories

    Assumptions

    OS/Environment

    The instructions in this module require Linux/MacOS. While deployment via Windows is possible, it is unsupported.

    Terraform

    This document assumes knowledge of Terraform. If you are not comfortable working with Terraform, the following links should bring you up to speed:

    For Cumulus specific instructions on installation of Terraform, refer to the main Cumulus Installation Documentation

    Aurora/RDS

    This document also assumes some basic familiarity with PostgreSQL databases, and Amazon Aurora/RDS. If you're unfamiliar consider perusing the AWS docs, and the Aurora Serverless V1 docs.

    Prepare deployment repository

    If you already are working with an existing repository that has a configured rds-cluster-tf deployment for the version of Cumulus you intend to deploy or update, or just need to configure this module for your repository, skip to Prepare AWS configuration.

    Clone the cumulus-template-deploy repo and name appropriately for your organization:

      git clone https://github.com/nasa/cumulus-template-deploy <repository-name>

    We will return to configuring this repo and using it for deployment below.

    Optional: Create a new repository

    Create a new repository on Github so that you can add your workflows and other modules to source control:

      git remote set-url origin https://github.com/<org>/<repository-name>
    git push origin master

    You can then add/commit changes as needed.

    Note: If you are pushing your deployment code to a git repo, make sure to add terraform.tf and terraform.tfvars to .gitignore, as these files will contain sensitive data related to your AWS account.


    Prepare AWS configuration

    To deploy this module, you need to make sure that you have the following steps from the Cumulus deployment instructions in similar fashion for this module:

    --

    Configure and deploy the module

    When configuring this module, please keep in mind that unlike Cumulus deployment, this module should be deployed once to create the database cluster and only thereafter to make changes to that configuration/upgrade/etc. This module does not need to be re-deployed for each Core update.

    These steps should be executed in the rds-cluster-tf directory of the template deploy repo that you previously cloned. Run the following to copy the example files:

    cd rds-cluster-tf/
    cp terraform.tf.example terraform.tf
    cp terraform.tfvars.example terraform.tfvars

    In terraform.tf, configure the remote state settings by substituting the appropriate values for:

    • bucket
    • dynamodb_table
    • PREFIX (whatever prefix you've chosen for your deployment)

    Fill in the appropriate values in terraform.tfvars. See the rds-cluster-tf module variable definitions for more detail on all of the configuration options. A few notable configuration options are documented in the next section.

    Configuration Options

    • deletion_protection -- defaults to true. Set it to false if you want to be able to delete your cluster with a terraform destroy without manually updating the cluster.
    • db_admin_username -- cluster database administration username. Defaults to postgres.
    • db_admin_password -- required variable that specifies the admin user password for the cluster. To randomize this on each deployment, consider using a random_string resource as input.
    • region -- defaults to us-east-1.
    • subnets -- requires at least 2 across different AZs. For use with Cumulus, these AZs should match the values you configure for your lambda_subnet_ids.
    • max_capacity -- the max ACUs the cluster is allowed to use. Carefully consider cost/performance concerns when setting this value.
    • min_capacity -- the minimum ACUs the cluster will scale to
    • provision_user_database -- Optional flag to allow module to provision a user database in addition to creating the cluster. Described in the next section.

    Provision user and user database

    If you wish for the module to provision a PostgreSQL database on your new cluster and provide a secret for access in the module output, in addition to managing the cluster itself, the following configuration keys are required:

    • provision_user_database -- must be set to true, this configures the module to deploy a lambda that will create the user database, and update the provided configuration on deploy.
    • permissions_boundary_arn -- the permissions boundary to use in creating the roles for access the provisioning lambda will need. This should in most use cases be the same one used for Cumulus Core deployment.
    • rds_user_password -- the value to set the user password to
    • prefix -- this value will be used to set a unique identifier the ProvisionDatabase lambda, as well as name the provisioned user/database.

    Once configured, the module will deploy the lambda, and run it on each provision, creating the configured database if it does not exist, updating the user password if that value has been changed, and updating the output user database secret.

    Setting provision_user_database to false after provisioning will not result in removal of the configured database, as the lambda is non-destructive as configured in this module.

    Please Note: This functionality is limited in that it will only provision a single database/user and configure a basic database, and should not be used in scenarios where more complex configuration is required.

    Initialize Terraform

    Run terraform init

    You should see output like:

    * provider.aws: version = "~> 2.32"

    Terraform has been successfully initialized!

    Deploy

    Run terraform apply to deploy the resources.

    If re-applying this module, variables (e.g. engine_version, snapshot_identifier ) that force a recreation of the database cluster may result in data loss if deletion protection is disabled. Examine the changeset carefully for resources that will be re-created/destroyed before applying.

    Review the changeset, and assuming it looks correct, type yes when prompted to confirm that you want to create all of the resources.

    Assuming the operation is successful, you should see output similar to the following (this example omits the creation of a user database/lambdas/security groups):

    terraform apply

    An execution plan has been generated and is shown below.
    Resource actions are indicated with the following symbols:
    + create

    Terraform will perform the following actions:

    # module.rds_cluster.aws_db_subnet_group.default will be created
    + resource "aws_db_subnet_group" "default" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + subnet_ids = [
    + "subnet-xxxxxxxxx",
    + "subnet-xxxxxxxxx",
    ]
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    }

    # module.rds_cluster.aws_rds_cluster.cumulus will be created
    + resource "aws_rds_cluster" "cumulus" {
    + apply_immediately = true
    + arn = (known after apply)
    + availability_zones = (known after apply)
    + backup_retention_period = 1
    + cluster_identifier = "xxxxxxxxx"
    + cluster_identifier_prefix = (known after apply)
    + cluster_members = (known after apply)
    + cluster_resource_id = (known after apply)
    + copy_tags_to_snapshot = false
    + database_name = "xxxxxxxxx"
    + db_cluster_parameter_group_name = (known after apply)
    + db_subnet_group_name = (known after apply)
    + deletion_protection = true
    + enable_http_endpoint = true
    + endpoint = (known after apply)
    + engine = "aurora-postgresql"
    + engine_mode = "serverless"
    + engine_version = "10.12"
    + final_snapshot_identifier = "xxxxxxxxx"
    + hosted_zone_id = (known after apply)
    + id = (known after apply)
    + kms_key_id = (known after apply)
    + master_password = (sensitive value)
    + master_username = "xxxxxxxxx"
    + port = (known after apply)
    + preferred_backup_window = "07:00-09:00"
    + preferred_maintenance_window = (known after apply)
    + reader_endpoint = (known after apply)
    + skip_final_snapshot = false
    + storage_encrypted = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_security_group_ids = (known after apply)

    + scaling_configuration {
    + auto_pause = true
    + max_capacity = 4
    + min_capacity = 2
    + seconds_until_auto_pause = 300
    + timeout_action = "RollbackCapacityChange"
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret.rds_login will be created
    + resource "aws_secretsmanager_secret" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + policy = (known after apply)
    + recovery_window_in_days = 30
    + rotation_enabled = (known after apply)
    + rotation_lambda_arn = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }

    + rotation_rules {
    + automatically_after_days = (known after apply)
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret_version.rds_login will be created
    + resource "aws_secretsmanager_secret_version" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + secret_id = (known after apply)
    + secret_string = (sensitive value)
    + version_id = (known after apply)
    + version_stages = (known after apply)
    }

    # module.rds_cluster.aws_security_group.rds_cluster_access will be created
    + resource "aws_security_group" "rds_cluster_access" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + egress = (known after apply)
    + id = (known after apply)
    + ingress = (known after apply)
    + name = (known after apply)
    + name_prefix = "cumulus_rds_cluster_access_ingress"
    + owner_id = (known after apply)
    + revoke_rules_on_delete = false
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_id = "vpc-xxxxxxxxx"
    }

    # module.rds_cluster.aws_security_group_rule.rds_security_group_allow_PostgreSQL will be created
    + resource "aws_security_group_rule" "rds_security_group_allow_postgres" {
    + from_port = 5432
    + id = (known after apply)
    + protocol = "tcp"
    + security_group_id = (known after apply)
    + self = true
    + source_security_group_id = (known after apply)
    + to_port = 5432
    + type = "ingress"
    }

    Plan: 6 to add, 0 to change, 0 to destroy.

    Do you want to perform these actions?
    Terraform will perform the actions described above.
    Only 'yes' will be accepted to approve.

    Enter a value: yes

    module.rds_cluster.aws_db_subnet_group.default: Creating...
    module.rds_cluster.aws_security_group.rds_cluster_access: Creating...
    module.rds_cluster.aws_secretsmanager_secret.rds_login: Creating...

    Then, after the resources are created:

    Apply complete! Resources: X added, 0 changed, 0 destroyed.
    Releasing state lock. This may take a few moments...

    Outputs:

    admin_db_login_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxxxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmdR
    admin_db_login_secret_version = xxxxxxxxx
    rds_endpoint = xxxxxxxxx.us-east-1.rds.amazonaws.com
    security_group_id = xxxxxxxxx
    user_credentials_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmpXA

    Note the output values for admin_db_login_secret_arn (and optionally user_credentials_secret_arn) as these provide the AWS Secrets Manager secret required to access the database as the administrative user and, optionally, the user database credentials Cumulus requires as well.

    The content of each of these secrets are is in the form:

    {
    "database": "postgres",
    "dbClusterIdentifier": "clusterName",
    "engine": "postgres",
    "host": "xxx",
    "password": "defaultPassword",
    "port": 5432,
    "username": "xxx"
    }
    • database -- the PostgreSQL database used by the configured user
    • dbClusterIdentifier -- the value set by the cluster_identifier variable in the terraform module
    • engine -- the Aurora/RDS database engine
    • host -- the RDS service host for the database in the form (dbClusterIdentifier)-(AWS ID string).(region).rds.amazonaws.com
    • password -- the database password
    • username -- the account username
    • port -- The database connection port, should always be 5432

    Next Steps

    The database cluster has been created/updated! From here you can continue to add additional user accounts, databases and other database configuration.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/deployment/share-s3-access-logs/index.html b/docs/v11.0.0/deployment/share-s3-access-logs/index.html index 6e226dd1b51..4f887444ecc 100644 --- a/docs/v11.0.0/deployment/share-s3-access-logs/index.html +++ b/docs/v11.0.0/deployment/share-s3-access-logs/index.html @@ -5,14 +5,14 @@ Share S3 Access Logs | Cumulus Documentation - +
    Version: v11.0.0

    Share S3 Access Logs

    It is possible through Cumulus to share S3 access logs across multiple S3 packages using the S3 replicator package.

    S3 Replicator

    The S3 Replicator is a node package that contains a simple lambda function, associated permissions, and the Terraform instructions to replicate create-object events from one S3 bucket to another.

    First ensure that you have enabled S3 Server Access Logging.

    Next configure your config.tfvars as described in the s3-replicator/README.md to correspond to your deployment. The source_bucket and source_prefix are determined by how you enabled the S3 Server Access Logging.

    In order to deploy the s3-replicator with cumulus you will need to add the module to your terraform main.tf definition. e.g.

    module "s3-replicator" {
    source = "<path to s3-replicator.zip>"
    prefix = var.prefix
    vpc_id = var.vpc_id
    subnet_ids = var.subnet_ids
    permissions_boundary = var.permissions_boundary_arn
    source_bucket = var.s3_replicator_config.source_bucket
    source_prefix = var.s3_replicator_config.source_prefix
    target_bucket = var.s3_replicator_config.target_bucket
    target_prefix = var.s3_replicator_config.target_prefix
    }

    The terraform source package can be found on the Cumulus github release page under the asset tab terraform-aws-cumulus-s3-replicator.zip.

    ESDIS Metrics

    In the NGAP environment, the ESDIS Metrics team has set up an ELK stack to process logs from Cumulus instances. To use this system, you must deliver any S3 Server Access logs that Cumulus creates.

    Configure the S3 replicator as described above using the target_bucket and target_prefix provided by the metrics team.

    The metrics team has taken care of setting up Logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/deployment/terraform-best-practices/index.html b/docs/v11.0.0/deployment/terraform-best-practices/index.html index c3dee6abfce..6bb2a5025b0 100644 --- a/docs/v11.0.0/deployment/terraform-best-practices/index.html +++ b/docs/v11.0.0/deployment/terraform-best-practices/index.html @@ -5,7 +5,7 @@ Terraform Best Practices | Cumulus Documentation - + @@ -88,7 +88,7 @@ AWS CLI command, replacing PREFIX with your deployment prefix name:

    aws resourcegroupstaggingapi get-resources \
    --query "ResourceTagMappingList[].ResourceARN" \
    --tag-filters Key=Deployment,Values=PREFIX

    Ideally, the output should be an empty list, but if it is not, then you may need to manually delete the listed resources.

    Configuring the Cumulus deployment: link Restoring a previous version: link

    - + \ No newline at end of file diff --git a/docs/v11.0.0/deployment/thin_egress_app/index.html b/docs/v11.0.0/deployment/thin_egress_app/index.html index a75426e8390..58f85c8dc13 100644 --- a/docs/v11.0.0/deployment/thin_egress_app/index.html +++ b/docs/v11.0.0/deployment/thin_egress_app/index.html @@ -5,7 +5,7 @@ Using the Thin Egress App for Cumulus distribution | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v11.0.0

    Using the Thin Egress App for Cumulus distribution

    The Thin Egress App (TEA) is an app running in Lambda that allows retrieving data from S3 using temporary links and provides URS integration.

    Configuring a TEA deployment

    TEA is deployed using Terraform modules. Refer to these instructions for guidance on how to integrate new components with your deployment.

    The cumulus-template-deploy repository cumulus-tf/main.tf contains a thin_egress_app for distribution.

    The TEA module provides these instructions showing how to add it to your deployment and the following are instructions to configure the thin_egress_app module in your Cumulus deployment.

    Create a secret for signing Thin Egress App JWTs

    The Thin Egress App uses JWTs internally to authenticate requests and requires a secret stored in AWS Secrets Manager containing SSH keys that are used to sign the JWTs.

    See the Thin Egress App documentation on how to create this secret with the correct values. It will be used later to set the thin_egress_jwt_secret_name variable when deploying the Cumulus module.

    bucket_map.yaml

    The Thin Egress App uses a bucket_map.yaml file to determine which buckets to serve. Documentation of the file format is available here.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple json mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    Please note: Cumulus only supports a one-to-one mapping of bucket->TEA path for 'distribution' buckets.

    Optionally configure a custom bucket map

    A simple config would look something like this:

    bucket_map.yaml
    MAP:
    my-protected: my-protected
    my-public: my-public

    PUBLIC_BUCKETS:
    - my-public

    Please note: your custom bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Optionally configure shared variables

    The cumulus module deploys certain components that interact with TEA. As a result, the cumulus module requires that if you are specifying a value for the stage_name variable to the TEA module, you must use the same value for the tea_api_gateway_stage variable to the cumulus module.

    One way to keep these variable values in sync across the modules is to use Terraform local values to define values to use for the variables for both modules. This approach is shown in the Cumulus core example deployment code.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/deployment/upgrade-readme/index.html b/docs/v11.0.0/deployment/upgrade-readme/index.html index 503bc6f54bc..9395d78de07 100644 --- a/docs/v11.0.0/deployment/upgrade-readme/index.html +++ b/docs/v11.0.0/deployment/upgrade-readme/index.html @@ -5,7 +5,7 @@ Upgrading Cumulus | Cumulus Documentation - + @@ -15,7 +15,7 @@ deployment functions correctly. Please refer to some recommended smoke tests given above, and consider additional tests appropriate for your particular deployment and environment.

    Update Cumulus Dashboard

    If there are breaking (or otherwise significant) changes to the Cumulus API, you should also upgrade your Cumulus Dashboard deployment to use the version of the Cumulus API matching the version of Cumulus to which you are migrating.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/development/forked-pr/index.html b/docs/v11.0.0/development/forked-pr/index.html index d40061decdf..ce5f638c703 100644 --- a/docs/v11.0.0/development/forked-pr/index.html +++ b/docs/v11.0.0/development/forked-pr/index.html @@ -5,13 +5,13 @@ Issuing PR From Forked Repos | Cumulus Documentation - +
    Version: v11.0.0

    Issuing PR From Forked Repos

    Fork the Repo

    • Fork the Cumulus repo
    • Create a new branch from the branch you'd like to contribute to
    • If an issue does't already exist, submit one (see above)

    Create a Pull Request

    Reviewing PRs from Forked Repos

    Upon submission of a pull request, the Cumulus development team will review the code.

    Once the code passes an initial review, the team will run the CI tests against the proposed update.

    The request will then either be merged, declined, or an adjustment to the code will be requested via the issue opened with the original PR request.

    PRs from forked repos cannot directly merged to master. Cumulus reviews must follow the following steps before completing the review process:

    1. Create a new branch:

        git checkout -b from-<name-of-the-branch> master
    2. Push the new branch to GitHub

    3. Change the destination of the forked PR to the new branch that was just pushed

      Screenshot of Github interface showing how to change the base branch of a pull request

    4. After code review and approval, merge the forked PR to the new branch.

    5. Create a PR for the new branch to master.

    6. If the CI tests pass, merge the new branch to master and close the issue. If the CI tests do not pass, request an amended PR from the original author/ or resolve failures as appropriate.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/development/integration-tests/index.html b/docs/v11.0.0/development/integration-tests/index.html index eba7ef2f66e..110bdb56f23 100644 --- a/docs/v11.0.0/development/integration-tests/index.html +++ b/docs/v11.0.0/development/integration-tests/index.html @@ -5,7 +5,7 @@ Integration Tests | Cumulus Documentation - + @@ -19,7 +19,7 @@ in the commit message.

    If you create a new stack and want to be able to run integration tests against it in CI, you will need to add it to bamboo/select-stack.js.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/development/quality-and-coverage/index.html b/docs/v11.0.0/development/quality-and-coverage/index.html index 72bdd2446fb..84a187b6479 100644 --- a/docs/v11.0.0/development/quality-and-coverage/index.html +++ b/docs/v11.0.0/development/quality-and-coverage/index.html @@ -5,7 +5,7 @@ Code Coverage and Quality | Cumulus Documentation - + @@ -23,7 +23,7 @@ here.

    To run linting on the markdown files, run npm run lint-md.

    Audit

    This project uses audit-ci to run a security audit on the package dependency tree. This must pass prior to merge. The configured rules for audit-ci can be found here.

    To execute an audit, run npm run audit.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/development/release/index.html b/docs/v11.0.0/development/release/index.html index 350889da2b0..c4888ac651b 100644 --- a/docs/v11.0.0/development/release/index.html +++ b/docs/v11.0.0/development/release/index.html @@ -5,7 +5,7 @@ Versioning and Releases | Cumulus Documentation - + @@ -15,7 +15,7 @@ It's useful to use the search feature of your code editor or grep to see if there any references to the old package versions. In bash shell you can run

    find . -name package.json -exec grep -nH "@cumulus/.*MAJOR\.MINOR\.PATCH.*" {} \;

    Verify that each of those is updated to the new MAJOR.MINOR.PATCH verion you are trying to release.

    A similar search for alpha and beta versions should be run on the release version and any problems should be fixed.

    find . -name package.json -exec grep -nHE "MAJOR\.MINOR\.PATCH.*(alpha|beta)" {} \;

    3. Check Cumulus Dashboard PRs for Version Bump

    There may be unreleased changes in the Cumulus Dashboard project that rely on this unreleased Cumulus Core version.

    If there is exists a PR in the cumulus-dashboard repo with a name containing: "Version Bump for Next Cumulus API Release":

    • There will be a placeholder change-me value that should be replaced with the Cumulus Core to-be-released-version.
    • Mark that PR as ready to be reviewed.

    4. Update CHANGELOG.md

    Update the CHANGELOG.md. Put a header under the Unreleased section with the new version number and the date.

    Add a link reference for the github "compare" view at the bottom of the CHANGELOG.md, following the existing pattern. This link reference should create a link in the CHANGELOG's release header to changes in the corresponding release.

    5. Update DATA_MODEL_CHANGELOG.md

    Similar to #4, make sure the DATA_MODEL_CHANGELOG is updated if there are data model changes in the release, and the link reference at the end of the document is updated as appropriate.

    6. Update CONTRIBUTORS.md

    ./bin/update-contributors.sh
    git add CONTRIBUTORS.md

    Commit and push these changes, if any.

    7. Update Cumulus package API documentation

    Update auto-generated API documentation for any Cumulus packages that have it:

    npm run docs-build-packages

    Commit and push these changes, if any.

    8. Cut new version of Cumulus Documentation

    If this is a backport, do not create a new version of the documentation. For various reasons, we do not merge backports back to master, other than changelog notes. Documentation changes for backports will not be published to our documentation website.

    cd website
    npm run version ${release_version}
    git add .

    Where ${release_version} corresponds to the version tag v1.2.3, for example.

    Commit and push these changes.

    9. Create a pull request against the minor version branch

    1. Push the release branch (e.g. release-1.2.3) to GitHub.

    2. Create a PR against the minor version base branch (e.g. release-1.2.x).

    3. Configure Bamboo to run automated tests against this PR by finding the branch plan for the release branch (release-1.2.3) and setting only these variables:

      • GIT_PR: true
      • SKIP_AUDIT: true

      IMPORTANT: Do NOT set the PUBLISH_FLAG variable to true for this branch plan. The actual publishing of the release will be handled by a separate, manually triggered branch plan.

      Screenshot of Bamboo CI interface showing the configuration of the GIT_PR branch variable to have a value of &quot;true&quot;

    4. Verify that the Bamboo build for the PR succeeds and then merge to the minor version base branch (release-1.2.x).

      • It is safe to do a squash merge in this instance, but not required
    5. You may delete your release branch (release-1.2.3) after merging to the base branch.

    10. Create a git tag for the release

    Check out the minor version base branch (release-1.2.x) now that your changes are merged in and do a git pull.

    Ensure you are on the latest commit.

    Create and push a new git tag:

        git tag -a vMAJOR.MINOR.PATCH -m "Release MAJOR.MINOR.PATCH"
    git push origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -a v9.1.0 -m "Release 9.1.0"
    git push origin v9.1.0

    11. Publishing the release

    Publishing of new releases is handled by a custom Bamboo branch plan and is manually triggered.

    The reasons for using a separate branch plan to handle releases instead of the branch plan for the minor version (e.g. release-1.2.x) are:

    • The Bamboo build for the minor version release branch is triggered automatically on any commits to that branch, whereas we want to manually control when the release is published.
    • We want to verify that integration tests have passed on the Bamboo build for the minor version release branch before we manually trigger the release, so that we can be sure that our code is safe to release.

    If this is a new minor version branch, then you will need to create a new Bamboo branch plan for publishing the release following the instructions below:

    Creating a Bamboo branch plan for the release

    • In the Cumulus Core project (https://ci.earthdata.nasa.gov/browse/CUM-CBA), click Actions -> Configure Plan in the top right.

    • Next to Plan branch click the rightmost button that displays Create Plan Branch upon hover.

    • Click Create plan branch manually.

    • Add the values in that list. Choose a display name that makes it very clear this is a deployment branch plan. Release (minor version branch name) seems to work well (e.g. Release (1.2.x))).

      • Make sure you enter the correct branch name (e.g. release-1.2.x).
    • Important Deselect Enable Branch - if you do not do this, it will immediately fire off a build.

    • Do Immediately On the Branch Details page, enable Change trigger. Set the Trigger type to manual, this will prevent commits to the branch from triggering the build plan. You should have been redirected to the Branch Details tab after creating the plan. If not, navigate to the branch from the list where you clicked Create Plan Branch in the previous step.

    • Go to the Variables tab. Ensure that you are on your branch plan and not the master plan: You should not see a large list of configured variables, but instead a dropdown allowing you to select variables to override, and the tab title will be Branch Variables. Then set the branch variables as follow:

      • DEPLOYMENT: cumulus-from-npm-tf (except in special cases such as incompatible backport branches)
        • If this variable is not set, it will default to the deployment name for the last committer on the branch
      • USE_CACHED_BOOTSTRAP: false
      • USE_TERRAFORM_ZIPS: true (IMPORTANT: MUST be set in order to run integration tests against the .zip files published during the build so that we are actually testing our released files)
      • GIT_PR: true
      • SKIP_AUDIT: true
      • PUBLISH_FLAG: true
    • Enable the branch from the Branch Details page.

    • Run the branch using the Run button in the top right.

    Bamboo will build and run lint and unit tests against that tagged release, publish the new packages to NPM, and then run the integration tests using those newly released packages.

    12. Create a new Cumulus release on github

    The CI release scripts will automatically create a GitHub release based on the release version tag, as well as upload artifacts to the Github release for the Terraform modules provided by Cumulus. The Terraform release artifacts include:

    • A multi-module Terraform .zip artifact containing filtered copies of the tf-modules, packages, and tasks directories for use as Terraform module sources.
    • A S3 replicator module
    • A workflow module
    • A distribution API module
    • An ECS service module

    Just make sure to verify the appropriate .zip files are present on Github after the release process is complete.

    13. Merge base branch back to master

    Finally, you need to reproduce the version update changes back to master.

    If this is the latest version, you can simply create a PR to merge the minor version base branch back to master.

    Do not merge master back into the release branch since we want the release branch to just have the code from the release. Instead, create a new branch off of the release branch and merge that to master. You can freely merge master into this branch and delete it when it is merged to master.

    If this is a backport, you will need to create a PR that ports the changelog updates back to master. It is important in this changelog note to call it out as a backport. For example, fixes in backport version 1.14.5 may not be available in 1.15.0 because the fix was introduced in 1.15.3.

    Troubleshooting

    Delete and regenerate the tag

    To delete a published tag to re-tag, follow these steps:

      git tag -d vMAJOR.MINOR.PATCH
    git push -d origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -d v9.1.0
    git push -d origin v9.1.0
    - + \ No newline at end of file diff --git a/docs/v11.0.0/docs-how-to/index.html b/docs/v11.0.0/docs-how-to/index.html index 868d19a7b1f..ead7c2e2fcf 100644 --- a/docs/v11.0.0/docs-how-to/index.html +++ b/docs/v11.0.0/docs-how-to/index.html @@ -5,13 +5,13 @@ Cumulus Documentation: How To's | Cumulus Documentation - +
    Version: v11.0.0

    Cumulus Documentation: How To's

    Cumulus Docs Installation

    Run a Local Server

    Environment variables DOCSEARCH_API_KEY and DOCSEARCH_INDEX_NAME must be set for search to work. At the moment, search is only truly functional on prod because that is the only website we have registered to be indexed with DocSearch (see below on search).

    git clone git@github.com:nasa/cumulus
    cd cumulus
    npm run docs-install
    npm run docs-serve

    Note: docs-build will build the documents into website/build.

    Cumulus Documentation

    Our project documentation is hosted on GitHub Pages. The resources published to this website are housed in docs/ directory at the top of the Cumulus repository. Those resources primarily consist of markdown files and images.

    We use the open-source static website generator Docusaurus to build html files from our markdown documentation, add some organization and navigation, and provide some other niceties in the final website (search, easy templating, etc.).

    Add a New Page and Sidebars

    Adding a new page should be as simple as writing some documentation in markdown, placing it under the correct directory in the docs/ folder and adding some configuration values wrapped by --- at the top of the file. There are many files that already have this header which can be used as reference.

    ---
    id: doc-unique-id # unique id for this document. This must be unique across ALL documentation under docs/
    title: Title Of Doc # Whatever title you feel like adding. This will show up as the index to this page on the sidebar.
    hide_title: false
    ---

    Note: To have the new page show up in a sidebar the designated id must be added to a sidebar in the website/sidebars.js file. Docusaurus has an in depth explanation of sidebars here.

    Versioning Docs

    We lean heavily on Docusaurus for versioning. Their suggestions and walk-through can be found here. It is worth noting that we would like the Documentation versions to match up directly with release versions. Cumulus versioning is explained in the Versioning Docs.

    Search on our documentation site is taken care of by DocSearch. We have been provided with an apiKey and an indexName by DocSearch that we include in our website/siteConfig.js file. The rest, indexing and actual searching, we leave to DocSearch. Our builds expect environment variables for both these values to exist - DOCSEARCH_API_KEY and DOCSEARCH_NAME_INDEX.

    Add a new task

    The tasks list in docs/tasks.md is generated from the list of task package in the task folder. Do not edit the docs/tasks.md file directly.

    Read more about adding a new task.

    Editing the tasks.md header or template

    Look at the bin/build-tasks-doc.js and bin/tasks-header.md files to edit the output of the tasks build script.

    Editing diagrams

    For some diagrams included in the documentation, the raw source is included in the docs/assets/raw directory to allow for easy updating in the future:

    • assets/interfaces.svg -> assets/raw/interfaces.drawio (generated using draw.io)

    Deployment

    The master branch is automatically built and deployed to gh-pages branch. The gh-pages branch is served by Github Pages. Do not make edits to the gh-pages branch.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/external-contributions/index.html b/docs/v11.0.0/external-contributions/index.html index 067dfcd1ecf..f29996097af 100644 --- a/docs/v11.0.0/external-contributions/index.html +++ b/docs/v11.0.0/external-contributions/index.html @@ -5,13 +5,13 @@ External Contributions | Cumulus Documentation - +
    Version: v11.0.0

    External Contributions

    Contributions to Cumulus may be made in the form of PRs to the repositories directly or through externally developed tasks and components. Cumulus is designed as an ecosystem that leverages Terraform deployments and AWS Step Functions to easily integrate external components.

    This list may not be exhaustive and represents components that are open source, owned externally, and that have been tested with the Cumulus system. For more information and contributing guidelines, visit the respective GitHub repositories.

    Distribution

    The ASF Thin Egress App is used by Cumulus for distribution. TEA can be deployed with Cumulus or as part of other applications to distribute data.

    Operational Cloud Recovery Archive (ORCA)

    ORCA can be deployed with Cumulus to provide a customizable baseline for creating and managing operational backups.

    Workflow Tasks

    CNM

    PO.DAAC provides two workflow tasks to be used with the Cloud Notification Mechanism (CNM) Schema: CNM to Granule and CNM Response.

    See the CNM workflow data cookbook for an example of how these can be used in a Cumulus ingest workflow.

    DMR++ Generation

    GHRC has provided a DMR++ Generation wokrflow task. This task is meant to be used in conjunction with Cumulus' Hyrax Metadata Updates workflow task.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/faqs/index.html b/docs/v11.0.0/faqs/index.html index a2f7b8c245d..599c07d450e 100644 --- a/docs/v11.0.0/faqs/index.html +++ b/docs/v11.0.0/faqs/index.html @@ -5,13 +5,13 @@ Frequently Asked Questions | Cumulus Documentation - +
    Version: v11.0.0

    Frequently Asked Questions

    Below are some commonly asked questions that you may encounter that can assist you along the way when working with Cumulus.

    General

    How do I deploy a new instance in Cumulus?

    Answer: For steps on the Cumulus deployment process go to How to Deploy Cumulus.

    What prerequisites are needed to setup Cumulus?

    Answer: You will need access to the AWS console and an Earthdata login before you can deploy Cumulus.

    What is the preferred web browser for the Cumulus environment?

    Answer: Our preferred web browser is the latest version of Google Chrome.

    How do I quickly troubleshoot an issue in Cumulus?

    Answer: To troubleshoot and fix issues in Cumulus reference our recommended solutions in Troubleshooting Cumulus.

    Where can I get support help?

    Answer: The following options are available for assistance:

    • Cumulus: Outside NASA users should file a GitHub issue and inside NASA users should file a JIRA issue.
    • AWS: You can create a case in the AWS Support Center, accessible via your AWS Console.

    Integrators & Developers

    What is a Cumulus integrator?

    Answer: Those who are working within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    What are the steps if I run into an issue during deployment?

    Answer: If you encounter an issue with your deployment go to the Troubleshooting Deployment guide.

    Is Cumulus customizable and flexible?

    Answer: Yes. Cumulus is a modular architecture that allows you to decide which components that you want/need to deploy. These components are maintained as Terraform modules.

    What are Terraform modules?

    Answer: They are modules that are composed to create a Cumulus deployment, which gives integrators the flexibility to choose the components of Cumulus that want/need. To view Cumulus maintained modules or steps on how to create a module go to Terraform modules.

    Where do I find Terraform module variables

    Answer: Go here for a list of Cumulus maintained variables.

    What is a Cumulus workflow?

    Answer: A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions. For more details, we suggest visiting here.

    How do I set up a Cumulus workflow?

    Answer: You will need to create a provider, have an associated collection (add a new one), and generate a new rule first. Then you can set up a Cumulus workflow by following these steps here.

    What are the common use cases that a Cumulus integrator encounters?

    Answer: The following are some examples of possible use cases you may see:


    Operators

    What is a Cumulus operator?

    Answer: Those that ingests, archives, and troubleshoots datasets (called collections in Cumulus). Your daily activities might include but not limited to the following:

    • Ingesting datasets
    • Maintaining historical data ingest
    • Starting and stopping data handlers
    • Managing collections
    • Managing provider definitions
    • Creating, enabling, and disabling rules
    • Investigating errors for granules and deleting or re-ingesting granules
    • Investigating errors in executions and isolating failed workflow step(s)
    What are the common use cases that a Cumulus operator encounters?

    Answer: The following are some examples of possible use cases you may see:

    Can you re-run a workflow execution in AWS?

    Answer: Yes. For steps on how to re-run a workflow execution go to Re-running workflow executions in the Cumulus Operator Docs.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/features/ancillary_metadata/index.html b/docs/v11.0.0/features/ancillary_metadata/index.html index e6047bc95a1..c53388e3e0e 100644 --- a/docs/v11.0.0/features/ancillary_metadata/index.html +++ b/docs/v11.0.0/features/ancillary_metadata/index.html @@ -5,7 +5,7 @@ Ancillary Metadata Export | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v11.0.0

    Ancillary Metadata Export

    This feature utilizes the type key on a files object in a Cumulus granule. It uses the key to provide a mechanism where granule discovery, processing and other tasks can set and use this value to facilitate metadata export to CMR.

    Tasks setting type

    Discover Granules

    Uses the Collection type key to set the value for files on discovered granules in it's output.

    Parse PDR

    Uses a task-specific mapping to map PDR 'FILE_TYPE' to a CNM type to set type on granules from the PDR.

    CNMToCMALambdaFunction

    Natively supports types that are included in incoming messages to a CNM Workflow.

    Tasks using type

    Move Granules

    Uses the granule file type key to update UMM/ECHO 10 CMR files passed in as candidates to the task. This task adds the external facing URLs to the CMR metadata file based on the type. See the file tracking data cookbook for a detailed mapping. If a non-CNM type is specified, the task assumes it is a 'data' file.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/features/backup_and_restore/index.html b/docs/v11.0.0/features/backup_and_restore/index.html index f7eb628df78..d945e9260cb 100644 --- a/docs/v11.0.0/features/backup_and_restore/index.html +++ b/docs/v11.0.0/features/backup_and_restore/index.html @@ -5,7 +5,7 @@ Cumulus Backup and Restore | Cumulus Documentation - + @@ -52,7 +52,7 @@ writing to the old cluster.

  • Set the snapshot_identifier variable to the snapshot you wish to create, and configure the module like a new deployment, with a unique cluster_identifier

  • Deploy the module using terraform apply

  • Once deployed, verify the cluster has the expected data

  • Redeploy the data persistence and Cumulus deployments - You should not need to reconfigure either, as the secret ARN and the security group should not change, however double-check the configured values are as expected

  • - + \ No newline at end of file diff --git a/docs/v11.0.0/features/dead_letter_archive/index.html b/docs/v11.0.0/features/dead_letter_archive/index.html index ba359b30acd..961e6c9d364 100644 --- a/docs/v11.0.0/features/dead_letter_archive/index.html +++ b/docs/v11.0.0/features/dead_letter_archive/index.html @@ -5,13 +5,13 @@ Cumulus Dead Letter Archive | Cumulus Documentation - +
    Version: v11.0.0

    Cumulus Dead Letter Archive

    This documentation explains the Cumulus dead letter archive and associated functionality.

    DB Records DLQ Archive

    The Cumulus system contains a number of dead letter queues. Perhaps the most important system lambda function supported by a DLQ is the sfEventSqsToDbRecords lambda function which parses Cumulus messages from workflow executions to generate and write database records to the Cumulus database.

    As of Cumulus v9+, the dead letter queue for this lambda (named sfEventSqsToDbRecordsDeadLetterQueue) has been updated with a consumer lambda that will automatically write any incoming records to the S3 system bucket, under the path <stackName>/dead-letter-archive/sqs/. This will allow integrators and operators engaged in debugging missing records to inspect any Cumulus messages which failed to process and did not result in the successful creation of database records.

    Dead Letter Archive recovery

    In addition to the above, as of Cumulus v9+, the Cumulus API also contains a new endpoint at /deadLetterArchive/recoverCumulusMessages.

    Sending a POST request to this endpoint will trigger a Cumulus AsyncOperation that will attempt to reprocess (and if successful delete) all Cumulus messages in the dead letter archive, using the same underlying logic as the existing sfEventSqsToDbRecords.

    This endpoint may prove particularly useful when recovering from extended or unexpected database outage, where messages failed to process due to external outage and there is no essential malformation of each Cumulus message.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/features/dead_letter_queues/index.html b/docs/v11.0.0/features/dead_letter_queues/index.html index 37437259a8d..e5aca739584 100644 --- a/docs/v11.0.0/features/dead_letter_queues/index.html +++ b/docs/v11.0.0/features/dead_letter_queues/index.html @@ -5,13 +5,13 @@ Dead Letter Queues | Cumulus Documentation - +
    Version: v11.0.0

    Dead Letter Queues

    startSF SQS queue

    The workflow-trigger for the startSF queue has a Redrive Policy set up that directs any failed attempts to pull from the workflow start queue to a SQS queue Dead Letter Queue.

    This queue can then be monitored for failures to initiate a workflow. Please note that workflow failures will not show up in this queue, only repeated failure to trigger a workflow.

    Named Lambda Dead Letter Queues

    Cumulus provides configured Dead Letter Queues (DLQ) for non-workflow Lambdas (such as ScheduleSF) to capture Lambda failures for further processing.

    These DLQs are setup with the following configuration:

      receive_wait_time_seconds  = 20
    message_retention_seconds = 1209600
    visibility_timeout_seconds = 60

    Default Lambda Configuration

    The following built-in Cumulus Lambdas are setup with DLQs to allow handling of process failures:

    • dbIndexer (Updates Elasticsearch)
    • JobsLambda (writes logs outputs to Elasticsearch)
    • ScheduleSF (the SF Scheduler Lambda that places messages on the queue that is used to start workflows, see Workflow Triggers)
    • publishReports (Lambda that publishes messages to the SNS topics for execution, granule and PDR reporting)
    • reportGranules, reportExecutions, reportPdrs (Lambdas responsible for updating records based on messages in the queues published by publishReports)

    Troubleshooting/Utilizing messages in a Dead Letter Queue

    Ideally an automated process should be configured to poll the queue and process messages off a dead letter queue.

    For aid in manually troubleshooting, you can utilize the SQS Management console to view/messages available in the queues setup for a particular stack. The dead letter queues will have a Message Body containing the Lambda payload, as well as Message Attributes that reference both the error returned and a RequestID which can be cross referenced to the associated Lambda's CloudWatch logs for more information:

    Screenshot of the AWS SQS console showing how to view SQS message attributes

    - + \ No newline at end of file diff --git a/docs/v11.0.0/features/distribution-metrics/index.html b/docs/v11.0.0/features/distribution-metrics/index.html index 650cd8ada0c..b04a444f707 100644 --- a/docs/v11.0.0/features/distribution-metrics/index.html +++ b/docs/v11.0.0/features/distribution-metrics/index.html @@ -5,13 +5,13 @@ Cumulus Distribution Metrics | Cumulus Documentation - +
    Version: v11.0.0

    Cumulus Distribution Metrics

    It is possible to configure Cumulus and the Cumulus Dashboard to display information about the successes and failures of requests for data. This requires the Cumulus instance to deliver Cloudwatch Logs and S3 Server Access logs to an ELK stack.

    ESDIS Metrics in NGAP

    Work with the ESDIS metrics team to set up permissions and access to forward Cloudwatch Logs to a shared AWS:Logs:Destination as well as transferring your S3 Server Access logs to a metrics team bucket.

    The metrics team has taken care of setting up logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    Once Cumulus has been configured to deliver Cloudwatch logs to the ESDIS Metrics team, you can use the Elasticsearch indexes to create the necessary target patterns on the dashboard. These are often <daac>-cloudwatch-cumulus-<env>-* and <daac>-distribution-<env>-*, but they will depend on your specific Elastiscearch setup.

    Cumulus / ESDIS Metrics distribution system

    Architecture diagram showing how logs are replicated from a Cumulus instance to the ESDIS Metrics account and accessed by the Cumulus dashboard

    - + \ No newline at end of file diff --git a/docs/v11.0.0/features/execution_payload_retention/index.html b/docs/v11.0.0/features/execution_payload_retention/index.html index 77d1dda3c45..35a541099a2 100644 --- a/docs/v11.0.0/features/execution_payload_retention/index.html +++ b/docs/v11.0.0/features/execution_payload_retention/index.html @@ -5,13 +5,13 @@ Execution Payload Retention | Cumulus Documentation - +
    Version: v11.0.0

    Execution Payload Retention

    In addition to CloudWatch logs and AWS StepFunction API records, Cumulus automatically stores the initial and 'final' (the last update to the execution record) payload values as part of the Execution record in your RDS database and Elasticsearch.

    This allows access via the API (or optionally direct DB/Elasticsearch querying) for debugging/reporting purposes. The data is stored in the "originalPayload" and "finalPayload" fields.

    Payload record cleanup

    To reduce storage requirements, a CloudWatch rule ({stack-name}-dailyExecutionPayloadCleanupRule) triggering a daily run of the provided cleanExecutions lambda has been added. This lambda will remove all 'completed' and 'non-completed' payload records in the database that are older than the specified configuration.

    Configuration

    The following configuration flags have been made available in the cumulus module. They may be overridden in your deployment's instance of the cumulus module by adding the following configuration options:

    dailyexecution_payload_cleanup_schedule_expression (string)_

    This configuration option sets the execution times for this Lambda to run, using a Cloudwatch cron expression.

    Default value is "cron(0 4 * * ? *)".

    completeexecution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of completed execution payloads.

    Default value is false.

    completeexecution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a 'completed' status in days. Records with updatedAt values older than this with payload information will have that information removed.

    Default value is 10.

    noncomplete_execution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of "non-complete" (any status other than completed) execution payloads.

    Default value is false.

    noncomplete_execution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a status other than 'complete' in days. Records with updateTime values older than this with payload information will have that information removed.

    Default value is 30 days.

    • complete_execution_payload_disable/non_complete_execution_payload_disable

    These flags (true/false) determine if the cleanup script's logic for 'complete' and 'non-complete' executions will run. Default value is false for both.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/features/logging-esdis-metrics/index.html b/docs/v11.0.0/features/logging-esdis-metrics/index.html index 4b790a6bf55..f07a9f0ccec 100644 --- a/docs/v11.0.0/features/logging-esdis-metrics/index.html +++ b/docs/v11.0.0/features/logging-esdis-metrics/index.html @@ -5,13 +5,13 @@ Writing logs for ESDIS Metrics | Cumulus Documentation - +
    Version: v11.0.0

    Writing logs for ESDIS Metrics

    Note: This feature is only available for Cumulus deployments in NGAP environments.

    Prerequisite: You must configure your Cumulus deployment to deliver your logs to the correct shared logs destination for ESDIS metrics.

    Log messages delivered to the ESDIS metrics logs destination conforming to an expected format will be automatically ingested and parsed to enable helpful searching/filtering of your logs via the ESDIS metrics Kibana dashboard.

    Expected log format

    The ESDIS metrics pipeline expects a log message to be a JSON string representation of an object (dict in Python or map in Java). An example log message might look like:

    {
    "level": "info",
    "executions": "arn:aws:states:us-east-1:000000000000:execution:MySfn:abcd1234",
    "granules": "[\"granule-1\",\"granule-2\"]",
    "message": "hello world",
    "sender": "greetingFunction",
    "stackName": "myCumulus",
    "timestamp": "2018-10-19T19:12:47.501Z"
    }

    A log message can contain the following properties:

    • executions: The AWS Step Function execution name in which this task is executing, if any
    • granules: A JSON string of the array of granule IDs being processed by this code, if any
    • level: A string identifier for the type of message being logged. Possible values:
      • debug
      • error
      • fatal
      • info
      • warn
      • trace
    • message: String containing your actual log message
    • parentArn: The parent AWS Step Function execution ARN that triggered the current execution, if any
    • sender: The name of the resource generating the log message (e.g. a library name, a Lambda function name, an ECS activity name)
    • stackName: The unique prefix for your Cumulus deployment
    • timestamp: An ISO-8601 formatted timestamp
    • version: The version of the resource generating the log message, if any

    None of these properties are explicitly required for ESDIS metrics to parse your log correctly. However, a log without a message has no informational content. And having level, sender, and timestamp properties is very useful for filtering your logs. Including a stackName in your logs is helpful as it allows you to distinguish between logs generated by different deployments.

    Using Cumulus Message Adapter libraries

    If you are writing a custom task that is integrated with the Cumulus Message Adapter, then some of language specific client libraries can be used to write logs compatible with ESDIS metrics.

    The usage of each library differs slightly, but in general a logger is initialized with a Cumulus workflow message to determine the contextual information for the task (e.g. granules, executions). Then, after the logger is initialized, writing logs only requires specifying a message, but the logged output will include the contextual information as well.

    Writing logs using custom code

    Any code that produces logs matching the expected log format can be processed by ESDIS metrics.

    Node.js

    Cumulus core provides a @cumulus/logger library that writes logs in the expected format for ESDIS metrics.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/features/replay-archived-sqs-messages/index.html b/docs/v11.0.0/features/replay-archived-sqs-messages/index.html index 742f8daa452..8ce21b56651 100644 --- a/docs/v11.0.0/features/replay-archived-sqs-messages/index.html +++ b/docs/v11.0.0/features/replay-archived-sqs-messages/index.html @@ -5,14 +5,14 @@ How to replay SQS messages archived in S3 | Cumulus Documentation - +
    Version: v11.0.0

    How to replay SQS messages archived in S3

    Context

    Cumulus archives all incoming SQS messages to S3 and removes messages once they have been processed. Unprocessed messages are archived at the path: ${stackName}/archived-incoming-messages/${queueName}/${messageId}

    Replay SQS messages endpoint

    The Cumulus API has added a new endpoint, /replays/sqs. This endpoint will allow you to start a replay operation to requeue all archived SQS messages by queueName and returns an AsyncOperationId for operation status tracking.

    Start replaying archived SQS messages

    In order to start a replay, you must perform a POST request to the replays/sqs endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    FieldTypeDescription
    queueNamestringAny valid SQS queue name (not ARN)

    Status tracking

    A successful response from the /replays/sqs endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/features/replay-kinesis-messages/index.html b/docs/v11.0.0/features/replay-kinesis-messages/index.html index f795e7d2fdb..2c52f5216c9 100644 --- a/docs/v11.0.0/features/replay-kinesis-messages/index.html +++ b/docs/v11.0.0/features/replay-kinesis-messages/index.html @@ -5,7 +5,7 @@ How to replay Kinesis messages after an outage | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v11.0.0

    How to replay Kinesis messages after an outage

    After a period of outage, it may be necessary for a Cumulus operator to reprocess or 'replay' messages that arrived on an AWS Kinesis Data Stream but did not trigger an ingest. This document serves as an outline on how to start a replay operation, and how to perform status tracking. Cumulus supports replay of all Kinesis messages on a stream (subject to the normal RetentionPeriod constraints), or all messages within a given time slice delimited by start and end timestamps.

    As Kinesis has no comparable field to e.g. the SQS ReceiveCount on its records, Cumulus cannot tell which messages within a given time slice have never been processed, and cannot guarantee only missed messages will be processed. Users will have to rely on duplicate handling or some other method of identifying messages that should not be processed within the time slice.

    NOTE: This operation flow effectively changes only the trigger mechanism for Kinesis ingest notifications. The existence of valid Kinesis-type rules and all other normal requirements for the triggering of ingest via Kinesis still apply.

    Replays endpoint

    Cumulus has added a new endpoint to its API, /replays. This endpoint will allow you to start replay operations and returns an AsyncOperationId for operation status tracking.

    Start a replay

    In order to start a replay, you must perform a POST request to the replays endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    NOTE: As the endTimestamp relies on a comparison with the Kinesis server-side ApproximateArrivalTimestamp, and given that there is no documented level of accuracy for the approximation, it is recommended that the endTimestamp include some amount of buffer to allow for slight discrepancies. If tolerable, the same is recommended for the startTimestamp although it is used differently and less vulnerable to discrepancies since a server-side arrival timestamp should never be earlier than the client-side request timestamp.

    FieldTypeRequiredDescription
    typestringrequiredCurrently only accepts kinesis.
    kinesisStreamstringfor type kinesisAny valid kinesis stream name (not ARN)
    kinesisStreamCreationTimestamp*optionalAny input valid for a JS Date constructor. For reasons to use this field see AWS documentation on StreamCreationTimestamp.
    endTimestamp*optionalAny input valid for a JS Date constructor. Messages newer than this timestamp will be skipped.
    startTimestamp*optionalAny input valid for a JS Date constructor. Messages will be fetched from the Kinesis stream starting at this timestamp. Ignored if it is further in the past than the stream's retention period.

    Status tracking

    A successful response from the /replays endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/features/reports/index.html b/docs/v11.0.0/features/reports/index.html index ee10cad0927..e17b1701e39 100644 --- a/docs/v11.0.0/features/reports/index.html +++ b/docs/v11.0.0/features/reports/index.html @@ -5,7 +5,7 @@ Reconciliation Reports | Cumulus Documentation - + @@ -19,7 +19,7 @@ report generation. The data buckets will include any buckets in your Cumulus buckets configuration that have type public, protected or private.
    - + \ No newline at end of file diff --git a/docs/v11.0.0/getting-started/index.html b/docs/v11.0.0/getting-started/index.html index 640f6047d78..8321abd8d82 100644 --- a/docs/v11.0.0/getting-started/index.html +++ b/docs/v11.0.0/getting-started/index.html @@ -5,13 +5,13 @@ Getting Started | Cumulus Documentation - +
    Version: v11.0.0

    Getting Started

    Overview | Quick Tutorials | Helpful Tips

    Overview

    This serves as a guide for new Cumulus users to deploy and learn how to use Cumulus. Here you will learn what you need in order to complete any prerequisites, what Cumulus is and how it works, and how to successfully navigate and deploy a Cumulus environment.

    What is Cumulus

    Cumulus is an open source set of components for creating cloud-based data ingest, archive, distribution and management designed for NASA's future Earth Science data streams.

    Who uses Cumulus

    Data integrators/developers and operators across projects not limited to NASA use Cumulus for their daily work functions.

    Cumulus Roles

    Integrator/Developer

    Cumulus integrators/developers are those who work within Cumulus and AWS for deployments and to manage workflows.

    Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections.

    Role Guides

    As a developer, integrator, or operator, you will need to set up your environments to work in Cumulus. The following docs can get you started in your role specific activities.

    What is a Cumulus Data Type

    In Cumulus, we have the following types of data that you can create and manage:

    • Collections
    • Granules
    • Providers
    • Rules
    • Workflows
    • Executions
    • Reports

    For details on how to create or manage data types go to Data Management Types.


    Quick Tutorials

    Deployment & Configuration

    Cumulus is deployed to an AWS account, so you must have access to deploy resources to an AWS account to get started.

    1. Deploy Cumulus and Cumulus Dashboard to AWS

    Follow the deployment instructions to deploy Cumulus to your AWS account.

    2. Configure and Run the HelloWorld Workflow

    If you have deployed using the cumulus-template-deploy repository, you have a HelloWorld workflow deployed to your Cumulus backend.

    You can see your deployed workflows on the Workflows page of your Cumulus dashboard.

    Configure a collection and provider using the setup guidance on the Cumulus dashboard.

    Then create a rule to trigger your HelloWorld workflow. You can select a rule type of one time.

    Navigate to the Executions page of the dashboard to check the status of your workflow execution.

    3. Configure a Custom Workflow

    See Developing a custom workflow documentation for adding a new workflow to your deployment.

    There are plenty of workflow examples using Cumulus tasks here. The Data Cookbooks provide a more in-depth look at some of these more advanced workflows and their configurations.

    There is a list of Cumulus tasks already included in your deployment here.

    After configuring your workflow and redeploying, you can configure and run your workflow using the same steps as in step 2.


    Helpful Tips

    Here are some useful tips to keep in mind when deploying or working in Cumulus.

    Integrator/Developer

    • Versioning and Releases: This documentation gives information on our global versioning approach. We suggest upgrading to the supported version for Cumulus, Cumulus dashboard, and Thin Egress App (TEA).
    • Cumulus Developer Documentation: We suggest that you read through and reference this resource for development best practices in Cumulus.
    • Cumulus Deployment: We will guide you on how to manually deploy a new instance of Cumulus. In this reference, you will learn how to install Terraform, create an AWS S3 bucket, configure a compatible database, and create a Lambda layer.
    • Terraform Best Practices: This will help guide you through your Terraform configuration and Cumulus deployment. For an introduction about Terraform go here.
    • Integrator Common Use Cases: Scenarios to help integrators along in the Cumulus environment.

    Operator

    Troubleshooting

    Troubleshooting: Some suggestions to help you troubleshoot and solve issues you may encounter.

    Resources

    - + \ No newline at end of file diff --git a/docs/v11.0.0/glossary/index.html b/docs/v11.0.0/glossary/index.html index 60fe2bd7511..fb74d42977e 100644 --- a/docs/v11.0.0/glossary/index.html +++ b/docs/v11.0.0/glossary/index.html @@ -5,13 +5,13 @@ Glossary | Cumulus Documentation - +
    Version: v11.0.0

    Glossary

    AWS Glossary

    For terms/items from Amazon/AWS not mentioned in this glossary, please refer to the AWS Glossary.

    Cumulus Glossary of Terms

    API Gateway

    Refers to AWS's API Gateway. Used by the Cumulus API.

    ARN

    Refers to an AWS "Amazon Resource Name".

    For more info, see the AWS documentation.

    AWS

    See: aws.amazon.com

    AWS Lambda/Lambda Function

    AWS's 'serverless' option. Allows the running of code without provisioning a service or managing server/ECS instances/etc.

    For more information, see the AWS Lambda documentation.

    AWS Access Keys

    Access credentials that give you access to AWS to act as a IAM user programmatically or from the command line.

    For more information, see the AWS IAM Documentation.

    Bucket

    An Amazon S3 cloud storage resource.

    For more information, see the AWS Bucket Documentation.

    CloudFormation

    An AWS service that allows you to define and manage cloud resources as a preconfigured block.

    For more information, see the AWS CloudFormation User Guide.

    Cloudformation Template

    A template that defines an AWS Cloud Formation.

    For more information, see the AWS intro page.

    Cloudwatch

    AWS service that allows logging and metrics collections on various cloud resources you have in AWS.

    For more information, see the AWS User Guide.

    Cloud Notification Mechanism (CNM)

    An interface mechanism to support cloud-based ingest messaging. For more information, see PO.DAAC's CNM Schema.

    Common Metadata Repository (CMR)

    "A high-performance, high-quality, continuously evolving metadata system that catalogs Earth Science data and associated service metadata records". For more information, see NASA's CMR page.

    Collection (Cumulus)

    Cumulus Collections are logical sets of data objects of the same data type and version.

    For more information, see cookbook reference page.

    Cumulus Message Adapter (CMA)

    A library designed to help task developers integrate step function tasks into a Cumulus workflow by adapting task input/output into the Cumulus Message format.

    For more information, see CMA workflow reference page.

    Distributed Active Archive Center (DAAC)

    Refers to a specific organization that's part of NASA's distributed system of archive centers. For more information see EOSDIS's DAAC page

    Dead Letter Queue (DLQ)

    This refers to Amazon SQS Dead-Letter Queues - these SQS queues are specifically configured to capture failed messages from other services/SQS queues/etc to allow for processing of failed messages.

    For more on DLQs, see the Amazon Documentation and the Cumulus DLQ feature page.

    Developer

    Those who setup deployment and workflow management for Cumulus. Sometimes referred to as an integrator. See integrator.

    ECS

    Amazon's Elastic Container Service. Used in Cumulus by workflow steps that require more flexibility than Lambda can provide.

    For more information, see AWS's developer guide.

    ECS Activity

    An ECS instance run via a Step Function.

    Execution (Cumulus)

    A Cumulus execution refers to a single execution of a (Cumulus) Workflow.

    GIBS

    Global Imagery Browse Services

    Granule

    A granule is the smallest aggregation of data that can be independently managed (described, inventoried, and retrieved). Granules are always associated with a collection, which is a grouping of granules. A granule is a grouping of data files.

    IAM

    AWS Identity and Access Management.

    For more information, see AWS IAMs.

    Integrator/Developer

    Those who work within Cumulus and AWS for deployments and to manage workflows.

    Kinesis

    Amazon's platform for streaming data on AWS.

    See AWS Kinesis for more information.

    Lambda

    AWS's cloud service that lets you run code without provisioning or managing servers.

    For more information, see AWS's lambda page.

    Module (Terraform)

    Refers to a terraform module.

    Node

    See node.js.

    Npm

    Node package manager.

    For more information, see npmjs.com.

    Operator

    Those who work within Cumulus to ingest/archive data and manage collections.

    PDR

    "Polling Delivery Mechanism" used in "DAAC Ingest" workflows.

    For more information, see nasa.gov.

    Packages (NPM)

    NPM hosted node.js packages. Cumulus packages can be found on NPM's site here

    Provider

    Data source that generates and/or distributes data for Cumulus workflows to act upon.

    For more information, see the Cumulus documentation.

    Rule

    Rules are configurable scheduled events that trigger workflows based on various criteria.

    For more information, see the Cumulus Rules documentation.

    S3

    Amazon's Simple Storage Service provides data object storage in the cloud. Used in Cumulus to store configuration, data and more.

    For more information, see AWS's s3 page.

    SIPS

    Science Investigator-led Processing Systems. In the context of DAAC ingest, this refers to data producers/providers.

    For more information, see nasa.gov.

    SNS

    Amazon's Simple Notification Service provides a messaging service that allows publication of and subscription to events. Used in Cumulus to trigger workflow events, track event failures, and others.

    For more information, see AWS's SNS page.

    SQS

    Amazon's Simple Queue Service.

    For more information, see AWS's SQS page.

    Stack

    A collection of AWS resources you can manage as a single unit.

    In the context of Cumulus, this refers to a deployment of the cumulus and data-persistence modules that is managed by Terraform

    Step Function

    AWS's web service that allows you to compose complex workflows as a state machine comprised of tasks (Lambdas, activities hosted on EC2/ECS, some AWS service APIs, etc). See AWS's Step Function Documentation for more information. In the context of Cumulus these are the underlying AWS service used to create Workflows.

    Terraform

    Terraform is the tool that you will use for deployment and configuration of your Cumulus environment.

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/index.html b/docs/v11.0.0/index.html index 12fe3326b46..8b7b1da4a34 100644 --- a/docs/v11.0.0/index.html +++ b/docs/v11.0.0/index.html @@ -5,13 +5,13 @@ Introduction | Cumulus Documentation - +
    Version: v11.0.0

    Introduction

    This Cumulus project seeks to address the existing need for a “native” cloud-based data ingest, archive, distribution, and management system that can be used for all future Earth Observing System Data and Information System (EOSDIS) data streams via the development and implementation of Cumulus. The term “native” implies that the system will leverage all components of a cloud infrastructure provided by the vendor for efficiency (in terms of both processing time and cost). Additionally, Cumulus will operate on future data streams involving satellite missions, aircraft missions, and field campaigns.

    This documentation includes both guidelines, examples, and source code docs. It is accessible at https://nasa.github.io/cumulus.


    Get To Know Cumulus

    • Getting Started - here - If you are new to Cumulus we suggest that you begin with this section to help you understand and work in the environment.
    • General Cumulus Documentation - here <- you're here

    Cumulus Reference Docs

    • Cumulus API Documentation - here
    • Cumulus Developer Documentation - here - READMEs throughout the main repository.
    • Data Cookbooks - here

    Auxiliary Guides

    • Integrator Guide - here
    • Operator Docs - here

    Contributing

    Please refer to: https://github.com/nasa/cumulus/blob/master/CONTRIBUTING.md for information. We thank you in advance.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/integrator-guide/about-int-guide/index.html b/docs/v11.0.0/integrator-guide/about-int-guide/index.html index c2d1cb8c51e..08b0c9db076 100644 --- a/docs/v11.0.0/integrator-guide/about-int-guide/index.html +++ b/docs/v11.0.0/integrator-guide/about-int-guide/index.html @@ -5,13 +5,13 @@ About Integrator Guide | Cumulus Documentation - +
    Version: v11.0.0

    About Integrator Guide

    Purpose

    The Integrator Guide is to help supplement the Cumulus documentation and Data Cookbooks. This content is for Cumulus integrators who are either new to the project or need a step-by-step resource to help them along.

    What Is A Cumulus Integrator

    Cumulus integrators are those who work within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    - + \ No newline at end of file diff --git a/docs/v11.0.0/integrator-guide/int-common-use-cases/index.html b/docs/v11.0.0/integrator-guide/int-common-use-cases/index.html index d9ee56fdf0d..a961134467e 100644 --- a/docs/v11.0.0/integrator-guide/int-common-use-cases/index.html +++ b/docs/v11.0.0/integrator-guide/int-common-use-cases/index.html @@ -5,13 +5,13 @@ Integrator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v11.0.0/integrator-guide/workflow-add-new-lambda/index.html b/docs/v11.0.0/integrator-guide/workflow-add-new-lambda/index.html index e39ba52fdb3..d6969e47cd6 100644 --- a/docs/v11.0.0/integrator-guide/workflow-add-new-lambda/index.html +++ b/docs/v11.0.0/integrator-guide/workflow-add-new-lambda/index.html @@ -5,13 +5,13 @@ Workflow - Add New Lambda | Cumulus Documentation - +
    Version: v11.0.0

    Workflow - Add New Lambda

    You can develop a workflow task in AWS Lambda or Elastic Container Service (ECS). AWS ECS requires Docker. For a list of tasks to use go to our Cumulus Tasks page.

    The following steps are to help you along as you write a new Lambda that integrates with a Cumulus workflow. This will aid you with the understanding of the Cumulus Message Adapter (CMA) process.

    Steps

    1. Define New Lambda in Terraform

    2. Add Task in JSON Object

      For details on how to set up a workflow via CMA go to the CMA Tasks: Message Flow.

      You will need to assign input and output for the new task and follow the CMA contract here. This contract defines how libraries should call the cumulus-message-adapter to integrate a task into an existing Cumulus Workflow.

    3. Verify New Task

      Check the updated workflow in AWS and in Cumulus.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/integrator-guide/workflow-ts-failed-step/index.html b/docs/v11.0.0/integrator-guide/workflow-ts-failed-step/index.html index 1bc9722e9f3..d24d5d9ac7b 100644 --- a/docs/v11.0.0/integrator-guide/workflow-ts-failed-step/index.html +++ b/docs/v11.0.0/integrator-guide/workflow-ts-failed-step/index.html @@ -5,13 +5,13 @@ Workflow - Troubleshoot Failed Step(s) | Cumulus Documentation - +
    Version: v11.0.0

    Workflow - Troubleshoot Failed Step(s)

    Steps

    1. Locate Step
    • Go to Cumulus dashboard
    • Find the granule
    • Go to Executions to determine the failed step
    1. Investigate in Cloudwatch
    • Go to Cloudwatch
    • Locate lambda
    • Search Cloudwatch logs
    1. Recreate Error

      In your sandbox environment, try to recreate the error.

    2. Resolution

    - + \ No newline at end of file diff --git a/docs/v11.0.0/interfaces/index.html b/docs/v11.0.0/interfaces/index.html index ab88eb29a41..9311159df13 100644 --- a/docs/v11.0.0/interfaces/index.html +++ b/docs/v11.0.0/interfaces/index.html @@ -5,13 +5,13 @@ Interfaces | Cumulus Documentation - +
    Version: v11.0.0

    Interfaces

    Cumulus has multiple interfaces that allow interaction with discrete components of the system, such as starting workflows via SNS/Kinesis/SQS, manually queueing workflow start messages, submitting SNS notifications for completed workflows, and the many operations allowed by the Cumulus API.

    The diagram below illustrates the workflow process in detail and the various interfaces that allow starting of workflows, reporting of workflow information, and database create operations that occur when a workflow reporting message is processed. For interfaces with expected input or output schemas, details are provided below.

    Architecture diagram showing the interfaces for triggering and reporting of Cumulus workflow executions

    Workflow triggers and queuing

    Kinesis stream

    As a Kinesis stream is consumed by the messageConsumer Lambda to queue workflow executions, the incoming event is validated against this consumer schema by the ajv package.

    SQS queue for executions

    The messages put into the SQS queue for executions should conform to the Cumulus message format.

    Workflow executions

    See the documentation on Cumulus workflows.

    Workflow reporting

    SNS reporting topics

    For granule and PDR reporting, the topics will only receive data if the Cumulus workflow execution message meets the following criteria:

    • Granules - workflow message contains granule data in payload.granules
    • PDRs - workflow message contains PDR data in payload.pdr

    The messages published to the SNS reporting topics for executions and PDRs and the record property in the messages published to the granules SNS topic should conform to the model schema for each data type.

    Further detail on workflow reporting and how to interact with these interfaces can be found in the workflow notifications data cookbook.

    Cumulus API

    See the Cumulus API documentation.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/operator-docs/about-operator-docs/index.html b/docs/v11.0.0/operator-docs/about-operator-docs/index.html index f642ef3dc67..77f8af3354c 100644 --- a/docs/v11.0.0/operator-docs/about-operator-docs/index.html +++ b/docs/v11.0.0/operator-docs/about-operator-docs/index.html @@ -5,13 +5,13 @@ About Operator Docs | Cumulus Documentation - +
    Version: v11.0.0

    About Operator Docs

    Purpose

    Operator Docs are an augmentation to Cumulus documentation and Data Cookbooks. These documents will walk step-by-step through common Cumulus activities (that aren't necessarily as use-case directed as what you'd see in Data Cookbooks).

    What Is A Cumulus Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections. They may perform the following functions via the operator dashboard or API:

    • Configure providers and collections
    • Configure rules and monitor workflow executions
    • Monitor granule ingestion
    • Monitor system metrics
    - + \ No newline at end of file diff --git a/docs/v11.0.0/operator-docs/bulk-operations/index.html b/docs/v11.0.0/operator-docs/bulk-operations/index.html index df7fd29038d..4d973cbf809 100644 --- a/docs/v11.0.0/operator-docs/bulk-operations/index.html +++ b/docs/v11.0.0/operator-docs/bulk-operations/index.html @@ -5,14 +5,14 @@ Bulk Operations | Cumulus Documentation - +
    Version: v11.0.0

    Bulk Operations

    Cumulus implements bulk operations through the use of AsyncOperations, which are long-running processes executed on an AWS ECS cluster.

    Submitting a bulk API request

    Bulk operations are generally submitted via the endpoint for the relevant data type, e.g. granules. For a list of supported API requests, refer to the Cumulus API documentation. Bulk operations are denoted with the keyword 'bulk'.

    Starting bulk operations from the Cumulus dashboard

    Using a Kibana query

    Note: You must have configured your dashboard build with a KIBANAROOT environment variable in order for the Kibana link to render in the bulk granules modal

    1. From the Granules dashboard page, click on the "Run Bulk Granules" button, then select what type of action you would like to perform

      • Note: the rest of the process is the same regardless of what type of bulk action you perform
    2. From the bulk granules modal, click the "Open Kibana" link:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations

    3. Once you have accessed Kibana, navigate to the "Discover" page. If this is your first time using Kibana, you may see a message like this at the top of the page:

      In order to visualize and explore data in Kibana, you'll need to create an index pattern to retrieve data from Elasticsearch.

      In that case, see the docs for creating an index pattern for Kibana

      Screenshot of Kibana user interface showing the &quot;Discover&quot; page for running queries

    4. Enter a query that returns the granule records that you want to use for bulk operations:

      Screenshot of Kibana user interface showing an example Kibana query and results

    5. Once the Kibana query is returning the results you want, click the "Inspect" link near the top of the page. A slide out tab with request details will appear on the right side of the page:

      Screenshot of Kibana user interface showing details of an example request

    6. In the slide out tab that appears on the right side of the page, click the "Request" link near the top and scroll down until you see the query property:

      Screenshot of Kibana user interface showing the Elasticsearch data request made for a given Kibana query

    7. Highlight and copy the query contents from Kibana. Go back to the Cumulus dashboard and paste the query contents from Kibana inside of the query property in the bulk granules request payload. It is expected that you should have a property of query nested inside of the existing query property:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query information populated

    8. Add values for the index and workflowName to the bulk granules request payload. The value for index will vary based on your Elasticsearch setup, but it is good to target an index specifically for granule data if possible:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query, index, and workflow information populated

    9. Click the "Run Bulk Operations" button. You should see a confirmation message, including an ID for the async operation that was started to handle your bulk action. You can track the status of this async operation on the Operations dashboard page, which can be visited by clicking the "Go To Operations" button:

      Screenshot of Cumulus dashboard showing confirmation message with async operation ID for bulk granules request

    Creating an index pattern for Kibana

    1. Define the index pattern for the indices that your Kibana queries should use. A wildcard character, *, will match across multiple indices. Once you are satisfied with your index pattern, click the "Next step" button:

      Screenshot of Kibana user interface for defining an index pattern

    2. Choose whether to use a Time Filter for your data, which is not required. Then click the "Create index pattern" button:

      Screenshot of Kibana user interface for configuring the settings of an index pattern

    Status Tracking

    All bulk operations return an AsyncOperationId which can be submitted to the /asyncOperations endpoint.

    The /asyncOperations endpoint allows listing of AsyncOperation records as well as record retrieval for individual records, which will contain the status. The Cumulus API documentation shows sample requests for these actions.

    The Cumulus Dashboard also includes an Operations monitoring page, where operations and their status are visible:

    Screenshot of Cumulus Dashboard Operations Page showing 5 operations and their status, ID, description, type and creation timestamp

    - + \ No newline at end of file diff --git a/docs/v11.0.0/operator-docs/cmr-operations/index.html b/docs/v11.0.0/operator-docs/cmr-operations/index.html index 7c2d95b068a..72d8f19c64a 100644 --- a/docs/v11.0.0/operator-docs/cmr-operations/index.html +++ b/docs/v11.0.0/operator-docs/cmr-operations/index.html @@ -5,7 +5,7 @@ CMR Operations | Cumulus Documentation - + @@ -16,7 +16,7 @@ UpdateCmrAccessConstraints will update CMR metadata file contents on S3, and PostToCmr will push the updates to CMR. The rest of this section will assume you have created this workflow under the name UpdateCmrAccessConstraints.

    Once created and deployed, the workflow is available in the Cumulus dashboard's Execute workflow selector. However, note that additional configuration is required for this request, to supply an access constraint integer value and optional description to the UpdateCmrAccessConstraints workflow, by clicking the Add Custom Workflow Meta option in the Execute popup, as shown below:

    Screenshot showing granule execute popup with &#39;updateCmrAccessConstraints&#39; selected and configuration values shown in a collapsible JSON field

    An example invocation of the API to perform this action is:

    $ curl --request PUT https://example.com/granules/MOD11A1.A2017137.h19v16.006.2017138085750 \
    --header 'Authorization: Bearer ReplaceWithTheToken' \
    --header 'Content-Type: application/json' \
    --data '{
    "action": "applyWorkflow",
    "workflow": "updateCmrAccessConstraints",
    "meta": {
    accessConstraints: {
    value: 5,
    description: "sample access constraint"
    }
    }
    }'

    Supported CMR metadata formats for the above operation are Echo10XML and UMMG-JSON, which will populate the RestrictionFlag and RestrictionComment fields in Echo10XML, or the AccessConstraints values in UMMG-JSON.

    Additional Operations

    At this time Cumulus does not, out of the box, support additional operations on CMR metadata. However, given the examples shown above, we recommend working with your integrators to develop additional workflows that perform any required operations.

    Bulk CMR operations

    In order to perform the above operations in bulk, Cumulus supports the use of ApplyWorkflow in an AsyncOperation. These are accessed via the Bulk Operation button on the dashboard, or the /granules/bulk endpoint on the Cumulus API.

    More information on bulk operations are in the bulk operations operator doc.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/operator-docs/create-rule-in-cumulus/index.html b/docs/v11.0.0/operator-docs/create-rule-in-cumulus/index.html index 1729a130a9d..d61dbc017b2 100644 --- a/docs/v11.0.0/operator-docs/create-rule-in-cumulus/index.html +++ b/docs/v11.0.0/operator-docs/create-rule-in-cumulus/index.html @@ -5,13 +5,13 @@ Create Rule In Cumulus | Cumulus Documentation - +
    Version: v11.0.0

    Create Rule In Cumulus

    Once the above files are in place and the entries created in CMR and Cumulus, we are ready to begin ingesting data. Depending on the type of ingestion (FTP/Kinesis, etc) the values below will change, but for the most part they are all similar. Rules tell Cumulus how to associate providers and collections, and when/how to start processing a workflow.

    Steps

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v11.0.0/operator-docs/discovery-filtering/index.html b/docs/v11.0.0/operator-docs/discovery-filtering/index.html index ebd3e9222dc..9b205001f7b 100644 --- a/docs/v11.0.0/operator-docs/discovery-filtering/index.html +++ b/docs/v11.0.0/operator-docs/discovery-filtering/index.html @@ -5,7 +5,7 @@ Discovery Filtering | Cumulus Documentation - + @@ -24,7 +24,7 @@ directly list the provider_path. If the path contains regular expression components, this may fail.

    It is recommended that operators diagnose any failures by checking error logs and ensuring that permissions on the remote file system allow reading of the default directory and any subdirectories that match the filter.

    Supported protocols

    Currently support for this feature is limited to the following protocols:

    • ftp
    • sftp
    - + \ No newline at end of file diff --git a/docs/v11.0.0/operator-docs/granule-workflows/index.html b/docs/v11.0.0/operator-docs/granule-workflows/index.html index 2704f0115f6..6caedcdad44 100644 --- a/docs/v11.0.0/operator-docs/granule-workflows/index.html +++ b/docs/v11.0.0/operator-docs/granule-workflows/index.html @@ -5,13 +5,13 @@ Granule Workflows | Cumulus Documentation - +
    Version: v11.0.0

    Granule Workflows

    Failed Granule

    Delete and Ingest

    1. Delete Granule

    Note: Granules published to CMR will need to be removed from CMR via the dashboard prior to deletion

    1. Ingest Granule via Ingest Rule
    • Re-trigger a one-time, kinesis, SQS, or SNS rule or a scheduled rule will re-discover and reingest the deleted granule.

    Reingest

    1. Select Failed Granule
    • In the Cumulus dashboard, go to the Collections page.
    • Use search field to find the granule.
    1. Re-ingest Granule
    • Go to the Collections page.
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of the Reingest modal workflow

    Delete and Ingest

    1. Bulk Delete Granules
    • Go to the Granules page.
    • Use the Bulk Delete button to bulk delete selected granules or select via a Kibana query

    Note: You can optionally force deletion from CMR

    1. Ingest Granules via Ingest Rule
    • Re-trigger one-time, kinesis, SQS, or SNS rules or scheduled rules will re-discover and reingest the deleted granule.

    Multiple Failed Granules

    1. Select Failed Granules
    • In the Cumulus dashboard, go to the Collections page.
    • Click on Failed Granules.
    • Select multiple granules.

    Screenshot of selected multiple granules

    1. Bulk Re-ingest Granules
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of Bulk Reingest modal workflow

    - + \ No newline at end of file diff --git a/docs/v11.0.0/operator-docs/kinesis-stream-for-ingest/index.html b/docs/v11.0.0/operator-docs/kinesis-stream-for-ingest/index.html index a38b4566da6..c114a2cb7c2 100644 --- a/docs/v11.0.0/operator-docs/kinesis-stream-for-ingest/index.html +++ b/docs/v11.0.0/operator-docs/kinesis-stream-for-ingest/index.html @@ -5,13 +5,13 @@ Setup Kinesis Stream & CNM Message | Cumulus Documentation - +
    Version: v11.0.0

    Setup Kinesis Stream & CNM Message

    Note: Keep in mind that you should only have to set this up once per ingest stream. Kinesis pricing is based on the shard value and not on amount of kinesis usage.

    1. Create a Kinesis Stream

      • In your AWS console, go to the Kinesis service and click Create Data Stream.
      • Assign a name to the stream.
      • Apply a shard value of 1.
      • Click on Create Kinesis Stream.
      • A status page with stream details display. Once the status is active then the stream is ready to use. Keep in mind to record the streamName and StreamARN for later use.

      Screenshot of AWS console page for creating a Kinesis stream

    2. Create a Rule

    3. Send a message

      • Send a message that makes your schema using python or by your command line.
      • The streamName and Collection must match the kinesisArn+collection defined in the rule that you have created in Step 2.
    - + \ No newline at end of file diff --git a/docs/v11.0.0/operator-docs/locating-access-logs/index.html b/docs/v11.0.0/operator-docs/locating-access-logs/index.html index ff527063733..4f72ee80a91 100644 --- a/docs/v11.0.0/operator-docs/locating-access-logs/index.html +++ b/docs/v11.0.0/operator-docs/locating-access-logs/index.html @@ -5,13 +5,13 @@ Locating S3 Access Logs | Cumulus Documentation - +
    Version: v11.0.0

    Locating S3 Access Logs

    When enabling S3 Access Logs for EMS Reporting you configured a TargetBucket and TargetPrefix. Inside the TargetBucket at the TargetPrefix is where you will find the raw S3 access logs.

    In a standard deployment, this will be your stack's <internal bucket name> and a key prefix of <stack>/ems-distribution/s3-server-access-logs/

    - + \ No newline at end of file diff --git a/docs/v11.0.0/operator-docs/naming-executions/index.html b/docs/v11.0.0/operator-docs/naming-executions/index.html index e0ba578675e..cf815e4c660 100644 --- a/docs/v11.0.0/operator-docs/naming-executions/index.html +++ b/docs/v11.0.0/operator-docs/naming-executions/index.html @@ -5,7 +5,7 @@ Naming Executions | Cumulus Documentation - + @@ -21,7 +21,7 @@ QueuePdrs step.

    In the following excerpt, the QueueGranules config.executionNamePrefix property is set using the value configured in the workflow's meta.executionNamePrefix.

    Please note: This meta.executionNamePrefix property should not be confused with the optional rule executionNamePrefix property from the previous section. Setting executionNamePrefix as a root property of the rule will set a prefix for the names of any workflows triggered by the rule. Setting meta.executionNamePrefix on the rule will set meta.executionNamePrefix in the workflow messages generated for this rule, allowing workflow steps like QueueGranules to read from the message meta.executionNamePrefix for their config. Then, workflows scheduled by QueueGranules would use the configured execution name prefix.

    Setting executionNamePrefix config for QueueGranules using rule.meta

    If you wanted to use a prefix of "my-prefix", you would create a rule with a meta property similar to the following Rule snippet:

    {
    ...other rule keys here...
    "meta":
    {
    "executionNamePrefix": "my-prefix"
    }
    }

    The value of meta.executionNamePrefix from the rule will be set as meta.executionNamePrefix in the workflow message.

    Then, the workflow could contain a "QueueGranules" step with the following state, which uses meta.executionNamePrefix from the message as the value for the executionNamePrefix config to the "QueueGranules" step:

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "executionNamePrefix": "{$.meta.executionNamePrefix}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },
    }
    - + \ No newline at end of file diff --git a/docs/v11.0.0/operator-docs/ops-common-use-cases/index.html b/docs/v11.0.0/operator-docs/ops-common-use-cases/index.html index 11d10018e6b..8586ac730d1 100644 --- a/docs/v11.0.0/operator-docs/ops-common-use-cases/index.html +++ b/docs/v11.0.0/operator-docs/ops-common-use-cases/index.html @@ -5,13 +5,13 @@ Operator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v11.0.0/operator-docs/trigger-workflow/index.html b/docs/v11.0.0/operator-docs/trigger-workflow/index.html index 2470ee72e7b..3222b5bb0ff 100644 --- a/docs/v11.0.0/operator-docs/trigger-workflow/index.html +++ b/docs/v11.0.0/operator-docs/trigger-workflow/index.html @@ -5,13 +5,13 @@ Trigger a Workflow Execution | Cumulus Documentation - +
    Version: v11.0.0

    Trigger a Workflow Execution

    To trigger a workflow, you need to create a rule. To trigger an ingest workflow, one that requires discovering and ingesting data, you will also need to configure the collection and provider and associate those to a rule.

    Trigger a HelloWorld Workflow

    To trigger a HelloWorld workflow that does not need to discover or archive data, you just need to create a rule.

    You can leave the provider and collection blank and do not need any additional metadata. If you create a onetime rule, the workflow execution will start momentarily and you can view its status on the Executions page.

    Trigger an Ingest Workflow

    To ingest data, you will need a provider and collection configured to tell your workflow where to discover data and where to archive the data respectively.

    Follow the instructions to create a provider and create a collection and configure their fields for your data ingest.

    In the rule's additional metadata you can specify a provider_path from which to get the data from the provider.

    Example: Ingest data from S3

    Setup

    Assume there are 2 files to be ingested in an S3 bucket called discovery-bucket, located in the test-data folder:

    • GRANULE.A2017025.jpg
    • GRANULE.A2017025.hdf

    Archive buckets should already be created and mapped to public / private / protected in the Cumulus deployment.

    For example:

    buckets = {
    private = {
    name = "discovery-bucket"
    type = "private"
    },
    protected = {
    name = "archive-protected"
    type = "protected"
    }
    public = {
    name = "archive-public"
    type = "public"
    }
    }

    Create a provider

    Create a new provider. Set protocol to S3 and Host to discovery-bucket.

    Screenshot of adding a sample S3 provider

    Create a collection

    Create a new collection. Configure the collection to extract the granule id from the filenames and configure where to store the granule files.

    The configuration below will store hdf files in the protected bucket and jpg files in the private bucket. The bucket types are

    {
    "name": "test-collection",
    "version": "001",
    "granuleId": "^GRANULE\\.A[\\d]{7}$",
    "granuleIdExtraction": "(GRANULE\\..*)(\\.hdf|\\.jpg)",
    "reportToEms": false,
    "sampleFileName": "GRANULE.A2017025.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^GRANULE\\.A[\\d]{7}\\.hdf$",
    "sampleFileName": "GRANULE.A2017025.hdf"
    },
    {
    "bucket": "public",
    "regex": "^GRANULE\\.A[\\d]{7}\\.jpg$",
    "sampleFileName": "GRANULE.A2017025.jpg"
    }
    ]
    }

    Create a rule

    Create a rule to trigger the workflow to discover your granule data and ingest your granule.

    Select the previously created provider and collection. See the Cumulus Discover Granules workflow for a workflow example of using Cumulus tasks to discover and queue data for ingest.

    In the rule meta, set the provider_path to test-data, so the test-data folder will be used to discover new granules.

    Screenshot of adding a Discover Granules rule

    A onetime rule will run your workflow on-demand and you can view it on the dashboard Executions page. The Cumulus Discover Granules workflow will trigger an ingest workflow and your ingested granules will be visible on the dashboard Granules page.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/tasks/index.html b/docs/v11.0.0/tasks/index.html index 9caa952facd..e7005e41a63 100644 --- a/docs/v11.0.0/tasks/index.html +++ b/docs/v11.0.0/tasks/index.html @@ -5,13 +5,13 @@ Cumulus Tasks | Cumulus Documentation - +
    Version: v11.0.0

    Cumulus Tasks

    A list of reusable Cumulus tasks. Add your own.

    Tasks

    @cumulus/add-missing-file-checksums

    Add checksums to files in S3 which don't have one


    @cumulus/discover-granules

    Discover Granules in FTP/HTTP/HTTPS/SFTP/S3 endpoints


    @cumulus/discover-pdrs

    Discover PDRs in FTP and HTTP endpoints


    @cumulus/files-to-granules

    Converts array-of-files input into a granules object by extracting granuleId from filename


    @cumulus/hello-world

    Example task


    @cumulus/hyrax-metadata-updates

    Update granule metadata with hooks to OPeNDAP URL


    @cumulus/lzards-backup

    Run LZARDS backup


    @cumulus/move-granules

    Move granule files from staging to final location


    @cumulus/parse-pdr

    Download and Parse a given PDR


    @cumulus/pdr-status-check

    Checks execution status of granules in a PDR


    @cumulus/post-to-cmr

    Post a given granule to CMR


    @cumulus/queue-granules

    Add discovered granules to the queue


    @cumulus/queue-pdrs

    Add discovered PDRs to a queue


    @cumulus/queue-workflow

    Add workflow to the queue


    @cumulus/sf-sqs-report

    Sends an incoming Cumulus message to SQS


    @cumulus/sync-granule

    Download a given granule


    @cumulus/test-processing

    Fake processing task used for integration tests


    @cumulus/update-cmr-access-constraints

    Updates CMR metadata to set access constraints


    Update CMR metadata files with correct online access urls and etags and transfer etag info to granules' CMR files

    - + \ No newline at end of file diff --git a/docs/v11.0.0/team/index.html b/docs/v11.0.0/team/index.html index c236e267baf..ef2cd741b53 100644 --- a/docs/v11.0.0/team/index.html +++ b/docs/v11.0.0/team/index.html @@ -5,13 +5,13 @@ Cumulus Team | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v11.0.0/troubleshooting/index.html b/docs/v11.0.0/troubleshooting/index.html index 0854620e425..e3ad31ab4b2 100644 --- a/docs/v11.0.0/troubleshooting/index.html +++ b/docs/v11.0.0/troubleshooting/index.html @@ -5,14 +5,14 @@ How to Troubleshoot and Fix Issues | Cumulus Documentation - +
    Version: v11.0.0

    How to Troubleshoot and Fix Issues

    While Cumulus is a complex system, there is a focus on maintaining the integrity and availability of the system and data. Should you encounter errors or issues while using this system, this section will help troubleshoot and solve those issues.

    Backup and Restore

    Cumulus has backup and restore functionality built-in to protect Cumulus data and allow recovery of a Cumulus stack. This is currently limited to Cumulus data and not full S3 archive data. Backup and restore is not enabled by default and must be enabled and configured to take advantage of this feature.

    For more information, read the Backup and Restore documentation.

    Elasticsearch reindexing

    If you run into issues with your Elasticsearch index, a reindex operation is available via the Cumulus API. See the Reindexing Guide.

    Information on how to reindex Elasticsearch is in the Cumulus API documentation.

    Troubleshooting Workflows

    Workflows are state machines comprised of tasks and services and each component logs to CloudWatch. The CloudWatch logs for all steps in the execution are displayed in the Cumulus dashboard or you can find them by going to CloudWatch and navigating to the logs for that particular task.

    Workflow Errors

    Visual representations of executed workflows can be found in the Cumulus dashboard or the AWS Step Functions console for that particular execution.

    If a workflow errors, the error will be handled according to the error handling configuration. The task that fails will have the exception field populated in the output, giving information about the error. Further information can be found in the CloudWatch logs for the task.

    Graph of AWS Step Function execution showing a failing workflow

    Workflow Did Not Start

    Generally, first check your rule configuration. If that is satisfactory, the answer will likely be in the CloudWatch logs for the schedule SF or SF starter lambda functions. See the workflow triggers page for more information on how workflows start.

    For Kinesis and SNS rules specifically, if an error occurs during the message consumer process, the fallback consumer lambda will be called and if the message continues to error, a message will be placed on the dead letter queue. Check the dead letter queue for a failure message. Errors can be traced back to the CloudWatch logs for the message consumer and the fallback consumer. Additionally, check that the name and version match those configured in your rule, as rules are filtered by the notification's collection name and version before scheduling executions.

    More information on kinesis error handling is here.

    Operator API Errors

    All operator API calls are funneled through the ApiEndpoints lambda. Each API call is logged to the ApiEndpoints CloudWatch log for your deployment.

    Lambda Errors

    KMS Exception: AccessDeniedException

    KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.

    The above error was being thrown by cumulus lambda function invocation. The KMS key is the encryption key used to encrypt lambda environment variables. The root cause of this error is unknown, but is speculated to be caused by deleting and recreating, with the same name, the IAM role the lambda uses.

    This error can be resolved by switching the lambda's execution role to a different one and then back through the Lambda management console. Unfortunately, this approach doesn't scale well.

    The other resolution (that scales but takes some time) that was found is as follows:

    1. Comment out all lambda definitions (and dependent resources) in your Terraform configuration.
    2. terraform apply to delete the lambdas.
    3. Un-comment the definitions.
    4. terraform apply to recreate the lambdas.

    If this problem occurs with Core lambdas and you are using the terraform-aws-cumulus.zip file source distributed in our release, we recommend using the non-scaling approach as the number of lambdas we distribute is in the low teens, which are likely to be easier and faster to reconfigure one-by-one compared to editing our configs.

    Error: Unable to import module 'index': Error

    This error is shown in the CloudWatch logs for a Lambda function.

    One possible cause is that the Lambda definition in the .tf file defining the lambda is not pointing to the correct packaged lambda source file. In order to resolve this issue, update the lambda definition to point directly to the packaged (e.g. .zip) lambda source file.

    resource "aws_lambda_function" "discover_granules_task" {
    function_name = "${var.prefix}-DiscoverGranules"
    filename = "${path.module}/../../tasks/discover-granules/dist/lambda.zip"
    handler = "index.handler"
    }

    If you are seeing this error when using the Lambda as a step in a Cumulus workflow, then inspect the output for this Lambda step in the AWS Step Function console. If you see the error Cannot find module 'node_modules/@cumulus/cumulus-message-adapter-js', then you need to ensure the lambda's packaged dependencies include cumulus-message-adapter-js.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/troubleshooting/reindex-elasticsearch/index.html b/docs/v11.0.0/troubleshooting/reindex-elasticsearch/index.html index a713cdcea80..32239f5b042 100644 --- a/docs/v11.0.0/troubleshooting/reindex-elasticsearch/index.html +++ b/docs/v11.0.0/troubleshooting/reindex-elasticsearch/index.html @@ -5,7 +5,7 @@ Reindexing Elasticsearch Guide | Cumulus Documentation - + @@ -14,7 +14,7 @@ current index, or the mappings for an index have been updated (they do not update automatically). Any reindexing that will be required when upgrading Cumulus will be in the Migration Steps section of the changelog.

    Switch to a new index and Reindex

    There are two operations needed: reindex and change-index to switch over to the new index. A Change Index/Reindex can be done in either order, but both have their trade-offs.

    If you decide to point Cumulus to a new (empty) index first (with a change index operation), and then Reindex the data to the new index, data ingested while reindexing will automatically be sent to the new index. As reindexing operations can take a while, not all the data will show up on the Cumulus Dashboard right away. The advantage is you do not have to turn of any ingest operations. This way is recommended.

    If you decide to Reindex data to a new index first, and then point Cumulus to that new index, it is not guaranteed that data that is sent to the old index while reindexing will show up in the new index. If you prefer this way, it is recommended to turn off any ingest operations. This order will keep your dashboard data from seeing any interruption.

    Change Index

    This will point Cumulus to the index in Elasticsearch that will be used when retrieving data. Performing a change index operation to an index that does not exist yet will create the index for you. The change index operation can be found here.

    Reindex from the old index to the new index

    The reindex operation will take the data from one index and copy it into another index. The reindex operation can be found here

    Reindex status

    Reindexing is a long-running operation. The reindex-status endpoint can be used to monitor the progress of the operation.

    Index from database

    If you want to just grab the data straight from the database you can perform an Index from Database Operation. After the data is indexed from the database, a Change Index operation will need to be performed to ensure Cumulus is pointing to the right index. It is strongly recommended to turn off workflow rules when performing this operation so any data ingested to the database is not lost.

    Validate reindex

    To validate the reindex, use the reindex-status endpoint. The doc count can be used to verify that the reindex was successful. In the below example the reindex from cumulus-2020-11-3 to cumulus-2021-3-4 was not fully successful as they show different doc counts.

    "indices": {
    "cumulus-2020-11-3": {
    "primaries": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    },
    "total": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    }
    },
    "cumulus-2021-3-4": {
    "primaries": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    },
    "total": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    }
    }
    }

    To further drill down into what is missing, log in to the Kibana instance (found in the Elasticsearch section of the AWS console) and run the following command replacing <index> with your index name.

    GET <index>/_search
    {
    "aggs": {
    "count_by_type": {
    "terms": {
    "field": "_type"
    }
    }
    },
    "size": 0
    }

    which will produce a result like

    "aggregations": {
    "count_by_type": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "logs",
    "doc_count": 483955
    },
    {
    "key": "execution",
    "doc_count": 4966
    },
    {
    "key": "deletedgranule",
    "doc_count": 4715
    },
    {
    "key": "pdr",
    "doc_count": 1822
    },
    {
    "key": "granule",
    "doc_count": 740
    },
    {
    "key": "asyncOperation",
    "doc_count": 616
    },
    {
    "key": "provider",
    "doc_count": 108
    },
    {
    "key": "collection",
    "doc_count": 87
    },
    {
    "key": "reconciliationReport",
    "doc_count": 48
    },
    {
    "key": "rule",
    "doc_count": 7
    }
    ]
    }
    }

    Resuming a reindex

    If a reindex operation did not fully complete it can be resumed using the following command run from the Kibana instance.

    POST _reindex?wait_for_completion=false
    {
    "conflicts": "proceed",
    "source": {
    "index": "cumulus-2020-11-3"
    },
    "dest": {
    "index": "cumulus-2021-3-4",
    "op_type": "create"
    }
    }

    The Cumulus API reindex-status endpoint can be used to monitor completion of this operation.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/troubleshooting/rerunning-workflow-executions/index.html b/docs/v11.0.0/troubleshooting/rerunning-workflow-executions/index.html index 9aad35e7199..bf183680d96 100644 --- a/docs/v11.0.0/troubleshooting/rerunning-workflow-executions/index.html +++ b/docs/v11.0.0/troubleshooting/rerunning-workflow-executions/index.html @@ -5,13 +5,13 @@ Re-running workflow executions | Cumulus Documentation - +
    Version: v11.0.0

    Re-running workflow executions

    To re-run a Cumulus workflow execution from the AWS console:

    1. Visit the page for an individual workflow execution

    2. Click the "New execution" button at the top right of the screen

      Screenshot of the AWS console for a Step Function execution highlighting the &quot;New execution&quot; button at the top right of the screen

    3. In the "New execution" modal that appears, replace the cumulus_meta.execution_name value in the default input with the value of the new execution ID as seen in the screenshot below

      Screenshot of the AWS console showing the modal window for entering input when running a new Step Function execution

    4. Click the "Start execution" button

    - + \ No newline at end of file diff --git a/docs/v11.0.0/troubleshooting/troubleshooting-deployment/index.html b/docs/v11.0.0/troubleshooting/troubleshooting-deployment/index.html index 6b58914a826..e8ca8de8e9f 100644 --- a/docs/v11.0.0/troubleshooting/troubleshooting-deployment/index.html +++ b/docs/v11.0.0/troubleshooting/troubleshooting-deployment/index.html @@ -5,7 +5,7 @@ Troubleshooting Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ data-persistence modules, but your config is only creating one Elasticsearch instance. To fix the issue, update the elasticsearch_config variable for your data-persistence module to increase the number of instances:

    {
    domain_name = "es"
    instance_count = 2
    instance_type = "t2.small.elasticsearch"
    version = "5.3"
    volume_size = 10
    }

    Install dashboard

    Dashboard configuration

    Issues:

    • Problem clearing the cache: EACCES: permission denied, rmdir '/tmp/gulp-cache/default'", this probably means the files at that location, and/or the folder, are owned by someone else (or some other factor prevents you from writing there).

    It's possible to workaround this by editing the file cumulus-dashboard/node_modules/gulp-cache/index.js and alter the value of the line var fileCache = new Cache({cacheDirName: 'gulp-cache'}); to something like var fileCache = new Cache({cacheDirName: '<prefix>-cache'});. Now gulp-cache will be able to write to /tmp/<prefix>-cache/default, and the error should resolve.

    Dashboard deployment

    Issues:

    • If the dashboard sends you to an Earthdata Login page that has an error reading "Invalid request, please verify the client status or redirect_uri before resubmitting", this means you've either forgotten to update one or more of your EARTHDATA_CLIENT_ID, EARTHDATA_CLIENT_PASSWORD environment variables (from your app/.env file) and re-deploy Cumulus, or you haven't placed the correct values in them, or you've forgotten to add both the "redirect" and "token" URL to the Earthdata Application.
    • There is odd caching behavior associated with the dashboard and Earthdata Login at this point in time that can cause the above error to reappear on the Earthdata Login page loaded by the dashboard even after fixing the cause of the error. If you experience this, attempt to access the dashboard in a new browser window, and it should work.
    - + \ No newline at end of file diff --git a/docs/v11.0.0/upgrade-notes/cumulus_distribution_migration/index.html b/docs/v11.0.0/upgrade-notes/cumulus_distribution_migration/index.html index c2ff6850691..fa476e6ef41 100644 --- a/docs/v11.0.0/upgrade-notes/cumulus_distribution_migration/index.html +++ b/docs/v11.0.0/upgrade-notes/cumulus_distribution_migration/index.html @@ -5,14 +5,14 @@ Migrate from TEA deployment to Cumulus Distribution | Cumulus Documentation - +
    Version: v11.0.0

    Migrate from TEA deployment to Cumulus Distribution

    Background

    The Cumulus Distribution API is configured to use the AWS Cognito OAuth client. This API can be used instead of the Thin Egress App, which is the default distribution API if using the Deployment Template.

    Configuring a Cumulus Distribution deployment

    See these instructions for deploying the Cumulus Distribution API.

    Important note if migrating from TEA to Cumulus Distribution

    If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/upgrade-notes/migrate_tea_standalone/index.html b/docs/v11.0.0/upgrade-notes/migrate_tea_standalone/index.html index ea7e23ab901..3b3730c957c 100644 --- a/docs/v11.0.0/upgrade-notes/migrate_tea_standalone/index.html +++ b/docs/v11.0.0/upgrade-notes/migrate_tea_standalone/index.html @@ -5,13 +5,13 @@ Migrate TEA deployment to standalone module | Cumulus Documentation - +
    Version: v11.0.0

    Migrate TEA deployment to standalone module

    Background

    This document is only relevant for upgrades of Cumulus from versions < 3.x.x to versions > 3.x.x

    Previous versions of Cumulus included deployment of the Thin Egress App (TEA) by default in the distribution module. As a result, Cumulus users who wanted to deploy a new version of TEA to wait on a new release of Cumulus that incorporated that release.

    In order to give Cumulus users the flexibility to deploy newer versions of TEA whenever they want, deployment of TEA has been removed from the distribution module and Cumulus users must now add the TEA module to their deployment. Guidance on integrating the TEA module to your deployment is provided, or you can refer to Cumulus core example deployment code for the thin_egress_app module.

    By default, when upgrading Cumulus and moving from TEA deployed via the distribution module to deployed as a separate module, your API gateway for TEA would be destroyed and re-created, which could cause outages for any Cloudfront endpoints pointing at that API gateway.

    These instructions outline how to modify your state to preserve your existing Thin Egress App (TEA) API gateway when upgrading Cumulus and moving deployment of TEA to a standalone module. If you do not care about preserving your API gateway for TEA when upgrading your Cumulus deployment, you can skip these instructions.

    Prerequisites

    Notes about state management

    These instructions will involve manipulating your Terraform state via terraform state mv commands. These operations are extremely dangerous, since a mistake in editing your Terraform state can leave your stack in a corrupted state where deployment may be impossible or may result in unanticipated resource deletion.

    Since bucket versioning preserves a separate version of your state file each time it is written, and the Terraform state modification commands overwrite the state file, we can mitigate the risk of these operations by downloading the most recent state file before starting the upgrade process. Then, if anything goes wrong during the upgrade, we can restore that previous state version. Guidance on how to perform both operations is provided below.

    Download your most recent state version

    Run this command to download the most recent cumulus deployment state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp s3://BUCKET/KEY /path/to/terraform.tfstate

    Restore a previous state version

    Upload the state file that was previously downloaded to the bucket/key for your state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp /path/to/terraform.tfstate s3://BUCKET/KEY

    Then run terraform plan, which will give an error because we manually overwrote the state file and it is now out of sync with the lock table Terraform uses to track your state file:

    Error: Error loading state: state data in S3 does not have the expected content.

    This may be caused by unusually long delays in S3 processing a previous state
    update. Please wait for a minute or two and try again. If this problem
    persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
    to manually verify the remote state and update the Digest value stored in the
    DynamoDB table to the following value: <some-digest-value>

    To resolve this error, run this command and replace DYNAMO_LOCK_TABLE, BUCKET and KEY with the correct values from cumulus-tf/terraform.tf, and use the digest value from the previous error output:

     aws dynamodb put-item \
    --table-name DYNAMO_LOCK_TABLE \
    --item '{
    "LockID": {"S": "BUCKET/KEY-md5"},
    "Digest": {"S": "some-digest-value"}
    }'

    Now, if you re-run terraform plan, it should work as expected.

    Migration instructions

    Please note: These instructions assume that you are deploying the thin_egress_app module as shown in the Cumulus core example deployment code

    1. Ensure that you have downloaded the latest version of your state file for your cumulus deployment

    2. Find the URL for your <prefix>-thin-egress-app-EgressGateway API gateway. Confirm that you can access it in the browser and that it is functional.

    3. Run terraform plan. You should see output like (edited for readability):

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be created
      + resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket.lambda_source will be created
      + resource "aws_s3_bucket" "lambda_source" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be created
      + resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be created
      + resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be created
      + resource "aws_s3_bucket_object" "lambda_source" {

      # module.thin_egress_app.aws_security_group.egress_lambda[0] will be created
      + resource "aws_security_group" "egress_lambda" {

      ...

      # module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be destroyed
      - resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source will be destroyed
      - resource "aws_s3_bucket" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be destroyed
      - resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be destroyed
      - resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source will be destroyed
      - resource "aws_s3_bucket_object" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda[0] will be destroyed
      - resource "aws_security_group" "egress_lambda" {
    4. Run the state modification commands. The commands must be run in exactly this order:

       # Move security group
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda module.thin_egress_app.aws_security_group.egress_lambda

      # Move TEA storage bucket
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source module.thin_egress_app.aws_s3_bucket.lambda_source

      # Move TEA lambda source code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source module.thin_egress_app.aws_s3_bucket_object.lambda_source

      # Move TEA lambda dependency code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive

      # Move TEA Cloudformation template
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template module.thin_egress_app.aws_s3_bucket_object.cloudformation_template

      # Move URS creds secret version
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret_version.thin_egress_urs_creds aws_secretsmanager_secret_version.thin_egress_urs_creds

      # Move URS creds secret
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret.thin_egress_urs_creds aws_secretsmanager_secret.thin_egress_urs_creds

      # Move TEA Cloudformation stack
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app module.thin_egress_app.aws_cloudformation_stack.thin_egress_app

      Depending on how you were supplying a bucket map to TEA, there may be an additional step. If you were specifying the bucket_map_key variable to the cumulus module to use a custom bucket map, then you can ignore this step and just ensure that the bucket_map_file variable to the TEA module uses that same S3 key. Otherwise, if you were letting Cumulus generate a bucket map for you, then you need to take this step to migrate that bucket map:

      # Move bucket map
      terraform state mv module.cumulus.module.distribution.aws_s3_bucket_object.bucket_map_yaml[0] aws_s3_bucket_object.bucket_map_yaml
    5. Run terraform plan again. You may still see a few additions/modifications pending like below, but you should not see any deletion of Thin Egress App resources pending:

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be updated in-place
      ~ resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be updated in-place
      ~ resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_source" {

      If you still see deletion of module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app pending, then something went wrong and you should restore the previously downloaded state file version and start over from step 1. Otherwise, proceed to step 6.

    6. Once you have confirmed that everything looks as expected, run terraform apply.

    7. Visit the same API gateway from step 1 and confirm that it still works.

    Your TEA deployment has now been migrated to a standalone module, which gives you the ability to upgrade the deployed version of TEA independently of Cumulus releases.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/upgrade-notes/update-cma-2.0.2/index.html b/docs/v11.0.0/upgrade-notes/update-cma-2.0.2/index.html index eed28152655..ddd28594268 100644 --- a/docs/v11.0.0/upgrade-notes/update-cma-2.0.2/index.html +++ b/docs/v11.0.0/upgrade-notes/update-cma-2.0.2/index.html @@ -5,13 +5,13 @@ Upgrade to CMA 2.0.2 | Cumulus Documentation - +
    Version: v11.0.0

    Upgrade to CMA 2.0.2

    Updating a Cumulus Deployment to CMA 2.0.2

    Background

    The Cumulus Message Adapter has been updated in release 2.0.2 to no longer utilize the AWS step function API to look up the defined name of a step function task for population in meta.workflow_tasks, but instead use an incrementing integer field.

    Additionally a bugfix was released in the form of v2.0.1/v2.0.2 following the initial 2.0.0 release, so all users should update to release 2.0.2

    The update is not tied to a particular version of Core, however the update should be done across all task components in order to ensure consistent execution records.

    Changes

    Execution Record Update

    This update functionally means that Cumulus tasks/activities using the CMA will now record a record that looks like the following in meta.workflowtasks, and more importantly in the tasks column for an execution record:

    Original

          "DiscoverGranules": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "QueueGranules": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    New

          "0": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "1": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    Actions Required

    The following should be done as part of a Cumulus stack update to utilize cumulus message adapter > 2.0.2:

    • Python tasks that utilize cumulus-message-adapter-python should be updated to use > 2.0.0, their lambdas rebuilt and Cumulus workflows reconfigured to use the updated version.

    • Python activities that utilize cumulus-process-py should be rebuilt using > 1.0.0 with updated dependencies, and have their images deployed/Cumulus configured to use the new version.

    • The cumulus-message-adapter v2.0.2 lambda layer should be made available in the deployment account, and the Cumulus deployment should be reconfigured to use it (via the cumulus_message_adapter_lambda_layer_version_arn variable in the cumulus module). This should address all Core node.js tasks that utilize the CMA, and many contributed node.js/JAVA components.

    Once the above have been done, redeploy Cumulus to apply the configuration and the updates should be live.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/upgrade-notes/update-task-file-schemas/index.html b/docs/v11.0.0/upgrade-notes/update-task-file-schemas/index.html index 7c467a4ff75..ccff3f99aba 100644 --- a/docs/v11.0.0/upgrade-notes/update-task-file-schemas/index.html +++ b/docs/v11.0.0/upgrade-notes/update-task-file-schemas/index.html @@ -5,13 +5,13 @@ Updates to task granule file schemas | Cumulus Documentation - +
    Version: v11.0.0

    Updates to task granule file schemas

    Background

    Most Cumulus workflow tasks expect as input a payload of granule(s) which contain the files for each granule. Most tasks also return this same granule structure as output.

    However, up to this point, there was inconsistency in the schemas for the granule files objects expected by each task. Furthermore, there was no guarantee of consistency between granule files objects as stored in the database and the expectations of any given workflow task.

    Thus, when performing bulk granule operations which pass granules from the database into a Cumulus workflow, it was possible for there to be schema validation failures depending on which task was used to start the workflow and its particular schema.

    In order to rectify this situation, CUMULUS-2388 was filed and addressed to create a common granule files schema between nearly all of the Cumulus tasks (exceptions discussed below) and the Cumulus database. The following documentation explains the manual changes you need to make to your deployment in order to be compatible with the updated files schema.

    Updated files schema

    The updated granule files schema can be found here.

    These former properties were deprecated (with notes about how to derive the same information from the updated schema, if possible):

    • filename - concatenate the bucket and key values with a directory separator (/)
    • name - use fileName property
    • etag - ETags are no longer provided as an individual file property. Instead, a separate etags object mapping S3 URIs to ETag values is provided as output from the following workflow tasks (guidance on how to integrate this output with your workflows is provided in the Upgrading your workflows section below):
      • update-granules-cmr-metadata-file-links
      • hyrax-metadata-updates
    • fileStagingDir - no longer supported
    • url_path - no longer supported
    • duplicate_found - This property is no longer supported, however sync-granule and move-granules now produce a separate granuleDuplicates object as part of their output. The granuleDuplicates object is a map of granules by granule ID which includes the files that encountered duplicates during processing. Guidance on how to integrate granuleDuplicates information into your workflow configuration is provided below.

    Exceptions

    These workflow tasks did not have their schema for granule files updated:

    • discover-granules - no updates
    • queue-granules - no updates
    • parse-pdr - no updates
    • sync-granule - input schema not updated, output schema was updated

    The reason that these task schemas were not updated is that all of these tasks start before the files have been ingested to S3, thus much of the information that is required in the updated files schema like bucket, key, or checksum is not yet known.

    Bulk granule operations

    Since the input schema for the above tasks was not updated, that means you cannot run bulk granule operations against workflows if they start with any of those tasks. Bulk granule operations work by loading the specified granules from the database and sending them as input to a specified workflow, so if the specified workflow begins with a task whose input schema does not conform to what is coming out of the database, there will be schema errors.

    Upgrading your deployment

    Upgrading your workflows

    For any workflows using the update-granules-cmr-metadata-file-links task before the hyrax-metadata-updates and/or post-to-cmr tasks, update the step definition for update-granules-cmr-metadata-file-links as follows:

        "UpdateGranulesCmrMetadataFileLinksStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    hyrax-metadata-updates

    For any workflows using the hyrax-metadata-updates task before a post-to-cmr task, update the definition of the hyrax-metadata-updates step as follows:

        "HyraxMetadataUpdatesTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    post-to-cmr

    For any workflows using post-to-cmr task after the update-granules-cmr-metadata-file-links or hyrax-metadata-updates tasks, update the post-to-cmr step definition as follows:

        "CmrStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}"
    }
    }
    },
    ...more configuration...

    Example workflow

    For an example workflow integrating all of these changes, please see our example ingest and publish workflow.

    Optional - Integrate granuleDuplicates information

    Please note that the granuleDuplicates output is purely informational and does not have any bearing on the separate configuration for how duplicates should be handled.

    You can include granuleDuplicates output from the sync-granule or move-granules tasks in your workflow messages like so:

        "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    ...other config...
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granuleDuplicates}",
    "destination": "{$.meta.sync_granule.granule_duplicates}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    }
    ...more configuration...

    The result of this configuration is that the granuleDuplicates output from sync-granule would be placed in meta.sync_granule.granule_duplicates on the workflow message and remain there throughout the rest of the workflow. The same configuration could be replicated for the move-granules task, but be sure to use a different destination in the workflow message for the granuleDuplicates output .

    Updating collection URL path templates

    Collections can specify url_path templates to dynamically generate the final location of files. As part of url_path templates, file object properties can be interpolated to generate the file path. Thus, these url_path templates need to be updated to ensure that they are compatible with the updated files schema and the properties that will actually be available on file objects.

    See the notes on the updated files schema to know which properties are available and which previously existing properties were deprecated.

    As an example, you will want to update any url_path properties in your collections to remove references to file.name and replace them with references to file.fileName like so:

    - "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.name, 0, 3)}",
    + "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.fileName, 0, 3)}",
    - + \ No newline at end of file diff --git a/docs/v11.0.0/upgrade-notes/upgrade-rds/index.html b/docs/v11.0.0/upgrade-notes/upgrade-rds/index.html index 1747dc5dc00..8467998a386 100644 --- a/docs/v11.0.0/upgrade-notes/upgrade-rds/index.html +++ b/docs/v11.0.0/upgrade-notes/upgrade-rds/index.html @@ -5,7 +5,7 @@ Upgrade to RDS release | Cumulus Documentation - + @@ -21,7 +21,7 @@ | cutoffSeconds | number | Number of seconds prior to this execution to 'cutoff' reconciliation queries. This allows in-progress/other in-flight operations time to complete and propagate to Elasticsearch/Dynamo/postgres. | 3600 | | dbConcurrency | number | Sets max number of parallel collections reports the script will run at a time. | 20 | | dbMaxPool | number | Sets the maximum number of connections the database pool has available. Modifying this may result in unexpected failures. | 20 |

    - + \ No newline at end of file diff --git a/docs/v11.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html b/docs/v11.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html index cdeff8f94d3..7a1d7a40dc1 100644 --- a/docs/v11.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html +++ b/docs/v11.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html @@ -5,13 +5,13 @@ Upgrade to TF version 0.13.6 | Cumulus Documentation - +
    Version: v11.0.0

    Upgrade to TF version 0.13.6

    Background

    Cumulus pins its support to a specific version of Terraform see: deployment documentation. The reason for only supporting one specific Terraform version at a time is to avoid deployment errors than can be caused by deploying to the same target with different Terraform versions.

    Cumulus is upgrading its supported version of Terraform from 0.12.12 to 0.13.6. This document contains instructions on how to perform the upgrade for your deployments.

    Prerequisites

    • Follow the Terraform guidance for what to do before upgrading, notably ensuring that you have no pending changes to your Cumulus deployments before proceeding.
      • You should do a terraform plan to see if you have any pending changes for your deployment (for both the data-persistence-tf and cumulus-tf modules), and if so, run a terraform apply before doing the upgrade to Terraform 0.13.6
    • Review the Terraform v0.13 release notes to prepare for any breaking changes that may affect your custom deployment code. Cumulus' deployment code has already been updated for compatibility with version 0.13.
    • Install Terraform version 0.13.6. We recommend using Terraform Version Manager tfenv to manage your installed versons of Terraform, but this is not required.

    Upgrade your deployment code

    Terraform 0.13 does not support some of the syntax from previous Terraform versions, so you need to upgrade your deployment code for compatibility.

    Terraform provides a 0.13upgrade command as part of version 0.13 to handle automatically upgrading your code. Make sure to check out the documentation on batch usage of 0.13upgrade, which will allow you to upgrade all of your Terraform code with one command.

    Run the 0.13upgrade command until you have no more necessary updates to your deployment code.

    Upgrade your deployment

    1. Ensure that you are running Terraform 0.13.6 by running terraform --version. If you are using tfenv, you can switch versions by running tfenv use 0.13.6.

    2. For the data-persistence-tf and cumulus-tf directories, take the following steps:

      1. Run terraform init --reconfigure. The --reconfigure flag is required, otherwise you might see an error like:

        Error: Failed to decode current backend config

        The backend configuration created by the most recent run of "terraform init"
        could not be decoded: unsupported attribute "lock_table". The configuration
        may have been initialized by an earlier version that used an incompatible
        configuration structure. Run "terraform init -reconfigure" to force
        re-initialization of the backend.
      2. Run terraform apply to perform a deployment.

        WARNING: Even if Terraform says that no resource changes are pending, running the apply using Terraform version 0.13.6 will modify your backend state from version 0.12.12 to version 0.13.6 without requiring approval. Updating the backend state is a necessary part of the version 0.13.6 upgrade, but it is not completely transparent.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/workflow_tasks/discover_granules/index.html b/docs/v11.0.0/workflow_tasks/discover_granules/index.html index 92cdd17f8fb..b307da8a58c 100644 --- a/docs/v11.0.0/workflow_tasks/discover_granules/index.html +++ b/docs/v11.0.0/workflow_tasks/discover_granules/index.html @@ -5,7 +5,7 @@ Discover Granules | Cumulus Documentation - + @@ -21,7 +21,7 @@ included in a granule's file list. That is, no such filtering based on filename occurs as described above.

    When set on the task configuration, the value applies to all collections during discovery. Otherwise, this property may be set on individual collections.

    Concurrency

    A number property that determines the level of concurrency with which granule duplicate checks are performed when duplicateGranuleHandling is skip or error.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when discover-granules discovers a large number of granules with skip or error duplicate handling. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the discover-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/workflow_tasks/files_to_granules/index.html b/docs/v11.0.0/workflow_tasks/files_to_granules/index.html index e14f703e5be..f8de7305c89 100644 --- a/docs/v11.0.0/workflow_tasks/files_to_granules/index.html +++ b/docs/v11.0.0/workflow_tasks/files_to_granules/index.html @@ -5,13 +5,13 @@ Files To Granules | Cumulus Documentation - +
    Version: v11.0.0

    Files To Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming config.inputGranules and the task input list of s3 URIs along with the rest of the configuration objects to take the list of incoming files and sort them into a list of granule objects.

    Please note Files passed in without metadata defined previously for config.inputGranules will be added with the following keys:

    • size
    • bucket
    • key
    • fileName

    It is primarily intended to support compatibility with the standard output of a processing task, and convert that output into a granule object accepted as input by the majority of other Cumulus tasks.

    Task Inputs

    Input

    This task expects an incoming input that contains an array of 'staged' S3 URIs to move to their final archive location.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    inputGranules

    An array of Cumulus granule objects.

    This object will be used to define metadata values for the move granules task, and is the basis for the updated object that will be added to the output.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/workflow_tasks/lzards_backup/index.html b/docs/v11.0.0/workflow_tasks/lzards_backup/index.html index 909e870bb49..f7adf8c5686 100644 --- a/docs/v11.0.0/workflow_tasks/lzards_backup/index.html +++ b/docs/v11.0.0/workflow_tasks/lzards_backup/index.html @@ -5,13 +5,13 @@ LZARDS Backup | Cumulus Documentation - +
    Version: v11.0.0

    LZARDS Backup

    The LZARDS backup task takes an array of granules and initiates backup requests to the LZARDS API, which will be handled asynchronously by LZARDS.

    Deployment

    The LZARDS backup task is not automatically deployed with Cumulus. To deploy the task through the Cumulus module, first you must specify a lzards_launchpad_passphrase in your terraform variables (e.g. variables.tf) like so:

    variable "lzards_launchpad_passphrase" {
    type = string
    default = ""
    }

    Then you can specify a value for your lzards_launchpad_passphrase in terraform.tfvars like so:

    lzards_launchpad_passphrase = your-passphrase

    Lastly, you need to make sure that the lzards_launchpad_passphrase is passed into the Cumulus module (in main.tf) like so:

    lzards_launchpad_passphrase  = var.lzards_launchpad_passphrase

    In short, deploying the LZARDS task requires configuring a passphrase variable and ensuring that your TF configuration passes that variable into the Cumulus module.

    Additional terraform configuration for the LZARDS task can be found in the cumulus module's variables.tf file, where the the relevant variables are prefixed with lzards_. You can add these variables to your deployment using the same process outlined above for lzards_launchpad_passphrase.

    Task Inputs

    Input

    This task expects an array of granules as input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Task Outputs

    Output

    The LZARDS task outputs a composite object containing:

    • the input granules array, and
    • a backupResults object that describes the results of LZARDS backup attempts.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/workflow_tasks/move_granules/index.html b/docs/v11.0.0/workflow_tasks/move_granules/index.html index 89e2b7d9c10..a91e0a6fa10 100644 --- a/docs/v11.0.0/workflow_tasks/move_granules/index.html +++ b/docs/v11.0.0/workflow_tasks/move_granules/index.html @@ -5,13 +5,13 @@ Move Granules | Cumulus Documentation - +
    Version: v11.0.0

    Move Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming event.input array of Cumulus granule objects to do the following:

    • Move granules from their 'staging' location to the final location (as configured in the Sync Granules task)

    • Update the event.input object with the new file locations.

    • If the granule has a ECHO10/UMM CMR file(.cmr.xml or .cmr.json) file included in the event.input:

      • Update that file's access locations

      • Add it to the appropriate access URL category for the CMR filetype as defined by granule CNM filetype.

      • Set the CMR file to 'metadata' in the output granules object and add it to the granule files if it's not already present.

        Please note: Granules without a valid CNM type set in the granule file type field in event.input will be treated as "data" in the updated CMR metadata file

    • Task then outputs an updated list of granule objects.

    Task Inputs

    Input

    This task expects an incoming input that contains a list of 'staged' S3 URIs to move to their final archive location. If CMR metadata is to be updated for a granule, it must also be included in the input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects event.input to provide an array of Cumulus granule objects. The files listed for each granule represent the files to be acted upon as described in summary.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects with post-move file locations as the payload for the next task, and returns only the expected payload for the next task. If a CMR file has been specified for a granule object, the CMR resources related to the granule files will be updated according to the updated granule file metadata.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v11.0.0/workflow_tasks/parse_pdr/index.html b/docs/v11.0.0/workflow_tasks/parse_pdr/index.html index a6c09673dde..43034c01d7e 100644 --- a/docs/v11.0.0/workflow_tasks/parse_pdr/index.html +++ b/docs/v11.0.0/workflow_tasks/parse_pdr/index.html @@ -5,13 +5,13 @@ Parse PDR | Cumulus Documentation - +
    Version: v11.0.0

    Parse PDR

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to do the following with the incoming PDR object:

    • Stage it to an internal S3 bucket

    • Parse the PDR

    • Archive the PDR and remove the staged file if successful

    • Outputs a payload object containing metadata about the parsed PDR (e.g. total size of all files, files counts, etc) and a granules object

    The constructed granules object is created using PDR metadata to determine values like data type and version, collection definitions to determine a file storage location based on the extracted data type and version number.

    Granule file types are converted from the PDR spec types to CNM types according to the following translation table:

      HDF: 'data',
    HDF-EOS: 'data',
    SCIENCE: 'data',
    BROWSE: 'browse',
    METADATA: 'metadata',
    BROWSE_METADATA: 'metadata',
    QA_METADATA: 'metadata',
    PRODHIST: 'qa',
    QA: 'metadata',
    TGZ: 'data',
    LINKAGE: 'data'

    Files missing file types will have none assigned, files with invalid types will result in a PDR parse failure.

    Task Inputs

    Input

    This task expects an incoming input that contains name and path information about the PDR to be parsed. For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    Provider

    A Cumulus provider object. Used to define connection information for retrieving the PDR.

    Bucket

    Defines the bucket where the 'pdrs' folder for parsed PDRs will be stored.

    Collection

    A Cumulus collection object. Used to define granule file groupings and granule metadata for discovered files.

    Task Outputs

    This task outputs a single payload output object containing metadata about the parsed PDR (e.g. filesCount, totalSize, etc), a pdr object with information for later steps and a the generated array of granule objects.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v11.0.0/workflow_tasks/queue_granules/index.html b/docs/v11.0.0/workflow_tasks/queue_granules/index.html index 9d92bc0e499..40ce9762271 100644 --- a/docs/v11.0.0/workflow_tasks/queue_granules/index.html +++ b/docs/v11.0.0/workflow_tasks/queue_granules/index.html @@ -5,14 +5,14 @@ Queue Granules | Cumulus Documentation - +
    Version: v11.0.0

    Queue Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions, and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to schedule ingest of granules that were discovered on a remote host, whether via the DiscoverGranules task or the ParsePDR task.

    The task utilizes a defined collection in concert with a defined provider, either on each granule, or passed in via config to queue up ingest executions for each granule, or for batches of granules.

    The constructed granules object is defined by the collection passed in the configuration, and has impacts to other provided core Cumulus Tasks.

    Users of this task in a workflow are encouraged to carefully consider their configuration in context of downstream tasks and workflows.

    Task Inputs

    Each of the following sections are a high-level discussion of the intent of the various input/output/config values.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects an incoming input that contains granules and information about them and their files. For the specifics, see the Cumulus Tasks page entry for the schema.

    This input is most commonly the output from a preceding DiscoverGranules or ParsePDR task.

    Cumulus Configuration

    This task does expect values to be set in the task_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    provider

    A Cumulus provider object for the originating provider. Will be passed along to the ingest workflow. This will be overruled by more specific provider information that may exist on a granule.

    internalBucket

    The Cumulus internal system bucket.

    granuleIngestWorkflow

    A string property that denotes the name of the ingest workflow into which granules should be queued.

    queueUrl

    A string property that denotes the URL of the queue to which scheduled execution messages are sent.

    preferredQueueBatchSize

    A number property that sets an upper bound on the size of each batch of granules queued into the payload of an ingest execution. Setting this property to a value higher than 1 allows queueing of multiple granules per ingest workflow.

    As ingest executions typically expect granules in the payload to have a common collection and common provider, this property only sets an upper bound within which batches will be created based on common collection and provider information.

    This means batches may be smaller than the preferred size if collection or provider information diverge, but never larger.

    The default value if none is specified is 1, which will queue one ingest execution per granule.

    concurrency

    A number property that determines the level of concurrency with which ingest executions are scheduled. Granules or batches of granules will be queued up into executions at this level of concurrency.

    This property is also used to limit concurrency when updating granule status to queued.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when queue-granules receives a large number of granules as input. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the queue-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    executionNamePrefix

    A string property that will prefix the names of scheduled executions.

    childWorkflowMeta

    An object property that will be merged into the scheduled execution input's meta field.

    Task Outputs

    This task outputs an assembled array of workflow execution ARNs for all scheduled workflow executions within the payload's running object.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/workflows/cumulus-task-message-flow/index.html b/docs/v11.0.0/workflows/cumulus-task-message-flow/index.html index f0c8a87606a..54226e85c5f 100644 --- a/docs/v11.0.0/workflows/cumulus-task-message-flow/index.html +++ b/docs/v11.0.0/workflows/cumulus-task-message-flow/index.html @@ -5,14 +5,14 @@ Cumulus Tasks: Message Flow | Cumulus Documentation - +
    Version: v11.0.0

    Cumulus Tasks: Message Flow

    Cumulus Tasks comprise Cumulus Workflows and are either AWS Lambda tasks or AWS Elastic Container Service (ECS) activities. Cumulus Tasks permit a payload as input to the main task application code. The task payload is additionally wrapped by the Cumulus Message Adapter. The Cumulus Message Adapter supplies additional information supporting message templating and metadata management of these workflows.

    Diagram showing how incoming and outgoing Cumulus messages for workflow steps are handled by the Cumulus Message Adapter

    The steps in this flow are detailed in sections below.

    Cumulus Message Format

    A full Cumulus Message has the following keys:

    • cumulus_meta: System runtime information that should generally not be touched outside of Cumulus library code or the Cumulus Message Adapter. Stores meta information about the workflow such as the state machine name and the current workflow execution's name. This information is used to look up the current active task. The name of the current active task is used to look up the corresponding task's config in task_config.
    • meta: Runtime information captured by the workflow operators. Stores execution-agnostic variables.
    • payload: Payload is runtime information for the tasks.

    In addition to the above keys, it may contain the following keys:

    • replace: A key generated in conjunction with the Cumulus Message adapter. It contains the location on S3 for a message payload and a Target JSON path in the message to extract it to.
    • exception: A key used to track workflow exceptions, should not be modified outside of Cumulus library code.

    Here's a simple example of a Cumulus Message:

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    A message utilizing the Cumulus Remote message functionality must have at least the keys replace and cumulus_meta. Depending on configuration other portions of the message may be present, however the cumulus_meta, meta, and payload keys must be present once extraction is complete.

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    Cumulus Message Preparation

    The event coming into a Cumulus Task is assumed to be a Cumulus Message and should first be handled by the functions described below before being passed to the task application code.

    Preparation Step 1: Fetch remote event

    Fetch remote event will fetch the full event from S3 if the cumulus message includes a replace key.

    Once "my-large-event.json" is fetched from S3, it's returned from the fetch remote event function. If no "replace" key is present, the event passed to the fetch remote event function is assumed to be a complete Cumulus Message and returned as-is.

    Preparation Step 2: Parse step function config from CMA configuration parameters

    This step determines what current task is being executed. Note this is different from what lambda or activity is being executed, because the same lambda or activity can be used for different tasks. The current task name is used to load the appropriate configuration from the Cumulus Message's 'task_config' configuration parameter.

    Preparation Step 3: Load nested event

    Using the config returned from the previous step, load nested event resolves templates for the final config and input to send to the task's application code.

    Task Application Code

    After message prep, the message passed to the task application code is of the form:

    {
    "input": {},
    "config": {}
    }

    Create Next Message functions

    Whatever comes out of the task application code is used to construct an outgoing Cumulus Message.

    Create Next Message Step 1: Assign outputs

    The config loaded from the Fetch step function config step may have a cumulus_message key. This can be used to "dispatch" fields from the task's application output to a destination in the final event output (via URL templating). Here's an example where the value of input.anykey would be dispatched as the value of payload.out in the final cumulus message:

    {
    "task_config": {
    "bar": "baz",
    "cumulus_message": {
    "input": "{$.payload.input}",
    "outputs": [
    {
    "source": "{$.input.anykey}",
    "destination": "{$.payload.out}"
    }
    ]
    }
    },
    "cumulus_meta": {
    "task": "Example",
    "message_source": "local",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "input": {
    "anykey": "anyvalue"
    }
    }
    }

    Create Next Message Step 2: Store remote event

    If the ReplaceConfiguration parameter is set, the configured key's value will be stored in S3 and the final output of the task will include a replace key that contains configuration for a future step to extract the payload on S3 back into the Cumulus Message. The replace key identifies where the large event node has been stored in S3.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/workflows/developing-a-cumulus-workflow/index.html b/docs/v11.0.0/workflows/developing-a-cumulus-workflow/index.html index b25814168e4..efd8fca0af4 100644 --- a/docs/v11.0.0/workflows/developing-a-cumulus-workflow/index.html +++ b/docs/v11.0.0/workflows/developing-a-cumulus-workflow/index.html @@ -5,13 +5,13 @@ Creating a Cumulus Workflow | Cumulus Documentation - +
    Version: v11.0.0

    Creating a Cumulus Workflow

    The Cumulus workflow module

    To facilitate adding a workflows to your deployment Cumulus provides a workflow module.

    In combination with the Cumulus message, the workflow module provides a way to easily turn a Step Function definition into a Cumulus workflow, complete with:

    Using the module also ensures that your workflows will continue to be compatible with future versions of Cumulus.

    For more on the full set of current available options for the module, please consult the module README.

    Adding a new Cumulus workflow to your deployment

    To add a new Cumulus workflow to your deployment that is using the cumulus module, add a new workflow resource to your deployment directory, either in a new .tf file, or to an existing file.

    The workflow should follow a syntax similar to:

    module "my_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/vx.x.x/terraform-aws-cumulus-workflow.zip"

    prefix = "my-prefix"
    name = "MyWorkflowName"
    system_bucket = "my-internal-bucket"

    workflow_config = module.cumulus.workflow_config

    tags = { Deployment = var.prefix }

    state_machine_definition = <<JSON
    {}
    JSON
    }

    In the above example, you would add your state_machine_definition using the Amazon States Language, using tasks you've developed and Cumulus core tasks that are made available as part of the cumulus terraform module.

    Please note: Cumulus follows the convention of tagging resources with the prefix variable { Deployment = var.prefix } that you pass to the cumulus module. For resources defined outside of Core, it's recommended that you adopt this convention as it makes resources and/or deployment recovery scenarios much easier to manage.

    Examples

    For a functional example of a basic workflow, please take a look at the hello_world_workflow.

    For more complete/advanced examples, please read the following cookbook entries/topics:

    - + \ No newline at end of file diff --git a/docs/v11.0.0/workflows/developing-workflow-tasks/index.html b/docs/v11.0.0/workflows/developing-workflow-tasks/index.html index cb4b20ca849..cbad37c3daa 100644 --- a/docs/v11.0.0/workflows/developing-workflow-tasks/index.html +++ b/docs/v11.0.0/workflows/developing-workflow-tasks/index.html @@ -5,13 +5,13 @@ Developing Workflow Tasks | Cumulus Documentation - +
    Version: v11.0.0

    Developing Workflow Tasks

    Workflow tasks can be either AWS Lambda Functions or ECS Activities.

    Lambda functions

    The full set of available core Lambda functions can be found in the deployed cumulus module zipfile at /tasks, as well as reference documentation here. These Lambdas can be referenced in workflows via the outputs from that module (see the cumulus-template-deploy repo for an example).

    The tasks source is located in the Cumulus repository at cumulus/tasks.

    You can also develop your own Lambda function. See the Lambda Functions page to learn more.

    ECS Activities

    ECS activities are supported via the cumulus_ecs_module available from the Cumulus release page.

    Please read the module README for configuration details.

    For assistance in creating a task definition within the module read the AWS Task Definition Docs.

    For a step-by-step example of using the cumulus_ecs_module, please see the related cookbook entry.

    Cumulus Docker Image

    ECS activities require a docker image. Cumulus provides a docker image (source for node 12x+ lambdas on dockerhub: cumuluss/cumulus-ecs-task.

    Alternate Docker Images

    Custom docker images/runtimes are supported as are private registries. For details on configuring a private registry/image see the AWS documentation on Private Registry Authentication for Tasks.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/workflows/docker/index.html b/docs/v11.0.0/workflows/docker/index.html index 87f22e54f74..1c591dc0714 100644 --- a/docs/v11.0.0/workflows/docker/index.html +++ b/docs/v11.0.0/workflows/docker/index.html @@ -5,7 +5,7 @@ Dockerizing Data Processing | Cumulus Documentation - + @@ -14,7 +14,7 @@ 2) validate the output (in this case just check for existence) 3) use 'ncatted' to update the resulting file to be CF-compliant 4) write out metadata generated for this file

    Process Testing

    It is important to have tests for data processing, however in many cases datafiles can be large so it is not practical to store the test data in the repository. Instead, test data is currently stored on AWS S3, and can be retrieved using the AWS CLI.

    aws s3 sync s3://cumulus-ghrc-logs/sample-data/collection-name data

    Where collection-name is the name of the data collection, such as 'avaps', or 'cpl'. For example, an abridged version of the data for CPL includes:

    ├── cpl
    │   ├── input
    │   │   ├── HS3_CPL_ATB_12203a_20120906.hdf5
    │   │   ├── HS3_CPL_OP_12203a_20120906.hdf5
    │   └── output
    │   ├── HS3_CPL_ATB_12203a_20120906.nc
    │   ├── HS3_CPL_ATB_12203a_20120906.nc.meta.xml
    │   ├── HS3_CPL_OP_12203a_20120906.nc
    │   ├── HS3_CPL_OP_12203a_20120906.nc.meta.xml

    Contained in the input directory are all possible sets of data files, while the output directory is the expected result of processing. In this case the hdf5 files are converted to NetCDF files and XML metadata files are generated.

    The docker image for a process can be used on the retrieved test data. First create a test-output directory in the newly created data directory.

    mkdir data/test-output

    Then run the docker image using docker-compose.

    docker-compose run test

    This will process the data in the data/input directory and put the output into data/test-output. Repositories also include Python based tests which will validate this newly created output to the contents of data/output. Use Python's Nose tool to run the included tests.

    nosetests

    If the data/test-output directory validated against the contents of data/output the tests will be successful, otherwise an error will be reported.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/workflows/index.html b/docs/v11.0.0/workflows/index.html index 6676da91e83..15fb60db47d 100644 --- a/docs/v11.0.0/workflows/index.html +++ b/docs/v11.0.0/workflows/index.html @@ -5,13 +5,13 @@ Workflows | Cumulus Documentation - +
    Version: v11.0.0

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    Provider data ingest and GIBS have a set of common needs in getting data from a source system and into the cloud where they can be distributed to end users. These common needs are:

    • Data Discovery - Crawling, polling, or detecting changes from a variety of sources.
    • Data Transformation - Taking data files in their original format and extracting and transforming them into another desired format such as visible browse images.
    • Archival - Storage of the files in a location that's accessible to end users.

    The high level view of the architecture and many of the individual steps are the same but the details of ingesting each type of collection differs. Different collection types and different providers have different needs. The individual boxes of a workflow are not only different. The branching, error handling, and multiplicity of the arrows connecting the boxes are also different. Some need visible images rendered from component data files from multiple collections. Some need to contact the CMR with updated metadata. Some will have different retry strategies to handle availability issues with source data systems.

    AWS and other cloud vendors provide an ideal solution for parts of these problems but there needs to be a higher level solution to allow the composition of AWS components into a full featured solution. The Ingest Workflow Architecture is designed to meet the needs for Earth Science data ingest and transformation.

    Goals

    Flexibility and Composability

    The steps to ingest and process data is different for each collection within a provider. Ingest should be as flexible as possible in the rearranging of steps and configuration.

    We want to use lego-like individual steps that can be composed by an operator.

    Individual steps should ...

    • Be as ignorant as possible of the overall flow. They should not be aware of previous steps.
    • Be runnable on their own.
    • Define their input and output in simple data structures.
    • Be domain agnostic.
    • Not make assumptions of specifics of what goes into a granule for example.

    Scalable

    The ingest architecture needs to be scalable both to handle ingesting hundreds of millions of granules and interpret dozens of different workflows.

    Data Provenance

    • We should have traceability for how data was produced and where it comes from.
    • Use immutable representations of data. Data once received is not overwritten. Data can be removed for cleanup.
    • All software is versioned. We can trace transformation of data by tracking the immutable source data and the versioned software applied to it.

    Operator Visibility and Control

    • Operators should be able to see and understand everything that is happening in the system.
    • It should be obvious why things are happening and straightforward to diagnose problems.
    • We generally assume that the operators know best in terms of the limits on a providers infrastructure, how often things need to be done, and details of a collection. The architecture should defer to their decisions and knowledge while providing safety nets to prevent problems.

    A Reconfigurable Workflow Architecture

    The Ingest Workflow Architecture is defined by two entity types, Workflows and Tasks. A Workflow is a set of composed Tasks to complete an objective such as ingesting a granule. Tasks are the individual steps of a Workflow that perform one job. The workflow is responsible for executing the right task based on the current state and response from the last task executed. Tasks are completely decoupled in that they don't call each other or even need to know about the presence of other tasks.

    Workflows and tasks are configured as Terraform resources, which are triggered via configured rules within Cumulus.

    Diagram showing the Step Function execution path through workflow tasks for a collection ingest

    See the Example GIBS Ingest Architecture showing how workflows and tasks are used to define the GIBS Ingest Architecture.

    Workflows

    A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions.

    Benefits of AWS Step Functions

    AWS Step functions are described in detail in the AWS documentation but they provide several benefits which are applicable to AWS.

    • Prebuilt solution
    • Operations Visibility
      • Visual diagram
      • Every execution is recorded with both inputs and output for every step.
    • Composability
      • Allow composing AWS Lambdas and code running in other steps. Code can be run in EC2 to interface with it or even on premise if desired.
      • Step functions allow specifying when steps run in parallel or choices between steps based on data from the previous step.
    • Flexibility
      • Step functions are designed to be easy to build new applications and reconfigure. We're exposing that flexibility directly to the provider.
    • Reliability and Error Handling
      • Step functions allow configuration of retries and adding handling of error conditions.
    • Described via data
      • This makes it easy to save the step function in configuration management solutions.
      • We can build simple interfaces on top of the flexibility provided.

    Workflow Scheduler

    The scheduler is responsible for initiating a step function and passing in the relevant data for a collection. This is currently configured as an interval for each collection. The scheduler service creates the initial event by combining the collection configuration with the AWS execution context defined via the cumulus terraform module.

    Tasks

    A workflow is composed of tasks. Each task is responsible for performing a discrete step of the ingest process. These can be activities like:

    • Crawling a provider website for new data.
    • Uploading data from a provider to S3.
    • Executing a process to transform data.

    AWS Step Functions permit tasks to be code running anywhere, even on premise. We expect most tasks will be written as Lambda functions in order to take advantage of the easy deployment, scalability, and cost benefits provided by AWS Lambda.

    • Leverages Existing Work
      • The design leverages the existing work of Amazon by defining workflows using the AWS Step Function State Language. This is the language that was created for describing the state machines used in AWS Step Functions.
    • Open for Extension
      • Both meta and task_config which are used for configuring at the collection and task levels do not dictate the fields and structure of the configuration. Additional task specific JSON schemas can be used for extending the validation of individual steps.
    • Data-centric Configuration
      • The use of a single JSON configuration file allows this to be added to a workflow. We build additional support on top of the configuration file for simpler domain specific configuration or interactive GUIs.

    For more details on Task Messages and Configuration, visit Cumulus configuration and message protocol documentation.

    Ingest Deploy

    To view deployment documentation, please see the Cumulus deployment documentation.

    Tradeoffs, and Benefits

    This section documents various tradeoffs and benefits of the Ingest Workflow Architecture.

    Tradeoffs

    Workflow execution is handled completely by AWS

    This means we can't add our own code into the orchestration of the workflow. We can't add new features not supported by Step Functions. We can't do things like enforce that the responses from tasks always conform to a schema or extract the configuration for a task ahead of it's execution.

    If we implemented our own orchestration we'd be able to add all of these. We save significant amounts of development effort and gain all the features of Step Functions for this trade off. One workaround is by providing a library of common task capabilities. These would optionally be available to tasks that can be implemented with Node.js and are able to include the library.

    Workflow Configuration is specified in AWS Step Function States Language

    The current design combines the states language defined by AWS with Ingest specific configuration. This means our representation has a tight coupling with their standard. If they make backwards incompatible changes in the future we will have to deal with existing projects written against that.

    We avoid having to develop our own standard and code to process it. The design can support new features in AWS Step Functions without needing to update the Ingest library code changes. It is unlikely they will make a backwards incompatible change at this point. One mitigation for this is writing data transformations to a new format if that were to happen.

    Collection Configuration Flexibility vs Complexity

    The Collections Configuration File is very flexible but requires more knowledge of AWS step functions to configure. A person modifying this file directly would need to comfortable editing a JSON file and configuring AWS Step Functions state transitions which address AWS resources.

    The configuration file itself is not necessarily meant to be edited by a human directly. Since we are developing a reconfigurable, composable architecture that specified entirely in data additional tools can be developed on top of it. The existing recipes.json files can be mapped to this format. Operational Tools like a GUI can be built that provide a usable interface for customizing workflows but it will take time to develop these tools.

    Benefits

    This section describes benefits of the Ingest Workflow Architecture.

    Simplicity

    The concepts of Workflows and Tasks are simple ones that should make sense to providers. Additionally, the implementation will only consist of a few components because the design leverages existing services and capabilities of AWS. The Ingest implementation will only consist of some reusable task code to make task implementation easier, Ingest deployment, and the Workflow Scheduler.

    Composability

    The design aims to satisfy the needs for ingest integrating different workflows for providers. It's flexible in terms of the ability to arrange tasks to meet the needs of a collection. Providers have developed and incorporated open source tools over the years. All of these are easily integrable into the workflows as tasks.

    There is low coupling between task steps. Failures of one component don't bring the whole system down. Individual tasks can be deployed separately.

    Scalability

    AWS Step Functions scale up as needed and aren't limited by a set of number of servers. They also easily allow you to leverage the inherent scalability of serverless functions.

    Monitoring and Auditing

    • Every execution is captured.
    • Every task run has captured input and outputs.
    • CloudWatch Metrics can be used for monitoring many of the events with the StepFunctions. It can also generate alarms for the whole process.
    • Visual report of the entire configuration.
      • Errors and success states are highlighted visually in the flow.

    Data Provenance

    • Monitoring and auditing ensures we know the data that was given to a task.
    • Workflows are versioned and the state machines stored in AWS Step Functions are immutable. Once created they cannot change.
    • Versioning of data in S3 or using immutable records in S3 will mean we always know what data was created as the result of a step or fed into a step.

    Appendix

    Example GIBS Ingest Architecture

    This shows the GIBS Ingest Architecture as an example of the use of the Ingest Workflow Architecture.

    • The GIBS Ingest Architecture consists of two workflows per collection type. There is one for discovery and one for ingest. The final stage of discovery triggers multiple ingest workflows for each MRF granule that needs to be generated.
    • It demonstrates both lambdas as tasks and a container used for MRF generation.

    GIBS Ingest Workflows

    Diagram showing the AWS Step Function execution path for a GIBS ingest workflow

    GIBS Ingest Granules Workflow

    This shows a visualization of an execution of the ingets granules workflow in step functions. The steps highlighted in green are the ones that executed and completed successfully.

    Diagram showing the AWS Step Function execution path for a GIBS ingest granules workflow

    - + \ No newline at end of file diff --git a/docs/v11.0.0/workflows/input_output/index.html b/docs/v11.0.0/workflows/input_output/index.html index 887f745409c..bb8127d6cec 100644 --- a/docs/v11.0.0/workflows/input_output/index.html +++ b/docs/v11.0.0/workflows/input_output/index.html @@ -5,14 +5,14 @@ Workflow Inputs & Outputs | Cumulus Documentation - +
    Version: v11.0.0

    Workflow Inputs & Outputs

    General Structure

    Cumulus uses a common format for all inputs and outputs to workflows. The same format is used for input and output from workflow steps. The common format consists of a JSON object which holds all necessary information about the task execution and AWS environment. Tasks return objects identical in format to their input with the exception of a task-specific payload field. Tasks may also augment their execution metadata.

    Cumulus Message Adapter

    The Cumulus Message Adapter and Cumulus Message Adapter libraries help task developers integrate their tasks into a Cumulus workflow. These libraries adapt input and outputs from tasks into the Cumulus Message format. The Scheduler service creates the initial event message by combining the collection configuration, external resource configuration, workflow configuration, and deployment environment settings. The subsequent workflow messages between tasks must conform to the message schema. By using the Cumulus Message Adapter, individual task Lambda functions only receive the input and output specifically configured for the task, and not non-task-related message fields.

    The Cumulus Message Adapter libraries are called by the tasks with a callback function containing the business logic of the task as a parameter. They first adapt the incoming message to a format more easily consumable by Cumulus tasks, then invoke the task, and then adapt the task response back to the Cumulus message protocol to be sent to the next task.

    A task's Lambda function can be configured to include a Cumulus Message Adapter library which constructs input/output messages and resolves task configurations. The CMA can then be included in one of several ways:

    Lambda Layer

    In order to make use of this configuration, a Lambda layer must be uploaded to your account. Due to platform restrictions, Core cannot currently support sharable public layers, however you can deploy the appropriate version from the release page in two ways:

    Once you've deployed the layer, integrate the CMA layer with your Lambdas:

    • If using the cumulus module, set the cumulus_message_adapter_lambda_layer_version_arn in your .tfvars file to integrate the CMA layer with all core Cumulus lambdas.
    • If including your own Lambda or ECS task Terraform modules, specify the CMA layer ARN in the Terraform resource definitions. Also, make sure to set the CUMULUS_MESSAGE_ADAPTER_DIR environment variable for the task to /opt for the CMA integration to work properly.

    In the future if you wish to update/change the CMA version you will need to update the deployed CMA, and update the layer configuration for the impacted Lambdas as needed.

    Please Note: Updating/removing a layer does not change a deployed Lambda, so to update the CMA you should deploy a new version of the CMA layer, update the associated Lambda configuration to reference the new CMA version, and re-deploy your Lambdas.

    Manual Addition

    You can include the CMA package in the Lambda code in the cumulus-message-adapter sub-directory in your lambda .zip, for any Lambda runtime that includes a python runtime. python 2 is included in Lambda runtimes that use Amazon Linux, however Amazon Linux 2 will not support this directly.

    Please note: It is expected that upcoming Cumulus releases will update the CMA layer to include a python runtime.

    If you are manually adding the message adapter to your source and utilizing the CMA, you should set the Lambda's CUMULUS_MESSAGE_ADAPTER_DIR environment variable to target the installation path for the CMA.

    CMA Input/Output

    Input to the task application code is a json object with keys:

    • input: By default, the incoming payload is the payload output from the previous task, or it can be a portion of the payload as configured for the task in the corresponding .tf workflow definition file.
    • config: Task-specific configuration object with URL templates resolved.

    Output from the task application code is returned in and placed in the payload key by default, but the config key can also be used to return just a portion of the task output.

    CMA configuration

    As of Cumulus > 1.15 and CMA > v1.1.1, configuration of the CMA is expected to be driven by AWS Step Function Parameters.

    Using the CMA package with the Lambda by any of the above mentioned methods (Lambda Layers, manual) requires configuration for its various features via a specific Step Function Parameters configuration format (see sample workflows in the examples cumulus-tf source for more examples):

    {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": "{some config}",
    "task_config": "{some config}"
    }
    }

    The "event.$": "$" parameter is required as it passes the entire incoming message to the CMA client library for parsing, and the CMA itself to convert the incoming message into a Cumulus message for use in the function.

    The following are the CMA's current configuration settings:

    ReplaceConfig (Cumulus Remote Message)

    Because of the potential size of a Cumulus message, mainly the payload field, a task can be set via configuration to store a portion of its output on S3 with a message key Remote Message that defines how to retrieve it and an empty JSON object {} in its place. If the portion of the message targeted exceeds the configured MaxSize (defaults to 0 bytes) it will be written to S3.

    The CMA remote message functionality can be configured using parameters in several ways:

    Partial Message

    Setting the Path/Target path in the ReplaceConfig parameter (and optionally a non-default MaxSize)

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 1,
    "Path": "$.payload",
    "TargetPath": "$.payload"
    }
    }
    }
    }
    }

    will result in any payload output larger than the MaxSize (in bytes) to be written to S3. The CMA will then mark that the key has been replaced via a replace key on the event. When the CMA picks up the replace key in future steps, it will attempt to retrieve the output from S3 and write it back to payload.

    Note that you can optionally use a different TargetPath than Path, however as the target is a JSON path there must be a key to target for replacement in the output of that step. Also note that the JSON path specified must target one node, otherwise the CMA will error, as it does not support multiple replacement targets.

    If TargetPath is omitted, it will default to the value for Path.

    Full Message

    Setting the following parameters for a lambda:

    DiscoverGranules:
    Parameters:
    cma:
    event.$: '$'
    ReplaceConfig:
    FullMessage: true

    will result in the CMA assuming the entire inbound message should be stored to S3 if it exceeds the default max size.

    This is effectively the same as doing:

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 0,
    "Path": "$",
    "TargetPath": "$"
    }
    }
    }
    }
    }

    Cumulus Message example

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Cumulus Remote Message example

    The message may contain a reference to an S3 Bucket, Key and TargetPath as follows:

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    task_config

    This configuration key contains the input/output configuration values for definition of inputs/outputs via URL paths. Important: These values are all relative to json object configured for event.$.

    This configuration's behavior is outlined in the CMA step description below.

    The configuration should follow the format:

    {
    "FunctionName": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "other_cma_configuration": "<config object>",
    "task_config": "<task config>"
    }
    }
    }
    }

    Example:

    {
    "StepFunction": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "sfnEnd": true,
    "stack": "{$.meta.stack}",
    "bucket": "{$.meta.buckets.internal.name}",
    "stateMachine": "{$.cumulus_meta.state_machine}",
    "executionName": "{$.cumulus_meta.execution_name}",
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    }
    }
    }

    Cumulus Message Adapter Steps

    1. Reformat AWS Step Function message into Cumulus Message

    Due to the way AWS handles Parameterized messages, when Parameters are used the CMA takes an inbound message:

    {
    "resource": "arn:aws:lambda:us-east-1:<lambda arn values>",
    "input": {
    "Other Parameter": {},
    "cma": {
    "ConfigKey": {
    "config values": "some config values"
    },
    "event": {
    "cumulus_meta": {},
    "payload": {},
    "meta": {},
    "exception": {}
    }
    }
    }
    }

    and takes the following actions:

    • Takes the object at input.cma.event and makes it the full input
    • Merges all of the keys except event under input.cma into the parent input object

    This results in the incoming message (presumably a Cumulus message) with any cma configuration parameters merged in being passed to the CMA. All other parameterized values defined outside of the cma key are ignored

    2. Resolve Remote Messages

    If the incoming Cumulus message has a replace key value, the CMA will attempt to pull the payload from S3,

    For example, if the incoming contains the following:

      "meta": {
    "foo": {}
    },
    "replace": {
    "TargetPath": "$.meta.foo",
    "Bucket": "some_bucket",
    "Key": "events/some-event-id"
    }

    The CMA will attempt to pull the file stored at Bucket/Key and replace the value at TargetPath, then remove the replace object entirely and continue.

    3. Resolve URL templates in the task configuration

    In the workflow configuration (defined under the task_config key), each task has its own configuration, and it can use URL template as a value to achieve simplicity or for values only available at execution time. The Cumulus Message Adapter resolves the URL templates (relative to the event configuration key) and then passes message to next task. For example, given a task which has the following configuration:

    {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }
    }
    }
    }

    and and incoming message that contains:

    {
    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    }
    }

    The corresponding Cumulus Message would contain:

    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }

    The message sent to the task would be:

    "config" : {
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    },
    "inlinestr": "prefixbarsuffix",
    "array": ["bar"],
    "object": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    },
    "input": "{...}"

    URL template variables replace dotted paths inside curly brackets with their corresponding value. If the Cumulus Message Adapter cannot resolve a value, it will ignore the template, leaving it verbatim in the string. While seemingly complex, this allows significant decoupling of Tasks from one another and the data that drives them. Tasks are able to easily receive runtime configuration produced by previously run tasks and domain data.

    4. Resolve task input

    By default, the incoming payload is the payload from the previous task. The task can also be configured to use a portion of the payload its input message. For example, given a task specifies cma.task_config.cumulus_message.input:

        ExampleTask:
    Parameters:
    cma:
    event.$: '$'
    task_config:
    cumulus_message:
    input: '{$.payload.foo}'

    The task configuration in the message would be:

        {
    "task_config": {
    "cumulus_message": {
    "input": "{$.payload.foo}"
    }
    },
    "payload": {
    "foo": {
    "anykey": "anyvalue"
    }
    }
    }

    The Cumulus Message Adapter will resolve the task input, instead of sending the whole payload as task input, the task input would be:

        {
    "input" : {
    "anykey": "anyvalue"
    },
    "config": {...}
    }

    5. Resolve task output

    By default, the task's return value is the next payload. However, the workflow task configuration can specify a portion of the return value as the next payload, and can also augment values to other fields. Based on the task configuration under cma.task_config.cumulus_message.outputs, the Message Adapter uses a task's return value to output a message as configured by the task-specific config defined under cma.task_config. The Message Adapter dispatches a "source" to a "destination" as defined by URL templates stored in the task-specific cumulus_message.outputs. The value of the task's return value at the "source" URL is used to create or replace the value of the task's return value at the "destination" URL. For example, given a task specifies cumulus_message.output in its workflow configuration as follows:

    {
    "ExampleTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    }
    }
    }
    }
    }

    The corresponding Cumulus Message would be:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Given the response from the task is:

        {
    "output": {
    "anykey": "boo"
    }
    }

    The Cumulus Message Adapter would output the following Cumulus Message:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    6. Apply Remote Message Configuration

    If the ReplaceConfig configuration parameter is defined, the CMA will evaluate the configuration options provided, and if required write a portion of the Cumulus Message to S3, and add a replace key to the message for future steps to utilize.

    Please Note: the non user-modifiable field cumulus-meta will always be retained, regardless of the configuration.

    For example, if the output message (post output configuration) from a cumulus message looks like:

        {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    the resultant output would look like:

    {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "replace": {
    "TargetPath": "$",
    "Bucket": "some-internal-bucket",
    "Key": "events/some-event-id"
    }
    }

    Additional features

    Validate task input, output and configuration messages against the schemas provided

    The Cumulus Message Adapter has the capability to validate task input, output and configuration messages against their schemas. The default location of the schemas is the schemas folder in the top level of the task and the default filenames are input.json, output.json, and config.json. The task can also configure a different schema location. If no schema can be found, the Cumulus Message Adapter will not validate the messages.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/workflows/lambda/index.html b/docs/v11.0.0/workflows/lambda/index.html index e7250d794ff..a320292a44b 100644 --- a/docs/v11.0.0/workflows/lambda/index.html +++ b/docs/v11.0.0/workflows/lambda/index.html @@ -5,13 +5,13 @@ Develop Lambda Functions | Cumulus Documentation - +
    Version: v11.0.0

    Develop Lambda Functions

    Develop a new Cumulus Lambda

    AWS provides great getting started guide for building Lambdas in the developer guide.

    Cumulus currently supports the following environments for Cumulus Message Adapter enabled functions:

    Additionally you may chose to include any of the other languages AWS supports as a resource with reduced feature support.

    Deploy a Lambda

    Node.js Lambda

    For a new Node.js Lambda, create a new function and add an aws_lambda_function resource to your Cumulus deployment (for examples, see the example in source example/lambdas.tf and ingest/lambda-functions.tf) as either a new .tf file, or added to an existing .tf file:

    resource "aws_lambda_function" "myfunction" {
    function_name = "${var.prefix}-function"
    filename = "/path/to/zip/lambda.zip"
    source_code_hash = filebase64sha256("/path/to/zip/lambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"

    vpc_config {
    subnet_ids = var.subnet_ids
    security_group_ids = var.security_group_ids
    }
    }

    Please note: This example contains the minimum set of required configuration.

    Make sure to include a vpc_config that matches the information you've provided the cumulus module if intending to integrate the lambda with a Cumulus deployment.

    Java Lambda

    Java Lambdas are created in much the same way as the Node.js example above.

    The source points to a folder with the compiled .class files and dependency libraries in the Lambda Java zip folder structure (details here), not an uber-jar.

    The deploy folder referenced here would contain a folder 'test_task/task/' which contains Task.class and TaskLogic.class as well as a lib folder containing dependency jars.

    Python Lambda

    Python Lambdas are created the same way as the Node.js example above.

    Cumulus Message Adapter

    For Lambdas wishing to utilize the Cumulus Message Adapter(CMA), you should define a layers key on your Lambda resource with the CMA you wish to include. See the input_output docs for more on how to create/use the CMA.

    Other Lambda Options

    Cumulus supports all of the options available to you via the aws_lambda_function Terraform resource. For more information on what's available, check out the Terraform resource docs.

    Cloudwatch log groups

    If you want to enable Cloudwatch logging for your Lambda resource, you'll need to add a aws_cloudwatch_log_group resource to your Lambda definition:

    resource "aws_cloudwatch_log_group" "myfunction_log_group" {
    name = "/aws/lambda/${aws_lambda_function.myfunction.function_name}"
    retention_in_days = 30
    tags = { Deployment = var.prefix }
    }
    - + \ No newline at end of file diff --git a/docs/v11.0.0/workflows/protocol/index.html b/docs/v11.0.0/workflows/protocol/index.html index 353f72bea15..e3956736080 100644 --- a/docs/v11.0.0/workflows/protocol/index.html +++ b/docs/v11.0.0/workflows/protocol/index.html @@ -5,13 +5,13 @@ Workflow Protocol | Cumulus Documentation - +
    Version: v11.0.0

    Workflow Protocol

    Configuration and Message Use Diagram

    A diagram showing at which point in a workflow the Cumulus message is checked for conformity with the message schema and where the configuration is checked for conformity with the configuration schema

    • Configuration - The Cumulus workflow configuration defines everything needed to describe an instance of Cumulus.
    • Scheduler - This starts ingest of a collection on configured intervals.
    • Input to Step Functions - The Scheduler uses the Configuration as source data to construct the input to the Workflow.
    • AWS Step Functions - Run the workflows as kicked off by the scheduler or other processes.
    • Input to Task - The input for each task is a JSON document that conforms to the message schema.
    • Output from Task - The output of each task must conform to the message schemas as well and is used as the input for the subsequent task.
    - + \ No newline at end of file diff --git a/docs/v11.0.0/workflows/workflow-configuration-how-to/index.html b/docs/v11.0.0/workflows/workflow-configuration-how-to/index.html index b4caae8dc0b..1fdac2b8991 100644 --- a/docs/v11.0.0/workflows/workflow-configuration-how-to/index.html +++ b/docs/v11.0.0/workflows/workflow-configuration-how-to/index.html @@ -5,7 +5,7 @@ Workflow Configuration How To's | Cumulus Documentation - + @@ -24,7 +24,7 @@ To take a subset of any given metadata, use the option substring.

    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{substring(file.fileName, 0, 3)}"

    This example will populate to "MOD09GQ/MOD"

    In addition to substring, several datetime-specific functions are available, which can parse a datetime string in the metadata and extract a certain part of it:

    "url_path": "{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"

    or

     "url_path": "{dateFormat(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime, YYYY-MM-DD[T]HH[:]mm[:]ss)}"

    The following functions are implemented:

    • extractYear - returns the year, formatted as YYYY
    • extractMonth - returns the month, formatted as MM
    • extractDate - returns the day of the month, formatted as DD
    • extractHour - returns the hour in 24-hour format, with no leading zero
    • dateFormat - takes a second argument describing how to format the date, and passes the metadata date string and the format argument to moment().format()

    Note: the move-granules step needs to be in the workflow for this template to be populated and the file moved. This cmrMetadata or CMR granule XML needs to have been generated and stored on S3. From there any field could be retrieved and used for a url_path.

    Adding Metadata dates and times to the URL Path

    There are a number of options to pull dates from the CMR file metadata. With this metadata:

    <Granule>
    <Temporal>
    <RangeDateTime>
    <BeginningDateTime>2003-02-19T00:00:00Z</BeginningDateTime>
    <EndingDateTime>2003-02-19T23:59:59Z</EndingDateTime>
    </RangeDateTime>
    </Temporal>
    </Granule>

    The following examples of url_path could be used.

    {extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the year from the full date: 2003.

    {extractMonth(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the month: 2.

    {extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the day: 19.

    {extractHour(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the hour: 0.

    Different values can be combined to create the url_path. For example

    {
    "bucket": "sample-protected-bucket",
    "name": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)/extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"
    }

    The final file location for the above would be s3://sample-protected-bucket/MOD09GQ/2003/19/MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.

    - + \ No newline at end of file diff --git a/docs/v11.0.0/workflows/workflow-triggers/index.html b/docs/v11.0.0/workflows/workflow-triggers/index.html index 854f41f0382..09c3f28d093 100644 --- a/docs/v11.0.0/workflows/workflow-triggers/index.html +++ b/docs/v11.0.0/workflows/workflow-triggers/index.html @@ -5,13 +5,13 @@ Workflow Triggers | Cumulus Documentation - +
    Version: v11.0.0

    Workflow Triggers

    For a workflow to run, it needs to be associated with a rule (see rule configuration). The rule configuration determines how and when a workflow execution is triggered. Rules can be triggered one time, on a schedule, or by new data written to a kinesis stream.

    There are three lambda functions in the API package responsible for scheduling and starting workflows: SF scheduler, message consumer, and SF starter. Each Cumulus instance comes with a Start SF SQS queue.

    The SF scheduler lambda puts a message onto the Start SF queue. This message is picked up the Start SF lambda and an execution is started with the body of the message as the input.

    When a one time rule is created, the schedule SF lambda is triggered. Rules that are not one time are associated with a CloudWatch event which will manage the trigger of the lambdas that trigger the workflows.

    For a scheduled rule, the Cloudwatch event is triggered on the given schedule which calls directly to the schedule SF lambda.

    For a kinesis rule, when data is added to the kinesis stream, the Cloudwatch event is triggered, which calls the message consumer lambda. The message consumer lambda parses the kinesis message and finds all of the rules associated with that message. For each rule (which corresponds to one workflow), the schedule SF lambda is triggered to queue a message to start the workflow.

    For an sns rule, when a message is published to the SNS topic, the message consumer receives the SNS message (JSON expected), parses it into an object, starts a new execution of the workflow associated with the rule and passes the object in the payload field of the Cumulus message.

    Diagram showing how workflows are scheduled via rules

    - + \ No newline at end of file diff --git a/docs/v11.1.0/adding-a-task/index.html b/docs/v11.1.0/adding-a-task/index.html index e4e3dccd53b..c2047c3e951 100644 --- a/docs/v11.1.0/adding-a-task/index.html +++ b/docs/v11.1.0/adding-a-task/index.html @@ -5,13 +5,13 @@ Contributing a Task | Cumulus Documentation - +
    Version: v11.1.0

    Contributing a Task

    We're tracking reusable Cumulus tasks in this list and, if you've got one you'd like to share with others, you can add it!

    Right now we're focused on tasks distributed via npm, but are open to including others. For now the script that pulls all the data for each package only supports npm.

    The tasks.md file is generated in the build process

    The tasks list in docs/tasks.md is generated from the list of task package names from the tasks folder.

    Do not edit the docs/tasks.md file directly.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/api/index.html b/docs/v11.1.0/api/index.html index f334cfbdf82..41ae988edce 100644 --- a/docs/v11.1.0/api/index.html +++ b/docs/v11.1.0/api/index.html @@ -5,13 +5,13 @@ Cumulus API | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v11.1.0/architecture/index.html b/docs/v11.1.0/architecture/index.html index faf73acb475..b389d270b67 100644 --- a/docs/v11.1.0/architecture/index.html +++ b/docs/v11.1.0/architecture/index.html @@ -5,14 +5,14 @@ Architecture | Cumulus Documentation - +
    Version: v11.1.0

    Architecture

    Architecture

    Below, find a diagram with the components that comprise an instance of Cumulus.

    Architecture diagram of a Cumulus deployment

    This diagram details all of the major architectural components of a Cumulus deployment.

    While the diagram can feel complex, it can easily be digested in several major components:

    Data Distribution

    End Users can access data via Cumulus's distribution submodule, which includes ASF's thin egress application, this provides authenticated data egress, temporary S3 links and other statistics features.

    End user exposure of Cumulus's holdings is expected to be provided by an external service.

    For NASA use, this is assumed to be CMR in this diagram.

    Data ingest

    Workflows

    The core of the ingest and processing capabilities in Cumulus is built into the deployed AWS Step Function workflows. Cumulus rules trigger workflows via either Cloud Watch rules, Kinesis streams, SNS topic, or SQS queue. The workflows then run with a configured Cumulus message, utilizing built-in processes to report status of granules, PDRs, executions, etc to the Data Persistence components.

    Workflows can optionally report granule metadata to CMR, and workflow steps can report metrics information to a shared SNS topic, which could be subscribed to for near real time granule, execution, and PDR status. This could be used for metrics reporting using an external ELK stack, for example.

    Data persistence

    Cumulus entity state data is stored in a set of PostgreSQL compatible database, and is exported to an Elasticsearch instance for non-authoritative querying/state data for the API and other applications that require more complex queries. Currently the entity state data is replicated in DynamoDB and this will be removed in a future release.

    Data discovery

    Discovering data for ingest is handled via workflow step components using Cumulus provider and collection configurations and various triggers. Data can be ingested from AWS S3, FTP, HTTPS and more.

    Database

    Cumulus utilizes a user-provided PostgreSQL database backend. For improved API search query efficiency Cumulus provides data replication to an Elasticsearch instance. For legacy reasons, Cumulus is currently also deploying a DynamoDB datastore, and writes are replicated in parallel with the PostgreSQL database writes. The DynamoDB replicated tables and parallel writes will be removed in future releases.

    PostgreSQL Database Schema Diagram

    ERD of the Cumulus Database

    Maintenance

    System maintenance personnel have access to manage ingest and various portions of Cumulus via an AWS API gateway, as well as the operator dashboard.

    Deployment Structure

    Cumulus is deployed via Terraform and is organized internally into two separate top-level modules, as well as several external modules.

    Cumulus

    The Cumulus module, which contains multiple internal submodules, deploys all of the Cumulus components that are not part of the Data Persistence portion of this diagram.

    Data persistence

    The data persistence module provides the Data Persistence portion of the diagram.

    Other modules

    Other modules are provided as artifacts on the release page for use in users configuring their own deployment and contain extracted subcomponents of the cumulus module. For more on these components see the components documentation.

    For more on the specific structure, examples of use and how to deploy and more, please see the deployment docs as well as the cumulus-template-deploy repo .

    - + \ No newline at end of file diff --git a/docs/v11.1.0/configuration/cloudwatch-retention/index.html b/docs/v11.1.0/configuration/cloudwatch-retention/index.html index c5b046754ae..48a808e483c 100644 --- a/docs/v11.1.0/configuration/cloudwatch-retention/index.html +++ b/docs/v11.1.0/configuration/cloudwatch-retention/index.html @@ -5,13 +5,13 @@ Cloudwatch Retention | Cumulus Documentation - +
    Version: v11.1.0

    Cloudwatch Retention

    Our lambdas dump logs to AWS CloudWatch. By default, these logs exist indefinitely. However, there are ways to specify a duration for log retention.

    aws-cli

    In addition to getting your aws-cli set-up, there are two values you'll need to acquire.

    1. log-group-name: the name of the log group who's retention policy (retention time) you'd like to change. We'll use /aws/lambda/KinesisInboundLogger in our examples.
    2. retention-in-days: the number of days you'd like to retain the logs in the specified log group for. There is a list of possible values available in the aws logs documentation.

    For example, if we wanted to set log retention to 30 days on our KinesisInboundLogger lambda, we would write:

    aws logs put-retention-policy --log-group-name "/aws/lambda/KinesisInboundLogger" --retention-in-days 30

    Note: The aws-cli log command that we're using is explained in detail here.

    AWS Management Console

    Changing the log retention policy in the AWS Management Console is a fairly simple process:

    1. Navigate to the CloudWatch service in the AWS Management Console.
    2. Click on the Logs entry on the sidebar.
    3. Find the Log Group who's retention policy you're interested in changing.
    4. Click on the value in the Expire Events After column.
    5. Enter/Select the number of days you'd like to retain logs in that log group for.

    Screenshot of AWS console showing how to configure the retention period for Cloudwatch logs

    - + \ No newline at end of file diff --git a/docs/v11.1.0/configuration/collection-storage-best-practices/index.html b/docs/v11.1.0/configuration/collection-storage-best-practices/index.html index 92e39f47611..1d8833f7bba 100644 --- a/docs/v11.1.0/configuration/collection-storage-best-practices/index.html +++ b/docs/v11.1.0/configuration/collection-storage-best-practices/index.html @@ -5,13 +5,13 @@ Collection Cost Tracking and Storage Best Practices | Cumulus Documentation - +
    Version: v11.1.0

    Collection Cost Tracking and Storage Best Practices

    Organizing your data is important for metrics you may want to collect. AWS S3 storage and cost metrics are calculated at the bucket level, so it is easy to get metrics by bucket. You can get storage metrics at the key prefix level, but that is done through the CLI, which can be very slow for large buckets. It is very difficult to estimate costs at the prefix level.

    Calculating Storage By Collection

    By bucket

    Usage by bucket can be obtained in your AWS Billing Dashboard via an S3 Usage Report. You can download your usage report for a period of time and review your storage and requests at the bucket level.

    Bucket metrics can also be found in the AWS CloudWatch Metrics Console (also see Using Amazon CloudWatch Metrics).

    Navigate to Storage Metrics and select the BucketName for all buckets you are interested in. The available metrics are BucketSizeInBytes and NumberOfObjects.

    In the Graphed metrics tab, you can select the type of statistic (i.e. average, minimum, maximum) and the period for the stats. At the top, it's useful to select from the dropdown to view the metrics as a number. You can also select the time period for which you want to see stats.

    Alternatively you can query CloudWatch using the CLI.

    This command will return the average number of bytes in the bucket test-bucket for 7/31/2019:

    aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2019-07-31T00:00:00 --end-time 2019-08-01T00:00:00 --period 86400 --statistics Average --region us-east-1 --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=test-bucket Name=StorageType,Value=StandardStorage

    The result looks like:

    {
    "Datapoints": [
    {
    "Timestamp": "2019-07-31T00:00:00Z",
    "Average": 150996467959.0,
    "Unit": "Bytes"
    }
    ],
    "Label": "BucketSizeBytes"
    }

    By key prefix

    AWS does not offer storage and usage statistics at a key prefix level. Via the AWS CLI, you can get the total storage for a bucket or folder. The following command would get the storage for folder example-folder in bucket sample-bucket:

    aws s3 ls --summarize --human-readable --recursive s3://sample-bucket/example-folder | grep 'Total'

    Note that this can be a long-running operation for large buckets.

    Calculating Cost By Collection

    NASA NGAP Environment

    If using an NGAP account, the cost per bucket can be found in your CloudTamer console, in the Financials section of your account information. This is calculated on a monthly basis.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Outside of NGAP

    You can enabled S3 Cost Allocation Tags and tag your buckets. From there, you can view the cost breakdown in your AWS Billing Dashboard via the Cost Explorer. Cost Allocation Tagging is available at the bucket level.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Storage Configuration

    Cumulus allows for the configuration of many buckets for your files. Buckets are created and added to your deployment as part of the deployment process.

    In your Cumulus collection configuration, you specify where you want the files to be stored post-processing. This is done by matching a regular expression on the file with the configured bucket.

    Note that in the collection configuration, the bucket field is the key to the buckets variable in the deployment's .tfvars file.

    Organizing By Bucket

    You can specify separate groups of buckets for each collection, which could look like the example below.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "MOD09GQ-006-private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "MOD09GQ-006-public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    Additional collections would go to different buckets.

    Organizing by Key Prefix

    Different collections can be organized into different folders in the same bucket, using the key prefix, which is specified as the url_path in the collection configuration. In this simplified collection configuration example, the url_path field is set at the top level so that all files go to a path prefixed with the collection name and version.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    In this case, the path to all the files would be: MOD09GQ___006/<filename> in their respective buckets.

    The url_path can be overidden directly on the file configuration. The example below produces the same result.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "protected-2",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    }
    ]
    }
    - + \ No newline at end of file diff --git a/docs/v11.1.0/configuration/data-management-types/index.html b/docs/v11.1.0/configuration/data-management-types/index.html index 1adbbad251b..29e6f89a480 100644 --- a/docs/v11.1.0/configuration/data-management-types/index.html +++ b/docs/v11.1.0/configuration/data-management-types/index.html @@ -5,13 +5,13 @@ Cumulus Data Management Types | Cumulus Documentation - +
    Version: v11.1.0

    Cumulus Data Management Types

    What Are The Cumulus Data Management Types

    • Collections: Collections are logical sets of data objects of the same data type and version. They provide contextual information used by Cumulus ingest.
    • Granules: Granules are the smallest aggregation of data that can be independently managed. They are always associated with a collection, which is a grouping of granules.
    • Providers: Providers generate and distribute input data that Cumulus obtains and sends to workflows.
    • Rules: Rules tell Cumulus how to associate providers and collections and when/how to start processing a workflow.
    • Workflows: Workflows are composed of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage, and archive data.
    • Executions: Executions are records of a workflow.
    • Reconciliation Reports: Reports are a comparison of data sets to check to see if they are in agreement and to help Cumulus users detect conflicts.

    Interaction

    • Providers tell Cumulus where to get new data - i.e. S3, HTTPS
    • Collections tell Cumulus where to store the data files
    • Rules tell Cumulus when to trigger a workflow execution and tie providers and collections together

    Managing Data Management Types

    The following are created via the dashboard or API:

    • Providers
    • Collections
    • Rules
    • Reconciliation reports

    Granules are created by workflow executions and then can be managed via the dashboard or API.

    An execution record is created for each workflow execution triggered and can be viewed in the dashboard or data can be retrieved via the API.

    Workflows are created and managed via the Cumulus deployment.

    Configuration Fields

    Schemas

    Looking at our API schema definitions can provide us with some insight into collections, providers, rules, and their attributes (and whether those are required or not). The schema for different concepts will be reference throughout this document.

    The schemas are extremely useful for understanding which attributes are configurable and which of those are required. Cumulus uses these schemas for validation.

    Providers

    Please note:

    • While connection configuration is defined here, things that are more specific to a specific ingest setup (e.g. 'What target directory should we be pulling from' or 'How is duplicate handling configured?') are generally defined in a Rule or Collection, not the Provider.
    • There is some provider behavior which is controlled by task-specific configuration and not the provider definition. This configuration has to be set on a per-workflow basis. For example, see the httpListTimeout configuration on the discover-granules task

    Provider Configuration

    The Provider configuration is defined by a JSON object that takes different configuration keys depending on the provider type. The following are definitions of typical configuration values relevant for the various providers:

    Configuration by provider type
    S3
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be s3 for this provider type.
    hoststringYesS3 Bucket to pull data from
    http
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be http for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 80
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certificateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    https
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be https for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 443
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certiciateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    ftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be ftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to anonymous if not defined
    passwordstringNoPassword to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to password if not defined
    portintegerNoPort to connect to the provider on. Defaults to 21
    sftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be sftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the sftp server.
    passwordstringNoPassword to use to connect to the sftp server.
    portintegerNoPort to connect to the provider on. Defaults to 22
    privateKeystringNofilename assumed to be in s3://bucketInternal/stackName/crypto
    cmKeyIdstringNoAWS KMS Customer Master Key arn or alias

    Collections

    Break down of [s3_MOD09GQ_006.json](https://github.com/nasa/cumulus/blob/master/example/data/collections/s3_MOD09GQ_006/s3_MOD09GQ_006.json)
    KeyValueRequiredDescription
    name"MOD09GQ"YesThe name attribute designates the name of the collection. This is the name under which the collection will be displayed on the dashboard
    version"006"YesA version tag for the collection
    granuleId"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$"YesThe regular expression used to validate the granule ID extracted from filenames according to the granuleIdExtraction
    granuleIdExtraction"(MOD09GQ\..*)(\.hdf|\.cmr|_ndvi\.jpg)"YesThe regular expression used to extract the granule ID from filenames. The first capturing group extracted from the filename by the regex will be used as the granule ID.
    sampleFileName"MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesAn example filename belonging to this collection
    files<JSON Object> of files defined hereYesDescribe the individual files that will exist for each granule in this collection (size, browse, meta, etc.)
    dataType"MOD09GQ"NoCan be specified, but this value will default to the collection_name if not
    duplicateHandling"replace"No("replace"|"version"|"skip") determines granule duplicate handling scheme
    ignoreFilesConfigForDiscoveryfalse (default)NoBy default, during discovery only files that match one of the regular expressions in this collection's files attribute (see above) are ingested. Setting this to true will ignore the files attribute during discovery, meaning that all files for a granule (i.e., all files with filenames matching granuleIdExtraction) will be ingested even when they don't match a regular expression in the files attribute at discovery time. (NOTE: this attribute does not appear in the example file, but is listed here for completeness.)
    process"modis"NoExample options for this are found in the ChooseProcess step definition in the IngestAndPublish workflow definition
    meta<JSON Object> of MetaData for the collectionNoMetaData for the collection. This metadata will be available to workflows for this collection via the Cumulus Message Adapter.
    url_path"{cmrMetadata.Granule.Collection.ShortName}/
    {substring(file.fileName, 0, 3)}"
    NoFilename without extension

    files-object

    KeyValueRequiredDescription
    regex"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"YesRegular expression used to identify the file
    sampleFileNameMOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesFilename used to validate the provided regex
    type"data"NoValue to be assigned to the Granule File Type. CNM types are used by Cumulus CMR steps, non-CNM values will be treated as 'data' type. Currently only utilized in DiscoverGranules task
    bucket"internal"YesName of the bucket where the file will be stored
    url_path"${collectionShortName}/{substring(file.fileName, 0, 3)}"NoFolder used to save the granule in the bucket. Defaults to the collection url_path
    checksumFor"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"NoIf this is a checksum file, set checksumFor to the regex of the target file.

    Rules

    Rules are used by to start processing workflows and the transformation process. Rules can be invoked manually, based on a schedule, or can be configured to be triggered by either events in Kinesis, SNS messages or SQS messages.

    Rule configuration
    KeyValueRequiredDescription
    name"L2_HR_PIXC_kinesisRule"YesName of the rule. This is the name under which the rule will be listed on the dashboard
    workflow"CNMExampleWorkflow"YesName of the workflow to be run. A list of available workflows can be found on the Workflows page
    provider"PODAAC_SWOT"NoConfigured provider's ID. This can be found on the Providers dashboard page
    collection<JSON Object> collection object shown belowYesName and version of the collection this rule will moderate. Relates to a collection configured and found in the Collections page
    payload<JSON Object or Array>NoThe payload to be passed to the workflow
    meta<JSON Object> of MetaData for the ruleNoMetaData for the rule. This metadata will be available to workflows for this rule via the Cumulus Message Adapter.
    rule<JSON Object> rule type and associated values - discussed belowYesObject defining the type and subsequent attributes of the rule
    state"ENABLED"No("ENABLED"|"DISABLED") whether or not the rule will be active. Defaults to "ENABLED".
    queueUrlhttps://sqs.us-east-1.amazonaws.com/1234567890/queue-nameNoURL for SQS queue that will be used to schedule workflows for this rule
    tags["kinesis", "podaac"]NoAn array of strings that can be used to simplify search

    collection-object

    KeyValueRequiredDescription
    name"L2_HR_PIXC"YesName of a collection defined/configured in the Collections dashboard page
    version"000"YesVersion number of a collection defined/configured in the Collections dashboard page

    meta-object

    KeyValueRequiredDescription
    retries3NoNumber of retries on errors, for sqs-type rule only. Defaults to 3.
    visibilityTimeout900NoVisibilityTimeout in seconds for the inflight messages, for sqs-type rule only. Defaults to the visibility timeout of the SQS queue when the rule is created.

    rule-object

    KeyValueRequiredDescription
    type"kinesis"Yes("onetime"|"scheduled"|"kinesis"|"sns"|"sqs") type of scheduling/workflow kick-off desired
    value<String> ObjectDependsDiscussion of valid values is below

    rule-value

    The rule - value entry depends on the type of run:

    • If this is a onetime rule this can be left blank. Example
    • If this is a scheduled rule this field must hold a valid cron-type expression or rate expression.
    • If this is a kinesis rule, this must be a configured ${Kinesis_stream_ARN}. Example
    • If this is an sns rule, this must be an existing ${SNS_Topic_Arn}. Example
    • If this is an sqs rule, this must be an existing ${SQS_QueueUrl} that your account has permissions to access, and also you must configure a dead-letter queue for this SQS queue. Example

    sqs-type rule features

    • When an SQS rule is triggered, the SQS message remains on the queue.
    • The SQS message is not processed multiple times in parallel when visibility timeout is properly set. You should set the visibility timeout to the maximum expected length of the workflow with padding. Longer is better to avoid parallel processing.
    • The SQS message visibility timeout can be overridden by the rule.
    • Upon successful workflow execution, the SQS message is removed from the queue.
    • Upon failed execution(s), the workflow is run 3 or configured number of times.
    • Upon failed execution(s), the visibility timeout will be set to 5s to allow retries.
    • After configured number of failed retries, the SQS message is moved to the dead-letter queue configured for the SQS queue.

    Configuration Via Cumulus Dashboard

    Create A Provider

    • In the Cumulus dashboard, go to the Provider page.

    Screenshot of Create Provider form

    • Click on Add Provider.
    • Fill in the form and then submit it.

    Screenshot of Create Provider form

    Create A Collection

    • Go to the Collections page.

    Screenshot of the Collections page

    • Click on Add Collection.
    • Copy and paste or fill in the collection JSON object form.

    Screenshot of Add Collection form

    • Once you submit the form, you should be able to verify that your new collection is in the list.

    Create A Rule

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Rule Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v11.1.0/configuration/lifecycle-policies/index.html b/docs/v11.1.0/configuration/lifecycle-policies/index.html index 177c5c1e177..455230c4cf7 100644 --- a/docs/v11.1.0/configuration/lifecycle-policies/index.html +++ b/docs/v11.1.0/configuration/lifecycle-policies/index.html @@ -5,13 +5,13 @@ Setting S3 Lifecycle Policies | Cumulus Documentation - +
    Version: v11.1.0

    Setting S3 Lifecycle Policies

    This document will outline, in brief, how to set data lifecycle policies so that you are more easily able to control data storage costs while keeping your data accessible. For more information on why you might want to do this, see the 'Additional Information' section at the end of the document.

    Requirements

    • The AWS CLI installed and configured (if you wish to run the CLI example). See AWS's guide to setting up the AWS CLI for more on this. Please ensure the AWS CLI is in your shell path.
    • You will need a S3 bucket on AWS. You are strongly encouraged to use a bucket without voluminous amounts of data in it for experimenting/learning.
    • An AWS user with the appropriate roles to access the target bucket as well as modify bucket policies.

    Examples

    Walk-through on setting time-based S3 Infrequent Access (S3IA) bucket policy

    This example will give step-by-step instructions on updating a bucket's lifecycle policy to move all objects in the bucket from the default storage to S3 Infrequent Access (S3IA) after a period of 90 days. Below are instructions for walking through configuration via the command line and the management console.

    Command Line

    Please ensure you have the AWS CLI installed and configured for access prior to attempting this example.

    Create policy

    From any directory you chose, open an editor and add the following to a file named exampleRule.json

    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    Set policy

    On the command line run the following command (with the bucket you're working with substituted in place of yourBucketNameHere).

    aws s3api put-bucket-lifecycle-configuration --bucket yourBucketNameHere --lifecycle-configuration file://exampleRule.json

    Verify policy has been set

    To obtain all of the existing policies for a bucket, run the following command (again substituting the correct bucket name):

     $ aws s3api get-bucket-lifecycle-configuration --bucket yourBucketNameHere
    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    You have set a policy that transitions any version of an object in the bucket to S3IA after each object version has not been modified for 90 days.

    Management Console

    Create Policy

    To create the example policy on a bucket via the management console, go to the following URL (replacing 'yourBucketHere' with the bucket you intend to update):

    https://s3.console.aws.amazon.com/s3/buckets/yourBucketHere/?tab=overview

    You should see a screen similar to:

    Screenshot of AWS console for an S3 bucket

    Click the "Management" Tab, then lifecycle button and press + Add lifecycle rule:

    Screenshot of &quot;Management&quot; tab of AWS console for an S3 bucket

    Give the rule a name (e.g. '90DayRule'), leaving the filter blank:

    Screenshot of window for configuring the name and scope of a lifecycle rule on an S3 bucket in the AWS console

    Click next, and mark Current Version and Previous Versions.

    Then for each, click + Add transition and select Transition to Standard-IA after for the Object creation field, and set 90 for the Days after creation/Days after objects become concurrent field. Your screen should look similar to:

    Screenshot of window for configuring the storage class transitions of a lifecycle rule on an S3 bucket in the AWS console

    Click next, then next past the Configure expiration screen (we won't be setting this), and on the fourth page, click Save:

    Screenshot of window for reviewing the configuration of a lifecycle rule on an S3 bucket in the AWS console

    You should now see you have a rule configured for your bucket:

    Screenshot of lifecycle rule appearing in the &quot;Management&quot; tab of AWS console for an S3 bucket

    You have now set a policy that transitions any version of an object in the bucket to S3IA after each object has not been modified for 90 days.

    Additional Information

    This section lists information you may want prior to enacting lifecycle policies. It is not required content for working through the examples.

    Strategy Overview

    For a discussion of overall recommended strategy, please review the Methodology for Data Lifecycle Management on the EarthData wiki.

    AWS Documentation

    The examples shown in this document are obviously fairly basic cases. By using object tags, filters and other configuration options you can enact far more complicated policies for various scenarios. For more reading on the topics presented on this page see:

    - + \ No newline at end of file diff --git a/docs/v11.1.0/configuration/monitoring-readme/index.html b/docs/v11.1.0/configuration/monitoring-readme/index.html index aab03f43558..384ea0d763f 100644 --- a/docs/v11.1.0/configuration/monitoring-readme/index.html +++ b/docs/v11.1.0/configuration/monitoring-readme/index.html @@ -5,14 +5,14 @@ Monitoring Best Practices | Cumulus Documentation - +
    Version: v11.1.0

    Monitoring Best Practices

    This document intends to provide a set of recommendations and best practices for monitoring the state of a deployed Cumulus and diagnosing any issues.

    Cumulus-provided resources and integrations for monitoring

    Cumulus provides a number set of resources that are useful for monitoring the system and its operation.

    Cumulus Dashboard

    The primary tool for monitoring the Cumulus system is the Cumulus Dashboard. The dashboard is hosted on Github and includes instructions on how to deploy and link it into your core Cumulus deployment.

    The dashboard displays workflow executions, their status, inputs, outputs, and some diagnostic information such as logs. For further information on the dashboard, its usage, and the information it provides, see the documentation.

    Cumulus-provided AWS resources

    Cumulus sets up CloudWatch log groups for all Core-provided tasks.

    Monitoring Lambda Functions

    Logging for each Lambda Function is available in Lambda-specific CloudWatch log groups.

    Monitoring ECS services

    Each deployed cumulus_ecs_service module also includes a CloudWatch log group for the processes running on ECS.

    Monitoring workflows

    For advanced debugging, we also configure dead letter queues on critical system functions. These will allow you to monitor and debug invalid inputs to the functions we use to start workflows, which can be helpful if you find that you are not seeing workflows being started as expected. More information on these can be found in the dead letter queue documentation

    AWS recommendations

    AWS has a number of recommendations on system monitoring. Rather than reproduce those here and risk providing outdated guidance, we've documented the following links which will take you to available AWS docs on monitoring recommendations and best practices for the services used in Cumulus:

    Example: Setting up email notifications for CloudWatch logs

    Cumulus does not provide out-of-the-box support for email notifications at this time. However, setting up email notifications on AWS is fairly straightforward in that the operative components are an AWS SNS topic and a subscribed email address.

    In terms of Cumulus integration, forwarding CloudWatch logs requires creating a mechanism, most likely a Lambda Function subscribed to the log group that will receive, filter and forward these messages to the SNS topic.

    As a very simple example, we could create a function that filters CloudWatch logs created by the @cumulus/logger package and sends email notifications for error and fatal log levels, adapting the example linked above:

    const zlib = require('zlib');
    const aws = require('aws-sdk');
    const { promisify } = require('util');

    const gunzip = promisify(zlib.gunzip);
    const sns = new aws.SNS();

    exports.handler = async (event) => {
    const payload = Buffer.from(event.awslogs.data, 'base64');
    const decompressedData = await gunzip(payload);
    const logData = JSON.parse(decompressedData.toString('ascii'));
    return await Promise.all(logData.logEvents.map(async (logEvent) => {
    const logMessage = JSON.parse(logEvent.message);
    if (['error', 'fatal'].includes(logMessage.level)) {
    return sns.publish({
    TopicArn: process.env.EmailReportingTopicArn,
    Message: logEvent.message
    }).promise();
    }
    return Promise.resolve();
    }));
    };

    After creating the SNS topic, We can deploy this code as a lambda function, following the setup steps from Amazon. Make sure to include your SNS topic ARN as an environment variable on the lambda function by using the --environment option on aws lambda create-function.

    You will need to create subscription filters for each log group you want to receive emails for. We recommend automating this as much as possible, and you could very well handle this via Terraform, such as using a module to deploy filters alongside log groups, or exporting the log group names to an all-in-one email notification module.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/configuration/server_access_logging/index.html b/docs/v11.1.0/configuration/server_access_logging/index.html index fd7debe37e6..708a0fbae74 100644 --- a/docs/v11.1.0/configuration/server_access_logging/index.html +++ b/docs/v11.1.0/configuration/server_access_logging/index.html @@ -5,13 +5,13 @@ S3 Server Access Logging | Cumulus Documentation - +
    Version: v11.1.0

    S3 Server Access Logging

    Via AWS Console

    Enable server access logging for an S3 bucket

    Via AWS Command Line Interface

    1. Create a logging.json file with these contents, replacing <stack-internal-bucket> with your stack's internal bucket name, and <stack> with the name of your cumulus stack.

      {
      "LoggingEnabled": {
      "TargetBucket": "<stack-internal-bucket>",
      "TargetPrefix": "<stack>/ems-distribution/s3-server-access-logs/"
      }
      }
    2. Add the logging policy to each of your protected and public buckets by calling this command on each bucket.

      aws s3api put-bucket-logging --bucket <protected/public-bucket-name> --bucket-logging-status file://logging.json
    3. Verify the logging policy exists on your buckets.

      aws s3api get-bucket-logging --bucket <protected/public-bucket-name>
    - + \ No newline at end of file diff --git a/docs/v11.1.0/configuration/task-configuration/index.html b/docs/v11.1.0/configuration/task-configuration/index.html index 64578991ad2..a3077ddb371 100644 --- a/docs/v11.1.0/configuration/task-configuration/index.html +++ b/docs/v11.1.0/configuration/task-configuration/index.html @@ -5,13 +5,13 @@ Configuration of Tasks | Cumulus Documentation - +
    Version: v11.1.0

    Configuration of Tasks

    The cumulus module exposes values for configuration for some of the provided archive and ingest tasks. Currently the following are available as configurable variables:

    cmr_search_client_config

    Configuration parameters for CMR search client for cumulus archive module tasks in the form:

    <lambda_identifier>_report_cmr_limit = <maximum number records can be returned from cmr-client search, this should be greater than cmr_page_size>
    <lambda_identifier>_report_cmr_page_size = <number of records for each page returned from CMR>
    type = map(string)

    More information about cmr limit and cmr page_size can be found from @cumulus/cmr-client and CMR Search API document.

    Currently the following values are supported:

    • create_reconciliation_report_cmr_limit
    • create_reconciliation_report_cmr_page_size

    Example

    cmr_search_client_config = {
    create_reconciliation_report_cmr_limit = 2500
    create_reconciliation_report_cmr_page_size = 250
    }

    elasticsearch_client_config

    Configuration parameters for Elasticsearch client for cumulus archive module tasks in the form:

    <lambda_identifier>_es_scroll_duration = <duration>
    <lambda_identifier>_es_scroll_size = <size>
    type = map(string)

    Currently the following values are supported:

    • create_reconciliation_report_es_scroll_duration
    • create_reconciliation_report_es_scroll_size

    Example

    elasticsearch_client_config = {
    create_reconciliation_report_es_scroll_duration = "15m"
    create_reconciliation_report_es_scroll_size = 2000
    }

    lambda_timeouts

    A configurable map of timeouts (in seconds) for cumulus ingest module task lambdas in the form:

    <lambda_identifier>_timeout: <timeout>
    type = map(string)

    Currently the following values are supported:

    • discover_granules_task_timeout
    • discover_pdrs_task_timeout
    • hyrax_metadata_update_tasks_timeout
    • lzards_backup_task_timeout
    • move_granules_task_timeout
    • parse_pdr_task_timeout
    • pdr_status_check_task_timeout
    • post_to_cmr_task_timeout
    • queue_granules_task_timeout
    • queue_pdrs_task_timeout
    • queue_workflow_task_timeout
    • sync_granule_task_timeout
    • update_granules_cmr_metadata_file_links_task_timeout

    Example

    lambda_timeouts = {
    discover_granules_task_timeout = 300
    }
    - + \ No newline at end of file diff --git a/docs/v11.1.0/data-cookbooks/about-cookbooks/index.html b/docs/v11.1.0/data-cookbooks/about-cookbooks/index.html index feb00638dda..58aae3dd304 100644 --- a/docs/v11.1.0/data-cookbooks/about-cookbooks/index.html +++ b/docs/v11.1.0/data-cookbooks/about-cookbooks/index.html @@ -5,13 +5,13 @@ About Cookbooks | Cumulus Documentation - +
    Version: v11.1.0

    About Cookbooks

    Introduction

    The following data cookbooks are documents containing examples and explanations of workflows in the Cumulus framework. Additionally, the following data cookbooks should serve to help unify an institution/user group on a set of terms.

    Setup

    The data cookbooks assume you can configure providers, collections, and rules to run workflows. Visit Cumulus data management types for information on how to configure Cumulus data management types.

    Adding a page

    As shown in detail in the "Add a New Page and Sidebars" section in Cumulus Docs: How To's, you can add a new page to the data cookbook by creating a markdown (.md) file in the docs/data-cookbooks directory. The new page can then be linked to the sidebar by adding it to the Data-Cookbooks object in the website/sidebar.json file as data-cookbooks/${id}.

    More about workflows

    Workflow general information

    Input & Output

    Developing Workflow Tasks

    Workflow Configuration How-to's

    - + \ No newline at end of file diff --git a/docs/v11.1.0/data-cookbooks/browse-generation/index.html b/docs/v11.1.0/data-cookbooks/browse-generation/index.html index 0fd78558045..4f259adbb7d 100644 --- a/docs/v11.1.0/data-cookbooks/browse-generation/index.html +++ b/docs/v11.1.0/data-cookbooks/browse-generation/index.html @@ -5,7 +5,7 @@ Ingest Browse Generation | Cumulus Documentation - + @@ -15,7 +15,7 @@ provider keys with the previously entered values) Note that you need to set the "provider_path" to the path on your bucket (e.g. "/data") that you've staged your mock/test data.:

    {
    "name": "TestBrowseGeneration",
    "workflow": "DiscoverGranulesBrowseExample",
    "provider": "{{provider_from_previous_step}}",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "meta": {
    "provider_path": "{{path_to_data}}"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "updatedAt": 1553053438767
    }

    Run Workflows

    Once you've configured the Collection and Provider and added a onetime rule, you're ready to trigger your rule, and watch the ingest workflows process.

    Go to the Rules tab, click the rule you just created:

    Screenshot of the Rules overview page with a list of rules in the Cumulus dashboard

    Then click the gear in the upper right corner and click "Rerun":

    Screenshot of clicking the button to rerun a workflow rule from the rule edit page in the Cumulus dashboard

    Tab over to executions and you should see the DiscoverGranulesBrowseExample workflow run, succeed, and then moments later the CookbookBrowseExample should run and succeed.

    Screenshot of page listing executions in the Cumulus dashboard

    Results

    You can verify your data has ingested by clicking the successful workflow entry:

    Screenshot of individual entry from table listing executions in the Cumulus dashboard

    Select "Show Output" on the next page

    Screenshot of &quot;Show output&quot; button from individual execution page in the Cumulus dashboard

    and you should see in the payload from the workflow something similar to:

    "payload": {
    "process": "modis",
    "granules": [
    {
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-private",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "type": "browse",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-protected-2",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}"
    }
    ],
    "cmrLink": "https://cmr.uat.earthdata.nasa.gov/search/granules.json?concept_id=G1222231611-CUMULUS",
    "cmrConceptId": "G1222231611-CUMULUS",
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "cmrMetadataFormat": "echo10",
    "dataType": "MOD09GQ",
    "version": "006",
    "published": true
    }
    ]
    }

    You can verify the granules exist within your cumulus instance (search using the Granules interface, check the S3 buckets, etc) and validate that the above CMR entry


    Build Processing Lambda

    This section discusses the construction of a custom processing lambda to replace the contrived example from this entry for a real dataset processing task.

    To ingest your own data using this example, you will need to construct your own lambda to replace the source in ProcessingStep that will generate browse imagery and provide or update a CMR metadata export file.

    You will then need to add the lambda to your Cumulus deployment as a aws_lambda_function Terraform resource.

    The discussion below outlines requirements for this lambda.

    Inputs

    The incoming message to the task defined in the ProcessingStep as configured will have the following configuration values (accessible inside event.config courtesy of the message adapter):

    Configuration

    • event.config.bucket -- the name of the bucket configured in terraform.tfvars as your internal bucket.

    • event.config.collection -- The full collection object we will configure in the Configure Ingest section. You can view the expected collection schema in the docs here or in the source code on github. You need this as available input and output so you can update as needed.

    event.config.additionalUrls, generateFakeBrowse and event.config.cmrMetadataFormat from the example can be ignored as they're configuration flags for the provided example script.

    Payload

    The 'payload' from the previous task is accessible via event.input. The expected payload output schema from SyncGranules can be viewed here.

    In our example, the payload would look like the following. Note: The types are set per-file based on what we configured in our collection, and were initially added as part of the DiscoverGranules step in the DiscoverGranulesBrowseExample workflow.

     "payload": {
    "process": "modis",
    "granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    }
    ]
    }
    ]
    }

    Generating Browse Imagery

    The provided example script used in the example goes through all granules and adds a 'fake' .jpg browse file to the same staging location as the data staged by prior ingest tasksf.

    The processing lambda you construct will need to do the following:

    • Create a browse image file based on the input data, and stage it to a location accessible to both this task and the FilesToGranules and MoveGranules tasks in a S3 bucket.
    • Add the browse file to the input granule files, making sure to set the granule file's type to browse.
    • Update meta.input_granules with the updated granules list, as well as provide the files to be integrated by FilesToGranules as output from the task.

    Generating/updating CMR metadata

    If you do not already have a CMR file in the granules list, you will need to generate one for valid export. This example's processing script generates and adds it to the FilesToGranules file list via the payload but it can be present in the InputGranules from the DiscoverGranules task as well if you'd prefer to pre-generate it.

    Both downstream tasks MoveGranules, UpdateGranulesCmrMetadataFileLinks, and PostToCmr expect a valid CMR file to be available if you want to export to CMR.

    Expected Outputs for processing task/tasks

    In the above example, the critical portion of the output to FilesToGranules is the payload and meta.input_granules.

    In the example provided, the processing task is setup to return an object with the keys "files" and "granules". In the cumulus_message configuration, the outputs are mapped in the configuration to the payload, granules to meta.input_granules:

              "task_config": {
    "inputGranules": "{$.meta.input_granules}",
    "granuleIdExtraction": "{$.meta.collection.granuleIdExtraction}"
    }

    Their expected values from the example above may be useful in constructing a processing task:

    payload

    The payload includes a full list of files to be 'moved' into the cumulus archive. The FilesToGranules task will take this list, merge it with the information from InputGranules, then pass that list to the MoveGranules task. The MoveGranules task will then move the files to their targets. The UpdateGranulesCmrMetadataFileLinks task will update the CMR metadata file if it exists with the updated granule locations and update the CMR file etags.

    In the provided example, a payload being passed to the FilesToGranules task should be expected to look like:

      "payload": [
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml"
    ]

    This list is the list of granules FilesToGranules will act upon to add/merge with the input_granules object.

    The pathing is generated from sync-granules, but in principle the files can be staged wherever you like so long as the processing/MoveGranules task's roles have access and the filename matches the collection configuration.

    input_granules

    The FilesToGranules task utilizes the incoming payload to chose which files to move, but pulls all other metadata from meta.input_granules. As such, the output payload in the example would look like:

    "input_granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg"
    }
    ]
    }
    ],
    - + \ No newline at end of file diff --git a/docs/v11.1.0/data-cookbooks/choice-states/index.html b/docs/v11.1.0/data-cookbooks/choice-states/index.html index a412b1a07a5..b8b54dbdd24 100644 --- a/docs/v11.1.0/data-cookbooks/choice-states/index.html +++ b/docs/v11.1.0/data-cookbooks/choice-states/index.html @@ -5,13 +5,13 @@ Choice States | Cumulus Documentation - +
    Version: v11.1.0

    Choice States

    Cumulus supports AWS Step Function Choice states. A Choice state enables branching logic in Cumulus workflows.

    Choice state definitions include a list of Choice Rules. Each Choice Rule defines a logical operation which compares an input value against a value using a comparison operator. For available comparison operators, review the AWS docs.

    If the comparison evaluates to true, the Next state is followed.

    Example

    In examples/cumulus-tf/parse_pdr_workflow.tf the ParsePdr workflow uses a Choice state, CheckAgainChoice, to terminate the workflow once meta.isPdrFinished: true is returned by the CheckStatus state.

    The CheckAgainChoice state definition requires an input object of the following structure:

    {
    "meta": {
    "isPdrFinished": false
    }
    }

    Given the above input to the CheckAgainChoice state, the workflow would transition to the PdrStatusReport state.

    "CheckAgainChoice": {
    "Type": "Choice",
    "Choices": [
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": false,
    "Next": "PdrStatusReport"
    },
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": true,
    "Next": "WorkflowSucceeded"
    }
    ],
    "Default": "WorkflowSucceeded"
    }

    Advanced: Loops in Cumulus Workflows

    Understanding the complete ParsePdr workflow is not necessary to understanding how Choice states work, but ParsePdr provides an example of how Choice states can be used to create a loop in a Cumulus workflow.

    In the complete ParsePdr workflow definition, the state QueueGranules is followed by CheckStatus. From CheckStatus a loop starts: Given CheckStatus returns meta.isPdrFinished: false, CheckStatus is followed by CheckAgainChoice is followed by PdrStatusReport is followed by WaitForSomeTime, which returns to CheckStatus. Once CheckStatus returns meta.isPdrFinished: true, CheckAgainChoice proceeds to WorkflowSucceeded.

    Execution graph of SIPS ParsePdr workflow in AWS Step Functions console

    Further documentation

    For complete details on Choice state configuration options, see the Choice state documentation.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/data-cookbooks/cnm-workflow/index.html b/docs/v11.1.0/data-cookbooks/cnm-workflow/index.html index 0acaa8fa8d0..2b77e054128 100644 --- a/docs/v11.1.0/data-cookbooks/cnm-workflow/index.html +++ b/docs/v11.1.0/data-cookbooks/cnm-workflow/index.html @@ -5,7 +5,7 @@ CNM Workflow | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v11.1.0

    CNM Workflow

    This entry documents how to setup a workflow that utilizes the built-in CNM/Kinesis functionality in Cumulus.

    Prior to working through this entry you should be familiar with the Cloud Notification Mechanism.

    Sections


    Prerequisites

    Cumulus

    This entry assumes you have a deployed instance of Cumulus (version >= 1.16.0). The entry assumes you are deploying Cumulus via the cumulus terraform module sourced from the release page.

    AWS CLI

    This entry assumes you have the AWS CLI installed and configured. If you do not, please take a moment to review the documentation - particularly the examples relevant to Kinesis - and install it now.

    Kinesis

    This entry assumes you already have two Kinesis data steams created for use as CNM notification and response data streams.

    If you do not have two streams setup, please take a moment to review the Kinesis documentation and setup two basic single-shard streams for this example:

    Using the "Create Data Stream" button on the Kinesis Dashboard, work through the dialogue.

    You should be able to quickly use the "Create Data Stream" button on the Kinesis Dashboard, and setup streams that are similar to the following example:

    Screenshot of AWS console page for creating a Kinesis stream

    Please bear in mind that your {{prefix}}-lambda-processing IAM role will need permissions to write to the response stream for this workflow to succeed if you create the Kinesis stream with a dashboard user. If you are using the cumulus top-level module for your deployment this should be set properly.

    If not, the most straightforward approach is to attach the AmazonKinesisFullAccess policy for the stream resource to whatever role your Lambda s are using, however your environment/security policies may require an approach specific to your deployment environment.

    In operational environments it's likely science data providers would typically be responsible for providing a Kinesis stream with the appropriate permissions.

    For more information on how this process works and how to develop a process that will add records to a stream, read the Kinesis documentation and the developer guide.

    Source Data

    This entry will run the SyncGranule task against a single target data file. To that end it will require a single data file to be present in an S3 bucket matching the Provider configured in the next section.

    Collection and Provider

    Cumulus will need to be configured with a Collection and Provider entry of your choosing. The provider should match the location of the source data from the Ingest Source Data section.

    This can be done via the Cumulus Dashboard if installed or the API. It is strongly recommended to use the dashboard if possible.


    Configure the Workflow

    Provided the prerequisites have been fulfilled, you can begin adding the needed values to your Cumulus configuration to configure the example workflow.

    The following are steps that are required to set up your Cumulus instance to run the example workflow:

    Example CNM Workflow

    In this example, we're going to trigger a workflow by creating a Kinesis rule and sending a record to a Kinesis stream.

    The following workflow definition should be added to a new .tf workflow resource (e.g. cnm_workflow.tf) in your deployment directory. For the complete CNM workflow example, see examples/cumulus-tf/kinesis_trigger_test_workflow.tf.

    Add the following to the new terraform file in your deployment directory, updating the following:

    • Set the response-endpoint key in the CnmResponse task in the workflow JSON to match the name of the Kinesis response stream you configured in the prerequisites section
    • Update the source key to the workflow module to match the Cumulus release associated with your deployment.
    module "cnm_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-workflow.zip"

    prefix = var.prefix
    name = "CNMExampleWorkflow"
    workflow_config = module.cumulus.workflow_config
    system_bucket = var.system_bucket

    {
    state_machine_definition = <<JSON
    "CNMExampleWorkflow": {
    "Comment": "CNMExampleWorkflow",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "collection": "{$.meta.collection}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "response-endpoint": "ADD YOUR RESPONSE STREAM NAME HERE",
    "region": "us-east-1",
    "type": "kinesis",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$.input.input}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 5,
    "MaxAttempts": 3
    }
    ],
    "End": true
    }
    }
    }
    }
    JSON

    Again, please make sure to modify the value response-endpoint to match the stream name (not ARN) for your Kinesis response stream.

    Lambda Configuration

    To execute this workflow, you're required to include several Lambda resources in your deployment. To do this, add the following task (Lambda) definitions to your deployment along with the workflow you created above:

    Please note: To utilize these tasks you need to ensure you have a compatible CMA layer. See the deployment instructions for more details on how to deploy a CMA layer.

    Below is a description of each of these tasks:

    CNMToCMA

    CNMToCMA is meant for the beginning of a workflow: it maps CNM granule information to a payload for downstream tasks. For other CNM workflows, you would need to ensure that downstream tasks in your workflow either understand the CNM message or include a translation task like this one.

    You can also manipulate the data sent to downstream tasks using task_config for various states in your workflow resource configuration. Read more about how to configure data on the Workflow Input & Output page.

    CnmResponse

    The CnmResponse Lambda generates a CNM response message and puts it on the response-endpoint Kinesis stream.

    You can read more about the expected schema of a CnmResponse record in the Cloud Notification Mechanism schema repository.

    Additional Tasks

    Lastly, this entry also makes use of the SyncGranule task from the cumulus module.

    Redeploy

    Once the above configuration changes have been made, redeploy your stack.

    Please refer to Update Cumulus resources in the deployment documentation if you are unfamiliar with redeployment.

    Rule Configuration

    Cumulus includes a messageConsumer Lambda function (message-consumer). Cumulus kinesis-type rules create the event source mappings between Kinesis streams and the messageConsumer Lambda. The messageConsumer Lambda consumes records from one or more Kinesis streams, as defined by enabled kinesis-type rules. When new records are pushed to one of these streams, the messageConsumer triggers workflows associated with the enabled kinesis-type rules.

    To add a rule via the dashboard (if you'd like to use the API, see the docs here), navigate to the Rules page and click Add a rule, then configure the new rule using the following template (substituting correct values for parameters denoted by ${}):

    {
    "collection": {
    "name": "L2_HR_PIXC",
    "version": "000"
    },
    "name": "L2_HR_PIXC_kinesisRule",
    "provider": "PODAAC_SWOT",
    "rule": {
    "type": "kinesis",
    "value": "arn:aws:kinesis:{{awsRegion}}:{{awsAccountId}}:stream/{{streamName}}"
    },
    "state": "ENABLED",
    "workflow": "CNMExampleWorkflow"
    }

    Please Note:

    • The rule's value attribute value must match the Amazon Resource Name ARN for the Kinesis data stream you've preconfigured. You should be able to obtain this ARN from the Kinesis Dashboard entry for the selected stream.
    • The collection and provider should match the collection and provider you setup in the Prerequisites section.

    Once you've clicked on 'submit' a new rule should appear in the dashboard's Rule Overview.


    Execute the Workflow

    Once Cumulus has been redeployed and a rule has been added, we're ready to trigger the workflow and watch it execute.

    How to Trigger the Workflow

    To trigger matching workflows, you will need to put a record on the Kinesis stream that the message-consumer Lambda will recognize as a matching event. Most importantly, it should include a collection name that matches a valid collection.

    For the purpose of this example, the easiest way to accomplish this is using the AWS CLI.

    Create Record JSON

    Construct a JSON file containing an object that matches the values that have been previously setup. This JSON object should be a valid Cloud Notification Mechanism message.

    Please note: this example is somewhat contrived, as the downstream tasks don't care about most of these fields. A 'real' data ingest workflow would.

    The following values (denoted by ${} in the sample below) should be replaced to match values we've previously configured:

    • TEST_DATA_FILE_NAME: The filename of the test data that is available in the S3 (or other) provider we created earlier.
    • TEST_DATA_URI: The full S3 path to the test data (e.g. s3://bucket-name/path/granule)
    • COLLECTION: The collection name defined in the prerequisites for this product
    {
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "${TEST_DATA_FILE_NAME}",
    "checksum": "bogus_checksum_value",
    "uri": "${TEST_DATA_URI}",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "${TEST_DATA_FILE_NAME}",
    "dataVersion": "006"
    },
    "identifier ": "testIdentifier123456",
    "collection": "${COLLECTION}",
    "provider": "TestProvider",
    "version": "001",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Add Record to Kinesis Data Stream

    Using the JSON file you created, push it to the Kinesis notification stream:

    aws kinesis put-record --stream-name YOUR_KINESIS_NOTIFICATION_STREAM_NAME_HERE --partition-key 1 --data file:///path/to/file.json

    Please note: The above command uses the stream name, not the ARN.

    The command should return output similar to:

    {
    "ShardId": "shardId-000000000000",
    "SequenceNumber": "42356659532578640215890215117033555573986830588739321858"
    }

    This command will put a record containing the JSON from the --data flag onto the Kinesis data stream. The messageConsumer Lambda will consume the record and construct a valid CMA payload to trigger workflows. For this example, the record will trigger the CNMExampleWorkflow workflow as defined by the rule previously configured.

    You can view the current running executions on the Executions dashboard page which presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information.

    Verify Workflow Execution

    As detailed above, once the record is added to the Kinesis data stream, the messageConsumer Lambda will trigger the CNMExampleWorkflow .

    TranslateMessage

    TranslateMessage (which corresponds to the CNMToCMA Lambda) will take the CNM object payload and add a granules object to the CMA payload that's consistent with other Cumulus ingest tasks, and add a meta.cnm key (as well as the payload) to store the original message.

    For more on the Message Adapter, please see the Message Flow documentation.

    An example of what is happening in the CNMToCMA Lambda is as follows:

    Example Input Payload:

    "payload": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some_bucket/cumulus-test-data/pdrs/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Example Output Payload:

      "payload": {
    "cnm": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552"
    },
    "output": {
    "granules": [
    {
    "granuleId": "TestGranuleUR",
    "files": [
    {
    "path": "some-bucket/data",
    "url_path": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "some-bucket",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 12345678
    }
    ]
    }
    ]
    }
    }

    SyncGranules

    This Lambda will take the files listed in the payload and move them to s3://{deployment-private-bucket}/file-staging/{deployment-name}/{COLLECTION}/{file_name}.

    CnmResponse

    Assuming a successful execution of the workflow, this task will recover the meta.cnm key from the CMA output, and add a "SUCCESS" record to the notification Kinesis stream.

    If a prior step in the workflow has failed, this will add a "FAILURE" record to the stream instead.

    The data written to the response-endpoint should adhere to the Response Message Fields schema.

    Example CNM Success Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "SUCCESS"
    }
    }

    Example CNM Error Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "FAILURE",
    "errorCode": "PROCESSING_ERROR",
    "errorMessage": "File [cumulus-dev-a4d38f59-5e57-590c-a2be-58640db02d91/prod_20170926T11:30:36/production_file.nc] did not match gve checksum value."
    }
    }

    Note the CnmResponse state defined in the .tf workflow definition above configures $.exception to be passed to the CnmResponse Lambda keyed under config.WorkflowException. This is required for the CnmResponse code to deliver a failure response.

    To test the failure scenario, send a record missing the product.name key.


    Verify results

    Check for successful execution on the dashboard

    Following the successful execution of this workflow, you should expect to see the workflow complete successfully on the dashboard:

    Screenshot of a successful CNM workflow appearing on the executions page of the Cumulus dashboard

    Check the test granule has been delivered to S3 staging

    The test granule identified in the Kinesis record should be moved to the deployment's private staging area.

    Check for Kinesis records

    A SUCCESS notification should be present on the response-endpoint Kinesis stream.

    You should be able to validate the notification and response streams have the expected records with the following steps (the AWS CLI Kinesis Basic Stream Operations is useful to review before proceeding):

    Get a shard iterator (substituting your stream name as appropriate):

    aws kinesis get-shard-iterator \
    --shard-id shardId-000000000000 \
    --shard-iterator-type LATEST \
    --stream-name NOTIFICATION_OR_RESPONSE_STREAM_NAME

    which should result in an output to:

    {
    "ShardIterator": "VeryLongString=="
    }
    • Re-trigger the workflow by using the put-record command from
    • As the workflow completes, use the output from the get-shard-iterator command to request data from the stream:
    aws kinesis get-records --shard-iterator SHARD_ITERATOR_VALUE

    This should result in output similar to:

    {
    "Records": [
    {
    "SequenceNumber": "49586720336541656798369548102057798835250389930873978882",
    "ApproximateArrivalTimestamp": 1532664689.128,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjI4LjkxOSJ9",
    "PartitionKey": "1"
    },
    {
    "SequenceNumber": "49586720336541656798369548102059007761070005796999266306",
    "ApproximateArrivalTimestamp": 1532664707.149,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjQ2Ljk1OCJ9",
    "PartitionKey": "1"
    }
    ],
    "NextShardIterator": "AAAAAAAAAAFo9SkF8RzVYIEmIsTN+1PYuyRRdlj4Gmy3dBzsLEBxLo4OU+2Xj1AFYr8DVBodtAiXbs3KD7tGkOFsilD9R5tA+5w9SkGJZ+DRRXWWCywh+yDPVE0KtzeI0andAXDh9yTvs7fLfHH6R4MN9Gutb82k3lD8ugFUCeBVo0xwJULVqFZEFh3KXWruo6KOG79cz2EF7vFApx+skanQPveIMz/80V72KQvb6XNmg6WBhdjqAA==",
    "MillisBehindLatest": 0
    }

    Note the data encoding is not human readable and would need to be parsed/converted to be interpretable. There are many options to build a Kineis consumer such as the KCL.

    For purposes of validating the workflow, it may be simpler to locate the workflow in the Step Function Management Console and assert the expected output is similar to the below examples.

    Successful CNM Response Object Example:

    {
    "cnmResponse": {
    "provider": "TestProvider",
    "collection": "MOD09GQ",
    "version": "123456",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier ": "testIdentifier123456",
    "response": {
    "status": "SUCCESS"
    }
    }
    }

    Kinesis Record Error Handling

    messageConsumer

    The default Kinesis stream processing in the Cumulus system is configured for record error tolerance.

    When the messageConsumer fails to process a record, the failure is captured and the record is published to the kinesisFallback SNS Topic. The kinesisFallback SNS topic broadcasts the record and a subscribed copy of the messageConsumer Lambda named kinesisFallback consumes these failures.

    At this point, the normal Lambda asynchronous invocation retry behavior will attempt to process the record 3 mores times. After this, if the record cannot successfully be processed, it is written to a dead letter queue. Cumulus' dead letter queue is an SQS Queue named kinesisFailure. Operators can use this queue to inspect failed records.

    This system ensures when messageConsumer fails to process a record and trigger a workflow, the record is retried 3 times. This retry behavior improves system reliability in case of any external service failure outside of Cumulus control.

    The Kinesis error handling system - the kinesisFallback SNS topic, messageConsumer Lambda, and kinesisFailure SQS queue - come with the API package and do not need to be configured by the operator.

    To examine records that were unable to be processed at any step you need to go look at the dead letter queue {{prefix}}-kinesisFailure. Check the Simple Queue Service (SQS) console. Select your queue, and under the Queue Actions tab, you can choose View/Delete Messages. Start polling for messages and you will see records that failed to process through the messageConsumer.

    Note, these are only records that occurred when processing records from Kinesis streams. Workflow failures are handled differently.

    Kinesis Stream logging

    Notification Stream messages

    Cumulus includes two Lambdas (KinesisInboundEventLogger and KinesisOutboundEventLogger) that utilize the same code to take a Kinesis record event as input, deserialize the data field and output the modified event to the logs.

    When a kinesis rule is created, in addition to the messageConsumer event mapping, an event mapping is created to trigger KinesisInboundEventLogger to record a log of the inbound record, to allow for analysis in case of unexpected failure.

    Response Stream messages

    Cumulus also supports this feature for all outbound messages. To take advantage of this feature, you will need to set an event mapping on the KinesisOutboundEventLogger Lambda that targets your response-endpoint. You can do this in the Lambda management page for KinesisOutboundEventLogger. Add a Kinesis trigger, and configure it to target the cnmResponseStream for your workflow:

    Screenshot of the AWS console showing configuration for Kinesis stream trigger on KinesisOutboundEventLogger Lambda

    Once this is done, all records sent to the response-endpoint will also be logged in CloudWatch. For more on configuring Lambdas to trigger on Kinesis events, please see creating an event source mapping.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/data-cookbooks/error-handling/index.html b/docs/v11.1.0/data-cookbooks/error-handling/index.html index c59f6fb6db8..52f8c07f3f5 100644 --- a/docs/v11.1.0/data-cookbooks/error-handling/index.html +++ b/docs/v11.1.0/data-cookbooks/error-handling/index.html @@ -5,7 +5,7 @@ Error Handling in Workflows | Cumulus Documentation - + @@ -45,7 +45,7 @@ Service Exception. See this documentation on configuring your workflow to handle transient lambda errors.

    Example state machine definition:

    {
    "Comment": "Tests Workflow from Kinesis Stream",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "Path": "$.payload",
    "TargetPath": "$.payload"
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": ["States.ALL"],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowSucceeded"
    },
    "CnmResponseFail": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowFailed"
    },
    "WorkflowSucceeded": {
    "Type": "Succeed"
    },
    "WorkflowFailed": {
    "Type": "Fail",
    "Cause": "Workflow failed"
    }
    }
    }

    The above results in a workflow which is visualized in the diagram below:

    Screenshot of a visualization of an AWS Step Function workflow definition with branching logic for failures

    Summary

    Error handling should (mostly) be the domain of workflow configuration.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/data-cookbooks/hello-world/index.html b/docs/v11.1.0/data-cookbooks/hello-world/index.html index 17ba6698a05..94bde4426c2 100644 --- a/docs/v11.1.0/data-cookbooks/hello-world/index.html +++ b/docs/v11.1.0/data-cookbooks/hello-world/index.html @@ -5,14 +5,14 @@ HelloWorld Workflow | Cumulus Documentation - +
    Version: v11.1.0

    HelloWorld Workflow

    Example task meant to be a sanity check/introduction to the Cumulus workflows.

    Pre-Deployment Configuration

    Workflow Configuration

    A workflow definition can be found in the template repository hello_world_workflow module.

    {
    "Comment": "Returns Hello World",
    "StartAt": "HelloWorld",
    "States": {
    "HelloWorld": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.hello_world_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    }

    Workflow error-handling can be configured as discussed in the Error-Handling cookbook.

    Task Configuration

    The HelloWorld task is provided for you as part of the cumulus terraform module, no configuration is needed.

    If you want to manually deploy your own version of this Lambda for testing, you can copy the Lambda resource definition located in the Cumulus source code at cumulus/tf-modules/ingest/hello-world-task.tf. The Lambda source code is located in the Cumulus source code at 'cumulus/tasks/hello-world'.

    Execution

    We will focus on using the Cumulus dashboard to schedule the execution of a HelloWorld workflow.

    Our goal here is to create a rule through the Cumulus dashboard that will define the scheduling and execution of our HelloWorld workflow. Let's navigate to the Rules page and click Add a rule.

    {
    "collection": { # collection values can be configured and found on the Collections page
    "name": "${collection_name}",
    "version": "${collection_version}"
    },
    "name": "helloworld_rule",
    "provider": "${provider}", # found on the Providers page
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "workflow": "HelloWorldWorkflow" # This can be found on the Workflows page
    }

    Screenshot of AWS Step Function execution graph for the HelloWorld workflow Executed workflow as seen in AWS Console

    Output/Results

    The Executions page presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information. The rule defined in the previous section should start an execution of its own accord, and the status of that execution can be tracked here.

    To get some deeper information on the execution, click on the value in the Name column of your execution of interest. This should bring up a visual representation of the workflow similar to that shown above, execution details, and a list of events.

    Summary

    Setting up the HelloWorld workflow on the Cumulus dashboard is the tip of the iceberg, so to speak. The task and step-function need to be configured before Cumulus deployment. A compatible collection and provider must be configured and applied to the rule. Finally, workflow execution status can be viewed via the workflows tab on the dashboard.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/data-cookbooks/ingest-notifications/index.html b/docs/v11.1.0/data-cookbooks/ingest-notifications/index.html index f0a5a43ddbb..58e2b4cc291 100644 --- a/docs/v11.1.0/data-cookbooks/ingest-notifications/index.html +++ b/docs/v11.1.0/data-cookbooks/ingest-notifications/index.html @@ -5,13 +5,13 @@ Ingest Notification in Workflows | Cumulus Documentation - +
    Version: v11.1.0

    Ingest Notification in Workflows

    On deployment, an SQS queue and three SNS topics, one for executions, granules, and PDRs, are created and used for handling notification messages related to the workflow.

    The ingest notification reporting SQS queue is populated via a Cloudwatch rule for any Step Function execution state transitions. The sfEventSqsToDbRecords Lambda consumes this queue. The queue and Lambda are included in the cumulus module and the Cloudwatch rule in the workflow module and are included by default in a Cumulus deployment.

    The sfEventSqsToDbRecords Lambda function reads from the sfEventSqsToDbRecordsInputQueue queue and updates the RDS database records for granules, executions, and PDRs. When the records are updated, messages are posted to the three SNS topics. This Lambda is invoked both when the workflow starts and when it reaches a terminal state (completion or failure).

    Diagram of architecture for reporting workflow ingest notifications from AWS Step Functions

    Sending SQS messages to report status

    Publishing granule/PDR reports directly to the SQS queue

    If you have a non-Cumulus workflow or process ingesting data and would like to update the status of your granules or PDRs, you can publish directly to the reporting SQS queue. Publishing messages to this queue will result in those messages being stored as granule/PDR records in the Cumulus database and having the status of those granules/PDRs being visible on the Cumulus dashboard. The queue does have certain expectations as it expects a Cumulus Message nested within a Cloudwatch Step Function Event object.

    Posting directly to the queue will require knowing the queue URL. Assuming that you are using the cumulus module for your deployment, you can get the queue URL by adding them to outputs.tf for your Terraform deployment as in our example deployment:

    output "stepfunction_event_reporter_queue_url" {
    value = module.cumulus.stepfunction_event_reporter_queue_url
    }

    output "report_executions_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_granules_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_pdrs_sns_topic_arn" {
    value = module.cumulus.report_pdrs_sns_topic_arn
    }

    Then, when you run terraform deploy, you should see the topic ARNs printed to your console:

    Outputs:
    ...
    stepfunction_event_reporter_queue_url = https://sqs.us-east-1.amazonaws.com/xxxxxxxxx/<prefix>-sfEventSqsToDbRecordsInputQueue
    report_executions_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_granules_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_pdrs_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-pdrs-topic

    Once you have the queue URL, you can use the AWS SDK for your language of choice to publish messages to the topic. The expected format of these messages is that of a Cloudwatch Step Function event containing a Cumulus message. For SUCCEEDED events, the Cumulus message is expected to be in detail.output. For all other events statuses, a Cumulus Message is expected in detail.input. The Cumulus Message populating these fields MUST be a JSON string, not an object. Messages that do not conform to the schemas will fail to be created as records.

    If you are not seeing records persist to the database or show up in the Cumulus dashboard, you can investigate the Cloudwatch logs of the SQS consumer Lambda:

    • /aws/lambda/<prefix>-sfEventSqsToDbRecords

    In a workflow

    As described above, ingest notifications will automatically be published to the SNS topics on workflow start and completion/failure, so you should not include a workflow step to publish the initial or final status of your workflows.

    However, if you want to report your ingest status at any point during a workflow execution, you can add a workflow step using the SfSqsReport Lambda. In the following example from cumulus-tf/parse_pdr_workflow.tf, the ParsePdr workflow is configured to use the SfSqsReport Lambda, primarily to update the PDR ingestion status.

    Note: ${sf_sqs_report_task_arn} is an interpolated value referring to a Terraform resource. See the example deployment code for the ParsePdr workflow.

      "PdrStatusReport": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    },
    "ResultPath": null,
    "Type": "Task",
    "Resource": "${sf_sqs_report_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WaitForSomeTime"
    },

    Subscribing additional listeners to SNS topics

    Additional listeners to SNS topics can be configured in a .tf file for your Cumulus deployment. Shown below is configuration that subscribes an additional Lambda function (test_lambda) to receive messages from the report_executions SNS topic. To subscribe to the report_granules or report_pdrs SNS topics instead, simply replace report_executions in the code block below with either of those values.

    resource "aws_lambda_function" "test_lambda" {
    function_name = "${var.prefix}-testLambda"
    filename = "./testLambda.zip"
    source_code_hash = filebase64sha256("./testLambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"
    }

    resource "aws_sns_topic_subscription" "test_lambda" {
    topic_arn = module.cumulus.report_executions_sns_topic_arn
    protocol = "lambda"
    endpoint = aws_lambda_function.test_lambda.arn
    }

    resource "aws_lambda_permission" "test_lambda" {
    action = "lambda:InvokeFunction"
    function_name = aws_lambda_function.test_lambda.arn
    principal = "sns.amazonaws.com"
    source_arn = module.cumulus.report_executions_sns_topic_arn
    }

    SNS message format

    Subscribers to the SNS topics can expect to find the published message in the SNS event at Records[0].Sns.Message. The message will be a JSON stringified version of the ingest notification record for an execution or a PDR. For granules, the message will be a JSON stringified object with ingest notification record in the record property and the event type as the event property.

    The ingest notification record of the execution, granule, or PDR should conform to the data model schema for the given record type.

    Summary

    Workflows can be configured to send SQS messages at any point using the sf-sqs-report task.

    Additional listeners can be easily configured to trigger when messages are sent to the SNS topics.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/data-cookbooks/queue-post-to-cmr/index.html b/docs/v11.1.0/data-cookbooks/queue-post-to-cmr/index.html index 3b1eb5f2f8f..86fe310df74 100644 --- a/docs/v11.1.0/data-cookbooks/queue-post-to-cmr/index.html +++ b/docs/v11.1.0/data-cookbooks/queue-post-to-cmr/index.html @@ -5,13 +5,13 @@ Queue PostToCmr | Cumulus Documentation - +
    Version: v11.1.0

    Queue PostToCmr

    In this document, we walk through handling CMR errors in workflows by queueing PostToCmr. We assume that the user already has an ingest workflow setup.

    Overview

    The general concept is that the last task of the ingest workflow will be QueueWorkflow, which queues the publish workflow. The publish workflow contains the PostToCmr task and if a CMR error occurs during PostToCmr, the publish workflow will add itself back onto the queue so that it can be executed when CMR is back online. This is achieved by leveraging the QueueWorkflow task again in the publish workflow. The following diagram demonstrates this queueing process.

    Diagram of workflow queueing

    Ingest Workflow

    The last step should be the QueuePublishWorkflow step. It should be configured with a queueUrl and workflow. In this case, the queueUrl is a throttled queue. Any queueUrl can be specified here which is useful if you would like to use a lower priority queue. The workflow is the unprefixed workflow name that you would like to queue (e.g. PublishWorkflow).

      "QueuePublishWorkflowStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "workflow": "{$.meta.workflow}",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Publish Workflow

    Configure the Catch section of your PostToCmr task to proceed to QueueWorkflow if a CMRInternalError is caught. Any other error will cause the workflow to fail.

      "Catch": [
    {
    "ErrorEquals": [
    "CMRInternalError"
    ],
    "Next": "RequeueWorkflow"
    },
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],

    Then, configure the QueueWorkflow task similarly to its configuration in the ingest workflow. This time, pass the current publish workflow to the task config. This allows for the publish workflow to be requeued when there is a CMR error.

    {
    "RequeueWorkflow": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "workflow": "PublishGranuleQueue",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    - + \ No newline at end of file diff --git a/docs/v11.1.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html b/docs/v11.1.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html index 66a5c5923f6..3f652954e40 100644 --- a/docs/v11.1.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html +++ b/docs/v11.1.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html @@ -5,13 +5,13 @@ Run Step Function Tasks in AWS Lambda or Docker | Cumulus Documentation - +
    Version: v11.1.0

    Run Step Function Tasks in AWS Lambda or Docker

    Overview

    AWS Step Function Tasks can run tasks on AWS Lambda or on AWS Elastic Container Service (ECS) as a Docker container.

    Lambda provides serverless architecture, providing the best option for minimizing cost and server management. ECS provides the fullest extent of AWS EC2 resources via the flexibility to execute arbitrary code on any AWS EC2 instance type.

    When to use Lambda

    You should use AWS Lambda whenever all of the following are true:

    • The task runs on one of the supported Lambda Runtimes. At time of this writing, supported runtimes include versions of python, Java, Ruby, node.js, Go and .NET.
    • The lambda package is less than 50 MB in size, zipped.
    • The task consumes less than each of the following resources:
      • 3008 MB memory allocation
      • 512 MB disk storage (must be written to /tmp)
      • 15 minutes of execution time

    See this page for a complete and up-to-date list of AWS Lambda limits.

    If your task requires more than any of these resources or an unsupported runtime, creating a Docker image which can be run on ECS is the way to go. Cumulus supports running any lambda package (and its configured layers) as a Docker container with cumulus-ecs-task.

    Step Function Activities and cumulus-ecs-task

    Step Function Activities enable a state machine task to "publish" an activity task which can be picked up by any activity worker. Activity workers can run pretty much anywhere, but Cumulus workflows support the cumulus-ecs-task activity worker. The cumulus-ecs-task worker runs as a Docker container on the Cumulus ECS cluster.

    The cumulus-ecs-task container takes an AWS Lambda Amazon Resource Name (ARN) as an argument (see --lambdaArn in the example below). This ARN argument is defined at deployment time. The cumulus-ecs-task worker polls for new Step Function Activity Tasks. When a Step Function executes, the worker (container) picks up the activity task and runs the code contained in the lambda package defined on deployment.

    Example: Replacing AWS Lambda with a Docker container run on ECS

    This example will use an already-defined workflow from the cumulus module that includes the QueueGranules task in its configuration.

    The following example is an excerpt from the Discover Granules workflow containing the step definition for the QueueGranules step:

    Note: ${ingest_granule_workflow_name} and ${queue_granules_task_arn} are interpolated values that refer to Terraform resources. See the example deployment code for the Discover Granules workflow.

      "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "queueUrl": "{$.meta.queues.startSF}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Given it has been discovered this task can no longer run in AWS Lambda, you can instead run it on the Cumulus ECS cluster by adding the following resources to your terraform deployment (by either adding a new .tf file or updating an existing one):

    • A aws_sfn_activity resource:
    resource "aws_sfn_activity" "queue_granules" {
    name = "${var.prefix}-QueueGranules"
    }
    • An instance of the cumulus_ecs_service module (found on the Cumulus releases page configured to provide the QueueGranules task:

    module "queue_granules_service" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-ecs-service.zip"

    prefix = var.prefix
    name = "QueueGranules"

    cluster_arn = module.cumulus.ecs_cluster_arn
    desired_count = 1
    image = "cumuluss/cumulus-ecs-task:1.7.0"

    cpu = 400
    memory_reservation = 700

    environment = {
    AWS_DEFAULT_REGION = data.aws_region.current.name
    }
    command = [
    "cumulus-ecs-task",
    "--activityArn",
    aws_sfn_activity.queue_granules.id,
    "--lambdaArn",
    module.cumulus.queue_granules_task.task_arn,
    "--lastModified",
    module.cumulus.queue_granules_task.last_modified_date
    ]
    alarms = {
    MemoryUtilizationHigh = {
    comparison_operator = "GreaterThanThreshold"
    evaluation_periods = 1
    metric_name = "MemoryUtilization"
    statistic = "SampleCount"
    threshold = 75
    }
    }
    }

    Please note: If you have updated the code for the Lambda specified by --lambdaArn, you will have to manually restart the tasks in your ECS service before invocation of the Step Function activity will use the updated Lambda code.

    • An updated Discover Granules workflow) to utilize the new resource (the Resource key in the QueueGranules step has been updated to:

    "Resource": "${aws_sfn_activity.queue_granules.id}")`

    If you then run this workflow in place of the DiscoverGranules workflow, the QueueGranules step would run as an ECS task instead of a lambda.

    Final note

    Step Function Activities and AWS Lambda are not the only ways to run tasks in an AWS Step Function. Learn more about other service integrations, including direct ECS integration via the AWS Service Integrations page.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/data-cookbooks/sips-workflow/index.html b/docs/v11.1.0/data-cookbooks/sips-workflow/index.html index 696f28d1707..4e209ed32cd 100644 --- a/docs/v11.1.0/data-cookbooks/sips-workflow/index.html +++ b/docs/v11.1.0/data-cookbooks/sips-workflow/index.html @@ -5,7 +5,7 @@ Science Investigator-led Processing Systems (SIPS) | Cumulus Documentation - + @@ -16,7 +16,7 @@ we're just going to create a onetime throw-away rule that will be easy to test with. This rule will kick off the DiscoverAndQueuePdrs workflow, which is the beginning of a Cumulus SIPS workflow:

    Screenshot of a Cumulus rule configuration

    Note: A list of configured workflows exists under the "Workflows" in the navigation bar on the Cumulus dashboard. Additionally, one can find a list of executions and their respective status in the "Executions" tab in the navigation bar.

    DiscoverAndQueuePdrs Workflow

    This workflow will discover PDRs and queue them to be processed. Duplicate PDRs will be dealt with according to the configured duplicate handling setting in the collection. The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. DiscoverPdrs - source
    2. QueuePdrs - source

    Screenshot of execution graph for discover and queue PDRs workflow in the AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the discover_and_queue_pdrs_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    ParsePdr Workflow

    The ParsePdr workflow will parse a PDR, queue the specified granules (duplicates are handled according to the duplicate handling setting) and periodically check the status of those queued granules. This workflow will not succeed until all the granules included in the PDR are successfully ingested. If one of those fails, the ParsePdr workflow will fail. NOTE that ParsePdr may spin up multiple IngestGranule workflows in parallel, depending on the granules included in the PDR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. ParsePdr - source
    2. QueueGranules - source
    3. CheckStatus - source

    Screenshot of execution graph for SIPS Parse PDR workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the parse_pdr_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    IngestGranule Workflow

    The IngestGranule workflow processes and ingests a granule and posts the granule metadata to CMR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. SyncGranule - source.
    2. CmrStep - source

    Additionally this workflow requires a processing step you must provide. The ProcessingStep step in the workflow picture below is an example of a custom processing step.

    Note: Using the CmrStep is not required and can be left out of the processing trajectory if desired (for example, in testing situations).

    Screenshot of execution graph for SIPS IngestGranule workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the ingest_and_publish_granule_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    Summary

    In this cookbook we went over setting up a collection, rule, and provider for a SIPS workflow. Once we had the setup completed, we looked over the Cumulus workflows that participate in parsing PDRs, ingesting and processing granules, and updating CMR.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/data-cookbooks/throttling-queued-executions/index.html b/docs/v11.1.0/data-cookbooks/throttling-queued-executions/index.html index facc75860ac..d9d34942e44 100644 --- a/docs/v11.1.0/data-cookbooks/throttling-queued-executions/index.html +++ b/docs/v11.1.0/data-cookbooks/throttling-queued-executions/index.html @@ -5,13 +5,13 @@ Throttling queued executions | Cumulus Documentation - +
    Version: v11.1.0

    Throttling queued executions

    In this entry, we will walk through how to create an SQS queue for scheduling executions which will be used to limit those executions to a maximum concurrency. And we will see how to configure our Cumulus workflows/rules to use this queue.

    We will also review the architecture of this feature and highlight some implementation notes.

    Limiting the number of executions that can be running from a given queue is useful for controlling the cloud resource usage of workflows that may be lower priority, such as granule reingestion or reprocessing campaigns. It could also be useful for preventing workflows from exceeding known resource limits, such as a maximum number of open connections to a data provider.

    Implementing the queue

    Create and deploy the queue

    Add a new queue

    In a .tf file for your Cumulus deployment, add a new SQS queue:

    resource "aws_sqs_queue" "background_job_queue" {
    name = "${var.prefix}-backgroundJobQueue"
    receive_wait_time_seconds = 20
    visibility_timeout_seconds = 60
    }

    Set maximum executions for the queue

    Define the throttled_queues variable for the cumulus module in your Cumulus deployment to specify the maximum concurrent executions for the queue.

    module "cumulus" {
    # ... other variables

    throttled_queues = [{
    url = aws_sqs_queue.background_job_queue.id,
    execution_limit = 5
    }]
    }

    Setup consumer for the queue

    Add the sqs2sfThrottle Lambda as the consumer for the queue and add a Cloudwatch event rule/target to read from the queue on a scheduled basis.

    Please note: You must use the sqs2sfThrottle Lambda as the consumer for any queue with a queue execution limit or else the execution throttling will not work correctly. Additionally, please allow at least 60 seconds after creation before using the queue while associated infrastructure and triggers are set up and made ready.

    aws_sqs_queue.background_job_queue.id refers to the queue resource defined above.

    resource "aws_cloudwatch_event_rule" "background_job_queue_watcher" {
    schedule_expression = "rate(1 minute)"
    }

    resource "aws_cloudwatch_event_target" "background_job_queue_watcher" {
    rule = aws_cloudwatch_event_rule.background_job_queue_watcher.name
    arn = module.cumulus.sqs2sfThrottle_lambda_function_arn
    input = jsonencode({
    messageLimit = 500
    queueUrl = aws_sqs_queue.background_job_queue.id
    timeLimit = 60
    })
    }

    resource "aws_lambda_permission" "background_job_queue_watcher" {
    action = "lambda:InvokeFunction"
    function_name = module.cumulus.sqs2sfThrottle_lambda_function_arn
    principal = "events.amazonaws.com"
    source_arn = aws_cloudwatch_event_rule.background_job_queue_watcher.arn
    }

    Re-deploy your Cumulus application

    Follow the instructions to re-deploy your Cumulus application. After you have re-deployed, your workflow template will be updated to the include information about the queue (the output below is partial output from an expected workflow template):

    {
    "cumulus_meta": {
    "queueExecutionLimits": {
    "<backgroundJobQueue_SQS_URL>": 5
    }
    }
    }

    Integrate your queue with workflows and/or rules

    Integrate queue with queuing steps in workflows

    For any workflows using QueueGranules or QueuePdrs that you want to use your new queue, update the Cumulus configuration of those steps in your workflows.

    As seen in this partial configuration for a QueueGranules step, update the queueUrl to reference the new throttled queue:

    Note: ${ingest_granule_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverGranules workflow.

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}"
    }
    }
    }
    }
    }

    Similarly, for a QueuePdrs step:

    Note: ${parse_pdr_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverPdrs workflow.

    {
    "QueuePdrs": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "parsePdrWorkflow": "${parse_pdr_workflow_name}"
    }
    }
    }
    }
    }

    After making these changes, re-deploy your Cumulus application for the execution throttling to take effect on workflow executions queued by these workflows.

    Create/update a rule to use your new queue

    Create or update a rule definition to include a queueUrl property that refers to your new queue:

    {
    "name": "s3_provider_rule",
    "workflow": "DiscoverAndQueuePdrs",
    "provider": "s3_provider",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "queueUrl": "<backgroundJobQueue_SQS_URL>" // configure rule to use your queue URL
    }

    After creating/updating the rule, any subsequent invocations of the rule should respect the maximum number of executions when starting workflows from the queue.

    Architecture

    Architecture diagram showing how executions started from a queue are throttled to a maximum concurrent limit

    Execution throttling based on the queue works by manually keeping a count (semaphore) of how many executions are running for the queue at a time. The key operation that prevents the number of executions from exceeding the maximum for the queue is that before starting new executions, the sqs2sfThrottle Lambda attempts to increment the semaphore and responds as follows:

    • If the increment operation is successful, then the count was not at the maximum and an execution is started
    • If the increment operation fails, then the count was already at the maximum so no execution is started

    Final notes

    Limiting the number of concurrent executions for work scheduled via a queue has several consequences worth noting:

    • The number of executions that are running for a given queue will be limited to the maximum for that queue regardless of which workflow(s) are started.
    • If you use the same queue to schedule executions across multiple workflows/rules, then the limit on the total number of executions running concurrently will be applied to all of the executions scheduled across all of those workflows/rules.
    • If you are scheduling the same workflow both via a queue with a maxExecutions value and a queue without a maxExecutions value, only the executions scheduled via the queue with the maxExecutions value will be limited to the maximum.
    - + \ No newline at end of file diff --git a/docs/v11.1.0/data-cookbooks/tracking-files/index.html b/docs/v11.1.0/data-cookbooks/tracking-files/index.html index f5caa9260a0..85710ada0fb 100644 --- a/docs/v11.1.0/data-cookbooks/tracking-files/index.html +++ b/docs/v11.1.0/data-cookbooks/tracking-files/index.html @@ -5,7 +5,7 @@ Tracking Ancillary Files | Cumulus Documentation - + @@ -19,7 +19,7 @@ The UMM-G column reflects the RelatedURL's Type derived from the CNM type, whereas the ECHO10 column shows how the CNM type affects the destination element.

    CNM TypeUMM-G RelatedUrl.TypeECHO10 Location
    ancillary'VIEW RELATED INFORMATION'OnlineResource
    data'GET DATA'(HTTPS URL) or 'GET DATA VIA DIRECT ACCESS'(S3 URI)OnlineAccessURL
    browse'GET RELATED VISUALIZATION'AssociatedBrowseImage
    linkage'EXTENDED METADATA'OnlineResource
    metadata'EXTENDED METADATA'OnlineResource
    qa'EXTENDED METADATA'OnlineResource

    Common Use Cases

    This section briefly documents some common use cases and the recommended configuration for the file. The examples shown here are for the DiscoverGranules use case, which allows configuration at the Cumulus dashboard level. The other two cases covered in the ancillary metadata documentation require configuration at the provider notification level (either CNM message or PDR) and are not covered here.

    Configuring browse imagery:

    {
    "bucket": "public",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_[\\d]{1}.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_1.jpg",
    "type": "browse"
    }

    Configuring a documentation entry:

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_README.pdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_README.pdf",
    "type": "metadata"
    }

    Configuring other associated files (use types metadata or qa as appropriate):

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_QA.txt$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_QA.txt",
    "type": "qa"
    }
    - + \ No newline at end of file diff --git a/docs/v11.1.0/deployment/api-gateway-logging/index.html b/docs/v11.1.0/deployment/api-gateway-logging/index.html index e89a23da04a..13ad3369f89 100644 --- a/docs/v11.1.0/deployment/api-gateway-logging/index.html +++ b/docs/v11.1.0/deployment/api-gateway-logging/index.html @@ -5,13 +5,13 @@ API Gateway Logging | Cumulus Documentation - +
    Version: v11.1.0

    API Gateway Logging

    Enabling API Gateway logging

    In order to enable distribution API Access and execution logging, configure the TEA deployment by setting log_api_gateway_to_cloudwatch on the thin_egress_app module:

    log_api_gateway_to_cloudwatch = true

    This enables the distribution API to send its logs to the default CloudWatch location: API-Gateway-Execution-Logs_<RESTAPI_ID>/<STAGE>

    Configure Permissions for API Gateway Logging to CloudWatch

    Instructions for enabling account level logging from API Gateway to CloudWatch

    This is a one time operation that must be performed on each AWS account to allow API Gateway to push logs to CloudWatch.

    Create a policy document

    The AmazonAPIGatewayPushToCloudWatchLogs managed policy, with an ARN of arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs, has all the required permissions to enable API Gateway logging to CloudWatch. To grant these permissions to your account, first create an IAM role with apigateway.amazonaws.com as its trusted entity.

    Save this snippet as apigateway-policy.json.

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "",
    "Effect": "Allow",
    "Principal": {
    "Service": "apigateway.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
    }
    ]
    }

    Create an account role to act as ApiGateway and write to CloudWatchLogs

    NASA users in NGAP: be sure to use your account's permission boundary.

    aws iam create-role \
    --role-name ApiGatewayToCloudWatchLogs \
    [--permissions-boundary <permissionBoundaryArn>] \
    --assume-role-policy-document file://apigateway-policy.json

    Note the ARN of the returned role for the last step.

    Attach correct permissions to role

    Next attach the AmazonAPIGatewayPushToCloudWatchLogs policy to the IAM role.

    aws iam attach-role-policy \
    --role-name ApiGatewayToCloudWatchLogs \
    --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs"

    Update Account API Gateway settings with correct permissions

    Finally, set the IAM role ARN on the cloudWatchRoleArn property on your API Gateway Account settings.

    aws apigateway update-account \
    --patch-operations op='replace',path='/cloudwatchRoleArn',value='<ApiGatewayToCloudWatchLogs ARN>'

    Configure API Gateway CloudWatch Logs Delivery

    See Configure Cloudwatch Logs Delivery

    - + \ No newline at end of file diff --git a/docs/v11.1.0/deployment/choosing_configuring_rds/index.html b/docs/v11.1.0/deployment/choosing_configuring_rds/index.html index 0333bc9ee41..faf83166927 100644 --- a/docs/v11.1.0/deployment/choosing_configuring_rds/index.html +++ b/docs/v11.1.0/deployment/choosing_configuring_rds/index.html @@ -5,7 +5,7 @@ Choosing and configuration your RDS database | Cumulus Documentation - + @@ -37,7 +37,7 @@ using this module to create your RDS cluster, you can configure the autoscaling timeout action, the cluster minimum and maximum capacity, and more as seen in the supported variables for the module.

    Unfortunately, Terraform currently doesn't allow specifying the autoscaling timeout itself, so that value will have to be manually configured in the AWS console or CLI.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/deployment/cloudwatch-logs-delivery/index.html b/docs/v11.1.0/deployment/cloudwatch-logs-delivery/index.html index df89fb5d58c..ab4d2a43648 100644 --- a/docs/v11.1.0/deployment/cloudwatch-logs-delivery/index.html +++ b/docs/v11.1.0/deployment/cloudwatch-logs-delivery/index.html @@ -5,13 +5,13 @@ Configure Cloudwatch Logs Delivery | Cumulus Documentation - +
    Version: v11.1.0

    Configure Cloudwatch Logs Delivery

    As an optional configuration step, it is possible to deliver CloudWatch logs to a cross-account shared AWS::Logs::Destination. An operator does this by configuring the cumulus module for your deployment as shown below. The value of the log_destination_arn variable is the ARN of a writeable log destination.

    The value can be either an AWS::Logs::Destination or a Kinesis Stream ARN to which your account can write.

    log_destination_arn           = arn:aws:[kinesis|logs]:us-east-1:123456789012:[streamName|destination:logDestinationName]

    Logs Sent

    Be default, the following logs will be sent to the destination when one is given.

    • Ingest logs
    • Async Operation logs
    • Thin Egress App API Gateway logs (if configured)

    Additional Logs

    If additional logs are needed, you can configure additional_log_groups_to_elk with the Cloudwatch log groups you want to send to the destination. additional_log_groups_to_elk is a map with the key as a descriptor and the value with the Cloudwatch log group name.

    additional_log_groups_to_elk = {
    "HelloWorldTask" = "/aws/lambda/cumulus-example-HelloWorld"
    "MyCustomTask" = "my-custom-task-log-group"
    }
    - + \ No newline at end of file diff --git a/docs/v11.1.0/deployment/components/index.html b/docs/v11.1.0/deployment/components/index.html index aec0056afd9..dd06fe8986e 100644 --- a/docs/v11.1.0/deployment/components/index.html +++ b/docs/v11.1.0/deployment/components/index.html @@ -5,7 +5,7 @@ Component-based Cumulus Deployment | Cumulus Documentation - + @@ -39,7 +39,7 @@ Terraform at the same time.

    With remote state, Terraform writes the state data to a remote data store, which can then be shared between all members of a team.

    The recommended approach for handling remote state with Cumulus is to use the S3 backend. This backend stores state in S3 and uses a DynamoDB table for locking.

    See the deployment documentation for a walk-through of creating resources for your remote state using an S3 backend.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/deployment/create_bucket/index.html b/docs/v11.1.0/deployment/create_bucket/index.html index 83982de6902..1eaa5b30dda 100644 --- a/docs/v11.1.0/deployment/create_bucket/index.html +++ b/docs/v11.1.0/deployment/create_bucket/index.html @@ -5,13 +5,13 @@ Creating an S3 Bucket | Cumulus Documentation - +
    Version: v11.1.0

    Creating an S3 Bucket

    Buckets can be created on the command line with AWS CLI or via the web interface on the AWS console.

    When creating a protected bucket (a bucket containing data which will be served through the distribution API), make sure to enable S3 server access logging. See S3 Server Access Logging for more details.

    Command line

    Using the AWS command line tool create-bucket s3api subcommand:

    $ aws s3api create-bucket \
    --bucket foobar-internal \
    --region us-west-2 \
    --create-bucket-configuration LocationConstraint=us-west-2
    {
    "Location": "/foobar-internal"
    }

    Note: The region and create-bucket-configuration arguments are only necessary if you are creating a bucket outside of the us-east-1 region.

    Please note security settings and other bucket options can be set via the options listed in the s3api documentation.

    Repeat the above step for each bucket to be created.

    Web interface

    See: AWS "Creating a Bucket" documentation

    - + \ No newline at end of file diff --git a/docs/v11.1.0/deployment/cumulus_distribution/index.html b/docs/v11.1.0/deployment/cumulus_distribution/index.html index fa50b49557d..f9f8358d838 100644 --- a/docs/v11.1.0/deployment/cumulus_distribution/index.html +++ b/docs/v11.1.0/deployment/cumulus_distribution/index.html @@ -5,14 +5,14 @@ Using the Cumulus Distribution API | Cumulus Documentation - +
    Version: v11.1.0

    Using the Cumulus Distribution API

    The Cumulus Distribution API is a set of endpoints that can be used to enable AWS Cognito authentication when downloading data from S3.

    Configuring a Cumulus Distribution deployment

    The Cumulus Distribution API is included in the main Cumulus repo. It is available as part of the terraform-aws-cumulus.zip archive in the latest release.

    These steps assume you're using the Cumulus Deployment Template but can also be used for custom deployments.

    To configure a deployment to use Cumulus Distribution:

    1. Remove or comment the "Thin Egress App Settings" in the Cumulus Template Deploy and enable the Cumulus Distribution settings.
    2. Delete or comment the contents of thin_egress_app.tf and the corresponding Thin Egress App outputs in outputs.tf. These are not necessary for a Cumulus Distribution deployment.
    3. Uncomment the Cumulus Distribution outputs in outputs.tf.
    4. Rename cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.example to cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.

    Cognito Application and User Credentials

    The major prerequisite for using the Cumulus Distribution API is to set up Cognito. If operating within NGAP, this should already be done for you. If operating outside of NGAP, you must set up Cognito yourself, which is beyond the scope of this documentation.

    Given that Cognito is set up, in order to be able to download granule files via the Cumulus Distribution API, you must obtain Cognito user credentials, because any attempt to download such files (that will be, or have been, published to the CMR via your Cumulus deployment) will result in a prompt for you to supply Cognito user credentials. To obtain your own user credentials, talk to your product owner or scrum master for additional information. They should either know how to create the credentials, know who can create them for the team, or be the liaison to the Cognito team.

    Further, whoever helps to obtain your Cognito user credentials should also be able to supply you with the values for the following new variables that you must add to your cumulus-tf/terraform.tfvars file:

    • csdap_host_url: The URL of the Cognito service to which your Cumulus deployment will make Cognito API calls during a distribution (download) event
    • csdap_client_id: The client ID for the Cumulus application registered within the Cognito service
    • csdap_client_password: The client password for the Cumulus application registered within the Cognito service

    Although you might have to wait a bit for your Cognito user credentials, the remaining instructions do not depend upon having them, so you may continue with these instructions while waiting for your credentials.

    Cumulus Distribution URL

    Your Cumulus Distribution URL is used by Cumulus to generate download URLs as part of the granule metadata generated and published to the CMR. For example, a granule download URL will be of the form <distribution url>/<protected bucket>/<key> (or <distribution url>/path/to/file, if using a custom bucket map, as explained further below).

    By default, the value of your distribution URL is the URL of your private Cumulus Distribution API Gateway (the API Gateway named <prefix>-distribution, once you deploy the Cumulus Distribution module). Therefore, by default, the generated download URLs are private, and thus inaccessible directly, but there are 2 ways to address this issue (both of which are detailed below): (a) use tunneling (typically in development) or (b) put a CloudFront URL in front of your API Gateway (typically in production, and perhaps UAT and/or SIT).

    In either case, you must first know the default URL (i.e., the URL for the private Cumulus Distribution API Gateway). In order to obtain this default URL, you must first deploy your cumulus-tf module with the new Cumulus Distribution module, and once your initial deployment is complete, one of the Terraform outputs will be cumulus_distribution_api_uri, which is the URL for the private API Gateway.

    You may override this default URL by adding a cumulus_distribution_url variable to your cumulus-tf/terraform.tfvars file, and setting it to one of the following values (both of which are explained below):

    1. The default URL, but with a port added to it, in order to allow you to configure tunneling (typically only in development)
    2. A CloudFront URL placed in front of your Cumulus Distribution API Gateway (typically only for Production, but perhaps also for a UAT or SIT environment)

    The following subsections explain these approaches, in turn.

    Using your Cumulus Distribution API Gateway URL as your distribution URL

    Since your Cumulus Distribution API Gateway URL is private, the only way you can use it to confirm that your integration with Cognito is working is by using tunneling (again, generally for development), as described here. Here is an outline of the required steps, with details provided further below:

    1. Create/import a key pair into your AWS EC2 service (if you haven't already done so)
    2. Add a reference to the name of the key pair to your Terraform variables (we'll set the key_name Terraform variable)
    3. Choose an open local port on your machine (we'll use 9000 in the following details)
    4. Add a reference to the value of your cumulus_distribution_api_uri (mentioned earlier), including your chosen port (we'll set the cumulus_distribution_url Terraform variable)
    5. Redeploy Cumulus
    6. Add an entry to your /etc/hosts file
    7. Add a redirect URI to Cognito, via the Cognito API
    8. Install the Session Manager Plugin for the AWS CLI (if you haven't already done so; assuming you have already installed the AWS CLI)
    9. Add a sample file to S3 to test downloading via Cognito

    To create or import an existing key pair, you can use the AWS CLI (see aws ec2 import-key-pair), or the AWS Console (see Amazon EC2 key pairs and Linux instances).

    Once your key pair is added to AWS, add the following to your cumulus-tf/terraform.tfvars file:

    key_name = "<name>"
    cumulus_distribution_url = "https://<id>.execute-api.<region>.amazonaws.com:<port>/dev/"

    where:

    • <name> is the name of the key pair you just added to AWS
    • <id> and <region> are the corresponding parts from your cumulus_distribution_api_uri output variable
    • <port> is your open local port of choice (9000 is typically a good choice)

    Once you save your variable changes, redeploy your cumulus-tf module.

    While your deployment runs, add the following entry to your /etc/hosts file, replacing <hostname> with the host name of the cumulus_distribution_url Terraform variable you just added above:

    localhost <hostname>

    Next, you'll need to use the Cognito API to add the value of your cumulus_distribution_url Terraform variable as a Cognito redirect URI. To do so, use your favorite tool (e.g., curl, wget, Postman, etc.) to make a BasicAuth request to the Cognito API, using the following details:

    • method: POST
    • base URL: the value of your csdap_host_url Terraform variable
    • path: /authclient/updateRedirectUri
    • username: the value of your csdap_client_id Terraform variable
    • password: the value of your csdap_client_password Terraform variable
    • headers: Content-Type='application/x-www-form-urlencoded'
    • body: redirect_uri=<cumulus_distribution_url>/login

    where <cumulus_distribution_url> is the value of your cumulus_distribution_url Terraform variable. Note the /login path at the end of the redirect_uri value.

    For reference, see the Cognito Authentication Service API.

    Next, install the Session Manager Plugin for the AWS CLI. If running on macOS, and you use Homebrew, you can install it simply as follows:

    brew install --cask session-manager-plugin --no-quarantine

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    At this point, you should be ready to open a tunnel and attempt to download your sample file via your browser, summarized as follows:

    1. Determine your ec2 instance ID
    2. Connect to the NASA VPN
    3. Start an AWS SSM session
    4. Open an ssh tunnel
    5. Use a browser to navigate to your file

    To determine your ec2 instance ID for your Cumulus deployment, run the follow command, where <profile> is the name of the appropriate AWS profile to use, and <prefix> is the value of your prefix Terraform variable:

    aws --profile <profile> ec2 describe-instances --filters Name=tag:Deployment,Values=<prefix> Name=instance-state-name,Values=running --query "Reservations[0].Instances[].InstanceId" --output text

    IMPORTANT: Before proceeding with the remaining steps, make sure you're connected to the NASA VPN.

    Use the value output from the command above in place of <id> in the following command, which will start an SSM session:

    aws ssm start-session --target <id> --document-name AWS-StartPortForwardingSession --parameters portNumber=22,localPortNumber=6000

    If successful, you should see output similar to the following:

    Starting session with SessionId: NGAPShApplicationDeveloper-***
    Port 6000 opened for sessionId NGAPShApplicationDeveloper-***.
    Waiting for connections...

    Open another terminal window, and open a tunnel with port forwarding, using your chosen port from above (e.g., 9000):

    ssh -4 -p 6000 -N -L <port>:<api-gateway-host>:443 ec2-user@127.0.0.1

    where:

    • <port> is the open local port you chose earlier (e.g., 9000)
    • <api-gateway-host> is the hostname of your private API Gateway (i.e., the host portion of the URL you used as the value of your cumulus_distribution_url Terraform variable above)

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3 above.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    Once you're finished testing, clean up as follows:

    1. Kill your ssh tunnel (Ctrl-C)
    2. Kill your AWS SSM session (Ctrl-C)
    3. If you like, disconnect from the NASA VPC

    While this is a relatively lengthy process, things are much easier when using CloudFront, such as in Production (OPS), SIT, or UAT, as explained next.

    Using a CloudFront URL as your distribution URL

    In Production (OPS), and perhaps in other environments, such as UAT and SIT, you'll need to provide a publicly accessible URL for users to use for downloading (distributing) granule files.

    This is generally done by placing a CloudFront URL in front of your private Cumulus Distribution API Gateway. In order to create such a CloudFront URL, contact the person who helped you obtain your Cognito credentials, and request a CloudFront URL with the following details:

    • The private, backing URL, which is the value of your cumulus_distribution_api_uri Terraform output value
    • A request to add the AWS account's VPC to the whitelist

    Once this request is completed, and you obtain the new CloudFront URL, override your default distribution URL with the CloudFront URL by adding the following to your cumulus-tf/terraform.tfvars file:

    cumulus_distribution_url = <cloudfront_url>

    In addition, add a Cognito redirect URI, as detailed in the previous section. Note that in this case, the value you'll use for redirect_uri is <cloudfront_url>/login since the value of your cumulus_distribution_url is now your CloudFront URL.

    At this point, it is assumed that you have added the appropriate values for this environment for the variables described at the top (csdap_host_url, csdap_client_id, and csdap_client_password).

    Redeploy Cumulus with your new/updated Terraform variables.

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    S3 Bucket Mapping

    An S3 Bucket map allows users to abstract bucket names. If the bucket names change at any point, only the bucket map would need to be updated instead of every S3 link.

    The Cumulus Distribution API uses a bucket_map.yaml or bucket_map.yaml.tmpl file to determine which buckets to serve. See the examples.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple json mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    Note: Cumulus only supports a one-to-one mapping of bucket -> Cumulus Distribution path for 'distribution' buckets. Also, the bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Switching from the Thin Egress App to Cumulus Distribution

    If you have previously deployed the Thin Egress App (TEA) as your distribution app, you can switch to Cumulus Distribution by following the steps above.

    Note, however, that the cumulus_distribution module will generate a bucket map cache and overwrite any existing bucket map caches created by TEA.

    There will also be downtime while your API gateway is updated.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/deployment/index.html b/docs/v11.1.0/deployment/index.html index 24d03dccd94..e2f88dd6619 100644 --- a/docs/v11.1.0/deployment/index.html +++ b/docs/v11.1.0/deployment/index.html @@ -5,7 +5,7 @@ How to Deploy Cumulus | Cumulus Documentation - + @@ -21,7 +21,7 @@ for deployment's EC2 instances and allows you to connect to them via SSH/SSM.

    Consider the sizing of your Cumulus instance when configuring your variables.

    Choose a distribution API

    Cumulus can be configured to use either the Thin Egress App (TEA) or the Cumulus Distribution API. The default selection is the Thin Egress App if you're using the Deployment Template.

    IMPORTANT! If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    Configure the Thin Egress App

    The Thin Egress App can be used for Cumulus distribution and is the default selection. It allows authentication using Earthdata Login. Follow the steps in the documentation to configure distribution in your cumulus-tf deployment.

    Configure the Cumulus Distribution API (optional)

    If you would prefer to use the Cumulus Distribution API, which supports AWS Cognito authentication, follow these steps to configure distribution in your cumulus-tf deployment.

    Initialize Terraform

    Follow the above instructions to initialize Terraform using terraform init3.

    Deploy

    Run terraform apply to deploy the resources. Type yes when prompted to confirm that you want to create the resources. Assuming the operation is successful, you should see output like this:

    Apply complete! Resources: 292 added, 0 changed, 0 destroyed.

    Outputs:

    archive_api_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/token
    archive_api_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/
    distribution_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/login
    distribution_url = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/

    Note: Be sure to copy the redirect URLs, as you will use them to update your Earthdata application.

    Update Earthdata Application

    You will need to add two redirect URLs to your EarthData login application.

    1. Login to URS.
    2. Under My Applications -> Application Administration -> use the edit icon of your application.
    3. Under Manage -> redirect URIs, add the Archive API url returned from the stack deployment
      • e.g. archive_api_redirect_uri = https://<czbbkscuy6>.execute-api.us-east-1.amazonaws.com/dev/token.
    4. Also add the Distribution url
      • e.g. distribution_redirect_uri = https://<kido2r7kji>.execute-api.us-east-1.amazonaws.com/dev/login1.
    5. You may delete the placeholder url you used to create the application.

    If you've lost track of the needed redirect URIs, they can be located on the API Gateway. Once there, select <prefix>-archive and/or <prefix>-thin-egress-app-EgressGateway, Dashboard and utilizing the base URL at the top of the page that is accompanied by the text Invoke this API at:. Make sure to append /token for the archive URL and /login to the thin egress app URL.


    Deploy Cumulus dashboard

    Dashboard Requirements

    Please note that the requirements are similar to the Cumulus stack deployment requirements. The installation instructions below include a step that will install/use the required node version referenced in the .nvmrc file in the dashboard repository.

    Prepare AWS

    Create S3 bucket for dashboard:

    • Create it, e.g. <prefix>-dashboard. Use the command line or console as you did when preparing AWS configuration.
    • Configure the bucket to host a website:
      • AWS S3 console: Select <prefix>-dashboard bucket then, "Properties" -> "Static Website Hosting", point to index.html
      • CLI: aws s3 website s3://<prefix>-dashboard --index-document index.html
    • The bucket's url will be http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or you can find it on the AWS console via "Properties" -> "Static website hosting" -> "Endpoint"
    • Ensure the bucket's access permissions allow your deployment user access to write to the bucket

    Install dashboard

    To install the dashboard, clone the Cumulus dashboard repository into the root deploy directory and install dependencies with npm install:

      git clone https://github.com/nasa/cumulus-dashboard
    cd cumulus-dashboard
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Dashboard versioning

    By default, the master branch will be used for dashboard deployments. The master branch of the dashboard repo contains the most recent stable release of the dashboard.

    If you want to test unreleased changes to the dashboard, use the develop branch.

    Each release/version of the dashboard will have a tag in the dashboard repo. Release/version numbers will use semantic versioning (major/minor/patch).

    To checkout and install a specific version of the dashboard:

      git fetch --tags
    git checkout <version-number> # e.g. v1.2.0
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Building the dashboard

    Note: These environment variables are available during the build: APIROOT, DAAC_NAME, STAGE, HIDE_PDR. Any of these can be set on the command line to override the values contained in config.js when running the build below.

    To configure your dashboard for deployment, set the APIROOT environment variable to your app's API root.2

    Build the dashboard from the dashboard repository root directory, cumulus-dashboard:

      APIROOT=<your_api_root> npm run build

    Dashboard deployment

    Deploy dashboard to s3 bucket from the cumulus-dashboard directory:

    Using AWS CLI:

      aws s3 sync dist s3://<prefix>-dashboard --acl public-read

    From the S3 Console:

    • Open the <prefix>-dashboard bucket, click 'upload'. Add the contents of the 'dist' subdirectory to the upload. Then select 'Next'. On the permissions window allow the public to view. Select 'Upload'.

    You should be able to visit the dashboard website at http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or find the url <prefix>-dashboard -> "Properties" -> "Static website hosting" -> "Endpoint" and login with a user that you configured for access in the Configure and Deploy the Cumulus Stack step.


    Cumulus Instance Sizing

    The Cumulus deployment default sizing for Elasticsearch instances, EC2 instances, and Autoscaling Groups are small and designed for testing and cost savings. The default settings are likely not suitable for production workloads. Sizing is highly individual and dependent on expected load and archive size.

    Please be cognizant of costs as any change in size will affect your AWS bill. AWS provides a pricing calculator for estimating costs.

    Elasticsearch

    The mappings file contains all of the data types that will be indexed into Elasticsearch. Elasticsearch sizing is tied to your archive size, including your collections, granules, and workflow executions that will be stored.

    AWS provides documentation on calculating and configuring for sizing.

    In addition to size you'll want to consider the number of nodes which determine how the system reacts in the event of a failure.

    Configuration can be done in the data persistence module in elasticsearch_config and the cumulus module in es_index_shards.

    If you make changes to your Elasticsearch configuration you will need to reindex for those changes to take effect.

    EC2 instances and autoscaling groups

    EC2 instances are used for long-running operations (i.e. generating a reconciliation report) and long-running workflow tasks. Configuration for your ECS cluster is achieved via Cumulus deployment variables.

    When configuring your ECS cluster consider:

    • The EC2 instance type and EBS volume size needed to accommodate your workloads. Configured as ecs_cluster_instance_type and ecs_cluster_instance_docker_volume_size.
    • The minimum and desired number of instances on hand to accommodate your workloads. Configured as ecs_cluster_min_size and ecs_cluster_desired_size.
    • The maximum number of instances you will need and are willing to pay for to accommodate your heaviest workloads. Configured as ecs_cluster_max_size.
    • Your autoscaling parameters: ecs_cluster_scale_in_adjustment_percent, ecs_cluster_scale_out_adjustment_percent, ecs_cluster_scale_in_threshold_percent, and ecs_cluster_scale_out_threshold_percent.

    Footnotes


    1. Run terraform init if:

      • This is the first time deploying the module
      • You have added any additional child modules, including Cumulus components
      • You have updated the source for any of the child modules

    2. To add another redirect URIs to your application. On Earthdata home page, select "My Applications". Scroll down to "Application Administration" and use the edit icon for your application. Then Manage -> Redirect URIs.

    3. The API root can be found a number of ways. The easiest is to note it in the output of the app deployment step. But you can also find it from the AWS console -> Amazon API Gateway -> APIs -> <prefix>-archive -> Dashboard, and reading the URL at the top after "Invoke this API at"

    - + \ No newline at end of file diff --git a/docs/v11.1.0/deployment/postgres_database_deployment/index.html b/docs/v11.1.0/deployment/postgres_database_deployment/index.html index 523800af7e6..005997cf8c4 100644 --- a/docs/v11.1.0/deployment/postgres_database_deployment/index.html +++ b/docs/v11.1.0/deployment/postgres_database_deployment/index.html @@ -5,7 +5,7 @@ PostgreSQL Database Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ cumulus-rds-tf that will deploy an AWS RDS Aurora Serverless PostgreSQL 10.2 compatible database cluster, and optionally provision a single deployment database with credentialed secrets for use with Cumulus.

    We have provided an example terraform deployment using this module in the Cumulus template-deploy repository on github.

    Use of this example involves:

    • Creating/configuring a Terraform module directory
    • Using Terraform to deploy resources to AWS

    Requirements

    Configuration/installation of this module requires the following:

    • Terraform
    • git
    • A VPC configured for use with Cumulus Core. This should match the subnets you provide when Deploying Cumulus to allow Core's lambdas to properly access the database.
    • At least two subnets across multiple AZs. These should match the subnets you provide as configuration when Deploying Cumulus, and should be within the same VPC.

    Needed Git Repositories

    Assumptions

    OS/Environment

    The instructions in this module require Linux/MacOS. While deployment via Windows is possible, it is unsupported.

    Terraform

    This document assumes knowledge of Terraform. If you are not comfortable working with Terraform, the following links should bring you up to speed:

    For Cumulus specific instructions on installation of Terraform, refer to the main Cumulus Installation Documentation

    Aurora/RDS

    This document also assumes some basic familiarity with PostgreSQL databases, and Amazon Aurora/RDS. If you're unfamiliar consider perusing the AWS docs, and the Aurora Serverless V1 docs.

    Prepare deployment repository

    If you already are working with an existing repository that has a configured rds-cluster-tf deployment for the version of Cumulus you intend to deploy or update, or just need to configure this module for your repository, skip to Prepare AWS configuration.

    Clone the cumulus-template-deploy repo and name appropriately for your organization:

      git clone https://github.com/nasa/cumulus-template-deploy <repository-name>

    We will return to configuring this repo and using it for deployment below.

    Optional: Create a new repository

    Create a new repository on Github so that you can add your workflows and other modules to source control:

      git remote set-url origin https://github.com/<org>/<repository-name>
    git push origin master

    You can then add/commit changes as needed.

    Note: If you are pushing your deployment code to a git repo, make sure to add terraform.tf and terraform.tfvars to .gitignore, as these files will contain sensitive data related to your AWS account.


    Prepare AWS configuration

    To deploy this module, you need to make sure that you have the following steps from the Cumulus deployment instructions in similar fashion for this module:

    --

    Configure and deploy the module

    When configuring this module, please keep in mind that unlike Cumulus deployment, this module should be deployed once to create the database cluster and only thereafter to make changes to that configuration/upgrade/etc. This module does not need to be re-deployed for each Core update.

    These steps should be executed in the rds-cluster-tf directory of the template deploy repo that you previously cloned. Run the following to copy the example files:

    cd rds-cluster-tf/
    cp terraform.tf.example terraform.tf
    cp terraform.tfvars.example terraform.tfvars

    In terraform.tf, configure the remote state settings by substituting the appropriate values for:

    • bucket
    • dynamodb_table
    • PREFIX (whatever prefix you've chosen for your deployment)

    Fill in the appropriate values in terraform.tfvars. See the rds-cluster-tf module variable definitions for more detail on all of the configuration options. A few notable configuration options are documented in the next section.

    Configuration Options

    • deletion_protection -- defaults to true. Set it to false if you want to be able to delete your cluster with a terraform destroy without manually updating the cluster.
    • db_admin_username -- cluster database administration username. Defaults to postgres.
    • db_admin_password -- required variable that specifies the admin user password for the cluster. To randomize this on each deployment, consider using a random_string resource as input.
    • region -- defaults to us-east-1.
    • subnets -- requires at least 2 across different AZs. For use with Cumulus, these AZs should match the values you configure for your lambda_subnet_ids.
    • max_capacity -- the max ACUs the cluster is allowed to use. Carefully consider cost/performance concerns when setting this value.
    • min_capacity -- the minimum ACUs the cluster will scale to
    • provision_user_database -- Optional flag to allow module to provision a user database in addition to creating the cluster. Described in the next section.

    Provision user and user database

    If you wish for the module to provision a PostgreSQL database on your new cluster and provide a secret for access in the module output, in addition to managing the cluster itself, the following configuration keys are required:

    • provision_user_database -- must be set to true, this configures the module to deploy a lambda that will create the user database, and update the provided configuration on deploy.
    • permissions_boundary_arn -- the permissions boundary to use in creating the roles for access the provisioning lambda will need. This should in most use cases be the same one used for Cumulus Core deployment.
    • rds_user_password -- the value to set the user password to
    • prefix -- this value will be used to set a unique identifier the ProvisionDatabase lambda, as well as name the provisioned user/database.

    Once configured, the module will deploy the lambda, and run it on each provision, creating the configured database if it does not exist, updating the user password if that value has been changed, and updating the output user database secret.

    Setting provision_user_database to false after provisioning will not result in removal of the configured database, as the lambda is non-destructive as configured in this module.

    Please Note: This functionality is limited in that it will only provision a single database/user and configure a basic database, and should not be used in scenarios where more complex configuration is required.

    Initialize Terraform

    Run terraform init

    You should see output like:

    * provider.aws: version = "~> 2.32"

    Terraform has been successfully initialized!

    Deploy

    Run terraform apply to deploy the resources.

    If re-applying this module, variables (e.g. engine_version, snapshot_identifier ) that force a recreation of the database cluster may result in data loss if deletion protection is disabled. Examine the changeset carefully for resources that will be re-created/destroyed before applying.

    Review the changeset, and assuming it looks correct, type yes when prompted to confirm that you want to create all of the resources.

    Assuming the operation is successful, you should see output similar to the following (this example omits the creation of a user database/lambdas/security groups):

    terraform apply

    An execution plan has been generated and is shown below.
    Resource actions are indicated with the following symbols:
    + create

    Terraform will perform the following actions:

    # module.rds_cluster.aws_db_subnet_group.default will be created
    + resource "aws_db_subnet_group" "default" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + subnet_ids = [
    + "subnet-xxxxxxxxx",
    + "subnet-xxxxxxxxx",
    ]
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    }

    # module.rds_cluster.aws_rds_cluster.cumulus will be created
    + resource "aws_rds_cluster" "cumulus" {
    + apply_immediately = true
    + arn = (known after apply)
    + availability_zones = (known after apply)
    + backup_retention_period = 1
    + cluster_identifier = "xxxxxxxxx"
    + cluster_identifier_prefix = (known after apply)
    + cluster_members = (known after apply)
    + cluster_resource_id = (known after apply)
    + copy_tags_to_snapshot = false
    + database_name = "xxxxxxxxx"
    + db_cluster_parameter_group_name = (known after apply)
    + db_subnet_group_name = (known after apply)
    + deletion_protection = true
    + enable_http_endpoint = true
    + endpoint = (known after apply)
    + engine = "aurora-postgresql"
    + engine_mode = "serverless"
    + engine_version = "10.12"
    + final_snapshot_identifier = "xxxxxxxxx"
    + hosted_zone_id = (known after apply)
    + id = (known after apply)
    + kms_key_id = (known after apply)
    + master_password = (sensitive value)
    + master_username = "xxxxxxxxx"
    + port = (known after apply)
    + preferred_backup_window = "07:00-09:00"
    + preferred_maintenance_window = (known after apply)
    + reader_endpoint = (known after apply)
    + skip_final_snapshot = false
    + storage_encrypted = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_security_group_ids = (known after apply)

    + scaling_configuration {
    + auto_pause = true
    + max_capacity = 4
    + min_capacity = 2
    + seconds_until_auto_pause = 300
    + timeout_action = "RollbackCapacityChange"
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret.rds_login will be created
    + resource "aws_secretsmanager_secret" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + policy = (known after apply)
    + recovery_window_in_days = 30
    + rotation_enabled = (known after apply)
    + rotation_lambda_arn = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }

    + rotation_rules {
    + automatically_after_days = (known after apply)
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret_version.rds_login will be created
    + resource "aws_secretsmanager_secret_version" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + secret_id = (known after apply)
    + secret_string = (sensitive value)
    + version_id = (known after apply)
    + version_stages = (known after apply)
    }

    # module.rds_cluster.aws_security_group.rds_cluster_access will be created
    + resource "aws_security_group" "rds_cluster_access" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + egress = (known after apply)
    + id = (known after apply)
    + ingress = (known after apply)
    + name = (known after apply)
    + name_prefix = "cumulus_rds_cluster_access_ingress"
    + owner_id = (known after apply)
    + revoke_rules_on_delete = false
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_id = "vpc-xxxxxxxxx"
    }

    # module.rds_cluster.aws_security_group_rule.rds_security_group_allow_PostgreSQL will be created
    + resource "aws_security_group_rule" "rds_security_group_allow_postgres" {
    + from_port = 5432
    + id = (known after apply)
    + protocol = "tcp"
    + security_group_id = (known after apply)
    + self = true
    + source_security_group_id = (known after apply)
    + to_port = 5432
    + type = "ingress"
    }

    Plan: 6 to add, 0 to change, 0 to destroy.

    Do you want to perform these actions?
    Terraform will perform the actions described above.
    Only 'yes' will be accepted to approve.

    Enter a value: yes

    module.rds_cluster.aws_db_subnet_group.default: Creating...
    module.rds_cluster.aws_security_group.rds_cluster_access: Creating...
    module.rds_cluster.aws_secretsmanager_secret.rds_login: Creating...

    Then, after the resources are created:

    Apply complete! Resources: X added, 0 changed, 0 destroyed.
    Releasing state lock. This may take a few moments...

    Outputs:

    admin_db_login_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxxxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmdR
    admin_db_login_secret_version = xxxxxxxxx
    rds_endpoint = xxxxxxxxx.us-east-1.rds.amazonaws.com
    security_group_id = xxxxxxxxx
    user_credentials_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmpXA

    Note the output values for admin_db_login_secret_arn (and optionally user_credentials_secret_arn) as these provide the AWS Secrets Manager secret required to access the database as the administrative user and, optionally, the user database credentials Cumulus requires as well.

    The content of each of these secrets are is in the form:

    {
    "database": "postgres",
    "dbClusterIdentifier": "clusterName",
    "engine": "postgres",
    "host": "xxx",
    "password": "defaultPassword",
    "port": 5432,
    "username": "xxx"
    }
    • database -- the PostgreSQL database used by the configured user
    • dbClusterIdentifier -- the value set by the cluster_identifier variable in the terraform module
    • engine -- the Aurora/RDS database engine
    • host -- the RDS service host for the database in the form (dbClusterIdentifier)-(AWS ID string).(region).rds.amazonaws.com
    • password -- the database password
    • username -- the account username
    • port -- The database connection port, should always be 5432

    Next Steps

    The database cluster has been created/updated! From here you can continue to add additional user accounts, databases and other database configuration.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/deployment/share-s3-access-logs/index.html b/docs/v11.1.0/deployment/share-s3-access-logs/index.html index 1f0126c700e..8b5772d8163 100644 --- a/docs/v11.1.0/deployment/share-s3-access-logs/index.html +++ b/docs/v11.1.0/deployment/share-s3-access-logs/index.html @@ -5,14 +5,14 @@ Share S3 Access Logs | Cumulus Documentation - +
    Version: v11.1.0

    Share S3 Access Logs

    It is possible through Cumulus to share S3 access logs across multiple S3 packages using the S3 replicator package.

    S3 Replicator

    The S3 Replicator is a node package that contains a simple lambda function, associated permissions, and the Terraform instructions to replicate create-object events from one S3 bucket to another.

    First ensure that you have enabled S3 Server Access Logging.

    Next configure your config.tfvars as described in the s3-replicator/README.md to correspond to your deployment. The source_bucket and source_prefix are determined by how you enabled the S3 Server Access Logging.

    In order to deploy the s3-replicator with cumulus you will need to add the module to your terraform main.tf definition. e.g.

    module "s3-replicator" {
    source = "<path to s3-replicator.zip>"
    prefix = var.prefix
    vpc_id = var.vpc_id
    subnet_ids = var.subnet_ids
    permissions_boundary = var.permissions_boundary_arn
    source_bucket = var.s3_replicator_config.source_bucket
    source_prefix = var.s3_replicator_config.source_prefix
    target_bucket = var.s3_replicator_config.target_bucket
    target_prefix = var.s3_replicator_config.target_prefix
    }

    The terraform source package can be found on the Cumulus github release page under the asset tab terraform-aws-cumulus-s3-replicator.zip.

    ESDIS Metrics

    In the NGAP environment, the ESDIS Metrics team has set up an ELK stack to process logs from Cumulus instances. To use this system, you must deliver any S3 Server Access logs that Cumulus creates.

    Configure the S3 replicator as described above using the target_bucket and target_prefix provided by the metrics team.

    The metrics team has taken care of setting up Logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/deployment/terraform-best-practices/index.html b/docs/v11.1.0/deployment/terraform-best-practices/index.html index ad948524900..4ef9a344f6d 100644 --- a/docs/v11.1.0/deployment/terraform-best-practices/index.html +++ b/docs/v11.1.0/deployment/terraform-best-practices/index.html @@ -5,7 +5,7 @@ Terraform Best Practices | Cumulus Documentation - + @@ -88,7 +88,7 @@ AWS CLI command, replacing PREFIX with your deployment prefix name:

    aws resourcegroupstaggingapi get-resources \
    --query "ResourceTagMappingList[].ResourceARN" \
    --tag-filters Key=Deployment,Values=PREFIX

    Ideally, the output should be an empty list, but if it is not, then you may need to manually delete the listed resources.

    Configuring the Cumulus deployment: link Restoring a previous version: link

    - + \ No newline at end of file diff --git a/docs/v11.1.0/deployment/thin_egress_app/index.html b/docs/v11.1.0/deployment/thin_egress_app/index.html index e98297a020b..b3fd06976a1 100644 --- a/docs/v11.1.0/deployment/thin_egress_app/index.html +++ b/docs/v11.1.0/deployment/thin_egress_app/index.html @@ -5,7 +5,7 @@ Using the Thin Egress App for Cumulus distribution | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v11.1.0

    Using the Thin Egress App for Cumulus distribution

    The Thin Egress App (TEA) is an app running in Lambda that allows retrieving data from S3 using temporary links and provides URS integration.

    Configuring a TEA deployment

    TEA is deployed using Terraform modules. Refer to these instructions for guidance on how to integrate new components with your deployment.

    The cumulus-template-deploy repository cumulus-tf/main.tf contains a thin_egress_app for distribution.

    The TEA module provides these instructions showing how to add it to your deployment and the following are instructions to configure the thin_egress_app module in your Cumulus deployment.

    Create a secret for signing Thin Egress App JWTs

    The Thin Egress App uses JWTs internally to authenticate requests and requires a secret stored in AWS Secrets Manager containing SSH keys that are used to sign the JWTs.

    See the Thin Egress App documentation on how to create this secret with the correct values. It will be used later to set the thin_egress_jwt_secret_name variable when deploying the Cumulus module.

    bucket_map.yaml

    The Thin Egress App uses a bucket_map.yaml file to determine which buckets to serve. Documentation of the file format is available here.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple json mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    Please note: Cumulus only supports a one-to-one mapping of bucket->TEA path for 'distribution' buckets.

    Optionally configure a custom bucket map

    A simple config would look something like this:

    bucket_map.yaml
    MAP:
    my-protected: my-protected
    my-public: my-public

    PUBLIC_BUCKETS:
    - my-public

    Please note: your custom bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Optionally configure shared variables

    The cumulus module deploys certain components that interact with TEA. As a result, the cumulus module requires that if you are specifying a value for the stage_name variable to the TEA module, you must use the same value for the tea_api_gateway_stage variable to the cumulus module.

    One way to keep these variable values in sync across the modules is to use Terraform local values to define values to use for the variables for both modules. This approach is shown in the Cumulus core example deployment code.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/deployment/upgrade-readme/index.html b/docs/v11.1.0/deployment/upgrade-readme/index.html index f2c18ee8f29..159a85d8700 100644 --- a/docs/v11.1.0/deployment/upgrade-readme/index.html +++ b/docs/v11.1.0/deployment/upgrade-readme/index.html @@ -5,7 +5,7 @@ Upgrading Cumulus | Cumulus Documentation - + @@ -15,7 +15,7 @@ deployment functions correctly. Please refer to some recommended smoke tests given above, and consider additional tests appropriate for your particular deployment and environment.

    Update Cumulus Dashboard

    If there are breaking (or otherwise significant) changes to the Cumulus API, you should also upgrade your Cumulus Dashboard deployment to use the version of the Cumulus API matching the version of Cumulus to which you are migrating.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/development/forked-pr/index.html b/docs/v11.1.0/development/forked-pr/index.html index 33770d5ab58..9640f758c6c 100644 --- a/docs/v11.1.0/development/forked-pr/index.html +++ b/docs/v11.1.0/development/forked-pr/index.html @@ -5,13 +5,13 @@ Issuing PR From Forked Repos | Cumulus Documentation - +
    Version: v11.1.0

    Issuing PR From Forked Repos

    Fork the Repo

    • Fork the Cumulus repo
    • Create a new branch from the branch you'd like to contribute to
    • If an issue does't already exist, submit one (see above)

    Create a Pull Request

    Reviewing PRs from Forked Repos

    Upon submission of a pull request, the Cumulus development team will review the code.

    Once the code passes an initial review, the team will run the CI tests against the proposed update.

    The request will then either be merged, declined, or an adjustment to the code will be requested via the issue opened with the original PR request.

    PRs from forked repos cannot directly merged to master. Cumulus reviews must follow the following steps before completing the review process:

    1. Create a new branch:

        git checkout -b from-<name-of-the-branch> master
    2. Push the new branch to GitHub

    3. Change the destination of the forked PR to the new branch that was just pushed

      Screenshot of Github interface showing how to change the base branch of a pull request

    4. After code review and approval, merge the forked PR to the new branch.

    5. Create a PR for the new branch to master.

    6. If the CI tests pass, merge the new branch to master and close the issue. If the CI tests do not pass, request an amended PR from the original author/ or resolve failures as appropriate.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/development/integration-tests/index.html b/docs/v11.1.0/development/integration-tests/index.html index 0fe42de5d62..b3cfdad16f2 100644 --- a/docs/v11.1.0/development/integration-tests/index.html +++ b/docs/v11.1.0/development/integration-tests/index.html @@ -5,7 +5,7 @@ Integration Tests | Cumulus Documentation - + @@ -19,7 +19,7 @@ in the commit message.

    If you create a new stack and want to be able to run integration tests against it in CI, you will need to add it to bamboo/select-stack.js.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/development/quality-and-coverage/index.html b/docs/v11.1.0/development/quality-and-coverage/index.html index 0c5971cac15..4550b9908cd 100644 --- a/docs/v11.1.0/development/quality-and-coverage/index.html +++ b/docs/v11.1.0/development/quality-and-coverage/index.html @@ -5,7 +5,7 @@ Code Coverage and Quality | Cumulus Documentation - + @@ -23,7 +23,7 @@ here.

    To run linting on the markdown files, run npm run lint-md.

    Audit

    This project uses audit-ci to run a security audit on the package dependency tree. This must pass prior to merge. The configured rules for audit-ci can be found here.

    To execute an audit, run npm run audit.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/development/release/index.html b/docs/v11.1.0/development/release/index.html index b63e62e2b98..96eb0cd7642 100644 --- a/docs/v11.1.0/development/release/index.html +++ b/docs/v11.1.0/development/release/index.html @@ -5,7 +5,7 @@ Versioning and Releases | Cumulus Documentation - + @@ -15,7 +15,7 @@ It's useful to use the search feature of your code editor or grep to see if there any references to the old package versions. In bash shell you can run

    find . -name package.json -exec grep -nH "@cumulus/.*MAJOR\.MINOR\.PATCH.*" {} \;

    Verify that each of those is updated to the new MAJOR.MINOR.PATCH verion you are trying to release.

    A similar search for alpha and beta versions should be run on the release version and any problems should be fixed.

    find . -name package.json -exec grep -nHE "MAJOR\.MINOR\.PATCH.*(alpha|beta)" {} \;

    3. Check Cumulus Dashboard PRs for Version Bump

    There may be unreleased changes in the Cumulus Dashboard project that rely on this unreleased Cumulus Core version.

    If there is exists a PR in the cumulus-dashboard repo with a name containing: "Version Bump for Next Cumulus API Release":

    • There will be a placeholder change-me value that should be replaced with the Cumulus Core to-be-released-version.
    • Mark that PR as ready to be reviewed.

    4. Update CHANGELOG.md

    Update the CHANGELOG.md. Put a header under the Unreleased section with the new version number and the date.

    Add a link reference for the github "compare" view at the bottom of the CHANGELOG.md, following the existing pattern. This link reference should create a link in the CHANGELOG's release header to changes in the corresponding release.

    5. Update DATA_MODEL_CHANGELOG.md

    Similar to #4, make sure the DATA_MODEL_CHANGELOG is updated if there are data model changes in the release, and the link reference at the end of the document is updated as appropriate.

    6. Update CONTRIBUTORS.md

    ./bin/update-contributors.sh
    git add CONTRIBUTORS.md

    Commit and push these changes, if any.

    7. Update Cumulus package API documentation

    Update auto-generated API documentation for any Cumulus packages that have it:

    npm run docs-build-packages

    Commit and push these changes, if any.

    8. Cut new version of Cumulus Documentation

    If this is a backport, do not create a new version of the documentation. For various reasons, we do not merge backports back to master, other than changelog notes. Documentation changes for backports will not be published to our documentation website.

    cd website
    npm run version ${release_version}
    git add .

    Where ${release_version} corresponds to the version tag v1.2.3, for example.

    Commit and push these changes.

    9. Create a pull request against the minor version branch

    1. Push the release branch (e.g. release-1.2.3) to GitHub.

    2. Create a PR against the minor version base branch (e.g. release-1.2.x).

    3. Configure Bamboo to run automated tests against this PR by finding the branch plan for the release branch (release-1.2.3) and setting only these variables:

      • GIT_PR: true
      • SKIP_AUDIT: true

      IMPORTANT: Do NOT set the PUBLISH_FLAG variable to true for this branch plan. The actual publishing of the release will be handled by a separate, manually triggered branch plan.

      Screenshot of Bamboo CI interface showing the configuration of the GIT_PR branch variable to have a value of &quot;true&quot;

    4. Verify that the Bamboo build for the PR succeeds and then merge to the minor version base branch (release-1.2.x).

      • It is safe to do a squash merge in this instance, but not required
    5. You may delete your release branch (release-1.2.3) after merging to the base branch.

    10. Create a git tag for the release

    Check out the minor version base branch (release-1.2.x) now that your changes are merged in and do a git pull.

    Ensure you are on the latest commit.

    Create and push a new git tag:

        git tag -a vMAJOR.MINOR.PATCH -m "Release MAJOR.MINOR.PATCH"
    git push origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -a v9.1.0 -m "Release 9.1.0"
    git push origin v9.1.0

    11. Publishing the release

    Publishing of new releases is handled by a custom Bamboo branch plan and is manually triggered.

    The reasons for using a separate branch plan to handle releases instead of the branch plan for the minor version (e.g. release-1.2.x) are:

    • The Bamboo build for the minor version release branch is triggered automatically on any commits to that branch, whereas we want to manually control when the release is published.
    • We want to verify that integration tests have passed on the Bamboo build for the minor version release branch before we manually trigger the release, so that we can be sure that our code is safe to release.

    If this is a new minor version branch, then you will need to create a new Bamboo branch plan for publishing the release following the instructions below:

    Creating a Bamboo branch plan for the release

    • In the Cumulus Core project (https://ci.earthdata.nasa.gov/browse/CUM-CBA), click Actions -> Configure Plan in the top right.

    • Next to Plan branch click the rightmost button that displays Create Plan Branch upon hover.

    • Click Create plan branch manually.

    • Add the values in that list. Choose a display name that makes it very clear this is a deployment branch plan. Release (minor version branch name) seems to work well (e.g. Release (1.2.x))).

      • Make sure you enter the correct branch name (e.g. release-1.2.x).
    • Important Deselect Enable Branch - if you do not do this, it will immediately fire off a build.

    • Do Immediately On the Branch Details page, enable Change trigger. Set the Trigger type to manual, this will prevent commits to the branch from triggering the build plan. You should have been redirected to the Branch Details tab after creating the plan. If not, navigate to the branch from the list where you clicked Create Plan Branch in the previous step.

    • Go to the Variables tab. Ensure that you are on your branch plan and not the master plan: You should not see a large list of configured variables, but instead a dropdown allowing you to select variables to override, and the tab title will be Branch Variables. Then set the branch variables as follow:

      • DEPLOYMENT: cumulus-from-npm-tf (except in special cases such as incompatible backport branches)
        • If this variable is not set, it will default to the deployment name for the last committer on the branch
      • USE_CACHED_BOOTSTRAP: false
      • USE_TERRAFORM_ZIPS: true (IMPORTANT: MUST be set in order to run integration tests against the .zip files published during the build so that we are actually testing our released files)
      • GIT_PR: true
      • SKIP_AUDIT: true
      • PUBLISH_FLAG: true
    • Enable the branch from the Branch Details page.

    • Run the branch using the Run button in the top right.

    Bamboo will build and run lint and unit tests against that tagged release, publish the new packages to NPM, and then run the integration tests using those newly released packages.

    12. Create a new Cumulus release on github

    The CI release scripts will automatically create a GitHub release based on the release version tag, as well as upload artifacts to the Github release for the Terraform modules provided by Cumulus. The Terraform release artifacts include:

    • A multi-module Terraform .zip artifact containing filtered copies of the tf-modules, packages, and tasks directories for use as Terraform module sources.
    • A S3 replicator module
    • A workflow module
    • A distribution API module
    • An ECS service module

    Just make sure to verify the appropriate .zip files are present on Github after the release process is complete.

    13. Merge base branch back to master

    Finally, you need to reproduce the version update changes back to master.

    If this is the latest version, you can simply create a PR to merge the minor version base branch back to master.

    Do not merge master back into the release branch since we want the release branch to just have the code from the release. Instead, create a new branch off of the release branch and merge that to master. You can freely merge master into this branch and delete it when it is merged to master.

    If this is a backport, you will need to create a PR that ports the changelog updates back to master. It is important in this changelog note to call it out as a backport. For example, fixes in backport version 1.14.5 may not be available in 1.15.0 because the fix was introduced in 1.15.3.

    Troubleshooting

    Delete and regenerate the tag

    To delete a published tag to re-tag, follow these steps:

      git tag -d vMAJOR.MINOR.PATCH
    git push -d origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -d v9.1.0
    git push -d origin v9.1.0
    - + \ No newline at end of file diff --git a/docs/v11.1.0/docs-how-to/index.html b/docs/v11.1.0/docs-how-to/index.html index c6d0b25fefa..7616504138f 100644 --- a/docs/v11.1.0/docs-how-to/index.html +++ b/docs/v11.1.0/docs-how-to/index.html @@ -5,13 +5,13 @@ Cumulus Documentation: How To's | Cumulus Documentation - +
    Version: v11.1.0

    Cumulus Documentation: How To's

    Cumulus Docs Installation

    Run a Local Server

    Environment variables DOCSEARCH_API_KEY and DOCSEARCH_INDEX_NAME must be set for search to work. At the moment, search is only truly functional on prod because that is the only website we have registered to be indexed with DocSearch (see below on search).

    git clone git@github.com:nasa/cumulus
    cd cumulus
    npm run docs-install
    npm run docs-serve

    Note: docs-build will build the documents into website/build.

    Cumulus Documentation

    Our project documentation is hosted on GitHub Pages. The resources published to this website are housed in docs/ directory at the top of the Cumulus repository. Those resources primarily consist of markdown files and images.

    We use the open-source static website generator Docusaurus to build html files from our markdown documentation, add some organization and navigation, and provide some other niceties in the final website (search, easy templating, etc.).

    Add a New Page and Sidebars

    Adding a new page should be as simple as writing some documentation in markdown, placing it under the correct directory in the docs/ folder and adding some configuration values wrapped by --- at the top of the file. There are many files that already have this header which can be used as reference.

    ---
    id: doc-unique-id # unique id for this document. This must be unique across ALL documentation under docs/
    title: Title Of Doc # Whatever title you feel like adding. This will show up as the index to this page on the sidebar.
    hide_title: false
    ---

    Note: To have the new page show up in a sidebar the designated id must be added to a sidebar in the website/sidebars.js file. Docusaurus has an in depth explanation of sidebars here.

    Versioning Docs

    We lean heavily on Docusaurus for versioning. Their suggestions and walk-through can be found here. It is worth noting that we would like the Documentation versions to match up directly with release versions. Cumulus versioning is explained in the Versioning Docs.

    Search on our documentation site is taken care of by DocSearch. We have been provided with an apiKey and an indexName by DocSearch that we include in our website/siteConfig.js file. The rest, indexing and actual searching, we leave to DocSearch. Our builds expect environment variables for both these values to exist - DOCSEARCH_API_KEY and DOCSEARCH_NAME_INDEX.

    Add a new task

    The tasks list in docs/tasks.md is generated from the list of task package in the task folder. Do not edit the docs/tasks.md file directly.

    Read more about adding a new task.

    Editing the tasks.md header or template

    Look at the bin/build-tasks-doc.js and bin/tasks-header.md files to edit the output of the tasks build script.

    Editing diagrams

    For some diagrams included in the documentation, the raw source is included in the docs/assets/raw directory to allow for easy updating in the future:

    • assets/interfaces.svg -> assets/raw/interfaces.drawio (generated using draw.io)

    Deployment

    The master branch is automatically built and deployed to gh-pages branch. The gh-pages branch is served by Github Pages. Do not make edits to the gh-pages branch.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/external-contributions/index.html b/docs/v11.1.0/external-contributions/index.html index 5f903bad727..94f49769744 100644 --- a/docs/v11.1.0/external-contributions/index.html +++ b/docs/v11.1.0/external-contributions/index.html @@ -5,13 +5,13 @@ External Contributions | Cumulus Documentation - +
    Version: v11.1.0

    External Contributions

    Contributions to Cumulus may be made in the form of PRs to the repositories directly or through externally developed tasks and components. Cumulus is designed as an ecosystem that leverages Terraform deployments and AWS Step Functions to easily integrate external components.

    This list may not be exhaustive and represents components that are open source, owned externally, and that have been tested with the Cumulus system. For more information and contributing guidelines, visit the respective GitHub repositories.

    Distribution

    The ASF Thin Egress App is used by Cumulus for distribution. TEA can be deployed with Cumulus or as part of other applications to distribute data.

    Operational Cloud Recovery Archive (ORCA)

    ORCA can be deployed with Cumulus to provide a customizable baseline for creating and managing operational backups.

    Workflow Tasks

    CNM

    PO.DAAC provides two workflow tasks to be used with the Cloud Notification Mechanism (CNM) Schema: CNM to Granule and CNM Response.

    See the CNM workflow data cookbook for an example of how these can be used in a Cumulus ingest workflow.

    DMR++ Generation

    GHRC has provided a DMR++ Generation wokrflow task. This task is meant to be used in conjunction with Cumulus' Hyrax Metadata Updates workflow task.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/faqs/index.html b/docs/v11.1.0/faqs/index.html index ffe10ef97b3..6fdf7e6de7f 100644 --- a/docs/v11.1.0/faqs/index.html +++ b/docs/v11.1.0/faqs/index.html @@ -5,13 +5,13 @@ Frequently Asked Questions | Cumulus Documentation - +
    Version: v11.1.0

    Frequently Asked Questions

    Below are some commonly asked questions that you may encounter that can assist you along the way when working with Cumulus.

    General

    How do I deploy a new instance in Cumulus?

    Answer: For steps on the Cumulus deployment process go to How to Deploy Cumulus.

    What prerequisites are needed to setup Cumulus?

    Answer: You will need access to the AWS console and an Earthdata login before you can deploy Cumulus.

    What is the preferred web browser for the Cumulus environment?

    Answer: Our preferred web browser is the latest version of Google Chrome.

    How do I quickly troubleshoot an issue in Cumulus?

    Answer: To troubleshoot and fix issues in Cumulus reference our recommended solutions in Troubleshooting Cumulus.

    Where can I get support help?

    Answer: The following options are available for assistance:

    • Cumulus: Outside NASA users should file a GitHub issue and inside NASA users should file a JIRA issue.
    • AWS: You can create a case in the AWS Support Center, accessible via your AWS Console.

    Integrators & Developers

    What is a Cumulus integrator?

    Answer: Those who are working within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    What are the steps if I run into an issue during deployment?

    Answer: If you encounter an issue with your deployment go to the Troubleshooting Deployment guide.

    Is Cumulus customizable and flexible?

    Answer: Yes. Cumulus is a modular architecture that allows you to decide which components that you want/need to deploy. These components are maintained as Terraform modules.

    What are Terraform modules?

    Answer: They are modules that are composed to create a Cumulus deployment, which gives integrators the flexibility to choose the components of Cumulus that want/need. To view Cumulus maintained modules or steps on how to create a module go to Terraform modules.

    Where do I find Terraform module variables

    Answer: Go here for a list of Cumulus maintained variables.

    What is a Cumulus workflow?

    Answer: A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions. For more details, we suggest visiting here.

    How do I set up a Cumulus workflow?

    Answer: You will need to create a provider, have an associated collection (add a new one), and generate a new rule first. Then you can set up a Cumulus workflow by following these steps here.

    What are the common use cases that a Cumulus integrator encounters?

    Answer: The following are some examples of possible use cases you may see:


    Operators

    What is a Cumulus operator?

    Answer: Those that ingests, archives, and troubleshoots datasets (called collections in Cumulus). Your daily activities might include but not limited to the following:

    • Ingesting datasets
    • Maintaining historical data ingest
    • Starting and stopping data handlers
    • Managing collections
    • Managing provider definitions
    • Creating, enabling, and disabling rules
    • Investigating errors for granules and deleting or re-ingesting granules
    • Investigating errors in executions and isolating failed workflow step(s)
    What are the common use cases that a Cumulus operator encounters?

    Answer: The following are some examples of possible use cases you may see:

    Can you re-run a workflow execution in AWS?

    Answer: Yes. For steps on how to re-run a workflow execution go to Re-running workflow executions in the Cumulus Operator Docs.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/features/ancillary_metadata/index.html b/docs/v11.1.0/features/ancillary_metadata/index.html index 2c3fd1ed2a8..ef66ae5d811 100644 --- a/docs/v11.1.0/features/ancillary_metadata/index.html +++ b/docs/v11.1.0/features/ancillary_metadata/index.html @@ -5,7 +5,7 @@ Ancillary Metadata Export | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v11.1.0

    Ancillary Metadata Export

    This feature utilizes the type key on a files object in a Cumulus granule. It uses the key to provide a mechanism where granule discovery, processing and other tasks can set and use this value to facilitate metadata export to CMR.

    Tasks setting type

    Discover Granules

    Uses the Collection type key to set the value for files on discovered granules in it's output.

    Parse PDR

    Uses a task-specific mapping to map PDR 'FILE_TYPE' to a CNM type to set type on granules from the PDR.

    CNMToCMALambdaFunction

    Natively supports types that are included in incoming messages to a CNM Workflow.

    Tasks using type

    Move Granules

    Uses the granule file type key to update UMM/ECHO 10 CMR files passed in as candidates to the task. This task adds the external facing URLs to the CMR metadata file based on the type. See the file tracking data cookbook for a detailed mapping. If a non-CNM type is specified, the task assumes it is a 'data' file.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/features/backup_and_restore/index.html b/docs/v11.1.0/features/backup_and_restore/index.html index b3d4a92cdd4..c2927642701 100644 --- a/docs/v11.1.0/features/backup_and_restore/index.html +++ b/docs/v11.1.0/features/backup_and_restore/index.html @@ -5,7 +5,7 @@ Cumulus Backup and Restore | Cumulus Documentation - + @@ -52,7 +52,7 @@ writing to the old cluster.

  • Set the snapshot_identifier variable to the snapshot you wish to create, and configure the module like a new deployment, with a unique cluster_identifier

  • Deploy the module using terraform apply

  • Once deployed, verify the cluster has the expected data

  • Redeploy the data persistence and Cumulus deployments - You should not need to reconfigure either, as the secret ARN and the security group should not change, however double-check the configured values are as expected

  • - + \ No newline at end of file diff --git a/docs/v11.1.0/features/dead_letter_archive/index.html b/docs/v11.1.0/features/dead_letter_archive/index.html index 1f7be1bc695..d7fee886361 100644 --- a/docs/v11.1.0/features/dead_letter_archive/index.html +++ b/docs/v11.1.0/features/dead_letter_archive/index.html @@ -5,13 +5,13 @@ Cumulus Dead Letter Archive | Cumulus Documentation - +
    Version: v11.1.0

    Cumulus Dead Letter Archive

    This documentation explains the Cumulus dead letter archive and associated functionality.

    DB Records DLQ Archive

    The Cumulus system contains a number of dead letter queues. Perhaps the most important system lambda function supported by a DLQ is the sfEventSqsToDbRecords lambda function which parses Cumulus messages from workflow executions to generate and write database records to the Cumulus database.

    As of Cumulus v9+, the dead letter queue for this lambda (named sfEventSqsToDbRecordsDeadLetterQueue) has been updated with a consumer lambda that will automatically write any incoming records to the S3 system bucket, under the path <stackName>/dead-letter-archive/sqs/. This will allow integrators and operators engaged in debugging missing records to inspect any Cumulus messages which failed to process and did not result in the successful creation of database records.

    Dead Letter Archive recovery

    In addition to the above, as of Cumulus v9+, the Cumulus API also contains a new endpoint at /deadLetterArchive/recoverCumulusMessages.

    Sending a POST request to this endpoint will trigger a Cumulus AsyncOperation that will attempt to reprocess (and if successful delete) all Cumulus messages in the dead letter archive, using the same underlying logic as the existing sfEventSqsToDbRecords.

    This endpoint may prove particularly useful when recovering from extended or unexpected database outage, where messages failed to process due to external outage and there is no essential malformation of each Cumulus message.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/features/dead_letter_queues/index.html b/docs/v11.1.0/features/dead_letter_queues/index.html index d8c20014b20..bf0b3df1506 100644 --- a/docs/v11.1.0/features/dead_letter_queues/index.html +++ b/docs/v11.1.0/features/dead_letter_queues/index.html @@ -5,13 +5,13 @@ Dead Letter Queues | Cumulus Documentation - +
    Version: v11.1.0

    Dead Letter Queues

    startSF SQS queue

    The workflow-trigger for the startSF queue has a Redrive Policy set up that directs any failed attempts to pull from the workflow start queue to a SQS queue Dead Letter Queue.

    This queue can then be monitored for failures to initiate a workflow. Please note that workflow failures will not show up in this queue, only repeated failure to trigger a workflow.

    Named Lambda Dead Letter Queues

    Cumulus provides configured Dead Letter Queues (DLQ) for non-workflow Lambdas (such as ScheduleSF) to capture Lambda failures for further processing.

    These DLQs are setup with the following configuration:

      receive_wait_time_seconds  = 20
    message_retention_seconds = 1209600
    visibility_timeout_seconds = 60

    Default Lambda Configuration

    The following built-in Cumulus Lambdas are setup with DLQs to allow handling of process failures:

    • dbIndexer (Updates Elasticsearch)
    • JobsLambda (writes logs outputs to Elasticsearch)
    • ScheduleSF (the SF Scheduler Lambda that places messages on the queue that is used to start workflows, see Workflow Triggers)
    • publishReports (Lambda that publishes messages to the SNS topics for execution, granule and PDR reporting)
    • reportGranules, reportExecutions, reportPdrs (Lambdas responsible for updating records based on messages in the queues published by publishReports)

    Troubleshooting/Utilizing messages in a Dead Letter Queue

    Ideally an automated process should be configured to poll the queue and process messages off a dead letter queue.

    For aid in manually troubleshooting, you can utilize the SQS Management console to view/messages available in the queues setup for a particular stack. The dead letter queues will have a Message Body containing the Lambda payload, as well as Message Attributes that reference both the error returned and a RequestID which can be cross referenced to the associated Lambda's CloudWatch logs for more information:

    Screenshot of the AWS SQS console showing how to view SQS message attributes

    - + \ No newline at end of file diff --git a/docs/v11.1.0/features/distribution-metrics/index.html b/docs/v11.1.0/features/distribution-metrics/index.html index 3ac7b906386..9296673bf95 100644 --- a/docs/v11.1.0/features/distribution-metrics/index.html +++ b/docs/v11.1.0/features/distribution-metrics/index.html @@ -5,13 +5,13 @@ Cumulus Distribution Metrics | Cumulus Documentation - +
    Version: v11.1.0

    Cumulus Distribution Metrics

    It is possible to configure Cumulus and the Cumulus Dashboard to display information about the successes and failures of requests for data. This requires the Cumulus instance to deliver Cloudwatch Logs and S3 Server Access logs to an ELK stack.

    ESDIS Metrics in NGAP

    Work with the ESDIS metrics team to set up permissions and access to forward Cloudwatch Logs to a shared AWS:Logs:Destination as well as transferring your S3 Server Access logs to a metrics team bucket.

    The metrics team has taken care of setting up logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    Once Cumulus has been configured to deliver Cloudwatch logs to the ESDIS Metrics team, you can use the Elasticsearch indexes to create the necessary target patterns on the dashboard. These are often <daac>-cloudwatch-cumulus-<env>-* and <daac>-distribution-<env>-*, but they will depend on your specific Elastiscearch setup.

    Cumulus / ESDIS Metrics distribution system

    Architecture diagram showing how logs are replicated from a Cumulus instance to the ESDIS Metrics account and accessed by the Cumulus dashboard

    - + \ No newline at end of file diff --git a/docs/v11.1.0/features/execution_payload_retention/index.html b/docs/v11.1.0/features/execution_payload_retention/index.html index 8f73e8200c6..0286997d613 100644 --- a/docs/v11.1.0/features/execution_payload_retention/index.html +++ b/docs/v11.1.0/features/execution_payload_retention/index.html @@ -5,13 +5,13 @@ Execution Payload Retention | Cumulus Documentation - +
    Version: v11.1.0

    Execution Payload Retention

    In addition to CloudWatch logs and AWS StepFunction API records, Cumulus automatically stores the initial and 'final' (the last update to the execution record) payload values as part of the Execution record in your RDS database and Elasticsearch.

    This allows access via the API (or optionally direct DB/Elasticsearch querying) for debugging/reporting purposes. The data is stored in the "originalPayload" and "finalPayload" fields.

    Payload record cleanup

    To reduce storage requirements, a CloudWatch rule ({stack-name}-dailyExecutionPayloadCleanupRule) triggering a daily run of the provided cleanExecutions lambda has been added. This lambda will remove all 'completed' and 'non-completed' payload records in the database that are older than the specified configuration.

    Configuration

    The following configuration flags have been made available in the cumulus module. They may be overridden in your deployment's instance of the cumulus module by adding the following configuration options:

    dailyexecution_payload_cleanup_schedule_expression (string)_

    This configuration option sets the execution times for this Lambda to run, using a Cloudwatch cron expression.

    Default value is "cron(0 4 * * ? *)".

    completeexecution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of completed execution payloads.

    Default value is false.

    completeexecution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a 'completed' status in days. Records with updatedAt values older than this with payload information will have that information removed.

    Default value is 10.

    noncomplete_execution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of "non-complete" (any status other than completed) execution payloads.

    Default value is false.

    noncomplete_execution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a status other than 'complete' in days. Records with updateTime values older than this with payload information will have that information removed.

    Default value is 30 days.

    • complete_execution_payload_disable/non_complete_execution_payload_disable

    These flags (true/false) determine if the cleanup script's logic for 'complete' and 'non-complete' executions will run. Default value is false for both.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/features/logging-esdis-metrics/index.html b/docs/v11.1.0/features/logging-esdis-metrics/index.html index a8096e7fa24..68b1467d392 100644 --- a/docs/v11.1.0/features/logging-esdis-metrics/index.html +++ b/docs/v11.1.0/features/logging-esdis-metrics/index.html @@ -5,13 +5,13 @@ Writing logs for ESDIS Metrics | Cumulus Documentation - +
    Version: v11.1.0

    Writing logs for ESDIS Metrics

    Note: This feature is only available for Cumulus deployments in NGAP environments.

    Prerequisite: You must configure your Cumulus deployment to deliver your logs to the correct shared logs destination for ESDIS metrics.

    Log messages delivered to the ESDIS metrics logs destination conforming to an expected format will be automatically ingested and parsed to enable helpful searching/filtering of your logs via the ESDIS metrics Kibana dashboard.

    Expected log format

    The ESDIS metrics pipeline expects a log message to be a JSON string representation of an object (dict in Python or map in Java). An example log message might look like:

    {
    "level": "info",
    "executions": "arn:aws:states:us-east-1:000000000000:execution:MySfn:abcd1234",
    "granules": "[\"granule-1\",\"granule-2\"]",
    "message": "hello world",
    "sender": "greetingFunction",
    "stackName": "myCumulus",
    "timestamp": "2018-10-19T19:12:47.501Z"
    }

    A log message can contain the following properties:

    • executions: The AWS Step Function execution name in which this task is executing, if any
    • granules: A JSON string of the array of granule IDs being processed by this code, if any
    • level: A string identifier for the type of message being logged. Possible values:
      • debug
      • error
      • fatal
      • info
      • warn
      • trace
    • message: String containing your actual log message
    • parentArn: The parent AWS Step Function execution ARN that triggered the current execution, if any
    • sender: The name of the resource generating the log message (e.g. a library name, a Lambda function name, an ECS activity name)
    • stackName: The unique prefix for your Cumulus deployment
    • timestamp: An ISO-8601 formatted timestamp
    • version: The version of the resource generating the log message, if any

    None of these properties are explicitly required for ESDIS metrics to parse your log correctly. However, a log without a message has no informational content. And having level, sender, and timestamp properties is very useful for filtering your logs. Including a stackName in your logs is helpful as it allows you to distinguish between logs generated by different deployments.

    Using Cumulus Message Adapter libraries

    If you are writing a custom task that is integrated with the Cumulus Message Adapter, then some of language specific client libraries can be used to write logs compatible with ESDIS metrics.

    The usage of each library differs slightly, but in general a logger is initialized with a Cumulus workflow message to determine the contextual information for the task (e.g. granules, executions). Then, after the logger is initialized, writing logs only requires specifying a message, but the logged output will include the contextual information as well.

    Writing logs using custom code

    Any code that produces logs matching the expected log format can be processed by ESDIS metrics.

    Node.js

    Cumulus core provides a @cumulus/logger library that writes logs in the expected format for ESDIS metrics.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/features/replay-archived-sqs-messages/index.html b/docs/v11.1.0/features/replay-archived-sqs-messages/index.html index 11ef196a0ae..35a9a2e023e 100644 --- a/docs/v11.1.0/features/replay-archived-sqs-messages/index.html +++ b/docs/v11.1.0/features/replay-archived-sqs-messages/index.html @@ -5,14 +5,14 @@ How to replay SQS messages archived in S3 | Cumulus Documentation - +
    Version: v11.1.0

    How to replay SQS messages archived in S3

    Context

    Cumulus archives all incoming SQS messages to S3 and removes messages once they have been processed. Unprocessed messages are archived at the path: ${stackName}/archived-incoming-messages/${queueName}/${messageId}

    Replay SQS messages endpoint

    The Cumulus API has added a new endpoint, /replays/sqs. This endpoint will allow you to start a replay operation to requeue all archived SQS messages by queueName and returns an AsyncOperationId for operation status tracking.

    Start replaying archived SQS messages

    In order to start a replay, you must perform a POST request to the replays/sqs endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    FieldTypeDescription
    queueNamestringAny valid SQS queue name (not ARN)

    Status tracking

    A successful response from the /replays/sqs endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/features/replay-kinesis-messages/index.html b/docs/v11.1.0/features/replay-kinesis-messages/index.html index 2335535d9e2..71256e08c63 100644 --- a/docs/v11.1.0/features/replay-kinesis-messages/index.html +++ b/docs/v11.1.0/features/replay-kinesis-messages/index.html @@ -5,7 +5,7 @@ How to replay Kinesis messages after an outage | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v11.1.0

    How to replay Kinesis messages after an outage

    After a period of outage, it may be necessary for a Cumulus operator to reprocess or 'replay' messages that arrived on an AWS Kinesis Data Stream but did not trigger an ingest. This document serves as an outline on how to start a replay operation, and how to perform status tracking. Cumulus supports replay of all Kinesis messages on a stream (subject to the normal RetentionPeriod constraints), or all messages within a given time slice delimited by start and end timestamps.

    As Kinesis has no comparable field to e.g. the SQS ReceiveCount on its records, Cumulus cannot tell which messages within a given time slice have never been processed, and cannot guarantee only missed messages will be processed. Users will have to rely on duplicate handling or some other method of identifying messages that should not be processed within the time slice.

    NOTE: This operation flow effectively changes only the trigger mechanism for Kinesis ingest notifications. The existence of valid Kinesis-type rules and all other normal requirements for the triggering of ingest via Kinesis still apply.

    Replays endpoint

    Cumulus has added a new endpoint to its API, /replays. This endpoint will allow you to start replay operations and returns an AsyncOperationId for operation status tracking.

    Start a replay

    In order to start a replay, you must perform a POST request to the replays endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    NOTE: As the endTimestamp relies on a comparison with the Kinesis server-side ApproximateArrivalTimestamp, and given that there is no documented level of accuracy for the approximation, it is recommended that the endTimestamp include some amount of buffer to allow for slight discrepancies. If tolerable, the same is recommended for the startTimestamp although it is used differently and less vulnerable to discrepancies since a server-side arrival timestamp should never be earlier than the client-side request timestamp.

    FieldTypeRequiredDescription
    typestringrequiredCurrently only accepts kinesis.
    kinesisStreamstringfor type kinesisAny valid kinesis stream name (not ARN)
    kinesisStreamCreationTimestamp*optionalAny input valid for a JS Date constructor. For reasons to use this field see AWS documentation on StreamCreationTimestamp.
    endTimestamp*optionalAny input valid for a JS Date constructor. Messages newer than this timestamp will be skipped.
    startTimestamp*optionalAny input valid for a JS Date constructor. Messages will be fetched from the Kinesis stream starting at this timestamp. Ignored if it is further in the past than the stream's retention period.

    Status tracking

    A successful response from the /replays endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/features/reports/index.html b/docs/v11.1.0/features/reports/index.html index fa70a41a925..1de3e867e4e 100644 --- a/docs/v11.1.0/features/reports/index.html +++ b/docs/v11.1.0/features/reports/index.html @@ -5,7 +5,7 @@ Reconciliation Reports | Cumulus Documentation - + @@ -19,7 +19,7 @@ report generation. The data buckets will include any buckets in your Cumulus buckets configuration that have type public, protected or private.
    - + \ No newline at end of file diff --git a/docs/v11.1.0/getting-started/index.html b/docs/v11.1.0/getting-started/index.html index dc276567ea4..71e8e7f1376 100644 --- a/docs/v11.1.0/getting-started/index.html +++ b/docs/v11.1.0/getting-started/index.html @@ -5,13 +5,13 @@ Getting Started | Cumulus Documentation - +
    Version: v11.1.0

    Getting Started

    Overview | Quick Tutorials | Helpful Tips

    Overview

    This serves as a guide for new Cumulus users to deploy and learn how to use Cumulus. Here you will learn what you need in order to complete any prerequisites, what Cumulus is and how it works, and how to successfully navigate and deploy a Cumulus environment.

    What is Cumulus

    Cumulus is an open source set of components for creating cloud-based data ingest, archive, distribution and management designed for NASA's future Earth Science data streams.

    Who uses Cumulus

    Data integrators/developers and operators across projects not limited to NASA use Cumulus for their daily work functions.

    Cumulus Roles

    Integrator/Developer

    Cumulus integrators/developers are those who work within Cumulus and AWS for deployments and to manage workflows.

    Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections.

    Role Guides

    As a developer, integrator, or operator, you will need to set up your environments to work in Cumulus. The following docs can get you started in your role specific activities.

    What is a Cumulus Data Type

    In Cumulus, we have the following types of data that you can create and manage:

    • Collections
    • Granules
    • Providers
    • Rules
    • Workflows
    • Executions
    • Reports

    For details on how to create or manage data types go to Data Management Types.


    Quick Tutorials

    Deployment & Configuration

    Cumulus is deployed to an AWS account, so you must have access to deploy resources to an AWS account to get started.

    1. Deploy Cumulus and Cumulus Dashboard to AWS

    Follow the deployment instructions to deploy Cumulus to your AWS account.

    2. Configure and Run the HelloWorld Workflow

    If you have deployed using the cumulus-template-deploy repository, you have a HelloWorld workflow deployed to your Cumulus backend.

    You can see your deployed workflows on the Workflows page of your Cumulus dashboard.

    Configure a collection and provider using the setup guidance on the Cumulus dashboard.

    Then create a rule to trigger your HelloWorld workflow. You can select a rule type of one time.

    Navigate to the Executions page of the dashboard to check the status of your workflow execution.

    3. Configure a Custom Workflow

    See Developing a custom workflow documentation for adding a new workflow to your deployment.

    There are plenty of workflow examples using Cumulus tasks here. The Data Cookbooks provide a more in-depth look at some of these more advanced workflows and their configurations.

    There is a list of Cumulus tasks already included in your deployment here.

    After configuring your workflow and redeploying, you can configure and run your workflow using the same steps as in step 2.


    Helpful Tips

    Here are some useful tips to keep in mind when deploying or working in Cumulus.

    Integrator/Developer

    • Versioning and Releases: This documentation gives information on our global versioning approach. We suggest upgrading to the supported version for Cumulus, Cumulus dashboard, and Thin Egress App (TEA).
    • Cumulus Developer Documentation: We suggest that you read through and reference this resource for development best practices in Cumulus.
    • Cumulus Deployment: We will guide you on how to manually deploy a new instance of Cumulus. In this reference, you will learn how to install Terraform, create an AWS S3 bucket, configure a compatible database, and create a Lambda layer.
    • Terraform Best Practices: This will help guide you through your Terraform configuration and Cumulus deployment. For an introduction about Terraform go here.
    • Integrator Common Use Cases: Scenarios to help integrators along in the Cumulus environment.

    Operator

    Troubleshooting

    Troubleshooting: Some suggestions to help you troubleshoot and solve issues you may encounter.

    Resources

    - + \ No newline at end of file diff --git a/docs/v11.1.0/glossary/index.html b/docs/v11.1.0/glossary/index.html index bfdbabea6c4..014c982816d 100644 --- a/docs/v11.1.0/glossary/index.html +++ b/docs/v11.1.0/glossary/index.html @@ -5,13 +5,13 @@ Glossary | Cumulus Documentation - +
    Version: v11.1.0

    Glossary

    AWS Glossary

    For terms/items from Amazon/AWS not mentioned in this glossary, please refer to the AWS Glossary.

    Cumulus Glossary of Terms

    API Gateway

    Refers to AWS's API Gateway. Used by the Cumulus API.

    ARN

    Refers to an AWS "Amazon Resource Name".

    For more info, see the AWS documentation.

    AWS

    See: aws.amazon.com

    AWS Lambda/Lambda Function

    AWS's 'serverless' option. Allows the running of code without provisioning a service or managing server/ECS instances/etc.

    For more information, see the AWS Lambda documentation.

    AWS Access Keys

    Access credentials that give you access to AWS to act as a IAM user programmatically or from the command line.

    For more information, see the AWS IAM Documentation.

    Bucket

    An Amazon S3 cloud storage resource.

    For more information, see the AWS Bucket Documentation.

    CloudFormation

    An AWS service that allows you to define and manage cloud resources as a preconfigured block.

    For more information, see the AWS CloudFormation User Guide.

    Cloudformation Template

    A template that defines an AWS Cloud Formation.

    For more information, see the AWS intro page.

    Cloudwatch

    AWS service that allows logging and metrics collections on various cloud resources you have in AWS.

    For more information, see the AWS User Guide.

    Cloud Notification Mechanism (CNM)

    An interface mechanism to support cloud-based ingest messaging. For more information, see PO.DAAC's CNM Schema.

    Common Metadata Repository (CMR)

    "A high-performance, high-quality, continuously evolving metadata system that catalogs Earth Science data and associated service metadata records". For more information, see NASA's CMR page.

    Collection (Cumulus)

    Cumulus Collections are logical sets of data objects of the same data type and version.

    For more information, see cookbook reference page.

    Cumulus Message Adapter (CMA)

    A library designed to help task developers integrate step function tasks into a Cumulus workflow by adapting task input/output into the Cumulus Message format.

    For more information, see CMA workflow reference page.

    Distributed Active Archive Center (DAAC)

    Refers to a specific organization that's part of NASA's distributed system of archive centers. For more information see EOSDIS's DAAC page

    Dead Letter Queue (DLQ)

    This refers to Amazon SQS Dead-Letter Queues - these SQS queues are specifically configured to capture failed messages from other services/SQS queues/etc to allow for processing of failed messages.

    For more on DLQs, see the Amazon Documentation and the Cumulus DLQ feature page.

    Developer

    Those who setup deployment and workflow management for Cumulus. Sometimes referred to as an integrator. See integrator.

    ECS

    Amazon's Elastic Container Service. Used in Cumulus by workflow steps that require more flexibility than Lambda can provide.

    For more information, see AWS's developer guide.

    ECS Activity

    An ECS instance run via a Step Function.

    Execution (Cumulus)

    A Cumulus execution refers to a single execution of a (Cumulus) Workflow.

    GIBS

    Global Imagery Browse Services

    Granule

    A granule is the smallest aggregation of data that can be independently managed (described, inventoried, and retrieved). Granules are always associated with a collection, which is a grouping of granules. A granule is a grouping of data files.

    IAM

    AWS Identity and Access Management.

    For more information, see AWS IAMs.

    Integrator/Developer

    Those who work within Cumulus and AWS for deployments and to manage workflows.

    Kinesis

    Amazon's platform for streaming data on AWS.

    See AWS Kinesis for more information.

    Lambda

    AWS's cloud service that lets you run code without provisioning or managing servers.

    For more information, see AWS's lambda page.

    Module (Terraform)

    Refers to a terraform module.

    Node

    See node.js.

    Npm

    Node package manager.

    For more information, see npmjs.com.

    Operator

    Those who work within Cumulus to ingest/archive data and manage collections.

    PDR

    "Polling Delivery Mechanism" used in "DAAC Ingest" workflows.

    For more information, see nasa.gov.

    Packages (NPM)

    NPM hosted node.js packages. Cumulus packages can be found on NPM's site here

    Provider

    Data source that generates and/or distributes data for Cumulus workflows to act upon.

    For more information, see the Cumulus documentation.

    Rule

    Rules are configurable scheduled events that trigger workflows based on various criteria.

    For more information, see the Cumulus Rules documentation.

    S3

    Amazon's Simple Storage Service provides data object storage in the cloud. Used in Cumulus to store configuration, data and more.

    For more information, see AWS's s3 page.

    SIPS

    Science Investigator-led Processing Systems. In the context of DAAC ingest, this refers to data producers/providers.

    For more information, see nasa.gov.

    SNS

    Amazon's Simple Notification Service provides a messaging service that allows publication of and subscription to events. Used in Cumulus to trigger workflow events, track event failures, and others.

    For more information, see AWS's SNS page.

    SQS

    Amazon's Simple Queue Service.

    For more information, see AWS's SQS page.

    Stack

    A collection of AWS resources you can manage as a single unit.

    In the context of Cumulus, this refers to a deployment of the cumulus and data-persistence modules that is managed by Terraform

    Step Function

    AWS's web service that allows you to compose complex workflows as a state machine comprised of tasks (Lambdas, activities hosted on EC2/ECS, some AWS service APIs, etc). See AWS's Step Function Documentation for more information. In the context of Cumulus these are the underlying AWS service used to create Workflows.

    Terraform

    Terraform is the tool that you will use for deployment and configuration of your Cumulus environment.

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/index.html b/docs/v11.1.0/index.html index 2d0133e0954..dd868187ccd 100644 --- a/docs/v11.1.0/index.html +++ b/docs/v11.1.0/index.html @@ -5,13 +5,13 @@ Introduction | Cumulus Documentation - +
    Version: v11.1.0

    Introduction

    This Cumulus project seeks to address the existing need for a “native” cloud-based data ingest, archive, distribution, and management system that can be used for all future Earth Observing System Data and Information System (EOSDIS) data streams via the development and implementation of Cumulus. The term “native” implies that the system will leverage all components of a cloud infrastructure provided by the vendor for efficiency (in terms of both processing time and cost). Additionally, Cumulus will operate on future data streams involving satellite missions, aircraft missions, and field campaigns.

    This documentation includes both guidelines, examples, and source code docs. It is accessible at https://nasa.github.io/cumulus.


    Get To Know Cumulus

    • Getting Started - here - If you are new to Cumulus we suggest that you begin with this section to help you understand and work in the environment.
    • General Cumulus Documentation - here <- you're here

    Cumulus Reference Docs

    • Cumulus API Documentation - here
    • Cumulus Developer Documentation - here - READMEs throughout the main repository.
    • Data Cookbooks - here

    Auxiliary Guides

    • Integrator Guide - here
    • Operator Docs - here

    Contributing

    Please refer to: https://github.com/nasa/cumulus/blob/master/CONTRIBUTING.md for information. We thank you in advance.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/integrator-guide/about-int-guide/index.html b/docs/v11.1.0/integrator-guide/about-int-guide/index.html index 176f3fc22fe..ad8a9b45071 100644 --- a/docs/v11.1.0/integrator-guide/about-int-guide/index.html +++ b/docs/v11.1.0/integrator-guide/about-int-guide/index.html @@ -5,13 +5,13 @@ About Integrator Guide | Cumulus Documentation - +
    Version: v11.1.0

    About Integrator Guide

    Purpose

    The Integrator Guide is to help supplement the Cumulus documentation and Data Cookbooks. This content is for Cumulus integrators who are either new to the project or need a step-by-step resource to help them along.

    What Is A Cumulus Integrator

    Cumulus integrators are those who work within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    - + \ No newline at end of file diff --git a/docs/v11.1.0/integrator-guide/int-common-use-cases/index.html b/docs/v11.1.0/integrator-guide/int-common-use-cases/index.html index 6816be48b24..6b0141284f7 100644 --- a/docs/v11.1.0/integrator-guide/int-common-use-cases/index.html +++ b/docs/v11.1.0/integrator-guide/int-common-use-cases/index.html @@ -5,13 +5,13 @@ Integrator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v11.1.0/integrator-guide/workflow-add-new-lambda/index.html b/docs/v11.1.0/integrator-guide/workflow-add-new-lambda/index.html index 4d14f251c91..dd47dfd7688 100644 --- a/docs/v11.1.0/integrator-guide/workflow-add-new-lambda/index.html +++ b/docs/v11.1.0/integrator-guide/workflow-add-new-lambda/index.html @@ -5,13 +5,13 @@ Workflow - Add New Lambda | Cumulus Documentation - +
    Version: v11.1.0

    Workflow - Add New Lambda

    You can develop a workflow task in AWS Lambda or Elastic Container Service (ECS). AWS ECS requires Docker. For a list of tasks to use go to our Cumulus Tasks page.

    The following steps are to help you along as you write a new Lambda that integrates with a Cumulus workflow. This will aid you with the understanding of the Cumulus Message Adapter (CMA) process.

    Steps

    1. Define New Lambda in Terraform

    2. Add Task in JSON Object

      For details on how to set up a workflow via CMA go to the CMA Tasks: Message Flow.

      You will need to assign input and output for the new task and follow the CMA contract here. This contract defines how libraries should call the cumulus-message-adapter to integrate a task into an existing Cumulus Workflow.

    3. Verify New Task

      Check the updated workflow in AWS and in Cumulus.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/integrator-guide/workflow-ts-failed-step/index.html b/docs/v11.1.0/integrator-guide/workflow-ts-failed-step/index.html index b596eef5dcc..be84299d1a7 100644 --- a/docs/v11.1.0/integrator-guide/workflow-ts-failed-step/index.html +++ b/docs/v11.1.0/integrator-guide/workflow-ts-failed-step/index.html @@ -5,13 +5,13 @@ Workflow - Troubleshoot Failed Step(s) | Cumulus Documentation - +
    Version: v11.1.0

    Workflow - Troubleshoot Failed Step(s)

    Steps

    1. Locate Step
    • Go to Cumulus dashboard
    • Find the granule
    • Go to Executions to determine the failed step
    1. Investigate in Cloudwatch
    • Go to Cloudwatch
    • Locate lambda
    • Search Cloudwatch logs
    1. Recreate Error

      In your sandbox environment, try to recreate the error.

    2. Resolution

    - + \ No newline at end of file diff --git a/docs/v11.1.0/interfaces/index.html b/docs/v11.1.0/interfaces/index.html index be37991893d..b6c1fac66c2 100644 --- a/docs/v11.1.0/interfaces/index.html +++ b/docs/v11.1.0/interfaces/index.html @@ -5,13 +5,13 @@ Interfaces | Cumulus Documentation - +
    Version: v11.1.0

    Interfaces

    Cumulus has multiple interfaces that allow interaction with discrete components of the system, such as starting workflows via SNS/Kinesis/SQS, manually queueing workflow start messages, submitting SNS notifications for completed workflows, and the many operations allowed by the Cumulus API.

    The diagram below illustrates the workflow process in detail and the various interfaces that allow starting of workflows, reporting of workflow information, and database create operations that occur when a workflow reporting message is processed. For interfaces with expected input or output schemas, details are provided below.

    Architecture diagram showing the interfaces for triggering and reporting of Cumulus workflow executions

    Workflow triggers and queuing

    Kinesis stream

    As a Kinesis stream is consumed by the messageConsumer Lambda to queue workflow executions, the incoming event is validated against this consumer schema by the ajv package.

    SQS queue for executions

    The messages put into the SQS queue for executions should conform to the Cumulus message format.

    Workflow executions

    See the documentation on Cumulus workflows.

    Workflow reporting

    SNS reporting topics

    For granule and PDR reporting, the topics will only receive data if the Cumulus workflow execution message meets the following criteria:

    • Granules - workflow message contains granule data in payload.granules
    • PDRs - workflow message contains PDR data in payload.pdr

    The messages published to the SNS reporting topics for executions and PDRs and the record property in the messages published to the granules SNS topic should conform to the model schema for each data type.

    Further detail on workflow reporting and how to interact with these interfaces can be found in the workflow notifications data cookbook.

    Cumulus API

    See the Cumulus API documentation.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/operator-docs/about-operator-docs/index.html b/docs/v11.1.0/operator-docs/about-operator-docs/index.html index 95032cc87ee..0d624d59208 100644 --- a/docs/v11.1.0/operator-docs/about-operator-docs/index.html +++ b/docs/v11.1.0/operator-docs/about-operator-docs/index.html @@ -5,13 +5,13 @@ About Operator Docs | Cumulus Documentation - +
    Version: v11.1.0

    About Operator Docs

    Purpose

    Operator Docs are an augmentation to Cumulus documentation and Data Cookbooks. These documents will walk step-by-step through common Cumulus activities (that aren't necessarily as use-case directed as what you'd see in Data Cookbooks).

    What Is A Cumulus Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections. They may perform the following functions via the operator dashboard or API:

    • Configure providers and collections
    • Configure rules and monitor workflow executions
    • Monitor granule ingestion
    • Monitor system metrics
    - + \ No newline at end of file diff --git a/docs/v11.1.0/operator-docs/bulk-operations/index.html b/docs/v11.1.0/operator-docs/bulk-operations/index.html index 8c499d2d978..1d84091ba26 100644 --- a/docs/v11.1.0/operator-docs/bulk-operations/index.html +++ b/docs/v11.1.0/operator-docs/bulk-operations/index.html @@ -5,14 +5,14 @@ Bulk Operations | Cumulus Documentation - +
    Version: v11.1.0

    Bulk Operations

    Cumulus implements bulk operations through the use of AsyncOperations, which are long-running processes executed on an AWS ECS cluster.

    Submitting a bulk API request

    Bulk operations are generally submitted via the endpoint for the relevant data type, e.g. granules. For a list of supported API requests, refer to the Cumulus API documentation. Bulk operations are denoted with the keyword 'bulk'.

    Starting bulk operations from the Cumulus dashboard

    Using a Kibana query

    Note: You must have configured your dashboard build with a KIBANAROOT environment variable in order for the Kibana link to render in the bulk granules modal

    1. From the Granules dashboard page, click on the "Run Bulk Granules" button, then select what type of action you would like to perform

      • Note: the rest of the process is the same regardless of what type of bulk action you perform
    2. From the bulk granules modal, click the "Open Kibana" link:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations

    3. Once you have accessed Kibana, navigate to the "Discover" page. If this is your first time using Kibana, you may see a message like this at the top of the page:

      In order to visualize and explore data in Kibana, you'll need to create an index pattern to retrieve data from Elasticsearch.

      In that case, see the docs for creating an index pattern for Kibana

      Screenshot of Kibana user interface showing the &quot;Discover&quot; page for running queries

    4. Enter a query that returns the granule records that you want to use for bulk operations:

      Screenshot of Kibana user interface showing an example Kibana query and results

    5. Once the Kibana query is returning the results you want, click the "Inspect" link near the top of the page. A slide out tab with request details will appear on the right side of the page:

      Screenshot of Kibana user interface showing details of an example request

    6. In the slide out tab that appears on the right side of the page, click the "Request" link near the top and scroll down until you see the query property:

      Screenshot of Kibana user interface showing the Elasticsearch data request made for a given Kibana query

    7. Highlight and copy the query contents from Kibana. Go back to the Cumulus dashboard and paste the query contents from Kibana inside of the query property in the bulk granules request payload. It is expected that you should have a property of query nested inside of the existing query property:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query information populated

    8. Add values for the index and workflowName to the bulk granules request payload. The value for index will vary based on your Elasticsearch setup, but it is good to target an index specifically for granule data if possible:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query, index, and workflow information populated

    9. Click the "Run Bulk Operations" button. You should see a confirmation message, including an ID for the async operation that was started to handle your bulk action. You can track the status of this async operation on the Operations dashboard page, which can be visited by clicking the "Go To Operations" button:

      Screenshot of Cumulus dashboard showing confirmation message with async operation ID for bulk granules request

    Creating an index pattern for Kibana

    1. Define the index pattern for the indices that your Kibana queries should use. A wildcard character, *, will match across multiple indices. Once you are satisfied with your index pattern, click the "Next step" button:

      Screenshot of Kibana user interface for defining an index pattern

    2. Choose whether to use a Time Filter for your data, which is not required. Then click the "Create index pattern" button:

      Screenshot of Kibana user interface for configuring the settings of an index pattern

    Status Tracking

    All bulk operations return an AsyncOperationId which can be submitted to the /asyncOperations endpoint.

    The /asyncOperations endpoint allows listing of AsyncOperation records as well as record retrieval for individual records, which will contain the status. The Cumulus API documentation shows sample requests for these actions.

    The Cumulus Dashboard also includes an Operations monitoring page, where operations and their status are visible:

    Screenshot of Cumulus Dashboard Operations Page showing 5 operations and their status, ID, description, type and creation timestamp

    - + \ No newline at end of file diff --git a/docs/v11.1.0/operator-docs/cmr-operations/index.html b/docs/v11.1.0/operator-docs/cmr-operations/index.html index ee8a46b1d82..cbb043adc94 100644 --- a/docs/v11.1.0/operator-docs/cmr-operations/index.html +++ b/docs/v11.1.0/operator-docs/cmr-operations/index.html @@ -5,7 +5,7 @@ CMR Operations | Cumulus Documentation - + @@ -16,7 +16,7 @@ UpdateCmrAccessConstraints will update CMR metadata file contents on S3, and PostToCmr will push the updates to CMR. The rest of this section will assume you have created this workflow under the name UpdateCmrAccessConstraints.

    Once created and deployed, the workflow is available in the Cumulus dashboard's Execute workflow selector. However, note that additional configuration is required for this request, to supply an access constraint integer value and optional description to the UpdateCmrAccessConstraints workflow, by clicking the Add Custom Workflow Meta option in the Execute popup, as shown below:

    Screenshot showing granule execute popup with &#39;updateCmrAccessConstraints&#39; selected and configuration values shown in a collapsible JSON field

    An example invocation of the API to perform this action is:

    $ curl --request PUT https://example.com/granules/MOD11A1.A2017137.h19v16.006.2017138085750 \
    --header 'Authorization: Bearer ReplaceWithTheToken' \
    --header 'Content-Type: application/json' \
    --data '{
    "action": "applyWorkflow",
    "workflow": "updateCmrAccessConstraints",
    "meta": {
    accessConstraints: {
    value: 5,
    description: "sample access constraint"
    }
    }
    }'

    Supported CMR metadata formats for the above operation are Echo10XML and UMMG-JSON, which will populate the RestrictionFlag and RestrictionComment fields in Echo10XML, or the AccessConstraints values in UMMG-JSON.

    Additional Operations

    At this time Cumulus does not, out of the box, support additional operations on CMR metadata. However, given the examples shown above, we recommend working with your integrators to develop additional workflows that perform any required operations.

    Bulk CMR operations

    In order to perform the above operations in bulk, Cumulus supports the use of ApplyWorkflow in an AsyncOperation. These are accessed via the Bulk Operation button on the dashboard, or the /granules/bulk endpoint on the Cumulus API.

    More information on bulk operations are in the bulk operations operator doc.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/operator-docs/create-rule-in-cumulus/index.html b/docs/v11.1.0/operator-docs/create-rule-in-cumulus/index.html index 417eb71ef20..fe6b77dd11b 100644 --- a/docs/v11.1.0/operator-docs/create-rule-in-cumulus/index.html +++ b/docs/v11.1.0/operator-docs/create-rule-in-cumulus/index.html @@ -5,13 +5,13 @@ Create Rule In Cumulus | Cumulus Documentation - +
    Version: v11.1.0

    Create Rule In Cumulus

    Once the above files are in place and the entries created in CMR and Cumulus, we are ready to begin ingesting data. Depending on the type of ingestion (FTP/Kinesis, etc) the values below will change, but for the most part they are all similar. Rules tell Cumulus how to associate providers and collections, and when/how to start processing a workflow.

    Steps

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v11.1.0/operator-docs/discovery-filtering/index.html b/docs/v11.1.0/operator-docs/discovery-filtering/index.html index 86c5371d238..2f9ac86d66a 100644 --- a/docs/v11.1.0/operator-docs/discovery-filtering/index.html +++ b/docs/v11.1.0/operator-docs/discovery-filtering/index.html @@ -5,7 +5,7 @@ Discovery Filtering | Cumulus Documentation - + @@ -24,7 +24,7 @@ directly list the provider_path. If the path contains regular expression components, this may fail.

    It is recommended that operators diagnose any failures by checking error logs and ensuring that permissions on the remote file system allow reading of the default directory and any subdirectories that match the filter.

    Supported protocols

    Currently support for this feature is limited to the following protocols:

    • ftp
    • sftp
    - + \ No newline at end of file diff --git a/docs/v11.1.0/operator-docs/granule-workflows/index.html b/docs/v11.1.0/operator-docs/granule-workflows/index.html index 09f5b97dc78..f6320f26f0c 100644 --- a/docs/v11.1.0/operator-docs/granule-workflows/index.html +++ b/docs/v11.1.0/operator-docs/granule-workflows/index.html @@ -5,13 +5,13 @@ Granule Workflows | Cumulus Documentation - +
    Version: v11.1.0

    Granule Workflows

    Failed Granule

    Delete and Ingest

    1. Delete Granule

    Note: Granules published to CMR will need to be removed from CMR via the dashboard prior to deletion

    1. Ingest Granule via Ingest Rule
    • Re-trigger a one-time, kinesis, SQS, or SNS rule or a scheduled rule will re-discover and reingest the deleted granule.

    Reingest

    1. Select Failed Granule
    • In the Cumulus dashboard, go to the Collections page.
    • Use search field to find the granule.
    1. Re-ingest Granule
    • Go to the Collections page.
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of the Reingest modal workflow

    Delete and Ingest

    1. Bulk Delete Granules
    • Go to the Granules page.
    • Use the Bulk Delete button to bulk delete selected granules or select via a Kibana query

    Note: You can optionally force deletion from CMR

    1. Ingest Granules via Ingest Rule
    • Re-trigger one-time, kinesis, SQS, or SNS rules or scheduled rules will re-discover and reingest the deleted granule.

    Multiple Failed Granules

    1. Select Failed Granules
    • In the Cumulus dashboard, go to the Collections page.
    • Click on Failed Granules.
    • Select multiple granules.

    Screenshot of selected multiple granules

    1. Bulk Re-ingest Granules
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of Bulk Reingest modal workflow

    - + \ No newline at end of file diff --git a/docs/v11.1.0/operator-docs/kinesis-stream-for-ingest/index.html b/docs/v11.1.0/operator-docs/kinesis-stream-for-ingest/index.html index ebf5904f4a3..4328846d49a 100644 --- a/docs/v11.1.0/operator-docs/kinesis-stream-for-ingest/index.html +++ b/docs/v11.1.0/operator-docs/kinesis-stream-for-ingest/index.html @@ -5,13 +5,13 @@ Setup Kinesis Stream & CNM Message | Cumulus Documentation - +
    Version: v11.1.0

    Setup Kinesis Stream & CNM Message

    Note: Keep in mind that you should only have to set this up once per ingest stream. Kinesis pricing is based on the shard value and not on amount of kinesis usage.

    1. Create a Kinesis Stream

      • In your AWS console, go to the Kinesis service and click Create Data Stream.
      • Assign a name to the stream.
      • Apply a shard value of 1.
      • Click on Create Kinesis Stream.
      • A status page with stream details display. Once the status is active then the stream is ready to use. Keep in mind to record the streamName and StreamARN for later use.

      Screenshot of AWS console page for creating a Kinesis stream

    2. Create a Rule

    3. Send a message

      • Send a message that makes your schema using python or by your command line.
      • The streamName and Collection must match the kinesisArn+collection defined in the rule that you have created in Step 2.
    - + \ No newline at end of file diff --git a/docs/v11.1.0/operator-docs/locating-access-logs/index.html b/docs/v11.1.0/operator-docs/locating-access-logs/index.html index 37666a73fa7..dffd1370f78 100644 --- a/docs/v11.1.0/operator-docs/locating-access-logs/index.html +++ b/docs/v11.1.0/operator-docs/locating-access-logs/index.html @@ -5,13 +5,13 @@ Locating S3 Access Logs | Cumulus Documentation - +
    Version: v11.1.0

    Locating S3 Access Logs

    When enabling S3 Access Logs for EMS Reporting you configured a TargetBucket and TargetPrefix. Inside the TargetBucket at the TargetPrefix is where you will find the raw S3 access logs.

    In a standard deployment, this will be your stack's <internal bucket name> and a key prefix of <stack>/ems-distribution/s3-server-access-logs/

    - + \ No newline at end of file diff --git a/docs/v11.1.0/operator-docs/naming-executions/index.html b/docs/v11.1.0/operator-docs/naming-executions/index.html index 52e25e0455f..bdc51b0af31 100644 --- a/docs/v11.1.0/operator-docs/naming-executions/index.html +++ b/docs/v11.1.0/operator-docs/naming-executions/index.html @@ -5,7 +5,7 @@ Naming Executions | Cumulus Documentation - + @@ -21,7 +21,7 @@ QueuePdrs step.

    In the following excerpt, the QueueGranules config.executionNamePrefix property is set using the value configured in the workflow's meta.executionNamePrefix.

    Please note: This meta.executionNamePrefix property should not be confused with the optional rule executionNamePrefix property from the previous section. Setting executionNamePrefix as a root property of the rule will set a prefix for the names of any workflows triggered by the rule. Setting meta.executionNamePrefix on the rule will set meta.executionNamePrefix in the workflow messages generated for this rule, allowing workflow steps like QueueGranules to read from the message meta.executionNamePrefix for their config. Then, workflows scheduled by QueueGranules would use the configured execution name prefix.

    Setting executionNamePrefix config for QueueGranules using rule.meta

    If you wanted to use a prefix of "my-prefix", you would create a rule with a meta property similar to the following Rule snippet:

    {
    ...other rule keys here...
    "meta":
    {
    "executionNamePrefix": "my-prefix"
    }
    }

    The value of meta.executionNamePrefix from the rule will be set as meta.executionNamePrefix in the workflow message.

    Then, the workflow could contain a "QueueGranules" step with the following state, which uses meta.executionNamePrefix from the message as the value for the executionNamePrefix config to the "QueueGranules" step:

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "executionNamePrefix": "{$.meta.executionNamePrefix}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },
    }
    - + \ No newline at end of file diff --git a/docs/v11.1.0/operator-docs/ops-common-use-cases/index.html b/docs/v11.1.0/operator-docs/ops-common-use-cases/index.html index b8db66154de..871ba8a6904 100644 --- a/docs/v11.1.0/operator-docs/ops-common-use-cases/index.html +++ b/docs/v11.1.0/operator-docs/ops-common-use-cases/index.html @@ -5,13 +5,13 @@ Operator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v11.1.0/operator-docs/trigger-workflow/index.html b/docs/v11.1.0/operator-docs/trigger-workflow/index.html index 31e784d493c..00f33ea091b 100644 --- a/docs/v11.1.0/operator-docs/trigger-workflow/index.html +++ b/docs/v11.1.0/operator-docs/trigger-workflow/index.html @@ -5,13 +5,13 @@ Trigger a Workflow Execution | Cumulus Documentation - +
    Version: v11.1.0

    Trigger a Workflow Execution

    To trigger a workflow, you need to create a rule. To trigger an ingest workflow, one that requires discovering and ingesting data, you will also need to configure the collection and provider and associate those to a rule.

    Trigger a HelloWorld Workflow

    To trigger a HelloWorld workflow that does not need to discover or archive data, you just need to create a rule.

    You can leave the provider and collection blank and do not need any additional metadata. If you create a onetime rule, the workflow execution will start momentarily and you can view its status on the Executions page.

    Trigger an Ingest Workflow

    To ingest data, you will need a provider and collection configured to tell your workflow where to discover data and where to archive the data respectively.

    Follow the instructions to create a provider and create a collection and configure their fields for your data ingest.

    In the rule's additional metadata you can specify a provider_path from which to get the data from the provider.

    Example: Ingest data from S3

    Setup

    Assume there are 2 files to be ingested in an S3 bucket called discovery-bucket, located in the test-data folder:

    • GRANULE.A2017025.jpg
    • GRANULE.A2017025.hdf

    Archive buckets should already be created and mapped to public / private / protected in the Cumulus deployment.

    For example:

    buckets = {
    private = {
    name = "discovery-bucket"
    type = "private"
    },
    protected = {
    name = "archive-protected"
    type = "protected"
    }
    public = {
    name = "archive-public"
    type = "public"
    }
    }

    Create a provider

    Create a new provider. Set protocol to S3 and Host to discovery-bucket.

    Screenshot of adding a sample S3 provider

    Create a collection

    Create a new collection. Configure the collection to extract the granule id from the filenames and configure where to store the granule files.

    The configuration below will store hdf files in the protected bucket and jpg files in the private bucket. The bucket types are

    {
    "name": "test-collection",
    "version": "001",
    "granuleId": "^GRANULE\\.A[\\d]{7}$",
    "granuleIdExtraction": "(GRANULE\\..*)(\\.hdf|\\.jpg)",
    "reportToEms": false,
    "sampleFileName": "GRANULE.A2017025.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^GRANULE\\.A[\\d]{7}\\.hdf$",
    "sampleFileName": "GRANULE.A2017025.hdf"
    },
    {
    "bucket": "public",
    "regex": "^GRANULE\\.A[\\d]{7}\\.jpg$",
    "sampleFileName": "GRANULE.A2017025.jpg"
    }
    ]
    }

    Create a rule

    Create a rule to trigger the workflow to discover your granule data and ingest your granule.

    Select the previously created provider and collection. See the Cumulus Discover Granules workflow for a workflow example of using Cumulus tasks to discover and queue data for ingest.

    In the rule meta, set the provider_path to test-data, so the test-data folder will be used to discover new granules.

    Screenshot of adding a Discover Granules rule

    A onetime rule will run your workflow on-demand and you can view it on the dashboard Executions page. The Cumulus Discover Granules workflow will trigger an ingest workflow and your ingested granules will be visible on the dashboard Granules page.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/tasks/index.html b/docs/v11.1.0/tasks/index.html index e1886cf541c..5ec7f0aebc1 100644 --- a/docs/v11.1.0/tasks/index.html +++ b/docs/v11.1.0/tasks/index.html @@ -5,13 +5,13 @@ Cumulus Tasks | Cumulus Documentation - +
    Version: v11.1.0

    Cumulus Tasks

    A list of reusable Cumulus tasks. Add your own.

    Tasks

    @cumulus/add-missing-file-checksums

    Add checksums to files in S3 which don't have one


    @cumulus/discover-granules

    Discover Granules in FTP/HTTP/HTTPS/SFTP/S3 endpoints


    @cumulus/discover-pdrs

    Discover PDRs in FTP and HTTP endpoints


    @cumulus/files-to-granules

    Converts array-of-files input into a granules object by extracting granuleId from filename


    @cumulus/hello-world

    Example task


    @cumulus/hyrax-metadata-updates

    Update granule metadata with hooks to OPeNDAP URL


    @cumulus/lzards-backup

    Run LZARDS backup


    @cumulus/move-granules

    Move granule files from staging to final location


    @cumulus/parse-pdr

    Download and Parse a given PDR


    @cumulus/pdr-status-check

    Checks execution status of granules in a PDR


    @cumulus/post-to-cmr

    Post a given granule to CMR


    @cumulus/queue-granules

    Add discovered granules to the queue


    @cumulus/queue-pdrs

    Add discovered PDRs to a queue


    @cumulus/queue-workflow

    Add workflow to the queue


    @cumulus/sf-sqs-report

    Sends an incoming Cumulus message to SQS


    @cumulus/sync-granule

    Download a given granule


    @cumulus/test-processing

    Fake processing task used for integration tests


    @cumulus/update-cmr-access-constraints

    Updates CMR metadata to set access constraints


    Update CMR metadata files with correct online access urls and etags and transfer etag info to granules' CMR files

    - + \ No newline at end of file diff --git a/docs/v11.1.0/team/index.html b/docs/v11.1.0/team/index.html index 7c163f7cb58..24f077cbadc 100644 --- a/docs/v11.1.0/team/index.html +++ b/docs/v11.1.0/team/index.html @@ -5,13 +5,13 @@ Cumulus Team | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v11.1.0/troubleshooting/index.html b/docs/v11.1.0/troubleshooting/index.html index 27691085daf..7b724f60311 100644 --- a/docs/v11.1.0/troubleshooting/index.html +++ b/docs/v11.1.0/troubleshooting/index.html @@ -5,14 +5,14 @@ How to Troubleshoot and Fix Issues | Cumulus Documentation - +
    Version: v11.1.0

    How to Troubleshoot and Fix Issues

    While Cumulus is a complex system, there is a focus on maintaining the integrity and availability of the system and data. Should you encounter errors or issues while using this system, this section will help troubleshoot and solve those issues.

    Backup and Restore

    Cumulus has backup and restore functionality built-in to protect Cumulus data and allow recovery of a Cumulus stack. This is currently limited to Cumulus data and not full S3 archive data. Backup and restore is not enabled by default and must be enabled and configured to take advantage of this feature.

    For more information, read the Backup and Restore documentation.

    Elasticsearch reindexing

    If you run into issues with your Elasticsearch index, a reindex operation is available via the Cumulus API. See the Reindexing Guide.

    Information on how to reindex Elasticsearch is in the Cumulus API documentation.

    Troubleshooting Workflows

    Workflows are state machines comprised of tasks and services and each component logs to CloudWatch. The CloudWatch logs for all steps in the execution are displayed in the Cumulus dashboard or you can find them by going to CloudWatch and navigating to the logs for that particular task.

    Workflow Errors

    Visual representations of executed workflows can be found in the Cumulus dashboard or the AWS Step Functions console for that particular execution.

    If a workflow errors, the error will be handled according to the error handling configuration. The task that fails will have the exception field populated in the output, giving information about the error. Further information can be found in the CloudWatch logs for the task.

    Graph of AWS Step Function execution showing a failing workflow

    Workflow Did Not Start

    Generally, first check your rule configuration. If that is satisfactory, the answer will likely be in the CloudWatch logs for the schedule SF or SF starter lambda functions. See the workflow triggers page for more information on how workflows start.

    For Kinesis and SNS rules specifically, if an error occurs during the message consumer process, the fallback consumer lambda will be called and if the message continues to error, a message will be placed on the dead letter queue. Check the dead letter queue for a failure message. Errors can be traced back to the CloudWatch logs for the message consumer and the fallback consumer. Additionally, check that the name and version match those configured in your rule, as rules are filtered by the notification's collection name and version before scheduling executions.

    More information on kinesis error handling is here.

    Operator API Errors

    All operator API calls are funneled through the ApiEndpoints lambda. Each API call is logged to the ApiEndpoints CloudWatch log for your deployment.

    Lambda Errors

    KMS Exception: AccessDeniedException

    KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.

    The above error was being thrown by cumulus lambda function invocation. The KMS key is the encryption key used to encrypt lambda environment variables. The root cause of this error is unknown, but is speculated to be caused by deleting and recreating, with the same name, the IAM role the lambda uses.

    This error can be resolved by switching the lambda's execution role to a different one and then back through the Lambda management console. Unfortunately, this approach doesn't scale well.

    The other resolution (that scales but takes some time) that was found is as follows:

    1. Comment out all lambda definitions (and dependent resources) in your Terraform configuration.
    2. terraform apply to delete the lambdas.
    3. Un-comment the definitions.
    4. terraform apply to recreate the lambdas.

    If this problem occurs with Core lambdas and you are using the terraform-aws-cumulus.zip file source distributed in our release, we recommend using the non-scaling approach as the number of lambdas we distribute is in the low teens, which are likely to be easier and faster to reconfigure one-by-one compared to editing our configs.

    Error: Unable to import module 'index': Error

    This error is shown in the CloudWatch logs for a Lambda function.

    One possible cause is that the Lambda definition in the .tf file defining the lambda is not pointing to the correct packaged lambda source file. In order to resolve this issue, update the lambda definition to point directly to the packaged (e.g. .zip) lambda source file.

    resource "aws_lambda_function" "discover_granules_task" {
    function_name = "${var.prefix}-DiscoverGranules"
    filename = "${path.module}/../../tasks/discover-granules/dist/lambda.zip"
    handler = "index.handler"
    }

    If you are seeing this error when using the Lambda as a step in a Cumulus workflow, then inspect the output for this Lambda step in the AWS Step Function console. If you see the error Cannot find module 'node_modules/@cumulus/cumulus-message-adapter-js', then you need to ensure the lambda's packaged dependencies include cumulus-message-adapter-js.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/troubleshooting/reindex-elasticsearch/index.html b/docs/v11.1.0/troubleshooting/reindex-elasticsearch/index.html index c75170fb7a1..3b45c007506 100644 --- a/docs/v11.1.0/troubleshooting/reindex-elasticsearch/index.html +++ b/docs/v11.1.0/troubleshooting/reindex-elasticsearch/index.html @@ -5,7 +5,7 @@ Reindexing Elasticsearch Guide | Cumulus Documentation - + @@ -14,7 +14,7 @@ current index, or the mappings for an index have been updated (they do not update automatically). Any reindexing that will be required when upgrading Cumulus will be in the Migration Steps section of the changelog.

    Switch to a new index and Reindex

    There are two operations needed: reindex and change-index to switch over to the new index. A Change Index/Reindex can be done in either order, but both have their trade-offs.

    If you decide to point Cumulus to a new (empty) index first (with a change index operation), and then Reindex the data to the new index, data ingested while reindexing will automatically be sent to the new index. As reindexing operations can take a while, not all the data will show up on the Cumulus Dashboard right away. The advantage is you do not have to turn of any ingest operations. This way is recommended.

    If you decide to Reindex data to a new index first, and then point Cumulus to that new index, it is not guaranteed that data that is sent to the old index while reindexing will show up in the new index. If you prefer this way, it is recommended to turn off any ingest operations. This order will keep your dashboard data from seeing any interruption.

    Change Index

    This will point Cumulus to the index in Elasticsearch that will be used when retrieving data. Performing a change index operation to an index that does not exist yet will create the index for you. The change index operation can be found here.

    Reindex from the old index to the new index

    The reindex operation will take the data from one index and copy it into another index. The reindex operation can be found here

    Reindex status

    Reindexing is a long-running operation. The reindex-status endpoint can be used to monitor the progress of the operation.

    Index from database

    If you want to just grab the data straight from the database you can perform an Index from Database Operation. After the data is indexed from the database, a Change Index operation will need to be performed to ensure Cumulus is pointing to the right index. It is strongly recommended to turn off workflow rules when performing this operation so any data ingested to the database is not lost.

    Validate reindex

    To validate the reindex, use the reindex-status endpoint. The doc count can be used to verify that the reindex was successful. In the below example the reindex from cumulus-2020-11-3 to cumulus-2021-3-4 was not fully successful as they show different doc counts.

    "indices": {
    "cumulus-2020-11-3": {
    "primaries": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    },
    "total": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    }
    },
    "cumulus-2021-3-4": {
    "primaries": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    },
    "total": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    }
    }
    }

    To further drill down into what is missing, log in to the Kibana instance (found in the Elasticsearch section of the AWS console) and run the following command replacing <index> with your index name.

    GET <index>/_search
    {
    "aggs": {
    "count_by_type": {
    "terms": {
    "field": "_type"
    }
    }
    },
    "size": 0
    }

    which will produce a result like

    "aggregations": {
    "count_by_type": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "logs",
    "doc_count": 483955
    },
    {
    "key": "execution",
    "doc_count": 4966
    },
    {
    "key": "deletedgranule",
    "doc_count": 4715
    },
    {
    "key": "pdr",
    "doc_count": 1822
    },
    {
    "key": "granule",
    "doc_count": 740
    },
    {
    "key": "asyncOperation",
    "doc_count": 616
    },
    {
    "key": "provider",
    "doc_count": 108
    },
    {
    "key": "collection",
    "doc_count": 87
    },
    {
    "key": "reconciliationReport",
    "doc_count": 48
    },
    {
    "key": "rule",
    "doc_count": 7
    }
    ]
    }
    }

    Resuming a reindex

    If a reindex operation did not fully complete it can be resumed using the following command run from the Kibana instance.

    POST _reindex?wait_for_completion=false
    {
    "conflicts": "proceed",
    "source": {
    "index": "cumulus-2020-11-3"
    },
    "dest": {
    "index": "cumulus-2021-3-4",
    "op_type": "create"
    }
    }

    The Cumulus API reindex-status endpoint can be used to monitor completion of this operation.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/troubleshooting/rerunning-workflow-executions/index.html b/docs/v11.1.0/troubleshooting/rerunning-workflow-executions/index.html index ece8ba2a25b..ffa9a8af984 100644 --- a/docs/v11.1.0/troubleshooting/rerunning-workflow-executions/index.html +++ b/docs/v11.1.0/troubleshooting/rerunning-workflow-executions/index.html @@ -5,13 +5,13 @@ Re-running workflow executions | Cumulus Documentation - +
    Version: v11.1.0

    Re-running workflow executions

    To re-run a Cumulus workflow execution from the AWS console:

    1. Visit the page for an individual workflow execution

    2. Click the "New execution" button at the top right of the screen

      Screenshot of the AWS console for a Step Function execution highlighting the &quot;New execution&quot; button at the top right of the screen

    3. In the "New execution" modal that appears, replace the cumulus_meta.execution_name value in the default input with the value of the new execution ID as seen in the screenshot below

      Screenshot of the AWS console showing the modal window for entering input when running a new Step Function execution

    4. Click the "Start execution" button

    - + \ No newline at end of file diff --git a/docs/v11.1.0/troubleshooting/troubleshooting-deployment/index.html b/docs/v11.1.0/troubleshooting/troubleshooting-deployment/index.html index dc7ce224b5a..44f6809dee8 100644 --- a/docs/v11.1.0/troubleshooting/troubleshooting-deployment/index.html +++ b/docs/v11.1.0/troubleshooting/troubleshooting-deployment/index.html @@ -5,7 +5,7 @@ Troubleshooting Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ data-persistence modules, but your config is only creating one Elasticsearch instance. To fix the issue, update the elasticsearch_config variable for your data-persistence module to increase the number of instances:

    {
    domain_name = "es"
    instance_count = 2
    instance_type = "t2.small.elasticsearch"
    version = "5.3"
    volume_size = 10
    }

    Install dashboard

    Dashboard configuration

    Issues:

    • Problem clearing the cache: EACCES: permission denied, rmdir '/tmp/gulp-cache/default'", this probably means the files at that location, and/or the folder, are owned by someone else (or some other factor prevents you from writing there).

    It's possible to workaround this by editing the file cumulus-dashboard/node_modules/gulp-cache/index.js and alter the value of the line var fileCache = new Cache({cacheDirName: 'gulp-cache'}); to something like var fileCache = new Cache({cacheDirName: '<prefix>-cache'});. Now gulp-cache will be able to write to /tmp/<prefix>-cache/default, and the error should resolve.

    Dashboard deployment

    Issues:

    • If the dashboard sends you to an Earthdata Login page that has an error reading "Invalid request, please verify the client status or redirect_uri before resubmitting", this means you've either forgotten to update one or more of your EARTHDATA_CLIENT_ID, EARTHDATA_CLIENT_PASSWORD environment variables (from your app/.env file) and re-deploy Cumulus, or you haven't placed the correct values in them, or you've forgotten to add both the "redirect" and "token" URL to the Earthdata Application.
    • There is odd caching behavior associated with the dashboard and Earthdata Login at this point in time that can cause the above error to reappear on the Earthdata Login page loaded by the dashboard even after fixing the cause of the error. If you experience this, attempt to access the dashboard in a new browser window, and it should work.
    - + \ No newline at end of file diff --git a/docs/v11.1.0/upgrade-notes/cumulus_distribution_migration/index.html b/docs/v11.1.0/upgrade-notes/cumulus_distribution_migration/index.html index c3b228c0e70..464c3ed7c52 100644 --- a/docs/v11.1.0/upgrade-notes/cumulus_distribution_migration/index.html +++ b/docs/v11.1.0/upgrade-notes/cumulus_distribution_migration/index.html @@ -5,14 +5,14 @@ Migrate from TEA deployment to Cumulus Distribution | Cumulus Documentation - +
    Version: v11.1.0

    Migrate from TEA deployment to Cumulus Distribution

    Background

    The Cumulus Distribution API is configured to use the AWS Cognito OAuth client. This API can be used instead of the Thin Egress App, which is the default distribution API if using the Deployment Template.

    Configuring a Cumulus Distribution deployment

    See these instructions for deploying the Cumulus Distribution API.

    Important note if migrating from TEA to Cumulus Distribution

    If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/upgrade-notes/migrate_tea_standalone/index.html b/docs/v11.1.0/upgrade-notes/migrate_tea_standalone/index.html index 0246efaa809..7005ce722e5 100644 --- a/docs/v11.1.0/upgrade-notes/migrate_tea_standalone/index.html +++ b/docs/v11.1.0/upgrade-notes/migrate_tea_standalone/index.html @@ -5,13 +5,13 @@ Migrate TEA deployment to standalone module | Cumulus Documentation - +
    Version: v11.1.0

    Migrate TEA deployment to standalone module

    Background

    This document is only relevant for upgrades of Cumulus from versions < 3.x.x to versions > 3.x.x

    Previous versions of Cumulus included deployment of the Thin Egress App (TEA) by default in the distribution module. As a result, Cumulus users who wanted to deploy a new version of TEA to wait on a new release of Cumulus that incorporated that release.

    In order to give Cumulus users the flexibility to deploy newer versions of TEA whenever they want, deployment of TEA has been removed from the distribution module and Cumulus users must now add the TEA module to their deployment. Guidance on integrating the TEA module to your deployment is provided, or you can refer to Cumulus core example deployment code for the thin_egress_app module.

    By default, when upgrading Cumulus and moving from TEA deployed via the distribution module to deployed as a separate module, your API gateway for TEA would be destroyed and re-created, which could cause outages for any Cloudfront endpoints pointing at that API gateway.

    These instructions outline how to modify your state to preserve your existing Thin Egress App (TEA) API gateway when upgrading Cumulus and moving deployment of TEA to a standalone module. If you do not care about preserving your API gateway for TEA when upgrading your Cumulus deployment, you can skip these instructions.

    Prerequisites

    Notes about state management

    These instructions will involve manipulating your Terraform state via terraform state mv commands. These operations are extremely dangerous, since a mistake in editing your Terraform state can leave your stack in a corrupted state where deployment may be impossible or may result in unanticipated resource deletion.

    Since bucket versioning preserves a separate version of your state file each time it is written, and the Terraform state modification commands overwrite the state file, we can mitigate the risk of these operations by downloading the most recent state file before starting the upgrade process. Then, if anything goes wrong during the upgrade, we can restore that previous state version. Guidance on how to perform both operations is provided below.

    Download your most recent state version

    Run this command to download the most recent cumulus deployment state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp s3://BUCKET/KEY /path/to/terraform.tfstate

    Restore a previous state version

    Upload the state file that was previously downloaded to the bucket/key for your state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp /path/to/terraform.tfstate s3://BUCKET/KEY

    Then run terraform plan, which will give an error because we manually overwrote the state file and it is now out of sync with the lock table Terraform uses to track your state file:

    Error: Error loading state: state data in S3 does not have the expected content.

    This may be caused by unusually long delays in S3 processing a previous state
    update. Please wait for a minute or two and try again. If this problem
    persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
    to manually verify the remote state and update the Digest value stored in the
    DynamoDB table to the following value: <some-digest-value>

    To resolve this error, run this command and replace DYNAMO_LOCK_TABLE, BUCKET and KEY with the correct values from cumulus-tf/terraform.tf, and use the digest value from the previous error output:

     aws dynamodb put-item \
    --table-name DYNAMO_LOCK_TABLE \
    --item '{
    "LockID": {"S": "BUCKET/KEY-md5"},
    "Digest": {"S": "some-digest-value"}
    }'

    Now, if you re-run terraform plan, it should work as expected.

    Migration instructions

    Please note: These instructions assume that you are deploying the thin_egress_app module as shown in the Cumulus core example deployment code

    1. Ensure that you have downloaded the latest version of your state file for your cumulus deployment

    2. Find the URL for your <prefix>-thin-egress-app-EgressGateway API gateway. Confirm that you can access it in the browser and that it is functional.

    3. Run terraform plan. You should see output like (edited for readability):

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be created
      + resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket.lambda_source will be created
      + resource "aws_s3_bucket" "lambda_source" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be created
      + resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be created
      + resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be created
      + resource "aws_s3_bucket_object" "lambda_source" {

      # module.thin_egress_app.aws_security_group.egress_lambda[0] will be created
      + resource "aws_security_group" "egress_lambda" {

      ...

      # module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be destroyed
      - resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source will be destroyed
      - resource "aws_s3_bucket" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be destroyed
      - resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be destroyed
      - resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source will be destroyed
      - resource "aws_s3_bucket_object" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda[0] will be destroyed
      - resource "aws_security_group" "egress_lambda" {
    4. Run the state modification commands. The commands must be run in exactly this order:

       # Move security group
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda module.thin_egress_app.aws_security_group.egress_lambda

      # Move TEA storage bucket
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source module.thin_egress_app.aws_s3_bucket.lambda_source

      # Move TEA lambda source code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source module.thin_egress_app.aws_s3_bucket_object.lambda_source

      # Move TEA lambda dependency code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive

      # Move TEA Cloudformation template
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template module.thin_egress_app.aws_s3_bucket_object.cloudformation_template

      # Move URS creds secret version
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret_version.thin_egress_urs_creds aws_secretsmanager_secret_version.thin_egress_urs_creds

      # Move URS creds secret
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret.thin_egress_urs_creds aws_secretsmanager_secret.thin_egress_urs_creds

      # Move TEA Cloudformation stack
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app module.thin_egress_app.aws_cloudformation_stack.thin_egress_app

      Depending on how you were supplying a bucket map to TEA, there may be an additional step. If you were specifying the bucket_map_key variable to the cumulus module to use a custom bucket map, then you can ignore this step and just ensure that the bucket_map_file variable to the TEA module uses that same S3 key. Otherwise, if you were letting Cumulus generate a bucket map for you, then you need to take this step to migrate that bucket map:

      # Move bucket map
      terraform state mv module.cumulus.module.distribution.aws_s3_bucket_object.bucket_map_yaml[0] aws_s3_bucket_object.bucket_map_yaml
    5. Run terraform plan again. You may still see a few additions/modifications pending like below, but you should not see any deletion of Thin Egress App resources pending:

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be updated in-place
      ~ resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be updated in-place
      ~ resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_source" {

      If you still see deletion of module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app pending, then something went wrong and you should restore the previously downloaded state file version and start over from step 1. Otherwise, proceed to step 6.

    6. Once you have confirmed that everything looks as expected, run terraform apply.

    7. Visit the same API gateway from step 1 and confirm that it still works.

    Your TEA deployment has now been migrated to a standalone module, which gives you the ability to upgrade the deployed version of TEA independently of Cumulus releases.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/upgrade-notes/update-cma-2.0.2/index.html b/docs/v11.1.0/upgrade-notes/update-cma-2.0.2/index.html index 44e0fcc6eed..f5bd2a65787 100644 --- a/docs/v11.1.0/upgrade-notes/update-cma-2.0.2/index.html +++ b/docs/v11.1.0/upgrade-notes/update-cma-2.0.2/index.html @@ -5,13 +5,13 @@ Upgrade to CMA 2.0.2 | Cumulus Documentation - +
    Version: v11.1.0

    Upgrade to CMA 2.0.2

    Updating a Cumulus Deployment to CMA 2.0.2

    Background

    The Cumulus Message Adapter has been updated in release 2.0.2 to no longer utilize the AWS step function API to look up the defined name of a step function task for population in meta.workflow_tasks, but instead use an incrementing integer field.

    Additionally a bugfix was released in the form of v2.0.1/v2.0.2 following the initial 2.0.0 release, so all users should update to release 2.0.2

    The update is not tied to a particular version of Core, however the update should be done across all task components in order to ensure consistent execution records.

    Changes

    Execution Record Update

    This update functionally means that Cumulus tasks/activities using the CMA will now record a record that looks like the following in meta.workflowtasks, and more importantly in the tasks column for an execution record:

    Original

          "DiscoverGranules": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "QueueGranules": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    New

          "0": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "1": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    Actions Required

    The following should be done as part of a Cumulus stack update to utilize cumulus message adapter > 2.0.2:

    • Python tasks that utilize cumulus-message-adapter-python should be updated to use > 2.0.0, their lambdas rebuilt and Cumulus workflows reconfigured to use the updated version.

    • Python activities that utilize cumulus-process-py should be rebuilt using > 1.0.0 with updated dependencies, and have their images deployed/Cumulus configured to use the new version.

    • The cumulus-message-adapter v2.0.2 lambda layer should be made available in the deployment account, and the Cumulus deployment should be reconfigured to use it (via the cumulus_message_adapter_lambda_layer_version_arn variable in the cumulus module). This should address all Core node.js tasks that utilize the CMA, and many contributed node.js/JAVA components.

    Once the above have been done, redeploy Cumulus to apply the configuration and the updates should be live.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/upgrade-notes/update-task-file-schemas/index.html b/docs/v11.1.0/upgrade-notes/update-task-file-schemas/index.html index d8d1c6e526f..15f1540e399 100644 --- a/docs/v11.1.0/upgrade-notes/update-task-file-schemas/index.html +++ b/docs/v11.1.0/upgrade-notes/update-task-file-schemas/index.html @@ -5,13 +5,13 @@ Updates to task granule file schemas | Cumulus Documentation - +
    Version: v11.1.0

    Updates to task granule file schemas

    Background

    Most Cumulus workflow tasks expect as input a payload of granule(s) which contain the files for each granule. Most tasks also return this same granule structure as output.

    However, up to this point, there was inconsistency in the schemas for the granule files objects expected by each task. Furthermore, there was no guarantee of consistency between granule files objects as stored in the database and the expectations of any given workflow task.

    Thus, when performing bulk granule operations which pass granules from the database into a Cumulus workflow, it was possible for there to be schema validation failures depending on which task was used to start the workflow and its particular schema.

    In order to rectify this situation, CUMULUS-2388 was filed and addressed to create a common granule files schema between nearly all of the Cumulus tasks (exceptions discussed below) and the Cumulus database. The following documentation explains the manual changes you need to make to your deployment in order to be compatible with the updated files schema.

    Updated files schema

    The updated granule files schema can be found here.

    These former properties were deprecated (with notes about how to derive the same information from the updated schema, if possible):

    • filename - concatenate the bucket and key values with a directory separator (/)
    • name - use fileName property
    • etag - ETags are no longer provided as an individual file property. Instead, a separate etags object mapping S3 URIs to ETag values is provided as output from the following workflow tasks (guidance on how to integrate this output with your workflows is provided in the Upgrading your workflows section below):
      • update-granules-cmr-metadata-file-links
      • hyrax-metadata-updates
    • fileStagingDir - no longer supported
    • url_path - no longer supported
    • duplicate_found - This property is no longer supported, however sync-granule and move-granules now produce a separate granuleDuplicates object as part of their output. The granuleDuplicates object is a map of granules by granule ID which includes the files that encountered duplicates during processing. Guidance on how to integrate granuleDuplicates information into your workflow configuration is provided below.

    Exceptions

    These workflow tasks did not have their schema for granule files updated:

    • discover-granules - no updates
    • queue-granules - no updates
    • parse-pdr - no updates
    • sync-granule - input schema not updated, output schema was updated

    The reason that these task schemas were not updated is that all of these tasks start before the files have been ingested to S3, thus much of the information that is required in the updated files schema like bucket, key, or checksum is not yet known.

    Bulk granule operations

    Since the input schema for the above tasks was not updated, that means you cannot run bulk granule operations against workflows if they start with any of those tasks. Bulk granule operations work by loading the specified granules from the database and sending them as input to a specified workflow, so if the specified workflow begins with a task whose input schema does not conform to what is coming out of the database, there will be schema errors.

    Upgrading your deployment

    Upgrading your workflows

    For any workflows using the update-granules-cmr-metadata-file-links task before the hyrax-metadata-updates and/or post-to-cmr tasks, update the step definition for update-granules-cmr-metadata-file-links as follows:

        "UpdateGranulesCmrMetadataFileLinksStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    hyrax-metadata-updates

    For any workflows using the hyrax-metadata-updates task before a post-to-cmr task, update the definition of the hyrax-metadata-updates step as follows:

        "HyraxMetadataUpdatesTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    post-to-cmr

    For any workflows using post-to-cmr task after the update-granules-cmr-metadata-file-links or hyrax-metadata-updates tasks, update the post-to-cmr step definition as follows:

        "CmrStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}"
    }
    }
    },
    ...more configuration...

    Example workflow

    For an example workflow integrating all of these changes, please see our example ingest and publish workflow.

    Optional - Integrate granuleDuplicates information

    Please note that the granuleDuplicates output is purely informational and does not have any bearing on the separate configuration for how duplicates should be handled.

    You can include granuleDuplicates output from the sync-granule or move-granules tasks in your workflow messages like so:

        "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    ...other config...
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granuleDuplicates}",
    "destination": "{$.meta.sync_granule.granule_duplicates}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    }
    ...more configuration...

    The result of this configuration is that the granuleDuplicates output from sync-granule would be placed in meta.sync_granule.granule_duplicates on the workflow message and remain there throughout the rest of the workflow. The same configuration could be replicated for the move-granules task, but be sure to use a different destination in the workflow message for the granuleDuplicates output .

    Updating collection URL path templates

    Collections can specify url_path templates to dynamically generate the final location of files. As part of url_path templates, file object properties can be interpolated to generate the file path. Thus, these url_path templates need to be updated to ensure that they are compatible with the updated files schema and the properties that will actually be available on file objects.

    See the notes on the updated files schema to know which properties are available and which previously existing properties were deprecated.

    As an example, you will want to update any url_path properties in your collections to remove references to file.name and replace them with references to file.fileName like so:

    - "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.name, 0, 3)}",
    + "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.fileName, 0, 3)}",
    - + \ No newline at end of file diff --git a/docs/v11.1.0/upgrade-notes/upgrade-rds/index.html b/docs/v11.1.0/upgrade-notes/upgrade-rds/index.html index c9e576b641e..4e475eb55e4 100644 --- a/docs/v11.1.0/upgrade-notes/upgrade-rds/index.html +++ b/docs/v11.1.0/upgrade-notes/upgrade-rds/index.html @@ -5,7 +5,7 @@ Upgrade to RDS release | Cumulus Documentation - + @@ -21,7 +21,7 @@ | cutoffSeconds | number | Number of seconds prior to this execution to 'cutoff' reconciliation queries. This allows in-progress/other in-flight operations time to complete and propagate to Elasticsearch/Dynamo/postgres. | 3600 | | dbConcurrency | number | Sets max number of parallel collections reports the script will run at a time. | 20 | | dbMaxPool | number | Sets the maximum number of connections the database pool has available. Modifying this may result in unexpected failures. | 20 |

    - + \ No newline at end of file diff --git a/docs/v11.1.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html b/docs/v11.1.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html index 83413ddfacd..2d63a8a5f8b 100644 --- a/docs/v11.1.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html +++ b/docs/v11.1.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html @@ -5,13 +5,13 @@ Upgrade to TF version 0.13.6 | Cumulus Documentation - +
    Version: v11.1.0

    Upgrade to TF version 0.13.6

    Background

    Cumulus pins its support to a specific version of Terraform see: deployment documentation. The reason for only supporting one specific Terraform version at a time is to avoid deployment errors than can be caused by deploying to the same target with different Terraform versions.

    Cumulus is upgrading its supported version of Terraform from 0.12.12 to 0.13.6. This document contains instructions on how to perform the upgrade for your deployments.

    Prerequisites

    • Follow the Terraform guidance for what to do before upgrading, notably ensuring that you have no pending changes to your Cumulus deployments before proceeding.
      • You should do a terraform plan to see if you have any pending changes for your deployment (for both the data-persistence-tf and cumulus-tf modules), and if so, run a terraform apply before doing the upgrade to Terraform 0.13.6
    • Review the Terraform v0.13 release notes to prepare for any breaking changes that may affect your custom deployment code. Cumulus' deployment code has already been updated for compatibility with version 0.13.
    • Install Terraform version 0.13.6. We recommend using Terraform Version Manager tfenv to manage your installed versons of Terraform, but this is not required.

    Upgrade your deployment code

    Terraform 0.13 does not support some of the syntax from previous Terraform versions, so you need to upgrade your deployment code for compatibility.

    Terraform provides a 0.13upgrade command as part of version 0.13 to handle automatically upgrading your code. Make sure to check out the documentation on batch usage of 0.13upgrade, which will allow you to upgrade all of your Terraform code with one command.

    Run the 0.13upgrade command until you have no more necessary updates to your deployment code.

    Upgrade your deployment

    1. Ensure that you are running Terraform 0.13.6 by running terraform --version. If you are using tfenv, you can switch versions by running tfenv use 0.13.6.

    2. For the data-persistence-tf and cumulus-tf directories, take the following steps:

      1. Run terraform init --reconfigure. The --reconfigure flag is required, otherwise you might see an error like:

        Error: Failed to decode current backend config

        The backend configuration created by the most recent run of "terraform init"
        could not be decoded: unsupported attribute "lock_table". The configuration
        may have been initialized by an earlier version that used an incompatible
        configuration structure. Run "terraform init -reconfigure" to force
        re-initialization of the backend.
      2. Run terraform apply to perform a deployment.

        WARNING: Even if Terraform says that no resource changes are pending, running the apply using Terraform version 0.13.6 will modify your backend state from version 0.12.12 to version 0.13.6 without requiring approval. Updating the backend state is a necessary part of the version 0.13.6 upgrade, but it is not completely transparent.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/workflow_tasks/discover_granules/index.html b/docs/v11.1.0/workflow_tasks/discover_granules/index.html index ac2feb24895..8fcda6c72d1 100644 --- a/docs/v11.1.0/workflow_tasks/discover_granules/index.html +++ b/docs/v11.1.0/workflow_tasks/discover_granules/index.html @@ -5,7 +5,7 @@ Discover Granules | Cumulus Documentation - + @@ -21,7 +21,7 @@ included in a granule's file list. That is, no such filtering based on filename occurs as described above.

    When set on the task configuration, the value applies to all collections during discovery. Otherwise, this property may be set on individual collections.

    Concurrency

    A number property that determines the level of concurrency with which granule duplicate checks are performed when duplicateGranuleHandling is skip or error.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when discover-granules discovers a large number of granules with skip or error duplicate handling. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the discover-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/workflow_tasks/files_to_granules/index.html b/docs/v11.1.0/workflow_tasks/files_to_granules/index.html index 4a9958fd8b7..18b31cae2f2 100644 --- a/docs/v11.1.0/workflow_tasks/files_to_granules/index.html +++ b/docs/v11.1.0/workflow_tasks/files_to_granules/index.html @@ -5,13 +5,13 @@ Files To Granules | Cumulus Documentation - +
    Version: v11.1.0

    Files To Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming config.inputGranules and the task input list of s3 URIs along with the rest of the configuration objects to take the list of incoming files and sort them into a list of granule objects.

    Please note Files passed in without metadata defined previously for config.inputGranules will be added with the following keys:

    • size
    • bucket
    • key
    • fileName

    It is primarily intended to support compatibility with the standard output of a processing task, and convert that output into a granule object accepted as input by the majority of other Cumulus tasks.

    Task Inputs

    Input

    This task expects an incoming input that contains an array of 'staged' S3 URIs to move to their final archive location.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    inputGranules

    An array of Cumulus granule objects.

    This object will be used to define metadata values for the move granules task, and is the basis for the updated object that will be added to the output.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/workflow_tasks/lzards_backup/index.html b/docs/v11.1.0/workflow_tasks/lzards_backup/index.html index dbbbc02123b..872ee9fa845 100644 --- a/docs/v11.1.0/workflow_tasks/lzards_backup/index.html +++ b/docs/v11.1.0/workflow_tasks/lzards_backup/index.html @@ -5,13 +5,13 @@ LZARDS Backup | Cumulus Documentation - +
    Version: v11.1.0

    LZARDS Backup

    The LZARDS backup task takes an array of granules and initiates backup requests to the LZARDS API, which will be handled asynchronously by LZARDS.

    Deployment

    The LZARDS backup task is not automatically deployed with Cumulus. To deploy the task through the Cumulus module, first you must specify a lzards_launchpad_passphrase in your terraform variables (e.g. variables.tf) like so:

    variable "lzards_launchpad_passphrase" {
    type = string
    default = ""
    }

    Then you can specify a value for your lzards_launchpad_passphrase in terraform.tfvars like so:

    lzards_launchpad_passphrase = your-passphrase

    Lastly, you need to make sure that the lzards_launchpad_passphrase is passed into the Cumulus module (in main.tf) like so:

    lzards_launchpad_passphrase  = var.lzards_launchpad_passphrase

    In short, deploying the LZARDS task requires configuring a passphrase variable and ensuring that your TF configuration passes that variable into the Cumulus module.

    Additional terraform configuration for the LZARDS task can be found in the cumulus module's variables.tf file, where the the relevant variables are prefixed with lzards_. You can add these variables to your deployment using the same process outlined above for lzards_launchpad_passphrase.

    Task Inputs

    Input

    This task expects an array of granules as input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Task Outputs

    Output

    The LZARDS task outputs a composite object containing:

    • the input granules array, and
    • a backupResults object that describes the results of LZARDS backup attempts.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/workflow_tasks/move_granules/index.html b/docs/v11.1.0/workflow_tasks/move_granules/index.html index b647d41094b..e5665dd1982 100644 --- a/docs/v11.1.0/workflow_tasks/move_granules/index.html +++ b/docs/v11.1.0/workflow_tasks/move_granules/index.html @@ -5,13 +5,13 @@ Move Granules | Cumulus Documentation - +
    Version: v11.1.0

    Move Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming event.input array of Cumulus granule objects to do the following:

    • Move granules from their 'staging' location to the final location (as configured in the Sync Granules task)

    • Update the event.input object with the new file locations.

    • If the granule has a ECHO10/UMM CMR file(.cmr.xml or .cmr.json) file included in the event.input:

      • Update that file's access locations

      • Add it to the appropriate access URL category for the CMR filetype as defined by granule CNM filetype.

      • Set the CMR file to 'metadata' in the output granules object and add it to the granule files if it's not already present.

        Please note: Granules without a valid CNM type set in the granule file type field in event.input will be treated as "data" in the updated CMR metadata file

    • Task then outputs an updated list of granule objects.

    Task Inputs

    Input

    This task expects an incoming input that contains a list of 'staged' S3 URIs to move to their final archive location. If CMR metadata is to be updated for a granule, it must also be included in the input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects event.input to provide an array of Cumulus granule objects. The files listed for each granule represent the files to be acted upon as described in summary.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects with post-move file locations as the payload for the next task, and returns only the expected payload for the next task. If a CMR file has been specified for a granule object, the CMR resources related to the granule files will be updated according to the updated granule file metadata.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v11.1.0/workflow_tasks/parse_pdr/index.html b/docs/v11.1.0/workflow_tasks/parse_pdr/index.html index 18a55ee83a1..008adfd6158 100644 --- a/docs/v11.1.0/workflow_tasks/parse_pdr/index.html +++ b/docs/v11.1.0/workflow_tasks/parse_pdr/index.html @@ -5,13 +5,13 @@ Parse PDR | Cumulus Documentation - +
    Version: v11.1.0

    Parse PDR

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to do the following with the incoming PDR object:

    • Stage it to an internal S3 bucket

    • Parse the PDR

    • Archive the PDR and remove the staged file if successful

    • Outputs a payload object containing metadata about the parsed PDR (e.g. total size of all files, files counts, etc) and a granules object

    The constructed granules object is created using PDR metadata to determine values like data type and version, collection definitions to determine a file storage location based on the extracted data type and version number.

    Granule file types are converted from the PDR spec types to CNM types according to the following translation table:

      HDF: 'data',
    HDF-EOS: 'data',
    SCIENCE: 'data',
    BROWSE: 'browse',
    METADATA: 'metadata',
    BROWSE_METADATA: 'metadata',
    QA_METADATA: 'metadata',
    PRODHIST: 'qa',
    QA: 'metadata',
    TGZ: 'data',
    LINKAGE: 'data'

    Files missing file types will have none assigned, files with invalid types will result in a PDR parse failure.

    Task Inputs

    Input

    This task expects an incoming input that contains name and path information about the PDR to be parsed. For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    Provider

    A Cumulus provider object. Used to define connection information for retrieving the PDR.

    Bucket

    Defines the bucket where the 'pdrs' folder for parsed PDRs will be stored.

    Collection

    A Cumulus collection object. Used to define granule file groupings and granule metadata for discovered files.

    Task Outputs

    This task outputs a single payload output object containing metadata about the parsed PDR (e.g. filesCount, totalSize, etc), a pdr object with information for later steps and a the generated array of granule objects.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v11.1.0/workflow_tasks/queue_granules/index.html b/docs/v11.1.0/workflow_tasks/queue_granules/index.html index 1e7c453ab83..6509f3ff86b 100644 --- a/docs/v11.1.0/workflow_tasks/queue_granules/index.html +++ b/docs/v11.1.0/workflow_tasks/queue_granules/index.html @@ -5,14 +5,14 @@ Queue Granules | Cumulus Documentation - +
    Version: v11.1.0

    Queue Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions, and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to schedule ingest of granules that were discovered on a remote host, whether via the DiscoverGranules task or the ParsePDR task.

    The task utilizes a defined collection in concert with a defined provider, either on each granule, or passed in via config to queue up ingest executions for each granule, or for batches of granules.

    The constructed granules object is defined by the collection passed in the configuration, and has impacts to other provided core Cumulus Tasks.

    Users of this task in a workflow are encouraged to carefully consider their configuration in context of downstream tasks and workflows.

    Task Inputs

    Each of the following sections are a high-level discussion of the intent of the various input/output/config values.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects an incoming input that contains granules and information about them and their files. For the specifics, see the Cumulus Tasks page entry for the schema.

    This input is most commonly the output from a preceding DiscoverGranules or ParsePDR task.

    Cumulus Configuration

    This task does expect values to be set in the task_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    provider

    A Cumulus provider object for the originating provider. Will be passed along to the ingest workflow. This will be overruled by more specific provider information that may exist on a granule.

    internalBucket

    The Cumulus internal system bucket.

    granuleIngestWorkflow

    A string property that denotes the name of the ingest workflow into which granules should be queued.

    queueUrl

    A string property that denotes the URL of the queue to which scheduled execution messages are sent.

    preferredQueueBatchSize

    A number property that sets an upper bound on the size of each batch of granules queued into the payload of an ingest execution. Setting this property to a value higher than 1 allows queueing of multiple granules per ingest workflow.

    As ingest executions typically expect granules in the payload to have a common collection and common provider, this property only sets an upper bound within which batches will be created based on common collection and provider information.

    This means batches may be smaller than the preferred size if collection or provider information diverge, but never larger.

    The default value if none is specified is 1, which will queue one ingest execution per granule.

    concurrency

    A number property that determines the level of concurrency with which ingest executions are scheduled. Granules or batches of granules will be queued up into executions at this level of concurrency.

    This property is also used to limit concurrency when updating granule status to queued.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when queue-granules receives a large number of granules as input. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the queue-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    executionNamePrefix

    A string property that will prefix the names of scheduled executions.

    childWorkflowMeta

    An object property that will be merged into the scheduled execution input's meta field.

    Task Outputs

    This task outputs an assembled array of workflow execution ARNs for all scheduled workflow executions within the payload's running object.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/workflows/cumulus-task-message-flow/index.html b/docs/v11.1.0/workflows/cumulus-task-message-flow/index.html index f2709318f47..3f4a1a88e46 100644 --- a/docs/v11.1.0/workflows/cumulus-task-message-flow/index.html +++ b/docs/v11.1.0/workflows/cumulus-task-message-flow/index.html @@ -5,14 +5,14 @@ Cumulus Tasks: Message Flow | Cumulus Documentation - +
    Version: v11.1.0

    Cumulus Tasks: Message Flow

    Cumulus Tasks comprise Cumulus Workflows and are either AWS Lambda tasks or AWS Elastic Container Service (ECS) activities. Cumulus Tasks permit a payload as input to the main task application code. The task payload is additionally wrapped by the Cumulus Message Adapter. The Cumulus Message Adapter supplies additional information supporting message templating and metadata management of these workflows.

    Diagram showing how incoming and outgoing Cumulus messages for workflow steps are handled by the Cumulus Message Adapter

    The steps in this flow are detailed in sections below.

    Cumulus Message Format

    A full Cumulus Message has the following keys:

    • cumulus_meta: System runtime information that should generally not be touched outside of Cumulus library code or the Cumulus Message Adapter. Stores meta information about the workflow such as the state machine name and the current workflow execution's name. This information is used to look up the current active task. The name of the current active task is used to look up the corresponding task's config in task_config.
    • meta: Runtime information captured by the workflow operators. Stores execution-agnostic variables.
    • payload: Payload is runtime information for the tasks.

    In addition to the above keys, it may contain the following keys:

    • replace: A key generated in conjunction with the Cumulus Message adapter. It contains the location on S3 for a message payload and a Target JSON path in the message to extract it to.
    • exception: A key used to track workflow exceptions, should not be modified outside of Cumulus library code.

    Here's a simple example of a Cumulus Message:

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    A message utilizing the Cumulus Remote message functionality must have at least the keys replace and cumulus_meta. Depending on configuration other portions of the message may be present, however the cumulus_meta, meta, and payload keys must be present once extraction is complete.

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    Cumulus Message Preparation

    The event coming into a Cumulus Task is assumed to be a Cumulus Message and should first be handled by the functions described below before being passed to the task application code.

    Preparation Step 1: Fetch remote event

    Fetch remote event will fetch the full event from S3 if the cumulus message includes a replace key.

    Once "my-large-event.json" is fetched from S3, it's returned from the fetch remote event function. If no "replace" key is present, the event passed to the fetch remote event function is assumed to be a complete Cumulus Message and returned as-is.

    Preparation Step 2: Parse step function config from CMA configuration parameters

    This step determines what current task is being executed. Note this is different from what lambda or activity is being executed, because the same lambda or activity can be used for different tasks. The current task name is used to load the appropriate configuration from the Cumulus Message's 'task_config' configuration parameter.

    Preparation Step 3: Load nested event

    Using the config returned from the previous step, load nested event resolves templates for the final config and input to send to the task's application code.

    Task Application Code

    After message prep, the message passed to the task application code is of the form:

    {
    "input": {},
    "config": {}
    }

    Create Next Message functions

    Whatever comes out of the task application code is used to construct an outgoing Cumulus Message.

    Create Next Message Step 1: Assign outputs

    The config loaded from the Fetch step function config step may have a cumulus_message key. This can be used to "dispatch" fields from the task's application output to a destination in the final event output (via URL templating). Here's an example where the value of input.anykey would be dispatched as the value of payload.out in the final cumulus message:

    {
    "task_config": {
    "bar": "baz",
    "cumulus_message": {
    "input": "{$.payload.input}",
    "outputs": [
    {
    "source": "{$.input.anykey}",
    "destination": "{$.payload.out}"
    }
    ]
    }
    },
    "cumulus_meta": {
    "task": "Example",
    "message_source": "local",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "input": {
    "anykey": "anyvalue"
    }
    }
    }

    Create Next Message Step 2: Store remote event

    If the ReplaceConfiguration parameter is set, the configured key's value will be stored in S3 and the final output of the task will include a replace key that contains configuration for a future step to extract the payload on S3 back into the Cumulus Message. The replace key identifies where the large event node has been stored in S3.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/workflows/developing-a-cumulus-workflow/index.html b/docs/v11.1.0/workflows/developing-a-cumulus-workflow/index.html index d0884286d2f..757dbeecd90 100644 --- a/docs/v11.1.0/workflows/developing-a-cumulus-workflow/index.html +++ b/docs/v11.1.0/workflows/developing-a-cumulus-workflow/index.html @@ -5,13 +5,13 @@ Creating a Cumulus Workflow | Cumulus Documentation - +
    Version: v11.1.0

    Creating a Cumulus Workflow

    The Cumulus workflow module

    To facilitate adding a workflows to your deployment Cumulus provides a workflow module.

    In combination with the Cumulus message, the workflow module provides a way to easily turn a Step Function definition into a Cumulus workflow, complete with:

    Using the module also ensures that your workflows will continue to be compatible with future versions of Cumulus.

    For more on the full set of current available options for the module, please consult the module README.

    Adding a new Cumulus workflow to your deployment

    To add a new Cumulus workflow to your deployment that is using the cumulus module, add a new workflow resource to your deployment directory, either in a new .tf file, or to an existing file.

    The workflow should follow a syntax similar to:

    module "my_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/vx.x.x/terraform-aws-cumulus-workflow.zip"

    prefix = "my-prefix"
    name = "MyWorkflowName"
    system_bucket = "my-internal-bucket"

    workflow_config = module.cumulus.workflow_config

    tags = { Deployment = var.prefix }

    state_machine_definition = <<JSON
    {}
    JSON
    }

    In the above example, you would add your state_machine_definition using the Amazon States Language, using tasks you've developed and Cumulus core tasks that are made available as part of the cumulus terraform module.

    Please note: Cumulus follows the convention of tagging resources with the prefix variable { Deployment = var.prefix } that you pass to the cumulus module. For resources defined outside of Core, it's recommended that you adopt this convention as it makes resources and/or deployment recovery scenarios much easier to manage.

    Examples

    For a functional example of a basic workflow, please take a look at the hello_world_workflow.

    For more complete/advanced examples, please read the following cookbook entries/topics:

    - + \ No newline at end of file diff --git a/docs/v11.1.0/workflows/developing-workflow-tasks/index.html b/docs/v11.1.0/workflows/developing-workflow-tasks/index.html index 50653e347b2..0131eeee9d9 100644 --- a/docs/v11.1.0/workflows/developing-workflow-tasks/index.html +++ b/docs/v11.1.0/workflows/developing-workflow-tasks/index.html @@ -5,13 +5,13 @@ Developing Workflow Tasks | Cumulus Documentation - +
    Version: v11.1.0

    Developing Workflow Tasks

    Workflow tasks can be either AWS Lambda Functions or ECS Activities.

    Lambda functions

    The full set of available core Lambda functions can be found in the deployed cumulus module zipfile at /tasks, as well as reference documentation here. These Lambdas can be referenced in workflows via the outputs from that module (see the cumulus-template-deploy repo for an example).

    The tasks source is located in the Cumulus repository at cumulus/tasks.

    You can also develop your own Lambda function. See the Lambda Functions page to learn more.

    ECS Activities

    ECS activities are supported via the cumulus_ecs_module available from the Cumulus release page.

    Please read the module README for configuration details.

    For assistance in creating a task definition within the module read the AWS Task Definition Docs.

    For a step-by-step example of using the cumulus_ecs_module, please see the related cookbook entry.

    Cumulus Docker Image

    ECS activities require a docker image. Cumulus provides a docker image (source for node 12x+ lambdas on dockerhub: cumuluss/cumulus-ecs-task.

    Alternate Docker Images

    Custom docker images/runtimes are supported as are private registries. For details on configuring a private registry/image see the AWS documentation on Private Registry Authentication for Tasks.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/workflows/docker/index.html b/docs/v11.1.0/workflows/docker/index.html index 4e19857057a..909375cf973 100644 --- a/docs/v11.1.0/workflows/docker/index.html +++ b/docs/v11.1.0/workflows/docker/index.html @@ -5,7 +5,7 @@ Dockerizing Data Processing | Cumulus Documentation - + @@ -14,7 +14,7 @@ 2) validate the output (in this case just check for existence) 3) use 'ncatted' to update the resulting file to be CF-compliant 4) write out metadata generated for this file

    Process Testing

    It is important to have tests for data processing, however in many cases datafiles can be large so it is not practical to store the test data in the repository. Instead, test data is currently stored on AWS S3, and can be retrieved using the AWS CLI.

    aws s3 sync s3://cumulus-ghrc-logs/sample-data/collection-name data

    Where collection-name is the name of the data collection, such as 'avaps', or 'cpl'. For example, an abridged version of the data for CPL includes:

    ├── cpl
    │   ├── input
    │   │   ├── HS3_CPL_ATB_12203a_20120906.hdf5
    │   │   ├── HS3_CPL_OP_12203a_20120906.hdf5
    │   └── output
    │   ├── HS3_CPL_ATB_12203a_20120906.nc
    │   ├── HS3_CPL_ATB_12203a_20120906.nc.meta.xml
    │   ├── HS3_CPL_OP_12203a_20120906.nc
    │   ├── HS3_CPL_OP_12203a_20120906.nc.meta.xml

    Contained in the input directory are all possible sets of data files, while the output directory is the expected result of processing. In this case the hdf5 files are converted to NetCDF files and XML metadata files are generated.

    The docker image for a process can be used on the retrieved test data. First create a test-output directory in the newly created data directory.

    mkdir data/test-output

    Then run the docker image using docker-compose.

    docker-compose run test

    This will process the data in the data/input directory and put the output into data/test-output. Repositories also include Python based tests which will validate this newly created output to the contents of data/output. Use Python's Nose tool to run the included tests.

    nosetests

    If the data/test-output directory validated against the contents of data/output the tests will be successful, otherwise an error will be reported.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/workflows/index.html b/docs/v11.1.0/workflows/index.html index 493f3470298..08a344ca5ba 100644 --- a/docs/v11.1.0/workflows/index.html +++ b/docs/v11.1.0/workflows/index.html @@ -5,13 +5,13 @@ Workflows | Cumulus Documentation - +
    Version: v11.1.0

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    Provider data ingest and GIBS have a set of common needs in getting data from a source system and into the cloud where they can be distributed to end users. These common needs are:

    • Data Discovery - Crawling, polling, or detecting changes from a variety of sources.
    • Data Transformation - Taking data files in their original format and extracting and transforming them into another desired format such as visible browse images.
    • Archival - Storage of the files in a location that's accessible to end users.

    The high level view of the architecture and many of the individual steps are the same but the details of ingesting each type of collection differs. Different collection types and different providers have different needs. The individual boxes of a workflow are not only different. The branching, error handling, and multiplicity of the arrows connecting the boxes are also different. Some need visible images rendered from component data files from multiple collections. Some need to contact the CMR with updated metadata. Some will have different retry strategies to handle availability issues with source data systems.

    AWS and other cloud vendors provide an ideal solution for parts of these problems but there needs to be a higher level solution to allow the composition of AWS components into a full featured solution. The Ingest Workflow Architecture is designed to meet the needs for Earth Science data ingest and transformation.

    Goals

    Flexibility and Composability

    The steps to ingest and process data is different for each collection within a provider. Ingest should be as flexible as possible in the rearranging of steps and configuration.

    We want to use lego-like individual steps that can be composed by an operator.

    Individual steps should ...

    • Be as ignorant as possible of the overall flow. They should not be aware of previous steps.
    • Be runnable on their own.
    • Define their input and output in simple data structures.
    • Be domain agnostic.
    • Not make assumptions of specifics of what goes into a granule for example.

    Scalable

    The ingest architecture needs to be scalable both to handle ingesting hundreds of millions of granules and interpret dozens of different workflows.

    Data Provenance

    • We should have traceability for how data was produced and where it comes from.
    • Use immutable representations of data. Data once received is not overwritten. Data can be removed for cleanup.
    • All software is versioned. We can trace transformation of data by tracking the immutable source data and the versioned software applied to it.

    Operator Visibility and Control

    • Operators should be able to see and understand everything that is happening in the system.
    • It should be obvious why things are happening and straightforward to diagnose problems.
    • We generally assume that the operators know best in terms of the limits on a providers infrastructure, how often things need to be done, and details of a collection. The architecture should defer to their decisions and knowledge while providing safety nets to prevent problems.

    A Reconfigurable Workflow Architecture

    The Ingest Workflow Architecture is defined by two entity types, Workflows and Tasks. A Workflow is a set of composed Tasks to complete an objective such as ingesting a granule. Tasks are the individual steps of a Workflow that perform one job. The workflow is responsible for executing the right task based on the current state and response from the last task executed. Tasks are completely decoupled in that they don't call each other or even need to know about the presence of other tasks.

    Workflows and tasks are configured as Terraform resources, which are triggered via configured rules within Cumulus.

    Diagram showing the Step Function execution path through workflow tasks for a collection ingest

    See the Example GIBS Ingest Architecture showing how workflows and tasks are used to define the GIBS Ingest Architecture.

    Workflows

    A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions.

    Benefits of AWS Step Functions

    AWS Step functions are described in detail in the AWS documentation but they provide several benefits which are applicable to AWS.

    • Prebuilt solution
    • Operations Visibility
      • Visual diagram
      • Every execution is recorded with both inputs and output for every step.
    • Composability
      • Allow composing AWS Lambdas and code running in other steps. Code can be run in EC2 to interface with it or even on premise if desired.
      • Step functions allow specifying when steps run in parallel or choices between steps based on data from the previous step.
    • Flexibility
      • Step functions are designed to be easy to build new applications and reconfigure. We're exposing that flexibility directly to the provider.
    • Reliability and Error Handling
      • Step functions allow configuration of retries and adding handling of error conditions.
    • Described via data
      • This makes it easy to save the step function in configuration management solutions.
      • We can build simple interfaces on top of the flexibility provided.

    Workflow Scheduler

    The scheduler is responsible for initiating a step function and passing in the relevant data for a collection. This is currently configured as an interval for each collection. The scheduler service creates the initial event by combining the collection configuration with the AWS execution context defined via the cumulus terraform module.

    Tasks

    A workflow is composed of tasks. Each task is responsible for performing a discrete step of the ingest process. These can be activities like:

    • Crawling a provider website for new data.
    • Uploading data from a provider to S3.
    • Executing a process to transform data.

    AWS Step Functions permit tasks to be code running anywhere, even on premise. We expect most tasks will be written as Lambda functions in order to take advantage of the easy deployment, scalability, and cost benefits provided by AWS Lambda.

    • Leverages Existing Work
      • The design leverages the existing work of Amazon by defining workflows using the AWS Step Function State Language. This is the language that was created for describing the state machines used in AWS Step Functions.
    • Open for Extension
      • Both meta and task_config which are used for configuring at the collection and task levels do not dictate the fields and structure of the configuration. Additional task specific JSON schemas can be used for extending the validation of individual steps.
    • Data-centric Configuration
      • The use of a single JSON configuration file allows this to be added to a workflow. We build additional support on top of the configuration file for simpler domain specific configuration or interactive GUIs.

    For more details on Task Messages and Configuration, visit Cumulus configuration and message protocol documentation.

    Ingest Deploy

    To view deployment documentation, please see the Cumulus deployment documentation.

    Tradeoffs, and Benefits

    This section documents various tradeoffs and benefits of the Ingest Workflow Architecture.

    Tradeoffs

    Workflow execution is handled completely by AWS

    This means we can't add our own code into the orchestration of the workflow. We can't add new features not supported by Step Functions. We can't do things like enforce that the responses from tasks always conform to a schema or extract the configuration for a task ahead of it's execution.

    If we implemented our own orchestration we'd be able to add all of these. We save significant amounts of development effort and gain all the features of Step Functions for this trade off. One workaround is by providing a library of common task capabilities. These would optionally be available to tasks that can be implemented with Node.js and are able to include the library.

    Workflow Configuration is specified in AWS Step Function States Language

    The current design combines the states language defined by AWS with Ingest specific configuration. This means our representation has a tight coupling with their standard. If they make backwards incompatible changes in the future we will have to deal with existing projects written against that.

    We avoid having to develop our own standard and code to process it. The design can support new features in AWS Step Functions without needing to update the Ingest library code changes. It is unlikely they will make a backwards incompatible change at this point. One mitigation for this is writing data transformations to a new format if that were to happen.

    Collection Configuration Flexibility vs Complexity

    The Collections Configuration File is very flexible but requires more knowledge of AWS step functions to configure. A person modifying this file directly would need to comfortable editing a JSON file and configuring AWS Step Functions state transitions which address AWS resources.

    The configuration file itself is not necessarily meant to be edited by a human directly. Since we are developing a reconfigurable, composable architecture that specified entirely in data additional tools can be developed on top of it. The existing recipes.json files can be mapped to this format. Operational Tools like a GUI can be built that provide a usable interface for customizing workflows but it will take time to develop these tools.

    Benefits

    This section describes benefits of the Ingest Workflow Architecture.

    Simplicity

    The concepts of Workflows and Tasks are simple ones that should make sense to providers. Additionally, the implementation will only consist of a few components because the design leverages existing services and capabilities of AWS. The Ingest implementation will only consist of some reusable task code to make task implementation easier, Ingest deployment, and the Workflow Scheduler.

    Composability

    The design aims to satisfy the needs for ingest integrating different workflows for providers. It's flexible in terms of the ability to arrange tasks to meet the needs of a collection. Providers have developed and incorporated open source tools over the years. All of these are easily integrable into the workflows as tasks.

    There is low coupling between task steps. Failures of one component don't bring the whole system down. Individual tasks can be deployed separately.

    Scalability

    AWS Step Functions scale up as needed and aren't limited by a set of number of servers. They also easily allow you to leverage the inherent scalability of serverless functions.

    Monitoring and Auditing

    • Every execution is captured.
    • Every task run has captured input and outputs.
    • CloudWatch Metrics can be used for monitoring many of the events with the StepFunctions. It can also generate alarms for the whole process.
    • Visual report of the entire configuration.
      • Errors and success states are highlighted visually in the flow.

    Data Provenance

    • Monitoring and auditing ensures we know the data that was given to a task.
    • Workflows are versioned and the state machines stored in AWS Step Functions are immutable. Once created they cannot change.
    • Versioning of data in S3 or using immutable records in S3 will mean we always know what data was created as the result of a step or fed into a step.

    Appendix

    Example GIBS Ingest Architecture

    This shows the GIBS Ingest Architecture as an example of the use of the Ingest Workflow Architecture.

    • The GIBS Ingest Architecture consists of two workflows per collection type. There is one for discovery and one for ingest. The final stage of discovery triggers multiple ingest workflows for each MRF granule that needs to be generated.
    • It demonstrates both lambdas as tasks and a container used for MRF generation.

    GIBS Ingest Workflows

    Diagram showing the AWS Step Function execution path for a GIBS ingest workflow

    GIBS Ingest Granules Workflow

    This shows a visualization of an execution of the ingets granules workflow in step functions. The steps highlighted in green are the ones that executed and completed successfully.

    Diagram showing the AWS Step Function execution path for a GIBS ingest granules workflow

    - + \ No newline at end of file diff --git a/docs/v11.1.0/workflows/input_output/index.html b/docs/v11.1.0/workflows/input_output/index.html index e2a683726fa..ec63e246468 100644 --- a/docs/v11.1.0/workflows/input_output/index.html +++ b/docs/v11.1.0/workflows/input_output/index.html @@ -5,14 +5,14 @@ Workflow Inputs & Outputs | Cumulus Documentation - +
    Version: v11.1.0

    Workflow Inputs & Outputs

    General Structure

    Cumulus uses a common format for all inputs and outputs to workflows. The same format is used for input and output from workflow steps. The common format consists of a JSON object which holds all necessary information about the task execution and AWS environment. Tasks return objects identical in format to their input with the exception of a task-specific payload field. Tasks may also augment their execution metadata.

    Cumulus Message Adapter

    The Cumulus Message Adapter and Cumulus Message Adapter libraries help task developers integrate their tasks into a Cumulus workflow. These libraries adapt input and outputs from tasks into the Cumulus Message format. The Scheduler service creates the initial event message by combining the collection configuration, external resource configuration, workflow configuration, and deployment environment settings. The subsequent workflow messages between tasks must conform to the message schema. By using the Cumulus Message Adapter, individual task Lambda functions only receive the input and output specifically configured for the task, and not non-task-related message fields.

    The Cumulus Message Adapter libraries are called by the tasks with a callback function containing the business logic of the task as a parameter. They first adapt the incoming message to a format more easily consumable by Cumulus tasks, then invoke the task, and then adapt the task response back to the Cumulus message protocol to be sent to the next task.

    A task's Lambda function can be configured to include a Cumulus Message Adapter library which constructs input/output messages and resolves task configurations. The CMA can then be included in one of several ways:

    Lambda Layer

    In order to make use of this configuration, a Lambda layer must be uploaded to your account. Due to platform restrictions, Core cannot currently support sharable public layers, however you can deploy the appropriate version from the release page in two ways:

    Once you've deployed the layer, integrate the CMA layer with your Lambdas:

    • If using the cumulus module, set the cumulus_message_adapter_lambda_layer_version_arn in your .tfvars file to integrate the CMA layer with all core Cumulus lambdas.
    • If including your own Lambda or ECS task Terraform modules, specify the CMA layer ARN in the Terraform resource definitions. Also, make sure to set the CUMULUS_MESSAGE_ADAPTER_DIR environment variable for the task to /opt for the CMA integration to work properly.

    In the future if you wish to update/change the CMA version you will need to update the deployed CMA, and update the layer configuration for the impacted Lambdas as needed.

    Please Note: Updating/removing a layer does not change a deployed Lambda, so to update the CMA you should deploy a new version of the CMA layer, update the associated Lambda configuration to reference the new CMA version, and re-deploy your Lambdas.

    Manual Addition

    You can include the CMA package in the Lambda code in the cumulus-message-adapter sub-directory in your lambda .zip, for any Lambda runtime that includes a python runtime. python 2 is included in Lambda runtimes that use Amazon Linux, however Amazon Linux 2 will not support this directly.

    Please note: It is expected that upcoming Cumulus releases will update the CMA layer to include a python runtime.

    If you are manually adding the message adapter to your source and utilizing the CMA, you should set the Lambda's CUMULUS_MESSAGE_ADAPTER_DIR environment variable to target the installation path for the CMA.

    CMA Input/Output

    Input to the task application code is a json object with keys:

    • input: By default, the incoming payload is the payload output from the previous task, or it can be a portion of the payload as configured for the task in the corresponding .tf workflow definition file.
    • config: Task-specific configuration object with URL templates resolved.

    Output from the task application code is returned in and placed in the payload key by default, but the config key can also be used to return just a portion of the task output.

    CMA configuration

    As of Cumulus > 1.15 and CMA > v1.1.1, configuration of the CMA is expected to be driven by AWS Step Function Parameters.

    Using the CMA package with the Lambda by any of the above mentioned methods (Lambda Layers, manual) requires configuration for its various features via a specific Step Function Parameters configuration format (see sample workflows in the examples cumulus-tf source for more examples):

    {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": "{some config}",
    "task_config": "{some config}"
    }
    }

    The "event.$": "$" parameter is required as it passes the entire incoming message to the CMA client library for parsing, and the CMA itself to convert the incoming message into a Cumulus message for use in the function.

    The following are the CMA's current configuration settings:

    ReplaceConfig (Cumulus Remote Message)

    Because of the potential size of a Cumulus message, mainly the payload field, a task can be set via configuration to store a portion of its output on S3 with a message key Remote Message that defines how to retrieve it and an empty JSON object {} in its place. If the portion of the message targeted exceeds the configured MaxSize (defaults to 0 bytes) it will be written to S3.

    The CMA remote message functionality can be configured using parameters in several ways:

    Partial Message

    Setting the Path/Target path in the ReplaceConfig parameter (and optionally a non-default MaxSize)

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 1,
    "Path": "$.payload",
    "TargetPath": "$.payload"
    }
    }
    }
    }
    }

    will result in any payload output larger than the MaxSize (in bytes) to be written to S3. The CMA will then mark that the key has been replaced via a replace key on the event. When the CMA picks up the replace key in future steps, it will attempt to retrieve the output from S3 and write it back to payload.

    Note that you can optionally use a different TargetPath than Path, however as the target is a JSON path there must be a key to target for replacement in the output of that step. Also note that the JSON path specified must target one node, otherwise the CMA will error, as it does not support multiple replacement targets.

    If TargetPath is omitted, it will default to the value for Path.

    Full Message

    Setting the following parameters for a lambda:

    DiscoverGranules:
    Parameters:
    cma:
    event.$: '$'
    ReplaceConfig:
    FullMessage: true

    will result in the CMA assuming the entire inbound message should be stored to S3 if it exceeds the default max size.

    This is effectively the same as doing:

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 0,
    "Path": "$",
    "TargetPath": "$"
    }
    }
    }
    }
    }

    Cumulus Message example

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Cumulus Remote Message example

    The message may contain a reference to an S3 Bucket, Key and TargetPath as follows:

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    task_config

    This configuration key contains the input/output configuration values for definition of inputs/outputs via URL paths. Important: These values are all relative to json object configured for event.$.

    This configuration's behavior is outlined in the CMA step description below.

    The configuration should follow the format:

    {
    "FunctionName": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "other_cma_configuration": "<config object>",
    "task_config": "<task config>"
    }
    }
    }
    }

    Example:

    {
    "StepFunction": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "sfnEnd": true,
    "stack": "{$.meta.stack}",
    "bucket": "{$.meta.buckets.internal.name}",
    "stateMachine": "{$.cumulus_meta.state_machine}",
    "executionName": "{$.cumulus_meta.execution_name}",
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    }
    }
    }

    Cumulus Message Adapter Steps

    1. Reformat AWS Step Function message into Cumulus Message

    Due to the way AWS handles Parameterized messages, when Parameters are used the CMA takes an inbound message:

    {
    "resource": "arn:aws:lambda:us-east-1:<lambda arn values>",
    "input": {
    "Other Parameter": {},
    "cma": {
    "ConfigKey": {
    "config values": "some config values"
    },
    "event": {
    "cumulus_meta": {},
    "payload": {},
    "meta": {},
    "exception": {}
    }
    }
    }
    }

    and takes the following actions:

    • Takes the object at input.cma.event and makes it the full input
    • Merges all of the keys except event under input.cma into the parent input object

    This results in the incoming message (presumably a Cumulus message) with any cma configuration parameters merged in being passed to the CMA. All other parameterized values defined outside of the cma key are ignored

    2. Resolve Remote Messages

    If the incoming Cumulus message has a replace key value, the CMA will attempt to pull the payload from S3,

    For example, if the incoming contains the following:

      "meta": {
    "foo": {}
    },
    "replace": {
    "TargetPath": "$.meta.foo",
    "Bucket": "some_bucket",
    "Key": "events/some-event-id"
    }

    The CMA will attempt to pull the file stored at Bucket/Key and replace the value at TargetPath, then remove the replace object entirely and continue.

    3. Resolve URL templates in the task configuration

    In the workflow configuration (defined under the task_config key), each task has its own configuration, and it can use URL template as a value to achieve simplicity or for values only available at execution time. The Cumulus Message Adapter resolves the URL templates (relative to the event configuration key) and then passes message to next task. For example, given a task which has the following configuration:

    {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }
    }
    }
    }

    and and incoming message that contains:

    {
    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    }
    }

    The corresponding Cumulus Message would contain:

    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }

    The message sent to the task would be:

    "config" : {
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    },
    "inlinestr": "prefixbarsuffix",
    "array": ["bar"],
    "object": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    },
    "input": "{...}"

    URL template variables replace dotted paths inside curly brackets with their corresponding value. If the Cumulus Message Adapter cannot resolve a value, it will ignore the template, leaving it verbatim in the string. While seemingly complex, this allows significant decoupling of Tasks from one another and the data that drives them. Tasks are able to easily receive runtime configuration produced by previously run tasks and domain data.

    4. Resolve task input

    By default, the incoming payload is the payload from the previous task. The task can also be configured to use a portion of the payload its input message. For example, given a task specifies cma.task_config.cumulus_message.input:

        ExampleTask:
    Parameters:
    cma:
    event.$: '$'
    task_config:
    cumulus_message:
    input: '{$.payload.foo}'

    The task configuration in the message would be:

        {
    "task_config": {
    "cumulus_message": {
    "input": "{$.payload.foo}"
    }
    },
    "payload": {
    "foo": {
    "anykey": "anyvalue"
    }
    }
    }

    The Cumulus Message Adapter will resolve the task input, instead of sending the whole payload as task input, the task input would be:

        {
    "input" : {
    "anykey": "anyvalue"
    },
    "config": {...}
    }

    5. Resolve task output

    By default, the task's return value is the next payload. However, the workflow task configuration can specify a portion of the return value as the next payload, and can also augment values to other fields. Based on the task configuration under cma.task_config.cumulus_message.outputs, the Message Adapter uses a task's return value to output a message as configured by the task-specific config defined under cma.task_config. The Message Adapter dispatches a "source" to a "destination" as defined by URL templates stored in the task-specific cumulus_message.outputs. The value of the task's return value at the "source" URL is used to create or replace the value of the task's return value at the "destination" URL. For example, given a task specifies cumulus_message.output in its workflow configuration as follows:

    {
    "ExampleTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    }
    }
    }
    }
    }

    The corresponding Cumulus Message would be:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Given the response from the task is:

        {
    "output": {
    "anykey": "boo"
    }
    }

    The Cumulus Message Adapter would output the following Cumulus Message:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    6. Apply Remote Message Configuration

    If the ReplaceConfig configuration parameter is defined, the CMA will evaluate the configuration options provided, and if required write a portion of the Cumulus Message to S3, and add a replace key to the message for future steps to utilize.

    Please Note: the non user-modifiable field cumulus-meta will always be retained, regardless of the configuration.

    For example, if the output message (post output configuration) from a cumulus message looks like:

        {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    the resultant output would look like:

    {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "replace": {
    "TargetPath": "$",
    "Bucket": "some-internal-bucket",
    "Key": "events/some-event-id"
    }
    }

    Additional features

    Validate task input, output and configuration messages against the schemas provided

    The Cumulus Message Adapter has the capability to validate task input, output and configuration messages against their schemas. The default location of the schemas is the schemas folder in the top level of the task and the default filenames are input.json, output.json, and config.json. The task can also configure a different schema location. If no schema can be found, the Cumulus Message Adapter will not validate the messages.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/workflows/lambda/index.html b/docs/v11.1.0/workflows/lambda/index.html index 1dc35f2aacf..8fdf761a065 100644 --- a/docs/v11.1.0/workflows/lambda/index.html +++ b/docs/v11.1.0/workflows/lambda/index.html @@ -5,13 +5,13 @@ Develop Lambda Functions | Cumulus Documentation - +
    Version: v11.1.0

    Develop Lambda Functions

    Develop a new Cumulus Lambda

    AWS provides great getting started guide for building Lambdas in the developer guide.

    Cumulus currently supports the following environments for Cumulus Message Adapter enabled functions:

    Additionally you may chose to include any of the other languages AWS supports as a resource with reduced feature support.

    Deploy a Lambda

    Node.js Lambda

    For a new Node.js Lambda, create a new function and add an aws_lambda_function resource to your Cumulus deployment (for examples, see the example in source example/lambdas.tf and ingest/lambda-functions.tf) as either a new .tf file, or added to an existing .tf file:

    resource "aws_lambda_function" "myfunction" {
    function_name = "${var.prefix}-function"
    filename = "/path/to/zip/lambda.zip"
    source_code_hash = filebase64sha256("/path/to/zip/lambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"

    vpc_config {
    subnet_ids = var.subnet_ids
    security_group_ids = var.security_group_ids
    }
    }

    Please note: This example contains the minimum set of required configuration.

    Make sure to include a vpc_config that matches the information you've provided the cumulus module if intending to integrate the lambda with a Cumulus deployment.

    Java Lambda

    Java Lambdas are created in much the same way as the Node.js example above.

    The source points to a folder with the compiled .class files and dependency libraries in the Lambda Java zip folder structure (details here), not an uber-jar.

    The deploy folder referenced here would contain a folder 'test_task/task/' which contains Task.class and TaskLogic.class as well as a lib folder containing dependency jars.

    Python Lambda

    Python Lambdas are created the same way as the Node.js example above.

    Cumulus Message Adapter

    For Lambdas wishing to utilize the Cumulus Message Adapter(CMA), you should define a layers key on your Lambda resource with the CMA you wish to include. See the input_output docs for more on how to create/use the CMA.

    Other Lambda Options

    Cumulus supports all of the options available to you via the aws_lambda_function Terraform resource. For more information on what's available, check out the Terraform resource docs.

    Cloudwatch log groups

    If you want to enable Cloudwatch logging for your Lambda resource, you'll need to add a aws_cloudwatch_log_group resource to your Lambda definition:

    resource "aws_cloudwatch_log_group" "myfunction_log_group" {
    name = "/aws/lambda/${aws_lambda_function.myfunction.function_name}"
    retention_in_days = 30
    tags = { Deployment = var.prefix }
    }
    - + \ No newline at end of file diff --git a/docs/v11.1.0/workflows/protocol/index.html b/docs/v11.1.0/workflows/protocol/index.html index 6597226e27e..7fa5e80ce57 100644 --- a/docs/v11.1.0/workflows/protocol/index.html +++ b/docs/v11.1.0/workflows/protocol/index.html @@ -5,13 +5,13 @@ Workflow Protocol | Cumulus Documentation - +
    Version: v11.1.0

    Workflow Protocol

    Configuration and Message Use Diagram

    A diagram showing at which point in a workflow the Cumulus message is checked for conformity with the message schema and where the configuration is checked for conformity with the configuration schema

    • Configuration - The Cumulus workflow configuration defines everything needed to describe an instance of Cumulus.
    • Scheduler - This starts ingest of a collection on configured intervals.
    • Input to Step Functions - The Scheduler uses the Configuration as source data to construct the input to the Workflow.
    • AWS Step Functions - Run the workflows as kicked off by the scheduler or other processes.
    • Input to Task - The input for each task is a JSON document that conforms to the message schema.
    • Output from Task - The output of each task must conform to the message schemas as well and is used as the input for the subsequent task.
    - + \ No newline at end of file diff --git a/docs/v11.1.0/workflows/workflow-configuration-how-to/index.html b/docs/v11.1.0/workflows/workflow-configuration-how-to/index.html index 5b325855e05..ddef0c1af1d 100644 --- a/docs/v11.1.0/workflows/workflow-configuration-how-to/index.html +++ b/docs/v11.1.0/workflows/workflow-configuration-how-to/index.html @@ -5,7 +5,7 @@ Workflow Configuration How To's | Cumulus Documentation - + @@ -24,7 +24,7 @@ To take a subset of any given metadata, use the option substring.

    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{substring(file.fileName, 0, 3)}"

    This example will populate to "MOD09GQ/MOD"

    In addition to substring, several datetime-specific functions are available, which can parse a datetime string in the metadata and extract a certain part of it:

    "url_path": "{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"

    or

     "url_path": "{dateFormat(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime, YYYY-MM-DD[T]HH[:]mm[:]ss)}"

    The following functions are implemented:

    • extractYear - returns the year, formatted as YYYY
    • extractMonth - returns the month, formatted as MM
    • extractDate - returns the day of the month, formatted as DD
    • extractHour - returns the hour in 24-hour format, with no leading zero
    • dateFormat - takes a second argument describing how to format the date, and passes the metadata date string and the format argument to moment().format()

    Note: the move-granules step needs to be in the workflow for this template to be populated and the file moved. This cmrMetadata or CMR granule XML needs to have been generated and stored on S3. From there any field could be retrieved and used for a url_path.

    Adding Metadata dates and times to the URL Path

    There are a number of options to pull dates from the CMR file metadata. With this metadata:

    <Granule>
    <Temporal>
    <RangeDateTime>
    <BeginningDateTime>2003-02-19T00:00:00Z</BeginningDateTime>
    <EndingDateTime>2003-02-19T23:59:59Z</EndingDateTime>
    </RangeDateTime>
    </Temporal>
    </Granule>

    The following examples of url_path could be used.

    {extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the year from the full date: 2003.

    {extractMonth(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the month: 2.

    {extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the day: 19.

    {extractHour(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the hour: 0.

    Different values can be combined to create the url_path. For example

    {
    "bucket": "sample-protected-bucket",
    "name": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)/extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"
    }

    The final file location for the above would be s3://sample-protected-bucket/MOD09GQ/2003/19/MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.

    - + \ No newline at end of file diff --git a/docs/v11.1.0/workflows/workflow-triggers/index.html b/docs/v11.1.0/workflows/workflow-triggers/index.html index 8ed9ce21342..a9f782787cb 100644 --- a/docs/v11.1.0/workflows/workflow-triggers/index.html +++ b/docs/v11.1.0/workflows/workflow-triggers/index.html @@ -5,13 +5,13 @@ Workflow Triggers | Cumulus Documentation - +
    Version: v11.1.0

    Workflow Triggers

    For a workflow to run, it needs to be associated with a rule (see rule configuration). The rule configuration determines how and when a workflow execution is triggered. Rules can be triggered one time, on a schedule, or by new data written to a kinesis stream.

    There are three lambda functions in the API package responsible for scheduling and starting workflows: SF scheduler, message consumer, and SF starter. Each Cumulus instance comes with a Start SF SQS queue.

    The SF scheduler lambda puts a message onto the Start SF queue. This message is picked up the Start SF lambda and an execution is started with the body of the message as the input.

    When a one time rule is created, the schedule SF lambda is triggered. Rules that are not one time are associated with a CloudWatch event which will manage the trigger of the lambdas that trigger the workflows.

    For a scheduled rule, the Cloudwatch event is triggered on the given schedule which calls directly to the schedule SF lambda.

    For a kinesis rule, when data is added to the kinesis stream, the Cloudwatch event is triggered, which calls the message consumer lambda. The message consumer lambda parses the kinesis message and finds all of the rules associated with that message. For each rule (which corresponds to one workflow), the schedule SF lambda is triggered to queue a message to start the workflow.

    For an sns rule, when a message is published to the SNS topic, the message consumer receives the SNS message (JSON expected), parses it into an object, starts a new execution of the workflow associated with the rule and passes the object in the payload field of the Cumulus message.

    Diagram showing how workflows are scheduled via rules

    - + \ No newline at end of file diff --git a/docs/v12.0.0/adding-a-task/index.html b/docs/v12.0.0/adding-a-task/index.html index 5af3dac3dc9..8ed1b582d28 100644 --- a/docs/v12.0.0/adding-a-task/index.html +++ b/docs/v12.0.0/adding-a-task/index.html @@ -5,13 +5,13 @@ Contributing a Task | Cumulus Documentation - +
    Version: v12.0.0

    Contributing a Task

    We're tracking reusable Cumulus tasks in this list and, if you've got one you'd like to share with others, you can add it!

    Right now we're focused on tasks distributed via npm, but are open to including others. For now the script that pulls all the data for each package only supports npm.

    The tasks.md file is generated in the build process

    The tasks list in docs/tasks.md is generated from the list of task package names from the tasks folder.

    Do not edit the docs/tasks.md file directly.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/api/index.html b/docs/v12.0.0/api/index.html index bf690ca8bd6..314e951455f 100644 --- a/docs/v12.0.0/api/index.html +++ b/docs/v12.0.0/api/index.html @@ -5,13 +5,13 @@ Cumulus API | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v12.0.0/architecture/index.html b/docs/v12.0.0/architecture/index.html index c84d156d670..25d3dc6a652 100644 --- a/docs/v12.0.0/architecture/index.html +++ b/docs/v12.0.0/architecture/index.html @@ -5,14 +5,14 @@ Architecture | Cumulus Documentation - +
    Version: v12.0.0

    Architecture

    Architecture

    Below, find a diagram with the components that comprise an instance of Cumulus.

    Architecture diagram of a Cumulus deployment

    This diagram details all of the major architectural components of a Cumulus deployment.

    While the diagram can feel complex, it can easily be digested in several major components:

    Data Distribution

    End Users can access data via Cumulus's distribution submodule, which includes ASF's thin egress application, this provides authenticated data egress, temporary S3 links and other statistics features.

    End user exposure of Cumulus's holdings is expected to be provided by an external service.

    For NASA use, this is assumed to be CMR in this diagram.

    Data ingest

    Workflows

    The core of the ingest and processing capabilities in Cumulus is built into the deployed AWS Step Function workflows. Cumulus rules trigger workflows via either Cloud Watch rules, Kinesis streams, SNS topic, or SQS queue. The workflows then run with a configured Cumulus message, utilizing built-in processes to report status of granules, PDRs, executions, etc to the Data Persistence components.

    Workflows can optionally report granule metadata to CMR, and workflow steps can report metrics information to a shared SNS topic, which could be subscribed to for near real time granule, execution, and PDR status. This could be used for metrics reporting using an external ELK stack, for example.

    Data persistence

    Cumulus entity state data is stored in a set of PostgreSQL compatible database, and is exported to an Elasticsearch instance for non-authoritative querying/state data for the API and other applications that require more complex queries. Currently the entity state data is replicated in DynamoDB and this will be removed in a future release.

    Data discovery

    Discovering data for ingest is handled via workflow step components using Cumulus provider and collection configurations and various triggers. Data can be ingested from AWS S3, FTP, HTTPS and more.

    Database

    Cumulus utilizes a user-provided PostgreSQL database backend. For improved API search query efficiency Cumulus provides data replication to an Elasticsearch instance. For legacy reasons, Cumulus is currently also deploying a DynamoDB datastore, and writes are replicated in parallel with the PostgreSQL database writes. The DynamoDB replicated tables and parallel writes will be removed in future releases.

    PostgreSQL Database Schema Diagram

    ERD of the Cumulus Database

    Maintenance

    System maintenance personnel have access to manage ingest and various portions of Cumulus via an AWS API gateway, as well as the operator dashboard.

    Deployment Structure

    Cumulus is deployed via Terraform and is organized internally into two separate top-level modules, as well as several external modules.

    Cumulus

    The Cumulus module, which contains multiple internal submodules, deploys all of the Cumulus components that are not part of the Data Persistence portion of this diagram.

    Data persistence

    The data persistence module provides the Data Persistence portion of the diagram.

    Other modules

    Other modules are provided as artifacts on the release page for use in users configuring their own deployment and contain extracted subcomponents of the cumulus module. For more on these components see the components documentation.

    For more on the specific structure, examples of use and how to deploy and more, please see the deployment docs as well as the cumulus-template-deploy repo .

    - + \ No newline at end of file diff --git a/docs/v12.0.0/configuration/cloudwatch-retention/index.html b/docs/v12.0.0/configuration/cloudwatch-retention/index.html index b947e6a11d2..45a09b33e6e 100644 --- a/docs/v12.0.0/configuration/cloudwatch-retention/index.html +++ b/docs/v12.0.0/configuration/cloudwatch-retention/index.html @@ -5,13 +5,13 @@ Cloudwatch Retention | Cumulus Documentation - +
    Version: v12.0.0

    Cloudwatch Retention

    Our lambdas dump logs to AWS CloudWatch. By default, these logs exist indefinitely. However, there are ways to specify a duration for log retention.

    aws-cli

    In addition to getting your aws-cli set-up, there are two values you'll need to acquire.

    1. log-group-name: the name of the log group who's retention policy (retention time) you'd like to change. We'll use /aws/lambda/KinesisInboundLogger in our examples.
    2. retention-in-days: the number of days you'd like to retain the logs in the specified log group for. There is a list of possible values available in the aws logs documentation.

    For example, if we wanted to set log retention to 30 days on our KinesisInboundLogger lambda, we would write:

    aws logs put-retention-policy --log-group-name "/aws/lambda/KinesisInboundLogger" --retention-in-days 30

    Note: The aws-cli log command that we're using is explained in detail here.

    AWS Management Console

    Changing the log retention policy in the AWS Management Console is a fairly simple process:

    1. Navigate to the CloudWatch service in the AWS Management Console.
    2. Click on the Logs entry on the sidebar.
    3. Find the Log Group who's retention policy you're interested in changing.
    4. Click on the value in the Expire Events After column.
    5. Enter/Select the number of days you'd like to retain logs in that log group for.

    Screenshot of AWS console showing how to configure the retention period for Cloudwatch logs

    - + \ No newline at end of file diff --git a/docs/v12.0.0/configuration/collection-storage-best-practices/index.html b/docs/v12.0.0/configuration/collection-storage-best-practices/index.html index 2e49888fe91..c495ac605f8 100644 --- a/docs/v12.0.0/configuration/collection-storage-best-practices/index.html +++ b/docs/v12.0.0/configuration/collection-storage-best-practices/index.html @@ -5,13 +5,13 @@ Collection Cost Tracking and Storage Best Practices | Cumulus Documentation - +
    Version: v12.0.0

    Collection Cost Tracking and Storage Best Practices

    Organizing your data is important for metrics you may want to collect. AWS S3 storage and cost metrics are calculated at the bucket level, so it is easy to get metrics by bucket. You can get storage metrics at the key prefix level, but that is done through the CLI, which can be very slow for large buckets. It is very difficult to estimate costs at the prefix level.

    Calculating Storage By Collection

    By bucket

    Usage by bucket can be obtained in your AWS Billing Dashboard via an S3 Usage Report. You can download your usage report for a period of time and review your storage and requests at the bucket level.

    Bucket metrics can also be found in the AWS CloudWatch Metrics Console (also see Using Amazon CloudWatch Metrics).

    Navigate to Storage Metrics and select the BucketName for all buckets you are interested in. The available metrics are BucketSizeInBytes and NumberOfObjects.

    In the Graphed metrics tab, you can select the type of statistic (i.e. average, minimum, maximum) and the period for the stats. At the top, it's useful to select from the dropdown to view the metrics as a number. You can also select the time period for which you want to see stats.

    Alternatively you can query CloudWatch using the CLI.

    This command will return the average number of bytes in the bucket test-bucket for 7/31/2019:

    aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2019-07-31T00:00:00 --end-time 2019-08-01T00:00:00 --period 86400 --statistics Average --region us-east-1 --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=test-bucket Name=StorageType,Value=StandardStorage

    The result looks like:

    {
    "Datapoints": [
    {
    "Timestamp": "2019-07-31T00:00:00Z",
    "Average": 150996467959.0,
    "Unit": "Bytes"
    }
    ],
    "Label": "BucketSizeBytes"
    }

    By key prefix

    AWS does not offer storage and usage statistics at a key prefix level. Via the AWS CLI, you can get the total storage for a bucket or folder. The following command would get the storage for folder example-folder in bucket sample-bucket:

    aws s3 ls --summarize --human-readable --recursive s3://sample-bucket/example-folder | grep 'Total'

    Note that this can be a long-running operation for large buckets.

    Calculating Cost By Collection

    NASA NGAP Environment

    If using an NGAP account, the cost per bucket can be found in your CloudTamer console, in the Financials section of your account information. This is calculated on a monthly basis.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Outside of NGAP

    You can enabled S3 Cost Allocation Tags and tag your buckets. From there, you can view the cost breakdown in your AWS Billing Dashboard via the Cost Explorer. Cost Allocation Tagging is available at the bucket level.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Storage Configuration

    Cumulus allows for the configuration of many buckets for your files. Buckets are created and added to your deployment as part of the deployment process.

    In your Cumulus collection configuration, you specify where you want the files to be stored post-processing. This is done by matching a regular expression on the file with the configured bucket.

    Note that in the collection configuration, the bucket field is the key to the buckets variable in the deployment's .tfvars file.

    Organizing By Bucket

    You can specify separate groups of buckets for each collection, which could look like the example below.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "MOD09GQ-006-private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "MOD09GQ-006-public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    Additional collections would go to different buckets.

    Organizing by Key Prefix

    Different collections can be organized into different folders in the same bucket, using the key prefix, which is specified as the url_path in the collection configuration. In this simplified collection configuration example, the url_path field is set at the top level so that all files go to a path prefixed with the collection name and version.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    In this case, the path to all the files would be: MOD09GQ___006/<filename> in their respective buckets.

    The url_path can be overidden directly on the file configuration. The example below produces the same result.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "protected-2",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    }
    ]
    }
    - + \ No newline at end of file diff --git a/docs/v12.0.0/configuration/data-management-types/index.html b/docs/v12.0.0/configuration/data-management-types/index.html index a3ad4342fa1..e0a92780490 100644 --- a/docs/v12.0.0/configuration/data-management-types/index.html +++ b/docs/v12.0.0/configuration/data-management-types/index.html @@ -5,13 +5,13 @@ Cumulus Data Management Types | Cumulus Documentation - +
    Version: v12.0.0

    Cumulus Data Management Types

    What Are The Cumulus Data Management Types

    • Collections: Collections are logical sets of data objects of the same data type and version. They provide contextual information used by Cumulus ingest.
    • Granules: Granules are the smallest aggregation of data that can be independently managed. They are always associated with a collection, which is a grouping of granules.
    • Providers: Providers generate and distribute input data that Cumulus obtains and sends to workflows.
    • Rules: Rules tell Cumulus how to associate providers and collections and when/how to start processing a workflow.
    • Workflows: Workflows are composed of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage, and archive data.
    • Executions: Executions are records of a workflow.
    • Reconciliation Reports: Reports are a comparison of data sets to check to see if they are in agreement and to help Cumulus users detect conflicts.

    Interaction

    • Providers tell Cumulus where to get new data - i.e. S3, HTTPS
    • Collections tell Cumulus where to store the data files
    • Rules tell Cumulus when to trigger a workflow execution and tie providers and collections together

    Managing Data Management Types

    The following are created via the dashboard or API:

    • Providers
    • Collections
    • Rules
    • Reconciliation reports

    Granules are created by workflow executions and then can be managed via the dashboard or API.

    An execution record is created for each workflow execution triggered and can be viewed in the dashboard or data can be retrieved via the API.

    Workflows are created and managed via the Cumulus deployment.

    Configuration Fields

    Schemas

    Looking at our API schema definitions can provide us with some insight into collections, providers, rules, and their attributes (and whether those are required or not). The schema for different concepts will be reference throughout this document.

    The schemas are extremely useful for understanding which attributes are configurable and which of those are required. Cumulus uses these schemas for validation.

    Providers

    Please note:

    • While connection configuration is defined here, things that are more specific to a specific ingest setup (e.g. 'What target directory should we be pulling from' or 'How is duplicate handling configured?') are generally defined in a Rule or Collection, not the Provider.
    • There is some provider behavior which is controlled by task-specific configuration and not the provider definition. This configuration has to be set on a per-workflow basis. For example, see the httpListTimeout configuration on the discover-granules task

    Provider Configuration

    The Provider configuration is defined by a JSON object that takes different configuration keys depending on the provider type. The following are definitions of typical configuration values relevant for the various providers:

    Configuration by provider type
    S3
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be s3 for this provider type.
    hoststringYesS3 Bucket to pull data from
    http
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be http for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 80
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certificateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    https
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be https for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 443
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certiciateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    ftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be ftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to anonymous if not defined
    passwordstringNoPassword to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to password if not defined
    portintegerNoPort to connect to the provider on. Defaults to 21
    sftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be sftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the sftp server.
    passwordstringNoPassword to use to connect to the sftp server.
    portintegerNoPort to connect to the provider on. Defaults to 22
    privateKeystringNofilename assumed to be in s3://bucketInternal/stackName/crypto
    cmKeyIdstringNoAWS KMS Customer Master Key arn or alias

    Collections

    Break down of [s3_MOD09GQ_006.json](https://github.com/nasa/cumulus/blob/master/example/data/collections/s3_MOD09GQ_006/s3_MOD09GQ_006.json)
    KeyValueRequiredDescription
    name"MOD09GQ"YesThe name attribute designates the name of the collection. This is the name under which the collection will be displayed on the dashboard
    version"006"YesA version tag for the collection
    granuleId"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$"YesThe regular expression used to validate the granule ID extracted from filenames according to the granuleIdExtraction
    granuleIdExtraction"(MOD09GQ\..*)(\.hdf|\.cmr|_ndvi\.jpg)"YesThe regular expression used to extract the granule ID from filenames. The first capturing group extracted from the filename by the regex will be used as the granule ID.
    sampleFileName"MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesAn example filename belonging to this collection
    files<JSON Object> of files defined hereYesDescribe the individual files that will exist for each granule in this collection (size, browse, meta, etc.)
    dataType"MOD09GQ"NoCan be specified, but this value will default to the collection_name if not
    duplicateHandling"replace"No("replace"|"version"|"skip") determines granule duplicate handling scheme
    ignoreFilesConfigForDiscoveryfalse (default)NoBy default, during discovery only files that match one of the regular expressions in this collection's files attribute (see above) are ingested. Setting this to true will ignore the files attribute during discovery, meaning that all files for a granule (i.e., all files with filenames matching granuleIdExtraction) will be ingested even when they don't match a regular expression in the files attribute at discovery time. (NOTE: this attribute does not appear in the example file, but is listed here for completeness.)
    process"modis"NoExample options for this are found in the ChooseProcess step definition in the IngestAndPublish workflow definition
    meta<JSON Object> of MetaData for the collectionNoMetaData for the collection. This metadata will be available to workflows for this collection via the Cumulus Message Adapter.
    url_path"{cmrMetadata.Granule.Collection.ShortName}/
    {substring(file.fileName, 0, 3)}"
    NoFilename without extension

    files-object

    KeyValueRequiredDescription
    regex"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"YesRegular expression used to identify the file
    sampleFileNameMOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesFilename used to validate the provided regex
    type"data"NoValue to be assigned to the Granule File Type. CNM types are used by Cumulus CMR steps, non-CNM values will be treated as 'data' type. Currently only utilized in DiscoverGranules task
    bucket"internal"YesName of the bucket where the file will be stored
    url_path"${collectionShortName}/{substring(file.fileName, 0, 3)}"NoFolder used to save the granule in the bucket. Defaults to the collection url_path
    checksumFor"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"NoIf this is a checksum file, set checksumFor to the regex of the target file.

    Rules

    Rules are used by to start processing workflows and the transformation process. Rules can be invoked manually, based on a schedule, or can be configured to be triggered by either events in Kinesis, SNS messages or SQS messages.

    Rule configuration
    KeyValueRequiredDescription
    name"L2_HR_PIXC_kinesisRule"YesName of the rule. This is the name under which the rule will be listed on the dashboard
    workflow"CNMExampleWorkflow"YesName of the workflow to be run. A list of available workflows can be found on the Workflows page
    provider"PODAAC_SWOT"NoConfigured provider's ID. This can be found on the Providers dashboard page
    collection<JSON Object> collection object shown belowYesName and version of the collection this rule will moderate. Relates to a collection configured and found in the Collections page
    payload<JSON Object or Array>NoThe payload to be passed to the workflow
    meta<JSON Object> of MetaData for the ruleNoMetaData for the rule. This metadata will be available to workflows for this rule via the Cumulus Message Adapter.
    rule<JSON Object> rule type and associated values - discussed belowYesObject defining the type and subsequent attributes of the rule
    state"ENABLED"No("ENABLED"|"DISABLED") whether or not the rule will be active. Defaults to "ENABLED".
    queueUrlhttps://sqs.us-east-1.amazonaws.com/1234567890/queue-nameNoURL for SQS queue that will be used to schedule workflows for this rule
    tags["kinesis", "podaac"]NoAn array of strings that can be used to simplify search

    collection-object

    KeyValueRequiredDescription
    name"L2_HR_PIXC"YesName of a collection defined/configured in the Collections dashboard page
    version"000"YesVersion number of a collection defined/configured in the Collections dashboard page

    meta-object

    KeyValueRequiredDescription
    retries3NoNumber of retries on errors, for sqs-type rule only. Defaults to 3.
    visibilityTimeout900NoVisibilityTimeout in seconds for the inflight messages, for sqs-type rule only. Defaults to the visibility timeout of the SQS queue when the rule is created.

    rule-object

    KeyValueRequiredDescription
    type"kinesis"Yes("onetime"|"scheduled"|"kinesis"|"sns"|"sqs") type of scheduling/workflow kick-off desired
    value<String> ObjectDependsDiscussion of valid values is below

    rule-value

    The rule - value entry depends on the type of run:

    • If this is a onetime rule this can be left blank. Example
    • If this is a scheduled rule this field must hold a valid cron-type expression or rate expression.
    • If this is a kinesis rule, this must be a configured ${Kinesis_stream_ARN}. Example
    • If this is an sns rule, this must be an existing ${SNS_Topic_Arn}. Example
    • If this is an sqs rule, this must be an existing ${SQS_QueueUrl} that your account has permissions to access, and also you must configure a dead-letter queue for this SQS queue. Example

    sqs-type rule features

    • When an SQS rule is triggered, the SQS message remains on the queue.
    • The SQS message is not processed multiple times in parallel when visibility timeout is properly set. You should set the visibility timeout to the maximum expected length of the workflow with padding. Longer is better to avoid parallel processing.
    • The SQS message visibility timeout can be overridden by the rule.
    • Upon successful workflow execution, the SQS message is removed from the queue.
    • Upon failed execution(s), the workflow is run 3 or configured number of times.
    • Upon failed execution(s), the visibility timeout will be set to 5s to allow retries.
    • After configured number of failed retries, the SQS message is moved to the dead-letter queue configured for the SQS queue.

    Configuration Via Cumulus Dashboard

    Create A Provider

    • In the Cumulus dashboard, go to the Provider page.

    Screenshot of Create Provider form

    • Click on Add Provider.
    • Fill in the form and then submit it.

    Screenshot of Create Provider form

    Create A Collection

    • Go to the Collections page.

    Screenshot of the Collections page

    • Click on Add Collection.
    • Copy and paste or fill in the collection JSON object form.

    Screenshot of Add Collection form

    • Once you submit the form, you should be able to verify that your new collection is in the list.

    Create A Rule

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Rule Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v12.0.0/configuration/lifecycle-policies/index.html b/docs/v12.0.0/configuration/lifecycle-policies/index.html index 3f99658ed31..8085d3b25a8 100644 --- a/docs/v12.0.0/configuration/lifecycle-policies/index.html +++ b/docs/v12.0.0/configuration/lifecycle-policies/index.html @@ -5,13 +5,13 @@ Setting S3 Lifecycle Policies | Cumulus Documentation - +
    Version: v12.0.0

    Setting S3 Lifecycle Policies

    This document will outline, in brief, how to set data lifecycle policies so that you are more easily able to control data storage costs while keeping your data accessible. For more information on why you might want to do this, see the 'Additional Information' section at the end of the document.

    Requirements

    • The AWS CLI installed and configured (if you wish to run the CLI example). See AWS's guide to setting up the AWS CLI for more on this. Please ensure the AWS CLI is in your shell path.
    • You will need a S3 bucket on AWS. You are strongly encouraged to use a bucket without voluminous amounts of data in it for experimenting/learning.
    • An AWS user with the appropriate roles to access the target bucket as well as modify bucket policies.

    Examples

    Walk-through on setting time-based S3 Infrequent Access (S3IA) bucket policy

    This example will give step-by-step instructions on updating a bucket's lifecycle policy to move all objects in the bucket from the default storage to S3 Infrequent Access (S3IA) after a period of 90 days. Below are instructions for walking through configuration via the command line and the management console.

    Command Line

    Please ensure you have the AWS CLI installed and configured for access prior to attempting this example.

    Create policy

    From any directory you chose, open an editor and add the following to a file named exampleRule.json

    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    Set policy

    On the command line run the following command (with the bucket you're working with substituted in place of yourBucketNameHere).

    aws s3api put-bucket-lifecycle-configuration --bucket yourBucketNameHere --lifecycle-configuration file://exampleRule.json

    Verify policy has been set

    To obtain all of the existing policies for a bucket, run the following command (again substituting the correct bucket name):

     $ aws s3api get-bucket-lifecycle-configuration --bucket yourBucketNameHere
    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    You have set a policy that transitions any version of an object in the bucket to S3IA after each object version has not been modified for 90 days.

    Management Console

    Create Policy

    To create the example policy on a bucket via the management console, go to the following URL (replacing 'yourBucketHere' with the bucket you intend to update):

    https://s3.console.aws.amazon.com/s3/buckets/yourBucketHere/?tab=overview

    You should see a screen similar to:

    Screenshot of AWS console for an S3 bucket

    Click the "Management" Tab, then lifecycle button and press + Add lifecycle rule:

    Screenshot of &quot;Management&quot; tab of AWS console for an S3 bucket

    Give the rule a name (e.g. '90DayRule'), leaving the filter blank:

    Screenshot of window for configuring the name and scope of a lifecycle rule on an S3 bucket in the AWS console

    Click next, and mark Current Version and Previous Versions.

    Then for each, click + Add transition and select Transition to Standard-IA after for the Object creation field, and set 90 for the Days after creation/Days after objects become concurrent field. Your screen should look similar to:

    Screenshot of window for configuring the storage class transitions of a lifecycle rule on an S3 bucket in the AWS console

    Click next, then next past the Configure expiration screen (we won't be setting this), and on the fourth page, click Save:

    Screenshot of window for reviewing the configuration of a lifecycle rule on an S3 bucket in the AWS console

    You should now see you have a rule configured for your bucket:

    Screenshot of lifecycle rule appearing in the &quot;Management&quot; tab of AWS console for an S3 bucket

    You have now set a policy that transitions any version of an object in the bucket to S3IA after each object has not been modified for 90 days.

    Additional Information

    This section lists information you may want prior to enacting lifecycle policies. It is not required content for working through the examples.

    Strategy Overview

    For a discussion of overall recommended strategy, please review the Methodology for Data Lifecycle Management on the EarthData wiki.

    AWS Documentation

    The examples shown in this document are obviously fairly basic cases. By using object tags, filters and other configuration options you can enact far more complicated policies for various scenarios. For more reading on the topics presented on this page see:

    - + \ No newline at end of file diff --git a/docs/v12.0.0/configuration/monitoring-readme/index.html b/docs/v12.0.0/configuration/monitoring-readme/index.html index bc25d7a4bcb..2d0457d95d7 100644 --- a/docs/v12.0.0/configuration/monitoring-readme/index.html +++ b/docs/v12.0.0/configuration/monitoring-readme/index.html @@ -5,14 +5,14 @@ Monitoring Best Practices | Cumulus Documentation - +
    Version: v12.0.0

    Monitoring Best Practices

    This document intends to provide a set of recommendations and best practices for monitoring the state of a deployed Cumulus and diagnosing any issues.

    Cumulus-provided resources and integrations for monitoring

    Cumulus provides a number set of resources that are useful for monitoring the system and its operation.

    Cumulus Dashboard

    The primary tool for monitoring the Cumulus system is the Cumulus Dashboard. The dashboard is hosted on Github and includes instructions on how to deploy and link it into your core Cumulus deployment.

    The dashboard displays workflow executions, their status, inputs, outputs, and some diagnostic information such as logs. For further information on the dashboard, its usage, and the information it provides, see the documentation.

    Cumulus-provided AWS resources

    Cumulus sets up CloudWatch log groups for all Core-provided tasks.

    Monitoring Lambda Functions

    Logging for each Lambda Function is available in Lambda-specific CloudWatch log groups.

    Monitoring ECS services

    Each deployed cumulus_ecs_service module also includes a CloudWatch log group for the processes running on ECS.

    Monitoring workflows

    For advanced debugging, we also configure dead letter queues on critical system functions. These will allow you to monitor and debug invalid inputs to the functions we use to start workflows, which can be helpful if you find that you are not seeing workflows being started as expected. More information on these can be found in the dead letter queue documentation

    AWS recommendations

    AWS has a number of recommendations on system monitoring. Rather than reproduce those here and risk providing outdated guidance, we've documented the following links which will take you to available AWS docs on monitoring recommendations and best practices for the services used in Cumulus:

    Example: Setting up email notifications for CloudWatch logs

    Cumulus does not provide out-of-the-box support for email notifications at this time. However, setting up email notifications on AWS is fairly straightforward in that the operative components are an AWS SNS topic and a subscribed email address.

    In terms of Cumulus integration, forwarding CloudWatch logs requires creating a mechanism, most likely a Lambda Function subscribed to the log group that will receive, filter and forward these messages to the SNS topic.

    As a very simple example, we could create a function that filters CloudWatch logs created by the @cumulus/logger package and sends email notifications for error and fatal log levels, adapting the example linked above:

    const zlib = require('zlib');
    const aws = require('aws-sdk');
    const { promisify } = require('util');

    const gunzip = promisify(zlib.gunzip);
    const sns = new aws.SNS();

    exports.handler = async (event) => {
    const payload = Buffer.from(event.awslogs.data, 'base64');
    const decompressedData = await gunzip(payload);
    const logData = JSON.parse(decompressedData.toString('ascii'));
    return await Promise.all(logData.logEvents.map(async (logEvent) => {
    const logMessage = JSON.parse(logEvent.message);
    if (['error', 'fatal'].includes(logMessage.level)) {
    return sns.publish({
    TopicArn: process.env.EmailReportingTopicArn,
    Message: logEvent.message
    }).promise();
    }
    return Promise.resolve();
    }));
    };

    After creating the SNS topic, We can deploy this code as a lambda function, following the setup steps from Amazon. Make sure to include your SNS topic ARN as an environment variable on the lambda function by using the --environment option on aws lambda create-function.

    You will need to create subscription filters for each log group you want to receive emails for. We recommend automating this as much as possible, and you could very well handle this via Terraform, such as using a module to deploy filters alongside log groups, or exporting the log group names to an all-in-one email notification module.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/configuration/server_access_logging/index.html b/docs/v12.0.0/configuration/server_access_logging/index.html index ccd843f4e04..07bc0b0829f 100644 --- a/docs/v12.0.0/configuration/server_access_logging/index.html +++ b/docs/v12.0.0/configuration/server_access_logging/index.html @@ -5,13 +5,13 @@ S3 Server Access Logging | Cumulus Documentation - +
    Version: v12.0.0

    S3 Server Access Logging

    Via AWS Console

    Enable server access logging for an S3 bucket

    Via AWS Command Line Interface

    1. Create a logging.json file with these contents, replacing <stack-internal-bucket> with your stack's internal bucket name, and <stack> with the name of your cumulus stack.

      {
      "LoggingEnabled": {
      "TargetBucket": "<stack-internal-bucket>",
      "TargetPrefix": "<stack>/ems-distribution/s3-server-access-logs/"
      }
      }
    2. Add the logging policy to each of your protected and public buckets by calling this command on each bucket.

      aws s3api put-bucket-logging --bucket <protected/public-bucket-name> --bucket-logging-status file://logging.json
    3. Verify the logging policy exists on your buckets.

      aws s3api get-bucket-logging --bucket <protected/public-bucket-name>
    - + \ No newline at end of file diff --git a/docs/v12.0.0/configuration/task-configuration/index.html b/docs/v12.0.0/configuration/task-configuration/index.html index 997dc141f4c..ec408bc13fc 100644 --- a/docs/v12.0.0/configuration/task-configuration/index.html +++ b/docs/v12.0.0/configuration/task-configuration/index.html @@ -5,13 +5,13 @@ Configuration of Tasks | Cumulus Documentation - +
    Version: v12.0.0

    Configuration of Tasks

    The cumulus module exposes values for configuration for some of the provided archive and ingest tasks. Currently the following are available as configurable variables:

    cmr_search_client_config

    Configuration parameters for CMR search client for cumulus archive module tasks in the form:

    <lambda_identifier>_report_cmr_limit = <maximum number records can be returned from cmr-client search, this should be greater than cmr_page_size>
    <lambda_identifier>_report_cmr_page_size = <number of records for each page returned from CMR>
    type = map(string)

    More information about cmr limit and cmr page_size can be found from @cumulus/cmr-client and CMR Search API document.

    Currently the following values are supported:

    • create_reconciliation_report_cmr_limit
    • create_reconciliation_report_cmr_page_size

    Example

    cmr_search_client_config = {
    create_reconciliation_report_cmr_limit = 2500
    create_reconciliation_report_cmr_page_size = 250
    }

    elasticsearch_client_config

    Configuration parameters for Elasticsearch client for cumulus archive module tasks in the form:

    <lambda_identifier>_es_scroll_duration = <duration>
    <lambda_identifier>_es_scroll_size = <size>
    type = map(string)

    Currently the following values are supported:

    • create_reconciliation_report_es_scroll_duration
    • create_reconciliation_report_es_scroll_size

    Example

    elasticsearch_client_config = {
    create_reconciliation_report_es_scroll_duration = "15m"
    create_reconciliation_report_es_scroll_size = 2000
    }

    lambda_timeouts

    A configurable map of timeouts (in seconds) for cumulus ingest module task lambdas in the form:

    <lambda_identifier>_timeout: <timeout>
    type = map(string)

    Currently the following values are supported:

    • discover_granules_task_timeout
    • discover_pdrs_task_timeout
    • hyrax_metadata_update_tasks_timeout
    • lzards_backup_task_timeout
    • move_granules_task_timeout
    • parse_pdr_task_timeout
    • pdr_status_check_task_timeout
    • post_to_cmr_task_timeout
    • queue_granules_task_timeout
    • queue_pdrs_task_timeout
    • queue_workflow_task_timeout
    • sync_granule_task_timeout
    • update_granules_cmr_metadata_file_links_task_timeout

    Example

    lambda_timeouts = {
    discover_granules_task_timeout = 300
    }
    - + \ No newline at end of file diff --git a/docs/v12.0.0/data-cookbooks/about-cookbooks/index.html b/docs/v12.0.0/data-cookbooks/about-cookbooks/index.html index 9839bd76c72..aee62744de7 100644 --- a/docs/v12.0.0/data-cookbooks/about-cookbooks/index.html +++ b/docs/v12.0.0/data-cookbooks/about-cookbooks/index.html @@ -5,13 +5,13 @@ About Cookbooks | Cumulus Documentation - +
    Version: v12.0.0

    About Cookbooks

    Introduction

    The following data cookbooks are documents containing examples and explanations of workflows in the Cumulus framework. Additionally, the following data cookbooks should serve to help unify an institution/user group on a set of terms.

    Setup

    The data cookbooks assume you can configure providers, collections, and rules to run workflows. Visit Cumulus data management types for information on how to configure Cumulus data management types.

    Adding a page

    As shown in detail in the "Add a New Page and Sidebars" section in Cumulus Docs: How To's, you can add a new page to the data cookbook by creating a markdown (.md) file in the docs/data-cookbooks directory. The new page can then be linked to the sidebar by adding it to the Data-Cookbooks object in the website/sidebar.json file as data-cookbooks/${id}.

    More about workflows

    Workflow general information

    Input & Output

    Developing Workflow Tasks

    Workflow Configuration How-to's

    - + \ No newline at end of file diff --git a/docs/v12.0.0/data-cookbooks/browse-generation/index.html b/docs/v12.0.0/data-cookbooks/browse-generation/index.html index 09c808dbddc..535eb9c1cfc 100644 --- a/docs/v12.0.0/data-cookbooks/browse-generation/index.html +++ b/docs/v12.0.0/data-cookbooks/browse-generation/index.html @@ -5,7 +5,7 @@ Ingest Browse Generation | Cumulus Documentation - + @@ -15,7 +15,7 @@ provider keys with the previously entered values) Note that you need to set the "provider_path" to the path on your bucket (e.g. "/data") that you've staged your mock/test data.:

    {
    "name": "TestBrowseGeneration",
    "workflow": "DiscoverGranulesBrowseExample",
    "provider": "{{provider_from_previous_step}}",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "meta": {
    "provider_path": "{{path_to_data}}"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "updatedAt": 1553053438767
    }

    Run Workflows

    Once you've configured the Collection and Provider and added a onetime rule, you're ready to trigger your rule, and watch the ingest workflows process.

    Go to the Rules tab, click the rule you just created:

    Screenshot of the Rules overview page with a list of rules in the Cumulus dashboard

    Then click the gear in the upper right corner and click "Rerun":

    Screenshot of clicking the button to rerun a workflow rule from the rule edit page in the Cumulus dashboard

    Tab over to executions and you should see the DiscoverGranulesBrowseExample workflow run, succeed, and then moments later the CookbookBrowseExample should run and succeed.

    Screenshot of page listing executions in the Cumulus dashboard

    Results

    You can verify your data has ingested by clicking the successful workflow entry:

    Screenshot of individual entry from table listing executions in the Cumulus dashboard

    Select "Show Output" on the next page

    Screenshot of &quot;Show output&quot; button from individual execution page in the Cumulus dashboard

    and you should see in the payload from the workflow something similar to:

    "payload": {
    "process": "modis",
    "granules": [
    {
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-private",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "type": "browse",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-protected-2",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}"
    }
    ],
    "cmrLink": "https://cmr.uat.earthdata.nasa.gov/search/granules.json?concept_id=G1222231611-CUMULUS",
    "cmrConceptId": "G1222231611-CUMULUS",
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "cmrMetadataFormat": "echo10",
    "dataType": "MOD09GQ",
    "version": "006",
    "published": true
    }
    ]
    }

    You can verify the granules exist within your cumulus instance (search using the Granules interface, check the S3 buckets, etc) and validate that the above CMR entry


    Build Processing Lambda

    This section discusses the construction of a custom processing lambda to replace the contrived example from this entry for a real dataset processing task.

    To ingest your own data using this example, you will need to construct your own lambda to replace the source in ProcessingStep that will generate browse imagery and provide or update a CMR metadata export file.

    You will then need to add the lambda to your Cumulus deployment as a aws_lambda_function Terraform resource.

    The discussion below outlines requirements for this lambda.

    Inputs

    The incoming message to the task defined in the ProcessingStep as configured will have the following configuration values (accessible inside event.config courtesy of the message adapter):

    Configuration

    • event.config.bucket -- the name of the bucket configured in terraform.tfvars as your internal bucket.

    • event.config.collection -- The full collection object we will configure in the Configure Ingest section. You can view the expected collection schema in the docs here or in the source code on github. You need this as available input and output so you can update as needed.

    event.config.additionalUrls, generateFakeBrowse and event.config.cmrMetadataFormat from the example can be ignored as they're configuration flags for the provided example script.

    Payload

    The 'payload' from the previous task is accessible via event.input. The expected payload output schema from SyncGranules can be viewed here.

    In our example, the payload would look like the following. Note: The types are set per-file based on what we configured in our collection, and were initially added as part of the DiscoverGranules step in the DiscoverGranulesBrowseExample workflow.

     "payload": {
    "process": "modis",
    "granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    }
    ]
    }
    ]
    }

    Generating Browse Imagery

    The provided example script used in the example goes through all granules and adds a 'fake' .jpg browse file to the same staging location as the data staged by prior ingest tasksf.

    The processing lambda you construct will need to do the following:

    • Create a browse image file based on the input data, and stage it to a location accessible to both this task and the FilesToGranules and MoveGranules tasks in a S3 bucket.
    • Add the browse file to the input granule files, making sure to set the granule file's type to browse.
    • Update meta.input_granules with the updated granules list, as well as provide the files to be integrated by FilesToGranules as output from the task.

    Generating/updating CMR metadata

    If you do not already have a CMR file in the granules list, you will need to generate one for valid export. This example's processing script generates and adds it to the FilesToGranules file list via the payload but it can be present in the InputGranules from the DiscoverGranules task as well if you'd prefer to pre-generate it.

    Both downstream tasks MoveGranules, UpdateGranulesCmrMetadataFileLinks, and PostToCmr expect a valid CMR file to be available if you want to export to CMR.

    Expected Outputs for processing task/tasks

    In the above example, the critical portion of the output to FilesToGranules is the payload and meta.input_granules.

    In the example provided, the processing task is setup to return an object with the keys "files" and "granules". In the cumulus_message configuration, the outputs are mapped in the configuration to the payload, granules to meta.input_granules:

              "task_config": {
    "inputGranules": "{$.meta.input_granules}",
    "granuleIdExtraction": "{$.meta.collection.granuleIdExtraction}"
    }

    Their expected values from the example above may be useful in constructing a processing task:

    payload

    The payload includes a full list of files to be 'moved' into the cumulus archive. The FilesToGranules task will take this list, merge it with the information from InputGranules, then pass that list to the MoveGranules task. The MoveGranules task will then move the files to their targets. The UpdateGranulesCmrMetadataFileLinks task will update the CMR metadata file if it exists with the updated granule locations and update the CMR file etags.

    In the provided example, a payload being passed to the FilesToGranules task should be expected to look like:

      "payload": [
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml"
    ]

    This list is the list of granules FilesToGranules will act upon to add/merge with the input_granules object.

    The pathing is generated from sync-granules, but in principle the files can be staged wherever you like so long as the processing/MoveGranules task's roles have access and the filename matches the collection configuration.

    input_granules

    The FilesToGranules task utilizes the incoming payload to chose which files to move, but pulls all other metadata from meta.input_granules. As such, the output payload in the example would look like:

    "input_granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg"
    }
    ]
    }
    ],
    - + \ No newline at end of file diff --git a/docs/v12.0.0/data-cookbooks/choice-states/index.html b/docs/v12.0.0/data-cookbooks/choice-states/index.html index d3021f60279..48569bc9d6f 100644 --- a/docs/v12.0.0/data-cookbooks/choice-states/index.html +++ b/docs/v12.0.0/data-cookbooks/choice-states/index.html @@ -5,13 +5,13 @@ Choice States | Cumulus Documentation - +
    Version: v12.0.0

    Choice States

    Cumulus supports AWS Step Function Choice states. A Choice state enables branching logic in Cumulus workflows.

    Choice state definitions include a list of Choice Rules. Each Choice Rule defines a logical operation which compares an input value against a value using a comparison operator. For available comparison operators, review the AWS docs.

    If the comparison evaluates to true, the Next state is followed.

    Example

    In examples/cumulus-tf/parse_pdr_workflow.tf the ParsePdr workflow uses a Choice state, CheckAgainChoice, to terminate the workflow once meta.isPdrFinished: true is returned by the CheckStatus state.

    The CheckAgainChoice state definition requires an input object of the following structure:

    {
    "meta": {
    "isPdrFinished": false
    }
    }

    Given the above input to the CheckAgainChoice state, the workflow would transition to the PdrStatusReport state.

    "CheckAgainChoice": {
    "Type": "Choice",
    "Choices": [
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": false,
    "Next": "PdrStatusReport"
    },
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": true,
    "Next": "WorkflowSucceeded"
    }
    ],
    "Default": "WorkflowSucceeded"
    }

    Advanced: Loops in Cumulus Workflows

    Understanding the complete ParsePdr workflow is not necessary to understanding how Choice states work, but ParsePdr provides an example of how Choice states can be used to create a loop in a Cumulus workflow.

    In the complete ParsePdr workflow definition, the state QueueGranules is followed by CheckStatus. From CheckStatus a loop starts: Given CheckStatus returns meta.isPdrFinished: false, CheckStatus is followed by CheckAgainChoice is followed by PdrStatusReport is followed by WaitForSomeTime, which returns to CheckStatus. Once CheckStatus returns meta.isPdrFinished: true, CheckAgainChoice proceeds to WorkflowSucceeded.

    Execution graph of SIPS ParsePdr workflow in AWS Step Functions console

    Further documentation

    For complete details on Choice state configuration options, see the Choice state documentation.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/data-cookbooks/cnm-workflow/index.html b/docs/v12.0.0/data-cookbooks/cnm-workflow/index.html index 767c0cc0f43..b01e7be47dd 100644 --- a/docs/v12.0.0/data-cookbooks/cnm-workflow/index.html +++ b/docs/v12.0.0/data-cookbooks/cnm-workflow/index.html @@ -5,7 +5,7 @@ CNM Workflow | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v12.0.0

    CNM Workflow

    This entry documents how to setup a workflow that utilizes the built-in CNM/Kinesis functionality in Cumulus.

    Prior to working through this entry you should be familiar with the Cloud Notification Mechanism.

    Sections


    Prerequisites

    Cumulus

    This entry assumes you have a deployed instance of Cumulus (version >= 1.16.0). The entry assumes you are deploying Cumulus via the cumulus terraform module sourced from the release page.

    AWS CLI

    This entry assumes you have the AWS CLI installed and configured. If you do not, please take a moment to review the documentation - particularly the examples relevant to Kinesis - and install it now.

    Kinesis

    This entry assumes you already have two Kinesis data steams created for use as CNM notification and response data streams.

    If you do not have two streams setup, please take a moment to review the Kinesis documentation and setup two basic single-shard streams for this example:

    Using the "Create Data Stream" button on the Kinesis Dashboard, work through the dialogue.

    You should be able to quickly use the "Create Data Stream" button on the Kinesis Dashboard, and setup streams that are similar to the following example:

    Screenshot of AWS console page for creating a Kinesis stream

    Please bear in mind that your {{prefix}}-lambda-processing IAM role will need permissions to write to the response stream for this workflow to succeed if you create the Kinesis stream with a dashboard user. If you are using the cumulus top-level module for your deployment this should be set properly.

    If not, the most straightforward approach is to attach the AmazonKinesisFullAccess policy for the stream resource to whatever role your Lambda s are using, however your environment/security policies may require an approach specific to your deployment environment.

    In operational environments it's likely science data providers would typically be responsible for providing a Kinesis stream with the appropriate permissions.

    For more information on how this process works and how to develop a process that will add records to a stream, read the Kinesis documentation and the developer guide.

    Source Data

    This entry will run the SyncGranule task against a single target data file. To that end it will require a single data file to be present in an S3 bucket matching the Provider configured in the next section.

    Collection and Provider

    Cumulus will need to be configured with a Collection and Provider entry of your choosing. The provider should match the location of the source data from the Ingest Source Data section.

    This can be done via the Cumulus Dashboard if installed or the API. It is strongly recommended to use the dashboard if possible.


    Configure the Workflow

    Provided the prerequisites have been fulfilled, you can begin adding the needed values to your Cumulus configuration to configure the example workflow.

    The following are steps that are required to set up your Cumulus instance to run the example workflow:

    Example CNM Workflow

    In this example, we're going to trigger a workflow by creating a Kinesis rule and sending a record to a Kinesis stream.

    The following workflow definition should be added to a new .tf workflow resource (e.g. cnm_workflow.tf) in your deployment directory. For the complete CNM workflow example, see examples/cumulus-tf/kinesis_trigger_test_workflow.tf.

    Add the following to the new terraform file in your deployment directory, updating the following:

    • Set the response-endpoint key in the CnmResponse task in the workflow JSON to match the name of the Kinesis response stream you configured in the prerequisites section
    • Update the source key to the workflow module to match the Cumulus release associated with your deployment.
    module "cnm_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-workflow.zip"

    prefix = var.prefix
    name = "CNMExampleWorkflow"
    workflow_config = module.cumulus.workflow_config
    system_bucket = var.system_bucket

    {
    state_machine_definition = <<JSON
    "CNMExampleWorkflow": {
    "Comment": "CNMExampleWorkflow",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "collection": "{$.meta.collection}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "response-endpoint": "ADD YOUR RESPONSE STREAM NAME HERE",
    "region": "us-east-1",
    "type": "kinesis",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$.input.input}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 5,
    "MaxAttempts": 3
    }
    ],
    "End": true
    }
    }
    }
    }
    JSON

    Again, please make sure to modify the value response-endpoint to match the stream name (not ARN) for your Kinesis response stream.

    Lambda Configuration

    To execute this workflow, you're required to include several Lambda resources in your deployment. To do this, add the following task (Lambda) definitions to your deployment along with the workflow you created above:

    Please note: To utilize these tasks you need to ensure you have a compatible CMA layer. See the deployment instructions for more details on how to deploy a CMA layer.

    Below is a description of each of these tasks:

    CNMToCMA

    CNMToCMA is meant for the beginning of a workflow: it maps CNM granule information to a payload for downstream tasks. For other CNM workflows, you would need to ensure that downstream tasks in your workflow either understand the CNM message or include a translation task like this one.

    You can also manipulate the data sent to downstream tasks using task_config for various states in your workflow resource configuration. Read more about how to configure data on the Workflow Input & Output page.

    CnmResponse

    The CnmResponse Lambda generates a CNM response message and puts it on the response-endpoint Kinesis stream.

    You can read more about the expected schema of a CnmResponse record in the Cloud Notification Mechanism schema repository.

    Additional Tasks

    Lastly, this entry also makes use of the SyncGranule task from the cumulus module.

    Redeploy

    Once the above configuration changes have been made, redeploy your stack.

    Please refer to Update Cumulus resources in the deployment documentation if you are unfamiliar with redeployment.

    Rule Configuration

    Cumulus includes a messageConsumer Lambda function (message-consumer). Cumulus kinesis-type rules create the event source mappings between Kinesis streams and the messageConsumer Lambda. The messageConsumer Lambda consumes records from one or more Kinesis streams, as defined by enabled kinesis-type rules. When new records are pushed to one of these streams, the messageConsumer triggers workflows associated with the enabled kinesis-type rules.

    To add a rule via the dashboard (if you'd like to use the API, see the docs here), navigate to the Rules page and click Add a rule, then configure the new rule using the following template (substituting correct values for parameters denoted by ${}):

    {
    "collection": {
    "name": "L2_HR_PIXC",
    "version": "000"
    },
    "name": "L2_HR_PIXC_kinesisRule",
    "provider": "PODAAC_SWOT",
    "rule": {
    "type": "kinesis",
    "value": "arn:aws:kinesis:{{awsRegion}}:{{awsAccountId}}:stream/{{streamName}}"
    },
    "state": "ENABLED",
    "workflow": "CNMExampleWorkflow"
    }

    Please Note:

    • The rule's value attribute value must match the Amazon Resource Name ARN for the Kinesis data stream you've preconfigured. You should be able to obtain this ARN from the Kinesis Dashboard entry for the selected stream.
    • The collection and provider should match the collection and provider you setup in the Prerequisites section.

    Once you've clicked on 'submit' a new rule should appear in the dashboard's Rule Overview.


    Execute the Workflow

    Once Cumulus has been redeployed and a rule has been added, we're ready to trigger the workflow and watch it execute.

    How to Trigger the Workflow

    To trigger matching workflows, you will need to put a record on the Kinesis stream that the message-consumer Lambda will recognize as a matching event. Most importantly, it should include a collection name that matches a valid collection.

    For the purpose of this example, the easiest way to accomplish this is using the AWS CLI.

    Create Record JSON

    Construct a JSON file containing an object that matches the values that have been previously setup. This JSON object should be a valid Cloud Notification Mechanism message.

    Please note: this example is somewhat contrived, as the downstream tasks don't care about most of these fields. A 'real' data ingest workflow would.

    The following values (denoted by ${} in the sample below) should be replaced to match values we've previously configured:

    • TEST_DATA_FILE_NAME: The filename of the test data that is available in the S3 (or other) provider we created earlier.
    • TEST_DATA_URI: The full S3 path to the test data (e.g. s3://bucket-name/path/granule)
    • COLLECTION: The collection name defined in the prerequisites for this product
    {
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "${TEST_DATA_FILE_NAME}",
    "checksum": "bogus_checksum_value",
    "uri": "${TEST_DATA_URI}",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "${TEST_DATA_FILE_NAME}",
    "dataVersion": "006"
    },
    "identifier ": "testIdentifier123456",
    "collection": "${COLLECTION}",
    "provider": "TestProvider",
    "version": "001",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Add Record to Kinesis Data Stream

    Using the JSON file you created, push it to the Kinesis notification stream:

    aws kinesis put-record --stream-name YOUR_KINESIS_NOTIFICATION_STREAM_NAME_HERE --partition-key 1 --data file:///path/to/file.json

    Please note: The above command uses the stream name, not the ARN.

    The command should return output similar to:

    {
    "ShardId": "shardId-000000000000",
    "SequenceNumber": "42356659532578640215890215117033555573986830588739321858"
    }

    This command will put a record containing the JSON from the --data flag onto the Kinesis data stream. The messageConsumer Lambda will consume the record and construct a valid CMA payload to trigger workflows. For this example, the record will trigger the CNMExampleWorkflow workflow as defined by the rule previously configured.

    You can view the current running executions on the Executions dashboard page which presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information.

    Verify Workflow Execution

    As detailed above, once the record is added to the Kinesis data stream, the messageConsumer Lambda will trigger the CNMExampleWorkflow .

    TranslateMessage

    TranslateMessage (which corresponds to the CNMToCMA Lambda) will take the CNM object payload and add a granules object to the CMA payload that's consistent with other Cumulus ingest tasks, and add a meta.cnm key (as well as the payload) to store the original message.

    For more on the Message Adapter, please see the Message Flow documentation.

    An example of what is happening in the CNMToCMA Lambda is as follows:

    Example Input Payload:

    "payload": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some_bucket/cumulus-test-data/pdrs/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Example Output Payload:

      "payload": {
    "cnm": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552"
    },
    "output": {
    "granules": [
    {
    "granuleId": "TestGranuleUR",
    "files": [
    {
    "path": "some-bucket/data",
    "url_path": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "some-bucket",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 12345678
    }
    ]
    }
    ]
    }
    }

    SyncGranules

    This Lambda will take the files listed in the payload and move them to s3://{deployment-private-bucket}/file-staging/{deployment-name}/{COLLECTION}/{file_name}.

    CnmResponse

    Assuming a successful execution of the workflow, this task will recover the meta.cnm key from the CMA output, and add a "SUCCESS" record to the notification Kinesis stream.

    If a prior step in the workflow has failed, this will add a "FAILURE" record to the stream instead.

    The data written to the response-endpoint should adhere to the Response Message Fields schema.

    Example CNM Success Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "SUCCESS"
    }
    }

    Example CNM Error Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "FAILURE",
    "errorCode": "PROCESSING_ERROR",
    "errorMessage": "File [cumulus-dev-a4d38f59-5e57-590c-a2be-58640db02d91/prod_20170926T11:30:36/production_file.nc] did not match gve checksum value."
    }
    }

    Note the CnmResponse state defined in the .tf workflow definition above configures $.exception to be passed to the CnmResponse Lambda keyed under config.WorkflowException. This is required for the CnmResponse code to deliver a failure response.

    To test the failure scenario, send a record missing the product.name key.


    Verify results

    Check for successful execution on the dashboard

    Following the successful execution of this workflow, you should expect to see the workflow complete successfully on the dashboard:

    Screenshot of a successful CNM workflow appearing on the executions page of the Cumulus dashboard

    Check the test granule has been delivered to S3 staging

    The test granule identified in the Kinesis record should be moved to the deployment's private staging area.

    Check for Kinesis records

    A SUCCESS notification should be present on the response-endpoint Kinesis stream.

    You should be able to validate the notification and response streams have the expected records with the following steps (the AWS CLI Kinesis Basic Stream Operations is useful to review before proceeding):

    Get a shard iterator (substituting your stream name as appropriate):

    aws kinesis get-shard-iterator \
    --shard-id shardId-000000000000 \
    --shard-iterator-type LATEST \
    --stream-name NOTIFICATION_OR_RESPONSE_STREAM_NAME

    which should result in an output to:

    {
    "ShardIterator": "VeryLongString=="
    }
    • Re-trigger the workflow by using the put-record command from
    • As the workflow completes, use the output from the get-shard-iterator command to request data from the stream:
    aws kinesis get-records --shard-iterator SHARD_ITERATOR_VALUE

    This should result in output similar to:

    {
    "Records": [
    {
    "SequenceNumber": "49586720336541656798369548102057798835250389930873978882",
    "ApproximateArrivalTimestamp": 1532664689.128,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjI4LjkxOSJ9",
    "PartitionKey": "1"
    },
    {
    "SequenceNumber": "49586720336541656798369548102059007761070005796999266306",
    "ApproximateArrivalTimestamp": 1532664707.149,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjQ2Ljk1OCJ9",
    "PartitionKey": "1"
    }
    ],
    "NextShardIterator": "AAAAAAAAAAFo9SkF8RzVYIEmIsTN+1PYuyRRdlj4Gmy3dBzsLEBxLo4OU+2Xj1AFYr8DVBodtAiXbs3KD7tGkOFsilD9R5tA+5w9SkGJZ+DRRXWWCywh+yDPVE0KtzeI0andAXDh9yTvs7fLfHH6R4MN9Gutb82k3lD8ugFUCeBVo0xwJULVqFZEFh3KXWruo6KOG79cz2EF7vFApx+skanQPveIMz/80V72KQvb6XNmg6WBhdjqAA==",
    "MillisBehindLatest": 0
    }

    Note the data encoding is not human readable and would need to be parsed/converted to be interpretable. There are many options to build a Kineis consumer such as the KCL.

    For purposes of validating the workflow, it may be simpler to locate the workflow in the Step Function Management Console and assert the expected output is similar to the below examples.

    Successful CNM Response Object Example:

    {
    "cnmResponse": {
    "provider": "TestProvider",
    "collection": "MOD09GQ",
    "version": "123456",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier ": "testIdentifier123456",
    "response": {
    "status": "SUCCESS"
    }
    }
    }

    Kinesis Record Error Handling

    messageConsumer

    The default Kinesis stream processing in the Cumulus system is configured for record error tolerance.

    When the messageConsumer fails to process a record, the failure is captured and the record is published to the kinesisFallback SNS Topic. The kinesisFallback SNS topic broadcasts the record and a subscribed copy of the messageConsumer Lambda named kinesisFallback consumes these failures.

    At this point, the normal Lambda asynchronous invocation retry behavior will attempt to process the record 3 mores times. After this, if the record cannot successfully be processed, it is written to a dead letter queue. Cumulus' dead letter queue is an SQS Queue named kinesisFailure. Operators can use this queue to inspect failed records.

    This system ensures when messageConsumer fails to process a record and trigger a workflow, the record is retried 3 times. This retry behavior improves system reliability in case of any external service failure outside of Cumulus control.

    The Kinesis error handling system - the kinesisFallback SNS topic, messageConsumer Lambda, and kinesisFailure SQS queue - come with the API package and do not need to be configured by the operator.

    To examine records that were unable to be processed at any step you need to go look at the dead letter queue {{prefix}}-kinesisFailure. Check the Simple Queue Service (SQS) console. Select your queue, and under the Queue Actions tab, you can choose View/Delete Messages. Start polling for messages and you will see records that failed to process through the messageConsumer.

    Note, these are only records that occurred when processing records from Kinesis streams. Workflow failures are handled differently.

    Kinesis Stream logging

    Notification Stream messages

    Cumulus includes two Lambdas (KinesisInboundEventLogger and KinesisOutboundEventLogger) that utilize the same code to take a Kinesis record event as input, deserialize the data field and output the modified event to the logs.

    When a kinesis rule is created, in addition to the messageConsumer event mapping, an event mapping is created to trigger KinesisInboundEventLogger to record a log of the inbound record, to allow for analysis in case of unexpected failure.

    Response Stream messages

    Cumulus also supports this feature for all outbound messages. To take advantage of this feature, you will need to set an event mapping on the KinesisOutboundEventLogger Lambda that targets your response-endpoint. You can do this in the Lambda management page for KinesisOutboundEventLogger. Add a Kinesis trigger, and configure it to target the cnmResponseStream for your workflow:

    Screenshot of the AWS console showing configuration for Kinesis stream trigger on KinesisOutboundEventLogger Lambda

    Once this is done, all records sent to the response-endpoint will also be logged in CloudWatch. For more on configuring Lambdas to trigger on Kinesis events, please see creating an event source mapping.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/data-cookbooks/error-handling/index.html b/docs/v12.0.0/data-cookbooks/error-handling/index.html index 41eb8ea29ea..81cbd50b5a8 100644 --- a/docs/v12.0.0/data-cookbooks/error-handling/index.html +++ b/docs/v12.0.0/data-cookbooks/error-handling/index.html @@ -5,7 +5,7 @@ Error Handling in Workflows | Cumulus Documentation - + @@ -45,7 +45,7 @@ Service Exception. See this documentation on configuring your workflow to handle transient lambda errors.

    Example state machine definition:

    {
    "Comment": "Tests Workflow from Kinesis Stream",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "Path": "$.payload",
    "TargetPath": "$.payload"
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": ["States.ALL"],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowSucceeded"
    },
    "CnmResponseFail": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowFailed"
    },
    "WorkflowSucceeded": {
    "Type": "Succeed"
    },
    "WorkflowFailed": {
    "Type": "Fail",
    "Cause": "Workflow failed"
    }
    }
    }

    The above results in a workflow which is visualized in the diagram below:

    Screenshot of a visualization of an AWS Step Function workflow definition with branching logic for failures

    Summary

    Error handling should (mostly) be the domain of workflow configuration.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/data-cookbooks/hello-world/index.html b/docs/v12.0.0/data-cookbooks/hello-world/index.html index 3daaa73e886..d9de86ae19f 100644 --- a/docs/v12.0.0/data-cookbooks/hello-world/index.html +++ b/docs/v12.0.0/data-cookbooks/hello-world/index.html @@ -5,14 +5,14 @@ HelloWorld Workflow | Cumulus Documentation - +
    Version: v12.0.0

    HelloWorld Workflow

    Example task meant to be a sanity check/introduction to the Cumulus workflows.

    Pre-Deployment Configuration

    Workflow Configuration

    A workflow definition can be found in the template repository hello_world_workflow module.

    {
    "Comment": "Returns Hello World",
    "StartAt": "HelloWorld",
    "States": {
    "HelloWorld": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.hello_world_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    }

    Workflow error-handling can be configured as discussed in the Error-Handling cookbook.

    Task Configuration

    The HelloWorld task is provided for you as part of the cumulus terraform module, no configuration is needed.

    If you want to manually deploy your own version of this Lambda for testing, you can copy the Lambda resource definition located in the Cumulus source code at cumulus/tf-modules/ingest/hello-world-task.tf. The Lambda source code is located in the Cumulus source code at 'cumulus/tasks/hello-world'.

    Execution

    We will focus on using the Cumulus dashboard to schedule the execution of a HelloWorld workflow.

    Our goal here is to create a rule through the Cumulus dashboard that will define the scheduling and execution of our HelloWorld workflow. Let's navigate to the Rules page and click Add a rule.

    {
    "collection": { # collection values can be configured and found on the Collections page
    "name": "${collection_name}",
    "version": "${collection_version}"
    },
    "name": "helloworld_rule",
    "provider": "${provider}", # found on the Providers page
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "workflow": "HelloWorldWorkflow" # This can be found on the Workflows page
    }

    Screenshot of AWS Step Function execution graph for the HelloWorld workflow Executed workflow as seen in AWS Console

    Output/Results

    The Executions page presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information. The rule defined in the previous section should start an execution of its own accord, and the status of that execution can be tracked here.

    To get some deeper information on the execution, click on the value in the Name column of your execution of interest. This should bring up a visual representation of the workflow similar to that shown above, execution details, and a list of events.

    Summary

    Setting up the HelloWorld workflow on the Cumulus dashboard is the tip of the iceberg, so to speak. The task and step-function need to be configured before Cumulus deployment. A compatible collection and provider must be configured and applied to the rule. Finally, workflow execution status can be viewed via the workflows tab on the dashboard.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/data-cookbooks/ingest-notifications/index.html b/docs/v12.0.0/data-cookbooks/ingest-notifications/index.html index fd6fad16a22..586d79054b6 100644 --- a/docs/v12.0.0/data-cookbooks/ingest-notifications/index.html +++ b/docs/v12.0.0/data-cookbooks/ingest-notifications/index.html @@ -5,13 +5,13 @@ Ingest Notification in Workflows | Cumulus Documentation - +
    Version: v12.0.0

    Ingest Notification in Workflows

    On deployment, an SQS queue and three SNS topics, one for executions, granules, and PDRs, are created and used for handling notification messages related to the workflow.

    The ingest notification reporting SQS queue is populated via a Cloudwatch rule for any Step Function execution state transitions. The sfEventSqsToDbRecords Lambda consumes this queue. The queue and Lambda are included in the cumulus module and the Cloudwatch rule in the workflow module and are included by default in a Cumulus deployment.

    The sfEventSqsToDbRecords Lambda function reads from the sfEventSqsToDbRecordsInputQueue queue and updates the RDS database records for granules, executions, and PDRs. When the records are updated, messages are posted to the three SNS topics. This Lambda is invoked both when the workflow starts and when it reaches a terminal state (completion or failure).

    Diagram of architecture for reporting workflow ingest notifications from AWS Step Functions

    Sending SQS messages to report status

    Publishing granule/PDR reports directly to the SQS queue

    If you have a non-Cumulus workflow or process ingesting data and would like to update the status of your granules or PDRs, you can publish directly to the reporting SQS queue. Publishing messages to this queue will result in those messages being stored as granule/PDR records in the Cumulus database and having the status of those granules/PDRs being visible on the Cumulus dashboard. The queue does have certain expectations as it expects a Cumulus Message nested within a Cloudwatch Step Function Event object.

    Posting directly to the queue will require knowing the queue URL. Assuming that you are using the cumulus module for your deployment, you can get the queue URL by adding them to outputs.tf for your Terraform deployment as in our example deployment:

    output "stepfunction_event_reporter_queue_url" {
    value = module.cumulus.stepfunction_event_reporter_queue_url
    }

    output "report_executions_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_granules_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_pdrs_sns_topic_arn" {
    value = module.cumulus.report_pdrs_sns_topic_arn
    }

    Then, when you run terraform deploy, you should see the topic ARNs printed to your console:

    Outputs:
    ...
    stepfunction_event_reporter_queue_url = https://sqs.us-east-1.amazonaws.com/xxxxxxxxx/<prefix>-sfEventSqsToDbRecordsInputQueue
    report_executions_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_granules_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_pdrs_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-pdrs-topic

    Once you have the queue URL, you can use the AWS SDK for your language of choice to publish messages to the topic. The expected format of these messages is that of a Cloudwatch Step Function event containing a Cumulus message. For SUCCEEDED events, the Cumulus message is expected to be in detail.output. For all other events statuses, a Cumulus Message is expected in detail.input. The Cumulus Message populating these fields MUST be a JSON string, not an object. Messages that do not conform to the schemas will fail to be created as records.

    If you are not seeing records persist to the database or show up in the Cumulus dashboard, you can investigate the Cloudwatch logs of the SQS consumer Lambda:

    • /aws/lambda/<prefix>-sfEventSqsToDbRecords

    In a workflow

    As described above, ingest notifications will automatically be published to the SNS topics on workflow start and completion/failure, so you should not include a workflow step to publish the initial or final status of your workflows.

    However, if you want to report your ingest status at any point during a workflow execution, you can add a workflow step using the SfSqsReport Lambda. In the following example from cumulus-tf/parse_pdr_workflow.tf, the ParsePdr workflow is configured to use the SfSqsReport Lambda, primarily to update the PDR ingestion status.

    Note: ${sf_sqs_report_task_arn} is an interpolated value referring to a Terraform resource. See the example deployment code for the ParsePdr workflow.

      "PdrStatusReport": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    },
    "ResultPath": null,
    "Type": "Task",
    "Resource": "${sf_sqs_report_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WaitForSomeTime"
    },

    Subscribing additional listeners to SNS topics

    Additional listeners to SNS topics can be configured in a .tf file for your Cumulus deployment. Shown below is configuration that subscribes an additional Lambda function (test_lambda) to receive messages from the report_executions SNS topic. To subscribe to the report_granules or report_pdrs SNS topics instead, simply replace report_executions in the code block below with either of those values.

    resource "aws_lambda_function" "test_lambda" {
    function_name = "${var.prefix}-testLambda"
    filename = "./testLambda.zip"
    source_code_hash = filebase64sha256("./testLambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"
    }

    resource "aws_sns_topic_subscription" "test_lambda" {
    topic_arn = module.cumulus.report_executions_sns_topic_arn
    protocol = "lambda"
    endpoint = aws_lambda_function.test_lambda.arn
    }

    resource "aws_lambda_permission" "test_lambda" {
    action = "lambda:InvokeFunction"
    function_name = aws_lambda_function.test_lambda.arn
    principal = "sns.amazonaws.com"
    source_arn = module.cumulus.report_executions_sns_topic_arn
    }

    SNS message format

    Subscribers to the SNS topics can expect to find the published message in the SNS event at Records[0].Sns.Message. The message will be a JSON stringified version of the ingest notification record for an execution or a PDR. For granules, the message will be a JSON stringified object with ingest notification record in the record property and the event type as the event property.

    The ingest notification record of the execution, granule, or PDR should conform to the data model schema for the given record type.

    Summary

    Workflows can be configured to send SQS messages at any point using the sf-sqs-report task.

    Additional listeners can be easily configured to trigger when messages are sent to the SNS topics.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/data-cookbooks/queue-post-to-cmr/index.html b/docs/v12.0.0/data-cookbooks/queue-post-to-cmr/index.html index 48d32b26f04..8ef501937a5 100644 --- a/docs/v12.0.0/data-cookbooks/queue-post-to-cmr/index.html +++ b/docs/v12.0.0/data-cookbooks/queue-post-to-cmr/index.html @@ -5,13 +5,13 @@ Queue PostToCmr | Cumulus Documentation - +
    Version: v12.0.0

    Queue PostToCmr

    In this document, we walk through handling CMR errors in workflows by queueing PostToCmr. We assume that the user already has an ingest workflow setup.

    Overview

    The general concept is that the last task of the ingest workflow will be QueueWorkflow, which queues the publish workflow. The publish workflow contains the PostToCmr task and if a CMR error occurs during PostToCmr, the publish workflow will add itself back onto the queue so that it can be executed when CMR is back online. This is achieved by leveraging the QueueWorkflow task again in the publish workflow. The following diagram demonstrates this queueing process.

    Diagram of workflow queueing

    Ingest Workflow

    The last step should be the QueuePublishWorkflow step. It should be configured with a queueUrl and workflow. In this case, the queueUrl is a throttled queue. Any queueUrl can be specified here which is useful if you would like to use a lower priority queue. The workflow is the unprefixed workflow name that you would like to queue (e.g. PublishWorkflow).

      "QueuePublishWorkflowStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "workflow": "{$.meta.workflow}",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Publish Workflow

    Configure the Catch section of your PostToCmr task to proceed to QueueWorkflow if a CMRInternalError is caught. Any other error will cause the workflow to fail.

      "Catch": [
    {
    "ErrorEquals": [
    "CMRInternalError"
    ],
    "Next": "RequeueWorkflow"
    },
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],

    Then, configure the QueueWorkflow task similarly to its configuration in the ingest workflow. This time, pass the current publish workflow to the task config. This allows for the publish workflow to be requeued when there is a CMR error.

    {
    "RequeueWorkflow": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "workflow": "PublishGranuleQueue",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    - + \ No newline at end of file diff --git a/docs/v12.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html b/docs/v12.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html index d9a71583909..d4254cdb2e6 100644 --- a/docs/v12.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html +++ b/docs/v12.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html @@ -5,13 +5,13 @@ Run Step Function Tasks in AWS Lambda or Docker | Cumulus Documentation - +
    Version: v12.0.0

    Run Step Function Tasks in AWS Lambda or Docker

    Overview

    AWS Step Function Tasks can run tasks on AWS Lambda or on AWS Elastic Container Service (ECS) as a Docker container.

    Lambda provides serverless architecture, providing the best option for minimizing cost and server management. ECS provides the fullest extent of AWS EC2 resources via the flexibility to execute arbitrary code on any AWS EC2 instance type.

    When to use Lambda

    You should use AWS Lambda whenever all of the following are true:

    • The task runs on one of the supported Lambda Runtimes. At time of this writing, supported runtimes include versions of python, Java, Ruby, node.js, Go and .NET.
    • The lambda package is less than 50 MB in size, zipped.
    • The task consumes less than each of the following resources:
      • 3008 MB memory allocation
      • 512 MB disk storage (must be written to /tmp)
      • 15 minutes of execution time

    See this page for a complete and up-to-date list of AWS Lambda limits.

    If your task requires more than any of these resources or an unsupported runtime, creating a Docker image which can be run on ECS is the way to go. Cumulus supports running any lambda package (and its configured layers) as a Docker container with cumulus-ecs-task.

    Step Function Activities and cumulus-ecs-task

    Step Function Activities enable a state machine task to "publish" an activity task which can be picked up by any activity worker. Activity workers can run pretty much anywhere, but Cumulus workflows support the cumulus-ecs-task activity worker. The cumulus-ecs-task worker runs as a Docker container on the Cumulus ECS cluster.

    The cumulus-ecs-task container takes an AWS Lambda Amazon Resource Name (ARN) as an argument (see --lambdaArn in the example below). This ARN argument is defined at deployment time. The cumulus-ecs-task worker polls for new Step Function Activity Tasks. When a Step Function executes, the worker (container) picks up the activity task and runs the code contained in the lambda package defined on deployment.

    Example: Replacing AWS Lambda with a Docker container run on ECS

    This example will use an already-defined workflow from the cumulus module that includes the QueueGranules task in its configuration.

    The following example is an excerpt from the Discover Granules workflow containing the step definition for the QueueGranules step:

    Note: ${ingest_granule_workflow_name} and ${queue_granules_task_arn} are interpolated values that refer to Terraform resources. See the example deployment code for the Discover Granules workflow.

      "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "queueUrl": "{$.meta.queues.startSF}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Given it has been discovered this task can no longer run in AWS Lambda, you can instead run it on the Cumulus ECS cluster by adding the following resources to your terraform deployment (by either adding a new .tf file or updating an existing one):

    • A aws_sfn_activity resource:
    resource "aws_sfn_activity" "queue_granules" {
    name = "${var.prefix}-QueueGranules"
    }
    • An instance of the cumulus_ecs_service module (found on the Cumulus releases page configured to provide the QueueGranules task:

    module "queue_granules_service" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-ecs-service.zip"

    prefix = var.prefix
    name = "QueueGranules"

    cluster_arn = module.cumulus.ecs_cluster_arn
    desired_count = 1
    image = "cumuluss/cumulus-ecs-task:1.7.0"

    cpu = 400
    memory_reservation = 700

    environment = {
    AWS_DEFAULT_REGION = data.aws_region.current.name
    }
    command = [
    "cumulus-ecs-task",
    "--activityArn",
    aws_sfn_activity.queue_granules.id,
    "--lambdaArn",
    module.cumulus.queue_granules_task.task_arn,
    "--lastModified",
    module.cumulus.queue_granules_task.last_modified_date
    ]
    alarms = {
    MemoryUtilizationHigh = {
    comparison_operator = "GreaterThanThreshold"
    evaluation_periods = 1
    metric_name = "MemoryUtilization"
    statistic = "SampleCount"
    threshold = 75
    }
    }
    }

    Please note: If you have updated the code for the Lambda specified by --lambdaArn, you will have to manually restart the tasks in your ECS service before invocation of the Step Function activity will use the updated Lambda code.

    • An updated Discover Granules workflow) to utilize the new resource (the Resource key in the QueueGranules step has been updated to:

    "Resource": "${aws_sfn_activity.queue_granules.id}")`

    If you then run this workflow in place of the DiscoverGranules workflow, the QueueGranules step would run as an ECS task instead of a lambda.

    Final note

    Step Function Activities and AWS Lambda are not the only ways to run tasks in an AWS Step Function. Learn more about other service integrations, including direct ECS integration via the AWS Service Integrations page.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/data-cookbooks/sips-workflow/index.html b/docs/v12.0.0/data-cookbooks/sips-workflow/index.html index 0c7fdd8495e..9de49ae38b5 100644 --- a/docs/v12.0.0/data-cookbooks/sips-workflow/index.html +++ b/docs/v12.0.0/data-cookbooks/sips-workflow/index.html @@ -5,7 +5,7 @@ Science Investigator-led Processing Systems (SIPS) | Cumulus Documentation - + @@ -16,7 +16,7 @@ we're just going to create a onetime throw-away rule that will be easy to test with. This rule will kick off the DiscoverAndQueuePdrs workflow, which is the beginning of a Cumulus SIPS workflow:

    Screenshot of a Cumulus rule configuration

    Note: A list of configured workflows exists under the "Workflows" in the navigation bar on the Cumulus dashboard. Additionally, one can find a list of executions and their respective status in the "Executions" tab in the navigation bar.

    DiscoverAndQueuePdrs Workflow

    This workflow will discover PDRs and queue them to be processed. Duplicate PDRs will be dealt with according to the configured duplicate handling setting in the collection. The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. DiscoverPdrs - source
    2. QueuePdrs - source

    Screenshot of execution graph for discover and queue PDRs workflow in the AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the discover_and_queue_pdrs_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    ParsePdr Workflow

    The ParsePdr workflow will parse a PDR, queue the specified granules (duplicates are handled according to the duplicate handling setting) and periodically check the status of those queued granules. This workflow will not succeed until all the granules included in the PDR are successfully ingested. If one of those fails, the ParsePdr workflow will fail. NOTE that ParsePdr may spin up multiple IngestGranule workflows in parallel, depending on the granules included in the PDR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. ParsePdr - source
    2. QueueGranules - source
    3. CheckStatus - source

    Screenshot of execution graph for SIPS Parse PDR workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the parse_pdr_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    IngestGranule Workflow

    The IngestGranule workflow processes and ingests a granule and posts the granule metadata to CMR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. SyncGranule - source.
    2. CmrStep - source

    Additionally this workflow requires a processing step you must provide. The ProcessingStep step in the workflow picture below is an example of a custom processing step.

    Note: Using the CmrStep is not required and can be left out of the processing trajectory if desired (for example, in testing situations).

    Screenshot of execution graph for SIPS IngestGranule workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the ingest_and_publish_granule_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    Summary

    In this cookbook we went over setting up a collection, rule, and provider for a SIPS workflow. Once we had the setup completed, we looked over the Cumulus workflows that participate in parsing PDRs, ingesting and processing granules, and updating CMR.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/data-cookbooks/throttling-queued-executions/index.html b/docs/v12.0.0/data-cookbooks/throttling-queued-executions/index.html index 6cfc67bc7d9..b5847fcd8d5 100644 --- a/docs/v12.0.0/data-cookbooks/throttling-queued-executions/index.html +++ b/docs/v12.0.0/data-cookbooks/throttling-queued-executions/index.html @@ -5,13 +5,13 @@ Throttling queued executions | Cumulus Documentation - +
    Version: v12.0.0

    Throttling queued executions

    In this entry, we will walk through how to create an SQS queue for scheduling executions which will be used to limit those executions to a maximum concurrency. And we will see how to configure our Cumulus workflows/rules to use this queue.

    We will also review the architecture of this feature and highlight some implementation notes.

    Limiting the number of executions that can be running from a given queue is useful for controlling the cloud resource usage of workflows that may be lower priority, such as granule reingestion or reprocessing campaigns. It could also be useful for preventing workflows from exceeding known resource limits, such as a maximum number of open connections to a data provider.

    Implementing the queue

    Create and deploy the queue

    Add a new queue

    In a .tf file for your Cumulus deployment, add a new SQS queue:

    resource "aws_sqs_queue" "background_job_queue" {
    name = "${var.prefix}-backgroundJobQueue"
    receive_wait_time_seconds = 20
    visibility_timeout_seconds = 60
    }

    Set maximum executions for the queue

    Define the throttled_queues variable for the cumulus module in your Cumulus deployment to specify the maximum concurrent executions for the queue.

    module "cumulus" {
    # ... other variables

    throttled_queues = [{
    url = aws_sqs_queue.background_job_queue.id,
    execution_limit = 5
    }]
    }

    Setup consumer for the queue

    Add the sqs2sfThrottle Lambda as the consumer for the queue and add a Cloudwatch event rule/target to read from the queue on a scheduled basis.

    Please note: You must use the sqs2sfThrottle Lambda as the consumer for any queue with a queue execution limit or else the execution throttling will not work correctly. Additionally, please allow at least 60 seconds after creation before using the queue while associated infrastructure and triggers are set up and made ready.

    aws_sqs_queue.background_job_queue.id refers to the queue resource defined above.

    resource "aws_cloudwatch_event_rule" "background_job_queue_watcher" {
    schedule_expression = "rate(1 minute)"
    }

    resource "aws_cloudwatch_event_target" "background_job_queue_watcher" {
    rule = aws_cloudwatch_event_rule.background_job_queue_watcher.name
    arn = module.cumulus.sqs2sfThrottle_lambda_function_arn
    input = jsonencode({
    messageLimit = 500
    queueUrl = aws_sqs_queue.background_job_queue.id
    timeLimit = 60
    })
    }

    resource "aws_lambda_permission" "background_job_queue_watcher" {
    action = "lambda:InvokeFunction"
    function_name = module.cumulus.sqs2sfThrottle_lambda_function_arn
    principal = "events.amazonaws.com"
    source_arn = aws_cloudwatch_event_rule.background_job_queue_watcher.arn
    }

    Re-deploy your Cumulus application

    Follow the instructions to re-deploy your Cumulus application. After you have re-deployed, your workflow template will be updated to the include information about the queue (the output below is partial output from an expected workflow template):

    {
    "cumulus_meta": {
    "queueExecutionLimits": {
    "<backgroundJobQueue_SQS_URL>": 5
    }
    }
    }

    Integrate your queue with workflows and/or rules

    Integrate queue with queuing steps in workflows

    For any workflows using QueueGranules or QueuePdrs that you want to use your new queue, update the Cumulus configuration of those steps in your workflows.

    As seen in this partial configuration for a QueueGranules step, update the queueUrl to reference the new throttled queue:

    Note: ${ingest_granule_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverGranules workflow.

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}"
    }
    }
    }
    }
    }

    Similarly, for a QueuePdrs step:

    Note: ${parse_pdr_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverPdrs workflow.

    {
    "QueuePdrs": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "parsePdrWorkflow": "${parse_pdr_workflow_name}"
    }
    }
    }
    }
    }

    After making these changes, re-deploy your Cumulus application for the execution throttling to take effect on workflow executions queued by these workflows.

    Create/update a rule to use your new queue

    Create or update a rule definition to include a queueUrl property that refers to your new queue:

    {
    "name": "s3_provider_rule",
    "workflow": "DiscoverAndQueuePdrs",
    "provider": "s3_provider",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "queueUrl": "<backgroundJobQueue_SQS_URL>" // configure rule to use your queue URL
    }

    After creating/updating the rule, any subsequent invocations of the rule should respect the maximum number of executions when starting workflows from the queue.

    Architecture

    Architecture diagram showing how executions started from a queue are throttled to a maximum concurrent limit

    Execution throttling based on the queue works by manually keeping a count (semaphore) of how many executions are running for the queue at a time. The key operation that prevents the number of executions from exceeding the maximum for the queue is that before starting new executions, the sqs2sfThrottle Lambda attempts to increment the semaphore and responds as follows:

    • If the increment operation is successful, then the count was not at the maximum and an execution is started
    • If the increment operation fails, then the count was already at the maximum so no execution is started

    Final notes

    Limiting the number of concurrent executions for work scheduled via a queue has several consequences worth noting:

    • The number of executions that are running for a given queue will be limited to the maximum for that queue regardless of which workflow(s) are started.
    • If you use the same queue to schedule executions across multiple workflows/rules, then the limit on the total number of executions running concurrently will be applied to all of the executions scheduled across all of those workflows/rules.
    • If you are scheduling the same workflow both via a queue with a maxExecutions value and a queue without a maxExecutions value, only the executions scheduled via the queue with the maxExecutions value will be limited to the maximum.
    - + \ No newline at end of file diff --git a/docs/v12.0.0/data-cookbooks/tracking-files/index.html b/docs/v12.0.0/data-cookbooks/tracking-files/index.html index 34abe16529e..2f5f4aa7b5e 100644 --- a/docs/v12.0.0/data-cookbooks/tracking-files/index.html +++ b/docs/v12.0.0/data-cookbooks/tracking-files/index.html @@ -5,7 +5,7 @@ Tracking Ancillary Files | Cumulus Documentation - + @@ -19,7 +19,7 @@ The UMM-G column reflects the RelatedURL's Type derived from the CNM type, whereas the ECHO10 column shows how the CNM type affects the destination element.

    CNM TypeUMM-G RelatedUrl.TypeECHO10 Location
    ancillary'VIEW RELATED INFORMATION'OnlineResource
    data'GET DATA'(HTTPS URL) or 'GET DATA VIA DIRECT ACCESS'(S3 URI)OnlineAccessURL
    browse'GET RELATED VISUALIZATION'AssociatedBrowseImage
    linkage'EXTENDED METADATA'OnlineResource
    metadata'EXTENDED METADATA'OnlineResource
    qa'EXTENDED METADATA'OnlineResource

    Common Use Cases

    This section briefly documents some common use cases and the recommended configuration for the file. The examples shown here are for the DiscoverGranules use case, which allows configuration at the Cumulus dashboard level. The other two cases covered in the ancillary metadata documentation require configuration at the provider notification level (either CNM message or PDR) and are not covered here.

    Configuring browse imagery:

    {
    "bucket": "public",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_[\\d]{1}.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_1.jpg",
    "type": "browse"
    }

    Configuring a documentation entry:

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_README.pdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_README.pdf",
    "type": "metadata"
    }

    Configuring other associated files (use types metadata or qa as appropriate):

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_QA.txt$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_QA.txt",
    "type": "qa"
    }
    - + \ No newline at end of file diff --git a/docs/v12.0.0/deployment/api-gateway-logging/index.html b/docs/v12.0.0/deployment/api-gateway-logging/index.html index 8328da45486..0d330ea6146 100644 --- a/docs/v12.0.0/deployment/api-gateway-logging/index.html +++ b/docs/v12.0.0/deployment/api-gateway-logging/index.html @@ -5,13 +5,13 @@ API Gateway Logging | Cumulus Documentation - +
    Version: v12.0.0

    API Gateway Logging

    Enabling API Gateway logging

    In order to enable distribution API Access and execution logging, configure the TEA deployment by setting log_api_gateway_to_cloudwatch on the thin_egress_app module:

    log_api_gateway_to_cloudwatch = true

    This enables the distribution API to send its logs to the default CloudWatch location: API-Gateway-Execution-Logs_<RESTAPI_ID>/<STAGE>

    Configure Permissions for API Gateway Logging to CloudWatch

    Instructions for enabling account level logging from API Gateway to CloudWatch

    This is a one time operation that must be performed on each AWS account to allow API Gateway to push logs to CloudWatch.

    Create a policy document

    The AmazonAPIGatewayPushToCloudWatchLogs managed policy, with an ARN of arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs, has all the required permissions to enable API Gateway logging to CloudWatch. To grant these permissions to your account, first create an IAM role with apigateway.amazonaws.com as its trusted entity.

    Save this snippet as apigateway-policy.json.

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "",
    "Effect": "Allow",
    "Principal": {
    "Service": "apigateway.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
    }
    ]
    }

    Create an account role to act as ApiGateway and write to CloudWatchLogs

    NASA users in NGAP: be sure to use your account's permission boundary.

    aws iam create-role \
    --role-name ApiGatewayToCloudWatchLogs \
    [--permissions-boundary <permissionBoundaryArn>] \
    --assume-role-policy-document file://apigateway-policy.json

    Note the ARN of the returned role for the last step.

    Attach correct permissions to role

    Next attach the AmazonAPIGatewayPushToCloudWatchLogs policy to the IAM role.

    aws iam attach-role-policy \
    --role-name ApiGatewayToCloudWatchLogs \
    --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs"

    Update Account API Gateway settings with correct permissions

    Finally, set the IAM role ARN on the cloudWatchRoleArn property on your API Gateway Account settings.

    aws apigateway update-account \
    --patch-operations op='replace',path='/cloudwatchRoleArn',value='<ApiGatewayToCloudWatchLogs ARN>'

    Configure API Gateway CloudWatch Logs Delivery

    See Configure Cloudwatch Logs Delivery

    - + \ No newline at end of file diff --git a/docs/v12.0.0/deployment/choosing_configuring_rds/index.html b/docs/v12.0.0/deployment/choosing_configuring_rds/index.html index 22cb82bec4d..1aec54e01eb 100644 --- a/docs/v12.0.0/deployment/choosing_configuring_rds/index.html +++ b/docs/v12.0.0/deployment/choosing_configuring_rds/index.html @@ -5,7 +5,7 @@ Choosing and configuration your RDS database | Cumulus Documentation - + @@ -37,7 +37,7 @@ using this module to create your RDS cluster, you can configure the autoscaling timeout action, the cluster minimum and maximum capacity, and more as seen in the supported variables for the module.

    Unfortunately, Terraform currently doesn't allow specifying the autoscaling timeout itself, so that value will have to be manually configured in the AWS console or CLI.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/deployment/cloudwatch-logs-delivery/index.html b/docs/v12.0.0/deployment/cloudwatch-logs-delivery/index.html index 4ed1713d69d..1c451874bef 100644 --- a/docs/v12.0.0/deployment/cloudwatch-logs-delivery/index.html +++ b/docs/v12.0.0/deployment/cloudwatch-logs-delivery/index.html @@ -5,13 +5,13 @@ Configure Cloudwatch Logs Delivery | Cumulus Documentation - +
    Version: v12.0.0

    Configure Cloudwatch Logs Delivery

    As an optional configuration step, it is possible to deliver CloudWatch logs to a cross-account shared AWS::Logs::Destination. An operator does this by configuring the cumulus module for your deployment as shown below. The value of the log_destination_arn variable is the ARN of a writeable log destination.

    The value can be either an AWS::Logs::Destination or a Kinesis Stream ARN to which your account can write.

    log_destination_arn           = arn:aws:[kinesis|logs]:us-east-1:123456789012:[streamName|destination:logDestinationName]

    Logs Sent

    Be default, the following logs will be sent to the destination when one is given.

    • Ingest logs
    • Async Operation logs
    • Thin Egress App API Gateway logs (if configured)

    Additional Logs

    If additional logs are needed, you can configure additional_log_groups_to_elk with the Cloudwatch log groups you want to send to the destination. additional_log_groups_to_elk is a map with the key as a descriptor and the value with the Cloudwatch log group name.

    additional_log_groups_to_elk = {
    "HelloWorldTask" = "/aws/lambda/cumulus-example-HelloWorld"
    "MyCustomTask" = "my-custom-task-log-group"
    }
    - + \ No newline at end of file diff --git a/docs/v12.0.0/deployment/components/index.html b/docs/v12.0.0/deployment/components/index.html index 5ca9e83fba9..0af105a5e59 100644 --- a/docs/v12.0.0/deployment/components/index.html +++ b/docs/v12.0.0/deployment/components/index.html @@ -5,7 +5,7 @@ Component-based Cumulus Deployment | Cumulus Documentation - + @@ -39,7 +39,7 @@ Terraform at the same time.

    With remote state, Terraform writes the state data to a remote data store, which can then be shared between all members of a team.

    The recommended approach for handling remote state with Cumulus is to use the S3 backend. This backend stores state in S3 and uses a DynamoDB table for locking.

    See the deployment documentation for a walk-through of creating resources for your remote state using an S3 backend.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/deployment/create_bucket/index.html b/docs/v12.0.0/deployment/create_bucket/index.html index 6bdd0083dc9..3cca42fc728 100644 --- a/docs/v12.0.0/deployment/create_bucket/index.html +++ b/docs/v12.0.0/deployment/create_bucket/index.html @@ -5,13 +5,13 @@ Creating an S3 Bucket | Cumulus Documentation - +
    Version: v12.0.0

    Creating an S3 Bucket

    Buckets can be created on the command line with AWS CLI or via the web interface on the AWS console.

    When creating a protected bucket (a bucket containing data which will be served through the distribution API), make sure to enable S3 server access logging. See S3 Server Access Logging for more details.

    Command line

    Using the AWS command line tool create-bucket s3api subcommand:

    $ aws s3api create-bucket \
    --bucket foobar-internal \
    --region us-west-2 \
    --create-bucket-configuration LocationConstraint=us-west-2
    {
    "Location": "/foobar-internal"
    }

    Note: The region and create-bucket-configuration arguments are only necessary if you are creating a bucket outside of the us-east-1 region.

    Please note security settings and other bucket options can be set via the options listed in the s3api documentation.

    Repeat the above step for each bucket to be created.

    Web interface

    See: AWS "Creating a Bucket" documentation

    - + \ No newline at end of file diff --git a/docs/v12.0.0/deployment/cumulus_distribution/index.html b/docs/v12.0.0/deployment/cumulus_distribution/index.html index 829f4a0bbc4..f331ad9346f 100644 --- a/docs/v12.0.0/deployment/cumulus_distribution/index.html +++ b/docs/v12.0.0/deployment/cumulus_distribution/index.html @@ -5,14 +5,14 @@ Using the Cumulus Distribution API | Cumulus Documentation - +
    Version: v12.0.0

    Using the Cumulus Distribution API

    The Cumulus Distribution API is a set of endpoints that can be used to enable AWS Cognito authentication when downloading data from S3.

    Configuring a Cumulus Distribution deployment

    The Cumulus Distribution API is included in the main Cumulus repo. It is available as part of the terraform-aws-cumulus.zip archive in the latest release.

    These steps assume you're using the Cumulus Deployment Template but can also be used for custom deployments.

    To configure a deployment to use Cumulus Distribution:

    1. Remove or comment the "Thin Egress App Settings" in the Cumulus Template Deploy and enable the Cumulus Distribution settings.
    2. Delete or comment the contents of thin_egress_app.tf and the corresponding Thin Egress App outputs in outputs.tf. These are not necessary for a Cumulus Distribution deployment.
    3. Uncomment the Cumulus Distribution outputs in outputs.tf.
    4. Rename cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.example to cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.

    Cognito Application and User Credentials

    The major prerequisite for using the Cumulus Distribution API is to set up Cognito. If operating within NGAP, this should already be done for you. If operating outside of NGAP, you must set up Cognito yourself, which is beyond the scope of this documentation.

    Given that Cognito is set up, in order to be able to download granule files via the Cumulus Distribution API, you must obtain Cognito user credentials, because any attempt to download such files (that will be, or have been, published to the CMR via your Cumulus deployment) will result in a prompt for you to supply Cognito user credentials. To obtain your own user credentials, talk to your product owner or scrum master for additional information. They should either know how to create the credentials, know who can create them for the team, or be the liaison to the Cognito team.

    Further, whoever helps to obtain your Cognito user credentials should also be able to supply you with the values for the following new variables that you must add to your cumulus-tf/terraform.tfvars file:

    • csdap_host_url: The URL of the Cognito service to which your Cumulus deployment will make Cognito API calls during a distribution (download) event
    • csdap_client_id: The client ID for the Cumulus application registered within the Cognito service
    • csdap_client_password: The client password for the Cumulus application registered within the Cognito service

    Although you might have to wait a bit for your Cognito user credentials, the remaining instructions do not depend upon having them, so you may continue with these instructions while waiting for your credentials.

    Cumulus Distribution URL

    Your Cumulus Distribution URL is used by Cumulus to generate download URLs as part of the granule metadata generated and published to the CMR. For example, a granule download URL will be of the form <distribution url>/<protected bucket>/<key> (or <distribution url>/path/to/file, if using a custom bucket map, as explained further below).

    By default, the value of your distribution URL is the URL of your private Cumulus Distribution API Gateway (the API Gateway named <prefix>-distribution, once you deploy the Cumulus Distribution module). Therefore, by default, the generated download URLs are private, and thus inaccessible directly, but there are 2 ways to address this issue (both of which are detailed below): (a) use tunneling (typically in development) or (b) put a CloudFront URL in front of your API Gateway (typically in production, and perhaps UAT and/or SIT).

    In either case, you must first know the default URL (i.e., the URL for the private Cumulus Distribution API Gateway). In order to obtain this default URL, you must first deploy your cumulus-tf module with the new Cumulus Distribution module, and once your initial deployment is complete, one of the Terraform outputs will be cumulus_distribution_api_uri, which is the URL for the private API Gateway.

    You may override this default URL by adding a cumulus_distribution_url variable to your cumulus-tf/terraform.tfvars file, and setting it to one of the following values (both of which are explained below):

    1. The default URL, but with a port added to it, in order to allow you to configure tunneling (typically only in development)
    2. A CloudFront URL placed in front of your Cumulus Distribution API Gateway (typically only for Production, but perhaps also for a UAT or SIT environment)

    The following subsections explain these approaches, in turn.

    Using your Cumulus Distribution API Gateway URL as your distribution URL

    Since your Cumulus Distribution API Gateway URL is private, the only way you can use it to confirm that your integration with Cognito is working is by using tunneling (again, generally for development), as described here. Here is an outline of the required steps, with details provided further below:

    1. Create/import a key pair into your AWS EC2 service (if you haven't already done so)
    2. Add a reference to the name of the key pair to your Terraform variables (we'll set the key_name Terraform variable)
    3. Choose an open local port on your machine (we'll use 9000 in the following details)
    4. Add a reference to the value of your cumulus_distribution_api_uri (mentioned earlier), including your chosen port (we'll set the cumulus_distribution_url Terraform variable)
    5. Redeploy Cumulus
    6. Add an entry to your /etc/hosts file
    7. Add a redirect URI to Cognito, via the Cognito API
    8. Install the Session Manager Plugin for the AWS CLI (if you haven't already done so; assuming you have already installed the AWS CLI)
    9. Add a sample file to S3 to test downloading via Cognito

    To create or import an existing key pair, you can use the AWS CLI (see aws ec2 import-key-pair), or the AWS Console (see Amazon EC2 key pairs and Linux instances).

    Once your key pair is added to AWS, add the following to your cumulus-tf/terraform.tfvars file:

    key_name = "<name>"
    cumulus_distribution_url = "https://<id>.execute-api.<region>.amazonaws.com:<port>/dev/"

    where:

    • <name> is the name of the key pair you just added to AWS
    • <id> and <region> are the corresponding parts from your cumulus_distribution_api_uri output variable
    • <port> is your open local port of choice (9000 is typically a good choice)

    Once you save your variable changes, redeploy your cumulus-tf module.

    While your deployment runs, add the following entry to your /etc/hosts file, replacing <hostname> with the host name of the cumulus_distribution_url Terraform variable you just added above:

    localhost <hostname>

    Next, you'll need to use the Cognito API to add the value of your cumulus_distribution_url Terraform variable as a Cognito redirect URI. To do so, use your favorite tool (e.g., curl, wget, Postman, etc.) to make a BasicAuth request to the Cognito API, using the following details:

    • method: POST
    • base URL: the value of your csdap_host_url Terraform variable
    • path: /authclient/updateRedirectUri
    • username: the value of your csdap_client_id Terraform variable
    • password: the value of your csdap_client_password Terraform variable
    • headers: Content-Type='application/x-www-form-urlencoded'
    • body: redirect_uri=<cumulus_distribution_url>/login

    where <cumulus_distribution_url> is the value of your cumulus_distribution_url Terraform variable. Note the /login path at the end of the redirect_uri value.

    For reference, see the Cognito Authentication Service API.

    Next, install the Session Manager Plugin for the AWS CLI. If running on macOS, and you use Homebrew, you can install it simply as follows:

    brew install --cask session-manager-plugin --no-quarantine

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    At this point, you should be ready to open a tunnel and attempt to download your sample file via your browser, summarized as follows:

    1. Determine your ec2 instance ID
    2. Connect to the NASA VPN
    3. Start an AWS SSM session
    4. Open an ssh tunnel
    5. Use a browser to navigate to your file

    To determine your ec2 instance ID for your Cumulus deployment, run the follow command, where <profile> is the name of the appropriate AWS profile to use, and <prefix> is the value of your prefix Terraform variable:

    aws --profile <profile> ec2 describe-instances --filters Name=tag:Deployment,Values=<prefix> Name=instance-state-name,Values=running --query "Reservations[0].Instances[].InstanceId" --output text

    IMPORTANT: Before proceeding with the remaining steps, make sure you're connected to the NASA VPN.

    Use the value output from the command above in place of <id> in the following command, which will start an SSM session:

    aws ssm start-session --target <id> --document-name AWS-StartPortForwardingSession --parameters portNumber=22,localPortNumber=6000

    If successful, you should see output similar to the following:

    Starting session with SessionId: NGAPShApplicationDeveloper-***
    Port 6000 opened for sessionId NGAPShApplicationDeveloper-***.
    Waiting for connections...

    Open another terminal window, and open a tunnel with port forwarding, using your chosen port from above (e.g., 9000):

    ssh -4 -p 6000 -N -L <port>:<api-gateway-host>:443 ec2-user@127.0.0.1

    where:

    • <port> is the open local port you chose earlier (e.g., 9000)
    • <api-gateway-host> is the hostname of your private API Gateway (i.e., the host portion of the URL you used as the value of your cumulus_distribution_url Terraform variable above)

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3 above.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    Once you're finished testing, clean up as follows:

    1. Kill your ssh tunnel (Ctrl-C)
    2. Kill your AWS SSM session (Ctrl-C)
    3. If you like, disconnect from the NASA VPC

    While this is a relatively lengthy process, things are much easier when using CloudFront, such as in Production (OPS), SIT, or UAT, as explained next.

    Using a CloudFront URL as your distribution URL

    In Production (OPS), and perhaps in other environments, such as UAT and SIT, you'll need to provide a publicly accessible URL for users to use for downloading (distributing) granule files.

    This is generally done by placing a CloudFront URL in front of your private Cumulus Distribution API Gateway. In order to create such a CloudFront URL, contact the person who helped you obtain your Cognito credentials, and request a CloudFront URL with the following details:

    • The private, backing URL, which is the value of your cumulus_distribution_api_uri Terraform output value
    • A request to add the AWS account's VPC to the whitelist

    Once this request is completed, and you obtain the new CloudFront URL, override your default distribution URL with the CloudFront URL by adding the following to your cumulus-tf/terraform.tfvars file:

    cumulus_distribution_url = <cloudfront_url>

    In addition, add a Cognito redirect URI, as detailed in the previous section. Note that in this case, the value you'll use for redirect_uri is <cloudfront_url>/login since the value of your cumulus_distribution_url is now your CloudFront URL.

    At this point, it is assumed that you have added the appropriate values for this environment for the variables described at the top (csdap_host_url, csdap_client_id, and csdap_client_password).

    Redeploy Cumulus with your new/updated Terraform variables.

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    S3 Bucket Mapping

    An S3 Bucket map allows users to abstract bucket names. If the bucket names change at any point, only the bucket map would need to be updated instead of every S3 link.

    The Cumulus Distribution API uses a bucket_map.yaml or bucket_map.yaml.tmpl file to determine which buckets to serve. See the examples.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple json mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    Note: Cumulus only supports a one-to-one mapping of bucket -> Cumulus Distribution path for 'distribution' buckets. Also, the bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Switching from the Thin Egress App to Cumulus Distribution

    If you have previously deployed the Thin Egress App (TEA) as your distribution app, you can switch to Cumulus Distribution by following the steps above.

    Note, however, that the cumulus_distribution module will generate a bucket map cache and overwrite any existing bucket map caches created by TEA.

    There will also be downtime while your API gateway is updated.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/deployment/index.html b/docs/v12.0.0/deployment/index.html index b1d657ec3ac..957065412f9 100644 --- a/docs/v12.0.0/deployment/index.html +++ b/docs/v12.0.0/deployment/index.html @@ -5,7 +5,7 @@ How to Deploy Cumulus | Cumulus Documentation - + @@ -21,7 +21,7 @@ for deployment's EC2 instances and allows you to connect to them via SSH/SSM.

    Consider the sizing of your Cumulus instance when configuring your variables.

    Choose a distribution API

    Cumulus can be configured to use either the Thin Egress App (TEA) or the Cumulus Distribution API. The default selection is the Thin Egress App if you're using the Deployment Template.

    IMPORTANT! If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    Configure the Thin Egress App

    The Thin Egress App can be used for Cumulus distribution and is the default selection. It allows authentication using Earthdata Login. Follow the steps in the documentation to configure distribution in your cumulus-tf deployment.

    Configure the Cumulus Distribution API (optional)

    If you would prefer to use the Cumulus Distribution API, which supports AWS Cognito authentication, follow these steps to configure distribution in your cumulus-tf deployment.

    Initialize Terraform

    Follow the above instructions to initialize Terraform using terraform init3.

    Deploy

    Run terraform apply to deploy the resources. Type yes when prompted to confirm that you want to create the resources. Assuming the operation is successful, you should see output like this:

    Apply complete! Resources: 292 added, 0 changed, 0 destroyed.

    Outputs:

    archive_api_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/token
    archive_api_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/
    distribution_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/login
    distribution_url = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/

    Note: Be sure to copy the redirect URLs, as you will use them to update your Earthdata application.

    Update Earthdata Application

    You will need to add two redirect URLs to your EarthData login application.

    1. Login to URS.
    2. Under My Applications -> Application Administration -> use the edit icon of your application.
    3. Under Manage -> redirect URIs, add the Archive API url returned from the stack deployment
      • e.g. archive_api_redirect_uri = https://<czbbkscuy6>.execute-api.us-east-1.amazonaws.com/dev/token.
    4. Also add the Distribution url
      • e.g. distribution_redirect_uri = https://<kido2r7kji>.execute-api.us-east-1.amazonaws.com/dev/login1.
    5. You may delete the placeholder url you used to create the application.

    If you've lost track of the needed redirect URIs, they can be located on the API Gateway. Once there, select <prefix>-archive and/or <prefix>-thin-egress-app-EgressGateway, Dashboard and utilizing the base URL at the top of the page that is accompanied by the text Invoke this API at:. Make sure to append /token for the archive URL and /login to the thin egress app URL.


    Deploy Cumulus dashboard

    Dashboard Requirements

    Please note that the requirements are similar to the Cumulus stack deployment requirements. The installation instructions below include a step that will install/use the required node version referenced in the .nvmrc file in the dashboard repository.

    Prepare AWS

    Create S3 bucket for dashboard:

    • Create it, e.g. <prefix>-dashboard. Use the command line or console as you did when preparing AWS configuration.
    • Configure the bucket to host a website:
      • AWS S3 console: Select <prefix>-dashboard bucket then, "Properties" -> "Static Website Hosting", point to index.html
      • CLI: aws s3 website s3://<prefix>-dashboard --index-document index.html
    • The bucket's url will be http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or you can find it on the AWS console via "Properties" -> "Static website hosting" -> "Endpoint"
    • Ensure the bucket's access permissions allow your deployment user access to write to the bucket

    Install dashboard

    To install the dashboard, clone the Cumulus dashboard repository into the root deploy directory and install dependencies with npm install:

      git clone https://github.com/nasa/cumulus-dashboard
    cd cumulus-dashboard
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Dashboard versioning

    By default, the master branch will be used for dashboard deployments. The master branch of the dashboard repo contains the most recent stable release of the dashboard.

    If you want to test unreleased changes to the dashboard, use the develop branch.

    Each release/version of the dashboard will have a tag in the dashboard repo. Release/version numbers will use semantic versioning (major/minor/patch).

    To checkout and install a specific version of the dashboard:

      git fetch --tags
    git checkout <version-number> # e.g. v1.2.0
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Building the dashboard

    Note: These environment variables are available during the build: APIROOT, DAAC_NAME, STAGE, HIDE_PDR. Any of these can be set on the command line to override the values contained in config.js when running the build below.

    To configure your dashboard for deployment, set the APIROOT environment variable to your app's API root.2

    Build the dashboard from the dashboard repository root directory, cumulus-dashboard:

      APIROOT=<your_api_root> npm run build

    Dashboard deployment

    Deploy dashboard to s3 bucket from the cumulus-dashboard directory:

    Using AWS CLI:

      aws s3 sync dist s3://<prefix>-dashboard --acl public-read

    From the S3 Console:

    • Open the <prefix>-dashboard bucket, click 'upload'. Add the contents of the 'dist' subdirectory to the upload. Then select 'Next'. On the permissions window allow the public to view. Select 'Upload'.

    You should be able to visit the dashboard website at http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or find the url <prefix>-dashboard -> "Properties" -> "Static website hosting" -> "Endpoint" and login with a user that you configured for access in the Configure and Deploy the Cumulus Stack step.


    Cumulus Instance Sizing

    The Cumulus deployment default sizing for Elasticsearch instances, EC2 instances, and Autoscaling Groups are small and designed for testing and cost savings. The default settings are likely not suitable for production workloads. Sizing is highly individual and dependent on expected load and archive size.

    Please be cognizant of costs as any change in size will affect your AWS bill. AWS provides a pricing calculator for estimating costs.

    Elasticsearch

    The mappings file contains all of the data types that will be indexed into Elasticsearch. Elasticsearch sizing is tied to your archive size, including your collections, granules, and workflow executions that will be stored.

    AWS provides documentation on calculating and configuring for sizing.

    In addition to size you'll want to consider the number of nodes which determine how the system reacts in the event of a failure.

    Configuration can be done in the data persistence module in elasticsearch_config and the cumulus module in es_index_shards.

    If you make changes to your Elasticsearch configuration you will need to reindex for those changes to take effect.

    EC2 instances and autoscaling groups

    EC2 instances are used for long-running operations (i.e. generating a reconciliation report) and long-running workflow tasks. Configuration for your ECS cluster is achieved via Cumulus deployment variables.

    When configuring your ECS cluster consider:

    • The EC2 instance type and EBS volume size needed to accommodate your workloads. Configured as ecs_cluster_instance_type and ecs_cluster_instance_docker_volume_size.
    • The minimum and desired number of instances on hand to accommodate your workloads. Configured as ecs_cluster_min_size and ecs_cluster_desired_size.
    • The maximum number of instances you will need and are willing to pay for to accommodate your heaviest workloads. Configured as ecs_cluster_max_size.
    • Your autoscaling parameters: ecs_cluster_scale_in_adjustment_percent, ecs_cluster_scale_out_adjustment_percent, ecs_cluster_scale_in_threshold_percent, and ecs_cluster_scale_out_threshold_percent.

    Footnotes


    1. Run terraform init if:

      • This is the first time deploying the module
      • You have added any additional child modules, including Cumulus components
      • You have updated the source for any of the child modules

    2. To add another redirect URIs to your application. On Earthdata home page, select "My Applications". Scroll down to "Application Administration" and use the edit icon for your application. Then Manage -> Redirect URIs.

    3. The API root can be found a number of ways. The easiest is to note it in the output of the app deployment step. But you can also find it from the AWS console -> Amazon API Gateway -> APIs -> <prefix>-archive -> Dashboard, and reading the URL at the top after "Invoke this API at"

    - + \ No newline at end of file diff --git a/docs/v12.0.0/deployment/postgres_database_deployment/index.html b/docs/v12.0.0/deployment/postgres_database_deployment/index.html index 6d02525c37e..4ab25295fb1 100644 --- a/docs/v12.0.0/deployment/postgres_database_deployment/index.html +++ b/docs/v12.0.0/deployment/postgres_database_deployment/index.html @@ -5,7 +5,7 @@ PostgreSQL Database Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ cumulus-rds-tf that will deploy an AWS RDS Aurora Serverless PostgreSQL 10.2 compatible database cluster, and optionally provision a single deployment database with credentialed secrets for use with Cumulus.

    We have provided an example terraform deployment using this module in the Cumulus template-deploy repository on github.

    Use of this example involves:

    • Creating/configuring a Terraform module directory
    • Using Terraform to deploy resources to AWS

    Requirements

    Configuration/installation of this module requires the following:

    • Terraform
    • git
    • A VPC configured for use with Cumulus Core. This should match the subnets you provide when Deploying Cumulus to allow Core's lambdas to properly access the database.
    • At least two subnets across multiple AZs. These should match the subnets you provide as configuration when Deploying Cumulus, and should be within the same VPC.

    Needed Git Repositories

    Assumptions

    OS/Environment

    The instructions in this module require Linux/MacOS. While deployment via Windows is possible, it is unsupported.

    Terraform

    This document assumes knowledge of Terraform. If you are not comfortable working with Terraform, the following links should bring you up to speed:

    For Cumulus specific instructions on installation of Terraform, refer to the main Cumulus Installation Documentation

    Aurora/RDS

    This document also assumes some basic familiarity with PostgreSQL databases, and Amazon Aurora/RDS. If you're unfamiliar consider perusing the AWS docs, and the Aurora Serverless V1 docs.

    Prepare deployment repository

    If you already are working with an existing repository that has a configured rds-cluster-tf deployment for the version of Cumulus you intend to deploy or update, or just need to configure this module for your repository, skip to Prepare AWS configuration.

    Clone the cumulus-template-deploy repo and name appropriately for your organization:

      git clone https://github.com/nasa/cumulus-template-deploy <repository-name>

    We will return to configuring this repo and using it for deployment below.

    Optional: Create a new repository

    Create a new repository on Github so that you can add your workflows and other modules to source control:

      git remote set-url origin https://github.com/<org>/<repository-name>
    git push origin master

    You can then add/commit changes as needed.

    Note: If you are pushing your deployment code to a git repo, make sure to add terraform.tf and terraform.tfvars to .gitignore, as these files will contain sensitive data related to your AWS account.


    Prepare AWS configuration

    To deploy this module, you need to make sure that you have the following steps from the Cumulus deployment instructions in similar fashion for this module:

    --

    Configure and deploy the module

    When configuring this module, please keep in mind that unlike Cumulus deployment, this module should be deployed once to create the database cluster and only thereafter to make changes to that configuration/upgrade/etc. This module does not need to be re-deployed for each Core update.

    These steps should be executed in the rds-cluster-tf directory of the template deploy repo that you previously cloned. Run the following to copy the example files:

    cd rds-cluster-tf/
    cp terraform.tf.example terraform.tf
    cp terraform.tfvars.example terraform.tfvars

    In terraform.tf, configure the remote state settings by substituting the appropriate values for:

    • bucket
    • dynamodb_table
    • PREFIX (whatever prefix you've chosen for your deployment)

    Fill in the appropriate values in terraform.tfvars. See the rds-cluster-tf module variable definitions for more detail on all of the configuration options. A few notable configuration options are documented in the next section.

    Configuration Options

    • deletion_protection -- defaults to true. Set it to false if you want to be able to delete your cluster with a terraform destroy without manually updating the cluster.
    • db_admin_username -- cluster database administration username. Defaults to postgres.
    • db_admin_password -- required variable that specifies the admin user password for the cluster. To randomize this on each deployment, consider using a random_string resource as input.
    • region -- defaults to us-east-1.
    • subnets -- requires at least 2 across different AZs. For use with Cumulus, these AZs should match the values you configure for your lambda_subnet_ids.
    • max_capacity -- the max ACUs the cluster is allowed to use. Carefully consider cost/performance concerns when setting this value.
    • min_capacity -- the minimum ACUs the cluster will scale to
    • provision_user_database -- Optional flag to allow module to provision a user database in addition to creating the cluster. Described in the next section.

    Provision user and user database

    If you wish for the module to provision a PostgreSQL database on your new cluster and provide a secret for access in the module output, in addition to managing the cluster itself, the following configuration keys are required:

    • provision_user_database -- must be set to true, this configures the module to deploy a lambda that will create the user database, and update the provided configuration on deploy.
    • permissions_boundary_arn -- the permissions boundary to use in creating the roles for access the provisioning lambda will need. This should in most use cases be the same one used for Cumulus Core deployment.
    • rds_user_password -- the value to set the user password to
    • prefix -- this value will be used to set a unique identifier the ProvisionDatabase lambda, as well as name the provisioned user/database.

    Once configured, the module will deploy the lambda, and run it on each provision, creating the configured database if it does not exist, updating the user password if that value has been changed, and updating the output user database secret.

    Setting provision_user_database to false after provisioning will not result in removal of the configured database, as the lambda is non-destructive as configured in this module.

    Please Note: This functionality is limited in that it will only provision a single database/user and configure a basic database, and should not be used in scenarios where more complex configuration is required.

    Initialize Terraform

    Run terraform init

    You should see output like:

    * provider.aws: version = "~> 2.32"

    Terraform has been successfully initialized!

    Deploy

    Run terraform apply to deploy the resources.

    If re-applying this module, variables (e.g. engine_version, snapshot_identifier ) that force a recreation of the database cluster may result in data loss if deletion protection is disabled. Examine the changeset carefully for resources that will be re-created/destroyed before applying.

    Review the changeset, and assuming it looks correct, type yes when prompted to confirm that you want to create all of the resources.

    Assuming the operation is successful, you should see output similar to the following (this example omits the creation of a user database/lambdas/security groups):

    terraform apply

    An execution plan has been generated and is shown below.
    Resource actions are indicated with the following symbols:
    + create

    Terraform will perform the following actions:

    # module.rds_cluster.aws_db_subnet_group.default will be created
    + resource "aws_db_subnet_group" "default" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + subnet_ids = [
    + "subnet-xxxxxxxxx",
    + "subnet-xxxxxxxxx",
    ]
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    }

    # module.rds_cluster.aws_rds_cluster.cumulus will be created
    + resource "aws_rds_cluster" "cumulus" {
    + apply_immediately = true
    + arn = (known after apply)
    + availability_zones = (known after apply)
    + backup_retention_period = 1
    + cluster_identifier = "xxxxxxxxx"
    + cluster_identifier_prefix = (known after apply)
    + cluster_members = (known after apply)
    + cluster_resource_id = (known after apply)
    + copy_tags_to_snapshot = false
    + database_name = "xxxxxxxxx"
    + db_cluster_parameter_group_name = (known after apply)
    + db_subnet_group_name = (known after apply)
    + deletion_protection = true
    + enable_http_endpoint = true
    + endpoint = (known after apply)
    + engine = "aurora-postgresql"
    + engine_mode = "serverless"
    + engine_version = "10.12"
    + final_snapshot_identifier = "xxxxxxxxx"
    + hosted_zone_id = (known after apply)
    + id = (known after apply)
    + kms_key_id = (known after apply)
    + master_password = (sensitive value)
    + master_username = "xxxxxxxxx"
    + port = (known after apply)
    + preferred_backup_window = "07:00-09:00"
    + preferred_maintenance_window = (known after apply)
    + reader_endpoint = (known after apply)
    + skip_final_snapshot = false
    + storage_encrypted = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_security_group_ids = (known after apply)

    + scaling_configuration {
    + auto_pause = true
    + max_capacity = 4
    + min_capacity = 2
    + seconds_until_auto_pause = 300
    + timeout_action = "RollbackCapacityChange"
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret.rds_login will be created
    + resource "aws_secretsmanager_secret" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + policy = (known after apply)
    + recovery_window_in_days = 30
    + rotation_enabled = (known after apply)
    + rotation_lambda_arn = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }

    + rotation_rules {
    + automatically_after_days = (known after apply)
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret_version.rds_login will be created
    + resource "aws_secretsmanager_secret_version" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + secret_id = (known after apply)
    + secret_string = (sensitive value)
    + version_id = (known after apply)
    + version_stages = (known after apply)
    }

    # module.rds_cluster.aws_security_group.rds_cluster_access will be created
    + resource "aws_security_group" "rds_cluster_access" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + egress = (known after apply)
    + id = (known after apply)
    + ingress = (known after apply)
    + name = (known after apply)
    + name_prefix = "cumulus_rds_cluster_access_ingress"
    + owner_id = (known after apply)
    + revoke_rules_on_delete = false
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_id = "vpc-xxxxxxxxx"
    }

    # module.rds_cluster.aws_security_group_rule.rds_security_group_allow_PostgreSQL will be created
    + resource "aws_security_group_rule" "rds_security_group_allow_postgres" {
    + from_port = 5432
    + id = (known after apply)
    + protocol = "tcp"
    + security_group_id = (known after apply)
    + self = true
    + source_security_group_id = (known after apply)
    + to_port = 5432
    + type = "ingress"
    }

    Plan: 6 to add, 0 to change, 0 to destroy.

    Do you want to perform these actions?
    Terraform will perform the actions described above.
    Only 'yes' will be accepted to approve.

    Enter a value: yes

    module.rds_cluster.aws_db_subnet_group.default: Creating...
    module.rds_cluster.aws_security_group.rds_cluster_access: Creating...
    module.rds_cluster.aws_secretsmanager_secret.rds_login: Creating...

    Then, after the resources are created:

    Apply complete! Resources: X added, 0 changed, 0 destroyed.
    Releasing state lock. This may take a few moments...

    Outputs:

    admin_db_login_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxxxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmdR
    admin_db_login_secret_version = xxxxxxxxx
    rds_endpoint = xxxxxxxxx.us-east-1.rds.amazonaws.com
    security_group_id = xxxxxxxxx
    user_credentials_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmpXA

    Note the output values for admin_db_login_secret_arn (and optionally user_credentials_secret_arn) as these provide the AWS Secrets Manager secret required to access the database as the administrative user and, optionally, the user database credentials Cumulus requires as well.

    The content of each of these secrets are is in the form:

    {
    "database": "postgres",
    "dbClusterIdentifier": "clusterName",
    "engine": "postgres",
    "host": "xxx",
    "password": "defaultPassword",
    "port": 5432,
    "username": "xxx"
    }
    • database -- the PostgreSQL database used by the configured user
    • dbClusterIdentifier -- the value set by the cluster_identifier variable in the terraform module
    • engine -- the Aurora/RDS database engine
    • host -- the RDS service host for the database in the form (dbClusterIdentifier)-(AWS ID string).(region).rds.amazonaws.com
    • password -- the database password
    • username -- the account username
    • port -- The database connection port, should always be 5432

    Next Steps

    The database cluster has been created/updated! From here you can continue to add additional user accounts, databases and other database configuration.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/deployment/share-s3-access-logs/index.html b/docs/v12.0.0/deployment/share-s3-access-logs/index.html index 36dc4c74df0..141d2b6b2f5 100644 --- a/docs/v12.0.0/deployment/share-s3-access-logs/index.html +++ b/docs/v12.0.0/deployment/share-s3-access-logs/index.html @@ -5,14 +5,14 @@ Share S3 Access Logs | Cumulus Documentation - +
    Version: v12.0.0

    Share S3 Access Logs

    It is possible through Cumulus to share S3 access logs across multiple S3 packages using the S3 replicator package.

    S3 Replicator

    The S3 Replicator is a node package that contains a simple lambda function, associated permissions, and the Terraform instructions to replicate create-object events from one S3 bucket to another.

    First ensure that you have enabled S3 Server Access Logging.

    Next configure your config.tfvars as described in the s3-replicator/README.md to correspond to your deployment. The source_bucket and source_prefix are determined by how you enabled the S3 Server Access Logging.

    In order to deploy the s3-replicator with cumulus you will need to add the module to your terraform main.tf definition. e.g.

    module "s3-replicator" {
    source = "<path to s3-replicator.zip>"
    prefix = var.prefix
    vpc_id = var.vpc_id
    subnet_ids = var.subnet_ids
    permissions_boundary = var.permissions_boundary_arn
    source_bucket = var.s3_replicator_config.source_bucket
    source_prefix = var.s3_replicator_config.source_prefix
    target_bucket = var.s3_replicator_config.target_bucket
    target_prefix = var.s3_replicator_config.target_prefix
    }

    The terraform source package can be found on the Cumulus github release page under the asset tab terraform-aws-cumulus-s3-replicator.zip.

    ESDIS Metrics

    In the NGAP environment, the ESDIS Metrics team has set up an ELK stack to process logs from Cumulus instances. To use this system, you must deliver any S3 Server Access logs that Cumulus creates.

    Configure the S3 replicator as described above using the target_bucket and target_prefix provided by the metrics team.

    The metrics team has taken care of setting up Logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/deployment/terraform-best-practices/index.html b/docs/v12.0.0/deployment/terraform-best-practices/index.html index fc57d4b9baa..2ed539f25a6 100644 --- a/docs/v12.0.0/deployment/terraform-best-practices/index.html +++ b/docs/v12.0.0/deployment/terraform-best-practices/index.html @@ -5,7 +5,7 @@ Terraform Best Practices | Cumulus Documentation - + @@ -88,7 +88,7 @@ AWS CLI command, replacing PREFIX with your deployment prefix name:

    aws resourcegroupstaggingapi get-resources \
    --query "ResourceTagMappingList[].ResourceARN" \
    --tag-filters Key=Deployment,Values=PREFIX

    Ideally, the output should be an empty list, but if it is not, then you may need to manually delete the listed resources.

    Configuring the Cumulus deployment: link Restoring a previous version: link

    - + \ No newline at end of file diff --git a/docs/v12.0.0/deployment/thin_egress_app/index.html b/docs/v12.0.0/deployment/thin_egress_app/index.html index 3c5e5180884..b10a89a4ea2 100644 --- a/docs/v12.0.0/deployment/thin_egress_app/index.html +++ b/docs/v12.0.0/deployment/thin_egress_app/index.html @@ -5,7 +5,7 @@ Using the Thin Egress App for Cumulus distribution | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v12.0.0

    Using the Thin Egress App for Cumulus distribution

    The Thin Egress App (TEA) is an app running in Lambda that allows retrieving data from S3 using temporary links and provides URS integration.

    Configuring a TEA deployment

    TEA is deployed using Terraform modules. Refer to these instructions for guidance on how to integrate new components with your deployment.

    The cumulus-template-deploy repository cumulus-tf/main.tf contains a thin_egress_app for distribution.

    The TEA module provides these instructions showing how to add it to your deployment and the following are instructions to configure the thin_egress_app module in your Cumulus deployment.

    Create a secret for signing Thin Egress App JWTs

    The Thin Egress App uses JWTs internally to authenticate requests and requires a secret stored in AWS Secrets Manager containing SSH keys that are used to sign the JWTs.

    See the Thin Egress App documentation on how to create this secret with the correct values. It will be used later to set the thin_egress_jwt_secret_name variable when deploying the Cumulus module.

    bucket_map.yaml

    The Thin Egress App uses a bucket_map.yaml file to determine which buckets to serve. Documentation of the file format is available here.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple json mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    Please note: Cumulus only supports a one-to-one mapping of bucket->TEA path for 'distribution' buckets.

    Optionally configure a custom bucket map

    A simple config would look something like this:

    bucket_map.yaml
    MAP:
    my-protected: my-protected
    my-public: my-public

    PUBLIC_BUCKETS:
    - my-public

    Please note: your custom bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Optionally configure shared variables

    The cumulus module deploys certain components that interact with TEA. As a result, the cumulus module requires that if you are specifying a value for the stage_name variable to the TEA module, you must use the same value for the tea_api_gateway_stage variable to the cumulus module.

    One way to keep these variable values in sync across the modules is to use Terraform local values to define values to use for the variables for both modules. This approach is shown in the Cumulus core example deployment code.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/deployment/upgrade-readme/index.html b/docs/v12.0.0/deployment/upgrade-readme/index.html index 084142ae302..cf6309f1a06 100644 --- a/docs/v12.0.0/deployment/upgrade-readme/index.html +++ b/docs/v12.0.0/deployment/upgrade-readme/index.html @@ -5,7 +5,7 @@ Upgrading Cumulus | Cumulus Documentation - + @@ -15,7 +15,7 @@ deployment functions correctly. Please refer to some recommended smoke tests given above, and consider additional tests appropriate for your particular deployment and environment.

    Update Cumulus Dashboard

    If there are breaking (or otherwise significant) changes to the Cumulus API, you should also upgrade your Cumulus Dashboard deployment to use the version of the Cumulus API matching the version of Cumulus to which you are migrating.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/development/forked-pr/index.html b/docs/v12.0.0/development/forked-pr/index.html index d7777ab8a5b..42be1c735c9 100644 --- a/docs/v12.0.0/development/forked-pr/index.html +++ b/docs/v12.0.0/development/forked-pr/index.html @@ -5,13 +5,13 @@ Issuing PR From Forked Repos | Cumulus Documentation - +
    Version: v12.0.0

    Issuing PR From Forked Repos

    Fork the Repo

    • Fork the Cumulus repo
    • Create a new branch from the branch you'd like to contribute to
    • If an issue does't already exist, submit one (see above)

    Create a Pull Request

    Reviewing PRs from Forked Repos

    Upon submission of a pull request, the Cumulus development team will review the code.

    Once the code passes an initial review, the team will run the CI tests against the proposed update.

    The request will then either be merged, declined, or an adjustment to the code will be requested via the issue opened with the original PR request.

    PRs from forked repos cannot directly merged to master. Cumulus reviews must follow the following steps before completing the review process:

    1. Create a new branch:

        git checkout -b from-<name-of-the-branch> master
    2. Push the new branch to GitHub

    3. Change the destination of the forked PR to the new branch that was just pushed

      Screenshot of Github interface showing how to change the base branch of a pull request

    4. After code review and approval, merge the forked PR to the new branch.

    5. Create a PR for the new branch to master.

    6. If the CI tests pass, merge the new branch to master and close the issue. If the CI tests do not pass, request an amended PR from the original author/ or resolve failures as appropriate.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/development/integration-tests/index.html b/docs/v12.0.0/development/integration-tests/index.html index cc640f48218..3808358c615 100644 --- a/docs/v12.0.0/development/integration-tests/index.html +++ b/docs/v12.0.0/development/integration-tests/index.html @@ -5,7 +5,7 @@ Integration Tests | Cumulus Documentation - + @@ -19,7 +19,7 @@ in the commit message.

    If you create a new stack and want to be able to run integration tests against it in CI, you will need to add it to bamboo/select-stack.js.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/development/quality-and-coverage/index.html b/docs/v12.0.0/development/quality-and-coverage/index.html index 5b6910b2fe2..e6ca98a9b36 100644 --- a/docs/v12.0.0/development/quality-and-coverage/index.html +++ b/docs/v12.0.0/development/quality-and-coverage/index.html @@ -5,7 +5,7 @@ Code Coverage and Quality | Cumulus Documentation - + @@ -23,7 +23,7 @@ here.

    To run linting on the markdown files, run npm run lint-md.

    Audit

    This project uses audit-ci to run a security audit on the package dependency tree. This must pass prior to merge. The configured rules for audit-ci can be found here.

    To execute an audit, run npm run audit.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/development/release/index.html b/docs/v12.0.0/development/release/index.html index 44de94f6193..cdf2c2ccb3e 100644 --- a/docs/v12.0.0/development/release/index.html +++ b/docs/v12.0.0/development/release/index.html @@ -5,7 +5,7 @@ Versioning and Releases | Cumulus Documentation - + @@ -15,7 +15,7 @@ It's useful to use the search feature of your code editor or grep to see if there any references to the old package versions. In bash shell you can run

    find . -name package.json -exec grep -nH "@cumulus/.*MAJOR\.MINOR\.PATCH.*" {} \;

    Verify that each of those is updated to the new MAJOR.MINOR.PATCH verion you are trying to release.

    A similar search for alpha and beta versions should be run on the release version and any problems should be fixed.

    find . -name package.json -exec grep -nHE "MAJOR\.MINOR\.PATCH.*(alpha|beta)" {} \;

    3. Check Cumulus Dashboard PRs for Version Bump

    There may be unreleased changes in the Cumulus Dashboard project that rely on this unreleased Cumulus Core version.

    If there is exists a PR in the cumulus-dashboard repo with a name containing: "Version Bump for Next Cumulus API Release":

    • There will be a placeholder change-me value that should be replaced with the Cumulus Core to-be-released-version.
    • Mark that PR as ready to be reviewed.

    4. Update CHANGELOG.md

    Update the CHANGELOG.md. Put a header under the Unreleased section with the new version number and the date.

    Add a link reference for the github "compare" view at the bottom of the CHANGELOG.md, following the existing pattern. This link reference should create a link in the CHANGELOG's release header to changes in the corresponding release.

    5. Update DATA_MODEL_CHANGELOG.md

    Similar to #4, make sure the DATA_MODEL_CHANGELOG is updated if there are data model changes in the release, and the link reference at the end of the document is updated as appropriate.

    6. Update CONTRIBUTORS.md

    ./bin/update-contributors.sh
    git add CONTRIBUTORS.md

    Commit and push these changes, if any.

    7. Update Cumulus package API documentation

    Update auto-generated API documentation for any Cumulus packages that have it:

    npm run docs-build-packages

    Commit and push these changes, if any.

    8. Cut new version of Cumulus Documentation

    If this is a backport, do not create a new version of the documentation. For various reasons, we do not merge backports back to master, other than changelog notes. Documentation changes for backports will not be published to our documentation website.

    cd website
    npm run version ${release_version}
    git add .

    Where ${release_version} corresponds to the version tag v1.2.3, for example.

    Commit and push these changes.

    9. Create a pull request against the minor version branch

    1. Push the release branch (e.g. release-1.2.3) to GitHub.

    2. Create a PR against the minor version base branch (e.g. release-1.2.x).

    3. Configure Bamboo to run automated tests against this PR by finding the branch plan for the release branch (release-1.2.3) and setting only these variables:

      • GIT_PR: true
      • SKIP_AUDIT: true

      IMPORTANT: Do NOT set the PUBLISH_FLAG variable to true for this branch plan. The actual publishing of the release will be handled by a separate, manually triggered branch plan.

      Screenshot of Bamboo CI interface showing the configuration of the GIT_PR branch variable to have a value of &quot;true&quot;

    4. Verify that the Bamboo build for the PR succeeds and then merge to the minor version base branch (release-1.2.x).

      • It is safe to do a squash merge in this instance, but not required
    5. You may delete your release branch (release-1.2.3) after merging to the base branch.

    10. Create a git tag for the release

    Check out the minor version base branch (release-1.2.x) now that your changes are merged in and do a git pull.

    Ensure you are on the latest commit.

    Create and push a new git tag:

        git tag -a vMAJOR.MINOR.PATCH -m "Release MAJOR.MINOR.PATCH"
    git push origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -a v9.1.0 -m "Release 9.1.0"
    git push origin v9.1.0

    11. Publishing the release

    Publishing of new releases is handled by a custom Bamboo branch plan and is manually triggered.

    The reasons for using a separate branch plan to handle releases instead of the branch plan for the minor version (e.g. release-1.2.x) are:

    • The Bamboo build for the minor version release branch is triggered automatically on any commits to that branch, whereas we want to manually control when the release is published.
    • We want to verify that integration tests have passed on the Bamboo build for the minor version release branch before we manually trigger the release, so that we can be sure that our code is safe to release.

    If this is a new minor version branch, then you will need to create a new Bamboo branch plan for publishing the release following the instructions below:

    Creating a Bamboo branch plan for the release

    • In the Cumulus Core project (https://ci.earthdata.nasa.gov/browse/CUM-CBA), click Actions -> Configure Plan in the top right.

    • Next to Plan branch click the rightmost button that displays Create Plan Branch upon hover.

    • Click Create plan branch manually.

    • Add the values in that list. Choose a display name that makes it very clear this is a deployment branch plan. Release (minor version branch name) seems to work well (e.g. Release (1.2.x))).

      • Make sure you enter the correct branch name (e.g. release-1.2.x).
    • Important Deselect Enable Branch - if you do not do this, it will immediately fire off a build.

    • Do Immediately On the Branch Details page, enable Change trigger. Set the Trigger type to manual, this will prevent commits to the branch from triggering the build plan. You should have been redirected to the Branch Details tab after creating the plan. If not, navigate to the branch from the list where you clicked Create Plan Branch in the previous step.

    • Go to the Variables tab. Ensure that you are on your branch plan and not the master plan: You should not see a large list of configured variables, but instead a dropdown allowing you to select variables to override, and the tab title will be Branch Variables. Then set the branch variables as follow:

      • DEPLOYMENT: cumulus-from-npm-tf (except in special cases such as incompatible backport branches)
        • If this variable is not set, it will default to the deployment name for the last committer on the branch
      • USE_CACHED_BOOTSTRAP: false
      • USE_TERRAFORM_ZIPS: true (IMPORTANT: MUST be set in order to run integration tests against the .zip files published during the build so that we are actually testing our released files)
      • GIT_PR: true
      • SKIP_AUDIT: true
      • PUBLISH_FLAG: true
    • Enable the branch from the Branch Details page.

    • Run the branch using the Run button in the top right.

    Bamboo will build and run lint and unit tests against that tagged release, publish the new packages to NPM, and then run the integration tests using those newly released packages.

    12. Create a new Cumulus release on github

    The CI release scripts will automatically create a GitHub release based on the release version tag, as well as upload artifacts to the Github release for the Terraform modules provided by Cumulus. The Terraform release artifacts include:

    • A multi-module Terraform .zip artifact containing filtered copies of the tf-modules, packages, and tasks directories for use as Terraform module sources.
    • A S3 replicator module
    • A workflow module
    • A distribution API module
    • An ECS service module

    Just make sure to verify the appropriate .zip files are present on Github after the release process is complete.

    13. Merge base branch back to master

    Finally, you need to reproduce the version update changes back to master.

    If this is the latest version, you can simply create a PR to merge the minor version base branch back to master.

    Do not merge master back into the release branch since we want the release branch to just have the code from the release. Instead, create a new branch off of the release branch and merge that to master. You can freely merge master into this branch and delete it when it is merged to master.

    If this is a backport, you will need to create a PR that ports the changelog updates back to master. It is important in this changelog note to call it out as a backport. For example, fixes in backport version 1.14.5 may not be available in 1.15.0 because the fix was introduced in 1.15.3.

    Troubleshooting

    Delete and regenerate the tag

    To delete a published tag to re-tag, follow these steps:

      git tag -d vMAJOR.MINOR.PATCH
    git push -d origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -d v9.1.0
    git push -d origin v9.1.0
    - + \ No newline at end of file diff --git a/docs/v12.0.0/docs-how-to/index.html b/docs/v12.0.0/docs-how-to/index.html index d2d694bf33b..a1d27baa760 100644 --- a/docs/v12.0.0/docs-how-to/index.html +++ b/docs/v12.0.0/docs-how-to/index.html @@ -5,13 +5,13 @@ Cumulus Documentation: How To's | Cumulus Documentation - +
    Version: v12.0.0

    Cumulus Documentation: How To's

    Cumulus Docs Installation

    Run a Local Server

    Environment variables DOCSEARCH_API_KEY and DOCSEARCH_INDEX_NAME must be set for search to work. At the moment, search is only truly functional on prod because that is the only website we have registered to be indexed with DocSearch (see below on search).

    git clone git@github.com:nasa/cumulus
    cd cumulus
    npm run docs-install
    npm run docs-serve

    Note: docs-build will build the documents into website/build.

    Cumulus Documentation

    Our project documentation is hosted on GitHub Pages. The resources published to this website are housed in docs/ directory at the top of the Cumulus repository. Those resources primarily consist of markdown files and images.

    We use the open-source static website generator Docusaurus to build html files from our markdown documentation, add some organization and navigation, and provide some other niceties in the final website (search, easy templating, etc.).

    Add a New Page and Sidebars

    Adding a new page should be as simple as writing some documentation in markdown, placing it under the correct directory in the docs/ folder and adding some configuration values wrapped by --- at the top of the file. There are many files that already have this header which can be used as reference.

    ---
    id: doc-unique-id # unique id for this document. This must be unique across ALL documentation under docs/
    title: Title Of Doc # Whatever title you feel like adding. This will show up as the index to this page on the sidebar.
    hide_title: false
    ---

    Note: To have the new page show up in a sidebar the designated id must be added to a sidebar in the website/sidebars.js file. Docusaurus has an in depth explanation of sidebars here.

    Versioning Docs

    We lean heavily on Docusaurus for versioning. Their suggestions and walk-through can be found here. It is worth noting that we would like the Documentation versions to match up directly with release versions. Cumulus versioning is explained in the Versioning Docs.

    Search on our documentation site is taken care of by DocSearch. We have been provided with an apiKey and an indexName by DocSearch that we include in our website/siteConfig.js file. The rest, indexing and actual searching, we leave to DocSearch. Our builds expect environment variables for both these values to exist - DOCSEARCH_API_KEY and DOCSEARCH_NAME_INDEX.

    Add a new task

    The tasks list in docs/tasks.md is generated from the list of task package in the task folder. Do not edit the docs/tasks.md file directly.

    Read more about adding a new task.

    Editing the tasks.md header or template

    Look at the bin/build-tasks-doc.js and bin/tasks-header.md files to edit the output of the tasks build script.

    Editing diagrams

    For some diagrams included in the documentation, the raw source is included in the docs/assets/raw directory to allow for easy updating in the future:

    • assets/interfaces.svg -> assets/raw/interfaces.drawio (generated using draw.io)

    Deployment

    The master branch is automatically built and deployed to gh-pages branch. The gh-pages branch is served by Github Pages. Do not make edits to the gh-pages branch.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/external-contributions/index.html b/docs/v12.0.0/external-contributions/index.html index 1ad47aabdb6..ef6f4152735 100644 --- a/docs/v12.0.0/external-contributions/index.html +++ b/docs/v12.0.0/external-contributions/index.html @@ -5,13 +5,13 @@ External Contributions | Cumulus Documentation - +
    Version: v12.0.0

    External Contributions

    Contributions to Cumulus may be made in the form of PRs to the repositories directly or through externally developed tasks and components. Cumulus is designed as an ecosystem that leverages Terraform deployments and AWS Step Functions to easily integrate external components.

    This list may not be exhaustive and represents components that are open source, owned externally, and that have been tested with the Cumulus system. For more information and contributing guidelines, visit the respective GitHub repositories.

    Distribution

    The ASF Thin Egress App is used by Cumulus for distribution. TEA can be deployed with Cumulus or as part of other applications to distribute data.

    Operational Cloud Recovery Archive (ORCA)

    ORCA can be deployed with Cumulus to provide a customizable baseline for creating and managing operational backups.

    Workflow Tasks

    CNM

    PO.DAAC provides two workflow tasks to be used with the Cloud Notification Mechanism (CNM) Schema: CNM to Granule and CNM Response.

    See the CNM workflow data cookbook for an example of how these can be used in a Cumulus ingest workflow.

    DMR++ Generation

    GHRC has provided a DMR++ Generation wokrflow task. This task is meant to be used in conjunction with Cumulus' Hyrax Metadata Updates workflow task.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/faqs/index.html b/docs/v12.0.0/faqs/index.html index 430ac006202..e4231e5da3e 100644 --- a/docs/v12.0.0/faqs/index.html +++ b/docs/v12.0.0/faqs/index.html @@ -5,13 +5,13 @@ Frequently Asked Questions | Cumulus Documentation - +
    Version: v12.0.0

    Frequently Asked Questions

    Below are some commonly asked questions that you may encounter that can assist you along the way when working with Cumulus.

    General

    How do I deploy a new instance in Cumulus?

    Answer: For steps on the Cumulus deployment process go to How to Deploy Cumulus.

    What prerequisites are needed to setup Cumulus?

    Answer: You will need access to the AWS console and an Earthdata login before you can deploy Cumulus.

    What is the preferred web browser for the Cumulus environment?

    Answer: Our preferred web browser is the latest version of Google Chrome.

    How do I quickly troubleshoot an issue in Cumulus?

    Answer: To troubleshoot and fix issues in Cumulus reference our recommended solutions in Troubleshooting Cumulus.

    Where can I get support help?

    Answer: The following options are available for assistance:

    • Cumulus: Outside NASA users should file a GitHub issue and inside NASA users should file a JIRA issue.
    • AWS: You can create a case in the AWS Support Center, accessible via your AWS Console.

    Integrators & Developers

    What is a Cumulus integrator?

    Answer: Those who are working within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    What are the steps if I run into an issue during deployment?

    Answer: If you encounter an issue with your deployment go to the Troubleshooting Deployment guide.

    Is Cumulus customizable and flexible?

    Answer: Yes. Cumulus is a modular architecture that allows you to decide which components that you want/need to deploy. These components are maintained as Terraform modules.

    What are Terraform modules?

    Answer: They are modules that are composed to create a Cumulus deployment, which gives integrators the flexibility to choose the components of Cumulus that want/need. To view Cumulus maintained modules or steps on how to create a module go to Terraform modules.

    Where do I find Terraform module variables

    Answer: Go here for a list of Cumulus maintained variables.

    What is a Cumulus workflow?

    Answer: A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions. For more details, we suggest visiting here.

    How do I set up a Cumulus workflow?

    Answer: You will need to create a provider, have an associated collection (add a new one), and generate a new rule first. Then you can set up a Cumulus workflow by following these steps here.

    What are the common use cases that a Cumulus integrator encounters?

    Answer: The following are some examples of possible use cases you may see:


    Operators

    What is a Cumulus operator?

    Answer: Those that ingests, archives, and troubleshoots datasets (called collections in Cumulus). Your daily activities might include but not limited to the following:

    • Ingesting datasets
    • Maintaining historical data ingest
    • Starting and stopping data handlers
    • Managing collections
    • Managing provider definitions
    • Creating, enabling, and disabling rules
    • Investigating errors for granules and deleting or re-ingesting granules
    • Investigating errors in executions and isolating failed workflow step(s)
    What are the common use cases that a Cumulus operator encounters?

    Answer: The following are some examples of possible use cases you may see:

    Can you re-run a workflow execution in AWS?

    Answer: Yes. For steps on how to re-run a workflow execution go to Re-running workflow executions in the Cumulus Operator Docs.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/features/ancillary_metadata/index.html b/docs/v12.0.0/features/ancillary_metadata/index.html index 54feb0c6dd6..4fdd124810c 100644 --- a/docs/v12.0.0/features/ancillary_metadata/index.html +++ b/docs/v12.0.0/features/ancillary_metadata/index.html @@ -5,7 +5,7 @@ Ancillary Metadata Export | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v12.0.0

    Ancillary Metadata Export

    This feature utilizes the type key on a files object in a Cumulus granule. It uses the key to provide a mechanism where granule discovery, processing and other tasks can set and use this value to facilitate metadata export to CMR.

    Tasks setting type

    Discover Granules

    Uses the Collection type key to set the value for files on discovered granules in it's output.

    Parse PDR

    Uses a task-specific mapping to map PDR 'FILE_TYPE' to a CNM type to set type on granules from the PDR.

    CNMToCMALambdaFunction

    Natively supports types that are included in incoming messages to a CNM Workflow.

    Tasks using type

    Move Granules

    Uses the granule file type key to update UMM/ECHO 10 CMR files passed in as candidates to the task. This task adds the external facing URLs to the CMR metadata file based on the type. See the file tracking data cookbook for a detailed mapping. If a non-CNM type is specified, the task assumes it is a 'data' file.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/features/backup_and_restore/index.html b/docs/v12.0.0/features/backup_and_restore/index.html index 2fb65cc7f9b..b868fc78aa0 100644 --- a/docs/v12.0.0/features/backup_and_restore/index.html +++ b/docs/v12.0.0/features/backup_and_restore/index.html @@ -5,7 +5,7 @@ Cumulus Backup and Restore | Cumulus Documentation - + @@ -52,7 +52,7 @@ writing to the old cluster.

  • Set the snapshot_identifier variable to the snapshot you wish to create, and configure the module like a new deployment, with a unique cluster_identifier

  • Deploy the module using terraform apply

  • Once deployed, verify the cluster has the expected data

  • Redeploy the data persistence and Cumulus deployments - You should not need to reconfigure either, as the secret ARN and the security group should not change, however double-check the configured values are as expected

  • - + \ No newline at end of file diff --git a/docs/v12.0.0/features/dead_letter_archive/index.html b/docs/v12.0.0/features/dead_letter_archive/index.html index b89f4b12c0b..abda5954b65 100644 --- a/docs/v12.0.0/features/dead_letter_archive/index.html +++ b/docs/v12.0.0/features/dead_letter_archive/index.html @@ -5,13 +5,13 @@ Cumulus Dead Letter Archive | Cumulus Documentation - +
    Version: v12.0.0

    Cumulus Dead Letter Archive

    This documentation explains the Cumulus dead letter archive and associated functionality.

    DB Records DLQ Archive

    The Cumulus system contains a number of dead letter queues. Perhaps the most important system lambda function supported by a DLQ is the sfEventSqsToDbRecords lambda function which parses Cumulus messages from workflow executions to generate and write database records to the Cumulus database.

    As of Cumulus v9+, the dead letter queue for this lambda (named sfEventSqsToDbRecordsDeadLetterQueue) has been updated with a consumer lambda that will automatically write any incoming records to the S3 system bucket, under the path <stackName>/dead-letter-archive/sqs/. This will allow integrators and operators engaged in debugging missing records to inspect any Cumulus messages which failed to process and did not result in the successful creation of database records.

    Dead Letter Archive recovery

    In addition to the above, as of Cumulus v9+, the Cumulus API also contains a new endpoint at /deadLetterArchive/recoverCumulusMessages.

    Sending a POST request to this endpoint will trigger a Cumulus AsyncOperation that will attempt to reprocess (and if successful delete) all Cumulus messages in the dead letter archive, using the same underlying logic as the existing sfEventSqsToDbRecords.

    This endpoint may prove particularly useful when recovering from extended or unexpected database outage, where messages failed to process due to external outage and there is no essential malformation of each Cumulus message.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/features/dead_letter_queues/index.html b/docs/v12.0.0/features/dead_letter_queues/index.html index 88a8c3ad0f8..8fed28d1d8e 100644 --- a/docs/v12.0.0/features/dead_letter_queues/index.html +++ b/docs/v12.0.0/features/dead_letter_queues/index.html @@ -5,13 +5,13 @@ Dead Letter Queues | Cumulus Documentation - +
    Version: v12.0.0

    Dead Letter Queues

    startSF SQS queue

    The workflow-trigger for the startSF queue has a Redrive Policy set up that directs any failed attempts to pull from the workflow start queue to a SQS queue Dead Letter Queue.

    This queue can then be monitored for failures to initiate a workflow. Please note that workflow failures will not show up in this queue, only repeated failure to trigger a workflow.

    Named Lambda Dead Letter Queues

    Cumulus provides configured Dead Letter Queues (DLQ) for non-workflow Lambdas (such as ScheduleSF) to capture Lambda failures for further processing.

    These DLQs are setup with the following configuration:

      receive_wait_time_seconds  = 20
    message_retention_seconds = 1209600
    visibility_timeout_seconds = 60

    Default Lambda Configuration

    The following built-in Cumulus Lambdas are setup with DLQs to allow handling of process failures:

    • dbIndexer (Updates Elasticsearch)
    • JobsLambda (writes logs outputs to Elasticsearch)
    • ScheduleSF (the SF Scheduler Lambda that places messages on the queue that is used to start workflows, see Workflow Triggers)
    • publishReports (Lambda that publishes messages to the SNS topics for execution, granule and PDR reporting)
    • reportGranules, reportExecutions, reportPdrs (Lambdas responsible for updating records based on messages in the queues published by publishReports)

    Troubleshooting/Utilizing messages in a Dead Letter Queue

    Ideally an automated process should be configured to poll the queue and process messages off a dead letter queue.

    For aid in manually troubleshooting, you can utilize the SQS Management console to view/messages available in the queues setup for a particular stack. The dead letter queues will have a Message Body containing the Lambda payload, as well as Message Attributes that reference both the error returned and a RequestID which can be cross referenced to the associated Lambda's CloudWatch logs for more information:

    Screenshot of the AWS SQS console showing how to view SQS message attributes

    - + \ No newline at end of file diff --git a/docs/v12.0.0/features/distribution-metrics/index.html b/docs/v12.0.0/features/distribution-metrics/index.html index 4acb5ee67b7..6e70b272782 100644 --- a/docs/v12.0.0/features/distribution-metrics/index.html +++ b/docs/v12.0.0/features/distribution-metrics/index.html @@ -5,13 +5,13 @@ Cumulus Distribution Metrics | Cumulus Documentation - +
    Version: v12.0.0

    Cumulus Distribution Metrics

    It is possible to configure Cumulus and the Cumulus Dashboard to display information about the successes and failures of requests for data. This requires the Cumulus instance to deliver Cloudwatch Logs and S3 Server Access logs to an ELK stack.

    ESDIS Metrics in NGAP

    Work with the ESDIS metrics team to set up permissions and access to forward Cloudwatch Logs to a shared AWS:Logs:Destination as well as transferring your S3 Server Access logs to a metrics team bucket.

    The metrics team has taken care of setting up logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    Once Cumulus has been configured to deliver Cloudwatch logs to the ESDIS Metrics team, you can use the Elasticsearch indexes to create the necessary target patterns on the dashboard. These are often <daac>-cloudwatch-cumulus-<env>-* and <daac>-distribution-<env>-*, but they will depend on your specific Elastiscearch setup.

    Cumulus / ESDIS Metrics distribution system

    Architecture diagram showing how logs are replicated from a Cumulus instance to the ESDIS Metrics account and accessed by the Cumulus dashboard

    - + \ No newline at end of file diff --git a/docs/v12.0.0/features/execution_payload_retention/index.html b/docs/v12.0.0/features/execution_payload_retention/index.html index b6edd7dfd1c..ba87c4e3d69 100644 --- a/docs/v12.0.0/features/execution_payload_retention/index.html +++ b/docs/v12.0.0/features/execution_payload_retention/index.html @@ -5,13 +5,13 @@ Execution Payload Retention | Cumulus Documentation - +
    Version: v12.0.0

    Execution Payload Retention

    In addition to CloudWatch logs and AWS StepFunction API records, Cumulus automatically stores the initial and 'final' (the last update to the execution record) payload values as part of the Execution record in your RDS database and Elasticsearch.

    This allows access via the API (or optionally direct DB/Elasticsearch querying) for debugging/reporting purposes. The data is stored in the "originalPayload" and "finalPayload" fields.

    Payload record cleanup

    To reduce storage requirements, a CloudWatch rule ({stack-name}-dailyExecutionPayloadCleanupRule) triggering a daily run of the provided cleanExecutions lambda has been added. This lambda will remove all 'completed' and 'non-completed' payload records in the database that are older than the specified configuration.

    Configuration

    The following configuration flags have been made available in the cumulus module. They may be overridden in your deployment's instance of the cumulus module by adding the following configuration options:

    dailyexecution_payload_cleanup_schedule_expression (string)_

    This configuration option sets the execution times for this Lambda to run, using a Cloudwatch cron expression.

    Default value is "cron(0 4 * * ? *)".

    completeexecution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of completed execution payloads.

    Default value is false.

    completeexecution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a 'completed' status in days. Records with updatedAt values older than this with payload information will have that information removed.

    Default value is 10.

    noncomplete_execution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of "non-complete" (any status other than completed) execution payloads.

    Default value is false.

    noncomplete_execution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a status other than 'complete' in days. Records with updateTime values older than this with payload information will have that information removed.

    Default value is 30 days.

    • complete_execution_payload_disable/non_complete_execution_payload_disable

    These flags (true/false) determine if the cleanup script's logic for 'complete' and 'non-complete' executions will run. Default value is false for both.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/features/logging-esdis-metrics/index.html b/docs/v12.0.0/features/logging-esdis-metrics/index.html index 1667910fc7c..fe2d7668be0 100644 --- a/docs/v12.0.0/features/logging-esdis-metrics/index.html +++ b/docs/v12.0.0/features/logging-esdis-metrics/index.html @@ -5,13 +5,13 @@ Writing logs for ESDIS Metrics | Cumulus Documentation - +
    Version: v12.0.0

    Writing logs for ESDIS Metrics

    Note: This feature is only available for Cumulus deployments in NGAP environments.

    Prerequisite: You must configure your Cumulus deployment to deliver your logs to the correct shared logs destination for ESDIS metrics.

    Log messages delivered to the ESDIS metrics logs destination conforming to an expected format will be automatically ingested and parsed to enable helpful searching/filtering of your logs via the ESDIS metrics Kibana dashboard.

    Expected log format

    The ESDIS metrics pipeline expects a log message to be a JSON string representation of an object (dict in Python or map in Java). An example log message might look like:

    {
    "level": "info",
    "executions": "arn:aws:states:us-east-1:000000000000:execution:MySfn:abcd1234",
    "granules": "[\"granule-1\",\"granule-2\"]",
    "message": "hello world",
    "sender": "greetingFunction",
    "stackName": "myCumulus",
    "timestamp": "2018-10-19T19:12:47.501Z"
    }

    A log message can contain the following properties:

    • executions: The AWS Step Function execution name in which this task is executing, if any
    • granules: A JSON string of the array of granule IDs being processed by this code, if any
    • level: A string identifier for the type of message being logged. Possible values:
      • debug
      • error
      • fatal
      • info
      • warn
      • trace
    • message: String containing your actual log message
    • parentArn: The parent AWS Step Function execution ARN that triggered the current execution, if any
    • sender: The name of the resource generating the log message (e.g. a library name, a Lambda function name, an ECS activity name)
    • stackName: The unique prefix for your Cumulus deployment
    • timestamp: An ISO-8601 formatted timestamp
    • version: The version of the resource generating the log message, if any

    None of these properties are explicitly required for ESDIS metrics to parse your log correctly. However, a log without a message has no informational content. And having level, sender, and timestamp properties is very useful for filtering your logs. Including a stackName in your logs is helpful as it allows you to distinguish between logs generated by different deployments.

    Using Cumulus Message Adapter libraries

    If you are writing a custom task that is integrated with the Cumulus Message Adapter, then some of language specific client libraries can be used to write logs compatible with ESDIS metrics.

    The usage of each library differs slightly, but in general a logger is initialized with a Cumulus workflow message to determine the contextual information for the task (e.g. granules, executions). Then, after the logger is initialized, writing logs only requires specifying a message, but the logged output will include the contextual information as well.

    Writing logs using custom code

    Any code that produces logs matching the expected log format can be processed by ESDIS metrics.

    Node.js

    Cumulus core provides a @cumulus/logger library that writes logs in the expected format for ESDIS metrics.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/features/replay-archived-sqs-messages/index.html b/docs/v12.0.0/features/replay-archived-sqs-messages/index.html index 090f1c509c1..b0d1ccd8cd6 100644 --- a/docs/v12.0.0/features/replay-archived-sqs-messages/index.html +++ b/docs/v12.0.0/features/replay-archived-sqs-messages/index.html @@ -5,14 +5,14 @@ How to replay SQS messages archived in S3 | Cumulus Documentation - +
    Version: v12.0.0

    How to replay SQS messages archived in S3

    Context

    Cumulus archives all incoming SQS messages to S3 and removes messages once they have been processed. Unprocessed messages are archived at the path: ${stackName}/archived-incoming-messages/${queueName}/${messageId}

    Replay SQS messages endpoint

    The Cumulus API has added a new endpoint, /replays/sqs. This endpoint will allow you to start a replay operation to requeue all archived SQS messages by queueName and returns an AsyncOperationId for operation status tracking.

    Start replaying archived SQS messages

    In order to start a replay, you must perform a POST request to the replays/sqs endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    FieldTypeDescription
    queueNamestringAny valid SQS queue name (not ARN)

    Status tracking

    A successful response from the /replays/sqs endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/features/replay-kinesis-messages/index.html b/docs/v12.0.0/features/replay-kinesis-messages/index.html index adca7de57cd..bcd485e9681 100644 --- a/docs/v12.0.0/features/replay-kinesis-messages/index.html +++ b/docs/v12.0.0/features/replay-kinesis-messages/index.html @@ -5,7 +5,7 @@ How to replay Kinesis messages after an outage | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v12.0.0

    How to replay Kinesis messages after an outage

    After a period of outage, it may be necessary for a Cumulus operator to reprocess or 'replay' messages that arrived on an AWS Kinesis Data Stream but did not trigger an ingest. This document serves as an outline on how to start a replay operation, and how to perform status tracking. Cumulus supports replay of all Kinesis messages on a stream (subject to the normal RetentionPeriod constraints), or all messages within a given time slice delimited by start and end timestamps.

    As Kinesis has no comparable field to e.g. the SQS ReceiveCount on its records, Cumulus cannot tell which messages within a given time slice have never been processed, and cannot guarantee only missed messages will be processed. Users will have to rely on duplicate handling or some other method of identifying messages that should not be processed within the time slice.

    NOTE: This operation flow effectively changes only the trigger mechanism for Kinesis ingest notifications. The existence of valid Kinesis-type rules and all other normal requirements for the triggering of ingest via Kinesis still apply.

    Replays endpoint

    Cumulus has added a new endpoint to its API, /replays. This endpoint will allow you to start replay operations and returns an AsyncOperationId for operation status tracking.

    Start a replay

    In order to start a replay, you must perform a POST request to the replays endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    NOTE: As the endTimestamp relies on a comparison with the Kinesis server-side ApproximateArrivalTimestamp, and given that there is no documented level of accuracy for the approximation, it is recommended that the endTimestamp include some amount of buffer to allow for slight discrepancies. If tolerable, the same is recommended for the startTimestamp although it is used differently and less vulnerable to discrepancies since a server-side arrival timestamp should never be earlier than the client-side request timestamp.

    FieldTypeRequiredDescription
    typestringrequiredCurrently only accepts kinesis.
    kinesisStreamstringfor type kinesisAny valid kinesis stream name (not ARN)
    kinesisStreamCreationTimestamp*optionalAny input valid for a JS Date constructor. For reasons to use this field see AWS documentation on StreamCreationTimestamp.
    endTimestamp*optionalAny input valid for a JS Date constructor. Messages newer than this timestamp will be skipped.
    startTimestamp*optionalAny input valid for a JS Date constructor. Messages will be fetched from the Kinesis stream starting at this timestamp. Ignored if it is further in the past than the stream's retention period.

    Status tracking

    A successful response from the /replays endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/features/reports/index.html b/docs/v12.0.0/features/reports/index.html index b77378af032..810f2a1351a 100644 --- a/docs/v12.0.0/features/reports/index.html +++ b/docs/v12.0.0/features/reports/index.html @@ -5,7 +5,7 @@ Reconciliation Reports | Cumulus Documentation - + @@ -19,7 +19,7 @@ report generation. The data buckets will include any buckets in your Cumulus buckets configuration that have type public, protected or private.
    - + \ No newline at end of file diff --git a/docs/v12.0.0/getting-started/index.html b/docs/v12.0.0/getting-started/index.html index 6d1965e0bc9..fd594edf1f9 100644 --- a/docs/v12.0.0/getting-started/index.html +++ b/docs/v12.0.0/getting-started/index.html @@ -5,13 +5,13 @@ Getting Started | Cumulus Documentation - +
    Version: v12.0.0

    Getting Started

    Overview | Quick Tutorials | Helpful Tips

    Overview

    This serves as a guide for new Cumulus users to deploy and learn how to use Cumulus. Here you will learn what you need in order to complete any prerequisites, what Cumulus is and how it works, and how to successfully navigate and deploy a Cumulus environment.

    What is Cumulus

    Cumulus is an open source set of components for creating cloud-based data ingest, archive, distribution and management designed for NASA's future Earth Science data streams.

    Who uses Cumulus

    Data integrators/developers and operators across projects not limited to NASA use Cumulus for their daily work functions.

    Cumulus Roles

    Integrator/Developer

    Cumulus integrators/developers are those who work within Cumulus and AWS for deployments and to manage workflows.

    Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections.

    Role Guides

    As a developer, integrator, or operator, you will need to set up your environments to work in Cumulus. The following docs can get you started in your role specific activities.

    What is a Cumulus Data Type

    In Cumulus, we have the following types of data that you can create and manage:

    • Collections
    • Granules
    • Providers
    • Rules
    • Workflows
    • Executions
    • Reports

    For details on how to create or manage data types go to Data Management Types.


    Quick Tutorials

    Deployment & Configuration

    Cumulus is deployed to an AWS account, so you must have access to deploy resources to an AWS account to get started.

    1. Deploy Cumulus and Cumulus Dashboard to AWS

    Follow the deployment instructions to deploy Cumulus to your AWS account.

    2. Configure and Run the HelloWorld Workflow

    If you have deployed using the cumulus-template-deploy repository, you have a HelloWorld workflow deployed to your Cumulus backend.

    You can see your deployed workflows on the Workflows page of your Cumulus dashboard.

    Configure a collection and provider using the setup guidance on the Cumulus dashboard.

    Then create a rule to trigger your HelloWorld workflow. You can select a rule type of one time.

    Navigate to the Executions page of the dashboard to check the status of your workflow execution.

    3. Configure a Custom Workflow

    See Developing a custom workflow documentation for adding a new workflow to your deployment.

    There are plenty of workflow examples using Cumulus tasks here. The Data Cookbooks provide a more in-depth look at some of these more advanced workflows and their configurations.

    There is a list of Cumulus tasks already included in your deployment here.

    After configuring your workflow and redeploying, you can configure and run your workflow using the same steps as in step 2.


    Helpful Tips

    Here are some useful tips to keep in mind when deploying or working in Cumulus.

    Integrator/Developer

    • Versioning and Releases: This documentation gives information on our global versioning approach. We suggest upgrading to the supported version for Cumulus, Cumulus dashboard, and Thin Egress App (TEA).
    • Cumulus Developer Documentation: We suggest that you read through and reference this resource for development best practices in Cumulus.
    • Cumulus Deployment: We will guide you on how to manually deploy a new instance of Cumulus. In this reference, you will learn how to install Terraform, create an AWS S3 bucket, configure a compatible database, and create a Lambda layer.
    • Terraform Best Practices: This will help guide you through your Terraform configuration and Cumulus deployment. For an introduction about Terraform go here.
    • Integrator Common Use Cases: Scenarios to help integrators along in the Cumulus environment.

    Operator

    Troubleshooting

    Troubleshooting: Some suggestions to help you troubleshoot and solve issues you may encounter.

    Resources

    - + \ No newline at end of file diff --git a/docs/v12.0.0/glossary/index.html b/docs/v12.0.0/glossary/index.html index 07c92c8985e..b42ab4d6b41 100644 --- a/docs/v12.0.0/glossary/index.html +++ b/docs/v12.0.0/glossary/index.html @@ -5,13 +5,13 @@ Glossary | Cumulus Documentation - +
    Version: v12.0.0

    Glossary

    AWS Glossary

    For terms/items from Amazon/AWS not mentioned in this glossary, please refer to the AWS Glossary.

    Cumulus Glossary of Terms

    API Gateway

    Refers to AWS's API Gateway. Used by the Cumulus API.

    ARN

    Refers to an AWS "Amazon Resource Name".

    For more info, see the AWS documentation.

    AWS

    See: aws.amazon.com

    AWS Lambda/Lambda Function

    AWS's 'serverless' option. Allows the running of code without provisioning a service or managing server/ECS instances/etc.

    For more information, see the AWS Lambda documentation.

    AWS Access Keys

    Access credentials that give you access to AWS to act as a IAM user programmatically or from the command line.

    For more information, see the AWS IAM Documentation.

    Bucket

    An Amazon S3 cloud storage resource.

    For more information, see the AWS Bucket Documentation.

    CloudFormation

    An AWS service that allows you to define and manage cloud resources as a preconfigured block.

    For more information, see the AWS CloudFormation User Guide.

    Cloudformation Template

    A template that defines an AWS Cloud Formation.

    For more information, see the AWS intro page.

    Cloudwatch

    AWS service that allows logging and metrics collections on various cloud resources you have in AWS.

    For more information, see the AWS User Guide.

    Cloud Notification Mechanism (CNM)

    An interface mechanism to support cloud-based ingest messaging. For more information, see PO.DAAC's CNM Schema.

    Common Metadata Repository (CMR)

    "A high-performance, high-quality, continuously evolving metadata system that catalogs Earth Science data and associated service metadata records". For more information, see NASA's CMR page.

    Collection (Cumulus)

    Cumulus Collections are logical sets of data objects of the same data type and version.

    For more information, see cookbook reference page.

    Cumulus Message Adapter (CMA)

    A library designed to help task developers integrate step function tasks into a Cumulus workflow by adapting task input/output into the Cumulus Message format.

    For more information, see CMA workflow reference page.

    Distributed Active Archive Center (DAAC)

    Refers to a specific organization that's part of NASA's distributed system of archive centers. For more information see EOSDIS's DAAC page

    Dead Letter Queue (DLQ)

    This refers to Amazon SQS Dead-Letter Queues - these SQS queues are specifically configured to capture failed messages from other services/SQS queues/etc to allow for processing of failed messages.

    For more on DLQs, see the Amazon Documentation and the Cumulus DLQ feature page.

    Developer

    Those who setup deployment and workflow management for Cumulus. Sometimes referred to as an integrator. See integrator.

    ECS

    Amazon's Elastic Container Service. Used in Cumulus by workflow steps that require more flexibility than Lambda can provide.

    For more information, see AWS's developer guide.

    ECS Activity

    An ECS instance run via a Step Function.

    Execution (Cumulus)

    A Cumulus execution refers to a single execution of a (Cumulus) Workflow.

    GIBS

    Global Imagery Browse Services

    Granule

    A granule is the smallest aggregation of data that can be independently managed (described, inventoried, and retrieved). Granules are always associated with a collection, which is a grouping of granules. A granule is a grouping of data files.

    IAM

    AWS Identity and Access Management.

    For more information, see AWS IAMs.

    Integrator/Developer

    Those who work within Cumulus and AWS for deployments and to manage workflows.

    Kinesis

    Amazon's platform for streaming data on AWS.

    See AWS Kinesis for more information.

    Lambda

    AWS's cloud service that lets you run code without provisioning or managing servers.

    For more information, see AWS's lambda page.

    Module (Terraform)

    Refers to a terraform module.

    Node

    See node.js.

    Npm

    Node package manager.

    For more information, see npmjs.com.

    Operator

    Those who work within Cumulus to ingest/archive data and manage collections.

    PDR

    "Polling Delivery Mechanism" used in "DAAC Ingest" workflows.

    For more information, see nasa.gov.

    Packages (NPM)

    NPM hosted node.js packages. Cumulus packages can be found on NPM's site here

    Provider

    Data source that generates and/or distributes data for Cumulus workflows to act upon.

    For more information, see the Cumulus documentation.

    Rule

    Rules are configurable scheduled events that trigger workflows based on various criteria.

    For more information, see the Cumulus Rules documentation.

    S3

    Amazon's Simple Storage Service provides data object storage in the cloud. Used in Cumulus to store configuration, data and more.

    For more information, see AWS's s3 page.

    SIPS

    Science Investigator-led Processing Systems. In the context of DAAC ingest, this refers to data producers/providers.

    For more information, see nasa.gov.

    SNS

    Amazon's Simple Notification Service provides a messaging service that allows publication of and subscription to events. Used in Cumulus to trigger workflow events, track event failures, and others.

    For more information, see AWS's SNS page.

    SQS

    Amazon's Simple Queue Service.

    For more information, see AWS's SQS page.

    Stack

    A collection of AWS resources you can manage as a single unit.

    In the context of Cumulus, this refers to a deployment of the cumulus and data-persistence modules that is managed by Terraform

    Step Function

    AWS's web service that allows you to compose complex workflows as a state machine comprised of tasks (Lambdas, activities hosted on EC2/ECS, some AWS service APIs, etc). See AWS's Step Function Documentation for more information. In the context of Cumulus these are the underlying AWS service used to create Workflows.

    Terraform

    Terraform is the tool that you will use for deployment and configuration of your Cumulus environment.

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/index.html b/docs/v12.0.0/index.html index 44bf38b8c7a..b685ad292b4 100644 --- a/docs/v12.0.0/index.html +++ b/docs/v12.0.0/index.html @@ -5,13 +5,13 @@ Introduction | Cumulus Documentation - +
    Version: v12.0.0

    Introduction

    This Cumulus project seeks to address the existing need for a “native” cloud-based data ingest, archive, distribution, and management system that can be used for all future Earth Observing System Data and Information System (EOSDIS) data streams via the development and implementation of Cumulus. The term “native” implies that the system will leverage all components of a cloud infrastructure provided by the vendor for efficiency (in terms of both processing time and cost). Additionally, Cumulus will operate on future data streams involving satellite missions, aircraft missions, and field campaigns.

    This documentation includes both guidelines, examples, and source code docs. It is accessible at https://nasa.github.io/cumulus.


    Get To Know Cumulus

    • Getting Started - here - If you are new to Cumulus we suggest that you begin with this section to help you understand and work in the environment.
    • General Cumulus Documentation - here <- you're here

    Cumulus Reference Docs

    • Cumulus API Documentation - here
    • Cumulus Developer Documentation - here - READMEs throughout the main repository.
    • Data Cookbooks - here

    Auxiliary Guides

    • Integrator Guide - here
    • Operator Docs - here

    Contributing

    Please refer to: https://github.com/nasa/cumulus/blob/master/CONTRIBUTING.md for information. We thank you in advance.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/integrator-guide/about-int-guide/index.html b/docs/v12.0.0/integrator-guide/about-int-guide/index.html index a748ee9623f..7ba38aa46da 100644 --- a/docs/v12.0.0/integrator-guide/about-int-guide/index.html +++ b/docs/v12.0.0/integrator-guide/about-int-guide/index.html @@ -5,13 +5,13 @@ About Integrator Guide | Cumulus Documentation - +
    Version: v12.0.0

    About Integrator Guide

    Purpose

    The Integrator Guide is to help supplement the Cumulus documentation and Data Cookbooks. This content is for Cumulus integrators who are either new to the project or need a step-by-step resource to help them along.

    What Is A Cumulus Integrator

    Cumulus integrators are those who work within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    - + \ No newline at end of file diff --git a/docs/v12.0.0/integrator-guide/int-common-use-cases/index.html b/docs/v12.0.0/integrator-guide/int-common-use-cases/index.html index 5d12f289b2c..4ceef0f8c90 100644 --- a/docs/v12.0.0/integrator-guide/int-common-use-cases/index.html +++ b/docs/v12.0.0/integrator-guide/int-common-use-cases/index.html @@ -5,13 +5,13 @@ Integrator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v12.0.0/integrator-guide/workflow-add-new-lambda/index.html b/docs/v12.0.0/integrator-guide/workflow-add-new-lambda/index.html index ec7cd7b1841..bff94642a8d 100644 --- a/docs/v12.0.0/integrator-guide/workflow-add-new-lambda/index.html +++ b/docs/v12.0.0/integrator-guide/workflow-add-new-lambda/index.html @@ -5,13 +5,13 @@ Workflow - Add New Lambda | Cumulus Documentation - +
    Version: v12.0.0

    Workflow - Add New Lambda

    You can develop a workflow task in AWS Lambda or Elastic Container Service (ECS). AWS ECS requires Docker. For a list of tasks to use go to our Cumulus Tasks page.

    The following steps are to help you along as you write a new Lambda that integrates with a Cumulus workflow. This will aid you with the understanding of the Cumulus Message Adapter (CMA) process.

    Steps

    1. Define New Lambda in Terraform

    2. Add Task in JSON Object

      For details on how to set up a workflow via CMA go to the CMA Tasks: Message Flow.

      You will need to assign input and output for the new task and follow the CMA contract here. This contract defines how libraries should call the cumulus-message-adapter to integrate a task into an existing Cumulus Workflow.

    3. Verify New Task

      Check the updated workflow in AWS and in Cumulus.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/integrator-guide/workflow-ts-failed-step/index.html b/docs/v12.0.0/integrator-guide/workflow-ts-failed-step/index.html index c05f7f86448..bb1d94fa037 100644 --- a/docs/v12.0.0/integrator-guide/workflow-ts-failed-step/index.html +++ b/docs/v12.0.0/integrator-guide/workflow-ts-failed-step/index.html @@ -5,13 +5,13 @@ Workflow - Troubleshoot Failed Step(s) | Cumulus Documentation - +
    Version: v12.0.0

    Workflow - Troubleshoot Failed Step(s)

    Steps

    1. Locate Step
    • Go to Cumulus dashboard
    • Find the granule
    • Go to Executions to determine the failed step
    1. Investigate in Cloudwatch
    • Go to Cloudwatch
    • Locate lambda
    • Search Cloudwatch logs
    1. Recreate Error

      In your sandbox environment, try to recreate the error.

    2. Resolution

    - + \ No newline at end of file diff --git a/docs/v12.0.0/interfaces/index.html b/docs/v12.0.0/interfaces/index.html index 7e282fe5244..fbfabd6511c 100644 --- a/docs/v12.0.0/interfaces/index.html +++ b/docs/v12.0.0/interfaces/index.html @@ -5,13 +5,13 @@ Interfaces | Cumulus Documentation - +
    Version: v12.0.0

    Interfaces

    Cumulus has multiple interfaces that allow interaction with discrete components of the system, such as starting workflows via SNS/Kinesis/SQS, manually queueing workflow start messages, submitting SNS notifications for completed workflows, and the many operations allowed by the Cumulus API.

    The diagram below illustrates the workflow process in detail and the various interfaces that allow starting of workflows, reporting of workflow information, and database create operations that occur when a workflow reporting message is processed. For interfaces with expected input or output schemas, details are provided below.

    Architecture diagram showing the interfaces for triggering and reporting of Cumulus workflow executions

    Workflow triggers and queuing

    Kinesis stream

    As a Kinesis stream is consumed by the messageConsumer Lambda to queue workflow executions, the incoming event is validated against this consumer schema by the ajv package.

    SQS queue for executions

    The messages put into the SQS queue for executions should conform to the Cumulus message format.

    Workflow executions

    See the documentation on Cumulus workflows.

    Workflow reporting

    SNS reporting topics

    For granule and PDR reporting, the topics will only receive data if the Cumulus workflow execution message meets the following criteria:

    • Granules - workflow message contains granule data in payload.granules
    • PDRs - workflow message contains PDR data in payload.pdr

    The messages published to the SNS reporting topics for executions and PDRs and the record property in the messages published to the granules SNS topic should conform to the model schema for each data type.

    Further detail on workflow reporting and how to interact with these interfaces can be found in the workflow notifications data cookbook.

    Cumulus API

    See the Cumulus API documentation.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/operator-docs/about-operator-docs/index.html b/docs/v12.0.0/operator-docs/about-operator-docs/index.html index 923d92662a7..2ce28887b22 100644 --- a/docs/v12.0.0/operator-docs/about-operator-docs/index.html +++ b/docs/v12.0.0/operator-docs/about-operator-docs/index.html @@ -5,13 +5,13 @@ About Operator Docs | Cumulus Documentation - +
    Version: v12.0.0

    About Operator Docs

    Purpose

    Operator Docs are an augmentation to Cumulus documentation and Data Cookbooks. These documents will walk step-by-step through common Cumulus activities (that aren't necessarily as use-case directed as what you'd see in Data Cookbooks).

    What Is A Cumulus Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections. They may perform the following functions via the operator dashboard or API:

    • Configure providers and collections
    • Configure rules and monitor workflow executions
    • Monitor granule ingestion
    • Monitor system metrics
    - + \ No newline at end of file diff --git a/docs/v12.0.0/operator-docs/bulk-operations/index.html b/docs/v12.0.0/operator-docs/bulk-operations/index.html index 883dcad3f10..e06f028b807 100644 --- a/docs/v12.0.0/operator-docs/bulk-operations/index.html +++ b/docs/v12.0.0/operator-docs/bulk-operations/index.html @@ -5,14 +5,14 @@ Bulk Operations | Cumulus Documentation - +
    Version: v12.0.0

    Bulk Operations

    Cumulus implements bulk operations through the use of AsyncOperations, which are long-running processes executed on an AWS ECS cluster.

    Submitting a bulk API request

    Bulk operations are generally submitted via the endpoint for the relevant data type, e.g. granules. For a list of supported API requests, refer to the Cumulus API documentation. Bulk operations are denoted with the keyword 'bulk'.

    Starting bulk operations from the Cumulus dashboard

    Using a Kibana query

    Note: You must have configured your dashboard build with a KIBANAROOT environment variable in order for the Kibana link to render in the bulk granules modal

    1. From the Granules dashboard page, click on the "Run Bulk Granules" button, then select what type of action you would like to perform

      • Note: the rest of the process is the same regardless of what type of bulk action you perform
    2. From the bulk granules modal, click the "Open Kibana" link:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations

    3. Once you have accessed Kibana, navigate to the "Discover" page. If this is your first time using Kibana, you may see a message like this at the top of the page:

      In order to visualize and explore data in Kibana, you'll need to create an index pattern to retrieve data from Elasticsearch.

      In that case, see the docs for creating an index pattern for Kibana

      Screenshot of Kibana user interface showing the &quot;Discover&quot; page for running queries

    4. Enter a query that returns the granule records that you want to use for bulk operations:

      Screenshot of Kibana user interface showing an example Kibana query and results

    5. Once the Kibana query is returning the results you want, click the "Inspect" link near the top of the page. A slide out tab with request details will appear on the right side of the page:

      Screenshot of Kibana user interface showing details of an example request

    6. In the slide out tab that appears on the right side of the page, click the "Request" link near the top and scroll down until you see the query property:

      Screenshot of Kibana user interface showing the Elasticsearch data request made for a given Kibana query

    7. Highlight and copy the query contents from Kibana. Go back to the Cumulus dashboard and paste the query contents from Kibana inside of the query property in the bulk granules request payload. It is expected that you should have a property of query nested inside of the existing query property:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query information populated

    8. Add values for the index and workflowName to the bulk granules request payload. The value for index will vary based on your Elasticsearch setup, but it is good to target an index specifically for granule data if possible:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query, index, and workflow information populated

    9. Click the "Run Bulk Operations" button. You should see a confirmation message, including an ID for the async operation that was started to handle your bulk action. You can track the status of this async operation on the Operations dashboard page, which can be visited by clicking the "Go To Operations" button:

      Screenshot of Cumulus dashboard showing confirmation message with async operation ID for bulk granules request

    Creating an index pattern for Kibana

    1. Define the index pattern for the indices that your Kibana queries should use. A wildcard character, *, will match across multiple indices. Once you are satisfied with your index pattern, click the "Next step" button:

      Screenshot of Kibana user interface for defining an index pattern

    2. Choose whether to use a Time Filter for your data, which is not required. Then click the "Create index pattern" button:

      Screenshot of Kibana user interface for configuring the settings of an index pattern

    Status Tracking

    All bulk operations return an AsyncOperationId which can be submitted to the /asyncOperations endpoint.

    The /asyncOperations endpoint allows listing of AsyncOperation records as well as record retrieval for individual records, which will contain the status. The Cumulus API documentation shows sample requests for these actions.

    The Cumulus Dashboard also includes an Operations monitoring page, where operations and their status are visible:

    Screenshot of Cumulus Dashboard Operations Page showing 5 operations and their status, ID, description, type and creation timestamp

    - + \ No newline at end of file diff --git a/docs/v12.0.0/operator-docs/cmr-operations/index.html b/docs/v12.0.0/operator-docs/cmr-operations/index.html index 175cb15f8ee..077fdf2aa18 100644 --- a/docs/v12.0.0/operator-docs/cmr-operations/index.html +++ b/docs/v12.0.0/operator-docs/cmr-operations/index.html @@ -5,7 +5,7 @@ CMR Operations | Cumulus Documentation - + @@ -16,7 +16,7 @@ UpdateCmrAccessConstraints will update CMR metadata file contents on S3, and PostToCmr will push the updates to CMR. The rest of this section will assume you have created this workflow under the name UpdateCmrAccessConstraints.

    Once created and deployed, the workflow is available in the Cumulus dashboard's Execute workflow selector. However, note that additional configuration is required for this request, to supply an access constraint integer value and optional description to the UpdateCmrAccessConstraints workflow, by clicking the Add Custom Workflow Meta option in the Execute popup, as shown below:

    Screenshot showing granule execute popup with &#39;updateCmrAccessConstraints&#39; selected and configuration values shown in a collapsible JSON field

    An example invocation of the API to perform this action is:

    $ curl --request PUT https://example.com/granules/MOD11A1.A2017137.h19v16.006.2017138085750 \
    --header 'Authorization: Bearer ReplaceWithTheToken' \
    --header 'Content-Type: application/json' \
    --data '{
    "action": "applyWorkflow",
    "workflow": "updateCmrAccessConstraints",
    "meta": {
    accessConstraints: {
    value: 5,
    description: "sample access constraint"
    }
    }
    }'

    Supported CMR metadata formats for the above operation are Echo10XML and UMMG-JSON, which will populate the RestrictionFlag and RestrictionComment fields in Echo10XML, or the AccessConstraints values in UMMG-JSON.

    Additional Operations

    At this time Cumulus does not, out of the box, support additional operations on CMR metadata. However, given the examples shown above, we recommend working with your integrators to develop additional workflows that perform any required operations.

    Bulk CMR operations

    In order to perform the above operations in bulk, Cumulus supports the use of ApplyWorkflow in an AsyncOperation. These are accessed via the Bulk Operation button on the dashboard, or the /granules/bulk endpoint on the Cumulus API.

    More information on bulk operations are in the bulk operations operator doc.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/operator-docs/create-rule-in-cumulus/index.html b/docs/v12.0.0/operator-docs/create-rule-in-cumulus/index.html index cf8cd83e7e1..0fb8032bce3 100644 --- a/docs/v12.0.0/operator-docs/create-rule-in-cumulus/index.html +++ b/docs/v12.0.0/operator-docs/create-rule-in-cumulus/index.html @@ -5,13 +5,13 @@ Create Rule In Cumulus | Cumulus Documentation - +
    Version: v12.0.0

    Create Rule In Cumulus

    Once the above files are in place and the entries created in CMR and Cumulus, we are ready to begin ingesting data. Depending on the type of ingestion (FTP/Kinesis, etc) the values below will change, but for the most part they are all similar. Rules tell Cumulus how to associate providers and collections, and when/how to start processing a workflow.

    Steps

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v12.0.0/operator-docs/discovery-filtering/index.html b/docs/v12.0.0/operator-docs/discovery-filtering/index.html index 46374b4a1fc..9076077e1ba 100644 --- a/docs/v12.0.0/operator-docs/discovery-filtering/index.html +++ b/docs/v12.0.0/operator-docs/discovery-filtering/index.html @@ -5,7 +5,7 @@ Discovery Filtering | Cumulus Documentation - + @@ -24,7 +24,7 @@ directly list the provider_path. If the path contains regular expression components, this may fail.

    It is recommended that operators diagnose any failures by checking error logs and ensuring that permissions on the remote file system allow reading of the default directory and any subdirectories that match the filter.

    Supported protocols

    Currently support for this feature is limited to the following protocols:

    • ftp
    • sftp
    - + \ No newline at end of file diff --git a/docs/v12.0.0/operator-docs/granule-workflows/index.html b/docs/v12.0.0/operator-docs/granule-workflows/index.html index 6dee74439c2..769d286a84e 100644 --- a/docs/v12.0.0/operator-docs/granule-workflows/index.html +++ b/docs/v12.0.0/operator-docs/granule-workflows/index.html @@ -5,13 +5,13 @@ Granule Workflows | Cumulus Documentation - +
    Version: v12.0.0

    Granule Workflows

    Failed Granule

    Delete and Ingest

    1. Delete Granule

    Note: Granules published to CMR will need to be removed from CMR via the dashboard prior to deletion

    1. Ingest Granule via Ingest Rule
    • Re-trigger a one-time, kinesis, SQS, or SNS rule or a scheduled rule will re-discover and reingest the deleted granule.

    Reingest

    1. Select Failed Granule
    • In the Cumulus dashboard, go to the Collections page.
    • Use search field to find the granule.
    1. Re-ingest Granule
    • Go to the Collections page.
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of the Reingest modal workflow

    Delete and Ingest

    1. Bulk Delete Granules
    • Go to the Granules page.
    • Use the Bulk Delete button to bulk delete selected granules or select via a Kibana query

    Note: You can optionally force deletion from CMR

    1. Ingest Granules via Ingest Rule
    • Re-trigger one-time, kinesis, SQS, or SNS rules or scheduled rules will re-discover and reingest the deleted granule.

    Multiple Failed Granules

    1. Select Failed Granules
    • In the Cumulus dashboard, go to the Collections page.
    • Click on Failed Granules.
    • Select multiple granules.

    Screenshot of selected multiple granules

    1. Bulk Re-ingest Granules
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of Bulk Reingest modal workflow

    - + \ No newline at end of file diff --git a/docs/v12.0.0/operator-docs/kinesis-stream-for-ingest/index.html b/docs/v12.0.0/operator-docs/kinesis-stream-for-ingest/index.html index 56ce357b445..22d085ec854 100644 --- a/docs/v12.0.0/operator-docs/kinesis-stream-for-ingest/index.html +++ b/docs/v12.0.0/operator-docs/kinesis-stream-for-ingest/index.html @@ -5,13 +5,13 @@ Setup Kinesis Stream & CNM Message | Cumulus Documentation - +
    Version: v12.0.0

    Setup Kinesis Stream & CNM Message

    Note: Keep in mind that you should only have to set this up once per ingest stream. Kinesis pricing is based on the shard value and not on amount of kinesis usage.

    1. Create a Kinesis Stream

      • In your AWS console, go to the Kinesis service and click Create Data Stream.
      • Assign a name to the stream.
      • Apply a shard value of 1.
      • Click on Create Kinesis Stream.
      • A status page with stream details display. Once the status is active then the stream is ready to use. Keep in mind to record the streamName and StreamARN for later use.

      Screenshot of AWS console page for creating a Kinesis stream

    2. Create a Rule

    3. Send a message

      • Send a message that makes your schema using python or by your command line.
      • The streamName and Collection must match the kinesisArn+collection defined in the rule that you have created in Step 2.
    - + \ No newline at end of file diff --git a/docs/v12.0.0/operator-docs/locating-access-logs/index.html b/docs/v12.0.0/operator-docs/locating-access-logs/index.html index 7399c66be66..101d7164039 100644 --- a/docs/v12.0.0/operator-docs/locating-access-logs/index.html +++ b/docs/v12.0.0/operator-docs/locating-access-logs/index.html @@ -5,13 +5,13 @@ Locating S3 Access Logs | Cumulus Documentation - +
    Version: v12.0.0

    Locating S3 Access Logs

    When enabling S3 Access Logs for EMS Reporting you configured a TargetBucket and TargetPrefix. Inside the TargetBucket at the TargetPrefix is where you will find the raw S3 access logs.

    In a standard deployment, this will be your stack's <internal bucket name> and a key prefix of <stack>/ems-distribution/s3-server-access-logs/

    - + \ No newline at end of file diff --git a/docs/v12.0.0/operator-docs/naming-executions/index.html b/docs/v12.0.0/operator-docs/naming-executions/index.html index 3cddcb26951..1cf6c9cc6d8 100644 --- a/docs/v12.0.0/operator-docs/naming-executions/index.html +++ b/docs/v12.0.0/operator-docs/naming-executions/index.html @@ -5,7 +5,7 @@ Naming Executions | Cumulus Documentation - + @@ -21,7 +21,7 @@ QueuePdrs step.

    In the following excerpt, the QueueGranules config.executionNamePrefix property is set using the value configured in the workflow's meta.executionNamePrefix.

    Please note: This meta.executionNamePrefix property should not be confused with the optional rule executionNamePrefix property from the previous section. Setting executionNamePrefix as a root property of the rule will set a prefix for the names of any workflows triggered by the rule. Setting meta.executionNamePrefix on the rule will set meta.executionNamePrefix in the workflow messages generated for this rule, allowing workflow steps like QueueGranules to read from the message meta.executionNamePrefix for their config. Then, workflows scheduled by QueueGranules would use the configured execution name prefix.

    Setting executionNamePrefix config for QueueGranules using rule.meta

    If you wanted to use a prefix of "my-prefix", you would create a rule with a meta property similar to the following Rule snippet:

    {
    ...other rule keys here...
    "meta":
    {
    "executionNamePrefix": "my-prefix"
    }
    }

    The value of meta.executionNamePrefix from the rule will be set as meta.executionNamePrefix in the workflow message.

    Then, the workflow could contain a "QueueGranules" step with the following state, which uses meta.executionNamePrefix from the message as the value for the executionNamePrefix config to the "QueueGranules" step:

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "executionNamePrefix": "{$.meta.executionNamePrefix}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },
    }
    - + \ No newline at end of file diff --git a/docs/v12.0.0/operator-docs/ops-common-use-cases/index.html b/docs/v12.0.0/operator-docs/ops-common-use-cases/index.html index d538590817c..955ad8f88ea 100644 --- a/docs/v12.0.0/operator-docs/ops-common-use-cases/index.html +++ b/docs/v12.0.0/operator-docs/ops-common-use-cases/index.html @@ -5,13 +5,13 @@ Operator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v12.0.0/operator-docs/trigger-workflow/index.html b/docs/v12.0.0/operator-docs/trigger-workflow/index.html index 24c6a713e20..460fa7744c0 100644 --- a/docs/v12.0.0/operator-docs/trigger-workflow/index.html +++ b/docs/v12.0.0/operator-docs/trigger-workflow/index.html @@ -5,13 +5,13 @@ Trigger a Workflow Execution | Cumulus Documentation - +
    Version: v12.0.0

    Trigger a Workflow Execution

    To trigger a workflow, you need to create a rule. To trigger an ingest workflow, one that requires discovering and ingesting data, you will also need to configure the collection and provider and associate those to a rule.

    Trigger a HelloWorld Workflow

    To trigger a HelloWorld workflow that does not need to discover or archive data, you just need to create a rule.

    You can leave the provider and collection blank and do not need any additional metadata. If you create a onetime rule, the workflow execution will start momentarily and you can view its status on the Executions page.

    Trigger an Ingest Workflow

    To ingest data, you will need a provider and collection configured to tell your workflow where to discover data and where to archive the data respectively.

    Follow the instructions to create a provider and create a collection and configure their fields for your data ingest.

    In the rule's additional metadata you can specify a provider_path from which to get the data from the provider.

    Example: Ingest data from S3

    Setup

    Assume there are 2 files to be ingested in an S3 bucket called discovery-bucket, located in the test-data folder:

    • GRANULE.A2017025.jpg
    • GRANULE.A2017025.hdf

    Archive buckets should already be created and mapped to public / private / protected in the Cumulus deployment.

    For example:

    buckets = {
    private = {
    name = "discovery-bucket"
    type = "private"
    },
    protected = {
    name = "archive-protected"
    type = "protected"
    }
    public = {
    name = "archive-public"
    type = "public"
    }
    }

    Create a provider

    Create a new provider. Set protocol to S3 and Host to discovery-bucket.

    Screenshot of adding a sample S3 provider

    Create a collection

    Create a new collection. Configure the collection to extract the granule id from the filenames and configure where to store the granule files.

    The configuration below will store hdf files in the protected bucket and jpg files in the private bucket. The bucket types are

    {
    "name": "test-collection",
    "version": "001",
    "granuleId": "^GRANULE\\.A[\\d]{7}$",
    "granuleIdExtraction": "(GRANULE\\..*)(\\.hdf|\\.jpg)",
    "reportToEms": false,
    "sampleFileName": "GRANULE.A2017025.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^GRANULE\\.A[\\d]{7}\\.hdf$",
    "sampleFileName": "GRANULE.A2017025.hdf"
    },
    {
    "bucket": "public",
    "regex": "^GRANULE\\.A[\\d]{7}\\.jpg$",
    "sampleFileName": "GRANULE.A2017025.jpg"
    }
    ]
    }

    Create a rule

    Create a rule to trigger the workflow to discover your granule data and ingest your granule.

    Select the previously created provider and collection. See the Cumulus Discover Granules workflow for a workflow example of using Cumulus tasks to discover and queue data for ingest.

    In the rule meta, set the provider_path to test-data, so the test-data folder will be used to discover new granules.

    Screenshot of adding a Discover Granules rule

    A onetime rule will run your workflow on-demand and you can view it on the dashboard Executions page. The Cumulus Discover Granules workflow will trigger an ingest workflow and your ingested granules will be visible on the dashboard Granules page.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/tasks/index.html b/docs/v12.0.0/tasks/index.html index 9dbaad8a702..c392375e662 100644 --- a/docs/v12.0.0/tasks/index.html +++ b/docs/v12.0.0/tasks/index.html @@ -5,13 +5,13 @@ Cumulus Tasks | Cumulus Documentation - +
    Version: v12.0.0

    Cumulus Tasks

    A list of reusable Cumulus tasks. Add your own.

    NOTE: For a detailed description of each task, visit the task's README.md. Information on the input or output of a task is specified in the task's schemas directory.

    Tasks

    @cumulus/add-missing-file-checksums

    Add checksums to files in S3 which don't have one


    @cumulus/discover-granules

    Discover Granules in FTP/HTTP/HTTPS/SFTP/S3 endpoints


    @cumulus/discover-pdrs

    Discover PDRs in FTP and HTTP endpoints


    @cumulus/files-to-granules

    Converts array-of-files input into a granules object by extracting granuleId from filename


    @cumulus/hello-world

    Example task


    @cumulus/hyrax-metadata-updates

    Update granule metadata with hooks to OPeNDAP URL


    @cumulus/lzards-backup

    Run LZARDS backup


    @cumulus/move-granules

    Move granule files from staging to final location


    @cumulus/parse-pdr

    Download and Parse a given PDR


    @cumulus/pdr-status-check

    Checks execution status of granules in a PDR


    @cumulus/post-to-cmr

    Post a given granule to CMR


    @cumulus/queue-granules

    Add discovered granules to the queue


    @cumulus/queue-pdrs

    Add discovered PDRs to a queue


    @cumulus/queue-workflow

    Add workflow to the queue


    @cumulus/sf-sqs-report

    Sends an incoming Cumulus message to SQS


    @cumulus/sync-granule

    Download a given granule


    @cumulus/test-processing

    Fake processing task used for integration tests


    @cumulus/update-cmr-access-constraints

    Updates CMR metadata to set access constraints


    Update CMR metadata files with correct online access urls and etags and transfer etag info to granules' CMR files

    - + \ No newline at end of file diff --git a/docs/v12.0.0/team/index.html b/docs/v12.0.0/team/index.html index d33e0a47011..38260208bc8 100644 --- a/docs/v12.0.0/team/index.html +++ b/docs/v12.0.0/team/index.html @@ -5,13 +5,13 @@ Cumulus Team | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v12.0.0/troubleshooting/index.html b/docs/v12.0.0/troubleshooting/index.html index 72283501816..c1214c429de 100644 --- a/docs/v12.0.0/troubleshooting/index.html +++ b/docs/v12.0.0/troubleshooting/index.html @@ -5,14 +5,14 @@ How to Troubleshoot and Fix Issues | Cumulus Documentation - +
    Version: v12.0.0

    How to Troubleshoot and Fix Issues

    While Cumulus is a complex system, there is a focus on maintaining the integrity and availability of the system and data. Should you encounter errors or issues while using this system, this section will help troubleshoot and solve those issues.

    Backup and Restore

    Cumulus has backup and restore functionality built-in to protect Cumulus data and allow recovery of a Cumulus stack. This is currently limited to Cumulus data and not full S3 archive data. Backup and restore is not enabled by default and must be enabled and configured to take advantage of this feature.

    For more information, read the Backup and Restore documentation.

    Elasticsearch reindexing

    If you run into issues with your Elasticsearch index, a reindex operation is available via the Cumulus API. See the Reindexing Guide.

    Information on how to reindex Elasticsearch is in the Cumulus API documentation.

    Troubleshooting Workflows

    Workflows are state machines comprised of tasks and services and each component logs to CloudWatch. The CloudWatch logs for all steps in the execution are displayed in the Cumulus dashboard or you can find them by going to CloudWatch and navigating to the logs for that particular task.

    Workflow Errors

    Visual representations of executed workflows can be found in the Cumulus dashboard or the AWS Step Functions console for that particular execution.

    If a workflow errors, the error will be handled according to the error handling configuration. The task that fails will have the exception field populated in the output, giving information about the error. Further information can be found in the CloudWatch logs for the task.

    Graph of AWS Step Function execution showing a failing workflow

    Workflow Did Not Start

    Generally, first check your rule configuration. If that is satisfactory, the answer will likely be in the CloudWatch logs for the schedule SF or SF starter lambda functions. See the workflow triggers page for more information on how workflows start.

    For Kinesis and SNS rules specifically, if an error occurs during the message consumer process, the fallback consumer lambda will be called and if the message continues to error, a message will be placed on the dead letter queue. Check the dead letter queue for a failure message. Errors can be traced back to the CloudWatch logs for the message consumer and the fallback consumer. Additionally, check that the name and version match those configured in your rule, as rules are filtered by the notification's collection name and version before scheduling executions.

    More information on kinesis error handling is here.

    Operator API Errors

    All operator API calls are funneled through the ApiEndpoints lambda. Each API call is logged to the ApiEndpoints CloudWatch log for your deployment.

    Lambda Errors

    KMS Exception: AccessDeniedException

    KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.

    The above error was being thrown by cumulus lambda function invocation. The KMS key is the encryption key used to encrypt lambda environment variables. The root cause of this error is unknown, but is speculated to be caused by deleting and recreating, with the same name, the IAM role the lambda uses.

    This error can be resolved by switching the lambda's execution role to a different one and then back through the Lambda management console. Unfortunately, this approach doesn't scale well.

    The other resolution (that scales but takes some time) that was found is as follows:

    1. Comment out all lambda definitions (and dependent resources) in your Terraform configuration.
    2. terraform apply to delete the lambdas.
    3. Un-comment the definitions.
    4. terraform apply to recreate the lambdas.

    If this problem occurs with Core lambdas and you are using the terraform-aws-cumulus.zip file source distributed in our release, we recommend using the non-scaling approach as the number of lambdas we distribute is in the low teens, which are likely to be easier and faster to reconfigure one-by-one compared to editing our configs.

    Error: Unable to import module 'index': Error

    This error is shown in the CloudWatch logs for a Lambda function.

    One possible cause is that the Lambda definition in the .tf file defining the lambda is not pointing to the correct packaged lambda source file. In order to resolve this issue, update the lambda definition to point directly to the packaged (e.g. .zip) lambda source file.

    resource "aws_lambda_function" "discover_granules_task" {
    function_name = "${var.prefix}-DiscoverGranules"
    filename = "${path.module}/../../tasks/discover-granules/dist/lambda.zip"
    handler = "index.handler"
    }

    If you are seeing this error when using the Lambda as a step in a Cumulus workflow, then inspect the output for this Lambda step in the AWS Step Function console. If you see the error Cannot find module 'node_modules/@cumulus/cumulus-message-adapter-js', then you need to ensure the lambda's packaged dependencies include cumulus-message-adapter-js.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/troubleshooting/reindex-elasticsearch/index.html b/docs/v12.0.0/troubleshooting/reindex-elasticsearch/index.html index 6c653d16cd0..569e806712b 100644 --- a/docs/v12.0.0/troubleshooting/reindex-elasticsearch/index.html +++ b/docs/v12.0.0/troubleshooting/reindex-elasticsearch/index.html @@ -5,7 +5,7 @@ Reindexing Elasticsearch Guide | Cumulus Documentation - + @@ -14,7 +14,7 @@ current index, or the mappings for an index have been updated (they do not update automatically). Any reindexing that will be required when upgrading Cumulus will be in the Migration Steps section of the changelog.

    Switch to a new index and Reindex

    There are two operations needed: reindex and change-index to switch over to the new index. A Change Index/Reindex can be done in either order, but both have their trade-offs.

    If you decide to point Cumulus to a new (empty) index first (with a change index operation), and then Reindex the data to the new index, data ingested while reindexing will automatically be sent to the new index. As reindexing operations can take a while, not all the data will show up on the Cumulus Dashboard right away. The advantage is you do not have to turn of any ingest operations. This way is recommended.

    If you decide to Reindex data to a new index first, and then point Cumulus to that new index, it is not guaranteed that data that is sent to the old index while reindexing will show up in the new index. If you prefer this way, it is recommended to turn off any ingest operations. This order will keep your dashboard data from seeing any interruption.

    Change Index

    This will point Cumulus to the index in Elasticsearch that will be used when retrieving data. Performing a change index operation to an index that does not exist yet will create the index for you. The change index operation can be found here.

    Reindex from the old index to the new index

    The reindex operation will take the data from one index and copy it into another index. The reindex operation can be found here

    Reindex status

    Reindexing is a long-running operation. The reindex-status endpoint can be used to monitor the progress of the operation.

    Index from database

    If you want to just grab the data straight from the database you can perform an Index from Database Operation. After the data is indexed from the database, a Change Index operation will need to be performed to ensure Cumulus is pointing to the right index. It is strongly recommended to turn off workflow rules when performing this operation so any data ingested to the database is not lost.

    Validate reindex

    To validate the reindex, use the reindex-status endpoint. The doc count can be used to verify that the reindex was successful. In the below example the reindex from cumulus-2020-11-3 to cumulus-2021-3-4 was not fully successful as they show different doc counts.

    "indices": {
    "cumulus-2020-11-3": {
    "primaries": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    },
    "total": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    }
    },
    "cumulus-2021-3-4": {
    "primaries": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    },
    "total": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    }
    }
    }

    To further drill down into what is missing, log in to the Kibana instance (found in the Elasticsearch section of the AWS console) and run the following command replacing <index> with your index name.

    GET <index>/_search
    {
    "aggs": {
    "count_by_type": {
    "terms": {
    "field": "_type"
    }
    }
    },
    "size": 0
    }

    which will produce a result like

    "aggregations": {
    "count_by_type": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "logs",
    "doc_count": 483955
    },
    {
    "key": "execution",
    "doc_count": 4966
    },
    {
    "key": "deletedgranule",
    "doc_count": 4715
    },
    {
    "key": "pdr",
    "doc_count": 1822
    },
    {
    "key": "granule",
    "doc_count": 740
    },
    {
    "key": "asyncOperation",
    "doc_count": 616
    },
    {
    "key": "provider",
    "doc_count": 108
    },
    {
    "key": "collection",
    "doc_count": 87
    },
    {
    "key": "reconciliationReport",
    "doc_count": 48
    },
    {
    "key": "rule",
    "doc_count": 7
    }
    ]
    }
    }

    Resuming a reindex

    If a reindex operation did not fully complete it can be resumed using the following command run from the Kibana instance.

    POST _reindex?wait_for_completion=false
    {
    "conflicts": "proceed",
    "source": {
    "index": "cumulus-2020-11-3"
    },
    "dest": {
    "index": "cumulus-2021-3-4",
    "op_type": "create"
    }
    }

    The Cumulus API reindex-status endpoint can be used to monitor completion of this operation.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/troubleshooting/rerunning-workflow-executions/index.html b/docs/v12.0.0/troubleshooting/rerunning-workflow-executions/index.html index fd19ba65c9b..65e56aeb22b 100644 --- a/docs/v12.0.0/troubleshooting/rerunning-workflow-executions/index.html +++ b/docs/v12.0.0/troubleshooting/rerunning-workflow-executions/index.html @@ -5,13 +5,13 @@ Re-running workflow executions | Cumulus Documentation - +
    Version: v12.0.0

    Re-running workflow executions

    To re-run a Cumulus workflow execution from the AWS console:

    1. Visit the page for an individual workflow execution

    2. Click the "New execution" button at the top right of the screen

      Screenshot of the AWS console for a Step Function execution highlighting the &quot;New execution&quot; button at the top right of the screen

    3. In the "New execution" modal that appears, replace the cumulus_meta.execution_name value in the default input with the value of the new execution ID as seen in the screenshot below

      Screenshot of the AWS console showing the modal window for entering input when running a new Step Function execution

    4. Click the "Start execution" button

    - + \ No newline at end of file diff --git a/docs/v12.0.0/troubleshooting/troubleshooting-deployment/index.html b/docs/v12.0.0/troubleshooting/troubleshooting-deployment/index.html index 42ac6ba6fda..464f510a316 100644 --- a/docs/v12.0.0/troubleshooting/troubleshooting-deployment/index.html +++ b/docs/v12.0.0/troubleshooting/troubleshooting-deployment/index.html @@ -5,7 +5,7 @@ Troubleshooting Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ data-persistence modules, but your config is only creating one Elasticsearch instance. To fix the issue, update the elasticsearch_config variable for your data-persistence module to increase the number of instances:

    {
    domain_name = "es"
    instance_count = 2
    instance_type = "t2.small.elasticsearch"
    version = "5.3"
    volume_size = 10
    }

    Install dashboard

    Dashboard configuration

    Issues:

    • Problem clearing the cache: EACCES: permission denied, rmdir '/tmp/gulp-cache/default'", this probably means the files at that location, and/or the folder, are owned by someone else (or some other factor prevents you from writing there).

    It's possible to workaround this by editing the file cumulus-dashboard/node_modules/gulp-cache/index.js and alter the value of the line var fileCache = new Cache({cacheDirName: 'gulp-cache'}); to something like var fileCache = new Cache({cacheDirName: '<prefix>-cache'});. Now gulp-cache will be able to write to /tmp/<prefix>-cache/default, and the error should resolve.

    Dashboard deployment

    Issues:

    • If the dashboard sends you to an Earthdata Login page that has an error reading "Invalid request, please verify the client status or redirect_uri before resubmitting", this means you've either forgotten to update one or more of your EARTHDATA_CLIENT_ID, EARTHDATA_CLIENT_PASSWORD environment variables (from your app/.env file) and re-deploy Cumulus, or you haven't placed the correct values in them, or you've forgotten to add both the "redirect" and "token" URL to the Earthdata Application.
    • There is odd caching behavior associated with the dashboard and Earthdata Login at this point in time that can cause the above error to reappear on the Earthdata Login page loaded by the dashboard even after fixing the cause of the error. If you experience this, attempt to access the dashboard in a new browser window, and it should work.
    - + \ No newline at end of file diff --git a/docs/v12.0.0/upgrade-notes/cumulus_distribution_migration/index.html b/docs/v12.0.0/upgrade-notes/cumulus_distribution_migration/index.html index ecd35cc26b3..f1f212dfec5 100644 --- a/docs/v12.0.0/upgrade-notes/cumulus_distribution_migration/index.html +++ b/docs/v12.0.0/upgrade-notes/cumulus_distribution_migration/index.html @@ -5,14 +5,14 @@ Migrate from TEA deployment to Cumulus Distribution | Cumulus Documentation - +
    Version: v12.0.0

    Migrate from TEA deployment to Cumulus Distribution

    Background

    The Cumulus Distribution API is configured to use the AWS Cognito OAuth client. This API can be used instead of the Thin Egress App, which is the default distribution API if using the Deployment Template.

    Configuring a Cumulus Distribution deployment

    See these instructions for deploying the Cumulus Distribution API.

    Important note if migrating from TEA to Cumulus Distribution

    If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/upgrade-notes/migrate_tea_standalone/index.html b/docs/v12.0.0/upgrade-notes/migrate_tea_standalone/index.html index 383940aff06..27767c3cb3d 100644 --- a/docs/v12.0.0/upgrade-notes/migrate_tea_standalone/index.html +++ b/docs/v12.0.0/upgrade-notes/migrate_tea_standalone/index.html @@ -5,13 +5,13 @@ Migrate TEA deployment to standalone module | Cumulus Documentation - +
    Version: v12.0.0

    Migrate TEA deployment to standalone module

    Background

    This document is only relevant for upgrades of Cumulus from versions < 3.x.x to versions > 3.x.x

    Previous versions of Cumulus included deployment of the Thin Egress App (TEA) by default in the distribution module. As a result, Cumulus users who wanted to deploy a new version of TEA to wait on a new release of Cumulus that incorporated that release.

    In order to give Cumulus users the flexibility to deploy newer versions of TEA whenever they want, deployment of TEA has been removed from the distribution module and Cumulus users must now add the TEA module to their deployment. Guidance on integrating the TEA module to your deployment is provided, or you can refer to Cumulus core example deployment code for the thin_egress_app module.

    By default, when upgrading Cumulus and moving from TEA deployed via the distribution module to deployed as a separate module, your API gateway for TEA would be destroyed and re-created, which could cause outages for any Cloudfront endpoints pointing at that API gateway.

    These instructions outline how to modify your state to preserve your existing Thin Egress App (TEA) API gateway when upgrading Cumulus and moving deployment of TEA to a standalone module. If you do not care about preserving your API gateway for TEA when upgrading your Cumulus deployment, you can skip these instructions.

    Prerequisites

    Notes about state management

    These instructions will involve manipulating your Terraform state via terraform state mv commands. These operations are extremely dangerous, since a mistake in editing your Terraform state can leave your stack in a corrupted state where deployment may be impossible or may result in unanticipated resource deletion.

    Since bucket versioning preserves a separate version of your state file each time it is written, and the Terraform state modification commands overwrite the state file, we can mitigate the risk of these operations by downloading the most recent state file before starting the upgrade process. Then, if anything goes wrong during the upgrade, we can restore that previous state version. Guidance on how to perform both operations is provided below.

    Download your most recent state version

    Run this command to download the most recent cumulus deployment state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp s3://BUCKET/KEY /path/to/terraform.tfstate

    Restore a previous state version

    Upload the state file that was previously downloaded to the bucket/key for your state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp /path/to/terraform.tfstate s3://BUCKET/KEY

    Then run terraform plan, which will give an error because we manually overwrote the state file and it is now out of sync with the lock table Terraform uses to track your state file:

    Error: Error loading state: state data in S3 does not have the expected content.

    This may be caused by unusually long delays in S3 processing a previous state
    update. Please wait for a minute or two and try again. If this problem
    persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
    to manually verify the remote state and update the Digest value stored in the
    DynamoDB table to the following value: <some-digest-value>

    To resolve this error, run this command and replace DYNAMO_LOCK_TABLE, BUCKET and KEY with the correct values from cumulus-tf/terraform.tf, and use the digest value from the previous error output:

     aws dynamodb put-item \
    --table-name DYNAMO_LOCK_TABLE \
    --item '{
    "LockID": {"S": "BUCKET/KEY-md5"},
    "Digest": {"S": "some-digest-value"}
    }'

    Now, if you re-run terraform plan, it should work as expected.

    Migration instructions

    Please note: These instructions assume that you are deploying the thin_egress_app module as shown in the Cumulus core example deployment code

    1. Ensure that you have downloaded the latest version of your state file for your cumulus deployment

    2. Find the URL for your <prefix>-thin-egress-app-EgressGateway API gateway. Confirm that you can access it in the browser and that it is functional.

    3. Run terraform plan. You should see output like (edited for readability):

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be created
      + resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket.lambda_source will be created
      + resource "aws_s3_bucket" "lambda_source" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be created
      + resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be created
      + resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be created
      + resource "aws_s3_bucket_object" "lambda_source" {

      # module.thin_egress_app.aws_security_group.egress_lambda[0] will be created
      + resource "aws_security_group" "egress_lambda" {

      ...

      # module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be destroyed
      - resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source will be destroyed
      - resource "aws_s3_bucket" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be destroyed
      - resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be destroyed
      - resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source will be destroyed
      - resource "aws_s3_bucket_object" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda[0] will be destroyed
      - resource "aws_security_group" "egress_lambda" {
    4. Run the state modification commands. The commands must be run in exactly this order:

       # Move security group
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda module.thin_egress_app.aws_security_group.egress_lambda

      # Move TEA storage bucket
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source module.thin_egress_app.aws_s3_bucket.lambda_source

      # Move TEA lambda source code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source module.thin_egress_app.aws_s3_bucket_object.lambda_source

      # Move TEA lambda dependency code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive

      # Move TEA Cloudformation template
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template module.thin_egress_app.aws_s3_bucket_object.cloudformation_template

      # Move URS creds secret version
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret_version.thin_egress_urs_creds aws_secretsmanager_secret_version.thin_egress_urs_creds

      # Move URS creds secret
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret.thin_egress_urs_creds aws_secretsmanager_secret.thin_egress_urs_creds

      # Move TEA Cloudformation stack
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app module.thin_egress_app.aws_cloudformation_stack.thin_egress_app

      Depending on how you were supplying a bucket map to TEA, there may be an additional step. If you were specifying the bucket_map_key variable to the cumulus module to use a custom bucket map, then you can ignore this step and just ensure that the bucket_map_file variable to the TEA module uses that same S3 key. Otherwise, if you were letting Cumulus generate a bucket map for you, then you need to take this step to migrate that bucket map:

      # Move bucket map
      terraform state mv module.cumulus.module.distribution.aws_s3_bucket_object.bucket_map_yaml[0] aws_s3_bucket_object.bucket_map_yaml
    5. Run terraform plan again. You may still see a few additions/modifications pending like below, but you should not see any deletion of Thin Egress App resources pending:

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be updated in-place
      ~ resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be updated in-place
      ~ resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_source" {

      If you still see deletion of module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app pending, then something went wrong and you should restore the previously downloaded state file version and start over from step 1. Otherwise, proceed to step 6.

    6. Once you have confirmed that everything looks as expected, run terraform apply.

    7. Visit the same API gateway from step 1 and confirm that it still works.

    Your TEA deployment has now been migrated to a standalone module, which gives you the ability to upgrade the deployed version of TEA independently of Cumulus releases.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/upgrade-notes/update-cma-2.0.2/index.html b/docs/v12.0.0/upgrade-notes/update-cma-2.0.2/index.html index fbc35e0f2fa..a6abd28c0d1 100644 --- a/docs/v12.0.0/upgrade-notes/update-cma-2.0.2/index.html +++ b/docs/v12.0.0/upgrade-notes/update-cma-2.0.2/index.html @@ -5,13 +5,13 @@ Upgrade to CMA 2.0.2 | Cumulus Documentation - +
    Version: v12.0.0

    Upgrade to CMA 2.0.2

    Updating a Cumulus Deployment to CMA 2.0.2

    Background

    The Cumulus Message Adapter has been updated in release 2.0.2 to no longer utilize the AWS step function API to look up the defined name of a step function task for population in meta.workflow_tasks, but instead use an incrementing integer field.

    Additionally a bugfix was released in the form of v2.0.1/v2.0.2 following the initial 2.0.0 release, so all users should update to release 2.0.2

    The update is not tied to a particular version of Core, however the update should be done across all task components in order to ensure consistent execution records.

    Changes

    Execution Record Update

    This update functionally means that Cumulus tasks/activities using the CMA will now record a record that looks like the following in meta.workflowtasks, and more importantly in the tasks column for an execution record:

    Original

          "DiscoverGranules": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "QueueGranules": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    New

          "0": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "1": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    Actions Required

    The following should be done as part of a Cumulus stack update to utilize cumulus message adapter > 2.0.2:

    • Python tasks that utilize cumulus-message-adapter-python should be updated to use > 2.0.0, their lambdas rebuilt and Cumulus workflows reconfigured to use the updated version.

    • Python activities that utilize cumulus-process-py should be rebuilt using > 1.0.0 with updated dependencies, and have their images deployed/Cumulus configured to use the new version.

    • The cumulus-message-adapter v2.0.2 lambda layer should be made available in the deployment account, and the Cumulus deployment should be reconfigured to use it (via the cumulus_message_adapter_lambda_layer_version_arn variable in the cumulus module). This should address all Core node.js tasks that utilize the CMA, and many contributed node.js/JAVA components.

    Once the above have been done, redeploy Cumulus to apply the configuration and the updates should be live.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/upgrade-notes/update-task-file-schemas/index.html b/docs/v12.0.0/upgrade-notes/update-task-file-schemas/index.html index a123986e3a1..f60cd8eaa0b 100644 --- a/docs/v12.0.0/upgrade-notes/update-task-file-schemas/index.html +++ b/docs/v12.0.0/upgrade-notes/update-task-file-schemas/index.html @@ -5,13 +5,13 @@ Updates to task granule file schemas | Cumulus Documentation - +
    Version: v12.0.0

    Updates to task granule file schemas

    Background

    Most Cumulus workflow tasks expect as input a payload of granule(s) which contain the files for each granule. Most tasks also return this same granule structure as output.

    However, up to this point, there was inconsistency in the schemas for the granule files objects expected by each task. Furthermore, there was no guarantee of consistency between granule files objects as stored in the database and the expectations of any given workflow task.

    Thus, when performing bulk granule operations which pass granules from the database into a Cumulus workflow, it was possible for there to be schema validation failures depending on which task was used to start the workflow and its particular schema.

    In order to rectify this situation, CUMULUS-2388 was filed and addressed to create a common granule files schema between nearly all of the Cumulus tasks (exceptions discussed below) and the Cumulus database. The following documentation explains the manual changes you need to make to your deployment in order to be compatible with the updated files schema.

    Updated files schema

    The updated granule files schema can be found here.

    These former properties were deprecated (with notes about how to derive the same information from the updated schema, if possible):

    • filename - concatenate the bucket and key values with a directory separator (/)
    • name - use fileName property
    • etag - ETags are no longer provided as an individual file property. Instead, a separate etags object mapping S3 URIs to ETag values is provided as output from the following workflow tasks (guidance on how to integrate this output with your workflows is provided in the Upgrading your workflows section below):
      • update-granules-cmr-metadata-file-links
      • hyrax-metadata-updates
    • fileStagingDir - no longer supported
    • url_path - no longer supported
    • duplicate_found - This property is no longer supported, however sync-granule and move-granules now produce a separate granuleDuplicates object as part of their output. The granuleDuplicates object is a map of granules by granule ID which includes the files that encountered duplicates during processing. Guidance on how to integrate granuleDuplicates information into your workflow configuration is provided below.

    Exceptions

    These workflow tasks did not have their schema for granule files updated:

    • discover-granules - no updates
    • queue-granules - no updates
    • parse-pdr - no updates
    • sync-granule - input schema not updated, output schema was updated

    The reason that these task schemas were not updated is that all of these tasks start before the files have been ingested to S3, thus much of the information that is required in the updated files schema like bucket, key, or checksum is not yet known.

    Bulk granule operations

    Since the input schema for the above tasks was not updated, that means you cannot run bulk granule operations against workflows if they start with any of those tasks. Bulk granule operations work by loading the specified granules from the database and sending them as input to a specified workflow, so if the specified workflow begins with a task whose input schema does not conform to what is coming out of the database, there will be schema errors.

    Upgrading your deployment

    Upgrading your workflows

    For any workflows using the update-granules-cmr-metadata-file-links task before the hyrax-metadata-updates and/or post-to-cmr tasks, update the step definition for update-granules-cmr-metadata-file-links as follows:

        "UpdateGranulesCmrMetadataFileLinksStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    hyrax-metadata-updates

    For any workflows using the hyrax-metadata-updates task before a post-to-cmr task, update the definition of the hyrax-metadata-updates step as follows:

        "HyraxMetadataUpdatesTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    post-to-cmr

    For any workflows using post-to-cmr task after the update-granules-cmr-metadata-file-links or hyrax-metadata-updates tasks, update the post-to-cmr step definition as follows:

        "CmrStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}"
    }
    }
    },
    ...more configuration...

    Example workflow

    For an example workflow integrating all of these changes, please see our example ingest and publish workflow.

    Optional - Integrate granuleDuplicates information

    Please note that the granuleDuplicates output is purely informational and does not have any bearing on the separate configuration for how duplicates should be handled.

    You can include granuleDuplicates output from the sync-granule or move-granules tasks in your workflow messages like so:

        "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    ...other config...
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granuleDuplicates}",
    "destination": "{$.meta.sync_granule.granule_duplicates}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    }
    ...more configuration...

    The result of this configuration is that the granuleDuplicates output from sync-granule would be placed in meta.sync_granule.granule_duplicates on the workflow message and remain there throughout the rest of the workflow. The same configuration could be replicated for the move-granules task, but be sure to use a different destination in the workflow message for the granuleDuplicates output .

    Updating collection URL path templates

    Collections can specify url_path templates to dynamically generate the final location of files. As part of url_path templates, file object properties can be interpolated to generate the file path. Thus, these url_path templates need to be updated to ensure that they are compatible with the updated files schema and the properties that will actually be available on file objects.

    See the notes on the updated files schema to know which properties are available and which previously existing properties were deprecated.

    As an example, you will want to update any url_path properties in your collections to remove references to file.name and replace them with references to file.fileName like so:

    - "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.name, 0, 3)}",
    + "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.fileName, 0, 3)}",
    - + \ No newline at end of file diff --git a/docs/v12.0.0/upgrade-notes/upgrade-rds/index.html b/docs/v12.0.0/upgrade-notes/upgrade-rds/index.html index 2b6899c3f3b..15513f19770 100644 --- a/docs/v12.0.0/upgrade-notes/upgrade-rds/index.html +++ b/docs/v12.0.0/upgrade-notes/upgrade-rds/index.html @@ -5,7 +5,7 @@ Upgrade to RDS release | Cumulus Documentation - + @@ -21,7 +21,7 @@ | cutoffSeconds | number | Number of seconds prior to this execution to 'cutoff' reconciliation queries. This allows in-progress/other in-flight operations time to complete and propagate to Elasticsearch/Dynamo/postgres. | 3600 | | dbConcurrency | number | Sets max number of parallel collections reports the script will run at a time. | 20 | | dbMaxPool | number | Sets the maximum number of connections the database pool has available. Modifying this may result in unexpected failures. | 20 |

    - + \ No newline at end of file diff --git a/docs/v12.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html b/docs/v12.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html index 25bab4c4a7f..ec9bada226b 100644 --- a/docs/v12.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html +++ b/docs/v12.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html @@ -5,13 +5,13 @@ Upgrade to TF version 0.13.6 | Cumulus Documentation - +
    Version: v12.0.0

    Upgrade to TF version 0.13.6

    Background

    Cumulus pins its support to a specific version of Terraform see: deployment documentation. The reason for only supporting one specific Terraform version at a time is to avoid deployment errors than can be caused by deploying to the same target with different Terraform versions.

    Cumulus is upgrading its supported version of Terraform from 0.12.12 to 0.13.6. This document contains instructions on how to perform the upgrade for your deployments.

    Prerequisites

    • Follow the Terraform guidance for what to do before upgrading, notably ensuring that you have no pending changes to your Cumulus deployments before proceeding.
      • You should do a terraform plan to see if you have any pending changes for your deployment (for both the data-persistence-tf and cumulus-tf modules), and if so, run a terraform apply before doing the upgrade to Terraform 0.13.6
    • Review the Terraform v0.13 release notes to prepare for any breaking changes that may affect your custom deployment code. Cumulus' deployment code has already been updated for compatibility with version 0.13.
    • Install Terraform version 0.13.6. We recommend using Terraform Version Manager tfenv to manage your installed versons of Terraform, but this is not required.

    Upgrade your deployment code

    Terraform 0.13 does not support some of the syntax from previous Terraform versions, so you need to upgrade your deployment code for compatibility.

    Terraform provides a 0.13upgrade command as part of version 0.13 to handle automatically upgrading your code. Make sure to check out the documentation on batch usage of 0.13upgrade, which will allow you to upgrade all of your Terraform code with one command.

    Run the 0.13upgrade command until you have no more necessary updates to your deployment code.

    Upgrade your deployment

    1. Ensure that you are running Terraform 0.13.6 by running terraform --version. If you are using tfenv, you can switch versions by running tfenv use 0.13.6.

    2. For the data-persistence-tf and cumulus-tf directories, take the following steps:

      1. Run terraform init --reconfigure. The --reconfigure flag is required, otherwise you might see an error like:

        Error: Failed to decode current backend config

        The backend configuration created by the most recent run of "terraform init"
        could not be decoded: unsupported attribute "lock_table". The configuration
        may have been initialized by an earlier version that used an incompatible
        configuration structure. Run "terraform init -reconfigure" to force
        re-initialization of the backend.
      2. Run terraform apply to perform a deployment.

        WARNING: Even if Terraform says that no resource changes are pending, running the apply using Terraform version 0.13.6 will modify your backend state from version 0.12.12 to version 0.13.6 without requiring approval. Updating the backend state is a necessary part of the version 0.13.6 upgrade, but it is not completely transparent.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/workflow_tasks/discover_granules/index.html b/docs/v12.0.0/workflow_tasks/discover_granules/index.html index b0ff7697733..0d6a945de5a 100644 --- a/docs/v12.0.0/workflow_tasks/discover_granules/index.html +++ b/docs/v12.0.0/workflow_tasks/discover_granules/index.html @@ -5,7 +5,7 @@ Discover Granules | Cumulus Documentation - + @@ -21,7 +21,7 @@ included in a granule's file list. That is, no such filtering based on filename occurs as described above.

    When set on the task configuration, the value applies to all collections during discovery. Otherwise, this property may be set on individual collections.

    Concurrency

    A number property that determines the level of concurrency with which granule duplicate checks are performed when duplicateGranuleHandling is skip or error.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when discover-granules discovers a large number of granules with skip or error duplicate handling. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the discover-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/workflow_tasks/files_to_granules/index.html b/docs/v12.0.0/workflow_tasks/files_to_granules/index.html index 8e28a0bc34c..d07e98f28c2 100644 --- a/docs/v12.0.0/workflow_tasks/files_to_granules/index.html +++ b/docs/v12.0.0/workflow_tasks/files_to_granules/index.html @@ -5,13 +5,13 @@ Files To Granules | Cumulus Documentation - +
    Version: v12.0.0

    Files To Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming config.inputGranules and the task input list of s3 URIs along with the rest of the configuration objects to take the list of incoming files and sort them into a list of granule objects.

    Please note Files passed in without metadata defined previously for config.inputGranules will be added with the following keys:

    • size
    • bucket
    • key
    • fileName

    It is primarily intended to support compatibility with the standard output of a processing task, and convert that output into a granule object accepted as input by the majority of other Cumulus tasks.

    Task Inputs

    Input

    This task expects an incoming input that contains an array of 'staged' S3 URIs to move to their final archive location.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    inputGranules

    An array of Cumulus granule objects.

    This object will be used to define metadata values for the move granules task, and is the basis for the updated object that will be added to the output.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/workflow_tasks/lzards_backup/index.html b/docs/v12.0.0/workflow_tasks/lzards_backup/index.html index 4662934016c..6c37dd28b5f 100644 --- a/docs/v12.0.0/workflow_tasks/lzards_backup/index.html +++ b/docs/v12.0.0/workflow_tasks/lzards_backup/index.html @@ -5,13 +5,13 @@ LZARDS Backup | Cumulus Documentation - +
    Version: v12.0.0

    LZARDS Backup

    The LZARDS backup task takes an array of granules and initiates backup requests to the LZARDS API, which will be handled asynchronously by LZARDS.

    Deployment

    The LZARDS backup task is not automatically deployed with Cumulus. To deploy the task through the Cumulus module, first you must specify a lzards_launchpad_passphrase in your terraform variables (e.g. variables.tf) like so:

    variable "lzards_launchpad_passphrase" {
    type = string
    default = ""
    }

    Then you can specify a value for your lzards_launchpad_passphrase in terraform.tfvars like so:

    lzards_launchpad_passphrase = your-passphrase

    Lastly, you need to make sure that the lzards_launchpad_passphrase is passed into the Cumulus module (in main.tf) like so:

    lzards_launchpad_passphrase  = var.lzards_launchpad_passphrase

    In short, deploying the LZARDS task requires configuring a passphrase variable and ensuring that your TF configuration passes that variable into the Cumulus module.

    Additional terraform configuration for the LZARDS task can be found in the cumulus module's variables.tf file, where the the relevant variables are prefixed with lzards_. You can add these variables to your deployment using the same process outlined above for lzards_launchpad_passphrase.

    Task Inputs

    Input

    This task expects an array of granules as input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Task Outputs

    Output

    The LZARDS task outputs a composite object containing:

    • the input granules array, and
    • a backupResults object that describes the results of LZARDS backup attempts.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/workflow_tasks/move_granules/index.html b/docs/v12.0.0/workflow_tasks/move_granules/index.html index 20b1c06cd55..a7a9ba59dc3 100644 --- a/docs/v12.0.0/workflow_tasks/move_granules/index.html +++ b/docs/v12.0.0/workflow_tasks/move_granules/index.html @@ -5,13 +5,13 @@ Move Granules | Cumulus Documentation - +
    Version: v12.0.0

    Move Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming event.input array of Cumulus granule objects to do the following:

    • Move granules from their 'staging' location to the final location (as configured in the Sync Granules task)

    • Update the event.input object with the new file locations.

    • If the granule has a ECHO10/UMM CMR file(.cmr.xml or .cmr.json) file included in the event.input:

      • Update that file's access locations

      • Add it to the appropriate access URL category for the CMR filetype as defined by granule CNM filetype.

      • Set the CMR file to 'metadata' in the output granules object and add it to the granule files if it's not already present.

        Please note: Granules without a valid CNM type set in the granule file type field in event.input will be treated as "data" in the updated CMR metadata file

    • Task then outputs an updated list of granule objects.

    Task Inputs

    Input

    This task expects an incoming input that contains a list of 'staged' S3 URIs to move to their final archive location. If CMR metadata is to be updated for a granule, it must also be included in the input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects event.input to provide an array of Cumulus granule objects. The files listed for each granule represent the files to be acted upon as described in summary.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects with post-move file locations as the payload for the next task, and returns only the expected payload for the next task. If a CMR file has been specified for a granule object, the CMR resources related to the granule files will be updated according to the updated granule file metadata.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v12.0.0/workflow_tasks/parse_pdr/index.html b/docs/v12.0.0/workflow_tasks/parse_pdr/index.html index d9c70175f1b..5d0d90d7cb2 100644 --- a/docs/v12.0.0/workflow_tasks/parse_pdr/index.html +++ b/docs/v12.0.0/workflow_tasks/parse_pdr/index.html @@ -5,13 +5,13 @@ Parse PDR | Cumulus Documentation - +
    Version: v12.0.0

    Parse PDR

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to do the following with the incoming PDR object:

    • Stage it to an internal S3 bucket

    • Parse the PDR

    • Archive the PDR and remove the staged file if successful

    • Outputs a payload object containing metadata about the parsed PDR (e.g. total size of all files, files counts, etc) and a granules object

    The constructed granules object is created using PDR metadata to determine values like data type and version, collection definitions to determine a file storage location based on the extracted data type and version number.

    Granule file types are converted from the PDR spec types to CNM types according to the following translation table:

      HDF: 'data',
    HDF-EOS: 'data',
    SCIENCE: 'data',
    BROWSE: 'browse',
    METADATA: 'metadata',
    BROWSE_METADATA: 'metadata',
    QA_METADATA: 'metadata',
    PRODHIST: 'qa',
    QA: 'metadata',
    TGZ: 'data',
    LINKAGE: 'data'

    Files missing file types will have none assigned, files with invalid types will result in a PDR parse failure.

    Task Inputs

    Input

    This task expects an incoming input that contains name and path information about the PDR to be parsed. For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    Provider

    A Cumulus provider object. Used to define connection information for retrieving the PDR.

    Bucket

    Defines the bucket where the 'pdrs' folder for parsed PDRs will be stored.

    Collection

    A Cumulus collection object. Used to define granule file groupings and granule metadata for discovered files.

    Task Outputs

    This task outputs a single payload output object containing metadata about the parsed PDR (e.g. filesCount, totalSize, etc), a pdr object with information for later steps and a the generated array of granule objects.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v12.0.0/workflow_tasks/queue_granules/index.html b/docs/v12.0.0/workflow_tasks/queue_granules/index.html index b2dbef7cadb..ab62d61fffb 100644 --- a/docs/v12.0.0/workflow_tasks/queue_granules/index.html +++ b/docs/v12.0.0/workflow_tasks/queue_granules/index.html @@ -5,14 +5,14 @@ Queue Granules | Cumulus Documentation - +
    Version: v12.0.0

    Queue Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions, and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to schedule ingest of granules that were discovered on a remote host, whether via the DiscoverGranules task or the ParsePDR task.

    The task utilizes a defined collection in concert with a defined provider, either on each granule, or passed in via config to queue up ingest executions for each granule, or for batches of granules.

    The constructed granules object is defined by the collection passed in the configuration, and has impacts to other provided core Cumulus Tasks.

    Users of this task in a workflow are encouraged to carefully consider their configuration in context of downstream tasks and workflows.

    Task Inputs

    Each of the following sections are a high-level discussion of the intent of the various input/output/config values.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects an incoming input that contains granules and information about them and their files. For the specifics, see the Cumulus Tasks page entry for the schema.

    This input is most commonly the output from a preceding DiscoverGranules or ParsePDR task.

    Cumulus Configuration

    This task does expect values to be set in the task_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    provider

    A Cumulus provider object for the originating provider. Will be passed along to the ingest workflow. This will be overruled by more specific provider information that may exist on a granule.

    internalBucket

    The Cumulus internal system bucket.

    granuleIngestWorkflow

    A string property that denotes the name of the ingest workflow into which granules should be queued.

    queueUrl

    A string property that denotes the URL of the queue to which scheduled execution messages are sent.

    preferredQueueBatchSize

    A number property that sets an upper bound on the size of each batch of granules queued into the payload of an ingest execution. Setting this property to a value higher than 1 allows queueing of multiple granules per ingest workflow.

    As ingest executions typically expect granules in the payload to have a common collection and common provider, this property only sets an upper bound within which batches will be created based on common collection and provider information.

    This means batches may be smaller than the preferred size if collection or provider information diverge, but never larger.

    The default value if none is specified is 1, which will queue one ingest execution per granule.

    concurrency

    A number property that determines the level of concurrency with which ingest executions are scheduled. Granules or batches of granules will be queued up into executions at this level of concurrency.

    This property is also used to limit concurrency when updating granule status to queued.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when queue-granules receives a large number of granules as input. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the queue-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    executionNamePrefix

    A string property that will prefix the names of scheduled executions.

    childWorkflowMeta

    An object property that will be merged into the scheduled execution input's meta field.

    Task Outputs

    This task outputs an assembled array of workflow execution ARNs for all scheduled workflow executions within the payload's running object.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/workflows/cumulus-task-message-flow/index.html b/docs/v12.0.0/workflows/cumulus-task-message-flow/index.html index b99f3119c65..4865eabccda 100644 --- a/docs/v12.0.0/workflows/cumulus-task-message-flow/index.html +++ b/docs/v12.0.0/workflows/cumulus-task-message-flow/index.html @@ -5,14 +5,14 @@ Cumulus Tasks: Message Flow | Cumulus Documentation - +
    Version: v12.0.0

    Cumulus Tasks: Message Flow

    Cumulus Tasks comprise Cumulus Workflows and are either AWS Lambda tasks or AWS Elastic Container Service (ECS) activities. Cumulus Tasks permit a payload as input to the main task application code. The task payload is additionally wrapped by the Cumulus Message Adapter. The Cumulus Message Adapter supplies additional information supporting message templating and metadata management of these workflows.

    Diagram showing how incoming and outgoing Cumulus messages for workflow steps are handled by the Cumulus Message Adapter

    The steps in this flow are detailed in sections below.

    Cumulus Message Format

    A full Cumulus Message has the following keys:

    • cumulus_meta: System runtime information that should generally not be touched outside of Cumulus library code or the Cumulus Message Adapter. Stores meta information about the workflow such as the state machine name and the current workflow execution's name. This information is used to look up the current active task. The name of the current active task is used to look up the corresponding task's config in task_config.
    • meta: Runtime information captured by the workflow operators. Stores execution-agnostic variables.
    • payload: Payload is runtime information for the tasks.

    In addition to the above keys, it may contain the following keys:

    • replace: A key generated in conjunction with the Cumulus Message adapter. It contains the location on S3 for a message payload and a Target JSON path in the message to extract it to.
    • exception: A key used to track workflow exceptions, should not be modified outside of Cumulus library code.

    Here's a simple example of a Cumulus Message:

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    A message utilizing the Cumulus Remote message functionality must have at least the keys replace and cumulus_meta. Depending on configuration other portions of the message may be present, however the cumulus_meta, meta, and payload keys must be present once extraction is complete.

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    Cumulus Message Preparation

    The event coming into a Cumulus Task is assumed to be a Cumulus Message and should first be handled by the functions described below before being passed to the task application code.

    Preparation Step 1: Fetch remote event

    Fetch remote event will fetch the full event from S3 if the cumulus message includes a replace key.

    Once "my-large-event.json" is fetched from S3, it's returned from the fetch remote event function. If no "replace" key is present, the event passed to the fetch remote event function is assumed to be a complete Cumulus Message and returned as-is.

    Preparation Step 2: Parse step function config from CMA configuration parameters

    This step determines what current task is being executed. Note this is different from what lambda or activity is being executed, because the same lambda or activity can be used for different tasks. The current task name is used to load the appropriate configuration from the Cumulus Message's 'task_config' configuration parameter.

    Preparation Step 3: Load nested event

    Using the config returned from the previous step, load nested event resolves templates for the final config and input to send to the task's application code.

    Task Application Code

    After message prep, the message passed to the task application code is of the form:

    {
    "input": {},
    "config": {}
    }

    Create Next Message functions

    Whatever comes out of the task application code is used to construct an outgoing Cumulus Message.

    Create Next Message Step 1: Assign outputs

    The config loaded from the Fetch step function config step may have a cumulus_message key. This can be used to "dispatch" fields from the task's application output to a destination in the final event output (via URL templating). Here's an example where the value of input.anykey would be dispatched as the value of payload.out in the final cumulus message:

    {
    "task_config": {
    "bar": "baz",
    "cumulus_message": {
    "input": "{$.payload.input}",
    "outputs": [
    {
    "source": "{$.input.anykey}",
    "destination": "{$.payload.out}"
    }
    ]
    }
    },
    "cumulus_meta": {
    "task": "Example",
    "message_source": "local",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "input": {
    "anykey": "anyvalue"
    }
    }
    }

    Create Next Message Step 2: Store remote event

    If the ReplaceConfiguration parameter is set, the configured key's value will be stored in S3 and the final output of the task will include a replace key that contains configuration for a future step to extract the payload on S3 back into the Cumulus Message. The replace key identifies where the large event node has been stored in S3.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/workflows/developing-a-cumulus-workflow/index.html b/docs/v12.0.0/workflows/developing-a-cumulus-workflow/index.html index 245303d381e..3c979bd30d6 100644 --- a/docs/v12.0.0/workflows/developing-a-cumulus-workflow/index.html +++ b/docs/v12.0.0/workflows/developing-a-cumulus-workflow/index.html @@ -5,13 +5,13 @@ Creating a Cumulus Workflow | Cumulus Documentation - +
    Version: v12.0.0

    Creating a Cumulus Workflow

    The Cumulus workflow module

    To facilitate adding a workflows to your deployment Cumulus provides a workflow module.

    In combination with the Cumulus message, the workflow module provides a way to easily turn a Step Function definition into a Cumulus workflow, complete with:

    Using the module also ensures that your workflows will continue to be compatible with future versions of Cumulus.

    For more on the full set of current available options for the module, please consult the module README.

    Adding a new Cumulus workflow to your deployment

    To add a new Cumulus workflow to your deployment that is using the cumulus module, add a new workflow resource to your deployment directory, either in a new .tf file, or to an existing file.

    The workflow should follow a syntax similar to:

    module "my_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/vx.x.x/terraform-aws-cumulus-workflow.zip"

    prefix = "my-prefix"
    name = "MyWorkflowName"
    system_bucket = "my-internal-bucket"

    workflow_config = module.cumulus.workflow_config

    tags = { Deployment = var.prefix }

    state_machine_definition = <<JSON
    {}
    JSON
    }

    In the above example, you would add your state_machine_definition using the Amazon States Language, using tasks you've developed and Cumulus core tasks that are made available as part of the cumulus terraform module.

    Please note: Cumulus follows the convention of tagging resources with the prefix variable { Deployment = var.prefix } that you pass to the cumulus module. For resources defined outside of Core, it's recommended that you adopt this convention as it makes resources and/or deployment recovery scenarios much easier to manage.

    Examples

    For a functional example of a basic workflow, please take a look at the hello_world_workflow.

    For more complete/advanced examples, please read the following cookbook entries/topics:

    - + \ No newline at end of file diff --git a/docs/v12.0.0/workflows/developing-workflow-tasks/index.html b/docs/v12.0.0/workflows/developing-workflow-tasks/index.html index c7b9e11e7d9..a29cc25e776 100644 --- a/docs/v12.0.0/workflows/developing-workflow-tasks/index.html +++ b/docs/v12.0.0/workflows/developing-workflow-tasks/index.html @@ -5,13 +5,13 @@ Developing Workflow Tasks | Cumulus Documentation - +
    Version: v12.0.0

    Developing Workflow Tasks

    Workflow tasks can be either AWS Lambda Functions or ECS Activities.

    Lambda functions

    The full set of available core Lambda functions can be found in the deployed cumulus module zipfile at /tasks, as well as reference documentation here. These Lambdas can be referenced in workflows via the outputs from that module (see the cumulus-template-deploy repo for an example).

    The tasks source is located in the Cumulus repository at cumulus/tasks.

    You can also develop your own Lambda function. See the Lambda Functions page to learn more.

    ECS Activities

    ECS activities are supported via the cumulus_ecs_module available from the Cumulus release page.

    Please read the module README for configuration details.

    For assistance in creating a task definition within the module read the AWS Task Definition Docs.

    For a step-by-step example of using the cumulus_ecs_module, please see the related cookbook entry.

    Cumulus Docker Image

    ECS activities require a docker image. Cumulus provides a docker image (source for node 12x+ lambdas on dockerhub: cumuluss/cumulus-ecs-task.

    Alternate Docker Images

    Custom docker images/runtimes are supported as are private registries. For details on configuring a private registry/image see the AWS documentation on Private Registry Authentication for Tasks.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/workflows/docker/index.html b/docs/v12.0.0/workflows/docker/index.html index 63ef52e65e6..ded3c10cc48 100644 --- a/docs/v12.0.0/workflows/docker/index.html +++ b/docs/v12.0.0/workflows/docker/index.html @@ -5,7 +5,7 @@ Dockerizing Data Processing | Cumulus Documentation - + @@ -14,7 +14,7 @@ 2) validate the output (in this case just check for existence) 3) use 'ncatted' to update the resulting file to be CF-compliant 4) write out metadata generated for this file

    Process Testing

    It is important to have tests for data processing, however in many cases datafiles can be large so it is not practical to store the test data in the repository. Instead, test data is currently stored on AWS S3, and can be retrieved using the AWS CLI.

    aws s3 sync s3://cumulus-ghrc-logs/sample-data/collection-name data

    Where collection-name is the name of the data collection, such as 'avaps', or 'cpl'. For example, an abridged version of the data for CPL includes:

    ├── cpl
    │   ├── input
    │   │   ├── HS3_CPL_ATB_12203a_20120906.hdf5
    │   │   ├── HS3_CPL_OP_12203a_20120906.hdf5
    │   └── output
    │   ├── HS3_CPL_ATB_12203a_20120906.nc
    │   ├── HS3_CPL_ATB_12203a_20120906.nc.meta.xml
    │   ├── HS3_CPL_OP_12203a_20120906.nc
    │   ├── HS3_CPL_OP_12203a_20120906.nc.meta.xml

    Contained in the input directory are all possible sets of data files, while the output directory is the expected result of processing. In this case the hdf5 files are converted to NetCDF files and XML metadata files are generated.

    The docker image for a process can be used on the retrieved test data. First create a test-output directory in the newly created data directory.

    mkdir data/test-output

    Then run the docker image using docker-compose.

    docker-compose run test

    This will process the data in the data/input directory and put the output into data/test-output. Repositories also include Python based tests which will validate this newly created output to the contents of data/output. Use Python's Nose tool to run the included tests.

    nosetests

    If the data/test-output directory validated against the contents of data/output the tests will be successful, otherwise an error will be reported.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/workflows/index.html b/docs/v12.0.0/workflows/index.html index 8e49a1000c6..a5fa4d00dfc 100644 --- a/docs/v12.0.0/workflows/index.html +++ b/docs/v12.0.0/workflows/index.html @@ -5,13 +5,13 @@ Workflows | Cumulus Documentation - +
    Version: v12.0.0

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    Provider data ingest and GIBS have a set of common needs in getting data from a source system and into the cloud where they can be distributed to end users. These common needs are:

    • Data Discovery - Crawling, polling, or detecting changes from a variety of sources.
    • Data Transformation - Taking data files in their original format and extracting and transforming them into another desired format such as visible browse images.
    • Archival - Storage of the files in a location that's accessible to end users.

    The high level view of the architecture and many of the individual steps are the same but the details of ingesting each type of collection differs. Different collection types and different providers have different needs. The individual boxes of a workflow are not only different. The branching, error handling, and multiplicity of the arrows connecting the boxes are also different. Some need visible images rendered from component data files from multiple collections. Some need to contact the CMR with updated metadata. Some will have different retry strategies to handle availability issues with source data systems.

    AWS and other cloud vendors provide an ideal solution for parts of these problems but there needs to be a higher level solution to allow the composition of AWS components into a full featured solution. The Ingest Workflow Architecture is designed to meet the needs for Earth Science data ingest and transformation.

    Goals

    Flexibility and Composability

    The steps to ingest and process data is different for each collection within a provider. Ingest should be as flexible as possible in the rearranging of steps and configuration.

    We want to use lego-like individual steps that can be composed by an operator.

    Individual steps should ...

    • Be as ignorant as possible of the overall flow. They should not be aware of previous steps.
    • Be runnable on their own.
    • Define their input and output in simple data structures.
    • Be domain agnostic.
    • Not make assumptions of specifics of what goes into a granule for example.

    Scalable

    The ingest architecture needs to be scalable both to handle ingesting hundreds of millions of granules and interpret dozens of different workflows.

    Data Provenance

    • We should have traceability for how data was produced and where it comes from.
    • Use immutable representations of data. Data once received is not overwritten. Data can be removed for cleanup.
    • All software is versioned. We can trace transformation of data by tracking the immutable source data and the versioned software applied to it.

    Operator Visibility and Control

    • Operators should be able to see and understand everything that is happening in the system.
    • It should be obvious why things are happening and straightforward to diagnose problems.
    • We generally assume that the operators know best in terms of the limits on a providers infrastructure, how often things need to be done, and details of a collection. The architecture should defer to their decisions and knowledge while providing safety nets to prevent problems.

    A Reconfigurable Workflow Architecture

    The Ingest Workflow Architecture is defined by two entity types, Workflows and Tasks. A Workflow is a set of composed Tasks to complete an objective such as ingesting a granule. Tasks are the individual steps of a Workflow that perform one job. The workflow is responsible for executing the right task based on the current state and response from the last task executed. Tasks are completely decoupled in that they don't call each other or even need to know about the presence of other tasks.

    Workflows and tasks are configured as Terraform resources, which are triggered via configured rules within Cumulus.

    Diagram showing the Step Function execution path through workflow tasks for a collection ingest

    See the Example GIBS Ingest Architecture showing how workflows and tasks are used to define the GIBS Ingest Architecture.

    Workflows

    A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions.

    Benefits of AWS Step Functions

    AWS Step functions are described in detail in the AWS documentation but they provide several benefits which are applicable to AWS.

    • Prebuilt solution
    • Operations Visibility
      • Visual diagram
      • Every execution is recorded with both inputs and output for every step.
    • Composability
      • Allow composing AWS Lambdas and code running in other steps. Code can be run in EC2 to interface with it or even on premise if desired.
      • Step functions allow specifying when steps run in parallel or choices between steps based on data from the previous step.
    • Flexibility
      • Step functions are designed to be easy to build new applications and reconfigure. We're exposing that flexibility directly to the provider.
    • Reliability and Error Handling
      • Step functions allow configuration of retries and adding handling of error conditions.
    • Described via data
      • This makes it easy to save the step function in configuration management solutions.
      • We can build simple interfaces on top of the flexibility provided.

    Workflow Scheduler

    The scheduler is responsible for initiating a step function and passing in the relevant data for a collection. This is currently configured as an interval for each collection. The scheduler service creates the initial event by combining the collection configuration with the AWS execution context defined via the cumulus terraform module.

    Tasks

    A workflow is composed of tasks. Each task is responsible for performing a discrete step of the ingest process. These can be activities like:

    • Crawling a provider website for new data.
    • Uploading data from a provider to S3.
    • Executing a process to transform data.

    AWS Step Functions permit tasks to be code running anywhere, even on premise. We expect most tasks will be written as Lambda functions in order to take advantage of the easy deployment, scalability, and cost benefits provided by AWS Lambda.

    • Leverages Existing Work
      • The design leverages the existing work of Amazon by defining workflows using the AWS Step Function State Language. This is the language that was created for describing the state machines used in AWS Step Functions.
    • Open for Extension
      • Both meta and task_config which are used for configuring at the collection and task levels do not dictate the fields and structure of the configuration. Additional task specific JSON schemas can be used for extending the validation of individual steps.
    • Data-centric Configuration
      • The use of a single JSON configuration file allows this to be added to a workflow. We build additional support on top of the configuration file for simpler domain specific configuration or interactive GUIs.

    For more details on Task Messages and Configuration, visit Cumulus configuration and message protocol documentation.

    Ingest Deploy

    To view deployment documentation, please see the Cumulus deployment documentation.

    Tradeoffs, and Benefits

    This section documents various tradeoffs and benefits of the Ingest Workflow Architecture.

    Tradeoffs

    Workflow execution is handled completely by AWS

    This means we can't add our own code into the orchestration of the workflow. We can't add new features not supported by Step Functions. We can't do things like enforce that the responses from tasks always conform to a schema or extract the configuration for a task ahead of it's execution.

    If we implemented our own orchestration we'd be able to add all of these. We save significant amounts of development effort and gain all the features of Step Functions for this trade off. One workaround is by providing a library of common task capabilities. These would optionally be available to tasks that can be implemented with Node.js and are able to include the library.

    Workflow Configuration is specified in AWS Step Function States Language

    The current design combines the states language defined by AWS with Ingest specific configuration. This means our representation has a tight coupling with their standard. If they make backwards incompatible changes in the future we will have to deal with existing projects written against that.

    We avoid having to develop our own standard and code to process it. The design can support new features in AWS Step Functions without needing to update the Ingest library code changes. It is unlikely they will make a backwards incompatible change at this point. One mitigation for this is writing data transformations to a new format if that were to happen.

    Collection Configuration Flexibility vs Complexity

    The Collections Configuration File is very flexible but requires more knowledge of AWS step functions to configure. A person modifying this file directly would need to comfortable editing a JSON file and configuring AWS Step Functions state transitions which address AWS resources.

    The configuration file itself is not necessarily meant to be edited by a human directly. Since we are developing a reconfigurable, composable architecture that specified entirely in data additional tools can be developed on top of it. The existing recipes.json files can be mapped to this format. Operational Tools like a GUI can be built that provide a usable interface for customizing workflows but it will take time to develop these tools.

    Benefits

    This section describes benefits of the Ingest Workflow Architecture.

    Simplicity

    The concepts of Workflows and Tasks are simple ones that should make sense to providers. Additionally, the implementation will only consist of a few components because the design leverages existing services and capabilities of AWS. The Ingest implementation will only consist of some reusable task code to make task implementation easier, Ingest deployment, and the Workflow Scheduler.

    Composability

    The design aims to satisfy the needs for ingest integrating different workflows for providers. It's flexible in terms of the ability to arrange tasks to meet the needs of a collection. Providers have developed and incorporated open source tools over the years. All of these are easily integrable into the workflows as tasks.

    There is low coupling between task steps. Failures of one component don't bring the whole system down. Individual tasks can be deployed separately.

    Scalability

    AWS Step Functions scale up as needed and aren't limited by a set of number of servers. They also easily allow you to leverage the inherent scalability of serverless functions.

    Monitoring and Auditing

    • Every execution is captured.
    • Every task run has captured input and outputs.
    • CloudWatch Metrics can be used for monitoring many of the events with the StepFunctions. It can also generate alarms for the whole process.
    • Visual report of the entire configuration.
      • Errors and success states are highlighted visually in the flow.

    Data Provenance

    • Monitoring and auditing ensures we know the data that was given to a task.
    • Workflows are versioned and the state machines stored in AWS Step Functions are immutable. Once created they cannot change.
    • Versioning of data in S3 or using immutable records in S3 will mean we always know what data was created as the result of a step or fed into a step.

    Appendix

    Example GIBS Ingest Architecture

    This shows the GIBS Ingest Architecture as an example of the use of the Ingest Workflow Architecture.

    • The GIBS Ingest Architecture consists of two workflows per collection type. There is one for discovery and one for ingest. The final stage of discovery triggers multiple ingest workflows for each MRF granule that needs to be generated.
    • It demonstrates both lambdas as tasks and a container used for MRF generation.

    GIBS Ingest Workflows

    Diagram showing the AWS Step Function execution path for a GIBS ingest workflow

    GIBS Ingest Granules Workflow

    This shows a visualization of an execution of the ingets granules workflow in step functions. The steps highlighted in green are the ones that executed and completed successfully.

    Diagram showing the AWS Step Function execution path for a GIBS ingest granules workflow

    - + \ No newline at end of file diff --git a/docs/v12.0.0/workflows/input_output/index.html b/docs/v12.0.0/workflows/input_output/index.html index 0eff1b57230..01fd7ed346b 100644 --- a/docs/v12.0.0/workflows/input_output/index.html +++ b/docs/v12.0.0/workflows/input_output/index.html @@ -5,14 +5,14 @@ Workflow Inputs & Outputs | Cumulus Documentation - +
    Version: v12.0.0

    Workflow Inputs & Outputs

    General Structure

    Cumulus uses a common format for all inputs and outputs to workflows. The same format is used for input and output from workflow steps. The common format consists of a JSON object which holds all necessary information about the task execution and AWS environment. Tasks return objects identical in format to their input with the exception of a task-specific payload field. Tasks may also augment their execution metadata.

    Cumulus Message Adapter

    The Cumulus Message Adapter and Cumulus Message Adapter libraries help task developers integrate their tasks into a Cumulus workflow. These libraries adapt input and outputs from tasks into the Cumulus Message format. The Scheduler service creates the initial event message by combining the collection configuration, external resource configuration, workflow configuration, and deployment environment settings. The subsequent workflow messages between tasks must conform to the message schema. By using the Cumulus Message Adapter, individual task Lambda functions only receive the input and output specifically configured for the task, and not non-task-related message fields.

    The Cumulus Message Adapter libraries are called by the tasks with a callback function containing the business logic of the task as a parameter. They first adapt the incoming message to a format more easily consumable by Cumulus tasks, then invoke the task, and then adapt the task response back to the Cumulus message protocol to be sent to the next task.

    A task's Lambda function can be configured to include a Cumulus Message Adapter library which constructs input/output messages and resolves task configurations. The CMA can then be included in one of several ways:

    Lambda Layer

    In order to make use of this configuration, a Lambda layer must be uploaded to your account. Due to platform restrictions, Core cannot currently support sharable public layers, however you can deploy the appropriate version from the release page in two ways:

    Once you've deployed the layer, integrate the CMA layer with your Lambdas:

    • If using the cumulus module, set the cumulus_message_adapter_lambda_layer_version_arn in your .tfvars file to integrate the CMA layer with all core Cumulus lambdas.
    • If including your own Lambda or ECS task Terraform modules, specify the CMA layer ARN in the Terraform resource definitions. Also, make sure to set the CUMULUS_MESSAGE_ADAPTER_DIR environment variable for the task to /opt for the CMA integration to work properly.

    In the future if you wish to update/change the CMA version you will need to update the deployed CMA, and update the layer configuration for the impacted Lambdas as needed.

    Please Note: Updating/removing a layer does not change a deployed Lambda, so to update the CMA you should deploy a new version of the CMA layer, update the associated Lambda configuration to reference the new CMA version, and re-deploy your Lambdas.

    Manual Addition

    You can include the CMA package in the Lambda code in the cumulus-message-adapter sub-directory in your lambda .zip, for any Lambda runtime that includes a python runtime. python 2 is included in Lambda runtimes that use Amazon Linux, however Amazon Linux 2 will not support this directly.

    Please note: It is expected that upcoming Cumulus releases will update the CMA layer to include a python runtime.

    If you are manually adding the message adapter to your source and utilizing the CMA, you should set the Lambda's CUMULUS_MESSAGE_ADAPTER_DIR environment variable to target the installation path for the CMA.

    CMA Input/Output

    Input to the task application code is a json object with keys:

    • input: By default, the incoming payload is the payload output from the previous task, or it can be a portion of the payload as configured for the task in the corresponding .tf workflow definition file.
    • config: Task-specific configuration object with URL templates resolved.

    Output from the task application code is returned in and placed in the payload key by default, but the config key can also be used to return just a portion of the task output.

    CMA configuration

    As of Cumulus > 1.15 and CMA > v1.1.1, configuration of the CMA is expected to be driven by AWS Step Function Parameters.

    Using the CMA package with the Lambda by any of the above mentioned methods (Lambda Layers, manual) requires configuration for its various features via a specific Step Function Parameters configuration format (see sample workflows in the examples cumulus-tf source for more examples):

    {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": "{some config}",
    "task_config": "{some config}"
    }
    }

    The "event.$": "$" parameter is required as it passes the entire incoming message to the CMA client library for parsing, and the CMA itself to convert the incoming message into a Cumulus message for use in the function.

    The following are the CMA's current configuration settings:

    ReplaceConfig (Cumulus Remote Message)

    Because of the potential size of a Cumulus message, mainly the payload field, a task can be set via configuration to store a portion of its output on S3 with a message key Remote Message that defines how to retrieve it and an empty JSON object {} in its place. If the portion of the message targeted exceeds the configured MaxSize (defaults to 0 bytes) it will be written to S3.

    The CMA remote message functionality can be configured using parameters in several ways:

    Partial Message

    Setting the Path/Target path in the ReplaceConfig parameter (and optionally a non-default MaxSize)

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 1,
    "Path": "$.payload",
    "TargetPath": "$.payload"
    }
    }
    }
    }
    }

    will result in any payload output larger than the MaxSize (in bytes) to be written to S3. The CMA will then mark that the key has been replaced via a replace key on the event. When the CMA picks up the replace key in future steps, it will attempt to retrieve the output from S3 and write it back to payload.

    Note that you can optionally use a different TargetPath than Path, however as the target is a JSON path there must be a key to target for replacement in the output of that step. Also note that the JSON path specified must target one node, otherwise the CMA will error, as it does not support multiple replacement targets.

    If TargetPath is omitted, it will default to the value for Path.

    Full Message

    Setting the following parameters for a lambda:

    DiscoverGranules:
    Parameters:
    cma:
    event.$: '$'
    ReplaceConfig:
    FullMessage: true

    will result in the CMA assuming the entire inbound message should be stored to S3 if it exceeds the default max size.

    This is effectively the same as doing:

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 0,
    "Path": "$",
    "TargetPath": "$"
    }
    }
    }
    }
    }

    Cumulus Message example

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Cumulus Remote Message example

    The message may contain a reference to an S3 Bucket, Key and TargetPath as follows:

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    task_config

    This configuration key contains the input/output configuration values for definition of inputs/outputs via URL paths. Important: These values are all relative to json object configured for event.$.

    This configuration's behavior is outlined in the CMA step description below.

    The configuration should follow the format:

    {
    "FunctionName": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "other_cma_configuration": "<config object>",
    "task_config": "<task config>"
    }
    }
    }
    }

    Example:

    {
    "StepFunction": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "sfnEnd": true,
    "stack": "{$.meta.stack}",
    "bucket": "{$.meta.buckets.internal.name}",
    "stateMachine": "{$.cumulus_meta.state_machine}",
    "executionName": "{$.cumulus_meta.execution_name}",
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    }
    }
    }

    Cumulus Message Adapter Steps

    1. Reformat AWS Step Function message into Cumulus Message

    Due to the way AWS handles Parameterized messages, when Parameters are used the CMA takes an inbound message:

    {
    "resource": "arn:aws:lambda:us-east-1:<lambda arn values>",
    "input": {
    "Other Parameter": {},
    "cma": {
    "ConfigKey": {
    "config values": "some config values"
    },
    "event": {
    "cumulus_meta": {},
    "payload": {},
    "meta": {},
    "exception": {}
    }
    }
    }
    }

    and takes the following actions:

    • Takes the object at input.cma.event and makes it the full input
    • Merges all of the keys except event under input.cma into the parent input object

    This results in the incoming message (presumably a Cumulus message) with any cma configuration parameters merged in being passed to the CMA. All other parameterized values defined outside of the cma key are ignored

    2. Resolve Remote Messages

    If the incoming Cumulus message has a replace key value, the CMA will attempt to pull the payload from S3,

    For example, if the incoming contains the following:

      "meta": {
    "foo": {}
    },
    "replace": {
    "TargetPath": "$.meta.foo",
    "Bucket": "some_bucket",
    "Key": "events/some-event-id"
    }

    The CMA will attempt to pull the file stored at Bucket/Key and replace the value at TargetPath, then remove the replace object entirely and continue.

    3. Resolve URL templates in the task configuration

    In the workflow configuration (defined under the task_config key), each task has its own configuration, and it can use URL template as a value to achieve simplicity or for values only available at execution time. The Cumulus Message Adapter resolves the URL templates (relative to the event configuration key) and then passes message to next task. For example, given a task which has the following configuration:

    {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }
    }
    }
    }

    and and incoming message that contains:

    {
    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    }
    }

    The corresponding Cumulus Message would contain:

    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }

    The message sent to the task would be:

    "config" : {
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    },
    "inlinestr": "prefixbarsuffix",
    "array": ["bar"],
    "object": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    },
    "input": "{...}"

    URL template variables replace dotted paths inside curly brackets with their corresponding value. If the Cumulus Message Adapter cannot resolve a value, it will ignore the template, leaving it verbatim in the string. While seemingly complex, this allows significant decoupling of Tasks from one another and the data that drives them. Tasks are able to easily receive runtime configuration produced by previously run tasks and domain data.

    4. Resolve task input

    By default, the incoming payload is the payload from the previous task. The task can also be configured to use a portion of the payload its input message. For example, given a task specifies cma.task_config.cumulus_message.input:

        ExampleTask:
    Parameters:
    cma:
    event.$: '$'
    task_config:
    cumulus_message:
    input: '{$.payload.foo}'

    The task configuration in the message would be:

        {
    "task_config": {
    "cumulus_message": {
    "input": "{$.payload.foo}"
    }
    },
    "payload": {
    "foo": {
    "anykey": "anyvalue"
    }
    }
    }

    The Cumulus Message Adapter will resolve the task input, instead of sending the whole payload as task input, the task input would be:

        {
    "input" : {
    "anykey": "anyvalue"
    },
    "config": {...}
    }

    5. Resolve task output

    By default, the task's return value is the next payload. However, the workflow task configuration can specify a portion of the return value as the next payload, and can also augment values to other fields. Based on the task configuration under cma.task_config.cumulus_message.outputs, the Message Adapter uses a task's return value to output a message as configured by the task-specific config defined under cma.task_config. The Message Adapter dispatches a "source" to a "destination" as defined by URL templates stored in the task-specific cumulus_message.outputs. The value of the task's return value at the "source" URL is used to create or replace the value of the task's return value at the "destination" URL. For example, given a task specifies cumulus_message.output in its workflow configuration as follows:

    {
    "ExampleTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    }
    }
    }
    }
    }

    The corresponding Cumulus Message would be:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Given the response from the task is:

        {
    "output": {
    "anykey": "boo"
    }
    }

    The Cumulus Message Adapter would output the following Cumulus Message:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    6. Apply Remote Message Configuration

    If the ReplaceConfig configuration parameter is defined, the CMA will evaluate the configuration options provided, and if required write a portion of the Cumulus Message to S3, and add a replace key to the message for future steps to utilize.

    Please Note: the non user-modifiable field cumulus-meta will always be retained, regardless of the configuration.

    For example, if the output message (post output configuration) from a cumulus message looks like:

        {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    the resultant output would look like:

    {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "replace": {
    "TargetPath": "$",
    "Bucket": "some-internal-bucket",
    "Key": "events/some-event-id"
    }
    }

    Additional features

    Validate task input, output and configuration messages against the schemas provided

    The Cumulus Message Adapter has the capability to validate task input, output and configuration messages against their schemas. The default location of the schemas is the schemas folder in the top level of the task and the default filenames are input.json, output.json, and config.json. The task can also configure a different schema location. If no schema can be found, the Cumulus Message Adapter will not validate the messages.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/workflows/lambda/index.html b/docs/v12.0.0/workflows/lambda/index.html index eccd07102a8..f9e609ed934 100644 --- a/docs/v12.0.0/workflows/lambda/index.html +++ b/docs/v12.0.0/workflows/lambda/index.html @@ -5,13 +5,13 @@ Develop Lambda Functions | Cumulus Documentation - +
    Version: v12.0.0

    Develop Lambda Functions

    Develop a new Cumulus Lambda

    AWS provides great getting started guide for building Lambdas in the developer guide.

    Cumulus currently supports the following environments for Cumulus Message Adapter enabled functions:

    Additionally you may chose to include any of the other languages AWS supports as a resource with reduced feature support.

    Deploy a Lambda

    Node.js Lambda

    For a new Node.js Lambda, create a new function and add an aws_lambda_function resource to your Cumulus deployment (for examples, see the example in source example/lambdas.tf and ingest/lambda-functions.tf) as either a new .tf file, or added to an existing .tf file:

    resource "aws_lambda_function" "myfunction" {
    function_name = "${var.prefix}-function"
    filename = "/path/to/zip/lambda.zip"
    source_code_hash = filebase64sha256("/path/to/zip/lambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"

    vpc_config {
    subnet_ids = var.subnet_ids
    security_group_ids = var.security_group_ids
    }
    }

    Please note: This example contains the minimum set of required configuration.

    Make sure to include a vpc_config that matches the information you've provided the cumulus module if intending to integrate the lambda with a Cumulus deployment.

    Java Lambda

    Java Lambdas are created in much the same way as the Node.js example above.

    The source points to a folder with the compiled .class files and dependency libraries in the Lambda Java zip folder structure (details here), not an uber-jar.

    The deploy folder referenced here would contain a folder 'test_task/task/' which contains Task.class and TaskLogic.class as well as a lib folder containing dependency jars.

    Python Lambda

    Python Lambdas are created the same way as the Node.js example above.

    Cumulus Message Adapter

    For Lambdas wishing to utilize the Cumulus Message Adapter(CMA), you should define a layers key on your Lambda resource with the CMA you wish to include. See the input_output docs for more on how to create/use the CMA.

    Other Lambda Options

    Cumulus supports all of the options available to you via the aws_lambda_function Terraform resource. For more information on what's available, check out the Terraform resource docs.

    Cloudwatch log groups

    If you want to enable Cloudwatch logging for your Lambda resource, you'll need to add a aws_cloudwatch_log_group resource to your Lambda definition:

    resource "aws_cloudwatch_log_group" "myfunction_log_group" {
    name = "/aws/lambda/${aws_lambda_function.myfunction.function_name}"
    retention_in_days = 30
    tags = { Deployment = var.prefix }
    }
    - + \ No newline at end of file diff --git a/docs/v12.0.0/workflows/protocol/index.html b/docs/v12.0.0/workflows/protocol/index.html index 119b0cff728..85fa35721de 100644 --- a/docs/v12.0.0/workflows/protocol/index.html +++ b/docs/v12.0.0/workflows/protocol/index.html @@ -5,13 +5,13 @@ Workflow Protocol | Cumulus Documentation - +
    Version: v12.0.0

    Workflow Protocol

    Configuration and Message Use Diagram

    A diagram showing at which point in a workflow the Cumulus message is checked for conformity with the message schema and where the configuration is checked for conformity with the configuration schema

    • Configuration - The Cumulus workflow configuration defines everything needed to describe an instance of Cumulus.
    • Scheduler - This starts ingest of a collection on configured intervals.
    • Input to Step Functions - The Scheduler uses the Configuration as source data to construct the input to the Workflow.
    • AWS Step Functions - Run the workflows as kicked off by the scheduler or other processes.
    • Input to Task - The input for each task is a JSON document that conforms to the message schema.
    • Output from Task - The output of each task must conform to the message schemas as well and is used as the input for the subsequent task.
    - + \ No newline at end of file diff --git a/docs/v12.0.0/workflows/workflow-configuration-how-to/index.html b/docs/v12.0.0/workflows/workflow-configuration-how-to/index.html index a819d192271..a62f5327578 100644 --- a/docs/v12.0.0/workflows/workflow-configuration-how-to/index.html +++ b/docs/v12.0.0/workflows/workflow-configuration-how-to/index.html @@ -5,7 +5,7 @@ Workflow Configuration How To's | Cumulus Documentation - + @@ -24,7 +24,7 @@ To take a subset of any given metadata, use the option substring.

    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{substring(file.fileName, 0, 3)}"

    This example will populate to "MOD09GQ/MOD"

    In addition to substring, several datetime-specific functions are available, which can parse a datetime string in the metadata and extract a certain part of it:

    "url_path": "{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"

    or

     "url_path": "{dateFormat(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime, YYYY-MM-DD[T]HH[:]mm[:]ss)}"

    The following functions are implemented:

    • extractYear - returns the year, formatted as YYYY
    • extractMonth - returns the month, formatted as MM
    • extractDate - returns the day of the month, formatted as DD
    • extractHour - returns the hour in 24-hour format, with no leading zero
    • dateFormat - takes a second argument describing how to format the date, and passes the metadata date string and the format argument to moment().format()

    Note: the move-granules step needs to be in the workflow for this template to be populated and the file moved. This cmrMetadata or CMR granule XML needs to have been generated and stored on S3. From there any field could be retrieved and used for a url_path.

    Adding Metadata dates and times to the URL Path

    There are a number of options to pull dates from the CMR file metadata. With this metadata:

    <Granule>
    <Temporal>
    <RangeDateTime>
    <BeginningDateTime>2003-02-19T00:00:00Z</BeginningDateTime>
    <EndingDateTime>2003-02-19T23:59:59Z</EndingDateTime>
    </RangeDateTime>
    </Temporal>
    </Granule>

    The following examples of url_path could be used.

    {extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the year from the full date: 2003.

    {extractMonth(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the month: 2.

    {extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the day: 19.

    {extractHour(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the hour: 0.

    Different values can be combined to create the url_path. For example

    {
    "bucket": "sample-protected-bucket",
    "name": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)/extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"
    }

    The final file location for the above would be s3://sample-protected-bucket/MOD09GQ/2003/19/MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.

    - + \ No newline at end of file diff --git a/docs/v12.0.0/workflows/workflow-triggers/index.html b/docs/v12.0.0/workflows/workflow-triggers/index.html index 3edaba310e4..244a5526820 100644 --- a/docs/v12.0.0/workflows/workflow-triggers/index.html +++ b/docs/v12.0.0/workflows/workflow-triggers/index.html @@ -5,13 +5,13 @@ Workflow Triggers | Cumulus Documentation - +
    Version: v12.0.0

    Workflow Triggers

    For a workflow to run, it needs to be associated with a rule (see rule configuration). The rule configuration determines how and when a workflow execution is triggered. Rules can be triggered one time, on a schedule, or by new data written to a kinesis stream.

    There are three lambda functions in the API package responsible for scheduling and starting workflows: SF scheduler, message consumer, and SF starter. Each Cumulus instance comes with a Start SF SQS queue.

    The SF scheduler lambda puts a message onto the Start SF queue. This message is picked up the Start SF lambda and an execution is started with the body of the message as the input.

    When a one time rule is created, the schedule SF lambda is triggered. Rules that are not one time are associated with a CloudWatch event which will manage the trigger of the lambdas that trigger the workflows.

    For a scheduled rule, the Cloudwatch event is triggered on the given schedule which calls directly to the schedule SF lambda.

    For a kinesis rule, when data is added to the kinesis stream, the Cloudwatch event is triggered, which calls the message consumer lambda. The message consumer lambda parses the kinesis message and finds all of the rules associated with that message. For each rule (which corresponds to one workflow), the schedule SF lambda is triggered to queue a message to start the workflow.

    For an sns rule, when a message is published to the SNS topic, the message consumer receives the SNS message (JSON expected), parses it into an object, starts a new execution of the workflow associated with the rule and passes the object in the payload field of the Cumulus message.

    Diagram showing how workflows are scheduled via rules

    - + \ No newline at end of file diff --git a/docs/v13.0.0/adding-a-task/index.html b/docs/v13.0.0/adding-a-task/index.html index 6c38b1407bf..c4e7a4b9b11 100644 --- a/docs/v13.0.0/adding-a-task/index.html +++ b/docs/v13.0.0/adding-a-task/index.html @@ -5,13 +5,13 @@ Contributing a Task | Cumulus Documentation - +
    Version: v13.0.0

    Contributing a Task

    We're tracking reusable Cumulus tasks in this list and, if you've got one you'd like to share with others, you can add it!

    Right now we're focused on tasks distributed via npm, but are open to including others. For now the script that pulls all the data for each package only supports npm.

    The tasks.md file is generated in the build process

    The tasks list in docs/tasks.md is generated from the list of task package names from the tasks folder.

    Do not edit the docs/tasks.md file directly.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/api/index.html b/docs/v13.0.0/api/index.html index 42fba788d0c..0c72b7fcebb 100644 --- a/docs/v13.0.0/api/index.html +++ b/docs/v13.0.0/api/index.html @@ -5,13 +5,13 @@ Cumulus API | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v13.0.0/architecture/index.html b/docs/v13.0.0/architecture/index.html index 933e0b2cd16..fdee6fe7668 100644 --- a/docs/v13.0.0/architecture/index.html +++ b/docs/v13.0.0/architecture/index.html @@ -5,14 +5,14 @@ Architecture | Cumulus Documentation - +
    Version: v13.0.0

    Architecture

    Architecture

    Below, find a diagram with the components that comprise an instance of Cumulus.

    Architecture diagram of a Cumulus deployment

    This diagram details all of the major architectural components of a Cumulus deployment.

    While the diagram can feel complex, it can easily be digested in several major components:

    Data Distribution

    End Users can access data via Cumulus's distribution submodule, which includes ASF's thin egress application, this provides authenticated data egress, temporary S3 links and other statistics features.

    End user exposure of Cumulus's holdings is expected to be provided by an external service.

    For NASA use, this is assumed to be CMR in this diagram.

    Data ingest

    Workflows

    The core of the ingest and processing capabilities in Cumulus is built into the deployed AWS Step Function workflows. Cumulus rules trigger workflows via either Cloud Watch rules, Kinesis streams, SNS topic, or SQS queue. The workflows then run with a configured Cumulus message, utilizing built-in processes to report status of granules, PDRs, executions, etc to the Data Persistence components.

    Workflows can optionally report granule metadata to CMR, and workflow steps can report metrics information to a shared SNS topic, which could be subscribed to for near real time granule, execution, and PDR status. This could be used for metrics reporting using an external ELK stack, for example.

    Data persistence

    Cumulus entity state data is stored in a set of PostgreSQL compatible database, and is exported to an Elasticsearch instance for non-authoritative querying/state data for the API and other applications that require more complex queries. Currently the entity state data is replicated in DynamoDB and this will be removed in a future release.

    Data discovery

    Discovering data for ingest is handled via workflow step components using Cumulus provider and collection configurations and various triggers. Data can be ingested from AWS S3, FTP, HTTPS and more.

    Database

    Cumulus utilizes a user-provided PostgreSQL database backend. For improved API search query efficiency Cumulus provides data replication to an Elasticsearch instance. For legacy reasons, Cumulus is currently also deploying a DynamoDB datastore, and writes are replicated in parallel with the PostgreSQL database writes. The DynamoDB replicated tables and parallel writes will be removed in future releases.

    PostgreSQL Database Schema Diagram

    ERD of the Cumulus Database

    Maintenance

    System maintenance personnel have access to manage ingest and various portions of Cumulus via an AWS API gateway, as well as the operator dashboard.

    Deployment Structure

    Cumulus is deployed via Terraform and is organized internally into two separate top-level modules, as well as several external modules.

    Cumulus

    The Cumulus module, which contains multiple internal submodules, deploys all of the Cumulus components that are not part of the Data Persistence portion of this diagram.

    Data persistence

    The data persistence module provides the Data Persistence portion of the diagram.

    Other modules

    Other modules are provided as artifacts on the release page for use in users configuring their own deployment and contain extracted subcomponents of the cumulus module. For more on these components see the components documentation.

    For more on the specific structure, examples of use and how to deploy and more, please see the deployment docs as well as the cumulus-template-deploy repo .

    - + \ No newline at end of file diff --git a/docs/v13.0.0/configuration/cloudwatch-retention/index.html b/docs/v13.0.0/configuration/cloudwatch-retention/index.html index f388c7be9be..11f620181cc 100644 --- a/docs/v13.0.0/configuration/cloudwatch-retention/index.html +++ b/docs/v13.0.0/configuration/cloudwatch-retention/index.html @@ -5,13 +5,13 @@ Cloudwatch Retention | Cumulus Documentation - +
    Version: v13.0.0

    Cloudwatch Retention

    Our lambdas dump logs to AWS CloudWatch. By default, these logs exist indefinitely. However, there are ways to specify a duration for log retention.

    aws-cli

    In addition to getting your aws-cli set-up, there are two values you'll need to acquire.

    1. log-group-name: the name of the log group who's retention policy (retention time) you'd like to change. We'll use /aws/lambda/KinesisInboundLogger in our examples.
    2. retention-in-days: the number of days you'd like to retain the logs in the specified log group for. There is a list of possible values available in the aws logs documentation.

    For example, if we wanted to set log retention to 30 days on our KinesisInboundLogger lambda, we would write:

    aws logs put-retention-policy --log-group-name "/aws/lambda/KinesisInboundLogger" --retention-in-days 30

    Note: The aws-cli log command that we're using is explained in detail here.

    AWS Management Console

    Changing the log retention policy in the AWS Management Console is a fairly simple process:

    1. Navigate to the CloudWatch service in the AWS Management Console.
    2. Click on the Logs entry on the sidebar.
    3. Find the Log Group who's retention policy you're interested in changing.
    4. Click on the value in the Expire Events After column.
    5. Enter/Select the number of days you'd like to retain logs in that log group for.

    Screenshot of AWS console showing how to configure the retention period for Cloudwatch logs

    - + \ No newline at end of file diff --git a/docs/v13.0.0/configuration/collection-storage-best-practices/index.html b/docs/v13.0.0/configuration/collection-storage-best-practices/index.html index f4cfb3dd43a..9868d0436bb 100644 --- a/docs/v13.0.0/configuration/collection-storage-best-practices/index.html +++ b/docs/v13.0.0/configuration/collection-storage-best-practices/index.html @@ -5,13 +5,13 @@ Collection Cost Tracking and Storage Best Practices | Cumulus Documentation - +
    Version: v13.0.0

    Collection Cost Tracking and Storage Best Practices

    Organizing your data is important for metrics you may want to collect. AWS S3 storage and cost metrics are calculated at the bucket level, so it is easy to get metrics by bucket. You can get storage metrics at the key prefix level, but that is done through the CLI, which can be very slow for large buckets. It is very difficult to estimate costs at the prefix level.

    Calculating Storage By Collection

    By bucket

    Usage by bucket can be obtained in your AWS Billing Dashboard via an S3 Usage Report. You can download your usage report for a period of time and review your storage and requests at the bucket level.

    Bucket metrics can also be found in the AWS CloudWatch Metrics Console (also see Using Amazon CloudWatch Metrics).

    Navigate to Storage Metrics and select the BucketName for all buckets you are interested in. The available metrics are BucketSizeInBytes and NumberOfObjects.

    In the Graphed metrics tab, you can select the type of statistic (i.e. average, minimum, maximum) and the period for the stats. At the top, it's useful to select from the dropdown to view the metrics as a number. You can also select the time period for which you want to see stats.

    Alternatively you can query CloudWatch using the CLI.

    This command will return the average number of bytes in the bucket test-bucket for 7/31/2019:

    aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2019-07-31T00:00:00 --end-time 2019-08-01T00:00:00 --period 86400 --statistics Average --region us-east-1 --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=test-bucket Name=StorageType,Value=StandardStorage

    The result looks like:

    {
    "Datapoints": [
    {
    "Timestamp": "2019-07-31T00:00:00Z",
    "Average": 150996467959.0,
    "Unit": "Bytes"
    }
    ],
    "Label": "BucketSizeBytes"
    }

    By key prefix

    AWS does not offer storage and usage statistics at a key prefix level. Via the AWS CLI, you can get the total storage for a bucket or folder. The following command would get the storage for folder example-folder in bucket sample-bucket:

    aws s3 ls --summarize --human-readable --recursive s3://sample-bucket/example-folder | grep 'Total'

    Note that this can be a long-running operation for large buckets.

    Calculating Cost By Collection

    NASA NGAP Environment

    If using an NGAP account, the cost per bucket can be found in your CloudTamer console, in the Financials section of your account information. This is calculated on a monthly basis.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Outside of NGAP

    You can enabled S3 Cost Allocation Tags and tag your buckets. From there, you can view the cost breakdown in your AWS Billing Dashboard via the Cost Explorer. Cost Allocation Tagging is available at the bucket level.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Storage Configuration

    Cumulus allows for the configuration of many buckets for your files. Buckets are created and added to your deployment as part of the deployment process.

    In your Cumulus collection configuration, you specify where you want the files to be stored post-processing. This is done by matching a regular expression on the file with the configured bucket.

    Note that in the collection configuration, the bucket field is the key to the buckets variable in the deployment's .tfvars file.

    Organizing By Bucket

    You can specify separate groups of buckets for each collection, which could look like the example below.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "MOD09GQ-006-private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "MOD09GQ-006-public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    Additional collections would go to different buckets.

    Organizing by Key Prefix

    Different collections can be organized into different folders in the same bucket, using the key prefix, which is specified as the url_path in the collection configuration. In this simplified collection configuration example, the url_path field is set at the top level so that all files go to a path prefixed with the collection name and version.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    In this case, the path to all the files would be: MOD09GQ___006/<filename> in their respective buckets.

    The url_path can be overidden directly on the file configuration. The example below produces the same result.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "protected-2",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    }
    ]
    }
    - + \ No newline at end of file diff --git a/docs/v13.0.0/configuration/data-management-types/index.html b/docs/v13.0.0/configuration/data-management-types/index.html index 85202b0ffcb..3989b7746b1 100644 --- a/docs/v13.0.0/configuration/data-management-types/index.html +++ b/docs/v13.0.0/configuration/data-management-types/index.html @@ -5,13 +5,13 @@ Cumulus Data Management Types | Cumulus Documentation - +
    Version: v13.0.0

    Cumulus Data Management Types

    What Are The Cumulus Data Management Types

    • Collections: Collections are logical sets of data objects of the same data type and version. They provide contextual information used by Cumulus ingest.
    • Granules: Granules are the smallest aggregation of data that can be independently managed. They are always associated with a collection, which is a grouping of granules.
    • Providers: Providers generate and distribute input data that Cumulus obtains and sends to workflows.
    • Rules: Rules tell Cumulus how to associate providers and collections and when/how to start processing a workflow.
    • Workflows: Workflows are composed of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage, and archive data.
    • Executions: Executions are records of a workflow.
    • Reconciliation Reports: Reports are a comparison of data sets to check to see if they are in agreement and to help Cumulus users detect conflicts.

    Interaction

    • Providers tell Cumulus where to get new data - i.e. S3, HTTPS
    • Collections tell Cumulus where to store the data files
    • Rules tell Cumulus when to trigger a workflow execution and tie providers and collections together

    Managing Data Management Types

    The following are created via the dashboard or API:

    • Providers
    • Collections
    • Rules
    • Reconciliation reports

    Granules are created by workflow executions and then can be managed via the dashboard or API.

    An execution record is created for each workflow execution triggered and can be viewed in the dashboard or data can be retrieved via the API.

    Workflows are created and managed via the Cumulus deployment.

    Configuration Fields

    Schemas

    Looking at our API schema definitions can provide us with some insight into collections, providers, rules, and their attributes (and whether those are required or not). The schema for different concepts will be reference throughout this document.

    The schemas are extremely useful for understanding which attributes are configurable and which of those are required. Cumulus uses these schemas for validation.

    Providers

    Please note:

    • While connection configuration is defined here, things that are more specific to a specific ingest setup (e.g. 'What target directory should we be pulling from' or 'How is duplicate handling configured?') are generally defined in a Rule or Collection, not the Provider.
    • There is some provider behavior which is controlled by task-specific configuration and not the provider definition. This configuration has to be set on a per-workflow basis. For example, see the httpListTimeout configuration on the discover-granules task

    Provider Configuration

    The Provider configuration is defined by a JSON object that takes different configuration keys depending on the provider type. The following are definitions of typical configuration values relevant for the various providers:

    Configuration by provider type
    S3
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be s3 for this provider type.
    hoststringYesS3 Bucket to pull data from
    http
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be http for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 80
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certificateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    https
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be https for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 443
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certiciateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    ftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be ftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to anonymous if not defined
    passwordstringNoPassword to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to password if not defined
    portintegerNoPort to connect to the provider on. Defaults to 21
    sftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be sftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the sftp server.
    passwordstringNoPassword to use to connect to the sftp server.
    portintegerNoPort to connect to the provider on. Defaults to 22
    privateKeystringNofilename assumed to be in s3://bucketInternal/stackName/crypto
    cmKeyIdstringNoAWS KMS Customer Master Key arn or alias

    Collections

    Break down of [s3_MOD09GQ_006.json](https://github.com/nasa/cumulus/blob/master/example/data/collections/s3_MOD09GQ_006/s3_MOD09GQ_006.json)
    KeyValueRequiredDescription
    name"MOD09GQ"YesThe name attribute designates the name of the collection. This is the name under which the collection will be displayed on the dashboard
    version"006"YesA version tag for the collection
    granuleId"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$"YesThe regular expression used to validate the granule ID extracted from filenames according to the granuleIdExtraction
    granuleIdExtraction"(MOD09GQ\..*)(\.hdf|\.cmr|_ndvi\.jpg)"YesThe regular expression used to extract the granule ID from filenames. The first capturing group extracted from the filename by the regex will be used as the granule ID.
    sampleFileName"MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesAn example filename belonging to this collection
    files<JSON Object> of files defined hereYesDescribe the individual files that will exist for each granule in this collection (size, browse, meta, etc.)
    dataType"MOD09GQ"NoCan be specified, but this value will default to the collection_name if not
    duplicateHandling"replace"No("replace"|"version"|"skip") determines granule duplicate handling scheme
    ignoreFilesConfigForDiscoveryfalse (default)NoBy default, during discovery only files that match one of the regular expressions in this collection's files attribute (see above) are ingested. Setting this to true will ignore the files attribute during discovery, meaning that all files for a granule (i.e., all files with filenames matching granuleIdExtraction) will be ingested even when they don't match a regular expression in the files attribute at discovery time. (NOTE: this attribute does not appear in the example file, but is listed here for completeness.)
    process"modis"NoExample options for this are found in the ChooseProcess step definition in the IngestAndPublish workflow definition
    meta<JSON Object> of MetaData for the collectionNoMetaData for the collection. This metadata will be available to workflows for this collection via the Cumulus Message Adapter.
    url_path"{cmrMetadata.Granule.Collection.ShortName}/
    {substring(file.fileName, 0, 3)}"
    NoFilename without extension

    files-object

    KeyValueRequiredDescription
    regex"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"YesRegular expression used to identify the file
    sampleFileNameMOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesFilename used to validate the provided regex
    type"data"NoValue to be assigned to the Granule File Type. CNM types are used by Cumulus CMR steps, non-CNM values will be treated as 'data' type. Currently only utilized in DiscoverGranules task
    bucket"internal"YesName of the bucket where the file will be stored
    url_path"${collectionShortName}/{substring(file.fileName, 0, 3)}"NoFolder used to save the granule in the bucket. Defaults to the collection url_path
    checksumFor"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"NoIf this is a checksum file, set checksumFor to the regex of the target file.

    Rules

    Rules are used by to start processing workflows and the transformation process. Rules can be invoked manually, based on a schedule, or can be configured to be triggered by either events in Kinesis, SNS messages or SQS messages.

    Rule configuration
    KeyValueRequiredDescription
    name"L2_HR_PIXC_kinesisRule"YesName of the rule. This is the name under which the rule will be listed on the dashboard
    workflow"CNMExampleWorkflow"YesName of the workflow to be run. A list of available workflows can be found on the Workflows page
    provider"PODAAC_SWOT"NoConfigured provider's ID. This can be found on the Providers dashboard page
    collection<JSON Object> collection object shown belowYesName and version of the collection this rule will moderate. Relates to a collection configured and found in the Collections page
    payload<JSON Object or Array>NoThe payload to be passed to the workflow
    meta<JSON Object> of MetaData for the ruleNoMetaData for the rule. This metadata will be available to workflows for this rule via the Cumulus Message Adapter.
    rule<JSON Object> rule type and associated values - discussed belowYesObject defining the type and subsequent attributes of the rule
    state"ENABLED"No("ENABLED"|"DISABLED") whether or not the rule will be active. Defaults to "ENABLED".
    queueUrlhttps://sqs.us-east-1.amazonaws.com/1234567890/queue-nameNoURL for SQS queue that will be used to schedule workflows for this rule
    tags["kinesis", "podaac"]NoAn array of strings that can be used to simplify search

    collection-object

    KeyValueRequiredDescription
    name"L2_HR_PIXC"YesName of a collection defined/configured in the Collections dashboard page
    version"000"YesVersion number of a collection defined/configured in the Collections dashboard page

    meta-object

    KeyValueRequiredDescription
    retries3NoNumber of retries on errors, for sqs-type rule only. Defaults to 3.
    visibilityTimeout900NoVisibilityTimeout in seconds for the inflight messages, for sqs-type rule only. Defaults to the visibility timeout of the SQS queue when the rule is created.

    rule-object

    KeyValueRequiredDescription
    type"kinesis"Yes("onetime"|"scheduled"|"kinesis"|"sns"|"sqs") type of scheduling/workflow kick-off desired
    value<String> ObjectDependsDiscussion of valid values is below

    rule-value

    The rule - value entry depends on the type of run:

    • If this is a onetime rule this can be left blank. Example
    • If this is a scheduled rule this field must hold a valid cron-type expression or rate expression.
    • If this is a kinesis rule, this must be a configured ${Kinesis_stream_ARN}. Example
    • If this is an sns rule, this must be an existing ${SNS_Topic_Arn}. Example
    • If this is an sqs rule, this must be an existing ${SQS_QueueUrl} that your account has permissions to access, and also you must configure a dead-letter queue for this SQS queue. Example

    sqs-type rule features

    • When an SQS rule is triggered, the SQS message remains on the queue.
    • The SQS message is not processed multiple times in parallel when visibility timeout is properly set. You should set the visibility timeout to the maximum expected length of the workflow with padding. Longer is better to avoid parallel processing.
    • The SQS message visibility timeout can be overridden by the rule.
    • Upon successful workflow execution, the SQS message is removed from the queue.
    • Upon failed execution(s), the workflow is run 3 or configured number of times.
    • Upon failed execution(s), the visibility timeout will be set to 5s to allow retries.
    • After configured number of failed retries, the SQS message is moved to the dead-letter queue configured for the SQS queue.

    Configuration Via Cumulus Dashboard

    Create A Provider

    • In the Cumulus dashboard, go to the Provider page.

    Screenshot of Create Provider form

    • Click on Add Provider.
    • Fill in the form and then submit it.

    Screenshot of Create Provider form

    Create A Collection

    • Go to the Collections page.

    Screenshot of the Collections page

    • Click on Add Collection.
    • Copy and paste or fill in the collection JSON object form.

    Screenshot of Add Collection form

    • Once you submit the form, you should be able to verify that your new collection is in the list.

    Create A Rule

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Rule Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v13.0.0/configuration/lifecycle-policies/index.html b/docs/v13.0.0/configuration/lifecycle-policies/index.html index 39cbe0c1fe9..d99482c5208 100644 --- a/docs/v13.0.0/configuration/lifecycle-policies/index.html +++ b/docs/v13.0.0/configuration/lifecycle-policies/index.html @@ -5,13 +5,13 @@ Setting S3 Lifecycle Policies | Cumulus Documentation - +
    Version: v13.0.0

    Setting S3 Lifecycle Policies

    This document will outline, in brief, how to set data lifecycle policies so that you are more easily able to control data storage costs while keeping your data accessible. For more information on why you might want to do this, see the 'Additional Information' section at the end of the document.

    Requirements

    • The AWS CLI installed and configured (if you wish to run the CLI example). See AWS's guide to setting up the AWS CLI for more on this. Please ensure the AWS CLI is in your shell path.
    • You will need a S3 bucket on AWS. You are strongly encouraged to use a bucket without voluminous amounts of data in it for experimenting/learning.
    • An AWS user with the appropriate roles to access the target bucket as well as modify bucket policies.

    Examples

    Walk-through on setting time-based S3 Infrequent Access (S3IA) bucket policy

    This example will give step-by-step instructions on updating a bucket's lifecycle policy to move all objects in the bucket from the default storage to S3 Infrequent Access (S3IA) after a period of 90 days. Below are instructions for walking through configuration via the command line and the management console.

    Command Line

    Please ensure you have the AWS CLI installed and configured for access prior to attempting this example.

    Create policy

    From any directory you chose, open an editor and add the following to a file named exampleRule.json

    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    Set policy

    On the command line run the following command (with the bucket you're working with substituted in place of yourBucketNameHere).

    aws s3api put-bucket-lifecycle-configuration --bucket yourBucketNameHere --lifecycle-configuration file://exampleRule.json

    Verify policy has been set

    To obtain all of the existing policies for a bucket, run the following command (again substituting the correct bucket name):

     $ aws s3api get-bucket-lifecycle-configuration --bucket yourBucketNameHere
    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    You have set a policy that transitions any version of an object in the bucket to S3IA after each object version has not been modified for 90 days.

    Management Console

    Create Policy

    To create the example policy on a bucket via the management console, go to the following URL (replacing 'yourBucketHere' with the bucket you intend to update):

    https://s3.console.aws.amazon.com/s3/buckets/yourBucketHere/?tab=overview

    You should see a screen similar to:

    Screenshot of AWS console for an S3 bucket

    Click the "Management" Tab, then lifecycle button and press + Add lifecycle rule:

    Screenshot of &quot;Management&quot; tab of AWS console for an S3 bucket

    Give the rule a name (e.g. '90DayRule'), leaving the filter blank:

    Screenshot of window for configuring the name and scope of a lifecycle rule on an S3 bucket in the AWS console

    Click next, and mark Current Version and Previous Versions.

    Then for each, click + Add transition and select Transition to Standard-IA after for the Object creation field, and set 90 for the Days after creation/Days after objects become concurrent field. Your screen should look similar to:

    Screenshot of window for configuring the storage class transitions of a lifecycle rule on an S3 bucket in the AWS console

    Click next, then next past the Configure expiration screen (we won't be setting this), and on the fourth page, click Save:

    Screenshot of window for reviewing the configuration of a lifecycle rule on an S3 bucket in the AWS console

    You should now see you have a rule configured for your bucket:

    Screenshot of lifecycle rule appearing in the &quot;Management&quot; tab of AWS console for an S3 bucket

    You have now set a policy that transitions any version of an object in the bucket to S3IA after each object has not been modified for 90 days.

    Additional Information

    This section lists information you may want prior to enacting lifecycle policies. It is not required content for working through the examples.

    Strategy Overview

    For a discussion of overall recommended strategy, please review the Methodology for Data Lifecycle Management on the EarthData wiki.

    AWS Documentation

    The examples shown in this document are obviously fairly basic cases. By using object tags, filters and other configuration options you can enact far more complicated policies for various scenarios. For more reading on the topics presented on this page see:

    - + \ No newline at end of file diff --git a/docs/v13.0.0/configuration/monitoring-readme/index.html b/docs/v13.0.0/configuration/monitoring-readme/index.html index d07bddcd7e1..dc87f0844e8 100644 --- a/docs/v13.0.0/configuration/monitoring-readme/index.html +++ b/docs/v13.0.0/configuration/monitoring-readme/index.html @@ -5,14 +5,14 @@ Monitoring Best Practices | Cumulus Documentation - +
    Version: v13.0.0

    Monitoring Best Practices

    This document intends to provide a set of recommendations and best practices for monitoring the state of a deployed Cumulus and diagnosing any issues.

    Cumulus-provided resources and integrations for monitoring

    Cumulus provides a number set of resources that are useful for monitoring the system and its operation.

    Cumulus Dashboard

    The primary tool for monitoring the Cumulus system is the Cumulus Dashboard. The dashboard is hosted on Github and includes instructions on how to deploy and link it into your core Cumulus deployment.

    The dashboard displays workflow executions, their status, inputs, outputs, and some diagnostic information such as logs. For further information on the dashboard, its usage, and the information it provides, see the documentation.

    Cumulus-provided AWS resources

    Cumulus sets up CloudWatch log groups for all Core-provided tasks.

    Monitoring Lambda Functions

    Logging for each Lambda Function is available in Lambda-specific CloudWatch log groups.

    Monitoring ECS services

    Each deployed cumulus_ecs_service module also includes a CloudWatch log group for the processes running on ECS.

    Monitoring workflows

    For advanced debugging, we also configure dead letter queues on critical system functions. These will allow you to monitor and debug invalid inputs to the functions we use to start workflows, which can be helpful if you find that you are not seeing workflows being started as expected. More information on these can be found in the dead letter queue documentation

    AWS recommendations

    AWS has a number of recommendations on system monitoring. Rather than reproduce those here and risk providing outdated guidance, we've documented the following links which will take you to available AWS docs on monitoring recommendations and best practices for the services used in Cumulus:

    Example: Setting up email notifications for CloudWatch logs

    Cumulus does not provide out-of-the-box support for email notifications at this time. However, setting up email notifications on AWS is fairly straightforward in that the operative components are an AWS SNS topic and a subscribed email address.

    In terms of Cumulus integration, forwarding CloudWatch logs requires creating a mechanism, most likely a Lambda Function subscribed to the log group that will receive, filter and forward these messages to the SNS topic.

    As a very simple example, we could create a function that filters CloudWatch logs created by the @cumulus/logger package and sends email notifications for error and fatal log levels, adapting the example linked above:

    const zlib = require('zlib');
    const aws = require('aws-sdk');
    const { promisify } = require('util');

    const gunzip = promisify(zlib.gunzip);
    const sns = new aws.SNS();

    exports.handler = async (event) => {
    const payload = Buffer.from(event.awslogs.data, 'base64');
    const decompressedData = await gunzip(payload);
    const logData = JSON.parse(decompressedData.toString('ascii'));
    return await Promise.all(logData.logEvents.map(async (logEvent) => {
    const logMessage = JSON.parse(logEvent.message);
    if (['error', 'fatal'].includes(logMessage.level)) {
    return sns.publish({
    TopicArn: process.env.EmailReportingTopicArn,
    Message: logEvent.message
    }).promise();
    }
    return Promise.resolve();
    }));
    };

    After creating the SNS topic, We can deploy this code as a lambda function, following the setup steps from Amazon. Make sure to include your SNS topic ARN as an environment variable on the lambda function by using the --environment option on aws lambda create-function.

    You will need to create subscription filters for each log group you want to receive emails for. We recommend automating this as much as possible, and you could very well handle this via Terraform, such as using a module to deploy filters alongside log groups, or exporting the log group names to an all-in-one email notification module.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/configuration/server_access_logging/index.html b/docs/v13.0.0/configuration/server_access_logging/index.html index 9ccb9edcb8d..9d08f8e1b7d 100644 --- a/docs/v13.0.0/configuration/server_access_logging/index.html +++ b/docs/v13.0.0/configuration/server_access_logging/index.html @@ -5,13 +5,13 @@ S3 Server Access Logging | Cumulus Documentation - +
    Version: v13.0.0

    S3 Server Access Logging

    Via AWS Console

    Enable server access logging for an S3 bucket

    Via AWS Command Line Interface

    1. Create a logging.json file with these contents, replacing <stack-internal-bucket> with your stack's internal bucket name, and <stack> with the name of your cumulus stack.

      {
      "LoggingEnabled": {
      "TargetBucket": "<stack-internal-bucket>",
      "TargetPrefix": "<stack>/ems-distribution/s3-server-access-logs/"
      }
      }
    2. Add the logging policy to each of your protected and public buckets by calling this command on each bucket.

      aws s3api put-bucket-logging --bucket <protected/public-bucket-name> --bucket-logging-status file://logging.json
    3. Verify the logging policy exists on your buckets.

      aws s3api get-bucket-logging --bucket <protected/public-bucket-name>
    - + \ No newline at end of file diff --git a/docs/v13.0.0/configuration/task-configuration/index.html b/docs/v13.0.0/configuration/task-configuration/index.html index 3ee0e9b23f8..9fe87261414 100644 --- a/docs/v13.0.0/configuration/task-configuration/index.html +++ b/docs/v13.0.0/configuration/task-configuration/index.html @@ -5,13 +5,13 @@ Configuration of Tasks | Cumulus Documentation - +
    Version: v13.0.0

    Configuration of Tasks

    The cumulus module exposes values for configuration for some of the provided archive and ingest tasks. Currently the following are available as configurable variables:

    cmr_search_client_config

    Configuration parameters for CMR search client for cumulus archive module tasks in the form:

    <lambda_identifier>_report_cmr_limit = <maximum number records can be returned from cmr-client search, this should be greater than cmr_page_size>
    <lambda_identifier>_report_cmr_page_size = <number of records for each page returned from CMR>
    type = map(string)

    More information about cmr limit and cmr page_size can be found from @cumulus/cmr-client and CMR Search API document.

    Currently the following values are supported:

    • create_reconciliation_report_cmr_limit
    • create_reconciliation_report_cmr_page_size

    Example

    cmr_search_client_config = {
    create_reconciliation_report_cmr_limit = 2500
    create_reconciliation_report_cmr_page_size = 250
    }

    elasticsearch_client_config

    Configuration parameters for Elasticsearch client for cumulus archive module tasks in the form:

    <lambda_identifier>_es_scroll_duration = <duration>
    <lambda_identifier>_es_scroll_size = <size>
    type = map(string)

    Currently the following values are supported:

    • create_reconciliation_report_es_scroll_duration
    • create_reconciliation_report_es_scroll_size

    Example

    elasticsearch_client_config = {
    create_reconciliation_report_es_scroll_duration = "15m"
    create_reconciliation_report_es_scroll_size = 2000
    }

    lambda_timeouts

    A configurable map of timeouts (in seconds) for cumulus ingest module task lambdas in the form:

    <lambda_identifier>_timeout: <timeout>
    type = map(string)

    Currently the following values are supported:

    • discover_granules_task_timeout
    • discover_pdrs_task_timeout
    • hyrax_metadata_update_tasks_timeout
    • lzards_backup_task_timeout
    • move_granules_task_timeout
    • parse_pdr_task_timeout
    • pdr_status_check_task_timeout
    • post_to_cmr_task_timeout
    • queue_granules_task_timeout
    • queue_pdrs_task_timeout
    • queue_workflow_task_timeout
    • sync_granule_task_timeout
    • update_granules_cmr_metadata_file_links_task_timeout

    Example

    lambda_timeouts = {
    discover_granules_task_timeout = 300
    }
    - + \ No newline at end of file diff --git a/docs/v13.0.0/data-cookbooks/about-cookbooks/index.html b/docs/v13.0.0/data-cookbooks/about-cookbooks/index.html index 4d71ac812c7..5f48b0f93bc 100644 --- a/docs/v13.0.0/data-cookbooks/about-cookbooks/index.html +++ b/docs/v13.0.0/data-cookbooks/about-cookbooks/index.html @@ -5,13 +5,13 @@ About Cookbooks | Cumulus Documentation - +
    Version: v13.0.0

    About Cookbooks

    Introduction

    The following data cookbooks are documents containing examples and explanations of workflows in the Cumulus framework. Additionally, the following data cookbooks should serve to help unify an institution/user group on a set of terms.

    Setup

    The data cookbooks assume you can configure providers, collections, and rules to run workflows. Visit Cumulus data management types for information on how to configure Cumulus data management types.

    Adding a page

    As shown in detail in the "Add a New Page and Sidebars" section in Cumulus Docs: How To's, you can add a new page to the data cookbook by creating a markdown (.md) file in the docs/data-cookbooks directory. The new page can then be linked to the sidebar by adding it to the Data-Cookbooks object in the website/sidebar.json file as data-cookbooks/${id}.

    More about workflows

    Workflow general information

    Input & Output

    Developing Workflow Tasks

    Workflow Configuration How-to's

    - + \ No newline at end of file diff --git a/docs/v13.0.0/data-cookbooks/browse-generation/index.html b/docs/v13.0.0/data-cookbooks/browse-generation/index.html index 9322446a069..74f568c7095 100644 --- a/docs/v13.0.0/data-cookbooks/browse-generation/index.html +++ b/docs/v13.0.0/data-cookbooks/browse-generation/index.html @@ -5,7 +5,7 @@ Ingest Browse Generation | Cumulus Documentation - + @@ -15,7 +15,7 @@ provider keys with the previously entered values) Note that you need to set the "provider_path" to the path on your bucket (e.g. "/data") that you've staged your mock/test data.:

    {
    "name": "TestBrowseGeneration",
    "workflow": "DiscoverGranulesBrowseExample",
    "provider": "{{provider_from_previous_step}}",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "meta": {
    "provider_path": "{{path_to_data}}"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "updatedAt": 1553053438767
    }

    Run Workflows

    Once you've configured the Collection and Provider and added a onetime rule, you're ready to trigger your rule, and watch the ingest workflows process.

    Go to the Rules tab, click the rule you just created:

    Screenshot of the Rules overview page with a list of rules in the Cumulus dashboard

    Then click the gear in the upper right corner and click "Rerun":

    Screenshot of clicking the button to rerun a workflow rule from the rule edit page in the Cumulus dashboard

    Tab over to executions and you should see the DiscoverGranulesBrowseExample workflow run, succeed, and then moments later the CookbookBrowseExample should run and succeed.

    Screenshot of page listing executions in the Cumulus dashboard

    Results

    You can verify your data has ingested by clicking the successful workflow entry:

    Screenshot of individual entry from table listing executions in the Cumulus dashboard

    Select "Show Output" on the next page

    Screenshot of &quot;Show output&quot; button from individual execution page in the Cumulus dashboard

    and you should see in the payload from the workflow something similar to:

    "payload": {
    "process": "modis",
    "granules": [
    {
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-private",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "type": "browse",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-protected-2",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}"
    }
    ],
    "cmrLink": "https://cmr.uat.earthdata.nasa.gov/search/granules.json?concept_id=G1222231611-CUMULUS",
    "cmrConceptId": "G1222231611-CUMULUS",
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "cmrMetadataFormat": "echo10",
    "dataType": "MOD09GQ",
    "version": "006",
    "published": true
    }
    ]
    }

    You can verify the granules exist within your cumulus instance (search using the Granules interface, check the S3 buckets, etc) and validate that the above CMR entry


    Build Processing Lambda

    This section discusses the construction of a custom processing lambda to replace the contrived example from this entry for a real dataset processing task.

    To ingest your own data using this example, you will need to construct your own lambda to replace the source in ProcessingStep that will generate browse imagery and provide or update a CMR metadata export file.

    You will then need to add the lambda to your Cumulus deployment as a aws_lambda_function Terraform resource.

    The discussion below outlines requirements for this lambda.

    Inputs

    The incoming message to the task defined in the ProcessingStep as configured will have the following configuration values (accessible inside event.config courtesy of the message adapter):

    Configuration

    • event.config.bucket -- the name of the bucket configured in terraform.tfvars as your internal bucket.

    • event.config.collection -- The full collection object we will configure in the Configure Ingest section. You can view the expected collection schema in the docs here or in the source code on github. You need this as available input and output so you can update as needed.

    event.config.additionalUrls, generateFakeBrowse and event.config.cmrMetadataFormat from the example can be ignored as they're configuration flags for the provided example script.

    Payload

    The 'payload' from the previous task is accessible via event.input. The expected payload output schema from SyncGranules can be viewed here.

    In our example, the payload would look like the following. Note: The types are set per-file based on what we configured in our collection, and were initially added as part of the DiscoverGranules step in the DiscoverGranulesBrowseExample workflow.

     "payload": {
    "process": "modis",
    "granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    }
    ]
    }
    ]
    }

    Generating Browse Imagery

    The provided example script used in the example goes through all granules and adds a 'fake' .jpg browse file to the same staging location as the data staged by prior ingest tasksf.

    The processing lambda you construct will need to do the following:

    • Create a browse image file based on the input data, and stage it to a location accessible to both this task and the FilesToGranules and MoveGranules tasks in a S3 bucket.
    • Add the browse file to the input granule files, making sure to set the granule file's type to browse.
    • Update meta.input_granules with the updated granules list, as well as provide the files to be integrated by FilesToGranules as output from the task.

    Generating/updating CMR metadata

    If you do not already have a CMR file in the granules list, you will need to generate one for valid export. This example's processing script generates and adds it to the FilesToGranules file list via the payload but it can be present in the InputGranules from the DiscoverGranules task as well if you'd prefer to pre-generate it.

    Both downstream tasks MoveGranules, UpdateGranulesCmrMetadataFileLinks, and PostToCmr expect a valid CMR file to be available if you want to export to CMR.

    Expected Outputs for processing task/tasks

    In the above example, the critical portion of the output to FilesToGranules is the payload and meta.input_granules.

    In the example provided, the processing task is setup to return an object with the keys "files" and "granules". In the cumulus_message configuration, the outputs are mapped in the configuration to the payload, granules to meta.input_granules:

              "task_config": {
    "inputGranules": "{$.meta.input_granules}",
    "granuleIdExtraction": "{$.meta.collection.granuleIdExtraction}"
    }

    Their expected values from the example above may be useful in constructing a processing task:

    payload

    The payload includes a full list of files to be 'moved' into the cumulus archive. The FilesToGranules task will take this list, merge it with the information from InputGranules, then pass that list to the MoveGranules task. The MoveGranules task will then move the files to their targets. The UpdateGranulesCmrMetadataFileLinks task will update the CMR metadata file if it exists with the updated granule locations and update the CMR file etags.

    In the provided example, a payload being passed to the FilesToGranules task should be expected to look like:

      "payload": [
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml"
    ]

    This list is the list of granules FilesToGranules will act upon to add/merge with the input_granules object.

    The pathing is generated from sync-granules, but in principle the files can be staged wherever you like so long as the processing/MoveGranules task's roles have access and the filename matches the collection configuration.

    input_granules

    The FilesToGranules task utilizes the incoming payload to chose which files to move, but pulls all other metadata from meta.input_granules. As such, the output payload in the example would look like:

    "input_granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg"
    }
    ]
    }
    ],
    - + \ No newline at end of file diff --git a/docs/v13.0.0/data-cookbooks/choice-states/index.html b/docs/v13.0.0/data-cookbooks/choice-states/index.html index b4b84ed1a85..5a44154dcf9 100644 --- a/docs/v13.0.0/data-cookbooks/choice-states/index.html +++ b/docs/v13.0.0/data-cookbooks/choice-states/index.html @@ -5,13 +5,13 @@ Choice States | Cumulus Documentation - +
    Version: v13.0.0

    Choice States

    Cumulus supports AWS Step Function Choice states. A Choice state enables branching logic in Cumulus workflows.

    Choice state definitions include a list of Choice Rules. Each Choice Rule defines a logical operation which compares an input value against a value using a comparison operator. For available comparison operators, review the AWS docs.

    If the comparison evaluates to true, the Next state is followed.

    Example

    In examples/cumulus-tf/parse_pdr_workflow.tf the ParsePdr workflow uses a Choice state, CheckAgainChoice, to terminate the workflow once meta.isPdrFinished: true is returned by the CheckStatus state.

    The CheckAgainChoice state definition requires an input object of the following structure:

    {
    "meta": {
    "isPdrFinished": false
    }
    }

    Given the above input to the CheckAgainChoice state, the workflow would transition to the PdrStatusReport state.

    "CheckAgainChoice": {
    "Type": "Choice",
    "Choices": [
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": false,
    "Next": "PdrStatusReport"
    },
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": true,
    "Next": "WorkflowSucceeded"
    }
    ],
    "Default": "WorkflowSucceeded"
    }

    Advanced: Loops in Cumulus Workflows

    Understanding the complete ParsePdr workflow is not necessary to understanding how Choice states work, but ParsePdr provides an example of how Choice states can be used to create a loop in a Cumulus workflow.

    In the complete ParsePdr workflow definition, the state QueueGranules is followed by CheckStatus. From CheckStatus a loop starts: Given CheckStatus returns meta.isPdrFinished: false, CheckStatus is followed by CheckAgainChoice is followed by PdrStatusReport is followed by WaitForSomeTime, which returns to CheckStatus. Once CheckStatus returns meta.isPdrFinished: true, CheckAgainChoice proceeds to WorkflowSucceeded.

    Execution graph of SIPS ParsePdr workflow in AWS Step Functions console

    Further documentation

    For complete details on Choice state configuration options, see the Choice state documentation.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/data-cookbooks/cnm-workflow/index.html b/docs/v13.0.0/data-cookbooks/cnm-workflow/index.html index bb3553c09ae..28128a31ecd 100644 --- a/docs/v13.0.0/data-cookbooks/cnm-workflow/index.html +++ b/docs/v13.0.0/data-cookbooks/cnm-workflow/index.html @@ -5,7 +5,7 @@ CNM Workflow | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v13.0.0

    CNM Workflow

    This entry documents how to setup a workflow that utilizes the built-in CNM/Kinesis functionality in Cumulus.

    Prior to working through this entry you should be familiar with the Cloud Notification Mechanism.

    Sections


    Prerequisites

    Cumulus

    This entry assumes you have a deployed instance of Cumulus (version >= 1.16.0). The entry assumes you are deploying Cumulus via the cumulus terraform module sourced from the release page.

    AWS CLI

    This entry assumes you have the AWS CLI installed and configured. If you do not, please take a moment to review the documentation - particularly the examples relevant to Kinesis - and install it now.

    Kinesis

    This entry assumes you already have two Kinesis data steams created for use as CNM notification and response data streams.

    If you do not have two streams setup, please take a moment to review the Kinesis documentation and setup two basic single-shard streams for this example:

    Using the "Create Data Stream" button on the Kinesis Dashboard, work through the dialogue.

    You should be able to quickly use the "Create Data Stream" button on the Kinesis Dashboard, and setup streams that are similar to the following example:

    Screenshot of AWS console page for creating a Kinesis stream

    Please bear in mind that your {{prefix}}-lambda-processing IAM role will need permissions to write to the response stream for this workflow to succeed if you create the Kinesis stream with a dashboard user. If you are using the cumulus top-level module for your deployment this should be set properly.

    If not, the most straightforward approach is to attach the AmazonKinesisFullAccess policy for the stream resource to whatever role your Lambda s are using, however your environment/security policies may require an approach specific to your deployment environment.

    In operational environments it's likely science data providers would typically be responsible for providing a Kinesis stream with the appropriate permissions.

    For more information on how this process works and how to develop a process that will add records to a stream, read the Kinesis documentation and the developer guide.

    Source Data

    This entry will run the SyncGranule task against a single target data file. To that end it will require a single data file to be present in an S3 bucket matching the Provider configured in the next section.

    Collection and Provider

    Cumulus will need to be configured with a Collection and Provider entry of your choosing. The provider should match the location of the source data from the Ingest Source Data section.

    This can be done via the Cumulus Dashboard if installed or the API. It is strongly recommended to use the dashboard if possible.


    Configure the Workflow

    Provided the prerequisites have been fulfilled, you can begin adding the needed values to your Cumulus configuration to configure the example workflow.

    The following are steps that are required to set up your Cumulus instance to run the example workflow:

    Example CNM Workflow

    In this example, we're going to trigger a workflow by creating a Kinesis rule and sending a record to a Kinesis stream.

    The following workflow definition should be added to a new .tf workflow resource (e.g. cnm_workflow.tf) in your deployment directory. For the complete CNM workflow example, see examples/cumulus-tf/kinesis_trigger_test_workflow.tf.

    Add the following to the new terraform file in your deployment directory, updating the following:

    • Set the response-endpoint key in the CnmResponse task in the workflow JSON to match the name of the Kinesis response stream you configured in the prerequisites section
    • Update the source key to the workflow module to match the Cumulus release associated with your deployment.
    module "cnm_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-workflow.zip"

    prefix = var.prefix
    name = "CNMExampleWorkflow"
    workflow_config = module.cumulus.workflow_config
    system_bucket = var.system_bucket

    {
    state_machine_definition = <<JSON
    "CNMExampleWorkflow": {
    "Comment": "CNMExampleWorkflow",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "collection": "{$.meta.collection}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "response-endpoint": "ADD YOUR RESPONSE STREAM NAME HERE",
    "region": "us-east-1",
    "type": "kinesis",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$.input.input}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 5,
    "MaxAttempts": 3
    }
    ],
    "End": true
    }
    }
    }
    }
    JSON

    Again, please make sure to modify the value response-endpoint to match the stream name (not ARN) for your Kinesis response stream.

    Lambda Configuration

    To execute this workflow, you're required to include several Lambda resources in your deployment. To do this, add the following task (Lambda) definitions to your deployment along with the workflow you created above:

    Please note: To utilize these tasks you need to ensure you have a compatible CMA layer. See the deployment instructions for more details on how to deploy a CMA layer.

    Below is a description of each of these tasks:

    CNMToCMA

    CNMToCMA is meant for the beginning of a workflow: it maps CNM granule information to a payload for downstream tasks. For other CNM workflows, you would need to ensure that downstream tasks in your workflow either understand the CNM message or include a translation task like this one.

    You can also manipulate the data sent to downstream tasks using task_config for various states in your workflow resource configuration. Read more about how to configure data on the Workflow Input & Output page.

    CnmResponse

    The CnmResponse Lambda generates a CNM response message and puts it on the response-endpoint Kinesis stream.

    You can read more about the expected schema of a CnmResponse record in the Cloud Notification Mechanism schema repository.

    Additional Tasks

    Lastly, this entry also makes use of the SyncGranule task from the cumulus module.

    Redeploy

    Once the above configuration changes have been made, redeploy your stack.

    Please refer to Update Cumulus resources in the deployment documentation if you are unfamiliar with redeployment.

    Rule Configuration

    Cumulus includes a messageConsumer Lambda function (message-consumer). Cumulus kinesis-type rules create the event source mappings between Kinesis streams and the messageConsumer Lambda. The messageConsumer Lambda consumes records from one or more Kinesis streams, as defined by enabled kinesis-type rules. When new records are pushed to one of these streams, the messageConsumer triggers workflows associated with the enabled kinesis-type rules.

    To add a rule via the dashboard (if you'd like to use the API, see the docs here), navigate to the Rules page and click Add a rule, then configure the new rule using the following template (substituting correct values for parameters denoted by ${}):

    {
    "collection": {
    "name": "L2_HR_PIXC",
    "version": "000"
    },
    "name": "L2_HR_PIXC_kinesisRule",
    "provider": "PODAAC_SWOT",
    "rule": {
    "type": "kinesis",
    "value": "arn:aws:kinesis:{{awsRegion}}:{{awsAccountId}}:stream/{{streamName}}"
    },
    "state": "ENABLED",
    "workflow": "CNMExampleWorkflow"
    }

    Please Note:

    • The rule's value attribute value must match the Amazon Resource Name ARN for the Kinesis data stream you've preconfigured. You should be able to obtain this ARN from the Kinesis Dashboard entry for the selected stream.
    • The collection and provider should match the collection and provider you setup in the Prerequisites section.

    Once you've clicked on 'submit' a new rule should appear in the dashboard's Rule Overview.


    Execute the Workflow

    Once Cumulus has been redeployed and a rule has been added, we're ready to trigger the workflow and watch it execute.

    How to Trigger the Workflow

    To trigger matching workflows, you will need to put a record on the Kinesis stream that the message-consumer Lambda will recognize as a matching event. Most importantly, it should include a collection name that matches a valid collection.

    For the purpose of this example, the easiest way to accomplish this is using the AWS CLI.

    Create Record JSON

    Construct a JSON file containing an object that matches the values that have been previously setup. This JSON object should be a valid Cloud Notification Mechanism message.

    Please note: this example is somewhat contrived, as the downstream tasks don't care about most of these fields. A 'real' data ingest workflow would.

    The following values (denoted by ${} in the sample below) should be replaced to match values we've previously configured:

    • TEST_DATA_FILE_NAME: The filename of the test data that is available in the S3 (or other) provider we created earlier.
    • TEST_DATA_URI: The full S3 path to the test data (e.g. s3://bucket-name/path/granule)
    • COLLECTION: The collection name defined in the prerequisites for this product
    {
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "${TEST_DATA_FILE_NAME}",
    "checksum": "bogus_checksum_value",
    "uri": "${TEST_DATA_URI}",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "${TEST_DATA_FILE_NAME}",
    "dataVersion": "006"
    },
    "identifier ": "testIdentifier123456",
    "collection": "${COLLECTION}",
    "provider": "TestProvider",
    "version": "001",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Add Record to Kinesis Data Stream

    Using the JSON file you created, push it to the Kinesis notification stream:

    aws kinesis put-record --stream-name YOUR_KINESIS_NOTIFICATION_STREAM_NAME_HERE --partition-key 1 --data file:///path/to/file.json

    Please note: The above command uses the stream name, not the ARN.

    The command should return output similar to:

    {
    "ShardId": "shardId-000000000000",
    "SequenceNumber": "42356659532578640215890215117033555573986830588739321858"
    }

    This command will put a record containing the JSON from the --data flag onto the Kinesis data stream. The messageConsumer Lambda will consume the record and construct a valid CMA payload to trigger workflows. For this example, the record will trigger the CNMExampleWorkflow workflow as defined by the rule previously configured.

    You can view the current running executions on the Executions dashboard page which presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information.

    Verify Workflow Execution

    As detailed above, once the record is added to the Kinesis data stream, the messageConsumer Lambda will trigger the CNMExampleWorkflow .

    TranslateMessage

    TranslateMessage (which corresponds to the CNMToCMA Lambda) will take the CNM object payload and add a granules object to the CMA payload that's consistent with other Cumulus ingest tasks, and add a meta.cnm key (as well as the payload) to store the original message.

    For more on the Message Adapter, please see the Message Flow documentation.

    An example of what is happening in the CNMToCMA Lambda is as follows:

    Example Input Payload:

    "payload": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some_bucket/cumulus-test-data/pdrs/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Example Output Payload:

      "payload": {
    "cnm": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552"
    },
    "output": {
    "granules": [
    {
    "granuleId": "TestGranuleUR",
    "files": [
    {
    "path": "some-bucket/data",
    "url_path": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "some-bucket",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 12345678
    }
    ]
    }
    ]
    }
    }

    SyncGranules

    This Lambda will take the files listed in the payload and move them to s3://{deployment-private-bucket}/file-staging/{deployment-name}/{COLLECTION}/{file_name}.

    CnmResponse

    Assuming a successful execution of the workflow, this task will recover the meta.cnm key from the CMA output, and add a "SUCCESS" record to the notification Kinesis stream.

    If a prior step in the workflow has failed, this will add a "FAILURE" record to the stream instead.

    The data written to the response-endpoint should adhere to the Response Message Fields schema.

    Example CNM Success Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "SUCCESS"
    }
    }

    Example CNM Error Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "FAILURE",
    "errorCode": "PROCESSING_ERROR",
    "errorMessage": "File [cumulus-dev-a4d38f59-5e57-590c-a2be-58640db02d91/prod_20170926T11:30:36/production_file.nc] did not match gve checksum value."
    }
    }

    Note the CnmResponse state defined in the .tf workflow definition above configures $.exception to be passed to the CnmResponse Lambda keyed under config.WorkflowException. This is required for the CnmResponse code to deliver a failure response.

    To test the failure scenario, send a record missing the product.name key.


    Verify results

    Check for successful execution on the dashboard

    Following the successful execution of this workflow, you should expect to see the workflow complete successfully on the dashboard:

    Screenshot of a successful CNM workflow appearing on the executions page of the Cumulus dashboard

    Check the test granule has been delivered to S3 staging

    The test granule identified in the Kinesis record should be moved to the deployment's private staging area.

    Check for Kinesis records

    A SUCCESS notification should be present on the response-endpoint Kinesis stream.

    You should be able to validate the notification and response streams have the expected records with the following steps (the AWS CLI Kinesis Basic Stream Operations is useful to review before proceeding):

    Get a shard iterator (substituting your stream name as appropriate):

    aws kinesis get-shard-iterator \
    --shard-id shardId-000000000000 \
    --shard-iterator-type LATEST \
    --stream-name NOTIFICATION_OR_RESPONSE_STREAM_NAME

    which should result in an output to:

    {
    "ShardIterator": "VeryLongString=="
    }
    • Re-trigger the workflow by using the put-record command from
    • As the workflow completes, use the output from the get-shard-iterator command to request data from the stream:
    aws kinesis get-records --shard-iterator SHARD_ITERATOR_VALUE

    This should result in output similar to:

    {
    "Records": [
    {
    "SequenceNumber": "49586720336541656798369548102057798835250389930873978882",
    "ApproximateArrivalTimestamp": 1532664689.128,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjI4LjkxOSJ9",
    "PartitionKey": "1"
    },
    {
    "SequenceNumber": "49586720336541656798369548102059007761070005796999266306",
    "ApproximateArrivalTimestamp": 1532664707.149,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjQ2Ljk1OCJ9",
    "PartitionKey": "1"
    }
    ],
    "NextShardIterator": "AAAAAAAAAAFo9SkF8RzVYIEmIsTN+1PYuyRRdlj4Gmy3dBzsLEBxLo4OU+2Xj1AFYr8DVBodtAiXbs3KD7tGkOFsilD9R5tA+5w9SkGJZ+DRRXWWCywh+yDPVE0KtzeI0andAXDh9yTvs7fLfHH6R4MN9Gutb82k3lD8ugFUCeBVo0xwJULVqFZEFh3KXWruo6KOG79cz2EF7vFApx+skanQPveIMz/80V72KQvb6XNmg6WBhdjqAA==",
    "MillisBehindLatest": 0
    }

    Note the data encoding is not human readable and would need to be parsed/converted to be interpretable. There are many options to build a Kineis consumer such as the KCL.

    For purposes of validating the workflow, it may be simpler to locate the workflow in the Step Function Management Console and assert the expected output is similar to the below examples.

    Successful CNM Response Object Example:

    {
    "cnmResponse": {
    "provider": "TestProvider",
    "collection": "MOD09GQ",
    "version": "123456",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier ": "testIdentifier123456",
    "response": {
    "status": "SUCCESS"
    }
    }
    }

    Kinesis Record Error Handling

    messageConsumer

    The default Kinesis stream processing in the Cumulus system is configured for record error tolerance.

    When the messageConsumer fails to process a record, the failure is captured and the record is published to the kinesisFallback SNS Topic. The kinesisFallback SNS topic broadcasts the record and a subscribed copy of the messageConsumer Lambda named kinesisFallback consumes these failures.

    At this point, the normal Lambda asynchronous invocation retry behavior will attempt to process the record 3 mores times. After this, if the record cannot successfully be processed, it is written to a dead letter queue. Cumulus' dead letter queue is an SQS Queue named kinesisFailure. Operators can use this queue to inspect failed records.

    This system ensures when messageConsumer fails to process a record and trigger a workflow, the record is retried 3 times. This retry behavior improves system reliability in case of any external service failure outside of Cumulus control.

    The Kinesis error handling system - the kinesisFallback SNS topic, messageConsumer Lambda, and kinesisFailure SQS queue - come with the API package and do not need to be configured by the operator.

    To examine records that were unable to be processed at any step you need to go look at the dead letter queue {{prefix}}-kinesisFailure. Check the Simple Queue Service (SQS) console. Select your queue, and under the Queue Actions tab, you can choose View/Delete Messages. Start polling for messages and you will see records that failed to process through the messageConsumer.

    Note, these are only records that occurred when processing records from Kinesis streams. Workflow failures are handled differently.

    Kinesis Stream logging

    Notification Stream messages

    Cumulus includes two Lambdas (KinesisInboundEventLogger and KinesisOutboundEventLogger) that utilize the same code to take a Kinesis record event as input, deserialize the data field and output the modified event to the logs.

    When a kinesis rule is created, in addition to the messageConsumer event mapping, an event mapping is created to trigger KinesisInboundEventLogger to record a log of the inbound record, to allow for analysis in case of unexpected failure.

    Response Stream messages

    Cumulus also supports this feature for all outbound messages. To take advantage of this feature, you will need to set an event mapping on the KinesisOutboundEventLogger Lambda that targets your response-endpoint. You can do this in the Lambda management page for KinesisOutboundEventLogger. Add a Kinesis trigger, and configure it to target the cnmResponseStream for your workflow:

    Screenshot of the AWS console showing configuration for Kinesis stream trigger on KinesisOutboundEventLogger Lambda

    Once this is done, all records sent to the response-endpoint will also be logged in CloudWatch. For more on configuring Lambdas to trigger on Kinesis events, please see creating an event source mapping.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/data-cookbooks/error-handling/index.html b/docs/v13.0.0/data-cookbooks/error-handling/index.html index 2551529c3c3..1da1804df5a 100644 --- a/docs/v13.0.0/data-cookbooks/error-handling/index.html +++ b/docs/v13.0.0/data-cookbooks/error-handling/index.html @@ -5,7 +5,7 @@ Error Handling in Workflows | Cumulus Documentation - + @@ -45,7 +45,7 @@ Service Exception. See this documentation on configuring your workflow to handle transient lambda errors.

    Example state machine definition:

    {
    "Comment": "Tests Workflow from Kinesis Stream",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "Path": "$.payload",
    "TargetPath": "$.payload"
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": ["States.ALL"],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowSucceeded"
    },
    "CnmResponseFail": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowFailed"
    },
    "WorkflowSucceeded": {
    "Type": "Succeed"
    },
    "WorkflowFailed": {
    "Type": "Fail",
    "Cause": "Workflow failed"
    }
    }
    }

    The above results in a workflow which is visualized in the diagram below:

    Screenshot of a visualization of an AWS Step Function workflow definition with branching logic for failures

    Summary

    Error handling should (mostly) be the domain of workflow configuration.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/data-cookbooks/hello-world/index.html b/docs/v13.0.0/data-cookbooks/hello-world/index.html index 3e632fa5e38..78b87f13c02 100644 --- a/docs/v13.0.0/data-cookbooks/hello-world/index.html +++ b/docs/v13.0.0/data-cookbooks/hello-world/index.html @@ -5,14 +5,14 @@ HelloWorld Workflow | Cumulus Documentation - +
    Version: v13.0.0

    HelloWorld Workflow

    Example task meant to be a sanity check/introduction to the Cumulus workflows.

    Pre-Deployment Configuration

    Workflow Configuration

    A workflow definition can be found in the template repository hello_world_workflow module.

    {
    "Comment": "Returns Hello World",
    "StartAt": "HelloWorld",
    "States": {
    "HelloWorld": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.hello_world_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    }

    Workflow error-handling can be configured as discussed in the Error-Handling cookbook.

    Task Configuration

    The HelloWorld task is provided for you as part of the cumulus terraform module, no configuration is needed.

    If you want to manually deploy your own version of this Lambda for testing, you can copy the Lambda resource definition located in the Cumulus source code at cumulus/tf-modules/ingest/hello-world-task.tf. The Lambda source code is located in the Cumulus source code at 'cumulus/tasks/hello-world'.

    Execution

    We will focus on using the Cumulus dashboard to schedule the execution of a HelloWorld workflow.

    Our goal here is to create a rule through the Cumulus dashboard that will define the scheduling and execution of our HelloWorld workflow. Let's navigate to the Rules page and click Add a rule.

    {
    "collection": { # collection values can be configured and found on the Collections page
    "name": "${collection_name}",
    "version": "${collection_version}"
    },
    "name": "helloworld_rule",
    "provider": "${provider}", # found on the Providers page
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "workflow": "HelloWorldWorkflow" # This can be found on the Workflows page
    }

    Screenshot of AWS Step Function execution graph for the HelloWorld workflow Executed workflow as seen in AWS Console

    Output/Results

    The Executions page presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information. The rule defined in the previous section should start an execution of its own accord, and the status of that execution can be tracked here.

    To get some deeper information on the execution, click on the value in the Name column of your execution of interest. This should bring up a visual representation of the workflow similar to that shown above, execution details, and a list of events.

    Summary

    Setting up the HelloWorld workflow on the Cumulus dashboard is the tip of the iceberg, so to speak. The task and step-function need to be configured before Cumulus deployment. A compatible collection and provider must be configured and applied to the rule. Finally, workflow execution status can be viewed via the workflows tab on the dashboard.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/data-cookbooks/ingest-notifications/index.html b/docs/v13.0.0/data-cookbooks/ingest-notifications/index.html index 89bae684aad..ed19f73faf9 100644 --- a/docs/v13.0.0/data-cookbooks/ingest-notifications/index.html +++ b/docs/v13.0.0/data-cookbooks/ingest-notifications/index.html @@ -5,13 +5,13 @@ Ingest Notification in Workflows | Cumulus Documentation - +
    Version: v13.0.0

    Ingest Notification in Workflows

    On deployment, an SQS queue and three SNS topics, one for executions, granules, and PDRs, are created and used for handling notification messages related to the workflow.

    The ingest notification reporting SQS queue is populated via a Cloudwatch rule for any Step Function execution state transitions. The sfEventSqsToDbRecords Lambda consumes this queue. The queue and Lambda are included in the cumulus module and the Cloudwatch rule in the workflow module and are included by default in a Cumulus deployment.

    The sfEventSqsToDbRecords Lambda function reads from the sfEventSqsToDbRecordsInputQueue queue and updates the RDS database records for granules, executions, and PDRs. When the records are updated, messages are posted to the three SNS topics. This Lambda is invoked both when the workflow starts and when it reaches a terminal state (completion or failure).

    Diagram of architecture for reporting workflow ingest notifications from AWS Step Functions

    Sending SQS messages to report status

    Publishing granule/PDR reports directly to the SQS queue

    If you have a non-Cumulus workflow or process ingesting data and would like to update the status of your granules or PDRs, you can publish directly to the reporting SQS queue. Publishing messages to this queue will result in those messages being stored as granule/PDR records in the Cumulus database and having the status of those granules/PDRs being visible on the Cumulus dashboard. The queue does have certain expectations as it expects a Cumulus Message nested within a Cloudwatch Step Function Event object.

    Posting directly to the queue will require knowing the queue URL. Assuming that you are using the cumulus module for your deployment, you can get the queue URL by adding them to outputs.tf for your Terraform deployment as in our example deployment:

    output "stepfunction_event_reporter_queue_url" {
    value = module.cumulus.stepfunction_event_reporter_queue_url
    }

    output "report_executions_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_granules_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_pdrs_sns_topic_arn" {
    value = module.cumulus.report_pdrs_sns_topic_arn
    }

    Then, when you run terraform deploy, you should see the topic ARNs printed to your console:

    Outputs:
    ...
    stepfunction_event_reporter_queue_url = https://sqs.us-east-1.amazonaws.com/xxxxxxxxx/<prefix>-sfEventSqsToDbRecordsInputQueue
    report_executions_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_granules_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_pdrs_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-pdrs-topic

    Once you have the queue URL, you can use the AWS SDK for your language of choice to publish messages to the topic. The expected format of these messages is that of a Cloudwatch Step Function event containing a Cumulus message. For SUCCEEDED events, the Cumulus message is expected to be in detail.output. For all other events statuses, a Cumulus Message is expected in detail.input. The Cumulus Message populating these fields MUST be a JSON string, not an object. Messages that do not conform to the schemas will fail to be created as records.

    If you are not seeing records persist to the database or show up in the Cumulus dashboard, you can investigate the Cloudwatch logs of the SQS consumer Lambda:

    • /aws/lambda/<prefix>-sfEventSqsToDbRecords

    In a workflow

    As described above, ingest notifications will automatically be published to the SNS topics on workflow start and completion/failure, so you should not include a workflow step to publish the initial or final status of your workflows.

    However, if you want to report your ingest status at any point during a workflow execution, you can add a workflow step using the SfSqsReport Lambda. In the following example from cumulus-tf/parse_pdr_workflow.tf, the ParsePdr workflow is configured to use the SfSqsReport Lambda, primarily to update the PDR ingestion status.

    Note: ${sf_sqs_report_task_arn} is an interpolated value referring to a Terraform resource. See the example deployment code for the ParsePdr workflow.

      "PdrStatusReport": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    },
    "ResultPath": null,
    "Type": "Task",
    "Resource": "${sf_sqs_report_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WaitForSomeTime"
    },

    Subscribing additional listeners to SNS topics

    Additional listeners to SNS topics can be configured in a .tf file for your Cumulus deployment. Shown below is configuration that subscribes an additional Lambda function (test_lambda) to receive messages from the report_executions SNS topic. To subscribe to the report_granules or report_pdrs SNS topics instead, simply replace report_executions in the code block below with either of those values.

    resource "aws_lambda_function" "test_lambda" {
    function_name = "${var.prefix}-testLambda"
    filename = "./testLambda.zip"
    source_code_hash = filebase64sha256("./testLambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"
    }

    resource "aws_sns_topic_subscription" "test_lambda" {
    topic_arn = module.cumulus.report_executions_sns_topic_arn
    protocol = "lambda"
    endpoint = aws_lambda_function.test_lambda.arn
    }

    resource "aws_lambda_permission" "test_lambda" {
    action = "lambda:InvokeFunction"
    function_name = aws_lambda_function.test_lambda.arn
    principal = "sns.amazonaws.com"
    source_arn = module.cumulus.report_executions_sns_topic_arn
    }

    SNS message format

    Subscribers to the SNS topics can expect to find the published message in the SNS event at Records[0].Sns.Message. The message will be a JSON stringified version of the ingest notification record for an execution or a PDR. For granules, the message will be a JSON stringified object with ingest notification record in the record property and the event type as the event property.

    The ingest notification record of the execution, granule, or PDR should conform to the data model schema for the given record type.

    Summary

    Workflows can be configured to send SQS messages at any point using the sf-sqs-report task.

    Additional listeners can be easily configured to trigger when messages are sent to the SNS topics.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/data-cookbooks/queue-post-to-cmr/index.html b/docs/v13.0.0/data-cookbooks/queue-post-to-cmr/index.html index e1a5ef1a41e..f9c414acc41 100644 --- a/docs/v13.0.0/data-cookbooks/queue-post-to-cmr/index.html +++ b/docs/v13.0.0/data-cookbooks/queue-post-to-cmr/index.html @@ -5,13 +5,13 @@ Queue PostToCmr | Cumulus Documentation - +
    Version: v13.0.0

    Queue PostToCmr

    In this document, we walk through handling CMR errors in workflows by queueing PostToCmr. We assume that the user already has an ingest workflow setup.

    Overview

    The general concept is that the last task of the ingest workflow will be QueueWorkflow, which queues the publish workflow. The publish workflow contains the PostToCmr task and if a CMR error occurs during PostToCmr, the publish workflow will add itself back onto the queue so that it can be executed when CMR is back online. This is achieved by leveraging the QueueWorkflow task again in the publish workflow. The following diagram demonstrates this queueing process.

    Diagram of workflow queueing

    Ingest Workflow

    The last step should be the QueuePublishWorkflow step. It should be configured with a queueUrl and workflow. In this case, the queueUrl is a throttled queue. Any queueUrl can be specified here which is useful if you would like to use a lower priority queue. The workflow is the unprefixed workflow name that you would like to queue (e.g. PublishWorkflow).

      "QueuePublishWorkflowStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "workflow": "{$.meta.workflow}",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Publish Workflow

    Configure the Catch section of your PostToCmr task to proceed to QueueWorkflow if a CMRInternalError is caught. Any other error will cause the workflow to fail.

      "Catch": [
    {
    "ErrorEquals": [
    "CMRInternalError"
    ],
    "Next": "RequeueWorkflow"
    },
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],

    Then, configure the QueueWorkflow task similarly to its configuration in the ingest workflow. This time, pass the current publish workflow to the task config. This allows for the publish workflow to be requeued when there is a CMR error.

    {
    "RequeueWorkflow": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "workflow": "PublishGranuleQueue",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    - + \ No newline at end of file diff --git a/docs/v13.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html b/docs/v13.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html index 96d8e5350e0..6ee055adf7c 100644 --- a/docs/v13.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html +++ b/docs/v13.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html @@ -5,13 +5,13 @@ Run Step Function Tasks in AWS Lambda or Docker | Cumulus Documentation - +
    Version: v13.0.0

    Run Step Function Tasks in AWS Lambda or Docker

    Overview

    AWS Step Function Tasks can run tasks on AWS Lambda or on AWS Elastic Container Service (ECS) as a Docker container.

    Lambda provides serverless architecture, providing the best option for minimizing cost and server management. ECS provides the fullest extent of AWS EC2 resources via the flexibility to execute arbitrary code on any AWS EC2 instance type.

    When to use Lambda

    You should use AWS Lambda whenever all of the following are true:

    • The task runs on one of the supported Lambda Runtimes. At time of this writing, supported runtimes include versions of python, Java, Ruby, node.js, Go and .NET.
    • The lambda package is less than 50 MB in size, zipped.
    • The task consumes less than each of the following resources:
      • 3008 MB memory allocation
      • 512 MB disk storage (must be written to /tmp)
      • 15 minutes of execution time

    See this page for a complete and up-to-date list of AWS Lambda limits.

    If your task requires more than any of these resources or an unsupported runtime, creating a Docker image which can be run on ECS is the way to go. Cumulus supports running any lambda package (and its configured layers) as a Docker container with cumulus-ecs-task.

    Step Function Activities and cumulus-ecs-task

    Step Function Activities enable a state machine task to "publish" an activity task which can be picked up by any activity worker. Activity workers can run pretty much anywhere, but Cumulus workflows support the cumulus-ecs-task activity worker. The cumulus-ecs-task worker runs as a Docker container on the Cumulus ECS cluster.

    The cumulus-ecs-task container takes an AWS Lambda Amazon Resource Name (ARN) as an argument (see --lambdaArn in the example below). This ARN argument is defined at deployment time. The cumulus-ecs-task worker polls for new Step Function Activity Tasks. When a Step Function executes, the worker (container) picks up the activity task and runs the code contained in the lambda package defined on deployment.

    Example: Replacing AWS Lambda with a Docker container run on ECS

    This example will use an already-defined workflow from the cumulus module that includes the QueueGranules task in its configuration.

    The following example is an excerpt from the Discover Granules workflow containing the step definition for the QueueGranules step:

    Note: ${ingest_granule_workflow_name} and ${queue_granules_task_arn} are interpolated values that refer to Terraform resources. See the example deployment code for the Discover Granules workflow.

      "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "queueUrl": "{$.meta.queues.startSF}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Given it has been discovered this task can no longer run in AWS Lambda, you can instead run it on the Cumulus ECS cluster by adding the following resources to your terraform deployment (by either adding a new .tf file or updating an existing one):

    • A aws_sfn_activity resource:
    resource "aws_sfn_activity" "queue_granules" {
    name = "${var.prefix}-QueueGranules"
    }
    • An instance of the cumulus_ecs_service module (found on the Cumulus releases page configured to provide the QueueGranules task:

    module "queue_granules_service" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-ecs-service.zip"

    prefix = var.prefix
    name = "QueueGranules"

    cluster_arn = module.cumulus.ecs_cluster_arn
    desired_count = 1
    image = "cumuluss/cumulus-ecs-task:1.7.0"

    cpu = 400
    memory_reservation = 700

    environment = {
    AWS_DEFAULT_REGION = data.aws_region.current.name
    }
    command = [
    "cumulus-ecs-task",
    "--activityArn",
    aws_sfn_activity.queue_granules.id,
    "--lambdaArn",
    module.cumulus.queue_granules_task.task_arn,
    "--lastModified",
    module.cumulus.queue_granules_task.last_modified_date
    ]
    alarms = {
    MemoryUtilizationHigh = {
    comparison_operator = "GreaterThanThreshold"
    evaluation_periods = 1
    metric_name = "MemoryUtilization"
    statistic = "SampleCount"
    threshold = 75
    }
    }
    }

    Please note: If you have updated the code for the Lambda specified by --lambdaArn, you will have to manually restart the tasks in your ECS service before invocation of the Step Function activity will use the updated Lambda code.

    • An updated Discover Granules workflow) to utilize the new resource (the Resource key in the QueueGranules step has been updated to:

    "Resource": "${aws_sfn_activity.queue_granules.id}")`

    If you then run this workflow in place of the DiscoverGranules workflow, the QueueGranules step would run as an ECS task instead of a lambda.

    Final note

    Step Function Activities and AWS Lambda are not the only ways to run tasks in an AWS Step Function. Learn more about other service integrations, including direct ECS integration via the AWS Service Integrations page.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/data-cookbooks/sips-workflow/index.html b/docs/v13.0.0/data-cookbooks/sips-workflow/index.html index d0b563db52b..5ce5edc0336 100644 --- a/docs/v13.0.0/data-cookbooks/sips-workflow/index.html +++ b/docs/v13.0.0/data-cookbooks/sips-workflow/index.html @@ -5,7 +5,7 @@ Science Investigator-led Processing Systems (SIPS) | Cumulus Documentation - + @@ -16,7 +16,7 @@ we're just going to create a onetime throw-away rule that will be easy to test with. This rule will kick off the DiscoverAndQueuePdrs workflow, which is the beginning of a Cumulus SIPS workflow:

    Screenshot of a Cumulus rule configuration

    Note: A list of configured workflows exists under the "Workflows" in the navigation bar on the Cumulus dashboard. Additionally, one can find a list of executions and their respective status in the "Executions" tab in the navigation bar.

    DiscoverAndQueuePdrs Workflow

    This workflow will discover PDRs and queue them to be processed. Duplicate PDRs will be dealt with according to the configured duplicate handling setting in the collection. The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. DiscoverPdrs - source
    2. QueuePdrs - source

    Screenshot of execution graph for discover and queue PDRs workflow in the AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the discover_and_queue_pdrs_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    ParsePdr Workflow

    The ParsePdr workflow will parse a PDR, queue the specified granules (duplicates are handled according to the duplicate handling setting) and periodically check the status of those queued granules. This workflow will not succeed until all the granules included in the PDR are successfully ingested. If one of those fails, the ParsePdr workflow will fail. NOTE that ParsePdr may spin up multiple IngestGranule workflows in parallel, depending on the granules included in the PDR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. ParsePdr - source
    2. QueueGranules - source
    3. CheckStatus - source

    Screenshot of execution graph for SIPS Parse PDR workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the parse_pdr_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    IngestGranule Workflow

    The IngestGranule workflow processes and ingests a granule and posts the granule metadata to CMR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. SyncGranule - source.
    2. CmrStep - source

    Additionally this workflow requires a processing step you must provide. The ProcessingStep step in the workflow picture below is an example of a custom processing step.

    Note: Using the CmrStep is not required and can be left out of the processing trajectory if desired (for example, in testing situations).

    Screenshot of execution graph for SIPS IngestGranule workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the ingest_and_publish_granule_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    Summary

    In this cookbook we went over setting up a collection, rule, and provider for a SIPS workflow. Once we had the setup completed, we looked over the Cumulus workflows that participate in parsing PDRs, ingesting and processing granules, and updating CMR.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/data-cookbooks/throttling-queued-executions/index.html b/docs/v13.0.0/data-cookbooks/throttling-queued-executions/index.html index 976a242649d..92f0bbbc50e 100644 --- a/docs/v13.0.0/data-cookbooks/throttling-queued-executions/index.html +++ b/docs/v13.0.0/data-cookbooks/throttling-queued-executions/index.html @@ -5,13 +5,13 @@ Throttling queued executions | Cumulus Documentation - +
    Version: v13.0.0

    Throttling queued executions

    In this entry, we will walk through how to create an SQS queue for scheduling executions which will be used to limit those executions to a maximum concurrency. And we will see how to configure our Cumulus workflows/rules to use this queue.

    We will also review the architecture of this feature and highlight some implementation notes.

    Limiting the number of executions that can be running from a given queue is useful for controlling the cloud resource usage of workflows that may be lower priority, such as granule reingestion or reprocessing campaigns. It could also be useful for preventing workflows from exceeding known resource limits, such as a maximum number of open connections to a data provider.

    Implementing the queue

    Create and deploy the queue

    Add a new queue

    In a .tf file for your Cumulus deployment, add a new SQS queue:

    resource "aws_sqs_queue" "background_job_queue" {
    name = "${var.prefix}-backgroundJobQueue"
    receive_wait_time_seconds = 20
    visibility_timeout_seconds = 60
    }

    Set maximum executions for the queue

    Define the throttled_queues variable for the cumulus module in your Cumulus deployment to specify the maximum concurrent executions for the queue.

    module "cumulus" {
    # ... other variables

    throttled_queues = [{
    url = aws_sqs_queue.background_job_queue.id,
    execution_limit = 5
    }]
    }

    Setup consumer for the queue

    Add the sqs2sfThrottle Lambda as the consumer for the queue and add a Cloudwatch event rule/target to read from the queue on a scheduled basis.

    Please note: You must use the sqs2sfThrottle Lambda as the consumer for any queue with a queue execution limit or else the execution throttling will not work correctly. Additionally, please allow at least 60 seconds after creation before using the queue while associated infrastructure and triggers are set up and made ready.

    aws_sqs_queue.background_job_queue.id refers to the queue resource defined above.

    resource "aws_cloudwatch_event_rule" "background_job_queue_watcher" {
    schedule_expression = "rate(1 minute)"
    }

    resource "aws_cloudwatch_event_target" "background_job_queue_watcher" {
    rule = aws_cloudwatch_event_rule.background_job_queue_watcher.name
    arn = module.cumulus.sqs2sfThrottle_lambda_function_arn
    input = jsonencode({
    messageLimit = 500
    queueUrl = aws_sqs_queue.background_job_queue.id
    timeLimit = 60
    })
    }

    resource "aws_lambda_permission" "background_job_queue_watcher" {
    action = "lambda:InvokeFunction"
    function_name = module.cumulus.sqs2sfThrottle_lambda_function_arn
    principal = "events.amazonaws.com"
    source_arn = aws_cloudwatch_event_rule.background_job_queue_watcher.arn
    }

    Re-deploy your Cumulus application

    Follow the instructions to re-deploy your Cumulus application. After you have re-deployed, your workflow template will be updated to the include information about the queue (the output below is partial output from an expected workflow template):

    {
    "cumulus_meta": {
    "queueExecutionLimits": {
    "<backgroundJobQueue_SQS_URL>": 5
    }
    }
    }

    Integrate your queue with workflows and/or rules

    Integrate queue with queuing steps in workflows

    For any workflows using QueueGranules or QueuePdrs that you want to use your new queue, update the Cumulus configuration of those steps in your workflows.

    As seen in this partial configuration for a QueueGranules step, update the queueUrl to reference the new throttled queue:

    Note: ${ingest_granule_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverGranules workflow.

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}"
    }
    }
    }
    }
    }

    Similarly, for a QueuePdrs step:

    Note: ${parse_pdr_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverPdrs workflow.

    {
    "QueuePdrs": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "parsePdrWorkflow": "${parse_pdr_workflow_name}"
    }
    }
    }
    }
    }

    After making these changes, re-deploy your Cumulus application for the execution throttling to take effect on workflow executions queued by these workflows.

    Create/update a rule to use your new queue

    Create or update a rule definition to include a queueUrl property that refers to your new queue:

    {
    "name": "s3_provider_rule",
    "workflow": "DiscoverAndQueuePdrs",
    "provider": "s3_provider",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "queueUrl": "<backgroundJobQueue_SQS_URL>" // configure rule to use your queue URL
    }

    After creating/updating the rule, any subsequent invocations of the rule should respect the maximum number of executions when starting workflows from the queue.

    Architecture

    Architecture diagram showing how executions started from a queue are throttled to a maximum concurrent limit

    Execution throttling based on the queue works by manually keeping a count (semaphore) of how many executions are running for the queue at a time. The key operation that prevents the number of executions from exceeding the maximum for the queue is that before starting new executions, the sqs2sfThrottle Lambda attempts to increment the semaphore and responds as follows:

    • If the increment operation is successful, then the count was not at the maximum and an execution is started
    • If the increment operation fails, then the count was already at the maximum so no execution is started

    Final notes

    Limiting the number of concurrent executions for work scheduled via a queue has several consequences worth noting:

    • The number of executions that are running for a given queue will be limited to the maximum for that queue regardless of which workflow(s) are started.
    • If you use the same queue to schedule executions across multiple workflows/rules, then the limit on the total number of executions running concurrently will be applied to all of the executions scheduled across all of those workflows/rules.
    • If you are scheduling the same workflow both via a queue with a maxExecutions value and a queue without a maxExecutions value, only the executions scheduled via the queue with the maxExecutions value will be limited to the maximum.
    - + \ No newline at end of file diff --git a/docs/v13.0.0/data-cookbooks/tracking-files/index.html b/docs/v13.0.0/data-cookbooks/tracking-files/index.html index 6dc7a123770..763282c7ad7 100644 --- a/docs/v13.0.0/data-cookbooks/tracking-files/index.html +++ b/docs/v13.0.0/data-cookbooks/tracking-files/index.html @@ -5,7 +5,7 @@ Tracking Ancillary Files | Cumulus Documentation - + @@ -19,7 +19,7 @@ The UMM-G column reflects the RelatedURL's Type derived from the CNM type, whereas the ECHO10 column shows how the CNM type affects the destination element.

    CNM TypeUMM-G RelatedUrl.TypeECHO10 Location
    ancillary'VIEW RELATED INFORMATION'OnlineResource
    data'GET DATA'(HTTPS URL) or 'GET DATA VIA DIRECT ACCESS'(S3 URI)OnlineAccessURL
    browse'GET RELATED VISUALIZATION'AssociatedBrowseImage
    linkage'EXTENDED METADATA'OnlineResource
    metadata'EXTENDED METADATA'OnlineResource
    qa'EXTENDED METADATA'OnlineResource

    Common Use Cases

    This section briefly documents some common use cases and the recommended configuration for the file. The examples shown here are for the DiscoverGranules use case, which allows configuration at the Cumulus dashboard level. The other two cases covered in the ancillary metadata documentation require configuration at the provider notification level (either CNM message or PDR) and are not covered here.

    Configuring browse imagery:

    {
    "bucket": "public",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_[\\d]{1}.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_1.jpg",
    "type": "browse"
    }

    Configuring a documentation entry:

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_README.pdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_README.pdf",
    "type": "metadata"
    }

    Configuring other associated files (use types metadata or qa as appropriate):

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_QA.txt$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_QA.txt",
    "type": "qa"
    }
    - + \ No newline at end of file diff --git a/docs/v13.0.0/deployment/api-gateway-logging/index.html b/docs/v13.0.0/deployment/api-gateway-logging/index.html index ad7c88a9f78..d7e23f26985 100644 --- a/docs/v13.0.0/deployment/api-gateway-logging/index.html +++ b/docs/v13.0.0/deployment/api-gateway-logging/index.html @@ -5,13 +5,13 @@ API Gateway Logging | Cumulus Documentation - +
    Version: v13.0.0

    API Gateway Logging

    Enabling API Gateway logging

    In order to enable distribution API Access and execution logging, configure the TEA deployment by setting log_api_gateway_to_cloudwatch on the thin_egress_app module:

    log_api_gateway_to_cloudwatch = true

    This enables the distribution API to send its logs to the default CloudWatch location: API-Gateway-Execution-Logs_<RESTAPI_ID>/<STAGE>

    Configure Permissions for API Gateway Logging to CloudWatch

    Instructions for enabling account level logging from API Gateway to CloudWatch

    This is a one time operation that must be performed on each AWS account to allow API Gateway to push logs to CloudWatch.

    Create a policy document

    The AmazonAPIGatewayPushToCloudWatchLogs managed policy, with an ARN of arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs, has all the required permissions to enable API Gateway logging to CloudWatch. To grant these permissions to your account, first create an IAM role with apigateway.amazonaws.com as its trusted entity.

    Save this snippet as apigateway-policy.json.

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "",
    "Effect": "Allow",
    "Principal": {
    "Service": "apigateway.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
    }
    ]
    }

    Create an account role to act as ApiGateway and write to CloudWatchLogs

    NASA users in NGAP: be sure to use your account's permission boundary.

    aws iam create-role \
    --role-name ApiGatewayToCloudWatchLogs \
    [--permissions-boundary <permissionBoundaryArn>] \
    --assume-role-policy-document file://apigateway-policy.json

    Note the ARN of the returned role for the last step.

    Attach correct permissions to role

    Next attach the AmazonAPIGatewayPushToCloudWatchLogs policy to the IAM role.

    aws iam attach-role-policy \
    --role-name ApiGatewayToCloudWatchLogs \
    --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs"

    Update Account API Gateway settings with correct permissions

    Finally, set the IAM role ARN on the cloudWatchRoleArn property on your API Gateway Account settings.

    aws apigateway update-account \
    --patch-operations op='replace',path='/cloudwatchRoleArn',value='<ApiGatewayToCloudWatchLogs ARN>'

    Configure API Gateway CloudWatch Logs Delivery

    See Configure Cloudwatch Logs Delivery

    - + \ No newline at end of file diff --git a/docs/v13.0.0/deployment/choosing_configuring_rds/index.html b/docs/v13.0.0/deployment/choosing_configuring_rds/index.html index aad081cb2e8..5c258e7b183 100644 --- a/docs/v13.0.0/deployment/choosing_configuring_rds/index.html +++ b/docs/v13.0.0/deployment/choosing_configuring_rds/index.html @@ -5,7 +5,7 @@ Choosing and configuration your RDS database | Cumulus Documentation - + @@ -37,7 +37,7 @@ using this module to create your RDS cluster, you can configure the autoscaling timeout action, the cluster minimum and maximum capacity, and more as seen in the supported variables for the module.

    Unfortunately, Terraform currently doesn't allow specifying the autoscaling timeout itself, so that value will have to be manually configured in the AWS console or CLI.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/deployment/cloudwatch-logs-delivery/index.html b/docs/v13.0.0/deployment/cloudwatch-logs-delivery/index.html index 4fc6261c710..5c2b882f303 100644 --- a/docs/v13.0.0/deployment/cloudwatch-logs-delivery/index.html +++ b/docs/v13.0.0/deployment/cloudwatch-logs-delivery/index.html @@ -5,13 +5,13 @@ Configure Cloudwatch Logs Delivery | Cumulus Documentation - +
    Version: v13.0.0

    Configure Cloudwatch Logs Delivery

    As an optional configuration step, it is possible to deliver CloudWatch logs to a cross-account shared AWS::Logs::Destination. An operator does this by configuring the cumulus module for your deployment as shown below. The value of the log_destination_arn variable is the ARN of a writeable log destination.

    The value can be either an AWS::Logs::Destination or a Kinesis Stream ARN to which your account can write.

    log_destination_arn           = arn:aws:[kinesis|logs]:us-east-1:123456789012:[streamName|destination:logDestinationName]

    Logs Sent

    Be default, the following logs will be sent to the destination when one is given.

    • Ingest logs
    • Async Operation logs
    • Thin Egress App API Gateway logs (if configured)

    Additional Logs

    If additional logs are needed, you can configure additional_log_groups_to_elk with the Cloudwatch log groups you want to send to the destination. additional_log_groups_to_elk is a map with the key as a descriptor and the value with the Cloudwatch log group name.

    additional_log_groups_to_elk = {
    "HelloWorldTask" = "/aws/lambda/cumulus-example-HelloWorld"
    "MyCustomTask" = "my-custom-task-log-group"
    }
    - + \ No newline at end of file diff --git a/docs/v13.0.0/deployment/components/index.html b/docs/v13.0.0/deployment/components/index.html index b0761e1ae75..d2ded944ad0 100644 --- a/docs/v13.0.0/deployment/components/index.html +++ b/docs/v13.0.0/deployment/components/index.html @@ -5,7 +5,7 @@ Component-based Cumulus Deployment | Cumulus Documentation - + @@ -39,7 +39,7 @@ Terraform at the same time.

    With remote state, Terraform writes the state data to a remote data store, which can then be shared between all members of a team.

    The recommended approach for handling remote state with Cumulus is to use the S3 backend. This backend stores state in S3 and uses a DynamoDB table for locking.

    See the deployment documentation for a walk-through of creating resources for your remote state using an S3 backend.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/deployment/create_bucket/index.html b/docs/v13.0.0/deployment/create_bucket/index.html index 54bef6b800d..c97dd8ca8cc 100644 --- a/docs/v13.0.0/deployment/create_bucket/index.html +++ b/docs/v13.0.0/deployment/create_bucket/index.html @@ -5,13 +5,13 @@ Creating an S3 Bucket | Cumulus Documentation - +
    Version: v13.0.0

    Creating an S3 Bucket

    Buckets can be created on the command line with AWS CLI or via the web interface on the AWS console.

    When creating a protected bucket (a bucket containing data which will be served through the distribution API), make sure to enable S3 server access logging. See S3 Server Access Logging for more details.

    Command line

    Using the AWS command line tool create-bucket s3api subcommand:

    $ aws s3api create-bucket \
    --bucket foobar-internal \
    --region us-west-2 \
    --create-bucket-configuration LocationConstraint=us-west-2
    {
    "Location": "/foobar-internal"
    }

    Note: The region and create-bucket-configuration arguments are only necessary if you are creating a bucket outside of the us-east-1 region.

    Please note security settings and other bucket options can be set via the options listed in the s3api documentation.

    Repeat the above step for each bucket to be created.

    Web interface

    See: AWS "Creating a Bucket" documentation

    - + \ No newline at end of file diff --git a/docs/v13.0.0/deployment/cumulus_distribution/index.html b/docs/v13.0.0/deployment/cumulus_distribution/index.html index 8c3a5a10cac..a61ea6d3f2b 100644 --- a/docs/v13.0.0/deployment/cumulus_distribution/index.html +++ b/docs/v13.0.0/deployment/cumulus_distribution/index.html @@ -5,14 +5,14 @@ Using the Cumulus Distribution API | Cumulus Documentation - +
    Version: v13.0.0

    Using the Cumulus Distribution API

    The Cumulus Distribution API is a set of endpoints that can be used to enable AWS Cognito authentication when downloading data from S3.

    Configuring a Cumulus Distribution deployment

    The Cumulus Distribution API is included in the main Cumulus repo. It is available as part of the terraform-aws-cumulus.zip archive in the latest release.

    These steps assume you're using the Cumulus Deployment Template but can also be used for custom deployments.

    To configure a deployment to use Cumulus Distribution:

    1. Remove or comment the "Thin Egress App Settings" in the Cumulus Template Deploy and enable the Cumulus Distribution settings.
    2. Delete or comment the contents of thin_egress_app.tf and the corresponding Thin Egress App outputs in outputs.tf. These are not necessary for a Cumulus Distribution deployment.
    3. Uncomment the Cumulus Distribution outputs in outputs.tf.
    4. Rename cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.example to cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.

    Cognito Application and User Credentials

    The major prerequisite for using the Cumulus Distribution API is to set up Cognito. If operating within NGAP, this should already be done for you. If operating outside of NGAP, you must set up Cognito yourself, which is beyond the scope of this documentation.

    Given that Cognito is set up, in order to be able to download granule files via the Cumulus Distribution API, you must obtain Cognito user credentials, because any attempt to download such files (that will be, or have been, published to the CMR via your Cumulus deployment) will result in a prompt for you to supply Cognito user credentials. To obtain your own user credentials, talk to your product owner or scrum master for additional information. They should either know how to create the credentials, know who can create them for the team, or be the liaison to the Cognito team.

    Further, whoever helps to obtain your Cognito user credentials should also be able to supply you with the values for the following new variables that you must add to your cumulus-tf/terraform.tfvars file:

    • csdap_host_url: The URL of the Cognito service to which your Cumulus deployment will make Cognito API calls during a distribution (download) event
    • csdap_client_id: The client ID for the Cumulus application registered within the Cognito service
    • csdap_client_password: The client password for the Cumulus application registered within the Cognito service

    Although you might have to wait a bit for your Cognito user credentials, the remaining instructions do not depend upon having them, so you may continue with these instructions while waiting for your credentials.

    Cumulus Distribution URL

    Your Cumulus Distribution URL is used by Cumulus to generate download URLs as part of the granule metadata generated and published to the CMR. For example, a granule download URL will be of the form <distribution url>/<protected bucket>/<key> (or <distribution url>/path/to/file, if using a custom bucket map, as explained further below).

    By default, the value of your distribution URL is the URL of your private Cumulus Distribution API Gateway (the API Gateway named <prefix>-distribution, once you deploy the Cumulus Distribution module). Therefore, by default, the generated download URLs are private, and thus inaccessible directly, but there are 2 ways to address this issue (both of which are detailed below): (a) use tunneling (typically in development) or (b) put a CloudFront URL in front of your API Gateway (typically in production, and perhaps UAT and/or SIT).

    In either case, you must first know the default URL (i.e., the URL for the private Cumulus Distribution API Gateway). In order to obtain this default URL, you must first deploy your cumulus-tf module with the new Cumulus Distribution module, and once your initial deployment is complete, one of the Terraform outputs will be cumulus_distribution_api_uri, which is the URL for the private API Gateway.

    You may override this default URL by adding a cumulus_distribution_url variable to your cumulus-tf/terraform.tfvars file, and setting it to one of the following values (both of which are explained below):

    1. The default URL, but with a port added to it, in order to allow you to configure tunneling (typically only in development)
    2. A CloudFront URL placed in front of your Cumulus Distribution API Gateway (typically only for Production, but perhaps also for a UAT or SIT environment)

    The following subsections explain these approaches, in turn.

    Using your Cumulus Distribution API Gateway URL as your distribution URL

    Since your Cumulus Distribution API Gateway URL is private, the only way you can use it to confirm that your integration with Cognito is working is by using tunneling (again, generally for development), as described here. Here is an outline of the required steps, with details provided further below:

    1. Create/import a key pair into your AWS EC2 service (if you haven't already done so)
    2. Add a reference to the name of the key pair to your Terraform variables (we'll set the key_name Terraform variable)
    3. Choose an open local port on your machine (we'll use 9000 in the following details)
    4. Add a reference to the value of your cumulus_distribution_api_uri (mentioned earlier), including your chosen port (we'll set the cumulus_distribution_url Terraform variable)
    5. Redeploy Cumulus
    6. Add an entry to your /etc/hosts file
    7. Add a redirect URI to Cognito, via the Cognito API
    8. Install the Session Manager Plugin for the AWS CLI (if you haven't already done so; assuming you have already installed the AWS CLI)
    9. Add a sample file to S3 to test downloading via Cognito

    To create or import an existing key pair, you can use the AWS CLI (see aws ec2 import-key-pair), or the AWS Console (see Amazon EC2 key pairs and Linux instances).

    Once your key pair is added to AWS, add the following to your cumulus-tf/terraform.tfvars file:

    key_name = "<name>"
    cumulus_distribution_url = "https://<id>.execute-api.<region>.amazonaws.com:<port>/dev/"

    where:

    • <name> is the name of the key pair you just added to AWS
    • <id> and <region> are the corresponding parts from your cumulus_distribution_api_uri output variable
    • <port> is your open local port of choice (9000 is typically a good choice)

    Once you save your variable changes, redeploy your cumulus-tf module.

    While your deployment runs, add the following entry to your /etc/hosts file, replacing <hostname> with the host name of the cumulus_distribution_url Terraform variable you just added above:

    localhost <hostname>

    Next, you'll need to use the Cognito API to add the value of your cumulus_distribution_url Terraform variable as a Cognito redirect URI. To do so, use your favorite tool (e.g., curl, wget, Postman, etc.) to make a BasicAuth request to the Cognito API, using the following details:

    • method: POST
    • base URL: the value of your csdap_host_url Terraform variable
    • path: /authclient/updateRedirectUri
    • username: the value of your csdap_client_id Terraform variable
    • password: the value of your csdap_client_password Terraform variable
    • headers: Content-Type='application/x-www-form-urlencoded'
    • body: redirect_uri=<cumulus_distribution_url>/login

    where <cumulus_distribution_url> is the value of your cumulus_distribution_url Terraform variable. Note the /login path at the end of the redirect_uri value.

    For reference, see the Cognito Authentication Service API.

    Next, install the Session Manager Plugin for the AWS CLI. If running on macOS, and you use Homebrew, you can install it simply as follows:

    brew install --cask session-manager-plugin --no-quarantine

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    At this point, you should be ready to open a tunnel and attempt to download your sample file via your browser, summarized as follows:

    1. Determine your ec2 instance ID
    2. Connect to the NASA VPN
    3. Start an AWS SSM session
    4. Open an ssh tunnel
    5. Use a browser to navigate to your file

    To determine your ec2 instance ID for your Cumulus deployment, run the follow command, where <profile> is the name of the appropriate AWS profile to use, and <prefix> is the value of your prefix Terraform variable:

    aws --profile <profile> ec2 describe-instances --filters Name=tag:Deployment,Values=<prefix> Name=instance-state-name,Values=running --query "Reservations[0].Instances[].InstanceId" --output text

    IMPORTANT: Before proceeding with the remaining steps, make sure you're connected to the NASA VPN.

    Use the value output from the command above in place of <id> in the following command, which will start an SSM session:

    aws ssm start-session --target <id> --document-name AWS-StartPortForwardingSession --parameters portNumber=22,localPortNumber=6000

    If successful, you should see output similar to the following:

    Starting session with SessionId: NGAPShApplicationDeveloper-***
    Port 6000 opened for sessionId NGAPShApplicationDeveloper-***.
    Waiting for connections...

    Open another terminal window, and open a tunnel with port forwarding, using your chosen port from above (e.g., 9000):

    ssh -4 -p 6000 -N -L <port>:<api-gateway-host>:443 ec2-user@127.0.0.1

    where:

    • <port> is the open local port you chose earlier (e.g., 9000)
    • <api-gateway-host> is the hostname of your private API Gateway (i.e., the host portion of the URL you used as the value of your cumulus_distribution_url Terraform variable above)

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3 above.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    Once you're finished testing, clean up as follows:

    1. Kill your ssh tunnel (Ctrl-C)
    2. Kill your AWS SSM session (Ctrl-C)
    3. If you like, disconnect from the NASA VPC

    While this is a relatively lengthy process, things are much easier when using CloudFront, such as in Production (OPS), SIT, or UAT, as explained next.

    Using a CloudFront URL as your distribution URL

    In Production (OPS), and perhaps in other environments, such as UAT and SIT, you'll need to provide a publicly accessible URL for users to use for downloading (distributing) granule files.

    This is generally done by placing a CloudFront URL in front of your private Cumulus Distribution API Gateway. In order to create such a CloudFront URL, contact the person who helped you obtain your Cognito credentials, and request a CloudFront URL with the following details:

    • The private, backing URL, which is the value of your cumulus_distribution_api_uri Terraform output value
    • A request to add the AWS account's VPC to the whitelist

    Once this request is completed, and you obtain the new CloudFront URL, override your default distribution URL with the CloudFront URL by adding the following to your cumulus-tf/terraform.tfvars file:

    cumulus_distribution_url = <cloudfront_url>

    In addition, add a Cognito redirect URI, as detailed in the previous section. Note that in this case, the value you'll use for redirect_uri is <cloudfront_url>/login since the value of your cumulus_distribution_url is now your CloudFront URL.

    At this point, it is assumed that you have added the appropriate values for this environment for the variables described at the top (csdap_host_url, csdap_client_id, and csdap_client_password).

    Redeploy Cumulus with your new/updated Terraform variables.

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    S3 Bucket Mapping

    An S3 Bucket map allows users to abstract bucket names. If the bucket names change at any point, only the bucket map would need to be updated instead of every S3 link.

    The Cumulus Distribution API uses a bucket_map.yaml or bucket_map.yaml.tmpl file to determine which buckets to serve. See the examples.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple json mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    Note: Cumulus only supports a one-to-one mapping of bucket -> Cumulus Distribution path for 'distribution' buckets. Also, the bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Switching from the Thin Egress App to Cumulus Distribution

    If you have previously deployed the Thin Egress App (TEA) as your distribution app, you can switch to Cumulus Distribution by following the steps above.

    Note, however, that the cumulus_distribution module will generate a bucket map cache and overwrite any existing bucket map caches created by TEA.

    There will also be downtime while your API gateway is updated.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/deployment/index.html b/docs/v13.0.0/deployment/index.html index 73544f0fc50..43e5e85884f 100644 --- a/docs/v13.0.0/deployment/index.html +++ b/docs/v13.0.0/deployment/index.html @@ -5,7 +5,7 @@ How to Deploy Cumulus | Cumulus Documentation - + @@ -21,7 +21,7 @@ for deployment's EC2 instances and allows you to connect to them via SSH/SSM.

    Consider the sizing of your Cumulus instance when configuring your variables.

    Choose a distribution API

    Cumulus can be configured to use either the Thin Egress App (TEA) or the Cumulus Distribution API. The default selection is the Thin Egress App if you're using the Deployment Template.

    IMPORTANT! If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    Configure the Thin Egress App

    The Thin Egress App can be used for Cumulus distribution and is the default selection. It allows authentication using Earthdata Login. Follow the steps in the documentation to configure distribution in your cumulus-tf deployment.

    Configure the Cumulus Distribution API (optional)

    If you would prefer to use the Cumulus Distribution API, which supports AWS Cognito authentication, follow these steps to configure distribution in your cumulus-tf deployment.

    Initialize Terraform

    Follow the above instructions to initialize Terraform using terraform init3.

    Deploy

    Run terraform apply to deploy the resources. Type yes when prompted to confirm that you want to create the resources. Assuming the operation is successful, you should see output like this:

    Apply complete! Resources: 292 added, 0 changed, 0 destroyed.

    Outputs:

    archive_api_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/token
    archive_api_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/
    distribution_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/login
    distribution_url = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/

    Note: Be sure to copy the redirect URLs, as you will use them to update your Earthdata application.

    Update Earthdata Application

    You will need to add two redirect URLs to your EarthData login application.

    1. Login to URS.
    2. Under My Applications -> Application Administration -> use the edit icon of your application.
    3. Under Manage -> redirect URIs, add the Archive API url returned from the stack deployment
      • e.g. archive_api_redirect_uri = https://<czbbkscuy6>.execute-api.us-east-1.amazonaws.com/dev/token.
    4. Also add the Distribution url
      • e.g. distribution_redirect_uri = https://<kido2r7kji>.execute-api.us-east-1.amazonaws.com/dev/login1.
    5. You may delete the placeholder url you used to create the application.

    If you've lost track of the needed redirect URIs, they can be located on the API Gateway. Once there, select <prefix>-archive and/or <prefix>-thin-egress-app-EgressGateway, Dashboard and utilizing the base URL at the top of the page that is accompanied by the text Invoke this API at:. Make sure to append /token for the archive URL and /login to the thin egress app URL.


    Deploy Cumulus dashboard

    Dashboard Requirements

    Please note that the requirements are similar to the Cumulus stack deployment requirements. The installation instructions below include a step that will install/use the required node version referenced in the .nvmrc file in the dashboard repository.

    Prepare AWS

    Create S3 bucket for dashboard:

    • Create it, e.g. <prefix>-dashboard. Use the command line or console as you did when preparing AWS configuration.
    • Configure the bucket to host a website:
      • AWS S3 console: Select <prefix>-dashboard bucket then, "Properties" -> "Static Website Hosting", point to index.html
      • CLI: aws s3 website s3://<prefix>-dashboard --index-document index.html
    • The bucket's url will be http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or you can find it on the AWS console via "Properties" -> "Static website hosting" -> "Endpoint"
    • Ensure the bucket's access permissions allow your deployment user access to write to the bucket

    Install dashboard

    To install the dashboard, clone the Cumulus dashboard repository into the root deploy directory and install dependencies with npm install:

      git clone https://github.com/nasa/cumulus-dashboard
    cd cumulus-dashboard
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Dashboard versioning

    By default, the master branch will be used for dashboard deployments. The master branch of the dashboard repo contains the most recent stable release of the dashboard.

    If you want to test unreleased changes to the dashboard, use the develop branch.

    Each release/version of the dashboard will have a tag in the dashboard repo. Release/version numbers will use semantic versioning (major/minor/patch).

    To checkout and install a specific version of the dashboard:

      git fetch --tags
    git checkout <version-number> # e.g. v1.2.0
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Building the dashboard

    Note: These environment variables are available during the build: APIROOT, DAAC_NAME, STAGE, HIDE_PDR. Any of these can be set on the command line to override the values contained in config.js when running the build below.

    To configure your dashboard for deployment, set the APIROOT environment variable to your app's API root.2

    Build the dashboard from the dashboard repository root directory, cumulus-dashboard:

      APIROOT=<your_api_root> npm run build

    Dashboard deployment

    Deploy dashboard to s3 bucket from the cumulus-dashboard directory:

    Using AWS CLI:

      aws s3 sync dist s3://<prefix>-dashboard --acl public-read

    From the S3 Console:

    • Open the <prefix>-dashboard bucket, click 'upload'. Add the contents of the 'dist' subdirectory to the upload. Then select 'Next'. On the permissions window allow the public to view. Select 'Upload'.

    You should be able to visit the dashboard website at http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or find the url <prefix>-dashboard -> "Properties" -> "Static website hosting" -> "Endpoint" and login with a user that you configured for access in the Configure and Deploy the Cumulus Stack step.


    Cumulus Instance Sizing

    The Cumulus deployment default sizing for Elasticsearch instances, EC2 instances, and Autoscaling Groups are small and designed for testing and cost savings. The default settings are likely not suitable for production workloads. Sizing is highly individual and dependent on expected load and archive size.

    Please be cognizant of costs as any change in size will affect your AWS bill. AWS provides a pricing calculator for estimating costs.

    Elasticsearch

    The mappings file contains all of the data types that will be indexed into Elasticsearch. Elasticsearch sizing is tied to your archive size, including your collections, granules, and workflow executions that will be stored.

    AWS provides documentation on calculating and configuring for sizing.

    In addition to size you'll want to consider the number of nodes which determine how the system reacts in the event of a failure.

    Configuration can be done in the data persistence module in elasticsearch_config and the cumulus module in es_index_shards.

    If you make changes to your Elasticsearch configuration you will need to reindex for those changes to take effect.

    EC2 instances and autoscaling groups

    EC2 instances are used for long-running operations (i.e. generating a reconciliation report) and long-running workflow tasks. Configuration for your ECS cluster is achieved via Cumulus deployment variables.

    When configuring your ECS cluster consider:

    • The EC2 instance type and EBS volume size needed to accommodate your workloads. Configured as ecs_cluster_instance_type and ecs_cluster_instance_docker_volume_size.
    • The minimum and desired number of instances on hand to accommodate your workloads. Configured as ecs_cluster_min_size and ecs_cluster_desired_size.
    • The maximum number of instances you will need and are willing to pay for to accommodate your heaviest workloads. Configured as ecs_cluster_max_size.
    • Your autoscaling parameters: ecs_cluster_scale_in_adjustment_percent, ecs_cluster_scale_out_adjustment_percent, ecs_cluster_scale_in_threshold_percent, and ecs_cluster_scale_out_threshold_percent.

    Footnotes


    1. Run terraform init if:

      • This is the first time deploying the module
      • You have added any additional child modules, including Cumulus components
      • You have updated the source for any of the child modules

    2. To add another redirect URIs to your application. On Earthdata home page, select "My Applications". Scroll down to "Application Administration" and use the edit icon for your application. Then Manage -> Redirect URIs.

    3. The API root can be found a number of ways. The easiest is to note it in the output of the app deployment step. But you can also find it from the AWS console -> Amazon API Gateway -> APIs -> <prefix>-archive -> Dashboard, and reading the URL at the top after "Invoke this API at"

    - + \ No newline at end of file diff --git a/docs/v13.0.0/deployment/postgres_database_deployment/index.html b/docs/v13.0.0/deployment/postgres_database_deployment/index.html index 79e5771ca9f..0ca31e0fb0d 100644 --- a/docs/v13.0.0/deployment/postgres_database_deployment/index.html +++ b/docs/v13.0.0/deployment/postgres_database_deployment/index.html @@ -5,7 +5,7 @@ PostgreSQL Database Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ cumulus-rds-tf that will deploy an AWS RDS Aurora Serverless PostgreSQL 10.2 compatible database cluster, and optionally provision a single deployment database with credentialed secrets for use with Cumulus.

    We have provided an example terraform deployment using this module in the Cumulus template-deploy repository on github.

    Use of this example involves:

    • Creating/configuring a Terraform module directory
    • Using Terraform to deploy resources to AWS

    Requirements

    Configuration/installation of this module requires the following:

    • Terraform
    • git
    • A VPC configured for use with Cumulus Core. This should match the subnets you provide when Deploying Cumulus to allow Core's lambdas to properly access the database.
    • At least two subnets across multiple AZs. These should match the subnets you provide as configuration when Deploying Cumulus, and should be within the same VPC.

    Needed Git Repositories

    Assumptions

    OS/Environment

    The instructions in this module require Linux/MacOS. While deployment via Windows is possible, it is unsupported.

    Terraform

    This document assumes knowledge of Terraform. If you are not comfortable working with Terraform, the following links should bring you up to speed:

    For Cumulus specific instructions on installation of Terraform, refer to the main Cumulus Installation Documentation

    Aurora/RDS

    This document also assumes some basic familiarity with PostgreSQL databases, and Amazon Aurora/RDS. If you're unfamiliar consider perusing the AWS docs, and the Aurora Serverless V1 docs.

    Prepare deployment repository

    If you already are working with an existing repository that has a configured rds-cluster-tf deployment for the version of Cumulus you intend to deploy or update, or just need to configure this module for your repository, skip to Prepare AWS configuration.

    Clone the cumulus-template-deploy repo and name appropriately for your organization:

      git clone https://github.com/nasa/cumulus-template-deploy <repository-name>

    We will return to configuring this repo and using it for deployment below.

    Optional: Create a new repository

    Create a new repository on Github so that you can add your workflows and other modules to source control:

      git remote set-url origin https://github.com/<org>/<repository-name>
    git push origin master

    You can then add/commit changes as needed.

    Note: If you are pushing your deployment code to a git repo, make sure to add terraform.tf and terraform.tfvars to .gitignore, as these files will contain sensitive data related to your AWS account.


    Prepare AWS configuration

    To deploy this module, you need to make sure that you have the following steps from the Cumulus deployment instructions in similar fashion for this module:

    --

    Configure and deploy the module

    When configuring this module, please keep in mind that unlike Cumulus deployment, this module should be deployed once to create the database cluster and only thereafter to make changes to that configuration/upgrade/etc. This module does not need to be re-deployed for each Core update.

    These steps should be executed in the rds-cluster-tf directory of the template deploy repo that you previously cloned. Run the following to copy the example files:

    cd rds-cluster-tf/
    cp terraform.tf.example terraform.tf
    cp terraform.tfvars.example terraform.tfvars

    In terraform.tf, configure the remote state settings by substituting the appropriate values for:

    • bucket
    • dynamodb_table
    • PREFIX (whatever prefix you've chosen for your deployment)

    Fill in the appropriate values in terraform.tfvars. See the rds-cluster-tf module variable definitions for more detail on all of the configuration options. A few notable configuration options are documented in the next section.

    Configuration Options

    • deletion_protection -- defaults to true. Set it to false if you want to be able to delete your cluster with a terraform destroy without manually updating the cluster.
    • db_admin_username -- cluster database administration username. Defaults to postgres.
    • db_admin_password -- required variable that specifies the admin user password for the cluster. To randomize this on each deployment, consider using a random_string resource as input.
    • region -- defaults to us-east-1.
    • subnets -- requires at least 2 across different AZs. For use with Cumulus, these AZs should match the values you configure for your lambda_subnet_ids.
    • max_capacity -- the max ACUs the cluster is allowed to use. Carefully consider cost/performance concerns when setting this value.
    • min_capacity -- the minimum ACUs the cluster will scale to
    • provision_user_database -- Optional flag to allow module to provision a user database in addition to creating the cluster. Described in the next section.

    Provision user and user database

    If you wish for the module to provision a PostgreSQL database on your new cluster and provide a secret for access in the module output, in addition to managing the cluster itself, the following configuration keys are required:

    • provision_user_database -- must be set to true, this configures the module to deploy a lambda that will create the user database, and update the provided configuration on deploy.
    • permissions_boundary_arn -- the permissions boundary to use in creating the roles for access the provisioning lambda will need. This should in most use cases be the same one used for Cumulus Core deployment.
    • rds_user_password -- the value to set the user password to
    • prefix -- this value will be used to set a unique identifier the ProvisionDatabase lambda, as well as name the provisioned user/database.

    Once configured, the module will deploy the lambda, and run it on each provision, creating the configured database if it does not exist, updating the user password if that value has been changed, and updating the output user database secret.

    Setting provision_user_database to false after provisioning will not result in removal of the configured database, as the lambda is non-destructive as configured in this module.

    Please Note: This functionality is limited in that it will only provision a single database/user and configure a basic database, and should not be used in scenarios where more complex configuration is required.

    Initialize Terraform

    Run terraform init

    You should see output like:

    * provider.aws: version = "~> 2.32"

    Terraform has been successfully initialized!

    Deploy

    Run terraform apply to deploy the resources.

    If re-applying this module, variables (e.g. engine_version, snapshot_identifier ) that force a recreation of the database cluster may result in data loss if deletion protection is disabled. Examine the changeset carefully for resources that will be re-created/destroyed before applying.

    Review the changeset, and assuming it looks correct, type yes when prompted to confirm that you want to create all of the resources.

    Assuming the operation is successful, you should see output similar to the following (this example omits the creation of a user database/lambdas/security groups):

    terraform apply

    An execution plan has been generated and is shown below.
    Resource actions are indicated with the following symbols:
    + create

    Terraform will perform the following actions:

    # module.rds_cluster.aws_db_subnet_group.default will be created
    + resource "aws_db_subnet_group" "default" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + subnet_ids = [
    + "subnet-xxxxxxxxx",
    + "subnet-xxxxxxxxx",
    ]
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    }

    # module.rds_cluster.aws_rds_cluster.cumulus will be created
    + resource "aws_rds_cluster" "cumulus" {
    + apply_immediately = true
    + arn = (known after apply)
    + availability_zones = (known after apply)
    + backup_retention_period = 1
    + cluster_identifier = "xxxxxxxxx"
    + cluster_identifier_prefix = (known after apply)
    + cluster_members = (known after apply)
    + cluster_resource_id = (known after apply)
    + copy_tags_to_snapshot = false
    + database_name = "xxxxxxxxx"
    + db_cluster_parameter_group_name = (known after apply)
    + db_subnet_group_name = (known after apply)
    + deletion_protection = true
    + enable_http_endpoint = true
    + endpoint = (known after apply)
    + engine = "aurora-postgresql"
    + engine_mode = "serverless"
    + engine_version = "10.12"
    + final_snapshot_identifier = "xxxxxxxxx"
    + hosted_zone_id = (known after apply)
    + id = (known after apply)
    + kms_key_id = (known after apply)
    + master_password = (sensitive value)
    + master_username = "xxxxxxxxx"
    + port = (known after apply)
    + preferred_backup_window = "07:00-09:00"
    + preferred_maintenance_window = (known after apply)
    + reader_endpoint = (known after apply)
    + skip_final_snapshot = false
    + storage_encrypted = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_security_group_ids = (known after apply)

    + scaling_configuration {
    + auto_pause = true
    + max_capacity = 4
    + min_capacity = 2
    + seconds_until_auto_pause = 300
    + timeout_action = "RollbackCapacityChange"
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret.rds_login will be created
    + resource "aws_secretsmanager_secret" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + policy = (known after apply)
    + recovery_window_in_days = 30
    + rotation_enabled = (known after apply)
    + rotation_lambda_arn = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }

    + rotation_rules {
    + automatically_after_days = (known after apply)
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret_version.rds_login will be created
    + resource "aws_secretsmanager_secret_version" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + secret_id = (known after apply)
    + secret_string = (sensitive value)
    + version_id = (known after apply)
    + version_stages = (known after apply)
    }

    # module.rds_cluster.aws_security_group.rds_cluster_access will be created
    + resource "aws_security_group" "rds_cluster_access" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + egress = (known after apply)
    + id = (known after apply)
    + ingress = (known after apply)
    + name = (known after apply)
    + name_prefix = "cumulus_rds_cluster_access_ingress"
    + owner_id = (known after apply)
    + revoke_rules_on_delete = false
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_id = "vpc-xxxxxxxxx"
    }

    # module.rds_cluster.aws_security_group_rule.rds_security_group_allow_PostgreSQL will be created
    + resource "aws_security_group_rule" "rds_security_group_allow_postgres" {
    + from_port = 5432
    + id = (known after apply)
    + protocol = "tcp"
    + security_group_id = (known after apply)
    + self = true
    + source_security_group_id = (known after apply)
    + to_port = 5432
    + type = "ingress"
    }

    Plan: 6 to add, 0 to change, 0 to destroy.

    Do you want to perform these actions?
    Terraform will perform the actions described above.
    Only 'yes' will be accepted to approve.

    Enter a value: yes

    module.rds_cluster.aws_db_subnet_group.default: Creating...
    module.rds_cluster.aws_security_group.rds_cluster_access: Creating...
    module.rds_cluster.aws_secretsmanager_secret.rds_login: Creating...

    Then, after the resources are created:

    Apply complete! Resources: X added, 0 changed, 0 destroyed.
    Releasing state lock. This may take a few moments...

    Outputs:

    admin_db_login_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxxxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmdR
    admin_db_login_secret_version = xxxxxxxxx
    rds_endpoint = xxxxxxxxx.us-east-1.rds.amazonaws.com
    security_group_id = xxxxxxxxx
    user_credentials_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmpXA

    Note the output values for admin_db_login_secret_arn (and optionally user_credentials_secret_arn) as these provide the AWS Secrets Manager secret required to access the database as the administrative user and, optionally, the user database credentials Cumulus requires as well.

    The content of each of these secrets are is in the form:

    {
    "database": "postgres",
    "dbClusterIdentifier": "clusterName",
    "engine": "postgres",
    "host": "xxx",
    "password": "defaultPassword",
    "port": 5432,
    "username": "xxx"
    }
    • database -- the PostgreSQL database used by the configured user
    • dbClusterIdentifier -- the value set by the cluster_identifier variable in the terraform module
    • engine -- the Aurora/RDS database engine
    • host -- the RDS service host for the database in the form (dbClusterIdentifier)-(AWS ID string).(region).rds.amazonaws.com
    • password -- the database password
    • username -- the account username
    • port -- The database connection port, should always be 5432

    Next Steps

    The database cluster has been created/updated! From here you can continue to add additional user accounts, databases and other database configuration.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/deployment/share-s3-access-logs/index.html b/docs/v13.0.0/deployment/share-s3-access-logs/index.html index 53dc9384b01..dd291470f21 100644 --- a/docs/v13.0.0/deployment/share-s3-access-logs/index.html +++ b/docs/v13.0.0/deployment/share-s3-access-logs/index.html @@ -5,14 +5,14 @@ Share S3 Access Logs | Cumulus Documentation - +
    Version: v13.0.0

    Share S3 Access Logs

    It is possible through Cumulus to share S3 access logs across multiple S3 packages using the S3 replicator package.

    S3 Replicator

    The S3 Replicator is a node package that contains a simple lambda function, associated permissions, and the Terraform instructions to replicate create-object events from one S3 bucket to another.

    First ensure that you have enabled S3 Server Access Logging.

    Next configure your config.tfvars as described in the s3-replicator/README.md to correspond to your deployment. The source_bucket and source_prefix are determined by how you enabled the S3 Server Access Logging.

    In order to deploy the s3-replicator with cumulus you will need to add the module to your terraform main.tf definition. e.g.

    module "s3-replicator" {
    source = "<path to s3-replicator.zip>"
    prefix = var.prefix
    vpc_id = var.vpc_id
    subnet_ids = var.subnet_ids
    permissions_boundary = var.permissions_boundary_arn
    source_bucket = var.s3_replicator_config.source_bucket
    source_prefix = var.s3_replicator_config.source_prefix
    target_bucket = var.s3_replicator_config.target_bucket
    target_prefix = var.s3_replicator_config.target_prefix
    }

    The terraform source package can be found on the Cumulus github release page under the asset tab terraform-aws-cumulus-s3-replicator.zip.

    ESDIS Metrics

    In the NGAP environment, the ESDIS Metrics team has set up an ELK stack to process logs from Cumulus instances. To use this system, you must deliver any S3 Server Access logs that Cumulus creates.

    Configure the S3 replicator as described above using the target_bucket and target_prefix provided by the metrics team.

    The metrics team has taken care of setting up Logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/deployment/terraform-best-practices/index.html b/docs/v13.0.0/deployment/terraform-best-practices/index.html index 611d1118ab3..8f14c9f016f 100644 --- a/docs/v13.0.0/deployment/terraform-best-practices/index.html +++ b/docs/v13.0.0/deployment/terraform-best-practices/index.html @@ -5,7 +5,7 @@ Terraform Best Practices | Cumulus Documentation - + @@ -88,7 +88,7 @@ AWS CLI command, replacing PREFIX with your deployment prefix name:

    aws resourcegroupstaggingapi get-resources \
    --query "ResourceTagMappingList[].ResourceARN" \
    --tag-filters Key=Deployment,Values=PREFIX

    Ideally, the output should be an empty list, but if it is not, then you may need to manually delete the listed resources.

    Configuring the Cumulus deployment: link Restoring a previous version: link

    - + \ No newline at end of file diff --git a/docs/v13.0.0/deployment/thin_egress_app/index.html b/docs/v13.0.0/deployment/thin_egress_app/index.html index 827cc07c498..330badfc40a 100644 --- a/docs/v13.0.0/deployment/thin_egress_app/index.html +++ b/docs/v13.0.0/deployment/thin_egress_app/index.html @@ -5,7 +5,7 @@ Using the Thin Egress App for Cumulus distribution | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v13.0.0

    Using the Thin Egress App for Cumulus distribution

    The Thin Egress App (TEA) is an app running in Lambda that allows retrieving data from S3 using temporary links and provides URS integration.

    Configuring a TEA deployment

    TEA is deployed using Terraform modules. Refer to these instructions for guidance on how to integrate new components with your deployment.

    The cumulus-template-deploy repository cumulus-tf/main.tf contains a thin_egress_app for distribution.

    The TEA module provides these instructions showing how to add it to your deployment and the following are instructions to configure the thin_egress_app module in your Cumulus deployment.

    Create a secret for signing Thin Egress App JWTs

    The Thin Egress App uses JWTs internally to authenticate requests and requires a secret stored in AWS Secrets Manager containing SSH keys that are used to sign the JWTs.

    See the Thin Egress App documentation on how to create this secret with the correct values. It will be used later to set the thin_egress_jwt_secret_name variable when deploying the Cumulus module.

    bucket_map.yaml

    The Thin Egress App uses a bucket_map.yaml file to determine which buckets to serve. Documentation of the file format is available here.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple json mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    Please note: Cumulus only supports a one-to-one mapping of bucket->TEA path for 'distribution' buckets.

    Optionally configure a custom bucket map

    A simple config would look something like this:

    bucket_map.yaml
    MAP:
    my-protected: my-protected
    my-public: my-public

    PUBLIC_BUCKETS:
    - my-public

    Please note: your custom bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Optionally configure shared variables

    The cumulus module deploys certain components that interact with TEA. As a result, the cumulus module requires that if you are specifying a value for the stage_name variable to the TEA module, you must use the same value for the tea_api_gateway_stage variable to the cumulus module.

    One way to keep these variable values in sync across the modules is to use Terraform local values to define values to use for the variables for both modules. This approach is shown in the Cumulus core example deployment code.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/deployment/upgrade-readme/index.html b/docs/v13.0.0/deployment/upgrade-readme/index.html index 6cf137bfa3d..a9076d49be4 100644 --- a/docs/v13.0.0/deployment/upgrade-readme/index.html +++ b/docs/v13.0.0/deployment/upgrade-readme/index.html @@ -5,7 +5,7 @@ Upgrading Cumulus | Cumulus Documentation - + @@ -15,7 +15,7 @@ deployment functions correctly. Please refer to some recommended smoke tests given above, and consider additional tests appropriate for your particular deployment and environment.

    Update Cumulus Dashboard

    If there are breaking (or otherwise significant) changes to the Cumulus API, you should also upgrade your Cumulus Dashboard deployment to use the version of the Cumulus API matching the version of Cumulus to which you are migrating.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/development/forked-pr/index.html b/docs/v13.0.0/development/forked-pr/index.html index cde4c7d23e9..a53d79e73a8 100644 --- a/docs/v13.0.0/development/forked-pr/index.html +++ b/docs/v13.0.0/development/forked-pr/index.html @@ -5,13 +5,13 @@ Issuing PR From Forked Repos | Cumulus Documentation - +
    Version: v13.0.0

    Issuing PR From Forked Repos

    Fork the Repo

    • Fork the Cumulus repo
    • Create a new branch from the branch you'd like to contribute to
    • If an issue does't already exist, submit one (see above)

    Create a Pull Request

    Reviewing PRs from Forked Repos

    Upon submission of a pull request, the Cumulus development team will review the code.

    Once the code passes an initial review, the team will run the CI tests against the proposed update.

    The request will then either be merged, declined, or an adjustment to the code will be requested via the issue opened with the original PR request.

    PRs from forked repos cannot directly merged to master. Cumulus reviews must follow the following steps before completing the review process:

    1. Create a new branch:

        git checkout -b from-<name-of-the-branch> master
    2. Push the new branch to GitHub

    3. Change the destination of the forked PR to the new branch that was just pushed

      Screenshot of Github interface showing how to change the base branch of a pull request

    4. After code review and approval, merge the forked PR to the new branch.

    5. Create a PR for the new branch to master.

    6. If the CI tests pass, merge the new branch to master and close the issue. If the CI tests do not pass, request an amended PR from the original author/ or resolve failures as appropriate.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/development/integration-tests/index.html b/docs/v13.0.0/development/integration-tests/index.html index 22982597a6b..5b9d39c0b56 100644 --- a/docs/v13.0.0/development/integration-tests/index.html +++ b/docs/v13.0.0/development/integration-tests/index.html @@ -5,7 +5,7 @@ Integration Tests | Cumulus Documentation - + @@ -19,7 +19,7 @@ in the commit message.

    If you create a new stack and want to be able to run integration tests against it in CI, you will need to add it to bamboo/select-stack.js.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/development/quality-and-coverage/index.html b/docs/v13.0.0/development/quality-and-coverage/index.html index b2a2ee0e24e..987a7b78c83 100644 --- a/docs/v13.0.0/development/quality-and-coverage/index.html +++ b/docs/v13.0.0/development/quality-and-coverage/index.html @@ -5,7 +5,7 @@ Code Coverage and Quality | Cumulus Documentation - + @@ -23,7 +23,7 @@ here.

    To run linting on the markdown files, run npm run lint-md.

    Audit

    This project uses audit-ci to run a security audit on the package dependency tree. This must pass prior to merge. The configured rules for audit-ci can be found here.

    To execute an audit, run npm run audit.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/development/release/index.html b/docs/v13.0.0/development/release/index.html index 1fcdb60ef93..8998e2cd64f 100644 --- a/docs/v13.0.0/development/release/index.html +++ b/docs/v13.0.0/development/release/index.html @@ -5,7 +5,7 @@ Versioning and Releases | Cumulus Documentation - + @@ -15,7 +15,7 @@ It's useful to use the search feature of your code editor or grep to see if there any references to the old package versions. In bash shell you can run

    find . -name package.json -exec grep -nH "@cumulus/.*MAJOR\.MINOR\.PATCH.*" {} \;

    Verify that each of those is updated to the new MAJOR.MINOR.PATCH verion you are trying to release.

    A similar search for alpha and beta versions should be run on the release version and any problems should be fixed.

    find . -name package.json -exec grep -nHE "MAJOR\.MINOR\.PATCH.*(alpha|beta)" {} \;

    3. Check Cumulus Dashboard PRs for Version Bump

    There may be unreleased changes in the Cumulus Dashboard project that rely on this unreleased Cumulus Core version.

    If there is exists a PR in the cumulus-dashboard repo with a name containing: "Version Bump for Next Cumulus API Release":

    • There will be a placeholder change-me value that should be replaced with the Cumulus Core to-be-released-version.
    • Mark that PR as ready to be reviewed.

    4. Update CHANGELOG.md

    Update the CHANGELOG.md. Put a header under the Unreleased section with the new version number and the date.

    Add a link reference for the github "compare" view at the bottom of the CHANGELOG.md, following the existing pattern. This link reference should create a link in the CHANGELOG's release header to changes in the corresponding release.

    5. Update DATA_MODEL_CHANGELOG.md

    Similar to #4, make sure the DATA_MODEL_CHANGELOG is updated if there are data model changes in the release, and the link reference at the end of the document is updated as appropriate.

    6. Update CONTRIBUTORS.md

    ./bin/update-contributors.sh
    git add CONTRIBUTORS.md

    Commit and push these changes, if any.

    7. Update Cumulus package API documentation

    Update auto-generated API documentation for any Cumulus packages that have it:

    npm run docs-build-packages

    Commit and push these changes, if any.

    8. Cut new version of Cumulus Documentation

    If this is a backport, do not create a new version of the documentation. For various reasons, we do not merge backports back to master, other than changelog notes. Documentation changes for backports will not be published to our documentation website.

    cd website
    npm run version ${release_version}
    git add .

    Where ${release_version} corresponds to the version tag v1.2.3, for example.

    Commit and push these changes.

    9. Create a pull request against the minor version branch

    1. Push the release branch (e.g. release-1.2.3) to GitHub.

    2. Create a PR against the minor version base branch (e.g. release-1.2.x).

    3. Configure Bamboo to run automated tests against this PR by finding the branch plan for the release branch (release-1.2.3) and setting only these variables:

      • GIT_PR: true
      • SKIP_AUDIT: true

      IMPORTANT: Do NOT set the PUBLISH_FLAG variable to true for this branch plan. The actual publishing of the release will be handled by a separate, manually triggered branch plan.

      Screenshot of Bamboo CI interface showing the configuration of the GIT_PR branch variable to have a value of &quot;true&quot;

    4. Verify that the Bamboo build for the PR succeeds and then merge to the minor version base branch (release-1.2.x).

      • It is safe to do a squash merge in this instance, but not required
    5. You may delete your release branch (release-1.2.3) after merging to the base branch.

    10. Create a git tag for the release

    Check out the minor version base branch (release-1.2.x) now that your changes are merged in and do a git pull.

    Ensure you are on the latest commit.

    Create and push a new git tag:

        git tag -a vMAJOR.MINOR.PATCH -m "Release MAJOR.MINOR.PATCH"
    git push origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -a v9.1.0 -m "Release 9.1.0"
    git push origin v9.1.0

    11. Publishing the release

    Publishing of new releases is handled by a custom Bamboo branch plan and is manually triggered.

    The reasons for using a separate branch plan to handle releases instead of the branch plan for the minor version (e.g. release-1.2.x) are:

    • The Bamboo build for the minor version release branch is triggered automatically on any commits to that branch, whereas we want to manually control when the release is published.
    • We want to verify that integration tests have passed on the Bamboo build for the minor version release branch before we manually trigger the release, so that we can be sure that our code is safe to release.

    If this is a new minor version branch, then you will need to create a new Bamboo branch plan for publishing the release following the instructions below:

    Creating a Bamboo branch plan for the release

    • In the Cumulus Core project (https://ci.earthdata.nasa.gov/browse/CUM-CBA), click Actions -> Configure Plan in the top right.

    • Next to Plan branch click the rightmost button that displays Create Plan Branch upon hover.

    • Click Create plan branch manually.

    • Add the values in that list. Choose a display name that makes it very clear this is a deployment branch plan. Release (minor version branch name) seems to work well (e.g. Release (1.2.x))).

      • Make sure you enter the correct branch name (e.g. release-1.2.x).
    • Important Deselect Enable Branch - if you do not do this, it will immediately fire off a build.

    • Do Immediately On the Branch Details page, enable Change trigger. Set the Trigger type to manual, this will prevent commits to the branch from triggering the build plan. You should have been redirected to the Branch Details tab after creating the plan. If not, navigate to the branch from the list where you clicked Create Plan Branch in the previous step.

    • Go to the Variables tab. Ensure that you are on your branch plan and not the master plan: You should not see a large list of configured variables, but instead a dropdown allowing you to select variables to override, and the tab title will be Branch Variables. Then set the branch variables as follow:

      • DEPLOYMENT: cumulus-from-npm-tf (except in special cases such as incompatible backport branches)
        • If this variable is not set, it will default to the deployment name for the last committer on the branch
      • USE_CACHED_BOOTSTRAP: false
      • USE_TERRAFORM_ZIPS: true (IMPORTANT: MUST be set in order to run integration tests against the .zip files published during the build so that we are actually testing our released files)
      • GIT_PR: true
      • SKIP_AUDIT: true
      • PUBLISH_FLAG: true
    • Enable the branch from the Branch Details page.

    • Run the branch using the Run button in the top right.

    Bamboo will build and run lint and unit tests against that tagged release, publish the new packages to NPM, and then run the integration tests using those newly released packages.

    12. Create a new Cumulus release on github

    The CI release scripts will automatically create a GitHub release based on the release version tag, as well as upload artifacts to the Github release for the Terraform modules provided by Cumulus. The Terraform release artifacts include:

    • A multi-module Terraform .zip artifact containing filtered copies of the tf-modules, packages, and tasks directories for use as Terraform module sources.
    • A S3 replicator module
    • A workflow module
    • A distribution API module
    • An ECS service module

    Just make sure to verify the appropriate .zip files are present on Github after the release process is complete.

    13. Merge base branch back to master

    Finally, you need to reproduce the version update changes back to master.

    If this is the latest version, you can simply create a PR to merge the minor version base branch back to master.

    Do not merge master back into the release branch since we want the release branch to just have the code from the release. Instead, create a new branch off of the release branch and merge that to master. You can freely merge master into this branch and delete it when it is merged to master.

    If this is a backport, you will need to create a PR that ports the changelog updates back to master. It is important in this changelog note to call it out as a backport. For example, fixes in backport version 1.14.5 may not be available in 1.15.0 because the fix was introduced in 1.15.3.

    Troubleshooting

    Delete and regenerate the tag

    To delete a published tag to re-tag, follow these steps:

      git tag -d vMAJOR.MINOR.PATCH
    git push -d origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -d v9.1.0
    git push -d origin v9.1.0
    - + \ No newline at end of file diff --git a/docs/v13.0.0/docs-how-to/index.html b/docs/v13.0.0/docs-how-to/index.html index 93a4c3b8350..702059f396e 100644 --- a/docs/v13.0.0/docs-how-to/index.html +++ b/docs/v13.0.0/docs-how-to/index.html @@ -5,13 +5,13 @@ Cumulus Documentation: How To's | Cumulus Documentation - +
    Version: v13.0.0

    Cumulus Documentation: How To's

    Cumulus Docs Installation

    Run a Local Server

    Environment variables DOCSEARCH_API_KEY and DOCSEARCH_INDEX_NAME must be set for search to work. At the moment, search is only truly functional on prod because that is the only website we have registered to be indexed with DocSearch (see below on search).

    git clone git@github.com:nasa/cumulus
    cd cumulus
    npm run docs-install
    npm run docs-serve

    Note: docs-build will build the documents into website/build.

    Cumulus Documentation

    Our project documentation is hosted on GitHub Pages. The resources published to this website are housed in docs/ directory at the top of the Cumulus repository. Those resources primarily consist of markdown files and images.

    We use the open-source static website generator Docusaurus to build html files from our markdown documentation, add some organization and navigation, and provide some other niceties in the final website (search, easy templating, etc.).

    Add a New Page and Sidebars

    Adding a new page should be as simple as writing some documentation in markdown, placing it under the correct directory in the docs/ folder and adding some configuration values wrapped by --- at the top of the file. There are many files that already have this header which can be used as reference.

    ---
    id: doc-unique-id # unique id for this document. This must be unique across ALL documentation under docs/
    title: Title Of Doc # Whatever title you feel like adding. This will show up as the index to this page on the sidebar.
    hide_title: false
    ---

    Note: To have the new page show up in a sidebar the designated id must be added to a sidebar in the website/sidebars.js file. Docusaurus has an in depth explanation of sidebars here.

    Versioning Docs

    We lean heavily on Docusaurus for versioning. Their suggestions and walk-through can be found here. It is worth noting that we would like the Documentation versions to match up directly with release versions. Cumulus versioning is explained in the Versioning Docs.

    Search on our documentation site is taken care of by DocSearch. We have been provided with an apiKey and an indexName by DocSearch that we include in our website/siteConfig.js file. The rest, indexing and actual searching, we leave to DocSearch. Our builds expect environment variables for both these values to exist - DOCSEARCH_API_KEY and DOCSEARCH_NAME_INDEX.

    Add a new task

    The tasks list in docs/tasks.md is generated from the list of task package in the task folder. Do not edit the docs/tasks.md file directly.

    Read more about adding a new task.

    Editing the tasks.md header or template

    Look at the bin/build-tasks-doc.js and bin/tasks-header.md files to edit the output of the tasks build script.

    Editing diagrams

    For some diagrams included in the documentation, the raw source is included in the docs/assets/raw directory to allow for easy updating in the future:

    • assets/interfaces.svg -> assets/raw/interfaces.drawio (generated using draw.io)

    Deployment

    The master branch is automatically built and deployed to gh-pages branch. The gh-pages branch is served by Github Pages. Do not make edits to the gh-pages branch.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/external-contributions/index.html b/docs/v13.0.0/external-contributions/index.html index 8f637d08433..b11bbfd5471 100644 --- a/docs/v13.0.0/external-contributions/index.html +++ b/docs/v13.0.0/external-contributions/index.html @@ -5,13 +5,13 @@ External Contributions | Cumulus Documentation - +
    Version: v13.0.0

    External Contributions

    Contributions to Cumulus may be made in the form of PRs to the repositories directly or through externally developed tasks and components. Cumulus is designed as an ecosystem that leverages Terraform deployments and AWS Step Functions to easily integrate external components.

    This list may not be exhaustive and represents components that are open source, owned externally, and that have been tested with the Cumulus system. For more information and contributing guidelines, visit the respective GitHub repositories.

    Distribution

    The ASF Thin Egress App is used by Cumulus for distribution. TEA can be deployed with Cumulus or as part of other applications to distribute data.

    Operational Cloud Recovery Archive (ORCA)

    ORCA can be deployed with Cumulus to provide a customizable baseline for creating and managing operational backups.

    Workflow Tasks

    CNM

    PO.DAAC provides two workflow tasks to be used with the Cloud Notification Mechanism (CNM) Schema: CNM to Granule and CNM Response.

    See the CNM workflow data cookbook for an example of how these can be used in a Cumulus ingest workflow.

    DMR++ Generation

    GHRC has provided a DMR++ Generation wokrflow task. This task is meant to be used in conjunction with Cumulus' Hyrax Metadata Updates workflow task.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/faqs/index.html b/docs/v13.0.0/faqs/index.html index 5055055f109..a648c5830c8 100644 --- a/docs/v13.0.0/faqs/index.html +++ b/docs/v13.0.0/faqs/index.html @@ -5,13 +5,13 @@ Frequently Asked Questions | Cumulus Documentation - +
    Version: v13.0.0

    Frequently Asked Questions

    Below are some commonly asked questions that you may encounter that can assist you along the way when working with Cumulus.

    General

    How do I deploy a new instance in Cumulus?

    Answer: For steps on the Cumulus deployment process go to How to Deploy Cumulus.

    What prerequisites are needed to setup Cumulus?

    Answer: You will need access to the AWS console and an Earthdata login before you can deploy Cumulus.

    What is the preferred web browser for the Cumulus environment?

    Answer: Our preferred web browser is the latest version of Google Chrome.

    How do I quickly troubleshoot an issue in Cumulus?

    Answer: To troubleshoot and fix issues in Cumulus reference our recommended solutions in Troubleshooting Cumulus.

    Where can I get support help?

    Answer: The following options are available for assistance:

    • Cumulus: Outside NASA users should file a GitHub issue and inside NASA users should file a JIRA issue.
    • AWS: You can create a case in the AWS Support Center, accessible via your AWS Console.

    Integrators & Developers

    What is a Cumulus integrator?

    Answer: Those who are working within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    What are the steps if I run into an issue during deployment?

    Answer: If you encounter an issue with your deployment go to the Troubleshooting Deployment guide.

    Is Cumulus customizable and flexible?

    Answer: Yes. Cumulus is a modular architecture that allows you to decide which components that you want/need to deploy. These components are maintained as Terraform modules.

    What are Terraform modules?

    Answer: They are modules that are composed to create a Cumulus deployment, which gives integrators the flexibility to choose the components of Cumulus that want/need. To view Cumulus maintained modules or steps on how to create a module go to Terraform modules.

    Where do I find Terraform module variables

    Answer: Go here for a list of Cumulus maintained variables.

    What is a Cumulus workflow?

    Answer: A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions. For more details, we suggest visiting here.

    How do I set up a Cumulus workflow?

    Answer: You will need to create a provider, have an associated collection (add a new one), and generate a new rule first. Then you can set up a Cumulus workflow by following these steps here.

    What are the common use cases that a Cumulus integrator encounters?

    Answer: The following are some examples of possible use cases you may see:


    Operators

    What is a Cumulus operator?

    Answer: Those that ingests, archives, and troubleshoots datasets (called collections in Cumulus). Your daily activities might include but not limited to the following:

    • Ingesting datasets
    • Maintaining historical data ingest
    • Starting and stopping data handlers
    • Managing collections
    • Managing provider definitions
    • Creating, enabling, and disabling rules
    • Investigating errors for granules and deleting or re-ingesting granules
    • Investigating errors in executions and isolating failed workflow step(s)
    What are the common use cases that a Cumulus operator encounters?

    Answer: The following are some examples of possible use cases you may see:

    Can you re-run a workflow execution in AWS?

    Answer: Yes. For steps on how to re-run a workflow execution go to Re-running workflow executions in the Cumulus Operator Docs.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/features/ancillary_metadata/index.html b/docs/v13.0.0/features/ancillary_metadata/index.html index 64c49c32eb3..bb5124644bc 100644 --- a/docs/v13.0.0/features/ancillary_metadata/index.html +++ b/docs/v13.0.0/features/ancillary_metadata/index.html @@ -5,7 +5,7 @@ Ancillary Metadata Export | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v13.0.0

    Ancillary Metadata Export

    This feature utilizes the type key on a files object in a Cumulus granule. It uses the key to provide a mechanism where granule discovery, processing and other tasks can set and use this value to facilitate metadata export to CMR.

    Tasks setting type

    Discover Granules

    Uses the Collection type key to set the value for files on discovered granules in it's output.

    Parse PDR

    Uses a task-specific mapping to map PDR 'FILE_TYPE' to a CNM type to set type on granules from the PDR.

    CNMToCMALambdaFunction

    Natively supports types that are included in incoming messages to a CNM Workflow.

    Tasks using type

    Move Granules

    Uses the granule file type key to update UMM/ECHO 10 CMR files passed in as candidates to the task. This task adds the external facing URLs to the CMR metadata file based on the type. See the file tracking data cookbook for a detailed mapping. If a non-CNM type is specified, the task assumes it is a 'data' file.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/features/backup_and_restore/index.html b/docs/v13.0.0/features/backup_and_restore/index.html index 74284508321..9e4798615b7 100644 --- a/docs/v13.0.0/features/backup_and_restore/index.html +++ b/docs/v13.0.0/features/backup_and_restore/index.html @@ -5,7 +5,7 @@ Cumulus Backup and Restore | Cumulus Documentation - + @@ -52,7 +52,7 @@ writing to the old cluster.

  • Set the snapshot_identifier variable to the snapshot you wish to create, and configure the module like a new deployment, with a unique cluster_identifier

  • Deploy the module using terraform apply

  • Once deployed, verify the cluster has the expected data

  • Redeploy the data persistence and Cumulus deployments - You should not need to reconfigure either, as the secret ARN and the security group should not change, however double-check the configured values are as expected

  • - + \ No newline at end of file diff --git a/docs/v13.0.0/features/dead_letter_archive/index.html b/docs/v13.0.0/features/dead_letter_archive/index.html index 6894a58af53..487fad4c9d0 100644 --- a/docs/v13.0.0/features/dead_letter_archive/index.html +++ b/docs/v13.0.0/features/dead_letter_archive/index.html @@ -5,13 +5,13 @@ Cumulus Dead Letter Archive | Cumulus Documentation - +
    Version: v13.0.0

    Cumulus Dead Letter Archive

    This documentation explains the Cumulus dead letter archive and associated functionality.

    DB Records DLQ Archive

    The Cumulus system contains a number of dead letter queues. Perhaps the most important system lambda function supported by a DLQ is the sfEventSqsToDbRecords lambda function which parses Cumulus messages from workflow executions to generate and write database records to the Cumulus database.

    As of Cumulus v9+, the dead letter queue for this lambda (named sfEventSqsToDbRecordsDeadLetterQueue) has been updated with a consumer lambda that will automatically write any incoming records to the S3 system bucket, under the path <stackName>/dead-letter-archive/sqs/. This will allow integrators and operators engaged in debugging missing records to inspect any Cumulus messages which failed to process and did not result in the successful creation of database records.

    Dead Letter Archive recovery

    In addition to the above, as of Cumulus v9+, the Cumulus API also contains a new endpoint at /deadLetterArchive/recoverCumulusMessages.

    Sending a POST request to this endpoint will trigger a Cumulus AsyncOperation that will attempt to reprocess (and if successful delete) all Cumulus messages in the dead letter archive, using the same underlying logic as the existing sfEventSqsToDbRecords.

    This endpoint may prove particularly useful when recovering from extended or unexpected database outage, where messages failed to process due to external outage and there is no essential malformation of each Cumulus message.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/features/dead_letter_queues/index.html b/docs/v13.0.0/features/dead_letter_queues/index.html index 1efe4e0fcb4..b49c1665bd2 100644 --- a/docs/v13.0.0/features/dead_letter_queues/index.html +++ b/docs/v13.0.0/features/dead_letter_queues/index.html @@ -5,13 +5,13 @@ Dead Letter Queues | Cumulus Documentation - +
    Version: v13.0.0

    Dead Letter Queues

    startSF SQS queue

    The workflow-trigger for the startSF queue has a Redrive Policy set up that directs any failed attempts to pull from the workflow start queue to a SQS queue Dead Letter Queue.

    This queue can then be monitored for failures to initiate a workflow. Please note that workflow failures will not show up in this queue, only repeated failure to trigger a workflow.

    Named Lambda Dead Letter Queues

    Cumulus provides configured Dead Letter Queues (DLQ) for non-workflow Lambdas (such as ScheduleSF) to capture Lambda failures for further processing.

    These DLQs are setup with the following configuration:

      receive_wait_time_seconds  = 20
    message_retention_seconds = 1209600
    visibility_timeout_seconds = 60

    Default Lambda Configuration

    The following built-in Cumulus Lambdas are setup with DLQs to allow handling of process failures:

    • dbIndexer (Updates Elasticsearch)
    • JobsLambda (writes logs outputs to Elasticsearch)
    • ScheduleSF (the SF Scheduler Lambda that places messages on the queue that is used to start workflows, see Workflow Triggers)
    • publishReports (Lambda that publishes messages to the SNS topics for execution, granule and PDR reporting)
    • reportGranules, reportExecutions, reportPdrs (Lambdas responsible for updating records based on messages in the queues published by publishReports)

    Troubleshooting/Utilizing messages in a Dead Letter Queue

    Ideally an automated process should be configured to poll the queue and process messages off a dead letter queue.

    For aid in manually troubleshooting, you can utilize the SQS Management console to view/messages available in the queues setup for a particular stack. The dead letter queues will have a Message Body containing the Lambda payload, as well as Message Attributes that reference both the error returned and a RequestID which can be cross referenced to the associated Lambda's CloudWatch logs for more information:

    Screenshot of the AWS SQS console showing how to view SQS message attributes

    - + \ No newline at end of file diff --git a/docs/v13.0.0/features/distribution-metrics/index.html b/docs/v13.0.0/features/distribution-metrics/index.html index c87ffc4e9dc..69a20f00e4f 100644 --- a/docs/v13.0.0/features/distribution-metrics/index.html +++ b/docs/v13.0.0/features/distribution-metrics/index.html @@ -5,13 +5,13 @@ Cumulus Distribution Metrics | Cumulus Documentation - +
    Version: v13.0.0

    Cumulus Distribution Metrics

    It is possible to configure Cumulus and the Cumulus Dashboard to display information about the successes and failures of requests for data. This requires the Cumulus instance to deliver Cloudwatch Logs and S3 Server Access logs to an ELK stack.

    ESDIS Metrics in NGAP

    Work with the ESDIS metrics team to set up permissions and access to forward Cloudwatch Logs to a shared AWS:Logs:Destination as well as transferring your S3 Server Access logs to a metrics team bucket.

    The metrics team has taken care of setting up logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    Once Cumulus has been configured to deliver Cloudwatch logs to the ESDIS Metrics team, you can use the Elasticsearch indexes to create the necessary target patterns on the dashboard. These are often <daac>-cloudwatch-cumulus-<env>-* and <daac>-distribution-<env>-*, but they will depend on your specific Elastiscearch setup.

    Cumulus / ESDIS Metrics distribution system

    Architecture diagram showing how logs are replicated from a Cumulus instance to the ESDIS Metrics account and accessed by the Cumulus dashboard

    - + \ No newline at end of file diff --git a/docs/v13.0.0/features/execution_payload_retention/index.html b/docs/v13.0.0/features/execution_payload_retention/index.html index 1e7add86723..93c933bc2a7 100644 --- a/docs/v13.0.0/features/execution_payload_retention/index.html +++ b/docs/v13.0.0/features/execution_payload_retention/index.html @@ -5,13 +5,13 @@ Execution Payload Retention | Cumulus Documentation - +
    Version: v13.0.0

    Execution Payload Retention

    In addition to CloudWatch logs and AWS StepFunction API records, Cumulus automatically stores the initial and 'final' (the last update to the execution record) payload values as part of the Execution record in your RDS database and Elasticsearch.

    This allows access via the API (or optionally direct DB/Elasticsearch querying) for debugging/reporting purposes. The data is stored in the "originalPayload" and "finalPayload" fields.

    Payload record cleanup

    To reduce storage requirements, a CloudWatch rule ({stack-name}-dailyExecutionPayloadCleanupRule) triggering a daily run of the provided cleanExecutions lambda has been added. This lambda will remove all 'completed' and 'non-completed' payload records in the database that are older than the specified configuration.

    Configuration

    The following configuration flags have been made available in the cumulus module. They may be overridden in your deployment's instance of the cumulus module by adding the following configuration options:

    dailyexecution_payload_cleanup_schedule_expression (string)_

    This configuration option sets the execution times for this Lambda to run, using a Cloudwatch cron expression.

    Default value is "cron(0 4 * * ? *)".

    completeexecution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of completed execution payloads.

    Default value is false.

    completeexecution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a 'completed' status in days. Records with updatedAt values older than this with payload information will have that information removed.

    Default value is 10.

    noncomplete_execution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of "non-complete" (any status other than completed) execution payloads.

    Default value is false.

    noncomplete_execution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a status other than 'complete' in days. Records with updateTime values older than this with payload information will have that information removed.

    Default value is 30 days.

    • complete_execution_payload_disable/non_complete_execution_payload_disable

    These flags (true/false) determine if the cleanup script's logic for 'complete' and 'non-complete' executions will run. Default value is false for both.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/features/logging-esdis-metrics/index.html b/docs/v13.0.0/features/logging-esdis-metrics/index.html index 370f5b8017b..519d59004bb 100644 --- a/docs/v13.0.0/features/logging-esdis-metrics/index.html +++ b/docs/v13.0.0/features/logging-esdis-metrics/index.html @@ -5,13 +5,13 @@ Writing logs for ESDIS Metrics | Cumulus Documentation - +
    Version: v13.0.0

    Writing logs for ESDIS Metrics

    Note: This feature is only available for Cumulus deployments in NGAP environments.

    Prerequisite: You must configure your Cumulus deployment to deliver your logs to the correct shared logs destination for ESDIS metrics.

    Log messages delivered to the ESDIS metrics logs destination conforming to an expected format will be automatically ingested and parsed to enable helpful searching/filtering of your logs via the ESDIS metrics Kibana dashboard.

    Expected log format

    The ESDIS metrics pipeline expects a log message to be a JSON string representation of an object (dict in Python or map in Java). An example log message might look like:

    {
    "level": "info",
    "executions": "arn:aws:states:us-east-1:000000000000:execution:MySfn:abcd1234",
    "granules": "[\"granule-1\",\"granule-2\"]",
    "message": "hello world",
    "sender": "greetingFunction",
    "stackName": "myCumulus",
    "timestamp": "2018-10-19T19:12:47.501Z"
    }

    A log message can contain the following properties:

    • executions: The AWS Step Function execution name in which this task is executing, if any
    • granules: A JSON string of the array of granule IDs being processed by this code, if any
    • level: A string identifier for the type of message being logged. Possible values:
      • debug
      • error
      • fatal
      • info
      • warn
      • trace
    • message: String containing your actual log message
    • parentArn: The parent AWS Step Function execution ARN that triggered the current execution, if any
    • sender: The name of the resource generating the log message (e.g. a library name, a Lambda function name, an ECS activity name)
    • stackName: The unique prefix for your Cumulus deployment
    • timestamp: An ISO-8601 formatted timestamp
    • version: The version of the resource generating the log message, if any

    None of these properties are explicitly required for ESDIS metrics to parse your log correctly. However, a log without a message has no informational content. And having level, sender, and timestamp properties is very useful for filtering your logs. Including a stackName in your logs is helpful as it allows you to distinguish between logs generated by different deployments.

    Using Cumulus Message Adapter libraries

    If you are writing a custom task that is integrated with the Cumulus Message Adapter, then some of language specific client libraries can be used to write logs compatible with ESDIS metrics.

    The usage of each library differs slightly, but in general a logger is initialized with a Cumulus workflow message to determine the contextual information for the task (e.g. granules, executions). Then, after the logger is initialized, writing logs only requires specifying a message, but the logged output will include the contextual information as well.

    Writing logs using custom code

    Any code that produces logs matching the expected log format can be processed by ESDIS metrics.

    Node.js

    Cumulus core provides a @cumulus/logger library that writes logs in the expected format for ESDIS metrics.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/features/replay-archived-sqs-messages/index.html b/docs/v13.0.0/features/replay-archived-sqs-messages/index.html index a8a3826a389..3f2dd4664ea 100644 --- a/docs/v13.0.0/features/replay-archived-sqs-messages/index.html +++ b/docs/v13.0.0/features/replay-archived-sqs-messages/index.html @@ -5,14 +5,14 @@ How to replay SQS messages archived in S3 | Cumulus Documentation - +
    Version: v13.0.0

    How to replay SQS messages archived in S3

    Context

    Cumulus archives all incoming SQS messages to S3 and removes messages once they have been processed. Unprocessed messages are archived at the path: ${stackName}/archived-incoming-messages/${queueName}/${messageId}

    Replay SQS messages endpoint

    The Cumulus API has added a new endpoint, /replays/sqs. This endpoint will allow you to start a replay operation to requeue all archived SQS messages by queueName and returns an AsyncOperationId for operation status tracking.

    Start replaying archived SQS messages

    In order to start a replay, you must perform a POST request to the replays/sqs endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    FieldTypeDescription
    queueNamestringAny valid SQS queue name (not ARN)

    Status tracking

    A successful response from the /replays/sqs endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/features/replay-kinesis-messages/index.html b/docs/v13.0.0/features/replay-kinesis-messages/index.html index bd98c2b069f..8d3391a93b3 100644 --- a/docs/v13.0.0/features/replay-kinesis-messages/index.html +++ b/docs/v13.0.0/features/replay-kinesis-messages/index.html @@ -5,7 +5,7 @@ How to replay Kinesis messages after an outage | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v13.0.0

    How to replay Kinesis messages after an outage

    After a period of outage, it may be necessary for a Cumulus operator to reprocess or 'replay' messages that arrived on an AWS Kinesis Data Stream but did not trigger an ingest. This document serves as an outline on how to start a replay operation, and how to perform status tracking. Cumulus supports replay of all Kinesis messages on a stream (subject to the normal RetentionPeriod constraints), or all messages within a given time slice delimited by start and end timestamps.

    As Kinesis has no comparable field to e.g. the SQS ReceiveCount on its records, Cumulus cannot tell which messages within a given time slice have never been processed, and cannot guarantee only missed messages will be processed. Users will have to rely on duplicate handling or some other method of identifying messages that should not be processed within the time slice.

    NOTE: This operation flow effectively changes only the trigger mechanism for Kinesis ingest notifications. The existence of valid Kinesis-type rules and all other normal requirements for the triggering of ingest via Kinesis still apply.

    Replays endpoint

    Cumulus has added a new endpoint to its API, /replays. This endpoint will allow you to start replay operations and returns an AsyncOperationId for operation status tracking.

    Start a replay

    In order to start a replay, you must perform a POST request to the replays endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    NOTE: As the endTimestamp relies on a comparison with the Kinesis server-side ApproximateArrivalTimestamp, and given that there is no documented level of accuracy for the approximation, it is recommended that the endTimestamp include some amount of buffer to allow for slight discrepancies. If tolerable, the same is recommended for the startTimestamp although it is used differently and less vulnerable to discrepancies since a server-side arrival timestamp should never be earlier than the client-side request timestamp.

    FieldTypeRequiredDescription
    typestringrequiredCurrently only accepts kinesis.
    kinesisStreamstringfor type kinesisAny valid kinesis stream name (not ARN)
    kinesisStreamCreationTimestamp*optionalAny input valid for a JS Date constructor. For reasons to use this field see AWS documentation on StreamCreationTimestamp.
    endTimestamp*optionalAny input valid for a JS Date constructor. Messages newer than this timestamp will be skipped.
    startTimestamp*optionalAny input valid for a JS Date constructor. Messages will be fetched from the Kinesis stream starting at this timestamp. Ignored if it is further in the past than the stream's retention period.

    Status tracking

    A successful response from the /replays endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/features/reports/index.html b/docs/v13.0.0/features/reports/index.html index 1e5e403eda1..871563de8f1 100644 --- a/docs/v13.0.0/features/reports/index.html +++ b/docs/v13.0.0/features/reports/index.html @@ -5,7 +5,7 @@ Reconciliation Reports | Cumulus Documentation - + @@ -19,7 +19,7 @@ report generation. The data buckets will include any buckets in your Cumulus buckets configuration that have type public, protected or private.
    - + \ No newline at end of file diff --git a/docs/v13.0.0/getting-started/index.html b/docs/v13.0.0/getting-started/index.html index e2ae41f1daa..1d168cb2265 100644 --- a/docs/v13.0.0/getting-started/index.html +++ b/docs/v13.0.0/getting-started/index.html @@ -5,13 +5,13 @@ Getting Started | Cumulus Documentation - +
    Version: v13.0.0

    Getting Started

    Overview | Quick Tutorials | Helpful Tips

    Overview

    This serves as a guide for new Cumulus users to deploy and learn how to use Cumulus. Here you will learn what you need in order to complete any prerequisites, what Cumulus is and how it works, and how to successfully navigate and deploy a Cumulus environment.

    What is Cumulus

    Cumulus is an open source set of components for creating cloud-based data ingest, archive, distribution and management designed for NASA's future Earth Science data streams.

    Who uses Cumulus

    Data integrators/developers and operators across projects not limited to NASA use Cumulus for their daily work functions.

    Cumulus Roles

    Integrator/Developer

    Cumulus integrators/developers are those who work within Cumulus and AWS for deployments and to manage workflows.

    Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections.

    Role Guides

    As a developer, integrator, or operator, you will need to set up your environments to work in Cumulus. The following docs can get you started in your role specific activities.

    What is a Cumulus Data Type

    In Cumulus, we have the following types of data that you can create and manage:

    • Collections
    • Granules
    • Providers
    • Rules
    • Workflows
    • Executions
    • Reports

    For details on how to create or manage data types go to Data Management Types.


    Quick Tutorials

    Deployment & Configuration

    Cumulus is deployed to an AWS account, so you must have access to deploy resources to an AWS account to get started.

    1. Deploy Cumulus and Cumulus Dashboard to AWS

    Follow the deployment instructions to deploy Cumulus to your AWS account.

    2. Configure and Run the HelloWorld Workflow

    If you have deployed using the cumulus-template-deploy repository, you have a HelloWorld workflow deployed to your Cumulus backend.

    You can see your deployed workflows on the Workflows page of your Cumulus dashboard.

    Configure a collection and provider using the setup guidance on the Cumulus dashboard.

    Then create a rule to trigger your HelloWorld workflow. You can select a rule type of one time.

    Navigate to the Executions page of the dashboard to check the status of your workflow execution.

    3. Configure a Custom Workflow

    See Developing a custom workflow documentation for adding a new workflow to your deployment.

    There are plenty of workflow examples using Cumulus tasks here. The Data Cookbooks provide a more in-depth look at some of these more advanced workflows and their configurations.

    There is a list of Cumulus tasks already included in your deployment here.

    After configuring your workflow and redeploying, you can configure and run your workflow using the same steps as in step 2.


    Helpful Tips

    Here are some useful tips to keep in mind when deploying or working in Cumulus.

    Integrator/Developer

    • Versioning and Releases: This documentation gives information on our global versioning approach. We suggest upgrading to the supported version for Cumulus, Cumulus dashboard, and Thin Egress App (TEA).
    • Cumulus Developer Documentation: We suggest that you read through and reference this resource for development best practices in Cumulus.
    • Cumulus Deployment: We will guide you on how to manually deploy a new instance of Cumulus. In this reference, you will learn how to install Terraform, create an AWS S3 bucket, configure a compatible database, and create a Lambda layer.
    • Terraform Best Practices: This will help guide you through your Terraform configuration and Cumulus deployment. For an introduction about Terraform go here.
    • Integrator Common Use Cases: Scenarios to help integrators along in the Cumulus environment.

    Operator

    Troubleshooting

    Troubleshooting: Some suggestions to help you troubleshoot and solve issues you may encounter.

    Resources

    - + \ No newline at end of file diff --git a/docs/v13.0.0/glossary/index.html b/docs/v13.0.0/glossary/index.html index 4233236f31f..e5144076969 100644 --- a/docs/v13.0.0/glossary/index.html +++ b/docs/v13.0.0/glossary/index.html @@ -5,13 +5,13 @@ Glossary | Cumulus Documentation - +
    Version: v13.0.0

    Glossary

    AWS Glossary

    For terms/items from Amazon/AWS not mentioned in this glossary, please refer to the AWS Glossary.

    Cumulus Glossary of Terms

    API Gateway

    Refers to AWS's API Gateway. Used by the Cumulus API.

    ARN

    Refers to an AWS "Amazon Resource Name".

    For more info, see the AWS documentation.

    AWS

    See: aws.amazon.com

    AWS Lambda/Lambda Function

    AWS's 'serverless' option. Allows the running of code without provisioning a service or managing server/ECS instances/etc.

    For more information, see the AWS Lambda documentation.

    AWS Access Keys

    Access credentials that give you access to AWS to act as a IAM user programmatically or from the command line.

    For more information, see the AWS IAM Documentation.

    Bucket

    An Amazon S3 cloud storage resource.

    For more information, see the AWS Bucket Documentation.

    CloudFormation

    An AWS service that allows you to define and manage cloud resources as a preconfigured block.

    For more information, see the AWS CloudFormation User Guide.

    Cloudformation Template

    A template that defines an AWS Cloud Formation.

    For more information, see the AWS intro page.

    Cloudwatch

    AWS service that allows logging and metrics collections on various cloud resources you have in AWS.

    For more information, see the AWS User Guide.

    Cloud Notification Mechanism (CNM)

    An interface mechanism to support cloud-based ingest messaging. For more information, see PO.DAAC's CNM Schema.

    Common Metadata Repository (CMR)

    "A high-performance, high-quality, continuously evolving metadata system that catalogs Earth Science data and associated service metadata records". For more information, see NASA's CMR page.

    Collection (Cumulus)

    Cumulus Collections are logical sets of data objects of the same data type and version.

    For more information, see cookbook reference page.

    Cumulus Message Adapter (CMA)

    A library designed to help task developers integrate step function tasks into a Cumulus workflow by adapting task input/output into the Cumulus Message format.

    For more information, see CMA workflow reference page.

    Distributed Active Archive Center (DAAC)

    Refers to a specific organization that's part of NASA's distributed system of archive centers. For more information see EOSDIS's DAAC page

    Dead Letter Queue (DLQ)

    This refers to Amazon SQS Dead-Letter Queues - these SQS queues are specifically configured to capture failed messages from other services/SQS queues/etc to allow for processing of failed messages.

    For more on DLQs, see the Amazon Documentation and the Cumulus DLQ feature page.

    Developer

    Those who setup deployment and workflow management for Cumulus. Sometimes referred to as an integrator. See integrator.

    ECS

    Amazon's Elastic Container Service. Used in Cumulus by workflow steps that require more flexibility than Lambda can provide.

    For more information, see AWS's developer guide.

    ECS Activity

    An ECS instance run via a Step Function.

    Execution (Cumulus)

    A Cumulus execution refers to a single execution of a (Cumulus) Workflow.

    GIBS

    Global Imagery Browse Services

    Granule

    A granule is the smallest aggregation of data that can be independently managed (described, inventoried, and retrieved). Granules are always associated with a collection, which is a grouping of granules. A granule is a grouping of data files.

    IAM

    AWS Identity and Access Management.

    For more information, see AWS IAMs.

    Integrator/Developer

    Those who work within Cumulus and AWS for deployments and to manage workflows.

    Kinesis

    Amazon's platform for streaming data on AWS.

    See AWS Kinesis for more information.

    Lambda

    AWS's cloud service that lets you run code without provisioning or managing servers.

    For more information, see AWS's lambda page.

    Module (Terraform)

    Refers to a terraform module.

    Node

    See node.js.

    Npm

    Node package manager.

    For more information, see npmjs.com.

    Operator

    Those who work within Cumulus to ingest/archive data and manage collections.

    PDR

    "Polling Delivery Mechanism" used in "DAAC Ingest" workflows.

    For more information, see nasa.gov.

    Packages (NPM)

    NPM hosted node.js packages. Cumulus packages can be found on NPM's site here

    Provider

    Data source that generates and/or distributes data for Cumulus workflows to act upon.

    For more information, see the Cumulus documentation.

    Rule

    Rules are configurable scheduled events that trigger workflows based on various criteria.

    For more information, see the Cumulus Rules documentation.

    S3

    Amazon's Simple Storage Service provides data object storage in the cloud. Used in Cumulus to store configuration, data and more.

    For more information, see AWS's s3 page.

    SIPS

    Science Investigator-led Processing Systems. In the context of DAAC ingest, this refers to data producers/providers.

    For more information, see nasa.gov.

    SNS

    Amazon's Simple Notification Service provides a messaging service that allows publication of and subscription to events. Used in Cumulus to trigger workflow events, track event failures, and others.

    For more information, see AWS's SNS page.

    SQS

    Amazon's Simple Queue Service.

    For more information, see AWS's SQS page.

    Stack

    A collection of AWS resources you can manage as a single unit.

    In the context of Cumulus, this refers to a deployment of the cumulus and data-persistence modules that is managed by Terraform

    Step Function

    AWS's web service that allows you to compose complex workflows as a state machine comprised of tasks (Lambdas, activities hosted on EC2/ECS, some AWS service APIs, etc). See AWS's Step Function Documentation for more information. In the context of Cumulus these are the underlying AWS service used to create Workflows.

    Terraform

    Terraform is the tool that you will use for deployment and configuration of your Cumulus environment.

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/index.html b/docs/v13.0.0/index.html index b402215e4e0..07790394adb 100644 --- a/docs/v13.0.0/index.html +++ b/docs/v13.0.0/index.html @@ -5,13 +5,13 @@ Introduction | Cumulus Documentation - +
    Version: v13.0.0

    Introduction

    This Cumulus project seeks to address the existing need for a “native” cloud-based data ingest, archive, distribution, and management system that can be used for all future Earth Observing System Data and Information System (EOSDIS) data streams via the development and implementation of Cumulus. The term “native” implies that the system will leverage all components of a cloud infrastructure provided by the vendor for efficiency (in terms of both processing time and cost). Additionally, Cumulus will operate on future data streams involving satellite missions, aircraft missions, and field campaigns.

    This documentation includes both guidelines, examples, and source code docs. It is accessible at https://nasa.github.io/cumulus.


    Get To Know Cumulus

    • Getting Started - here - If you are new to Cumulus we suggest that you begin with this section to help you understand and work in the environment.
    • General Cumulus Documentation - here <- you're here

    Cumulus Reference Docs

    • Cumulus API Documentation - here
    • Cumulus Developer Documentation - here - READMEs throughout the main repository.
    • Data Cookbooks - here

    Auxiliary Guides

    • Integrator Guide - here
    • Operator Docs - here

    Contributing

    Please refer to: https://github.com/nasa/cumulus/blob/master/CONTRIBUTING.md for information. We thank you in advance.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/integrator-guide/about-int-guide/index.html b/docs/v13.0.0/integrator-guide/about-int-guide/index.html index 3845c2c704b..37010316a5c 100644 --- a/docs/v13.0.0/integrator-guide/about-int-guide/index.html +++ b/docs/v13.0.0/integrator-guide/about-int-guide/index.html @@ -5,13 +5,13 @@ About Integrator Guide | Cumulus Documentation - +
    Version: v13.0.0

    About Integrator Guide

    Purpose

    The Integrator Guide is to help supplement the Cumulus documentation and Data Cookbooks. This content is for Cumulus integrators who are either new to the project or need a step-by-step resource to help them along.

    What Is A Cumulus Integrator

    Cumulus integrators are those who work within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    - + \ No newline at end of file diff --git a/docs/v13.0.0/integrator-guide/int-common-use-cases/index.html b/docs/v13.0.0/integrator-guide/int-common-use-cases/index.html index 3d05702e6e1..a70ba9e5e03 100644 --- a/docs/v13.0.0/integrator-guide/int-common-use-cases/index.html +++ b/docs/v13.0.0/integrator-guide/int-common-use-cases/index.html @@ -5,13 +5,13 @@ Integrator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v13.0.0/integrator-guide/workflow-add-new-lambda/index.html b/docs/v13.0.0/integrator-guide/workflow-add-new-lambda/index.html index 21c7c3dd63c..fc242b7b6b1 100644 --- a/docs/v13.0.0/integrator-guide/workflow-add-new-lambda/index.html +++ b/docs/v13.0.0/integrator-guide/workflow-add-new-lambda/index.html @@ -5,13 +5,13 @@ Workflow - Add New Lambda | Cumulus Documentation - +
    Version: v13.0.0

    Workflow - Add New Lambda

    You can develop a workflow task in AWS Lambda or Elastic Container Service (ECS). AWS ECS requires Docker. For a list of tasks to use go to our Cumulus Tasks page.

    The following steps are to help you along as you write a new Lambda that integrates with a Cumulus workflow. This will aid you with the understanding of the Cumulus Message Adapter (CMA) process.

    Steps

    1. Define New Lambda in Terraform

    2. Add Task in JSON Object

      For details on how to set up a workflow via CMA go to the CMA Tasks: Message Flow.

      You will need to assign input and output for the new task and follow the CMA contract here. This contract defines how libraries should call the cumulus-message-adapter to integrate a task into an existing Cumulus Workflow.

    3. Verify New Task

      Check the updated workflow in AWS and in Cumulus.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/integrator-guide/workflow-ts-failed-step/index.html b/docs/v13.0.0/integrator-guide/workflow-ts-failed-step/index.html index aa488a08663..4e1eb3a2d51 100644 --- a/docs/v13.0.0/integrator-guide/workflow-ts-failed-step/index.html +++ b/docs/v13.0.0/integrator-guide/workflow-ts-failed-step/index.html @@ -5,13 +5,13 @@ Workflow - Troubleshoot Failed Step(s) | Cumulus Documentation - +
    Version: v13.0.0

    Workflow - Troubleshoot Failed Step(s)

    Steps

    1. Locate Step
    • Go to Cumulus dashboard
    • Find the granule
    • Go to Executions to determine the failed step
    1. Investigate in Cloudwatch
    • Go to Cloudwatch
    • Locate lambda
    • Search Cloudwatch logs
    1. Recreate Error

      In your sandbox environment, try to recreate the error.

    2. Resolution

    - + \ No newline at end of file diff --git a/docs/v13.0.0/interfaces/index.html b/docs/v13.0.0/interfaces/index.html index 8edab9b4fd7..646abccbfc1 100644 --- a/docs/v13.0.0/interfaces/index.html +++ b/docs/v13.0.0/interfaces/index.html @@ -5,13 +5,13 @@ Interfaces | Cumulus Documentation - +
    Version: v13.0.0

    Interfaces

    Cumulus has multiple interfaces that allow interaction with discrete components of the system, such as starting workflows via SNS/Kinesis/SQS, manually queueing workflow start messages, submitting SNS notifications for completed workflows, and the many operations allowed by the Cumulus API.

    The diagram below illustrates the workflow process in detail and the various interfaces that allow starting of workflows, reporting of workflow information, and database create operations that occur when a workflow reporting message is processed. For interfaces with expected input or output schemas, details are provided below.

    Architecture diagram showing the interfaces for triggering and reporting of Cumulus workflow executions

    Workflow triggers and queuing

    Kinesis stream

    As a Kinesis stream is consumed by the messageConsumer Lambda to queue workflow executions, the incoming event is validated against this consumer schema by the ajv package.

    SQS queue for executions

    The messages put into the SQS queue for executions should conform to the Cumulus message format.

    Workflow executions

    See the documentation on Cumulus workflows.

    Workflow reporting

    SNS reporting topics

    For granule and PDR reporting, the topics will only receive data if the Cumulus workflow execution message meets the following criteria:

    • Granules - workflow message contains granule data in payload.granules
    • PDRs - workflow message contains PDR data in payload.pdr

    The messages published to the SNS reporting topics for executions and PDRs and the record property in the messages published to the granules SNS topic should conform to the model schema for each data type.

    Further detail on workflow reporting and how to interact with these interfaces can be found in the workflow notifications data cookbook.

    Cumulus API

    See the Cumulus API documentation.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/operator-docs/about-operator-docs/index.html b/docs/v13.0.0/operator-docs/about-operator-docs/index.html index 56faaca7e07..2f7d8520264 100644 --- a/docs/v13.0.0/operator-docs/about-operator-docs/index.html +++ b/docs/v13.0.0/operator-docs/about-operator-docs/index.html @@ -5,13 +5,13 @@ About Operator Docs | Cumulus Documentation - +
    Version: v13.0.0

    About Operator Docs

    Purpose

    Operator Docs are an augmentation to Cumulus documentation and Data Cookbooks. These documents will walk step-by-step through common Cumulus activities (that aren't necessarily as use-case directed as what you'd see in Data Cookbooks).

    What Is A Cumulus Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections. They may perform the following functions via the operator dashboard or API:

    • Configure providers and collections
    • Configure rules and monitor workflow executions
    • Monitor granule ingestion
    • Monitor system metrics
    - + \ No newline at end of file diff --git a/docs/v13.0.0/operator-docs/bulk-operations/index.html b/docs/v13.0.0/operator-docs/bulk-operations/index.html index e962df79542..a92737d7927 100644 --- a/docs/v13.0.0/operator-docs/bulk-operations/index.html +++ b/docs/v13.0.0/operator-docs/bulk-operations/index.html @@ -5,14 +5,14 @@ Bulk Operations | Cumulus Documentation - +
    Version: v13.0.0

    Bulk Operations

    Cumulus implements bulk operations through the use of AsyncOperations, which are long-running processes executed on an AWS ECS cluster.

    Submitting a bulk API request

    Bulk operations are generally submitted via the endpoint for the relevant data type, e.g. granules. For a list of supported API requests, refer to the Cumulus API documentation. Bulk operations are denoted with the keyword 'bulk'.

    Starting bulk operations from the Cumulus dashboard

    Using a Kibana query

    Note: You must have configured your dashboard build with a KIBANAROOT environment variable in order for the Kibana link to render in the bulk granules modal

    1. From the Granules dashboard page, click on the "Run Bulk Granules" button, then select what type of action you would like to perform

      • Note: the rest of the process is the same regardless of what type of bulk action you perform
    2. From the bulk granules modal, click the "Open Kibana" link:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations

    3. Once you have accessed Kibana, navigate to the "Discover" page. If this is your first time using Kibana, you may see a message like this at the top of the page:

      In order to visualize and explore data in Kibana, you'll need to create an index pattern to retrieve data from Elasticsearch.

      In that case, see the docs for creating an index pattern for Kibana

      Screenshot of Kibana user interface showing the &quot;Discover&quot; page for running queries

    4. Enter a query that returns the granule records that you want to use for bulk operations:

      Screenshot of Kibana user interface showing an example Kibana query and results

    5. Once the Kibana query is returning the results you want, click the "Inspect" link near the top of the page. A slide out tab with request details will appear on the right side of the page:

      Screenshot of Kibana user interface showing details of an example request

    6. In the slide out tab that appears on the right side of the page, click the "Request" link near the top and scroll down until you see the query property:

      Screenshot of Kibana user interface showing the Elasticsearch data request made for a given Kibana query

    7. Highlight and copy the query contents from Kibana. Go back to the Cumulus dashboard and paste the query contents from Kibana inside of the query property in the bulk granules request payload. It is expected that you should have a property of query nested inside of the existing query property:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query information populated

    8. Add values for the index and workflowName to the bulk granules request payload. The value for index will vary based on your Elasticsearch setup, but it is good to target an index specifically for granule data if possible:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query, index, and workflow information populated

    9. Click the "Run Bulk Operations" button. You should see a confirmation message, including an ID for the async operation that was started to handle your bulk action. You can track the status of this async operation on the Operations dashboard page, which can be visited by clicking the "Go To Operations" button:

      Screenshot of Cumulus dashboard showing confirmation message with async operation ID for bulk granules request

    Creating an index pattern for Kibana

    1. Define the index pattern for the indices that your Kibana queries should use. A wildcard character, *, will match across multiple indices. Once you are satisfied with your index pattern, click the "Next step" button:

      Screenshot of Kibana user interface for defining an index pattern

    2. Choose whether to use a Time Filter for your data, which is not required. Then click the "Create index pattern" button:

      Screenshot of Kibana user interface for configuring the settings of an index pattern

    Status Tracking

    All bulk operations return an AsyncOperationId which can be submitted to the /asyncOperations endpoint.

    The /asyncOperations endpoint allows listing of AsyncOperation records as well as record retrieval for individual records, which will contain the status. The Cumulus API documentation shows sample requests for these actions.

    The Cumulus Dashboard also includes an Operations monitoring page, where operations and their status are visible:

    Screenshot of Cumulus Dashboard Operations Page showing 5 operations and their status, ID, description, type and creation timestamp

    - + \ No newline at end of file diff --git a/docs/v13.0.0/operator-docs/cmr-operations/index.html b/docs/v13.0.0/operator-docs/cmr-operations/index.html index 301e57da2ef..8d85b574f84 100644 --- a/docs/v13.0.0/operator-docs/cmr-operations/index.html +++ b/docs/v13.0.0/operator-docs/cmr-operations/index.html @@ -5,7 +5,7 @@ CMR Operations | Cumulus Documentation - + @@ -16,7 +16,7 @@ UpdateCmrAccessConstraints will update CMR metadata file contents on S3, and PostToCmr will push the updates to CMR. The rest of this section will assume you have created this workflow under the name UpdateCmrAccessConstraints.

    Once created and deployed, the workflow is available in the Cumulus dashboard's Execute workflow selector. However, note that additional configuration is required for this request, to supply an access constraint integer value and optional description to the UpdateCmrAccessConstraints workflow, by clicking the Add Custom Workflow Meta option in the Execute popup, as shown below:

    Screenshot showing granule execute popup with &#39;updateCmrAccessConstraints&#39; selected and configuration values shown in a collapsible JSON field

    An example invocation of the API to perform this action is:

    $ curl --request PUT https://example.com/granules/MOD11A1.A2017137.h19v16.006.2017138085750 \
    --header 'Authorization: Bearer ReplaceWithTheToken' \
    --header 'Content-Type: application/json' \
    --data '{
    "action": "applyWorkflow",
    "workflow": "updateCmrAccessConstraints",
    "meta": {
    accessConstraints: {
    value: 5,
    description: "sample access constraint"
    }
    }
    }'

    Supported CMR metadata formats for the above operation are Echo10XML and UMMG-JSON, which will populate the RestrictionFlag and RestrictionComment fields in Echo10XML, or the AccessConstraints values in UMMG-JSON.

    Additional Operations

    At this time Cumulus does not, out of the box, support additional operations on CMR metadata. However, given the examples shown above, we recommend working with your integrators to develop additional workflows that perform any required operations.

    Bulk CMR operations

    In order to perform the above operations in bulk, Cumulus supports the use of ApplyWorkflow in an AsyncOperation. These are accessed via the Bulk Operation button on the dashboard, or the /granules/bulk endpoint on the Cumulus API.

    More information on bulk operations are in the bulk operations operator doc.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/operator-docs/create-rule-in-cumulus/index.html b/docs/v13.0.0/operator-docs/create-rule-in-cumulus/index.html index 274d1b4cdd7..33e80d583a2 100644 --- a/docs/v13.0.0/operator-docs/create-rule-in-cumulus/index.html +++ b/docs/v13.0.0/operator-docs/create-rule-in-cumulus/index.html @@ -5,13 +5,13 @@ Create Rule In Cumulus | Cumulus Documentation - +
    Version: v13.0.0

    Create Rule In Cumulus

    Once the above files are in place and the entries created in CMR and Cumulus, we are ready to begin ingesting data. Depending on the type of ingestion (FTP/Kinesis, etc) the values below will change, but for the most part they are all similar. Rules tell Cumulus how to associate providers and collections, and when/how to start processing a workflow.

    Steps

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v13.0.0/operator-docs/discovery-filtering/index.html b/docs/v13.0.0/operator-docs/discovery-filtering/index.html index 75cae2815c3..0ef203c2d53 100644 --- a/docs/v13.0.0/operator-docs/discovery-filtering/index.html +++ b/docs/v13.0.0/operator-docs/discovery-filtering/index.html @@ -5,7 +5,7 @@ Discovery Filtering | Cumulus Documentation - + @@ -24,7 +24,7 @@ directly list the provider_path. If the path contains regular expression components, this may fail.

    It is recommended that operators diagnose any failures by checking error logs and ensuring that permissions on the remote file system allow reading of the default directory and any subdirectories that match the filter.

    Supported protocols

    Currently support for this feature is limited to the following protocols:

    • ftp
    • sftp
    - + \ No newline at end of file diff --git a/docs/v13.0.0/operator-docs/granule-workflows/index.html b/docs/v13.0.0/operator-docs/granule-workflows/index.html index faa942591b2..58576b8627b 100644 --- a/docs/v13.0.0/operator-docs/granule-workflows/index.html +++ b/docs/v13.0.0/operator-docs/granule-workflows/index.html @@ -5,13 +5,13 @@ Granule Workflows | Cumulus Documentation - +
    Version: v13.0.0

    Granule Workflows

    Failed Granule

    Delete and Ingest

    1. Delete Granule

    Note: Granules published to CMR will need to be removed from CMR via the dashboard prior to deletion

    1. Ingest Granule via Ingest Rule
    • Re-trigger a one-time, kinesis, SQS, or SNS rule or a scheduled rule will re-discover and reingest the deleted granule.

    Reingest

    1. Select Failed Granule
    • In the Cumulus dashboard, go to the Collections page.
    • Use search field to find the granule.
    1. Re-ingest Granule
    • Go to the Collections page.
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of the Reingest modal workflow

    Delete and Ingest

    1. Bulk Delete Granules
    • Go to the Granules page.
    • Use the Bulk Delete button to bulk delete selected granules or select via a Kibana query

    Note: You can optionally force deletion from CMR

    1. Ingest Granules via Ingest Rule
    • Re-trigger one-time, kinesis, SQS, or SNS rules or scheduled rules will re-discover and reingest the deleted granule.

    Multiple Failed Granules

    1. Select Failed Granules
    • In the Cumulus dashboard, go to the Collections page.
    • Click on Failed Granules.
    • Select multiple granules.

    Screenshot of selected multiple granules

    1. Bulk Re-ingest Granules
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of Bulk Reingest modal workflow

    - + \ No newline at end of file diff --git a/docs/v13.0.0/operator-docs/kinesis-stream-for-ingest/index.html b/docs/v13.0.0/operator-docs/kinesis-stream-for-ingest/index.html index c40bc6b43ca..c91983c26fb 100644 --- a/docs/v13.0.0/operator-docs/kinesis-stream-for-ingest/index.html +++ b/docs/v13.0.0/operator-docs/kinesis-stream-for-ingest/index.html @@ -5,13 +5,13 @@ Setup Kinesis Stream & CNM Message | Cumulus Documentation - +
    Version: v13.0.0

    Setup Kinesis Stream & CNM Message

    Note: Keep in mind that you should only have to set this up once per ingest stream. Kinesis pricing is based on the shard value and not on amount of kinesis usage.

    1. Create a Kinesis Stream

      • In your AWS console, go to the Kinesis service and click Create Data Stream.
      • Assign a name to the stream.
      • Apply a shard value of 1.
      • Click on Create Kinesis Stream.
      • A status page with stream details display. Once the status is active then the stream is ready to use. Keep in mind to record the streamName and StreamARN for later use.

      Screenshot of AWS console page for creating a Kinesis stream

    2. Create a Rule

    3. Send a message

      • Send a message that makes your schema using python or by your command line.
      • The streamName and Collection must match the kinesisArn+collection defined in the rule that you have created in Step 2.
    - + \ No newline at end of file diff --git a/docs/v13.0.0/operator-docs/locating-access-logs/index.html b/docs/v13.0.0/operator-docs/locating-access-logs/index.html index 1e1b88aaa67..dbf83860df6 100644 --- a/docs/v13.0.0/operator-docs/locating-access-logs/index.html +++ b/docs/v13.0.0/operator-docs/locating-access-logs/index.html @@ -5,13 +5,13 @@ Locating S3 Access Logs | Cumulus Documentation - +
    Version: v13.0.0

    Locating S3 Access Logs

    When enabling S3 Access Logs for EMS Reporting you configured a TargetBucket and TargetPrefix. Inside the TargetBucket at the TargetPrefix is where you will find the raw S3 access logs.

    In a standard deployment, this will be your stack's <internal bucket name> and a key prefix of <stack>/ems-distribution/s3-server-access-logs/

    - + \ No newline at end of file diff --git a/docs/v13.0.0/operator-docs/naming-executions/index.html b/docs/v13.0.0/operator-docs/naming-executions/index.html index bf1085dfc22..251d43af569 100644 --- a/docs/v13.0.0/operator-docs/naming-executions/index.html +++ b/docs/v13.0.0/operator-docs/naming-executions/index.html @@ -5,7 +5,7 @@ Naming Executions | Cumulus Documentation - + @@ -21,7 +21,7 @@ QueuePdrs step.

    In the following excerpt, the QueueGranules config.executionNamePrefix property is set using the value configured in the workflow's meta.executionNamePrefix.

    Please note: This meta.executionNamePrefix property should not be confused with the optional rule executionNamePrefix property from the previous section. Setting executionNamePrefix as a root property of the rule will set a prefix for the names of any workflows triggered by the rule. Setting meta.executionNamePrefix on the rule will set meta.executionNamePrefix in the workflow messages generated for this rule, allowing workflow steps like QueueGranules to read from the message meta.executionNamePrefix for their config. Then, workflows scheduled by QueueGranules would use the configured execution name prefix.

    Setting executionNamePrefix config for QueueGranules using rule.meta

    If you wanted to use a prefix of "my-prefix", you would create a rule with a meta property similar to the following Rule snippet:

    {
    ...other rule keys here...
    "meta":
    {
    "executionNamePrefix": "my-prefix"
    }
    }

    The value of meta.executionNamePrefix from the rule will be set as meta.executionNamePrefix in the workflow message.

    Then, the workflow could contain a "QueueGranules" step with the following state, which uses meta.executionNamePrefix from the message as the value for the executionNamePrefix config to the "QueueGranules" step:

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "executionNamePrefix": "{$.meta.executionNamePrefix}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },
    }
    - + \ No newline at end of file diff --git a/docs/v13.0.0/operator-docs/ops-common-use-cases/index.html b/docs/v13.0.0/operator-docs/ops-common-use-cases/index.html index a309438d37f..b3b9a4c1611 100644 --- a/docs/v13.0.0/operator-docs/ops-common-use-cases/index.html +++ b/docs/v13.0.0/operator-docs/ops-common-use-cases/index.html @@ -5,13 +5,13 @@ Operator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v13.0.0/operator-docs/trigger-workflow/index.html b/docs/v13.0.0/operator-docs/trigger-workflow/index.html index a9e1644346e..8988789c94a 100644 --- a/docs/v13.0.0/operator-docs/trigger-workflow/index.html +++ b/docs/v13.0.0/operator-docs/trigger-workflow/index.html @@ -5,13 +5,13 @@ Trigger a Workflow Execution | Cumulus Documentation - +
    Version: v13.0.0

    Trigger a Workflow Execution

    To trigger a workflow, you need to create a rule. To trigger an ingest workflow, one that requires discovering and ingesting data, you will also need to configure the collection and provider and associate those to a rule.

    Trigger a HelloWorld Workflow

    To trigger a HelloWorld workflow that does not need to discover or archive data, you just need to create a rule.

    You can leave the provider and collection blank and do not need any additional metadata. If you create a onetime rule, the workflow execution will start momentarily and you can view its status on the Executions page.

    Trigger an Ingest Workflow

    To ingest data, you will need a provider and collection configured to tell your workflow where to discover data and where to archive the data respectively.

    Follow the instructions to create a provider and create a collection and configure their fields for your data ingest.

    In the rule's additional metadata you can specify a provider_path from which to get the data from the provider.

    Example: Ingest data from S3

    Setup

    Assume there are 2 files to be ingested in an S3 bucket called discovery-bucket, located in the test-data folder:

    • GRANULE.A2017025.jpg
    • GRANULE.A2017025.hdf

    Archive buckets should already be created and mapped to public / private / protected in the Cumulus deployment.

    For example:

    buckets = {
    private = {
    name = "discovery-bucket"
    type = "private"
    },
    protected = {
    name = "archive-protected"
    type = "protected"
    }
    public = {
    name = "archive-public"
    type = "public"
    }
    }

    Create a provider

    Create a new provider. Set protocol to S3 and Host to discovery-bucket.

    Screenshot of adding a sample S3 provider

    Create a collection

    Create a new collection. Configure the collection to extract the granule id from the filenames and configure where to store the granule files.

    The configuration below will store hdf files in the protected bucket and jpg files in the private bucket. The bucket types are

    {
    "name": "test-collection",
    "version": "001",
    "granuleId": "^GRANULE\\.A[\\d]{7}$",
    "granuleIdExtraction": "(GRANULE\\..*)(\\.hdf|\\.jpg)",
    "reportToEms": false,
    "sampleFileName": "GRANULE.A2017025.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^GRANULE\\.A[\\d]{7}\\.hdf$",
    "sampleFileName": "GRANULE.A2017025.hdf"
    },
    {
    "bucket": "public",
    "regex": "^GRANULE\\.A[\\d]{7}\\.jpg$",
    "sampleFileName": "GRANULE.A2017025.jpg"
    }
    ]
    }

    Create a rule

    Create a rule to trigger the workflow to discover your granule data and ingest your granule.

    Select the previously created provider and collection. See the Cumulus Discover Granules workflow for a workflow example of using Cumulus tasks to discover and queue data for ingest.

    In the rule meta, set the provider_path to test-data, so the test-data folder will be used to discover new granules.

    Screenshot of adding a Discover Granules rule

    A onetime rule will run your workflow on-demand and you can view it on the dashboard Executions page. The Cumulus Discover Granules workflow will trigger an ingest workflow and your ingested granules will be visible on the dashboard Granules page.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/tasks/index.html b/docs/v13.0.0/tasks/index.html index 07ea4027231..b84e5585b63 100644 --- a/docs/v13.0.0/tasks/index.html +++ b/docs/v13.0.0/tasks/index.html @@ -5,13 +5,13 @@ Cumulus Tasks | Cumulus Documentation - +
    Version: v13.0.0

    Cumulus Tasks

    A list of reusable Cumulus tasks. Add your own.

    NOTE: For a detailed description of each task, visit the task's README.md. Information on the input or output of a task is specified in the task's schemas directory.

    Tasks

    @cumulus/add-missing-file-checksums

    Add checksums to files in S3 which don't have one


    @cumulus/discover-granules

    Discover Granules in FTP/HTTP/HTTPS/SFTP/S3 endpoints


    @cumulus/discover-pdrs

    Discover PDRs in FTP and HTTP endpoints


    @cumulus/files-to-granules

    Converts array-of-files input into a granules object by extracting granuleId from filename


    @cumulus/hello-world

    Example task


    @cumulus/hyrax-metadata-updates

    Update granule metadata with hooks to OPeNDAP URL


    @cumulus/lzards-backup

    Run LZARDS backup


    @cumulus/move-granules

    Move granule files from staging to final location


    @cumulus/parse-pdr

    Download and Parse a given PDR


    @cumulus/pdr-status-check

    Checks execution status of granules in a PDR


    @cumulus/post-to-cmr

    Post a given granule to CMR


    @cumulus/queue-granules

    Add discovered granules to the queue


    @cumulus/queue-pdrs

    Add discovered PDRs to a queue


    @cumulus/queue-workflow

    Add workflow to the queue


    @cumulus/sf-sqs-report

    Sends an incoming Cumulus message to SQS


    @cumulus/sync-granule

    Download a given granule


    @cumulus/test-processing

    Fake processing task used for integration tests


    @cumulus/update-cmr-access-constraints

    Updates CMR metadata to set access constraints


    Update CMR metadata files with correct online access urls and etags and transfer etag info to granules' CMR files

    - + \ No newline at end of file diff --git a/docs/v13.0.0/team/index.html b/docs/v13.0.0/team/index.html index 039f582dfcb..47d8ffc2345 100644 --- a/docs/v13.0.0/team/index.html +++ b/docs/v13.0.0/team/index.html @@ -5,13 +5,13 @@ Cumulus Team | Cumulus Documentation - +
    Version: v13.0.0

    Cumulus Team

    Cumulus Core Team

    Cumulus Emeritus Team

    - + \ No newline at end of file diff --git a/docs/v13.0.0/troubleshooting/index.html b/docs/v13.0.0/troubleshooting/index.html index 7d393bcc56d..2f0bfffdf48 100644 --- a/docs/v13.0.0/troubleshooting/index.html +++ b/docs/v13.0.0/troubleshooting/index.html @@ -5,14 +5,14 @@ How to Troubleshoot and Fix Issues | Cumulus Documentation - +
    Version: v13.0.0

    How to Troubleshoot and Fix Issues

    While Cumulus is a complex system, there is a focus on maintaining the integrity and availability of the system and data. Should you encounter errors or issues while using this system, this section will help troubleshoot and solve those issues.

    Backup and Restore

    Cumulus has backup and restore functionality built-in to protect Cumulus data and allow recovery of a Cumulus stack. This is currently limited to Cumulus data and not full S3 archive data. Backup and restore is not enabled by default and must be enabled and configured to take advantage of this feature.

    For more information, read the Backup and Restore documentation.

    Elasticsearch reindexing

    If you run into issues with your Elasticsearch index, a reindex operation is available via the Cumulus API. See the Reindexing Guide.

    Information on how to reindex Elasticsearch is in the Cumulus API documentation.

    Troubleshooting Workflows

    Workflows are state machines comprised of tasks and services and each component logs to CloudWatch. The CloudWatch logs for all steps in the execution are displayed in the Cumulus dashboard or you can find them by going to CloudWatch and navigating to the logs for that particular task.

    Workflow Errors

    Visual representations of executed workflows can be found in the Cumulus dashboard or the AWS Step Functions console for that particular execution.

    If a workflow errors, the error will be handled according to the error handling configuration. The task that fails will have the exception field populated in the output, giving information about the error. Further information can be found in the CloudWatch logs for the task.

    Graph of AWS Step Function execution showing a failing workflow

    Workflow Did Not Start

    Generally, first check your rule configuration. If that is satisfactory, the answer will likely be in the CloudWatch logs for the schedule SF or SF starter lambda functions. See the workflow triggers page for more information on how workflows start.

    For Kinesis and SNS rules specifically, if an error occurs during the message consumer process, the fallback consumer lambda will be called and if the message continues to error, a message will be placed on the dead letter queue. Check the dead letter queue for a failure message. Errors can be traced back to the CloudWatch logs for the message consumer and the fallback consumer. Additionally, check that the name and version match those configured in your rule, as rules are filtered by the notification's collection name and version before scheduling executions.

    More information on kinesis error handling is here.

    Operator API Errors

    All operator API calls are funneled through the ApiEndpoints lambda. Each API call is logged to the ApiEndpoints CloudWatch log for your deployment.

    Lambda Errors

    KMS Exception: AccessDeniedException

    KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.

    The above error was being thrown by cumulus lambda function invocation. The KMS key is the encryption key used to encrypt lambda environment variables. The root cause of this error is unknown, but is speculated to be caused by deleting and recreating, with the same name, the IAM role the lambda uses.

    This error can be resolved by switching the lambda's execution role to a different one and then back through the Lambda management console. Unfortunately, this approach doesn't scale well.

    The other resolution (that scales but takes some time) that was found is as follows:

    1. Comment out all lambda definitions (and dependent resources) in your Terraform configuration.
    2. terraform apply to delete the lambdas.
    3. Un-comment the definitions.
    4. terraform apply to recreate the lambdas.

    If this problem occurs with Core lambdas and you are using the terraform-aws-cumulus.zip file source distributed in our release, we recommend using the non-scaling approach as the number of lambdas we distribute is in the low teens, which are likely to be easier and faster to reconfigure one-by-one compared to editing our configs.

    Error: Unable to import module 'index': Error

    This error is shown in the CloudWatch logs for a Lambda function.

    One possible cause is that the Lambda definition in the .tf file defining the lambda is not pointing to the correct packaged lambda source file. In order to resolve this issue, update the lambda definition to point directly to the packaged (e.g. .zip) lambda source file.

    resource "aws_lambda_function" "discover_granules_task" {
    function_name = "${var.prefix}-DiscoverGranules"
    filename = "${path.module}/../../tasks/discover-granules/dist/lambda.zip"
    handler = "index.handler"
    }

    If you are seeing this error when using the Lambda as a step in a Cumulus workflow, then inspect the output for this Lambda step in the AWS Step Function console. If you see the error Cannot find module 'node_modules/@cumulus/cumulus-message-adapter-js', then you need to ensure the lambda's packaged dependencies include cumulus-message-adapter-js.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/troubleshooting/reindex-elasticsearch/index.html b/docs/v13.0.0/troubleshooting/reindex-elasticsearch/index.html index 3558ccebeab..3d67033a513 100644 --- a/docs/v13.0.0/troubleshooting/reindex-elasticsearch/index.html +++ b/docs/v13.0.0/troubleshooting/reindex-elasticsearch/index.html @@ -5,7 +5,7 @@ Reindexing Elasticsearch Guide | Cumulus Documentation - + @@ -14,7 +14,7 @@ current index, or the mappings for an index have been updated (they do not update automatically). Any reindexing that will be required when upgrading Cumulus will be in the Migration Steps section of the changelog.

    Switch to a new index and Reindex

    There are two operations needed: reindex and change-index to switch over to the new index. A Change Index/Reindex can be done in either order, but both have their trade-offs.

    If you decide to point Cumulus to a new (empty) index first (with a change index operation), and then Reindex the data to the new index, data ingested while reindexing will automatically be sent to the new index. As reindexing operations can take a while, not all the data will show up on the Cumulus Dashboard right away. The advantage is you do not have to turn of any ingest operations. This way is recommended.

    If you decide to Reindex data to a new index first, and then point Cumulus to that new index, it is not guaranteed that data that is sent to the old index while reindexing will show up in the new index. If you prefer this way, it is recommended to turn off any ingest operations. This order will keep your dashboard data from seeing any interruption.

    Change Index

    This will point Cumulus to the index in Elasticsearch that will be used when retrieving data. Performing a change index operation to an index that does not exist yet will create the index for you. The change index operation can be found here.

    Reindex from the old index to the new index

    The reindex operation will take the data from one index and copy it into another index. The reindex operation can be found here

    Reindex status

    Reindexing is a long-running operation. The reindex-status endpoint can be used to monitor the progress of the operation.

    Index from database

    If you want to just grab the data straight from the database you can perform an Index from Database Operation. After the data is indexed from the database, a Change Index operation will need to be performed to ensure Cumulus is pointing to the right index. It is strongly recommended to turn off workflow rules when performing this operation so any data ingested to the database is not lost.

    Validate reindex

    To validate the reindex, use the reindex-status endpoint. The doc count can be used to verify that the reindex was successful. In the below example the reindex from cumulus-2020-11-3 to cumulus-2021-3-4 was not fully successful as they show different doc counts.

    "indices": {
    "cumulus-2020-11-3": {
    "primaries": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    },
    "total": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    }
    },
    "cumulus-2021-3-4": {
    "primaries": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    },
    "total": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    }
    }
    }

    To further drill down into what is missing, log in to the Kibana instance (found in the Elasticsearch section of the AWS console) and run the following command replacing <index> with your index name.

    GET <index>/_search
    {
    "aggs": {
    "count_by_type": {
    "terms": {
    "field": "_type"
    }
    }
    },
    "size": 0
    }

    which will produce a result like

    "aggregations": {
    "count_by_type": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "logs",
    "doc_count": 483955
    },
    {
    "key": "execution",
    "doc_count": 4966
    },
    {
    "key": "deletedgranule",
    "doc_count": 4715
    },
    {
    "key": "pdr",
    "doc_count": 1822
    },
    {
    "key": "granule",
    "doc_count": 740
    },
    {
    "key": "asyncOperation",
    "doc_count": 616
    },
    {
    "key": "provider",
    "doc_count": 108
    },
    {
    "key": "collection",
    "doc_count": 87
    },
    {
    "key": "reconciliationReport",
    "doc_count": 48
    },
    {
    "key": "rule",
    "doc_count": 7
    }
    ]
    }
    }

    Resuming a reindex

    If a reindex operation did not fully complete it can be resumed using the following command run from the Kibana instance.

    POST _reindex?wait_for_completion=false
    {
    "conflicts": "proceed",
    "source": {
    "index": "cumulus-2020-11-3"
    },
    "dest": {
    "index": "cumulus-2021-3-4",
    "op_type": "create"
    }
    }

    The Cumulus API reindex-status endpoint can be used to monitor completion of this operation.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/troubleshooting/rerunning-workflow-executions/index.html b/docs/v13.0.0/troubleshooting/rerunning-workflow-executions/index.html index cb4634fd904..50c11eac21b 100644 --- a/docs/v13.0.0/troubleshooting/rerunning-workflow-executions/index.html +++ b/docs/v13.0.0/troubleshooting/rerunning-workflow-executions/index.html @@ -5,13 +5,13 @@ Re-running workflow executions | Cumulus Documentation - +
    Version: v13.0.0

    Re-running workflow executions

    To re-run a Cumulus workflow execution from the AWS console:

    1. Visit the page for an individual workflow execution

    2. Click the "New execution" button at the top right of the screen

      Screenshot of the AWS console for a Step Function execution highlighting the &quot;New execution&quot; button at the top right of the screen

    3. In the "New execution" modal that appears, replace the cumulus_meta.execution_name value in the default input with the value of the new execution ID as seen in the screenshot below

      Screenshot of the AWS console showing the modal window for entering input when running a new Step Function execution

    4. Click the "Start execution" button

    - + \ No newline at end of file diff --git a/docs/v13.0.0/troubleshooting/troubleshooting-deployment/index.html b/docs/v13.0.0/troubleshooting/troubleshooting-deployment/index.html index 0986acc7ab0..acc4205c899 100644 --- a/docs/v13.0.0/troubleshooting/troubleshooting-deployment/index.html +++ b/docs/v13.0.0/troubleshooting/troubleshooting-deployment/index.html @@ -5,7 +5,7 @@ Troubleshooting Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ data-persistence modules, but your config is only creating one Elasticsearch instance. To fix the issue, update the elasticsearch_config variable for your data-persistence module to increase the number of instances:

    {
    domain_name = "es"
    instance_count = 2
    instance_type = "t2.small.elasticsearch"
    version = "5.3"
    volume_size = 10
    }

    Install dashboard

    Dashboard configuration

    Issues:

    • Problem clearing the cache: EACCES: permission denied, rmdir '/tmp/gulp-cache/default'", this probably means the files at that location, and/or the folder, are owned by someone else (or some other factor prevents you from writing there).

    It's possible to workaround this by editing the file cumulus-dashboard/node_modules/gulp-cache/index.js and alter the value of the line var fileCache = new Cache({cacheDirName: 'gulp-cache'}); to something like var fileCache = new Cache({cacheDirName: '<prefix>-cache'});. Now gulp-cache will be able to write to /tmp/<prefix>-cache/default, and the error should resolve.

    Dashboard deployment

    Issues:

    • If the dashboard sends you to an Earthdata Login page that has an error reading "Invalid request, please verify the client status or redirect_uri before resubmitting", this means you've either forgotten to update one or more of your EARTHDATA_CLIENT_ID, EARTHDATA_CLIENT_PASSWORD environment variables (from your app/.env file) and re-deploy Cumulus, or you haven't placed the correct values in them, or you've forgotten to add both the "redirect" and "token" URL to the Earthdata Application.
    • There is odd caching behavior associated with the dashboard and Earthdata Login at this point in time that can cause the above error to reappear on the Earthdata Login page loaded by the dashboard even after fixing the cause of the error. If you experience this, attempt to access the dashboard in a new browser window, and it should work.
    - + \ No newline at end of file diff --git a/docs/v13.0.0/upgrade-notes/cumulus_distribution_migration/index.html b/docs/v13.0.0/upgrade-notes/cumulus_distribution_migration/index.html index d0c6417f706..ef9e67ab2b1 100644 --- a/docs/v13.0.0/upgrade-notes/cumulus_distribution_migration/index.html +++ b/docs/v13.0.0/upgrade-notes/cumulus_distribution_migration/index.html @@ -5,14 +5,14 @@ Migrate from TEA deployment to Cumulus Distribution | Cumulus Documentation - +
    Version: v13.0.0

    Migrate from TEA deployment to Cumulus Distribution

    Background

    The Cumulus Distribution API is configured to use the AWS Cognito OAuth client. This API can be used instead of the Thin Egress App, which is the default distribution API if using the Deployment Template.

    Configuring a Cumulus Distribution deployment

    See these instructions for deploying the Cumulus Distribution API.

    Important note if migrating from TEA to Cumulus Distribution

    If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/upgrade-notes/migrate_tea_standalone/index.html b/docs/v13.0.0/upgrade-notes/migrate_tea_standalone/index.html index 2141bd4c1bf..70c79c6ecba 100644 --- a/docs/v13.0.0/upgrade-notes/migrate_tea_standalone/index.html +++ b/docs/v13.0.0/upgrade-notes/migrate_tea_standalone/index.html @@ -5,13 +5,13 @@ Migrate TEA deployment to standalone module | Cumulus Documentation - +
    Version: v13.0.0

    Migrate TEA deployment to standalone module

    Background

    This document is only relevant for upgrades of Cumulus from versions < 3.x.x to versions > 3.x.x

    Previous versions of Cumulus included deployment of the Thin Egress App (TEA) by default in the distribution module. As a result, Cumulus users who wanted to deploy a new version of TEA to wait on a new release of Cumulus that incorporated that release.

    In order to give Cumulus users the flexibility to deploy newer versions of TEA whenever they want, deployment of TEA has been removed from the distribution module and Cumulus users must now add the TEA module to their deployment. Guidance on integrating the TEA module to your deployment is provided, or you can refer to Cumulus core example deployment code for the thin_egress_app module.

    By default, when upgrading Cumulus and moving from TEA deployed via the distribution module to deployed as a separate module, your API gateway for TEA would be destroyed and re-created, which could cause outages for any Cloudfront endpoints pointing at that API gateway.

    These instructions outline how to modify your state to preserve your existing Thin Egress App (TEA) API gateway when upgrading Cumulus and moving deployment of TEA to a standalone module. If you do not care about preserving your API gateway for TEA when upgrading your Cumulus deployment, you can skip these instructions.

    Prerequisites

    Notes about state management

    These instructions will involve manipulating your Terraform state via terraform state mv commands. These operations are extremely dangerous, since a mistake in editing your Terraform state can leave your stack in a corrupted state where deployment may be impossible or may result in unanticipated resource deletion.

    Since bucket versioning preserves a separate version of your state file each time it is written, and the Terraform state modification commands overwrite the state file, we can mitigate the risk of these operations by downloading the most recent state file before starting the upgrade process. Then, if anything goes wrong during the upgrade, we can restore that previous state version. Guidance on how to perform both operations is provided below.

    Download your most recent state version

    Run this command to download the most recent cumulus deployment state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp s3://BUCKET/KEY /path/to/terraform.tfstate

    Restore a previous state version

    Upload the state file that was previously downloaded to the bucket/key for your state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp /path/to/terraform.tfstate s3://BUCKET/KEY

    Then run terraform plan, which will give an error because we manually overwrote the state file and it is now out of sync with the lock table Terraform uses to track your state file:

    Error: Error loading state: state data in S3 does not have the expected content.

    This may be caused by unusually long delays in S3 processing a previous state
    update. Please wait for a minute or two and try again. If this problem
    persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
    to manually verify the remote state and update the Digest value stored in the
    DynamoDB table to the following value: <some-digest-value>

    To resolve this error, run this command and replace DYNAMO_LOCK_TABLE, BUCKET and KEY with the correct values from cumulus-tf/terraform.tf, and use the digest value from the previous error output:

     aws dynamodb put-item \
    --table-name DYNAMO_LOCK_TABLE \
    --item '{
    "LockID": {"S": "BUCKET/KEY-md5"},
    "Digest": {"S": "some-digest-value"}
    }'

    Now, if you re-run terraform plan, it should work as expected.

    Migration instructions

    Please note: These instructions assume that you are deploying the thin_egress_app module as shown in the Cumulus core example deployment code

    1. Ensure that you have downloaded the latest version of your state file for your cumulus deployment

    2. Find the URL for your <prefix>-thin-egress-app-EgressGateway API gateway. Confirm that you can access it in the browser and that it is functional.

    3. Run terraform plan. You should see output like (edited for readability):

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be created
      + resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket.lambda_source will be created
      + resource "aws_s3_bucket" "lambda_source" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be created
      + resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be created
      + resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be created
      + resource "aws_s3_bucket_object" "lambda_source" {

      # module.thin_egress_app.aws_security_group.egress_lambda[0] will be created
      + resource "aws_security_group" "egress_lambda" {

      ...

      # module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be destroyed
      - resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source will be destroyed
      - resource "aws_s3_bucket" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be destroyed
      - resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be destroyed
      - resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source will be destroyed
      - resource "aws_s3_bucket_object" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda[0] will be destroyed
      - resource "aws_security_group" "egress_lambda" {
    4. Run the state modification commands. The commands must be run in exactly this order:

       # Move security group
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda module.thin_egress_app.aws_security_group.egress_lambda

      # Move TEA storage bucket
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source module.thin_egress_app.aws_s3_bucket.lambda_source

      # Move TEA lambda source code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source module.thin_egress_app.aws_s3_bucket_object.lambda_source

      # Move TEA lambda dependency code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive

      # Move TEA Cloudformation template
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template module.thin_egress_app.aws_s3_bucket_object.cloudformation_template

      # Move URS creds secret version
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret_version.thin_egress_urs_creds aws_secretsmanager_secret_version.thin_egress_urs_creds

      # Move URS creds secret
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret.thin_egress_urs_creds aws_secretsmanager_secret.thin_egress_urs_creds

      # Move TEA Cloudformation stack
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app module.thin_egress_app.aws_cloudformation_stack.thin_egress_app

      Depending on how you were supplying a bucket map to TEA, there may be an additional step. If you were specifying the bucket_map_key variable to the cumulus module to use a custom bucket map, then you can ignore this step and just ensure that the bucket_map_file variable to the TEA module uses that same S3 key. Otherwise, if you were letting Cumulus generate a bucket map for you, then you need to take this step to migrate that bucket map:

      # Move bucket map
      terraform state mv module.cumulus.module.distribution.aws_s3_bucket_object.bucket_map_yaml[0] aws_s3_bucket_object.bucket_map_yaml
    5. Run terraform plan again. You may still see a few additions/modifications pending like below, but you should not see any deletion of Thin Egress App resources pending:

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be updated in-place
      ~ resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be updated in-place
      ~ resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_source" {

      If you still see deletion of module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app pending, then something went wrong and you should restore the previously downloaded state file version and start over from step 1. Otherwise, proceed to step 6.

    6. Once you have confirmed that everything looks as expected, run terraform apply.

    7. Visit the same API gateway from step 1 and confirm that it still works.

    Your TEA deployment has now been migrated to a standalone module, which gives you the ability to upgrade the deployed version of TEA independently of Cumulus releases.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/upgrade-notes/update-cma-2.0.2/index.html b/docs/v13.0.0/upgrade-notes/update-cma-2.0.2/index.html index f69ab66a205..60c13626dee 100644 --- a/docs/v13.0.0/upgrade-notes/update-cma-2.0.2/index.html +++ b/docs/v13.0.0/upgrade-notes/update-cma-2.0.2/index.html @@ -5,13 +5,13 @@ Upgrade to CMA 2.0.2 | Cumulus Documentation - +
    Version: v13.0.0

    Upgrade to CMA 2.0.2

    Updating a Cumulus Deployment to CMA 2.0.2

    Background

    The Cumulus Message Adapter has been updated in release 2.0.2 to no longer utilize the AWS step function API to look up the defined name of a step function task for population in meta.workflow_tasks, but instead use an incrementing integer field.

    Additionally a bugfix was released in the form of v2.0.1/v2.0.2 following the initial 2.0.0 release, so all users should update to release 2.0.2

    The update is not tied to a particular version of Core, however the update should be done across all task components in order to ensure consistent execution records.

    Changes

    Execution Record Update

    This update functionally means that Cumulus tasks/activities using the CMA will now record a record that looks like the following in meta.workflowtasks, and more importantly in the tasks column for an execution record:

    Original

          "DiscoverGranules": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "QueueGranules": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    New

          "0": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "1": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    Actions Required

    The following should be done as part of a Cumulus stack update to utilize cumulus message adapter > 2.0.2:

    • Python tasks that utilize cumulus-message-adapter-python should be updated to use > 2.0.0, their lambdas rebuilt and Cumulus workflows reconfigured to use the updated version.

    • Python activities that utilize cumulus-process-py should be rebuilt using > 1.0.0 with updated dependencies, and have their images deployed/Cumulus configured to use the new version.

    • The cumulus-message-adapter v2.0.2 lambda layer should be made available in the deployment account, and the Cumulus deployment should be reconfigured to use it (via the cumulus_message_adapter_lambda_layer_version_arn variable in the cumulus module). This should address all Core node.js tasks that utilize the CMA, and many contributed node.js/JAVA components.

    Once the above have been done, redeploy Cumulus to apply the configuration and the updates should be live.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/upgrade-notes/update-task-file-schemas/index.html b/docs/v13.0.0/upgrade-notes/update-task-file-schemas/index.html index b4d22836db9..ecfb762fbe6 100644 --- a/docs/v13.0.0/upgrade-notes/update-task-file-schemas/index.html +++ b/docs/v13.0.0/upgrade-notes/update-task-file-schemas/index.html @@ -5,13 +5,13 @@ Updates to task granule file schemas | Cumulus Documentation - +
    Version: v13.0.0

    Updates to task granule file schemas

    Background

    Most Cumulus workflow tasks expect as input a payload of granule(s) which contain the files for each granule. Most tasks also return this same granule structure as output.

    However, up to this point, there was inconsistency in the schemas for the granule files objects expected by each task. Furthermore, there was no guarantee of consistency between granule files objects as stored in the database and the expectations of any given workflow task.

    Thus, when performing bulk granule operations which pass granules from the database into a Cumulus workflow, it was possible for there to be schema validation failures depending on which task was used to start the workflow and its particular schema.

    In order to rectify this situation, CUMULUS-2388 was filed and addressed to create a common granule files schema between nearly all of the Cumulus tasks (exceptions discussed below) and the Cumulus database. The following documentation explains the manual changes you need to make to your deployment in order to be compatible with the updated files schema.

    Updated files schema

    The updated granule files schema can be found here.

    These former properties were deprecated (with notes about how to derive the same information from the updated schema, if possible):

    • filename - concatenate the bucket and key values with a directory separator (/)
    • name - use fileName property
    • etag - ETags are no longer provided as an individual file property. Instead, a separate etags object mapping S3 URIs to ETag values is provided as output from the following workflow tasks (guidance on how to integrate this output with your workflows is provided in the Upgrading your workflows section below):
      • update-granules-cmr-metadata-file-links
      • hyrax-metadata-updates
    • fileStagingDir - no longer supported
    • url_path - no longer supported
    • duplicate_found - This property is no longer supported, however sync-granule and move-granules now produce a separate granuleDuplicates object as part of their output. The granuleDuplicates object is a map of granules by granule ID which includes the files that encountered duplicates during processing. Guidance on how to integrate granuleDuplicates information into your workflow configuration is provided below.

    Exceptions

    These workflow tasks did not have their schema for granule files updated:

    • discover-granules - no updates
    • queue-granules - no updates
    • parse-pdr - no updates
    • sync-granule - input schema not updated, output schema was updated

    The reason that these task schemas were not updated is that all of these tasks start before the files have been ingested to S3, thus much of the information that is required in the updated files schema like bucket, key, or checksum is not yet known.

    Bulk granule operations

    Since the input schema for the above tasks was not updated, that means you cannot run bulk granule operations against workflows if they start with any of those tasks. Bulk granule operations work by loading the specified granules from the database and sending them as input to a specified workflow, so if the specified workflow begins with a task whose input schema does not conform to what is coming out of the database, there will be schema errors.

    Upgrading your deployment

    Upgrading your workflows

    For any workflows using the update-granules-cmr-metadata-file-links task before the hyrax-metadata-updates and/or post-to-cmr tasks, update the step definition for update-granules-cmr-metadata-file-links as follows:

        "UpdateGranulesCmrMetadataFileLinksStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    hyrax-metadata-updates

    For any workflows using the hyrax-metadata-updates task before a post-to-cmr task, update the definition of the hyrax-metadata-updates step as follows:

        "HyraxMetadataUpdatesTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    post-to-cmr

    For any workflows using post-to-cmr task after the update-granules-cmr-metadata-file-links or hyrax-metadata-updates tasks, update the post-to-cmr step definition as follows:

        "CmrStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}"
    }
    }
    },
    ...more configuration...

    Example workflow

    For an example workflow integrating all of these changes, please see our example ingest and publish workflow.

    Optional - Integrate granuleDuplicates information

    Please note that the granuleDuplicates output is purely informational and does not have any bearing on the separate configuration for how duplicates should be handled.

    You can include granuleDuplicates output from the sync-granule or move-granules tasks in your workflow messages like so:

        "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    ...other config...
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granuleDuplicates}",
    "destination": "{$.meta.sync_granule.granule_duplicates}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    }
    ...more configuration...

    The result of this configuration is that the granuleDuplicates output from sync-granule would be placed in meta.sync_granule.granule_duplicates on the workflow message and remain there throughout the rest of the workflow. The same configuration could be replicated for the move-granules task, but be sure to use a different destination in the workflow message for the granuleDuplicates output .

    Updating collection URL path templates

    Collections can specify url_path templates to dynamically generate the final location of files. As part of url_path templates, file object properties can be interpolated to generate the file path. Thus, these url_path templates need to be updated to ensure that they are compatible with the updated files schema and the properties that will actually be available on file objects.

    See the notes on the updated files schema to know which properties are available and which previously existing properties were deprecated.

    As an example, you will want to update any url_path properties in your collections to remove references to file.name and replace them with references to file.fileName like so:

    - "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.name, 0, 3)}",
    + "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.fileName, 0, 3)}",
    - + \ No newline at end of file diff --git a/docs/v13.0.0/upgrade-notes/upgrade-rds/index.html b/docs/v13.0.0/upgrade-notes/upgrade-rds/index.html index 4757b3c1939..c57a9f15439 100644 --- a/docs/v13.0.0/upgrade-notes/upgrade-rds/index.html +++ b/docs/v13.0.0/upgrade-notes/upgrade-rds/index.html @@ -5,7 +5,7 @@ Upgrade to RDS release | Cumulus Documentation - + @@ -21,7 +21,7 @@ | cutoffSeconds | number | Number of seconds prior to this execution to 'cutoff' reconciliation queries. This allows in-progress/other in-flight operations time to complete and propagate to Elasticsearch/Dynamo/postgres. | 3600 | | dbConcurrency | number | Sets max number of parallel collections reports the script will run at a time. | 20 | | dbMaxPool | number | Sets the maximum number of connections the database pool has available. Modifying this may result in unexpected failures. | 20 |

    - + \ No newline at end of file diff --git a/docs/v13.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html b/docs/v13.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html index 262758efad4..c768c3e8182 100644 --- a/docs/v13.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html +++ b/docs/v13.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html @@ -5,13 +5,13 @@ Upgrade to TF version 0.13.6 | Cumulus Documentation - +
    Version: v13.0.0

    Upgrade to TF version 0.13.6

    Background

    Cumulus pins its support to a specific version of Terraform see: deployment documentation. The reason for only supporting one specific Terraform version at a time is to avoid deployment errors than can be caused by deploying to the same target with different Terraform versions.

    Cumulus is upgrading its supported version of Terraform from 0.12.12 to 0.13.6. This document contains instructions on how to perform the upgrade for your deployments.

    Prerequisites

    • Follow the Terraform guidance for what to do before upgrading, notably ensuring that you have no pending changes to your Cumulus deployments before proceeding.
      • You should do a terraform plan to see if you have any pending changes for your deployment (for both the data-persistence-tf and cumulus-tf modules), and if so, run a terraform apply before doing the upgrade to Terraform 0.13.6
    • Review the Terraform v0.13 release notes to prepare for any breaking changes that may affect your custom deployment code. Cumulus' deployment code has already been updated for compatibility with version 0.13.
    • Install Terraform version 0.13.6. We recommend using Terraform Version Manager tfenv to manage your installed versons of Terraform, but this is not required.

    Upgrade your deployment code

    Terraform 0.13 does not support some of the syntax from previous Terraform versions, so you need to upgrade your deployment code for compatibility.

    Terraform provides a 0.13upgrade command as part of version 0.13 to handle automatically upgrading your code. Make sure to check out the documentation on batch usage of 0.13upgrade, which will allow you to upgrade all of your Terraform code with one command.

    Run the 0.13upgrade command until you have no more necessary updates to your deployment code.

    Upgrade your deployment

    1. Ensure that you are running Terraform 0.13.6 by running terraform --version. If you are using tfenv, you can switch versions by running tfenv use 0.13.6.

    2. For the data-persistence-tf and cumulus-tf directories, take the following steps:

      1. Run terraform init --reconfigure. The --reconfigure flag is required, otherwise you might see an error like:

        Error: Failed to decode current backend config

        The backend configuration created by the most recent run of "terraform init"
        could not be decoded: unsupported attribute "lock_table". The configuration
        may have been initialized by an earlier version that used an incompatible
        configuration structure. Run "terraform init -reconfigure" to force
        re-initialization of the backend.
      2. Run terraform apply to perform a deployment.

        WARNING: Even if Terraform says that no resource changes are pending, running the apply using Terraform version 0.13.6 will modify your backend state from version 0.12.12 to version 0.13.6 without requiring approval. Updating the backend state is a necessary part of the version 0.13.6 upgrade, but it is not completely transparent.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/workflow_tasks/discover_granules/index.html b/docs/v13.0.0/workflow_tasks/discover_granules/index.html index ca7b755435b..c6eca335639 100644 --- a/docs/v13.0.0/workflow_tasks/discover_granules/index.html +++ b/docs/v13.0.0/workflow_tasks/discover_granules/index.html @@ -5,7 +5,7 @@ Discover Granules | Cumulus Documentation - + @@ -21,7 +21,7 @@ included in a granule's file list. That is, no such filtering based on filename occurs as described above.

    When set on the task configuration, the value applies to all collections during discovery. Otherwise, this property may be set on individual collections.

    Concurrency

    A number property that determines the level of concurrency with which granule duplicate checks are performed when duplicateGranuleHandling is skip or error.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when discover-granules discovers a large number of granules with skip or error duplicate handling. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the discover-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/workflow_tasks/files_to_granules/index.html b/docs/v13.0.0/workflow_tasks/files_to_granules/index.html index be8be22a8ce..e17d54054f3 100644 --- a/docs/v13.0.0/workflow_tasks/files_to_granules/index.html +++ b/docs/v13.0.0/workflow_tasks/files_to_granules/index.html @@ -5,13 +5,13 @@ Files To Granules | Cumulus Documentation - +
    Version: v13.0.0

    Files To Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming config.inputGranules and the task input list of s3 URIs along with the rest of the configuration objects to take the list of incoming files and sort them into a list of granule objects.

    Please note Files passed in without metadata defined previously for config.inputGranules will be added with the following keys:

    • size
    • bucket
    • key
    • fileName

    It is primarily intended to support compatibility with the standard output of a processing task, and convert that output into a granule object accepted as input by the majority of other Cumulus tasks.

    Task Inputs

    Input

    This task expects an incoming input that contains an array of 'staged' S3 URIs to move to their final archive location.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    inputGranules

    An array of Cumulus granule objects.

    This object will be used to define metadata values for the move granules task, and is the basis for the updated object that will be added to the output.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/workflow_tasks/lzards_backup/index.html b/docs/v13.0.0/workflow_tasks/lzards_backup/index.html index 08581d1aea8..db448f27078 100644 --- a/docs/v13.0.0/workflow_tasks/lzards_backup/index.html +++ b/docs/v13.0.0/workflow_tasks/lzards_backup/index.html @@ -5,13 +5,13 @@ LZARDS Backup | Cumulus Documentation - +
    Version: v13.0.0

    LZARDS Backup

    The LZARDS backup task takes an array of granules and initiates backup requests to the LZARDS API, which will be handled asynchronously by LZARDS.

    Deployment

    The LZARDS backup task is not automatically deployed with Cumulus. To deploy the task through the Cumulus module, first you must specify a lzards_launchpad_passphrase in your terraform variables (e.g. variables.tf) like so:

    variable "lzards_launchpad_passphrase" {
    type = string
    default = ""
    }

    Then you can specify a value for your lzards_launchpad_passphrase in terraform.tfvars like so:

    lzards_launchpad_passphrase = your-passphrase

    Lastly, you need to make sure that the lzards_launchpad_passphrase is passed into the Cumulus module (in main.tf) like so:

    lzards_launchpad_passphrase  = var.lzards_launchpad_passphrase

    In short, deploying the LZARDS task requires configuring a passphrase variable and ensuring that your TF configuration passes that variable into the Cumulus module.

    Additional terraform configuration for the LZARDS task can be found in the cumulus module's variables.tf file, where the the relevant variables are prefixed with lzards_. You can add these variables to your deployment using the same process outlined above for lzards_launchpad_passphrase.

    Task Inputs

    Input

    This task expects an array of granules as input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Task Outputs

    Output

    The LZARDS task outputs a composite object containing:

    • the input granules array, and
    • a backupResults object that describes the results of LZARDS backup attempts.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/workflow_tasks/move_granules/index.html b/docs/v13.0.0/workflow_tasks/move_granules/index.html index eabe2348645..9f2b78ecbf7 100644 --- a/docs/v13.0.0/workflow_tasks/move_granules/index.html +++ b/docs/v13.0.0/workflow_tasks/move_granules/index.html @@ -5,13 +5,13 @@ Move Granules | Cumulus Documentation - +
    Version: v13.0.0

    Move Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming event.input array of Cumulus granule objects to do the following:

    • Move granules from their 'staging' location to the final location (as configured in the Sync Granules task)

    • Update the event.input object with the new file locations.

    • If the granule has a ECHO10/UMM CMR file(.cmr.xml or .cmr.json) file included in the event.input:

      • Update that file's access locations

      • Add it to the appropriate access URL category for the CMR filetype as defined by granule CNM filetype.

      • Set the CMR file to 'metadata' in the output granules object and add it to the granule files if it's not already present.

        Please note: Granules without a valid CNM type set in the granule file type field in event.input will be treated as "data" in the updated CMR metadata file

    • Task then outputs an updated list of granule objects.

    Task Inputs

    Input

    This task expects an incoming input that contains a list of 'staged' S3 URIs to move to their final archive location. If CMR metadata is to be updated for a granule, it must also be included in the input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects event.input to provide an array of Cumulus granule objects. The files listed for each granule represent the files to be acted upon as described in summary.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects with post-move file locations as the payload for the next task, and returns only the expected payload for the next task. If a CMR file has been specified for a granule object, the CMR resources related to the granule files will be updated according to the updated granule file metadata.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v13.0.0/workflow_tasks/parse_pdr/index.html b/docs/v13.0.0/workflow_tasks/parse_pdr/index.html index f9548eac729..ee631663538 100644 --- a/docs/v13.0.0/workflow_tasks/parse_pdr/index.html +++ b/docs/v13.0.0/workflow_tasks/parse_pdr/index.html @@ -5,13 +5,13 @@ Parse PDR | Cumulus Documentation - +
    Version: v13.0.0

    Parse PDR

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to do the following with the incoming PDR object:

    • Stage it to an internal S3 bucket

    • Parse the PDR

    • Archive the PDR and remove the staged file if successful

    • Outputs a payload object containing metadata about the parsed PDR (e.g. total size of all files, files counts, etc) and a granules object

    The constructed granules object is created using PDR metadata to determine values like data type and version, collection definitions to determine a file storage location based on the extracted data type and version number.

    Granule file types are converted from the PDR spec types to CNM types according to the following translation table:

      HDF: 'data',
    HDF-EOS: 'data',
    SCIENCE: 'data',
    BROWSE: 'browse',
    METADATA: 'metadata',
    BROWSE_METADATA: 'metadata',
    QA_METADATA: 'metadata',
    PRODHIST: 'qa',
    QA: 'metadata',
    TGZ: 'data',
    LINKAGE: 'data'

    Files missing file types will have none assigned, files with invalid types will result in a PDR parse failure.

    Task Inputs

    Input

    This task expects an incoming input that contains name and path information about the PDR to be parsed. For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    Provider

    A Cumulus provider object. Used to define connection information for retrieving the PDR.

    Bucket

    Defines the bucket where the 'pdrs' folder for parsed PDRs will be stored.

    Collection

    A Cumulus collection object. Used to define granule file groupings and granule metadata for discovered files.

    Task Outputs

    This task outputs a single payload output object containing metadata about the parsed PDR (e.g. filesCount, totalSize, etc), a pdr object with information for later steps and a the generated array of granule objects.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v13.0.0/workflow_tasks/queue_granules/index.html b/docs/v13.0.0/workflow_tasks/queue_granules/index.html index 16f3986bcb4..c318c729c43 100644 --- a/docs/v13.0.0/workflow_tasks/queue_granules/index.html +++ b/docs/v13.0.0/workflow_tasks/queue_granules/index.html @@ -5,14 +5,14 @@ Queue Granules | Cumulus Documentation - +
    Version: v13.0.0

    Queue Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions, and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to schedule ingest of granules that were discovered on a remote host, whether via the DiscoverGranules task or the ParsePDR task.

    The task utilizes a defined collection in concert with a defined provider, either on each granule, or passed in via config to queue up ingest executions for each granule, or for batches of granules.

    The constructed granules object is defined by the collection passed in the configuration, and has impacts to other provided core Cumulus Tasks.

    Users of this task in a workflow are encouraged to carefully consider their configuration in context of downstream tasks and workflows.

    Task Inputs

    Each of the following sections are a high-level discussion of the intent of the various input/output/config values.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects an incoming input that contains granules and information about them and their files. For the specifics, see the Cumulus Tasks page entry for the schema.

    This input is most commonly the output from a preceding DiscoverGranules or ParsePDR task.

    Cumulus Configuration

    This task does expect values to be set in the task_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    provider

    A Cumulus provider object for the originating provider. Will be passed along to the ingest workflow. This will be overruled by more specific provider information that may exist on a granule.

    internalBucket

    The Cumulus internal system bucket.

    granuleIngestWorkflow

    A string property that denotes the name of the ingest workflow into which granules should be queued.

    queueUrl

    A string property that denotes the URL of the queue to which scheduled execution messages are sent.

    preferredQueueBatchSize

    A number property that sets an upper bound on the size of each batch of granules queued into the payload of an ingest execution. Setting this property to a value higher than 1 allows queueing of multiple granules per ingest workflow.

    As ingest executions typically expect granules in the payload to have a common collection and common provider, this property only sets an upper bound within which batches will be created based on common collection and provider information.

    This means batches may be smaller than the preferred size if collection or provider information diverge, but never larger.

    The default value if none is specified is 1, which will queue one ingest execution per granule.

    concurrency

    A number property that determines the level of concurrency with which ingest executions are scheduled. Granules or batches of granules will be queued up into executions at this level of concurrency.

    This property is also used to limit concurrency when updating granule status to queued.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when queue-granules receives a large number of granules as input. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the queue-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    executionNamePrefix

    A string property that will prefix the names of scheduled executions.

    childWorkflowMeta

    An object property that will be merged into the scheduled execution input's meta field.

    Task Outputs

    This task outputs an assembled array of workflow execution ARNs for all scheduled workflow executions within the payload's running object.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/workflows/cumulus-task-message-flow/index.html b/docs/v13.0.0/workflows/cumulus-task-message-flow/index.html index d5730203983..685d5cd4b48 100644 --- a/docs/v13.0.0/workflows/cumulus-task-message-flow/index.html +++ b/docs/v13.0.0/workflows/cumulus-task-message-flow/index.html @@ -5,14 +5,14 @@ Cumulus Tasks: Message Flow | Cumulus Documentation - +
    Version: v13.0.0

    Cumulus Tasks: Message Flow

    Cumulus Tasks comprise Cumulus Workflows and are either AWS Lambda tasks or AWS Elastic Container Service (ECS) activities. Cumulus Tasks permit a payload as input to the main task application code. The task payload is additionally wrapped by the Cumulus Message Adapter. The Cumulus Message Adapter supplies additional information supporting message templating and metadata management of these workflows.

    Diagram showing how incoming and outgoing Cumulus messages for workflow steps are handled by the Cumulus Message Adapter

    The steps in this flow are detailed in sections below.

    Cumulus Message Format

    A full Cumulus Message has the following keys:

    • cumulus_meta: System runtime information that should generally not be touched outside of Cumulus library code or the Cumulus Message Adapter. Stores meta information about the workflow such as the state machine name and the current workflow execution's name. This information is used to look up the current active task. The name of the current active task is used to look up the corresponding task's config in task_config.
    • meta: Runtime information captured by the workflow operators. Stores execution-agnostic variables.
    • payload: Payload is runtime information for the tasks.

    In addition to the above keys, it may contain the following keys:

    • replace: A key generated in conjunction with the Cumulus Message adapter. It contains the location on S3 for a message payload and a Target JSON path in the message to extract it to.
    • exception: A key used to track workflow exceptions, should not be modified outside of Cumulus library code.

    Here's a simple example of a Cumulus Message:

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    A message utilizing the Cumulus Remote message functionality must have at least the keys replace and cumulus_meta. Depending on configuration other portions of the message may be present, however the cumulus_meta, meta, and payload keys must be present once extraction is complete.

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    Cumulus Message Preparation

    The event coming into a Cumulus Task is assumed to be a Cumulus Message and should first be handled by the functions described below before being passed to the task application code.

    Preparation Step 1: Fetch remote event

    Fetch remote event will fetch the full event from S3 if the cumulus message includes a replace key.

    Once "my-large-event.json" is fetched from S3, it's returned from the fetch remote event function. If no "replace" key is present, the event passed to the fetch remote event function is assumed to be a complete Cumulus Message and returned as-is.

    Preparation Step 2: Parse step function config from CMA configuration parameters

    This step determines what current task is being executed. Note this is different from what lambda or activity is being executed, because the same lambda or activity can be used for different tasks. The current task name is used to load the appropriate configuration from the Cumulus Message's 'task_config' configuration parameter.

    Preparation Step 3: Load nested event

    Using the config returned from the previous step, load nested event resolves templates for the final config and input to send to the task's application code.

    Task Application Code

    After message prep, the message passed to the task application code is of the form:

    {
    "input": {},
    "config": {}
    }

    Create Next Message functions

    Whatever comes out of the task application code is used to construct an outgoing Cumulus Message.

    Create Next Message Step 1: Assign outputs

    The config loaded from the Fetch step function config step may have a cumulus_message key. This can be used to "dispatch" fields from the task's application output to a destination in the final event output (via URL templating). Here's an example where the value of input.anykey would be dispatched as the value of payload.out in the final cumulus message:

    {
    "task_config": {
    "bar": "baz",
    "cumulus_message": {
    "input": "{$.payload.input}",
    "outputs": [
    {
    "source": "{$.input.anykey}",
    "destination": "{$.payload.out}"
    }
    ]
    }
    },
    "cumulus_meta": {
    "task": "Example",
    "message_source": "local",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "input": {
    "anykey": "anyvalue"
    }
    }
    }

    Create Next Message Step 2: Store remote event

    If the ReplaceConfiguration parameter is set, the configured key's value will be stored in S3 and the final output of the task will include a replace key that contains configuration for a future step to extract the payload on S3 back into the Cumulus Message. The replace key identifies where the large event node has been stored in S3.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/workflows/developing-a-cumulus-workflow/index.html b/docs/v13.0.0/workflows/developing-a-cumulus-workflow/index.html index f26365d1dc2..00399c503b6 100644 --- a/docs/v13.0.0/workflows/developing-a-cumulus-workflow/index.html +++ b/docs/v13.0.0/workflows/developing-a-cumulus-workflow/index.html @@ -5,13 +5,13 @@ Creating a Cumulus Workflow | Cumulus Documentation - +
    Version: v13.0.0

    Creating a Cumulus Workflow

    The Cumulus workflow module

    To facilitate adding a workflows to your deployment Cumulus provides a workflow module.

    In combination with the Cumulus message, the workflow module provides a way to easily turn a Step Function definition into a Cumulus workflow, complete with:

    Using the module also ensures that your workflows will continue to be compatible with future versions of Cumulus.

    For more on the full set of current available options for the module, please consult the module README.

    Adding a new Cumulus workflow to your deployment

    To add a new Cumulus workflow to your deployment that is using the cumulus module, add a new workflow resource to your deployment directory, either in a new .tf file, or to an existing file.

    The workflow should follow a syntax similar to:

    module "my_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/vx.x.x/terraform-aws-cumulus-workflow.zip"

    prefix = "my-prefix"
    name = "MyWorkflowName"
    system_bucket = "my-internal-bucket"

    workflow_config = module.cumulus.workflow_config

    tags = { Deployment = var.prefix }

    state_machine_definition = <<JSON
    {}
    JSON
    }

    In the above example, you would add your state_machine_definition using the Amazon States Language, using tasks you've developed and Cumulus core tasks that are made available as part of the cumulus terraform module.

    Please note: Cumulus follows the convention of tagging resources with the prefix variable { Deployment = var.prefix } that you pass to the cumulus module. For resources defined outside of Core, it's recommended that you adopt this convention as it makes resources and/or deployment recovery scenarios much easier to manage.

    Examples

    For a functional example of a basic workflow, please take a look at the hello_world_workflow.

    For more complete/advanced examples, please read the following cookbook entries/topics:

    - + \ No newline at end of file diff --git a/docs/v13.0.0/workflows/developing-workflow-tasks/index.html b/docs/v13.0.0/workflows/developing-workflow-tasks/index.html index 9ae30263b9a..7e84c951c01 100644 --- a/docs/v13.0.0/workflows/developing-workflow-tasks/index.html +++ b/docs/v13.0.0/workflows/developing-workflow-tasks/index.html @@ -5,13 +5,13 @@ Developing Workflow Tasks | Cumulus Documentation - +
    Version: v13.0.0

    Developing Workflow Tasks

    Workflow tasks can be either AWS Lambda Functions or ECS Activities.

    Lambda functions

    The full set of available core Lambda functions can be found in the deployed cumulus module zipfile at /tasks, as well as reference documentation here. These Lambdas can be referenced in workflows via the outputs from that module (see the cumulus-template-deploy repo for an example).

    The tasks source is located in the Cumulus repository at cumulus/tasks.

    You can also develop your own Lambda function. See the Lambda Functions page to learn more.

    ECS Activities

    ECS activities are supported via the cumulus_ecs_module available from the Cumulus release page.

    Please read the module README for configuration details.

    For assistance in creating a task definition within the module read the AWS Task Definition Docs.

    For a step-by-step example of using the cumulus_ecs_module, please see the related cookbook entry.

    Cumulus Docker Image

    ECS activities require a docker image. Cumulus provides a docker image (source for node 12x+ lambdas on dockerhub: cumuluss/cumulus-ecs-task.

    Alternate Docker Images

    Custom docker images/runtimes are supported as are private registries. For details on configuring a private registry/image see the AWS documentation on Private Registry Authentication for Tasks.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/workflows/docker/index.html b/docs/v13.0.0/workflows/docker/index.html index 70c111b2577..d7d2fcc775e 100644 --- a/docs/v13.0.0/workflows/docker/index.html +++ b/docs/v13.0.0/workflows/docker/index.html @@ -5,7 +5,7 @@ Dockerizing Data Processing | Cumulus Documentation - + @@ -14,7 +14,7 @@ 2) validate the output (in this case just check for existence) 3) use 'ncatted' to update the resulting file to be CF-compliant 4) write out metadata generated for this file

    Process Testing

    It is important to have tests for data processing, however in many cases datafiles can be large so it is not practical to store the test data in the repository. Instead, test data is currently stored on AWS S3, and can be retrieved using the AWS CLI.

    aws s3 sync s3://cumulus-ghrc-logs/sample-data/collection-name data

    Where collection-name is the name of the data collection, such as 'avaps', or 'cpl'. For example, an abridged version of the data for CPL includes:

    ├── cpl
    │   ├── input
    │   │   ├── HS3_CPL_ATB_12203a_20120906.hdf5
    │   │   ├── HS3_CPL_OP_12203a_20120906.hdf5
    │   └── output
    │   ├── HS3_CPL_ATB_12203a_20120906.nc
    │   ├── HS3_CPL_ATB_12203a_20120906.nc.meta.xml
    │   ├── HS3_CPL_OP_12203a_20120906.nc
    │   ├── HS3_CPL_OP_12203a_20120906.nc.meta.xml

    Contained in the input directory are all possible sets of data files, while the output directory is the expected result of processing. In this case the hdf5 files are converted to NetCDF files and XML metadata files are generated.

    The docker image for a process can be used on the retrieved test data. First create a test-output directory in the newly created data directory.

    mkdir data/test-output

    Then run the docker image using docker-compose.

    docker-compose run test

    This will process the data in the data/input directory and put the output into data/test-output. Repositories also include Python based tests which will validate this newly created output to the contents of data/output. Use Python's Nose tool to run the included tests.

    nosetests

    If the data/test-output directory validated against the contents of data/output the tests will be successful, otherwise an error will be reported.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/workflows/index.html b/docs/v13.0.0/workflows/index.html index c2e04aaf213..a1d667149a1 100644 --- a/docs/v13.0.0/workflows/index.html +++ b/docs/v13.0.0/workflows/index.html @@ -5,13 +5,13 @@ Workflows | Cumulus Documentation - +
    Version: v13.0.0

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    Provider data ingest and GIBS have a set of common needs in getting data from a source system and into the cloud where they can be distributed to end users. These common needs are:

    • Data Discovery - Crawling, polling, or detecting changes from a variety of sources.
    • Data Transformation - Taking data files in their original format and extracting and transforming them into another desired format such as visible browse images.
    • Archival - Storage of the files in a location that's accessible to end users.

    The high level view of the architecture and many of the individual steps are the same but the details of ingesting each type of collection differs. Different collection types and different providers have different needs. The individual boxes of a workflow are not only different. The branching, error handling, and multiplicity of the arrows connecting the boxes are also different. Some need visible images rendered from component data files from multiple collections. Some need to contact the CMR with updated metadata. Some will have different retry strategies to handle availability issues with source data systems.

    AWS and other cloud vendors provide an ideal solution for parts of these problems but there needs to be a higher level solution to allow the composition of AWS components into a full featured solution. The Ingest Workflow Architecture is designed to meet the needs for Earth Science data ingest and transformation.

    Goals

    Flexibility and Composability

    The steps to ingest and process data is different for each collection within a provider. Ingest should be as flexible as possible in the rearranging of steps and configuration.

    We want to use lego-like individual steps that can be composed by an operator.

    Individual steps should ...

    • Be as ignorant as possible of the overall flow. They should not be aware of previous steps.
    • Be runnable on their own.
    • Define their input and output in simple data structures.
    • Be domain agnostic.
    • Not make assumptions of specifics of what goes into a granule for example.

    Scalable

    The ingest architecture needs to be scalable both to handle ingesting hundreds of millions of granules and interpret dozens of different workflows.

    Data Provenance

    • We should have traceability for how data was produced and where it comes from.
    • Use immutable representations of data. Data once received is not overwritten. Data can be removed for cleanup.
    • All software is versioned. We can trace transformation of data by tracking the immutable source data and the versioned software applied to it.

    Operator Visibility and Control

    • Operators should be able to see and understand everything that is happening in the system.
    • It should be obvious why things are happening and straightforward to diagnose problems.
    • We generally assume that the operators know best in terms of the limits on a providers infrastructure, how often things need to be done, and details of a collection. The architecture should defer to their decisions and knowledge while providing safety nets to prevent problems.

    A Reconfigurable Workflow Architecture

    The Ingest Workflow Architecture is defined by two entity types, Workflows and Tasks. A Workflow is a set of composed Tasks to complete an objective such as ingesting a granule. Tasks are the individual steps of a Workflow that perform one job. The workflow is responsible for executing the right task based on the current state and response from the last task executed. Tasks are completely decoupled in that they don't call each other or even need to know about the presence of other tasks.

    Workflows and tasks are configured as Terraform resources, which are triggered via configured rules within Cumulus.

    Diagram showing the Step Function execution path through workflow tasks for a collection ingest

    See the Example GIBS Ingest Architecture showing how workflows and tasks are used to define the GIBS Ingest Architecture.

    Workflows

    A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions.

    Benefits of AWS Step Functions

    AWS Step functions are described in detail in the AWS documentation but they provide several benefits which are applicable to AWS.

    • Prebuilt solution
    • Operations Visibility
      • Visual diagram
      • Every execution is recorded with both inputs and output for every step.
    • Composability
      • Allow composing AWS Lambdas and code running in other steps. Code can be run in EC2 to interface with it or even on premise if desired.
      • Step functions allow specifying when steps run in parallel or choices between steps based on data from the previous step.
    • Flexibility
      • Step functions are designed to be easy to build new applications and reconfigure. We're exposing that flexibility directly to the provider.
    • Reliability and Error Handling
      • Step functions allow configuration of retries and adding handling of error conditions.
    • Described via data
      • This makes it easy to save the step function in configuration management solutions.
      • We can build simple interfaces on top of the flexibility provided.

    Workflow Scheduler

    The scheduler is responsible for initiating a step function and passing in the relevant data for a collection. This is currently configured as an interval for each collection. The scheduler service creates the initial event by combining the collection configuration with the AWS execution context defined via the cumulus terraform module.

    Tasks

    A workflow is composed of tasks. Each task is responsible for performing a discrete step of the ingest process. These can be activities like:

    • Crawling a provider website for new data.
    • Uploading data from a provider to S3.
    • Executing a process to transform data.

    AWS Step Functions permit tasks to be code running anywhere, even on premise. We expect most tasks will be written as Lambda functions in order to take advantage of the easy deployment, scalability, and cost benefits provided by AWS Lambda.

    • Leverages Existing Work
      • The design leverages the existing work of Amazon by defining workflows using the AWS Step Function State Language. This is the language that was created for describing the state machines used in AWS Step Functions.
    • Open for Extension
      • Both meta and task_config which are used for configuring at the collection and task levels do not dictate the fields and structure of the configuration. Additional task specific JSON schemas can be used for extending the validation of individual steps.
    • Data-centric Configuration
      • The use of a single JSON configuration file allows this to be added to a workflow. We build additional support on top of the configuration file for simpler domain specific configuration or interactive GUIs.

    For more details on Task Messages and Configuration, visit Cumulus configuration and message protocol documentation.

    Ingest Deploy

    To view deployment documentation, please see the Cumulus deployment documentation.

    Tradeoffs, and Benefits

    This section documents various tradeoffs and benefits of the Ingest Workflow Architecture.

    Tradeoffs

    Workflow execution is handled completely by AWS

    This means we can't add our own code into the orchestration of the workflow. We can't add new features not supported by Step Functions. We can't do things like enforce that the responses from tasks always conform to a schema or extract the configuration for a task ahead of it's execution.

    If we implemented our own orchestration we'd be able to add all of these. We save significant amounts of development effort and gain all the features of Step Functions for this trade off. One workaround is by providing a library of common task capabilities. These would optionally be available to tasks that can be implemented with Node.js and are able to include the library.

    Workflow Configuration is specified in AWS Step Function States Language

    The current design combines the states language defined by AWS with Ingest specific configuration. This means our representation has a tight coupling with their standard. If they make backwards incompatible changes in the future we will have to deal with existing projects written against that.

    We avoid having to develop our own standard and code to process it. The design can support new features in AWS Step Functions without needing to update the Ingest library code changes. It is unlikely they will make a backwards incompatible change at this point. One mitigation for this is writing data transformations to a new format if that were to happen.

    Collection Configuration Flexibility vs Complexity

    The Collections Configuration File is very flexible but requires more knowledge of AWS step functions to configure. A person modifying this file directly would need to comfortable editing a JSON file and configuring AWS Step Functions state transitions which address AWS resources.

    The configuration file itself is not necessarily meant to be edited by a human directly. Since we are developing a reconfigurable, composable architecture that specified entirely in data additional tools can be developed on top of it. The existing recipes.json files can be mapped to this format. Operational Tools like a GUI can be built that provide a usable interface for customizing workflows but it will take time to develop these tools.

    Benefits

    This section describes benefits of the Ingest Workflow Architecture.

    Simplicity

    The concepts of Workflows and Tasks are simple ones that should make sense to providers. Additionally, the implementation will only consist of a few components because the design leverages existing services and capabilities of AWS. The Ingest implementation will only consist of some reusable task code to make task implementation easier, Ingest deployment, and the Workflow Scheduler.

    Composability

    The design aims to satisfy the needs for ingest integrating different workflows for providers. It's flexible in terms of the ability to arrange tasks to meet the needs of a collection. Providers have developed and incorporated open source tools over the years. All of these are easily integrable into the workflows as tasks.

    There is low coupling between task steps. Failures of one component don't bring the whole system down. Individual tasks can be deployed separately.

    Scalability

    AWS Step Functions scale up as needed and aren't limited by a set of number of servers. They also easily allow you to leverage the inherent scalability of serverless functions.

    Monitoring and Auditing

    • Every execution is captured.
    • Every task run has captured input and outputs.
    • CloudWatch Metrics can be used for monitoring many of the events with the StepFunctions. It can also generate alarms for the whole process.
    • Visual report of the entire configuration.
      • Errors and success states are highlighted visually in the flow.

    Data Provenance

    • Monitoring and auditing ensures we know the data that was given to a task.
    • Workflows are versioned and the state machines stored in AWS Step Functions are immutable. Once created they cannot change.
    • Versioning of data in S3 or using immutable records in S3 will mean we always know what data was created as the result of a step or fed into a step.

    Appendix

    Example GIBS Ingest Architecture

    This shows the GIBS Ingest Architecture as an example of the use of the Ingest Workflow Architecture.

    • The GIBS Ingest Architecture consists of two workflows per collection type. There is one for discovery and one for ingest. The final stage of discovery triggers multiple ingest workflows for each MRF granule that needs to be generated.
    • It demonstrates both lambdas as tasks and a container used for MRF generation.

    GIBS Ingest Workflows

    Diagram showing the AWS Step Function execution path for a GIBS ingest workflow

    GIBS Ingest Granules Workflow

    This shows a visualization of an execution of the ingets granules workflow in step functions. The steps highlighted in green are the ones that executed and completed successfully.

    Diagram showing the AWS Step Function execution path for a GIBS ingest granules workflow

    - + \ No newline at end of file diff --git a/docs/v13.0.0/workflows/input_output/index.html b/docs/v13.0.0/workflows/input_output/index.html index eb5b4d65641..489a1ab2f99 100644 --- a/docs/v13.0.0/workflows/input_output/index.html +++ b/docs/v13.0.0/workflows/input_output/index.html @@ -5,14 +5,14 @@ Workflow Inputs & Outputs | Cumulus Documentation - +
    Version: v13.0.0

    Workflow Inputs & Outputs

    General Structure

    Cumulus uses a common format for all inputs and outputs to workflows. The same format is used for input and output from workflow steps. The common format consists of a JSON object which holds all necessary information about the task execution and AWS environment. Tasks return objects identical in format to their input with the exception of a task-specific payload field. Tasks may also augment their execution metadata.

    Cumulus Message Adapter

    The Cumulus Message Adapter and Cumulus Message Adapter libraries help task developers integrate their tasks into a Cumulus workflow. These libraries adapt input and outputs from tasks into the Cumulus Message format. The Scheduler service creates the initial event message by combining the collection configuration, external resource configuration, workflow configuration, and deployment environment settings. The subsequent workflow messages between tasks must conform to the message schema. By using the Cumulus Message Adapter, individual task Lambda functions only receive the input and output specifically configured for the task, and not non-task-related message fields.

    The Cumulus Message Adapter libraries are called by the tasks with a callback function containing the business logic of the task as a parameter. They first adapt the incoming message to a format more easily consumable by Cumulus tasks, then invoke the task, and then adapt the task response back to the Cumulus message protocol to be sent to the next task.

    A task's Lambda function can be configured to include a Cumulus Message Adapter library which constructs input/output messages and resolves task configurations. The CMA can then be included in one of several ways:

    Lambda Layer

    In order to make use of this configuration, a Lambda layer must be uploaded to your account. Due to platform restrictions, Core cannot currently support sharable public layers, however you can deploy the appropriate version from the release page in two ways:

    Once you've deployed the layer, integrate the CMA layer with your Lambdas:

    • If using the cumulus module, set the cumulus_message_adapter_lambda_layer_version_arn in your .tfvars file to integrate the CMA layer with all core Cumulus lambdas.
    • If including your own Lambda or ECS task Terraform modules, specify the CMA layer ARN in the Terraform resource definitions. Also, make sure to set the CUMULUS_MESSAGE_ADAPTER_DIR environment variable for the task to /opt for the CMA integration to work properly.

    In the future if you wish to update/change the CMA version you will need to update the deployed CMA, and update the layer configuration for the impacted Lambdas as needed.

    Please Note: Updating/removing a layer does not change a deployed Lambda, so to update the CMA you should deploy a new version of the CMA layer, update the associated Lambda configuration to reference the new CMA version, and re-deploy your Lambdas.

    Manual Addition

    You can include the CMA package in the Lambda code in the cumulus-message-adapter sub-directory in your lambda .zip, for any Lambda runtime that includes a python runtime. python 2 is included in Lambda runtimes that use Amazon Linux, however Amazon Linux 2 will not support this directly.

    Please note: It is expected that upcoming Cumulus releases will update the CMA layer to include a python runtime.

    If you are manually adding the message adapter to your source and utilizing the CMA, you should set the Lambda's CUMULUS_MESSAGE_ADAPTER_DIR environment variable to target the installation path for the CMA.

    CMA Input/Output

    Input to the task application code is a json object with keys:

    • input: By default, the incoming payload is the payload output from the previous task, or it can be a portion of the payload as configured for the task in the corresponding .tf workflow definition file.
    • config: Task-specific configuration object with URL templates resolved.

    Output from the task application code is returned in and placed in the payload key by default, but the config key can also be used to return just a portion of the task output.

    CMA configuration

    As of Cumulus > 1.15 and CMA > v1.1.1, configuration of the CMA is expected to be driven by AWS Step Function Parameters.

    Using the CMA package with the Lambda by any of the above mentioned methods (Lambda Layers, manual) requires configuration for its various features via a specific Step Function Parameters configuration format (see sample workflows in the examples cumulus-tf source for more examples):

    {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": "{some config}",
    "task_config": "{some config}"
    }
    }

    The "event.$": "$" parameter is required as it passes the entire incoming message to the CMA client library for parsing, and the CMA itself to convert the incoming message into a Cumulus message for use in the function.

    The following are the CMA's current configuration settings:

    ReplaceConfig (Cumulus Remote Message)

    Because of the potential size of a Cumulus message, mainly the payload field, a task can be set via configuration to store a portion of its output on S3 with a message key Remote Message that defines how to retrieve it and an empty JSON object {} in its place. If the portion of the message targeted exceeds the configured MaxSize (defaults to 0 bytes) it will be written to S3.

    The CMA remote message functionality can be configured using parameters in several ways:

    Partial Message

    Setting the Path/Target path in the ReplaceConfig parameter (and optionally a non-default MaxSize)

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 1,
    "Path": "$.payload",
    "TargetPath": "$.payload"
    }
    }
    }
    }
    }

    will result in any payload output larger than the MaxSize (in bytes) to be written to S3. The CMA will then mark that the key has been replaced via a replace key on the event. When the CMA picks up the replace key in future steps, it will attempt to retrieve the output from S3 and write it back to payload.

    Note that you can optionally use a different TargetPath than Path, however as the target is a JSON path there must be a key to target for replacement in the output of that step. Also note that the JSON path specified must target one node, otherwise the CMA will error, as it does not support multiple replacement targets.

    If TargetPath is omitted, it will default to the value for Path.

    Full Message

    Setting the following parameters for a lambda:

    DiscoverGranules:
    Parameters:
    cma:
    event.$: '$'
    ReplaceConfig:
    FullMessage: true

    will result in the CMA assuming the entire inbound message should be stored to S3 if it exceeds the default max size.

    This is effectively the same as doing:

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 0,
    "Path": "$",
    "TargetPath": "$"
    }
    }
    }
    }
    }

    Cumulus Message example

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Cumulus Remote Message example

    The message may contain a reference to an S3 Bucket, Key and TargetPath as follows:

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    task_config

    This configuration key contains the input/output configuration values for definition of inputs/outputs via URL paths. Important: These values are all relative to json object configured for event.$.

    This configuration's behavior is outlined in the CMA step description below.

    The configuration should follow the format:

    {
    "FunctionName": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "other_cma_configuration": "<config object>",
    "task_config": "<task config>"
    }
    }
    }
    }

    Example:

    {
    "StepFunction": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "sfnEnd": true,
    "stack": "{$.meta.stack}",
    "bucket": "{$.meta.buckets.internal.name}",
    "stateMachine": "{$.cumulus_meta.state_machine}",
    "executionName": "{$.cumulus_meta.execution_name}",
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    }
    }
    }

    Cumulus Message Adapter Steps

    1. Reformat AWS Step Function message into Cumulus Message

    Due to the way AWS handles Parameterized messages, when Parameters are used the CMA takes an inbound message:

    {
    "resource": "arn:aws:lambda:us-east-1:<lambda arn values>",
    "input": {
    "Other Parameter": {},
    "cma": {
    "ConfigKey": {
    "config values": "some config values"
    },
    "event": {
    "cumulus_meta": {},
    "payload": {},
    "meta": {},
    "exception": {}
    }
    }
    }
    }

    and takes the following actions:

    • Takes the object at input.cma.event and makes it the full input
    • Merges all of the keys except event under input.cma into the parent input object

    This results in the incoming message (presumably a Cumulus message) with any cma configuration parameters merged in being passed to the CMA. All other parameterized values defined outside of the cma key are ignored

    2. Resolve Remote Messages

    If the incoming Cumulus message has a replace key value, the CMA will attempt to pull the payload from S3,

    For example, if the incoming contains the following:

      "meta": {
    "foo": {}
    },
    "replace": {
    "TargetPath": "$.meta.foo",
    "Bucket": "some_bucket",
    "Key": "events/some-event-id"
    }

    The CMA will attempt to pull the file stored at Bucket/Key and replace the value at TargetPath, then remove the replace object entirely and continue.

    3. Resolve URL templates in the task configuration

    In the workflow configuration (defined under the task_config key), each task has its own configuration, and it can use URL template as a value to achieve simplicity or for values only available at execution time. The Cumulus Message Adapter resolves the URL templates (relative to the event configuration key) and then passes message to next task. For example, given a task which has the following configuration:

    {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }
    }
    }
    }

    and and incoming message that contains:

    {
    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    }
    }

    The corresponding Cumulus Message would contain:

    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }

    The message sent to the task would be:

    "config" : {
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    },
    "inlinestr": "prefixbarsuffix",
    "array": ["bar"],
    "object": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    },
    "input": "{...}"

    URL template variables replace dotted paths inside curly brackets with their corresponding value. If the Cumulus Message Adapter cannot resolve a value, it will ignore the template, leaving it verbatim in the string. While seemingly complex, this allows significant decoupling of Tasks from one another and the data that drives them. Tasks are able to easily receive runtime configuration produced by previously run tasks and domain data.

    4. Resolve task input

    By default, the incoming payload is the payload from the previous task. The task can also be configured to use a portion of the payload its input message. For example, given a task specifies cma.task_config.cumulus_message.input:

        ExampleTask:
    Parameters:
    cma:
    event.$: '$'
    task_config:
    cumulus_message:
    input: '{$.payload.foo}'

    The task configuration in the message would be:

        {
    "task_config": {
    "cumulus_message": {
    "input": "{$.payload.foo}"
    }
    },
    "payload": {
    "foo": {
    "anykey": "anyvalue"
    }
    }
    }

    The Cumulus Message Adapter will resolve the task input, instead of sending the whole payload as task input, the task input would be:

        {
    "input" : {
    "anykey": "anyvalue"
    },
    "config": {...}
    }

    5. Resolve task output

    By default, the task's return value is the next payload. However, the workflow task configuration can specify a portion of the return value as the next payload, and can also augment values to other fields. Based on the task configuration under cma.task_config.cumulus_message.outputs, the Message Adapter uses a task's return value to output a message as configured by the task-specific config defined under cma.task_config. The Message Adapter dispatches a "source" to a "destination" as defined by URL templates stored in the task-specific cumulus_message.outputs. The value of the task's return value at the "source" URL is used to create or replace the value of the task's return value at the "destination" URL. For example, given a task specifies cumulus_message.output in its workflow configuration as follows:

    {
    "ExampleTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    }
    }
    }
    }
    }

    The corresponding Cumulus Message would be:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Given the response from the task is:

        {
    "output": {
    "anykey": "boo"
    }
    }

    The Cumulus Message Adapter would output the following Cumulus Message:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    6. Apply Remote Message Configuration

    If the ReplaceConfig configuration parameter is defined, the CMA will evaluate the configuration options provided, and if required write a portion of the Cumulus Message to S3, and add a replace key to the message for future steps to utilize.

    Please Note: the non user-modifiable field cumulus-meta will always be retained, regardless of the configuration.

    For example, if the output message (post output configuration) from a cumulus message looks like:

        {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    the resultant output would look like:

    {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "replace": {
    "TargetPath": "$",
    "Bucket": "some-internal-bucket",
    "Key": "events/some-event-id"
    }
    }

    Additional features

    Validate task input, output and configuration messages against the schemas provided

    The Cumulus Message Adapter has the capability to validate task input, output and configuration messages against their schemas. The default location of the schemas is the schemas folder in the top level of the task and the default filenames are input.json, output.json, and config.json. The task can also configure a different schema location. If no schema can be found, the Cumulus Message Adapter will not validate the messages.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/workflows/lambda/index.html b/docs/v13.0.0/workflows/lambda/index.html index 485dda0fb3a..b652b08fd65 100644 --- a/docs/v13.0.0/workflows/lambda/index.html +++ b/docs/v13.0.0/workflows/lambda/index.html @@ -5,13 +5,13 @@ Develop Lambda Functions | Cumulus Documentation - +
    Version: v13.0.0

    Develop Lambda Functions

    Develop a new Cumulus Lambda

    AWS provides great getting started guide for building Lambdas in the developer guide.

    Cumulus currently supports the following environments for Cumulus Message Adapter enabled functions:

    Additionally you may chose to include any of the other languages AWS supports as a resource with reduced feature support.

    Deploy a Lambda

    Node.js Lambda

    For a new Node.js Lambda, create a new function and add an aws_lambda_function resource to your Cumulus deployment (for examples, see the example in source example/lambdas.tf and ingest/lambda-functions.tf) as either a new .tf file, or added to an existing .tf file:

    resource "aws_lambda_function" "myfunction" {
    function_name = "${var.prefix}-function"
    filename = "/path/to/zip/lambda.zip"
    source_code_hash = filebase64sha256("/path/to/zip/lambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"

    vpc_config {
    subnet_ids = var.subnet_ids
    security_group_ids = var.security_group_ids
    }
    }

    Please note: This example contains the minimum set of required configuration.

    Make sure to include a vpc_config that matches the information you've provided the cumulus module if intending to integrate the lambda with a Cumulus deployment.

    Java Lambda

    Java Lambdas are created in much the same way as the Node.js example above.

    The source points to a folder with the compiled .class files and dependency libraries in the Lambda Java zip folder structure (details here), not an uber-jar.

    The deploy folder referenced here would contain a folder 'test_task/task/' which contains Task.class and TaskLogic.class as well as a lib folder containing dependency jars.

    Python Lambda

    Python Lambdas are created the same way as the Node.js example above.

    Cumulus Message Adapter

    For Lambdas wishing to utilize the Cumulus Message Adapter(CMA), you should define a layers key on your Lambda resource with the CMA you wish to include. See the input_output docs for more on how to create/use the CMA.

    Other Lambda Options

    Cumulus supports all of the options available to you via the aws_lambda_function Terraform resource. For more information on what's available, check out the Terraform resource docs.

    Cloudwatch log groups

    If you want to enable Cloudwatch logging for your Lambda resource, you'll need to add a aws_cloudwatch_log_group resource to your Lambda definition:

    resource "aws_cloudwatch_log_group" "myfunction_log_group" {
    name = "/aws/lambda/${aws_lambda_function.myfunction.function_name}"
    retention_in_days = 30
    tags = { Deployment = var.prefix }
    }
    - + \ No newline at end of file diff --git a/docs/v13.0.0/workflows/protocol/index.html b/docs/v13.0.0/workflows/protocol/index.html index 77f5d39679a..e6f17b76a78 100644 --- a/docs/v13.0.0/workflows/protocol/index.html +++ b/docs/v13.0.0/workflows/protocol/index.html @@ -5,13 +5,13 @@ Workflow Protocol | Cumulus Documentation - +
    Version: v13.0.0

    Workflow Protocol

    Configuration and Message Use Diagram

    A diagram showing at which point in a workflow the Cumulus message is checked for conformity with the message schema and where the configuration is checked for conformity with the configuration schema

    • Configuration - The Cumulus workflow configuration defines everything needed to describe an instance of Cumulus.
    • Scheduler - This starts ingest of a collection on configured intervals.
    • Input to Step Functions - The Scheduler uses the Configuration as source data to construct the input to the Workflow.
    • AWS Step Functions - Run the workflows as kicked off by the scheduler or other processes.
    • Input to Task - The input for each task is a JSON document that conforms to the message schema.
    • Output from Task - The output of each task must conform to the message schemas as well and is used as the input for the subsequent task.
    - + \ No newline at end of file diff --git a/docs/v13.0.0/workflows/workflow-configuration-how-to/index.html b/docs/v13.0.0/workflows/workflow-configuration-how-to/index.html index 69d966c0c1e..ebfc9697a1f 100644 --- a/docs/v13.0.0/workflows/workflow-configuration-how-to/index.html +++ b/docs/v13.0.0/workflows/workflow-configuration-how-to/index.html @@ -5,7 +5,7 @@ Workflow Configuration How To's | Cumulus Documentation - + @@ -24,7 +24,7 @@ To take a subset of any given metadata, use the option substring.

    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{substring(file.fileName, 0, 3)}"

    This example will populate to "MOD09GQ/MOD"

    In addition to substring, several datetime-specific functions are available, which can parse a datetime string in the metadata and extract a certain part of it:

    "url_path": "{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"

    or

     "url_path": "{dateFormat(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime, YYYY-MM-DD[T]HH[:]mm[:]ss)}"

    The following functions are implemented:

    • extractYear - returns the year, formatted as YYYY
    • extractMonth - returns the month, formatted as MM
    • extractDate - returns the day of the month, formatted as DD
    • extractHour - returns the hour in 24-hour format, with no leading zero
    • dateFormat - takes a second argument describing how to format the date, and passes the metadata date string and the format argument to moment().format()

    Note: the move-granules step needs to be in the workflow for this template to be populated and the file moved. This cmrMetadata or CMR granule XML needs to have been generated and stored on S3. From there any field could be retrieved and used for a url_path.

    Adding Metadata dates and times to the URL Path

    There are a number of options to pull dates from the CMR file metadata. With this metadata:

    <Granule>
    <Temporal>
    <RangeDateTime>
    <BeginningDateTime>2003-02-19T00:00:00Z</BeginningDateTime>
    <EndingDateTime>2003-02-19T23:59:59Z</EndingDateTime>
    </RangeDateTime>
    </Temporal>
    </Granule>

    The following examples of url_path could be used.

    {extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the year from the full date: 2003.

    {extractMonth(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the month: 2.

    {extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the day: 19.

    {extractHour(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the hour: 0.

    Different values can be combined to create the url_path. For example

    {
    "bucket": "sample-protected-bucket",
    "name": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)/extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"
    }

    The final file location for the above would be s3://sample-protected-bucket/MOD09GQ/2003/19/MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.

    - + \ No newline at end of file diff --git a/docs/v13.0.0/workflows/workflow-triggers/index.html b/docs/v13.0.0/workflows/workflow-triggers/index.html index 59f87991f9e..4709ee9bacb 100644 --- a/docs/v13.0.0/workflows/workflow-triggers/index.html +++ b/docs/v13.0.0/workflows/workflow-triggers/index.html @@ -5,13 +5,13 @@ Workflow Triggers | Cumulus Documentation - +
    Version: v13.0.0

    Workflow Triggers

    For a workflow to run, it needs to be associated with a rule (see rule configuration). The rule configuration determines how and when a workflow execution is triggered. Rules can be triggered one time, on a schedule, or by new data written to a kinesis stream.

    There are three lambda functions in the API package responsible for scheduling and starting workflows: SF scheduler, message consumer, and SF starter. Each Cumulus instance comes with a Start SF SQS queue.

    The SF scheduler lambda puts a message onto the Start SF queue. This message is picked up the Start SF lambda and an execution is started with the body of the message as the input.

    When a one time rule is created, the schedule SF lambda is triggered. Rules that are not one time are associated with a CloudWatch event which will manage the trigger of the lambdas that trigger the workflows.

    For a scheduled rule, the Cloudwatch event is triggered on the given schedule which calls directly to the schedule SF lambda.

    For a kinesis rule, when data is added to the kinesis stream, the Cloudwatch event is triggered, which calls the message consumer lambda. The message consumer lambda parses the kinesis message and finds all of the rules associated with that message. For each rule (which corresponds to one workflow), the schedule SF lambda is triggered to queue a message to start the workflow.

    For an sns rule, when a message is published to the SNS topic, the message consumer receives the SNS message (JSON expected), parses it into an object, starts a new execution of the workflow associated with the rule and passes the object in the payload field of the Cumulus message.

    Diagram showing how workflows are scheduled via rules

    - + \ No newline at end of file diff --git a/docs/v13.4.0/adding-a-task/index.html b/docs/v13.4.0/adding-a-task/index.html index e3095238758..db7c6c0589c 100644 --- a/docs/v13.4.0/adding-a-task/index.html +++ b/docs/v13.4.0/adding-a-task/index.html @@ -5,13 +5,13 @@ Contributing a Task | Cumulus Documentation - +
    Version: v13.4.0

    Contributing a Task

    We're tracking reusable Cumulus tasks in this list and, if you've got one you'd like to share with others, you can add it!

    Right now we're focused on tasks distributed via npm, but are open to including others. For now the script that pulls all the data for each package only supports npm.

    The tasks.md file is generated in the build process

    The tasks list in docs/tasks.md is generated from the list of task package names from the tasks folder.

    Do not edit the docs/tasks.md file directly.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/api/index.html b/docs/v13.4.0/api/index.html index 6f3fd1a054e..0d64c6151b6 100644 --- a/docs/v13.4.0/api/index.html +++ b/docs/v13.4.0/api/index.html @@ -5,13 +5,13 @@ Cumulus API | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v13.4.0/architecture/index.html b/docs/v13.4.0/architecture/index.html index af4a64c0c3f..2bc5c35f5ea 100644 --- a/docs/v13.4.0/architecture/index.html +++ b/docs/v13.4.0/architecture/index.html @@ -5,14 +5,14 @@ Architecture | Cumulus Documentation - +
    Version: v13.4.0

    Architecture

    Architecture

    Below, find a diagram with the components that comprise an instance of Cumulus.

    Architecture diagram of a Cumulus deployment

    This diagram details all of the major architectural components of a Cumulus deployment.

    While the diagram can feel complex, it can easily be digested in several major components:

    Data Distribution

    End Users can access data via Cumulus's distribution submodule, which includes ASF's thin egress application, this provides authenticated data egress, temporary S3 links and other statistics features.

    End user exposure of Cumulus's holdings is expected to be provided by an external service.

    For NASA use, this is assumed to be CMR in this diagram.

    Data ingest

    Workflows

    The core of the ingest and processing capabilities in Cumulus is built into the deployed AWS Step Function workflows. Cumulus rules trigger workflows via either Cloud Watch rules, Kinesis streams, SNS topic, or SQS queue. The workflows then run with a configured Cumulus message, utilizing built-in processes to report status of granules, PDRs, executions, etc to the Data Persistence components.

    Workflows can optionally report granule metadata to CMR, and workflow steps can report metrics information to a shared SNS topic, which could be subscribed to for near real time granule, execution, and PDR status. This could be used for metrics reporting using an external ELK stack, for example.

    Data persistence

    Cumulus entity state data is stored in a set of PostgreSQL compatible database, and is exported to an Elasticsearch instance for non-authoritative querying/state data for the API and other applications that require more complex queries. Currently the entity state data is replicated in DynamoDB and this will be removed in a future release.

    Data discovery

    Discovering data for ingest is handled via workflow step components using Cumulus provider and collection configurations and various triggers. Data can be ingested from AWS S3, FTP, HTTPS and more.

    Database

    Cumulus utilizes a user-provided PostgreSQL database backend. For improved API search query efficiency Cumulus provides data replication to an Elasticsearch instance. For legacy reasons, Cumulus is currently also deploying a DynamoDB datastore, and writes are replicated in parallel with the PostgreSQL database writes. The DynamoDB replicated tables and parallel writes will be removed in future releases.

    PostgreSQL Database Schema Diagram

    ERD of the Cumulus Database

    Maintenance

    System maintenance personnel have access to manage ingest and various portions of Cumulus via an AWS API gateway, as well as the operator dashboard.

    Deployment Structure

    Cumulus is deployed via Terraform and is organized internally into two separate top-level modules, as well as several external modules.

    Cumulus

    The Cumulus module, which contains multiple internal submodules, deploys all of the Cumulus components that are not part of the Data Persistence portion of this diagram.

    Data persistence

    The data persistence module provides the Data Persistence portion of the diagram.

    Other modules

    Other modules are provided as artifacts on the release page for use in users configuring their own deployment and contain extracted subcomponents of the cumulus module. For more on these components see the components documentation.

    For more on the specific structure, examples of use and how to deploy and more, please see the deployment docs as well as the cumulus-template-deploy repo .

    - + \ No newline at end of file diff --git a/docs/v13.4.0/configuration/cloudwatch-retention/index.html b/docs/v13.4.0/configuration/cloudwatch-retention/index.html index 67bdafb9655..0a97866f1c4 100644 --- a/docs/v13.4.0/configuration/cloudwatch-retention/index.html +++ b/docs/v13.4.0/configuration/cloudwatch-retention/index.html @@ -5,13 +5,13 @@ Cloudwatch Retention | Cumulus Documentation - +
    Version: v13.4.0

    Cloudwatch Retention

    Our lambdas dump logs to AWS CloudWatch. By default, these logs exist indefinitely. However, there are ways to specify a duration for log retention.

    aws-cli

    In addition to getting your aws-cli set-up, there are two values you'll need to acquire.

    1. log-group-name: the name of the log group who's retention policy (retention time) you'd like to change. We'll use /aws/lambda/KinesisInboundLogger in our examples.
    2. retention-in-days: the number of days you'd like to retain the logs in the specified log group for. There is a list of possible values available in the aws logs documentation.

    For example, if we wanted to set log retention to 30 days on our KinesisInboundLogger lambda, we would write:

    aws logs put-retention-policy --log-group-name "/aws/lambda/KinesisInboundLogger" --retention-in-days 30

    Note: The aws-cli log command that we're using is explained in detail here.

    AWS Management Console

    Changing the log retention policy in the AWS Management Console is a fairly simple process:

    1. Navigate to the CloudWatch service in the AWS Management Console.
    2. Click on the Logs entry on the sidebar.
    3. Find the Log Group who's retention policy you're interested in changing.
    4. Click on the value in the Expire Events After column.
    5. Enter/Select the number of days you'd like to retain logs in that log group for.

    Screenshot of AWS console showing how to configure the retention period for Cloudwatch logs

    - + \ No newline at end of file diff --git a/docs/v13.4.0/configuration/collection-storage-best-practices/index.html b/docs/v13.4.0/configuration/collection-storage-best-practices/index.html index 5a733dea77e..3d3b86dca41 100644 --- a/docs/v13.4.0/configuration/collection-storage-best-practices/index.html +++ b/docs/v13.4.0/configuration/collection-storage-best-practices/index.html @@ -5,13 +5,13 @@ Collection Cost Tracking and Storage Best Practices | Cumulus Documentation - +
    Version: v13.4.0

    Collection Cost Tracking and Storage Best Practices

    Organizing your data is important for metrics you may want to collect. AWS S3 storage and cost metrics are calculated at the bucket level, so it is easy to get metrics by bucket. You can get storage metrics at the key prefix level, but that is done through the CLI, which can be very slow for large buckets. It is very difficult to estimate costs at the prefix level.

    Calculating Storage By Collection

    By bucket

    Usage by bucket can be obtained in your AWS Billing Dashboard via an S3 Usage Report. You can download your usage report for a period of time and review your storage and requests at the bucket level.

    Bucket metrics can also be found in the AWS CloudWatch Metrics Console (also see Using Amazon CloudWatch Metrics).

    Navigate to Storage Metrics and select the BucketName for all buckets you are interested in. The available metrics are BucketSizeInBytes and NumberOfObjects.

    In the Graphed metrics tab, you can select the type of statistic (i.e. average, minimum, maximum) and the period for the stats. At the top, it's useful to select from the dropdown to view the metrics as a number. You can also select the time period for which you want to see stats.

    Alternatively you can query CloudWatch using the CLI.

    This command will return the average number of bytes in the bucket test-bucket for 7/31/2019:

    aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2019-07-31T00:00:00 --end-time 2019-08-01T00:00:00 --period 86400 --statistics Average --region us-east-1 --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=test-bucket Name=StorageType,Value=StandardStorage

    The result looks like:

    {
    "Datapoints": [
    {
    "Timestamp": "2019-07-31T00:00:00Z",
    "Average": 150996467959.0,
    "Unit": "Bytes"
    }
    ],
    "Label": "BucketSizeBytes"
    }

    By key prefix

    AWS does not offer storage and usage statistics at a key prefix level. Via the AWS CLI, you can get the total storage for a bucket or folder. The following command would get the storage for folder example-folder in bucket sample-bucket:

    aws s3 ls --summarize --human-readable --recursive s3://sample-bucket/example-folder | grep 'Total'

    Note that this can be a long-running operation for large buckets.

    Calculating Cost By Collection

    NASA NGAP Environment

    If using an NGAP account, the cost per bucket can be found in your CloudTamer console, in the Financials section of your account information. This is calculated on a monthly basis.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Outside of NGAP

    You can enabled S3 Cost Allocation Tags and tag your buckets. From there, you can view the cost breakdown in your AWS Billing Dashboard via the Cost Explorer. Cost Allocation Tagging is available at the bucket level.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Storage Configuration

    Cumulus allows for the configuration of many buckets for your files. Buckets are created and added to your deployment as part of the deployment process.

    In your Cumulus collection configuration, you specify where you want the files to be stored post-processing. This is done by matching a regular expression on the file with the configured bucket.

    Note that in the collection configuration, the bucket field is the key to the buckets variable in the deployment's .tfvars file.

    Organizing By Bucket

    You can specify separate groups of buckets for each collection, which could look like the example below.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "MOD09GQ-006-private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "MOD09GQ-006-public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    Additional collections would go to different buckets.

    Organizing by Key Prefix

    Different collections can be organized into different folders in the same bucket, using the key prefix, which is specified as the url_path in the collection configuration. In this simplified collection configuration example, the url_path field is set at the top level so that all files go to a path prefixed with the collection name and version.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    In this case, the path to all the files would be: MOD09GQ___006/<filename> in their respective buckets.

    The url_path can be overidden directly on the file configuration. The example below produces the same result.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "protected-2",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    }
    ]
    }
    - + \ No newline at end of file diff --git a/docs/v13.4.0/configuration/data-management-types/index.html b/docs/v13.4.0/configuration/data-management-types/index.html index b50f9d1df3d..4d506880516 100644 --- a/docs/v13.4.0/configuration/data-management-types/index.html +++ b/docs/v13.4.0/configuration/data-management-types/index.html @@ -5,13 +5,13 @@ Cumulus Data Management Types | Cumulus Documentation - +
    Version: v13.4.0

    Cumulus Data Management Types

    What Are The Cumulus Data Management Types

    • Collections: Collections are logical sets of data objects of the same data type and version. They provide contextual information used by Cumulus ingest.
    • Granules: Granules are the smallest aggregation of data that can be independently managed. They are always associated with a collection, which is a grouping of granules.
    • Providers: Providers generate and distribute input data that Cumulus obtains and sends to workflows.
    • Rules: Rules tell Cumulus how to associate providers and collections and when/how to start processing a workflow.
    • Workflows: Workflows are composed of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage, and archive data.
    • Executions: Executions are records of a workflow.
    • Reconciliation Reports: Reports are a comparison of data sets to check to see if they are in agreement and to help Cumulus users detect conflicts.

    Interaction

    • Providers tell Cumulus where to get new data - i.e. S3, HTTPS
    • Collections tell Cumulus where to store the data files
    • Rules tell Cumulus when to trigger a workflow execution and tie providers and collections together

    Managing Data Management Types

    The following are created via the dashboard or API:

    • Providers
    • Collections
    • Rules
    • Reconciliation reports

    Granules are created by workflow executions and then can be managed via the dashboard or API.

    An execution record is created for each workflow execution triggered and can be viewed in the dashboard or data can be retrieved via the API.

    Workflows are created and managed via the Cumulus deployment.

    Configuration Fields

    Schemas

    Looking at our API schema definitions can provide us with some insight into collections, providers, rules, and their attributes (and whether those are required or not). The schema for different concepts will be reference throughout this document.

    The schemas are extremely useful for understanding which attributes are configurable and which of those are required. Cumulus uses these schemas for validation.

    Providers

    Please note:

    • While connection configuration is defined here, things that are more specific to a specific ingest setup (e.g. 'What target directory should we be pulling from' or 'How is duplicate handling configured?') are generally defined in a Rule or Collection, not the Provider.
    • There is some provider behavior which is controlled by task-specific configuration and not the provider definition. This configuration has to be set on a per-workflow basis. For example, see the httpListTimeout configuration on the discover-granules task

    Provider Configuration

    The Provider configuration is defined by a JSON object that takes different configuration keys depending on the provider type. The following are definitions of typical configuration values relevant for the various providers:

    Configuration by provider type
    S3
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be s3 for this provider type.
    hoststringYesS3 Bucket to pull data from
    http
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be http for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 80
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certificateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    https
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be https for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 443
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certiciateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    ftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be ftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to anonymous if not defined
    passwordstringNoPassword to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to password if not defined
    portintegerNoPort to connect to the provider on. Defaults to 21
    sftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be sftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the sftp server.
    passwordstringNoPassword to use to connect to the sftp server.
    portintegerNoPort to connect to the provider on. Defaults to 22
    privateKeystringNofilename assumed to be in s3://bucketInternal/stackName/crypto
    cmKeyIdstringNoAWS KMS Customer Master Key arn or alias

    Collections

    Break down of [s3_MOD09GQ_006.json](https://github.com/nasa/cumulus/blob/master/example/data/collections/s3_MOD09GQ_006/s3_MOD09GQ_006.json)
    KeyValueRequiredDescription
    name"MOD09GQ"YesThe name attribute designates the name of the collection. This is the name under which the collection will be displayed on the dashboard
    version"006"YesA version tag for the collection
    granuleId"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$"YesThe regular expression used to validate the granule ID extracted from filenames according to the granuleIdExtraction
    granuleIdExtraction"(MOD09GQ\..*)(\.hdf|\.cmr|_ndvi\.jpg)"YesThe regular expression used to extract the granule ID from filenames. The first capturing group extracted from the filename by the regex will be used as the granule ID.
    sampleFileName"MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesAn example filename belonging to this collection
    files<JSON Object> of files defined hereYesDescribe the individual files that will exist for each granule in this collection (size, browse, meta, etc.)
    dataType"MOD09GQ"NoCan be specified, but this value will default to the collection_name if not
    duplicateHandling"replace"No("replace"|"version"|"skip") determines granule duplicate handling scheme
    ignoreFilesConfigForDiscoveryfalse (default)NoBy default, during discovery only files that match one of the regular expressions in this collection's files attribute (see above) are ingested. Setting this to true will ignore the files attribute during discovery, meaning that all files for a granule (i.e., all files with filenames matching granuleIdExtraction) will be ingested even when they don't match a regular expression in the files attribute at discovery time. (NOTE: this attribute does not appear in the example file, but is listed here for completeness.)
    process"modis"NoExample options for this are found in the ChooseProcess step definition in the IngestAndPublish workflow definition
    meta<JSON Object> of MetaData for the collectionNoMetaData for the collection. This metadata will be available to workflows for this collection via the Cumulus Message Adapter.
    url_path"{cmrMetadata.Granule.Collection.ShortName}/
    {substring(file.fileName, 0, 3)}"
    NoFilename without extension

    files-object

    KeyValueRequiredDescription
    regex"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"YesRegular expression used to identify the file
    sampleFileNameMOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesFilename used to validate the provided regex
    type"data"NoValue to be assigned to the Granule File Type. CNM types are used by Cumulus CMR steps, non-CNM values will be treated as 'data' type. Currently only utilized in DiscoverGranules task
    bucket"internal"YesName of the bucket where the file will be stored
    url_path"${collectionShortName}/{substring(file.fileName, 0, 3)}"NoFolder used to save the granule in the bucket. Defaults to the collection url_path
    checksumFor"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"NoIf this is a checksum file, set checksumFor to the regex of the target file.

    Rules

    Rules are used by to start processing workflows and the transformation process. Rules can be invoked manually, based on a schedule, or can be configured to be triggered by either events in Kinesis, SNS messages or SQS messages.

    Rule configuration
    KeyValueRequiredDescription
    name"L2_HR_PIXC_kinesisRule"YesName of the rule. This is the name under which the rule will be listed on the dashboard
    workflow"CNMExampleWorkflow"YesName of the workflow to be run. A list of available workflows can be found on the Workflows page
    provider"PODAAC_SWOT"NoConfigured provider's ID. This can be found on the Providers dashboard page
    collection<JSON Object> collection object shown belowYesName and version of the collection this rule will moderate. Relates to a collection configured and found in the Collections page
    payload<JSON Object or Array>NoThe payload to be passed to the workflow
    meta<JSON Object> of MetaData for the ruleNoMetaData for the rule. This metadata will be available to workflows for this rule via the Cumulus Message Adapter.
    rule<JSON Object> rule type and associated values - discussed belowYesObject defining the type and subsequent attributes of the rule
    state"ENABLED"No("ENABLED"|"DISABLED") whether or not the rule will be active. Defaults to "ENABLED".
    queueUrlhttps://sqs.us-east-1.amazonaws.com/1234567890/queue-nameNoURL for SQS queue that will be used to schedule workflows for this rule
    tags["kinesis", "podaac"]NoAn array of strings that can be used to simplify search

    collection-object

    KeyValueRequiredDescription
    name"L2_HR_PIXC"YesName of a collection defined/configured in the Collections dashboard page
    version"000"YesVersion number of a collection defined/configured in the Collections dashboard page

    meta-object

    KeyValueRequiredDescription
    retries3NoNumber of retries on errors, for sqs-type rule only. Defaults to 3.
    visibilityTimeout900NoVisibilityTimeout in seconds for the inflight messages, for sqs-type rule only. Defaults to the visibility timeout of the SQS queue when the rule is created.

    rule-object

    KeyValueRequiredDescription
    type"kinesis"Yes("onetime"|"scheduled"|"kinesis"|"sns"|"sqs") type of scheduling/workflow kick-off desired
    value<String> ObjectDependsDiscussion of valid values is below

    rule-value

    The rule - value entry depends on the type of run:

    • If this is a onetime rule this can be left blank. Example
    • If this is a scheduled rule this field must hold a valid cron-type expression or rate expression.
    • If this is a kinesis rule, this must be a configured ${Kinesis_stream_ARN}. Example
    • If this is an sns rule, this must be an existing ${SNS_Topic_Arn}. Example
    • If this is an sqs rule, this must be an existing ${SQS_QueueUrl} that your account has permissions to access, and also you must configure a dead-letter queue for this SQS queue. Example

    sqs-type rule features

    • When an SQS rule is triggered, the SQS message remains on the queue.
    • The SQS message is not processed multiple times in parallel when visibility timeout is properly set. You should set the visibility timeout to the maximum expected length of the workflow with padding. Longer is better to avoid parallel processing.
    • The SQS message visibility timeout can be overridden by the rule.
    • Upon successful workflow execution, the SQS message is removed from the queue.
    • Upon failed execution(s), the workflow is run 3 or configured number of times.
    • Upon failed execution(s), the visibility timeout will be set to 5s to allow retries.
    • After configured number of failed retries, the SQS message is moved to the dead-letter queue configured for the SQS queue.

    Configuration Via Cumulus Dashboard

    Create A Provider

    • In the Cumulus dashboard, go to the Provider page.

    Screenshot of Create Provider form

    • Click on Add Provider.
    • Fill in the form and then submit it.

    Screenshot of Create Provider form

    Create A Collection

    • Go to the Collections page.

    Screenshot of the Collections page

    • Click on Add Collection.
    • Copy and paste or fill in the collection JSON object form.

    Screenshot of Add Collection form

    • Once you submit the form, you should be able to verify that your new collection is in the list.

    Create A Rule

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Rule Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v13.4.0/configuration/lifecycle-policies/index.html b/docs/v13.4.0/configuration/lifecycle-policies/index.html index d13ddec3048..4e8303cad29 100644 --- a/docs/v13.4.0/configuration/lifecycle-policies/index.html +++ b/docs/v13.4.0/configuration/lifecycle-policies/index.html @@ -5,13 +5,13 @@ Setting S3 Lifecycle Policies | Cumulus Documentation - +
    Version: v13.4.0

    Setting S3 Lifecycle Policies

    This document will outline, in brief, how to set data lifecycle policies so that you are more easily able to control data storage costs while keeping your data accessible. For more information on why you might want to do this, see the 'Additional Information' section at the end of the document.

    Requirements

    • The AWS CLI installed and configured (if you wish to run the CLI example). See AWS's guide to setting up the AWS CLI for more on this. Please ensure the AWS CLI is in your shell path.
    • You will need a S3 bucket on AWS. You are strongly encouraged to use a bucket without voluminous amounts of data in it for experimenting/learning.
    • An AWS user with the appropriate roles to access the target bucket as well as modify bucket policies.

    Examples

    Walk-through on setting time-based S3 Infrequent Access (S3IA) bucket policy

    This example will give step-by-step instructions on updating a bucket's lifecycle policy to move all objects in the bucket from the default storage to S3 Infrequent Access (S3IA) after a period of 90 days. Below are instructions for walking through configuration via the command line and the management console.

    Command Line

    Please ensure you have the AWS CLI installed and configured for access prior to attempting this example.

    Create policy

    From any directory you chose, open an editor and add the following to a file named exampleRule.json

    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    Set policy

    On the command line run the following command (with the bucket you're working with substituted in place of yourBucketNameHere).

    aws s3api put-bucket-lifecycle-configuration --bucket yourBucketNameHere --lifecycle-configuration file://exampleRule.json

    Verify policy has been set

    To obtain all of the existing policies for a bucket, run the following command (again substituting the correct bucket name):

     $ aws s3api get-bucket-lifecycle-configuration --bucket yourBucketNameHere
    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    You have set a policy that transitions any version of an object in the bucket to S3IA after each object version has not been modified for 90 days.

    Management Console

    Create Policy

    To create the example policy on a bucket via the management console, go to the following URL (replacing 'yourBucketHere' with the bucket you intend to update):

    https://s3.console.aws.amazon.com/s3/buckets/yourBucketHere/?tab=overview

    You should see a screen similar to:

    Screenshot of AWS console for an S3 bucket

    Click the "Management" Tab, then lifecycle button and press + Add lifecycle rule:

    Screenshot of &quot;Management&quot; tab of AWS console for an S3 bucket

    Give the rule a name (e.g. '90DayRule'), leaving the filter blank:

    Screenshot of window for configuring the name and scope of a lifecycle rule on an S3 bucket in the AWS console

    Click next, and mark Current Version and Previous Versions.

    Then for each, click + Add transition and select Transition to Standard-IA after for the Object creation field, and set 90 for the Days after creation/Days after objects become concurrent field. Your screen should look similar to:

    Screenshot of window for configuring the storage class transitions of a lifecycle rule on an S3 bucket in the AWS console

    Click next, then next past the Configure expiration screen (we won't be setting this), and on the fourth page, click Save:

    Screenshot of window for reviewing the configuration of a lifecycle rule on an S3 bucket in the AWS console

    You should now see you have a rule configured for your bucket:

    Screenshot of lifecycle rule appearing in the &quot;Management&quot; tab of AWS console for an S3 bucket

    You have now set a policy that transitions any version of an object in the bucket to S3IA after each object has not been modified for 90 days.

    Additional Information

    This section lists information you may want prior to enacting lifecycle policies. It is not required content for working through the examples.

    Strategy Overview

    For a discussion of overall recommended strategy, please review the Methodology for Data Lifecycle Management on the EarthData wiki.

    AWS Documentation

    The examples shown in this document are obviously fairly basic cases. By using object tags, filters and other configuration options you can enact far more complicated policies for various scenarios. For more reading on the topics presented on this page see:

    - + \ No newline at end of file diff --git a/docs/v13.4.0/configuration/monitoring-readme/index.html b/docs/v13.4.0/configuration/monitoring-readme/index.html index 9e92ae8a364..8fef7e54503 100644 --- a/docs/v13.4.0/configuration/monitoring-readme/index.html +++ b/docs/v13.4.0/configuration/monitoring-readme/index.html @@ -5,14 +5,14 @@ Monitoring Best Practices | Cumulus Documentation - +
    Version: v13.4.0

    Monitoring Best Practices

    This document intends to provide a set of recommendations and best practices for monitoring the state of a deployed Cumulus and diagnosing any issues.

    Cumulus-provided resources and integrations for monitoring

    Cumulus provides a number set of resources that are useful for monitoring the system and its operation.

    Cumulus Dashboard

    The primary tool for monitoring the Cumulus system is the Cumulus Dashboard. The dashboard is hosted on Github and includes instructions on how to deploy and link it into your core Cumulus deployment.

    The dashboard displays workflow executions, their status, inputs, outputs, and some diagnostic information such as logs. For further information on the dashboard, its usage, and the information it provides, see the documentation.

    Cumulus-provided AWS resources

    Cumulus sets up CloudWatch log groups for all Core-provided tasks.

    Monitoring Lambda Functions

    Logging for each Lambda Function is available in Lambda-specific CloudWatch log groups.

    Monitoring ECS services

    Each deployed cumulus_ecs_service module also includes a CloudWatch log group for the processes running on ECS.

    Monitoring workflows

    For advanced debugging, we also configure dead letter queues on critical system functions. These will allow you to monitor and debug invalid inputs to the functions we use to start workflows, which can be helpful if you find that you are not seeing workflows being started as expected. More information on these can be found in the dead letter queue documentation

    AWS recommendations

    AWS has a number of recommendations on system monitoring. Rather than reproduce those here and risk providing outdated guidance, we've documented the following links which will take you to available AWS docs on monitoring recommendations and best practices for the services used in Cumulus:

    Example: Setting up email notifications for CloudWatch logs

    Cumulus does not provide out-of-the-box support for email notifications at this time. However, setting up email notifications on AWS is fairly straightforward in that the operative components are an AWS SNS topic and a subscribed email address.

    In terms of Cumulus integration, forwarding CloudWatch logs requires creating a mechanism, most likely a Lambda Function subscribed to the log group that will receive, filter and forward these messages to the SNS topic.

    As a very simple example, we could create a function that filters CloudWatch logs created by the @cumulus/logger package and sends email notifications for error and fatal log levels, adapting the example linked above:

    const zlib = require('zlib');
    const aws = require('aws-sdk');
    const { promisify } = require('util');

    const gunzip = promisify(zlib.gunzip);
    const sns = new aws.SNS();

    exports.handler = async (event) => {
    const payload = Buffer.from(event.awslogs.data, 'base64');
    const decompressedData = await gunzip(payload);
    const logData = JSON.parse(decompressedData.toString('ascii'));
    return await Promise.all(logData.logEvents.map(async (logEvent) => {
    const logMessage = JSON.parse(logEvent.message);
    if (['error', 'fatal'].includes(logMessage.level)) {
    return sns.publish({
    TopicArn: process.env.EmailReportingTopicArn,
    Message: logEvent.message
    }).promise();
    }
    return Promise.resolve();
    }));
    };

    After creating the SNS topic, We can deploy this code as a lambda function, following the setup steps from Amazon. Make sure to include your SNS topic ARN as an environment variable on the lambda function by using the --environment option on aws lambda create-function.

    You will need to create subscription filters for each log group you want to receive emails for. We recommend automating this as much as possible, and you could very well handle this via Terraform, such as using a module to deploy filters alongside log groups, or exporting the log group names to an all-in-one email notification module.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/configuration/server_access_logging/index.html b/docs/v13.4.0/configuration/server_access_logging/index.html index 8b4ac4467a3..4d249d280a9 100644 --- a/docs/v13.4.0/configuration/server_access_logging/index.html +++ b/docs/v13.4.0/configuration/server_access_logging/index.html @@ -5,13 +5,13 @@ S3 Server Access Logging | Cumulus Documentation - +
    Version: v13.4.0

    S3 Server Access Logging

    Via AWS Console

    Enable server access logging for an S3 bucket

    Via AWS Command Line Interface

    1. Create a logging.json file with these contents, replacing <stack-internal-bucket> with your stack's internal bucket name, and <stack> with the name of your cumulus stack.

      {
      "LoggingEnabled": {
      "TargetBucket": "<stack-internal-bucket>",
      "TargetPrefix": "<stack>/ems-distribution/s3-server-access-logs/"
      }
      }
    2. Add the logging policy to each of your protected and public buckets by calling this command on each bucket.

      aws s3api put-bucket-logging --bucket <protected/public-bucket-name> --bucket-logging-status file://logging.json
    3. Verify the logging policy exists on your buckets.

      aws s3api get-bucket-logging --bucket <protected/public-bucket-name>
    - + \ No newline at end of file diff --git a/docs/v13.4.0/configuration/task-configuration/index.html b/docs/v13.4.0/configuration/task-configuration/index.html index c012b2661e5..fa82503ad7f 100644 --- a/docs/v13.4.0/configuration/task-configuration/index.html +++ b/docs/v13.4.0/configuration/task-configuration/index.html @@ -5,13 +5,13 @@ Configuration of Tasks | Cumulus Documentation - +
    Version: v13.4.0

    Configuration of Tasks

    The cumulus module exposes values for configuration for some of the provided archive and ingest tasks. Currently the following are available as configurable variables:

    cmr_search_client_config

    Configuration parameters for CMR search client for cumulus archive module tasks in the form:

    <lambda_identifier>_report_cmr_limit = <maximum number records can be returned from cmr-client search, this should be greater than cmr_page_size>
    <lambda_identifier>_report_cmr_page_size = <number of records for each page returned from CMR>
    type = map(string)

    More information about cmr limit and cmr page_size can be found from @cumulus/cmr-client and CMR Search API document.

    Currently the following values are supported:

    • create_reconciliation_report_cmr_limit
    • create_reconciliation_report_cmr_page_size

    Example

    cmr_search_client_config = {
    create_reconciliation_report_cmr_limit = 2500
    create_reconciliation_report_cmr_page_size = 250
    }

    elasticsearch_client_config

    Configuration parameters for Elasticsearch client for cumulus archive module tasks in the form:

    <lambda_identifier>_es_scroll_duration = <duration>
    <lambda_identifier>_es_scroll_size = <size>
    type = map(string)

    Currently the following values are supported:

    • create_reconciliation_report_es_scroll_duration
    • create_reconciliation_report_es_scroll_size

    Example

    elasticsearch_client_config = {
    create_reconciliation_report_es_scroll_duration = "15m"
    create_reconciliation_report_es_scroll_size = 2000
    }

    lambda_timeouts

    A configurable map of timeouts (in seconds) for cumulus ingest module task lambdas in the form:

    <lambda_identifier>_timeout: <timeout>
    type = map(string)

    Currently the following values are supported:

    • discover_granules_task_timeout
    • discover_pdrs_task_timeout
    • fake_processing_task_timeout
    • files_to_granules_task_timeout
    • hello_world_task_timeout
    • hyrax_metadata_update_tasks_timeout
    • lzards_backup_task_timeout
    • move_granules_task_timeout
    • parse_pdr_task_timeout
    • pdr_status_check_task_timeout
    • post_to_cmr_task_timeout
    • queue_granules_task_timeout
    • queue_pdrs_task_timeout
    • queue_workflow_task_timeout
    • sf_sqs_report_task_timeout
    • sync_granule_task_timeout
    • update_granules_cmr_metadata_file_links_task_timeout

    Example

    lambda_timeouts = {
    discover_granules_task_timeout = 300
    }

    lambda_memory_sizes

    A configurable map of memory sizes (in MBs) for cumulus ingest module task lambdas in the form:

    <lambda_identifier>_memory_size: <memory_size>
    type = map(string)

    Currently the following values are supported:

    • add_missing_file_checksums_task_memory_size
    • discover_granules_task_memory_size
    • discover_pdrs_task_memory_size
    • fake_processing_task_memory_size
    • hyrax_metadata_updates_task_memory_size
    • lzards_backup_task_memory_size
    • move_granules_task_memory_size
    • parse_pdr_task_memory_size
    • pdr_status_check_task_memory_size
    • post_to_cmr_task_memory_size
    • queue_granules_task_memory_size
    • queue_pdrs_task_memory_size
    • queue_workflow_task_memory_size
    • sf_sqs_report_task_memory_size
    • sync_granule_task_memory_size
    • update_cmr_acess_constraints_task_memory_size
    • update_granules_cmr_metadata_file_links_task_memory_size

    Example

    lambda_memory_sizes = {
    queue_granules_task_memory_size = 1036
    }
    - + \ No newline at end of file diff --git a/docs/v13.4.0/data-cookbooks/about-cookbooks/index.html b/docs/v13.4.0/data-cookbooks/about-cookbooks/index.html index 0b065b89fda..3e3dea20290 100644 --- a/docs/v13.4.0/data-cookbooks/about-cookbooks/index.html +++ b/docs/v13.4.0/data-cookbooks/about-cookbooks/index.html @@ -5,13 +5,13 @@ About Cookbooks | Cumulus Documentation - +
    Version: v13.4.0

    About Cookbooks

    Introduction

    The following data cookbooks are documents containing examples and explanations of workflows in the Cumulus framework. Additionally, the following data cookbooks should serve to help unify an institution/user group on a set of terms.

    Setup

    The data cookbooks assume you can configure providers, collections, and rules to run workflows. Visit Cumulus data management types for information on how to configure Cumulus data management types.

    Adding a page

    As shown in detail in the "Add a New Page and Sidebars" section in Cumulus Docs: How To's, you can add a new page to the data cookbook by creating a markdown (.md) file in the docs/data-cookbooks directory. The new page can then be linked to the sidebar by adding it to the Data-Cookbooks object in the website/sidebar.json file as data-cookbooks/${id}.

    More about workflows

    Workflow general information

    Input & Output

    Developing Workflow Tasks

    Workflow Configuration How-to's

    - + \ No newline at end of file diff --git a/docs/v13.4.0/data-cookbooks/browse-generation/index.html b/docs/v13.4.0/data-cookbooks/browse-generation/index.html index 022e306bf93..c81121339fa 100644 --- a/docs/v13.4.0/data-cookbooks/browse-generation/index.html +++ b/docs/v13.4.0/data-cookbooks/browse-generation/index.html @@ -5,7 +5,7 @@ Ingest Browse Generation | Cumulus Documentation - + @@ -15,7 +15,7 @@ provider keys with the previously entered values) Note that you need to set the "provider_path" to the path on your bucket (e.g. "/data") that you've staged your mock/test data.:

    {
    "name": "TestBrowseGeneration",
    "workflow": "DiscoverGranulesBrowseExample",
    "provider": "{{provider_from_previous_step}}",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "meta": {
    "provider_path": "{{path_to_data}}"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "updatedAt": 1553053438767
    }

    Run Workflows

    Once you've configured the Collection and Provider and added a onetime rule, you're ready to trigger your rule, and watch the ingest workflows process.

    Go to the Rules tab, click the rule you just created:

    Screenshot of the Rules overview page with a list of rules in the Cumulus dashboard

    Then click the gear in the upper right corner and click "Rerun":

    Screenshot of clicking the button to rerun a workflow rule from the rule edit page in the Cumulus dashboard

    Tab over to executions and you should see the DiscoverGranulesBrowseExample workflow run, succeed, and then moments later the CookbookBrowseExample should run and succeed.

    Screenshot of page listing executions in the Cumulus dashboard

    Results

    You can verify your data has ingested by clicking the successful workflow entry:

    Screenshot of individual entry from table listing executions in the Cumulus dashboard

    Select "Show Output" on the next page

    Screenshot of &quot;Show output&quot; button from individual execution page in the Cumulus dashboard

    and you should see in the payload from the workflow something similar to:

    "payload": {
    "process": "modis",
    "granules": [
    {
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-private",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "type": "browse",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-protected-2",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}"
    }
    ],
    "cmrLink": "https://cmr.uat.earthdata.nasa.gov/search/granules.json?concept_id=G1222231611-CUMULUS",
    "cmrConceptId": "G1222231611-CUMULUS",
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "cmrMetadataFormat": "echo10",
    "dataType": "MOD09GQ",
    "version": "006",
    "published": true
    }
    ]
    }

    You can verify the granules exist within your cumulus instance (search using the Granules interface, check the S3 buckets, etc) and validate that the above CMR entry


    Build Processing Lambda

    This section discusses the construction of a custom processing lambda to replace the contrived example from this entry for a real dataset processing task.

    To ingest your own data using this example, you will need to construct your own lambda to replace the source in ProcessingStep that will generate browse imagery and provide or update a CMR metadata export file.

    You will then need to add the lambda to your Cumulus deployment as a aws_lambda_function Terraform resource.

    The discussion below outlines requirements for this lambda.

    Inputs

    The incoming message to the task defined in the ProcessingStep as configured will have the following configuration values (accessible inside event.config courtesy of the message adapter):

    Configuration

    • event.config.bucket -- the name of the bucket configured in terraform.tfvars as your internal bucket.

    • event.config.collection -- The full collection object we will configure in the Configure Ingest section. You can view the expected collection schema in the docs here or in the source code on github. You need this as available input and output so you can update as needed.

    event.config.additionalUrls, generateFakeBrowse and event.config.cmrMetadataFormat from the example can be ignored as they're configuration flags for the provided example script.

    Payload

    The 'payload' from the previous task is accessible via event.input. The expected payload output schema from SyncGranules can be viewed here.

    In our example, the payload would look like the following. Note: The types are set per-file based on what we configured in our collection, and were initially added as part of the DiscoverGranules step in the DiscoverGranulesBrowseExample workflow.

     "payload": {
    "process": "modis",
    "granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    }
    ]
    }
    ]
    }

    Generating Browse Imagery

    The provided example script used in the example goes through all granules and adds a 'fake' .jpg browse file to the same staging location as the data staged by prior ingest tasksf.

    The processing lambda you construct will need to do the following:

    • Create a browse image file based on the input data, and stage it to a location accessible to both this task and the FilesToGranules and MoveGranules tasks in a S3 bucket.
    • Add the browse file to the input granule files, making sure to set the granule file's type to browse.
    • Update meta.input_granules with the updated granules list, as well as provide the files to be integrated by FilesToGranules as output from the task.

    Generating/updating CMR metadata

    If you do not already have a CMR file in the granules list, you will need to generate one for valid export. This example's processing script generates and adds it to the FilesToGranules file list via the payload but it can be present in the InputGranules from the DiscoverGranules task as well if you'd prefer to pre-generate it.

    Both downstream tasks MoveGranules, UpdateGranulesCmrMetadataFileLinks, and PostToCmr expect a valid CMR file to be available if you want to export to CMR.

    Expected Outputs for processing task/tasks

    In the above example, the critical portion of the output to FilesToGranules is the payload and meta.input_granules.

    In the example provided, the processing task is setup to return an object with the keys "files" and "granules". In the cumulus_message configuration, the outputs are mapped in the configuration to the payload, granules to meta.input_granules:

              "task_config": {
    "inputGranules": "{$.meta.input_granules}",
    "granuleIdExtraction": "{$.meta.collection.granuleIdExtraction}"
    }

    Their expected values from the example above may be useful in constructing a processing task:

    payload

    The payload includes a full list of files to be 'moved' into the cumulus archive. The FilesToGranules task will take this list, merge it with the information from InputGranules, then pass that list to the MoveGranules task. The MoveGranules task will then move the files to their targets. The UpdateGranulesCmrMetadataFileLinks task will update the CMR metadata file if it exists with the updated granule locations and update the CMR file etags.

    In the provided example, a payload being passed to the FilesToGranules task should be expected to look like:

      "payload": [
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml"
    ]

    This list is the list of granules FilesToGranules will act upon to add/merge with the input_granules object.

    The pathing is generated from sync-granules, but in principle the files can be staged wherever you like so long as the processing/MoveGranules task's roles have access and the filename matches the collection configuration.

    input_granules

    The FilesToGranules task utilizes the incoming payload to chose which files to move, but pulls all other metadata from meta.input_granules. As such, the output payload in the example would look like:

    "input_granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg"
    }
    ]
    }
    ],
    - + \ No newline at end of file diff --git a/docs/v13.4.0/data-cookbooks/choice-states/index.html b/docs/v13.4.0/data-cookbooks/choice-states/index.html index 44c915e2103..e58d4e42726 100644 --- a/docs/v13.4.0/data-cookbooks/choice-states/index.html +++ b/docs/v13.4.0/data-cookbooks/choice-states/index.html @@ -5,13 +5,13 @@ Choice States | Cumulus Documentation - +
    Version: v13.4.0

    Choice States

    Cumulus supports AWS Step Function Choice states. A Choice state enables branching logic in Cumulus workflows.

    Choice state definitions include a list of Choice Rules. Each Choice Rule defines a logical operation which compares an input value against a value using a comparison operator. For available comparison operators, review the AWS docs.

    If the comparison evaluates to true, the Next state is followed.

    Example

    In examples/cumulus-tf/parse_pdr_workflow.tf the ParsePdr workflow uses a Choice state, CheckAgainChoice, to terminate the workflow once meta.isPdrFinished: true is returned by the CheckStatus state.

    The CheckAgainChoice state definition requires an input object of the following structure:

    {
    "meta": {
    "isPdrFinished": false
    }
    }

    Given the above input to the CheckAgainChoice state, the workflow would transition to the PdrStatusReport state.

    "CheckAgainChoice": {
    "Type": "Choice",
    "Choices": [
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": false,
    "Next": "PdrStatusReport"
    },
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": true,
    "Next": "WorkflowSucceeded"
    }
    ],
    "Default": "WorkflowSucceeded"
    }

    Advanced: Loops in Cumulus Workflows

    Understanding the complete ParsePdr workflow is not necessary to understanding how Choice states work, but ParsePdr provides an example of how Choice states can be used to create a loop in a Cumulus workflow.

    In the complete ParsePdr workflow definition, the state QueueGranules is followed by CheckStatus. From CheckStatus a loop starts: Given CheckStatus returns meta.isPdrFinished: false, CheckStatus is followed by CheckAgainChoice is followed by PdrStatusReport is followed by WaitForSomeTime, which returns to CheckStatus. Once CheckStatus returns meta.isPdrFinished: true, CheckAgainChoice proceeds to WorkflowSucceeded.

    Execution graph of SIPS ParsePdr workflow in AWS Step Functions console

    Further documentation

    For complete details on Choice state configuration options, see the Choice state documentation.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/data-cookbooks/cnm-workflow/index.html b/docs/v13.4.0/data-cookbooks/cnm-workflow/index.html index 73c241d502f..0f01b296d51 100644 --- a/docs/v13.4.0/data-cookbooks/cnm-workflow/index.html +++ b/docs/v13.4.0/data-cookbooks/cnm-workflow/index.html @@ -5,7 +5,7 @@ CNM Workflow | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v13.4.0

    CNM Workflow

    This entry documents how to setup a workflow that utilizes the built-in CNM/Kinesis functionality in Cumulus.

    Prior to working through this entry you should be familiar with the Cloud Notification Mechanism.

    Sections


    Prerequisites

    Cumulus

    This entry assumes you have a deployed instance of Cumulus (version >= 1.16.0). The entry assumes you are deploying Cumulus via the cumulus terraform module sourced from the release page.

    AWS CLI

    This entry assumes you have the AWS CLI installed and configured. If you do not, please take a moment to review the documentation - particularly the examples relevant to Kinesis - and install it now.

    Kinesis

    This entry assumes you already have two Kinesis data steams created for use as CNM notification and response data streams.

    If you do not have two streams setup, please take a moment to review the Kinesis documentation and setup two basic single-shard streams for this example:

    Using the "Create Data Stream" button on the Kinesis Dashboard, work through the dialogue.

    You should be able to quickly use the "Create Data Stream" button on the Kinesis Dashboard, and setup streams that are similar to the following example:

    Screenshot of AWS console page for creating a Kinesis stream

    Please bear in mind that your {{prefix}}-lambda-processing IAM role will need permissions to write to the response stream for this workflow to succeed if you create the Kinesis stream with a dashboard user. If you are using the cumulus top-level module for your deployment this should be set properly.

    If not, the most straightforward approach is to attach the AmazonKinesisFullAccess policy for the stream resource to whatever role your Lambda s are using, however your environment/security policies may require an approach specific to your deployment environment.

    In operational environments it's likely science data providers would typically be responsible for providing a Kinesis stream with the appropriate permissions.

    For more information on how this process works and how to develop a process that will add records to a stream, read the Kinesis documentation and the developer guide.

    Source Data

    This entry will run the SyncGranule task against a single target data file. To that end it will require a single data file to be present in an S3 bucket matching the Provider configured in the next section.

    Collection and Provider

    Cumulus will need to be configured with a Collection and Provider entry of your choosing. The provider should match the location of the source data from the Ingest Source Data section.

    This can be done via the Cumulus Dashboard if installed or the API. It is strongly recommended to use the dashboard if possible.


    Configure the Workflow

    Provided the prerequisites have been fulfilled, you can begin adding the needed values to your Cumulus configuration to configure the example workflow.

    The following are steps that are required to set up your Cumulus instance to run the example workflow:

    Example CNM Workflow

    In this example, we're going to trigger a workflow by creating a Kinesis rule and sending a record to a Kinesis stream.

    The following workflow definition should be added to a new .tf workflow resource (e.g. cnm_workflow.tf) in your deployment directory. For the complete CNM workflow example, see examples/cumulus-tf/cnm_workflow.tf.

    Add the following to the new terraform file in your deployment directory, updating the following:

    • Set the response-endpoint key in the CnmResponse task in the workflow JSON to match the name of the Kinesis response stream you configured in the prerequisites section
    • Update the source key to the workflow module to match the Cumulus release associated with your deployment.
    module "cnm_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-workflow.zip"

    prefix = var.prefix
    name = "CNMExampleWorkflow"
    workflow_config = module.cumulus.workflow_config
    system_bucket = var.system_bucket

    {
    state_machine_definition = <<JSON
    "CNMExampleWorkflow": {
    "Comment": "CNMExampleWorkflow",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "collection": "{$.meta.collection}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "response-endpoint": "ADD YOUR RESPONSE STREAM NAME HERE",
    "region": "us-east-1",
    "type": "kinesis",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$.input.input}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 5,
    "MaxAttempts": 3
    }
    ],
    "End": true
    }
    }
    }
    }
    JSON

    Again, please make sure to modify the value response-endpoint to match the stream name (not ARN) for your Kinesis response stream.

    Lambda Configuration

    To execute this workflow, you're required to include several Lambda resources in your deployment. To do this, add the following task (Lambda) definitions to your deployment along with the workflow you created above:

    Please note: To utilize these tasks you need to ensure you have a compatible CMA layer. See the deployment instructions for more details on how to deploy a CMA layer.

    Below is a description of each of these tasks:

    CNMToCMA

    CNMToCMA is meant for the beginning of a workflow: it maps CNM granule information to a payload for downstream tasks. For other CNM workflows, you would need to ensure that downstream tasks in your workflow either understand the CNM message or include a translation task like this one.

    You can also manipulate the data sent to downstream tasks using task_config for various states in your workflow resource configuration. Read more about how to configure data on the Workflow Input & Output page.

    CnmResponse

    The CnmResponse Lambda generates a CNM response message and puts it on the response-endpoint Kinesis stream.

    You can read more about the expected schema of a CnmResponse record in the Cloud Notification Mechanism schema repository.

    Additional Tasks

    Lastly, this entry also makes use of the SyncGranule task from the cumulus module.

    Redeploy

    Once the above configuration changes have been made, redeploy your stack.

    Please refer to Update Cumulus resources in the deployment documentation if you are unfamiliar with redeployment.

    Rule Configuration

    Cumulus includes a messageConsumer Lambda function (message-consumer). Cumulus kinesis-type rules create the event source mappings between Kinesis streams and the messageConsumer Lambda. The messageConsumer Lambda consumes records from one or more Kinesis streams, as defined by enabled kinesis-type rules. When new records are pushed to one of these streams, the messageConsumer triggers workflows associated with the enabled kinesis-type rules.

    To add a rule via the dashboard (if you'd like to use the API, see the docs here), navigate to the Rules page and click Add a rule, then configure the new rule using the following template (substituting correct values for parameters denoted by ${}):

    {
    "collection": {
    "name": "L2_HR_PIXC",
    "version": "000"
    },
    "name": "L2_HR_PIXC_kinesisRule",
    "provider": "PODAAC_SWOT",
    "rule": {
    "type": "kinesis",
    "value": "arn:aws:kinesis:{{awsRegion}}:{{awsAccountId}}:stream/{{streamName}}"
    },
    "state": "ENABLED",
    "workflow": "CNMExampleWorkflow"
    }

    Please Note:

    • The rule's value attribute value must match the Amazon Resource Name ARN for the Kinesis data stream you've preconfigured. You should be able to obtain this ARN from the Kinesis Dashboard entry for the selected stream.
    • The collection and provider should match the collection and provider you setup in the Prerequisites section.

    Once you've clicked on 'submit' a new rule should appear in the dashboard's Rule Overview.


    Execute the Workflow

    Once Cumulus has been redeployed and a rule has been added, we're ready to trigger the workflow and watch it execute.

    How to Trigger the Workflow

    To trigger matching workflows, you will need to put a record on the Kinesis stream that the message-consumer Lambda will recognize as a matching event. Most importantly, it should include a collection name that matches a valid collection.

    For the purpose of this example, the easiest way to accomplish this is using the AWS CLI.

    Create Record JSON

    Construct a JSON file containing an object that matches the values that have been previously setup. This JSON object should be a valid Cloud Notification Mechanism message.

    Please note: this example is somewhat contrived, as the downstream tasks don't care about most of these fields. A 'real' data ingest workflow would.

    The following values (denoted by ${} in the sample below) should be replaced to match values we've previously configured:

    • TEST_DATA_FILE_NAME: The filename of the test data that is available in the S3 (or other) provider we created earlier.
    • TEST_DATA_URI: The full S3 path to the test data (e.g. s3://bucket-name/path/granule)
    • COLLECTION: The collection name defined in the prerequisites for this product
    {
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "${TEST_DATA_FILE_NAME}",
    "checksum": "bogus_checksum_value",
    "uri": "${TEST_DATA_URI}",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "${TEST_DATA_FILE_NAME}",
    "dataVersion": "006"
    },
    "identifier ": "testIdentifier123456",
    "collection": "${COLLECTION}",
    "provider": "TestProvider",
    "version": "001",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Add Record to Kinesis Data Stream

    Using the JSON file you created, push it to the Kinesis notification stream:

    aws kinesis put-record --stream-name YOUR_KINESIS_NOTIFICATION_STREAM_NAME_HERE --partition-key 1 --data file:///path/to/file.json

    Please note: The above command uses the stream name, not the ARN.

    The command should return output similar to:

    {
    "ShardId": "shardId-000000000000",
    "SequenceNumber": "42356659532578640215890215117033555573986830588739321858"
    }

    This command will put a record containing the JSON from the --data flag onto the Kinesis data stream. The messageConsumer Lambda will consume the record and construct a valid CMA payload to trigger workflows. For this example, the record will trigger the CNMExampleWorkflow workflow as defined by the rule previously configured.

    You can view the current running executions on the Executions dashboard page which presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information.

    Verify Workflow Execution

    As detailed above, once the record is added to the Kinesis data stream, the messageConsumer Lambda will trigger the CNMExampleWorkflow .

    TranslateMessage

    TranslateMessage (which corresponds to the CNMToCMA Lambda) will take the CNM object payload and add a granules object to the CMA payload that's consistent with other Cumulus ingest tasks, and add a meta.cnm key (as well as the payload) to store the original message.

    For more on the Message Adapter, please see the Message Flow documentation.

    An example of what is happening in the CNMToCMA Lambda is as follows:

    Example Input Payload:

    "payload": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some_bucket/cumulus-test-data/pdrs/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Example Output Payload:

      "payload": {
    "cnm": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552"
    },
    "output": {
    "granules": [
    {
    "granuleId": "TestGranuleUR",
    "files": [
    {
    "path": "some-bucket/data",
    "url_path": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "some-bucket",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 12345678
    }
    ]
    }
    ]
    }
    }

    SyncGranules

    This Lambda will take the files listed in the payload and move them to s3://{deployment-private-bucket}/file-staging/{deployment-name}/{COLLECTION}/{file_name}.

    CnmResponse

    Assuming a successful execution of the workflow, this task will recover the meta.cnm key from the CMA output, and add a "SUCCESS" record to the notification Kinesis stream.

    If a prior step in the workflow has failed, this will add a "FAILURE" record to the stream instead.

    The data written to the response-endpoint should adhere to the Response Message Fields schema.

    Example CNM Success Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "SUCCESS"
    }
    }

    Example CNM Error Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "FAILURE",
    "errorCode": "PROCESSING_ERROR",
    "errorMessage": "File [cumulus-dev-a4d38f59-5e57-590c-a2be-58640db02d91/prod_20170926T11:30:36/production_file.nc] did not match gve checksum value."
    }
    }

    Note the CnmResponse state defined in the .tf workflow definition above configures $.exception to be passed to the CnmResponse Lambda keyed under config.WorkflowException. This is required for the CnmResponse code to deliver a failure response.

    To test the failure scenario, send a record missing the product.name key.


    Verify results

    Check for successful execution on the dashboard

    Following the successful execution of this workflow, you should expect to see the workflow complete successfully on the dashboard:

    Screenshot of a successful CNM workflow appearing on the executions page of the Cumulus dashboard

    Check the test granule has been delivered to S3 staging

    The test granule identified in the Kinesis record should be moved to the deployment's private staging area.

    Check for Kinesis records

    A SUCCESS notification should be present on the response-endpoint Kinesis stream.

    You should be able to validate the notification and response streams have the expected records with the following steps (the AWS CLI Kinesis Basic Stream Operations is useful to review before proceeding):

    Get a shard iterator (substituting your stream name as appropriate):

    aws kinesis get-shard-iterator \
    --shard-id shardId-000000000000 \
    --shard-iterator-type LATEST \
    --stream-name NOTIFICATION_OR_RESPONSE_STREAM_NAME

    which should result in an output to:

    {
    "ShardIterator": "VeryLongString=="
    }
    • Re-trigger the workflow by using the put-record command from
    • As the workflow completes, use the output from the get-shard-iterator command to request data from the stream:
    aws kinesis get-records --shard-iterator SHARD_ITERATOR_VALUE

    This should result in output similar to:

    {
    "Records": [
    {
    "SequenceNumber": "49586720336541656798369548102057798835250389930873978882",
    "ApproximateArrivalTimestamp": 1532664689.128,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjI4LjkxOSJ9",
    "PartitionKey": "1"
    },
    {
    "SequenceNumber": "49586720336541656798369548102059007761070005796999266306",
    "ApproximateArrivalTimestamp": 1532664707.149,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjQ2Ljk1OCJ9",
    "PartitionKey": "1"
    }
    ],
    "NextShardIterator": "AAAAAAAAAAFo9SkF8RzVYIEmIsTN+1PYuyRRdlj4Gmy3dBzsLEBxLo4OU+2Xj1AFYr8DVBodtAiXbs3KD7tGkOFsilD9R5tA+5w9SkGJZ+DRRXWWCywh+yDPVE0KtzeI0andAXDh9yTvs7fLfHH6R4MN9Gutb82k3lD8ugFUCeBVo0xwJULVqFZEFh3KXWruo6KOG79cz2EF7vFApx+skanQPveIMz/80V72KQvb6XNmg6WBhdjqAA==",
    "MillisBehindLatest": 0
    }

    Note the data encoding is not human readable and would need to be parsed/converted to be interpretable. There are many options to build a Kineis consumer such as the KCL.

    For purposes of validating the workflow, it may be simpler to locate the workflow in the Step Function Management Console and assert the expected output is similar to the below examples.

    Successful CNM Response Object Example:

    {
    "cnmResponse": {
    "provider": "TestProvider",
    "collection": "MOD09GQ",
    "version": "123456",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier ": "testIdentifier123456",
    "response": {
    "status": "SUCCESS"
    }
    }
    }

    Kinesis Record Error Handling

    messageConsumer

    The default Kinesis stream processing in the Cumulus system is configured for record error tolerance.

    When the messageConsumer fails to process a record, the failure is captured and the record is published to the kinesisFallback SNS Topic. The kinesisFallback SNS topic broadcasts the record and a subscribed copy of the messageConsumer Lambda named kinesisFallback consumes these failures.

    At this point, the normal Lambda asynchronous invocation retry behavior will attempt to process the record 3 mores times. After this, if the record cannot successfully be processed, it is written to a dead letter queue. Cumulus' dead letter queue is an SQS Queue named kinesisFailure. Operators can use this queue to inspect failed records.

    This system ensures when messageConsumer fails to process a record and trigger a workflow, the record is retried 3 times. This retry behavior improves system reliability in case of any external service failure outside of Cumulus control.

    The Kinesis error handling system - the kinesisFallback SNS topic, messageConsumer Lambda, and kinesisFailure SQS queue - come with the API package and do not need to be configured by the operator.

    To examine records that were unable to be processed at any step you need to go look at the dead letter queue {{prefix}}-kinesisFailure. Check the Simple Queue Service (SQS) console. Select your queue, and under the Queue Actions tab, you can choose View/Delete Messages. Start polling for messages and you will see records that failed to process through the messageConsumer.

    Note, these are only records that occurred when processing records from Kinesis streams. Workflow failures are handled differently.

    Kinesis Stream logging

    Notification Stream messages

    Cumulus includes two Lambdas (KinesisInboundEventLogger and KinesisOutboundEventLogger) that utilize the same code to take a Kinesis record event as input, deserialize the data field and output the modified event to the logs.

    When a kinesis rule is created, in addition to the messageConsumer event mapping, an event mapping is created to trigger KinesisInboundEventLogger to record a log of the inbound record, to allow for analysis in case of unexpected failure.

    Response Stream messages

    Cumulus also supports this feature for all outbound messages. To take advantage of this feature, you will need to set an event mapping on the KinesisOutboundEventLogger Lambda that targets your response-endpoint. You can do this in the Lambda management page for KinesisOutboundEventLogger. Add a Kinesis trigger, and configure it to target the cnmResponseStream for your workflow:

    Screenshot of the AWS console showing configuration for Kinesis stream trigger on KinesisOutboundEventLogger Lambda

    Once this is done, all records sent to the response-endpoint will also be logged in CloudWatch. For more on configuring Lambdas to trigger on Kinesis events, please see creating an event source mapping.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/data-cookbooks/error-handling/index.html b/docs/v13.4.0/data-cookbooks/error-handling/index.html index 3d09e41992a..edb58292e6c 100644 --- a/docs/v13.4.0/data-cookbooks/error-handling/index.html +++ b/docs/v13.4.0/data-cookbooks/error-handling/index.html @@ -5,7 +5,7 @@ Error Handling in Workflows | Cumulus Documentation - + @@ -45,7 +45,7 @@ Service Exception. See this documentation on configuring your workflow to handle transient lambda errors.

    Example state machine definition:

    {
    "Comment": "Tests Workflow from Kinesis Stream",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "Path": "$.payload",
    "TargetPath": "$.payload"
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": ["States.ALL"],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowSucceeded"
    },
    "CnmResponseFail": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowFailed"
    },
    "WorkflowSucceeded": {
    "Type": "Succeed"
    },
    "WorkflowFailed": {
    "Type": "Fail",
    "Cause": "Workflow failed"
    }
    }
    }

    The above results in a workflow which is visualized in the diagram below:

    Screenshot of a visualization of an AWS Step Function workflow definition with branching logic for failures

    Summary

    Error handling should (mostly) be the domain of workflow configuration.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/data-cookbooks/hello-world/index.html b/docs/v13.4.0/data-cookbooks/hello-world/index.html index 36cd2259672..6f6a30ba593 100644 --- a/docs/v13.4.0/data-cookbooks/hello-world/index.html +++ b/docs/v13.4.0/data-cookbooks/hello-world/index.html @@ -5,14 +5,14 @@ HelloWorld Workflow | Cumulus Documentation - +
    Version: v13.4.0

    HelloWorld Workflow

    Example task meant to be a sanity check/introduction to the Cumulus workflows.

    Pre-Deployment Configuration

    Workflow Configuration

    A workflow definition can be found in the template repository hello_world_workflow module.

    {
    "Comment": "Returns Hello World",
    "StartAt": "HelloWorld",
    "States": {
    "HelloWorld": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.hello_world_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    }

    Workflow error-handling can be configured as discussed in the Error-Handling cookbook.

    Task Configuration

    The HelloWorld task is provided for you as part of the cumulus terraform module, no configuration is needed.

    If you want to manually deploy your own version of this Lambda for testing, you can copy the Lambda resource definition located in the Cumulus source code at cumulus/tf-modules/ingest/hello-world-task.tf. The Lambda source code is located in the Cumulus source code at 'cumulus/tasks/hello-world'.

    Execution

    We will focus on using the Cumulus dashboard to schedule the execution of a HelloWorld workflow.

    Our goal here is to create a rule through the Cumulus dashboard that will define the scheduling and execution of our HelloWorld workflow. Let's navigate to the Rules page and click Add a rule.

    {
    "collection": { # collection values can be configured and found on the Collections page
    "name": "${collection_name}",
    "version": "${collection_version}"
    },
    "name": "helloworld_rule",
    "provider": "${provider}", # found on the Providers page
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "workflow": "HelloWorldWorkflow" # This can be found on the Workflows page
    }

    Screenshot of AWS Step Function execution graph for the HelloWorld workflow Executed workflow as seen in AWS Console

    Output/Results

    The Executions page presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information. The rule defined in the previous section should start an execution of its own accord, and the status of that execution can be tracked here.

    To get some deeper information on the execution, click on the value in the Name column of your execution of interest. This should bring up a visual representation of the workflow similar to that shown above, execution details, and a list of events.

    Summary

    Setting up the HelloWorld workflow on the Cumulus dashboard is the tip of the iceberg, so to speak. The task and step-function need to be configured before Cumulus deployment. A compatible collection and provider must be configured and applied to the rule. Finally, workflow execution status can be viewed via the workflows tab on the dashboard.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/data-cookbooks/ingest-notifications/index.html b/docs/v13.4.0/data-cookbooks/ingest-notifications/index.html index aa76b1abff9..17907b1ad07 100644 --- a/docs/v13.4.0/data-cookbooks/ingest-notifications/index.html +++ b/docs/v13.4.0/data-cookbooks/ingest-notifications/index.html @@ -5,13 +5,13 @@ Ingest Notification in Workflows | Cumulus Documentation - +
    Version: v13.4.0

    Ingest Notification in Workflows

    On deployment, an SQS queue and three SNS topics, one for executions, granules, and PDRs, are created and used for handling notification messages related to the workflow.

    The ingest notification reporting SQS queue is populated via a Cloudwatch rule for any Step Function execution state transitions. The sfEventSqsToDbRecords Lambda consumes this queue. The queue and Lambda are included in the cumulus module and the Cloudwatch rule in the workflow module and are included by default in a Cumulus deployment.

    The sfEventSqsToDbRecords Lambda function reads from the sfEventSqsToDbRecordsInputQueue queue and updates the RDS database records for granules, executions, and PDRs. When the records are updated, messages are posted to the three SNS topics. This Lambda is invoked both when the workflow starts and when it reaches a terminal state (completion or failure).

    Diagram of architecture for reporting workflow ingest notifications from AWS Step Functions

    Sending SQS messages to report status

    Publishing granule/PDR reports directly to the SQS queue

    If you have a non-Cumulus workflow or process ingesting data and would like to update the status of your granules or PDRs, you can publish directly to the reporting SQS queue. Publishing messages to this queue will result in those messages being stored as granule/PDR records in the Cumulus database and having the status of those granules/PDRs being visible on the Cumulus dashboard. The queue does have certain expectations as it expects a Cumulus Message nested within a Cloudwatch Step Function Event object.

    Posting directly to the queue will require knowing the queue URL. Assuming that you are using the cumulus module for your deployment, you can get the queue URL by adding them to outputs.tf for your Terraform deployment as in our example deployment:

    output "stepfunction_event_reporter_queue_url" {
    value = module.cumulus.stepfunction_event_reporter_queue_url
    }

    output "report_executions_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_granules_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_pdrs_sns_topic_arn" {
    value = module.cumulus.report_pdrs_sns_topic_arn
    }

    Then, when you run terraform deploy, you should see the topic ARNs printed to your console:

    Outputs:
    ...
    stepfunction_event_reporter_queue_url = https://sqs.us-east-1.amazonaws.com/xxxxxxxxx/<prefix>-sfEventSqsToDbRecordsInputQueue
    report_executions_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_granules_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_pdrs_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-pdrs-topic

    Once you have the queue URL, you can use the AWS SDK for your language of choice to publish messages to the topic. The expected format of these messages is that of a Cloudwatch Step Function event containing a Cumulus message. For SUCCEEDED events, the Cumulus message is expected to be in detail.output. For all other events statuses, a Cumulus Message is expected in detail.input. The Cumulus Message populating these fields MUST be a JSON string, not an object. Messages that do not conform to the schemas will fail to be created as records.

    If you are not seeing records persist to the database or show up in the Cumulus dashboard, you can investigate the Cloudwatch logs of the SQS consumer Lambda:

    • /aws/lambda/<prefix>-sfEventSqsToDbRecords

    In a workflow

    As described above, ingest notifications will automatically be published to the SNS topics on workflow start and completion/failure, so you should not include a workflow step to publish the initial or final status of your workflows.

    However, if you want to report your ingest status at any point during a workflow execution, you can add a workflow step using the SfSqsReport Lambda. In the following example from cumulus-tf/parse_pdr_workflow.tf, the ParsePdr workflow is configured to use the SfSqsReport Lambda, primarily to update the PDR ingestion status.

    Note: ${sf_sqs_report_task_arn} is an interpolated value referring to a Terraform resource. See the example deployment code for the ParsePdr workflow.

      "PdrStatusReport": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    },
    "ResultPath": null,
    "Type": "Task",
    "Resource": "${sf_sqs_report_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WaitForSomeTime"
    },

    Subscribing additional listeners to SNS topics

    Additional listeners to SNS topics can be configured in a .tf file for your Cumulus deployment. Shown below is configuration that subscribes an additional Lambda function (test_lambda) to receive messages from the report_executions SNS topic. To subscribe to the report_granules or report_pdrs SNS topics instead, simply replace report_executions in the code block below with either of those values.

    resource "aws_lambda_function" "test_lambda" {
    function_name = "${var.prefix}-testLambda"
    filename = "./testLambda.zip"
    source_code_hash = filebase64sha256("./testLambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"
    }

    resource "aws_sns_topic_subscription" "test_lambda" {
    topic_arn = module.cumulus.report_executions_sns_topic_arn
    protocol = "lambda"
    endpoint = aws_lambda_function.test_lambda.arn
    }

    resource "aws_lambda_permission" "test_lambda" {
    action = "lambda:InvokeFunction"
    function_name = aws_lambda_function.test_lambda.arn
    principal = "sns.amazonaws.com"
    source_arn = module.cumulus.report_executions_sns_topic_arn
    }

    SNS message format

    Subscribers to the SNS topics can expect to find the published message in the SNS event at Records[0].Sns.Message. The message will be a JSON stringified version of the ingest notification record for an execution or a PDR. For granules, the message will be a JSON stringified object with ingest notification record in the record property and the event type as the event property.

    The ingest notification record of the execution, granule, or PDR should conform to the data model schema for the given record type.

    Summary

    Workflows can be configured to send SQS messages at any point using the sf-sqs-report task.

    Additional listeners can be easily configured to trigger when messages are sent to the SNS topics.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/data-cookbooks/queue-post-to-cmr/index.html b/docs/v13.4.0/data-cookbooks/queue-post-to-cmr/index.html index 65d86c0687b..4ded0e4e4f5 100644 --- a/docs/v13.4.0/data-cookbooks/queue-post-to-cmr/index.html +++ b/docs/v13.4.0/data-cookbooks/queue-post-to-cmr/index.html @@ -5,13 +5,13 @@ Queue PostToCmr | Cumulus Documentation - +
    Version: v13.4.0

    Queue PostToCmr

    In this document, we walk through handling CMR errors in workflows by queueing PostToCmr. We assume that the user already has an ingest workflow setup.

    Overview

    The general concept is that the last task of the ingest workflow will be QueueWorkflow, which queues the publish workflow. The publish workflow contains the PostToCmr task and if a CMR error occurs during PostToCmr, the publish workflow will add itself back onto the queue so that it can be executed when CMR is back online. This is achieved by leveraging the QueueWorkflow task again in the publish workflow. The following diagram demonstrates this queueing process.

    Diagram of workflow queueing

    Ingest Workflow

    The last step should be the QueuePublishWorkflow step. It should be configured with a queueUrl and workflow. In this case, the queueUrl is a throttled queue. Any queueUrl can be specified here which is useful if you would like to use a lower priority queue. The workflow is the unprefixed workflow name that you would like to queue (e.g. PublishWorkflow).

      "QueuePublishWorkflowStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "workflow": "{$.meta.workflow}",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Publish Workflow

    Configure the Catch section of your PostToCmr task to proceed to QueueWorkflow if a CMRInternalError is caught. Any other error will cause the workflow to fail.

      "Catch": [
    {
    "ErrorEquals": [
    "CMRInternalError"
    ],
    "Next": "RequeueWorkflow"
    },
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],

    Then, configure the QueueWorkflow task similarly to its configuration in the ingest workflow. This time, pass the current publish workflow to the task config. This allows for the publish workflow to be requeued when there is a CMR error.

    {
    "RequeueWorkflow": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "workflow": "PublishGranuleQueue",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    - + \ No newline at end of file diff --git a/docs/v13.4.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html b/docs/v13.4.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html index f00b52df3a9..2fc042e621e 100644 --- a/docs/v13.4.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html +++ b/docs/v13.4.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html @@ -5,13 +5,13 @@ Run Step Function Tasks in AWS Lambda or Docker | Cumulus Documentation - +
    Version: v13.4.0

    Run Step Function Tasks in AWS Lambda or Docker

    Overview

    AWS Step Function Tasks can run tasks on AWS Lambda or on AWS Elastic Container Service (ECS) as a Docker container.

    Lambda provides serverless architecture, providing the best option for minimizing cost and server management. ECS provides the fullest extent of AWS EC2 resources via the flexibility to execute arbitrary code on any AWS EC2 instance type.

    When to use Lambda

    You should use AWS Lambda whenever all of the following are true:

    • The task runs on one of the supported Lambda Runtimes. At time of this writing, supported runtimes include versions of python, Java, Ruby, node.js, Go and .NET.
    • The lambda package is less than 50 MB in size, zipped.
    • The task consumes less than each of the following resources:
      • 3008 MB memory allocation
      • 512 MB disk storage (must be written to /tmp)
      • 15 minutes of execution time

    See this page for a complete and up-to-date list of AWS Lambda limits.

    If your task requires more than any of these resources or an unsupported runtime, creating a Docker image which can be run on ECS is the way to go. Cumulus supports running any lambda package (and its configured layers) as a Docker container with cumulus-ecs-task.

    Step Function Activities and cumulus-ecs-task

    Step Function Activities enable a state machine task to "publish" an activity task which can be picked up by any activity worker. Activity workers can run pretty much anywhere, but Cumulus workflows support the cumulus-ecs-task activity worker. The cumulus-ecs-task worker runs as a Docker container on the Cumulus ECS cluster.

    The cumulus-ecs-task container takes an AWS Lambda Amazon Resource Name (ARN) as an argument (see --lambdaArn in the example below). This ARN argument is defined at deployment time. The cumulus-ecs-task worker polls for new Step Function Activity Tasks. When a Step Function executes, the worker (container) picks up the activity task and runs the code contained in the lambda package defined on deployment.

    Example: Replacing AWS Lambda with a Docker container run on ECS

    This example will use an already-defined workflow from the cumulus module that includes the QueueGranules task in its configuration.

    The following example is an excerpt from the Discover Granules workflow containing the step definition for the QueueGranules step:

    Note: ${ingest_granule_workflow_name} and ${queue_granules_task_arn} are interpolated values that refer to Terraform resources. See the example deployment code for the Discover Granules workflow.

      "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "queueUrl": "{$.meta.queues.startSF}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Given it has been discovered this task can no longer run in AWS Lambda, you can instead run it on the Cumulus ECS cluster by adding the following resources to your terraform deployment (by either adding a new .tf file or updating an existing one):

    • A aws_sfn_activity resource:
    resource "aws_sfn_activity" "queue_granules" {
    name = "${var.prefix}-QueueGranules"
    }
    • An instance of the cumulus_ecs_service module (found on the Cumulus releases page configured to provide the QueueGranules task:

    module "queue_granules_service" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-ecs-service.zip"

    prefix = var.prefix
    name = "QueueGranules"

    cluster_arn = module.cumulus.ecs_cluster_arn
    desired_count = 1
    image = "cumuluss/cumulus-ecs-task:1.7.0"

    cpu = 400
    memory_reservation = 700

    environment = {
    AWS_DEFAULT_REGION = data.aws_region.current.name
    }
    command = [
    "cumulus-ecs-task",
    "--activityArn",
    aws_sfn_activity.queue_granules.id,
    "--lambdaArn",
    module.cumulus.queue_granules_task.task_arn,
    "--lastModified",
    module.cumulus.queue_granules_task.last_modified_date
    ]
    alarms = {
    MemoryUtilizationHigh = {
    comparison_operator = "GreaterThanThreshold"
    evaluation_periods = 1
    metric_name = "MemoryUtilization"
    statistic = "SampleCount"
    threshold = 75
    }
    }
    }

    Please note: If you have updated the code for the Lambda specified by --lambdaArn, you will have to manually restart the tasks in your ECS service before invocation of the Step Function activity will use the updated Lambda code.

    • An updated Discover Granules workflow) to utilize the new resource (the Resource key in the QueueGranules step has been updated to:

    "Resource": "${aws_sfn_activity.queue_granules.id}")`

    If you then run this workflow in place of the DiscoverGranules workflow, the QueueGranules step would run as an ECS task instead of a lambda.

    Final note

    Step Function Activities and AWS Lambda are not the only ways to run tasks in an AWS Step Function. Learn more about other service integrations, including direct ECS integration via the AWS Service Integrations page.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/data-cookbooks/sips-workflow/index.html b/docs/v13.4.0/data-cookbooks/sips-workflow/index.html index c04e685855d..c8a0225699a 100644 --- a/docs/v13.4.0/data-cookbooks/sips-workflow/index.html +++ b/docs/v13.4.0/data-cookbooks/sips-workflow/index.html @@ -5,7 +5,7 @@ Science Investigator-led Processing Systems (SIPS) | Cumulus Documentation - + @@ -16,7 +16,7 @@ we're just going to create a onetime throw-away rule that will be easy to test with. This rule will kick off the DiscoverAndQueuePdrs workflow, which is the beginning of a Cumulus SIPS workflow:

    Screenshot of a Cumulus rule configuration

    Note: A list of configured workflows exists under the "Workflows" in the navigation bar on the Cumulus dashboard. Additionally, one can find a list of executions and their respective status in the "Executions" tab in the navigation bar.

    DiscoverAndQueuePdrs Workflow

    This workflow will discover PDRs and queue them to be processed. Duplicate PDRs will be dealt with according to the configured duplicate handling setting in the collection. The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. DiscoverPdrs - source
    2. QueuePdrs - source

    Screenshot of execution graph for discover and queue PDRs workflow in the AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the discover_and_queue_pdrs_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    ParsePdr Workflow

    The ParsePdr workflow will parse a PDR, queue the specified granules (duplicates are handled according to the duplicate handling setting) and periodically check the status of those queued granules. This workflow will not succeed until all the granules included in the PDR are successfully ingested. If one of those fails, the ParsePdr workflow will fail. NOTE that ParsePdr may spin up multiple IngestGranule workflows in parallel, depending on the granules included in the PDR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. ParsePdr - source
    2. QueueGranules - source
    3. CheckStatus - source

    Screenshot of execution graph for SIPS Parse PDR workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the parse_pdr_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    IngestGranule Workflow

    The IngestGranule workflow processes and ingests a granule and posts the granule metadata to CMR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. SyncGranule - source.
    2. CmrStep - source

    Additionally this workflow requires a processing step you must provide. The ProcessingStep step in the workflow picture below is an example of a custom processing step.

    Note: Using the CmrStep is not required and can be left out of the processing trajectory if desired (for example, in testing situations).

    Screenshot of execution graph for SIPS IngestGranule workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the ingest_and_publish_granule_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    Summary

    In this cookbook we went over setting up a collection, rule, and provider for a SIPS workflow. Once we had the setup completed, we looked over the Cumulus workflows that participate in parsing PDRs, ingesting and processing granules, and updating CMR.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/data-cookbooks/throttling-queued-executions/index.html b/docs/v13.4.0/data-cookbooks/throttling-queued-executions/index.html index 82e2c12a423..96e3d497d0e 100644 --- a/docs/v13.4.0/data-cookbooks/throttling-queued-executions/index.html +++ b/docs/v13.4.0/data-cookbooks/throttling-queued-executions/index.html @@ -5,13 +5,13 @@ Throttling queued executions | Cumulus Documentation - +
    Version: v13.4.0

    Throttling queued executions

    In this entry, we will walk through how to create an SQS queue for scheduling executions which will be used to limit those executions to a maximum concurrency. And we will see how to configure our Cumulus workflows/rules to use this queue.

    We will also review the architecture of this feature and highlight some implementation notes.

    Limiting the number of executions that can be running from a given queue is useful for controlling the cloud resource usage of workflows that may be lower priority, such as granule reingestion or reprocessing campaigns. It could also be useful for preventing workflows from exceeding known resource limits, such as a maximum number of open connections to a data provider.

    Implementing the queue

    Create and deploy the queue

    Add a new queue

    In a .tf file for your Cumulus deployment, add a new SQS queue:

    resource "aws_sqs_queue" "background_job_queue" {
    name = "${var.prefix}-backgroundJobQueue"
    receive_wait_time_seconds = 20
    visibility_timeout_seconds = 60
    }

    Set maximum executions for the queue

    Define the throttled_queues variable for the cumulus module in your Cumulus deployment to specify the maximum concurrent executions for the queue.

    module "cumulus" {
    # ... other variables

    throttled_queues = [{
    url = aws_sqs_queue.background_job_queue.id,
    execution_limit = 5
    }]
    }

    Setup consumer for the queue

    Add the sqs2sfThrottle Lambda as the consumer for the queue and add a Cloudwatch event rule/target to read from the queue on a scheduled basis.

    Please note: You must use the sqs2sfThrottle Lambda as the consumer for any queue with a queue execution limit or else the execution throttling will not work correctly. Additionally, please allow at least 60 seconds after creation before using the queue while associated infrastructure and triggers are set up and made ready.

    aws_sqs_queue.background_job_queue.id refers to the queue resource defined above.

    resource "aws_cloudwatch_event_rule" "background_job_queue_watcher" {
    schedule_expression = "rate(1 minute)"
    }

    resource "aws_cloudwatch_event_target" "background_job_queue_watcher" {
    rule = aws_cloudwatch_event_rule.background_job_queue_watcher.name
    arn = module.cumulus.sqs2sfThrottle_lambda_function_arn
    input = jsonencode({
    messageLimit = 500
    queueUrl = aws_sqs_queue.background_job_queue.id
    timeLimit = 60
    })
    }

    resource "aws_lambda_permission" "background_job_queue_watcher" {
    action = "lambda:InvokeFunction"
    function_name = module.cumulus.sqs2sfThrottle_lambda_function_arn
    principal = "events.amazonaws.com"
    source_arn = aws_cloudwatch_event_rule.background_job_queue_watcher.arn
    }

    Re-deploy your Cumulus application

    Follow the instructions to re-deploy your Cumulus application. After you have re-deployed, your workflow template will be updated to the include information about the queue (the output below is partial output from an expected workflow template):

    {
    "cumulus_meta": {
    "queueExecutionLimits": {
    "<backgroundJobQueue_SQS_URL>": 5
    }
    }
    }

    Integrate your queue with workflows and/or rules

    Integrate queue with queuing steps in workflows

    For any workflows using QueueGranules or QueuePdrs that you want to use your new queue, update the Cumulus configuration of those steps in your workflows.

    As seen in this partial configuration for a QueueGranules step, update the queueUrl to reference the new throttled queue:

    Note: ${ingest_granule_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverGranules workflow.

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}"
    }
    }
    }
    }
    }

    Similarly, for a QueuePdrs step:

    Note: ${parse_pdr_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverPdrs workflow.

    {
    "QueuePdrs": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "parsePdrWorkflow": "${parse_pdr_workflow_name}"
    }
    }
    }
    }
    }

    After making these changes, re-deploy your Cumulus application for the execution throttling to take effect on workflow executions queued by these workflows.

    Create/update a rule to use your new queue

    Create or update a rule definition to include a queueUrl property that refers to your new queue:

    {
    "name": "s3_provider_rule",
    "workflow": "DiscoverAndQueuePdrs",
    "provider": "s3_provider",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "queueUrl": "<backgroundJobQueue_SQS_URL>" // configure rule to use your queue URL
    }

    After creating/updating the rule, any subsequent invocations of the rule should respect the maximum number of executions when starting workflows from the queue.

    Architecture

    Architecture diagram showing how executions started from a queue are throttled to a maximum concurrent limit

    Execution throttling based on the queue works by manually keeping a count (semaphore) of how many executions are running for the queue at a time. The key operation that prevents the number of executions from exceeding the maximum for the queue is that before starting new executions, the sqs2sfThrottle Lambda attempts to increment the semaphore and responds as follows:

    • If the increment operation is successful, then the count was not at the maximum and an execution is started
    • If the increment operation fails, then the count was already at the maximum so no execution is started

    Final notes

    Limiting the number of concurrent executions for work scheduled via a queue has several consequences worth noting:

    • The number of executions that are running for a given queue will be limited to the maximum for that queue regardless of which workflow(s) are started.
    • If you use the same queue to schedule executions across multiple workflows/rules, then the limit on the total number of executions running concurrently will be applied to all of the executions scheduled across all of those workflows/rules.
    • If you are scheduling the same workflow both via a queue with a maxExecutions value and a queue without a maxExecutions value, only the executions scheduled via the queue with the maxExecutions value will be limited to the maximum.
    - + \ No newline at end of file diff --git a/docs/v13.4.0/data-cookbooks/tracking-files/index.html b/docs/v13.4.0/data-cookbooks/tracking-files/index.html index f02842f9307..6302c9602ac 100644 --- a/docs/v13.4.0/data-cookbooks/tracking-files/index.html +++ b/docs/v13.4.0/data-cookbooks/tracking-files/index.html @@ -5,7 +5,7 @@ Tracking Ancillary Files | Cumulus Documentation - + @@ -19,7 +19,7 @@ The UMM-G column reflects the RelatedURL's Type derived from the CNM type, whereas the ECHO10 column shows how the CNM type affects the destination element.

    CNM TypeUMM-G RelatedUrl.TypeECHO10 Location
    ancillary'VIEW RELATED INFORMATION'OnlineResource
    data'GET DATA'(HTTPS URL) or 'GET DATA VIA DIRECT ACCESS'(S3 URI)OnlineAccessURL
    browse'GET RELATED VISUALIZATION'AssociatedBrowseImage
    linkage'EXTENDED METADATA'OnlineResource
    metadata'EXTENDED METADATA'OnlineResource
    qa'EXTENDED METADATA'OnlineResource

    Common Use Cases

    This section briefly documents some common use cases and the recommended configuration for the file. The examples shown here are for the DiscoverGranules use case, which allows configuration at the Cumulus dashboard level. The other two cases covered in the ancillary metadata documentation require configuration at the provider notification level (either CNM message or PDR) and are not covered here.

    Configuring browse imagery:

    {
    "bucket": "public",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_[\\d]{1}.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_1.jpg",
    "type": "browse"
    }

    Configuring a documentation entry:

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_README.pdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_README.pdf",
    "type": "metadata"
    }

    Configuring other associated files (use types metadata or qa as appropriate):

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_QA.txt$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_QA.txt",
    "type": "qa"
    }
    - + \ No newline at end of file diff --git a/docs/v13.4.0/deployment/api-gateway-logging/index.html b/docs/v13.4.0/deployment/api-gateway-logging/index.html index 880be3c6b39..b7114317b19 100644 --- a/docs/v13.4.0/deployment/api-gateway-logging/index.html +++ b/docs/v13.4.0/deployment/api-gateway-logging/index.html @@ -5,13 +5,13 @@ API Gateway Logging | Cumulus Documentation - +
    Version: v13.4.0

    API Gateway Logging

    Enabling API Gateway logging

    In order to enable distribution API Access and execution logging, configure the TEA deployment by setting log_api_gateway_to_cloudwatch on the thin_egress_app module:

    log_api_gateway_to_cloudwatch = true

    This enables the distribution API to send its logs to the default CloudWatch location: API-Gateway-Execution-Logs_<RESTAPI_ID>/<STAGE>

    Configure Permissions for API Gateway Logging to CloudWatch

    Instructions for enabling account level logging from API Gateway to CloudWatch

    This is a one time operation that must be performed on each AWS account to allow API Gateway to push logs to CloudWatch.

    Create a policy document

    The AmazonAPIGatewayPushToCloudWatchLogs managed policy, with an ARN of arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs, has all the required permissions to enable API Gateway logging to CloudWatch. To grant these permissions to your account, first create an IAM role with apigateway.amazonaws.com as its trusted entity.

    Save this snippet as apigateway-policy.json.

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "",
    "Effect": "Allow",
    "Principal": {
    "Service": "apigateway.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
    }
    ]
    }

    Create an account role to act as ApiGateway and write to CloudWatchLogs

    NASA users in NGAP: be sure to use your account's permission boundary.

    aws iam create-role \
    --role-name ApiGatewayToCloudWatchLogs \
    [--permissions-boundary <permissionBoundaryArn>] \
    --assume-role-policy-document file://apigateway-policy.json

    Note the ARN of the returned role for the last step.

    Attach correct permissions to role

    Next attach the AmazonAPIGatewayPushToCloudWatchLogs policy to the IAM role.

    aws iam attach-role-policy \
    --role-name ApiGatewayToCloudWatchLogs \
    --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs"

    Update Account API Gateway settings with correct permissions

    Finally, set the IAM role ARN on the cloudWatchRoleArn property on your API Gateway Account settings.

    aws apigateway update-account \
    --patch-operations op='replace',path='/cloudwatchRoleArn',value='<ApiGatewayToCloudWatchLogs ARN>'

    Configure API Gateway CloudWatch Logs Delivery

    See Configure Cloudwatch Logs Delivery

    - + \ No newline at end of file diff --git a/docs/v13.4.0/deployment/choosing_configuring_rds/index.html b/docs/v13.4.0/deployment/choosing_configuring_rds/index.html index 2b96cdb7ad3..0c37a1d57da 100644 --- a/docs/v13.4.0/deployment/choosing_configuring_rds/index.html +++ b/docs/v13.4.0/deployment/choosing_configuring_rds/index.html @@ -5,7 +5,7 @@ Choosing and configuration your RDS database | Cumulus Documentation - + @@ -37,7 +37,7 @@ using this module to create your RDS cluster, you can configure the autoscaling timeout action, the cluster minimum and maximum capacity, and more as seen in the supported variables for the module.

    Unfortunately, Terraform currently doesn't allow specifying the autoscaling timeout itself, so that value will have to be manually configured in the AWS console or CLI.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/deployment/cloudwatch-logs-delivery/index.html b/docs/v13.4.0/deployment/cloudwatch-logs-delivery/index.html index 023ab3c86e0..e19284cc844 100644 --- a/docs/v13.4.0/deployment/cloudwatch-logs-delivery/index.html +++ b/docs/v13.4.0/deployment/cloudwatch-logs-delivery/index.html @@ -5,13 +5,13 @@ Configure Cloudwatch Logs Delivery | Cumulus Documentation - +
    Version: v13.4.0

    Configure Cloudwatch Logs Delivery

    As an optional configuration step, it is possible to deliver CloudWatch logs to a cross-account shared AWS::Logs::Destination. An operator does this by configuring the cumulus module for your deployment as shown below. The value of the log_destination_arn variable is the ARN of a writeable log destination.

    The value can be either an AWS::Logs::Destination or a Kinesis Stream ARN to which your account can write.

    log_destination_arn           = arn:aws:[kinesis|logs]:us-east-1:123456789012:[streamName|destination:logDestinationName]

    Logs Sent

    Be default, the following logs will be sent to the destination when one is given.

    • Ingest logs
    • Async Operation logs
    • Thin Egress App API Gateway logs (if configured)

    Additional Logs

    If additional logs are needed, you can configure additional_log_groups_to_elk with the Cloudwatch log groups you want to send to the destination. additional_log_groups_to_elk is a map with the key as a descriptor and the value with the Cloudwatch log group name.

    additional_log_groups_to_elk = {
    "HelloWorldTask" = "/aws/lambda/cumulus-example-HelloWorld"
    "MyCustomTask" = "my-custom-task-log-group"
    }
    - + \ No newline at end of file diff --git a/docs/v13.4.0/deployment/components/index.html b/docs/v13.4.0/deployment/components/index.html index 964dea8c047..de1171006b6 100644 --- a/docs/v13.4.0/deployment/components/index.html +++ b/docs/v13.4.0/deployment/components/index.html @@ -5,7 +5,7 @@ Component-based Cumulus Deployment | Cumulus Documentation - + @@ -39,7 +39,7 @@ Terraform at the same time.

    With remote state, Terraform writes the state data to a remote data store, which can then be shared between all members of a team.

    The recommended approach for handling remote state with Cumulus is to use the S3 backend. This backend stores state in S3 and uses a DynamoDB table for locking.

    See the deployment documentation for a walk-through of creating resources for your remote state using an S3 backend.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/deployment/create_bucket/index.html b/docs/v13.4.0/deployment/create_bucket/index.html index 3e300ab2185..d7fba77e6dd 100644 --- a/docs/v13.4.0/deployment/create_bucket/index.html +++ b/docs/v13.4.0/deployment/create_bucket/index.html @@ -5,13 +5,13 @@ Creating an S3 Bucket | Cumulus Documentation - +
    Version: v13.4.0

    Creating an S3 Bucket

    Buckets can be created on the command line with AWS CLI or via the web interface on the AWS console.

    When creating a protected bucket (a bucket containing data which will be served through the distribution API), make sure to enable S3 server access logging. See S3 Server Access Logging for more details.

    Command line

    Using the AWS command line tool create-bucket s3api subcommand:

    $ aws s3api create-bucket \
    --bucket foobar-internal \
    --region us-west-2 \
    --create-bucket-configuration LocationConstraint=us-west-2
    {
    "Location": "/foobar-internal"
    }

    Note: The region and create-bucket-configuration arguments are only necessary if you are creating a bucket outside of the us-east-1 region.

    Please note security settings and other bucket options can be set via the options listed in the s3api documentation.

    Repeat the above step for each bucket to be created.

    Web interface

    See: AWS "Creating a Bucket" documentation

    - + \ No newline at end of file diff --git a/docs/v13.4.0/deployment/cumulus_distribution/index.html b/docs/v13.4.0/deployment/cumulus_distribution/index.html index 2017bcf3e75..bb7da762678 100644 --- a/docs/v13.4.0/deployment/cumulus_distribution/index.html +++ b/docs/v13.4.0/deployment/cumulus_distribution/index.html @@ -5,14 +5,14 @@ Using the Cumulus Distribution API | Cumulus Documentation - +
    Version: v13.4.0

    Using the Cumulus Distribution API

    The Cumulus Distribution API is a set of endpoints that can be used to enable AWS Cognito authentication when downloading data from S3.

    Configuring a Cumulus Distribution deployment

    The Cumulus Distribution API is included in the main Cumulus repo. It is available as part of the terraform-aws-cumulus.zip archive in the latest release.

    These steps assume you're using the Cumulus Deployment Template but can also be used for custom deployments.

    To configure a deployment to use Cumulus Distribution:

    1. Remove or comment the "Thin Egress App Settings" in the Cumulus Template Deploy and enable the Cumulus Distribution settings.
    2. Delete or comment the contents of thin_egress_app.tf and the corresponding Thin Egress App outputs in outputs.tf. These are not necessary for a Cumulus Distribution deployment.
    3. Uncomment the Cumulus Distribution outputs in outputs.tf.
    4. Rename cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.example to cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.

    Cognito Application and User Credentials

    The major prerequisite for using the Cumulus Distribution API is to set up Cognito. If operating within NGAP, this should already be done for you. If operating outside of NGAP, you must set up Cognito yourself, which is beyond the scope of this documentation.

    Given that Cognito is set up, in order to be able to download granule files via the Cumulus Distribution API, you must obtain Cognito user credentials, because any attempt to download such files (that will be, or have been, published to the CMR via your Cumulus deployment) will result in a prompt for you to supply Cognito user credentials. To obtain your own user credentials, talk to your product owner or scrum master for additional information. They should either know how to create the credentials, know who can create them for the team, or be the liaison to the Cognito team.

    Further, whoever helps to obtain your Cognito user credentials should also be able to supply you with the values for the following new variables that you must add to your cumulus-tf/terraform.tfvars file:

    • csdap_host_url: The URL of the Cognito service to which your Cumulus deployment will make Cognito API calls during a distribution (download) event
    • csdap_client_id: The client ID for the Cumulus application registered within the Cognito service
    • csdap_client_password: The client password for the Cumulus application registered within the Cognito service

    Although you might have to wait a bit for your Cognito user credentials, the remaining instructions do not depend upon having them, so you may continue with these instructions while waiting for your credentials.

    Cumulus Distribution URL

    Your Cumulus Distribution URL is used by Cumulus to generate download URLs as part of the granule metadata generated and published to the CMR. For example, a granule download URL will be of the form <distribution url>/<protected bucket>/<key> (or <distribution url>/path/to/file, if using a custom bucket map, as explained further below).

    By default, the value of your distribution URL is the URL of your private Cumulus Distribution API Gateway (the API Gateway named <prefix>-distribution, once you deploy the Cumulus Distribution module). Therefore, by default, the generated download URLs are private, and thus inaccessible directly, but there are 2 ways to address this issue (both of which are detailed below): (a) use tunneling (typically in development) or (b) put a CloudFront URL in front of your API Gateway (typically in production, and perhaps UAT and/or SIT).

    In either case, you must first know the default URL (i.e., the URL for the private Cumulus Distribution API Gateway). In order to obtain this default URL, you must first deploy your cumulus-tf module with the new Cumulus Distribution module, and once your initial deployment is complete, one of the Terraform outputs will be cumulus_distribution_api_uri, which is the URL for the private API Gateway.

    You may override this default URL by adding a cumulus_distribution_url variable to your cumulus-tf/terraform.tfvars file, and setting it to one of the following values (both of which are explained below):

    1. The default URL, but with a port added to it, in order to allow you to configure tunneling (typically only in development)
    2. A CloudFront URL placed in front of your Cumulus Distribution API Gateway (typically only for Production, but perhaps also for a UAT or SIT environment)

    The following subsections explain these approaches, in turn.

    Using your Cumulus Distribution API Gateway URL as your distribution URL

    Since your Cumulus Distribution API Gateway URL is private, the only way you can use it to confirm that your integration with Cognito is working is by using tunneling (again, generally for development), as described here. Here is an outline of the required steps, with details provided further below:

    1. Create/import a key pair into your AWS EC2 service (if you haven't already done so)
    2. Add a reference to the name of the key pair to your Terraform variables (we'll set the key_name Terraform variable)
    3. Choose an open local port on your machine (we'll use 9000 in the following details)
    4. Add a reference to the value of your cumulus_distribution_api_uri (mentioned earlier), including your chosen port (we'll set the cumulus_distribution_url Terraform variable)
    5. Redeploy Cumulus
    6. Add an entry to your /etc/hosts file
    7. Add a redirect URI to Cognito, via the Cognito API
    8. Install the Session Manager Plugin for the AWS CLI (if you haven't already done so; assuming you have already installed the AWS CLI)
    9. Add a sample file to S3 to test downloading via Cognito

    To create or import an existing key pair, you can use the AWS CLI (see aws ec2 import-key-pair), or the AWS Console (see Amazon EC2 key pairs and Linux instances).

    Once your key pair is added to AWS, add the following to your cumulus-tf/terraform.tfvars file:

    key_name = "<name>"
    cumulus_distribution_url = "https://<id>.execute-api.<region>.amazonaws.com:<port>/dev/"

    where:

    • <name> is the name of the key pair you just added to AWS
    • <id> and <region> are the corresponding parts from your cumulus_distribution_api_uri output variable
    • <port> is your open local port of choice (9000 is typically a good choice)

    Once you save your variable changes, redeploy your cumulus-tf module.

    While your deployment runs, add the following entry to your /etc/hosts file, replacing <hostname> with the host name of the cumulus_distribution_url Terraform variable you just added above:

    localhost <hostname>

    Next, you'll need to use the Cognito API to add the value of your cumulus_distribution_url Terraform variable as a Cognito redirect URI. To do so, use your favorite tool (e.g., curl, wget, Postman, etc.) to make a BasicAuth request to the Cognito API, using the following details:

    • method: POST
    • base URL: the value of your csdap_host_url Terraform variable
    • path: /authclient/updateRedirectUri
    • username: the value of your csdap_client_id Terraform variable
    • password: the value of your csdap_client_password Terraform variable
    • headers: Content-Type='application/x-www-form-urlencoded'
    • body: redirect_uri=<cumulus_distribution_url>/login

    where <cumulus_distribution_url> is the value of your cumulus_distribution_url Terraform variable. Note the /login path at the end of the redirect_uri value.

    For reference, see the Cognito Authentication Service API.

    Next, install the Session Manager Plugin for the AWS CLI. If running on macOS, and you use Homebrew, you can install it simply as follows:

    brew install --cask session-manager-plugin --no-quarantine

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    At this point, you should be ready to open a tunnel and attempt to download your sample file via your browser, summarized as follows:

    1. Determine your ec2 instance ID
    2. Connect to the NASA VPN
    3. Start an AWS SSM session
    4. Open an ssh tunnel
    5. Use a browser to navigate to your file

    To determine your ec2 instance ID for your Cumulus deployment, run the follow command, where <profile> is the name of the appropriate AWS profile to use, and <prefix> is the value of your prefix Terraform variable:

    aws --profile <profile> ec2 describe-instances --filters Name=tag:Deployment,Values=<prefix> Name=instance-state-name,Values=running --query "Reservations[0].Instances[].InstanceId" --output text

    IMPORTANT: Before proceeding with the remaining steps, make sure you're connected to the NASA VPN.

    Use the value output from the command above in place of <id> in the following command, which will start an SSM session:

    aws ssm start-session --target <id> --document-name AWS-StartPortForwardingSession --parameters portNumber=22,localPortNumber=6000

    If successful, you should see output similar to the following:

    Starting session with SessionId: NGAPShApplicationDeveloper-***
    Port 6000 opened for sessionId NGAPShApplicationDeveloper-***.
    Waiting for connections...

    Open another terminal window, and open a tunnel with port forwarding, using your chosen port from above (e.g., 9000):

    ssh -4 -p 6000 -N -L <port>:<api-gateway-host>:443 ec2-user@127.0.0.1

    where:

    • <port> is the open local port you chose earlier (e.g., 9000)
    • <api-gateway-host> is the hostname of your private API Gateway (i.e., the host portion of the URL you used as the value of your cumulus_distribution_url Terraform variable above)

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3 above.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    Once you're finished testing, clean up as follows:

    1. Kill your ssh tunnel (Ctrl-C)
    2. Kill your AWS SSM session (Ctrl-C)
    3. If you like, disconnect from the NASA VPC

    While this is a relatively lengthy process, things are much easier when using CloudFront, such as in Production (OPS), SIT, or UAT, as explained next.

    Using a CloudFront URL as your distribution URL

    In Production (OPS), and perhaps in other environments, such as UAT and SIT, you'll need to provide a publicly accessible URL for users to use for downloading (distributing) granule files.

    This is generally done by placing a CloudFront URL in front of your private Cumulus Distribution API Gateway. In order to create such a CloudFront URL, contact the person who helped you obtain your Cognito credentials, and request a CloudFront URL with the following details:

    • The private, backing URL, which is the value of your cumulus_distribution_api_uri Terraform output value
    • A request to add the AWS account's VPC to the whitelist

    Once this request is completed, and you obtain the new CloudFront URL, override your default distribution URL with the CloudFront URL by adding the following to your cumulus-tf/terraform.tfvars file:

    cumulus_distribution_url = <cloudfront_url>

    In addition, add a Cognito redirect URI, as detailed in the previous section. Note that in this case, the value you'll use for redirect_uri is <cloudfront_url>/login since the value of your cumulus_distribution_url is now your CloudFront URL.

    At this point, it is assumed that you have added the appropriate values for this environment for the variables described at the top (csdap_host_url, csdap_client_id, and csdap_client_password).

    Redeploy Cumulus with your new/updated Terraform variables.

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    S3 Bucket Mapping

    An S3 Bucket map allows users to abstract bucket names. If the bucket names change at any point, only the bucket map would need to be updated instead of every S3 link.

    The Cumulus Distribution API uses a bucket_map.yaml or bucket_map.yaml.tmpl file to determine which buckets to serve. See the examples.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple json mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    Note: Cumulus only supports a one-to-one mapping of bucket -> Cumulus Distribution path for 'distribution' buckets. Also, the bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Switching from the Thin Egress App to Cumulus Distribution

    If you have previously deployed the Thin Egress App (TEA) as your distribution app, you can switch to Cumulus Distribution by following the steps above.

    Note, however, that the cumulus_distribution module will generate a bucket map cache and overwrite any existing bucket map caches created by TEA.

    There will also be downtime while your API gateway is updated.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/deployment/index.html b/docs/v13.4.0/deployment/index.html index 473576434ac..4f30bc3255f 100644 --- a/docs/v13.4.0/deployment/index.html +++ b/docs/v13.4.0/deployment/index.html @@ -5,7 +5,7 @@ How to Deploy Cumulus | Cumulus Documentation - + @@ -21,7 +21,7 @@ for deployment's EC2 instances and allows you to connect to them via SSH/SSM.

    Consider the sizing of your Cumulus instance when configuring your variables.

    Choose a distribution API

    Cumulus can be configured to use either the Thin Egress App (TEA) or the Cumulus Distribution API. The default selection is the Thin Egress App if you're using the Deployment Template.

    IMPORTANT! If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    Configure the Thin Egress App

    The Thin Egress App can be used for Cumulus distribution and is the default selection. It allows authentication using Earthdata Login. Follow the steps in the documentation to configure distribution in your cumulus-tf deployment.

    Configure the Cumulus Distribution API (optional)

    If you would prefer to use the Cumulus Distribution API, which supports AWS Cognito authentication, follow these steps to configure distribution in your cumulus-tf deployment.

    Initialize Terraform

    Follow the above instructions to initialize Terraform using terraform init3.

    Deploy

    Run terraform apply to deploy the resources. Type yes when prompted to confirm that you want to create the resources. Assuming the operation is successful, you should see output like this:

    Apply complete! Resources: 292 added, 0 changed, 0 destroyed.

    Outputs:

    archive_api_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/token
    archive_api_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/
    distribution_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/login
    distribution_url = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/

    Note: Be sure to copy the redirect URLs, as you will use them to update your Earthdata application.

    Update Earthdata Application

    You will need to add two redirect URLs to your EarthData login application.

    1. Login to URS.
    2. Under My Applications -> Application Administration -> use the edit icon of your application.
    3. Under Manage -> redirect URIs, add the Archive API url returned from the stack deployment
      • e.g. archive_api_redirect_uri = https://<czbbkscuy6>.execute-api.us-east-1.amazonaws.com/dev/token.
    4. Also add the Distribution url
      • e.g. distribution_redirect_uri = https://<kido2r7kji>.execute-api.us-east-1.amazonaws.com/dev/login1.
    5. You may delete the placeholder url you used to create the application.

    If you've lost track of the needed redirect URIs, they can be located on the API Gateway. Once there, select <prefix>-archive and/or <prefix>-thin-egress-app-EgressGateway, Dashboard and utilizing the base URL at the top of the page that is accompanied by the text Invoke this API at:. Make sure to append /token for the archive URL and /login to the thin egress app URL.


    Deploy Cumulus dashboard

    Dashboard Requirements

    Please note that the requirements are similar to the Cumulus stack deployment requirements. The installation instructions below include a step that will install/use the required node version referenced in the .nvmrc file in the dashboard repository.

    Prepare AWS

    Create S3 bucket for dashboard:

    • Create it, e.g. <prefix>-dashboard. Use the command line or console as you did when preparing AWS configuration.
    • Configure the bucket to host a website:
      • AWS S3 console: Select <prefix>-dashboard bucket then, "Properties" -> "Static Website Hosting", point to index.html
      • CLI: aws s3 website s3://<prefix>-dashboard --index-document index.html
    • The bucket's url will be http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or you can find it on the AWS console via "Properties" -> "Static website hosting" -> "Endpoint"
    • Ensure the bucket's access permissions allow your deployment user access to write to the bucket

    Install dashboard

    To install the dashboard, clone the Cumulus dashboard repository into the root deploy directory and install dependencies with npm install:

      git clone https://github.com/nasa/cumulus-dashboard
    cd cumulus-dashboard
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Dashboard versioning

    By default, the master branch will be used for dashboard deployments. The master branch of the dashboard repo contains the most recent stable release of the dashboard.

    If you want to test unreleased changes to the dashboard, use the develop branch.

    Each release/version of the dashboard will have a tag in the dashboard repo. Release/version numbers will use semantic versioning (major/minor/patch).

    To checkout and install a specific version of the dashboard:

      git fetch --tags
    git checkout <version-number> # e.g. v1.2.0
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Building the dashboard

    Note: These environment variables are available during the build: APIROOT, DAAC_NAME, STAGE, HIDE_PDR. Any of these can be set on the command line to override the values contained in config.js when running the build below.

    To configure your dashboard for deployment, set the APIROOT environment variable to your app's API root.2

    Build the dashboard from the dashboard repository root directory, cumulus-dashboard:

      APIROOT=<your_api_root> npm run build

    Dashboard deployment

    Deploy dashboard to s3 bucket from the cumulus-dashboard directory:

    Using AWS CLI:

      aws s3 sync dist s3://<prefix>-dashboard --acl public-read

    From the S3 Console:

    • Open the <prefix>-dashboard bucket, click 'upload'. Add the contents of the 'dist' subdirectory to the upload. Then select 'Next'. On the permissions window allow the public to view. Select 'Upload'.

    You should be able to visit the dashboard website at http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or find the url <prefix>-dashboard -> "Properties" -> "Static website hosting" -> "Endpoint" and login with a user that you configured for access in the Configure and Deploy the Cumulus Stack step.


    Cumulus Instance Sizing

    The Cumulus deployment default sizing for Elasticsearch instances, EC2 instances, and Autoscaling Groups are small and designed for testing and cost savings. The default settings are likely not suitable for production workloads. Sizing is highly individual and dependent on expected load and archive size.

    Please be cognizant of costs as any change in size will affect your AWS bill. AWS provides a pricing calculator for estimating costs.

    Elasticsearch

    The mappings file contains all of the data types that will be indexed into Elasticsearch. Elasticsearch sizing is tied to your archive size, including your collections, granules, and workflow executions that will be stored.

    AWS provides documentation on calculating and configuring for sizing.

    In addition to size you'll want to consider the number of nodes which determine how the system reacts in the event of a failure.

    Configuration can be done in the data persistence module in elasticsearch_config and the cumulus module in es_index_shards.

    If you make changes to your Elasticsearch configuration you will need to reindex for those changes to take effect.

    EC2 instances and autoscaling groups

    EC2 instances are used for long-running operations (i.e. generating a reconciliation report) and long-running workflow tasks. Configuration for your ECS cluster is achieved via Cumulus deployment variables.

    When configuring your ECS cluster consider:

    • The EC2 instance type and EBS volume size needed to accommodate your workloads. Configured as ecs_cluster_instance_type and ecs_cluster_instance_docker_volume_size.
    • The minimum and desired number of instances on hand to accommodate your workloads. Configured as ecs_cluster_min_size and ecs_cluster_desired_size.
    • The maximum number of instances you will need and are willing to pay for to accommodate your heaviest workloads. Configured as ecs_cluster_max_size.
    • Your autoscaling parameters: ecs_cluster_scale_in_adjustment_percent, ecs_cluster_scale_out_adjustment_percent, ecs_cluster_scale_in_threshold_percent, and ecs_cluster_scale_out_threshold_percent.

    Footnotes


    1. Run terraform init if:

      • This is the first time deploying the module
      • You have added any additional child modules, including Cumulus components
      • You have updated the source for any of the child modules

    2. To add another redirect URIs to your application. On Earthdata home page, select "My Applications". Scroll down to "Application Administration" and use the edit icon for your application. Then Manage -> Redirect URIs.

    3. The API root can be found a number of ways. The easiest is to note it in the output of the app deployment step. But you can also find it from the AWS console -> Amazon API Gateway -> APIs -> <prefix>-archive -> Dashboard, and reading the URL at the top after "Invoke this API at"

    - + \ No newline at end of file diff --git a/docs/v13.4.0/deployment/postgres_database_deployment/index.html b/docs/v13.4.0/deployment/postgres_database_deployment/index.html index 6f1b7cff12e..95f33a77520 100644 --- a/docs/v13.4.0/deployment/postgres_database_deployment/index.html +++ b/docs/v13.4.0/deployment/postgres_database_deployment/index.html @@ -5,7 +5,7 @@ PostgreSQL Database Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ cumulus-rds-tf that will deploy an AWS RDS Aurora Serverless PostgreSQL 10.2 compatible database cluster, and optionally provision a single deployment database with credentialed secrets for use with Cumulus.

    We have provided an example terraform deployment using this module in the Cumulus template-deploy repository on github.

    Use of this example involves:

    • Creating/configuring a Terraform module directory
    • Using Terraform to deploy resources to AWS

    Requirements

    Configuration/installation of this module requires the following:

    • Terraform
    • git
    • A VPC configured for use with Cumulus Core. This should match the subnets you provide when Deploying Cumulus to allow Core's lambdas to properly access the database.
    • At least two subnets across multiple AZs. These should match the subnets you provide as configuration when Deploying Cumulus, and should be within the same VPC.

    Needed Git Repositories

    Assumptions

    OS/Environment

    The instructions in this module require Linux/MacOS. While deployment via Windows is possible, it is unsupported.

    Terraform

    This document assumes knowledge of Terraform. If you are not comfortable working with Terraform, the following links should bring you up to speed:

    For Cumulus specific instructions on installation of Terraform, refer to the main Cumulus Installation Documentation

    Aurora/RDS

    This document also assumes some basic familiarity with PostgreSQL databases, and Amazon Aurora/RDS. If you're unfamiliar consider perusing the AWS docs, and the Aurora Serverless V1 docs.

    Prepare deployment repository

    If you already are working with an existing repository that has a configured rds-cluster-tf deployment for the version of Cumulus you intend to deploy or update, or just need to configure this module for your repository, skip to Prepare AWS configuration.

    Clone the cumulus-template-deploy repo and name appropriately for your organization:

      git clone https://github.com/nasa/cumulus-template-deploy <repository-name>

    We will return to configuring this repo and using it for deployment below.

    Optional: Create a new repository

    Create a new repository on Github so that you can add your workflows and other modules to source control:

      git remote set-url origin https://github.com/<org>/<repository-name>
    git push origin master

    You can then add/commit changes as needed.

    Note: If you are pushing your deployment code to a git repo, make sure to add terraform.tf and terraform.tfvars to .gitignore, as these files will contain sensitive data related to your AWS account.


    Prepare AWS configuration

    To deploy this module, you need to make sure that you have the following steps from the Cumulus deployment instructions in similar fashion for this module:

    --

    Configure and deploy the module

    When configuring this module, please keep in mind that unlike Cumulus deployment, this module should be deployed once to create the database cluster and only thereafter to make changes to that configuration/upgrade/etc. This module does not need to be re-deployed for each Core update.

    These steps should be executed in the rds-cluster-tf directory of the template deploy repo that you previously cloned. Run the following to copy the example files:

    cd rds-cluster-tf/
    cp terraform.tf.example terraform.tf
    cp terraform.tfvars.example terraform.tfvars

    In terraform.tf, configure the remote state settings by substituting the appropriate values for:

    • bucket
    • dynamodb_table
    • PREFIX (whatever prefix you've chosen for your deployment)

    Fill in the appropriate values in terraform.tfvars. See the rds-cluster-tf module variable definitions for more detail on all of the configuration options. A few notable configuration options are documented in the next section.

    Configuration Options

    • deletion_protection -- defaults to true. Set it to false if you want to be able to delete your cluster with a terraform destroy without manually updating the cluster.
    • db_admin_username -- cluster database administration username. Defaults to postgres.
    • db_admin_password -- required variable that specifies the admin user password for the cluster. To randomize this on each deployment, consider using a random_string resource as input.
    • region -- defaults to us-east-1.
    • subnets -- requires at least 2 across different AZs. For use with Cumulus, these AZs should match the values you configure for your lambda_subnet_ids.
    • max_capacity -- the max ACUs the cluster is allowed to use. Carefully consider cost/performance concerns when setting this value.
    • min_capacity -- the minimum ACUs the cluster will scale to
    • provision_user_database -- Optional flag to allow module to provision a user database in addition to creating the cluster. Described in the next section.

    Provision user and user database

    If you wish for the module to provision a PostgreSQL database on your new cluster and provide a secret for access in the module output, in addition to managing the cluster itself, the following configuration keys are required:

    • provision_user_database -- must be set to true, this configures the module to deploy a lambda that will create the user database, and update the provided configuration on deploy.
    • permissions_boundary_arn -- the permissions boundary to use in creating the roles for access the provisioning lambda will need. This should in most use cases be the same one used for Cumulus Core deployment.
    • rds_user_password -- the value to set the user password to
    • prefix -- this value will be used to set a unique identifier the ProvisionDatabase lambda, as well as name the provisioned user/database.

    Once configured, the module will deploy the lambda, and run it on each provision, creating the configured database if it does not exist, updating the user password if that value has been changed, and updating the output user database secret.

    Setting provision_user_database to false after provisioning will not result in removal of the configured database, as the lambda is non-destructive as configured in this module.

    Please Note: This functionality is limited in that it will only provision a single database/user and configure a basic database, and should not be used in scenarios where more complex configuration is required.

    Initialize Terraform

    Run terraform init

    You should see output like:

    * provider.aws: version = "~> 2.32"

    Terraform has been successfully initialized!

    Deploy

    Run terraform apply to deploy the resources.

    If re-applying this module, variables (e.g. engine_version, snapshot_identifier ) that force a recreation of the database cluster may result in data loss if deletion protection is disabled. Examine the changeset carefully for resources that will be re-created/destroyed before applying.

    Review the changeset, and assuming it looks correct, type yes when prompted to confirm that you want to create all of the resources.

    Assuming the operation is successful, you should see output similar to the following (this example omits the creation of a user database/lambdas/security groups):

    terraform apply

    An execution plan has been generated and is shown below.
    Resource actions are indicated with the following symbols:
    + create

    Terraform will perform the following actions:

    # module.rds_cluster.aws_db_subnet_group.default will be created
    + resource "aws_db_subnet_group" "default" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + subnet_ids = [
    + "subnet-xxxxxxxxx",
    + "subnet-xxxxxxxxx",
    ]
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    }

    # module.rds_cluster.aws_rds_cluster.cumulus will be created
    + resource "aws_rds_cluster" "cumulus" {
    + apply_immediately = true
    + arn = (known after apply)
    + availability_zones = (known after apply)
    + backup_retention_period = 1
    + cluster_identifier = "xxxxxxxxx"
    + cluster_identifier_prefix = (known after apply)
    + cluster_members = (known after apply)
    + cluster_resource_id = (known after apply)
    + copy_tags_to_snapshot = false
    + database_name = "xxxxxxxxx"
    + db_cluster_parameter_group_name = (known after apply)
    + db_subnet_group_name = (known after apply)
    + deletion_protection = true
    + enable_http_endpoint = true
    + endpoint = (known after apply)
    + engine = "aurora-postgresql"
    + engine_mode = "serverless"
    + engine_version = "10.12"
    + final_snapshot_identifier = "xxxxxxxxx"
    + hosted_zone_id = (known after apply)
    + id = (known after apply)
    + kms_key_id = (known after apply)
    + master_password = (sensitive value)
    + master_username = "xxxxxxxxx"
    + port = (known after apply)
    + preferred_backup_window = "07:00-09:00"
    + preferred_maintenance_window = (known after apply)
    + reader_endpoint = (known after apply)
    + skip_final_snapshot = false
    + storage_encrypted = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_security_group_ids = (known after apply)

    + scaling_configuration {
    + auto_pause = true
    + max_capacity = 4
    + min_capacity = 2
    + seconds_until_auto_pause = 300
    + timeout_action = "RollbackCapacityChange"
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret.rds_login will be created
    + resource "aws_secretsmanager_secret" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + policy = (known after apply)
    + recovery_window_in_days = 30
    + rotation_enabled = (known after apply)
    + rotation_lambda_arn = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }

    + rotation_rules {
    + automatically_after_days = (known after apply)
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret_version.rds_login will be created
    + resource "aws_secretsmanager_secret_version" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + secret_id = (known after apply)
    + secret_string = (sensitive value)
    + version_id = (known after apply)
    + version_stages = (known after apply)
    }

    # module.rds_cluster.aws_security_group.rds_cluster_access will be created
    + resource "aws_security_group" "rds_cluster_access" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + egress = (known after apply)
    + id = (known after apply)
    + ingress = (known after apply)
    + name = (known after apply)
    + name_prefix = "cumulus_rds_cluster_access_ingress"
    + owner_id = (known after apply)
    + revoke_rules_on_delete = false
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_id = "vpc-xxxxxxxxx"
    }

    # module.rds_cluster.aws_security_group_rule.rds_security_group_allow_PostgreSQL will be created
    + resource "aws_security_group_rule" "rds_security_group_allow_postgres" {
    + from_port = 5432
    + id = (known after apply)
    + protocol = "tcp"
    + security_group_id = (known after apply)
    + self = true
    + source_security_group_id = (known after apply)
    + to_port = 5432
    + type = "ingress"
    }

    Plan: 6 to add, 0 to change, 0 to destroy.

    Do you want to perform these actions?
    Terraform will perform the actions described above.
    Only 'yes' will be accepted to approve.

    Enter a value: yes

    module.rds_cluster.aws_db_subnet_group.default: Creating...
    module.rds_cluster.aws_security_group.rds_cluster_access: Creating...
    module.rds_cluster.aws_secretsmanager_secret.rds_login: Creating...

    Then, after the resources are created:

    Apply complete! Resources: X added, 0 changed, 0 destroyed.
    Releasing state lock. This may take a few moments...

    Outputs:

    admin_db_login_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxxxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmdR
    admin_db_login_secret_version = xxxxxxxxx
    rds_endpoint = xxxxxxxxx.us-east-1.rds.amazonaws.com
    security_group_id = xxxxxxxxx
    user_credentials_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmpXA

    Note the output values for admin_db_login_secret_arn (and optionally user_credentials_secret_arn) as these provide the AWS Secrets Manager secret required to access the database as the administrative user and, optionally, the user database credentials Cumulus requires as well.

    The content of each of these secrets are is in the form:

    {
    "database": "postgres",
    "dbClusterIdentifier": "clusterName",
    "engine": "postgres",
    "host": "xxx",
    "password": "defaultPassword",
    "port": 5432,
    "username": "xxx"
    }
    • database -- the PostgreSQL database used by the configured user
    • dbClusterIdentifier -- the value set by the cluster_identifier variable in the terraform module
    • engine -- the Aurora/RDS database engine
    • host -- the RDS service host for the database in the form (dbClusterIdentifier)-(AWS ID string).(region).rds.amazonaws.com
    • password -- the database password
    • username -- the account username
    • port -- The database connection port, should always be 5432

    Next Steps

    The database cluster has been created/updated! From here you can continue to add additional user accounts, databases and other database configuration.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/deployment/share-s3-access-logs/index.html b/docs/v13.4.0/deployment/share-s3-access-logs/index.html index 5b2a5371853..ddf9c2a3497 100644 --- a/docs/v13.4.0/deployment/share-s3-access-logs/index.html +++ b/docs/v13.4.0/deployment/share-s3-access-logs/index.html @@ -5,14 +5,14 @@ Share S3 Access Logs | Cumulus Documentation - +
    Version: v13.4.0

    Share S3 Access Logs

    It is possible through Cumulus to share S3 access logs across multiple S3 packages using the S3 replicator package.

    S3 Replicator

    The S3 Replicator is a node package that contains a simple lambda function, associated permissions, and the Terraform instructions to replicate create-object events from one S3 bucket to another.

    First ensure that you have enabled S3 Server Access Logging.

    Next configure your config.tfvars as described in the s3-replicator/README.md to correspond to your deployment. The source_bucket and source_prefix are determined by how you enabled the S3 Server Access Logging.

    In order to deploy the s3-replicator with cumulus you will need to add the module to your terraform main.tf definition. e.g.

    module "s3-replicator" {
    source = "<path to s3-replicator.zip>"
    prefix = var.prefix
    vpc_id = var.vpc_id
    subnet_ids = var.subnet_ids
    permissions_boundary = var.permissions_boundary_arn
    source_bucket = var.s3_replicator_config.source_bucket
    source_prefix = var.s3_replicator_config.source_prefix
    target_bucket = var.s3_replicator_config.target_bucket
    target_prefix = var.s3_replicator_config.target_prefix
    }

    The terraform source package can be found on the Cumulus github release page under the asset tab terraform-aws-cumulus-s3-replicator.zip.

    ESDIS Metrics

    In the NGAP environment, the ESDIS Metrics team has set up an ELK stack to process logs from Cumulus instances. To use this system, you must deliver any S3 Server Access logs that Cumulus creates.

    Configure the S3 replicator as described above using the target_bucket and target_prefix provided by the metrics team.

    The metrics team has taken care of setting up Logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/deployment/terraform-best-practices/index.html b/docs/v13.4.0/deployment/terraform-best-practices/index.html index ef4898c0e4b..a95317e9b97 100644 --- a/docs/v13.4.0/deployment/terraform-best-practices/index.html +++ b/docs/v13.4.0/deployment/terraform-best-practices/index.html @@ -5,7 +5,7 @@ Terraform Best Practices | Cumulus Documentation - + @@ -88,7 +88,7 @@ AWS CLI command, replacing PREFIX with your deployment prefix name:

    aws resourcegroupstaggingapi get-resources \
    --query "ResourceTagMappingList[].ResourceARN" \
    --tag-filters Key=Deployment,Values=PREFIX

    Ideally, the output should be an empty list, but if it is not, then you may need to manually delete the listed resources.

    Configuring the Cumulus deployment: link Restoring a previous version: link

    - + \ No newline at end of file diff --git a/docs/v13.4.0/deployment/thin_egress_app/index.html b/docs/v13.4.0/deployment/thin_egress_app/index.html index f298cf1b880..e02fd327536 100644 --- a/docs/v13.4.0/deployment/thin_egress_app/index.html +++ b/docs/v13.4.0/deployment/thin_egress_app/index.html @@ -5,7 +5,7 @@ Using the Thin Egress App for Cumulus distribution | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v13.4.0

    Using the Thin Egress App for Cumulus distribution

    The Thin Egress App (TEA) is an app running in Lambda that allows retrieving data from S3 using temporary links and provides URS integration.

    Configuring a TEA deployment

    TEA is deployed using Terraform modules. Refer to these instructions for guidance on how to integrate new components with your deployment.

    The cumulus-template-deploy repository cumulus-tf/main.tf contains a thin_egress_app for distribution.

    The TEA module provides these instructions showing how to add it to your deployment and the following are instructions to configure the thin_egress_app module in your Cumulus deployment.

    Create a secret for signing Thin Egress App JWTs

    The Thin Egress App uses JWTs internally to authenticate requests and requires a secret stored in AWS Secrets Manager containing SSH keys that are used to sign the JWTs.

    See the Thin Egress App documentation on how to create this secret with the correct values. It will be used later to set the thin_egress_jwt_secret_name variable when deploying the Cumulus module.

    bucket_map.yaml

    The Thin Egress App uses a bucket_map.yaml file to determine which buckets to serve. Documentation of the file format is available here.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple json mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    Please note: Cumulus only supports a one-to-one mapping of bucket->TEA path for 'distribution' buckets.

    Optionally configure a custom bucket map

    A simple config would look something like this:

    bucket_map.yaml
    MAP:
    my-protected: my-protected
    my-public: my-public

    PUBLIC_BUCKETS:
    - my-public

    Please note: your custom bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Optionally configure shared variables

    The cumulus module deploys certain components that interact with TEA. As a result, the cumulus module requires that if you are specifying a value for the stage_name variable to the TEA module, you must use the same value for the tea_api_gateway_stage variable to the cumulus module.

    One way to keep these variable values in sync across the modules is to use Terraform local values to define values to use for the variables for both modules. This approach is shown in the Cumulus core example deployment code.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/deployment/upgrade-readme/index.html b/docs/v13.4.0/deployment/upgrade-readme/index.html index 77c1162cff5..43a4c5df3e4 100644 --- a/docs/v13.4.0/deployment/upgrade-readme/index.html +++ b/docs/v13.4.0/deployment/upgrade-readme/index.html @@ -5,7 +5,7 @@ Upgrading Cumulus | Cumulus Documentation - + @@ -15,7 +15,7 @@ deployment functions correctly. Please refer to some recommended smoke tests given above, and consider additional tests appropriate for your particular deployment and environment.

    Update Cumulus Dashboard

    If there are breaking (or otherwise significant) changes to the Cumulus API, you should also upgrade your Cumulus Dashboard deployment to use the version of the Cumulus API matching the version of Cumulus to which you are migrating.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/development/forked-pr/index.html b/docs/v13.4.0/development/forked-pr/index.html index 5f62dfc63d9..3127ee8fe31 100644 --- a/docs/v13.4.0/development/forked-pr/index.html +++ b/docs/v13.4.0/development/forked-pr/index.html @@ -5,13 +5,13 @@ Issuing PR From Forked Repos | Cumulus Documentation - +
    Version: v13.4.0

    Issuing PR From Forked Repos

    Fork the Repo

    • Fork the Cumulus repo
    • Create a new branch from the branch you'd like to contribute to
    • If an issue does't already exist, submit one (see above)

    Create a Pull Request

    Reviewing PRs from Forked Repos

    Upon submission of a pull request, the Cumulus development team will review the code.

    Once the code passes an initial review, the team will run the CI tests against the proposed update.

    The request will then either be merged, declined, or an adjustment to the code will be requested via the issue opened with the original PR request.

    PRs from forked repos cannot directly merged to master. Cumulus reviews must follow the following steps before completing the review process:

    1. Create a new branch:

        git checkout -b from-<name-of-the-branch> master
    2. Push the new branch to GitHub

    3. Change the destination of the forked PR to the new branch that was just pushed

      Screenshot of Github interface showing how to change the base branch of a pull request

    4. After code review and approval, merge the forked PR to the new branch.

    5. Create a PR for the new branch to master.

    6. If the CI tests pass, merge the new branch to master and close the issue. If the CI tests do not pass, request an amended PR from the original author/ or resolve failures as appropriate.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/development/integration-tests/index.html b/docs/v13.4.0/development/integration-tests/index.html index 46a4c34ceaf..6057e293469 100644 --- a/docs/v13.4.0/development/integration-tests/index.html +++ b/docs/v13.4.0/development/integration-tests/index.html @@ -5,7 +5,7 @@ Integration Tests | Cumulus Documentation - + @@ -19,7 +19,7 @@ in the commit message.

    If you create a new stack and want to be able to run integration tests against it in CI, you will need to add it to bamboo/select-stack.js.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/development/quality-and-coverage/index.html b/docs/v13.4.0/development/quality-and-coverage/index.html index cd0febfca9e..24050b56737 100644 --- a/docs/v13.4.0/development/quality-and-coverage/index.html +++ b/docs/v13.4.0/development/quality-and-coverage/index.html @@ -5,7 +5,7 @@ Code Coverage and Quality | Cumulus Documentation - + @@ -23,7 +23,7 @@ here.

    To run linting on the markdown files, run npm run lint-md.

    Audit

    This project uses audit-ci to run a security audit on the package dependency tree. This must pass prior to merge. The configured rules for audit-ci can be found here.

    To execute an audit, run npm run audit.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/development/release/index.html b/docs/v13.4.0/development/release/index.html index a71d774d2a0..d6420c91d41 100644 --- a/docs/v13.4.0/development/release/index.html +++ b/docs/v13.4.0/development/release/index.html @@ -5,7 +5,7 @@ Versioning and Releases | Cumulus Documentation - + @@ -15,7 +15,7 @@ It's useful to use the search feature of your code editor or grep to see if there any references to the old package versions. In bash shell you can run

        find . -name package.json -exec grep -nH "@cumulus/.*[0-9]*\.[0-9]\.[0-9].*" {} \; | grep -v "@cumulus/.*MAJOR\.MINOR\.PATCH.*"

    e.g.:
    find . -name package.json -exec grep -nH "@cumulus/.*[0-9]*\.[0-9]\.[0-9].*" {} \; | grep -v "@cumulus/.*13\.1\.0.*"

    Verify that no results are returned where MAJOR, MINOR, or PATCH differ from the intended version, and no outdated -alpha or -beta versions are specified.

    3. Check Cumulus Dashboard PRs for Version Bump

    There may be unreleased changes in the Cumulus Dashboard project that rely on this unreleased Cumulus Core version.

    If there is exists a PR in the cumulus-dashboard repo with a name containing: "Version Bump for Next Cumulus API Release":

    • There will be a placeholder change-me value that should be replaced with the Cumulus Core to-be-released-version.
    • Mark that PR as ready to be reviewed.

    4. Update CHANGELOG.md

    Update the CHANGELOG.md. Put a header under the Unreleased section with the new version number and the date.

    Add a link reference for the github "compare" view at the bottom of the CHANGELOG.md, following the existing pattern. This link reference should create a link in the CHANGELOG's release header to changes in the corresponding release.

    5. Update DATA_MODEL_CHANGELOG.md

    Similar to #4, make sure the DATA_MODEL_CHANGELOG is updated if there are data model changes in the release, and the link reference at the end of the document is updated as appropriate.

    6. Update CONTRIBUTORS.md

    ./bin/update-contributors.sh
    git add CONTRIBUTORS.md

    Commit and push these changes, if any.

    7. Update Cumulus package API documentation

    Update auto-generated API documentation for any Cumulus packages that have it:

    npm run docs-build-packages

    Commit and push these changes, if any.

    8. Cut new version of Cumulus Documentation

    If this is a backport, do not create a new version of the documentation. For various reasons, we do not merge backports back to master, other than changelog notes. Documentation changes for backports will not be published to our documentation website.

    cd website
    npm run version ${release_version}
    git add .

    Where ${release_version} corresponds to the version tag v1.2.3, for example.

    Commit and push these changes.

    9. Create a pull request against the minor version branch

    1. Push the release branch (e.g. release-1.2.3) to GitHub.

    2. Create a PR against the minor version base branch (e.g. release-1.2.x).

    3. Configure Bamboo to run automated tests against this PR by finding the branch plan for the release branch (release-1.2.3) and setting only these variables:

      • GIT_PR: true
      • SKIP_AUDIT: true

      IMPORTANT: Do NOT set the PUBLISH_FLAG variable to true for this branch plan. The actual publishing of the release will be handled by a separate, manually triggered branch plan.

      Screenshot of Bamboo CI interface showing the configuration of the GIT_PR branch variable to have a value of &quot;true&quot;

    4. Verify that the Bamboo build for the PR succeeds and then merge to the minor version base branch (release-1.2.x).

      • It is safe to do a squash merge in this instance, but not required
    5. You may delete your release branch (release-1.2.3) after merging to the base branch.

    10. Create a git tag for the release

    Check out the minor version base branch (release-1.2.x) now that your changes are merged in and do a git pull.

    Ensure you are on the latest commit.

    Create and push a new git tag:

        git tag -a vMAJOR.MINOR.PATCH -m "Release MAJOR.MINOR.PATCH"
    git push origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -a v9.1.0 -m "Release 9.1.0"
    git push origin v9.1.0

    11. Publishing the release

    Publishing of new releases is handled by a custom Bamboo branch plan and is manually triggered.

    The reasons for using a separate branch plan to handle releases instead of the branch plan for the minor version (e.g. release-1.2.x) are:

    • The Bamboo build for the minor version release branch is triggered automatically on any commits to that branch, whereas we want to manually control when the release is published.
    • We want to verify that integration tests have passed on the Bamboo build for the minor version release branch before we manually trigger the release, so that we can be sure that our code is safe to release.

    If this is a new minor version branch, then you will need to create a new Bamboo branch plan for publishing the release following the instructions below:

    Creating a Bamboo branch plan for the release

    • In the Cumulus Core project (https://ci.earthdata.nasa.gov/browse/CUM-CBA), click Actions -> Configure Plan in the top right.

    • Next to Plan branch click the rightmost button that displays Create Plan Branch upon hover.

    • Click Create plan branch manually.

    • Add the values in that list. Choose a display name that makes it very clear this is a deployment branch plan. Release (minor version branch name) seems to work well (e.g. Release (1.2.x))).

      • Make sure you enter the correct branch name (e.g. release-1.2.x).
    • Important Deselect Enable Branch - if you do not do this, it will immediately fire off a build.

    • Do Immediately On the Branch Details page, enable Change trigger. Set the Trigger type to manual, this will prevent commits to the branch from triggering the build plan. You should have been redirected to the Branch Details tab after creating the plan. If not, navigate to the branch from the list where you clicked Create Plan Branch in the previous step.

    • Go to the Variables tab. Ensure that you are on your branch plan and not the master plan: You should not see a large list of configured variables, but instead a dropdown allowing you to select variables to override, and the tab title will be Branch Variables. Then set the branch variables as follow:

      • DEPLOYMENT: cumulus-from-npm-tf (except in special cases such as incompatible backport branches)
        • If this variable is not set, it will default to the deployment name for the last committer on the branch
      • USE_CACHED_BOOTSTRAP: false
      • USE_TERRAFORM_ZIPS: true (IMPORTANT: MUST be set in order to run integration tests against the .zip files published during the build so that we are actually testing our released files)
      • GIT_PR: true
      • SKIP_AUDIT: true
      • PUBLISH_FLAG: true
    • Enable the branch from the Branch Details page.

    • Run the branch using the Run button in the top right.

    Bamboo will build and run lint and unit tests against that tagged release, publish the new packages to NPM, and then run the integration tests using those newly released packages.

    12. Create a new Cumulus release on github

    The CI release scripts will automatically create a GitHub release based on the release version tag, as well as upload artifacts to the Github release for the Terraform modules provided by Cumulus. The Terraform release artifacts include:

    • A multi-module Terraform .zip artifact containing filtered copies of the tf-modules, packages, and tasks directories for use as Terraform module sources.
    • A S3 replicator module
    • A workflow module
    • A distribution API module
    • An ECS service module

    Just make sure to verify the appropriate .zip files are present on Github after the release process is complete.

    13. Merge base branch back to master

    Finally, you need to reproduce the version update changes back to master.

    If this is the latest version, you can simply create a PR to merge the minor version base branch back to master.

    Do not merge master back into the release branch since we want the release branch to just have the code from the release. Instead, create a new branch off of the release branch and merge that to master. You can freely merge master into this branch and delete it when it is merged to master.

    If this is a backport, you will need to create a PR that ports the changelog updates back to master. It is important in this changelog note to call it out as a backport. For example, fixes in backport version 1.14.5 may not be available in 1.15.0 because the fix was introduced in 1.15.3.

    Troubleshooting

    Delete and regenerate the tag

    To delete a published tag to re-tag, follow these steps:

      git tag -d vMAJOR.MINOR.PATCH
    git push -d origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -d v9.1.0
    git push -d origin v9.1.0
    - + \ No newline at end of file diff --git a/docs/v13.4.0/docs-how-to/index.html b/docs/v13.4.0/docs-how-to/index.html index eaaf00f711f..5dfbd34cdc3 100644 --- a/docs/v13.4.0/docs-how-to/index.html +++ b/docs/v13.4.0/docs-how-to/index.html @@ -5,13 +5,13 @@ Cumulus Documentation: How To's | Cumulus Documentation - +
    Version: v13.4.0

    Cumulus Documentation: How To's

    Cumulus Docs Installation

    Run a Local Server

    Environment variables DOCSEARCH_API_KEY and DOCSEARCH_INDEX_NAME must be set for search to work. At the moment, search is only truly functional on prod because that is the only website we have registered to be indexed with DocSearch (see below on search).

    git clone git@github.com:nasa/cumulus
    cd cumulus
    npm run docs-install
    npm run docs-serve

    Note: docs-build will build the documents into website/build.

    Cumulus Documentation

    Our project documentation is hosted on GitHub Pages. The resources published to this website are housed in docs/ directory at the top of the Cumulus repository. Those resources primarily consist of markdown files and images.

    We use the open-source static website generator Docusaurus to build html files from our markdown documentation, add some organization and navigation, and provide some other niceties in the final website (search, easy templating, etc.).

    Add a New Page and Sidebars

    Adding a new page should be as simple as writing some documentation in markdown, placing it under the correct directory in the docs/ folder and adding some configuration values wrapped by --- at the top of the file. There are many files that already have this header which can be used as reference.

    ---
    id: doc-unique-id # unique id for this document. This must be unique across ALL documentation under docs/
    title: Title Of Doc # Whatever title you feel like adding. This will show up as the index to this page on the sidebar.
    hide_title: false
    ---

    Note: To have the new page show up in a sidebar the designated id must be added to a sidebar in the website/sidebars.js file. Docusaurus has an in depth explanation of sidebars here.

    Versioning Docs

    We lean heavily on Docusaurus for versioning. Their suggestions and walk-through can be found here. It is worth noting that we would like the Documentation versions to match up directly with release versions. Cumulus versioning is explained in the Versioning Docs.

    Search on our documentation site is taken care of by DocSearch. We have been provided with an apiKey and an indexName by DocSearch that we include in our website/siteConfig.js file. The rest, indexing and actual searching, we leave to DocSearch. Our builds expect environment variables for both these values to exist - DOCSEARCH_API_KEY and DOCSEARCH_NAME_INDEX.

    Add a new task

    The tasks list in docs/tasks.md is generated from the list of task package in the task folder. Do not edit the docs/tasks.md file directly.

    Read more about adding a new task.

    Editing the tasks.md header or template

    Look at the bin/build-tasks-doc.js and bin/tasks-header.md files to edit the output of the tasks build script.

    Editing diagrams

    For some diagrams included in the documentation, the raw source is included in the docs/assets/raw directory to allow for easy updating in the future:

    • assets/interfaces.svg -> assets/raw/interfaces.drawio (generated using draw.io)

    Deployment

    The master branch is automatically built and deployed to gh-pages branch. The gh-pages branch is served by Github Pages. Do not make edits to the gh-pages branch.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/external-contributions/index.html b/docs/v13.4.0/external-contributions/index.html index e7e6971e027..ec764427e0c 100644 --- a/docs/v13.4.0/external-contributions/index.html +++ b/docs/v13.4.0/external-contributions/index.html @@ -5,13 +5,13 @@ External Contributions | Cumulus Documentation - +
    Version: v13.4.0

    External Contributions

    Contributions to Cumulus may be made in the form of PRs to the repositories directly or through externally developed tasks and components. Cumulus is designed as an ecosystem that leverages Terraform deployments and AWS Step Functions to easily integrate external components.

    This list may not be exhaustive and represents components that are open source, owned externally, and that have been tested with the Cumulus system. For more information and contributing guidelines, visit the respective GitHub repositories.

    Distribution

    The ASF Thin Egress App is used by Cumulus for distribution. TEA can be deployed with Cumulus or as part of other applications to distribute data.

    Operational Cloud Recovery Archive (ORCA)

    ORCA can be deployed with Cumulus to provide a customizable baseline for creating and managing operational backups.

    Workflow Tasks

    CNM

    PO.DAAC provides two workflow tasks to be used with the Cloud Notification Mechanism (CNM) Schema: CNM to Granule and CNM Response.

    See the CNM workflow data cookbook for an example of how these can be used in a Cumulus ingest workflow.

    DMR++ Generation

    GHRC has provided a DMR++ Generation wokrflow task. This task is meant to be used in conjunction with Cumulus' Hyrax Metadata Updates workflow task.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/faqs/index.html b/docs/v13.4.0/faqs/index.html index f33804d2e36..0d237dd7270 100644 --- a/docs/v13.4.0/faqs/index.html +++ b/docs/v13.4.0/faqs/index.html @@ -5,13 +5,13 @@ Frequently Asked Questions | Cumulus Documentation - +
    Version: v13.4.0

    Frequently Asked Questions

    Below are some commonly asked questions that you may encounter that can assist you along the way when working with Cumulus.

    General

    How do I deploy a new instance in Cumulus?

    Answer: For steps on the Cumulus deployment process go to How to Deploy Cumulus.

    What prerequisites are needed to setup Cumulus?

    Answer: You will need access to the AWS console and an Earthdata login before you can deploy Cumulus.

    What is the preferred web browser for the Cumulus environment?

    Answer: Our preferred web browser is the latest version of Google Chrome.

    How do I quickly troubleshoot an issue in Cumulus?

    Answer: To troubleshoot and fix issues in Cumulus reference our recommended solutions in Troubleshooting Cumulus.

    Where can I get support help?

    Answer: The following options are available for assistance:

    • Cumulus: Outside NASA users should file a GitHub issue and inside NASA users should file a JIRA issue.
    • AWS: You can create a case in the AWS Support Center, accessible via your AWS Console.

    Integrators & Developers

    What is a Cumulus integrator?

    Answer: Those who are working within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    What are the steps if I run into an issue during deployment?

    Answer: If you encounter an issue with your deployment go to the Troubleshooting Deployment guide.

    Is Cumulus customizable and flexible?

    Answer: Yes. Cumulus is a modular architecture that allows you to decide which components that you want/need to deploy. These components are maintained as Terraform modules.

    What are Terraform modules?

    Answer: They are modules that are composed to create a Cumulus deployment, which gives integrators the flexibility to choose the components of Cumulus that want/need. To view Cumulus maintained modules or steps on how to create a module go to Terraform modules.

    Where do I find Terraform module variables

    Answer: Go here for a list of Cumulus maintained variables.

    What is a Cumulus workflow?

    Answer: A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions. For more details, we suggest visiting here.

    How do I set up a Cumulus workflow?

    Answer: You will need to create a provider, have an associated collection (add a new one), and generate a new rule first. Then you can set up a Cumulus workflow by following these steps here.

    What are the common use cases that a Cumulus integrator encounters?

    Answer: The following are some examples of possible use cases you may see:


    Operators

    What is a Cumulus operator?

    Answer: Those that ingests, archives, and troubleshoots datasets (called collections in Cumulus). Your daily activities might include but not limited to the following:

    • Ingesting datasets
    • Maintaining historical data ingest
    • Starting and stopping data handlers
    • Managing collections
    • Managing provider definitions
    • Creating, enabling, and disabling rules
    • Investigating errors for granules and deleting or re-ingesting granules
    • Investigating errors in executions and isolating failed workflow step(s)
    What are the common use cases that a Cumulus operator encounters?

    Answer: The following are some examples of possible use cases you may see:

    Can you re-run a workflow execution in AWS?

    Answer: Yes. For steps on how to re-run a workflow execution go to Re-running workflow executions in the Cumulus Operator Docs.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/features/ancillary_metadata/index.html b/docs/v13.4.0/features/ancillary_metadata/index.html index cbf2a7b2bca..9e76ab19945 100644 --- a/docs/v13.4.0/features/ancillary_metadata/index.html +++ b/docs/v13.4.0/features/ancillary_metadata/index.html @@ -5,7 +5,7 @@ Ancillary Metadata Export | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v13.4.0

    Ancillary Metadata Export

    This feature utilizes the type key on a files object in a Cumulus granule. It uses the key to provide a mechanism where granule discovery, processing and other tasks can set and use this value to facilitate metadata export to CMR.

    Tasks setting type

    Discover Granules

    Uses the Collection type key to set the value for files on discovered granules in it's output.

    Parse PDR

    Uses a task-specific mapping to map PDR 'FILE_TYPE' to a CNM type to set type on granules from the PDR.

    CNMToCMALambdaFunction

    Natively supports types that are included in incoming messages to a CNM Workflow.

    Tasks using type

    Move Granules

    Uses the granule file type key to update UMM/ECHO 10 CMR files passed in as candidates to the task. This task adds the external facing URLs to the CMR metadata file based on the type. See the file tracking data cookbook for a detailed mapping. If a non-CNM type is specified, the task assumes it is a 'data' file.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/features/backup_and_restore/index.html b/docs/v13.4.0/features/backup_and_restore/index.html index 533954ee911..9186388c239 100644 --- a/docs/v13.4.0/features/backup_and_restore/index.html +++ b/docs/v13.4.0/features/backup_and_restore/index.html @@ -5,7 +5,7 @@ Cumulus Backup and Restore | Cumulus Documentation - + @@ -52,7 +52,7 @@ writing to the old cluster.

  • Set the snapshot_identifier variable to the snapshot you wish to create, and configure the module like a new deployment, with a unique cluster_identifier

  • Deploy the module using terraform apply

  • Once deployed, verify the cluster has the expected data

  • Redeploy the data persistence and Cumulus deployments - You should not need to reconfigure either, as the secret ARN and the security group should not change, however double-check the configured values are as expected

  • - + \ No newline at end of file diff --git a/docs/v13.4.0/features/dead_letter_archive/index.html b/docs/v13.4.0/features/dead_letter_archive/index.html index 00b7e9ae9c0..e97cf30def7 100644 --- a/docs/v13.4.0/features/dead_letter_archive/index.html +++ b/docs/v13.4.0/features/dead_letter_archive/index.html @@ -5,13 +5,13 @@ Cumulus Dead Letter Archive | Cumulus Documentation - +
    Version: v13.4.0

    Cumulus Dead Letter Archive

    This documentation explains the Cumulus dead letter archive and associated functionality.

    DB Records DLQ Archive

    The Cumulus system contains a number of dead letter queues. Perhaps the most important system lambda function supported by a DLQ is the sfEventSqsToDbRecords lambda function which parses Cumulus messages from workflow executions to generate and write database records to the Cumulus database.

    As of Cumulus v9+, the dead letter queue for this lambda (named sfEventSqsToDbRecordsDeadLetterQueue) has been updated with a consumer lambda that will automatically write any incoming records to the S3 system bucket, under the path <stackName>/dead-letter-archive/sqs/. This will allow integrators and operators engaged in debugging missing records to inspect any Cumulus messages which failed to process and did not result in the successful creation of database records.

    Dead Letter Archive recovery

    In addition to the above, as of Cumulus v9+, the Cumulus API also contains a new endpoint at /deadLetterArchive/recoverCumulusMessages.

    Sending a POST request to this endpoint will trigger a Cumulus AsyncOperation that will attempt to reprocess (and if successful delete) all Cumulus messages in the dead letter archive, using the same underlying logic as the existing sfEventSqsToDbRecords. Otherwise, all Cumulus messages that fail to be reprocessed will be moved to a new archive location under the path <stackName>/dead-letter-archive/failed-sqs/<YYYY-MM-DD>.

    This endpoint may prove particularly useful when recovering from extended or unexpected database outage, where messages failed to process due to external outage and there is no essential malformation of each Cumulus message.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/features/dead_letter_queues/index.html b/docs/v13.4.0/features/dead_letter_queues/index.html index 5f62ba57ecb..7199830f900 100644 --- a/docs/v13.4.0/features/dead_letter_queues/index.html +++ b/docs/v13.4.0/features/dead_letter_queues/index.html @@ -5,13 +5,13 @@ Dead Letter Queues | Cumulus Documentation - +
    Version: v13.4.0

    Dead Letter Queues

    startSF SQS queue

    The workflow-trigger for the startSF queue has a Redrive Policy set up that directs any failed attempts to pull from the workflow start queue to a SQS queue Dead Letter Queue.

    This queue can then be monitored for failures to initiate a workflow. Please note that workflow failures will not show up in this queue, only repeated failure to trigger a workflow.

    Named Lambda Dead Letter Queues

    Cumulus provides configured Dead Letter Queues (DLQ) for non-workflow Lambdas (such as ScheduleSF) to capture Lambda failures for further processing.

    These DLQs are setup with the following configuration:

      receive_wait_time_seconds  = 20
    message_retention_seconds = 1209600
    visibility_timeout_seconds = 60

    Default Lambda Configuration

    The following built-in Cumulus Lambdas are setup with DLQs to allow handling of process failures:

    • dbIndexer (Updates Elasticsearch)
    • JobsLambda (writes logs outputs to Elasticsearch)
    • ScheduleSF (the SF Scheduler Lambda that places messages on the queue that is used to start workflows, see Workflow Triggers)
    • publishReports (Lambda that publishes messages to the SNS topics for execution, granule and PDR reporting)
    • reportGranules, reportExecutions, reportPdrs (Lambdas responsible for updating records based on messages in the queues published by publishReports)

    Troubleshooting/Utilizing messages in a Dead Letter Queue

    Ideally an automated process should be configured to poll the queue and process messages off a dead letter queue.

    For aid in manually troubleshooting, you can utilize the SQS Management console to view/messages available in the queues setup for a particular stack. The dead letter queues will have a Message Body containing the Lambda payload, as well as Message Attributes that reference both the error returned and a RequestID which can be cross referenced to the associated Lambda's CloudWatch logs for more information:

    Screenshot of the AWS SQS console showing how to view SQS message attributes

    - + \ No newline at end of file diff --git a/docs/v13.4.0/features/distribution-metrics/index.html b/docs/v13.4.0/features/distribution-metrics/index.html index 5fcaf88dc8f..40514f3ab98 100644 --- a/docs/v13.4.0/features/distribution-metrics/index.html +++ b/docs/v13.4.0/features/distribution-metrics/index.html @@ -5,13 +5,13 @@ Cumulus Distribution Metrics | Cumulus Documentation - +
    Version: v13.4.0

    Cumulus Distribution Metrics

    It is possible to configure Cumulus and the Cumulus Dashboard to display information about the successes and failures of requests for data. This requires the Cumulus instance to deliver Cloudwatch Logs and S3 Server Access logs to an ELK stack.

    ESDIS Metrics in NGAP

    Work with the ESDIS metrics team to set up permissions and access to forward Cloudwatch Logs to a shared AWS:Logs:Destination as well as transferring your S3 Server Access logs to a metrics team bucket.

    The metrics team has taken care of setting up logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    Once Cumulus has been configured to deliver Cloudwatch logs to the ESDIS Metrics team, you can use the Elasticsearch indexes to create the necessary target patterns on the dashboard. These are often <daac>-cloudwatch-cumulus-<env>-* and <daac>-distribution-<env>-*, but they will depend on your specific Elastiscearch setup.

    Cumulus / ESDIS Metrics distribution system

    Architecture diagram showing how logs are replicated from a Cumulus instance to the ESDIS Metrics account and accessed by the Cumulus dashboard

    - + \ No newline at end of file diff --git a/docs/v13.4.0/features/execution_payload_retention/index.html b/docs/v13.4.0/features/execution_payload_retention/index.html index e86d91692ab..57c8996c29f 100644 --- a/docs/v13.4.0/features/execution_payload_retention/index.html +++ b/docs/v13.4.0/features/execution_payload_retention/index.html @@ -5,13 +5,13 @@ Execution Payload Retention | Cumulus Documentation - +
    Version: v13.4.0

    Execution Payload Retention

    In addition to CloudWatch logs and AWS StepFunction API records, Cumulus automatically stores the initial and 'final' (the last update to the execution record) payload values as part of the Execution record in your RDS database and Elasticsearch.

    This allows access via the API (or optionally direct DB/Elasticsearch querying) for debugging/reporting purposes. The data is stored in the "originalPayload" and "finalPayload" fields.

    Payload record cleanup

    To reduce storage requirements, a CloudWatch rule ({stack-name}-dailyExecutionPayloadCleanupRule) triggering a daily run of the provided cleanExecutions lambda has been added. This lambda will remove all 'completed' and 'non-completed' payload records in the database that are older than the specified configuration.

    Configuration

    The following configuration flags have been made available in the cumulus module. They may be overridden in your deployment's instance of the cumulus module by adding the following configuration options:

    dailyexecution_payload_cleanup_schedule_expression (string)_

    This configuration option sets the execution times for this Lambda to run, using a Cloudwatch cron expression.

    Default value is "cron(0 4 * * ? *)".

    completeexecution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of completed execution payloads.

    Default value is false.

    completeexecution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a 'completed' status in days. Records with updatedAt values older than this with payload information will have that information removed.

    Default value is 10.

    noncomplete_execution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of "non-complete" (any status other than completed) execution payloads.

    Default value is false.

    noncomplete_execution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a status other than 'complete' in days. Records with updateTime values older than this with payload information will have that information removed.

    Default value is 30 days.

    • complete_execution_payload_disable/non_complete_execution_payload_disable

    These flags (true/false) determine if the cleanup script's logic for 'complete' and 'non-complete' executions will run. Default value is false for both.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/features/logging-esdis-metrics/index.html b/docs/v13.4.0/features/logging-esdis-metrics/index.html index 6f2a35165ac..8bac62a89f1 100644 --- a/docs/v13.4.0/features/logging-esdis-metrics/index.html +++ b/docs/v13.4.0/features/logging-esdis-metrics/index.html @@ -5,13 +5,13 @@ Writing logs for ESDIS Metrics | Cumulus Documentation - +
    Version: v13.4.0

    Writing logs for ESDIS Metrics

    Note: This feature is only available for Cumulus deployments in NGAP environments.

    Prerequisite: You must configure your Cumulus deployment to deliver your logs to the correct shared logs destination for ESDIS metrics.

    Log messages delivered to the ESDIS metrics logs destination conforming to an expected format will be automatically ingested and parsed to enable helpful searching/filtering of your logs via the ESDIS metrics Kibana dashboard.

    Expected log format

    The ESDIS metrics pipeline expects a log message to be a JSON string representation of an object (dict in Python or map in Java). An example log message might look like:

    {
    "level": "info",
    "executions": "arn:aws:states:us-east-1:000000000000:execution:MySfn:abcd1234",
    "granules": "[\"granule-1\",\"granule-2\"]",
    "message": "hello world",
    "sender": "greetingFunction",
    "stackName": "myCumulus",
    "timestamp": "2018-10-19T19:12:47.501Z"
    }

    A log message can contain the following properties:

    • executions: The AWS Step Function execution name in which this task is executing, if any
    • granules: A JSON string of the array of granule IDs being processed by this code, if any
    • level: A string identifier for the type of message being logged. Possible values:
      • debug
      • error
      • fatal
      • info
      • warn
      • trace
    • message: String containing your actual log message
    • parentArn: The parent AWS Step Function execution ARN that triggered the current execution, if any
    • sender: The name of the resource generating the log message (e.g. a library name, a Lambda function name, an ECS activity name)
    • stackName: The unique prefix for your Cumulus deployment
    • timestamp: An ISO-8601 formatted timestamp
    • version: The version of the resource generating the log message, if any

    None of these properties are explicitly required for ESDIS metrics to parse your log correctly. However, a log without a message has no informational content. And having level, sender, and timestamp properties is very useful for filtering your logs. Including a stackName in your logs is helpful as it allows you to distinguish between logs generated by different deployments.

    Using Cumulus Message Adapter libraries

    If you are writing a custom task that is integrated with the Cumulus Message Adapter, then some of language specific client libraries can be used to write logs compatible with ESDIS metrics.

    The usage of each library differs slightly, but in general a logger is initialized with a Cumulus workflow message to determine the contextual information for the task (e.g. granules, executions). Then, after the logger is initialized, writing logs only requires specifying a message, but the logged output will include the contextual information as well.

    Writing logs using custom code

    Any code that produces logs matching the expected log format can be processed by ESDIS metrics.

    Node.js

    Cumulus core provides a @cumulus/logger library that writes logs in the expected format for ESDIS metrics.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/features/replay-archived-sqs-messages/index.html b/docs/v13.4.0/features/replay-archived-sqs-messages/index.html index e28f225d6d3..2a11a3490f9 100644 --- a/docs/v13.4.0/features/replay-archived-sqs-messages/index.html +++ b/docs/v13.4.0/features/replay-archived-sqs-messages/index.html @@ -5,14 +5,14 @@ How to replay SQS messages archived in S3 | Cumulus Documentation - +
    Version: v13.4.0

    How to replay SQS messages archived in S3

    Context

    Cumulus archives all incoming SQS messages to S3 and removes messages once they have been processed. Unprocessed messages are archived at the path: ${stackName}/archived-incoming-messages/${queueName}/${messageId}

    Replay SQS messages endpoint

    The Cumulus API has added a new endpoint, /replays/sqs. This endpoint will allow you to start a replay operation to requeue all archived SQS messages by queueName and returns an AsyncOperationId for operation status tracking.

    Start replaying archived SQS messages

    In order to start a replay, you must perform a POST request to the replays/sqs endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    FieldTypeDescription
    queueNamestringAny valid SQS queue name (not ARN)

    Status tracking

    A successful response from the /replays/sqs endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/features/replay-kinesis-messages/index.html b/docs/v13.4.0/features/replay-kinesis-messages/index.html index dce726b4b85..ee9d7c0d6b4 100644 --- a/docs/v13.4.0/features/replay-kinesis-messages/index.html +++ b/docs/v13.4.0/features/replay-kinesis-messages/index.html @@ -5,7 +5,7 @@ How to replay Kinesis messages after an outage | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v13.4.0

    How to replay Kinesis messages after an outage

    After a period of outage, it may be necessary for a Cumulus operator to reprocess or 'replay' messages that arrived on an AWS Kinesis Data Stream but did not trigger an ingest. This document serves as an outline on how to start a replay operation, and how to perform status tracking. Cumulus supports replay of all Kinesis messages on a stream (subject to the normal RetentionPeriod constraints), or all messages within a given time slice delimited by start and end timestamps.

    As Kinesis has no comparable field to e.g. the SQS ReceiveCount on its records, Cumulus cannot tell which messages within a given time slice have never been processed, and cannot guarantee only missed messages will be processed. Users will have to rely on duplicate handling or some other method of identifying messages that should not be processed within the time slice.

    NOTE: This operation flow effectively changes only the trigger mechanism for Kinesis ingest notifications. The existence of valid Kinesis-type rules and all other normal requirements for the triggering of ingest via Kinesis still apply.

    Replays endpoint

    Cumulus has added a new endpoint to its API, /replays. This endpoint will allow you to start replay operations and returns an AsyncOperationId for operation status tracking.

    Start a replay

    In order to start a replay, you must perform a POST request to the replays endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    NOTE: As the endTimestamp relies on a comparison with the Kinesis server-side ApproximateArrivalTimestamp, and given that there is no documented level of accuracy for the approximation, it is recommended that the endTimestamp include some amount of buffer to allow for slight discrepancies. If tolerable, the same is recommended for the startTimestamp although it is used differently and less vulnerable to discrepancies since a server-side arrival timestamp should never be earlier than the client-side request timestamp.

    FieldTypeRequiredDescription
    typestringrequiredCurrently only accepts kinesis.
    kinesisStreamstringfor type kinesisAny valid kinesis stream name (not ARN)
    kinesisStreamCreationTimestamp*optionalAny input valid for a JS Date constructor. For reasons to use this field see AWS documentation on StreamCreationTimestamp.
    endTimestamp*optionalAny input valid for a JS Date constructor. Messages newer than this timestamp will be skipped.
    startTimestamp*optionalAny input valid for a JS Date constructor. Messages will be fetched from the Kinesis stream starting at this timestamp. Ignored if it is further in the past than the stream's retention period.

    Status tracking

    A successful response from the /replays endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/features/reports/index.html b/docs/v13.4.0/features/reports/index.html index 116891a6b20..c0e78e7fe6d 100644 --- a/docs/v13.4.0/features/reports/index.html +++ b/docs/v13.4.0/features/reports/index.html @@ -5,7 +5,7 @@ Reconciliation Reports | Cumulus Documentation - + @@ -19,7 +19,7 @@ report generation. The data buckets will include any buckets in your Cumulus buckets configuration that have type public, protected or private.
    - + \ No newline at end of file diff --git a/docs/v13.4.0/getting-started/index.html b/docs/v13.4.0/getting-started/index.html index 468ef5294d5..8cdb7d7e8a1 100644 --- a/docs/v13.4.0/getting-started/index.html +++ b/docs/v13.4.0/getting-started/index.html @@ -5,13 +5,13 @@ Getting Started | Cumulus Documentation - +
    Version: v13.4.0

    Getting Started

    Overview | Quick Tutorials | Helpful Tips

    Overview

    This serves as a guide for new Cumulus users to deploy and learn how to use Cumulus. Here you will learn what you need in order to complete any prerequisites, what Cumulus is and how it works, and how to successfully navigate and deploy a Cumulus environment.

    What is Cumulus

    Cumulus is an open source set of components for creating cloud-based data ingest, archive, distribution and management designed for NASA's future Earth Science data streams.

    Who uses Cumulus

    Data integrators/developers and operators across projects not limited to NASA use Cumulus for their daily work functions.

    Cumulus Roles

    Integrator/Developer

    Cumulus integrators/developers are those who work within Cumulus and AWS for deployments and to manage workflows.

    Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections.

    Role Guides

    As a developer, integrator, or operator, you will need to set up your environments to work in Cumulus. The following docs can get you started in your role specific activities.

    What is a Cumulus Data Type

    In Cumulus, we have the following types of data that you can create and manage:

    • Collections
    • Granules
    • Providers
    • Rules
    • Workflows
    • Executions
    • Reports

    For details on how to create or manage data types go to Data Management Types.


    Quick Tutorials

    Deployment & Configuration

    Cumulus is deployed to an AWS account, so you must have access to deploy resources to an AWS account to get started.

    1. Deploy Cumulus and Cumulus Dashboard to AWS

    Follow the deployment instructions to deploy Cumulus to your AWS account.

    2. Configure and Run the HelloWorld Workflow

    If you have deployed using the cumulus-template-deploy repository, you have a HelloWorld workflow deployed to your Cumulus backend.

    You can see your deployed workflows on the Workflows page of your Cumulus dashboard.

    Configure a collection and provider using the setup guidance on the Cumulus dashboard.

    Then create a rule to trigger your HelloWorld workflow. You can select a rule type of one time.

    Navigate to the Executions page of the dashboard to check the status of your workflow execution.

    3. Configure a Custom Workflow

    See Developing a custom workflow documentation for adding a new workflow to your deployment.

    There are plenty of workflow examples using Cumulus tasks here. The Data Cookbooks provide a more in-depth look at some of these more advanced workflows and their configurations.

    There is a list of Cumulus tasks already included in your deployment here.

    After configuring your workflow and redeploying, you can configure and run your workflow using the same steps as in step 2.


    Helpful Tips

    Here are some useful tips to keep in mind when deploying or working in Cumulus.

    Integrator/Developer

    • Versioning and Releases: This documentation gives information on our global versioning approach. We suggest upgrading to the supported version for Cumulus, Cumulus dashboard, and Thin Egress App (TEA).
    • Cumulus Developer Documentation: We suggest that you read through and reference this resource for development best practices in Cumulus.
    • Cumulus Deployment: We will guide you on how to manually deploy a new instance of Cumulus. In this reference, you will learn how to install Terraform, create an AWS S3 bucket, configure a compatible database, and create a Lambda layer.
    • Terraform Best Practices: This will help guide you through your Terraform configuration and Cumulus deployment. For an introduction about Terraform go here.
    • Integrator Common Use Cases: Scenarios to help integrators along in the Cumulus environment.

    Operator

    Troubleshooting

    Troubleshooting: Some suggestions to help you troubleshoot and solve issues you may encounter.

    Resources

    - + \ No newline at end of file diff --git a/docs/v13.4.0/glossary/index.html b/docs/v13.4.0/glossary/index.html index 7b716f68ef2..b648ee4115c 100644 --- a/docs/v13.4.0/glossary/index.html +++ b/docs/v13.4.0/glossary/index.html @@ -5,13 +5,13 @@ Glossary | Cumulus Documentation - +
    Version: v13.4.0

    Glossary

    AWS Glossary

    For terms/items from Amazon/AWS not mentioned in this glossary, please refer to the AWS Glossary.

    Cumulus Glossary of Terms

    API Gateway

    Refers to AWS's API Gateway. Used by the Cumulus API.

    ARN

    Refers to an AWS "Amazon Resource Name".

    For more info, see the AWS documentation.

    AWS

    See: aws.amazon.com

    AWS Lambda/Lambda Function

    AWS's 'serverless' option. Allows the running of code without provisioning a service or managing server/ECS instances/etc.

    For more information, see the AWS Lambda documentation.

    AWS Access Keys

    Access credentials that give you access to AWS to act as a IAM user programmatically or from the command line.

    For more information, see the AWS IAM Documentation.

    Bucket

    An Amazon S3 cloud storage resource.

    For more information, see the AWS Bucket Documentation.

    CloudFormation

    An AWS service that allows you to define and manage cloud resources as a preconfigured block.

    For more information, see the AWS CloudFormation User Guide.

    Cloudformation Template

    A template that defines an AWS Cloud Formation.

    For more information, see the AWS intro page.

    Cloudwatch

    AWS service that allows logging and metrics collections on various cloud resources you have in AWS.

    For more information, see the AWS User Guide.

    Cloud Notification Mechanism (CNM)

    An interface mechanism to support cloud-based ingest messaging. For more information, see PO.DAAC's CNM Schema.

    Common Metadata Repository (CMR)

    "A high-performance, high-quality, continuously evolving metadata system that catalogs Earth Science data and associated service metadata records". For more information, see NASA's CMR page.

    Collection (Cumulus)

    Cumulus Collections are logical sets of data objects of the same data type and version.

    For more information, see cookbook reference page.

    Cumulus Message Adapter (CMA)

    A library designed to help task developers integrate step function tasks into a Cumulus workflow by adapting task input/output into the Cumulus Message format.

    For more information, see CMA workflow reference page.

    Distributed Active Archive Center (DAAC)

    Refers to a specific organization that's part of NASA's distributed system of archive centers. For more information see EOSDIS's DAAC page

    Dead Letter Queue (DLQ)

    This refers to Amazon SQS Dead-Letter Queues - these SQS queues are specifically configured to capture failed messages from other services/SQS queues/etc to allow for processing of failed messages.

    For more on DLQs, see the Amazon Documentation and the Cumulus DLQ feature page.

    Developer

    Those who setup deployment and workflow management for Cumulus. Sometimes referred to as an integrator. See integrator.

    ECS

    Amazon's Elastic Container Service. Used in Cumulus by workflow steps that require more flexibility than Lambda can provide.

    For more information, see AWS's developer guide.

    ECS Activity

    An ECS instance run via a Step Function.

    Execution (Cumulus)

    A Cumulus execution refers to a single execution of a (Cumulus) Workflow.

    GIBS

    Global Imagery Browse Services

    Granule

    A granule is the smallest aggregation of data that can be independently managed (described, inventoried, and retrieved). Granules are always associated with a collection, which is a grouping of granules. A granule is a grouping of data files.

    IAM

    AWS Identity and Access Management.

    For more information, see AWS IAMs.

    Integrator/Developer

    Those who work within Cumulus and AWS for deployments and to manage workflows.

    Kinesis

    Amazon's platform for streaming data on AWS.

    See AWS Kinesis for more information.

    Lambda

    AWS's cloud service that lets you run code without provisioning or managing servers.

    For more information, see AWS's lambda page.

    Module (Terraform)

    Refers to a terraform module.

    Node

    See node.js.

    Npm

    Node package manager.

    For more information, see npmjs.com.

    Operator

    Those who work within Cumulus to ingest/archive data and manage collections.

    PDR

    "Polling Delivery Mechanism" used in "DAAC Ingest" workflows.

    For more information, see nasa.gov.

    Packages (NPM)

    NPM hosted node.js packages. Cumulus packages can be found on NPM's site here

    Provider

    Data source that generates and/or distributes data for Cumulus workflows to act upon.

    For more information, see the Cumulus documentation.

    Rule

    Rules are configurable scheduled events that trigger workflows based on various criteria.

    For more information, see the Cumulus Rules documentation.

    S3

    Amazon's Simple Storage Service provides data object storage in the cloud. Used in Cumulus to store configuration, data and more.

    For more information, see AWS's s3 page.

    SIPS

    Science Investigator-led Processing Systems. In the context of DAAC ingest, this refers to data producers/providers.

    For more information, see nasa.gov.

    SNS

    Amazon's Simple Notification Service provides a messaging service that allows publication of and subscription to events. Used in Cumulus to trigger workflow events, track event failures, and others.

    For more information, see AWS's SNS page.

    SQS

    Amazon's Simple Queue Service.

    For more information, see AWS's SQS page.

    Stack

    A collection of AWS resources you can manage as a single unit.

    In the context of Cumulus, this refers to a deployment of the cumulus and data-persistence modules that is managed by Terraform

    Step Function

    AWS's web service that allows you to compose complex workflows as a state machine comprised of tasks (Lambdas, activities hosted on EC2/ECS, some AWS service APIs, etc). See AWS's Step Function Documentation for more information. In the context of Cumulus these are the underlying AWS service used to create Workflows.

    Terraform

    Terraform is the tool that you will use for deployment and configuration of your Cumulus environment.

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/index.html b/docs/v13.4.0/index.html index 48ad535e5fc..bf432a5287f 100644 --- a/docs/v13.4.0/index.html +++ b/docs/v13.4.0/index.html @@ -5,13 +5,13 @@ Introduction | Cumulus Documentation - +
    Version: v13.4.0

    Introduction

    This Cumulus project seeks to address the existing need for a “native” cloud-based data ingest, archive, distribution, and management system that can be used for all future Earth Observing System Data and Information System (EOSDIS) data streams via the development and implementation of Cumulus. The term “native” implies that the system will leverage all components of a cloud infrastructure provided by the vendor for efficiency (in terms of both processing time and cost). Additionally, Cumulus will operate on future data streams involving satellite missions, aircraft missions, and field campaigns.

    This documentation includes both guidelines, examples, and source code docs. It is accessible at https://nasa.github.io/cumulus.


    Get To Know Cumulus

    • Getting Started - here - If you are new to Cumulus we suggest that you begin with this section to help you understand and work in the environment.
    • General Cumulus Documentation - here <- you're here

    Cumulus Reference Docs

    • Cumulus API Documentation - here
    • Cumulus Developer Documentation - here - READMEs throughout the main repository.
    • Data Cookbooks - here

    Auxiliary Guides

    • Integrator Guide - here
    • Operator Docs - here

    Contributing

    Please refer to: https://github.com/nasa/cumulus/blob/master/CONTRIBUTING.md for information. We thank you in advance.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/integrator-guide/about-int-guide/index.html b/docs/v13.4.0/integrator-guide/about-int-guide/index.html index 359c975d6e2..3a7940f4f2a 100644 --- a/docs/v13.4.0/integrator-guide/about-int-guide/index.html +++ b/docs/v13.4.0/integrator-guide/about-int-guide/index.html @@ -5,13 +5,13 @@ About Integrator Guide | Cumulus Documentation - +
    Version: v13.4.0

    About Integrator Guide

    Purpose

    The Integrator Guide is to help supplement the Cumulus documentation and Data Cookbooks. This content is for Cumulus integrators who are either new to the project or need a step-by-step resource to help them along.

    What Is A Cumulus Integrator

    Cumulus integrators are those who work within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    - + \ No newline at end of file diff --git a/docs/v13.4.0/integrator-guide/int-common-use-cases/index.html b/docs/v13.4.0/integrator-guide/int-common-use-cases/index.html index 3627e071f2e..33881c91532 100644 --- a/docs/v13.4.0/integrator-guide/int-common-use-cases/index.html +++ b/docs/v13.4.0/integrator-guide/int-common-use-cases/index.html @@ -5,13 +5,13 @@ Integrator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v13.4.0/integrator-guide/workflow-add-new-lambda/index.html b/docs/v13.4.0/integrator-guide/workflow-add-new-lambda/index.html index 349df59b089..3a995e8b774 100644 --- a/docs/v13.4.0/integrator-guide/workflow-add-new-lambda/index.html +++ b/docs/v13.4.0/integrator-guide/workflow-add-new-lambda/index.html @@ -5,13 +5,13 @@ Workflow - Add New Lambda | Cumulus Documentation - +
    Version: v13.4.0

    Workflow - Add New Lambda

    You can develop a workflow task in AWS Lambda or Elastic Container Service (ECS). AWS ECS requires Docker. For a list of tasks to use go to our Cumulus Tasks page.

    The following steps are to help you along as you write a new Lambda that integrates with a Cumulus workflow. This will aid you with the understanding of the Cumulus Message Adapter (CMA) process.

    Steps

    1. Define New Lambda in Terraform

    2. Add Task in JSON Object

      For details on how to set up a workflow via CMA go to the CMA Tasks: Message Flow.

      You will need to assign input and output for the new task and follow the CMA contract here. This contract defines how libraries should call the cumulus-message-adapter to integrate a task into an existing Cumulus Workflow.

    3. Verify New Task

      Check the updated workflow in AWS and in Cumulus.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/integrator-guide/workflow-ts-failed-step/index.html b/docs/v13.4.0/integrator-guide/workflow-ts-failed-step/index.html index d0a5fdd528a..fab12367a97 100644 --- a/docs/v13.4.0/integrator-guide/workflow-ts-failed-step/index.html +++ b/docs/v13.4.0/integrator-guide/workflow-ts-failed-step/index.html @@ -5,13 +5,13 @@ Workflow - Troubleshoot Failed Step(s) | Cumulus Documentation - +
    Version: v13.4.0

    Workflow - Troubleshoot Failed Step(s)

    Steps

    1. Locate Step
    • Go to Cumulus dashboard
    • Find the granule
    • Go to Executions to determine the failed step
    1. Investigate in Cloudwatch
    • Go to Cloudwatch
    • Locate lambda
    • Search Cloudwatch logs
    1. Recreate Error

      In your sandbox environment, try to recreate the error.

    2. Resolution

    - + \ No newline at end of file diff --git a/docs/v13.4.0/interfaces/index.html b/docs/v13.4.0/interfaces/index.html index 8542162b07e..8e747a8f700 100644 --- a/docs/v13.4.0/interfaces/index.html +++ b/docs/v13.4.0/interfaces/index.html @@ -5,13 +5,13 @@ Interfaces | Cumulus Documentation - +
    Version: v13.4.0

    Interfaces

    Cumulus has multiple interfaces that allow interaction with discrete components of the system, such as starting workflows via SNS/Kinesis/SQS, manually queueing workflow start messages, submitting SNS notifications for completed workflows, and the many operations allowed by the Cumulus API.

    The diagram below illustrates the workflow process in detail and the various interfaces that allow starting of workflows, reporting of workflow information, and database create operations that occur when a workflow reporting message is processed. For interfaces with expected input or output schemas, details are provided below.

    Architecture diagram showing the interfaces for triggering and reporting of Cumulus workflow executions

    Workflow triggers and queuing

    Kinesis stream

    As a Kinesis stream is consumed by the messageConsumer Lambda to queue workflow executions, the incoming event is validated against this consumer schema by the ajv package.

    SQS queue for executions

    The messages put into the SQS queue for executions should conform to the Cumulus message format.

    Workflow executions

    See the documentation on Cumulus workflows.

    Workflow reporting

    SNS reporting topics

    For granule and PDR reporting, the topics will only receive data if the Cumulus workflow execution message meets the following criteria:

    • Granules - workflow message contains granule data in payload.granules
    • PDRs - workflow message contains PDR data in payload.pdr

    The messages published to the SNS reporting topics for executions and PDRs and the record property in the messages published to the granules SNS topic should conform to the model schema for each data type.

    Further detail on workflow reporting and how to interact with these interfaces can be found in the workflow notifications data cookbook.

    Cumulus API

    See the Cumulus API documentation.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/operator-docs/about-operator-docs/index.html b/docs/v13.4.0/operator-docs/about-operator-docs/index.html index c512cd42bdc..b2ef4603156 100644 --- a/docs/v13.4.0/operator-docs/about-operator-docs/index.html +++ b/docs/v13.4.0/operator-docs/about-operator-docs/index.html @@ -5,13 +5,13 @@ About Operator Docs | Cumulus Documentation - +
    Version: v13.4.0

    About Operator Docs

    Purpose

    Operator Docs are an augmentation to Cumulus documentation and Data Cookbooks. These documents will walk step-by-step through common Cumulus activities (that aren't necessarily as use-case directed as what you'd see in Data Cookbooks).

    What Is A Cumulus Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections. They may perform the following functions via the operator dashboard or API:

    • Configure providers and collections
    • Configure rules and monitor workflow executions
    • Monitor granule ingestion
    • Monitor system metrics
    - + \ No newline at end of file diff --git a/docs/v13.4.0/operator-docs/bulk-operations/index.html b/docs/v13.4.0/operator-docs/bulk-operations/index.html index b8b8e49dcb5..36b078990ce 100644 --- a/docs/v13.4.0/operator-docs/bulk-operations/index.html +++ b/docs/v13.4.0/operator-docs/bulk-operations/index.html @@ -5,14 +5,14 @@ Bulk Operations | Cumulus Documentation - +
    Version: v13.4.0

    Bulk Operations

    Cumulus implements bulk operations through the use of AsyncOperations, which are long-running processes executed on an AWS ECS cluster.

    Submitting a bulk API request

    Bulk operations are generally submitted via the endpoint for the relevant data type, e.g. granules. For a list of supported API requests, refer to the Cumulus API documentation. Bulk operations are denoted with the keyword 'bulk'.

    Starting bulk operations from the Cumulus dashboard

    Using a Kibana query

    Note: You must have configured your dashboard build with a KIBANAROOT environment variable in order for the Kibana link to render in the bulk granules modal

    1. From the Granules dashboard page, click on the "Run Bulk Granules" button, then select what type of action you would like to perform

      • Note: the rest of the process is the same regardless of what type of bulk action you perform
    2. From the bulk granules modal, click the "Open Kibana" link:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations

    3. Once you have accessed Kibana, navigate to the "Discover" page. If this is your first time using Kibana, you may see a message like this at the top of the page:

      In order to visualize and explore data in Kibana, you'll need to create an index pattern to retrieve data from Elasticsearch.

      In that case, see the docs for creating an index pattern for Kibana

      Screenshot of Kibana user interface showing the &quot;Discover&quot; page for running queries

    4. Enter a query that returns the granule records that you want to use for bulk operations:

      Screenshot of Kibana user interface showing an example Kibana query and results

    5. Once the Kibana query is returning the results you want, click the "Inspect" link near the top of the page. A slide out tab with request details will appear on the right side of the page:

      Screenshot of Kibana user interface showing details of an example request

    6. In the slide out tab that appears on the right side of the page, click the "Request" link near the top and scroll down until you see the query property:

      Screenshot of Kibana user interface showing the Elasticsearch data request made for a given Kibana query

    7. Highlight and copy the query contents from Kibana. Go back to the Cumulus dashboard and paste the query contents from Kibana inside of the query property in the bulk granules request payload. It is expected that you should have a property of query nested inside of the existing query property:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query information populated

    8. Add values for the index and workflowName to the bulk granules request payload. The value for index will vary based on your Elasticsearch setup, but it is good to target an index specifically for granule data if possible:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query, index, and workflow information populated

    9. Click the "Run Bulk Operations" button. You should see a confirmation message, including an ID for the async operation that was started to handle your bulk action. You can track the status of this async operation on the Operations dashboard page, which can be visited by clicking the "Go To Operations" button:

      Screenshot of Cumulus dashboard showing confirmation message with async operation ID for bulk granules request

    Creating an index pattern for Kibana

    1. Define the index pattern for the indices that your Kibana queries should use. A wildcard character, *, will match across multiple indices. Once you are satisfied with your index pattern, click the "Next step" button:

      Screenshot of Kibana user interface for defining an index pattern

    2. Choose whether to use a Time Filter for your data, which is not required. Then click the "Create index pattern" button:

      Screenshot of Kibana user interface for configuring the settings of an index pattern

    Status Tracking

    All bulk operations return an AsyncOperationId which can be submitted to the /asyncOperations endpoint.

    The /asyncOperations endpoint allows listing of AsyncOperation records as well as record retrieval for individual records, which will contain the status. The Cumulus API documentation shows sample requests for these actions.

    The Cumulus Dashboard also includes an Operations monitoring page, where operations and their status are visible:

    Screenshot of Cumulus Dashboard Operations Page showing 5 operations and their status, ID, description, type and creation timestamp

    - + \ No newline at end of file diff --git a/docs/v13.4.0/operator-docs/cmr-operations/index.html b/docs/v13.4.0/operator-docs/cmr-operations/index.html index 90dba94f513..a36829cb177 100644 --- a/docs/v13.4.0/operator-docs/cmr-operations/index.html +++ b/docs/v13.4.0/operator-docs/cmr-operations/index.html @@ -5,7 +5,7 @@ CMR Operations | Cumulus Documentation - + @@ -16,7 +16,7 @@ UpdateCmrAccessConstraints will update CMR metadata file contents on S3, and PostToCmr will push the updates to CMR. The rest of this section will assume you have created this workflow under the name UpdateCmrAccessConstraints.

    Once created and deployed, the workflow is available in the Cumulus dashboard's Execute workflow selector. However, note that additional configuration is required for this request, to supply an access constraint integer value and optional description to the UpdateCmrAccessConstraints workflow, by clicking the Add Custom Workflow Meta option in the Execute popup, as shown below:

    Screenshot showing granule execute popup with &#39;updateCmrAccessConstraints&#39; selected and configuration values shown in a collapsible JSON field

    An example invocation of the API to perform this action is:

    $ curl --request PUT https://example.com/granules/MOD11A1.A2017137.h19v16.006.2017138085750 \
    --header 'Authorization: Bearer ReplaceWithTheToken' \
    --header 'Content-Type: application/json' \
    --data '{
    "action": "applyWorkflow",
    "workflow": "updateCmrAccessConstraints",
    "meta": {
    accessConstraints: {
    value: 5,
    description: "sample access constraint"
    }
    }
    }'

    Supported CMR metadata formats for the above operation are Echo10XML and UMMG-JSON, which will populate the RestrictionFlag and RestrictionComment fields in Echo10XML, or the AccessConstraints values in UMMG-JSON.

    Additional Operations

    At this time Cumulus does not, out of the box, support additional operations on CMR metadata. However, given the examples shown above, we recommend working with your integrators to develop additional workflows that perform any required operations.

    Bulk CMR operations

    In order to perform the above operations in bulk, Cumulus supports the use of ApplyWorkflow in an AsyncOperation. These are accessed via the Bulk Operation button on the dashboard, or the /granules/bulk endpoint on the Cumulus API.

    More information on bulk operations are in the bulk operations operator doc.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/operator-docs/create-rule-in-cumulus/index.html b/docs/v13.4.0/operator-docs/create-rule-in-cumulus/index.html index 2b0fc367388..430c9510dc2 100644 --- a/docs/v13.4.0/operator-docs/create-rule-in-cumulus/index.html +++ b/docs/v13.4.0/operator-docs/create-rule-in-cumulus/index.html @@ -5,13 +5,13 @@ Create Rule In Cumulus | Cumulus Documentation - +
    Version: v13.4.0

    Create Rule In Cumulus

    Once the above files are in place and the entries created in CMR and Cumulus, we are ready to begin ingesting data. Depending on the type of ingestion (FTP/Kinesis, etc) the values below will change, but for the most part they are all similar. Rules tell Cumulus how to associate providers and collections, and when/how to start processing a workflow.

    Steps

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v13.4.0/operator-docs/discovery-filtering/index.html b/docs/v13.4.0/operator-docs/discovery-filtering/index.html index d6abb4416f0..14453d9871b 100644 --- a/docs/v13.4.0/operator-docs/discovery-filtering/index.html +++ b/docs/v13.4.0/operator-docs/discovery-filtering/index.html @@ -5,7 +5,7 @@ Discovery Filtering | Cumulus Documentation - + @@ -24,7 +24,7 @@ directly list the provider_path. If the path contains regular expression components, this may fail.

    It is recommended that operators diagnose any failures by checking error logs and ensuring that permissions on the remote file system allow reading of the default directory and any subdirectories that match the filter.

    Supported protocols

    Currently support for this feature is limited to the following protocols:

    • ftp
    • sftp
    - + \ No newline at end of file diff --git a/docs/v13.4.0/operator-docs/granule-workflows/index.html b/docs/v13.4.0/operator-docs/granule-workflows/index.html index 6cd4e0808b1..f8f98be5ff1 100644 --- a/docs/v13.4.0/operator-docs/granule-workflows/index.html +++ b/docs/v13.4.0/operator-docs/granule-workflows/index.html @@ -5,13 +5,13 @@ Granule Workflows | Cumulus Documentation - +
    Version: v13.4.0

    Granule Workflows

    Failed Granule

    Delete and Ingest

    1. Delete Granule

    Note: Granules published to CMR will need to be removed from CMR via the dashboard prior to deletion

    1. Ingest Granule via Ingest Rule
    • Re-trigger a one-time, kinesis, SQS, or SNS rule or a scheduled rule will re-discover and reingest the deleted granule.

    Reingest

    1. Select Failed Granule
    • In the Cumulus dashboard, go to the Collections page.
    • Use search field to find the granule.
    1. Re-ingest Granule
    • Go to the Collections page.
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of the Reingest modal workflow

    Delete and Ingest

    1. Bulk Delete Granules
    • Go to the Granules page.
    • Use the Bulk Delete button to bulk delete selected granules or select via a Kibana query

    Note: You can optionally force deletion from CMR

    1. Ingest Granules via Ingest Rule
    • Re-trigger one-time, kinesis, SQS, or SNS rules or scheduled rules will re-discover and reingest the deleted granule.

    Multiple Failed Granules

    1. Select Failed Granules
    • In the Cumulus dashboard, go to the Collections page.
    • Click on Failed Granules.
    • Select multiple granules.

    Screenshot of selected multiple granules

    1. Bulk Re-ingest Granules
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of Bulk Reingest modal workflow

    - + \ No newline at end of file diff --git a/docs/v13.4.0/operator-docs/kinesis-stream-for-ingest/index.html b/docs/v13.4.0/operator-docs/kinesis-stream-for-ingest/index.html index 40e6982e0c8..27acd8774fa 100644 --- a/docs/v13.4.0/operator-docs/kinesis-stream-for-ingest/index.html +++ b/docs/v13.4.0/operator-docs/kinesis-stream-for-ingest/index.html @@ -5,13 +5,13 @@ Setup Kinesis Stream & CNM Message | Cumulus Documentation - +
    Version: v13.4.0

    Setup Kinesis Stream & CNM Message

    Note: Keep in mind that you should only have to set this up once per ingest stream. Kinesis pricing is based on the shard value and not on amount of kinesis usage.

    1. Create a Kinesis Stream

      • In your AWS console, go to the Kinesis service and click Create Data Stream.
      • Assign a name to the stream.
      • Apply a shard value of 1.
      • Click on Create Kinesis Stream.
      • A status page with stream details display. Once the status is active then the stream is ready to use. Keep in mind to record the streamName and StreamARN for later use.

      Screenshot of AWS console page for creating a Kinesis stream

    2. Create a Rule

    3. Send a message

      • Send a message that makes your schema using python or by your command line.
      • The streamName and Collection must match the kinesisArn+collection defined in the rule that you have created in Step 2.
    - + \ No newline at end of file diff --git a/docs/v13.4.0/operator-docs/locating-access-logs/index.html b/docs/v13.4.0/operator-docs/locating-access-logs/index.html index 72c194ed81c..bc1c60796eb 100644 --- a/docs/v13.4.0/operator-docs/locating-access-logs/index.html +++ b/docs/v13.4.0/operator-docs/locating-access-logs/index.html @@ -5,13 +5,13 @@ Locating S3 Access Logs | Cumulus Documentation - +
    Version: v13.4.0

    Locating S3 Access Logs

    When enabling S3 Access Logs for EMS Reporting you configured a TargetBucket and TargetPrefix. Inside the TargetBucket at the TargetPrefix is where you will find the raw S3 access logs.

    In a standard deployment, this will be your stack's <internal bucket name> and a key prefix of <stack>/ems-distribution/s3-server-access-logs/

    - + \ No newline at end of file diff --git a/docs/v13.4.0/operator-docs/naming-executions/index.html b/docs/v13.4.0/operator-docs/naming-executions/index.html index 290801b29fb..a23492fce84 100644 --- a/docs/v13.4.0/operator-docs/naming-executions/index.html +++ b/docs/v13.4.0/operator-docs/naming-executions/index.html @@ -5,7 +5,7 @@ Naming Executions | Cumulus Documentation - + @@ -21,7 +21,7 @@ QueuePdrs step.

    In the following excerpt, the QueueGranules config.executionNamePrefix property is set using the value configured in the workflow's meta.executionNamePrefix.

    Please note: This meta.executionNamePrefix property should not be confused with the optional rule executionNamePrefix property from the previous section. Setting executionNamePrefix as a root property of the rule will set a prefix for the names of any workflows triggered by the rule. Setting meta.executionNamePrefix on the rule will set meta.executionNamePrefix in the workflow messages generated for this rule, allowing workflow steps like QueueGranules to read from the message meta.executionNamePrefix for their config. Then, workflows scheduled by QueueGranules would use the configured execution name prefix.

    Setting executionNamePrefix config for QueueGranules using rule.meta

    If you wanted to use a prefix of "my-prefix", you would create a rule with a meta property similar to the following Rule snippet:

    {
    ...other rule keys here...
    "meta":
    {
    "executionNamePrefix": "my-prefix"
    }
    }

    The value of meta.executionNamePrefix from the rule will be set as meta.executionNamePrefix in the workflow message.

    Then, the workflow could contain a "QueueGranules" step with the following state, which uses meta.executionNamePrefix from the message as the value for the executionNamePrefix config to the "QueueGranules" step:

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "executionNamePrefix": "{$.meta.executionNamePrefix}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },
    }
    - + \ No newline at end of file diff --git a/docs/v13.4.0/operator-docs/ops-common-use-cases/index.html b/docs/v13.4.0/operator-docs/ops-common-use-cases/index.html index 31a546d6398..d5c7746f28e 100644 --- a/docs/v13.4.0/operator-docs/ops-common-use-cases/index.html +++ b/docs/v13.4.0/operator-docs/ops-common-use-cases/index.html @@ -5,13 +5,13 @@ Operator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v13.4.0/operator-docs/trigger-workflow/index.html b/docs/v13.4.0/operator-docs/trigger-workflow/index.html index 1ab46c1689d..b2498982b53 100644 --- a/docs/v13.4.0/operator-docs/trigger-workflow/index.html +++ b/docs/v13.4.0/operator-docs/trigger-workflow/index.html @@ -5,13 +5,13 @@ Trigger a Workflow Execution | Cumulus Documentation - +
    Version: v13.4.0

    Trigger a Workflow Execution

    To trigger a workflow, you need to create a rule. To trigger an ingest workflow, one that requires discovering and ingesting data, you will also need to configure the collection and provider and associate those to a rule.

    Trigger a HelloWorld Workflow

    To trigger a HelloWorld workflow that does not need to discover or archive data, you just need to create a rule.

    You can leave the provider and collection blank and do not need any additional metadata. If you create a onetime rule, the workflow execution will start momentarily and you can view its status on the Executions page.

    Trigger an Ingest Workflow

    To ingest data, you will need a provider and collection configured to tell your workflow where to discover data and where to archive the data respectively.

    Follow the instructions to create a provider and create a collection and configure their fields for your data ingest.

    In the rule's additional metadata you can specify a provider_path from which to get the data from the provider.

    Example: Ingest data from S3

    Setup

    Assume there are 2 files to be ingested in an S3 bucket called discovery-bucket, located in the test-data folder:

    • GRANULE.A2017025.jpg
    • GRANULE.A2017025.hdf

    Archive buckets should already be created and mapped to public / private / protected in the Cumulus deployment.

    For example:

    buckets = {
    private = {
    name = "discovery-bucket"
    type = "private"
    },
    protected = {
    name = "archive-protected"
    type = "protected"
    }
    public = {
    name = "archive-public"
    type = "public"
    }
    }

    Create a provider

    Create a new provider. Set protocol to S3 and Host to discovery-bucket.

    Screenshot of adding a sample S3 provider

    Create a collection

    Create a new collection. Configure the collection to extract the granule id from the filenames and configure where to store the granule files.

    The configuration below will store hdf files in the protected bucket and jpg files in the private bucket. The bucket types are

    {
    "name": "test-collection",
    "version": "001",
    "granuleId": "^GRANULE\\.A[\\d]{7}$",
    "granuleIdExtraction": "(GRANULE\\..*)(\\.hdf|\\.jpg)",
    "reportToEms": false,
    "sampleFileName": "GRANULE.A2017025.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^GRANULE\\.A[\\d]{7}\\.hdf$",
    "sampleFileName": "GRANULE.A2017025.hdf"
    },
    {
    "bucket": "public",
    "regex": "^GRANULE\\.A[\\d]{7}\\.jpg$",
    "sampleFileName": "GRANULE.A2017025.jpg"
    }
    ]
    }

    Create a rule

    Create a rule to trigger the workflow to discover your granule data and ingest your granule.

    Select the previously created provider and collection. See the Cumulus Discover Granules workflow for a workflow example of using Cumulus tasks to discover and queue data for ingest.

    In the rule meta, set the provider_path to test-data, so the test-data folder will be used to discover new granules.

    Screenshot of adding a Discover Granules rule

    A onetime rule will run your workflow on-demand and you can view it on the dashboard Executions page. The Cumulus Discover Granules workflow will trigger an ingest workflow and your ingested granules will be visible on the dashboard Granules page.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/tasks/index.html b/docs/v13.4.0/tasks/index.html index 1e4ca3cfc27..b6aa4891c2e 100644 --- a/docs/v13.4.0/tasks/index.html +++ b/docs/v13.4.0/tasks/index.html @@ -5,13 +5,13 @@ Cumulus Tasks | Cumulus Documentation - +
    Version: v13.4.0

    Cumulus Tasks

    A list of reusable Cumulus tasks. Add your own.

    NOTE: For a detailed description of each task, visit the task's README.md. Information on the input or output of a task is specified in the task's schemas directory.

    Tasks

    @cumulus/add-missing-file-checksums

    Add checksums to files in S3 which don't have one


    @cumulus/discover-granules

    Discover Granules in FTP/HTTP/HTTPS/SFTP/S3 endpoints


    @cumulus/discover-pdrs

    Discover PDRs in FTP and HTTP endpoints


    @cumulus/files-to-granules

    Converts array-of-files input into a granules object by extracting granuleId from filename


    @cumulus/hello-world

    Example task


    @cumulus/hyrax-metadata-updates

    Update granule metadata with hooks to OPeNDAP URL


    @cumulus/lzards-backup

    Run LZARDS backup


    @cumulus/move-granules

    Move granule files from staging to final location


    @cumulus/parse-pdr

    Download and Parse a given PDR


    @cumulus/pdr-status-check

    Checks execution status of granules in a PDR


    @cumulus/post-to-cmr

    Post a given granule to CMR


    @cumulus/queue-granules

    Add discovered granules to the queue


    @cumulus/queue-pdrs

    Add discovered PDRs to a queue


    @cumulus/queue-workflow

    Add workflow to the queue


    @cumulus/sf-sqs-report

    Sends an incoming Cumulus message to SQS


    @cumulus/sync-granule

    Download a given granule


    @cumulus/test-processing

    Fake processing task used for integration tests


    @cumulus/update-cmr-access-constraints

    Updates CMR metadata to set access constraints


    Update CMR metadata files with correct online access urls and etags and transfer etag info to granules' CMR files

    - + \ No newline at end of file diff --git a/docs/v13.4.0/team/index.html b/docs/v13.4.0/team/index.html index 5c153dd3ba1..e53750eeae3 100644 --- a/docs/v13.4.0/team/index.html +++ b/docs/v13.4.0/team/index.html @@ -5,13 +5,13 @@ Cumulus Team | Cumulus Documentation - +
    Version: v13.4.0

    Cumulus Team

    Cumulus Core Team

    Cumulus Emeritus Team

    - + \ No newline at end of file diff --git a/docs/v13.4.0/troubleshooting/index.html b/docs/v13.4.0/troubleshooting/index.html index 08a2452b417..bd5cbbb10b9 100644 --- a/docs/v13.4.0/troubleshooting/index.html +++ b/docs/v13.4.0/troubleshooting/index.html @@ -5,14 +5,14 @@ How to Troubleshoot and Fix Issues | Cumulus Documentation - +
    Version: v13.4.0

    How to Troubleshoot and Fix Issues

    While Cumulus is a complex system, there is a focus on maintaining the integrity and availability of the system and data. Should you encounter errors or issues while using this system, this section will help troubleshoot and solve those issues.

    Backup and Restore

    Cumulus has backup and restore functionality built-in to protect Cumulus data and allow recovery of a Cumulus stack. This is currently limited to Cumulus data and not full S3 archive data. Backup and restore is not enabled by default and must be enabled and configured to take advantage of this feature.

    For more information, read the Backup and Restore documentation.

    Elasticsearch reindexing

    If you run into issues with your Elasticsearch index, a reindex operation is available via the Cumulus API. See the Reindexing Guide.

    Information on how to reindex Elasticsearch is in the Cumulus API documentation.

    Troubleshooting Workflows

    Workflows are state machines comprised of tasks and services and each component logs to CloudWatch. The CloudWatch logs for all steps in the execution are displayed in the Cumulus dashboard or you can find them by going to CloudWatch and navigating to the logs for that particular task.

    Workflow Errors

    Visual representations of executed workflows can be found in the Cumulus dashboard or the AWS Step Functions console for that particular execution.

    If a workflow errors, the error will be handled according to the error handling configuration. The task that fails will have the exception field populated in the output, giving information about the error. Further information can be found in the CloudWatch logs for the task.

    Graph of AWS Step Function execution showing a failing workflow

    Workflow Did Not Start

    Generally, first check your rule configuration. If that is satisfactory, the answer will likely be in the CloudWatch logs for the schedule SF or SF starter lambda functions. See the workflow triggers page for more information on how workflows start.

    For Kinesis and SNS rules specifically, if an error occurs during the message consumer process, the fallback consumer lambda will be called and if the message continues to error, a message will be placed on the dead letter queue. Check the dead letter queue for a failure message. Errors can be traced back to the CloudWatch logs for the message consumer and the fallback consumer. Additionally, check that the name and version match those configured in your rule, as rules are filtered by the notification's collection name and version before scheduling executions.

    More information on kinesis error handling is here.

    Operator API Errors

    All operator API calls are funneled through the ApiEndpoints lambda. Each API call is logged to the ApiEndpoints CloudWatch log for your deployment.

    Lambda Errors

    KMS Exception: AccessDeniedException

    KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.

    The above error was being thrown by cumulus lambda function invocation. The KMS key is the encryption key used to encrypt lambda environment variables. The root cause of this error is unknown, but is speculated to be caused by deleting and recreating, with the same name, the IAM role the lambda uses.

    This error can be resolved by switching the lambda's execution role to a different one and then back through the Lambda management console. Unfortunately, this approach doesn't scale well.

    The other resolution (that scales but takes some time) that was found is as follows:

    1. Comment out all lambda definitions (and dependent resources) in your Terraform configuration.
    2. terraform apply to delete the lambdas.
    3. Un-comment the definitions.
    4. terraform apply to recreate the lambdas.

    If this problem occurs with Core lambdas and you are using the terraform-aws-cumulus.zip file source distributed in our release, we recommend using the non-scaling approach as the number of lambdas we distribute is in the low teens, which are likely to be easier and faster to reconfigure one-by-one compared to editing our configs.

    Error: Unable to import module 'index': Error

    This error is shown in the CloudWatch logs for a Lambda function.

    One possible cause is that the Lambda definition in the .tf file defining the lambda is not pointing to the correct packaged lambda source file. In order to resolve this issue, update the lambda definition to point directly to the packaged (e.g. .zip) lambda source file.

    resource "aws_lambda_function" "discover_granules_task" {
    function_name = "${var.prefix}-DiscoverGranules"
    filename = "${path.module}/../../tasks/discover-granules/dist/lambda.zip"
    handler = "index.handler"
    }

    If you are seeing this error when using the Lambda as a step in a Cumulus workflow, then inspect the output for this Lambda step in the AWS Step Function console. If you see the error Cannot find module 'node_modules/@cumulus/cumulus-message-adapter-js', then you need to ensure the lambda's packaged dependencies include cumulus-message-adapter-js.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/troubleshooting/reindex-elasticsearch/index.html b/docs/v13.4.0/troubleshooting/reindex-elasticsearch/index.html index 9fe9aa1ff66..745a2099ac7 100644 --- a/docs/v13.4.0/troubleshooting/reindex-elasticsearch/index.html +++ b/docs/v13.4.0/troubleshooting/reindex-elasticsearch/index.html @@ -5,7 +5,7 @@ Reindexing Elasticsearch Guide | Cumulus Documentation - + @@ -14,7 +14,7 @@ current index, or the mappings for an index have been updated (they do not update automatically). Any reindexing that will be required when upgrading Cumulus will be in the Migration Steps section of the changelog.

    Switch to a new index and Reindex

    There are two operations needed: reindex and change-index to switch over to the new index. A Change Index/Reindex can be done in either order, but both have their trade-offs.

    If you decide to point Cumulus to a new (empty) index first (with a change index operation), and then Reindex the data to the new index, data ingested while reindexing will automatically be sent to the new index. As reindexing operations can take a while, not all the data will show up on the Cumulus Dashboard right away. The advantage is you do not have to turn of any ingest operations. This way is recommended.

    If you decide to Reindex data to a new index first, and then point Cumulus to that new index, it is not guaranteed that data that is sent to the old index while reindexing will show up in the new index. If you prefer this way, it is recommended to turn off any ingest operations. This order will keep your dashboard data from seeing any interruption.

    Change Index

    This will point Cumulus to the index in Elasticsearch that will be used when retrieving data. Performing a change index operation to an index that does not exist yet will create the index for you. The change index operation can be found here.

    Reindex from the old index to the new index

    The reindex operation will take the data from one index and copy it into another index. The reindex operation can be found here

    Reindex status

    Reindexing is a long-running operation. The reindex-status endpoint can be used to monitor the progress of the operation.

    Index from database

    If you want to just grab the data straight from the database you can perform an Index from Database Operation. After the data is indexed from the database, a Change Index operation will need to be performed to ensure Cumulus is pointing to the right index. It is strongly recommended to turn off workflow rules when performing this operation so any data ingested to the database is not lost.

    Validate reindex

    To validate the reindex, use the reindex-status endpoint. The doc count can be used to verify that the reindex was successful. In the below example the reindex from cumulus-2020-11-3 to cumulus-2021-3-4 was not fully successful as they show different doc counts.

    "indices": {
    "cumulus-2020-11-3": {
    "primaries": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    },
    "total": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    }
    },
    "cumulus-2021-3-4": {
    "primaries": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    },
    "total": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    }
    }
    }

    To further drill down into what is missing, log in to the Kibana instance (found in the Elasticsearch section of the AWS console) and run the following command replacing <index> with your index name.

    GET <index>/_search
    {
    "aggs": {
    "count_by_type": {
    "terms": {
    "field": "_type"
    }
    }
    },
    "size": 0
    }

    which will produce a result like

    "aggregations": {
    "count_by_type": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "logs",
    "doc_count": 483955
    },
    {
    "key": "execution",
    "doc_count": 4966
    },
    {
    "key": "deletedgranule",
    "doc_count": 4715
    },
    {
    "key": "pdr",
    "doc_count": 1822
    },
    {
    "key": "granule",
    "doc_count": 740
    },
    {
    "key": "asyncOperation",
    "doc_count": 616
    },
    {
    "key": "provider",
    "doc_count": 108
    },
    {
    "key": "collection",
    "doc_count": 87
    },
    {
    "key": "reconciliationReport",
    "doc_count": 48
    },
    {
    "key": "rule",
    "doc_count": 7
    }
    ]
    }
    }

    Resuming a reindex

    If a reindex operation did not fully complete it can be resumed using the following command run from the Kibana instance.

    POST _reindex?wait_for_completion=false
    {
    "conflicts": "proceed",
    "source": {
    "index": "cumulus-2020-11-3"
    },
    "dest": {
    "index": "cumulus-2021-3-4",
    "op_type": "create"
    }
    }

    The Cumulus API reindex-status endpoint can be used to monitor completion of this operation.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/troubleshooting/rerunning-workflow-executions/index.html b/docs/v13.4.0/troubleshooting/rerunning-workflow-executions/index.html index d6add5a7076..20518505a85 100644 --- a/docs/v13.4.0/troubleshooting/rerunning-workflow-executions/index.html +++ b/docs/v13.4.0/troubleshooting/rerunning-workflow-executions/index.html @@ -5,13 +5,13 @@ Re-running workflow executions | Cumulus Documentation - +
    Version: v13.4.0

    Re-running workflow executions

    To re-run a Cumulus workflow execution from the AWS console:

    1. Visit the page for an individual workflow execution

    2. Click the "New execution" button at the top right of the screen

      Screenshot of the AWS console for a Step Function execution highlighting the &quot;New execution&quot; button at the top right of the screen

    3. In the "New execution" modal that appears, replace the cumulus_meta.execution_name value in the default input with the value of the new execution ID as seen in the screenshot below

      Screenshot of the AWS console showing the modal window for entering input when running a new Step Function execution

    4. Click the "Start execution" button

    - + \ No newline at end of file diff --git a/docs/v13.4.0/troubleshooting/troubleshooting-deployment/index.html b/docs/v13.4.0/troubleshooting/troubleshooting-deployment/index.html index 580a2414a64..8d0f787d6f6 100644 --- a/docs/v13.4.0/troubleshooting/troubleshooting-deployment/index.html +++ b/docs/v13.4.0/troubleshooting/troubleshooting-deployment/index.html @@ -5,7 +5,7 @@ Troubleshooting Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ data-persistence modules, but your config is only creating one Elasticsearch instance. To fix the issue, update the elasticsearch_config variable for your data-persistence module to increase the number of instances:

    {
    domain_name = "es"
    instance_count = 2
    instance_type = "t2.small.elasticsearch"
    version = "5.3"
    volume_size = 10
    }

    Install dashboard

    Dashboard configuration

    Issues:

    • Problem clearing the cache: EACCES: permission denied, rmdir '/tmp/gulp-cache/default'", this probably means the files at that location, and/or the folder, are owned by someone else (or some other factor prevents you from writing there).

    It's possible to workaround this by editing the file cumulus-dashboard/node_modules/gulp-cache/index.js and alter the value of the line var fileCache = new Cache({cacheDirName: 'gulp-cache'}); to something like var fileCache = new Cache({cacheDirName: '<prefix>-cache'});. Now gulp-cache will be able to write to /tmp/<prefix>-cache/default, and the error should resolve.

    Dashboard deployment

    Issues:

    • If the dashboard sends you to an Earthdata Login page that has an error reading "Invalid request, please verify the client status or redirect_uri before resubmitting", this means you've either forgotten to update one or more of your EARTHDATA_CLIENT_ID, EARTHDATA_CLIENT_PASSWORD environment variables (from your app/.env file) and re-deploy Cumulus, or you haven't placed the correct values in them, or you've forgotten to add both the "redirect" and "token" URL to the Earthdata Application.
    • There is odd caching behavior associated with the dashboard and Earthdata Login at this point in time that can cause the above error to reappear on the Earthdata Login page loaded by the dashboard even after fixing the cause of the error. If you experience this, attempt to access the dashboard in a new browser window, and it should work.
    - + \ No newline at end of file diff --git a/docs/v13.4.0/upgrade-notes/cumulus_distribution_migration/index.html b/docs/v13.4.0/upgrade-notes/cumulus_distribution_migration/index.html index 4d76041b24c..d92d0fa7e02 100644 --- a/docs/v13.4.0/upgrade-notes/cumulus_distribution_migration/index.html +++ b/docs/v13.4.0/upgrade-notes/cumulus_distribution_migration/index.html @@ -5,14 +5,14 @@ Migrate from TEA deployment to Cumulus Distribution | Cumulus Documentation - +
    Version: v13.4.0

    Migrate from TEA deployment to Cumulus Distribution

    Background

    The Cumulus Distribution API is configured to use the AWS Cognito OAuth client. This API can be used instead of the Thin Egress App, which is the default distribution API if using the Deployment Template.

    Configuring a Cumulus Distribution deployment

    See these instructions for deploying the Cumulus Distribution API.

    Important note if migrating from TEA to Cumulus Distribution

    If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/upgrade-notes/migrate_tea_standalone/index.html b/docs/v13.4.0/upgrade-notes/migrate_tea_standalone/index.html index 77684193646..7c3ea18ae5d 100644 --- a/docs/v13.4.0/upgrade-notes/migrate_tea_standalone/index.html +++ b/docs/v13.4.0/upgrade-notes/migrate_tea_standalone/index.html @@ -5,13 +5,13 @@ Migrate TEA deployment to standalone module | Cumulus Documentation - +
    Version: v13.4.0

    Migrate TEA deployment to standalone module

    Background

    This document is only relevant for upgrades of Cumulus from versions < 3.x.x to versions > 3.x.x

    Previous versions of Cumulus included deployment of the Thin Egress App (TEA) by default in the distribution module. As a result, Cumulus users who wanted to deploy a new version of TEA to wait on a new release of Cumulus that incorporated that release.

    In order to give Cumulus users the flexibility to deploy newer versions of TEA whenever they want, deployment of TEA has been removed from the distribution module and Cumulus users must now add the TEA module to their deployment. Guidance on integrating the TEA module to your deployment is provided, or you can refer to Cumulus core example deployment code for the thin_egress_app module.

    By default, when upgrading Cumulus and moving from TEA deployed via the distribution module to deployed as a separate module, your API gateway for TEA would be destroyed and re-created, which could cause outages for any Cloudfront endpoints pointing at that API gateway.

    These instructions outline how to modify your state to preserve your existing Thin Egress App (TEA) API gateway when upgrading Cumulus and moving deployment of TEA to a standalone module. If you do not care about preserving your API gateway for TEA when upgrading your Cumulus deployment, you can skip these instructions.

    Prerequisites

    Notes about state management

    These instructions will involve manipulating your Terraform state via terraform state mv commands. These operations are extremely dangerous, since a mistake in editing your Terraform state can leave your stack in a corrupted state where deployment may be impossible or may result in unanticipated resource deletion.

    Since bucket versioning preserves a separate version of your state file each time it is written, and the Terraform state modification commands overwrite the state file, we can mitigate the risk of these operations by downloading the most recent state file before starting the upgrade process. Then, if anything goes wrong during the upgrade, we can restore that previous state version. Guidance on how to perform both operations is provided below.

    Download your most recent state version

    Run this command to download the most recent cumulus deployment state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp s3://BUCKET/KEY /path/to/terraform.tfstate

    Restore a previous state version

    Upload the state file that was previously downloaded to the bucket/key for your state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp /path/to/terraform.tfstate s3://BUCKET/KEY

    Then run terraform plan, which will give an error because we manually overwrote the state file and it is now out of sync with the lock table Terraform uses to track your state file:

    Error: Error loading state: state data in S3 does not have the expected content.

    This may be caused by unusually long delays in S3 processing a previous state
    update. Please wait for a minute or two and try again. If this problem
    persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
    to manually verify the remote state and update the Digest value stored in the
    DynamoDB table to the following value: <some-digest-value>

    To resolve this error, run this command and replace DYNAMO_LOCK_TABLE, BUCKET and KEY with the correct values from cumulus-tf/terraform.tf, and use the digest value from the previous error output:

     aws dynamodb put-item \
    --table-name DYNAMO_LOCK_TABLE \
    --item '{
    "LockID": {"S": "BUCKET/KEY-md5"},
    "Digest": {"S": "some-digest-value"}
    }'

    Now, if you re-run terraform plan, it should work as expected.

    Migration instructions

    Please note: These instructions assume that you are deploying the thin_egress_app module as shown in the Cumulus core example deployment code

    1. Ensure that you have downloaded the latest version of your state file for your cumulus deployment

    2. Find the URL for your <prefix>-thin-egress-app-EgressGateway API gateway. Confirm that you can access it in the browser and that it is functional.

    3. Run terraform plan. You should see output like (edited for readability):

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be created
      + resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket.lambda_source will be created
      + resource "aws_s3_bucket" "lambda_source" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be created
      + resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be created
      + resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be created
      + resource "aws_s3_bucket_object" "lambda_source" {

      # module.thin_egress_app.aws_security_group.egress_lambda[0] will be created
      + resource "aws_security_group" "egress_lambda" {

      ...

      # module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be destroyed
      - resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source will be destroyed
      - resource "aws_s3_bucket" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be destroyed
      - resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be destroyed
      - resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source will be destroyed
      - resource "aws_s3_bucket_object" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda[0] will be destroyed
      - resource "aws_security_group" "egress_lambda" {
    4. Run the state modification commands. The commands must be run in exactly this order:

       # Move security group
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda module.thin_egress_app.aws_security_group.egress_lambda

      # Move TEA storage bucket
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source module.thin_egress_app.aws_s3_bucket.lambda_source

      # Move TEA lambda source code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source module.thin_egress_app.aws_s3_bucket_object.lambda_source

      # Move TEA lambda dependency code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive

      # Move TEA Cloudformation template
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template module.thin_egress_app.aws_s3_bucket_object.cloudformation_template

      # Move URS creds secret version
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret_version.thin_egress_urs_creds aws_secretsmanager_secret_version.thin_egress_urs_creds

      # Move URS creds secret
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret.thin_egress_urs_creds aws_secretsmanager_secret.thin_egress_urs_creds

      # Move TEA Cloudformation stack
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app module.thin_egress_app.aws_cloudformation_stack.thin_egress_app

      Depending on how you were supplying a bucket map to TEA, there may be an additional step. If you were specifying the bucket_map_key variable to the cumulus module to use a custom bucket map, then you can ignore this step and just ensure that the bucket_map_file variable to the TEA module uses that same S3 key. Otherwise, if you were letting Cumulus generate a bucket map for you, then you need to take this step to migrate that bucket map:

      # Move bucket map
      terraform state mv module.cumulus.module.distribution.aws_s3_bucket_object.bucket_map_yaml[0] aws_s3_bucket_object.bucket_map_yaml
    5. Run terraform plan again. You may still see a few additions/modifications pending like below, but you should not see any deletion of Thin Egress App resources pending:

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be updated in-place
      ~ resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be updated in-place
      ~ resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_source" {

      If you still see deletion of module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app pending, then something went wrong and you should restore the previously downloaded state file version and start over from step 1. Otherwise, proceed to step 6.

    6. Once you have confirmed that everything looks as expected, run terraform apply.

    7. Visit the same API gateway from step 1 and confirm that it still works.

    Your TEA deployment has now been migrated to a standalone module, which gives you the ability to upgrade the deployed version of TEA independently of Cumulus releases.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/upgrade-notes/update-cma-2.0.2/index.html b/docs/v13.4.0/upgrade-notes/update-cma-2.0.2/index.html index ffd8dc1e485..20cffcdff50 100644 --- a/docs/v13.4.0/upgrade-notes/update-cma-2.0.2/index.html +++ b/docs/v13.4.0/upgrade-notes/update-cma-2.0.2/index.html @@ -5,13 +5,13 @@ Upgrade to CMA 2.0.2 | Cumulus Documentation - +
    Version: v13.4.0

    Upgrade to CMA 2.0.2

    Updating a Cumulus Deployment to CMA 2.0.2

    Background

    The Cumulus Message Adapter has been updated in release 2.0.2 to no longer utilize the AWS step function API to look up the defined name of a step function task for population in meta.workflow_tasks, but instead use an incrementing integer field.

    Additionally a bugfix was released in the form of v2.0.1/v2.0.2 following the initial 2.0.0 release, so all users should update to release 2.0.2

    The update is not tied to a particular version of Core, however the update should be done across all task components in order to ensure consistent execution records.

    Changes

    Execution Record Update

    This update functionally means that Cumulus tasks/activities using the CMA will now record a record that looks like the following in meta.workflowtasks, and more importantly in the tasks column for an execution record:

    Original

          "DiscoverGranules": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "QueueGranules": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    New

          "0": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "1": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    Actions Required

    The following should be done as part of a Cumulus stack update to utilize cumulus message adapter > 2.0.2:

    • Python tasks that utilize cumulus-message-adapter-python should be updated to use > 2.0.0, their lambdas rebuilt and Cumulus workflows reconfigured to use the updated version.

    • Python activities that utilize cumulus-process-py should be rebuilt using > 1.0.0 with updated dependencies, and have their images deployed/Cumulus configured to use the new version.

    • The cumulus-message-adapter v2.0.2 lambda layer should be made available in the deployment account, and the Cumulus deployment should be reconfigured to use it (via the cumulus_message_adapter_lambda_layer_version_arn variable in the cumulus module). This should address all Core node.js tasks that utilize the CMA, and many contributed node.js/JAVA components.

    Once the above have been done, redeploy Cumulus to apply the configuration and the updates should be live.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/upgrade-notes/update-task-file-schemas/index.html b/docs/v13.4.0/upgrade-notes/update-task-file-schemas/index.html index f3a92d954cb..0575a180a90 100644 --- a/docs/v13.4.0/upgrade-notes/update-task-file-schemas/index.html +++ b/docs/v13.4.0/upgrade-notes/update-task-file-schemas/index.html @@ -5,13 +5,13 @@ Updates to task granule file schemas | Cumulus Documentation - +
    Version: v13.4.0

    Updates to task granule file schemas

    Background

    Most Cumulus workflow tasks expect as input a payload of granule(s) which contain the files for each granule. Most tasks also return this same granule structure as output.

    However, up to this point, there was inconsistency in the schemas for the granule files objects expected by each task. Furthermore, there was no guarantee of consistency between granule files objects as stored in the database and the expectations of any given workflow task.

    Thus, when performing bulk granule operations which pass granules from the database into a Cumulus workflow, it was possible for there to be schema validation failures depending on which task was used to start the workflow and its particular schema.

    In order to rectify this situation, CUMULUS-2388 was filed and addressed to create a common granule files schema between nearly all of the Cumulus tasks (exceptions discussed below) and the Cumulus database. The following documentation explains the manual changes you need to make to your deployment in order to be compatible with the updated files schema.

    Updated files schema

    The updated granule files schema can be found here.

    These former properties were deprecated (with notes about how to derive the same information from the updated schema, if possible):

    • filename - concatenate the bucket and key values with a directory separator (/)
    • name - use fileName property
    • etag - ETags are no longer provided as an individual file property. Instead, a separate etags object mapping S3 URIs to ETag values is provided as output from the following workflow tasks (guidance on how to integrate this output with your workflows is provided in the Upgrading your workflows section below):
      • update-granules-cmr-metadata-file-links
      • hyrax-metadata-updates
    • fileStagingDir - no longer supported
    • url_path - no longer supported
    • duplicate_found - This property is no longer supported, however sync-granule and move-granules now produce a separate granuleDuplicates object as part of their output. The granuleDuplicates object is a map of granules by granule ID which includes the files that encountered duplicates during processing. Guidance on how to integrate granuleDuplicates information into your workflow configuration is provided below.

    Exceptions

    These workflow tasks did not have their schema for granule files updated:

    • discover-granules - no updates
    • queue-granules - no updates
    • parse-pdr - no updates
    • sync-granule - input schema not updated, output schema was updated

    The reason that these task schemas were not updated is that all of these tasks start before the files have been ingested to S3, thus much of the information that is required in the updated files schema like bucket, key, or checksum is not yet known.

    Bulk granule operations

    Since the input schema for the above tasks was not updated, that means you cannot run bulk granule operations against workflows if they start with any of those tasks. Bulk granule operations work by loading the specified granules from the database and sending them as input to a specified workflow, so if the specified workflow begins with a task whose input schema does not conform to what is coming out of the database, there will be schema errors.

    Upgrading your deployment

    Upgrading your workflows

    For any workflows using the update-granules-cmr-metadata-file-links task before the hyrax-metadata-updates and/or post-to-cmr tasks, update the step definition for update-granules-cmr-metadata-file-links as follows:

        "UpdateGranulesCmrMetadataFileLinksStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    hyrax-metadata-updates

    For any workflows using the hyrax-metadata-updates task before a post-to-cmr task, update the definition of the hyrax-metadata-updates step as follows:

        "HyraxMetadataUpdatesTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    post-to-cmr

    For any workflows using post-to-cmr task after the update-granules-cmr-metadata-file-links or hyrax-metadata-updates tasks, update the post-to-cmr step definition as follows:

        "CmrStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}"
    }
    }
    },
    ...more configuration...

    Example workflow

    For an example workflow integrating all of these changes, please see our example ingest and publish workflow.

    Optional - Integrate granuleDuplicates information

    Please note that the granuleDuplicates output is purely informational and does not have any bearing on the separate configuration for how duplicates should be handled.

    You can include granuleDuplicates output from the sync-granule or move-granules tasks in your workflow messages like so:

        "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    ...other config...
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granuleDuplicates}",
    "destination": "{$.meta.sync_granule.granule_duplicates}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    }
    ...more configuration...

    The result of this configuration is that the granuleDuplicates output from sync-granule would be placed in meta.sync_granule.granule_duplicates on the workflow message and remain there throughout the rest of the workflow. The same configuration could be replicated for the move-granules task, but be sure to use a different destination in the workflow message for the granuleDuplicates output .

    Updating collection URL path templates

    Collections can specify url_path templates to dynamically generate the final location of files. As part of url_path templates, file object properties can be interpolated to generate the file path. Thus, these url_path templates need to be updated to ensure that they are compatible with the updated files schema and the properties that will actually be available on file objects.

    See the notes on the updated files schema to know which properties are available and which previously existing properties were deprecated.

    As an example, you will want to update any url_path properties in your collections to remove references to file.name and replace them with references to file.fileName like so:

    - "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.name, 0, 3)}",
    + "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.fileName, 0, 3)}",
    - + \ No newline at end of file diff --git a/docs/v13.4.0/upgrade-notes/upgrade-rds/index.html b/docs/v13.4.0/upgrade-notes/upgrade-rds/index.html index cd76ca53005..879ea8d874c 100644 --- a/docs/v13.4.0/upgrade-notes/upgrade-rds/index.html +++ b/docs/v13.4.0/upgrade-notes/upgrade-rds/index.html @@ -5,7 +5,7 @@ Upgrade to RDS release | Cumulus Documentation - + @@ -21,7 +21,7 @@ | cutoffSeconds | number | Number of seconds prior to this execution to 'cutoff' reconciliation queries. This allows in-progress/other in-flight operations time to complete and propagate to Elasticsearch/Dynamo/postgres. | 3600 | | dbConcurrency | number | Sets max number of parallel collections reports the script will run at a time. | 20 | | dbMaxPool | number | Sets the maximum number of connections the database pool has available. Modifying this may result in unexpected failures. | 20 |

    - + \ No newline at end of file diff --git a/docs/v13.4.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html b/docs/v13.4.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html index 62b9aebb296..cc125186009 100644 --- a/docs/v13.4.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html +++ b/docs/v13.4.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html @@ -5,13 +5,13 @@ Upgrade to TF version 0.13.6 | Cumulus Documentation - +
    Version: v13.4.0

    Upgrade to TF version 0.13.6

    Background

    Cumulus pins its support to a specific version of Terraform see: deployment documentation. The reason for only supporting one specific Terraform version at a time is to avoid deployment errors than can be caused by deploying to the same target with different Terraform versions.

    Cumulus is upgrading its supported version of Terraform from 0.12.12 to 0.13.6. This document contains instructions on how to perform the upgrade for your deployments.

    Prerequisites

    • Follow the Terraform guidance for what to do before upgrading, notably ensuring that you have no pending changes to your Cumulus deployments before proceeding.
      • You should do a terraform plan to see if you have any pending changes for your deployment (for both the data-persistence-tf and cumulus-tf modules), and if so, run a terraform apply before doing the upgrade to Terraform 0.13.6
    • Review the Terraform v0.13 release notes to prepare for any breaking changes that may affect your custom deployment code. Cumulus' deployment code has already been updated for compatibility with version 0.13.
    • Install Terraform version 0.13.6. We recommend using Terraform Version Manager tfenv to manage your installed versons of Terraform, but this is not required.

    Upgrade your deployment code

    Terraform 0.13 does not support some of the syntax from previous Terraform versions, so you need to upgrade your deployment code for compatibility.

    Terraform provides a 0.13upgrade command as part of version 0.13 to handle automatically upgrading your code. Make sure to check out the documentation on batch usage of 0.13upgrade, which will allow you to upgrade all of your Terraform code with one command.

    Run the 0.13upgrade command until you have no more necessary updates to your deployment code.

    Upgrade your deployment

    1. Ensure that you are running Terraform 0.13.6 by running terraform --version. If you are using tfenv, you can switch versions by running tfenv use 0.13.6.

    2. For the data-persistence-tf and cumulus-tf directories, take the following steps:

      1. Run terraform init --reconfigure. The --reconfigure flag is required, otherwise you might see an error like:

        Error: Failed to decode current backend config

        The backend configuration created by the most recent run of "terraform init"
        could not be decoded: unsupported attribute "lock_table". The configuration
        may have been initialized by an earlier version that used an incompatible
        configuration structure. Run "terraform init -reconfigure" to force
        re-initialization of the backend.
      2. Run terraform apply to perform a deployment.

        WARNING: Even if Terraform says that no resource changes are pending, running the apply using Terraform version 0.13.6 will modify your backend state from version 0.12.12 to version 0.13.6 without requiring approval. Updating the backend state is a necessary part of the version 0.13.6 upgrade, but it is not completely transparent.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/workflow_tasks/discover_granules/index.html b/docs/v13.4.0/workflow_tasks/discover_granules/index.html index b56d4715192..f3636fff9dc 100644 --- a/docs/v13.4.0/workflow_tasks/discover_granules/index.html +++ b/docs/v13.4.0/workflow_tasks/discover_granules/index.html @@ -5,7 +5,7 @@ Discover Granules | Cumulus Documentation - + @@ -21,7 +21,7 @@ included in a granule's file list. That is, no such filtering based on filename occurs as described above.

    When set on the task configuration, the value applies to all collections during discovery. Otherwise, this property may be set on individual collections.

    Concurrency

    A number property that determines the level of concurrency with which granule duplicate checks are performed when duplicateGranuleHandling is skip or error.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when discover-granules discovers a large number of granules with skip or error duplicate handling. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the discover-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/workflow_tasks/files_to_granules/index.html b/docs/v13.4.0/workflow_tasks/files_to_granules/index.html index cabb755f4e2..82a82ca134f 100644 --- a/docs/v13.4.0/workflow_tasks/files_to_granules/index.html +++ b/docs/v13.4.0/workflow_tasks/files_to_granules/index.html @@ -5,13 +5,13 @@ Files To Granules | Cumulus Documentation - +
    Version: v13.4.0

    Files To Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming config.inputGranules and the task input list of s3 URIs along with the rest of the configuration objects to take the list of incoming files and sort them into a list of granule objects.

    Please note Files passed in without metadata defined previously for config.inputGranules will be added with the following keys:

    • size
    • bucket
    • key
    • fileName

    It is primarily intended to support compatibility with the standard output of a processing task, and convert that output into a granule object accepted as input by the majority of other Cumulus tasks.

    Task Inputs

    Input

    This task expects an incoming input that contains an array of 'staged' S3 URIs to move to their final archive location.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    inputGranules

    An array of Cumulus granule objects.

    This object will be used to define metadata values for the move granules task, and is the basis for the updated object that will be added to the output.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/workflow_tasks/lzards_backup/index.html b/docs/v13.4.0/workflow_tasks/lzards_backup/index.html index e7c231ae04d..c44833f27db 100644 --- a/docs/v13.4.0/workflow_tasks/lzards_backup/index.html +++ b/docs/v13.4.0/workflow_tasks/lzards_backup/index.html @@ -5,13 +5,13 @@ LZARDS Backup | Cumulus Documentation - +
    Version: v13.4.0

    LZARDS Backup

    The LZARDS backup task takes an array of granules and initiates backup requests to the LZARDS API, which will be handled asynchronously by LZARDS.

    Deployment

    The LZARDS backup task is not automatically deployed with Cumulus. To deploy the task through the Cumulus module, first you must specify a lzards_launchpad_passphrase in your terraform variables (e.g. variables.tf) like so:

    variable "lzards_launchpad_passphrase" {
    type = string
    default = ""
    }

    Then you can specify a value for your lzards_launchpad_passphrase in terraform.tfvars like so:

    lzards_launchpad_passphrase = your-passphrase

    Lastly, you need to make sure that the lzards_launchpad_passphrase is passed into the Cumulus module (in main.tf) like so:

    lzards_launchpad_passphrase  = var.lzards_launchpad_passphrase

    In short, deploying the LZARDS task requires configuring a passphrase variable and ensuring that your TF configuration passes that variable into the Cumulus module.

    Additional terraform configuration for the LZARDS task can be found in the cumulus module's variables.tf file, where the the relevant variables are prefixed with lzards_. You can add these variables to your deployment using the same process outlined above for lzards_launchpad_passphrase.

    Task Inputs

    Input

    This task expects an array of granules as input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Task Outputs

    Output

    The LZARDS task outputs a composite object containing:

    • the input granules array, and
    • a backupResults object that describes the results of LZARDS backup attempts.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/workflow_tasks/move_granules/index.html b/docs/v13.4.0/workflow_tasks/move_granules/index.html index 6ee7616fd56..633fa80d341 100644 --- a/docs/v13.4.0/workflow_tasks/move_granules/index.html +++ b/docs/v13.4.0/workflow_tasks/move_granules/index.html @@ -5,13 +5,13 @@ Move Granules | Cumulus Documentation - +
    Version: v13.4.0

    Move Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming event.input array of Cumulus granule objects to do the following:

    • Move granules from their 'staging' location to the final location (as configured in the Sync Granules task)

    • Update the event.input object with the new file locations.

    • If the granule has a ECHO10/UMM CMR file(.cmr.xml or .cmr.json) file included in the event.input:

      • Update that file's access locations

      • Add it to the appropriate access URL category for the CMR filetype as defined by granule CNM filetype.

      • Set the CMR file to 'metadata' in the output granules object and add it to the granule files if it's not already present.

        Please note: Granules without a valid CNM type set in the granule file type field in event.input will be treated as "data" in the updated CMR metadata file

    • Task then outputs an updated list of granule objects.

    Task Inputs

    Input

    This task expects an incoming input that contains a list of 'staged' S3 URIs to move to their final archive location. If CMR metadata is to be updated for a granule, it must also be included in the input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects event.input to provide an array of Cumulus granule objects. The files listed for each granule represent the files to be acted upon as described in summary.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects with post-move file locations as the payload for the next task, and returns only the expected payload for the next task. If a CMR file has been specified for a granule object, the CMR resources related to the granule files will be updated according to the updated granule file metadata.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v13.4.0/workflow_tasks/parse_pdr/index.html b/docs/v13.4.0/workflow_tasks/parse_pdr/index.html index d083bddbbb4..e0f699a79ff 100644 --- a/docs/v13.4.0/workflow_tasks/parse_pdr/index.html +++ b/docs/v13.4.0/workflow_tasks/parse_pdr/index.html @@ -5,13 +5,13 @@ Parse PDR | Cumulus Documentation - +
    Version: v13.4.0

    Parse PDR

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to do the following with the incoming PDR object:

    • Stage it to an internal S3 bucket

    • Parse the PDR

    • Archive the PDR and remove the staged file if successful

    • Outputs a payload object containing metadata about the parsed PDR (e.g. total size of all files, files counts, etc) and a granules object

    The constructed granules object is created using PDR metadata to determine values like data type and version, collection definitions to determine a file storage location based on the extracted data type and version number.

    Granule file types are converted from the PDR spec types to CNM types according to the following translation table:

      HDF: 'data',
    HDF-EOS: 'data',
    SCIENCE: 'data',
    BROWSE: 'browse',
    METADATA: 'metadata',
    BROWSE_METADATA: 'metadata',
    QA_METADATA: 'metadata',
    PRODHIST: 'qa',
    QA: 'metadata',
    TGZ: 'data',
    LINKAGE: 'data'

    Files missing file types will have none assigned, files with invalid types will result in a PDR parse failure.

    Task Inputs

    Input

    This task expects an incoming input that contains name and path information about the PDR to be parsed. For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    Provider

    A Cumulus provider object. Used to define connection information for retrieving the PDR.

    Bucket

    Defines the bucket where the 'pdrs' folder for parsed PDRs will be stored.

    Collection

    A Cumulus collection object. Used to define granule file groupings and granule metadata for discovered files.

    Task Outputs

    This task outputs a single payload output object containing metadata about the parsed PDR (e.g. filesCount, totalSize, etc), a pdr object with information for later steps and a the generated array of granule objects.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v13.4.0/workflow_tasks/queue_granules/index.html b/docs/v13.4.0/workflow_tasks/queue_granules/index.html index bed4f398e99..c0ddcc4d53d 100644 --- a/docs/v13.4.0/workflow_tasks/queue_granules/index.html +++ b/docs/v13.4.0/workflow_tasks/queue_granules/index.html @@ -5,14 +5,14 @@ Queue Granules | Cumulus Documentation - +
    Version: v13.4.0

    Queue Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions, and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to schedule ingest of granules that were discovered on a remote host, whether via the DiscoverGranules task or the ParsePDR task.

    The task utilizes a defined collection in concert with a defined provider, either on each granule, or passed in via config to queue up ingest executions for each granule, or for batches of granules.

    The constructed granules object is defined by the collection passed in the configuration, and has impacts to other provided core Cumulus Tasks.

    Users of this task in a workflow are encouraged to carefully consider their configuration in context of downstream tasks and workflows.

    Task Inputs

    Each of the following sections are a high-level discussion of the intent of the various input/output/config values.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects an incoming input that contains granules and information about them and their files. For the specifics, see the Cumulus Tasks page entry for the schema.

    This input is most commonly the output from a preceding DiscoverGranules or ParsePDR task.

    Cumulus Configuration

    This task does expect values to be set in the task_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    provider

    A Cumulus provider object for the originating provider. Will be passed along to the ingest workflow. This will be overruled by more specific provider information that may exist on a granule.

    internalBucket

    The Cumulus internal system bucket.

    granuleIngestWorkflow

    A string property that denotes the name of the ingest workflow into which granules should be queued.

    queueUrl

    A string property that denotes the URL of the queue to which scheduled execution messages are sent.

    preferredQueueBatchSize

    A number property that sets an upper bound on the size of each batch of granules queued into the payload of an ingest execution. Setting this property to a value higher than 1 allows queueing of multiple granules per ingest workflow.

    As ingest executions typically expect granules in the payload to have a common collection and common provider, this property only sets an upper bound within which batches will be created based on common collection and provider information.

    This means batches may be smaller than the preferred size if collection or provider information diverge, but never larger.

    The default value if none is specified is 1, which will queue one ingest execution per granule.

    concurrency

    A number property that determines the level of concurrency with which ingest executions are scheduled. Granules or batches of granules will be queued up into executions at this level of concurrency.

    This property is also used to limit concurrency when updating granule status to queued.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when queue-granules receives a large number of granules as input. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the queue-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    executionNamePrefix

    A string property that will prefix the names of scheduled executions.

    childWorkflowMeta

    An object property that will be merged into the scheduled execution input's meta field.

    Task Outputs

    This task outputs an assembled array of workflow execution ARNs for all scheduled workflow executions within the payload's running object.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/workflows/cumulus-task-message-flow/index.html b/docs/v13.4.0/workflows/cumulus-task-message-flow/index.html index 83ecb01e1d2..dbd745e23d2 100644 --- a/docs/v13.4.0/workflows/cumulus-task-message-flow/index.html +++ b/docs/v13.4.0/workflows/cumulus-task-message-flow/index.html @@ -5,14 +5,14 @@ Cumulus Tasks: Message Flow | Cumulus Documentation - +
    Version: v13.4.0

    Cumulus Tasks: Message Flow

    Cumulus Tasks comprise Cumulus Workflows and are either AWS Lambda tasks or AWS Elastic Container Service (ECS) activities. Cumulus Tasks permit a payload as input to the main task application code. The task payload is additionally wrapped by the Cumulus Message Adapter. The Cumulus Message Adapter supplies additional information supporting message templating and metadata management of these workflows.

    Diagram showing how incoming and outgoing Cumulus messages for workflow steps are handled by the Cumulus Message Adapter

    The steps in this flow are detailed in sections below.

    Cumulus Message Format

    A full Cumulus Message has the following keys:

    • cumulus_meta: System runtime information that should generally not be touched outside of Cumulus library code or the Cumulus Message Adapter. Stores meta information about the workflow such as the state machine name and the current workflow execution's name. This information is used to look up the current active task. The name of the current active task is used to look up the corresponding task's config in task_config.
    • meta: Runtime information captured by the workflow operators. Stores execution-agnostic variables.
    • payload: Payload is runtime information for the tasks.

    In addition to the above keys, it may contain the following keys:

    • replace: A key generated in conjunction with the Cumulus Message adapter. It contains the location on S3 for a message payload and a Target JSON path in the message to extract it to.
    • exception: A key used to track workflow exceptions, should not be modified outside of Cumulus library code.

    Here's a simple example of a Cumulus Message:

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    A message utilizing the Cumulus Remote message functionality must have at least the keys replace and cumulus_meta. Depending on configuration other portions of the message may be present, however the cumulus_meta, meta, and payload keys must be present once extraction is complete.

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    Cumulus Message Preparation

    The event coming into a Cumulus Task is assumed to be a Cumulus Message and should first be handled by the functions described below before being passed to the task application code.

    Preparation Step 1: Fetch remote event

    Fetch remote event will fetch the full event from S3 if the cumulus message includes a replace key.

    Once "my-large-event.json" is fetched from S3, it's returned from the fetch remote event function. If no "replace" key is present, the event passed to the fetch remote event function is assumed to be a complete Cumulus Message and returned as-is.

    Preparation Step 2: Parse step function config from CMA configuration parameters

    This step determines what current task is being executed. Note this is different from what lambda or activity is being executed, because the same lambda or activity can be used for different tasks. The current task name is used to load the appropriate configuration from the Cumulus Message's 'task_config' configuration parameter.

    Preparation Step 3: Load nested event

    Using the config returned from the previous step, load nested event resolves templates for the final config and input to send to the task's application code.

    Task Application Code

    After message prep, the message passed to the task application code is of the form:

    {
    "input": {},
    "config": {}
    }

    Create Next Message functions

    Whatever comes out of the task application code is used to construct an outgoing Cumulus Message.

    Create Next Message Step 1: Assign outputs

    The config loaded from the Fetch step function config step may have a cumulus_message key. This can be used to "dispatch" fields from the task's application output to a destination in the final event output (via URL templating). Here's an example where the value of input.anykey would be dispatched as the value of payload.out in the final cumulus message:

    {
    "task_config": {
    "bar": "baz",
    "cumulus_message": {
    "input": "{$.payload.input}",
    "outputs": [
    {
    "source": "{$.input.anykey}",
    "destination": "{$.payload.out}"
    }
    ]
    }
    },
    "cumulus_meta": {
    "task": "Example",
    "message_source": "local",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "input": {
    "anykey": "anyvalue"
    }
    }
    }

    Create Next Message Step 2: Store remote event

    If the ReplaceConfiguration parameter is set, the configured key's value will be stored in S3 and the final output of the task will include a replace key that contains configuration for a future step to extract the payload on S3 back into the Cumulus Message. The replace key identifies where the large event node has been stored in S3.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/workflows/developing-a-cumulus-workflow/index.html b/docs/v13.4.0/workflows/developing-a-cumulus-workflow/index.html index 2734b1514bd..9ca73774a3b 100644 --- a/docs/v13.4.0/workflows/developing-a-cumulus-workflow/index.html +++ b/docs/v13.4.0/workflows/developing-a-cumulus-workflow/index.html @@ -5,13 +5,13 @@ Creating a Cumulus Workflow | Cumulus Documentation - +
    Version: v13.4.0

    Creating a Cumulus Workflow

    The Cumulus workflow module

    To facilitate adding a workflows to your deployment Cumulus provides a workflow module.

    In combination with the Cumulus message, the workflow module provides a way to easily turn a Step Function definition into a Cumulus workflow, complete with:

    Using the module also ensures that your workflows will continue to be compatible with future versions of Cumulus.

    For more on the full set of current available options for the module, please consult the module README.

    Adding a new Cumulus workflow to your deployment

    To add a new Cumulus workflow to your deployment that is using the cumulus module, add a new workflow resource to your deployment directory, either in a new .tf file, or to an existing file.

    The workflow should follow a syntax similar to:

    module "my_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/vx.x.x/terraform-aws-cumulus-workflow.zip"

    prefix = "my-prefix"
    name = "MyWorkflowName"
    system_bucket = "my-internal-bucket"

    workflow_config = module.cumulus.workflow_config

    tags = { Deployment = var.prefix }

    state_machine_definition = <<JSON
    {}
    JSON
    }

    In the above example, you would add your state_machine_definition using the Amazon States Language, using tasks you've developed and Cumulus core tasks that are made available as part of the cumulus terraform module.

    Please note: Cumulus follows the convention of tagging resources with the prefix variable { Deployment = var.prefix } that you pass to the cumulus module. For resources defined outside of Core, it's recommended that you adopt this convention as it makes resources and/or deployment recovery scenarios much easier to manage.

    Examples

    For a functional example of a basic workflow, please take a look at the hello_world_workflow.

    For more complete/advanced examples, please read the following cookbook entries/topics:

    - + \ No newline at end of file diff --git a/docs/v13.4.0/workflows/developing-workflow-tasks/index.html b/docs/v13.4.0/workflows/developing-workflow-tasks/index.html index 4b1635e4c1f..c4bd66288e3 100644 --- a/docs/v13.4.0/workflows/developing-workflow-tasks/index.html +++ b/docs/v13.4.0/workflows/developing-workflow-tasks/index.html @@ -5,13 +5,13 @@ Developing Workflow Tasks | Cumulus Documentation - +
    Version: v13.4.0

    Developing Workflow Tasks

    Workflow tasks can be either AWS Lambda Functions or ECS Activities.

    Lambda functions

    The full set of available core Lambda functions can be found in the deployed cumulus module zipfile at /tasks, as well as reference documentation here. These Lambdas can be referenced in workflows via the outputs from that module (see the cumulus-template-deploy repo for an example).

    The tasks source is located in the Cumulus repository at cumulus/tasks.

    You can also develop your own Lambda function. See the Lambda Functions page to learn more.

    ECS Activities

    ECS activities are supported via the cumulus_ecs_module available from the Cumulus release page.

    Please read the module README for configuration details.

    For assistance in creating a task definition within the module read the AWS Task Definition Docs.

    For a step-by-step example of using the cumulus_ecs_module, please see the related cookbook entry.

    Cumulus Docker Image

    ECS activities require a docker image. Cumulus provides a docker image (source for node 12x+ lambdas on dockerhub: cumuluss/cumulus-ecs-task.

    Alternate Docker Images

    Custom docker images/runtimes are supported as are private registries. For details on configuring a private registry/image see the AWS documentation on Private Registry Authentication for Tasks.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/workflows/docker/index.html b/docs/v13.4.0/workflows/docker/index.html index 27cfb9e69a7..2b560c91cb5 100644 --- a/docs/v13.4.0/workflows/docker/index.html +++ b/docs/v13.4.0/workflows/docker/index.html @@ -5,7 +5,7 @@ Dockerizing Data Processing | Cumulus Documentation - + @@ -14,7 +14,7 @@ 2) validate the output (in this case just check for existence) 3) use 'ncatted' to update the resulting file to be CF-compliant 4) write out metadata generated for this file

    Process Testing

    It is important to have tests for data processing, however in many cases datafiles can be large so it is not practical to store the test data in the repository. Instead, test data is currently stored on AWS S3, and can be retrieved using the AWS CLI.

    aws s3 sync s3://cumulus-ghrc-logs/sample-data/collection-name data

    Where collection-name is the name of the data collection, such as 'avaps', or 'cpl'. For example, an abridged version of the data for CPL includes:

    ├── cpl
    │   ├── input
    │   │   ├── HS3_CPL_ATB_12203a_20120906.hdf5
    │   │   ├── HS3_CPL_OP_12203a_20120906.hdf5
    │   └── output
    │   ├── HS3_CPL_ATB_12203a_20120906.nc
    │   ├── HS3_CPL_ATB_12203a_20120906.nc.meta.xml
    │   ├── HS3_CPL_OP_12203a_20120906.nc
    │   ├── HS3_CPL_OP_12203a_20120906.nc.meta.xml

    Contained in the input directory are all possible sets of data files, while the output directory is the expected result of processing. In this case the hdf5 files are converted to NetCDF files and XML metadata files are generated.

    The docker image for a process can be used on the retrieved test data. First create a test-output directory in the newly created data directory.

    mkdir data/test-output

    Then run the docker image using docker-compose.

    docker-compose run test

    This will process the data in the data/input directory and put the output into data/test-output. Repositories also include Python based tests which will validate this newly created output to the contents of data/output. Use Python's Nose tool to run the included tests.

    nosetests

    If the data/test-output directory validated against the contents of data/output the tests will be successful, otherwise an error will be reported.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/workflows/index.html b/docs/v13.4.0/workflows/index.html index 06c8f837103..2a6155e05d9 100644 --- a/docs/v13.4.0/workflows/index.html +++ b/docs/v13.4.0/workflows/index.html @@ -5,13 +5,13 @@ Workflows | Cumulus Documentation - +
    Version: v13.4.0

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    Provider data ingest and GIBS have a set of common needs in getting data from a source system and into the cloud where they can be distributed to end users. These common needs are:

    • Data Discovery - Crawling, polling, or detecting changes from a variety of sources.
    • Data Transformation - Taking data files in their original format and extracting and transforming them into another desired format such as visible browse images.
    • Archival - Storage of the files in a location that's accessible to end users.

    The high level view of the architecture and many of the individual steps are the same but the details of ingesting each type of collection differs. Different collection types and different providers have different needs. The individual boxes of a workflow are not only different. The branching, error handling, and multiplicity of the arrows connecting the boxes are also different. Some need visible images rendered from component data files from multiple collections. Some need to contact the CMR with updated metadata. Some will have different retry strategies to handle availability issues with source data systems.

    AWS and other cloud vendors provide an ideal solution for parts of these problems but there needs to be a higher level solution to allow the composition of AWS components into a full featured solution. The Ingest Workflow Architecture is designed to meet the needs for Earth Science data ingest and transformation.

    Goals

    Flexibility and Composability

    The steps to ingest and process data is different for each collection within a provider. Ingest should be as flexible as possible in the rearranging of steps and configuration.

    We want to use lego-like individual steps that can be composed by an operator.

    Individual steps should ...

    • Be as ignorant as possible of the overall flow. They should not be aware of previous steps.
    • Be runnable on their own.
    • Define their input and output in simple data structures.
    • Be domain agnostic.
    • Not make assumptions of specifics of what goes into a granule for example.

    Scalable

    The ingest architecture needs to be scalable both to handle ingesting hundreds of millions of granules and interpret dozens of different workflows.

    Data Provenance

    • We should have traceability for how data was produced and where it comes from.
    • Use immutable representations of data. Data once received is not overwritten. Data can be removed for cleanup.
    • All software is versioned. We can trace transformation of data by tracking the immutable source data and the versioned software applied to it.

    Operator Visibility and Control

    • Operators should be able to see and understand everything that is happening in the system.
    • It should be obvious why things are happening and straightforward to diagnose problems.
    • We generally assume that the operators know best in terms of the limits on a providers infrastructure, how often things need to be done, and details of a collection. The architecture should defer to their decisions and knowledge while providing safety nets to prevent problems.

    A Reconfigurable Workflow Architecture

    The Ingest Workflow Architecture is defined by two entity types, Workflows and Tasks. A Workflow is a set of composed Tasks to complete an objective such as ingesting a granule. Tasks are the individual steps of a Workflow that perform one job. The workflow is responsible for executing the right task based on the current state and response from the last task executed. Tasks are completely decoupled in that they don't call each other or even need to know about the presence of other tasks.

    Workflows and tasks are configured as Terraform resources, which are triggered via configured rules within Cumulus.

    Diagram showing the Step Function execution path through workflow tasks for a collection ingest

    See the Example GIBS Ingest Architecture showing how workflows and tasks are used to define the GIBS Ingest Architecture.

    Workflows

    A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions.

    Benefits of AWS Step Functions

    AWS Step functions are described in detail in the AWS documentation but they provide several benefits which are applicable to AWS.

    • Prebuilt solution
    • Operations Visibility
      • Visual diagram
      • Every execution is recorded with both inputs and output for every step.
    • Composability
      • Allow composing AWS Lambdas and code running in other steps. Code can be run in EC2 to interface with it or even on premise if desired.
      • Step functions allow specifying when steps run in parallel or choices between steps based on data from the previous step.
    • Flexibility
      • Step functions are designed to be easy to build new applications and reconfigure. We're exposing that flexibility directly to the provider.
    • Reliability and Error Handling
      • Step functions allow configuration of retries and adding handling of error conditions.
    • Described via data
      • This makes it easy to save the step function in configuration management solutions.
      • We can build simple interfaces on top of the flexibility provided.

    Workflow Scheduler

    The scheduler is responsible for initiating a step function and passing in the relevant data for a collection. This is currently configured as an interval for each collection. The scheduler service creates the initial event by combining the collection configuration with the AWS execution context defined via the cumulus terraform module.

    Tasks

    A workflow is composed of tasks. Each task is responsible for performing a discrete step of the ingest process. These can be activities like:

    • Crawling a provider website for new data.
    • Uploading data from a provider to S3.
    • Executing a process to transform data.

    AWS Step Functions permit tasks to be code running anywhere, even on premise. We expect most tasks will be written as Lambda functions in order to take advantage of the easy deployment, scalability, and cost benefits provided by AWS Lambda.

    • Leverages Existing Work
      • The design leverages the existing work of Amazon by defining workflows using the AWS Step Function State Language. This is the language that was created for describing the state machines used in AWS Step Functions.
    • Open for Extension
      • Both meta and task_config which are used for configuring at the collection and task levels do not dictate the fields and structure of the configuration. Additional task specific JSON schemas can be used for extending the validation of individual steps.
    • Data-centric Configuration
      • The use of a single JSON configuration file allows this to be added to a workflow. We build additional support on top of the configuration file for simpler domain specific configuration or interactive GUIs.

    For more details on Task Messages and Configuration, visit Cumulus configuration and message protocol documentation.

    Ingest Deploy

    To view deployment documentation, please see the Cumulus deployment documentation.

    Tradeoffs, and Benefits

    This section documents various tradeoffs and benefits of the Ingest Workflow Architecture.

    Tradeoffs

    Workflow execution is handled completely by AWS

    This means we can't add our own code into the orchestration of the workflow. We can't add new features not supported by Step Functions. We can't do things like enforce that the responses from tasks always conform to a schema or extract the configuration for a task ahead of it's execution.

    If we implemented our own orchestration we'd be able to add all of these. We save significant amounts of development effort and gain all the features of Step Functions for this trade off. One workaround is by providing a library of common task capabilities. These would optionally be available to tasks that can be implemented with Node.js and are able to include the library.

    Workflow Configuration is specified in AWS Step Function States Language

    The current design combines the states language defined by AWS with Ingest specific configuration. This means our representation has a tight coupling with their standard. If they make backwards incompatible changes in the future we will have to deal with existing projects written against that.

    We avoid having to develop our own standard and code to process it. The design can support new features in AWS Step Functions without needing to update the Ingest library code changes. It is unlikely they will make a backwards incompatible change at this point. One mitigation for this is writing data transformations to a new format if that were to happen.

    Collection Configuration Flexibility vs Complexity

    The Collections Configuration File is very flexible but requires more knowledge of AWS step functions to configure. A person modifying this file directly would need to comfortable editing a JSON file and configuring AWS Step Functions state transitions which address AWS resources.

    The configuration file itself is not necessarily meant to be edited by a human directly. Since we are developing a reconfigurable, composable architecture that specified entirely in data additional tools can be developed on top of it. The existing recipes.json files can be mapped to this format. Operational Tools like a GUI can be built that provide a usable interface for customizing workflows but it will take time to develop these tools.

    Benefits

    This section describes benefits of the Ingest Workflow Architecture.

    Simplicity

    The concepts of Workflows and Tasks are simple ones that should make sense to providers. Additionally, the implementation will only consist of a few components because the design leverages existing services and capabilities of AWS. The Ingest implementation will only consist of some reusable task code to make task implementation easier, Ingest deployment, and the Workflow Scheduler.

    Composability

    The design aims to satisfy the needs for ingest integrating different workflows for providers. It's flexible in terms of the ability to arrange tasks to meet the needs of a collection. Providers have developed and incorporated open source tools over the years. All of these are easily integrable into the workflows as tasks.

    There is low coupling between task steps. Failures of one component don't bring the whole system down. Individual tasks can be deployed separately.

    Scalability

    AWS Step Functions scale up as needed and aren't limited by a set of number of servers. They also easily allow you to leverage the inherent scalability of serverless functions.

    Monitoring and Auditing

    • Every execution is captured.
    • Every task run has captured input and outputs.
    • CloudWatch Metrics can be used for monitoring many of the events with the StepFunctions. It can also generate alarms for the whole process.
    • Visual report of the entire configuration.
      • Errors and success states are highlighted visually in the flow.

    Data Provenance

    • Monitoring and auditing ensures we know the data that was given to a task.
    • Workflows are versioned and the state machines stored in AWS Step Functions are immutable. Once created they cannot change.
    • Versioning of data in S3 or using immutable records in S3 will mean we always know what data was created as the result of a step or fed into a step.

    Appendix

    Example GIBS Ingest Architecture

    This shows the GIBS Ingest Architecture as an example of the use of the Ingest Workflow Architecture.

    • The GIBS Ingest Architecture consists of two workflows per collection type. There is one for discovery and one for ingest. The final stage of discovery triggers multiple ingest workflows for each MRF granule that needs to be generated.
    • It demonstrates both lambdas as tasks and a container used for MRF generation.

    GIBS Ingest Workflows

    Diagram showing the AWS Step Function execution path for a GIBS ingest workflow

    GIBS Ingest Granules Workflow

    This shows a visualization of an execution of the ingets granules workflow in step functions. The steps highlighted in green are the ones that executed and completed successfully.

    Diagram showing the AWS Step Function execution path for a GIBS ingest granules workflow

    - + \ No newline at end of file diff --git a/docs/v13.4.0/workflows/input_output/index.html b/docs/v13.4.0/workflows/input_output/index.html index 3a0b3cca769..9f70f705149 100644 --- a/docs/v13.4.0/workflows/input_output/index.html +++ b/docs/v13.4.0/workflows/input_output/index.html @@ -5,14 +5,14 @@ Workflow Inputs & Outputs | Cumulus Documentation - +
    Version: v13.4.0

    Workflow Inputs & Outputs

    General Structure

    Cumulus uses a common format for all inputs and outputs to workflows. The same format is used for input and output from workflow steps. The common format consists of a JSON object which holds all necessary information about the task execution and AWS environment. Tasks return objects identical in format to their input with the exception of a task-specific payload field. Tasks may also augment their execution metadata.

    Cumulus Message Adapter

    The Cumulus Message Adapter and Cumulus Message Adapter libraries help task developers integrate their tasks into a Cumulus workflow. These libraries adapt input and outputs from tasks into the Cumulus Message format. The Scheduler service creates the initial event message by combining the collection configuration, external resource configuration, workflow configuration, and deployment environment settings. The subsequent workflow messages between tasks must conform to the message schema. By using the Cumulus Message Adapter, individual task Lambda functions only receive the input and output specifically configured for the task, and not non-task-related message fields.

    The Cumulus Message Adapter libraries are called by the tasks with a callback function containing the business logic of the task as a parameter. They first adapt the incoming message to a format more easily consumable by Cumulus tasks, then invoke the task, and then adapt the task response back to the Cumulus message protocol to be sent to the next task.

    A task's Lambda function can be configured to include a Cumulus Message Adapter library which constructs input/output messages and resolves task configurations. The CMA can then be included in one of several ways:

    Lambda Layer

    In order to make use of this configuration, a Lambda layer must be uploaded to your account. Due to platform restrictions, Core cannot currently support sharable public layers, however you can deploy the appropriate version from the release page in two ways:

    Once you've deployed the layer, integrate the CMA layer with your Lambdas:

    • If using the cumulus module, set the cumulus_message_adapter_lambda_layer_version_arn in your .tfvars file to integrate the CMA layer with all core Cumulus lambdas.
    • If including your own Lambda or ECS task Terraform modules, specify the CMA layer ARN in the Terraform resource definitions. Also, make sure to set the CUMULUS_MESSAGE_ADAPTER_DIR environment variable for the task to /opt for the CMA integration to work properly.

    In the future if you wish to update/change the CMA version you will need to update the deployed CMA, and update the layer configuration for the impacted Lambdas as needed.

    Please Note: Updating/removing a layer does not change a deployed Lambda, so to update the CMA you should deploy a new version of the CMA layer, update the associated Lambda configuration to reference the new CMA version, and re-deploy your Lambdas.

    Manual Addition

    You can include the CMA package in the Lambda code in the cumulus-message-adapter sub-directory in your lambda .zip, for any Lambda runtime that includes a python runtime. python 2 is included in Lambda runtimes that use Amazon Linux, however Amazon Linux 2 will not support this directly.

    Please note: It is expected that upcoming Cumulus releases will update the CMA layer to include a python runtime.

    If you are manually adding the message adapter to your source and utilizing the CMA, you should set the Lambda's CUMULUS_MESSAGE_ADAPTER_DIR environment variable to target the installation path for the CMA.

    CMA Input/Output

    Input to the task application code is a json object with keys:

    • input: By default, the incoming payload is the payload output from the previous task, or it can be a portion of the payload as configured for the task in the corresponding .tf workflow definition file.
    • config: Task-specific configuration object with URL templates resolved.

    Output from the task application code is returned in and placed in the payload key by default, but the config key can also be used to return just a portion of the task output.

    CMA configuration

    As of Cumulus > 1.15 and CMA > v1.1.1, configuration of the CMA is expected to be driven by AWS Step Function Parameters.

    Using the CMA package with the Lambda by any of the above mentioned methods (Lambda Layers, manual) requires configuration for its various features via a specific Step Function Parameters configuration format (see sample workflows in the examples cumulus-tf source for more examples):

    {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": "{some config}",
    "task_config": "{some config}"
    }
    }

    The "event.$": "$" parameter is required as it passes the entire incoming message to the CMA client library for parsing, and the CMA itself to convert the incoming message into a Cumulus message for use in the function.

    The following are the CMA's current configuration settings:

    ReplaceConfig (Cumulus Remote Message)

    Because of the potential size of a Cumulus message, mainly the payload field, a task can be set via configuration to store a portion of its output on S3 with a message key Remote Message that defines how to retrieve it and an empty JSON object {} in its place. If the portion of the message targeted exceeds the configured MaxSize (defaults to 0 bytes) it will be written to S3.

    The CMA remote message functionality can be configured using parameters in several ways:

    Partial Message

    Setting the Path/Target path in the ReplaceConfig parameter (and optionally a non-default MaxSize)

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 1,
    "Path": "$.payload",
    "TargetPath": "$.payload"
    }
    }
    }
    }
    }

    will result in any payload output larger than the MaxSize (in bytes) to be written to S3. The CMA will then mark that the key has been replaced via a replace key on the event. When the CMA picks up the replace key in future steps, it will attempt to retrieve the output from S3 and write it back to payload.

    Note that you can optionally use a different TargetPath than Path, however as the target is a JSON path there must be a key to target for replacement in the output of that step. Also note that the JSON path specified must target one node, otherwise the CMA will error, as it does not support multiple replacement targets.

    If TargetPath is omitted, it will default to the value for Path.

    Full Message

    Setting the following parameters for a lambda:

    DiscoverGranules:
    Parameters:
    cma:
    event.$: '$'
    ReplaceConfig:
    FullMessage: true

    will result in the CMA assuming the entire inbound message should be stored to S3 if it exceeds the default max size.

    This is effectively the same as doing:

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 0,
    "Path": "$",
    "TargetPath": "$"
    }
    }
    }
    }
    }

    Cumulus Message example

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Cumulus Remote Message example

    The message may contain a reference to an S3 Bucket, Key and TargetPath as follows:

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    task_config

    This configuration key contains the input/output configuration values for definition of inputs/outputs via URL paths. Important: These values are all relative to json object configured for event.$.

    This configuration's behavior is outlined in the CMA step description below.

    The configuration should follow the format:

    {
    "FunctionName": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "other_cma_configuration": "<config object>",
    "task_config": "<task config>"
    }
    }
    }
    }

    Example:

    {
    "StepFunction": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "sfnEnd": true,
    "stack": "{$.meta.stack}",
    "bucket": "{$.meta.buckets.internal.name}",
    "stateMachine": "{$.cumulus_meta.state_machine}",
    "executionName": "{$.cumulus_meta.execution_name}",
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    }
    }
    }

    Cumulus Message Adapter Steps

    1. Reformat AWS Step Function message into Cumulus Message

    Due to the way AWS handles Parameterized messages, when Parameters are used the CMA takes an inbound message:

    {
    "resource": "arn:aws:lambda:us-east-1:<lambda arn values>",
    "input": {
    "Other Parameter": {},
    "cma": {
    "ConfigKey": {
    "config values": "some config values"
    },
    "event": {
    "cumulus_meta": {},
    "payload": {},
    "meta": {},
    "exception": {}
    }
    }
    }
    }

    and takes the following actions:

    • Takes the object at input.cma.event and makes it the full input
    • Merges all of the keys except event under input.cma into the parent input object

    This results in the incoming message (presumably a Cumulus message) with any cma configuration parameters merged in being passed to the CMA. All other parameterized values defined outside of the cma key are ignored

    2. Resolve Remote Messages

    If the incoming Cumulus message has a replace key value, the CMA will attempt to pull the payload from S3,

    For example, if the incoming contains the following:

      "meta": {
    "foo": {}
    },
    "replace": {
    "TargetPath": "$.meta.foo",
    "Bucket": "some_bucket",
    "Key": "events/some-event-id"
    }

    The CMA will attempt to pull the file stored at Bucket/Key and replace the value at TargetPath, then remove the replace object entirely and continue.

    3. Resolve URL templates in the task configuration

    In the workflow configuration (defined under the task_config key), each task has its own configuration, and it can use URL template as a value to achieve simplicity or for values only available at execution time. The Cumulus Message Adapter resolves the URL templates (relative to the event configuration key) and then passes message to next task. For example, given a task which has the following configuration:

    {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }
    }
    }
    }

    and and incoming message that contains:

    {
    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    }
    }

    The corresponding Cumulus Message would contain:

    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }

    The message sent to the task would be:

    "config" : {
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    },
    "inlinestr": "prefixbarsuffix",
    "array": ["bar"],
    "object": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    },
    "input": "{...}"

    URL template variables replace dotted paths inside curly brackets with their corresponding value. If the Cumulus Message Adapter cannot resolve a value, it will ignore the template, leaving it verbatim in the string. While seemingly complex, this allows significant decoupling of Tasks from one another and the data that drives them. Tasks are able to easily receive runtime configuration produced by previously run tasks and domain data.

    4. Resolve task input

    By default, the incoming payload is the payload from the previous task. The task can also be configured to use a portion of the payload its input message. For example, given a task specifies cma.task_config.cumulus_message.input:

        ExampleTask:
    Parameters:
    cma:
    event.$: '$'
    task_config:
    cumulus_message:
    input: '{$.payload.foo}'

    The task configuration in the message would be:

        {
    "task_config": {
    "cumulus_message": {
    "input": "{$.payload.foo}"
    }
    },
    "payload": {
    "foo": {
    "anykey": "anyvalue"
    }
    }
    }

    The Cumulus Message Adapter will resolve the task input, instead of sending the whole payload as task input, the task input would be:

        {
    "input" : {
    "anykey": "anyvalue"
    },
    "config": {...}
    }

    5. Resolve task output

    By default, the task's return value is the next payload. However, the workflow task configuration can specify a portion of the return value as the next payload, and can also augment values to other fields. Based on the task configuration under cma.task_config.cumulus_message.outputs, the Message Adapter uses a task's return value to output a message as configured by the task-specific config defined under cma.task_config. The Message Adapter dispatches a "source" to a "destination" as defined by URL templates stored in the task-specific cumulus_message.outputs. The value of the task's return value at the "source" URL is used to create or replace the value of the task's return value at the "destination" URL. For example, given a task specifies cumulus_message.output in its workflow configuration as follows:

    {
    "ExampleTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    }
    }
    }
    }
    }

    The corresponding Cumulus Message would be:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Given the response from the task is:

        {
    "output": {
    "anykey": "boo"
    }
    }

    The Cumulus Message Adapter would output the following Cumulus Message:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    6. Apply Remote Message Configuration

    If the ReplaceConfig configuration parameter is defined, the CMA will evaluate the configuration options provided, and if required write a portion of the Cumulus Message to S3, and add a replace key to the message for future steps to utilize.

    Please Note: the non user-modifiable field cumulus-meta will always be retained, regardless of the configuration.

    For example, if the output message (post output configuration) from a cumulus message looks like:

        {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    the resultant output would look like:

    {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "replace": {
    "TargetPath": "$",
    "Bucket": "some-internal-bucket",
    "Key": "events/some-event-id"
    }
    }

    Additional features

    Validate task input, output and configuration messages against the schemas provided

    The Cumulus Message Adapter has the capability to validate task input, output and configuration messages against their schemas. The default location of the schemas is the schemas folder in the top level of the task and the default filenames are input.json, output.json, and config.json. The task can also configure a different schema location. If no schema can be found, the Cumulus Message Adapter will not validate the messages.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/workflows/lambda/index.html b/docs/v13.4.0/workflows/lambda/index.html index d650dc0a207..fc1c0f495a6 100644 --- a/docs/v13.4.0/workflows/lambda/index.html +++ b/docs/v13.4.0/workflows/lambda/index.html @@ -5,13 +5,13 @@ Develop Lambda Functions | Cumulus Documentation - +
    Version: v13.4.0

    Develop Lambda Functions

    Develop a new Cumulus Lambda

    AWS provides great getting started guide for building Lambdas in the developer guide.

    Cumulus currently supports the following environments for Cumulus Message Adapter enabled functions:

    Additionally you may chose to include any of the other languages AWS supports as a resource with reduced feature support.

    Deploy a Lambda

    Node.js Lambda

    For a new Node.js Lambda, create a new function and add an aws_lambda_function resource to your Cumulus deployment (for examples, see the example in source example/lambdas.tf and ingest/lambda-functions.tf) as either a new .tf file, or added to an existing .tf file:

    resource "aws_lambda_function" "myfunction" {
    function_name = "${var.prefix}-function"
    filename = "/path/to/zip/lambda.zip"
    source_code_hash = filebase64sha256("/path/to/zip/lambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"

    vpc_config {
    subnet_ids = var.subnet_ids
    security_group_ids = var.security_group_ids
    }
    }

    Please note: This example contains the minimum set of required configuration.

    Make sure to include a vpc_config that matches the information you've provided the cumulus module if intending to integrate the lambda with a Cumulus deployment.

    Java Lambda

    Java Lambdas are created in much the same way as the Node.js example above.

    The source points to a folder with the compiled .class files and dependency libraries in the Lambda Java zip folder structure (details here), not an uber-jar.

    The deploy folder referenced here would contain a folder 'test_task/task/' which contains Task.class and TaskLogic.class as well as a lib folder containing dependency jars.

    Python Lambda

    Python Lambdas are created the same way as the Node.js example above.

    Cumulus Message Adapter

    For Lambdas wishing to utilize the Cumulus Message Adapter(CMA), you should define a layers key on your Lambda resource with the CMA you wish to include. See the input_output docs for more on how to create/use the CMA.

    Other Lambda Options

    Cumulus supports all of the options available to you via the aws_lambda_function Terraform resource. For more information on what's available, check out the Terraform resource docs.

    Cloudwatch log groups

    If you want to enable Cloudwatch logging for your Lambda resource, you'll need to add a aws_cloudwatch_log_group resource to your Lambda definition:

    resource "aws_cloudwatch_log_group" "myfunction_log_group" {
    name = "/aws/lambda/${aws_lambda_function.myfunction.function_name}"
    retention_in_days = 30
    tags = { Deployment = var.prefix }
    }
    - + \ No newline at end of file diff --git a/docs/v13.4.0/workflows/protocol/index.html b/docs/v13.4.0/workflows/protocol/index.html index 5af8f01e1c6..ac43a849afa 100644 --- a/docs/v13.4.0/workflows/protocol/index.html +++ b/docs/v13.4.0/workflows/protocol/index.html @@ -5,13 +5,13 @@ Workflow Protocol | Cumulus Documentation - +
    Version: v13.4.0

    Workflow Protocol

    Configuration and Message Use Diagram

    A diagram showing at which point in a workflow the Cumulus message is checked for conformity with the message schema and where the configuration is checked for conformity with the configuration schema

    • Configuration - The Cumulus workflow configuration defines everything needed to describe an instance of Cumulus.
    • Scheduler - This starts ingest of a collection on configured intervals.
    • Input to Step Functions - The Scheduler uses the Configuration as source data to construct the input to the Workflow.
    • AWS Step Functions - Run the workflows as kicked off by the scheduler or other processes.
    • Input to Task - The input for each task is a JSON document that conforms to the message schema.
    • Output from Task - The output of each task must conform to the message schemas as well and is used as the input for the subsequent task.
    - + \ No newline at end of file diff --git a/docs/v13.4.0/workflows/workflow-configuration-how-to/index.html b/docs/v13.4.0/workflows/workflow-configuration-how-to/index.html index c5f7b9cf802..f99a7469cd7 100644 --- a/docs/v13.4.0/workflows/workflow-configuration-how-to/index.html +++ b/docs/v13.4.0/workflows/workflow-configuration-how-to/index.html @@ -5,7 +5,7 @@ Workflow Configuration How To's | Cumulus Documentation - + @@ -24,7 +24,7 @@ To take a subset of any given metadata, use the option substring.

    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{substring(file.fileName, 0, 3)}"

    This example will populate to "MOD09GQ/MOD"

    In addition to substring, several datetime-specific functions are available, which can parse a datetime string in the metadata and extract a certain part of it:

    "url_path": "{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"

    or

     "url_path": "{dateFormat(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime, YYYY-MM-DD[T]HH[:]mm[:]ss)}"

    The following functions are implemented:

    • extractYear - returns the year, formatted as YYYY
    • extractMonth - returns the month, formatted as MM
    • extractDate - returns the day of the month, formatted as DD
    • extractHour - returns the hour in 24-hour format, with no leading zero
    • dateFormat - takes a second argument describing how to format the date, and passes the metadata date string and the format argument to moment().format()

    Note: the move-granules step needs to be in the workflow for this template to be populated and the file moved. This cmrMetadata or CMR granule XML needs to have been generated and stored on S3. From there any field could be retrieved and used for a url_path.

    Adding Metadata dates and times to the URL Path

    There are a number of options to pull dates from the CMR file metadata. With this metadata:

    <Granule>
    <Temporal>
    <RangeDateTime>
    <BeginningDateTime>2003-02-19T00:00:00Z</BeginningDateTime>
    <EndingDateTime>2003-02-19T23:59:59Z</EndingDateTime>
    </RangeDateTime>
    </Temporal>
    </Granule>

    The following examples of url_path could be used.

    {extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the year from the full date: 2003.

    {extractMonth(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the month: 2.

    {extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the day: 19.

    {extractHour(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the hour: 0.

    Different values can be combined to create the url_path. For example

    {
    "bucket": "sample-protected-bucket",
    "name": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)/extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"
    }

    The final file location for the above would be s3://sample-protected-bucket/MOD09GQ/2003/19/MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.

    - + \ No newline at end of file diff --git a/docs/v13.4.0/workflows/workflow-triggers/index.html b/docs/v13.4.0/workflows/workflow-triggers/index.html index 39b2722412f..ed2f58febd3 100644 --- a/docs/v13.4.0/workflows/workflow-triggers/index.html +++ b/docs/v13.4.0/workflows/workflow-triggers/index.html @@ -5,13 +5,13 @@ Workflow Triggers | Cumulus Documentation - +
    Version: v13.4.0

    Workflow Triggers

    For a workflow to run, it needs to be associated with a rule (see rule configuration). The rule configuration determines how and when a workflow execution is triggered. Rules can be triggered one time, on a schedule, or by new data written to a kinesis stream.

    There are three lambda functions in the API package responsible for scheduling and starting workflows: SF scheduler, message consumer, and SF starter. Each Cumulus instance comes with a Start SF SQS queue.

    The SF scheduler lambda puts a message onto the Start SF queue. This message is picked up the Start SF lambda and an execution is started with the body of the message as the input.

    When a one time rule is created, the schedule SF lambda is triggered. Rules that are not one time are associated with a CloudWatch event which will manage the trigger of the lambdas that trigger the workflows.

    For a scheduled rule, the Cloudwatch event is triggered on the given schedule which calls directly to the schedule SF lambda.

    For a kinesis rule, when data is added to the kinesis stream, the Cloudwatch event is triggered, which calls the message consumer lambda. The message consumer lambda parses the kinesis message and finds all of the rules associated with that message. For each rule (which corresponds to one workflow), the schedule SF lambda is triggered to queue a message to start the workflow.

    For an sns rule, when a message is published to the SNS topic, the message consumer receives the SNS message (JSON expected), parses it into an object, starts a new execution of the workflow associated with the rule and passes the object in the payload field of the Cumulus message.

    Diagram showing how workflows are scheduled via rules

    - + \ No newline at end of file diff --git a/docs/v14.1.0/adding-a-task/index.html b/docs/v14.1.0/adding-a-task/index.html index 94c73ac998c..f12560e4068 100644 --- a/docs/v14.1.0/adding-a-task/index.html +++ b/docs/v14.1.0/adding-a-task/index.html @@ -5,13 +5,13 @@ Contributing a Task | Cumulus Documentation - +
    Version: v14.1.0

    Contributing a Task

    We're tracking reusable Cumulus tasks in this list and, if you've got one you'd like to share with others, you can add it!

    Right now we're focused on tasks distributed via npm, but are open to including others. For now the script that pulls all the data for each package only supports npm.

    The tasks.md file is generated in the build process

    The tasks list in docs/tasks.md is generated from the list of task package names from the tasks folder.

    Do not edit the docs/tasks.md file directly.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/api/index.html b/docs/v14.1.0/api/index.html index 8286182844a..0c191396bc9 100644 --- a/docs/v14.1.0/api/index.html +++ b/docs/v14.1.0/api/index.html @@ -5,13 +5,13 @@ Cumulus API | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v14.1.0/architecture/index.html b/docs/v14.1.0/architecture/index.html index af5c54b2b64..978f22f69ec 100644 --- a/docs/v14.1.0/architecture/index.html +++ b/docs/v14.1.0/architecture/index.html @@ -5,14 +5,14 @@ Architecture | Cumulus Documentation - +
    Version: v14.1.0

    Architecture

    Architecture

    Below, find a diagram with the components that comprise an instance of Cumulus.

    Architecture diagram of a Cumulus deployment

    This diagram details all of the major architectural components of a Cumulus deployment.

    While the diagram can feel complex, it can easily be digested in several major components:

    Data Distribution

    End Users can access data via Cumulus's distribution submodule, which includes ASF's thin egress application, this provides authenticated data egress, temporary S3 links and other statistics features.

    End user exposure of Cumulus's holdings is expected to be provided by an external service.

    For NASA use, this is assumed to be CMR in this diagram.

    Data ingest

    Workflows

    The core of the ingest and processing capabilities in Cumulus is built into the deployed AWS Step Function workflows. Cumulus rules trigger workflows via either Cloud Watch rules, Kinesis streams, SNS topic, or SQS queue. The workflows then run with a configured Cumulus message, utilizing built-in processes to report status of granules, PDRs, executions, etc to the Data Persistence components.

    Workflows can optionally report granule metadata to CMR, and workflow steps can report metrics information to a shared SNS topic, which could be subscribed to for near real time granule, execution, and PDR status. This could be used for metrics reporting using an external ELK stack, for example.

    Data persistence

    Cumulus entity state data is stored in a set of PostgreSQL compatible database, and is exported to an Elasticsearch instance for non-authoritative querying/state data for the API and other applications that require more complex queries. Currently the entity state data is replicated in DynamoDB and this will be removed in a future release.

    Data discovery

    Discovering data for ingest is handled via workflow step components using Cumulus provider and collection configurations and various triggers. Data can be ingested from AWS S3, FTP, HTTPS and more.

    Database

    Cumulus utilizes a user-provided PostgreSQL database backend. For improved API search query efficiency Cumulus provides data replication to an Elasticsearch instance. For legacy reasons, Cumulus is currently also deploying a DynamoDB datastore, and writes are replicated in parallel with the PostgreSQL database writes. The DynamoDB replicated tables and parallel writes will be removed in future releases.

    PostgreSQL Database Schema Diagram

    ERD of the Cumulus Database

    Maintenance

    System maintenance personnel have access to manage ingest and various portions of Cumulus via an AWS API gateway, as well as the operator dashboard.

    Deployment Structure

    Cumulus is deployed via Terraform and is organized internally into two separate top-level modules, as well as several external modules.

    Cumulus

    The Cumulus module, which contains multiple internal submodules, deploys all of the Cumulus components that are not part of the Data Persistence portion of this diagram.

    Data persistence

    The data persistence module provides the Data Persistence portion of the diagram.

    Other modules

    Other modules are provided as artifacts on the release page for use in users configuring their own deployment and contain extracted subcomponents of the cumulus module. For more on these components see the components documentation.

    For more on the specific structure, examples of use and how to deploy and more, please see the deployment docs as well as the cumulus-template-deploy repo .

    - + \ No newline at end of file diff --git a/docs/v14.1.0/configuration/cloudwatch-retention/index.html b/docs/v14.1.0/configuration/cloudwatch-retention/index.html index 25ec866f243..164aaa3cd70 100644 --- a/docs/v14.1.0/configuration/cloudwatch-retention/index.html +++ b/docs/v14.1.0/configuration/cloudwatch-retention/index.html @@ -5,13 +5,13 @@ Cloudwatch Retention | Cumulus Documentation - +
    Version: v14.1.0

    Cloudwatch Retention

    Our lambdas dump logs to AWS CloudWatch. By default, these logs exist indefinitely. However, there are ways to specify a duration for log retention.

    aws-cli

    In addition to getting your aws-cli set-up, there are two values you'll need to acquire.

    1. log-group-name: the name of the log group who's retention policy (retention time) you'd like to change. We'll use /aws/lambda/KinesisInboundLogger in our examples.
    2. retention-in-days: the number of days you'd like to retain the logs in the specified log group for. There is a list of possible values available in the aws logs documentation.

    For example, if we wanted to set log retention to 30 days on our KinesisInboundLogger lambda, we would write:

    aws logs put-retention-policy --log-group-name "/aws/lambda/KinesisInboundLogger" --retention-in-days 30

    Note: The aws-cli log command that we're using is explained in detail here.

    AWS Management Console

    Changing the log retention policy in the AWS Management Console is a fairly simple process:

    1. Navigate to the CloudWatch service in the AWS Management Console.
    2. Click on the Logs entry on the sidebar.
    3. Find the Log Group who's retention policy you're interested in changing.
    4. Click on the value in the Expire Events After column.
    5. Enter/Select the number of days you'd like to retain logs in that log group for.

    Screenshot of AWS console showing how to configure the retention period for Cloudwatch logs

    - + \ No newline at end of file diff --git a/docs/v14.1.0/configuration/collection-storage-best-practices/index.html b/docs/v14.1.0/configuration/collection-storage-best-practices/index.html index a48c6a54586..aa198954e25 100644 --- a/docs/v14.1.0/configuration/collection-storage-best-practices/index.html +++ b/docs/v14.1.0/configuration/collection-storage-best-practices/index.html @@ -5,13 +5,13 @@ Collection Cost Tracking and Storage Best Practices | Cumulus Documentation - +
    Version: v14.1.0

    Collection Cost Tracking and Storage Best Practices

    Organizing your data is important for metrics you may want to collect. AWS S3 storage and cost metrics are calculated at the bucket level, so it is easy to get metrics by bucket. You can get storage metrics at the key prefix level, but that is done through the CLI, which can be very slow for large buckets. It is very difficult to estimate costs at the prefix level.

    Calculating Storage By Collection

    By bucket

    Usage by bucket can be obtained in your AWS Billing Dashboard via an S3 Usage Report. You can download your usage report for a period of time and review your storage and requests at the bucket level.

    Bucket metrics can also be found in the AWS CloudWatch Metrics Console (also see Using Amazon CloudWatch Metrics).

    Navigate to Storage Metrics and select the BucketName for all buckets you are interested in. The available metrics are BucketSizeInBytes and NumberOfObjects.

    In the Graphed metrics tab, you can select the type of statistic (i.e. average, minimum, maximum) and the period for the stats. At the top, it's useful to select from the dropdown to view the metrics as a number. You can also select the time period for which you want to see stats.

    Alternatively you can query CloudWatch using the CLI.

    This command will return the average number of bytes in the bucket test-bucket for 7/31/2019:

    aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2019-07-31T00:00:00 --end-time 2019-08-01T00:00:00 --period 86400 --statistics Average --region us-east-1 --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=test-bucket Name=StorageType,Value=StandardStorage

    The result looks like:

    {
    "Datapoints": [
    {
    "Timestamp": "2019-07-31T00:00:00Z",
    "Average": 150996467959.0,
    "Unit": "Bytes"
    }
    ],
    "Label": "BucketSizeBytes"
    }

    By key prefix

    AWS does not offer storage and usage statistics at a key prefix level. Via the AWS CLI, you can get the total storage for a bucket or folder. The following command would get the storage for folder example-folder in bucket sample-bucket:

    aws s3 ls --summarize --human-readable --recursive s3://sample-bucket/example-folder | grep 'Total'

    Note that this can be a long-running operation for large buckets.

    Calculating Cost By Collection

    NASA NGAP Environment

    If using an NGAP account, the cost per bucket can be found in your CloudTamer console, in the Financials section of your account information. This is calculated on a monthly basis.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Outside of NGAP

    You can enabled S3 Cost Allocation Tags and tag your buckets. From there, you can view the cost breakdown in your AWS Billing Dashboard via the Cost Explorer. Cost Allocation Tagging is available at the bucket level.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Storage Configuration

    Cumulus allows for the configuration of many buckets for your files. Buckets are created and added to your deployment as part of the deployment process.

    In your Cumulus collection configuration, you specify where you want the files to be stored post-processing. This is done by matching a regular expression on the file with the configured bucket.

    Note that in the collection configuration, the bucket field is the key to the buckets variable in the deployment's .tfvars file.

    Organizing By Bucket

    You can specify separate groups of buckets for each collection, which could look like the example below.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "MOD09GQ-006-private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "MOD09GQ-006-public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    Additional collections would go to different buckets.

    Organizing by Key Prefix

    Different collections can be organized into different folders in the same bucket, using the key prefix, which is specified as the url_path in the collection configuration. In this simplified collection configuration example, the url_path field is set at the top level so that all files go to a path prefixed with the collection name and version.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    In this case, the path to all the files would be: MOD09GQ___006/<filename> in their respective buckets.

    The url_path can be overidden directly on the file configuration. The example below produces the same result.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "protected-2",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    }
    ]
    }
    - + \ No newline at end of file diff --git a/docs/v14.1.0/configuration/data-management-types/index.html b/docs/v14.1.0/configuration/data-management-types/index.html index 74ca003e6b7..9678f953d7d 100644 --- a/docs/v14.1.0/configuration/data-management-types/index.html +++ b/docs/v14.1.0/configuration/data-management-types/index.html @@ -5,13 +5,13 @@ Cumulus Data Management Types | Cumulus Documentation - +
    Version: v14.1.0

    Cumulus Data Management Types

    What Are The Cumulus Data Management Types

    • Collections: Collections are logical sets of data objects of the same data type and version. They provide contextual information used by Cumulus ingest.
    • Granules: Granules are the smallest aggregation of data that can be independently managed. They are always associated with a collection, which is a grouping of granules.
    • Providers: Providers generate and distribute input data that Cumulus obtains and sends to workflows.
    • Rules: Rules tell Cumulus how to associate providers and collections and when/how to start processing a workflow.
    • Workflows: Workflows are composed of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage, and archive data.
    • Executions: Executions are records of a workflow.
    • Reconciliation Reports: Reports are a comparison of data sets to check to see if they are in agreement and to help Cumulus users detect conflicts.

    Interaction

    • Providers tell Cumulus where to get new data - i.e. S3, HTTPS
    • Collections tell Cumulus where to store the data files
    • Rules tell Cumulus when to trigger a workflow execution and tie providers and collections together

    Managing Data Management Types

    The following are created via the dashboard or API:

    • Providers
    • Collections
    • Rules
    • Reconciliation reports

    Granules are created by workflow executions and then can be managed via the dashboard or API.

    An execution record is created for each workflow execution triggered and can be viewed in the dashboard or data can be retrieved via the API.

    Workflows are created and managed via the Cumulus deployment.

    Configuration Fields

    Schemas

    Looking at our API schema definitions can provide us with some insight into collections, providers, rules, and their attributes (and whether those are required or not). The schema for different concepts will be reference throughout this document.

    The schemas are extremely useful for understanding which attributes are configurable and which of those are required. Cumulus uses these schemas for validation.

    Providers

    Please note:

    • While connection configuration is defined here, things that are more specific to a specific ingest setup (e.g. 'What target directory should we be pulling from' or 'How is duplicate handling configured?') are generally defined in a Rule or Collection, not the Provider.
    • There is some provider behavior which is controlled by task-specific configuration and not the provider definition. This configuration has to be set on a per-workflow basis. For example, see the httpListTimeout configuration on the discover-granules task

    Provider Configuration

    The Provider configuration is defined by a JSON object that takes different configuration keys depending on the provider type. The following are definitions of typical configuration values relevant for the various providers:

    Configuration by provider type
    S3
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be s3 for this provider type.
    hoststringYesS3 Bucket to pull data from
    http
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be http for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 80
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certificateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    https
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be https for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 443
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certiciateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    ftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be ftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to anonymous if not defined
    passwordstringNoPassword to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to password if not defined
    portintegerNoPort to connect to the provider on. Defaults to 21
    sftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be sftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the sftp server.
    passwordstringNoPassword to use to connect to the sftp server.
    portintegerNoPort to connect to the provider on. Defaults to 22
    privateKeystringNofilename assumed to be in s3://bucketInternal/stackName/crypto
    cmKeyIdstringNoAWS KMS Customer Master Key arn or alias

    Collections

    Break down of [s3_MOD09GQ_006.json](https://github.com/nasa/cumulus/blob/master/example/data/collections/s3_MOD09GQ_006/s3_MOD09GQ_006.json)
    KeyValueRequiredDescription
    name"MOD09GQ"YesThe name attribute designates the name of the collection. This is the name under which the collection will be displayed on the dashboard
    version"006"YesA version tag for the collection
    granuleId"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$"YesThe regular expression used to validate the granule ID extracted from filenames according to the granuleIdExtraction
    granuleIdExtraction"(MOD09GQ\..*)(\.hdf|\.cmr|_ndvi\.jpg)"YesThe regular expression used to extract the granule ID from filenames. The first capturing group extracted from the filename by the regex will be used as the granule ID.
    sampleFileName"MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesAn example filename belonging to this collection
    files<JSON Object> of files defined hereYesDescribe the individual files that will exist for each granule in this collection (size, browse, meta, etc.)
    dataType"MOD09GQ"NoCan be specified, but this value will default to the collection_name if not
    duplicateHandling"replace"No("replace"|"version"|"skip") determines granule duplicate handling scheme
    ignoreFilesConfigForDiscoveryfalse (default)NoBy default, during discovery only files that match one of the regular expressions in this collection's files attribute (see above) are ingested. Setting this to true will ignore the files attribute during discovery, meaning that all files for a granule (i.e., all files with filenames matching granuleIdExtraction) will be ingested even when they don't match a regular expression in the files attribute at discovery time. (NOTE: this attribute does not appear in the example file, but is listed here for completeness.)
    process"modis"NoExample options for this are found in the ChooseProcess step definition in the IngestAndPublish workflow definition
    meta<JSON Object> of MetaData for the collectionNoMetaData for the collection. This metadata will be available to workflows for this collection via the Cumulus Message Adapter.
    url_path"{cmrMetadata.Granule.Collection.ShortName}/
    {substring(file.fileName, 0, 3)}"
    NoFilename without extension

    files-object

    KeyValueRequiredDescription
    regex"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"YesRegular expression used to identify the file
    sampleFileNameMOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesFilename used to validate the provided regex
    type"data"NoValue to be assigned to the Granule File Type. CNM types are used by Cumulus CMR steps, non-CNM values will be treated as 'data' type. Currently only utilized in DiscoverGranules task
    bucket"internal"YesName of the bucket where the file will be stored
    url_path"${collectionShortName}/{substring(file.fileName, 0, 3)}"NoFolder used to save the granule in the bucket. Defaults to the collection url_path
    checksumFor"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"NoIf this is a checksum file, set checksumFor to the regex of the target file.

    Rules

    Rules are used by to start processing workflows and the transformation process. Rules can be invoked manually, based on a schedule, or can be configured to be triggered by either events in Kinesis, SNS messages or SQS messages.

    Rule configuration
    KeyValueRequiredDescription
    name"L2_HR_PIXC_kinesisRule"YesName of the rule. This is the name under which the rule will be listed on the dashboard
    workflow"CNMExampleWorkflow"YesName of the workflow to be run. A list of available workflows can be found on the Workflows page
    provider"PODAAC_SWOT"NoConfigured provider's ID. This can be found on the Providers dashboard page
    collection<JSON Object> collection object shown belowYesName and version of the collection this rule will moderate. Relates to a collection configured and found in the Collections page
    payload<JSON Object or Array>NoThe payload to be passed to the workflow
    meta<JSON Object> of MetaData for the ruleNoMetaData for the rule. This metadata will be available to workflows for this rule via the Cumulus Message Adapter.
    rule<JSON Object> rule type and associated values - discussed belowYesObject defining the type and subsequent attributes of the rule
    state"ENABLED"No("ENABLED"|"DISABLED") whether or not the rule will be active. Defaults to "ENABLED".
    queueUrlhttps://sqs.us-east-1.amazonaws.com/1234567890/queue-nameNoURL for SQS queue that will be used to schedule workflows for this rule
    tags["kinesis", "podaac"]NoAn array of strings that can be used to simplify search

    collection-object

    KeyValueRequiredDescription
    name"L2_HR_PIXC"YesName of a collection defined/configured in the Collections dashboard page
    version"000"YesVersion number of a collection defined/configured in the Collections dashboard page

    meta-object

    KeyValueRequiredDescription
    retries3NoNumber of retries on errors, for sqs-type rule only. Defaults to 3.
    visibilityTimeout900NoVisibilityTimeout in seconds for the inflight messages, for sqs-type rule only. Defaults to the visibility timeout of the SQS queue when the rule is created.

    rule-object

    KeyValueRequiredDescription
    type"kinesis"Yes("onetime"|"scheduled"|"kinesis"|"sns"|"sqs") type of scheduling/workflow kick-off desired
    value<String> ObjectDependsDiscussion of valid values is below

    rule-value

    The rule - value entry depends on the type of run:

    • If this is a onetime rule this can be left blank. Example
    • If this is a scheduled rule this field must hold a valid cron-type expression or rate expression.
    • If this is a kinesis rule, this must be a configured ${Kinesis_stream_ARN}. Example
    • If this is an sns rule, this must be an existing ${SNS_Topic_Arn}. Example
    • If this is an sqs rule, this must be an existing ${SQS_QueueUrl} that your account has permissions to access, and also you must configure a dead-letter queue for this SQS queue. Example

    sqs-type rule features

    • When an SQS rule is triggered, the SQS message remains on the queue.
    • The SQS message is not processed multiple times in parallel when visibility timeout is properly set. You should set the visibility timeout to the maximum expected length of the workflow with padding. Longer is better to avoid parallel processing.
    • The SQS message visibility timeout can be overridden by the rule.
    • Upon successful workflow execution, the SQS message is removed from the queue.
    • Upon failed execution(s), the workflow is run 3 or configured number of times.
    • Upon failed execution(s), the visibility timeout will be set to 5s to allow retries.
    • After configured number of failed retries, the SQS message is moved to the dead-letter queue configured for the SQS queue.

    Configuration Via Cumulus Dashboard

    Create A Provider

    • In the Cumulus dashboard, go to the Provider page.

    Screenshot of Create Provider form

    • Click on Add Provider.
    • Fill in the form and then submit it.

    Screenshot of Create Provider form

    Create A Collection

    • Go to the Collections page.

    Screenshot of the Collections page

    • Click on Add Collection.
    • Copy and paste or fill in the collection JSON object form.

    Screenshot of Add Collection form

    • Once you submit the form, you should be able to verify that your new collection is in the list.

    Create A Rule

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Rule Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v14.1.0/configuration/lifecycle-policies/index.html b/docs/v14.1.0/configuration/lifecycle-policies/index.html index 6205dbdf877..1240a083970 100644 --- a/docs/v14.1.0/configuration/lifecycle-policies/index.html +++ b/docs/v14.1.0/configuration/lifecycle-policies/index.html @@ -5,13 +5,13 @@ Setting S3 Lifecycle Policies | Cumulus Documentation - +
    Version: v14.1.0

    Setting S3 Lifecycle Policies

    This document will outline, in brief, how to set data lifecycle policies so that you are more easily able to control data storage costs while keeping your data accessible. For more information on why you might want to do this, see the 'Additional Information' section at the end of the document.

    Requirements

    • The AWS CLI installed and configured (if you wish to run the CLI example). See AWS's guide to setting up the AWS CLI for more on this. Please ensure the AWS CLI is in your shell path.
    • You will need a S3 bucket on AWS. You are strongly encouraged to use a bucket without voluminous amounts of data in it for experimenting/learning.
    • An AWS user with the appropriate roles to access the target bucket as well as modify bucket policies.

    Examples

    Walk-through on setting time-based S3 Infrequent Access (S3IA) bucket policy

    This example will give step-by-step instructions on updating a bucket's lifecycle policy to move all objects in the bucket from the default storage to S3 Infrequent Access (S3IA) after a period of 90 days. Below are instructions for walking through configuration via the command line and the management console.

    Command Line

    Please ensure you have the AWS CLI installed and configured for access prior to attempting this example.

    Create policy

    From any directory you chose, open an editor and add the following to a file named exampleRule.json

    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    Set policy

    On the command line run the following command (with the bucket you're working with substituted in place of yourBucketNameHere).

    aws s3api put-bucket-lifecycle-configuration --bucket yourBucketNameHere --lifecycle-configuration file://exampleRule.json

    Verify policy has been set

    To obtain all of the existing policies for a bucket, run the following command (again substituting the correct bucket name):

     $ aws s3api get-bucket-lifecycle-configuration --bucket yourBucketNameHere
    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    You have set a policy that transitions any version of an object in the bucket to S3IA after each object version has not been modified for 90 days.

    Management Console

    Create Policy

    To create the example policy on a bucket via the management console, go to the following URL (replacing 'yourBucketHere' with the bucket you intend to update):

    https://s3.console.aws.amazon.com/s3/buckets/yourBucketHere/?tab=overview

    You should see a screen similar to:

    Screenshot of AWS console for an S3 bucket

    Click the "Management" Tab, then lifecycle button and press + Add lifecycle rule:

    Screenshot of &quot;Management&quot; tab of AWS console for an S3 bucket

    Give the rule a name (e.g. '90DayRule'), leaving the filter blank:

    Screenshot of window for configuring the name and scope of a lifecycle rule on an S3 bucket in the AWS console

    Click next, and mark Current Version and Previous Versions.

    Then for each, click + Add transition and select Transition to Standard-IA after for the Object creation field, and set 90 for the Days after creation/Days after objects become concurrent field. Your screen should look similar to:

    Screenshot of window for configuring the storage class transitions of a lifecycle rule on an S3 bucket in the AWS console

    Click next, then next past the Configure expiration screen (we won't be setting this), and on the fourth page, click Save:

    Screenshot of window for reviewing the configuration of a lifecycle rule on an S3 bucket in the AWS console

    You should now see you have a rule configured for your bucket:

    Screenshot of lifecycle rule appearing in the &quot;Management&quot; tab of AWS console for an S3 bucket

    You have now set a policy that transitions any version of an object in the bucket to S3IA after each object has not been modified for 90 days.

    Additional Information

    This section lists information you may want prior to enacting lifecycle policies. It is not required content for working through the examples.

    Strategy Overview

    For a discussion of overall recommended strategy, please review the Methodology for Data Lifecycle Management on the EarthData wiki.

    AWS Documentation

    The examples shown in this document are obviously fairly basic cases. By using object tags, filters and other configuration options you can enact far more complicated policies for various scenarios. For more reading on the topics presented on this page see:

    - + \ No newline at end of file diff --git a/docs/v14.1.0/configuration/monitoring-readme/index.html b/docs/v14.1.0/configuration/monitoring-readme/index.html index bccab7dd108..c06574e61dc 100644 --- a/docs/v14.1.0/configuration/monitoring-readme/index.html +++ b/docs/v14.1.0/configuration/monitoring-readme/index.html @@ -5,14 +5,14 @@ Monitoring Best Practices | Cumulus Documentation - +
    Version: v14.1.0

    Monitoring Best Practices

    This document intends to provide a set of recommendations and best practices for monitoring the state of a deployed Cumulus and diagnosing any issues.

    Cumulus-provided resources and integrations for monitoring

    Cumulus provides a number set of resources that are useful for monitoring the system and its operation.

    Cumulus Dashboard

    The primary tool for monitoring the Cumulus system is the Cumulus Dashboard. The dashboard is hosted on Github and includes instructions on how to deploy and link it into your core Cumulus deployment.

    The dashboard displays workflow executions, their status, inputs, outputs, and some diagnostic information such as logs. For further information on the dashboard, its usage, and the information it provides, see the documentation.

    Cumulus-provided AWS resources

    Cumulus sets up CloudWatch log groups for all Core-provided tasks.

    Monitoring Lambda Functions

    Logging for each Lambda Function is available in Lambda-specific CloudWatch log groups.

    Monitoring ECS services

    Each deployed cumulus_ecs_service module also includes a CloudWatch log group for the processes running on ECS.

    Monitoring workflows

    For advanced debugging, we also configure dead letter queues on critical system functions. These will allow you to monitor and debug invalid inputs to the functions we use to start workflows, which can be helpful if you find that you are not seeing workflows being started as expected. More information on these can be found in the dead letter queue documentation

    AWS recommendations

    AWS has a number of recommendations on system monitoring. Rather than reproduce those here and risk providing outdated guidance, we've documented the following links which will take you to available AWS docs on monitoring recommendations and best practices for the services used in Cumulus:

    Example: Setting up email notifications for CloudWatch logs

    Cumulus does not provide out-of-the-box support for email notifications at this time. However, setting up email notifications on AWS is fairly straightforward in that the operative components are an AWS SNS topic and a subscribed email address.

    In terms of Cumulus integration, forwarding CloudWatch logs requires creating a mechanism, most likely a Lambda Function subscribed to the log group that will receive, filter and forward these messages to the SNS topic.

    As a very simple example, we could create a function that filters CloudWatch logs created by the @cumulus/logger package and sends email notifications for error and fatal log levels, adapting the example linked above:

    const zlib = require('zlib');
    const aws = require('aws-sdk');
    const { promisify } = require('util');

    const gunzip = promisify(zlib.gunzip);
    const sns = new aws.SNS();

    exports.handler = async (event) => {
    const payload = Buffer.from(event.awslogs.data, 'base64');
    const decompressedData = await gunzip(payload);
    const logData = JSON.parse(decompressedData.toString('ascii'));
    return await Promise.all(logData.logEvents.map(async (logEvent) => {
    const logMessage = JSON.parse(logEvent.message);
    if (['error', 'fatal'].includes(logMessage.level)) {
    return sns.publish({
    TopicArn: process.env.EmailReportingTopicArn,
    Message: logEvent.message
    }).promise();
    }
    return Promise.resolve();
    }));
    };

    After creating the SNS topic, We can deploy this code as a lambda function, following the setup steps from Amazon. Make sure to include your SNS topic ARN as an environment variable on the lambda function by using the --environment option on aws lambda create-function.

    You will need to create subscription filters for each log group you want to receive emails for. We recommend automating this as much as possible, and you could very well handle this via Terraform, such as using a module to deploy filters alongside log groups, or exporting the log group names to an all-in-one email notification module.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/configuration/server_access_logging/index.html b/docs/v14.1.0/configuration/server_access_logging/index.html index 55020b8ff04..1cb4879441e 100644 --- a/docs/v14.1.0/configuration/server_access_logging/index.html +++ b/docs/v14.1.0/configuration/server_access_logging/index.html @@ -5,13 +5,13 @@ S3 Server Access Logging | Cumulus Documentation - +
    Version: v14.1.0

    S3 Server Access Logging

    Via AWS Console

    Enable server access logging for an S3 bucket

    Via AWS Command Line Interface

    1. Create a logging.json file with these contents, replacing <stack-internal-bucket> with your stack's internal bucket name, and <stack> with the name of your cumulus stack.

      {
      "LoggingEnabled": {
      "TargetBucket": "<stack-internal-bucket>",
      "TargetPrefix": "<stack>/ems-distribution/s3-server-access-logs/"
      }
      }
    2. Add the logging policy to each of your protected and public buckets by calling this command on each bucket.

      aws s3api put-bucket-logging --bucket <protected/public-bucket-name> --bucket-logging-status file://logging.json
    3. Verify the logging policy exists on your buckets.

      aws s3api get-bucket-logging --bucket <protected/public-bucket-name>
    - + \ No newline at end of file diff --git a/docs/v14.1.0/configuration/task-configuration/index.html b/docs/v14.1.0/configuration/task-configuration/index.html index 40f92fc9356..277034d7625 100644 --- a/docs/v14.1.0/configuration/task-configuration/index.html +++ b/docs/v14.1.0/configuration/task-configuration/index.html @@ -5,13 +5,13 @@ Configuration of Tasks | Cumulus Documentation - +
    Version: v14.1.0

    Configuration of Tasks

    The cumulus module exposes values for configuration for some of the provided archive and ingest tasks. Currently the following are available as configurable variables:

    cmr_search_client_config

    Configuration parameters for CMR search client for cumulus archive module tasks in the form:

    <lambda_identifier>_report_cmr_limit = <maximum number records can be returned from cmr-client search, this should be greater than cmr_page_size>
    <lambda_identifier>_report_cmr_page_size = <number of records for each page returned from CMR>
    type = map(string)

    More information about cmr limit and cmr page_size can be found from @cumulus/cmr-client and CMR Search API document.

    Currently the following values are supported:

    • create_reconciliation_report_cmr_limit
    • create_reconciliation_report_cmr_page_size

    Example

    cmr_search_client_config = {
    create_reconciliation_report_cmr_limit = 2500
    create_reconciliation_report_cmr_page_size = 250
    }

    elasticsearch_client_config

    Configuration parameters for Elasticsearch client for cumulus archive module tasks in the form:

    <lambda_identifier>_es_scroll_duration = <duration>
    <lambda_identifier>_es_scroll_size = <size>
    type = map(string)

    Currently the following values are supported:

    • create_reconciliation_report_es_scroll_duration
    • create_reconciliation_report_es_scroll_size

    Example

    elasticsearch_client_config = {
    create_reconciliation_report_es_scroll_duration = "15m"
    create_reconciliation_report_es_scroll_size = 2000
    }

    lambda_timeouts

    A configurable map of timeouts (in seconds) for cumulus ingest module task lambdas in the form:

    <lambda_identifier>_timeout: <timeout>
    type = map(string)

    Currently the following values are supported:

    • add_missing_file_checksums_task_timeout
    • discover_granules_task_timeout
    • discover_pdrs_task_timeout
    • fake_processing_task_timeout
    • files_to_granules_task_timeout
    • hello_world_task_timeout
    • hyrax_metadata_update_tasks_timeout
    • lzards_backup_task_timeout
    • move_granules_task_timeout
    • parse_pdr_task_timeout
    • pdr_status_check_task_timeout
    • post_to_cmr_task_timeout
    • queue_granules_task_timeout
    • queue_pdrs_task_timeout
    • queue_workflow_task_timeout
    • sf_sqs_report_task_timeout
    • sync_granule_task_timeout
    • update_granules_cmr_metadata_file_links_task_timeout

    Example

    lambda_timeouts = {
    discover_granules_task_timeout = 300
    }

    lambda_memory_sizes

    A configurable map of memory sizes (in MBs) for cumulus ingest module task lambdas in the form:

    <lambda_identifier>_memory_size: <memory_size>
    type = map(string)

    Currently the following values are supported:

    • add_missing_file_checksums_task_memory_size
    • discover_granules_task_memory_size
    • discover_pdrs_task_memory_size
    • fake_processing_task_memory_size
    • hyrax_metadata_updates_task_memory_size
    • lzards_backup_task_memory_size
    • move_granules_task_memory_size
    • parse_pdr_task_memory_size
    • pdr_status_check_task_memory_size
    • post_to_cmr_task_memory_size
    • queue_granules_task_memory_size
    • queue_pdrs_task_memory_size
    • queue_workflow_task_memory_size
    • sf_sqs_report_task_memory_size
    • sync_granule_task_memory_size
    • update_cmr_acess_constraints_task_memory_size
    • update_granules_cmr_metadata_file_links_task_memory_size

    Example

    lambda_memory_sizes = {
    queue_granules_task_memory_size = 1036
    }
    - + \ No newline at end of file diff --git a/docs/v14.1.0/data-cookbooks/about-cookbooks/index.html b/docs/v14.1.0/data-cookbooks/about-cookbooks/index.html index 6deef87f121..e366c59a678 100644 --- a/docs/v14.1.0/data-cookbooks/about-cookbooks/index.html +++ b/docs/v14.1.0/data-cookbooks/about-cookbooks/index.html @@ -5,13 +5,13 @@ About Cookbooks | Cumulus Documentation - +
    Version: v14.1.0

    About Cookbooks

    Introduction

    The following data cookbooks are documents containing examples and explanations of workflows in the Cumulus framework. Additionally, the following data cookbooks should serve to help unify an institution/user group on a set of terms.

    Setup

    The data cookbooks assume you can configure providers, collections, and rules to run workflows. Visit Cumulus data management types for information on how to configure Cumulus data management types.

    Adding a page

    As shown in detail in the "Add a New Page and Sidebars" section in Cumulus Docs: How To's, you can add a new page to the data cookbook by creating a markdown (.md) file in the docs/data-cookbooks directory. The new page can then be linked to the sidebar by adding it to the Data-Cookbooks object in the website/sidebar.json file as data-cookbooks/${id}.

    More about workflows

    Workflow general information

    Input & Output

    Developing Workflow Tasks

    Workflow Configuration How-to's

    - + \ No newline at end of file diff --git a/docs/v14.1.0/data-cookbooks/browse-generation/index.html b/docs/v14.1.0/data-cookbooks/browse-generation/index.html index 699b6f1a035..6599b183da9 100644 --- a/docs/v14.1.0/data-cookbooks/browse-generation/index.html +++ b/docs/v14.1.0/data-cookbooks/browse-generation/index.html @@ -5,7 +5,7 @@ Ingest Browse Generation | Cumulus Documentation - + @@ -15,7 +15,7 @@ provider keys with the previously entered values) Note that you need to set the "provider_path" to the path on your bucket (e.g. "/data") that you've staged your mock/test data.:

    {
    "name": "TestBrowseGeneration",
    "workflow": "DiscoverGranulesBrowseExample",
    "provider": "{{provider_from_previous_step}}",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "meta": {
    "provider_path": "{{path_to_data}}"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "updatedAt": 1553053438767
    }

    Run Workflows

    Once you've configured the Collection and Provider and added a onetime rule, you're ready to trigger your rule, and watch the ingest workflows process.

    Go to the Rules tab, click the rule you just created:

    Screenshot of the Rules overview page with a list of rules in the Cumulus dashboard

    Then click the gear in the upper right corner and click "Rerun":

    Screenshot of clicking the button to rerun a workflow rule from the rule edit page in the Cumulus dashboard

    Tab over to executions and you should see the DiscoverGranulesBrowseExample workflow run, succeed, and then moments later the CookbookBrowseExample should run and succeed.

    Screenshot of page listing executions in the Cumulus dashboard

    Results

    You can verify your data has ingested by clicking the successful workflow entry:

    Screenshot of individual entry from table listing executions in the Cumulus dashboard

    Select "Show Output" on the next page

    Screenshot of &quot;Show output&quot; button from individual execution page in the Cumulus dashboard

    and you should see in the payload from the workflow something similar to:

    "payload": {
    "process": "modis",
    "granules": [
    {
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-private",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "type": "browse",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-protected-2",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}"
    }
    ],
    "cmrLink": "https://cmr.uat.earthdata.nasa.gov/search/granules.json?concept_id=G1222231611-CUMULUS",
    "cmrConceptId": "G1222231611-CUMULUS",
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "cmrMetadataFormat": "echo10",
    "dataType": "MOD09GQ",
    "version": "006",
    "published": true
    }
    ]
    }

    You can verify the granules exist within your cumulus instance (search using the Granules interface, check the S3 buckets, etc) and validate that the above CMR entry


    Build Processing Lambda

    This section discusses the construction of a custom processing lambda to replace the contrived example from this entry for a real dataset processing task.

    To ingest your own data using this example, you will need to construct your own lambda to replace the source in ProcessingStep that will generate browse imagery and provide or update a CMR metadata export file.

    You will then need to add the lambda to your Cumulus deployment as a aws_lambda_function Terraform resource.

    The discussion below outlines requirements for this lambda.

    Inputs

    The incoming message to the task defined in the ProcessingStep as configured will have the following configuration values (accessible inside event.config courtesy of the message adapter):

    Configuration

    • event.config.bucket -- the name of the bucket configured in terraform.tfvars as your internal bucket.

    • event.config.collection -- The full collection object we will configure in the Configure Ingest section. You can view the expected collection schema in the docs here or in the source code on github. You need this as available input and output so you can update as needed.

    event.config.additionalUrls, generateFakeBrowse and event.config.cmrMetadataFormat from the example can be ignored as they're configuration flags for the provided example script.

    Payload

    The 'payload' from the previous task is accessible via event.input. The expected payload output schema from SyncGranules can be viewed here.

    In our example, the payload would look like the following. Note: The types are set per-file based on what we configured in our collection, and were initially added as part of the DiscoverGranules step in the DiscoverGranulesBrowseExample workflow.

     "payload": {
    "process": "modis",
    "granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    }
    ]
    }
    ]
    }

    Generating Browse Imagery

    The provided example script used in the example goes through all granules and adds a 'fake' .jpg browse file to the same staging location as the data staged by prior ingest tasksf.

    The processing lambda you construct will need to do the following:

    • Create a browse image file based on the input data, and stage it to a location accessible to both this task and the FilesToGranules and MoveGranules tasks in a S3 bucket.
    • Add the browse file to the input granule files, making sure to set the granule file's type to browse.
    • Update meta.input_granules with the updated granules list, as well as provide the files to be integrated by FilesToGranules as output from the task.

    Generating/updating CMR metadata

    If you do not already have a CMR file in the granules list, you will need to generate one for valid export. This example's processing script generates and adds it to the FilesToGranules file list via the payload but it can be present in the InputGranules from the DiscoverGranules task as well if you'd prefer to pre-generate it.

    Both downstream tasks MoveGranules, UpdateGranulesCmrMetadataFileLinks, and PostToCmr expect a valid CMR file to be available if you want to export to CMR.

    Expected Outputs for processing task/tasks

    In the above example, the critical portion of the output to FilesToGranules is the payload and meta.input_granules.

    In the example provided, the processing task is setup to return an object with the keys "files" and "granules". In the cumulus_message configuration, the outputs are mapped in the configuration to the payload, granules to meta.input_granules:

              "task_config": {
    "inputGranules": "{$.meta.input_granules}",
    "granuleIdExtraction": "{$.meta.collection.granuleIdExtraction}"
    }

    Their expected values from the example above may be useful in constructing a processing task:

    payload

    The payload includes a full list of files to be 'moved' into the cumulus archive. The FilesToGranules task will take this list, merge it with the information from InputGranules, then pass that list to the MoveGranules task. The MoveGranules task will then move the files to their targets. The UpdateGranulesCmrMetadataFileLinks task will update the CMR metadata file if it exists with the updated granule locations and update the CMR file etags.

    In the provided example, a payload being passed to the FilesToGranules task should be expected to look like:

      "payload": [
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml"
    ]

    This list is the list of granules FilesToGranules will act upon to add/merge with the input_granules object.

    The pathing is generated from sync-granules, but in principle the files can be staged wherever you like so long as the processing/MoveGranules task's roles have access and the filename matches the collection configuration.

    input_granules

    The FilesToGranules task utilizes the incoming payload to chose which files to move, but pulls all other metadata from meta.input_granules. As such, the output payload in the example would look like:

    "input_granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg"
    }
    ]
    }
    ],
    - + \ No newline at end of file diff --git a/docs/v14.1.0/data-cookbooks/choice-states/index.html b/docs/v14.1.0/data-cookbooks/choice-states/index.html index 26d44f781d5..689ce422bbe 100644 --- a/docs/v14.1.0/data-cookbooks/choice-states/index.html +++ b/docs/v14.1.0/data-cookbooks/choice-states/index.html @@ -5,13 +5,13 @@ Choice States | Cumulus Documentation - +
    Version: v14.1.0

    Choice States

    Cumulus supports AWS Step Function Choice states. A Choice state enables branching logic in Cumulus workflows.

    Choice state definitions include a list of Choice Rules. Each Choice Rule defines a logical operation which compares an input value against a value using a comparison operator. For available comparison operators, review the AWS docs.

    If the comparison evaluates to true, the Next state is followed.

    Example

    In examples/cumulus-tf/parse_pdr_workflow.tf the ParsePdr workflow uses a Choice state, CheckAgainChoice, to terminate the workflow once meta.isPdrFinished: true is returned by the CheckStatus state.

    The CheckAgainChoice state definition requires an input object of the following structure:

    {
    "meta": {
    "isPdrFinished": false
    }
    }

    Given the above input to the CheckAgainChoice state, the workflow would transition to the PdrStatusReport state.

    "CheckAgainChoice": {
    "Type": "Choice",
    "Choices": [
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": false,
    "Next": "PdrStatusReport"
    },
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": true,
    "Next": "WorkflowSucceeded"
    }
    ],
    "Default": "WorkflowSucceeded"
    }

    Advanced: Loops in Cumulus Workflows

    Understanding the complete ParsePdr workflow is not necessary to understanding how Choice states work, but ParsePdr provides an example of how Choice states can be used to create a loop in a Cumulus workflow.

    In the complete ParsePdr workflow definition, the state QueueGranules is followed by CheckStatus. From CheckStatus a loop starts: Given CheckStatus returns meta.isPdrFinished: false, CheckStatus is followed by CheckAgainChoice is followed by PdrStatusReport is followed by WaitForSomeTime, which returns to CheckStatus. Once CheckStatus returns meta.isPdrFinished: true, CheckAgainChoice proceeds to WorkflowSucceeded.

    Execution graph of SIPS ParsePdr workflow in AWS Step Functions console

    Further documentation

    For complete details on Choice state configuration options, see the Choice state documentation.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/data-cookbooks/cnm-workflow/index.html b/docs/v14.1.0/data-cookbooks/cnm-workflow/index.html index a867ac5cb18..c730334a97e 100644 --- a/docs/v14.1.0/data-cookbooks/cnm-workflow/index.html +++ b/docs/v14.1.0/data-cookbooks/cnm-workflow/index.html @@ -5,7 +5,7 @@ CNM Workflow | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v14.1.0

    CNM Workflow

    This entry documents how to setup a workflow that utilizes the built-in CNM/Kinesis functionality in Cumulus.

    Prior to working through this entry you should be familiar with the Cloud Notification Mechanism.

    Sections


    Prerequisites

    Cumulus

    This entry assumes you have a deployed instance of Cumulus (version >= 1.16.0). The entry assumes you are deploying Cumulus via the cumulus terraform module sourced from the release page.

    AWS CLI

    This entry assumes you have the AWS CLI installed and configured. If you do not, please take a moment to review the documentation - particularly the examples relevant to Kinesis - and install it now.

    Kinesis

    This entry assumes you already have two Kinesis data steams created for use as CNM notification and response data streams.

    If you do not have two streams setup, please take a moment to review the Kinesis documentation and setup two basic single-shard streams for this example:

    Using the "Create Data Stream" button on the Kinesis Dashboard, work through the dialogue.

    You should be able to quickly use the "Create Data Stream" button on the Kinesis Dashboard, and setup streams that are similar to the following example:

    Screenshot of AWS console page for creating a Kinesis stream

    Please bear in mind that your {{prefix}}-lambda-processing IAM role will need permissions to write to the response stream for this workflow to succeed if you create the Kinesis stream with a dashboard user. If you are using the cumulus top-level module for your deployment this should be set properly.

    If not, the most straightforward approach is to attach the AmazonKinesisFullAccess policy for the stream resource to whatever role your Lambda s are using, however your environment/security policies may require an approach specific to your deployment environment.

    In operational environments it's likely science data providers would typically be responsible for providing a Kinesis stream with the appropriate permissions.

    For more information on how this process works and how to develop a process that will add records to a stream, read the Kinesis documentation and the developer guide.

    Source Data

    This entry will run the SyncGranule task against a single target data file. To that end it will require a single data file to be present in an S3 bucket matching the Provider configured in the next section.

    Collection and Provider

    Cumulus will need to be configured with a Collection and Provider entry of your choosing. The provider should match the location of the source data from the Ingest Source Data section.

    This can be done via the Cumulus Dashboard if installed or the API. It is strongly recommended to use the dashboard if possible.


    Configure the Workflow

    Provided the prerequisites have been fulfilled, you can begin adding the needed values to your Cumulus configuration to configure the example workflow.

    The following are steps that are required to set up your Cumulus instance to run the example workflow:

    Example CNM Workflow

    In this example, we're going to trigger a workflow by creating a Kinesis rule and sending a record to a Kinesis stream.

    The following workflow definition should be added to a new .tf workflow resource (e.g. cnm_workflow.tf) in your deployment directory. For the complete CNM workflow example, see examples/cumulus-tf/cnm_workflow.tf.

    Add the following to the new terraform file in your deployment directory, updating the following:

    • Set the response-endpoint key in the CnmResponse task in the workflow JSON to match the name of the Kinesis response stream you configured in the prerequisites section
    • Update the source key to the workflow module to match the Cumulus release associated with your deployment.
    module "cnm_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-workflow.zip"

    prefix = var.prefix
    name = "CNMExampleWorkflow"
    workflow_config = module.cumulus.workflow_config
    system_bucket = var.system_bucket

    {
    state_machine_definition = <<JSON
    "CNMExampleWorkflow": {
    "Comment": "CNMExampleWorkflow",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "collection": "{$.meta.collection}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "response-endpoint": "ADD YOUR RESPONSE STREAM NAME HERE",
    "region": "us-east-1",
    "type": "kinesis",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$.input.input}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 5,
    "MaxAttempts": 3
    }
    ],
    "End": true
    }
    }
    }
    }
    JSON

    Again, please make sure to modify the value response-endpoint to match the stream name (not ARN) for your Kinesis response stream.

    Lambda Configuration

    To execute this workflow, you're required to include several Lambda resources in your deployment. To do this, add the following task (Lambda) definitions to your deployment along with the workflow you created above:

    Please note: To utilize these tasks you need to ensure you have a compatible CMA layer. See the deployment instructions for more details on how to deploy a CMA layer.

    Below is a description of each of these tasks:

    CNMToCMA

    CNMToCMA is meant for the beginning of a workflow: it maps CNM granule information to a payload for downstream tasks. For other CNM workflows, you would need to ensure that downstream tasks in your workflow either understand the CNM message or include a translation task like this one.

    You can also manipulate the data sent to downstream tasks using task_config for various states in your workflow resource configuration. Read more about how to configure data on the Workflow Input & Output page.

    CnmResponse

    The CnmResponse Lambda generates a CNM response message and puts it on the response-endpoint Kinesis stream.

    You can read more about the expected schema of a CnmResponse record in the Cloud Notification Mechanism schema repository.

    Additional Tasks

    Lastly, this entry also makes use of the SyncGranule task from the cumulus module.

    Redeploy

    Once the above configuration changes have been made, redeploy your stack.

    Please refer to Update Cumulus resources in the deployment documentation if you are unfamiliar with redeployment.

    Rule Configuration

    Cumulus includes a messageConsumer Lambda function (message-consumer). Cumulus kinesis-type rules create the event source mappings between Kinesis streams and the messageConsumer Lambda. The messageConsumer Lambda consumes records from one or more Kinesis streams, as defined by enabled kinesis-type rules. When new records are pushed to one of these streams, the messageConsumer triggers workflows associated with the enabled kinesis-type rules.

    To add a rule via the dashboard (if you'd like to use the API, see the docs here), navigate to the Rules page and click Add a rule, then configure the new rule using the following template (substituting correct values for parameters denoted by ${}):

    {
    "collection": {
    "name": "L2_HR_PIXC",
    "version": "000"
    },
    "name": "L2_HR_PIXC_kinesisRule",
    "provider": "PODAAC_SWOT",
    "rule": {
    "type": "kinesis",
    "value": "arn:aws:kinesis:{{awsRegion}}:{{awsAccountId}}:stream/{{streamName}}"
    },
    "state": "ENABLED",
    "workflow": "CNMExampleWorkflow"
    }

    Please Note:

    • The rule's value attribute value must match the Amazon Resource Name ARN for the Kinesis data stream you've preconfigured. You should be able to obtain this ARN from the Kinesis Dashboard entry for the selected stream.
    • The collection and provider should match the collection and provider you setup in the Prerequisites section.

    Once you've clicked on 'submit' a new rule should appear in the dashboard's Rule Overview.


    Execute the Workflow

    Once Cumulus has been redeployed and a rule has been added, we're ready to trigger the workflow and watch it execute.

    How to Trigger the Workflow

    To trigger matching workflows, you will need to put a record on the Kinesis stream that the message-consumer Lambda will recognize as a matching event. Most importantly, it should include a collection name that matches a valid collection.

    For the purpose of this example, the easiest way to accomplish this is using the AWS CLI.

    Create Record JSON

    Construct a JSON file containing an object that matches the values that have been previously setup. This JSON object should be a valid Cloud Notification Mechanism message.

    Please note: this example is somewhat contrived, as the downstream tasks don't care about most of these fields. A 'real' data ingest workflow would.

    The following values (denoted by ${} in the sample below) should be replaced to match values we've previously configured:

    • TEST_DATA_FILE_NAME: The filename of the test data that is available in the S3 (or other) provider we created earlier.
    • TEST_DATA_URI: The full S3 path to the test data (e.g. s3://bucket-name/path/granule)
    • COLLECTION: The collection name defined in the prerequisites for this product
    {
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "${TEST_DATA_FILE_NAME}",
    "checksum": "bogus_checksum_value",
    "uri": "${TEST_DATA_URI}",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "${TEST_DATA_FILE_NAME}",
    "dataVersion": "006"
    },
    "identifier ": "testIdentifier123456",
    "collection": "${COLLECTION}",
    "provider": "TestProvider",
    "version": "001",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Add Record to Kinesis Data Stream

    Using the JSON file you created, push it to the Kinesis notification stream:

    aws kinesis put-record --stream-name YOUR_KINESIS_NOTIFICATION_STREAM_NAME_HERE --partition-key 1 --data file:///path/to/file.json

    Please note: The above command uses the stream name, not the ARN.

    The command should return output similar to:

    {
    "ShardId": "shardId-000000000000",
    "SequenceNumber": "42356659532578640215890215117033555573986830588739321858"
    }

    This command will put a record containing the JSON from the --data flag onto the Kinesis data stream. The messageConsumer Lambda will consume the record and construct a valid CMA payload to trigger workflows. For this example, the record will trigger the CNMExampleWorkflow workflow as defined by the rule previously configured.

    You can view the current running executions on the Executions dashboard page which presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information.

    Verify Workflow Execution

    As detailed above, once the record is added to the Kinesis data stream, the messageConsumer Lambda will trigger the CNMExampleWorkflow .

    TranslateMessage

    TranslateMessage (which corresponds to the CNMToCMA Lambda) will take the CNM object payload and add a granules object to the CMA payload that's consistent with other Cumulus ingest tasks, and add a meta.cnm key (as well as the payload) to store the original message.

    For more on the Message Adapter, please see the Message Flow documentation.

    An example of what is happening in the CNMToCMA Lambda is as follows:

    Example Input Payload:

    "payload": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some_bucket/cumulus-test-data/pdrs/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Example Output Payload:

      "payload": {
    "cnm": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552"
    },
    "output": {
    "granules": [
    {
    "granuleId": "TestGranuleUR",
    "files": [
    {
    "path": "some-bucket/data",
    "url_path": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "some-bucket",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 12345678
    }
    ]
    }
    ]
    }
    }

    SyncGranules

    This Lambda will take the files listed in the payload and move them to s3://{deployment-private-bucket}/file-staging/{deployment-name}/{COLLECTION}/{file_name}.

    CnmResponse

    Assuming a successful execution of the workflow, this task will recover the meta.cnm key from the CMA output, and add a "SUCCESS" record to the notification Kinesis stream.

    If a prior step in the workflow has failed, this will add a "FAILURE" record to the stream instead.

    The data written to the response-endpoint should adhere to the Response Message Fields schema.

    Example CNM Success Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "SUCCESS"
    }
    }

    Example CNM Error Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "FAILURE",
    "errorCode": "PROCESSING_ERROR",
    "errorMessage": "File [cumulus-dev-a4d38f59-5e57-590c-a2be-58640db02d91/prod_20170926T11:30:36/production_file.nc] did not match gve checksum value."
    }
    }

    Note the CnmResponse state defined in the .tf workflow definition above configures $.exception to be passed to the CnmResponse Lambda keyed under config.WorkflowException. This is required for the CnmResponse code to deliver a failure response.

    To test the failure scenario, send a record missing the product.name key.


    Verify results

    Check for successful execution on the dashboard

    Following the successful execution of this workflow, you should expect to see the workflow complete successfully on the dashboard:

    Screenshot of a successful CNM workflow appearing on the executions page of the Cumulus dashboard

    Check the test granule has been delivered to S3 staging

    The test granule identified in the Kinesis record should be moved to the deployment's private staging area.

    Check for Kinesis records

    A SUCCESS notification should be present on the response-endpoint Kinesis stream.

    You should be able to validate the notification and response streams have the expected records with the following steps (the AWS CLI Kinesis Basic Stream Operations is useful to review before proceeding):

    Get a shard iterator (substituting your stream name as appropriate):

    aws kinesis get-shard-iterator \
    --shard-id shardId-000000000000 \
    --shard-iterator-type LATEST \
    --stream-name NOTIFICATION_OR_RESPONSE_STREAM_NAME

    which should result in an output to:

    {
    "ShardIterator": "VeryLongString=="
    }
    • Re-trigger the workflow by using the put-record command from
    • As the workflow completes, use the output from the get-shard-iterator command to request data from the stream:
    aws kinesis get-records --shard-iterator SHARD_ITERATOR_VALUE

    This should result in output similar to:

    {
    "Records": [
    {
    "SequenceNumber": "49586720336541656798369548102057798835250389930873978882",
    "ApproximateArrivalTimestamp": 1532664689.128,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjI4LjkxOSJ9",
    "PartitionKey": "1"
    },
    {
    "SequenceNumber": "49586720336541656798369548102059007761070005796999266306",
    "ApproximateArrivalTimestamp": 1532664707.149,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjQ2Ljk1OCJ9",
    "PartitionKey": "1"
    }
    ],
    "NextShardIterator": "AAAAAAAAAAFo9SkF8RzVYIEmIsTN+1PYuyRRdlj4Gmy3dBzsLEBxLo4OU+2Xj1AFYr8DVBodtAiXbs3KD7tGkOFsilD9R5tA+5w9SkGJZ+DRRXWWCywh+yDPVE0KtzeI0andAXDh9yTvs7fLfHH6R4MN9Gutb82k3lD8ugFUCeBVo0xwJULVqFZEFh3KXWruo6KOG79cz2EF7vFApx+skanQPveIMz/80V72KQvb6XNmg6WBhdjqAA==",
    "MillisBehindLatest": 0
    }

    Note the data encoding is not human readable and would need to be parsed/converted to be interpretable. There are many options to build a Kineis consumer such as the KCL.

    For purposes of validating the workflow, it may be simpler to locate the workflow in the Step Function Management Console and assert the expected output is similar to the below examples.

    Successful CNM Response Object Example:

    {
    "cnmResponse": {
    "provider": "TestProvider",
    "collection": "MOD09GQ",
    "version": "123456",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier ": "testIdentifier123456",
    "response": {
    "status": "SUCCESS"
    }
    }
    }

    Kinesis Record Error Handling

    messageConsumer

    The default Kinesis stream processing in the Cumulus system is configured for record error tolerance.

    When the messageConsumer fails to process a record, the failure is captured and the record is published to the kinesisFallback SNS Topic. The kinesisFallback SNS topic broadcasts the record and a subscribed copy of the messageConsumer Lambda named kinesisFallback consumes these failures.

    At this point, the normal Lambda asynchronous invocation retry behavior will attempt to process the record 3 mores times. After this, if the record cannot successfully be processed, it is written to a dead letter queue. Cumulus' dead letter queue is an SQS Queue named kinesisFailure. Operators can use this queue to inspect failed records.

    This system ensures when messageConsumer fails to process a record and trigger a workflow, the record is retried 3 times. This retry behavior improves system reliability in case of any external service failure outside of Cumulus control.

    The Kinesis error handling system - the kinesisFallback SNS topic, messageConsumer Lambda, and kinesisFailure SQS queue - come with the API package and do not need to be configured by the operator.

    To examine records that were unable to be processed at any step you need to go look at the dead letter queue {{prefix}}-kinesisFailure. Check the Simple Queue Service (SQS) console. Select your queue, and under the Queue Actions tab, you can choose View/Delete Messages. Start polling for messages and you will see records that failed to process through the messageConsumer.

    Note, these are only records that occurred when processing records from Kinesis streams. Workflow failures are handled differently.

    Kinesis Stream logging

    Notification Stream messages

    Cumulus includes two Lambdas (KinesisInboundEventLogger and KinesisOutboundEventLogger) that utilize the same code to take a Kinesis record event as input, deserialize the data field and output the modified event to the logs.

    When a kinesis rule is created, in addition to the messageConsumer event mapping, an event mapping is created to trigger KinesisInboundEventLogger to record a log of the inbound record, to allow for analysis in case of unexpected failure.

    Response Stream messages

    Cumulus also supports this feature for all outbound messages. To take advantage of this feature, you will need to set an event mapping on the KinesisOutboundEventLogger Lambda that targets your response-endpoint. You can do this in the Lambda management page for KinesisOutboundEventLogger. Add a Kinesis trigger, and configure it to target the cnmResponseStream for your workflow:

    Screenshot of the AWS console showing configuration for Kinesis stream trigger on KinesisOutboundEventLogger Lambda

    Once this is done, all records sent to the response-endpoint will also be logged in CloudWatch. For more on configuring Lambdas to trigger on Kinesis events, please see creating an event source mapping.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/data-cookbooks/error-handling/index.html b/docs/v14.1.0/data-cookbooks/error-handling/index.html index a752201defb..e24a6fd7f80 100644 --- a/docs/v14.1.0/data-cookbooks/error-handling/index.html +++ b/docs/v14.1.0/data-cookbooks/error-handling/index.html @@ -5,7 +5,7 @@ Error Handling in Workflows | Cumulus Documentation - + @@ -45,7 +45,7 @@ Service Exception. See this documentation on configuring your workflow to handle transient lambda errors.

    Example state machine definition:

    {
    "Comment": "Tests Workflow from Kinesis Stream",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "Path": "$.payload",
    "TargetPath": "$.payload"
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": ["States.ALL"],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowSucceeded"
    },
    "CnmResponseFail": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowFailed"
    },
    "WorkflowSucceeded": {
    "Type": "Succeed"
    },
    "WorkflowFailed": {
    "Type": "Fail",
    "Cause": "Workflow failed"
    }
    }
    }

    The above results in a workflow which is visualized in the diagram below:

    Screenshot of a visualization of an AWS Step Function workflow definition with branching logic for failures

    Summary

    Error handling should (mostly) be the domain of workflow configuration.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/data-cookbooks/hello-world/index.html b/docs/v14.1.0/data-cookbooks/hello-world/index.html index c4b192aca6f..7e7591c367b 100644 --- a/docs/v14.1.0/data-cookbooks/hello-world/index.html +++ b/docs/v14.1.0/data-cookbooks/hello-world/index.html @@ -5,14 +5,14 @@ HelloWorld Workflow | Cumulus Documentation - +
    Version: v14.1.0

    HelloWorld Workflow

    Example task meant to be a sanity check/introduction to the Cumulus workflows.

    Pre-Deployment Configuration

    Workflow Configuration

    A workflow definition can be found in the template repository hello_world_workflow module.

    {
    "Comment": "Returns Hello World",
    "StartAt": "HelloWorld",
    "States": {
    "HelloWorld": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.hello_world_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    }

    Workflow error-handling can be configured as discussed in the Error-Handling cookbook.

    Task Configuration

    The HelloWorld task is provided for you as part of the cumulus terraform module, no configuration is needed.

    If you want to manually deploy your own version of this Lambda for testing, you can copy the Lambda resource definition located in the Cumulus source code at cumulus/tf-modules/ingest/hello-world-task.tf. The Lambda source code is located in the Cumulus source code at 'cumulus/tasks/hello-world'.

    Execution

    We will focus on using the Cumulus dashboard to schedule the execution of a HelloWorld workflow.

    Our goal here is to create a rule through the Cumulus dashboard that will define the scheduling and execution of our HelloWorld workflow. Let's navigate to the Rules page and click Add a rule.

    {
    "collection": { # collection values can be configured and found on the Collections page
    "name": "${collection_name}",
    "version": "${collection_version}"
    },
    "name": "helloworld_rule",
    "provider": "${provider}", # found on the Providers page
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "workflow": "HelloWorldWorkflow" # This can be found on the Workflows page
    }

    Screenshot of AWS Step Function execution graph for the HelloWorld workflow Executed workflow as seen in AWS Console

    Output/Results

    The Executions page presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information. The rule defined in the previous section should start an execution of its own accord, and the status of that execution can be tracked here.

    To get some deeper information on the execution, click on the value in the Name column of your execution of interest. This should bring up a visual representation of the workflow similar to that shown above, execution details, and a list of events.

    Summary

    Setting up the HelloWorld workflow on the Cumulus dashboard is the tip of the iceberg, so to speak. The task and step-function need to be configured before Cumulus deployment. A compatible collection and provider must be configured and applied to the rule. Finally, workflow execution status can be viewed via the workflows tab on the dashboard.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/data-cookbooks/ingest-notifications/index.html b/docs/v14.1.0/data-cookbooks/ingest-notifications/index.html index d92c621f4db..86c237041d4 100644 --- a/docs/v14.1.0/data-cookbooks/ingest-notifications/index.html +++ b/docs/v14.1.0/data-cookbooks/ingest-notifications/index.html @@ -5,13 +5,13 @@ Ingest Notification in Workflows | Cumulus Documentation - +
    Version: v14.1.0

    Ingest Notification in Workflows

    On deployment, an SQS queue and three SNS topics, one for executions, granules, and PDRs, are created and used for handling notification messages related to the workflow.

    The ingest notification reporting SQS queue is populated via a Cloudwatch rule for any Step Function execution state transitions. The sfEventSqsToDbRecords Lambda consumes this queue. The queue and Lambda are included in the cumulus module and the Cloudwatch rule in the workflow module and are included by default in a Cumulus deployment.

    The sfEventSqsToDbRecords Lambda function reads from the sfEventSqsToDbRecordsInputQueue queue and updates the RDS database records for granules, executions, and PDRs. When the records are updated, messages are posted to the three SNS topics. This Lambda is invoked both when the workflow starts and when it reaches a terminal state (completion or failure).

    Diagram of architecture for reporting workflow ingest notifications from AWS Step Functions

    Sending SQS messages to report status

    Publishing granule/PDR reports directly to the SQS queue

    If you have a non-Cumulus workflow or process ingesting data and would like to update the status of your granules or PDRs, you can publish directly to the reporting SQS queue. Publishing messages to this queue will result in those messages being stored as granule/PDR records in the Cumulus database and having the status of those granules/PDRs being visible on the Cumulus dashboard. The queue does have certain expectations as it expects a Cumulus Message nested within a Cloudwatch Step Function Event object.

    Posting directly to the queue will require knowing the queue URL. Assuming that you are using the cumulus module for your deployment, you can get the queue URL by adding them to outputs.tf for your Terraform deployment as in our example deployment:

    output "stepfunction_event_reporter_queue_url" {
    value = module.cumulus.stepfunction_event_reporter_queue_url
    }

    output "report_executions_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_granules_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_pdrs_sns_topic_arn" {
    value = module.cumulus.report_pdrs_sns_topic_arn
    }

    Then, when you run terraform deploy, you should see the topic ARNs printed to your console:

    Outputs:
    ...
    stepfunction_event_reporter_queue_url = https://sqs.us-east-1.amazonaws.com/xxxxxxxxx/<prefix>-sfEventSqsToDbRecordsInputQueue
    report_executions_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_granules_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_pdrs_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-pdrs-topic

    Once you have the queue URL, you can use the AWS SDK for your language of choice to publish messages to the topic. The expected format of these messages is that of a Cloudwatch Step Function event containing a Cumulus message. For SUCCEEDED events, the Cumulus message is expected to be in detail.output. For all other events statuses, a Cumulus Message is expected in detail.input. The Cumulus Message populating these fields MUST be a JSON string, not an object. Messages that do not conform to the schemas will fail to be created as records.

    If you are not seeing records persist to the database or show up in the Cumulus dashboard, you can investigate the Cloudwatch logs of the SQS consumer Lambda:

    • /aws/lambda/<prefix>-sfEventSqsToDbRecords

    In a workflow

    As described above, ingest notifications will automatically be published to the SNS topics on workflow start and completion/failure, so you should not include a workflow step to publish the initial or final status of your workflows.

    However, if you want to report your ingest status at any point during a workflow execution, you can add a workflow step using the SfSqsReport Lambda. In the following example from cumulus-tf/parse_pdr_workflow.tf, the ParsePdr workflow is configured to use the SfSqsReport Lambda, primarily to update the PDR ingestion status.

    Note: ${sf_sqs_report_task_arn} is an interpolated value referring to a Terraform resource. See the example deployment code for the ParsePdr workflow.

      "PdrStatusReport": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    },
    "ResultPath": null,
    "Type": "Task",
    "Resource": "${sf_sqs_report_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WaitForSomeTime"
    },

    Subscribing additional listeners to SNS topics

    Additional listeners to SNS topics can be configured in a .tf file for your Cumulus deployment. Shown below is configuration that subscribes an additional Lambda function (test_lambda) to receive messages from the report_executions SNS topic. To subscribe to the report_granules or report_pdrs SNS topics instead, simply replace report_executions in the code block below with either of those values.

    resource "aws_lambda_function" "test_lambda" {
    function_name = "${var.prefix}-testLambda"
    filename = "./testLambda.zip"
    source_code_hash = filebase64sha256("./testLambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"
    }

    resource "aws_sns_topic_subscription" "test_lambda" {
    topic_arn = module.cumulus.report_executions_sns_topic_arn
    protocol = "lambda"
    endpoint = aws_lambda_function.test_lambda.arn
    }

    resource "aws_lambda_permission" "test_lambda" {
    action = "lambda:InvokeFunction"
    function_name = aws_lambda_function.test_lambda.arn
    principal = "sns.amazonaws.com"
    source_arn = module.cumulus.report_executions_sns_topic_arn
    }

    SNS message format

    Subscribers to the SNS topics can expect to find the published message in the SNS event at Records[0].Sns.Message. The message will be a JSON stringified version of the ingest notification record for an execution or a PDR. For granules, the message will be a JSON stringified object with ingest notification record in the record property and the event type as the event property.

    The ingest notification record of the execution, granule, or PDR should conform to the data model schema for the given record type.

    Summary

    Workflows can be configured to send SQS messages at any point using the sf-sqs-report task.

    Additional listeners can be easily configured to trigger when messages are sent to the SNS topics.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/data-cookbooks/queue-post-to-cmr/index.html b/docs/v14.1.0/data-cookbooks/queue-post-to-cmr/index.html index edc77047920..d812685ca85 100644 --- a/docs/v14.1.0/data-cookbooks/queue-post-to-cmr/index.html +++ b/docs/v14.1.0/data-cookbooks/queue-post-to-cmr/index.html @@ -5,13 +5,13 @@ Queue PostToCmr | Cumulus Documentation - +
    Version: v14.1.0

    Queue PostToCmr

    In this document, we walk through handling CMR errors in workflows by queueing PostToCmr. We assume that the user already has an ingest workflow setup.

    Overview

    The general concept is that the last task of the ingest workflow will be QueueWorkflow, which queues the publish workflow. The publish workflow contains the PostToCmr task and if a CMR error occurs during PostToCmr, the publish workflow will add itself back onto the queue so that it can be executed when CMR is back online. This is achieved by leveraging the QueueWorkflow task again in the publish workflow. The following diagram demonstrates this queueing process.

    Diagram of workflow queueing

    Ingest Workflow

    The last step should be the QueuePublishWorkflow step. It should be configured with a queueUrl and workflow. In this case, the queueUrl is a throttled queue. Any queueUrl can be specified here which is useful if you would like to use a lower priority queue. The workflow is the unprefixed workflow name that you would like to queue (e.g. PublishWorkflow).

      "QueuePublishWorkflowStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "workflow": "{$.meta.workflow}",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Publish Workflow

    Configure the Catch section of your PostToCmr task to proceed to QueueWorkflow if a CMRInternalError is caught. Any other error will cause the workflow to fail.

      "Catch": [
    {
    "ErrorEquals": [
    "CMRInternalError"
    ],
    "Next": "RequeueWorkflow"
    },
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],

    Then, configure the QueueWorkflow task similarly to its configuration in the ingest workflow. This time, pass the current publish workflow to the task config. This allows for the publish workflow to be requeued when there is a CMR error.

    {
    "RequeueWorkflow": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "workflow": "PublishGranuleQueue",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    - + \ No newline at end of file diff --git a/docs/v14.1.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html b/docs/v14.1.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html index ffbdf760905..410683b50bc 100644 --- a/docs/v14.1.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html +++ b/docs/v14.1.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html @@ -5,13 +5,13 @@ Run Step Function Tasks in AWS Lambda or Docker | Cumulus Documentation - +
    Version: v14.1.0

    Run Step Function Tasks in AWS Lambda or Docker

    Overview

    AWS Step Function Tasks can run tasks on AWS Lambda or on AWS Elastic Container Service (ECS) as a Docker container.

    Lambda provides serverless architecture, providing the best option for minimizing cost and server management. ECS provides the fullest extent of AWS EC2 resources via the flexibility to execute arbitrary code on any AWS EC2 instance type.

    When to use Lambda

    You should use AWS Lambda whenever all of the following are true:

    • The task runs on one of the supported Lambda Runtimes. At time of this writing, supported runtimes include versions of python, Java, Ruby, node.js, Go and .NET.
    • The lambda package is less than 50 MB in size, zipped.
    • The task consumes less than each of the following resources:
      • 3008 MB memory allocation
      • 512 MB disk storage (must be written to /tmp)
      • 15 minutes of execution time

    See this page for a complete and up-to-date list of AWS Lambda limits.

    If your task requires more than any of these resources or an unsupported runtime, creating a Docker image which can be run on ECS is the way to go. Cumulus supports running any lambda package (and its configured layers) as a Docker container with cumulus-ecs-task.

    Step Function Activities and cumulus-ecs-task

    Step Function Activities enable a state machine task to "publish" an activity task which can be picked up by any activity worker. Activity workers can run pretty much anywhere, but Cumulus workflows support the cumulus-ecs-task activity worker. The cumulus-ecs-task worker runs as a Docker container on the Cumulus ECS cluster.

    The cumulus-ecs-task container takes an AWS Lambda Amazon Resource Name (ARN) as an argument (see --lambdaArn in the example below). This ARN argument is defined at deployment time. The cumulus-ecs-task worker polls for new Step Function Activity Tasks. When a Step Function executes, the worker (container) picks up the activity task and runs the code contained in the lambda package defined on deployment.

    Example: Replacing AWS Lambda with a Docker container run on ECS

    This example will use an already-defined workflow from the cumulus module that includes the QueueGranules task in its configuration.

    The following example is an excerpt from the Discover Granules workflow containing the step definition for the QueueGranules step:

    Note: ${ingest_granule_workflow_name} and ${queue_granules_task_arn} are interpolated values that refer to Terraform resources. See the example deployment code for the Discover Granules workflow.

      "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "queueUrl": "{$.meta.queues.startSF}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Given it has been discovered this task can no longer run in AWS Lambda, you can instead run it on the Cumulus ECS cluster by adding the following resources to your terraform deployment (by either adding a new .tf file or updating an existing one):

    • A aws_sfn_activity resource:
    resource "aws_sfn_activity" "queue_granules" {
    name = "${var.prefix}-QueueGranules"
    }
    • An instance of the cumulus_ecs_service module (found on the Cumulus releases page configured to provide the QueueGranules task:

    module "queue_granules_service" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-ecs-service.zip"

    prefix = var.prefix
    name = "QueueGranules"

    cluster_arn = module.cumulus.ecs_cluster_arn
    desired_count = 1
    image = "cumuluss/cumulus-ecs-task:1.7.0"

    cpu = 400
    memory_reservation = 700

    environment = {
    AWS_DEFAULT_REGION = data.aws_region.current.name
    }
    command = [
    "cumulus-ecs-task",
    "--activityArn",
    aws_sfn_activity.queue_granules.id,
    "--lambdaArn",
    module.cumulus.queue_granules_task.task_arn,
    "--lastModified",
    module.cumulus.queue_granules_task.last_modified_date
    ]
    alarms = {
    MemoryUtilizationHigh = {
    comparison_operator = "GreaterThanThreshold"
    evaluation_periods = 1
    metric_name = "MemoryUtilization"
    statistic = "SampleCount"
    threshold = 75
    }
    }
    }

    Please note: If you have updated the code for the Lambda specified by --lambdaArn, you will have to manually restart the tasks in your ECS service before invocation of the Step Function activity will use the updated Lambda code.

    • An updated Discover Granules workflow) to utilize the new resource (the Resource key in the QueueGranules step has been updated to:

    "Resource": "${aws_sfn_activity.queue_granules.id}")`

    If you then run this workflow in place of the DiscoverGranules workflow, the QueueGranules step would run as an ECS task instead of a lambda.

    Final note

    Step Function Activities and AWS Lambda are not the only ways to run tasks in an AWS Step Function. Learn more about other service integrations, including direct ECS integration via the AWS Service Integrations page.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/data-cookbooks/sips-workflow/index.html b/docs/v14.1.0/data-cookbooks/sips-workflow/index.html index 286fdcf8b2c..4e764a88143 100644 --- a/docs/v14.1.0/data-cookbooks/sips-workflow/index.html +++ b/docs/v14.1.0/data-cookbooks/sips-workflow/index.html @@ -5,7 +5,7 @@ Science Investigator-led Processing Systems (SIPS) | Cumulus Documentation - + @@ -16,7 +16,7 @@ we're just going to create a onetime throw-away rule that will be easy to test with. This rule will kick off the DiscoverAndQueuePdrs workflow, which is the beginning of a Cumulus SIPS workflow:

    Screenshot of a Cumulus rule configuration

    Note: A list of configured workflows exists under the "Workflows" in the navigation bar on the Cumulus dashboard. Additionally, one can find a list of executions and their respective status in the "Executions" tab in the navigation bar.

    DiscoverAndQueuePdrs Workflow

    This workflow will discover PDRs and queue them to be processed. Duplicate PDRs will be dealt with according to the configured duplicate handling setting in the collection. The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. DiscoverPdrs - source
    2. QueuePdrs - source

    Screenshot of execution graph for discover and queue PDRs workflow in the AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the discover_and_queue_pdrs_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    ParsePdr Workflow

    The ParsePdr workflow will parse a PDR, queue the specified granules (duplicates are handled according to the duplicate handling setting) and periodically check the status of those queued granules. This workflow will not succeed until all the granules included in the PDR are successfully ingested. If one of those fails, the ParsePdr workflow will fail. NOTE that ParsePdr may spin up multiple IngestGranule workflows in parallel, depending on the granules included in the PDR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. ParsePdr - source
    2. QueueGranules - source
    3. CheckStatus - source

    Screenshot of execution graph for SIPS Parse PDR workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the parse_pdr_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    IngestGranule Workflow

    The IngestGranule workflow processes and ingests a granule and posts the granule metadata to CMR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. SyncGranule - source.
    2. CmrStep - source

    Additionally this workflow requires a processing step you must provide. The ProcessingStep step in the workflow picture below is an example of a custom processing step.

    Note: Using the CmrStep is not required and can be left out of the processing trajectory if desired (for example, in testing situations).

    Screenshot of execution graph for SIPS IngestGranule workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the ingest_and_publish_granule_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    Summary

    In this cookbook we went over setting up a collection, rule, and provider for a SIPS workflow. Once we had the setup completed, we looked over the Cumulus workflows that participate in parsing PDRs, ingesting and processing granules, and updating CMR.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/data-cookbooks/throttling-queued-executions/index.html b/docs/v14.1.0/data-cookbooks/throttling-queued-executions/index.html index ba7333ca032..4ca8b47d480 100644 --- a/docs/v14.1.0/data-cookbooks/throttling-queued-executions/index.html +++ b/docs/v14.1.0/data-cookbooks/throttling-queued-executions/index.html @@ -5,13 +5,13 @@ Throttling queued executions | Cumulus Documentation - +
    Version: v14.1.0

    Throttling queued executions

    In this entry, we will walk through how to create an SQS queue for scheduling executions which will be used to limit those executions to a maximum concurrency. And we will see how to configure our Cumulus workflows/rules to use this queue.

    We will also review the architecture of this feature and highlight some implementation notes.

    Limiting the number of executions that can be running from a given queue is useful for controlling the cloud resource usage of workflows that may be lower priority, such as granule reingestion or reprocessing campaigns. It could also be useful for preventing workflows from exceeding known resource limits, such as a maximum number of open connections to a data provider.

    Implementing the queue

    Create and deploy the queue

    Add a new queue

    In a .tf file for your Cumulus deployment, add a new SQS queue:

    resource "aws_sqs_queue" "background_job_queue" {
    name = "${var.prefix}-backgroundJobQueue"
    receive_wait_time_seconds = 20
    visibility_timeout_seconds = 60
    }

    Set maximum executions for the queue

    Define the throttled_queues variable for the cumulus module in your Cumulus deployment to specify the maximum concurrent executions for the queue.

    module "cumulus" {
    # ... other variables

    throttled_queues = [{
    url = aws_sqs_queue.background_job_queue.id,
    execution_limit = 5
    }]
    }

    Setup consumer for the queue

    Add the sqs2sfThrottle Lambda as the consumer for the queue and add a Cloudwatch event rule/target to read from the queue on a scheduled basis.

    Please note: You must use the sqs2sfThrottle Lambda as the consumer for any queue with a queue execution limit or else the execution throttling will not work correctly. Additionally, please allow at least 60 seconds after creation before using the queue while associated infrastructure and triggers are set up and made ready.

    aws_sqs_queue.background_job_queue.id refers to the queue resource defined above.

    resource "aws_cloudwatch_event_rule" "background_job_queue_watcher" {
    schedule_expression = "rate(1 minute)"
    }

    resource "aws_cloudwatch_event_target" "background_job_queue_watcher" {
    rule = aws_cloudwatch_event_rule.background_job_queue_watcher.name
    arn = module.cumulus.sqs2sfThrottle_lambda_function_arn
    input = jsonencode({
    messageLimit = 500
    queueUrl = aws_sqs_queue.background_job_queue.id
    timeLimit = 60
    })
    }

    resource "aws_lambda_permission" "background_job_queue_watcher" {
    action = "lambda:InvokeFunction"
    function_name = module.cumulus.sqs2sfThrottle_lambda_function_arn
    principal = "events.amazonaws.com"
    source_arn = aws_cloudwatch_event_rule.background_job_queue_watcher.arn
    }

    Re-deploy your Cumulus application

    Follow the instructions to re-deploy your Cumulus application. After you have re-deployed, your workflow template will be updated to the include information about the queue (the output below is partial output from an expected workflow template):

    {
    "cumulus_meta": {
    "queueExecutionLimits": {
    "<backgroundJobQueue_SQS_URL>": 5
    }
    }
    }

    Integrate your queue with workflows and/or rules

    Integrate queue with queuing steps in workflows

    For any workflows using QueueGranules or QueuePdrs that you want to use your new queue, update the Cumulus configuration of those steps in your workflows.

    As seen in this partial configuration for a QueueGranules step, update the queueUrl to reference the new throttled queue:

    Note: ${ingest_granule_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverGranules workflow.

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}"
    }
    }
    }
    }
    }

    Similarly, for a QueuePdrs step:

    Note: ${parse_pdr_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverPdrs workflow.

    {
    "QueuePdrs": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "parsePdrWorkflow": "${parse_pdr_workflow_name}"
    }
    }
    }
    }
    }

    After making these changes, re-deploy your Cumulus application for the execution throttling to take effect on workflow executions queued by these workflows.

    Create/update a rule to use your new queue

    Create or update a rule definition to include a queueUrl property that refers to your new queue:

    {
    "name": "s3_provider_rule",
    "workflow": "DiscoverAndQueuePdrs",
    "provider": "s3_provider",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "queueUrl": "<backgroundJobQueue_SQS_URL>" // configure rule to use your queue URL
    }

    After creating/updating the rule, any subsequent invocations of the rule should respect the maximum number of executions when starting workflows from the queue.

    Architecture

    Architecture diagram showing how executions started from a queue are throttled to a maximum concurrent limit

    Execution throttling based on the queue works by manually keeping a count (semaphore) of how many executions are running for the queue at a time. The key operation that prevents the number of executions from exceeding the maximum for the queue is that before starting new executions, the sqs2sfThrottle Lambda attempts to increment the semaphore and responds as follows:

    • If the increment operation is successful, then the count was not at the maximum and an execution is started
    • If the increment operation fails, then the count was already at the maximum so no execution is started

    Final notes

    Limiting the number of concurrent executions for work scheduled via a queue has several consequences worth noting:

    • The number of executions that are running for a given queue will be limited to the maximum for that queue regardless of which workflow(s) are started.
    • If you use the same queue to schedule executions across multiple workflows/rules, then the limit on the total number of executions running concurrently will be applied to all of the executions scheduled across all of those workflows/rules.
    • If you are scheduling the same workflow both via a queue with a maxExecutions value and a queue without a maxExecutions value, only the executions scheduled via the queue with the maxExecutions value will be limited to the maximum.
    - + \ No newline at end of file diff --git a/docs/v14.1.0/data-cookbooks/tracking-files/index.html b/docs/v14.1.0/data-cookbooks/tracking-files/index.html index 4aa6ddf12fa..add85587f0b 100644 --- a/docs/v14.1.0/data-cookbooks/tracking-files/index.html +++ b/docs/v14.1.0/data-cookbooks/tracking-files/index.html @@ -5,7 +5,7 @@ Tracking Ancillary Files | Cumulus Documentation - + @@ -19,7 +19,7 @@ The UMM-G column reflects the RelatedURL's Type derived from the CNM type, whereas the ECHO10 column shows how the CNM type affects the destination element.

    CNM TypeUMM-G RelatedUrl.TypeECHO10 Location
    ancillary'VIEW RELATED INFORMATION'OnlineResource
    data'GET DATA'(HTTPS URL) or 'GET DATA VIA DIRECT ACCESS'(S3 URI)OnlineAccessURL
    browse'GET RELATED VISUALIZATION'AssociatedBrowseImage
    linkage'EXTENDED METADATA'OnlineResource
    metadata'EXTENDED METADATA'OnlineResource
    qa'EXTENDED METADATA'OnlineResource

    Common Use Cases

    This section briefly documents some common use cases and the recommended configuration for the file. The examples shown here are for the DiscoverGranules use case, which allows configuration at the Cumulus dashboard level. The other two cases covered in the ancillary metadata documentation require configuration at the provider notification level (either CNM message or PDR) and are not covered here.

    Configuring browse imagery:

    {
    "bucket": "public",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_[\\d]{1}.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_1.jpg",
    "type": "browse"
    }

    Configuring a documentation entry:

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_README.pdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_README.pdf",
    "type": "metadata"
    }

    Configuring other associated files (use types metadata or qa as appropriate):

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_QA.txt$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_QA.txt",
    "type": "qa"
    }
    - + \ No newline at end of file diff --git a/docs/v14.1.0/deployment/api-gateway-logging/index.html b/docs/v14.1.0/deployment/api-gateway-logging/index.html index cdc44744a99..09904ce6fbe 100644 --- a/docs/v14.1.0/deployment/api-gateway-logging/index.html +++ b/docs/v14.1.0/deployment/api-gateway-logging/index.html @@ -5,13 +5,13 @@ API Gateway Logging | Cumulus Documentation - +
    Version: v14.1.0

    API Gateway Logging

    Enabling API Gateway Logging

    In order to enable distribution API Access and execution logging, configure the TEA deployment by setting log_api_gateway_to_cloudwatch on the thin_egress_app module:

    log_api_gateway_to_cloudwatch = true

    This enables the distribution API to send its logs to the default CloudWatch location: API-Gateway-Execution-Logs_<RESTAPI_ID>/<STAGE>

    Configure Permissions for API Gateway Logging to CloudWatch

    Instructions: Enabling Account Level Logging from API Gateway to CloudWatch

    This is a one time operation that must be performed on each AWS account to allow API Gateway to push logs to CloudWatch.

    1. Create a policy document

      The AmazonAPIGatewayPushToCloudWatchLogs managed policy, with an ARN of arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs, has all the required permissions to enable API Gateway logging to CloudWatch. To grant these permissions to your account, first create an IAM role with apigateway.amazonaws.com as its trusted entity.

      Save this snippet as apigateway-policy.json.

      {
      "Version": "2012-10-17",
      "Statement": [
      {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
      "Service": "apigateway.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
      }
      ]
      }
    2. Create an account role to act as ApiGateway and write to CloudWatchLogs

      NASA users in NGAP: be sure to use your account's permission boundary.

      aws iam create-role \
      --role-name ApiGatewayToCloudWatchLogs \
      [--permissions-boundary <permissionBoundaryArn>] \
      --assume-role-policy-document file://apigateway-policy.json

      Note the ARN of the returned role for the last step.

    3. Attach correct permissions to role

      Next attach the AmazonAPIGatewayPushToCloudWatchLogs policy to the IAM role.

      aws iam attach-role-policy \
      --role-name ApiGatewayToCloudWatchLogs \
      --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs"
    4. Update Account API Gateway settings with correct permissions

      Finally, set the IAM role ARN on the cloudWatchRoleArn property on your API Gateway Account settings.

      aws apigateway update-account \
      --patch-operations op='replace',path='/cloudwatchRoleArn',value='<ApiGatewayToCloudWatchLogs ARN>'

    Configure API Gateway CloudWatch Logs Delivery

    For details about configuring the API Gateway CloudWatch Logs delivery, see Configure Cloudwatch Logs Delivery.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/deployment/choosing_configuring_rds/index.html b/docs/v14.1.0/deployment/choosing_configuring_rds/index.html index 2cd62d44a26..fa72893a5b6 100644 --- a/docs/v14.1.0/deployment/choosing_configuring_rds/index.html +++ b/docs/v14.1.0/deployment/choosing_configuring_rds/index.html @@ -5,7 +5,7 @@ Choosing and Configuration Your RDS Database | Cumulus Documentation - + @@ -36,7 +36,7 @@ using this module to create your RDS cluster, you can configure the autoscaling timeout action, the cluster minimum and maximum capacity, and more as seen in the supported variables for the module.

    Unfortunately, Terraform currently doesn't allow specifying the autoscaling timeout itself, so that value will have to be manually configured in the AWS console or CLI.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/deployment/cloudwatch-logs-delivery/index.html b/docs/v14.1.0/deployment/cloudwatch-logs-delivery/index.html index 6688329920e..e94e6dbf1ca 100644 --- a/docs/v14.1.0/deployment/cloudwatch-logs-delivery/index.html +++ b/docs/v14.1.0/deployment/cloudwatch-logs-delivery/index.html @@ -5,13 +5,13 @@ Configure Cloudwatch Logs Delivery | Cumulus Documentation - +
    Version: v14.1.0

    Configure Cloudwatch Logs Delivery

    As an optional configuration step, it is possible to deliver CloudWatch logs to a cross-account shared AWS::Logs::Destination. An operator does this by configuring the cumulus module for your deployment as shown below. The value of the log_destination_arn variable is the ARN of a writeable log destination.

    The value can be either an AWS::Logs::Destination or a Kinesis Stream ARN to which your account can write.

    log_destination_arn           = arn:aws:[kinesis|logs]:us-east-1:123456789012:[streamName|destination:logDestinationName]

    Logs Sent

    By default, the following logs will be sent to the destination when one is given.

    • Ingest logs
    • Async Operation logs
    • Thin Egress App API Gateway logs (if configured)

    Additional Logs

    If additional logs are needed, you can configure additional_log_groups_to_elk with the Cloudwatch log groups you want to send to the destination. additional_log_groups_to_elk is a map with the key as a descriptor and the value with the Cloudwatch log group name.

    additional_log_groups_to_elk = {
    "HelloWorldTask" = "/aws/lambda/cumulus-example-HelloWorld"
    "MyCustomTask" = "my-custom-task-log-group"
    }
    - + \ No newline at end of file diff --git a/docs/v14.1.0/deployment/components/index.html b/docs/v14.1.0/deployment/components/index.html index 6b3f554a00e..98017b39c8b 100644 --- a/docs/v14.1.0/deployment/components/index.html +++ b/docs/v14.1.0/deployment/components/index.html @@ -5,7 +5,7 @@ Component-based Cumulus Deployment | Cumulus Documentation - + @@ -39,7 +39,7 @@ Terraform at the same time.

    With remote state, Terraform writes the state data to a remote data store, which can then be shared between all members of a team.

    The recommended approach for handling remote state with Cumulus is to use the S3 backend. This backend stores state in S3 and uses a DynamoDB table for locking.

    See the deployment documentation for a walk-through of creating resources for your remote state using an S3 backend.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/deployment/create_bucket/index.html b/docs/v14.1.0/deployment/create_bucket/index.html index 6af49e03c34..58c54eb25af 100644 --- a/docs/v14.1.0/deployment/create_bucket/index.html +++ b/docs/v14.1.0/deployment/create_bucket/index.html @@ -5,13 +5,13 @@ Creating an S3 Bucket | Cumulus Documentation - +
    Version: v14.1.0

    Creating an S3 Bucket

    Buckets can be created on the command line with AWS CLI or via the web interface on the AWS console.

    When creating a protected bucket (a bucket containing data which will be served through the distribution API), make sure to enable S3 server access logging. See S3 Server Access Logging for more details.

    Command Line

    Using the AWS Command Line Tool create-bucket s3api subcommand:

    $ aws s3api create-bucket \
    --bucket foobar-internal \
    --region us-west-2 \
    --create-bucket-configuration LocationConstraint=us-west-2
    {
    "Location": "/foobar-internal"
    }

    ⚠️ Note: The region and create-bucket-configuration arguments are only necessary if you are creating a bucket outside of the us-east-1 region.

    Please note security settings and other bucket options can be set via the options listed in the s3api documentation.

    Repeat the above step for each bucket to be created.

    Web Interface

    If you prefer to use the AWS web interface instead of the command line, see AWS "Creating a Bucket" documentation.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/deployment/cumulus_distribution/index.html b/docs/v14.1.0/deployment/cumulus_distribution/index.html index a5062afe37c..179c90930d1 100644 --- a/docs/v14.1.0/deployment/cumulus_distribution/index.html +++ b/docs/v14.1.0/deployment/cumulus_distribution/index.html @@ -5,14 +5,14 @@ Using the Cumulus Distribution API | Cumulus Documentation - +
    Version: v14.1.0

    Using the Cumulus Distribution API

    The Cumulus Distribution API is a set of endpoints that can be used to enable AWS Cognito authentication when downloading data from S3.

    Configuring a Cumulus Distribution Deployment

    The Cumulus Distribution API is included in the main Cumulus repo. It is available as part of the terraform-aws-cumulus.zip archive in the latest release.

    These steps assume you're using the Cumulus Deployment Template but they can also be used for custom deployments.

    To configure a deployment to use Cumulus Distribution:

    1. Remove or comment the "Thin Egress App Settings" in the Cumulus Template Deploy and enable the "Cumulus Distribution Settings".
    2. Delete or comment the contents of thin_egress_app.tf and the corresponding Thin Egress App outputs in outputs.tf. These are not necessary for a Cumulus Distribution deployment.
    3. Uncomment the Cumulus Distribution outputs in outputs.tf.
    4. Rename cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.example to cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.

    Cognito Application and User Credentials

    The major prerequisite for using the Cumulus Distribution API is to set up Cognito. If operating within NGAP, this should already be done for you. If operating outside of NGAP, you must set up Cognito yourself, which is beyond the scope of this documentation.

    Given that Cognito is set up, in order to be able to download granule files via the Cumulus Distribution API, you must obtain Cognito user credentials, because any attempt to download such files (that will be, or have been, published to the CMR via your Cumulus deployment) will result in a prompt for you to supply Cognito user credentials. To obtain your own user credentials, talk to your product owner or scrum master for additional information. They should either know how to create the credentials, know who can create them for the team, or be the liaison to the Cognito team.

    Further, whoever helps to obtain your Cognito user credentials should also be able to supply you with the values for the following new variables that you must add to your cumulus-tf/terraform.tfvars file:

    • csdap_host_url: The URL of the Cognito service to which your Cumulus deployment will make Cognito API calls during a distribution (download) event
    • csdap_client_id: The client ID for the Cumulus application registered within the Cognito service
    • csdap_client_password: The client password for the Cumulus application registered within the Cognito service

    Although you might have to wait a bit for your Cognito user credentials, the remaining instructions do not depend upon having them, so you may continue with these instructions while waiting for your credentials.

    Cumulus Distribution URL

    Your Cumulus Distribution URL is used by Cumulus to generate download URLs as part of the granule metadata generated and published to the CMR. For example, a granule download URL will be of the form <distribution url>/<protected bucket>/<key> (or <distribution url>/path/to/file, if using a custom bucket map, as explained further below).

    By default, the value of your distribution URL is the URL of your private Cumulus Distribution API Gateway (the API Gateway named <prefix>-distribution, once you deploy the Cumulus Distribution module). Therefore, by default, the generated download URLs are private, and thus inaccessible directly, but there are 2 ways to address this issue (both of which are detailed below): (a) use tunneling (typically in development) or (b) put a CloudFront URL in front of your API Gateway (typically in production, and perhaps UAT and/or SIT).

    In either case, you must first know the default URL (i.e., the URL for the private Cumulus Distribution API Gateway). In order to obtain this default URL, you must first deploy your cumulus-tf module with the new Cumulus Distribution module, and once your initial deployment is complete, one of the Terraform outputs will be cumulus_distribution_api_uri, which is the URL for the private API Gateway.

    You may override this default URL by adding a cumulus_distribution_url variable to your cumulus-tf/terraform.tfvars file and setting it to one of the following values (both are explained below):

    1. The default URL, but with a port added to it, in order to allow you to configure tunneling (typically only in development)
    2. A CloudFront URL placed in front of your Cumulus Distribution API Gateway (typically only for Production, but perhaps also for a UAT or SIT environment)

    The following subsections explain these approaches in turn.

    Using Your Cumulus Distribution API Gateway URL as Your Distribution URL

    Since your Cumulus Distribution API Gateway URL is private, the only way you can use it to confirm that your integration with Cognito is working is by using tunneling (again, generally for development). Here is an outline of the required steps with details provided further below:

    1. Create/import a key pair into your AWS EC2 service (if you haven't already done so)
    2. Add a reference to the name of the key pair to your Terraform variables (we'll set the key_name Terraform variable)
    3. Choose an open local port on your machine (we'll use 9000 in the following example)
    4. Add a reference to the value of your cumulus_distribution_api_uri (mentioned earlier), including your chosen port (we'll set the cumulus_distribution_url Terraform variable)
    5. Redeploy Cumulus
    6. Add an entry to your /etc/hosts file
    7. Add a redirect URI to Cognito via the Cognito API
    8. Install the Session Manager Plugin for the AWS CLI (if you haven't already done so; assuming you have already installed the AWS CLI)
    9. Add a sample file to S3 to test downloading via Cognito

    To create or import an existing key pair, you can use the AWS CLI (see AWS ec2 import-key-pair), or the AWS Console (see Amazon EC2 key pairs and Linux instances).

    Once your key pair is added to AWS, add the following to your cumulus-tf/terraform.tfvars file:

    key_name = "<name>"
    cumulus_distribution_url = "https://<id>.execute-api.<region>.amazonaws.com:<port>/dev/"

    where:

    • <name> is the name of the key pair you just added to AWS
    • <id> and <region> are the corresponding parts from your cumulus_distribution_api_uri output variable
    • <port> is your open local port of choice (9000 is typically a good choice)

    Once you save your variable changes, redeploy your cumulus-tf module.

    While your deployment runs, add the following entry to your /etc/hosts file, replacing <hostname> with the host name of the cumulus_distribution_url Terraform variable you just added above:

    localhost <hostname>

    Next, you'll need to use the Cognito API to add the value of your cumulus_distribution_url Terraform variable as a Cognito redirect URI. To do so, use your favorite tool (e.g., curl, wget, Postman, etc.) to make a BasicAuth request to the Cognito API, using the following details:

    • method: POST
    • base URL: the value of your csdap_host_url Terraform variable
    • path: /authclient/updateRedirectUri
    • username: the value of your csdap_client_id Terraform variable
    • password: the value of your csdap_client_password Terraform variable
    • headers: Content-Type='application/x-www-form-urlencoded'
    • body: redirect_uri=<cumulus_distribution_url>/login

    where <cumulus_distribution_url> is the value of your cumulus_distribution_url Terraform variable. Note the /login path at the end of the redirect_uri value.

    For reference, see the Cognito Authentication Service API.

    Next, install the Session Manager Plugin for the AWS CLI. If running on macOS, and you use Homebrew, you can install it simply as follows:

    brew install --cask session-manager-plugin --no-quarantine

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    At this point, you should be ready to open a tunnel and attempt to download your sample file via your browser, summarized as follows:

    1. Determine your EC2 instance ID
    2. Connect to the NASA VPN
    3. Start an AWS SSM session
    4. Open an SSH tunnel
    5. Use a browser to navigate to your file

    To determine your EC2 instance ID for your Cumulus deployment, run the follow command where <profile> is the name of the appropriate AWS profile to use, and <prefix> is the value of your prefix Terraform variable:

    aws --profile <profile> ec2 describe-instances --filters Name=tag:Deployment,Values=<prefix> Name=instance-state-name,Values=running --query "Reservations[0].Instances[].InstanceId" --output text

    ⚠️ IMPORTANT: Before proceeding with the remaining steps, make sure you're connected to the NASA VPN.

    Use the value output from the command above in place of <id> in the following command, which will start an SSM session:

    aws ssm start-session --target <id> --document-name AWS-StartPortForwardingSession --parameters portNumber=22,localPortNumber=6000

    If successful, you should see output similar to the following:

    Starting session with SessionId: NGAPShApplicationDeveloper-***
    Port 6000 opened for sessionId NGAPShApplicationDeveloper-***.
    Waiting for connections...

    In another terminal window, open a tunnel with port forwarding using your chosen port from above (e.g., 9000):

    ssh -4 -p 6000 -N -L <port>:<api-gateway-host>:443 ec2-user@127.0.0.1

    where:

    • <port> is the open local port you chose earlier (e.g., 9000)
    • <api-gateway-host> is the hostname of your private API Gateway (i.e., the host portion of the URL you used as the value of your cumulus_distribution_url Terraform variable above)

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3 above.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, and then next enter a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    Once you're finished testing, clean up as follows:

    1. Stop your SSH tunnel (enter Ctrl-C)
    2. Stop your AWS SSM session (enter Ctrl-C)
    3. If you like, disconnect from the NASA VPN

    While this is a relatively lengthy process, things are much easier when using CloudFront, such as in Production (OPS), SIT, or UAT, as explained next.

    Using a CloudFront URL as Your Distribution URL

    In Production (OPS), and perhaps in other environments, such as UAT and SIT, you'll need to provide a publicly accessible URL for users to use for downloading (distributing) granule files.

    This is generally done by placing a CloudFront URL in front of your private Cumulus Distribution API Gateway. In order to create such a CloudFront URL, contact the person who helped you obtain your Cognito credentials, and request a CloudFront URL with the following details:

    • The private, backing URL, which is the value of your cumulus_distribution_api_uri Terraform output value
    • A request to add the AWS account's VPC to the whitelist

    Once this request is completed, and you obtain the new CloudFront URL, override your default distribution URL with the CloudFront URL by adding the following to your cumulus-tf/terraform.tfvars file:

    cumulus_distribution_url = <cloudfront_url>

    In addition, add a Cognito redirect URI, as detailed in the previous section. Note that in this case, the value you'll use for redirect_uri is <cloudfront_url>/login since the value of your cumulus_distribution_url is now your CloudFront URL.

    At this point, it is assumed that you have added the appropriate values for this environment for the variables described at the top (csdap_host_url, csdap_client_id, and csdap_client_password).

    Redeploy Cumulus with your new/updated Terraform variables.

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    S3 Bucket Mapping

    An S3 Bucket map allows users to abstract bucket names. If the bucket names change at any point, only the bucket map would need to be updated instead of every S3 link.

    The Cumulus Distribution API uses a bucket_map.yaml or bucket_map.yaml.tmpl file to determine which buckets to serve. See the examples.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple JSON mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    ⚠️ Note: Cumulus only supports a one-to-one mapping of bucket -> Cumulus Distribution path for 'distribution' buckets. Also, the bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Switching from the Thin Egress App to Cumulus Distribution

    If you have previously deployed the Thin Egress App (TEA) as your distribution app, you can switch to Cumulus Distribution by following the steps above.

    Note, however, that the cumulus_distribution module will generate a bucket map cache and overwrite any existing bucket map caches created by TEA.

    There will also be downtime while your API gateway is updated.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/deployment/index.html b/docs/v14.1.0/deployment/index.html index 208f1d7aa4e..78e35825e35 100644 --- a/docs/v14.1.0/deployment/index.html +++ b/docs/v14.1.0/deployment/index.html @@ -5,7 +5,7 @@ How to Deploy Cumulus | Cumulus Documentation - + @@ -19,7 +19,7 @@ for deployment's EC2 instances and allows you to connect to them via SSH/SSM.

    Consider the sizing of your Cumulus instance when configuring your variables.

    Choose a Distribution API

    Cumulus can be configured to use either the Thin Egress App (TEA) or the Cumulus Distribution API. The default selection is the Thin Egress App if you're using the Deployment Template.

    ⚠️ IMPORTANT: If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    Configure the Thin Egress App

    TEA can be used for Cumulus distribution and is the default selection. It allows authentication using Earthdata Login. Follow the steps in the TEA documentation to configure distribution in your cumulus-tf deployment.

    Configure the Cumulus Distribution API (Optional)

    If you would prefer to use the Cumulus Distribution API, which supports AWS Cognito authentication, follow these steps to configure distribution in your cumulus-tf deployment.

    Initialize Terraform

    Follow the above instructions to initialize Terraform using terraform init3.

    Deploy

    Run terraform apply to deploy the resources. Type yes when prompted to confirm that you want to create the resources. Assuming the operation is successful, you should see output like this:

    Apply complete! Resources: 292 added, 0 changed, 0 destroyed.

    Outputs:

    archive_api_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/token
    archive_api_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/
    distribution_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/login
    distribution_url = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/

    ⚠️ Note: Be sure to copy the redirect URLs because you will need them to update your Earthdata application.

    Update Earthdata Application

    Add the two redirect URLs to your EarthData login application by doing the following:

    1. Login to URS
    2. Under My Applications -> Application Administration -> use the edit icon of your application
    3. Under Manage -> redirect URIs, add the Archive API url returned from the stack deployment
      • e.g. archive_api_redirect_uri = https://<czbbkscuy6>.execute-api.us-east-1.amazonaws.com/dev/token
    4. Also add the Distribution url
      • e.g. distribution_redirect_uri = https://<kido2r7kji>.execute-api.us-east-1.amazonaws.com/dev/login1
    5. You may delete the placeholder url you used to create the application

    If you've lost track of the needed redirect URIs, they can be located on the API Gateway. Once there, select <prefix>-archive and/or <prefix>-thin-egress-app-EgressGateway, Dashboard and utilizing the base URL at the top of the page that is accompanied by the text Invoke this API at:. Make sure to append /token for the archive URL and /login to the thin egress app URL.


    Deploy Cumulus Dashboard

    Dashboard Requirements

    Please note that the requirements are similar to the Cumulus stack deployment requirements. The installation instructions below include a step that will install/use the required node version referenced in the .nvmrc file in the Dashboard repository.

    Prepare AWS

    Create S3 Bucket for Dashboard:

    • Create it, e.g. <prefix>-dashboard. Use the command line or console as you did when preparing AWS configuration.
    • Configure the bucket to host a website:
      • AWS S3 console: Select <prefix>-dashboard bucket then, "Properties" -> "Static Website Hosting", point to index.html
      • CLI: aws s3 website s3://<prefix>-dashboard --index-document index.html
    • The bucket's url will be http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or you can find it on the AWS console via "Properties" -> "Static website hosting" -> "Endpoint"
    • Ensure the bucket's access permissions allow your deployment user access to write to the bucket

    Install Dashboard

    To install the Cumulus Dashboard, clone the repository into the root deploy directory and install dependencies with npm install:

      git clone https://github.com/nasa/cumulus-dashboard
    cd cumulus-dashboard
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Dashboard Versioning

    By default, the master branch will be used for Dashboard deployments. The master branch of the repository contains the most recent stable release of the Cumulus Dashboard.

    If you want to test unreleased changes to the Dashboard, use the develop branch.

    Each release/version of the Dashboard will have a tag in the Dashboard repo. Release/version numbers will use semantic versioning (major/minor/patch).

    To checkout and install a specific version of the Dashboard:

      git fetch --tags
    git checkout <version-number> # e.g. v1.2.0
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Building the Dashboard

    ⚠️ Note: These environment variables are available during the build: APIROOT, DAAC_NAME, STAGE, HIDE_PDR. Any of these can be set on the command line to override the values contained in config.js when running the build below.

    To configure your dashboard for deployment, set the APIROOT environment variable to your app's API root.2

    Build your dashboard from the Cumulus Dashboard repository root directory, cumulus-dashboard:

      APIROOT=<your_api_root> npm run build

    Dashboard Deployment

    Deploy your dashboard to S3 bucket from the cumulus-dashboard directory:

    Using AWS CLI:

      aws s3 sync dist s3://<prefix>-dashboard --acl public-read

    From the S3 Console:

    • Open the <prefix>-dashboard bucket, click 'upload'. Add the contents of the 'dist' subdirectory to the upload. Then select 'Next'. On the permissions window allow the public to view. Select 'Upload'.

    You should be able to visit the Dashboard website at http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or find the url <prefix>-dashboard -> "Properties" -> "Static website hosting" -> "Endpoint" and log in with a user that you had previously configured for access.


    Cumulus Instance Sizing

    The Cumulus deployment default sizing for Elasticsearch instances, EC2 instances, and Autoscaling Groups are small and designed for testing and cost savings. The default settings are likely not suitable for production workloads. Sizing is highly individual and dependent on expected load and archive size.

    Please be cognizant of costs as any change in size will affect your AWS bill. AWS provides a pricing calculator for estimating costs.

    Elasticsearch

    The mappings file contains all of the data types that will be indexed into Elasticsearch. Elasticsearch sizing is tied to your archive size, including your collections, granules, and workflow executions that will be stored.

    AWS provides documentation on calculating and configuring for sizing.

    In addition to size you'll want to consider the number of nodes which determine how the system reacts in the event of a failure.

    Configuration can be done in the data persistence module in elasticsearch_config and the cumulus module in es_index_shards.

    If you make changes to your Elasticsearch configuration you will need to reindex for those changes to take effect.

    EC2 Instances and Autoscaling Groups

    EC2 instances are used for long-running operations (i.e. generating a reconciliation report) and long-running workflow tasks. Configuration for your ECS cluster is achieved via Cumulus deployment variables.

    When configuring your ECS cluster consider:

    • The EC2 instance type and EBS volume size needed to accommodate your workloads. Configured as ecs_cluster_instance_type and ecs_cluster_instance_docker_volume_size.
    • The minimum and desired number of instances on hand to accommodate your workloads. Configured as ecs_cluster_min_size and ecs_cluster_desired_size.
    • The maximum number of instances you will need and are willing to pay for to accommodate your heaviest workloads. Configured as ecs_cluster_max_size.
    • Your autoscaling parameters: ecs_cluster_scale_in_adjustment_percent, ecs_cluster_scale_out_adjustment_percent, ecs_cluster_scale_in_threshold_percent, and ecs_cluster_scale_out_threshold_percent.

    Footnotes


    1. Run terraform init if:

      • This is the first time deploying the module
      • You have added any additional child modules, including Cumulus components
      • You have updated the source for any of the child modules

    2. To add another redirect URIs to your application. On Earthdata home page, select "My Applications". Scroll down to "Application Administration" and use the edit icon for your application. Then Manage -> Redirect URIs.

    3. The API root can be found a number of ways. The easiest is to note it in the output of the app deployment step. But you can also find it from the AWS console -> Amazon API Gateway -> APIs -> <prefix>-archive -> Dashboard, and reading the URL at the top after "Invoke this API at"

    - + \ No newline at end of file diff --git a/docs/v14.1.0/deployment/postgres_database_deployment/index.html b/docs/v14.1.0/deployment/postgres_database_deployment/index.html index 833a83c8a01..ba381ad12c6 100644 --- a/docs/v14.1.0/deployment/postgres_database_deployment/index.html +++ b/docs/v14.1.0/deployment/postgres_database_deployment/index.html @@ -5,7 +5,7 @@ PostgreSQL Database Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ cumulus-rds-tf that will deploy an AWS RDS Aurora Serverless PostgreSQL 11 compatible database cluster, and optionally provision a single deployment database with credentialed secrets for use with Cumulus.

    We have provided an example terraform deployment using this module in the Cumulus template-deploy repository on github.

    Use of this example involves:

    • Creating/configuring a Terraform module directory
    • Using Terraform to deploy resources to AWS

    Requirements

    Configuration/installation of this module requires the following:

    • Terraform
    • git
    • A VPC configured for use with Cumulus Core. This should match the subnets you provide when Deploying Cumulus to allow Core's lambdas to properly access the database.
    • At least two subnets across multiple AZs. These should match the subnets you provide as configuration when Deploying Cumulus, and should be within the same VPC.

    Needed Git Repositories

    Assumptions

    OS/Environment

    The instructions in this module require Linux/MacOS. While deployment via Windows is possible, it is unsupported.

    Terraform

    This document assumes knowledge of Terraform. If you are not comfortable working with Terraform, the following links should bring you up to speed:

    For Cumulus specific instructions on installation of Terraform, refer to the main Cumulus Installation Documentation

    Aurora/RDS

    This document also assumes some basic familiarity with PostgreSQL databases and Amazon Aurora/RDS. If you're unfamiliar consider perusing the AWS docs and the Aurora Serverless V1 docs.

    Prepare Deployment Repository

    If you already are working with an existing repository that has a configured rds-cluster-tf deployment for the version of Cumulus you intend to deploy or update, or just need to configure this module for your repository, skip to Prepare AWS Configuration.

    Clone the cumulus-template-deploy repo and name appropriately for your organization:

      git clone https://github.com/nasa/cumulus-template-deploy <repository-name>

    We will return to configuring this repo and using it for deployment below.

    Optional: Create a New Repository

    Create a new repository on Github so that you can add your workflows and other modules to source control:

      git remote set-url origin https://github.com/<org>/<repository-name>
    git push origin master

    You can then add/commit changes as needed.

    ⚠️ Note: If you are pushing your deployment code to a git repo, make sure to add terraform.tf and terraform.tfvars to .gitignore, as these files will contain sensitive data related to your AWS account.


    Prepare AWS Configuration

    To deploy this module, you need to make sure that you have the following steps from the Cumulus deployment instructions in similar fashion for this module:

    --

    Configure and Deploy the Module

    When configuring this module, please keep in mind that unlike Cumulus deployment, this module should be deployed once to create the database cluster and only thereafter to make changes to that configuration/upgrade/etc.

    Tip: This module does not need to be re-deployed for each Core update.

    These steps should be executed in the rds-cluster-tf directory of the template deploy repo that you previously cloned. Run the following to copy the example files:

    cd rds-cluster-tf/
    cp terraform.tf.example terraform.tf
    cp terraform.tfvars.example terraform.tfvars

    In terraform.tf, configure the remote state settings by substituting the appropriate values for:

    • bucket
    • dynamodb_table
    • PREFIX (whatever prefix you've chosen for your deployment)

    Fill in the appropriate values in terraform.tfvars. See the rds-cluster-tf module variable definitions for more detail on all of the configuration options. A few notable configuration options are documented in the next section.

    Configuration Options

    • deletion_protection -- defaults to true. Set it to false if you want to be able to delete your cluster with a terraform destroy without manually updating the cluster.
    • db_admin_username -- cluster database administration username. Defaults to postgres.
    • db_admin_password -- required variable that specifies the admin user password for the cluster. To randomize this on each deployment, consider using a random_string resource as input.
    • region -- defaults to us-east-1.
    • subnets -- requires at least 2 across different AZs. For use with Cumulus, these AZs should match the values you configure for your lambda_subnet_ids.
    • max_capacity -- the max ACUs the cluster is allowed to use. Carefully consider cost/performance concerns when setting this value.
    • min_capacity -- the minimum ACUs the cluster will scale to
    • provision_user_database -- Optional flag to allow module to provision a user database in addition to creating the cluster. Described in the next section.

    Provision User and User Database

    If you wish for the module to provision a PostgreSQL database on your new cluster and provide a secret for access in the module output, in addition to managing the cluster itself, the following configuration keys are required:

    • provision_user_database -- must be set to true. This configures the module to deploy a lambda that will create the user database, and update the provided configuration on deploy.
    • permissions_boundary_arn -- the permissions boundary to use in creating the roles for access the provisioning lambda will need. This should in most use cases be the same one used for Cumulus Core deployment.
    • rds_user_password -- the value to set the user password to.
    • prefix -- this value will be used to set a unique identifier for the ProvisionDatabase lambda, as well as name the provisioned user/database.

    Once configured, the module will deploy the lambda and run it on each provision thus creating the configured database (if it does not exist), updating the user password (if that value has been changed), and updating the output user database secret.

    Setting provision_user_database to false after provisioning will not result in removal of the configured database, as the lambda is non-destructive as configured in this module.

    ⚠️ Note: This functionality is limited in that it will only provision a single database/user and configure a basic database, and should not be used in scenarios where more complex configuration is required.

    Initialize Terraform

    Run terraform init

    You should see a similar output:

    * provider.aws: version = "~> 2.32"

    Terraform has been successfully initialized!

    Deploy

    Run terraform apply to deploy the resources.

    ⚠️ Caution: If re-applying this module, variables (e.g. engine_version, snapshot_identifier ) that force a recreation of the database cluster may result in data loss if deletion protection is disabled. Examine the changeset carefully for resources that will be re-created/destroyed before applying.

    Review the changeset, and assuming it looks correct, type yes when prompted to confirm that you want to create all of the resources.

    Assuming the operation is successful, you should see output similar to the following (this example omits the creation of a user database/lambdas/security groups):

    terraform apply

    An execution plan has been generated and is shown below.
    Resource actions are indicated with the following symbols:
    + create

    Terraform will perform the following actions:

    # module.rds_cluster.aws_db_subnet_group.default will be created
    + resource "aws_db_subnet_group" "default" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + subnet_ids = [
    + "subnet-xxxxxxxxx",
    + "subnet-xxxxxxxxx",
    ]
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    }

    # module.rds_cluster.aws_rds_cluster.cumulus will be created
    + resource "aws_rds_cluster" "cumulus" {
    + apply_immediately = true
    + arn = (known after apply)
    + availability_zones = (known after apply)
    + backup_retention_period = 1
    + cluster_identifier = "xxxxxxxxx"
    + cluster_identifier_prefix = (known after apply)
    + cluster_members = (known after apply)
    + cluster_resource_id = (known after apply)
    + copy_tags_to_snapshot = false
    + database_name = "xxxxxxxxx"
    + db_cluster_parameter_group_name = (known after apply)
    + db_subnet_group_name = (known after apply)
    + deletion_protection = true
    + enable_http_endpoint = true
    + endpoint = (known after apply)
    + engine = "aurora-postgresql"
    + engine_mode = "serverless"
    + engine_version = "10.12"
    + final_snapshot_identifier = "xxxxxxxxx"
    + hosted_zone_id = (known after apply)
    + id = (known after apply)
    + kms_key_id = (known after apply)
    + master_password = (sensitive value)
    + master_username = "xxxxxxxxx"
    + port = (known after apply)
    + preferred_backup_window = "07:00-09:00"
    + preferred_maintenance_window = (known after apply)
    + reader_endpoint = (known after apply)
    + skip_final_snapshot = false
    + storage_encrypted = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_security_group_ids = (known after apply)

    + scaling_configuration {
    + auto_pause = true
    + max_capacity = 4
    + min_capacity = 2
    + seconds_until_auto_pause = 300
    + timeout_action = "RollbackCapacityChange"
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret.rds_login will be created
    + resource "aws_secretsmanager_secret" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + policy = (known after apply)
    + recovery_window_in_days = 30
    + rotation_enabled = (known after apply)
    + rotation_lambda_arn = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }

    + rotation_rules {
    + automatically_after_days = (known after apply)
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret_version.rds_login will be created
    + resource "aws_secretsmanager_secret_version" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + secret_id = (known after apply)
    + secret_string = (sensitive value)
    + version_id = (known after apply)
    + version_stages = (known after apply)
    }

    # module.rds_cluster.aws_security_group.rds_cluster_access will be created
    + resource "aws_security_group" "rds_cluster_access" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + egress = (known after apply)
    + id = (known after apply)
    + ingress = (known after apply)
    + name = (known after apply)
    + name_prefix = "cumulus_rds_cluster_access_ingress"
    + owner_id = (known after apply)
    + revoke_rules_on_delete = false
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_id = "vpc-xxxxxxxxx"
    }

    # module.rds_cluster.aws_security_group_rule.rds_security_group_allow_PostgreSQL will be created
    + resource "aws_security_group_rule" "rds_security_group_allow_postgres" {
    + from_port = 5432
    + id = (known after apply)
    + protocol = "tcp"
    + security_group_id = (known after apply)
    + self = true
    + source_security_group_id = (known after apply)
    + to_port = 5432
    + type = "ingress"
    }

    Plan: 6 to add, 0 to change, 0 to destroy.

    Do you want to perform these actions?
    Terraform will perform the actions described above.
    Only 'yes' will be accepted to approve.

    Enter a value: yes

    module.rds_cluster.aws_db_subnet_group.default: Creating...
    module.rds_cluster.aws_security_group.rds_cluster_access: Creating...
    module.rds_cluster.aws_secretsmanager_secret.rds_login: Creating...

    Then, after the resources are created:

    Apply complete! Resources: X added, 0 changed, 0 destroyed.
    Releasing state lock. This may take a few moments...

    Outputs:

    admin_db_login_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxxxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmdR
    admin_db_login_secret_version = xxxxxxxxx
    rds_endpoint = xxxxxxxxx.us-east-1.rds.amazonaws.com
    security_group_id = xxxxxxxxx
    user_credentials_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmpXA

    Note the output values for admin_db_login_secret_arn (and optionally user_credentials_secret_arn) as these provide the AWS Secrets Manager secrets required to access the database as the administrative user and, optionally, the user database credentials Cumulus requires as well.

    The content of each of these secrets are in the form:

    {
    "database": "postgres",
    "dbClusterIdentifier": "clusterName",
    "engine": "postgres",
    "host": "xxx",
    "password": "defaultPassword",
    "port": 5432,
    "username": "xxx"
    }
    • database -- the PostgreSQL database used by the configured user
    • dbClusterIdentifier -- the value set by the cluster_identifier variable in the terraform module
    • engine -- the Aurora/RDS database engine
    • host -- the RDS service host for the database in the form (dbClusterIdentifier)-(AWS ID string).(region).rds.amazonaws.com
    • password -- the database password
    • username -- the account username
    • port -- The database connection port, should always be 5432

    Next Steps

    The database cluster has been created/updated! From here you can continue to add additional user accounts, databases, and other database configuration.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/deployment/share-s3-access-logs/index.html b/docs/v14.1.0/deployment/share-s3-access-logs/index.html index 2d819c14b75..a36e3734c99 100644 --- a/docs/v14.1.0/deployment/share-s3-access-logs/index.html +++ b/docs/v14.1.0/deployment/share-s3-access-logs/index.html @@ -5,13 +5,13 @@ Share S3 Access Logs | Cumulus Documentation - +
    Version: v14.1.0

    Share S3 Access Logs

    It is possible through Cumulus to share S3 access logs across multiple S3 packages using the S3 replicator package.

    S3 Replicator

    The S3 Replicator is a Node.js package that contains a simple Lambda function, associated permissions, and the Terraform instructions to replicate create-object events from one S3 bucket to another.

    First ensure that you have enabled S3 Server Access Logging.

    Next configure your config.tfvars as described in the s3-replicator/README.md to correspond to your deployment. The source_bucket and source_prefix are determined by how you enabled the S3 Server Access Logging.

    In order to deploy the s3-replicator with cumulus you will need to add the module to your terraform main.tf definition as the example below:

    module "s3-replicator" {
    source = "<path to s3-replicator.zip>"
    prefix = var.prefix
    vpc_id = var.vpc_id
    subnet_ids = var.subnet_ids
    permissions_boundary = var.permissions_boundary_arn
    source_bucket = var.s3_replicator_config.source_bucket
    source_prefix = var.s3_replicator_config.source_prefix
    target_bucket = var.s3_replicator_config.target_bucket
    target_prefix = var.s3_replicator_config.target_prefix
    }

    The Terraform source package can be found on the Cumulus GitHub Release page under the asset tab terraform-aws-cumulus-s3-replicator.zip.

    ESDIS Metrics

    In the NGAP environment, the ESDIS Metrics team has set up an ELK stack to process logs from Cumulus instances. To use this system, you must deliver any S3 Server Access logs that Cumulus creates.

    Configure the S3 Replicator as described above using the target_bucket and target_prefix provided by the Metrics team.

    The Metrics team has taken care of setting up Logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/deployment/terraform-best-practices/index.html b/docs/v14.1.0/deployment/terraform-best-practices/index.html index 639859990f5..b1f59422c6b 100644 --- a/docs/v14.1.0/deployment/terraform-best-practices/index.html +++ b/docs/v14.1.0/deployment/terraform-best-practices/index.html @@ -5,7 +5,7 @@ Terraform Best Practices | Cumulus Documentation - + @@ -88,7 +88,7 @@ AWS CLI command, replacing PREFIX with your deployment prefix name:

    aws resourcegroupstaggingapi get-resources \
    --query "ResourceTagMappingList[].ResourceARN" \
    --tag-filters Key=Deployment,Values=PREFIX

    Ideally, the output should be an empty list, but if it is not, then you may need to manually delete the listed resources.

    Configuring the Cumulus deployment: link Restoring a previous version: link

    - + \ No newline at end of file diff --git a/docs/v14.1.0/deployment/thin_egress_app/index.html b/docs/v14.1.0/deployment/thin_egress_app/index.html index 846597c040c..9702aca57a1 100644 --- a/docs/v14.1.0/deployment/thin_egress_app/index.html +++ b/docs/v14.1.0/deployment/thin_egress_app/index.html @@ -5,7 +5,7 @@ Using the Thin Egress App for Cumulus Distribution | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v14.1.0

    Using the Thin Egress App for Cumulus Distribution

    The Thin Egress App (TEA) is an app running in Lambda that allows retrieving data from S3 using temporary links and provides URS integration.

    Configuring a TEA Deployment

    TEA is deployed using Terraform modules. Refer to these instructions for guidance on how to integrate new components with your deployment.

    The cumulus-template-deploy repository cumulus-tf/main.tf contains a thin_egress_app for distribution.

    The TEA module provides these instructions showing how to add it to your deployment and the following are instructions to configure the thin_egress_app module in your Cumulus deployment.

    Create a Secret for Signing Thin Egress App JWTs

    The Thin Egress App uses JSON Web Tokens (JWTs) internally to authenticate requests and requires a secret stored in AWS Secrets Manager containing SSH keys that are used to sign the JWTs.

    See the Thin Egress App documentation on how to create this secret with the correct values. It will be used later to set the thin_egress_jwt_secret_name variable when deploying the Cumulus module.

    Bucket_map.yaml

    The Thin Egress App uses a bucket_map.yaml file to determine which buckets to serve. Documentation of the file format is available here.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple JSON mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    ⚠️ Note: Cumulus only supports a one-to-one mapping of bucket->TEA path for 'distribution' buckets.

    Optionally Configure a Custom Bucket Map

    A simple config would look something like this:

    bucket_map.yaml
    MAP:
    my-protected: my-protected
    my-public: my-public

    PUBLIC_BUCKETS:
    - my-public

    ⚠️ Note: Your custom bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Optionally Configure Shared Variables

    The cumulus module deploys certain components that interact with TEA. As a result, the cumulus module requires that if you are specifying a value for the stage_name variable to the TEA module, you must use the same value for the tea_api_gateway_stage variable to the cumulus module.

    One way to keep these variable values in sync across the modules is to use Terraform local values to define values to use for the variables for both modules. This approach is shown in the Cumulus Core example deployment code.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/deployment/upgrade-readme/index.html b/docs/v14.1.0/deployment/upgrade-readme/index.html index 4381f5a945c..68ba9ed8550 100644 --- a/docs/v14.1.0/deployment/upgrade-readme/index.html +++ b/docs/v14.1.0/deployment/upgrade-readme/index.html @@ -5,7 +5,7 @@ Upgrading Cumulus | Cumulus Documentation - + @@ -15,7 +15,7 @@ deployment functions correctly. Please refer to some recommended smoke tests given above, and consider additional tests appropriate for your particular deployment and environment.

    Update Cumulus Dashboard

    If there are breaking (or otherwise significant) changes to the Cumulus API, you should also upgrade your Cumulus Dashboard deployment to use the version of the Cumulus API matching the version of Cumulus to which you are migrating.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/development/forked-pr/index.html b/docs/v14.1.0/development/forked-pr/index.html index fcaa34627ab..319984a536e 100644 --- a/docs/v14.1.0/development/forked-pr/index.html +++ b/docs/v14.1.0/development/forked-pr/index.html @@ -5,13 +5,13 @@ Issuing PR From Forked Repos | Cumulus Documentation - +
    Version: v14.1.0

    Issuing PR From Forked Repos

    Fork the Repo

    • Fork the Cumulus repo
    • Create a new branch from the branch you'd like to contribute to
    • If an issue does't already exist, submit one (see above)

    Create a Pull Request

    Reviewing PRs from Forked Repos

    Upon submission of a pull request, the Cumulus development team will review the code.

    Once the code passes an initial review, the team will run the CI tests against the proposed update.

    The request will then either be merged, declined, or an adjustment to the code will be requested via the issue opened with the original PR request.

    PRs from forked repos cannot directly merged to master. Cumulus reviews must follow the following steps before completing the review process:

    1. Create a new branch:

        git checkout -b from-<name-of-the-branch> master
    2. Push the new branch to GitHub

    3. Change the destination of the forked PR to the new branch that was just pushed

      Screenshot of Github interface showing how to change the base branch of a pull request

    4. After code review and approval, merge the forked PR to the new branch.

    5. Create a PR for the new branch to master.

    6. If the CI tests pass, merge the new branch to master and close the issue. If the CI tests do not pass, request an amended PR from the original author/ or resolve failures as appropriate.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/development/integration-tests/index.html b/docs/v14.1.0/development/integration-tests/index.html index fe20065f257..afd1164a326 100644 --- a/docs/v14.1.0/development/integration-tests/index.html +++ b/docs/v14.1.0/development/integration-tests/index.html @@ -5,7 +5,7 @@ Integration Tests | Cumulus Documentation - + @@ -19,7 +19,7 @@ in the commit message.

    If you create a new stack and want to be able to run integration tests against it in CI, you will need to add it to bamboo/select-stack.js.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/development/quality-and-coverage/index.html b/docs/v14.1.0/development/quality-and-coverage/index.html index 0fc82155e5f..1491f6566c4 100644 --- a/docs/v14.1.0/development/quality-and-coverage/index.html +++ b/docs/v14.1.0/development/quality-and-coverage/index.html @@ -5,7 +5,7 @@ Code Coverage and Quality | Cumulus Documentation - + @@ -23,7 +23,7 @@ here.

    To run linting on the markdown files, run npm run lint-md.

    Audit

    This project uses audit-ci to run a security audit on the package dependency tree. This must pass prior to merge. The configured rules for audit-ci can be found here.

    To execute an audit, run npm run audit.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/development/release/index.html b/docs/v14.1.0/development/release/index.html index 50eefa3467f..4cb26703beb 100644 --- a/docs/v14.1.0/development/release/index.html +++ b/docs/v14.1.0/development/release/index.html @@ -5,7 +5,7 @@ Versioning and Releases | Cumulus Documentation - + @@ -24,7 +24,7 @@ this is a backport and patch release on the 13.3.x series of releases. Updates that are included in the future will have a corresponding CHANGELOG entry in future releases..

    Troubleshooting

    Delete and regenerate the tag

    To delete a published tag to re-tag, follow these steps:

      git tag -d vMAJOR.MINOR.PATCH
    git push -d origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -d v9.1.0
    git push -d origin v9.1.0
    - + \ No newline at end of file diff --git a/docs/v14.1.0/docs-how-to/index.html b/docs/v14.1.0/docs-how-to/index.html index dad5ef93878..270f312cdd0 100644 --- a/docs/v14.1.0/docs-how-to/index.html +++ b/docs/v14.1.0/docs-how-to/index.html @@ -5,7 +5,7 @@ Cumulus Documentation: How To's | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v14.1.0

    Cumulus Documentation: How To's

    Cumulus Docs Installation

    Run a Local Server

    Environment variables DOCSEARCH_APP_ID, DOCSEARCH_API_KEY and DOCSEARCH_INDEX_NAME must be set for search to work. At the moment, search is only truly functional on prod because that is the only website we have registered to be indexed with DocSearch (see below on search).

    git clone git@github.com:nasa/cumulus
    cd cumulus
    npm run docs-install
    npm run docs-serve
    note

    docs-build will build the documents into website/build. docs-clear will clear the documents.

    caution

    Fix any broken links reported by Docusaurus if you see the following messages during build.

    [INFO] Docusaurus found broken links!

    Exhaustive list of all broken links found:

    Cumulus Documentation

    Our project documentation is hosted on GitHub Pages. The resources published to this website are housed in docs/ directory at the top of the Cumulus repository. Those resources primarily consist of markdown files and images.

    We use the open-source static website generator Docusaurus to build html files from our markdown documentation, add some organization and navigation, and provide some other niceties in the final website (search, easy templating, etc.).

    Add a New Page and Sidebars

    Adding a new page should be as simple as writing some documentation in markdown, placing it under the correct directory in the docs/ folder and adding some configuration values wrapped by --- at the top of the file. There are many files that already have this header which can be used as reference.

    ---
    id: doc-unique-id # unique id for this document. This must be unique across ALL documentation under docs/
    title: Title Of Doc # Whatever title you feel like adding. This will show up as the index to this page on the sidebar.
    hide_title: false
    ---

    Note: To have the new page show up in a sidebar the designated id must be added to a sidebar in the website/sidebars.js file. Docusaurus has an in depth explanation of sidebars here.

    Versioning Docs

    We lean heavily on Docusaurus for versioning. Their suggestions and walk-through can be found here. Docusaurus v2 uses snapshot approach for documentation versioning. Every versioned docs does not depends on other version. It is worth noting that we would like the Documentation versions to match up directly with release versions. However, a new versioned docs can take up a lot of repo space and require maintenance, we suggest to update existing versioned docs for minor releases when there are no significant functionality changes. Cumulus versioning is explained in the Versioning Docs.

    Search on our documentation site is taken care of by DocSearch. We have been provided with an apiId, apiKey and an indexName by DocSearch that we include in our website/docusaurus.config.js file. The rest, indexing and actual searching, we leave to DocSearch. Our builds expect environment variables for these values to exist - DOCSEARCH_APP_ID, DOCSEARCH_API_KEY and DOCSEARCH_NAME_INDEX.

    Add a new task

    The tasks list in docs/tasks.md is generated from the list of task package in the task folder. Do not edit the docs/tasks.md file directly.

    Read more about adding a new task.

    Editing the tasks.md header or template

    Look at the bin/build-tasks-doc.js and bin/tasks-header.md files to edit the output of the tasks build script.

    Editing diagrams

    For some diagrams included in the documentation, the raw source is included in the docs/assets/raw directory to allow for easy updating in the future:

    • assets/interfaces.svg -> assets/raw/interfaces.drawio (generated using draw.io)

    Deployment

    The master branch is automatically built and deployed to gh-pages branch. The gh-pages branch is served by Github Pages. Do not make edits to the gh-pages branch.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/external-contributions/index.html b/docs/v14.1.0/external-contributions/index.html index 99f7b4fcf6a..87a6aa862ea 100644 --- a/docs/v14.1.0/external-contributions/index.html +++ b/docs/v14.1.0/external-contributions/index.html @@ -5,13 +5,13 @@ External Contributions | Cumulus Documentation - +
    Version: v14.1.0

    External Contributions

    Contributions to Cumulus may be made in the form of PRs to the repositories directly or through externally developed tasks and components. Cumulus is designed as an ecosystem that leverages Terraform deployments and AWS Step Functions to easily integrate external components.

    This list may not be exhaustive and represents components that are open source, owned externally, and that have been tested with the Cumulus system. For more information and contributing guidelines, visit the respective GitHub repositories.

    Distribution

    The ASF Thin Egress App is used by Cumulus for distribution. TEA can be deployed with Cumulus or as part of other applications to distribute data.

    Operational Cloud Recovery Archive (ORCA)

    ORCA can be deployed with Cumulus to provide a customizable baseline for creating and managing operational backups.

    Workflow Tasks

    CNM

    PO.DAAC provides two workflow tasks to be used with the Cloud Notification Mechanism (CNM) Schema: CNM to Granule and CNM Response.

    See the CNM workflow data cookbook for an example of how these can be used in a Cumulus ingest workflow.

    DMR++ Generation

    GHRC has provided a DMR++ Generation wokrflow task. This task is meant to be used in conjunction with Cumulus' Hyrax Metadata Updates workflow task.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/faqs/index.html b/docs/v14.1.0/faqs/index.html index 543b1e53853..ff0fc30d647 100644 --- a/docs/v14.1.0/faqs/index.html +++ b/docs/v14.1.0/faqs/index.html @@ -5,13 +5,13 @@ Frequently Asked Questions | Cumulus Documentation - +
    Version: v14.1.0

    Frequently Asked Questions

    Below are some commonly asked questions that you may encounter that can assist you along the way when working with Cumulus.

    General | Workflows | Integrators & Developers | Operators


    General

    What prerequisites are needed to setup Cumulus?
    Answer: Here is a list of the tools and access that you will need in order to get started. To maintain the up-to-date versions that we are using please visit our [Cumulus main README](https://github.com/nasa/cumulus) for details.
    • NVM for node versioning
    • AWS CLI
    • Bash
    • Docker (only required for testing)
    • docker-compose (only required for testing pip install docker-compose)
    • Python
    • pipenv

    Keep in mind you will need access to the AWS console and an Earthdata account before you can deploy Cumulus.

    What is the preferred web browser for the Cumulus environment?

    Answer: Our preferred web browser is the latest version of Google Chrome.

    How do I deploy a new instance in Cumulus?

    Answer: For steps on the Cumulus deployment process go to How to Deploy Cumulus.

    Where can I find Cumulus release notes?

    Answer: To get the latest information about updates to Cumulus go to Cumulus Versions.

    How do I quickly troubleshoot an issue in Cumulus?

    Answer: To troubleshoot and fix issues in Cumulus reference our recommended solutions in Troubleshooting Cumulus.

    Where can I get support help?

    Answer: The following options are available for assistance:

    • Cumulus: Outside NASA users should file a GitHub issue and inside NASA users should file a Cumulus JIRA ticket.
    • AWS: You can create a case in the AWS Support Center, accessible via your AWS Console.

    For more information on how to submit an issue or contribute to Cumulus follow our guidelines at Contributing


    Workflows

    What is a Cumulus workflow?

    Answer: A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions. For more details, we suggest visiting the Workflows section.

    How do I set up a Cumulus workflow?

    Answer: You will need to create a provider, have an associated collection (add a new one), and generate a new rule first. Then you can set up a Cumulus workflow by following these steps here.

    Where can I find a list of workflow tasks?

    Answer: You can access a list of reusable tasks for Cumulus development at Cumulus Tasks.

    Are there any third-party workflows or applications that I can use with Cumulus?

    Answer: The Cumulus team works with various partners to help build a robust framework. You can visit our External Contributions section to see what other options are available to help you customize Cumulus for your needs.


    Integrators & Developers

    What is a Cumulus integrator?

    Answer: Those who are working within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    What are the steps if I run into an issue during deployment?

    Answer: If you encounter an issue with your deployment go to the Troubleshooting Deployment guide.

    Is Cumulus customizable and flexible?

    Answer: Yes. Cumulus is a modular architecture that allows you to decide which components that you want/need to deploy. These components are maintained as Terraform modules.

    What are Terraform modules?

    Answer: They are modules that are composed to create a Cumulus deployment, which gives integrators the flexibility to choose the components of Cumulus that want/need. To view Cumulus maintained modules or steps on how to create a module go to Terraform modules.

    Where do I find Terraform module variables

    Answer: Go here for a list of Cumulus maintained variables.

    What are the common use cases that a Cumulus integrator encounters?

    Answer: The following are some examples of possible use cases you may see:


    Operators

    What is a Cumulus operator?

    Answer: Those that ingests, archives, and troubleshoots datasets (called collections in Cumulus). Your daily activities might include but not limited to the following:

    • Ingesting datasets
    • Maintaining historical data ingest
    • Starting and stopping data handlers
    • Managing collections
    • Managing provider definitions
    • Creating, enabling, and disabling rules
    • Investigating errors for granules and deleting or re-ingesting granules
    • Investigating errors in executions and isolating failed workflow step(s)
    What are the common use cases that a Cumulus operator encounters?

    Answer: The following are some examples of possible use cases you may see:

    Explore more Cumulus operator best practices and how-tos in the dedicated Operator Docs.

    Can you re-run a workflow execution in AWS?

    Answer: Yes. For steps on how to re-run a workflow execution go to Re-running workflow executions in the Cumulus Operator Docs.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/features/ancillary_metadata/index.html b/docs/v14.1.0/features/ancillary_metadata/index.html index ab7c4a61f35..31da4b8ce20 100644 --- a/docs/v14.1.0/features/ancillary_metadata/index.html +++ b/docs/v14.1.0/features/ancillary_metadata/index.html @@ -5,7 +5,7 @@ Ancillary Metadata Export | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v14.1.0

    Ancillary Metadata Export

    This feature utilizes the type key on a files object in a Cumulus granule. It uses the key to provide a mechanism where granule discovery, processing and other tasks can set and use this value to facilitate metadata export to CMR.

    Tasks setting type

    Discover Granules

    Uses the Collection type key to set the value for files on discovered granules in it's output.

    Parse PDR

    Uses a task-specific mapping to map PDR 'FILE_TYPE' to a CNM type to set type on granules from the PDR.

    CNMToCMALambdaFunction

    Natively supports types that are included in incoming messages to a CNM Workflow.

    Tasks using type

    Move Granules

    Uses the granule file type key to update UMM/ECHO 10 CMR files passed in as candidates to the task. This task adds the external facing URLs to the CMR metadata file based on the type. See the file tracking data cookbook for a detailed mapping. If a non-CNM type is specified, the task assumes it is a 'data' file.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/features/backup_and_restore/index.html b/docs/v14.1.0/features/backup_and_restore/index.html index 82c1d3b8fb4..376cba1875d 100644 --- a/docs/v14.1.0/features/backup_and_restore/index.html +++ b/docs/v14.1.0/features/backup_and_restore/index.html @@ -5,7 +5,7 @@ Cumulus Backup and Restore | Cumulus Documentation - + @@ -52,7 +52,7 @@ writing to the old cluster.

  • Set the snapshot_identifier variable to the snapshot you wish to create, and configure the module like a new deployment, with a unique cluster_identifier

  • Deploy the module using terraform apply

  • Once deployed, verify the cluster has the expected data

  • Redeploy the data persistence and Cumulus deployments - You should not need to reconfigure either, as the secret ARN and the security group should not change, however double-check the configured values are as expected

  • - + \ No newline at end of file diff --git a/docs/v14.1.0/features/dead_letter_archive/index.html b/docs/v14.1.0/features/dead_letter_archive/index.html index fd372225640..c548741a382 100644 --- a/docs/v14.1.0/features/dead_letter_archive/index.html +++ b/docs/v14.1.0/features/dead_letter_archive/index.html @@ -5,13 +5,13 @@ Cumulus Dead Letter Archive | Cumulus Documentation - +
    Version: v14.1.0

    Cumulus Dead Letter Archive

    This documentation explains the Cumulus dead letter archive and associated functionality.

    DB Records DLQ Archive

    The Cumulus system contains a number of dead letter queues. Perhaps the most important system lambda function supported by a DLQ is the sfEventSqsToDbRecords lambda function which parses Cumulus messages from workflow executions to generate and write database records to the Cumulus database.

    As of Cumulus v9+, the dead letter queue for this lambda (named sfEventSqsToDbRecordsDeadLetterQueue) has been updated with a consumer lambda that will automatically write any incoming records to the S3 system bucket, under the path <stackName>/dead-letter-archive/sqs/. This will allow integrators and operators engaged in debugging missing records to inspect any Cumulus messages which failed to process and did not result in the successful creation of database records.

    Dead Letter Archive recovery

    In addition to the above, as of Cumulus v9+, the Cumulus API also contains a new endpoint at /deadLetterArchive/recoverCumulusMessages.

    Sending a POST request to this endpoint will trigger a Cumulus AsyncOperation that will attempt to reprocess (and if successful delete) all Cumulus messages in the dead letter archive, using the same underlying logic as the existing sfEventSqsToDbRecords. Otherwise, all Cumulus messages that fail to be reprocessed will be moved to a new archive location under the path <stackName>/dead-letter-archive/failed-sqs/<YYYY-MM-DD>.

    This endpoint may prove particularly useful when recovering from extended or unexpected database outage, where messages failed to process due to external outage and there is no essential malformation of each Cumulus message.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/features/dead_letter_queues/index.html b/docs/v14.1.0/features/dead_letter_queues/index.html index 66e37f068f1..205993cda5b 100644 --- a/docs/v14.1.0/features/dead_letter_queues/index.html +++ b/docs/v14.1.0/features/dead_letter_queues/index.html @@ -5,13 +5,13 @@ Dead Letter Queues | Cumulus Documentation - +
    Version: v14.1.0

    Dead Letter Queues

    startSF SQS queue

    The workflow-trigger for the startSF queue has a Redrive Policy set up that directs any failed attempts to pull from the workflow start queue to a SQS queue Dead Letter Queue.

    This queue can then be monitored for failures to initiate a workflow. Please note that workflow failures will not show up in this queue, only repeated failure to trigger a workflow.

    Named Lambda Dead Letter Queues

    Cumulus provides configured Dead Letter Queues (DLQ) for non-workflow Lambdas (such as ScheduleSF) to capture Lambda failures for further processing.

    These DLQs are setup with the following configuration:

      receive_wait_time_seconds  = 20
    message_retention_seconds = 1209600
    visibility_timeout_seconds = 60

    Default Lambda Configuration

    The following built-in Cumulus Lambdas are setup with DLQs to allow handling of process failures:

    • dbIndexer (Updates Elasticsearch)
    • JobsLambda (writes logs outputs to Elasticsearch)
    • ScheduleSF (the SF Scheduler Lambda that places messages on the queue that is used to start workflows, see Workflow Triggers)
    • publishReports (Lambda that publishes messages to the SNS topics for execution, granule and PDR reporting)
    • reportGranules, reportExecutions, reportPdrs (Lambdas responsible for updating records based on messages in the queues published by publishReports)

    Troubleshooting/Utilizing messages in a Dead Letter Queue

    Ideally an automated process should be configured to poll the queue and process messages off a dead letter queue.

    For aid in manually troubleshooting, you can utilize the SQS Management console to view/messages available in the queues setup for a particular stack. The dead letter queues will have a Message Body containing the Lambda payload, as well as Message Attributes that reference both the error returned and a RequestID which can be cross referenced to the associated Lambda's CloudWatch logs for more information:

    Screenshot of the AWS SQS console showing how to view SQS message attributes

    - + \ No newline at end of file diff --git a/docs/v14.1.0/features/distribution-metrics/index.html b/docs/v14.1.0/features/distribution-metrics/index.html index e86b4ef0eb1..521636e1176 100644 --- a/docs/v14.1.0/features/distribution-metrics/index.html +++ b/docs/v14.1.0/features/distribution-metrics/index.html @@ -5,13 +5,13 @@ Cumulus Distribution Metrics | Cumulus Documentation - +
    Version: v14.1.0

    Cumulus Distribution Metrics

    It is possible to configure Cumulus and the Cumulus Dashboard to display information about the successes and failures of requests for data. This requires the Cumulus instance to deliver Cloudwatch Logs and S3 Server Access logs to an ELK stack.

    ESDIS Metrics in NGAP

    Work with the ESDIS metrics team to set up permissions and access to forward Cloudwatch Logs to a shared AWS:Logs:Destination as well as transferring your S3 Server Access logs to a metrics team bucket.

    The metrics team has taken care of setting up logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    Once Cumulus has been configured to deliver Cloudwatch logs to the ESDIS Metrics team, you can use the Elasticsearch indexes to create the necessary target patterns on the dashboard. These are often <daac>-cloudwatch-cumulus-<env>-* and <daac>-distribution-<env>-*, but they will depend on your specific Elastiscearch setup.

    Cumulus / ESDIS Metrics distribution system

    Architecture diagram showing how logs are replicated from a Cumulus instance to the ESDIS Metrics account and accessed by the Cumulus dashboard

    - + \ No newline at end of file diff --git a/docs/v14.1.0/features/execution_payload_retention/index.html b/docs/v14.1.0/features/execution_payload_retention/index.html index 8552e9421a6..463a97fe0dc 100644 --- a/docs/v14.1.0/features/execution_payload_retention/index.html +++ b/docs/v14.1.0/features/execution_payload_retention/index.html @@ -5,13 +5,13 @@ Execution Payload Retention | Cumulus Documentation - +
    Version: v14.1.0

    Execution Payload Retention

    In addition to CloudWatch logs and AWS StepFunction API records, Cumulus automatically stores the initial and 'final' (the last update to the execution record) payload values as part of the Execution record in your RDS database and Elasticsearch.

    This allows access via the API (or optionally direct DB/Elasticsearch querying) for debugging/reporting purposes. The data is stored in the "originalPayload" and "finalPayload" fields.

    Payload record cleanup

    To reduce storage requirements, a CloudWatch rule ({stack-name}-dailyExecutionPayloadCleanupRule) triggering a daily run of the provided cleanExecutions lambda has been added. This lambda will remove all 'completed' and 'non-completed' payload records in the database that are older than the specified configuration.

    Configuration

    The following configuration flags have been made available in the cumulus module. They may be overridden in your deployment's instance of the cumulus module by adding the following configuration options:

    dailyexecution_payload_cleanup_schedule_expression (string)_

    This configuration option sets the execution times for this Lambda to run, using a Cloudwatch cron expression.

    Default value is "cron(0 4 * * ? *)".

    completeexecution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of completed execution payloads.

    Default value is false.

    completeexecution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a 'completed' status in days. Records with updatedAt values older than this with payload information will have that information removed.

    Default value is 10.

    noncomplete_execution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of "non-complete" (any status other than completed) execution payloads.

    Default value is false.

    noncomplete_execution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a status other than 'complete' in days. Records with updateTime values older than this with payload information will have that information removed.

    Default value is 30 days.

    • complete_execution_payload_disable/non_complete_execution_payload_disable

    These flags (true/false) determine if the cleanup script's logic for 'complete' and 'non-complete' executions will run. Default value is false for both.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/features/logging-esdis-metrics/index.html b/docs/v14.1.0/features/logging-esdis-metrics/index.html index a36fe1bd6d1..b32957d2631 100644 --- a/docs/v14.1.0/features/logging-esdis-metrics/index.html +++ b/docs/v14.1.0/features/logging-esdis-metrics/index.html @@ -5,13 +5,13 @@ Writing logs for ESDIS Metrics | Cumulus Documentation - +
    Version: v14.1.0

    Writing logs for ESDIS Metrics

    Note: This feature is only available for Cumulus deployments in NGAP environments.

    Prerequisite: You must configure your Cumulus deployment to deliver your logs to the correct shared logs destination for ESDIS metrics.

    Log messages delivered to the ESDIS metrics logs destination conforming to an expected format will be automatically ingested and parsed to enable helpful searching/filtering of your logs via the ESDIS metrics Kibana dashboard.

    Expected log format

    The ESDIS metrics pipeline expects a log message to be a JSON string representation of an object (dict in Python or map in Java). An example log message might look like:

    {
    "level": "info",
    "executions": "arn:aws:states:us-east-1:000000000000:execution:MySfn:abcd1234",
    "granules": "[\"granule-1\",\"granule-2\"]",
    "message": "hello world",
    "sender": "greetingFunction",
    "stackName": "myCumulus",
    "timestamp": "2018-10-19T19:12:47.501Z"
    }

    A log message can contain the following properties:

    • executions: The AWS Step Function execution name in which this task is executing, if any
    • granules: A JSON string of the array of granule IDs being processed by this code, if any
    • level: A string identifier for the type of message being logged. Possible values:
      • debug
      • error
      • fatal
      • info
      • warn
      • trace
    • message: String containing your actual log message
    • parentArn: The parent AWS Step Function execution ARN that triggered the current execution, if any
    • sender: The name of the resource generating the log message (e.g. a library name, a Lambda function name, an ECS activity name)
    • stackName: The unique prefix for your Cumulus deployment
    • timestamp: An ISO-8601 formatted timestamp
    • version: The version of the resource generating the log message, if any

    None of these properties are explicitly required for ESDIS metrics to parse your log correctly. However, a log without a message has no informational content. And having level, sender, and timestamp properties is very useful for filtering your logs. Including a stackName in your logs is helpful as it allows you to distinguish between logs generated by different deployments.

    Using Cumulus Message Adapter libraries

    If you are writing a custom task that is integrated with the Cumulus Message Adapter, then some of language specific client libraries can be used to write logs compatible with ESDIS metrics.

    The usage of each library differs slightly, but in general a logger is initialized with a Cumulus workflow message to determine the contextual information for the task (e.g. granules, executions). Then, after the logger is initialized, writing logs only requires specifying a message, but the logged output will include the contextual information as well.

    Writing logs using custom code

    Any code that produces logs matching the expected log format can be processed by ESDIS metrics.

    Node.js

    Cumulus core provides a @cumulus/logger library that writes logs in the expected format for ESDIS metrics.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/features/replay-archived-sqs-messages/index.html b/docs/v14.1.0/features/replay-archived-sqs-messages/index.html index 5b1a1b32191..c7dbac75e7d 100644 --- a/docs/v14.1.0/features/replay-archived-sqs-messages/index.html +++ b/docs/v14.1.0/features/replay-archived-sqs-messages/index.html @@ -5,14 +5,14 @@ How to replay SQS messages archived in S3 | Cumulus Documentation - +
    Version: v14.1.0

    How to replay SQS messages archived in S3

    Context

    Cumulus archives all incoming SQS messages to S3 and removes messages once they have been processed. Unprocessed messages are archived at the path: ${stackName}/archived-incoming-messages/${queueName}/${messageId}

    Replay SQS messages endpoint

    The Cumulus API has added a new endpoint, /replays/sqs. This endpoint will allow you to start a replay operation to requeue all archived SQS messages by queueName and returns an AsyncOperationId for operation status tracking.

    Start replaying archived SQS messages

    In order to start a replay, you must perform a POST request to the replays/sqs endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    FieldTypeDescription
    queueNamestringAny valid SQS queue name (not ARN)

    Status tracking

    A successful response from the /replays/sqs endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/features/replay-kinesis-messages/index.html b/docs/v14.1.0/features/replay-kinesis-messages/index.html index 95a5721572b..b70f80c1a4e 100644 --- a/docs/v14.1.0/features/replay-kinesis-messages/index.html +++ b/docs/v14.1.0/features/replay-kinesis-messages/index.html @@ -5,7 +5,7 @@ How to replay Kinesis messages after an outage | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v14.1.0

    How to replay Kinesis messages after an outage

    After a period of outage, it may be necessary for a Cumulus operator to reprocess or 'replay' messages that arrived on an AWS Kinesis Data Stream but did not trigger an ingest. This document serves as an outline on how to start a replay operation, and how to perform status tracking. Cumulus supports replay of all Kinesis messages on a stream (subject to the normal RetentionPeriod constraints), or all messages within a given time slice delimited by start and end timestamps.

    As Kinesis has no comparable field to e.g. the SQS ReceiveCount on its records, Cumulus cannot tell which messages within a given time slice have never been processed, and cannot guarantee only missed messages will be processed. Users will have to rely on duplicate handling or some other method of identifying messages that should not be processed within the time slice.

    NOTE: This operation flow effectively changes only the trigger mechanism for Kinesis ingest notifications. The existence of valid Kinesis-type rules and all other normal requirements for the triggering of ingest via Kinesis still apply.

    Replays endpoint

    Cumulus has added a new endpoint to its API, /replays. This endpoint will allow you to start replay operations and returns an AsyncOperationId for operation status tracking.

    Start a replay

    In order to start a replay, you must perform a POST request to the replays endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    NOTE: As the endTimestamp relies on a comparison with the Kinesis server-side ApproximateArrivalTimestamp, and given that there is no documented level of accuracy for the approximation, it is recommended that the endTimestamp include some amount of buffer to allow for slight discrepancies. If tolerable, the same is recommended for the startTimestamp although it is used differently and less vulnerable to discrepancies since a server-side arrival timestamp should never be earlier than the client-side request timestamp.

    FieldTypeRequiredDescription
    typestringrequiredCurrently only accepts kinesis.
    kinesisStreamstringfor type kinesisAny valid kinesis stream name (not ARN)
    kinesisStreamCreationTimestamp*optionalAny input valid for a JS Date constructor. For reasons to use this field see AWS documentation on StreamCreationTimestamp.
    endTimestamp*optionalAny input valid for a JS Date constructor. Messages newer than this timestamp will be skipped.
    startTimestamp*optionalAny input valid for a JS Date constructor. Messages will be fetched from the Kinesis stream starting at this timestamp. Ignored if it is further in the past than the stream's retention period.

    Status tracking

    A successful response from the /replays endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/features/reports/index.html b/docs/v14.1.0/features/reports/index.html index bb6cd360ec1..70fa6356142 100644 --- a/docs/v14.1.0/features/reports/index.html +++ b/docs/v14.1.0/features/reports/index.html @@ -5,7 +5,7 @@ Reconciliation Reports | Cumulus Documentation - + @@ -19,7 +19,7 @@ report generation. The data buckets will include any buckets in your Cumulus buckets configuration that have type public, protected or private.
    - + \ No newline at end of file diff --git a/docs/v14.1.0/getting-started/index.html b/docs/v14.1.0/getting-started/index.html index acc42d55a40..27926810c81 100644 --- a/docs/v14.1.0/getting-started/index.html +++ b/docs/v14.1.0/getting-started/index.html @@ -5,13 +5,13 @@ Getting Started | Cumulus Documentation - +
    Version: v14.1.0

    Getting Started

    Overview | Quick Tutorials | Helpful Tips

    Overview

    This serves as a guide for new Cumulus users to deploy and learn how to use Cumulus. Here you will learn what you need in order to complete any prerequisites, what Cumulus is and how it works, and how to successfully navigate and deploy a Cumulus environment.

    What is Cumulus

    Cumulus is an open source set of components for creating cloud-based data ingest, archive, distribution and management designed for NASA's future Earth Science data streams.

    Who uses Cumulus

    Data integrators/developers and operators across projects not limited to NASA use Cumulus for their daily work functions.

    Cumulus Roles

    Integrator/Developer

    Cumulus integrators/developers are those who work within Cumulus and AWS for deployments and to manage workflows.

    Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections.

    Role Guides

    As a developer, integrator, or operator, you will need to set up your environments to work in Cumulus. The following docs can get you started in your role specific activities.

    What is a Cumulus Data Type

    In Cumulus, we have the following types of data that you can create and manage:

    • Collections
    • Granules
    • Providers
    • Rules
    • Workflows
    • Executions
    • Reports

    For details on how to create or manage data types go to Data Management Types.


    Quick Tutorials

    Deployment & Configuration

    Cumulus is deployed to an AWS account, so you must have access to deploy resources to an AWS account to get started.

    1. Set up Git Secrets

    To ensure your AWS access keys and passwords are protected as you submit commits we recommend setting up Git Secrets.

    2. Deploy Cumulus Core and Cumulus Dashboard to AWS

    Follow the deployment instructions to deploy Cumulus to your AWS account.

    3. Configure and Run the HelloWorld Workflow

    If you have deployed using the cumulus-template-deploy repository, you have a HelloWorld workflow deployed to your Cumulus backend.

    You can see your deployed workflows on the Workflows page of your Cumulus dashboard.

    Configure a collection and provider using the setup guidance on the Cumulus dashboard.

    Then create a rule to trigger your HelloWorld workflow. You can select a rule type of one time.

    Navigate to the Executions page of the dashboard to check the status of your workflow execution.

    4. Configure a Custom Workflow

    See Developing a custom workflow documentation for adding a new workflow to your deployment.

    There are plenty of workflow examples using Cumulus tasks here. The Data Cookbooks provide a more in-depth look at some of these more advanced workflows and their configurations.

    There is a list of Cumulus tasks already included in your deployment here.

    After configuring your workflow and redeploying, you can configure and run your workflow using the same steps as in step 2.


    Helpful Tips

    Here are some useful tips to keep in mind when deploying or working in Cumulus.

    Integrator/Developer

    • Versioning and Releases: This documentation gives information on our global versioning approach. We suggest upgrading to the supported version for Cumulus, Cumulus dashboard, and Thin Egress App (TEA).
    • Cumulus Developer Documentation: We suggest that you read through and reference this resource for development best practices in Cumulus.
    • Cumulus Deployment: We will guide you on how to manually deploy a new instance of Cumulus. In this reference, you will learn how to install Terraform, create an AWS S3 bucket, configure a compatible database, and create a Lambda layer.
    • Terraform Best Practices: This will help guide you through your Terraform configuration and Cumulus deployment.

    For an introduction about Terraform go here.

    Operator

    Troubleshooting

    Troubleshooting: Some suggestions to help you troubleshoot and solve issues you may encounter.

    Resources

    - + \ No newline at end of file diff --git a/docs/v14.1.0/glossary/index.html b/docs/v14.1.0/glossary/index.html index e6c88ff1532..673402cb0c0 100644 --- a/docs/v14.1.0/glossary/index.html +++ b/docs/v14.1.0/glossary/index.html @@ -5,13 +5,13 @@ Glossary | Cumulus Documentation - +
    Version: v14.1.0

    Glossary

    AWS Glossary

    For terms/items from Amazon/AWS not mentioned in this glossary, please refer to the AWS Glossary.

    Cumulus Glossary of Terms

    API Gateway

    Refers to AWS's API Gateway. Used by the Cumulus API.

    ARN

    Refers to an AWS "Amazon Resource Name".

    For more info, see the AWS documentation.

    AWS

    See: Amazon Web Services documentation.

    AWS Lambda/Lambda Function

    AWS's 'serverless' option. Allows the running of code without provisioning a service or managing server/ECS instances/etc.

    For more information, see the AWS Lambda documentation.

    AWS Access Keys

    Access credentials that give you access to AWS to act as a IAM user programmatically or from the command line.

    For more information, see the AWS IAM Documentation.

    Bucket

    An Amazon S3 cloud storage resource.

    For more information, see the AWS Bucket Documentation.

    CloudFormation

    An AWS service that allows you to define and manage cloud resources as a preconfigured block.

    For more information, see the AWS CloudFormation User Guide.

    Cloudformation Template

    A template that defines an AWS Cloud Formation.

    For more information, see the AWS intro page.

    Cloudwatch

    AWS service that allows logging and metrics collections on various cloud resources you have in AWS.

    For more information, see the AWS User Guide.

    Cloud Notification Mechanism (CNM)

    An interface mechanism to support cloud-based ingest messaging. For more information, see PO.DAAC's CNM Schema.

    Common Metadata Repository (CMR)

    "A high-performance, high-quality, continuously evolving metadata system that catalogs Earth Science data and associated service metadata records". For more information, see NASA's CMR page.

    Collection (Cumulus)

    Cumulus Collections are logical sets of data objects of the same data type and version.

    For more information, see Collections - Data Management Types.

    Cumulus Message Adapter (CMA)

    A library designed to help task developers integrate step function tasks into a Cumulus workflow by adapting task input/output into the Cumulus Message format.

    For more information, see CMA workflow reference page.

    Distributed Active Archive Center (DAAC)

    Refers to a specific organization that's part of NASA's distributed system of archive centers. For more information see EOSDIS's DAAC page.

    Dead Letter Queue (DLQ)

    This refers to Amazon SQS Dead-Letter Queues - these SQS queues are specifically configured to capture failed messages from other services/SQS queues/etc to allow for processing of failed messages.

    For more on DLQs, see the Amazon Documentation and the Cumulus DLQ feature page.

    Developer

    Those who setup deployment and workflow management for Cumulus. Sometimes referred to as an integrator. See integrator.

    ECS

    Amazon's Elastic Container Service. Used in Cumulus by workflow steps that require more flexibility than Lambda can provide.

    For more information, see AWS's developer guide.

    ECS Activity

    An ECS instance run via a Step Function.

    Execution (Cumulus)

    A Cumulus execution refers to a single execution of a (Cumulus) Workflow.

    GIBS

    Global Imagery Browse Services

    Granule

    A granule is the smallest aggregation of data that can be independently managed (described, inventoried, and retrieved). Granules are always associated with a collection, which is a grouping of granules. A granule is a grouping of data files.

    IAM

    AWS Identity and Access Management.

    For more information, see AWS IAMs.

    Integrator/Developer

    Those who work within Cumulus and AWS for deployments and to manage workflows.

    Kinesis

    Amazon's platform for streaming data on AWS.

    See AWS Kinesis for more information.

    Lambda

    AWS's cloud service that lets you run code without provisioning or managing servers.

    For more information, see AWS's lambda page.

    Module (Terraform)

    Refers to a terraform module.

    Node

    See node.js.

    Node Package Manager (npm)

    Node package manager. Often referred to as npm.

    For more information, see npm.

    Operator

    Those who work within Cumulus to ingest/archive data and manage collections.

    PDR

    "Polling Delivery Mechanism" used in "DAAC Ingest" workflows.

    For more information, see nasa.gov.

    Packages (npm)

    Npm hosted node.js packages. Cumulus packages can be found on npm's site here

    Provider

    Data source that generates and/or distributes data for Cumulus workflows to act upon.

    For more information, see the Cumulus documentation.

    Rule

    Rules are configurable scheduled events that trigger workflows based on various criteria.

    For more information, see the Cumulus Rules documentation.

    S3

    Amazon's Simple Storage Service provides data object storage in the cloud. Used in Cumulus to store configuration, data, and more.

    For more information, see AWS's S3 page.

    SIPS

    Science Investigator-led Processing Systems. In the context of DAAC ingest, this refers to data producers/providers.

    For more information, see nasa.gov.

    SNS

    Amazon's Simple Notification Service provides a messaging service that allows publication of and subscription to events. Used in Cumulus to trigger workflow events, track event failures, and others.

    For more information, see AWS's SNS page.

    SQS

    Amazon's Simple Queue Service.

    For more information, see AWS's SQS page.

    Stack

    A collection of AWS resources you can manage as a single unit.

    In the context of Cumulus, this refers to a deployment of the cumulus and data-persistence modules that is managed by Terraform.

    Step Function

    AWS's web service that allows you to compose complex workflows as a state machine comprised of tasks (Lambdas, activities hosted on EC2/ECS, some AWS service APIs, etc). See AWS's Step Function Documentation for more information. In the context of Cumulus these are the underlying AWS service used to create Workflows.

    Terraform

    Terraform is the tool that you will use for deployment and configuration of your Cumulus environment.

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/index.html b/docs/v14.1.0/index.html index e494f2e819b..82f9e5c2ed5 100644 --- a/docs/v14.1.0/index.html +++ b/docs/v14.1.0/index.html @@ -5,13 +5,13 @@ Introduction | Cumulus Documentation - +
    Version: v14.1.0

    Introduction

    This Cumulus project seeks to address the existing need for a “native” cloud-based data ingest, archive, distribution, and management system that can be used for all future Earth Observing System Data and Information System (EOSDIS) data streams via the development and implementation of Cumulus. The term “native” implies that the system will leverage all components of a cloud infrastructure provided by the vendor for efficiency (in terms of both processing time and cost). Additionally, Cumulus will operate on future data streams involving satellite missions, aircraft missions, and field campaigns.

    This documentation includes both guidelines, examples, and source code docs. It is accessible at https://nasa.github.io/cumulus.


    Get To Know Cumulus

    • Getting Started - here - If you are new to Cumulus we suggest that you begin with this section to help you understand and work in the environment.
    • General Cumulus Documentation - here <- you're here

    Cumulus Reference Docs

    • Cumulus API Documentation - here
    • Cumulus Developer Documentation - here - READMEs throughout the main repository.
    • Data Cookbooks - here

    Auxiliary Guides

    • Integrator Guide - here
    • Operator Docs - here

    Contributing

    Please refer to: https://github.com/nasa/cumulus/blob/master/CONTRIBUTING.md for information. We thank you in advance.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/integrator-guide/about-int-guide/index.html b/docs/v14.1.0/integrator-guide/about-int-guide/index.html index 56b88a42bac..3ccd7350c9b 100644 --- a/docs/v14.1.0/integrator-guide/about-int-guide/index.html +++ b/docs/v14.1.0/integrator-guide/about-int-guide/index.html @@ -5,13 +5,13 @@ About Integrator Guide | Cumulus Documentation - +
    Version: v14.1.0

    About Integrator Guide

    Purpose

    The Integrator Guide is to help supplement the Cumulus documentation and Data Cookbooks. This content is for Cumulus integrators who are either new to the project or need a step-by-step resource to help them along.

    What Is A Cumulus Integrator

    Cumulus integrators are those who work within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    - + \ No newline at end of file diff --git a/docs/v14.1.0/integrator-guide/int-common-use-cases/index.html b/docs/v14.1.0/integrator-guide/int-common-use-cases/index.html index 2c5d84f2235..c016f2d561f 100644 --- a/docs/v14.1.0/integrator-guide/int-common-use-cases/index.html +++ b/docs/v14.1.0/integrator-guide/int-common-use-cases/index.html @@ -5,13 +5,13 @@ Integrator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v14.1.0/integrator-guide/workflow-add-new-lambda/index.html b/docs/v14.1.0/integrator-guide/workflow-add-new-lambda/index.html index 02a702c7faa..51c1597d486 100644 --- a/docs/v14.1.0/integrator-guide/workflow-add-new-lambda/index.html +++ b/docs/v14.1.0/integrator-guide/workflow-add-new-lambda/index.html @@ -5,13 +5,13 @@ Workflow - Add New Lambda | Cumulus Documentation - +
    Version: v14.1.0

    Workflow - Add New Lambda

    You can develop a workflow task in AWS Lambda or Elastic Container Service (ECS). AWS ECS requires Docker. For a list of tasks to use go to our Cumulus Tasks page.

    The following steps are to help you along as you write a new Lambda that integrates with a Cumulus workflow. This will aid you with the understanding of the Cumulus Message Adapter (CMA) process.

    Steps

    1. Define New Lambda in Terraform

    2. Add Task in JSON Object

      For details on how to set up a workflow via CMA go to the CMA Tasks: Message Flow.

      You will need to assign input and output for the new task and follow the CMA contract here. This contract defines how libraries should call the cumulus-message-adapter to integrate a task into an existing Cumulus Workflow.

    3. Verify New Task

      Check the updated workflow in AWS and in Cumulus.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/integrator-guide/workflow-ts-failed-step/index.html b/docs/v14.1.0/integrator-guide/workflow-ts-failed-step/index.html index 37c356ac387..6785d0c4337 100644 --- a/docs/v14.1.0/integrator-guide/workflow-ts-failed-step/index.html +++ b/docs/v14.1.0/integrator-guide/workflow-ts-failed-step/index.html @@ -5,13 +5,13 @@ Workflow - Troubleshoot Failed Step(s) | Cumulus Documentation - +
    Version: v14.1.0

    Workflow - Troubleshoot Failed Step(s)

    Steps

    1. Locate Step
    • Go to Cumulus dashboard
    • Find the granule
    • Go to Executions to determine the failed step
    1. Investigate in Cloudwatch
    • Go to Cloudwatch
    • Locate lambda
    • Search Cloudwatch logs
    1. Recreate Error

      In your sandbox environment, try to recreate the error.

    2. Resolution

    - + \ No newline at end of file diff --git a/docs/v14.1.0/interfaces/index.html b/docs/v14.1.0/interfaces/index.html index 8052b89a300..100e5784fe5 100644 --- a/docs/v14.1.0/interfaces/index.html +++ b/docs/v14.1.0/interfaces/index.html @@ -5,13 +5,13 @@ Interfaces | Cumulus Documentation - +
    Version: v14.1.0

    Interfaces

    Cumulus has multiple interfaces that allow interaction with discrete components of the system, such as starting workflows via SNS/Kinesis/SQS, manually queueing workflow start messages, submitting SNS notifications for completed workflows, and the many operations allowed by the Cumulus API.

    The diagram below illustrates the workflow process in detail and the various interfaces that allow starting of workflows, reporting of workflow information, and database create operations that occur when a workflow reporting message is processed. For interfaces with expected input or output schemas, details are provided below.

    Architecture diagram showing the interfaces for triggering and reporting of Cumulus workflow executions

    Workflow triggers and queuing

    Kinesis stream

    As a Kinesis stream is consumed by the messageConsumer Lambda to queue workflow executions, the incoming event is validated against this consumer schema by the ajv package.

    SQS queue for executions

    The messages put into the SQS queue for executions should conform to the Cumulus message format.

    Workflow executions

    See the documentation on Cumulus workflows.

    Workflow reporting

    SNS reporting topics

    For granule and PDR reporting, the topics will only receive data if the Cumulus workflow execution message meets the following criteria:

    • Granules - workflow message contains granule data in payload.granules
    • PDRs - workflow message contains PDR data in payload.pdr

    The messages published to the SNS reporting topics for executions and PDRs and the record property in the messages published to the granules SNS topic should conform to the model schema for each data type.

    Further detail on workflow reporting and how to interact with these interfaces can be found in the workflow notifications data cookbook.

    Cumulus API

    See the Cumulus API documentation.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/operator-docs/about-operator-docs/index.html b/docs/v14.1.0/operator-docs/about-operator-docs/index.html index 1215792e31c..d603eae8a69 100644 --- a/docs/v14.1.0/operator-docs/about-operator-docs/index.html +++ b/docs/v14.1.0/operator-docs/about-operator-docs/index.html @@ -5,13 +5,13 @@ About Operator Docs | Cumulus Documentation - +
    Version: v14.1.0

    About Operator Docs

    Purpose

    Operator Docs are an augmentation to Cumulus documentation and Data Cookbooks. These documents will walk step-by-step through common Cumulus activities (that aren't necessarily as use-case directed as what you'd see in Data Cookbooks).

    What Is A Cumulus Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections. They may perform the following functions via the operator dashboard or API:

    • Configure providers and collections
    • Configure rules and monitor workflow executions
    • Monitor granule ingestion
    • Monitor system metrics
    - + \ No newline at end of file diff --git a/docs/v14.1.0/operator-docs/bulk-operations/index.html b/docs/v14.1.0/operator-docs/bulk-operations/index.html index 306f3b352ef..a3369434d20 100644 --- a/docs/v14.1.0/operator-docs/bulk-operations/index.html +++ b/docs/v14.1.0/operator-docs/bulk-operations/index.html @@ -5,14 +5,14 @@ Bulk Operations | Cumulus Documentation - +
    Version: v14.1.0

    Bulk Operations

    Cumulus implements bulk operations through the use of AsyncOperations, which are long-running processes executed on an AWS ECS cluster.

    Submitting a bulk API request

    Bulk operations are generally submitted via the endpoint for the relevant data type, e.g. granules. For a list of supported API requests, refer to the Cumulus API documentation. Bulk operations are denoted with the keyword 'bulk'.

    Starting bulk operations from the Cumulus dashboard

    Using a Kibana query

    Note: You must have configured your dashboard build with a KIBANAROOT environment variable in order for the Kibana link to render in the bulk granules modal

    1. From the Granules dashboard page, click on the "Run Bulk Granules" button, then select what type of action you would like to perform

      • Note: the rest of the process is the same regardless of what type of bulk action you perform
    2. From the bulk granules modal, click the "Open Kibana" link:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations

    3. Once you have accessed Kibana, navigate to the "Discover" page. If this is your first time using Kibana, you may see a message like this at the top of the page:

      In order to visualize and explore data in Kibana, you'll need to create an index pattern to retrieve data from Elasticsearch.

      In that case, see the docs for creating an index pattern for Kibana

      Screenshot of Kibana user interface showing the &quot;Discover&quot; page for running queries

    4. Enter a query that returns the granule records that you want to use for bulk operations:

      Screenshot of Kibana user interface showing an example Kibana query and results

    5. Once the Kibana query is returning the results you want, click the "Inspect" link near the top of the page. A slide out tab with request details will appear on the right side of the page:

      Screenshot of Kibana user interface showing details of an example request

    6. In the slide out tab that appears on the right side of the page, click the "Request" link near the top and scroll down until you see the query property:

      Screenshot of Kibana user interface showing the Elasticsearch data request made for a given Kibana query

    7. Highlight and copy the query contents from Kibana. Go back to the Cumulus dashboard and paste the query contents from Kibana inside of the query property in the bulk granules request payload. It is expected that you should have a property of query nested inside of the existing query property:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query information populated

    8. Add values for the index and workflowName to the bulk granules request payload. The value for index will vary based on your Elasticsearch setup, but it is good to target an index specifically for granule data if possible:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query, index, and workflow information populated

    9. Click the "Run Bulk Operations" button. You should see a confirmation message, including an ID for the async operation that was started to handle your bulk action. You can track the status of this async operation on the Operations dashboard page, which can be visited by clicking the "Go To Operations" button:

      Screenshot of Cumulus dashboard showing confirmation message with async operation ID for bulk granules request

    Creating an index pattern for Kibana

    1. Define the index pattern for the indices that your Kibana queries should use. A wildcard character, *, will match across multiple indices. Once you are satisfied with your index pattern, click the "Next step" button:

      Screenshot of Kibana user interface for defining an index pattern

    2. Choose whether to use a Time Filter for your data, which is not required. Then click the "Create index pattern" button:

      Screenshot of Kibana user interface for configuring the settings of an index pattern

    Status Tracking

    All bulk operations return an AsyncOperationId which can be submitted to the /asyncOperations endpoint.

    The /asyncOperations endpoint allows listing of AsyncOperation records as well as record retrieval for individual records, which will contain the status. The Cumulus API documentation shows sample requests for these actions.

    The Cumulus Dashboard also includes an Operations monitoring page, where operations and their status are visible:

    Screenshot of Cumulus Dashboard Operations Page showing 5 operations and their status, ID, description, type and creation timestamp

    - + \ No newline at end of file diff --git a/docs/v14.1.0/operator-docs/cmr-operations/index.html b/docs/v14.1.0/operator-docs/cmr-operations/index.html index 588d21d4919..c7db87038ca 100644 --- a/docs/v14.1.0/operator-docs/cmr-operations/index.html +++ b/docs/v14.1.0/operator-docs/cmr-operations/index.html @@ -5,7 +5,7 @@ CMR Operations | Cumulus Documentation - + @@ -16,7 +16,7 @@ UpdateCmrAccessConstraints will update CMR metadata file contents on S3, and PostToCmr will push the updates to CMR. The rest of this section will assume you have created this workflow under the name UpdateCmrAccessConstraints.

    Once created and deployed, the workflow is available in the Cumulus dashboard's Execute workflow selector. However, note that additional configuration is required for this request, to supply an access constraint integer value and optional description to the UpdateCmrAccessConstraints workflow, by clicking the Add Custom Workflow Meta option in the Execute popup, as shown below:

    Screenshot showing granule execute popup with &#39;updateCmrAccessConstraints&#39; selected and configuration values shown in a collapsible JSON field

    An example invocation of the API to perform this action is:

    $ curl --request PUT https://example.com/granules/MOD11A1.A2017137.h19v16.006.2017138085750 \
    --header 'Authorization: Bearer ReplaceWithTheToken' \
    --header 'Content-Type: application/json' \
    --data '{
    "action": "applyWorkflow",
    "workflow": "updateCmrAccessConstraints",
    "meta": {
    accessConstraints: {
    value: 5,
    description: "sample access constraint"
    }
    }
    }'

    Supported CMR metadata formats for the above operation are Echo10XML and UMMG-JSON, which will populate the RestrictionFlag and RestrictionComment fields in Echo10XML, or the AccessConstraints values in UMMG-JSON.

    Additional Operations

    At this time Cumulus does not, out of the box, support additional operations on CMR metadata. However, given the examples shown above, we recommend working with your integrators to develop additional workflows that perform any required operations.

    Bulk CMR operations

    In order to perform the above operations in bulk, Cumulus supports the use of ApplyWorkflow in an AsyncOperation. These are accessed via the Bulk Operation button on the dashboard, or the /granules/bulk endpoint on the Cumulus API.

    More information on bulk operations are in the bulk operations operator doc.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/operator-docs/create-rule-in-cumulus/index.html b/docs/v14.1.0/operator-docs/create-rule-in-cumulus/index.html index 116790f9efb..688d45f495b 100644 --- a/docs/v14.1.0/operator-docs/create-rule-in-cumulus/index.html +++ b/docs/v14.1.0/operator-docs/create-rule-in-cumulus/index.html @@ -5,13 +5,13 @@ Create Rule In Cumulus | Cumulus Documentation - +
    Version: v14.1.0

    Create Rule In Cumulus

    Once the above files are in place and the entries created in CMR and Cumulus, we are ready to begin ingesting data. Depending on the type of ingestion (FTP/Kinesis, etc) the values below will change, but for the most part they are all similar. Rules tell Cumulus how to associate providers and collections, and when/how to start processing a workflow.

    Steps

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v14.1.0/operator-docs/discovery-filtering/index.html b/docs/v14.1.0/operator-docs/discovery-filtering/index.html index aa33b8542bf..8fc4178d06c 100644 --- a/docs/v14.1.0/operator-docs/discovery-filtering/index.html +++ b/docs/v14.1.0/operator-docs/discovery-filtering/index.html @@ -5,7 +5,7 @@ Discovery Filtering | Cumulus Documentation - + @@ -24,7 +24,7 @@ directly list the provider_path. If the path contains regular expression components, this may fail.

    It is recommended that operators diagnose any failures by checking error logs and ensuring that permissions on the remote file system allow reading of the default directory and any subdirectories that match the filter.

    Supported protocols

    Currently support for this feature is limited to the following protocols:

    • ftp
    • sftp
    - + \ No newline at end of file diff --git a/docs/v14.1.0/operator-docs/granule-workflows/index.html b/docs/v14.1.0/operator-docs/granule-workflows/index.html index 578bfef1fb5..47aee78b7d2 100644 --- a/docs/v14.1.0/operator-docs/granule-workflows/index.html +++ b/docs/v14.1.0/operator-docs/granule-workflows/index.html @@ -5,13 +5,13 @@ Granule Workflows | Cumulus Documentation - +
    Version: v14.1.0

    Granule Workflows

    Failed Granule

    Delete and Ingest

    1. Delete Granule

    Note: Granules published to CMR will need to be removed from CMR via the dashboard prior to deletion

    1. Ingest Granule via Ingest Rule
    • Re-trigger a one-time, kinesis, SQS, or SNS rule or a scheduled rule will re-discover and reingest the deleted granule.

    Reingest

    1. Select Failed Granule
    • In the Cumulus dashboard, go to the Collections page.
    • Use search field to find the granule.
    1. Re-ingest Granule
    • Go to the Collections page.
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of the Reingest modal workflow

    Delete and Ingest

    1. Bulk Delete Granules
    • Go to the Granules page.
    • Use the Bulk Delete button to bulk delete selected granules or select via a Kibana query

    Note: You can optionally force deletion from CMR

    1. Ingest Granules via Ingest Rule
    • Re-trigger one-time, kinesis, SQS, or SNS rules or scheduled rules will re-discover and reingest the deleted granule.

    Multiple Failed Granules

    1. Select Failed Granules
    • In the Cumulus dashboard, go to the Collections page.
    • Click on Failed Granules.
    • Select multiple granules.

    Screenshot of selected multiple granules

    1. Bulk Re-ingest Granules
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of Bulk Reingest modal workflow

    - + \ No newline at end of file diff --git a/docs/v14.1.0/operator-docs/kinesis-stream-for-ingest/index.html b/docs/v14.1.0/operator-docs/kinesis-stream-for-ingest/index.html index 5b96b8aeb6b..9e7a802c6e8 100644 --- a/docs/v14.1.0/operator-docs/kinesis-stream-for-ingest/index.html +++ b/docs/v14.1.0/operator-docs/kinesis-stream-for-ingest/index.html @@ -5,13 +5,13 @@ Setup Kinesis Stream & CNM Message | Cumulus Documentation - +
    Version: v14.1.0

    Setup Kinesis Stream & CNM Message

    Note: Keep in mind that you should only have to set this up once per ingest stream. Kinesis pricing is based on the shard value and not on amount of kinesis usage.

    1. Create a Kinesis Stream

      • In your AWS console, go to the Kinesis service and click Create Data Stream.
      • Assign a name to the stream.
      • Apply a shard value of 1.
      • Click on Create Kinesis Stream.
      • A status page with stream details display. Once the status is active then the stream is ready to use. Keep in mind to record the streamName and StreamARN for later use.

      Screenshot of AWS console page for creating a Kinesis stream

    2. Create a Rule

    3. Send a message

      • Send a message that makes your schema using python or by your command line.
      • The streamName and Collection must match the kinesisArn+collection defined in the rule that you have created in Step 2.
    - + \ No newline at end of file diff --git a/docs/v14.1.0/operator-docs/locating-access-logs/index.html b/docs/v14.1.0/operator-docs/locating-access-logs/index.html index dd5169ae3ef..aba0d017fbc 100644 --- a/docs/v14.1.0/operator-docs/locating-access-logs/index.html +++ b/docs/v14.1.0/operator-docs/locating-access-logs/index.html @@ -5,13 +5,13 @@ Locating S3 Access Logs | Cumulus Documentation - +
    Version: v14.1.0

    Locating S3 Access Logs

    When enabling S3 Access Logs for EMS Reporting you configured a TargetBucket and TargetPrefix. Inside the TargetBucket at the TargetPrefix is where you will find the raw S3 access logs.

    In a standard deployment, this will be your stack's <internal bucket name> and a key prefix of <stack>/ems-distribution/s3-server-access-logs/

    - + \ No newline at end of file diff --git a/docs/v14.1.0/operator-docs/naming-executions/index.html b/docs/v14.1.0/operator-docs/naming-executions/index.html index 61c04b0f6ab..4f8a02fb531 100644 --- a/docs/v14.1.0/operator-docs/naming-executions/index.html +++ b/docs/v14.1.0/operator-docs/naming-executions/index.html @@ -5,7 +5,7 @@ Naming Executions | Cumulus Documentation - + @@ -21,7 +21,7 @@ QueuePdrs step.

    In the following excerpt, the QueueGranules config.executionNamePrefix property is set using the value configured in the workflow's meta.executionNamePrefix.

    Please note: This meta.executionNamePrefix property should not be confused with the optional rule executionNamePrefix property from the previous section. Setting executionNamePrefix as a root property of the rule will set a prefix for the names of any workflows triggered by the rule. Setting meta.executionNamePrefix on the rule will set meta.executionNamePrefix in the workflow messages generated for this rule, allowing workflow steps like QueueGranules to read from the message meta.executionNamePrefix for their config. Then, workflows scheduled by QueueGranules would use the configured execution name prefix.

    Setting executionNamePrefix config for QueueGranules using rule.meta

    If you wanted to use a prefix of "my-prefix", you would create a rule with a meta property similar to the following Rule snippet:

    {
    ...other rule keys here...
    "meta":
    {
    "executionNamePrefix": "my-prefix"
    }
    }

    The value of meta.executionNamePrefix from the rule will be set as meta.executionNamePrefix in the workflow message.

    Then, the workflow could contain a "QueueGranules" step with the following state, which uses meta.executionNamePrefix from the message as the value for the executionNamePrefix config to the "QueueGranules" step:

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "executionNamePrefix": "{$.meta.executionNamePrefix}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },
    }
    - + \ No newline at end of file diff --git a/docs/v14.1.0/operator-docs/ops-common-use-cases/index.html b/docs/v14.1.0/operator-docs/ops-common-use-cases/index.html index 62ca164fc49..1bd4035dcb6 100644 --- a/docs/v14.1.0/operator-docs/ops-common-use-cases/index.html +++ b/docs/v14.1.0/operator-docs/ops-common-use-cases/index.html @@ -5,13 +5,13 @@ Operator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v14.1.0/operator-docs/trigger-workflow/index.html b/docs/v14.1.0/operator-docs/trigger-workflow/index.html index ee79463e660..7dd944753d1 100644 --- a/docs/v14.1.0/operator-docs/trigger-workflow/index.html +++ b/docs/v14.1.0/operator-docs/trigger-workflow/index.html @@ -5,13 +5,13 @@ Trigger a Workflow Execution | Cumulus Documentation - +
    Version: v14.1.0

    Trigger a Workflow Execution

    To trigger a workflow, you need to create a rule. To trigger an ingest workflow, one that requires discovering and ingesting data, you will also need to configure the collection and provider and associate those to a rule.

    Trigger a HelloWorld Workflow

    To trigger a HelloWorld workflow that does not need to discover or archive data, you just need to create a rule.

    You can leave the provider and collection blank and do not need any additional metadata. If you create a onetime rule, the workflow execution will start momentarily and you can view its status on the Executions page.

    Trigger an Ingest Workflow

    To ingest data, you will need a provider and collection configured to tell your workflow where to discover data and where to archive the data respectively.

    Follow the instructions to create a provider and create a collection and configure their fields for your data ingest.

    In the rule's additional metadata you can specify a provider_path from which to get the data from the provider.

    Example: Ingest data from S3

    Setup

    Assume there are 2 files to be ingested in an S3 bucket called discovery-bucket, located in the test-data folder:

    • GRANULE.A2017025.jpg
    • GRANULE.A2017025.hdf

    Archive buckets should already be created and mapped to public / private / protected in the Cumulus deployment.

    For example:

    buckets = {
    private = {
    name = "discovery-bucket"
    type = "private"
    },
    protected = {
    name = "archive-protected"
    type = "protected"
    }
    public = {
    name = "archive-public"
    type = "public"
    }
    }

    Create a provider

    Create a new provider. Set protocol to S3 and Host to discovery-bucket.

    Screenshot of adding a sample S3 provider

    Create a collection

    Create a new collection. Configure the collection to extract the granule id from the filenames and configure where to store the granule files.

    The configuration below will store hdf files in the protected bucket and jpg files in the private bucket. The bucket types are

    {
    "name": "test-collection",
    "version": "001",
    "granuleId": "^GRANULE\\.A[\\d]{7}$",
    "granuleIdExtraction": "(GRANULE\\..*)(\\.hdf|\\.jpg)",
    "reportToEms": false,
    "sampleFileName": "GRANULE.A2017025.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^GRANULE\\.A[\\d]{7}\\.hdf$",
    "sampleFileName": "GRANULE.A2017025.hdf"
    },
    {
    "bucket": "public",
    "regex": "^GRANULE\\.A[\\d]{7}\\.jpg$",
    "sampleFileName": "GRANULE.A2017025.jpg"
    }
    ]
    }

    Create a rule

    Create a rule to trigger the workflow to discover your granule data and ingest your granule.

    Select the previously created provider and collection. See the Cumulus Discover Granules workflow for a workflow example of using Cumulus tasks to discover and queue data for ingest.

    In the rule meta, set the provider_path to test-data, so the test-data folder will be used to discover new granules.

    Screenshot of adding a Discover Granules rule

    A onetime rule will run your workflow on-demand and you can view it on the dashboard Executions page. The Cumulus Discover Granules workflow will trigger an ingest workflow and your ingested granules will be visible on the dashboard Granules page.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/tasks/index.html b/docs/v14.1.0/tasks/index.html index c7af075da95..bdcd245bd3d 100644 --- a/docs/v14.1.0/tasks/index.html +++ b/docs/v14.1.0/tasks/index.html @@ -5,13 +5,13 @@ Cumulus Tasks | Cumulus Documentation - +
    Version: v14.1.0

    Cumulus Tasks

    A list of reusable Cumulus tasks. Add your own.

    Tasks

    @cumulus/add-missing-file-checksums

    Add checksums to files in S3 which don't have one


    @cumulus/discover-granules

    Discover Granules in FTP/HTTP/HTTPS/SFTP/S3 endpoints


    @cumulus/discover-pdrs

    Discover PDRs in FTP and HTTP endpoints


    @cumulus/files-to-granules

    Converts array-of-files input into a granules object by extracting granuleId from filename


    @cumulus/hello-world

    Example task


    @cumulus/hyrax-metadata-updates

    Update granule metadata with hooks to OPeNDAP URL


    @cumulus/lzards-backup

    Run LZARDS backup


    @cumulus/move-granules

    Move granule files from staging to final location


    @cumulus/parse-pdr

    Download and Parse a given PDR


    @cumulus/pdr-status-check

    Checks execution status of granules in a PDR


    @cumulus/post-to-cmr

    Post a given granule to CMR


    @cumulus/queue-granules

    Add discovered granules to the queue


    @cumulus/queue-pdrs

    Add discovered PDRs to a queue


    @cumulus/queue-workflow

    Add workflow to the queue


    @cumulus/sf-sqs-report

    Sends an incoming Cumulus message to SQS


    @cumulus/sync-granule

    Download a given granule


    @cumulus/test-processing

    Fake processing task used for integration tests


    @cumulus/update-cmr-access-constraints

    Updates CMR metadata to set access constraints


    Update CMR metadata files with correct online access urls and etags and transfer etag info to granules' CMR files

    - + \ No newline at end of file diff --git a/docs/v14.1.0/team/index.html b/docs/v14.1.0/team/index.html index c3f856652a9..24ce19d2c9f 100644 --- a/docs/v14.1.0/team/index.html +++ b/docs/v14.1.0/team/index.html @@ -5,13 +5,13 @@ Cumulus Team | Cumulus Documentation - +
    Version: v14.1.0

    Cumulus Team

    Cumulus Core Team

    Cumulus Emeritus Team

    - + \ No newline at end of file diff --git a/docs/v14.1.0/troubleshooting/index.html b/docs/v14.1.0/troubleshooting/index.html index 15925e45e89..f25cdc96c5e 100644 --- a/docs/v14.1.0/troubleshooting/index.html +++ b/docs/v14.1.0/troubleshooting/index.html @@ -5,14 +5,14 @@ How to Troubleshoot and Fix Issues | Cumulus Documentation - +
    Version: v14.1.0

    How to Troubleshoot and Fix Issues

    While Cumulus is a complex system, there is a focus on maintaining the integrity and availability of the system and data. Should you encounter errors or issues while using this system, this section will help troubleshoot and solve those issues.

    Backup and Restore

    Cumulus has backup and restore functionality built-in to protect Cumulus data and allow recovery of a Cumulus stack. This is currently limited to Cumulus data and not full S3 archive data. Backup and restore is not enabled by default and must be enabled and configured to take advantage of this feature.

    For more information, read the Backup and Restore documentation.

    Elasticsearch reindexing

    If you run into issues with your Elasticsearch index, a reindex operation is available via the Cumulus API. See the Reindexing Guide.

    Information on how to reindex Elasticsearch is in the Cumulus API documentation.

    Troubleshooting Workflows

    Workflows are state machines comprised of tasks and services and each component logs to CloudWatch. The CloudWatch logs for all steps in the execution are displayed in the Cumulus dashboard or you can find them by going to CloudWatch and navigating to the logs for that particular task.

    Workflow Errors

    Visual representations of executed workflows can be found in the Cumulus dashboard or the AWS Step Functions console for that particular execution.

    If a workflow errors, the error will be handled according to the error handling configuration. The task that fails will have the exception field populated in the output, giving information about the error. Further information can be found in the CloudWatch logs for the task.

    Graph of AWS Step Function execution showing a failing workflow

    Workflow Did Not Start

    Generally, first check your rule configuration. If that is satisfactory, the answer will likely be in the CloudWatch logs for the schedule SF or SF starter lambda functions. See the workflow triggers page for more information on how workflows start.

    For Kinesis and SNS rules specifically, if an error occurs during the message consumer process, the fallback consumer lambda will be called and if the message continues to error, a message will be placed on the dead letter queue. Check the dead letter queue for a failure message. Errors can be traced back to the CloudWatch logs for the message consumer and the fallback consumer. Additionally, check that the name and version match those configured in your rule, as rules are filtered by the notification's collection name and version before scheduling executions.

    More information on kinesis error handling is here.

    Operator API Errors

    All operator API calls are funneled through the ApiEndpoints lambda. Each API call is logged to the ApiEndpoints CloudWatch log for your deployment.

    Lambda Errors

    KMS Exception: AccessDeniedException

    KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.

    The above error was being thrown by cumulus lambda function invocation. The KMS key is the encryption key used to encrypt lambda environment variables. The root cause of this error is unknown, but is speculated to be caused by deleting and recreating, with the same name, the IAM role the lambda uses.

    This error can be resolved by switching the lambda's execution role to a different one and then back through the Lambda management console. Unfortunately, this approach doesn't scale well.

    The other resolution (that scales but takes some time) that was found is as follows:

    1. Comment out all lambda definitions (and dependent resources) in your Terraform configuration.
    2. terraform apply to delete the lambdas.
    3. Un-comment the definitions.
    4. terraform apply to recreate the lambdas.

    If this problem occurs with Core lambdas and you are using the terraform-aws-cumulus.zip file source distributed in our release, we recommend using the non-scaling approach as the number of lambdas we distribute is in the low teens, which are likely to be easier and faster to reconfigure one-by-one compared to editing our configs.

    Error: Unable to import module 'index': Error

    This error is shown in the CloudWatch logs for a Lambda function.

    One possible cause is that the Lambda definition in the .tf file defining the lambda is not pointing to the correct packaged lambda source file. In order to resolve this issue, update the lambda definition to point directly to the packaged (e.g. .zip) lambda source file.

    resource "aws_lambda_function" "discover_granules_task" {
    function_name = "${var.prefix}-DiscoverGranules"
    filename = "${path.module}/../../tasks/discover-granules/dist/lambda.zip"
    handler = "index.handler"
    }

    If you are seeing this error when using the Lambda as a step in a Cumulus workflow, then inspect the output for this Lambda step in the AWS Step Function console. If you see the error Cannot find module 'node_modules/@cumulus/cumulus-message-adapter-js', then you need to ensure the lambda's packaged dependencies include cumulus-message-adapter-js.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/troubleshooting/reindex-elasticsearch/index.html b/docs/v14.1.0/troubleshooting/reindex-elasticsearch/index.html index 2ad3202cfbc..c6a5deecc25 100644 --- a/docs/v14.1.0/troubleshooting/reindex-elasticsearch/index.html +++ b/docs/v14.1.0/troubleshooting/reindex-elasticsearch/index.html @@ -5,7 +5,7 @@ Reindexing Elasticsearch Guide | Cumulus Documentation - + @@ -14,7 +14,7 @@ current index, or the mappings for an index have been updated (they do not update automatically). Any reindexing that will be required when upgrading Cumulus will be in the Migration Steps section of the changelog.

    Switch to a new index and Reindex

    There are two operations needed: reindex and change-index to switch over to the new index. A Change Index/Reindex can be done in either order, but both have their trade-offs.

    If you decide to point Cumulus to a new (empty) index first (with a change index operation), and then Reindex the data to the new index, data ingested while reindexing will automatically be sent to the new index. As reindexing operations can take a while, not all the data will show up on the Cumulus Dashboard right away. The advantage is you do not have to turn of any ingest operations. This way is recommended.

    If you decide to Reindex data to a new index first, and then point Cumulus to that new index, it is not guaranteed that data that is sent to the old index while reindexing will show up in the new index. If you prefer this way, it is recommended to turn off any ingest operations. This order will keep your dashboard data from seeing any interruption.

    Change Index

    This will point Cumulus to the index in Elasticsearch that will be used when retrieving data. Performing a change index operation to an index that does not exist yet will create the index for you. The change index operation can be found here.

    Reindex from the old index to the new index

    The reindex operation will take the data from one index and copy it into another index. The reindex operation can be found here

    Reindex status

    Reindexing is a long-running operation. The reindex-status endpoint can be used to monitor the progress of the operation.

    Index from database

    If you want to just grab the data straight from the database you can perform an Index from Database Operation. After the data is indexed from the database, a Change Index operation will need to be performed to ensure Cumulus is pointing to the right index. It is strongly recommended to turn off workflow rules when performing this operation so any data ingested to the database is not lost.

    Validate reindex

    To validate the reindex, use the reindex-status endpoint. The doc count can be used to verify that the reindex was successful. In the below example the reindex from cumulus-2020-11-3 to cumulus-2021-3-4 was not fully successful as they show different doc counts.

    "indices": {
    "cumulus-2020-11-3": {
    "primaries": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    },
    "total": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    }
    },
    "cumulus-2021-3-4": {
    "primaries": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    },
    "total": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    }
    }
    }

    To further drill down into what is missing, log in to the Kibana instance (found in the Elasticsearch section of the AWS console) and run the following command replacing <index> with your index name.

    GET <index>/_search
    {
    "aggs": {
    "count_by_type": {
    "terms": {
    "field": "_type"
    }
    }
    },
    "size": 0
    }

    which will produce a result like

    "aggregations": {
    "count_by_type": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "logs",
    "doc_count": 483955
    },
    {
    "key": "execution",
    "doc_count": 4966
    },
    {
    "key": "deletedgranule",
    "doc_count": 4715
    },
    {
    "key": "pdr",
    "doc_count": 1822
    },
    {
    "key": "granule",
    "doc_count": 740
    },
    {
    "key": "asyncOperation",
    "doc_count": 616
    },
    {
    "key": "provider",
    "doc_count": 108
    },
    {
    "key": "collection",
    "doc_count": 87
    },
    {
    "key": "reconciliationReport",
    "doc_count": 48
    },
    {
    "key": "rule",
    "doc_count": 7
    }
    ]
    }
    }

    Resuming a reindex

    If a reindex operation did not fully complete it can be resumed using the following command run from the Kibana instance.

    POST _reindex?wait_for_completion=false
    {
    "conflicts": "proceed",
    "source": {
    "index": "cumulus-2020-11-3"
    },
    "dest": {
    "index": "cumulus-2021-3-4",
    "op_type": "create"
    }
    }

    The Cumulus API reindex-status endpoint can be used to monitor completion of this operation.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/troubleshooting/rerunning-workflow-executions/index.html b/docs/v14.1.0/troubleshooting/rerunning-workflow-executions/index.html index 733da9dfb2f..4ecae787bca 100644 --- a/docs/v14.1.0/troubleshooting/rerunning-workflow-executions/index.html +++ b/docs/v14.1.0/troubleshooting/rerunning-workflow-executions/index.html @@ -5,13 +5,13 @@ Re-running workflow executions | Cumulus Documentation - +
    Version: v14.1.0

    Re-running workflow executions

    To re-run a Cumulus workflow execution from the AWS console:

    1. Visit the page for an individual workflow execution

    2. Click the "New execution" button at the top right of the screen

      Screenshot of the AWS console for a Step Function execution highlighting the &quot;New execution&quot; button at the top right of the screen

    3. In the "New execution" modal that appears, replace the cumulus_meta.execution_name value in the default input with the value of the new execution ID as seen in the screenshot below

      Screenshot of the AWS console showing the modal window for entering input when running a new Step Function execution

    4. Click the "Start execution" button

    - + \ No newline at end of file diff --git a/docs/v14.1.0/troubleshooting/troubleshooting-deployment/index.html b/docs/v14.1.0/troubleshooting/troubleshooting-deployment/index.html index 6f91d9fccc5..da1f9693605 100644 --- a/docs/v14.1.0/troubleshooting/troubleshooting-deployment/index.html +++ b/docs/v14.1.0/troubleshooting/troubleshooting-deployment/index.html @@ -5,7 +5,7 @@ Troubleshooting Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ data-persistence modules, but your config is only creating one Elasticsearch instance. To fix the issue, update the elasticsearch_config variable for your data-persistence module to increase the number of instances:

    {
    domain_name = "es"
    instance_count = 2
    instance_type = "t2.small.elasticsearch"
    version = "5.3"
    volume_size = 10
    }

    Install dashboard

    Dashboard configuration

    Issues:

    • Problem clearing the cache: EACCES: permission denied, rmdir '/tmp/gulp-cache/default'", this probably means the files at that location, and/or the folder, are owned by someone else (or some other factor prevents you from writing there).

    It's possible to workaround this by editing the file cumulus-dashboard/node_modules/gulp-cache/index.js and alter the value of the line var fileCache = new Cache({cacheDirName: 'gulp-cache'}); to something like var fileCache = new Cache({cacheDirName: '<prefix>-cache'});. Now gulp-cache will be able to write to /tmp/<prefix>-cache/default, and the error should resolve.

    Dashboard deployment

    Issues:

    • If the dashboard sends you to an Earthdata Login page that has an error reading "Invalid request, please verify the client status or redirect_uri before resubmitting", this means you've either forgotten to update one or more of your EARTHDATA_CLIENT_ID, EARTHDATA_CLIENT_PASSWORD environment variables (from your app/.env file) and re-deploy Cumulus, or you haven't placed the correct values in them, or you've forgotten to add both the "redirect" and "token" URL to the Earthdata Application.
    • There is odd caching behavior associated with the dashboard and Earthdata Login at this point in time that can cause the above error to reappear on the Earthdata Login page loaded by the dashboard even after fixing the cause of the error. If you experience this, attempt to access the dashboard in a new browser window, and it should work.
    - + \ No newline at end of file diff --git a/docs/v14.1.0/upgrade-notes/cumulus_distribution_migration/index.html b/docs/v14.1.0/upgrade-notes/cumulus_distribution_migration/index.html index fa7904a521d..78d3fe17f76 100644 --- a/docs/v14.1.0/upgrade-notes/cumulus_distribution_migration/index.html +++ b/docs/v14.1.0/upgrade-notes/cumulus_distribution_migration/index.html @@ -5,14 +5,14 @@ Migrate from TEA deployment to Cumulus Distribution | Cumulus Documentation - +
    Version: v14.1.0

    Migrate from TEA deployment to Cumulus Distribution

    Background

    The Cumulus Distribution API is configured to use the AWS Cognito OAuth client. This API can be used instead of the Thin Egress App, which is the default distribution API if using the Deployment Template.

    Configuring a Cumulus Distribution deployment

    See these instructions for deploying the Cumulus Distribution API.

    Important note if migrating from TEA to Cumulus Distribution

    If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/upgrade-notes/migrate_tea_standalone/index.html b/docs/v14.1.0/upgrade-notes/migrate_tea_standalone/index.html index a5a7da9e9b2..8e8ba52fe76 100644 --- a/docs/v14.1.0/upgrade-notes/migrate_tea_standalone/index.html +++ b/docs/v14.1.0/upgrade-notes/migrate_tea_standalone/index.html @@ -5,13 +5,13 @@ Migrate TEA deployment to standalone module | Cumulus Documentation - +
    Version: v14.1.0

    Migrate TEA deployment to standalone module

    Background

    This document is only relevant for upgrades of Cumulus from versions < 3.x.x to versions > 3.x.x

    Previous versions of Cumulus included deployment of the Thin Egress App (TEA) by default in the distribution module. As a result, Cumulus users who wanted to deploy a new version of TEA to wait on a new release of Cumulus that incorporated that release.

    In order to give Cumulus users the flexibility to deploy newer versions of TEA whenever they want, deployment of TEA has been removed from the distribution module and Cumulus users must now add the TEA module to their deployment. Guidance on integrating the TEA module to your deployment is provided, or you can refer to Cumulus core example deployment code for the thin_egress_app module.

    By default, when upgrading Cumulus and moving from TEA deployed via the distribution module to deployed as a separate module, your API gateway for TEA would be destroyed and re-created, which could cause outages for any Cloudfront endpoints pointing at that API gateway.

    These instructions outline how to modify your state to preserve your existing Thin Egress App (TEA) API gateway when upgrading Cumulus and moving deployment of TEA to a standalone module. If you do not care about preserving your API gateway for TEA when upgrading your Cumulus deployment, you can skip these instructions.

    Prerequisites

    Notes about state management

    These instructions will involve manipulating your Terraform state via terraform state mv commands. These operations are extremely dangerous, since a mistake in editing your Terraform state can leave your stack in a corrupted state where deployment may be impossible or may result in unanticipated resource deletion.

    Since bucket versioning preserves a separate version of your state file each time it is written, and the Terraform state modification commands overwrite the state file, we can mitigate the risk of these operations by downloading the most recent state file before starting the upgrade process. Then, if anything goes wrong during the upgrade, we can restore that previous state version. Guidance on how to perform both operations is provided below.

    Download your most recent state version

    Run this command to download the most recent cumulus deployment state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp s3://BUCKET/KEY /path/to/terraform.tfstate

    Restore a previous state version

    Upload the state file that was previously downloaded to the bucket/key for your state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp /path/to/terraform.tfstate s3://BUCKET/KEY

    Then run terraform plan, which will give an error because we manually overwrote the state file and it is now out of sync with the lock table Terraform uses to track your state file:

    Error: Error loading state: state data in S3 does not have the expected content.

    This may be caused by unusually long delays in S3 processing a previous state
    update. Please wait for a minute or two and try again. If this problem
    persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
    to manually verify the remote state and update the Digest value stored in the
    DynamoDB table to the following value: <some-digest-value>

    To resolve this error, run this command and replace DYNAMO_LOCK_TABLE, BUCKET and KEY with the correct values from cumulus-tf/terraform.tf, and use the digest value from the previous error output:

     aws dynamodb put-item \
    --table-name DYNAMO_LOCK_TABLE \
    --item '{
    "LockID": {"S": "BUCKET/KEY-md5"},
    "Digest": {"S": "some-digest-value"}
    }'

    Now, if you re-run terraform plan, it should work as expected.

    Migration instructions

    Please note: These instructions assume that you are deploying the thin_egress_app module as shown in the Cumulus core example deployment code

    1. Ensure that you have downloaded the latest version of your state file for your cumulus deployment

    2. Find the URL for your <prefix>-thin-egress-app-EgressGateway API gateway. Confirm that you can access it in the browser and that it is functional.

    3. Run terraform plan. You should see output like (edited for readability):

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be created
      + resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket.lambda_source will be created
      + resource "aws_s3_bucket" "lambda_source" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be created
      + resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be created
      + resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be created
      + resource "aws_s3_bucket_object" "lambda_source" {

      # module.thin_egress_app.aws_security_group.egress_lambda[0] will be created
      + resource "aws_security_group" "egress_lambda" {

      ...

      # module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be destroyed
      - resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source will be destroyed
      - resource "aws_s3_bucket" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be destroyed
      - resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be destroyed
      - resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source will be destroyed
      - resource "aws_s3_bucket_object" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda[0] will be destroyed
      - resource "aws_security_group" "egress_lambda" {
    4. Run the state modification commands. The commands must be run in exactly this order:

       # Move security group
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda module.thin_egress_app.aws_security_group.egress_lambda

      # Move TEA storage bucket
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source module.thin_egress_app.aws_s3_bucket.lambda_source

      # Move TEA lambda source code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source module.thin_egress_app.aws_s3_bucket_object.lambda_source

      # Move TEA lambda dependency code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive

      # Move TEA Cloudformation template
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template module.thin_egress_app.aws_s3_bucket_object.cloudformation_template

      # Move URS creds secret version
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret_version.thin_egress_urs_creds aws_secretsmanager_secret_version.thin_egress_urs_creds

      # Move URS creds secret
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret.thin_egress_urs_creds aws_secretsmanager_secret.thin_egress_urs_creds

      # Move TEA Cloudformation stack
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app module.thin_egress_app.aws_cloudformation_stack.thin_egress_app

      Depending on how you were supplying a bucket map to TEA, there may be an additional step. If you were specifying the bucket_map_key variable to the cumulus module to use a custom bucket map, then you can ignore this step and just ensure that the bucket_map_file variable to the TEA module uses that same S3 key. Otherwise, if you were letting Cumulus generate a bucket map for you, then you need to take this step to migrate that bucket map:

      # Move bucket map
      terraform state mv module.cumulus.module.distribution.aws_s3_bucket_object.bucket_map_yaml[0] aws_s3_bucket_object.bucket_map_yaml
    5. Run terraform plan again. You may still see a few additions/modifications pending like below, but you should not see any deletion of Thin Egress App resources pending:

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be updated in-place
      ~ resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be updated in-place
      ~ resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_source" {

      If you still see deletion of module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app pending, then something went wrong and you should restore the previously downloaded state file version and start over from step 1. Otherwise, proceed to step 6.

    6. Once you have confirmed that everything looks as expected, run terraform apply.

    7. Visit the same API gateway from step 1 and confirm that it still works.

    Your TEA deployment has now been migrated to a standalone module, which gives you the ability to upgrade the deployed version of TEA independently of Cumulus releases.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/upgrade-notes/update-cma-2.0.2/index.html b/docs/v14.1.0/upgrade-notes/update-cma-2.0.2/index.html index 6cf59b33e2e..4dac0073324 100644 --- a/docs/v14.1.0/upgrade-notes/update-cma-2.0.2/index.html +++ b/docs/v14.1.0/upgrade-notes/update-cma-2.0.2/index.html @@ -5,13 +5,13 @@ Upgrade to CMA 2.0.2 | Cumulus Documentation - +
    Version: v14.1.0

    Upgrade to CMA 2.0.2

    Updating a Cumulus Deployment to CMA 2.0.2

    Background

    The Cumulus Message Adapter has been updated in release 2.0.2 to no longer utilize the AWS step function API to look up the defined name of a step function task for population in meta.workflow_tasks, but instead use an incrementing integer field.

    Additionally a bugfix was released in the form of v2.0.1/v2.0.2 following the initial 2.0.0 release, so all users should update to release 2.0.2

    The update is not tied to a particular version of Core, however the update should be done across all task components in order to ensure consistent execution records.

    Changes

    Execution Record Update

    This update functionally means that Cumulus tasks/activities using the CMA will now record a record that looks like the following in meta.workflowtasks, and more importantly in the tasks column for an execution record:

    Original

          "DiscoverGranules": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "QueueGranules": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    New

          "0": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "1": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    Actions Required

    The following should be done as part of a Cumulus stack update to utilize cumulus message adapter > 2.0.2:

    • Python tasks that utilize cumulus-message-adapter-python should be updated to use > 2.0.0, their lambdas rebuilt and Cumulus workflows reconfigured to use the updated version.

    • Python activities that utilize cumulus-process-py should be rebuilt using > 1.0.0 with updated dependencies, and have their images deployed/Cumulus configured to use the new version.

    • The cumulus-message-adapter v2.0.2 lambda layer should be made available in the deployment account, and the Cumulus deployment should be reconfigured to use it (via the cumulus_message_adapter_lambda_layer_version_arn variable in the cumulus module). This should address all Core node.js tasks that utilize the CMA, and many contributed node.js/JAVA components.

    Once the above have been done, redeploy Cumulus to apply the configuration and the updates should be live.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/upgrade-notes/update-task-file-schemas/index.html b/docs/v14.1.0/upgrade-notes/update-task-file-schemas/index.html index c0236d1dc96..4c45a3ad25b 100644 --- a/docs/v14.1.0/upgrade-notes/update-task-file-schemas/index.html +++ b/docs/v14.1.0/upgrade-notes/update-task-file-schemas/index.html @@ -5,13 +5,13 @@ Updates to task granule file schemas | Cumulus Documentation - +
    Version: v14.1.0

    Updates to task granule file schemas

    Background

    Most Cumulus workflow tasks expect as input a payload of granule(s) which contain the files for each granule. Most tasks also return this same granule structure as output.

    However, up to this point, there was inconsistency in the schemas for the granule files objects expected by each task. Furthermore, there was no guarantee of consistency between granule files objects as stored in the database and the expectations of any given workflow task.

    Thus, when performing bulk granule operations which pass granules from the database into a Cumulus workflow, it was possible for there to be schema validation failures depending on which task was used to start the workflow and its particular schema.

    In order to rectify this situation, CUMULUS-2388 was filed and addressed to create a common granule files schema between nearly all of the Cumulus tasks (exceptions discussed below) and the Cumulus database. The following documentation explains the manual changes you need to make to your deployment in order to be compatible with the updated files schema.

    Updated files schema

    The updated granule files schema can be found here.

    These former properties were deprecated (with notes about how to derive the same information from the updated schema, if possible):

    • filename - concatenate the bucket and key values with a directory separator (/)
    • name - use fileName property
    • etag - ETags are no longer provided as an individual file property. Instead, a separate etags object mapping S3 URIs to ETag values is provided as output from the following workflow tasks (guidance on how to integrate this output with your workflows is provided in the Upgrading your workflows section below):
      • update-granules-cmr-metadata-file-links
      • hyrax-metadata-updates
    • fileStagingDir - no longer supported
    • url_path - no longer supported
    • duplicate_found - This property is no longer supported, however sync-granule and move-granules now produce a separate granuleDuplicates object as part of their output. The granuleDuplicates object is a map of granules by granule ID which includes the files that encountered duplicates during processing. Guidance on how to integrate granuleDuplicates information into your workflow configuration is provided below.

    Exceptions

    These workflow tasks did not have their schema for granule files updated:

    • discover-granules - no updates
    • queue-granules - no updates
    • parse-pdr - no updates
    • sync-granule - input schema not updated, output schema was updated

    The reason that these task schemas were not updated is that all of these tasks start before the files have been ingested to S3, thus much of the information that is required in the updated files schema like bucket, key, or checksum is not yet known.

    Bulk granule operations

    Since the input schema for the above tasks was not updated, that means you cannot run bulk granule operations against workflows if they start with any of those tasks. Bulk granule operations work by loading the specified granules from the database and sending them as input to a specified workflow, so if the specified workflow begins with a task whose input schema does not conform to what is coming out of the database, there will be schema errors.

    Upgrading your deployment

    Upgrading your workflows

    For any workflows using the update-granules-cmr-metadata-file-links task before the hyrax-metadata-updates and/or post-to-cmr tasks, update the step definition for update-granules-cmr-metadata-file-links as follows:

        "UpdateGranulesCmrMetadataFileLinksStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    hyrax-metadata-updates

    For any workflows using the hyrax-metadata-updates task before a post-to-cmr task, update the definition of the hyrax-metadata-updates step as follows:

        "HyraxMetadataUpdatesTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    post-to-cmr

    For any workflows using post-to-cmr task after the update-granules-cmr-metadata-file-links or hyrax-metadata-updates tasks, update the post-to-cmr step definition as follows:

        "CmrStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}"
    }
    }
    },
    ...more configuration...

    Example workflow

    For an example workflow integrating all of these changes, please see our example ingest and publish workflow.

    Optional - Integrate granuleDuplicates information

    Please note that the granuleDuplicates output is purely informational and does not have any bearing on the separate configuration for how duplicates should be handled.

    You can include granuleDuplicates output from the sync-granule or move-granules tasks in your workflow messages like so:

        "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    ...other config...
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granuleDuplicates}",
    "destination": "{$.meta.sync_granule.granule_duplicates}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    }
    ...more configuration...

    The result of this configuration is that the granuleDuplicates output from sync-granule would be placed in meta.sync_granule.granule_duplicates on the workflow message and remain there throughout the rest of the workflow. The same configuration could be replicated for the move-granules task, but be sure to use a different destination in the workflow message for the granuleDuplicates output .

    Updating collection URL path templates

    Collections can specify url_path templates to dynamically generate the final location of files. As part of url_path templates, file object properties can be interpolated to generate the file path. Thus, these url_path templates need to be updated to ensure that they are compatible with the updated files schema and the properties that will actually be available on file objects.

    See the notes on the updated files schema to know which properties are available and which previously existing properties were deprecated.

    As an example, you will want to update any url_path properties in your collections to remove references to file.name and replace them with references to file.fileName like so:

    - "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.name, 0, 3)}",
    + "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.fileName, 0, 3)}",
    - + \ No newline at end of file diff --git a/docs/v14.1.0/upgrade-notes/upgrade-rds/index.html b/docs/v14.1.0/upgrade-notes/upgrade-rds/index.html index 36428568f30..fc40e06b080 100644 --- a/docs/v14.1.0/upgrade-notes/upgrade-rds/index.html +++ b/docs/v14.1.0/upgrade-notes/upgrade-rds/index.html @@ -5,7 +5,7 @@ Upgrade to RDS release | Cumulus Documentation - + @@ -21,7 +21,7 @@ | cutoffSeconds | number | Number of seconds prior to this execution to 'cutoff' reconciliation queries. This allows in-progress/other in-flight operations time to complete and propagate to Elasticsearch/Dynamo/postgres. | 3600 | | dbConcurrency | number | Sets max number of parallel collections reports the script will run at a time. | 20 | | dbMaxPool | number | Sets the maximum number of connections the database pool has available. Modifying this may result in unexpected failures. | 20 |

    - + \ No newline at end of file diff --git a/docs/v14.1.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html b/docs/v14.1.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html index 8802491e3fc..44f52f7ddbe 100644 --- a/docs/v14.1.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html +++ b/docs/v14.1.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html @@ -5,13 +5,13 @@ Upgrade to TF version 0.13.6 | Cumulus Documentation - +
    Version: v14.1.0

    Upgrade to TF version 0.13.6

    Background

    Cumulus pins its support to a specific version of Terraform see: deployment documentation. The reason for only supporting one specific Terraform version at a time is to avoid deployment errors than can be caused by deploying to the same target with different Terraform versions.

    Cumulus is upgrading its supported version of Terraform from 0.12.12 to 0.13.6. This document contains instructions on how to perform the upgrade for your deployments.

    Prerequisites

    • Follow the Terraform guidance for what to do before upgrading, notably ensuring that you have no pending changes to your Cumulus deployments before proceeding.
      • You should do a terraform plan to see if you have any pending changes for your deployment (for both the data-persistence-tf and cumulus-tf modules), and if so, run a terraform apply before doing the upgrade to Terraform 0.13.6
    • Review the Terraform v0.13 release notes to prepare for any breaking changes that may affect your custom deployment code. Cumulus' deployment code has already been updated for compatibility with version 0.13.
    • Install Terraform version 0.13.6. We recommend using Terraform Version Manager tfenv to manage your installed versons of Terraform, but this is not required.

    Upgrade your deployment code

    Terraform 0.13 does not support some of the syntax from previous Terraform versions, so you need to upgrade your deployment code for compatibility.

    Terraform provides a 0.13upgrade command as part of version 0.13 to handle automatically upgrading your code. Make sure to check out the documentation on batch usage of 0.13upgrade, which will allow you to upgrade all of your Terraform code with one command.

    Run the 0.13upgrade command until you have no more necessary updates to your deployment code.

    Upgrade your deployment

    1. Ensure that you are running Terraform 0.13.6 by running terraform --version. If you are using tfenv, you can switch versions by running tfenv use 0.13.6.

    2. For the data-persistence-tf and cumulus-tf directories, take the following steps:

      1. Run terraform init --reconfigure. The --reconfigure flag is required, otherwise you might see an error like:

        Error: Failed to decode current backend config

        The backend configuration created by the most recent run of "terraform init"
        could not be decoded: unsupported attribute "lock_table". The configuration
        may have been initialized by an earlier version that used an incompatible
        configuration structure. Run "terraform init -reconfigure" to force
        re-initialization of the backend.
      2. Run terraform apply to perform a deployment.

        WARNING: Even if Terraform says that no resource changes are pending, running the apply using Terraform version 0.13.6 will modify your backend state from version 0.12.12 to version 0.13.6 without requiring approval. Updating the backend state is a necessary part of the version 0.13.6 upgrade, but it is not completely transparent.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/workflow_tasks/discover_granules/index.html b/docs/v14.1.0/workflow_tasks/discover_granules/index.html index e3f8552bd94..9b1b1191060 100644 --- a/docs/v14.1.0/workflow_tasks/discover_granules/index.html +++ b/docs/v14.1.0/workflow_tasks/discover_granules/index.html @@ -5,7 +5,7 @@ Discover Granules | Cumulus Documentation - + @@ -21,7 +21,7 @@ included in a granule's file list. That is, no such filtering based on filename occurs as described above.

    When set on the task configuration, the value applies to all collections during discovery. Otherwise, this property may be set on individual collections.

    Concurrency

    A number property that determines the level of concurrency with which granule duplicate checks are performed when duplicateGranuleHandling is skip or error.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when discover-granules discovers a large number of granules with skip or error duplicate handling. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the discover-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/workflow_tasks/files_to_granules/index.html b/docs/v14.1.0/workflow_tasks/files_to_granules/index.html index d0ae04c88f5..1541f8cd9d4 100644 --- a/docs/v14.1.0/workflow_tasks/files_to_granules/index.html +++ b/docs/v14.1.0/workflow_tasks/files_to_granules/index.html @@ -5,13 +5,13 @@ Files To Granules | Cumulus Documentation - +
    Version: v14.1.0

    Files To Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming config.inputGranules and the task input list of s3 URIs along with the rest of the configuration objects to take the list of incoming files and sort them into a list of granule objects.

    Please note Files passed in without metadata defined previously for config.inputGranules will be added with the following keys:

    • size
    • bucket
    • key
    • fileName

    It is primarily intended to support compatibility with the standard output of a processing task, and convert that output into a granule object accepted as input by the majority of other Cumulus tasks.

    Task Inputs

    Input

    This task expects an incoming input that contains an array of 'staged' S3 URIs to move to their final archive location.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    inputGranules

    An array of Cumulus granule objects.

    This object will be used to define metadata values for the move granules task, and is the basis for the updated object that will be added to the output.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/workflow_tasks/lzards_backup/index.html b/docs/v14.1.0/workflow_tasks/lzards_backup/index.html index bc86dea742a..529d5c065ff 100644 --- a/docs/v14.1.0/workflow_tasks/lzards_backup/index.html +++ b/docs/v14.1.0/workflow_tasks/lzards_backup/index.html @@ -5,13 +5,13 @@ LZARDS Backup | Cumulus Documentation - +
    Version: v14.1.0

    LZARDS Backup

    The LZARDS backup task takes an array of granules and initiates backup requests to the LZARDS API, which will be handled asynchronously by LZARDS.

    Deployment

    The LZARDS backup task is not automatically deployed with Cumulus. To deploy the task through the Cumulus module, first you must specify a lzards_launchpad_passphrase in your terraform variables (e.g. variables.tf) like so:

    variable "lzards_launchpad_passphrase" {
    type = string
    default = ""
    }

    Then you can specify a value for your lzards_launchpad_passphrase in terraform.tfvars like so:

    lzards_launchpad_passphrase = your-passphrase

    Lastly, you need to make sure that the lzards_launchpad_passphrase is passed into the Cumulus module (in main.tf) like so:

    lzards_launchpad_passphrase  = var.lzards_launchpad_passphrase

    In short, deploying the LZARDS task requires configuring a passphrase variable and ensuring that your TF configuration passes that variable into the Cumulus module.

    Additional terraform configuration for the LZARDS task can be found in the cumulus module's variables.tf file, where the the relevant variables are prefixed with lzards_. You can add these variables to your deployment using the same process outlined above for lzards_launchpad_passphrase.

    Task Inputs

    Input

    This task expects an array of granules as input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Task Outputs

    Output

    The LZARDS task outputs a composite object containing:

    • the input granules array, and
    • a backupResults object that describes the results of LZARDS backup attempts.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/workflow_tasks/move_granules/index.html b/docs/v14.1.0/workflow_tasks/move_granules/index.html index 6a1cda89bec..7c1c7f79897 100644 --- a/docs/v14.1.0/workflow_tasks/move_granules/index.html +++ b/docs/v14.1.0/workflow_tasks/move_granules/index.html @@ -5,13 +5,13 @@ Move Granules | Cumulus Documentation - +
    Version: v14.1.0

    Move Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming event.input array of Cumulus granule objects to do the following:

    • Move granules from their 'staging' location to the final location (as configured in the Sync Granules task)

    • Update the event.input object with the new file locations.

    • If the granule has a ECHO10/UMM CMR file(.cmr.xml or .cmr.json) file included in the event.input:

      • Update that file's access locations

      • Add it to the appropriate access URL category for the CMR filetype as defined by granule CNM filetype.

      • Set the CMR file to 'metadata' in the output granules object and add it to the granule files if it's not already present.

        Please note: Granules without a valid CNM type set in the granule file type field in event.input will be treated as "data" in the updated CMR metadata file

    • Task then outputs an updated list of granule objects.

    Task Inputs

    Input

    This task expects an incoming input that contains a list of 'staged' S3 URIs to move to their final archive location. If CMR metadata is to be updated for a granule, it must also be included in the input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects event.input to provide an array of Cumulus granule objects. The files listed for each granule represent the files to be acted upon as described in summary.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects with post-move file locations as the payload for the next task, and returns only the expected payload for the next task. If a CMR file has been specified for a granule object, the CMR resources related to the granule files will be updated according to the updated granule file metadata.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v14.1.0/workflow_tasks/parse_pdr/index.html b/docs/v14.1.0/workflow_tasks/parse_pdr/index.html index de92772e309..e9becc75723 100644 --- a/docs/v14.1.0/workflow_tasks/parse_pdr/index.html +++ b/docs/v14.1.0/workflow_tasks/parse_pdr/index.html @@ -5,13 +5,13 @@ Parse PDR | Cumulus Documentation - +
    Version: v14.1.0

    Parse PDR

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to do the following with the incoming PDR object:

    • Stage it to an internal S3 bucket

    • Parse the PDR

    • Archive the PDR and remove the staged file if successful

    • Outputs a payload object containing metadata about the parsed PDR (e.g. total size of all files, files counts, etc) and a granules object

    The constructed granules object is created using PDR metadata to determine values like data type and version, collection definitions to determine a file storage location based on the extracted data type and version number.

    Granule file types are converted from the PDR spec types to CNM types according to the following translation table:

      HDF: 'data',
    HDF-EOS: 'data',
    SCIENCE: 'data',
    BROWSE: 'browse',
    METADATA: 'metadata',
    BROWSE_METADATA: 'metadata',
    QA_METADATA: 'metadata',
    PRODHIST: 'qa',
    QA: 'metadata',
    TGZ: 'data',
    LINKAGE: 'data'

    Files missing file types will have none assigned, files with invalid types will result in a PDR parse failure.

    Task Inputs

    Input

    This task expects an incoming input that contains name and path information about the PDR to be parsed. For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    Provider

    A Cumulus provider object. Used to define connection information for retrieving the PDR.

    Bucket

    Defines the bucket where the 'pdrs' folder for parsed PDRs will be stored.

    Collection

    A Cumulus collection object. Used to define granule file groupings and granule metadata for discovered files.

    Task Outputs

    This task outputs a single payload output object containing metadata about the parsed PDR (e.g. filesCount, totalSize, etc), a pdr object with information for later steps and a the generated array of granule objects.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v14.1.0/workflow_tasks/queue_granules/index.html b/docs/v14.1.0/workflow_tasks/queue_granules/index.html index ef1ae06f795..cb0fa7226b3 100644 --- a/docs/v14.1.0/workflow_tasks/queue_granules/index.html +++ b/docs/v14.1.0/workflow_tasks/queue_granules/index.html @@ -5,14 +5,14 @@ Queue Granules | Cumulus Documentation - +
    Version: v14.1.0

    Queue Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions, and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to schedule ingest of granules that were discovered on a remote host, whether via the DiscoverGranules task or the ParsePDR task.

    The task utilizes a defined collection in concert with a defined provider, either on each granule, or passed in via config to queue up ingest executions for each granule, or for batches of granules.

    The constructed granules object is defined by the collection passed in the configuration, and has impacts to other provided core Cumulus Tasks.

    Users of this task in a workflow are encouraged to carefully consider their configuration in context of downstream tasks and workflows.

    Task Inputs

    Each of the following sections are a high-level discussion of the intent of the various input/output/config values.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects an incoming input that contains granules and information about them and their files. For the specifics, see the Cumulus Tasks page entry for the schema.

    This input is most commonly the output from a preceding DiscoverGranules or ParsePDR task.

    Cumulus Configuration

    This task does expect values to be set in the task_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    provider

    A Cumulus provider object for the originating provider. Will be passed along to the ingest workflow. This will be overruled by more specific provider information that may exist on a granule.

    internalBucket

    The Cumulus internal system bucket.

    granuleIngestWorkflow

    A string property that denotes the name of the ingest workflow into which granules should be queued.

    queueUrl

    A string property that denotes the URL of the queue to which scheduled execution messages are sent.

    preferredQueueBatchSize

    A number property that sets an upper bound on the size of each batch of granules queued into the payload of an ingest execution. Setting this property to a value higher than 1 allows queueing of multiple granules per ingest workflow.

    As ingest executions typically expect granules in the payload to have a common collection and common provider, this property only sets an upper bound within which batches will be created based on common collection and provider information.

    This means batches may be smaller than the preferred size if collection or provider information diverge, but never larger.

    The default value if none is specified is 1, which will queue one ingest execution per granule.

    concurrency

    A number property that determines the level of concurrency with which ingest executions are scheduled. Granules or batches of granules will be queued up into executions at this level of concurrency.

    This property is also used to limit concurrency when updating granule status to queued.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when queue-granules receives a large number of granules as input. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the queue-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    executionNamePrefix

    A string property that will prefix the names of scheduled executions.

    childWorkflowMeta

    An object property that will be merged into the scheduled execution input's meta field.

    Task Outputs

    This task outputs an assembled array of workflow execution ARNs for all scheduled workflow executions within the payload's running object.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/workflows/cumulus-task-message-flow/index.html b/docs/v14.1.0/workflows/cumulus-task-message-flow/index.html index d706c2670bc..fd37b05156d 100644 --- a/docs/v14.1.0/workflows/cumulus-task-message-flow/index.html +++ b/docs/v14.1.0/workflows/cumulus-task-message-flow/index.html @@ -5,14 +5,14 @@ Cumulus Tasks: Message Flow | Cumulus Documentation - +
    Version: v14.1.0

    Cumulus Tasks: Message Flow

    Cumulus Tasks comprise Cumulus Workflows and are either AWS Lambda tasks or AWS Elastic Container Service (ECS) activities. Cumulus Tasks permit a payload as input to the main task application code. The task payload is additionally wrapped by the Cumulus Message Adapter. The Cumulus Message Adapter supplies additional information supporting message templating and metadata management of these workflows.

    Diagram showing how incoming and outgoing Cumulus messages for workflow steps are handled by the Cumulus Message Adapter

    The steps in this flow are detailed in sections below.

    Cumulus Message Format

    A full Cumulus Message has the following keys:

    • cumulus_meta: System runtime information that should generally not be touched outside of Cumulus library code or the Cumulus Message Adapter. Stores meta information about the workflow such as the state machine name and the current workflow execution's name. This information is used to look up the current active task. The name of the current active task is used to look up the corresponding task's config in task_config.
    • meta: Runtime information captured by the workflow operators. Stores execution-agnostic variables.
    • payload: Payload is runtime information for the tasks.

    In addition to the above keys, it may contain the following keys:

    • replace: A key generated in conjunction with the Cumulus Message adapter. It contains the location on S3 for a message payload and a Target JSON path in the message to extract it to.
    • exception: A key used to track workflow exceptions, should not be modified outside of Cumulus library code.

    Here's a simple example of a Cumulus Message:

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    A message utilizing the Cumulus Remote message functionality must have at least the keys replace and cumulus_meta. Depending on configuration other portions of the message may be present, however the cumulus_meta, meta, and payload keys must be present once extraction is complete.

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    Cumulus Message Preparation

    The event coming into a Cumulus Task is assumed to be a Cumulus Message and should first be handled by the functions described below before being passed to the task application code.

    Preparation Step 1: Fetch remote event

    Fetch remote event will fetch the full event from S3 if the cumulus message includes a replace key.

    Once "my-large-event.json" is fetched from S3, it's returned from the fetch remote event function. If no "replace" key is present, the event passed to the fetch remote event function is assumed to be a complete Cumulus Message and returned as-is.

    Preparation Step 2: Parse step function config from CMA configuration parameters

    This step determines what current task is being executed. Note this is different from what lambda or activity is being executed, because the same lambda or activity can be used for different tasks. The current task name is used to load the appropriate configuration from the Cumulus Message's 'task_config' configuration parameter.

    Preparation Step 3: Load nested event

    Using the config returned from the previous step, load nested event resolves templates for the final config and input to send to the task's application code.

    Task Application Code

    After message prep, the message passed to the task application code is of the form:

    {
    "input": {},
    "config": {}
    }

    Create Next Message functions

    Whatever comes out of the task application code is used to construct an outgoing Cumulus Message.

    Create Next Message Step 1: Assign outputs

    The config loaded from the Fetch step function config step may have a cumulus_message key. This can be used to "dispatch" fields from the task's application output to a destination in the final event output (via URL templating). Here's an example where the value of input.anykey would be dispatched as the value of payload.out in the final cumulus message:

    {
    "task_config": {
    "bar": "baz",
    "cumulus_message": {
    "input": "{$.payload.input}",
    "outputs": [
    {
    "source": "{$.input.anykey}",
    "destination": "{$.payload.out}"
    }
    ]
    }
    },
    "cumulus_meta": {
    "task": "Example",
    "message_source": "local",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "input": {
    "anykey": "anyvalue"
    }
    }
    }

    Create Next Message Step 2: Store remote event

    If the ReplaceConfiguration parameter is set, the configured key's value will be stored in S3 and the final output of the task will include a replace key that contains configuration for a future step to extract the payload on S3 back into the Cumulus Message. The replace key identifies where the large event node has been stored in S3.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/workflows/developing-a-cumulus-workflow/index.html b/docs/v14.1.0/workflows/developing-a-cumulus-workflow/index.html index b08325fb8a6..76840abaf6b 100644 --- a/docs/v14.1.0/workflows/developing-a-cumulus-workflow/index.html +++ b/docs/v14.1.0/workflows/developing-a-cumulus-workflow/index.html @@ -5,13 +5,13 @@ Creating a Cumulus Workflow | Cumulus Documentation - +
    Version: v14.1.0

    Creating a Cumulus Workflow

    The Cumulus workflow module

    To facilitate adding a workflows to your deployment Cumulus provides a workflow module.

    In combination with the Cumulus message, the workflow module provides a way to easily turn a Step Function definition into a Cumulus workflow, complete with:

    Using the module also ensures that your workflows will continue to be compatible with future versions of Cumulus.

    For more on the full set of current available options for the module, please consult the module README.

    Adding a new Cumulus workflow to your deployment

    To add a new Cumulus workflow to your deployment that is using the cumulus module, add a new workflow resource to your deployment directory, either in a new .tf file, or to an existing file.

    The workflow should follow a syntax similar to:

    module "my_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/vx.x.x/terraform-aws-cumulus-workflow.zip"

    prefix = "my-prefix"
    name = "MyWorkflowName"
    system_bucket = "my-internal-bucket"

    workflow_config = module.cumulus.workflow_config

    tags = { Deployment = var.prefix }

    state_machine_definition = <<JSON
    {}
    JSON
    }

    In the above example, you would add your state_machine_definition using the Amazon States Language, using tasks you've developed and Cumulus core tasks that are made available as part of the cumulus terraform module.

    Please note: Cumulus follows the convention of tagging resources with the prefix variable { Deployment = var.prefix } that you pass to the cumulus module. For resources defined outside of Core, it's recommended that you adopt this convention as it makes resources and/or deployment recovery scenarios much easier to manage.

    Examples

    For a functional example of a basic workflow, please take a look at the hello_world_workflow.

    For more complete/advanced examples, please read the following cookbook entries/topics:

    - + \ No newline at end of file diff --git a/docs/v14.1.0/workflows/developing-workflow-tasks/index.html b/docs/v14.1.0/workflows/developing-workflow-tasks/index.html index d07ca4a0186..a83cd4a30a8 100644 --- a/docs/v14.1.0/workflows/developing-workflow-tasks/index.html +++ b/docs/v14.1.0/workflows/developing-workflow-tasks/index.html @@ -5,13 +5,13 @@ Developing Workflow Tasks | Cumulus Documentation - +
    Version: v14.1.0

    Developing Workflow Tasks

    Workflow tasks can be either AWS Lambda Functions or ECS Activities.

    Lambda functions

    The full set of available core Lambda functions can be found in the deployed cumulus module zipfile at /tasks, as well as reference documentation here. These Lambdas can be referenced in workflows via the outputs from that module (see the cumulus-template-deploy repo for an example).

    The tasks source is located in the Cumulus repository at cumulus/tasks.

    You can also develop your own Lambda function. See the Lambda Functions page to learn more.

    ECS Activities

    ECS activities are supported via the cumulus_ecs_module available from the Cumulus release page.

    Please read the module README for configuration details.

    For assistance in creating a task definition within the module read the AWS Task Definition Docs.

    For a step-by-step example of using the cumulus_ecs_module, please see the related cookbook entry.

    Cumulus Docker Image

    ECS activities require a docker image. Cumulus provides a docker image (source for node 12x+ lambdas on dockerhub: cumuluss/cumulus-ecs-task.

    Alternate Docker Images

    Custom docker images/runtimes are supported as are private registries. For details on configuring a private registry/image see the AWS documentation on Private Registry Authentication for Tasks.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/workflows/docker/index.html b/docs/v14.1.0/workflows/docker/index.html index cb2ed7dd35f..7442e3ee467 100644 --- a/docs/v14.1.0/workflows/docker/index.html +++ b/docs/v14.1.0/workflows/docker/index.html @@ -5,7 +5,7 @@ Dockerizing Data Processing | Cumulus Documentation - + @@ -14,7 +14,7 @@ 2) validate the output (in this case just check for existence) 3) use 'ncatted' to update the resulting file to be CF-compliant 4) write out metadata generated for this file

    Process Testing

    It is important to have tests for data processing, however in many cases datafiles can be large so it is not practical to store the test data in the repository. Instead, test data is currently stored on AWS S3, and can be retrieved using the AWS CLI.

    aws s3 sync s3://cumulus-ghrc-logs/sample-data/collection-name data

    Where collection-name is the name of the data collection, such as 'avaps', or 'cpl'. For example, an abridged version of the data for CPL includes:

    ├── cpl
    │   ├── input
    │   │   ├── HS3_CPL_ATB_12203a_20120906.hdf5
    │   │   ├── HS3_CPL_OP_12203a_20120906.hdf5
    │   └── output
    │   ├── HS3_CPL_ATB_12203a_20120906.nc
    │   ├── HS3_CPL_ATB_12203a_20120906.nc.meta.xml
    │   ├── HS3_CPL_OP_12203a_20120906.nc
    │   ├── HS3_CPL_OP_12203a_20120906.nc.meta.xml

    Contained in the input directory are all possible sets of data files, while the output directory is the expected result of processing. In this case the hdf5 files are converted to NetCDF files and XML metadata files are generated.

    The docker image for a process can be used on the retrieved test data. First create a test-output directory in the newly created data directory.

    mkdir data/test-output

    Then run the docker image using docker-compose.

    docker-compose run test

    This will process the data in the data/input directory and put the output into data/test-output. Repositories also include Python based tests which will validate this newly created output to the contents of data/output. Use Python's Nose tool to run the included tests.

    nosetests

    If the data/test-output directory validated against the contents of data/output the tests will be successful, otherwise an error will be reported.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/workflows/index.html b/docs/v14.1.0/workflows/index.html index cee97df0dad..00e71eb43f6 100644 --- a/docs/v14.1.0/workflows/index.html +++ b/docs/v14.1.0/workflows/index.html @@ -5,13 +5,13 @@ Workflows | Cumulus Documentation - +
    Version: v14.1.0

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    Provider data ingest and GIBS have a set of common needs in getting data from a source system and into the cloud where they can be distributed to end users. These common needs are:

    • Data Discovery - Crawling, polling, or detecting changes from a variety of sources.
    • Data Transformation - Taking data files in their original format and extracting and transforming them into another desired format such as visible browse images.
    • Archival - Storage of the files in a location that's accessible to end users.

    The high level view of the architecture and many of the individual steps are the same but the details of ingesting each type of collection differs. Different collection types and different providers have different needs. The individual boxes of a workflow are not only different. The branching, error handling, and multiplicity of the arrows connecting the boxes are also different. Some need visible images rendered from component data files from multiple collections. Some need to contact the CMR with updated metadata. Some will have different retry strategies to handle availability issues with source data systems.

    AWS and other cloud vendors provide an ideal solution for parts of these problems but there needs to be a higher level solution to allow the composition of AWS components into a full featured solution. The Ingest Workflow Architecture is designed to meet the needs for Earth Science data ingest and transformation.

    Goals

    Flexibility and Composability

    The steps to ingest and process data is different for each collection within a provider. Ingest should be as flexible as possible in the rearranging of steps and configuration.

    We want to use lego-like individual steps that can be composed by an operator.

    Individual steps should ...

    • Be as ignorant as possible of the overall flow. They should not be aware of previous steps.
    • Be runnable on their own.
    • Define their input and output in simple data structures.
    • Be domain agnostic.
    • Not make assumptions of specifics of what goes into a granule for example.

    Scalable

    The ingest architecture needs to be scalable both to handle ingesting hundreds of millions of granules and interpret dozens of different workflows.

    Data Provenance

    • We should have traceability for how data was produced and where it comes from.
    • Use immutable representations of data. Data once received is not overwritten. Data can be removed for cleanup.
    • All software is versioned. We can trace transformation of data by tracking the immutable source data and the versioned software applied to it.

    Operator Visibility and Control

    • Operators should be able to see and understand everything that is happening in the system.
    • It should be obvious why things are happening and straightforward to diagnose problems.
    • We generally assume that the operators know best in terms of the limits on a providers infrastructure, how often things need to be done, and details of a collection. The architecture should defer to their decisions and knowledge while providing safety nets to prevent problems.

    A Reconfigurable Workflow Architecture

    The Ingest Workflow Architecture is defined by two entity types, Workflows and Tasks. A Workflow is a set of composed Tasks to complete an objective such as ingesting a granule. Tasks are the individual steps of a Workflow that perform one job. The workflow is responsible for executing the right task based on the current state and response from the last task executed. Tasks are completely decoupled in that they don't call each other or even need to know about the presence of other tasks.

    Workflows and tasks are configured as Terraform resources, which are triggered via configured rules within Cumulus.

    Diagram showing the Step Function execution path through workflow tasks for a collection ingest

    See the Example GIBS Ingest Architecture showing how workflows and tasks are used to define the GIBS Ingest Architecture.

    Workflows

    A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions.

    Benefits of AWS Step Functions

    AWS Step functions are described in detail in the AWS documentation but they provide several benefits which are applicable to AWS.

    • Prebuilt solution
    • Operations Visibility
      • Visual diagram
      • Every execution is recorded with both inputs and output for every step.
    • Composability
      • Allow composing AWS Lambdas and code running in other steps. Code can be run in EC2 to interface with it or even on premise if desired.
      • Step functions allow specifying when steps run in parallel or choices between steps based on data from the previous step.
    • Flexibility
      • Step functions are designed to be easy to build new applications and reconfigure. We're exposing that flexibility directly to the provider.
    • Reliability and Error Handling
      • Step functions allow configuration of retries and adding handling of error conditions.
    • Described via data
      • This makes it easy to save the step function in configuration management solutions.
      • We can build simple interfaces on top of the flexibility provided.

    Workflow Scheduler

    The scheduler is responsible for initiating a step function and passing in the relevant data for a collection. This is currently configured as an interval for each collection. The scheduler service creates the initial event by combining the collection configuration with the AWS execution context defined via the cumulus terraform module.

    Tasks

    A workflow is composed of tasks. Each task is responsible for performing a discrete step of the ingest process. These can be activities like:

    • Crawling a provider website for new data.
    • Uploading data from a provider to S3.
    • Executing a process to transform data.

    AWS Step Functions permit tasks to be code running anywhere, even on premise. We expect most tasks will be written as Lambda functions in order to take advantage of the easy deployment, scalability, and cost benefits provided by AWS Lambda.

    • Leverages Existing Work
      • The design leverages the existing work of Amazon by defining workflows using the AWS Step Function State Language. This is the language that was created for describing the state machines used in AWS Step Functions.
    • Open for Extension
      • Both meta and task_config which are used for configuring at the collection and task levels do not dictate the fields and structure of the configuration. Additional task specific JSON schemas can be used for extending the validation of individual steps.
    • Data-centric Configuration
      • The use of a single JSON configuration file allows this to be added to a workflow. We build additional support on top of the configuration file for simpler domain specific configuration or interactive GUIs.

    For more details on Task Messages and Configuration, visit Cumulus configuration and message protocol documentation.

    Ingest Deploy

    To view deployment documentation, please see the Cumulus deployment documentation.

    Tradeoffs, and Benefits

    This section documents various tradeoffs and benefits of the Ingest Workflow Architecture.

    Tradeoffs

    Workflow execution is handled completely by AWS

    This means we can't add our own code into the orchestration of the workflow. We can't add new features not supported by Step Functions. We can't do things like enforce that the responses from tasks always conform to a schema or extract the configuration for a task ahead of it's execution.

    If we implemented our own orchestration we'd be able to add all of these. We save significant amounts of development effort and gain all the features of Step Functions for this trade off. One workaround is by providing a library of common task capabilities. These would optionally be available to tasks that can be implemented with Node.js and are able to include the library.

    Workflow Configuration is specified in AWS Step Function States Language

    The current design combines the states language defined by AWS with Ingest specific configuration. This means our representation has a tight coupling with their standard. If they make backwards incompatible changes in the future we will have to deal with existing projects written against that.

    We avoid having to develop our own standard and code to process it. The design can support new features in AWS Step Functions without needing to update the Ingest library code changes. It is unlikely they will make a backwards incompatible change at this point. One mitigation for this is writing data transformations to a new format if that were to happen.

    Collection Configuration Flexibility vs Complexity

    The Collections Configuration File is very flexible but requires more knowledge of AWS step functions to configure. A person modifying this file directly would need to comfortable editing a JSON file and configuring AWS Step Functions state transitions which address AWS resources.

    The configuration file itself is not necessarily meant to be edited by a human directly. Since we are developing a reconfigurable, composable architecture that specified entirely in data additional tools can be developed on top of it. The existing recipes.json files can be mapped to this format. Operational Tools like a GUI can be built that provide a usable interface for customizing workflows but it will take time to develop these tools.

    Benefits

    This section describes benefits of the Ingest Workflow Architecture.

    Simplicity

    The concepts of Workflows and Tasks are simple ones that should make sense to providers. Additionally, the implementation will only consist of a few components because the design leverages existing services and capabilities of AWS. The Ingest implementation will only consist of some reusable task code to make task implementation easier, Ingest deployment, and the Workflow Scheduler.

    Composability

    The design aims to satisfy the needs for ingest integrating different workflows for providers. It's flexible in terms of the ability to arrange tasks to meet the needs of a collection. Providers have developed and incorporated open source tools over the years. All of these are easily integrable into the workflows as tasks.

    There is low coupling between task steps. Failures of one component don't bring the whole system down. Individual tasks can be deployed separately.

    Scalability

    AWS Step Functions scale up as needed and aren't limited by a set of number of servers. They also easily allow you to leverage the inherent scalability of serverless functions.

    Monitoring and Auditing

    • Every execution is captured.
    • Every task run has captured input and outputs.
    • CloudWatch Metrics can be used for monitoring many of the events with the StepFunctions. It can also generate alarms for the whole process.
    • Visual report of the entire configuration.
      • Errors and success states are highlighted visually in the flow.

    Data Provenance

    • Monitoring and auditing ensures we know the data that was given to a task.
    • Workflows are versioned and the state machines stored in AWS Step Functions are immutable. Once created they cannot change.
    • Versioning of data in S3 or using immutable records in S3 will mean we always know what data was created as the result of a step or fed into a step.

    Appendix

    Example GIBS Ingest Architecture

    This shows the GIBS Ingest Architecture as an example of the use of the Ingest Workflow Architecture.

    • The GIBS Ingest Architecture consists of two workflows per collection type. There is one for discovery and one for ingest. The final stage of discovery triggers multiple ingest workflows for each MRF granule that needs to be generated.
    • It demonstrates both lambdas as tasks and a container used for MRF generation.

    GIBS Ingest Workflows

    Diagram showing the AWS Step Function execution path for a GIBS ingest workflow

    GIBS Ingest Granules Workflow

    This shows a visualization of an execution of the ingets granules workflow in step functions. The steps highlighted in green are the ones that executed and completed successfully.

    Diagram showing the AWS Step Function execution path for a GIBS ingest granules workflow

    - + \ No newline at end of file diff --git a/docs/v14.1.0/workflows/input_output/index.html b/docs/v14.1.0/workflows/input_output/index.html index 6ffbfdc8869..7d4f8258525 100644 --- a/docs/v14.1.0/workflows/input_output/index.html +++ b/docs/v14.1.0/workflows/input_output/index.html @@ -5,14 +5,14 @@ Workflow Inputs & Outputs | Cumulus Documentation - +
    Version: v14.1.0

    Workflow Inputs & Outputs

    General Structure

    Cumulus uses a common format for all inputs and outputs to workflows. The same format is used for input and output from workflow steps. The common format consists of a JSON object which holds all necessary information about the task execution and AWS environment. Tasks return objects identical in format to their input with the exception of a task-specific payload field. Tasks may also augment their execution metadata.

    Cumulus Message Adapter

    The Cumulus Message Adapter and Cumulus Message Adapter libraries help task developers integrate their tasks into a Cumulus workflow. These libraries adapt input and outputs from tasks into the Cumulus Message format. The Scheduler service creates the initial event message by combining the collection configuration, external resource configuration, workflow configuration, and deployment environment settings. The subsequent workflow messages between tasks must conform to the message schema. By using the Cumulus Message Adapter, individual task Lambda functions only receive the input and output specifically configured for the task, and not non-task-related message fields.

    The Cumulus Message Adapter libraries are called by the tasks with a callback function containing the business logic of the task as a parameter. They first adapt the incoming message to a format more easily consumable by Cumulus tasks, then invoke the task, and then adapt the task response back to the Cumulus message protocol to be sent to the next task.

    A task's Lambda function can be configured to include a Cumulus Message Adapter library which constructs input/output messages and resolves task configurations. The CMA can then be included in one of several ways:

    Lambda Layer

    In order to make use of this configuration, a Lambda layer must be uploaded to your account. Due to platform restrictions, Core cannot currently support sharable public layers, however you can deploy the appropriate version from the release page in two ways:

    Once you've deployed the layer, integrate the CMA layer with your Lambdas:

    • If using the cumulus module, set the cumulus_message_adapter_lambda_layer_version_arn in your .tfvars file to integrate the CMA layer with all core Cumulus lambdas.
    • If including your own Lambda or ECS task Terraform modules, specify the CMA layer ARN in the Terraform resource definitions. Also, make sure to set the CUMULUS_MESSAGE_ADAPTER_DIR environment variable for the task to /opt for the CMA integration to work properly.

    In the future if you wish to update/change the CMA version you will need to update the deployed CMA, and update the layer configuration for the impacted Lambdas as needed.

    Please Note: Updating/removing a layer does not change a deployed Lambda, so to update the CMA you should deploy a new version of the CMA layer, update the associated Lambda configuration to reference the new CMA version, and re-deploy your Lambdas.

    Manual Addition

    You can include the CMA package in the Lambda code in the cumulus-message-adapter sub-directory in your lambda .zip, for any Lambda runtime that includes a python runtime. python 2 is included in Lambda runtimes that use Amazon Linux, however Amazon Linux 2 will not support this directly.

    Please note: It is expected that upcoming Cumulus releases will update the CMA layer to include a python runtime.

    If you are manually adding the message adapter to your source and utilizing the CMA, you should set the Lambda's CUMULUS_MESSAGE_ADAPTER_DIR environment variable to target the installation path for the CMA.

    CMA Input/Output

    Input to the task application code is a json object with keys:

    • input: By default, the incoming payload is the payload output from the previous task, or it can be a portion of the payload as configured for the task in the corresponding .tf workflow definition file.
    • config: Task-specific configuration object with URL templates resolved.

    Output from the task application code is returned in and placed in the payload key by default, but the config key can also be used to return just a portion of the task output.

    CMA configuration

    As of Cumulus > 1.15 and CMA > v1.1.1, configuration of the CMA is expected to be driven by AWS Step Function Parameters.

    Using the CMA package with the Lambda by any of the above mentioned methods (Lambda Layers, manual) requires configuration for its various features via a specific Step Function Parameters configuration format (see sample workflows in the examples cumulus-tf source for more examples):

    {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": "{some config}",
    "task_config": "{some config}"
    }
    }

    The "event.$": "$" parameter is required as it passes the entire incoming message to the CMA client library for parsing, and the CMA itself to convert the incoming message into a Cumulus message for use in the function.

    The following are the CMA's current configuration settings:

    ReplaceConfig (Cumulus Remote Message)

    Because of the potential size of a Cumulus message, mainly the payload field, a task can be set via configuration to store a portion of its output on S3 with a message key Remote Message that defines how to retrieve it and an empty JSON object {} in its place. If the portion of the message targeted exceeds the configured MaxSize (defaults to 0 bytes) it will be written to S3.

    The CMA remote message functionality can be configured using parameters in several ways:

    Partial Message

    Setting the Path/Target path in the ReplaceConfig parameter (and optionally a non-default MaxSize)

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 1,
    "Path": "$.payload",
    "TargetPath": "$.payload"
    }
    }
    }
    }
    }

    will result in any payload output larger than the MaxSize (in bytes) to be written to S3. The CMA will then mark that the key has been replaced via a replace key on the event. When the CMA picks up the replace key in future steps, it will attempt to retrieve the output from S3 and write it back to payload.

    Note that you can optionally use a different TargetPath than Path, however as the target is a JSON path there must be a key to target for replacement in the output of that step. Also note that the JSON path specified must target one node, otherwise the CMA will error, as it does not support multiple replacement targets.

    If TargetPath is omitted, it will default to the value for Path.

    Full Message

    Setting the following parameters for a lambda:

    DiscoverGranules:
    Parameters:
    cma:
    event.$: '$'
    ReplaceConfig:
    FullMessage: true

    will result in the CMA assuming the entire inbound message should be stored to S3 if it exceeds the default max size.

    This is effectively the same as doing:

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 0,
    "Path": "$",
    "TargetPath": "$"
    }
    }
    }
    }
    }

    Cumulus Message example

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Cumulus Remote Message example

    The message may contain a reference to an S3 Bucket, Key and TargetPath as follows:

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    task_config

    This configuration key contains the input/output configuration values for definition of inputs/outputs via URL paths. Important: These values are all relative to json object configured for event.$.

    This configuration's behavior is outlined in the CMA step description below.

    The configuration should follow the format:

    {
    "FunctionName": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "other_cma_configuration": "<config object>",
    "task_config": "<task config>"
    }
    }
    }
    }

    Example:

    {
    "StepFunction": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "sfnEnd": true,
    "stack": "{$.meta.stack}",
    "bucket": "{$.meta.buckets.internal.name}",
    "stateMachine": "{$.cumulus_meta.state_machine}",
    "executionName": "{$.cumulus_meta.execution_name}",
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    }
    }
    }

    Cumulus Message Adapter Steps

    1. Reformat AWS Step Function message into Cumulus Message

    Due to the way AWS handles Parameterized messages, when Parameters are used the CMA takes an inbound message:

    {
    "resource": "arn:aws:lambda:us-east-1:<lambda arn values>",
    "input": {
    "Other Parameter": {},
    "cma": {
    "ConfigKey": {
    "config values": "some config values"
    },
    "event": {
    "cumulus_meta": {},
    "payload": {},
    "meta": {},
    "exception": {}
    }
    }
    }
    }

    and takes the following actions:

    • Takes the object at input.cma.event and makes it the full input
    • Merges all of the keys except event under input.cma into the parent input object

    This results in the incoming message (presumably a Cumulus message) with any cma configuration parameters merged in being passed to the CMA. All other parameterized values defined outside of the cma key are ignored

    2. Resolve Remote Messages

    If the incoming Cumulus message has a replace key value, the CMA will attempt to pull the payload from S3,

    For example, if the incoming contains the following:

      "meta": {
    "foo": {}
    },
    "replace": {
    "TargetPath": "$.meta.foo",
    "Bucket": "some_bucket",
    "Key": "events/some-event-id"
    }

    The CMA will attempt to pull the file stored at Bucket/Key and replace the value at TargetPath, then remove the replace object entirely and continue.

    3. Resolve URL templates in the task configuration

    In the workflow configuration (defined under the task_config key), each task has its own configuration, and it can use URL template as a value to achieve simplicity or for values only available at execution time. The Cumulus Message Adapter resolves the URL templates (relative to the event configuration key) and then passes message to next task. For example, given a task which has the following configuration:

    {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }
    }
    }
    }

    and and incoming message that contains:

    {
    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    }
    }

    The corresponding Cumulus Message would contain:

    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }

    The message sent to the task would be:

    "config" : {
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    },
    "inlinestr": "prefixbarsuffix",
    "array": ["bar"],
    "object": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    },
    "input": "{...}"

    URL template variables replace dotted paths inside curly brackets with their corresponding value. If the Cumulus Message Adapter cannot resolve a value, it will ignore the template, leaving it verbatim in the string. While seemingly complex, this allows significant decoupling of Tasks from one another and the data that drives them. Tasks are able to easily receive runtime configuration produced by previously run tasks and domain data.

    4. Resolve task input

    By default, the incoming payload is the payload from the previous task. The task can also be configured to use a portion of the payload its input message. For example, given a task specifies cma.task_config.cumulus_message.input:

        ExampleTask:
    Parameters:
    cma:
    event.$: '$'
    task_config:
    cumulus_message:
    input: '{$.payload.foo}'

    The task configuration in the message would be:

        {
    "task_config": {
    "cumulus_message": {
    "input": "{$.payload.foo}"
    }
    },
    "payload": {
    "foo": {
    "anykey": "anyvalue"
    }
    }
    }

    The Cumulus Message Adapter will resolve the task input, instead of sending the whole payload as task input, the task input would be:

        {
    "input" : {
    "anykey": "anyvalue"
    },
    "config": {...}
    }

    5. Resolve task output

    By default, the task's return value is the next payload. However, the workflow task configuration can specify a portion of the return value as the next payload, and can also augment values to other fields. Based on the task configuration under cma.task_config.cumulus_message.outputs, the Message Adapter uses a task's return value to output a message as configured by the task-specific config defined under cma.task_config. The Message Adapter dispatches a "source" to a "destination" as defined by URL templates stored in the task-specific cumulus_message.outputs. The value of the task's return value at the "source" URL is used to create or replace the value of the task's return value at the "destination" URL. For example, given a task specifies cumulus_message.output in its workflow configuration as follows:

    {
    "ExampleTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    }
    }
    }
    }
    }

    The corresponding Cumulus Message would be:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Given the response from the task is:

        {
    "output": {
    "anykey": "boo"
    }
    }

    The Cumulus Message Adapter would output the following Cumulus Message:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    6. Apply Remote Message Configuration

    If the ReplaceConfig configuration parameter is defined, the CMA will evaluate the configuration options provided, and if required write a portion of the Cumulus Message to S3, and add a replace key to the message for future steps to utilize.

    Please Note: the non user-modifiable field cumulus-meta will always be retained, regardless of the configuration.

    For example, if the output message (post output configuration) from a cumulus message looks like:

        {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    the resultant output would look like:

    {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "replace": {
    "TargetPath": "$",
    "Bucket": "some-internal-bucket",
    "Key": "events/some-event-id"
    }
    }

    Additional features

    Validate task input, output and configuration messages against the schemas provided

    The Cumulus Message Adapter has the capability to validate task input, output and configuration messages against their schemas. The default location of the schemas is the schemas folder in the top level of the task and the default filenames are input.json, output.json, and config.json. The task can also configure a different schema location. If no schema can be found, the Cumulus Message Adapter will not validate the messages.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/workflows/lambda/index.html b/docs/v14.1.0/workflows/lambda/index.html index ff4bb5b969e..f1eead719ff 100644 --- a/docs/v14.1.0/workflows/lambda/index.html +++ b/docs/v14.1.0/workflows/lambda/index.html @@ -5,13 +5,13 @@ Develop Lambda Functions | Cumulus Documentation - +
    Version: v14.1.0

    Develop Lambda Functions

    Develop a new Cumulus Lambda

    AWS provides great getting started guide for building Lambdas in the developer guide.

    Cumulus currently supports the following environments for Cumulus Message Adapter enabled functions:

    Additionally you may chose to include any of the other languages AWS supports as a resource with reduced feature support.

    Deploy a Lambda

    Node.js Lambda

    For a new Node.js Lambda, create a new function and add an aws_lambda_function resource to your Cumulus deployment (for examples, see the example in source example/lambdas.tf and ingest/lambda-functions.tf) as either a new .tf file, or added to an existing .tf file:

    resource "aws_lambda_function" "myfunction" {
    function_name = "${var.prefix}-function"
    filename = "/path/to/zip/lambda.zip"
    source_code_hash = filebase64sha256("/path/to/zip/lambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"

    vpc_config {
    subnet_ids = var.subnet_ids
    security_group_ids = var.security_group_ids
    }
    }

    Please note: This example contains the minimum set of required configuration.

    Make sure to include a vpc_config that matches the information you've provided the cumulus module if intending to integrate the lambda with a Cumulus deployment.

    Java Lambda

    Java Lambdas are created in much the same way as the Node.js example above.

    The source points to a folder with the compiled .class files and dependency libraries in the Lambda Java zip folder structure (details here), not an uber-jar.

    The deploy folder referenced here would contain a folder 'test_task/task/' which contains Task.class and TaskLogic.class as well as a lib folder containing dependency jars.

    Python Lambda

    Python Lambdas are created the same way as the Node.js example above.

    Cumulus Message Adapter

    For Lambdas wishing to utilize the Cumulus Message Adapter(CMA), you should define a layers key on your Lambda resource with the CMA you wish to include. See the input_output docs for more on how to create/use the CMA.

    Other Lambda Options

    Cumulus supports all of the options available to you via the aws_lambda_function Terraform resource. For more information on what's available, check out the Terraform resource docs.

    Cloudwatch log groups

    If you want to enable Cloudwatch logging for your Lambda resource, you'll need to add a aws_cloudwatch_log_group resource to your Lambda definition:

    resource "aws_cloudwatch_log_group" "myfunction_log_group" {
    name = "/aws/lambda/${aws_lambda_function.myfunction.function_name}"
    retention_in_days = 30
    tags = { Deployment = var.prefix }
    }
    - + \ No newline at end of file diff --git a/docs/v14.1.0/workflows/protocol/index.html b/docs/v14.1.0/workflows/protocol/index.html index a770cbd7677..3aaf842566e 100644 --- a/docs/v14.1.0/workflows/protocol/index.html +++ b/docs/v14.1.0/workflows/protocol/index.html @@ -5,13 +5,13 @@ Workflow Protocol | Cumulus Documentation - +
    Version: v14.1.0

    Workflow Protocol

    Configuration and Message Use Diagram

    A diagram showing at which point in a workflow the Cumulus message is checked for conformity with the message schema and where the configuration is checked for conformity with the configuration schema

    • Configuration - The Cumulus workflow configuration defines everything needed to describe an instance of Cumulus.
    • Scheduler - This starts ingest of a collection on configured intervals.
    • Input to Step Functions - The Scheduler uses the Configuration as source data to construct the input to the Workflow.
    • AWS Step Functions - Run the workflows as kicked off by the scheduler or other processes.
    • Input to Task - The input for each task is a JSON document that conforms to the message schema.
    • Output from Task - The output of each task must conform to the message schemas as well and is used as the input for the subsequent task.
    - + \ No newline at end of file diff --git a/docs/v14.1.0/workflows/workflow-configuration-how-to/index.html b/docs/v14.1.0/workflows/workflow-configuration-how-to/index.html index bcf24fbcff8..6c2fb8e1ead 100644 --- a/docs/v14.1.0/workflows/workflow-configuration-how-to/index.html +++ b/docs/v14.1.0/workflows/workflow-configuration-how-to/index.html @@ -5,7 +5,7 @@ Workflow Configuration How To's | Cumulus Documentation - + @@ -24,7 +24,7 @@ To take a subset of any given metadata, use the option substring.

    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{substring(file.fileName, 0, 3)}"

    This example will populate to "MOD09GQ/MOD"

    In addition to substring, several datetime-specific functions are available, which can parse a datetime string in the metadata and extract a certain part of it:

    "url_path": "{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"

    or

     "url_path": "{dateFormat(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime, YYYY-MM-DD[T]HH[:]mm[:]ss)}"

    The following functions are implemented:

    • extractYear - returns the year, formatted as YYYY
    • extractMonth - returns the month, formatted as MM
    • extractDate - returns the day of the month, formatted as DD
    • extractHour - returns the hour in 24-hour format, with no leading zero
    • dateFormat - takes a second argument describing how to format the date, and passes the metadata date string and the format argument to moment().format()

    Note: the move-granules step needs to be in the workflow for this template to be populated and the file moved. This cmrMetadata or CMR granule XML needs to have been generated and stored on S3. From there any field could be retrieved and used for a url_path.

    Adding Metadata dates and times to the URL Path

    There are a number of options to pull dates from the CMR file metadata. With this metadata:

    <Granule>
    <Temporal>
    <RangeDateTime>
    <BeginningDateTime>2003-02-19T00:00:00Z</BeginningDateTime>
    <EndingDateTime>2003-02-19T23:59:59Z</EndingDateTime>
    </RangeDateTime>
    </Temporal>
    </Granule>

    The following examples of url_path could be used.

    {extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the year from the full date: 2003.

    {extractMonth(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the month: 2.

    {extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the day: 19.

    {extractHour(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the hour: 0.

    Different values can be combined to create the url_path. For example

    {
    "bucket": "sample-protected-bucket",
    "name": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)/extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"
    }

    The final file location for the above would be s3://sample-protected-bucket/MOD09GQ/2003/19/MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.

    - + \ No newline at end of file diff --git a/docs/v14.1.0/workflows/workflow-triggers/index.html b/docs/v14.1.0/workflows/workflow-triggers/index.html index 1f132839701..704493b9a58 100644 --- a/docs/v14.1.0/workflows/workflow-triggers/index.html +++ b/docs/v14.1.0/workflows/workflow-triggers/index.html @@ -5,13 +5,13 @@ Workflow Triggers | Cumulus Documentation - +
    Version: v14.1.0

    Workflow Triggers

    For a workflow to run, it needs to be associated with a rule (see rule configuration). The rule configuration determines how and when a workflow execution is triggered. Rules can be triggered one time, on a schedule, or by new data written to a kinesis stream.

    There are three lambda functions in the API package responsible for scheduling and starting workflows: SF scheduler, message consumer, and SF starter. Each Cumulus instance comes with a Start SF SQS queue.

    The SF scheduler lambda puts a message onto the Start SF queue. This message is picked up the Start SF lambda and an execution is started with the body of the message as the input.

    When a one time rule is created, the schedule SF lambda is triggered. Rules that are not one time are associated with a CloudWatch event which will manage the trigger of the lambdas that trigger the workflows.

    For a scheduled rule, the Cloudwatch event is triggered on the given schedule which calls directly to the schedule SF lambda.

    For a kinesis rule, when data is added to the kinesis stream, the Cloudwatch event is triggered, which calls the message consumer lambda. The message consumer lambda parses the kinesis message and finds all of the rules associated with that message. For each rule (which corresponds to one workflow), the schedule SF lambda is triggered to queue a message to start the workflow.

    For an sns rule, when a message is published to the SNS topic, the message consumer receives the SNS message (JSON expected), parses it into an object, starts a new execution of the workflow associated with the rule and passes the object in the payload field of the Cumulus message.

    Diagram showing how workflows are scheduled via rules

    - + \ No newline at end of file diff --git a/docs/v15.0.2/adding-a-task/index.html b/docs/v15.0.2/adding-a-task/index.html index aa04fc7e5b1..74544bacc8e 100644 --- a/docs/v15.0.2/adding-a-task/index.html +++ b/docs/v15.0.2/adding-a-task/index.html @@ -5,13 +5,13 @@ Contributing a Task | Cumulus Documentation - +
    Version: v15.0.2

    Contributing a Task

    We're tracking reusable Cumulus tasks in this list and, if you've got one you'd like to share with others, you can add it!

    Right now we're focused on tasks distributed via npm, but are open to including others. For now the script that pulls all the data for each package only supports npm.

    The tasks.md file is generated in the build process

    The tasks list in docs/tasks.md is generated from the list of task package names from the tasks folder.

    Do not edit the docs/tasks.md file directly.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/api/index.html b/docs/v15.0.2/api/index.html index ef6c0b886ff..1289498a0cc 100644 --- a/docs/v15.0.2/api/index.html +++ b/docs/v15.0.2/api/index.html @@ -5,13 +5,13 @@ Cumulus API | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v15.0.2/architecture/index.html b/docs/v15.0.2/architecture/index.html index e0b96e26128..d9cce675d09 100644 --- a/docs/v15.0.2/architecture/index.html +++ b/docs/v15.0.2/architecture/index.html @@ -5,14 +5,14 @@ Architecture | Cumulus Documentation - +
    Version: v15.0.2

    Architecture

    Architecture

    Below, find a diagram with the components that comprise an instance of Cumulus.

    Architecture diagram of a Cumulus deployment

    This diagram details all of the major architectural components of a Cumulus deployment.

    While the diagram can feel complex, it can easily be digested in several major components:

    Data Distribution

    End Users can access data via Cumulus's distribution submodule, which includes ASF's thin egress application, this provides authenticated data egress, temporary S3 links and other statistics features.

    End user exposure of Cumulus's holdings is expected to be provided by an external service.

    For NASA use, this is assumed to be CMR in this diagram.

    Data ingest

    Workflows

    The core of the ingest and processing capabilities in Cumulus is built into the deployed AWS Step Function workflows. Cumulus rules trigger workflows via either Cloud Watch rules, Kinesis streams, SNS topic, or SQS queue. The workflows then run with a configured Cumulus message, utilizing built-in processes to report status of granules, PDRs, executions, etc to the Data Persistence components.

    Workflows can optionally report granule metadata to CMR, and workflow steps can report metrics information to a shared SNS topic, which could be subscribed to for near real time granule, execution, and PDR status. This could be used for metrics reporting using an external ELK stack, for example.

    Data persistence

    Cumulus entity state data is stored in a set of PostgreSQL compatible database, and is exported to an Elasticsearch instance for non-authoritative querying/state data for the API and other applications that require more complex queries. Currently the entity state data is replicated in DynamoDB and this will be removed in a future release.

    Data discovery

    Discovering data for ingest is handled via workflow step components using Cumulus provider and collection configurations and various triggers. Data can be ingested from AWS S3, FTP, HTTPS and more.

    Database

    Cumulus utilizes a user-provided PostgreSQL database backend. For improved API search query efficiency Cumulus provides data replication to an Elasticsearch instance. For legacy reasons, Cumulus is currently also deploying a DynamoDB datastore, and writes are replicated in parallel with the PostgreSQL database writes. The DynamoDB replicated tables and parallel writes will be removed in future releases.

    PostgreSQL Database Schema Diagram

    ERD of the Cumulus Database

    Maintenance

    System maintenance personnel have access to manage ingest and various portions of Cumulus via an AWS API gateway, as well as the operator dashboard.

    Deployment Structure

    Cumulus is deployed via Terraform and is organized internally into two separate top-level modules, as well as several external modules.

    Cumulus

    The Cumulus module, which contains multiple internal submodules, deploys all of the Cumulus components that are not part of the Data Persistence portion of this diagram.

    Data persistence

    The data persistence module provides the Data Persistence portion of the diagram.

    Other modules

    Other modules are provided as artifacts on the release page for use in users configuring their own deployment and contain extracted subcomponents of the cumulus module. For more on these components see the components documentation.

    For more on the specific structure, examples of use and how to deploy and more, please see the deployment docs as well as the cumulus-template-deploy repo .

    - + \ No newline at end of file diff --git a/docs/v15.0.2/configuration/cloudwatch-retention/index.html b/docs/v15.0.2/configuration/cloudwatch-retention/index.html index 9ac245a8153..316c80f47f1 100644 --- a/docs/v15.0.2/configuration/cloudwatch-retention/index.html +++ b/docs/v15.0.2/configuration/cloudwatch-retention/index.html @@ -5,7 +5,7 @@ Cloudwatch Retention | Cumulus Documentation - + @@ -14,7 +14,7 @@ the retention period (in days) of cloudwatch log groups for lambdas and tasks which the cumulus, cumulus_distribution, and cumulus_ecs_service modules supports (using the cumulus module as an example):

    module "cumulus" {
    # ... other variables
    default_log_retention_days = var.default_log_retention_days
    cloudwatch_log_retention_periods = var.cloudwatch_log_retention_periods
    }

    By setting the below variables in terraform.tfvars and deploying, the cloudwatch log groups will be instantiated or updated with the new retention value.

    default_log_retention_periods

    The variable default_log_retention_days can be configured in order to set the default log retention for all cloudwatch log groups managed by Cumulus in case a custom value isn't used. The log groups will use this value for their retention, and if this value is not set either, the retention will default to 30 days. For example, if a user would like their log groups of the Cumulus module to have a retention period of one year, deploy the respective modules with the variable in the example below.

    Example

    default_log_retention_periods = 365

    cloudwatch_log_retention_periods

    The retention period (in days) of cloudwatch log groups for specific lambdas and tasks can be set during deployment using the cloudwatch_log_retention_periods terraform map variable. In order to configure these values for respective cloudwatch log groups, uncomment the cloudwatch_log_retention_periods variable and add the retention values listed below corresponding to the group's retention you want to change. The following values are supported correlating to their lambda/task name, (i.e. "/aws/lambda/prefix-DiscoverPdrs" would have the retention variable "DiscoverPdrs" )

    • ApiEndpoints
    • AsyncOperationEcsLogs
    • DiscoverPdrs
    • DistributionApiEndpoints
    • EcsLogs
    • granuleFilesCacheUpdater
    • HyraxMetadataUpdates
    • ParsePdr
    • PostToCmr
    • PrivateApiLambda
    • publishExecutions
    • publishGranules
    • publishPdrs
    • QueuePdrs
    • QueueWorkflow
    • replaySqsMessages
    • SyncGranule
    • UpdateCmrAccessConstraints
    note

    EcsLogs is used for all cumulus_ecs_service tasks cloudwatch log groups

    Example

    cloudwatch_log_retention_periods = {
    ParsePdr = 365
    }

    The retention periods are the number of days you'd like to retain the logs in the specified log group for. There is a list of possible values available in the aws logs documentation.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/configuration/collection-storage-best-practices/index.html b/docs/v15.0.2/configuration/collection-storage-best-practices/index.html index c839e1e6cb4..e20d53c3bb6 100644 --- a/docs/v15.0.2/configuration/collection-storage-best-practices/index.html +++ b/docs/v15.0.2/configuration/collection-storage-best-practices/index.html @@ -5,13 +5,13 @@ Collection Cost Tracking and Storage Best Practices | Cumulus Documentation - +
    Version: v15.0.2

    Collection Cost Tracking and Storage Best Practices

    Organizing your data is important for metrics you may want to collect. AWS S3 storage and cost metrics are calculated at the bucket level, so it is easy to get metrics by bucket. You can get storage metrics at the key prefix level, but that is done through the CLI, which can be very slow for large buckets. It is very difficult to estimate costs at the prefix level.

    Calculating Storage By Collection

    By bucket

    Usage by bucket can be obtained in your AWS Billing Dashboard via an S3 Usage Report. You can download your usage report for a period of time and review your storage and requests at the bucket level.

    Bucket metrics can also be found in the AWS CloudWatch Metrics Console (also see Using Amazon CloudWatch Metrics).

    Navigate to Storage Metrics and select the BucketName for all buckets you are interested in. The available metrics are BucketSizeInBytes and NumberOfObjects.

    In the Graphed metrics tab, you can select the type of statistic (i.e. average, minimum, maximum) and the period for the stats. At the top, it's useful to select from the dropdown to view the metrics as a number. You can also select the time period for which you want to see stats.

    Alternatively you can query CloudWatch using the CLI.

    This command will return the average number of bytes in the bucket test-bucket for 7/31/2019:

    aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2019-07-31T00:00:00 --end-time 2019-08-01T00:00:00 --period 86400 --statistics Average --region us-east-1 --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=test-bucket Name=StorageType,Value=StandardStorage

    The result looks like:

    {
    "Datapoints": [
    {
    "Timestamp": "2019-07-31T00:00:00Z",
    "Average": 150996467959.0,
    "Unit": "Bytes"
    }
    ],
    "Label": "BucketSizeBytes"
    }

    By key prefix

    AWS does not offer storage and usage statistics at a key prefix level. Via the AWS CLI, you can get the total storage for a bucket or folder. The following command would get the storage for folder example-folder in bucket sample-bucket:

    aws s3 ls --summarize --human-readable --recursive s3://sample-bucket/example-folder | grep 'Total'

    Note that this can be a long-running operation for large buckets.

    Calculating Cost By Collection

    NASA NGAP Environment

    If using an NGAP account, the cost per bucket can be found in your CloudTamer console, in the Financials section of your account information. This is calculated on a monthly basis.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Outside of NGAP

    You can enabled S3 Cost Allocation Tags and tag your buckets. From there, you can view the cost breakdown in your AWS Billing Dashboard via the Cost Explorer. Cost Allocation Tagging is available at the bucket level.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Storage Configuration

    Cumulus allows for the configuration of many buckets for your files. Buckets are created and added to your deployment as part of the deployment process.

    In your Cumulus collection configuration, you specify where you want the files to be stored post-processing. This is done by matching a regular expression on the file with the configured bucket.

    Note that in the collection configuration, the bucket field is the key to the buckets variable in the deployment's .tfvars file.

    Organizing By Bucket

    You can specify separate groups of buckets for each collection, which could look like the example below.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "MOD09GQ-006-private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "MOD09GQ-006-public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    Additional collections would go to different buckets.

    Organizing by Key Prefix

    Different collections can be organized into different folders in the same bucket, using the key prefix, which is specified as the url_path in the collection configuration. In this simplified collection configuration example, the url_path field is set at the top level so that all files go to a path prefixed with the collection name and version.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    In this case, the path to all the files would be: MOD09GQ___006/<filename> in their respective buckets.

    The url_path can be overidden directly on the file configuration. The example below produces the same result.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "protected-2",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    }
    ]
    }
    - + \ No newline at end of file diff --git a/docs/v15.0.2/configuration/data-management-types/index.html b/docs/v15.0.2/configuration/data-management-types/index.html index d718991b8c9..98cb2f01351 100644 --- a/docs/v15.0.2/configuration/data-management-types/index.html +++ b/docs/v15.0.2/configuration/data-management-types/index.html @@ -5,13 +5,13 @@ Cumulus Data Management Types | Cumulus Documentation - +
    Version: v15.0.2

    Cumulus Data Management Types

    What Are The Cumulus Data Management Types

    • Collections: Collections are logical sets of data objects of the same data type and version. They provide contextual information used by Cumulus ingest.
    • Granules: Granules are the smallest aggregation of data that can be independently managed. They are always associated with a collection, which is a grouping of granules.
    • Providers: Providers generate and distribute input data that Cumulus obtains and sends to workflows.
    • Rules: Rules tell Cumulus how to associate providers and collections and when/how to start processing a workflow.
    • Workflows: Workflows are composed of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage, and archive data.
    • Executions: Executions are records of a workflow.
    • Reconciliation Reports: Reports are a comparison of data sets to check to see if they are in agreement and to help Cumulus users detect conflicts.

    Interaction

    • Providers tell Cumulus where to get new data - i.e. S3, HTTPS
    • Collections tell Cumulus where to store the data files
    • Rules tell Cumulus when to trigger a workflow execution and tie providers and collections together

    Managing Data Management Types

    The following are created via the dashboard or API:

    • Providers
    • Collections
    • Rules
    • Reconciliation reports

    Granules are created by workflow executions and then can be managed via the dashboard or API.

    An execution record is created for each workflow execution triggered and can be viewed in the dashboard or data can be retrieved via the API.

    Workflows are created and managed via the Cumulus deployment.

    Configuration Fields

    Schemas

    Looking at our API schema definitions can provide us with some insight into collections, providers, rules, and their attributes (and whether those are required or not). The schema for different concepts will be reference throughout this document.

    The schemas are extremely useful for understanding which attributes are configurable and which of those are required. Cumulus uses these schemas for validation.

    Providers

    Please note:

    • While connection configuration is defined here, things that are more specific to a specific ingest setup (e.g. 'What target directory should we be pulling from' or 'How is duplicate handling configured?') are generally defined in a Rule or Collection, not the Provider.
    • There is some provider behavior which is controlled by task-specific configuration and not the provider definition. This configuration has to be set on a per-workflow basis. For example, see the httpListTimeout configuration on the discover-granules task

    Provider Configuration

    The Provider configuration is defined by a JSON object that takes different configuration keys depending on the provider type. The following are definitions of typical configuration values relevant for the various providers:

    Configuration by provider type
    S3
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be s3 for this provider type.
    hoststringYesS3 Bucket to pull data from
    http
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be http for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 80
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certificateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    https
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be https for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 443
    allowedRedirectsstring[]NoOnly hosts in this list will have the provider username/password forwarded for authentication. Entries should be specified as host.com or host.com:7000 if redirect port is different than the provider port.
    certiciateUristringNoSSL Certificate S3 URI for custom or self-signed SSL (TLS) certificate
    ftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be ftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to anonymous if not defined
    passwordstringNoPassword to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to password if not defined
    portintegerNoPort to connect to the provider on. Defaults to 21
    sftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be sftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the sftp server.
    passwordstringNoPassword to use to connect to the sftp server.
    portintegerNoPort to connect to the provider on. Defaults to 22
    privateKeystringNofilename assumed to be in s3://bucketInternal/stackName/crypto
    cmKeyIdstringNoAWS KMS Customer Master Key arn or alias

    Collections

    Break down of [s3_MOD09GQ_006.json](https://github.com/nasa/cumulus/blob/master/example/data/collections/s3_MOD09GQ_006/s3_MOD09GQ_006.json)
    KeyValueRequiredDescription
    name"MOD09GQ"YesThe name attribute designates the name of the collection. This is the name under which the collection will be displayed on the dashboard
    version"006"YesA version tag for the collection
    granuleId"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$"YesThe regular expression used to validate the granule ID extracted from filenames according to the granuleIdExtraction
    granuleIdExtraction"(MOD09GQ\..*)(\.hdf|\.cmr|_ndvi\.jpg)"YesThe regular expression used to extract the granule ID from filenames. The first capturing group extracted from the filename by the regex will be used as the granule ID.
    sampleFileName"MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesAn example filename belonging to this collection
    files<JSON Object> of files defined hereYesDescribe the individual files that will exist for each granule in this collection (size, browse, meta, etc.)
    dataType"MOD09GQ"NoCan be specified, but this value will default to the collection_name if not
    duplicateHandling"replace"No("replace"|"version"|"skip") determines granule duplicate handling scheme
    ignoreFilesConfigForDiscoveryfalse (default)NoBy default, during discovery only files that match one of the regular expressions in this collection's files attribute (see above) are ingested. Setting this to true will ignore the files attribute during discovery, meaning that all files for a granule (i.e., all files with filenames matching granuleIdExtraction) will be ingested even when they don't match a regular expression in the files attribute at discovery time. (NOTE: this attribute does not appear in the example file, but is listed here for completeness.)
    process"modis"NoExample options for this are found in the ChooseProcess step definition in the IngestAndPublish workflow definition
    meta<JSON Object> of MetaData for the collectionNoMetaData for the collection. This metadata will be available to workflows for this collection via the Cumulus Message Adapter.
    url_path"{cmrMetadata.Granule.Collection.ShortName}/
    {substring(file.fileName, 0, 3)}"
    NoFilename without extension

    files-object

    KeyValueRequiredDescription
    regex"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"YesRegular expression used to identify the file
    sampleFileNameMOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesFilename used to validate the provided regex
    type"data"NoValue to be assigned to the Granule File Type. CNM types are used by Cumulus CMR steps, non-CNM values will be treated as 'data' type. Currently only utilized in DiscoverGranules task
    bucket"internal"YesName of the bucket where the file will be stored
    url_path"${collectionShortName}/{substring(file.fileName, 0, 3)}"NoFolder used to save the granule in the bucket. Defaults to the collection url_path
    checksumFor"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"NoIf this is a checksum file, set checksumFor to the regex of the target file.

    Rules

    Rules are used by to start processing workflows and the transformation process. Rules can be invoked manually, based on a schedule, or can be configured to be triggered by either events in Kinesis, SNS messages or SQS messages.

    Rule configuration
    KeyValueRequiredDescription
    name"L2_HR_PIXC_kinesisRule"YesName of the rule. This is the name under which the rule will be listed on the dashboard
    workflow"CNMExampleWorkflow"YesName of the workflow to be run. A list of available workflows can be found on the Workflows page
    provider"PODAAC_SWOT"NoConfigured provider's ID. This can be found on the Providers dashboard page
    collection<JSON Object> collection object shown belowYesName and version of the collection this rule will moderate. Relates to a collection configured and found in the Collections page
    payload<JSON Object or Array>NoThe payload to be passed to the workflow
    meta<JSON Object> of MetaData for the ruleNoMetaData for the rule. This metadata will be available to workflows for this rule via the Cumulus Message Adapter.
    rule<JSON Object> rule type and associated values - discussed belowYesObject defining the type and subsequent attributes of the rule
    state"ENABLED"No("ENABLED"|"DISABLED") whether or not the rule will be active. Defaults to "ENABLED".
    queueUrlhttps://sqs.us-east-1.amazonaws.com/1234567890/queue-nameNoURL for SQS queue that will be used to schedule workflows for this rule
    tags["kinesis", "podaac"]NoAn array of strings that can be used to simplify search

    collection-object

    KeyValueRequiredDescription
    name"L2_HR_PIXC"YesName of a collection defined/configured in the Collections dashboard page
    version"000"YesVersion number of a collection defined/configured in the Collections dashboard page

    meta-object

    KeyValueRequiredDescription
    retries3NoNumber of retries on errors, for sqs-type rule only. Defaults to 3.
    visibilityTimeout900NoVisibilityTimeout in seconds for the inflight messages, for sqs-type rule only. Defaults to the visibility timeout of the SQS queue when the rule is created.

    rule-object

    KeyValueRequiredDescription
    type"kinesis"Yes("onetime"|"scheduled"|"kinesis"|"sns"|"sqs") type of scheduling/workflow kick-off desired
    value<String> ObjectDependsDiscussion of valid values is below

    rule-value

    The rule - value entry depends on the type of run:

    • If this is a onetime rule this can be left blank. Example
    • If this is a scheduled rule this field must hold a valid cron-type expression or rate expression.
    • If this is a kinesis rule, this must be a configured ${Kinesis_stream_ARN}. Example
    • If this is an sns rule, this must be an existing ${SNS_Topic_Arn}. Example
    • If this is an sqs rule, this must be an existing ${SQS_QueueUrl} that your account has permissions to access, and also you must configure a dead-letter queue for this SQS queue. Example

    sqs-type rule features

    • When an SQS rule is triggered, the SQS message remains on the queue.
    • The SQS message is not processed multiple times in parallel when visibility timeout is properly set. You should set the visibility timeout to the maximum expected length of the workflow with padding. Longer is better to avoid parallel processing.
    • The SQS message visibility timeout can be overridden by the rule.
    • Upon successful workflow execution, the SQS message is removed from the queue.
    • Upon failed execution(s), the workflow is run 3 or configured number of times.
    • Upon failed execution(s), the visibility timeout will be set to 5s to allow retries.
    • After configured number of failed retries, the SQS message is moved to the dead-letter queue configured for the SQS queue.

    Configuration Via Cumulus Dashboard

    Create A Provider

    • In the Cumulus dashboard, go to the Provider page.

    Screenshot of Create Provider form

    • Click on Add Provider.
    • Fill in the form and then submit it.

    Screenshot of Create Provider form

    Create A Collection

    • Go to the Collections page.

    Screenshot of the Collections page

    • Click on Add Collection.
    • Copy and paste or fill in the collection JSON object form.

    Screenshot of Add Collection form

    • Once you submit the form, you should be able to verify that your new collection is in the list.

    Create A Rule

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Rule Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v15.0.2/configuration/lifecycle-policies/index.html b/docs/v15.0.2/configuration/lifecycle-policies/index.html index 573e621c748..316f6fc138c 100644 --- a/docs/v15.0.2/configuration/lifecycle-policies/index.html +++ b/docs/v15.0.2/configuration/lifecycle-policies/index.html @@ -5,13 +5,13 @@ Setting S3 Lifecycle Policies | Cumulus Documentation - +
    Version: v15.0.2

    Setting S3 Lifecycle Policies

    This document will outline, in brief, how to set data lifecycle policies so that you are more easily able to control data storage costs while keeping your data accessible. For more information on why you might want to do this, see the 'Additional Information' section at the end of the document.

    Requirements

    • The AWS CLI installed and configured (if you wish to run the CLI example). See AWS's guide to setting up the AWS CLI for more on this. Please ensure the AWS CLI is in your shell path.
    • You will need a S3 bucket on AWS. You are strongly encouraged to use a bucket without voluminous amounts of data in it for experimenting/learning.
    • An AWS user with the appropriate roles to access the target bucket as well as modify bucket policies.

    Examples

    Walk-through on setting time-based S3 Infrequent Access (S3IA) bucket policy

    This example will give step-by-step instructions on updating a bucket's lifecycle policy to move all objects in the bucket from the default storage to S3 Infrequent Access (S3IA) after a period of 90 days. Below are instructions for walking through configuration via the command line and the management console.

    Command Line

    Please ensure you have the AWS CLI installed and configured for access prior to attempting this example.

    Create policy

    From any directory you chose, open an editor and add the following to a file named exampleRule.json

    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    Set policy

    On the command line run the following command (with the bucket you're working with substituted in place of yourBucketNameHere).

    aws s3api put-bucket-lifecycle-configuration --bucket yourBucketNameHere --lifecycle-configuration file://exampleRule.json

    Verify policy has been set

    To obtain all of the existing policies for a bucket, run the following command (again substituting the correct bucket name):

     $ aws s3api get-bucket-lifecycle-configuration --bucket yourBucketNameHere
    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    You have set a policy that transitions any version of an object in the bucket to S3IA after each object version has not been modified for 90 days.

    Management Console

    Create Policy

    To create the example policy on a bucket via the management console, go to the following URL (replacing 'yourBucketHere' with the bucket you intend to update):

    https://s3.console.aws.amazon.com/s3/buckets/yourBucketHere/?tab=overview

    You should see a screen similar to:

    Screenshot of AWS console for an S3 bucket

    Click the "Management" Tab, then lifecycle button and press + Add lifecycle rule:

    Screenshot of &quot;Management&quot; tab of AWS console for an S3 bucket

    Give the rule a name (e.g. '90DayRule'), leaving the filter blank:

    Screenshot of window for configuring the name and scope of a lifecycle rule on an S3 bucket in the AWS console

    Click next, and mark Current Version and Previous Versions.

    Then for each, click + Add transition and select Transition to Standard-IA after for the Object creation field, and set 90 for the Days after creation/Days after objects become concurrent field. Your screen should look similar to:

    Screenshot of window for configuring the storage class transitions of a lifecycle rule on an S3 bucket in the AWS console

    Click next, then next past the Configure expiration screen (we won't be setting this), and on the fourth page, click Save:

    Screenshot of window for reviewing the configuration of a lifecycle rule on an S3 bucket in the AWS console

    You should now see you have a rule configured for your bucket:

    Screenshot of lifecycle rule appearing in the &quot;Management&quot; tab of AWS console for an S3 bucket

    You have now set a policy that transitions any version of an object in the bucket to S3IA after each object has not been modified for 90 days.

    Additional Information

    This section lists information you may want prior to enacting lifecycle policies. It is not required content for working through the examples.

    Strategy Overview

    For a discussion of overall recommended strategy, please review the Methodology for Data Lifecycle Management on the EarthData wiki.

    AWS Documentation

    The examples shown in this document are obviously fairly basic cases. By using object tags, filters and other configuration options you can enact far more complicated policies for various scenarios. For more reading on the topics presented on this page see:

    - + \ No newline at end of file diff --git a/docs/v15.0.2/configuration/monitoring-readme/index.html b/docs/v15.0.2/configuration/monitoring-readme/index.html index 5c0948affea..917bbda8595 100644 --- a/docs/v15.0.2/configuration/monitoring-readme/index.html +++ b/docs/v15.0.2/configuration/monitoring-readme/index.html @@ -5,14 +5,14 @@ Monitoring Best Practices | Cumulus Documentation - +
    Version: v15.0.2

    Monitoring Best Practices

    This document intends to provide a set of recommendations and best practices for monitoring the state of a deployed Cumulus and diagnosing any issues.

    Cumulus-provided resources and integrations for monitoring

    Cumulus provides a number set of resources that are useful for monitoring the system and its operation.

    Cumulus Dashboard

    The primary tool for monitoring the Cumulus system is the Cumulus Dashboard. The dashboard is hosted on Github and includes instructions on how to deploy and link it into your core Cumulus deployment.

    The dashboard displays workflow executions, their status, inputs, outputs, and some diagnostic information such as logs. For further information on the dashboard, its usage, and the information it provides, see the documentation.

    Cumulus-provided AWS resources

    Cumulus sets up CloudWatch log groups for all Core-provided tasks.

    Monitoring Lambda Functions

    Logging for each Lambda Function is available in Lambda-specific CloudWatch log groups.

    Monitoring ECS services

    Each deployed cumulus_ecs_service module also includes a CloudWatch log group for the processes running on ECS.

    Monitoring workflows

    For advanced debugging, we also configure dead letter queues on critical system functions. These will allow you to monitor and debug invalid inputs to the functions we use to start workflows, which can be helpful if you find that you are not seeing workflows being started as expected. More information on these can be found in the dead letter queue documentation

    AWS recommendations

    AWS has a number of recommendations on system monitoring. Rather than reproduce those here and risk providing outdated guidance, we've documented the following links which will take you to available AWS docs on monitoring recommendations and best practices for the services used in Cumulus:

    Example: Setting up email notifications for CloudWatch logs

    Cumulus does not provide out-of-the-box support for email notifications at this time. However, setting up email notifications on AWS is fairly straightforward in that the operative components are an AWS SNS topic and a subscribed email address.

    In terms of Cumulus integration, forwarding CloudWatch logs requires creating a mechanism, most likely a Lambda Function subscribed to the log group that will receive, filter and forward these messages to the SNS topic.

    As a very simple example, we could create a function that filters CloudWatch logs created by the @cumulus/logger package and sends email notifications for error and fatal log levels, adapting the example linked above:

    const zlib = require('zlib');
    const aws = require('aws-sdk');
    const { promisify } = require('util');

    const gunzip = promisify(zlib.gunzip);
    const sns = new aws.SNS();

    exports.handler = async (event) => {
    const payload = Buffer.from(event.awslogs.data, 'base64');
    const decompressedData = await gunzip(payload);
    const logData = JSON.parse(decompressedData.toString('ascii'));
    return await Promise.all(logData.logEvents.map(async (logEvent) => {
    const logMessage = JSON.parse(logEvent.message);
    if (['error', 'fatal'].includes(logMessage.level)) {
    return sns.publish({
    TopicArn: process.env.EmailReportingTopicArn,
    Message: logEvent.message
    }).promise();
    }
    return Promise.resolve();
    }));
    };

    After creating the SNS topic, We can deploy this code as a lambda function, following the setup steps from Amazon. Make sure to include your SNS topic ARN as an environment variable on the lambda function by using the --environment option on aws lambda create-function.

    You will need to create subscription filters for each log group you want to receive emails for. We recommend automating this as much as possible, and you could very well handle this via Terraform, such as using a module to deploy filters alongside log groups, or exporting the log group names to an all-in-one email notification module.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/configuration/server_access_logging/index.html b/docs/v15.0.2/configuration/server_access_logging/index.html index 1e1c90cebfd..06c5f67f4ee 100644 --- a/docs/v15.0.2/configuration/server_access_logging/index.html +++ b/docs/v15.0.2/configuration/server_access_logging/index.html @@ -5,13 +5,13 @@ S3 Server Access Logging | Cumulus Documentation - +
    Version: v15.0.2

    S3 Server Access Logging

    Via AWS Console

    Enable server access logging for an S3 bucket

    Via AWS Command Line Interface

    1. Create a logging.json file with these contents, replacing <stack-internal-bucket> with your stack's internal bucket name, and <stack> with the name of your cumulus stack.

      {
      "LoggingEnabled": {
      "TargetBucket": "<stack-internal-bucket>",
      "TargetPrefix": "<stack>/ems-distribution/s3-server-access-logs/"
      }
      }
    2. Add the logging policy to each of your protected and public buckets by calling this command on each bucket.

      aws s3api put-bucket-logging --bucket <protected/public-bucket-name> --bucket-logging-status file://logging.json
    3. Verify the logging policy exists on your buckets.

      aws s3api get-bucket-logging --bucket <protected/public-bucket-name>
    - + \ No newline at end of file diff --git a/docs/v15.0.2/configuration/task-configuration/index.html b/docs/v15.0.2/configuration/task-configuration/index.html index 4ffe541e050..6903540f46f 100644 --- a/docs/v15.0.2/configuration/task-configuration/index.html +++ b/docs/v15.0.2/configuration/task-configuration/index.html @@ -5,13 +5,13 @@ Configuration of Tasks | Cumulus Documentation - +
    Version: v15.0.2

    Configuration of Tasks

    The cumulus module exposes values for configuration for some of the provided archive and ingest tasks. Currently the following are available as configurable variables:

    cmr_search_client_config

    Configuration parameters for CMR search client for cumulus archive module tasks in the form:

    <lambda_identifier>_report_cmr_limit = <maximum number records can be returned from cmr-client search, this should be greater than cmr_page_size>
    <lambda_identifier>_report_cmr_page_size = <number of records for each page returned from CMR>
    type = map(string)

    More information about cmr limit and cmr page_size can be found from @cumulus/cmr-client and CMR Search API document.

    Currently the following values are supported:

    • create_reconciliation_report_cmr_limit
    • create_reconciliation_report_cmr_page_size

    Example

    cmr_search_client_config = {
    create_reconciliation_report_cmr_limit = 2500
    create_reconciliation_report_cmr_page_size = 250
    }

    elasticsearch_client_config

    Configuration parameters for Elasticsearch client for cumulus archive module tasks in the form:

    <lambda_identifier>_es_scroll_duration = <duration>
    <lambda_identifier>_es_scroll_size = <size>
    type = map(string)

    Currently the following values are supported:

    • create_reconciliation_report_es_scroll_duration
    • create_reconciliation_report_es_scroll_size

    Example

    elasticsearch_client_config = {
    create_reconciliation_report_es_scroll_duration = "15m"
    create_reconciliation_report_es_scroll_size = 2000
    }

    lambda_timeouts

    A configurable map of timeouts (in seconds) for cumulus ingest module task lambdas in the form:

    <lambda_identifier>_timeout: <timeout>
    type = map(string)

    Currently the following values are supported:

    • add_missing_file_checksums_task_timeout
    • discover_granules_task_timeout
    • discover_pdrs_task_timeout
    • fake_processing_task_timeout
    • files_to_granules_task_timeout
    • hello_world_task_timeout
    • hyrax_metadata_update_tasks_timeout
    • lzards_backup_task_timeout
    • move_granules_task_timeout
    • parse_pdr_task_timeout
    • pdr_status_check_task_timeout
    • post_to_cmr_task_timeout
    • queue_granules_task_timeout
    • queue_pdrs_task_timeout
    • queue_workflow_task_timeout
    • sf_sqs_report_task_timeout
    • sync_granule_task_timeout
    • update_granules_cmr_metadata_file_links_task_timeout

    Example

    lambda_timeouts = {
    discover_granules_task_timeout = 300
    }

    lambda_memory_sizes

    A configurable map of memory sizes (in MBs) for cumulus ingest module task lambdas in the form:

    <lambda_identifier>_memory_size: <memory_size>
    type = map(string)

    Currently the following values are supported:

    • add_missing_file_checksums_task_memory_size
    • discover_granules_task_memory_size
    • discover_pdrs_task_memory_size
    • fake_processing_task_memory_size
    • hyrax_metadata_updates_task_memory_size
    • lzards_backup_task_memory_size
    • move_granules_task_memory_size
    • parse_pdr_task_memory_size
    • pdr_status_check_task_memory_size
    • post_to_cmr_task_memory_size
    • queue_granules_task_memory_size
    • queue_pdrs_task_memory_size
    • queue_workflow_task_memory_size
    • sf_sqs_report_task_memory_size
    • sync_granule_task_memory_size
    • update_cmr_acess_constraints_task_memory_size
    • update_granules_cmr_metadata_file_links_task_memory_size

    Example

    lambda_memory_sizes = {
    queue_granules_task_memory_size = 1036
    }
    - + \ No newline at end of file diff --git a/docs/v15.0.2/data-cookbooks/about-cookbooks/index.html b/docs/v15.0.2/data-cookbooks/about-cookbooks/index.html index a2794815dbf..0e0c36cc6ce 100644 --- a/docs/v15.0.2/data-cookbooks/about-cookbooks/index.html +++ b/docs/v15.0.2/data-cookbooks/about-cookbooks/index.html @@ -5,13 +5,13 @@ About Cookbooks | Cumulus Documentation - +
    Version: v15.0.2

    About Cookbooks

    Introduction

    The following data cookbooks are documents containing examples and explanations of workflows in the Cumulus framework. Additionally, the following data cookbooks should serve to help unify an institution/user group on a set of terms.

    Setup

    The data cookbooks assume you can configure providers, collections, and rules to run workflows. Visit Cumulus data management types for information on how to configure Cumulus data management types.

    Adding a page

    As shown in detail in the "Add a New Page and Sidebars" section in Cumulus Docs: How To's, you can add a new page to the data cookbook by creating a markdown (.md) file in the docs/data-cookbooks directory. The new page can then be linked to the sidebar by adding it to the Data-Cookbooks object in the website/sidebar.json file as data-cookbooks/${id}.

    More about workflows

    Workflow general information

    Input & Output

    Developing Workflow Tasks

    Workflow Configuration How-to's

    - + \ No newline at end of file diff --git a/docs/v15.0.2/data-cookbooks/browse-generation/index.html b/docs/v15.0.2/data-cookbooks/browse-generation/index.html index d9e205d916b..6f712927aad 100644 --- a/docs/v15.0.2/data-cookbooks/browse-generation/index.html +++ b/docs/v15.0.2/data-cookbooks/browse-generation/index.html @@ -5,7 +5,7 @@ Ingest Browse Generation | Cumulus Documentation - + @@ -15,7 +15,7 @@ provider keys with the previously entered values) Note that you need to set the "provider_path" to the path on your bucket (e.g. "/data") that you've staged your mock/test data.:

    {
    "name": "TestBrowseGeneration",
    "workflow": "DiscoverGranulesBrowseExample",
    "provider": "{{provider_from_previous_step}}",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "meta": {
    "provider_path": "{{path_to_data}}"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "updatedAt": 1553053438767
    }

    Run Workflows

    Once you've configured the Collection and Provider and added a onetime rule, you're ready to trigger your rule, and watch the ingest workflows process.

    Go to the Rules tab, click the rule you just created:

    Screenshot of the Rules overview page with a list of rules in the Cumulus dashboard

    Then click the gear in the upper right corner and click "Rerun":

    Screenshot of clicking the button to rerun a workflow rule from the rule edit page in the Cumulus dashboard

    Tab over to executions and you should see the DiscoverGranulesBrowseExample workflow run, succeed, and then moments later the CookbookBrowseExample should run and succeed.

    Screenshot of page listing executions in the Cumulus dashboard

    Results

    You can verify your data has ingested by clicking the successful workflow entry:

    Screenshot of individual entry from table listing executions in the Cumulus dashboard

    Select "Show Output" on the next page

    Screenshot of &quot;Show output&quot; button from individual execution page in the Cumulus dashboard

    and you should see in the payload from the workflow something similar to:

    "payload": {
    "process": "modis",
    "granules": [
    {
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-private",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "type": "browse",
    "bucket": "cumulus-test-sandbox-protected",
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.fileName, 0, 3)}",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "key": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-protected-2",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.fileName, 0, 3)}"
    }
    ],
    "cmrLink": "https://cmr.uat.earthdata.nasa.gov/search/granules.json?concept_id=G1222231611-CUMULUS",
    "cmrConceptId": "G1222231611-CUMULUS",
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "cmrMetadataFormat": "echo10",
    "dataType": "MOD09GQ",
    "version": "006",
    "published": true
    }
    ]
    }

    You can verify the granules exist within your cumulus instance (search using the Granules interface, check the S3 buckets, etc) and validate that the above CMR entry


    Build Processing Lambda

    This section discusses the construction of a custom processing lambda to replace the contrived example from this entry for a real dataset processing task.

    To ingest your own data using this example, you will need to construct your own lambda to replace the source in ProcessingStep that will generate browse imagery and provide or update a CMR metadata export file.

    You will then need to add the lambda to your Cumulus deployment as a aws_lambda_function Terraform resource.

    The discussion below outlines requirements for this lambda.

    Inputs

    The incoming message to the task defined in the ProcessingStep as configured will have the following configuration values (accessible inside event.config courtesy of the message adapter):

    Configuration

    • event.config.bucket -- the name of the bucket configured in terraform.tfvars as your internal bucket.

    • event.config.collection -- The full collection object we will configure in the Configure Ingest section. You can view the expected collection schema in the docs here or in the source code on github. You need this as available input and output so you can update as needed.

    event.config.additionalUrls, generateFakeBrowse and event.config.cmrMetadataFormat from the example can be ignored as they're configuration flags for the provided example script.

    Payload

    The 'payload' from the previous task is accessible via event.input. The expected payload output schema from SyncGranules can be viewed here.

    In our example, the payload would look like the following. Note: The types are set per-file based on what we configured in our collection, and were initially added as part of the DiscoverGranules step in the DiscoverGranulesBrowseExample workflow.

     "payload": {
    "process": "modis",
    "granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    }
    ]
    }
    ]
    }

    Generating Browse Imagery

    The provided example script used in the example goes through all granules and adds a 'fake' .jpg browse file to the same staging location as the data staged by prior ingest tasksf.

    The processing lambda you construct will need to do the following:

    • Create a browse image file based on the input data, and stage it to a location accessible to both this task and the FilesToGranules and MoveGranules tasks in a S3 bucket.
    • Add the browse file to the input granule files, making sure to set the granule file's type to browse.
    • Update meta.input_granules with the updated granules list, as well as provide the files to be integrated by FilesToGranules as output from the task.

    Generating/updating CMR metadata

    If you do not already have a CMR file in the granules list, you will need to generate one for valid export. This example's processing script generates and adds it to the FilesToGranules file list via the payload but it can be present in the InputGranules from the DiscoverGranules task as well if you'd prefer to pre-generate it.

    Both downstream tasks MoveGranules, UpdateGranulesCmrMetadataFileLinks, and PostToCmr expect a valid CMR file to be available if you want to export to CMR.

    Expected Outputs for processing task/tasks

    In the above example, the critical portion of the output to FilesToGranules is the payload and meta.input_granules.

    In the example provided, the processing task is setup to return an object with the keys "files" and "granules". In the cumulus_message configuration, the outputs are mapped in the configuration to the payload, granules to meta.input_granules:

              "task_config": {
    "inputGranules": "{$.meta.input_granules}",
    "granuleIdExtraction": "{$.meta.collection.granuleIdExtraction}"
    }

    Their expected values from the example above may be useful in constructing a processing task:

    payload

    The payload includes a full list of files to be 'moved' into the cumulus archive. The FilesToGranules task will take this list, merge it with the information from InputGranules, then pass that list to the MoveGranules task. The MoveGranules task will then move the files to their targets. The UpdateGranulesCmrMetadataFileLinks task will update the CMR metadata file if it exists with the updated granule locations and update the CMR file etags.

    In the provided example, a payload being passed to the FilesToGranules task should be expected to look like:

      "payload": [
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml"
    ]

    This list is the list of granules FilesToGranules will act upon to add/merge with the input_granules object.

    The pathing is generated from sync-granules, but in principle the files can be staged wherever you like so long as the processing/MoveGranules task's roles have access and the filename matches the collection configuration.

    input_granules

    The FilesToGranules task utilizes the incoming payload to chose which files to move, but pulls all other metadata from meta.input_granules. As such, the output payload in the example would look like:

    "input_granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 1908635
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "size": 21708
    },
    {
    "fileName": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "bucket": "cumulus-test-sandbox-internal",
    "key": "file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg"
    }
    ]
    }
    ],
    - + \ No newline at end of file diff --git a/docs/v15.0.2/data-cookbooks/choice-states/index.html b/docs/v15.0.2/data-cookbooks/choice-states/index.html index 5c3eb9f4680..94365219997 100644 --- a/docs/v15.0.2/data-cookbooks/choice-states/index.html +++ b/docs/v15.0.2/data-cookbooks/choice-states/index.html @@ -5,13 +5,13 @@ Choice States | Cumulus Documentation - +
    Version: v15.0.2

    Choice States

    Cumulus supports AWS Step Function Choice states. A Choice state enables branching logic in Cumulus workflows.

    Choice state definitions include a list of Choice Rules. Each Choice Rule defines a logical operation which compares an input value against a value using a comparison operator. For available comparison operators, review the AWS docs.

    If the comparison evaluates to true, the Next state is followed.

    Example

    In examples/cumulus-tf/parse_pdr_workflow.tf the ParsePdr workflow uses a Choice state, CheckAgainChoice, to terminate the workflow once meta.isPdrFinished: true is returned by the CheckStatus state.

    The CheckAgainChoice state definition requires an input object of the following structure:

    {
    "meta": {
    "isPdrFinished": false
    }
    }

    Given the above input to the CheckAgainChoice state, the workflow would transition to the PdrStatusReport state.

    "CheckAgainChoice": {
    "Type": "Choice",
    "Choices": [
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": false,
    "Next": "PdrStatusReport"
    },
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": true,
    "Next": "WorkflowSucceeded"
    }
    ],
    "Default": "WorkflowSucceeded"
    }

    Advanced: Loops in Cumulus Workflows

    Understanding the complete ParsePdr workflow is not necessary to understanding how Choice states work, but ParsePdr provides an example of how Choice states can be used to create a loop in a Cumulus workflow.

    In the complete ParsePdr workflow definition, the state QueueGranules is followed by CheckStatus. From CheckStatus a loop starts: Given CheckStatus returns meta.isPdrFinished: false, CheckStatus is followed by CheckAgainChoice is followed by PdrStatusReport is followed by WaitForSomeTime, which returns to CheckStatus. Once CheckStatus returns meta.isPdrFinished: true, CheckAgainChoice proceeds to WorkflowSucceeded.

    Execution graph of SIPS ParsePdr workflow in AWS Step Functions console

    Further documentation

    For complete details on Choice state configuration options, see the Choice state documentation.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/data-cookbooks/cnm-workflow/index.html b/docs/v15.0.2/data-cookbooks/cnm-workflow/index.html index 112ee694db4..f265ddd40c6 100644 --- a/docs/v15.0.2/data-cookbooks/cnm-workflow/index.html +++ b/docs/v15.0.2/data-cookbooks/cnm-workflow/index.html @@ -5,7 +5,7 @@ CNM Workflow | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v15.0.2

    CNM Workflow

    This entry documents how to setup a workflow that utilizes the built-in CNM/Kinesis functionality in Cumulus.

    Prior to working through this entry you should be familiar with the Cloud Notification Mechanism.

    Sections


    Prerequisites

    Cumulus

    This entry assumes you have a deployed instance of Cumulus (version >= 1.16.0). The entry assumes you are deploying Cumulus via the cumulus terraform module sourced from the release page.

    AWS CLI

    This entry assumes you have the AWS CLI installed and configured. If you do not, please take a moment to review the documentation - particularly the examples relevant to Kinesis - and install it now.

    Kinesis

    This entry assumes you already have two Kinesis data steams created for use as CNM notification and response data streams.

    If you do not have two streams setup, please take a moment to review the Kinesis documentation and setup two basic single-shard streams for this example:

    Using the "Create Data Stream" button on the Kinesis Dashboard, work through the dialogue.

    You should be able to quickly use the "Create Data Stream" button on the Kinesis Dashboard, and setup streams that are similar to the following example:

    Screenshot of AWS console page for creating a Kinesis stream

    Please bear in mind that your {{prefix}}-lambda-processing IAM role will need permissions to write to the response stream for this workflow to succeed if you create the Kinesis stream with a dashboard user. If you are using the cumulus top-level module for your deployment this should be set properly.

    If not, the most straightforward approach is to attach the AmazonKinesisFullAccess policy for the stream resource to whatever role your Lambda s are using, however your environment/security policies may require an approach specific to your deployment environment.

    In operational environments it's likely science data providers would typically be responsible for providing a Kinesis stream with the appropriate permissions.

    For more information on how this process works and how to develop a process that will add records to a stream, read the Kinesis documentation and the developer guide.

    Source Data

    This entry will run the SyncGranule task against a single target data file. To that end it will require a single data file to be present in an S3 bucket matching the Provider configured in the next section.

    Collection and Provider

    Cumulus will need to be configured with a Collection and Provider entry of your choosing. The provider should match the location of the source data from the Ingest Source Data section.

    This can be done via the Cumulus Dashboard if installed or the API. It is strongly recommended to use the dashboard if possible.


    Configure the Workflow

    Provided the prerequisites have been fulfilled, you can begin adding the needed values to your Cumulus configuration to configure the example workflow.

    The following are steps that are required to set up your Cumulus instance to run the example workflow:

    Example CNM Workflow

    In this example, we're going to trigger a workflow by creating a Kinesis rule and sending a record to a Kinesis stream.

    The following workflow definition should be added to a new .tf workflow resource (e.g. cnm_workflow.tf) in your deployment directory. For the complete CNM workflow example, see examples/cumulus-tf/cnm_workflow.tf.

    Add the following to the new terraform file in your deployment directory, updating the following:

    • Set the response-endpoint key in the CnmResponse task in the workflow JSON to match the name of the Kinesis response stream you configured in the prerequisites section
    • Update the source key to the workflow module to match the Cumulus release associated with your deployment.
    module "cnm_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-workflow.zip"

    prefix = var.prefix
    name = "CNMExampleWorkflow"
    workflow_config = module.cumulus.workflow_config
    system_bucket = var.system_bucket

    {
    state_machine_definition = <<JSON
    "CNMExampleWorkflow": {
    "Comment": "CNMExampleWorkflow",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "collection": "{$.meta.collection}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "response-endpoint": "ADD YOUR RESPONSE STREAM NAME HERE",
    "region": "us-east-1",
    "type": "kinesis",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$.input.input}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 5,
    "MaxAttempts": 3
    }
    ],
    "End": true
    }
    }
    }
    }
    JSON

    Again, please make sure to modify the value response-endpoint to match the stream name (not ARN) for your Kinesis response stream.

    Lambda Configuration

    To execute this workflow, you're required to include several Lambda resources in your deployment. To do this, add the following task (Lambda) definitions to your deployment along with the workflow you created above:

    Please note: To utilize these tasks you need to ensure you have a compatible CMA layer. See the deployment instructions for more details on how to deploy a CMA layer.

    Below is a description of each of these tasks:

    CNMToCMA

    CNMToCMA is meant for the beginning of a workflow: it maps CNM granule information to a payload for downstream tasks. For other CNM workflows, you would need to ensure that downstream tasks in your workflow either understand the CNM message or include a translation task like this one.

    You can also manipulate the data sent to downstream tasks using task_config for various states in your workflow resource configuration. Read more about how to configure data on the Workflow Input & Output page.

    CnmResponse

    The CnmResponse Lambda generates a CNM response message and puts it on the response-endpoint Kinesis stream.

    You can read more about the expected schema of a CnmResponse record in the Cloud Notification Mechanism schema repository.

    Additional Tasks

    Lastly, this entry also makes use of the SyncGranule task from the cumulus module.

    Redeploy

    Once the above configuration changes have been made, redeploy your stack.

    Please refer to Update Cumulus resources in the deployment documentation if you are unfamiliar with redeployment.

    Rule Configuration

    Cumulus includes a messageConsumer Lambda function (message-consumer). Cumulus kinesis-type rules create the event source mappings between Kinesis streams and the messageConsumer Lambda. The messageConsumer Lambda consumes records from one or more Kinesis streams, as defined by enabled kinesis-type rules. When new records are pushed to one of these streams, the messageConsumer triggers workflows associated with the enabled kinesis-type rules.

    To add a rule via the dashboard (if you'd like to use the API, see the docs here), navigate to the Rules page and click Add a rule, then configure the new rule using the following template (substituting correct values for parameters denoted by ${}):

    {
    "collection": {
    "name": "L2_HR_PIXC",
    "version": "000"
    },
    "name": "L2_HR_PIXC_kinesisRule",
    "provider": "PODAAC_SWOT",
    "rule": {
    "type": "kinesis",
    "value": "arn:aws:kinesis:{{awsRegion}}:{{awsAccountId}}:stream/{{streamName}}"
    },
    "state": "ENABLED",
    "workflow": "CNMExampleWorkflow"
    }

    Please Note:

    • The rule's value attribute value must match the Amazon Resource Name ARN for the Kinesis data stream you've preconfigured. You should be able to obtain this ARN from the Kinesis Dashboard entry for the selected stream.
    • The collection and provider should match the collection and provider you setup in the Prerequisites section.

    Once you've clicked on 'submit' a new rule should appear in the dashboard's Rule Overview.


    Execute the Workflow

    Once Cumulus has been redeployed and a rule has been added, we're ready to trigger the workflow and watch it execute.

    How to Trigger the Workflow

    To trigger matching workflows, you will need to put a record on the Kinesis stream that the message-consumer Lambda will recognize as a matching event. Most importantly, it should include a collection name that matches a valid collection.

    For the purpose of this example, the easiest way to accomplish this is using the AWS CLI.

    Create Record JSON

    Construct a JSON file containing an object that matches the values that have been previously setup. This JSON object should be a valid Cloud Notification Mechanism message.

    Please note: this example is somewhat contrived, as the downstream tasks don't care about most of these fields. A 'real' data ingest workflow would.

    The following values (denoted by ${} in the sample below) should be replaced to match values we've previously configured:

    • TEST_DATA_FILE_NAME: The filename of the test data that is available in the S3 (or other) provider we created earlier.
    • TEST_DATA_URI: The full S3 path to the test data (e.g. s3://bucket-name/path/granule)
    • COLLECTION: The collection name defined in the prerequisites for this product
    {
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "${TEST_DATA_FILE_NAME}",
    "checksum": "bogus_checksum_value",
    "uri": "${TEST_DATA_URI}",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "${TEST_DATA_FILE_NAME}",
    "dataVersion": "006"
    },
    "identifier ": "testIdentifier123456",
    "collection": "${COLLECTION}",
    "provider": "TestProvider",
    "version": "001",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Add Record to Kinesis Data Stream

    Using the JSON file you created, push it to the Kinesis notification stream:

    aws kinesis put-record --stream-name YOUR_KINESIS_NOTIFICATION_STREAM_NAME_HERE --partition-key 1 --data file:///path/to/file.json

    Please note: The above command uses the stream name, not the ARN.

    The command should return output similar to:

    {
    "ShardId": "shardId-000000000000",
    "SequenceNumber": "42356659532578640215890215117033555573986830588739321858"
    }

    This command will put a record containing the JSON from the --data flag onto the Kinesis data stream. The messageConsumer Lambda will consume the record and construct a valid CMA payload to trigger workflows. For this example, the record will trigger the CNMExampleWorkflow workflow as defined by the rule previously configured.

    You can view the current running executions on the Executions dashboard page which presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information.

    Verify Workflow Execution

    As detailed above, once the record is added to the Kinesis data stream, the messageConsumer Lambda will trigger the CNMExampleWorkflow .

    TranslateMessage

    TranslateMessage (which corresponds to the CNMToCMA Lambda) will take the CNM object payload and add a granules object to the CMA payload that's consistent with other Cumulus ingest tasks, and add a meta.cnm key (as well as the payload) to store the original message.

    For more on the Message Adapter, please see the Message Flow documentation.

    An example of what is happening in the CNMToCMA Lambda is as follows:

    Example Input Payload:

    "payload": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some_bucket/cumulus-test-data/pdrs/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Example Output Payload:

      "payload": {
    "cnm": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552"
    },
    "output": {
    "granules": [
    {
    "granuleId": "TestGranuleUR",
    "files": [
    {
    "path": "some-bucket/data",
    "url_path": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "some-bucket",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 12345678
    }
    ]
    }
    ]
    }
    }

    SyncGranules

    This Lambda will take the files listed in the payload and move them to s3://{deployment-private-bucket}/file-staging/{deployment-name}/{COLLECTION}/{file_name}.

    CnmResponse

    Assuming a successful execution of the workflow, this task will recover the meta.cnm key from the CMA output, and add a "SUCCESS" record to the notification Kinesis stream.

    If a prior step in the workflow has failed, this will add a "FAILURE" record to the stream instead.

    The data written to the response-endpoint should adhere to the Response Message Fields schema.

    Example CNM Success Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "SUCCESS"
    }
    }

    Example CNM Error Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "FAILURE",
    "errorCode": "PROCESSING_ERROR",
    "errorMessage": "File [cumulus-dev-a4d38f59-5e57-590c-a2be-58640db02d91/prod_20170926T11:30:36/production_file.nc] did not match gve checksum value."
    }
    }

    Note the CnmResponse state defined in the .tf workflow definition above configures $.exception to be passed to the CnmResponse Lambda keyed under config.WorkflowException. This is required for the CnmResponse code to deliver a failure response.

    To test the failure scenario, send a record missing the product.name key.


    Verify results

    Check for successful execution on the dashboard

    Following the successful execution of this workflow, you should expect to see the workflow complete successfully on the dashboard:

    Screenshot of a successful CNM workflow appearing on the executions page of the Cumulus dashboard

    Check the test granule has been delivered to S3 staging

    The test granule identified in the Kinesis record should be moved to the deployment's private staging area.

    Check for Kinesis records

    A SUCCESS notification should be present on the response-endpoint Kinesis stream.

    You should be able to validate the notification and response streams have the expected records with the following steps (the AWS CLI Kinesis Basic Stream Operations is useful to review before proceeding):

    Get a shard iterator (substituting your stream name as appropriate):

    aws kinesis get-shard-iterator \
    --shard-id shardId-000000000000 \
    --shard-iterator-type LATEST \
    --stream-name NOTIFICATION_OR_RESPONSE_STREAM_NAME

    which should result in an output to:

    {
    "ShardIterator": "VeryLongString=="
    }
    • Re-trigger the workflow by using the put-record command from
    • As the workflow completes, use the output from the get-shard-iterator command to request data from the stream:
    aws kinesis get-records --shard-iterator SHARD_ITERATOR_VALUE

    This should result in output similar to:

    {
    "Records": [
    {
    "SequenceNumber": "49586720336541656798369548102057798835250389930873978882",
    "ApproximateArrivalTimestamp": 1532664689.128,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjI4LjkxOSJ9",
    "PartitionKey": "1"
    },
    {
    "SequenceNumber": "49586720336541656798369548102059007761070005796999266306",
    "ApproximateArrivalTimestamp": 1532664707.149,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjQ2Ljk1OCJ9",
    "PartitionKey": "1"
    }
    ],
    "NextShardIterator": "AAAAAAAAAAFo9SkF8RzVYIEmIsTN+1PYuyRRdlj4Gmy3dBzsLEBxLo4OU+2Xj1AFYr8DVBodtAiXbs3KD7tGkOFsilD9R5tA+5w9SkGJZ+DRRXWWCywh+yDPVE0KtzeI0andAXDh9yTvs7fLfHH6R4MN9Gutb82k3lD8ugFUCeBVo0xwJULVqFZEFh3KXWruo6KOG79cz2EF7vFApx+skanQPveIMz/80V72KQvb6XNmg6WBhdjqAA==",
    "MillisBehindLatest": 0
    }

    Note the data encoding is not human readable and would need to be parsed/converted to be interpretable. There are many options to build a Kineis consumer such as the KCL.

    For purposes of validating the workflow, it may be simpler to locate the workflow in the Step Function Management Console and assert the expected output is similar to the below examples.

    Successful CNM Response Object Example:

    {
    "cnmResponse": {
    "provider": "TestProvider",
    "collection": "MOD09GQ",
    "version": "123456",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier ": "testIdentifier123456",
    "response": {
    "status": "SUCCESS"
    }
    }
    }

    Kinesis Record Error Handling

    messageConsumer

    The default Kinesis stream processing in the Cumulus system is configured for record error tolerance.

    When the messageConsumer fails to process a record, the failure is captured and the record is published to the kinesisFallback SNS Topic. The kinesisFallback SNS topic broadcasts the record and a subscribed copy of the messageConsumer Lambda named kinesisFallback consumes these failures.

    At this point, the normal Lambda asynchronous invocation retry behavior will attempt to process the record 3 mores times. After this, if the record cannot successfully be processed, it is written to a dead letter queue. Cumulus' dead letter queue is an SQS Queue named kinesisFailure. Operators can use this queue to inspect failed records.

    This system ensures when messageConsumer fails to process a record and trigger a workflow, the record is retried 3 times. This retry behavior improves system reliability in case of any external service failure outside of Cumulus control.

    The Kinesis error handling system - the kinesisFallback SNS topic, messageConsumer Lambda, and kinesisFailure SQS queue - come with the API package and do not need to be configured by the operator.

    To examine records that were unable to be processed at any step you need to go look at the dead letter queue {{prefix}}-kinesisFailure. Check the Simple Queue Service (SQS) console. Select your queue, and under the Queue Actions tab, you can choose View/Delete Messages. Start polling for messages and you will see records that failed to process through the messageConsumer.

    Note, these are only records that occurred when processing records from Kinesis streams. Workflow failures are handled differently.

    Kinesis Stream logging

    Notification Stream messages

    Cumulus includes two Lambdas (KinesisInboundEventLogger and KinesisOutboundEventLogger) that utilize the same code to take a Kinesis record event as input, deserialize the data field and output the modified event to the logs.

    When a kinesis rule is created, in addition to the messageConsumer event mapping, an event mapping is created to trigger KinesisInboundEventLogger to record a log of the inbound record, to allow for analysis in case of unexpected failure.

    Response Stream messages

    Cumulus also supports this feature for all outbound messages. To take advantage of this feature, you will need to set an event mapping on the KinesisOutboundEventLogger Lambda that targets your response-endpoint. You can do this in the Lambda management page for KinesisOutboundEventLogger. Add a Kinesis trigger, and configure it to target the cnmResponseStream for your workflow:

    Screenshot of the AWS console showing configuration for Kinesis stream trigger on KinesisOutboundEventLogger Lambda

    Once this is done, all records sent to the response-endpoint will also be logged in CloudWatch. For more on configuring Lambdas to trigger on Kinesis events, please see creating an event source mapping.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/data-cookbooks/error-handling/index.html b/docs/v15.0.2/data-cookbooks/error-handling/index.html index 71dcd073ff2..9632d8aa0a6 100644 --- a/docs/v15.0.2/data-cookbooks/error-handling/index.html +++ b/docs/v15.0.2/data-cookbooks/error-handling/index.html @@ -5,7 +5,7 @@ Error Handling in Workflows | Cumulus Documentation - + @@ -45,7 +45,7 @@ Service Exception. See this documentation on configuring your workflow to handle transient lambda errors.

    Example state machine definition:

    {
    "Comment": "Tests Workflow from Kinesis Stream",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "Path": "$.payload",
    "TargetPath": "$.payload"
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": ["States.ALL"],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowSucceeded"
    },
    "CnmResponseFail": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowFailed"
    },
    "WorkflowSucceeded": {
    "Type": "Succeed"
    },
    "WorkflowFailed": {
    "Type": "Fail",
    "Cause": "Workflow failed"
    }
    }
    }

    The above results in a workflow which is visualized in the diagram below:

    Screenshot of a visualization of an AWS Step Function workflow definition with branching logic for failures

    Summary

    Error handling should (mostly) be the domain of workflow configuration.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/data-cookbooks/hello-world/index.html b/docs/v15.0.2/data-cookbooks/hello-world/index.html index 940352b1431..4c145a841c6 100644 --- a/docs/v15.0.2/data-cookbooks/hello-world/index.html +++ b/docs/v15.0.2/data-cookbooks/hello-world/index.html @@ -5,14 +5,14 @@ HelloWorld Workflow | Cumulus Documentation - +
    Version: v15.0.2

    HelloWorld Workflow

    Example task meant to be a sanity check/introduction to the Cumulus workflows.

    Pre-Deployment Configuration

    Workflow Configuration

    A workflow definition can be found in the template repository hello_world_workflow module.

    {
    "Comment": "Returns Hello World",
    "StartAt": "HelloWorld",
    "States": {
    "HelloWorld": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.hello_world_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    }

    Workflow error-handling can be configured as discussed in the Error-Handling cookbook.

    Task Configuration

    The HelloWorld task is provided for you as part of the cumulus terraform module, no configuration is needed.

    If you want to manually deploy your own version of this Lambda for testing, you can copy the Lambda resource definition located in the Cumulus source code at cumulus/tf-modules/ingest/hello-world-task.tf. The Lambda source code is located in the Cumulus source code at 'cumulus/tasks/hello-world'.

    Execution

    We will focus on using the Cumulus dashboard to schedule the execution of a HelloWorld workflow.

    Our goal here is to create a rule through the Cumulus dashboard that will define the scheduling and execution of our HelloWorld workflow. Let's navigate to the Rules page and click Add a rule.

    {
    "collection": { # collection values can be configured and found on the Collections page
    "name": "${collection_name}",
    "version": "${collection_version}"
    },
    "name": "helloworld_rule",
    "provider": "${provider}", # found on the Providers page
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "workflow": "HelloWorldWorkflow" # This can be found on the Workflows page
    }

    Screenshot of AWS Step Function execution graph for the HelloWorld workflow Executed workflow as seen in AWS Console

    Output/Results

    The Executions page presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information. The rule defined in the previous section should start an execution of its own accord, and the status of that execution can be tracked here.

    To get some deeper information on the execution, click on the value in the Name column of your execution of interest. This should bring up a visual representation of the workflow similar to that shown above, execution details, and a list of events.

    Summary

    Setting up the HelloWorld workflow on the Cumulus dashboard is the tip of the iceberg, so to speak. The task and step-function need to be configured before Cumulus deployment. A compatible collection and provider must be configured and applied to the rule. Finally, workflow execution status can be viewed via the workflows tab on the dashboard.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/data-cookbooks/ingest-notifications/index.html b/docs/v15.0.2/data-cookbooks/ingest-notifications/index.html index 036f9401fb5..a81d960d298 100644 --- a/docs/v15.0.2/data-cookbooks/ingest-notifications/index.html +++ b/docs/v15.0.2/data-cookbooks/ingest-notifications/index.html @@ -5,13 +5,13 @@ Ingest Notification in Workflows | Cumulus Documentation - +
    Version: v15.0.2

    Ingest Notification in Workflows

    On deployment, an SQS queue and three SNS topics, one for executions, granules, and PDRs, are created and used for handling notification messages related to the workflow.

    The ingest notification reporting SQS queue is populated via a Cloudwatch rule for any Step Function execution state transitions. The sfEventSqsToDbRecords Lambda consumes this queue. The queue and Lambda are included in the cumulus module and the Cloudwatch rule in the workflow module and are included by default in a Cumulus deployment.

    The sfEventSqsToDbRecords Lambda function reads from the sfEventSqsToDbRecordsInputQueue queue and updates the RDS database records for granules, executions, and PDRs. When the records are updated, messages are posted to the three SNS topics. This Lambda is invoked both when the workflow starts and when it reaches a terminal state (completion or failure).

    Diagram of architecture for reporting workflow ingest notifications from AWS Step Functions

    Sending SQS messages to report status

    Publishing granule/PDR reports directly to the SQS queue

    If you have a non-Cumulus workflow or process ingesting data and would like to update the status of your granules or PDRs, you can publish directly to the reporting SQS queue. Publishing messages to this queue will result in those messages being stored as granule/PDR records in the Cumulus database and having the status of those granules/PDRs being visible on the Cumulus dashboard. The queue does have certain expectations as it expects a Cumulus Message nested within a Cloudwatch Step Function Event object.

    Posting directly to the queue will require knowing the queue URL. Assuming that you are using the cumulus module for your deployment, you can get the queue URL by adding them to outputs.tf for your Terraform deployment as in our example deployment:

    output "stepfunction_event_reporter_queue_url" {
    value = module.cumulus.stepfunction_event_reporter_queue_url
    }

    output "report_executions_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_granules_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_pdrs_sns_topic_arn" {
    value = module.cumulus.report_pdrs_sns_topic_arn
    }

    Then, when you run terraform deploy, you should see the topic ARNs printed to your console:

    Outputs:
    ...
    stepfunction_event_reporter_queue_url = https://sqs.us-east-1.amazonaws.com/xxxxxxxxx/<prefix>-sfEventSqsToDbRecordsInputQueue
    report_executions_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_granules_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_pdrs_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-pdrs-topic

    Once you have the queue URL, you can use the AWS SDK for your language of choice to publish messages to the topic. The expected format of these messages is that of a Cloudwatch Step Function event containing a Cumulus message. For SUCCEEDED events, the Cumulus message is expected to be in detail.output. For all other events statuses, a Cumulus Message is expected in detail.input. The Cumulus Message populating these fields MUST be a JSON string, not an object. Messages that do not conform to the schemas will fail to be created as records.

    If you are not seeing records persist to the database or show up in the Cumulus dashboard, you can investigate the Cloudwatch logs of the SQS consumer Lambda:

    • /aws/lambda/<prefix>-sfEventSqsToDbRecords

    In a workflow

    As described above, ingest notifications will automatically be published to the SNS topics on workflow start and completion/failure, so you should not include a workflow step to publish the initial or final status of your workflows.

    However, if you want to report your ingest status at any point during a workflow execution, you can add a workflow step using the SfSqsReport Lambda. In the following example from cumulus-tf/parse_pdr_workflow.tf, the ParsePdr workflow is configured to use the SfSqsReport Lambda, primarily to update the PDR ingestion status.

    Note: ${sf_sqs_report_task_arn} is an interpolated value referring to a Terraform resource. See the example deployment code for the ParsePdr workflow.

      "PdrStatusReport": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    },
    "ResultPath": null,
    "Type": "Task",
    "Resource": "${sf_sqs_report_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WaitForSomeTime"
    },

    Subscribing additional listeners to SNS topics

    Additional listeners to SNS topics can be configured in a .tf file for your Cumulus deployment. Shown below is configuration that subscribes an additional Lambda function (test_lambda) to receive messages from the report_executions SNS topic. To subscribe to the report_granules or report_pdrs SNS topics instead, simply replace report_executions in the code block below with either of those values.

    resource "aws_lambda_function" "test_lambda" {
    function_name = "${var.prefix}-testLambda"
    filename = "./testLambda.zip"
    source_code_hash = filebase64sha256("./testLambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"
    }

    resource "aws_sns_topic_subscription" "test_lambda" {
    topic_arn = module.cumulus.report_executions_sns_topic_arn
    protocol = "lambda"
    endpoint = aws_lambda_function.test_lambda.arn
    }

    resource "aws_lambda_permission" "test_lambda" {
    action = "lambda:InvokeFunction"
    function_name = aws_lambda_function.test_lambda.arn
    principal = "sns.amazonaws.com"
    source_arn = module.cumulus.report_executions_sns_topic_arn
    }

    SNS message format

    Subscribers to the SNS topics can expect to find the published message in the SNS event at Records[0].Sns.Message. The message will be a JSON stringified version of the ingest notification record for an execution or a PDR. For granules, the message will be a JSON stringified object with ingest notification record in the record property and the event type as the event property.

    The ingest notification record of the execution, granule, or PDR should conform to the data model schema for the given record type.

    Summary

    Workflows can be configured to send SQS messages at any point using the sf-sqs-report task.

    Additional listeners can be easily configured to trigger when messages are sent to the SNS topics.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/data-cookbooks/queue-post-to-cmr/index.html b/docs/v15.0.2/data-cookbooks/queue-post-to-cmr/index.html index 1314d872959..48953e45eaa 100644 --- a/docs/v15.0.2/data-cookbooks/queue-post-to-cmr/index.html +++ b/docs/v15.0.2/data-cookbooks/queue-post-to-cmr/index.html @@ -5,13 +5,13 @@ Queue PostToCmr | Cumulus Documentation - +
    Version: v15.0.2

    Queue PostToCmr

    In this document, we walk through handling CMR errors in workflows by queueing PostToCmr. We assume that the user already has an ingest workflow setup.

    Overview

    The general concept is that the last task of the ingest workflow will be QueueWorkflow, which queues the publish workflow. The publish workflow contains the PostToCmr task and if a CMR error occurs during PostToCmr, the publish workflow will add itself back onto the queue so that it can be executed when CMR is back online. This is achieved by leveraging the QueueWorkflow task again in the publish workflow. The following diagram demonstrates this queueing process.

    Diagram of workflow queueing

    Ingest Workflow

    The last step should be the QueuePublishWorkflow step. It should be configured with a queueUrl and workflow. In this case, the queueUrl is a throttled queue. Any queueUrl can be specified here which is useful if you would like to use a lower priority queue. The workflow is the unprefixed workflow name that you would like to queue (e.g. PublishWorkflow).

      "QueuePublishWorkflowStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "workflow": "{$.meta.workflow}",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Publish Workflow

    Configure the Catch section of your PostToCmr task to proceed to QueueWorkflow if a CMRInternalError is caught. Any other error will cause the workflow to fail.

      "Catch": [
    {
    "ErrorEquals": [
    "CMRInternalError"
    ],
    "Next": "RequeueWorkflow"
    },
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],

    Then, configure the QueueWorkflow task similarly to its configuration in the ingest workflow. This time, pass the current publish workflow to the task config. This allows for the publish workflow to be requeued when there is a CMR error.

    {
    "RequeueWorkflow": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "workflow": "PublishGranuleQueue",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    - + \ No newline at end of file diff --git a/docs/v15.0.2/data-cookbooks/run-tasks-in-lambda-or-docker/index.html b/docs/v15.0.2/data-cookbooks/run-tasks-in-lambda-or-docker/index.html index 8cbf826c425..670a54872bf 100644 --- a/docs/v15.0.2/data-cookbooks/run-tasks-in-lambda-or-docker/index.html +++ b/docs/v15.0.2/data-cookbooks/run-tasks-in-lambda-or-docker/index.html @@ -5,13 +5,13 @@ Run Step Function Tasks in AWS Lambda or Docker | Cumulus Documentation - +
    Version: v15.0.2

    Run Step Function Tasks in AWS Lambda or Docker

    Overview

    AWS Step Function Tasks can run tasks on AWS Lambda or on AWS Elastic Container Service (ECS) as a Docker container.

    Lambda provides serverless architecture, providing the best option for minimizing cost and server management. ECS provides the fullest extent of AWS EC2 resources via the flexibility to execute arbitrary code on any AWS EC2 instance type.

    When to use Lambda

    You should use AWS Lambda whenever all of the following are true:

    • The task runs on one of the supported Lambda Runtimes. At time of this writing, supported runtimes include versions of python, Java, Ruby, node.js, Go and .NET.
    • The lambda package is less than 50 MB in size, zipped.
    • The task consumes less than each of the following resources:
      • 3008 MB memory allocation
      • 512 MB disk storage (must be written to /tmp)
      • 15 minutes of execution time

    See this page for a complete and up-to-date list of AWS Lambda limits.

    If your task requires more than any of these resources or an unsupported runtime, creating a Docker image which can be run on ECS is the way to go. Cumulus supports running any lambda package (and its configured layers) as a Docker container with cumulus-ecs-task.

    Step Function Activities and cumulus-ecs-task

    Step Function Activities enable a state machine task to "publish" an activity task which can be picked up by any activity worker. Activity workers can run pretty much anywhere, but Cumulus workflows support the cumulus-ecs-task activity worker. The cumulus-ecs-task worker runs as a Docker container on the Cumulus ECS cluster.

    The cumulus-ecs-task container takes an AWS Lambda Amazon Resource Name (ARN) as an argument (see --lambdaArn in the example below). This ARN argument is defined at deployment time. The cumulus-ecs-task worker polls for new Step Function Activity Tasks. When a Step Function executes, the worker (container) picks up the activity task and runs the code contained in the lambda package defined on deployment.

    Example: Replacing AWS Lambda with a Docker container run on ECS

    This example will use an already-defined workflow from the cumulus module that includes the QueueGranules task in its configuration.

    The following example is an excerpt from the Discover Granules workflow containing the step definition for the QueueGranules step:

    Note: ${ingest_granule_workflow_name} and ${queue_granules_task_arn} are interpolated values that refer to Terraform resources. See the example deployment code for the Discover Granules workflow.

      "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "queueUrl": "{$.meta.queues.startSF}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Given it has been discovered this task can no longer run in AWS Lambda, you can instead run it on the Cumulus ECS cluster by adding the following resources to your terraform deployment (by either adding a new .tf file or updating an existing one):

    • A aws_sfn_activity resource:
    resource "aws_sfn_activity" "queue_granules" {
    name = "${var.prefix}-QueueGranules"
    }
    • An instance of the cumulus_ecs_service module (found on the Cumulus releases page configured to provide the QueueGranules task:

    module "queue_granules_service" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-ecs-service.zip"

    prefix = var.prefix
    name = "QueueGranules"

    cluster_arn = module.cumulus.ecs_cluster_arn
    desired_count = 1
    image = "cumuluss/cumulus-ecs-task:1.9.0"

    cpu = 400
    memory_reservation = 700

    environment = {
    AWS_DEFAULT_REGION = data.aws_region.current.name
    }
    command = [
    "cumulus-ecs-task",
    "--activityArn",
    aws_sfn_activity.queue_granules.id,
    "--lambdaArn",
    module.cumulus.queue_granules_task.task_arn,
    "--lastModified",
    module.cumulus.queue_granules_task.last_modified_date
    ]
    alarms = {
    MemoryUtilizationHigh = {
    comparison_operator = "GreaterThanThreshold"
    evaluation_periods = 1
    metric_name = "MemoryUtilization"
    statistic = "SampleCount"
    threshold = 75
    }
    }
    }

    Please note: If you have updated the code for the Lambda specified by --lambdaArn, you will have to manually restart the tasks in your ECS service before invocation of the Step Function activity will use the updated Lambda code.

    • An updated Discover Granules workflow) to utilize the new resource (the Resource key in the QueueGranules step has been updated to:

    "Resource": "${aws_sfn_activity.queue_granules.id}")`

    If you then run this workflow in place of the DiscoverGranules workflow, the QueueGranules step would run as an ECS task instead of a lambda.

    Final note

    Step Function Activities and AWS Lambda are not the only ways to run tasks in an AWS Step Function. Learn more about other service integrations, including direct ECS integration via the AWS Service Integrations page.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/data-cookbooks/sips-workflow/index.html b/docs/v15.0.2/data-cookbooks/sips-workflow/index.html index df594318b48..cfc3b76ee66 100644 --- a/docs/v15.0.2/data-cookbooks/sips-workflow/index.html +++ b/docs/v15.0.2/data-cookbooks/sips-workflow/index.html @@ -5,7 +5,7 @@ Science Investigator-led Processing Systems (SIPS) | Cumulus Documentation - + @@ -16,7 +16,7 @@ we're just going to create a onetime throw-away rule that will be easy to test with. This rule will kick off the DiscoverAndQueuePdrs workflow, which is the beginning of a Cumulus SIPS workflow:

    Screenshot of a Cumulus rule configuration

    Note: A list of configured workflows exists under the "Workflows" in the navigation bar on the Cumulus dashboard. Additionally, one can find a list of executions and their respective status in the "Executions" tab in the navigation bar.

    DiscoverAndQueuePdrs Workflow

    This workflow will discover PDRs and queue them to be processed. Duplicate PDRs will be dealt with according to the configured duplicate handling setting in the collection. The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. DiscoverPdrs - source
    2. QueuePdrs - source

    Screenshot of execution graph for discover and queue PDRs workflow in the AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the discover_and_queue_pdrs_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    ParsePdr Workflow

    The ParsePdr workflow will parse a PDR, queue the specified granules (duplicates are handled according to the duplicate handling setting) and periodically check the status of those queued granules. This workflow will not succeed until all the granules included in the PDR are successfully ingested. If one of those fails, the ParsePdr workflow will fail. NOTE that ParsePdr may spin up multiple IngestGranule workflows in parallel, depending on the granules included in the PDR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. ParsePdr - source
    2. QueueGranules - source
    3. CheckStatus - source

    Screenshot of execution graph for SIPS Parse PDR workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the parse_pdr_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    IngestGranule Workflow

    The IngestGranule workflow processes and ingests a granule and posts the granule metadata to CMR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. SyncGranule - source.
    2. CmrStep - source

    Additionally this workflow requires a processing step you must provide. The ProcessingStep step in the workflow picture below is an example of a custom processing step.

    Note: Using the CmrStep is not required and can be left out of the processing trajectory if desired (for example, in testing situations).

    Screenshot of execution graph for SIPS IngestGranule workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the ingest_and_publish_granule_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    Summary

    In this cookbook we went over setting up a collection, rule, and provider for a SIPS workflow. Once we had the setup completed, we looked over the Cumulus workflows that participate in parsing PDRs, ingesting and processing granules, and updating CMR.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/data-cookbooks/throttling-queued-executions/index.html b/docs/v15.0.2/data-cookbooks/throttling-queued-executions/index.html index e6c01eebf36..ae3991d2c19 100644 --- a/docs/v15.0.2/data-cookbooks/throttling-queued-executions/index.html +++ b/docs/v15.0.2/data-cookbooks/throttling-queued-executions/index.html @@ -5,13 +5,13 @@ Throttling queued executions | Cumulus Documentation - +
    Version: v15.0.2

    Throttling queued executions

    In this entry, we will walk through how to create an SQS queue for scheduling executions which will be used to limit those executions to a maximum concurrency. And we will see how to configure our Cumulus workflows/rules to use this queue.

    We will also review the architecture of this feature and highlight some implementation notes.

    Limiting the number of executions that can be running from a given queue is useful for controlling the cloud resource usage of workflows that may be lower priority, such as granule reingestion or reprocessing campaigns. It could also be useful for preventing workflows from exceeding known resource limits, such as a maximum number of open connections to a data provider.

    Implementing the queue

    Create and deploy the queue

    Add a new queue

    In a .tf file for your Cumulus deployment, add a new SQS queue:

    resource "aws_sqs_queue" "background_job_queue" {
    name = "${var.prefix}-backgroundJobQueue"
    receive_wait_time_seconds = 20
    visibility_timeout_seconds = 60
    }

    Set maximum executions for the queue

    Define the throttled_queues variable for the cumulus module in your Cumulus deployment to specify the maximum concurrent executions for the queue.

    module "cumulus" {
    # ... other variables

    throttled_queues = [{
    url = aws_sqs_queue.background_job_queue.id,
    execution_limit = 5
    }]
    }

    Setup consumer for the queue

    Add the sqs2sfThrottle Lambda as the consumer for the queue and add a Cloudwatch event rule/target to read from the queue on a scheduled basis.

    Please note: You must use the sqs2sfThrottle Lambda as the consumer for any queue with a queue execution limit or else the execution throttling will not work correctly. Additionally, please allow at least 60 seconds after creation before using the queue while associated infrastructure and triggers are set up and made ready.

    aws_sqs_queue.background_job_queue.id refers to the queue resource defined above.

    resource "aws_cloudwatch_event_rule" "background_job_queue_watcher" {
    schedule_expression = "rate(1 minute)"
    }

    resource "aws_cloudwatch_event_target" "background_job_queue_watcher" {
    rule = aws_cloudwatch_event_rule.background_job_queue_watcher.name
    arn = module.cumulus.sqs2sfThrottle_lambda_function_arn
    input = jsonencode({
    messageLimit = 500
    queueUrl = aws_sqs_queue.background_job_queue.id
    timeLimit = 60
    })
    }

    resource "aws_lambda_permission" "background_job_queue_watcher" {
    action = "lambda:InvokeFunction"
    function_name = module.cumulus.sqs2sfThrottle_lambda_function_arn
    principal = "events.amazonaws.com"
    source_arn = aws_cloudwatch_event_rule.background_job_queue_watcher.arn
    }

    Re-deploy your Cumulus application

    Follow the instructions to re-deploy your Cumulus application. After you have re-deployed, your workflow template will be updated to the include information about the queue (the output below is partial output from an expected workflow template):

    {
    "cumulus_meta": {
    "queueExecutionLimits": {
    "<backgroundJobQueue_SQS_URL>": 5
    }
    }
    }

    Integrate your queue with workflows and/or rules

    Integrate queue with queuing steps in workflows

    For any workflows using QueueGranules or QueuePdrs that you want to use your new queue, update the Cumulus configuration of those steps in your workflows.

    As seen in this partial configuration for a QueueGranules step, update the queueUrl to reference the new throttled queue:

    Note: ${ingest_granule_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverGranules workflow.

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}"
    }
    }
    }
    }
    }

    Similarly, for a QueuePdrs step:

    Note: ${parse_pdr_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverPdrs workflow.

    {
    "QueuePdrs": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "parsePdrWorkflow": "${parse_pdr_workflow_name}"
    }
    }
    }
    }
    }

    After making these changes, re-deploy your Cumulus application for the execution throttling to take effect on workflow executions queued by these workflows.

    Create/update a rule to use your new queue

    Create or update a rule definition to include a queueUrl property that refers to your new queue:

    {
    "name": "s3_provider_rule",
    "workflow": "DiscoverAndQueuePdrs",
    "provider": "s3_provider",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "queueUrl": "<backgroundJobQueue_SQS_URL>" // configure rule to use your queue URL
    }

    After creating/updating the rule, any subsequent invocations of the rule should respect the maximum number of executions when starting workflows from the queue.

    Architecture

    Architecture diagram showing how executions started from a queue are throttled to a maximum concurrent limit

    Execution throttling based on the queue works by manually keeping a count (semaphore) of how many executions are running for the queue at a time. The key operation that prevents the number of executions from exceeding the maximum for the queue is that before starting new executions, the sqs2sfThrottle Lambda attempts to increment the semaphore and responds as follows:

    • If the increment operation is successful, then the count was not at the maximum and an execution is started
    • If the increment operation fails, then the count was already at the maximum so no execution is started

    Final notes

    Limiting the number of concurrent executions for work scheduled via a queue has several consequences worth noting:

    • The number of executions that are running for a given queue will be limited to the maximum for that queue regardless of which workflow(s) are started.
    • If you use the same queue to schedule executions across multiple workflows/rules, then the limit on the total number of executions running concurrently will be applied to all of the executions scheduled across all of those workflows/rules.
    • If you are scheduling the same workflow both via a queue with a maxExecutions value and a queue without a maxExecutions value, only the executions scheduled via the queue with the maxExecutions value will be limited to the maximum.
    - + \ No newline at end of file diff --git a/docs/v15.0.2/data-cookbooks/tracking-files/index.html b/docs/v15.0.2/data-cookbooks/tracking-files/index.html index 6dfce5b6551..808c26e2eb8 100644 --- a/docs/v15.0.2/data-cookbooks/tracking-files/index.html +++ b/docs/v15.0.2/data-cookbooks/tracking-files/index.html @@ -5,7 +5,7 @@ Tracking Ancillary Files | Cumulus Documentation - + @@ -19,7 +19,7 @@ The UMM-G column reflects the RelatedURL's Type derived from the CNM type, whereas the ECHO10 column shows how the CNM type affects the destination element.

    CNM TypeUMM-G RelatedUrl.TypeECHO10 Location
    ancillary'VIEW RELATED INFORMATION'OnlineResource
    data'GET DATA'(HTTPS URL) or 'GET DATA VIA DIRECT ACCESS'(S3 URI)OnlineAccessURL
    browse'GET RELATED VISUALIZATION'AssociatedBrowseImage
    linkage'EXTENDED METADATA'OnlineResource
    metadata'EXTENDED METADATA'OnlineResource
    qa'EXTENDED METADATA'OnlineResource

    Common Use Cases

    This section briefly documents some common use cases and the recommended configuration for the file. The examples shown here are for the DiscoverGranules use case, which allows configuration at the Cumulus dashboard level. The other two cases covered in the ancillary metadata documentation require configuration at the provider notification level (either CNM message or PDR) and are not covered here.

    Configuring browse imagery:

    {
    "bucket": "public",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_[\\d]{1}.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_1.jpg",
    "type": "browse"
    }

    Configuring a documentation entry:

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_README.pdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_README.pdf",
    "type": "metadata"
    }

    Configuring other associated files (use types metadata or qa as appropriate):

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_QA.txt$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_QA.txt",
    "type": "qa"
    }
    - + \ No newline at end of file diff --git a/docs/v15.0.2/deployment/api-gateway-logging/index.html b/docs/v15.0.2/deployment/api-gateway-logging/index.html index e1a862f29e4..db546d15a4e 100644 --- a/docs/v15.0.2/deployment/api-gateway-logging/index.html +++ b/docs/v15.0.2/deployment/api-gateway-logging/index.html @@ -5,13 +5,13 @@ API Gateway Logging | Cumulus Documentation - +
    Version: v15.0.2

    API Gateway Logging

    Enabling API Gateway Logging

    In order to enable distribution API Access and execution logging, configure the TEA deployment by setting log_api_gateway_to_cloudwatch on the thin_egress_app module:

    log_api_gateway_to_cloudwatch = true

    This enables the distribution API to send its logs to the default CloudWatch location: API-Gateway-Execution-Logs_<RESTAPI_ID>/<STAGE>

    Configure Permissions for API Gateway Logging to CloudWatch

    Instructions: Enabling Account Level Logging from API Gateway to CloudWatch

    This is a one time operation that must be performed on each AWS account to allow API Gateway to push logs to CloudWatch.

    1. Create a policy document

      The AmazonAPIGatewayPushToCloudWatchLogs managed policy, with an ARN of arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs, has all the required permissions to enable API Gateway logging to CloudWatch. To grant these permissions to your account, first create an IAM role with apigateway.amazonaws.com as its trusted entity.

      Save this snippet as apigateway-policy.json.

      {
      "Version": "2012-10-17",
      "Statement": [
      {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
      "Service": "apigateway.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
      }
      ]
      }
    2. Create an account role to act as ApiGateway and write to CloudWatchLogs

      NASA users in NGAP: be sure to use your account's permission boundary.

      aws iam create-role \
      --role-name ApiGatewayToCloudWatchLogs \
      [--permissions-boundary <permissionBoundaryArn>] \
      --assume-role-policy-document file://apigateway-policy.json

      Note the ARN of the returned role for the last step.

    3. Attach correct permissions to role

      Next attach the AmazonAPIGatewayPushToCloudWatchLogs policy to the IAM role.

      aws iam attach-role-policy \
      --role-name ApiGatewayToCloudWatchLogs \
      --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs"
    4. Update Account API Gateway settings with correct permissions

      Finally, set the IAM role ARN on the cloudWatchRoleArn property on your API Gateway Account settings.

      aws apigateway update-account \
      --patch-operations op='replace',path='/cloudwatchRoleArn',value='<ApiGatewayToCloudWatchLogs ARN>'

    Configure API Gateway CloudWatch Logs Delivery

    For details about configuring the API Gateway CloudWatch Logs delivery, see Configure Cloudwatch Logs Delivery.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/deployment/choosing_configuring_rds/index.html b/docs/v15.0.2/deployment/choosing_configuring_rds/index.html index 518251d7c9c..38ec1aee5b2 100644 --- a/docs/v15.0.2/deployment/choosing_configuring_rds/index.html +++ b/docs/v15.0.2/deployment/choosing_configuring_rds/index.html @@ -5,7 +5,7 @@ Choosing and Configuration Your RDS Database | Cumulus Documentation - + @@ -36,7 +36,7 @@ using this module to create your RDS cluster, you can configure the autoscaling timeout action, the cluster minimum and maximum capacity, and more as seen in the supported variables for the module.

    Unfortunately, Terraform currently doesn't allow specifying the autoscaling timeout itself, so that value will have to be manually configured in the AWS console or CLI.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/deployment/cloudwatch-logs-delivery/index.html b/docs/v15.0.2/deployment/cloudwatch-logs-delivery/index.html index de2068fa473..26d9f9588a9 100644 --- a/docs/v15.0.2/deployment/cloudwatch-logs-delivery/index.html +++ b/docs/v15.0.2/deployment/cloudwatch-logs-delivery/index.html @@ -5,13 +5,13 @@ Configure Cloudwatch Logs Delivery | Cumulus Documentation - +
    Version: v15.0.2

    Configure Cloudwatch Logs Delivery

    As an optional configuration step, it is possible to deliver CloudWatch logs to a cross-account shared AWS::Logs::Destination. An operator does this by configuring the cumulus module for your deployment as shown below. The value of the log_destination_arn variable is the ARN of a writeable log destination.

    The value can be either an AWS::Logs::Destination or a Kinesis Stream ARN to which your account can write.

    log_destination_arn           = arn:aws:[kinesis|logs]:us-east-1:123456789012:[streamName|destination:logDestinationName]

    Logs Sent

    By default, the following logs will be sent to the destination when one is given.

    • Ingest logs
    • Async Operation logs
    • Thin Egress App API Gateway logs (if configured)

    Additional Logs

    If additional logs are needed, you can configure additional_log_groups_to_elk with the Cloudwatch log groups you want to send to the destination. additional_log_groups_to_elk is a map with the key as a descriptor and the value with the Cloudwatch log group name.

    additional_log_groups_to_elk = {
    "HelloWorldTask" = "/aws/lambda/cumulus-example-HelloWorld"
    "MyCustomTask" = "my-custom-task-log-group"
    }
    - + \ No newline at end of file diff --git a/docs/v15.0.2/deployment/components/index.html b/docs/v15.0.2/deployment/components/index.html index 466895b7241..4c7c57508bd 100644 --- a/docs/v15.0.2/deployment/components/index.html +++ b/docs/v15.0.2/deployment/components/index.html @@ -5,7 +5,7 @@ Component-based Cumulus Deployment | Cumulus Documentation - + @@ -39,7 +39,7 @@ Terraform at the same time.

    With remote state, Terraform writes the state data to a remote data store, which can then be shared between all members of a team.

    The recommended approach for handling remote state with Cumulus is to use the S3 backend. This backend stores state in S3 and uses a DynamoDB table for locking.

    See the deployment documentation for a walk-through of creating resources for your remote state using an S3 backend.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/deployment/create_bucket/index.html b/docs/v15.0.2/deployment/create_bucket/index.html index 76e8c40a011..9bf63d75ee4 100644 --- a/docs/v15.0.2/deployment/create_bucket/index.html +++ b/docs/v15.0.2/deployment/create_bucket/index.html @@ -5,13 +5,13 @@ Creating an S3 Bucket | Cumulus Documentation - +
    Version: v15.0.2

    Creating an S3 Bucket

    Buckets can be created on the command line with AWS CLI or via the web interface on the AWS console.

    When creating a protected bucket (a bucket containing data which will be served through the distribution API), make sure to enable S3 server access logging. See S3 Server Access Logging for more details.

    Command Line

    Using the AWS Command Line Tool create-bucket s3api subcommand:

    $ aws s3api create-bucket \
    --bucket foobar-internal \
    --region us-west-2 \
    --create-bucket-configuration LocationConstraint=us-west-2
    {
    "Location": "/foobar-internal"
    }

    ⚠️ Note: The region and create-bucket-configuration arguments are only necessary if you are creating a bucket outside of the us-east-1 region.

    Please note security settings and other bucket options can be set via the options listed in the s3api documentation.

    Repeat the above step for each bucket to be created.

    Web Interface

    If you prefer to use the AWS web interface instead of the command line, see AWS "Creating a Bucket" documentation.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/deployment/cumulus_distribution/index.html b/docs/v15.0.2/deployment/cumulus_distribution/index.html index 4194bad89e2..6ef2873934d 100644 --- a/docs/v15.0.2/deployment/cumulus_distribution/index.html +++ b/docs/v15.0.2/deployment/cumulus_distribution/index.html @@ -5,14 +5,14 @@ Using the Cumulus Distribution API | Cumulus Documentation - +
    Version: v15.0.2

    Using the Cumulus Distribution API

    The Cumulus Distribution API is a set of endpoints that can be used to enable AWS Cognito authentication when downloading data from S3.

    Configuring a Cumulus Distribution Deployment

    The Cumulus Distribution API is included in the main Cumulus repo. It is available as part of the terraform-aws-cumulus.zip archive in the latest release.

    These steps assume you're using the Cumulus Deployment Template but they can also be used for custom deployments.

    To configure a deployment to use Cumulus Distribution:

    1. Remove or comment the "Thin Egress App Settings" in the Cumulus Template Deploy and enable the "Cumulus Distribution Settings".
    2. Delete or comment the contents of thin_egress_app.tf and the corresponding Thin Egress App outputs in outputs.tf. These are not necessary for a Cumulus Distribution deployment.
    3. Uncomment the Cumulus Distribution outputs in outputs.tf.
    4. Rename cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.example to cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.

    Cognito Application and User Credentials

    The major prerequisite for using the Cumulus Distribution API is to set up Cognito. If operating within NGAP, this should already be done for you. If operating outside of NGAP, you must set up Cognito yourself, which is beyond the scope of this documentation.

    Given that Cognito is set up, in order to be able to download granule files via the Cumulus Distribution API, you must obtain Cognito user credentials, because any attempt to download such files (that will be, or have been, published to the CMR via your Cumulus deployment) will result in a prompt for you to supply Cognito user credentials. To obtain your own user credentials, talk to your product owner or scrum master for additional information. They should either know how to create the credentials, know who can create them for the team, or be the liaison to the Cognito team.

    Further, whoever helps to obtain your Cognito user credentials should also be able to supply you with the values for the following new variables that you must add to your cumulus-tf/terraform.tfvars file:

    • csdap_host_url: The URL of the Cognito service to which your Cumulus deployment will make Cognito API calls during a distribution (download) event
    • csdap_client_id: The client ID for the Cumulus application registered within the Cognito service
    • csdap_client_password: The client password for the Cumulus application registered within the Cognito service

    Although you might have to wait a bit for your Cognito user credentials, the remaining instructions do not depend upon having them, so you may continue with these instructions while waiting for your credentials.

    Cumulus Distribution URL

    Your Cumulus Distribution URL is used by Cumulus to generate download URLs as part of the granule metadata generated and published to the CMR. For example, a granule download URL will be of the form <distribution url>/<protected bucket>/<key> (or <distribution url>/path/to/file, if using a custom bucket map, as explained further below).

    By default, the value of your distribution URL is the URL of your private Cumulus Distribution API Gateway (the API Gateway named <prefix>-distribution, once you deploy the Cumulus Distribution module). Therefore, by default, the generated download URLs are private, and thus inaccessible directly, but there are 2 ways to address this issue (both of which are detailed below): (a) use tunneling (typically in development) or (b) put a CloudFront URL in front of your API Gateway (typically in production, and perhaps UAT and/or SIT).

    In either case, you must first know the default URL (i.e., the URL for the private Cumulus Distribution API Gateway). In order to obtain this default URL, you must first deploy your cumulus-tf module with the new Cumulus Distribution module, and once your initial deployment is complete, one of the Terraform outputs will be cumulus_distribution_api_uri, which is the URL for the private API Gateway.

    You may override this default URL by adding a cumulus_distribution_url variable to your cumulus-tf/terraform.tfvars file and setting it to one of the following values (both are explained below):

    1. The default URL, but with a port added to it, in order to allow you to configure tunneling (typically only in development)
    2. A CloudFront URL placed in front of your Cumulus Distribution API Gateway (typically only for Production, but perhaps also for a UAT or SIT environment)

    The following subsections explain these approaches in turn.

    Using Your Cumulus Distribution API Gateway URL as Your Distribution URL

    Since your Cumulus Distribution API Gateway URL is private, the only way you can use it to confirm that your integration with Cognito is working is by using tunneling (again, generally for development). Here is an outline of the required steps with details provided further below:

    1. Create/import a key pair into your AWS EC2 service (if you haven't already done so)
    2. Add a reference to the name of the key pair to your Terraform variables (we'll set the key_name Terraform variable)
    3. Choose an open local port on your machine (we'll use 9000 in the following example)
    4. Add a reference to the value of your cumulus_distribution_api_uri (mentioned earlier), including your chosen port (we'll set the cumulus_distribution_url Terraform variable)
    5. Redeploy Cumulus
    6. Add an entry to your /etc/hosts file
    7. Add a redirect URI to Cognito via the Cognito API
    8. Install the Session Manager Plugin for the AWS CLI (if you haven't already done so; assuming you have already installed the AWS CLI)
    9. Add a sample file to S3 to test downloading via Cognito

    To create or import an existing key pair, you can use the AWS CLI (see AWS ec2 import-key-pair), or the AWS Console (see Amazon EC2 key pairs and Linux instances).

    Once your key pair is added to AWS, add the following to your cumulus-tf/terraform.tfvars file:

    key_name = "<name>"
    cumulus_distribution_url = "https://<id>.execute-api.<region>.amazonaws.com:<port>/dev/"

    where:

    • <name> is the name of the key pair you just added to AWS
    • <id> and <region> are the corresponding parts from your cumulus_distribution_api_uri output variable
    • <port> is your open local port of choice (9000 is typically a good choice)

    Once you save your variable changes, redeploy your cumulus-tf module.

    While your deployment runs, add the following entry to your /etc/hosts file, replacing <hostname> with the host name of the cumulus_distribution_url Terraform variable you just added above:

    localhost <hostname>

    Next, you'll need to use the Cognito API to add the value of your cumulus_distribution_url Terraform variable as a Cognito redirect URI. To do so, use your favorite tool (e.g., curl, wget, Postman, etc.) to make a BasicAuth request to the Cognito API, using the following details:

    • method: POST
    • base URL: the value of your csdap_host_url Terraform variable
    • path: /authclient/updateRedirectUri
    • username: the value of your csdap_client_id Terraform variable
    • password: the value of your csdap_client_password Terraform variable
    • headers: Content-Type='application/x-www-form-urlencoded'
    • body: redirect_uri=<cumulus_distribution_url>/login

    where <cumulus_distribution_url> is the value of your cumulus_distribution_url Terraform variable. Note the /login path at the end of the redirect_uri value.

    For reference, see the Cognito Authentication Service API.

    Next, install the Session Manager Plugin for the AWS CLI. If running on macOS, and you use Homebrew, you can install it simply as follows:

    brew install --cask session-manager-plugin --no-quarantine

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    At this point, you should be ready to open a tunnel and attempt to download your sample file via your browser, summarized as follows:

    1. Determine your EC2 instance ID
    2. Connect to the NASA VPN
    3. Start an AWS SSM session
    4. Open an SSH tunnel
    5. Use a browser to navigate to your file

    To determine your EC2 instance ID for your Cumulus deployment, run the follow command where <profile> is the name of the appropriate AWS profile to use, and <prefix> is the value of your prefix Terraform variable:

    aws --profile <profile> ec2 describe-instances --filters Name=tag:Deployment,Values=<prefix> Name=instance-state-name,Values=running --query "Reservations[0].Instances[].InstanceId" --output text

    ⚠️ IMPORTANT: Before proceeding with the remaining steps, make sure you're connected to the NASA VPN.

    Use the value output from the command above in place of <id> in the following command, which will start an SSM session:

    aws ssm start-session --target <id> --document-name AWS-StartPortForwardingSession --parameters portNumber=22,localPortNumber=6000

    If successful, you should see output similar to the following:

    Starting session with SessionId: NGAPShApplicationDeveloper-***
    Port 6000 opened for sessionId NGAPShApplicationDeveloper-***.
    Waiting for connections...

    In another terminal window, open a tunnel with port forwarding using your chosen port from above (e.g., 9000):

    ssh -4 -p 6000 -N -L <port>:<api-gateway-host>:443 ec2-user@127.0.0.1

    where:

    • <port> is the open local port you chose earlier (e.g., 9000)
    • <api-gateway-host> is the hostname of your private API Gateway (i.e., the host portion of the URL you used as the value of your cumulus_distribution_url Terraform variable above)

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3 above.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, and then next enter a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    Once you're finished testing, clean up as follows:

    1. Stop your SSH tunnel (enter Ctrl-C)
    2. Stop your AWS SSM session (enter Ctrl-C)
    3. If you like, disconnect from the NASA VPN

    While this is a relatively lengthy process, things are much easier when using CloudFront, such as in Production (OPS), SIT, or UAT, as explained next.

    Using a CloudFront URL as Your Distribution URL

    In Production (OPS), and perhaps in other environments, such as UAT and SIT, you'll need to provide a publicly accessible URL for users to use for downloading (distributing) granule files.

    This is generally done by placing a CloudFront URL in front of your private Cumulus Distribution API Gateway. In order to create such a CloudFront URL, contact the person who helped you obtain your Cognito credentials, and request a CloudFront URL with the following details:

    • The private, backing URL, which is the value of your cumulus_distribution_api_uri Terraform output value
    • A request to add the AWS account's VPC to the whitelist

    Once this request is completed, and you obtain the new CloudFront URL, override your default distribution URL with the CloudFront URL by adding the following to your cumulus-tf/terraform.tfvars file:

    cumulus_distribution_url = <cloudfront_url>

    In addition, add a Cognito redirect URI, as detailed in the previous section. Note that in this case, the value you'll use for redirect_uri is <cloudfront_url>/login since the value of your cumulus_distribution_url is now your CloudFront URL.

    At this point, it is assumed that you have added the appropriate values for this environment for the variables described at the top (csdap_host_url, csdap_client_id, and csdap_client_password).

    Redeploy Cumulus with your new/updated Terraform variables.

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    S3 Bucket Mapping

    An S3 Bucket map allows users to abstract bucket names. If the bucket names change at any point, only the bucket map would need to be updated instead of every S3 link.

    The Cumulus Distribution API uses a bucket_map.yaml or bucket_map.yaml.tmpl file to determine which buckets to serve. See the examples.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple JSON mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    ⚠️ Note: Cumulus only supports a one-to-one mapping of bucket -> Cumulus Distribution path for 'distribution' buckets. Also, the bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Switching from the Thin Egress App to Cumulus Distribution

    If you have previously deployed the Thin Egress App (TEA) as your distribution app, you can switch to Cumulus Distribution by following the steps above.

    Note, however, that the cumulus_distribution module will generate a bucket map cache and overwrite any existing bucket map caches created by TEA.

    There will also be downtime while your API gateway is updated.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/deployment/index.html b/docs/v15.0.2/deployment/index.html index 5fb38ad5d77..667fdfe926c 100644 --- a/docs/v15.0.2/deployment/index.html +++ b/docs/v15.0.2/deployment/index.html @@ -5,7 +5,7 @@ How to Deploy Cumulus | Cumulus Documentation - + @@ -19,7 +19,7 @@ for deployment's EC2 instances and allows you to connect to them via SSH/SSM.

    Consider the sizing of your Cumulus instance when configuring your variables.

    Choose a Distribution API

    Cumulus can be configured to use either the Thin Egress App (TEA) or the Cumulus Distribution API. The default selection is the Thin Egress App if you're using the Deployment Template.

    ⚠️ IMPORTANT: If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    Configure the Thin Egress App

    TEA can be used for Cumulus distribution and is the default selection. It allows authentication using Earthdata Login. Follow the steps in the TEA documentation to configure distribution in your cumulus-tf deployment.

    Configure the Cumulus Distribution API (Optional)

    If you would prefer to use the Cumulus Distribution API, which supports AWS Cognito authentication, follow these steps to configure distribution in your cumulus-tf deployment.

    Initialize Terraform

    Follow the above instructions to initialize Terraform using terraform init3.

    Deploy

    Run terraform apply to deploy the resources. Type yes when prompted to confirm that you want to create the resources. Assuming the operation is successful, you should see output like this:

    Apply complete! Resources: 292 added, 0 changed, 0 destroyed.

    Outputs:

    archive_api_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/token
    archive_api_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/
    distribution_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/login
    distribution_url = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/

    ⚠️ Note: Be sure to copy the redirect URLs because you will need them to update your Earthdata application.

    Update Earthdata Application

    Add the two redirect URLs to your EarthData login application by doing the following:

    1. Login to URS
    2. Under My Applications -> Application Administration -> use the edit icon of your application
    3. Under Manage -> redirect URIs, add the Archive API url returned from the stack deployment
      • e.g. archive_api_redirect_uri = https://<czbbkscuy6>.execute-api.us-east-1.amazonaws.com/dev/token
    4. Also add the Distribution url
      • e.g. distribution_redirect_uri = https://<kido2r7kji>.execute-api.us-east-1.amazonaws.com/dev/login1
    5. You may delete the placeholder url you used to create the application

    If you've lost track of the needed redirect URIs, they can be located on the API Gateway. Once there, select <prefix>-archive and/or <prefix>-thin-egress-app-EgressGateway, Dashboard and utilizing the base URL at the top of the page that is accompanied by the text Invoke this API at:. Make sure to append /token for the archive URL and /login to the thin egress app URL.


    Deploy Cumulus Dashboard

    Dashboard Requirements

    Please note that the requirements are similar to the Cumulus stack deployment requirements. The installation instructions below include a step that will install/use the required node version referenced in the .nvmrc file in the Dashboard repository.

    Prepare AWS

    Create S3 Bucket for Dashboard:

    • Create it, e.g. <prefix>-dashboard. Use the command line or console as you did when preparing AWS configuration.
    • Configure the bucket to host a website:
      • AWS S3 console: Select <prefix>-dashboard bucket then, "Properties" -> "Static Website Hosting", point to index.html
      • CLI: aws s3 website s3://<prefix>-dashboard --index-document index.html
    • The bucket's url will be http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or you can find it on the AWS console via "Properties" -> "Static website hosting" -> "Endpoint"
    • Ensure the bucket's access permissions allow your deployment user access to write to the bucket

    Install Dashboard

    To install the Cumulus Dashboard, clone the repository into the root deploy directory and install dependencies with npm install:

      git clone https://github.com/nasa/cumulus-dashboard
    cd cumulus-dashboard
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Dashboard Versioning

    By default, the master branch will be used for Dashboard deployments. The master branch of the repository contains the most recent stable release of the Cumulus Dashboard.

    If you want to test unreleased changes to the Dashboard, use the develop branch.

    Each release/version of the Dashboard will have a tag in the Dashboard repo. Release/version numbers will use semantic versioning (major/minor/patch).

    To checkout and install a specific version of the Dashboard:

      git fetch --tags
    git checkout <version-number> # e.g. v1.2.0
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Building the Dashboard

    ⚠️ Note: These environment variables are available during the build: APIROOT, DAAC_NAME, STAGE, HIDE_PDR. Any of these can be set on the command line to override the values contained in config.js when running the build below.

    To configure your dashboard for deployment, set the APIROOT environment variable to your app's API root.2

    Build your dashboard from the Cumulus Dashboard repository root directory, cumulus-dashboard:

      APIROOT=<your_api_root> npm run build

    Dashboard Deployment

    Deploy your dashboard to S3 bucket from the cumulus-dashboard directory:

    Using AWS CLI:

      aws s3 sync dist s3://<prefix>-dashboard --acl public-read

    From the S3 Console:

    • Open the <prefix>-dashboard bucket, click 'upload'. Add the contents of the 'dist' subdirectory to the upload. Then select 'Next'. On the permissions window allow the public to view. Select 'Upload'.

    You should be able to visit the Dashboard website at http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or find the url <prefix>-dashboard -> "Properties" -> "Static website hosting" -> "Endpoint" and log in with a user that you had previously configured for access.


    Cumulus Instance Sizing

    The Cumulus deployment default sizing for Elasticsearch instances, EC2 instances, and Autoscaling Groups are small and designed for testing and cost savings. The default settings are likely not suitable for production workloads. Sizing is highly individual and dependent on expected load and archive size.

    Please be cognizant of costs as any change in size will affect your AWS bill. AWS provides a pricing calculator for estimating costs.

    Elasticsearch

    The mappings file contains all of the data types that will be indexed into Elasticsearch. Elasticsearch sizing is tied to your archive size, including your collections, granules, and workflow executions that will be stored.

    AWS provides documentation on calculating and configuring for sizing.

    In addition to size you'll want to consider the number of nodes which determine how the system reacts in the event of a failure.

    Configuration can be done in the data persistence module in elasticsearch_config and the cumulus module in es_index_shards.

    If you make changes to your Elasticsearch configuration you will need to reindex for those changes to take effect.

    EC2 Instances and Autoscaling Groups

    EC2 instances are used for long-running operations (i.e. generating a reconciliation report) and long-running workflow tasks. Configuration for your ECS cluster is achieved via Cumulus deployment variables.

    When configuring your ECS cluster consider:

    • The EC2 instance type and EBS volume size needed to accommodate your workloads. Configured as ecs_cluster_instance_type and ecs_cluster_instance_docker_volume_size.
    • The minimum and desired number of instances on hand to accommodate your workloads. Configured as ecs_cluster_min_size and ecs_cluster_desired_size.
    • The maximum number of instances you will need and are willing to pay for to accommodate your heaviest workloads. Configured as ecs_cluster_max_size.
    • Your autoscaling parameters: ecs_cluster_scale_in_adjustment_percent, ecs_cluster_scale_out_adjustment_percent, ecs_cluster_scale_in_threshold_percent, and ecs_cluster_scale_out_threshold_percent.

    Footnotes


    1. Run terraform init if:

      • This is the first time deploying the module
      • You have added any additional child modules, including Cumulus components
      • You have updated the source for any of the child modules

    2. To add another redirect URIs to your application. On Earthdata home page, select "My Applications". Scroll down to "Application Administration" and use the edit icon for your application. Then Manage -> Redirect URIs.

    3. The API root can be found a number of ways. The easiest is to note it in the output of the app deployment step. But you can also find it from the AWS console -> Amazon API Gateway -> APIs -> <prefix>-archive -> Dashboard, and reading the URL at the top after "Invoke this API at"

    - + \ No newline at end of file diff --git a/docs/v15.0.2/deployment/postgres_database_deployment/index.html b/docs/v15.0.2/deployment/postgres_database_deployment/index.html index d86edced3f7..c40b1895a49 100644 --- a/docs/v15.0.2/deployment/postgres_database_deployment/index.html +++ b/docs/v15.0.2/deployment/postgres_database_deployment/index.html @@ -5,7 +5,7 @@ PostgreSQL Database Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ cumulus-rds-tf that will deploy an AWS RDS Aurora Serverless PostgreSQL 11 compatible database cluster, and optionally provision a single deployment database with credentialed secrets for use with Cumulus.

    We have provided an example terraform deployment using this module in the Cumulus template-deploy repository on github.

    Use of this example involves:

    • Creating/configuring a Terraform module directory
    • Using Terraform to deploy resources to AWS

    Requirements

    Configuration/installation of this module requires the following:

    • Terraform
    • git
    • A VPC configured for use with Cumulus Core. This should match the subnets you provide when Deploying Cumulus to allow Core's lambdas to properly access the database.
    • At least two subnets across multiple AZs. These should match the subnets you provide as configuration when Deploying Cumulus, and should be within the same VPC.

    Needed Git Repositories

    Assumptions

    OS/Environment

    The instructions in this module require Linux/MacOS. While deployment via Windows is possible, it is unsupported.

    Terraform

    This document assumes knowledge of Terraform. If you are not comfortable working with Terraform, the following links should bring you up to speed:

    For Cumulus specific instructions on installation of Terraform, refer to the main Cumulus Installation Documentation

    Aurora/RDS

    This document also assumes some basic familiarity with PostgreSQL databases and Amazon Aurora/RDS. If you're unfamiliar consider perusing the AWS docs and the Aurora Serverless V1 docs.

    Prepare Deployment Repository

    If you already are working with an existing repository that has a configured rds-cluster-tf deployment for the version of Cumulus you intend to deploy or update, or just need to configure this module for your repository, skip to Prepare AWS Configuration.

    Clone the cumulus-template-deploy repo and name appropriately for your organization:

      git clone https://github.com/nasa/cumulus-template-deploy <repository-name>

    We will return to configuring this repo and using it for deployment below.

    Optional: Create a New Repository

    Create a new repository on Github so that you can add your workflows and other modules to source control:

      git remote set-url origin https://github.com/<org>/<repository-name>
    git push origin master

    You can then add/commit changes as needed.

    ⚠️ Note: If you are pushing your deployment code to a git repo, make sure to add terraform.tf and terraform.tfvars to .gitignore, as these files will contain sensitive data related to your AWS account.


    Prepare AWS Configuration

    To deploy this module, you need to make sure that you have the following steps from the Cumulus deployment instructions in similar fashion for this module:

    --

    Configure and Deploy the Module

    When configuring this module, please keep in mind that unlike Cumulus deployment, this module should be deployed once to create the database cluster and only thereafter to make changes to that configuration/upgrade/etc.

    Tip: This module does not need to be re-deployed for each Core update.

    These steps should be executed in the rds-cluster-tf directory of the template deploy repo that you previously cloned. Run the following to copy the example files:

    cd rds-cluster-tf/
    cp terraform.tf.example terraform.tf
    cp terraform.tfvars.example terraform.tfvars

    In terraform.tf, configure the remote state settings by substituting the appropriate values for:

    • bucket
    • dynamodb_table
    • PREFIX (whatever prefix you've chosen for your deployment)

    Fill in the appropriate values in terraform.tfvars. See the rds-cluster-tf module variable definitions for more detail on all of the configuration options. A few notable configuration options are documented in the next section.

    Configuration Options

    • deletion_protection -- defaults to true. Set it to false if you want to be able to delete your cluster with a terraform destroy without manually updating the cluster.
    • db_admin_username -- cluster database administration username. Defaults to postgres.
    • db_admin_password -- required variable that specifies the admin user password for the cluster. To randomize this on each deployment, consider using a random_string resource as input.
    • region -- defaults to us-east-1.
    • subnets -- requires at least 2 across different AZs. For use with Cumulus, these AZs should match the values you configure for your lambda_subnet_ids.
    • max_capacity -- the max ACUs the cluster is allowed to use. Carefully consider cost/performance concerns when setting this value.
    • min_capacity -- the minimum ACUs the cluster will scale to
    • provision_user_database -- Optional flag to allow module to provision a user database in addition to creating the cluster. Described in the next section.

    Provision User and User Database

    If you wish for the module to provision a PostgreSQL database on your new cluster and provide a secret for access in the module output, in addition to managing the cluster itself, the following configuration keys are required:

    • provision_user_database -- must be set to true. This configures the module to deploy a lambda that will create the user database, and update the provided configuration on deploy.
    • permissions_boundary_arn -- the permissions boundary to use in creating the roles for access the provisioning lambda will need. This should in most use cases be the same one used for Cumulus Core deployment.
    • rds_user_password -- the value to set the user password to.
    • prefix -- this value will be used to set a unique identifier for the ProvisionDatabase lambda, as well as name the provisioned user/database.

    Once configured, the module will deploy the lambda and run it on each provision thus creating the configured database (if it does not exist), updating the user password (if that value has been changed), and updating the output user database secret.

    Setting provision_user_database to false after provisioning will not result in removal of the configured database, as the lambda is non-destructive as configured in this module.

    ⚠️ Note: This functionality is limited in that it will only provision a single database/user and configure a basic database, and should not be used in scenarios where more complex configuration is required.

    Initialize Terraform

    Run terraform init

    You should see a similar output:

    * provider.aws: version = "~> 2.32"

    Terraform has been successfully initialized!

    Deploy

    Run terraform apply to deploy the resources.

    ⚠️ Caution: If re-applying this module, variables (e.g. engine_version, snapshot_identifier ) that force a recreation of the database cluster may result in data loss if deletion protection is disabled. Examine the changeset carefully for resources that will be re-created/destroyed before applying.

    Review the changeset, and assuming it looks correct, type yes when prompted to confirm that you want to create all of the resources.

    Assuming the operation is successful, you should see output similar to the following (this example omits the creation of a user database/lambdas/security groups):

    terraform apply

    An execution plan has been generated and is shown below.
    Resource actions are indicated with the following symbols:
    + create

    Terraform will perform the following actions:

    # module.rds_cluster.aws_db_subnet_group.default will be created
    + resource "aws_db_subnet_group" "default" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + subnet_ids = [
    + "subnet-xxxxxxxxx",
    + "subnet-xxxxxxxxx",
    ]
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    }

    # module.rds_cluster.aws_rds_cluster.cumulus will be created
    + resource "aws_rds_cluster" "cumulus" {
    + apply_immediately = true
    + arn = (known after apply)
    + availability_zones = (known after apply)
    + backup_retention_period = 1
    + cluster_identifier = "xxxxxxxxx"
    + cluster_identifier_prefix = (known after apply)
    + cluster_members = (known after apply)
    + cluster_resource_id = (known after apply)
    + copy_tags_to_snapshot = false
    + database_name = "xxxxxxxxx"
    + db_cluster_parameter_group_name = (known after apply)
    + db_subnet_group_name = (known after apply)
    + deletion_protection = true
    + enable_http_endpoint = true
    + endpoint = (known after apply)
    + engine = "aurora-postgresql"
    + engine_mode = "serverless"
    + engine_version = "10.12"
    + final_snapshot_identifier = "xxxxxxxxx"
    + hosted_zone_id = (known after apply)
    + id = (known after apply)
    + kms_key_id = (known after apply)
    + master_password = (sensitive value)
    + master_username = "xxxxxxxxx"
    + port = (known after apply)
    + preferred_backup_window = "07:00-09:00"
    + preferred_maintenance_window = (known after apply)
    + reader_endpoint = (known after apply)
    + skip_final_snapshot = false
    + storage_encrypted = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_security_group_ids = (known after apply)

    + scaling_configuration {
    + auto_pause = true
    + max_capacity = 4
    + min_capacity = 2
    + seconds_until_auto_pause = 300
    + timeout_action = "RollbackCapacityChange"
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret.rds_login will be created
    + resource "aws_secretsmanager_secret" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + policy = (known after apply)
    + recovery_window_in_days = 30
    + rotation_enabled = (known after apply)
    + rotation_lambda_arn = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }

    + rotation_rules {
    + automatically_after_days = (known after apply)
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret_version.rds_login will be created
    + resource "aws_secretsmanager_secret_version" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + secret_id = (known after apply)
    + secret_string = (sensitive value)
    + version_id = (known after apply)
    + version_stages = (known after apply)
    }

    # module.rds_cluster.aws_security_group.rds_cluster_access will be created
    + resource "aws_security_group" "rds_cluster_access" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + egress = (known after apply)
    + id = (known after apply)
    + ingress = (known after apply)
    + name = (known after apply)
    + name_prefix = "cumulus_rds_cluster_access_ingress"
    + owner_id = (known after apply)
    + revoke_rules_on_delete = false
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_id = "vpc-xxxxxxxxx"
    }

    # module.rds_cluster.aws_security_group_rule.rds_security_group_allow_PostgreSQL will be created
    + resource "aws_security_group_rule" "rds_security_group_allow_postgres" {
    + from_port = 5432
    + id = (known after apply)
    + protocol = "tcp"
    + security_group_id = (known after apply)
    + self = true
    + source_security_group_id = (known after apply)
    + to_port = 5432
    + type = "ingress"
    }

    Plan: 6 to add, 0 to change, 0 to destroy.

    Do you want to perform these actions?
    Terraform will perform the actions described above.
    Only 'yes' will be accepted to approve.

    Enter a value: yes

    module.rds_cluster.aws_db_subnet_group.default: Creating...
    module.rds_cluster.aws_security_group.rds_cluster_access: Creating...
    module.rds_cluster.aws_secretsmanager_secret.rds_login: Creating...

    Then, after the resources are created:

    Apply complete! Resources: X added, 0 changed, 0 destroyed.
    Releasing state lock. This may take a few moments...

    Outputs:

    admin_db_login_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxxxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmdR
    admin_db_login_secret_version = xxxxxxxxx
    rds_endpoint = xxxxxxxxx.us-east-1.rds.amazonaws.com
    security_group_id = xxxxxxxxx
    user_credentials_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmpXA

    Note the output values for admin_db_login_secret_arn (and optionally user_credentials_secret_arn) as these provide the AWS Secrets Manager secrets required to access the database as the administrative user and, optionally, the user database credentials Cumulus requires as well.

    The content of each of these secrets are in the form:

    {
    "database": "postgres",
    "dbClusterIdentifier": "clusterName",
    "engine": "postgres",
    "host": "xxx",
    "password": "defaultPassword",
    "port": 5432,
    "username": "xxx"
    }
    • database -- the PostgreSQL database used by the configured user
    • dbClusterIdentifier -- the value set by the cluster_identifier variable in the terraform module
    • engine -- the Aurora/RDS database engine
    • host -- the RDS service host for the database in the form (dbClusterIdentifier)-(AWS ID string).(region).rds.amazonaws.com
    • password -- the database password
    • username -- the account username
    • port -- The database connection port, should always be 5432

    Next Steps

    The database cluster has been created/updated! From here you can continue to add additional user accounts, databases, and other database configuration.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/deployment/share-s3-access-logs/index.html b/docs/v15.0.2/deployment/share-s3-access-logs/index.html index e4fd62efd40..8cba7c51c98 100644 --- a/docs/v15.0.2/deployment/share-s3-access-logs/index.html +++ b/docs/v15.0.2/deployment/share-s3-access-logs/index.html @@ -5,13 +5,13 @@ Share S3 Access Logs | Cumulus Documentation - +
    Version: v15.0.2

    Share S3 Access Logs

    It is possible through Cumulus to share S3 access logs across multiple S3 packages using the S3 replicator package.

    S3 Replicator

    The S3 Replicator is a Node.js package that contains a simple Lambda function, associated permissions, and the Terraform instructions to replicate create-object events from one S3 bucket to another.

    First ensure that you have enabled S3 Server Access Logging.

    Next configure your config.tfvars as described in the s3-replicator/README.md to correspond to your deployment. The source_bucket and source_prefix are determined by how you enabled the S3 Server Access Logging.

    In order to deploy the s3-replicator with cumulus you will need to add the module to your terraform main.tf definition as the example below:

    module "s3-replicator" {
    source = "<path to s3-replicator.zip>"
    prefix = var.prefix
    vpc_id = var.vpc_id
    subnet_ids = var.subnet_ids
    permissions_boundary = var.permissions_boundary_arn
    source_bucket = var.s3_replicator_config.source_bucket
    source_prefix = var.s3_replicator_config.source_prefix
    target_bucket = var.s3_replicator_config.target_bucket
    target_prefix = var.s3_replicator_config.target_prefix
    }

    The Terraform source package can be found on the Cumulus GitHub Release page under the asset tab terraform-aws-cumulus-s3-replicator.zip.

    ESDIS Metrics

    In the NGAP environment, the ESDIS Metrics team has set up an ELK stack to process logs from Cumulus instances. To use this system, you must deliver any S3 Server Access logs that Cumulus creates.

    Configure the S3 Replicator as described above using the target_bucket and target_prefix provided by the Metrics team.

    The Metrics team has taken care of setting up Logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/deployment/terraform-best-practices/index.html b/docs/v15.0.2/deployment/terraform-best-practices/index.html index ed9a293bc61..4c0d20b4815 100644 --- a/docs/v15.0.2/deployment/terraform-best-practices/index.html +++ b/docs/v15.0.2/deployment/terraform-best-practices/index.html @@ -5,7 +5,7 @@ Terraform Best Practices | Cumulus Documentation - + @@ -88,7 +88,7 @@ AWS CLI command, replacing PREFIX with your deployment prefix name:

    aws resourcegroupstaggingapi get-resources \
    --query "ResourceTagMappingList[].ResourceARN" \
    --tag-filters Key=Deployment,Values=PREFIX

    Ideally, the output should be an empty list, but if it is not, then you may need to manually delete the listed resources.

    Configuring the Cumulus deployment: link Restoring a previous version: link

    - + \ No newline at end of file diff --git a/docs/v15.0.2/deployment/thin_egress_app/index.html b/docs/v15.0.2/deployment/thin_egress_app/index.html index 9a3a218c481..4ffb7c85cb2 100644 --- a/docs/v15.0.2/deployment/thin_egress_app/index.html +++ b/docs/v15.0.2/deployment/thin_egress_app/index.html @@ -5,7 +5,7 @@ Using the Thin Egress App for Cumulus Distribution | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v15.0.2

    Using the Thin Egress App for Cumulus Distribution

    The Thin Egress App (TEA) is an app running in Lambda that allows retrieving data from S3 using temporary links and provides URS integration.

    Configuring a TEA Deployment

    TEA is deployed using Terraform modules. Refer to these instructions for guidance on how to integrate new components with your deployment.

    The cumulus-template-deploy repository cumulus-tf/main.tf contains a thin_egress_app for distribution.

    The TEA module provides these instructions showing how to add it to your deployment and the following are instructions to configure the thin_egress_app module in your Cumulus deployment.

    Create a Secret for Signing Thin Egress App JWTs

    The Thin Egress App uses JSON Web Tokens (JWTs) internally to authenticate requests and requires a secret stored in AWS Secrets Manager containing SSH keys that are used to sign the JWTs.

    See the Thin Egress App documentation on how to create this secret with the correct values. It will be used later to set the thin_egress_jwt_secret_name variable when deploying the Cumulus module.

    Bucket_map.yaml

    The Thin Egress App uses a bucket_map.yaml file to determine which buckets to serve. Documentation of the file format is available here.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple JSON mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    ⚠️ Note: Cumulus only supports a one-to-one mapping of bucket->TEA path for 'distribution' buckets.

    Optionally Configure a Custom Bucket Map

    A simple config would look something like this:

    bucket_map.yaml
    MAP:
    my-protected: my-protected
    my-public: my-public

    PUBLIC_BUCKETS:
    - my-public

    ⚠️ Note: Your custom bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Optionally Configure Shared Variables

    The cumulus module deploys certain components that interact with TEA. As a result, the cumulus module requires that if you are specifying a value for the stage_name variable to the TEA module, you must use the same value for the tea_api_gateway_stage variable to the cumulus module.

    One way to keep these variable values in sync across the modules is to use Terraform local values to define values to use for the variables for both modules. This approach is shown in the Cumulus Core example deployment code.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/deployment/upgrade-readme/index.html b/docs/v15.0.2/deployment/upgrade-readme/index.html index 35c30c12596..b7caa89f487 100644 --- a/docs/v15.0.2/deployment/upgrade-readme/index.html +++ b/docs/v15.0.2/deployment/upgrade-readme/index.html @@ -5,7 +5,7 @@ Upgrading Cumulus | Cumulus Documentation - + @@ -15,7 +15,7 @@ deployment functions correctly. Please refer to some recommended smoke tests given above, and consider additional tests appropriate for your particular deployment and environment.

    Update Cumulus Dashboard

    If there are breaking (or otherwise significant) changes to the Cumulus API, you should also upgrade your Cumulus Dashboard deployment to use the version of the Cumulus API matching the version of Cumulus to which you are migrating.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/development/forked-pr/index.html b/docs/v15.0.2/development/forked-pr/index.html index 32f3db1a112..63befc97e78 100644 --- a/docs/v15.0.2/development/forked-pr/index.html +++ b/docs/v15.0.2/development/forked-pr/index.html @@ -5,13 +5,13 @@ Issuing PR From Forked Repos | Cumulus Documentation - +
    Version: v15.0.2

    Issuing PR From Forked Repos

    Fork the Repo

    • Fork the Cumulus repo
    • Create a new branch from the branch you'd like to contribute to
    • If an issue does't already exist, submit one (see above)

    Create a Pull Request

    Reviewing PRs from Forked Repos

    Upon submission of a pull request, the Cumulus development team will review the code.

    Once the code passes an initial review, the team will run the CI tests against the proposed update.

    The request will then either be merged, declined, or an adjustment to the code will be requested via the issue opened with the original PR request.

    PRs from forked repos cannot directly merged to master. Cumulus reviews must follow the following steps before completing the review process:

    1. Create a new branch:

        git checkout -b from-<name-of-the-branch> master
    2. Push the new branch to GitHub

    3. Change the destination of the forked PR to the new branch that was just pushed

      Screenshot of Github interface showing how to change the base branch of a pull request

    4. After code review and approval, merge the forked PR to the new branch.

    5. Create a PR for the new branch to master.

    6. If the CI tests pass, merge the new branch to master and close the issue. If the CI tests do not pass, request an amended PR from the original author/ or resolve failures as appropriate.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/development/integration-tests/index.html b/docs/v15.0.2/development/integration-tests/index.html index 2d50c8c92b5..0ec070e4bdb 100644 --- a/docs/v15.0.2/development/integration-tests/index.html +++ b/docs/v15.0.2/development/integration-tests/index.html @@ -5,7 +5,7 @@ Integration Tests | Cumulus Documentation - + @@ -19,7 +19,7 @@ in the commit message.

    If you create a new stack and want to be able to run integration tests against it in CI, you will need to add it to bamboo/select-stack.js.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/development/quality-and-coverage/index.html b/docs/v15.0.2/development/quality-and-coverage/index.html index 8ebd2def25e..cb24e1108dd 100644 --- a/docs/v15.0.2/development/quality-and-coverage/index.html +++ b/docs/v15.0.2/development/quality-and-coverage/index.html @@ -5,7 +5,7 @@ Code Coverage and Quality | Cumulus Documentation - + @@ -23,7 +23,7 @@ here.

    To run linting on the markdown files, run npm run lint-md.

    Audit

    This project uses audit-ci to run a security audit on the package dependency tree. This must pass prior to merge. The configured rules for audit-ci can be found here.

    To execute an audit, run npm run audit.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/development/release/index.html b/docs/v15.0.2/development/release/index.html index dafac3f80fe..a9424c02330 100644 --- a/docs/v15.0.2/development/release/index.html +++ b/docs/v15.0.2/development/release/index.html @@ -5,7 +5,7 @@ Versioning and Releases | Cumulus Documentation - + @@ -24,7 +24,7 @@ this is a backport and patch release on the 13.3.x series of releases. Updates that are included in the future will have a corresponding CHANGELOG entry in future releases..

    Troubleshooting

    Delete and regenerate the tag

    To delete a published tag to re-tag, follow these steps:

      git tag -d vMAJOR.MINOR.PATCH
    git push -d origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -d v9.1.0
    git push -d origin v9.1.0
    - + \ No newline at end of file diff --git a/docs/v15.0.2/docs-how-to/index.html b/docs/v15.0.2/docs-how-to/index.html index 7295e67d96d..9454951b0d0 100644 --- a/docs/v15.0.2/docs-how-to/index.html +++ b/docs/v15.0.2/docs-how-to/index.html @@ -5,7 +5,7 @@ Cumulus Documentation: How To's | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v15.0.2

    Cumulus Documentation: How To's

    Cumulus Docs Installation

    Run a Local Server

    Environment variables DOCSEARCH_APP_ID, DOCSEARCH_API_KEY and DOCSEARCH_INDEX_NAME must be set for search to work. At the moment, search is only truly functional on prod because that is the only website we have registered to be indexed with DocSearch (see below on search).

    git clone git@github.com:nasa/cumulus
    cd cumulus
    npm run docs-install
    npm run docs-serve
    note

    docs-build will build the documents into website/build. docs-clear will clear the documents.

    caution

    Fix any broken links reported by Docusaurus if you see the following messages during build.

    [INFO] Docusaurus found broken links!

    Exhaustive list of all broken links found:

    Cumulus Documentation

    Our project documentation is hosted on GitHub Pages. The resources published to this website are housed in docs/ directory at the top of the Cumulus repository. Those resources primarily consist of markdown files and images.

    We use the open-source static website generator Docusaurus to build html files from our markdown documentation, add some organization and navigation, and provide some other niceties in the final website (search, easy templating, etc.).

    Add a New Page and Sidebars

    Adding a new page should be as simple as writing some documentation in markdown, placing it under the correct directory in the docs/ folder and adding some configuration values wrapped by --- at the top of the file. There are many files that already have this header which can be used as reference.

    ---
    id: doc-unique-id # unique id for this document. This must be unique across ALL documentation under docs/
    title: Title Of Doc # Whatever title you feel like adding. This will show up as the index to this page on the sidebar.
    hide_title: false
    ---

    Note: To have the new page show up in a sidebar the designated id must be added to a sidebar in the website/sidebars.js file. Docusaurus has an in depth explanation of sidebars here.

    Versioning Docs

    We lean heavily on Docusaurus for versioning. Their suggestions and walk-through can be found here. Docusaurus v2 uses snapshot approach for documentation versioning. Every versioned docs does not depends on other version. It is worth noting that we would like the Documentation versions to match up directly with release versions. However, a new versioned docs can take up a lot of repo space and require maintenance, we suggest to update existing versioned docs for minor releases when there are no significant functionality changes. Cumulus versioning is explained in the Versioning Docs.

    Search on our documentation site is taken care of by DocSearch. We have been provided with an apiId, apiKey and an indexName by DocSearch that we include in our website/docusaurus.config.js file. The rest, indexing and actual searching, we leave to DocSearch. Our builds expect environment variables for these values to exist - DOCSEARCH_APP_ID, DOCSEARCH_API_KEY and DOCSEARCH_NAME_INDEX.

    Add a new task

    The tasks list in docs/tasks.md is generated from the list of task package in the task folder. Do not edit the docs/tasks.md file directly.

    Read more about adding a new task.

    Editing the tasks.md header or template

    Look at the bin/build-tasks-doc.js and bin/tasks-header.md files to edit the output of the tasks build script.

    Editing diagrams

    For some diagrams included in the documentation, the raw source is included in the docs/assets/raw directory to allow for easy updating in the future:

    • assets/interfaces.svg -> assets/raw/interfaces.drawio (generated using draw.io)

    Deployment

    The master branch is automatically built and deployed to gh-pages branch. The gh-pages branch is served by Github Pages. Do not make edits to the gh-pages branch.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/external-contributions/index.html b/docs/v15.0.2/external-contributions/index.html index 336f8190467..fed23ce44be 100644 --- a/docs/v15.0.2/external-contributions/index.html +++ b/docs/v15.0.2/external-contributions/index.html @@ -5,13 +5,13 @@ External Contributions | Cumulus Documentation - +
    Version: v15.0.2

    External Contributions

    Contributions to Cumulus may be made in the form of PRs to the repositories directly or through externally developed tasks and components. Cumulus is designed as an ecosystem that leverages Terraform deployments and AWS Step Functions to easily integrate external components.

    This list may not be exhaustive and represents components that are open source, owned externally, and that have been tested with the Cumulus system. For more information and contributing guidelines, visit the respective GitHub repositories.

    Distribution

    The ASF Thin Egress App is used by Cumulus for distribution. TEA can be deployed with Cumulus or as part of other applications to distribute data.

    Operational Cloud Recovery Archive (ORCA)

    ORCA can be deployed with Cumulus to provide a customizable baseline for creating and managing operational backups.

    Workflow Tasks

    CNM

    PO.DAAC provides two workflow tasks to be used with the Cloud Notification Mechanism (CNM) Schema: CNM to Granule and CNM Response.

    See the CNM workflow data cookbook for an example of how these can be used in a Cumulus ingest workflow.

    DMR++ Generation

    GHRC has provided a DMR++ Generation wokrflow task. This task is meant to be used in conjunction with Cumulus' Hyrax Metadata Updates workflow task.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/faqs/index.html b/docs/v15.0.2/faqs/index.html index 5e9f0a4fdb5..4962cf4b7e1 100644 --- a/docs/v15.0.2/faqs/index.html +++ b/docs/v15.0.2/faqs/index.html @@ -5,13 +5,13 @@ Frequently Asked Questions | Cumulus Documentation - +
    Version: v15.0.2

    Frequently Asked Questions

    Below are some commonly asked questions that you may encounter that can assist you along the way when working with Cumulus.

    General | Workflows | Integrators & Developers | Operators


    General

    What prerequisites are needed to setup Cumulus?
    Answer: Here is a list of the tools and access that you will need in order to get started. To maintain the up-to-date versions that we are using please visit our [Cumulus main README](https://github.com/nasa/cumulus) for details.
    • NVM for node versioning
    • AWS CLI
    • Bash
    • Docker (only required for testing)
    • docker-compose (only required for testing pip install docker-compose)
    • Python
    • pipenv

    Keep in mind you will need access to the AWS console and an Earthdata account before you can deploy Cumulus.

    What is the preferred web browser for the Cumulus environment?

    Answer: Our preferred web browser is the latest version of Google Chrome.

    How do I deploy a new instance in Cumulus?

    Answer: For steps on the Cumulus deployment process go to How to Deploy Cumulus.

    Where can I find Cumulus release notes?

    Answer: To get the latest information about updates to Cumulus go to Cumulus Versions.

    How do I quickly troubleshoot an issue in Cumulus?

    Answer: To troubleshoot and fix issues in Cumulus reference our recommended solutions in Troubleshooting Cumulus.

    Where can I get support help?

    Answer: The following options are available for assistance:

    • Cumulus: Outside NASA users should file a GitHub issue and inside NASA users should file a Cumulus JIRA ticket.
    • AWS: You can create a case in the AWS Support Center, accessible via your AWS Console.

    For more information on how to submit an issue or contribute to Cumulus follow our guidelines at Contributing


    Workflows

    What is a Cumulus workflow?

    Answer: A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions. For more details, we suggest visiting the Workflows section.

    How do I set up a Cumulus workflow?

    Answer: You will need to create a provider, have an associated collection (add a new one), and generate a new rule first. Then you can set up a Cumulus workflow by following these steps here.

    Where can I find a list of workflow tasks?

    Answer: You can access a list of reusable tasks for Cumulus development at Cumulus Tasks.

    Are there any third-party workflows or applications that I can use with Cumulus?

    Answer: The Cumulus team works with various partners to help build a robust framework. You can visit our External Contributions section to see what other options are available to help you customize Cumulus for your needs.


    Integrators & Developers

    What is a Cumulus integrator?

    Answer: Those who are working within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    What are the steps if I run into an issue during deployment?

    Answer: If you encounter an issue with your deployment go to the Troubleshooting Deployment guide.

    Is Cumulus customizable and flexible?

    Answer: Yes. Cumulus is a modular architecture that allows you to decide which components that you want/need to deploy. These components are maintained as Terraform modules.

    What are Terraform modules?

    Answer: They are modules that are composed to create a Cumulus deployment, which gives integrators the flexibility to choose the components of Cumulus that want/need. To view Cumulus maintained modules or steps on how to create a module go to Terraform modules.

    Where do I find Terraform module variables

    Answer: Go here for a list of Cumulus maintained variables.

    What are the common use cases that a Cumulus integrator encounters?

    Answer: The following are some examples of possible use cases you may see:


    Operators

    What is a Cumulus operator?

    Answer: Those that ingests, archives, and troubleshoots datasets (called collections in Cumulus). Your daily activities might include but not limited to the following:

    • Ingesting datasets
    • Maintaining historical data ingest
    • Starting and stopping data handlers
    • Managing collections
    • Managing provider definitions
    • Creating, enabling, and disabling rules
    • Investigating errors for granules and deleting or re-ingesting granules
    • Investigating errors in executions and isolating failed workflow step(s)
    What are the common use cases that a Cumulus operator encounters?

    Answer: The following are some examples of possible use cases you may see:

    Explore more Cumulus operator best practices and how-tos in the dedicated Operator Docs.

    Can you re-run a workflow execution in AWS?

    Answer: Yes. For steps on how to re-run a workflow execution go to Re-running workflow executions in the Cumulus Operator Docs.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/features/ancillary_metadata/index.html b/docs/v15.0.2/features/ancillary_metadata/index.html index 4dccabb0ab7..7ec4b29839a 100644 --- a/docs/v15.0.2/features/ancillary_metadata/index.html +++ b/docs/v15.0.2/features/ancillary_metadata/index.html @@ -5,7 +5,7 @@ Ancillary Metadata Export | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v15.0.2

    Ancillary Metadata Export

    This feature utilizes the type key on a files object in a Cumulus granule. It uses the key to provide a mechanism where granule discovery, processing and other tasks can set and use this value to facilitate metadata export to CMR.

    Tasks setting type

    Discover Granules

    Uses the Collection type key to set the value for files on discovered granules in it's output.

    Parse PDR

    Uses a task-specific mapping to map PDR 'FILE_TYPE' to a CNM type to set type on granules from the PDR.

    CNMToCMALambdaFunction

    Natively supports types that are included in incoming messages to a CNM Workflow.

    Tasks using type

    Move Granules

    Uses the granule file type key to update UMM/ECHO 10 CMR files passed in as candidates to the task. This task adds the external facing URLs to the CMR metadata file based on the type. See the file tracking data cookbook for a detailed mapping. If a non-CNM type is specified, the task assumes it is a 'data' file.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/features/backup_and_restore/index.html b/docs/v15.0.2/features/backup_and_restore/index.html index c01bb3792bf..3173e4a5afa 100644 --- a/docs/v15.0.2/features/backup_and_restore/index.html +++ b/docs/v15.0.2/features/backup_and_restore/index.html @@ -5,7 +5,7 @@ Cumulus Backup and Restore | Cumulus Documentation - + @@ -52,7 +52,7 @@ writing to the old cluster.

  • Set the snapshot_identifier variable to the snapshot you wish to create, and configure the module like a new deployment, with a unique cluster_identifier

  • Deploy the module using terraform apply

  • Once deployed, verify the cluster has the expected data

  • Redeploy the data persistence and Cumulus deployments - You should not need to reconfigure either, as the secret ARN and the security group should not change, however double-check the configured values are as expected

  • - + \ No newline at end of file diff --git a/docs/v15.0.2/features/dead_letter_archive/index.html b/docs/v15.0.2/features/dead_letter_archive/index.html index f1f55691f26..95521ef560b 100644 --- a/docs/v15.0.2/features/dead_letter_archive/index.html +++ b/docs/v15.0.2/features/dead_letter_archive/index.html @@ -5,13 +5,13 @@ Cumulus Dead Letter Archive | Cumulus Documentation - +
    Version: v15.0.2

    Cumulus Dead Letter Archive

    This documentation explains the Cumulus dead letter archive and associated functionality.

    DB Records DLQ Archive

    The Cumulus system contains a number of dead letter queues. Perhaps the most important system lambda function supported by a DLQ is the sfEventSqsToDbRecords lambda function which parses Cumulus messages from workflow executions to generate and write database records to the Cumulus database.

    As of Cumulus v9+, the dead letter queue for this lambda (named sfEventSqsToDbRecordsDeadLetterQueue) has been updated with a consumer lambda that will automatically write any incoming records to the S3 system bucket, under the path <stackName>/dead-letter-archive/sqs/. This will allow integrators and operators engaged in debugging missing records to inspect any Cumulus messages which failed to process and did not result in the successful creation of database records.

    Dead Letter Archive recovery

    In addition to the above, as of Cumulus v9+, the Cumulus API also contains a new endpoint at /deadLetterArchive/recoverCumulusMessages.

    Sending a POST request to this endpoint will trigger a Cumulus AsyncOperation that will attempt to reprocess (and if successful delete) all Cumulus messages in the dead letter archive, using the same underlying logic as the existing sfEventSqsToDbRecords. Otherwise, all Cumulus messages that fail to be reprocessed will be moved to a new archive location under the path <stackName>/dead-letter-archive/failed-sqs/<YYYY-MM-DD>.

    This endpoint may prove particularly useful when recovering from extended or unexpected database outage, where messages failed to process due to external outage and there is no essential malformation of each Cumulus message.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/features/dead_letter_queues/index.html b/docs/v15.0.2/features/dead_letter_queues/index.html index b88b985bf6e..39ba7b45f12 100644 --- a/docs/v15.0.2/features/dead_letter_queues/index.html +++ b/docs/v15.0.2/features/dead_letter_queues/index.html @@ -5,13 +5,13 @@ Dead Letter Queues | Cumulus Documentation - +
    Version: v15.0.2

    Dead Letter Queues

    startSF SQS queue

    The workflow-trigger for the startSF queue has a Redrive Policy set up that directs any failed attempts to pull from the workflow start queue to a SQS queue Dead Letter Queue.

    This queue can then be monitored for failures to initiate a workflow. Please note that workflow failures will not show up in this queue, only repeated failure to trigger a workflow.

    Named Lambda Dead Letter Queues

    Cumulus provides configured Dead Letter Queues (DLQ) for non-workflow Lambdas (such as ScheduleSF) to capture Lambda failures for further processing.

    These DLQs are setup with the following configuration:

      receive_wait_time_seconds  = 20
    message_retention_seconds = 1209600
    visibility_timeout_seconds = 60

    Default Lambda Configuration

    The following built-in Cumulus Lambdas are setup with DLQs to allow handling of process failures:

    • dbIndexer (Updates Elasticsearch)
    • JobsLambda (writes logs outputs to Elasticsearch)
    • ScheduleSF (the SF Scheduler Lambda that places messages on the queue that is used to start workflows, see Workflow Triggers)
    • publishReports (Lambda that publishes messages to the SNS topics for execution, granule and PDR reporting)
    • reportGranules, reportExecutions, reportPdrs (Lambdas responsible for updating records based on messages in the queues published by publishReports)

    Troubleshooting/Utilizing messages in a Dead Letter Queue

    Ideally an automated process should be configured to poll the queue and process messages off a dead letter queue.

    For aid in manually troubleshooting, you can utilize the SQS Management console to view/messages available in the queues setup for a particular stack. The dead letter queues will have a Message Body containing the Lambda payload, as well as Message Attributes that reference both the error returned and a RequestID which can be cross referenced to the associated Lambda's CloudWatch logs for more information:

    Screenshot of the AWS SQS console showing how to view SQS message attributes

    - + \ No newline at end of file diff --git a/docs/v15.0.2/features/distribution-metrics/index.html b/docs/v15.0.2/features/distribution-metrics/index.html index ab97014c039..d478483623c 100644 --- a/docs/v15.0.2/features/distribution-metrics/index.html +++ b/docs/v15.0.2/features/distribution-metrics/index.html @@ -5,13 +5,13 @@ Cumulus Distribution Metrics | Cumulus Documentation - +
    Version: v15.0.2

    Cumulus Distribution Metrics

    It is possible to configure Cumulus and the Cumulus Dashboard to display information about the successes and failures of requests for data. This requires the Cumulus instance to deliver Cloudwatch Logs and S3 Server Access logs to an ELK stack.

    ESDIS Metrics in NGAP

    Work with the ESDIS metrics team to set up permissions and access to forward Cloudwatch Logs to a shared AWS:Logs:Destination as well as transferring your S3 Server Access logs to a metrics team bucket.

    The metrics team has taken care of setting up logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    Once Cumulus has been configured to deliver Cloudwatch logs to the ESDIS Metrics team, you can use the Elasticsearch indexes to create the necessary target patterns on the dashboard. These are often <daac>-cloudwatch-cumulus-<env>-* and <daac>-distribution-<env>-*, but they will depend on your specific Elastiscearch setup.

    Cumulus / ESDIS Metrics distribution system

    Architecture diagram showing how logs are replicated from a Cumulus instance to the ESDIS Metrics account and accessed by the Cumulus dashboard

    - + \ No newline at end of file diff --git a/docs/v15.0.2/features/execution_payload_retention/index.html b/docs/v15.0.2/features/execution_payload_retention/index.html index a02d6a4a047..0912c1c376a 100644 --- a/docs/v15.0.2/features/execution_payload_retention/index.html +++ b/docs/v15.0.2/features/execution_payload_retention/index.html @@ -5,13 +5,13 @@ Execution Payload Retention | Cumulus Documentation - +
    Version: v15.0.2

    Execution Payload Retention

    In addition to CloudWatch logs and AWS StepFunction API records, Cumulus automatically stores the initial and 'final' (the last update to the execution record) payload values as part of the Execution record in your RDS database and Elasticsearch.

    This allows access via the API (or optionally direct DB/Elasticsearch querying) for debugging/reporting purposes. The data is stored in the "originalPayload" and "finalPayload" fields.

    Payload record cleanup

    To reduce storage requirements, a CloudWatch rule ({stack-name}-dailyExecutionPayloadCleanupRule) triggering a daily run of the provided cleanExecutions lambda has been added. This lambda will remove all 'completed' and 'non-completed' payload records in the database that are older than the specified configuration.

    Configuration

    The following configuration flags have been made available in the cumulus module. They may be overridden in your deployment's instance of the cumulus module by adding the following configuration options:

    dailyexecution_payload_cleanup_schedule_expression (string)_

    This configuration option sets the execution times for this Lambda to run, using a Cloudwatch cron expression.

    Default value is "cron(0 4 * * ? *)".

    completeexecution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of completed execution payloads.

    Default value is false.

    completeexecution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a 'completed' status in days. Records with updatedAt values older than this with payload information will have that information removed.

    Default value is 10.

    noncomplete_execution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of "non-complete" (any status other than completed) execution payloads.

    Default value is false.

    noncomplete_execution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a status other than 'complete' in days. Records with updateTime values older than this with payload information will have that information removed.

    Default value is 30 days.

    • complete_execution_payload_disable/non_complete_execution_payload_disable

    These flags (true/false) determine if the cleanup script's logic for 'complete' and 'non-complete' executions will run. Default value is false for both.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/features/logging-esdis-metrics/index.html b/docs/v15.0.2/features/logging-esdis-metrics/index.html index a7cdbf5b820..6f6dd1c1b6b 100644 --- a/docs/v15.0.2/features/logging-esdis-metrics/index.html +++ b/docs/v15.0.2/features/logging-esdis-metrics/index.html @@ -5,13 +5,13 @@ Writing logs for ESDIS Metrics | Cumulus Documentation - +
    Version: v15.0.2

    Writing logs for ESDIS Metrics

    Note: This feature is only available for Cumulus deployments in NGAP environments.

    Prerequisite: You must configure your Cumulus deployment to deliver your logs to the correct shared logs destination for ESDIS metrics.

    Log messages delivered to the ESDIS metrics logs destination conforming to an expected format will be automatically ingested and parsed to enable helpful searching/filtering of your logs via the ESDIS metrics Kibana dashboard.

    Expected log format

    The ESDIS metrics pipeline expects a log message to be a JSON string representation of an object (dict in Python or map in Java). An example log message might look like:

    {
    "level": "info",
    "executions": "arn:aws:states:us-east-1:000000000000:execution:MySfn:abcd1234",
    "granules": "[\"granule-1\",\"granule-2\"]",
    "message": "hello world",
    "sender": "greetingFunction",
    "stackName": "myCumulus",
    "timestamp": "2018-10-19T19:12:47.501Z"
    }

    A log message can contain the following properties:

    • executions: The AWS Step Function execution name in which this task is executing, if any
    • granules: A JSON string of the array of granule IDs being processed by this code, if any
    • level: A string identifier for the type of message being logged. Possible values:
      • debug
      • error
      • fatal
      • info
      • warn
      • trace
    • message: String containing your actual log message
    • parentArn: The parent AWS Step Function execution ARN that triggered the current execution, if any
    • sender: The name of the resource generating the log message (e.g. a library name, a Lambda function name, an ECS activity name)
    • stackName: The unique prefix for your Cumulus deployment
    • timestamp: An ISO-8601 formatted timestamp
    • version: The version of the resource generating the log message, if any

    None of these properties are explicitly required for ESDIS metrics to parse your log correctly. However, a log without a message has no informational content. And having level, sender, and timestamp properties is very useful for filtering your logs. Including a stackName in your logs is helpful as it allows you to distinguish between logs generated by different deployments.

    Using Cumulus Message Adapter libraries

    If you are writing a custom task that is integrated with the Cumulus Message Adapter, then some of language specific client libraries can be used to write logs compatible with ESDIS metrics.

    The usage of each library differs slightly, but in general a logger is initialized with a Cumulus workflow message to determine the contextual information for the task (e.g. granules, executions). Then, after the logger is initialized, writing logs only requires specifying a message, but the logged output will include the contextual information as well.

    Writing logs using custom code

    Any code that produces logs matching the expected log format can be processed by ESDIS metrics.

    Node.js

    Cumulus core provides a @cumulus/logger library that writes logs in the expected format for ESDIS metrics.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/features/replay-archived-sqs-messages/index.html b/docs/v15.0.2/features/replay-archived-sqs-messages/index.html index eaa3317d515..9f74b1d9de0 100644 --- a/docs/v15.0.2/features/replay-archived-sqs-messages/index.html +++ b/docs/v15.0.2/features/replay-archived-sqs-messages/index.html @@ -5,14 +5,14 @@ How to replay SQS messages archived in S3 | Cumulus Documentation - +
    Version: v15.0.2

    How to replay SQS messages archived in S3

    Context

    Cumulus archives all incoming SQS messages to S3 and removes messages once they have been processed. Unprocessed messages are archived at the path: ${stackName}/archived-incoming-messages/${queueName}/${messageId}

    Replay SQS messages endpoint

    The Cumulus API has added a new endpoint, /replays/sqs. This endpoint will allow you to start a replay operation to requeue all archived SQS messages by queueName and returns an AsyncOperationId for operation status tracking.

    Start replaying archived SQS messages

    In order to start a replay, you must perform a POST request to the replays/sqs endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    FieldTypeDescription
    queueNamestringAny valid SQS queue name (not ARN)

    Status tracking

    A successful response from the /replays/sqs endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/features/replay-kinesis-messages/index.html b/docs/v15.0.2/features/replay-kinesis-messages/index.html index 0d11e51ffa9..b5840def84b 100644 --- a/docs/v15.0.2/features/replay-kinesis-messages/index.html +++ b/docs/v15.0.2/features/replay-kinesis-messages/index.html @@ -5,7 +5,7 @@ How to replay Kinesis messages after an outage | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v15.0.2

    How to replay Kinesis messages after an outage

    After a period of outage, it may be necessary for a Cumulus operator to reprocess or 'replay' messages that arrived on an AWS Kinesis Data Stream but did not trigger an ingest. This document serves as an outline on how to start a replay operation, and how to perform status tracking. Cumulus supports replay of all Kinesis messages on a stream (subject to the normal RetentionPeriod constraints), or all messages within a given time slice delimited by start and end timestamps.

    As Kinesis has no comparable field to e.g. the SQS ReceiveCount on its records, Cumulus cannot tell which messages within a given time slice have never been processed, and cannot guarantee only missed messages will be processed. Users will have to rely on duplicate handling or some other method of identifying messages that should not be processed within the time slice.

    NOTE: This operation flow effectively changes only the trigger mechanism for Kinesis ingest notifications. The existence of valid Kinesis-type rules and all other normal requirements for the triggering of ingest via Kinesis still apply.

    Replays endpoint

    Cumulus has added a new endpoint to its API, /replays. This endpoint will allow you to start replay operations and returns an AsyncOperationId for operation status tracking.

    Start a replay

    In order to start a replay, you must perform a POST request to the replays endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    NOTE: As the endTimestamp relies on a comparison with the Kinesis server-side ApproximateArrivalTimestamp, and given that there is no documented level of accuracy for the approximation, it is recommended that the endTimestamp include some amount of buffer to allow for slight discrepancies. If tolerable, the same is recommended for the startTimestamp although it is used differently and less vulnerable to discrepancies since a server-side arrival timestamp should never be earlier than the client-side request timestamp.

    FieldTypeRequiredDescription
    typestringrequiredCurrently only accepts kinesis.
    kinesisStreamstringfor type kinesisAny valid kinesis stream name (not ARN)
    kinesisStreamCreationTimestamp*optionalAny input valid for a JS Date constructor. For reasons to use this field see AWS documentation on StreamCreationTimestamp.
    endTimestamp*optionalAny input valid for a JS Date constructor. Messages newer than this timestamp will be skipped.
    startTimestamp*optionalAny input valid for a JS Date constructor. Messages will be fetched from the Kinesis stream starting at this timestamp. Ignored if it is further in the past than the stream's retention period.

    Status tracking

    A successful response from the /replays endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/features/reports/index.html b/docs/v15.0.2/features/reports/index.html index aef534756b8..b34b516de32 100644 --- a/docs/v15.0.2/features/reports/index.html +++ b/docs/v15.0.2/features/reports/index.html @@ -5,7 +5,7 @@ Reconciliation Reports | Cumulus Documentation - + @@ -19,7 +19,7 @@ report generation. The data buckets will include any buckets in your Cumulus buckets configuration that have type public, protected or private.
    - + \ No newline at end of file diff --git a/docs/v15.0.2/getting-started/index.html b/docs/v15.0.2/getting-started/index.html index d5c5efdaff9..0f04458961f 100644 --- a/docs/v15.0.2/getting-started/index.html +++ b/docs/v15.0.2/getting-started/index.html @@ -5,13 +5,13 @@ Getting Started | Cumulus Documentation - +
    Version: v15.0.2

    Getting Started

    Overview | Quick Tutorials | Helpful Tips

    Overview

    This serves as a guide for new Cumulus users to deploy and learn how to use Cumulus. Here you will learn what you need in order to complete any prerequisites, what Cumulus is and how it works, and how to successfully navigate and deploy a Cumulus environment.

    What is Cumulus

    Cumulus is an open source set of components for creating cloud-based data ingest, archive, distribution and management designed for NASA's future Earth Science data streams.

    Who uses Cumulus

    Data integrators/developers and operators across projects not limited to NASA use Cumulus for their daily work functions.

    Cumulus Roles

    Integrator/Developer

    Cumulus integrators/developers are those who work within Cumulus and AWS for deployments and to manage workflows.

    Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections.

    Role Guides

    As a developer, integrator, or operator, you will need to set up your environments to work in Cumulus. The following docs can get you started in your role specific activities.

    What is a Cumulus Data Type

    In Cumulus, we have the following types of data that you can create and manage:

    • Collections
    • Granules
    • Providers
    • Rules
    • Workflows
    • Executions
    • Reports

    For details on how to create or manage data types go to Data Management Types.


    Quick Tutorials

    Deployment & Configuration

    Cumulus is deployed to an AWS account, so you must have access to deploy resources to an AWS account to get started.

    1. Set up Git Secrets

    To ensure your AWS access keys and passwords are protected as you submit commits we recommend setting up Git Secrets.

    2. Deploy Cumulus Core and Cumulus Dashboard to AWS

    Follow the deployment instructions to deploy Cumulus to your AWS account.

    3. Configure and Run the HelloWorld Workflow

    If you have deployed using the cumulus-template-deploy repository, you have a HelloWorld workflow deployed to your Cumulus backend.

    You can see your deployed workflows on the Workflows page of your Cumulus dashboard.

    Configure a collection and provider using the setup guidance on the Cumulus dashboard.

    Then create a rule to trigger your HelloWorld workflow. You can select a rule type of one time.

    Navigate to the Executions page of the dashboard to check the status of your workflow execution.

    4. Configure a Custom Workflow

    See Developing a custom workflow documentation for adding a new workflow to your deployment.

    There are plenty of workflow examples using Cumulus tasks here. The Data Cookbooks provide a more in-depth look at some of these more advanced workflows and their configurations.

    There is a list of Cumulus tasks already included in your deployment here.

    After configuring your workflow and redeploying, you can configure and run your workflow using the same steps as in step 2.


    Helpful Tips

    Here are some useful tips to keep in mind when deploying or working in Cumulus.

    Integrator/Developer

    • Versioning and Releases: This documentation gives information on our global versioning approach. We suggest upgrading to the supported version for Cumulus, Cumulus dashboard, and Thin Egress App (TEA).
    • Cumulus Developer Documentation: We suggest that you read through and reference this resource for development best practices in Cumulus.
    • Cumulus Deployment: We will guide you on how to manually deploy a new instance of Cumulus. In this reference, you will learn how to install Terraform, create an AWS S3 bucket, configure a compatible database, and create a Lambda layer.
    • Terraform Best Practices: This will help guide you through your Terraform configuration and Cumulus deployment.

    For an introduction about Terraform go here.

    Operator

    Troubleshooting

    Troubleshooting: Some suggestions to help you troubleshoot and solve issues you may encounter.

    Resources

    - + \ No newline at end of file diff --git a/docs/v15.0.2/glossary/index.html b/docs/v15.0.2/glossary/index.html index 9d2524dbc08..b669b5e708e 100644 --- a/docs/v15.0.2/glossary/index.html +++ b/docs/v15.0.2/glossary/index.html @@ -5,13 +5,13 @@ Glossary | Cumulus Documentation - +
    Version: v15.0.2

    Glossary

    AWS Glossary

    For terms/items from Amazon/AWS not mentioned in this glossary, please refer to the AWS Glossary.

    Cumulus Glossary of Terms

    API Gateway

    Refers to AWS's API Gateway. Used by the Cumulus API.

    ARN

    Refers to an AWS "Amazon Resource Name".

    For more info, see the AWS documentation.

    AWS

    See: Amazon Web Services documentation.

    AWS Lambda/Lambda Function

    AWS's 'serverless' option. Allows the running of code without provisioning a service or managing server/ECS instances/etc.

    For more information, see the AWS Lambda documentation.

    AWS Access Keys

    Access credentials that give you access to AWS to act as a IAM user programmatically or from the command line.

    For more information, see the AWS IAM Documentation.

    Bucket

    An Amazon S3 cloud storage resource.

    For more information, see the AWS Bucket Documentation.

    CloudFormation

    An AWS service that allows you to define and manage cloud resources as a preconfigured block.

    For more information, see the AWS CloudFormation User Guide.

    Cloudformation Template

    A template that defines an AWS Cloud Formation.

    For more information, see the AWS intro page.

    Cloudwatch

    AWS service that allows logging and metrics collections on various cloud resources you have in AWS.

    For more information, see the AWS User Guide.

    Cloud Notification Mechanism (CNM)

    An interface mechanism to support cloud-based ingest messaging. For more information, see PO.DAAC's CNM Schema.

    Common Metadata Repository (CMR)

    "A high-performance, high-quality, continuously evolving metadata system that catalogs Earth Science data and associated service metadata records". For more information, see NASA's CMR page.

    Collection (Cumulus)

    Cumulus Collections are logical sets of data objects of the same data type and version.

    For more information, see Collections - Data Management Types.

    Cumulus Message Adapter (CMA)

    A library designed to help task developers integrate step function tasks into a Cumulus workflow by adapting task input/output into the Cumulus Message format.

    For more information, see CMA workflow reference page.

    Distributed Active Archive Center (DAAC)

    Refers to a specific organization that's part of NASA's distributed system of archive centers. For more information see EOSDIS's DAAC page.

    Dead Letter Queue (DLQ)

    This refers to Amazon SQS Dead-Letter Queues - these SQS queues are specifically configured to capture failed messages from other services/SQS queues/etc to allow for processing of failed messages.

    For more on DLQs, see the Amazon Documentation and the Cumulus DLQ feature page.

    Developer

    Those who setup deployment and workflow management for Cumulus. Sometimes referred to as an integrator. See integrator.

    ECS

    Amazon's Elastic Container Service. Used in Cumulus by workflow steps that require more flexibility than Lambda can provide.

    For more information, see AWS's developer guide.

    ECS Activity

    An ECS instance run via a Step Function.

    Execution (Cumulus)

    A Cumulus execution refers to a single execution of a (Cumulus) Workflow.

    GIBS

    Global Imagery Browse Services

    Granule

    A granule is the smallest aggregation of data that can be independently managed (described, inventoried, and retrieved). Granules are always associated with a collection, which is a grouping of granules. A granule is a grouping of data files.

    IAM

    AWS Identity and Access Management.

    For more information, see AWS IAMs.

    Integrator/Developer

    Those who work within Cumulus and AWS for deployments and to manage workflows.

    Kinesis

    Amazon's platform for streaming data on AWS.

    See AWS Kinesis for more information.

    Lambda

    AWS's cloud service that lets you run code without provisioning or managing servers.

    For more information, see AWS's lambda page.

    Module (Terraform)

    Refers to a terraform module.

    Node

    See node.js.

    Node Package Manager (npm)

    Node package manager. Often referred to as npm.

    For more information, see npm.

    Operator

    Those who work within Cumulus to ingest/archive data and manage collections.

    PDR

    "Polling Delivery Mechanism" used in "DAAC Ingest" workflows.

    For more information, see nasa.gov.

    Packages (npm)

    Npm hosted node.js packages. Cumulus packages can be found on npm's site here

    Provider

    Data source that generates and/or distributes data for Cumulus workflows to act upon.

    For more information, see the Cumulus documentation.

    Rule

    Rules are configurable scheduled events that trigger workflows based on various criteria.

    For more information, see the Cumulus Rules documentation.

    S3

    Amazon's Simple Storage Service provides data object storage in the cloud. Used in Cumulus to store configuration, data, and more.

    For more information, see AWS's S3 page.

    SIPS

    Science Investigator-led Processing Systems. In the context of DAAC ingest, this refers to data producers/providers.

    For more information, see nasa.gov.

    SNS

    Amazon's Simple Notification Service provides a messaging service that allows publication of and subscription to events. Used in Cumulus to trigger workflow events, track event failures, and others.

    For more information, see AWS's SNS page.

    SQS

    Amazon's Simple Queue Service.

    For more information, see AWS's SQS page.

    Stack

    A collection of AWS resources you can manage as a single unit.

    In the context of Cumulus, this refers to a deployment of the cumulus and data-persistence modules that is managed by Terraform.

    Step Function

    AWS's web service that allows you to compose complex workflows as a state machine comprised of tasks (Lambdas, activities hosted on EC2/ECS, some AWS service APIs, etc). See AWS's Step Function Documentation for more information. In the context of Cumulus these are the underlying AWS service used to create Workflows.

    Terraform

    Terraform is the tool that you will use for deployment and configuration of your Cumulus environment.

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/index.html b/docs/v15.0.2/index.html index 95082748aab..736cc5a7a88 100644 --- a/docs/v15.0.2/index.html +++ b/docs/v15.0.2/index.html @@ -5,13 +5,13 @@ Introduction | Cumulus Documentation - +
    Version: v15.0.2

    Introduction

    This Cumulus project seeks to address the existing need for a “native” cloud-based data ingest, archive, distribution, and management system that can be used for all future Earth Observing System Data and Information System (EOSDIS) data streams via the development and implementation of Cumulus. The term “native” implies that the system will leverage all components of a cloud infrastructure provided by the vendor for efficiency (in terms of both processing time and cost). Additionally, Cumulus will operate on future data streams involving satellite missions, aircraft missions, and field campaigns.

    This documentation includes both guidelines, examples, and source code docs. It is accessible at https://nasa.github.io/cumulus.


    Get To Know Cumulus

    • Getting Started - here - If you are new to Cumulus we suggest that you begin with this section to help you understand and work in the environment.
    • General Cumulus Documentation - here <- you're here

    Cumulus Reference Docs

    • Cumulus API Documentation - here
    • Cumulus Developer Documentation - here - READMEs throughout the main repository.
    • Data Cookbooks - here

    Auxiliary Guides

    • Integrator Guide - here
    • Operator Docs - here

    Contributing

    Please refer to: https://github.com/nasa/cumulus/blob/master/CONTRIBUTING.md for information. We thank you in advance.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/integrator-guide/about-int-guide/index.html b/docs/v15.0.2/integrator-guide/about-int-guide/index.html index 73e4de9819e..c94131fe7e2 100644 --- a/docs/v15.0.2/integrator-guide/about-int-guide/index.html +++ b/docs/v15.0.2/integrator-guide/about-int-guide/index.html @@ -5,13 +5,13 @@ About Integrator Guide | Cumulus Documentation - +
    Version: v15.0.2

    About Integrator Guide

    Purpose

    The Integrator Guide is to help supplement the Cumulus documentation and Data Cookbooks. This content is for Cumulus integrators who are either new to the project or need a step-by-step resource to help them along.

    What Is A Cumulus Integrator

    Cumulus integrators are those who work within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    - + \ No newline at end of file diff --git a/docs/v15.0.2/integrator-guide/int-common-use-cases/index.html b/docs/v15.0.2/integrator-guide/int-common-use-cases/index.html index e3ce78b9bef..fcfab4cdfb2 100644 --- a/docs/v15.0.2/integrator-guide/int-common-use-cases/index.html +++ b/docs/v15.0.2/integrator-guide/int-common-use-cases/index.html @@ -5,13 +5,13 @@ Integrator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v15.0.2/integrator-guide/workflow-add-new-lambda/index.html b/docs/v15.0.2/integrator-guide/workflow-add-new-lambda/index.html index 91424a2831f..cee1f68b832 100644 --- a/docs/v15.0.2/integrator-guide/workflow-add-new-lambda/index.html +++ b/docs/v15.0.2/integrator-guide/workflow-add-new-lambda/index.html @@ -5,13 +5,13 @@ Workflow - Add New Lambda | Cumulus Documentation - +
    Version: v15.0.2

    Workflow - Add New Lambda

    You can develop a workflow task in AWS Lambda or Elastic Container Service (ECS). AWS ECS requires Docker. For a list of tasks to use go to our Cumulus Tasks page.

    The following steps are to help you along as you write a new Lambda that integrates with a Cumulus workflow. This will aid you with the understanding of the Cumulus Message Adapter (CMA) process.

    Steps

    1. Define New Lambda in Terraform

    2. Add Task in JSON Object

      For details on how to set up a workflow via CMA go to the CMA Tasks: Message Flow.

      You will need to assign input and output for the new task and follow the CMA contract here. This contract defines how libraries should call the cumulus-message-adapter to integrate a task into an existing Cumulus Workflow.

    3. Verify New Task

      Check the updated workflow in AWS and in Cumulus.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/integrator-guide/workflow-ts-failed-step/index.html b/docs/v15.0.2/integrator-guide/workflow-ts-failed-step/index.html index 9bb7a614a33..0b83e5abe97 100644 --- a/docs/v15.0.2/integrator-guide/workflow-ts-failed-step/index.html +++ b/docs/v15.0.2/integrator-guide/workflow-ts-failed-step/index.html @@ -5,13 +5,13 @@ Workflow - Troubleshoot Failed Step(s) | Cumulus Documentation - +
    Version: v15.0.2

    Workflow - Troubleshoot Failed Step(s)

    Steps

    1. Locate Step
    • Go to Cumulus dashboard
    • Find the granule
    • Go to Executions to determine the failed step
    1. Investigate in Cloudwatch
    • Go to Cloudwatch
    • Locate lambda
    • Search Cloudwatch logs
    1. Recreate Error

      In your sandbox environment, try to recreate the error.

    2. Resolution

    - + \ No newline at end of file diff --git a/docs/v15.0.2/interfaces/index.html b/docs/v15.0.2/interfaces/index.html index 2be23bf4c9e..72ffbfa29a4 100644 --- a/docs/v15.0.2/interfaces/index.html +++ b/docs/v15.0.2/interfaces/index.html @@ -5,13 +5,13 @@ Interfaces | Cumulus Documentation - +
    Version: v15.0.2

    Interfaces

    Cumulus has multiple interfaces that allow interaction with discrete components of the system, such as starting workflows via SNS/Kinesis/SQS, manually queueing workflow start messages, submitting SNS notifications for completed workflows, and the many operations allowed by the Cumulus API.

    The diagram below illustrates the workflow process in detail and the various interfaces that allow starting of workflows, reporting of workflow information, and database create operations that occur when a workflow reporting message is processed. For interfaces with expected input or output schemas, details are provided below.

    Architecture diagram showing the interfaces for triggering and reporting of Cumulus workflow executions

    Workflow triggers and queuing

    Kinesis stream

    As a Kinesis stream is consumed by the messageConsumer Lambda to queue workflow executions, the incoming event is validated against this consumer schema by the ajv package.

    SQS queue for executions

    The messages put into the SQS queue for executions should conform to the Cumulus message format.

    Workflow executions

    See the documentation on Cumulus workflows.

    Workflow reporting

    SNS reporting topics

    For granule and PDR reporting, the topics will only receive data if the Cumulus workflow execution message meets the following criteria:

    • Granules - workflow message contains granule data in payload.granules
    • PDRs - workflow message contains PDR data in payload.pdr

    The messages published to the SNS reporting topics for executions and PDRs and the record property in the messages published to the granules SNS topic should conform to the model schema for each data type.

    Further detail on workflow reporting and how to interact with these interfaces can be found in the workflow notifications data cookbook.

    Cumulus API

    See the Cumulus API documentation.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/operator-docs/about-operator-docs/index.html b/docs/v15.0.2/operator-docs/about-operator-docs/index.html index d3bf71c322f..15c260fb6e0 100644 --- a/docs/v15.0.2/operator-docs/about-operator-docs/index.html +++ b/docs/v15.0.2/operator-docs/about-operator-docs/index.html @@ -5,13 +5,13 @@ About Operator Docs | Cumulus Documentation - +
    Version: v15.0.2

    About Operator Docs

    Purpose

    Operator Docs are an augmentation to Cumulus documentation and Data Cookbooks. These documents will walk step-by-step through common Cumulus activities (that aren't necessarily as use-case directed as what you'd see in Data Cookbooks).

    What Is A Cumulus Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections. They may perform the following functions via the operator dashboard or API:

    • Configure providers and collections
    • Configure rules and monitor workflow executions
    • Monitor granule ingestion
    • Monitor system metrics
    - + \ No newline at end of file diff --git a/docs/v15.0.2/operator-docs/bulk-operations/index.html b/docs/v15.0.2/operator-docs/bulk-operations/index.html index 1d5cea10cdd..eea454707a0 100644 --- a/docs/v15.0.2/operator-docs/bulk-operations/index.html +++ b/docs/v15.0.2/operator-docs/bulk-operations/index.html @@ -5,14 +5,14 @@ Bulk Operations | Cumulus Documentation - +
    Version: v15.0.2

    Bulk Operations

    Cumulus implements bulk operations through the use of AsyncOperations, which are long-running processes executed on an AWS ECS cluster.

    Submitting a bulk API request

    Bulk operations are generally submitted via the endpoint for the relevant data type, e.g. granules. For a list of supported API requests, refer to the Cumulus API documentation. Bulk operations are denoted with the keyword 'bulk'.

    Starting bulk operations from the Cumulus dashboard

    Using a Kibana query

    Note: You must have configured your dashboard build with a KIBANAROOT environment variable in order for the Kibana link to render in the bulk granules modal

    1. From the Granules dashboard page, click on the "Run Bulk Granules" button, then select what type of action you would like to perform

      • Note: the rest of the process is the same regardless of what type of bulk action you perform
    2. From the bulk granules modal, click the "Open Kibana" link:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations

    3. Once you have accessed Kibana, navigate to the "Discover" page. If this is your first time using Kibana, you may see a message like this at the top of the page:

      In order to visualize and explore data in Kibana, you'll need to create an index pattern to retrieve data from Elasticsearch.

      In that case, see the docs for creating an index pattern for Kibana

      Screenshot of Kibana user interface showing the &quot;Discover&quot; page for running queries

    4. Enter a query that returns the granule records that you want to use for bulk operations:

      Screenshot of Kibana user interface showing an example Kibana query and results

    5. Once the Kibana query is returning the results you want, click the "Inspect" link near the top of the page. A slide out tab with request details will appear on the right side of the page:

      Screenshot of Kibana user interface showing details of an example request

    6. In the slide out tab that appears on the right side of the page, click the "Request" link near the top and scroll down until you see the query property:

      Screenshot of Kibana user interface showing the Elasticsearch data request made for a given Kibana query

    7. Highlight and copy the query contents from Kibana. Go back to the Cumulus dashboard and paste the query contents from Kibana inside of the query property in the bulk granules request payload. It is expected that you should have a property of query nested inside of the existing query property:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query information populated

    8. Add values for the index and workflowName to the bulk granules request payload. The value for index will vary based on your Elasticsearch setup, but it is good to target an index specifically for granule data if possible:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query, index, and workflow information populated

    9. Click the "Run Bulk Operations" button. You should see a confirmation message, including an ID for the async operation that was started to handle your bulk action. You can track the status of this async operation on the Operations dashboard page, which can be visited by clicking the "Go To Operations" button:

      Screenshot of Cumulus dashboard showing confirmation message with async operation ID for bulk granules request

    Creating an index pattern for Kibana

    1. Define the index pattern for the indices that your Kibana queries should use. A wildcard character, *, will match across multiple indices. Once you are satisfied with your index pattern, click the "Next step" button:

      Screenshot of Kibana user interface for defining an index pattern

    2. Choose whether to use a Time Filter for your data, which is not required. Then click the "Create index pattern" button:

      Screenshot of Kibana user interface for configuring the settings of an index pattern

    Status Tracking

    All bulk operations return an AsyncOperationId which can be submitted to the /asyncOperations endpoint.

    The /asyncOperations endpoint allows listing of AsyncOperation records as well as record retrieval for individual records, which will contain the status. The Cumulus API documentation shows sample requests for these actions.

    The Cumulus Dashboard also includes an Operations monitoring page, where operations and their status are visible:

    Screenshot of Cumulus Dashboard Operations Page showing 5 operations and their status, ID, description, type and creation timestamp

    - + \ No newline at end of file diff --git a/docs/v15.0.2/operator-docs/cmr-operations/index.html b/docs/v15.0.2/operator-docs/cmr-operations/index.html index ab9b7eec718..329b698ef4d 100644 --- a/docs/v15.0.2/operator-docs/cmr-operations/index.html +++ b/docs/v15.0.2/operator-docs/cmr-operations/index.html @@ -5,7 +5,7 @@ CMR Operations | Cumulus Documentation - + @@ -16,7 +16,7 @@ UpdateCmrAccessConstraints will update CMR metadata file contents on S3, and PostToCmr will push the updates to CMR. The rest of this section will assume you have created this workflow under the name UpdateCmrAccessConstraints.

    Once created and deployed, the workflow is available in the Cumulus dashboard's Execute workflow selector. However, note that additional configuration is required for this request, to supply an access constraint integer value and optional description to the UpdateCmrAccessConstraints workflow, by clicking the Add Custom Workflow Meta option in the Execute popup, as shown below:

    Screenshot showing granule execute popup with &#39;updateCmrAccessConstraints&#39; selected and configuration values shown in a collapsible JSON field

    An example invocation of the API to perform this action is:

    $ curl --request PUT https://example.com/granules/MOD11A1.A2017137.h19v16.006.2017138085750 \
    --header 'Authorization: Bearer ReplaceWithTheToken' \
    --header 'Content-Type: application/json' \
    --data '{
    "action": "applyWorkflow",
    "workflow": "updateCmrAccessConstraints",
    "meta": {
    accessConstraints: {
    value: 5,
    description: "sample access constraint"
    }
    }
    }'

    Supported CMR metadata formats for the above operation are Echo10XML and UMMG-JSON, which will populate the RestrictionFlag and RestrictionComment fields in Echo10XML, or the AccessConstraints values in UMMG-JSON.

    Additional Operations

    At this time Cumulus does not, out of the box, support additional operations on CMR metadata. However, given the examples shown above, we recommend working with your integrators to develop additional workflows that perform any required operations.

    Bulk CMR operations

    In order to perform the above operations in bulk, Cumulus supports the use of ApplyWorkflow in an AsyncOperation. These are accessed via the Bulk Operation button on the dashboard, or the /granules/bulk endpoint on the Cumulus API.

    More information on bulk operations are in the bulk operations operator doc.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/operator-docs/create-rule-in-cumulus/index.html b/docs/v15.0.2/operator-docs/create-rule-in-cumulus/index.html index ecf62141216..2555f1caae1 100644 --- a/docs/v15.0.2/operator-docs/create-rule-in-cumulus/index.html +++ b/docs/v15.0.2/operator-docs/create-rule-in-cumulus/index.html @@ -5,13 +5,13 @@ Create Rule In Cumulus | Cumulus Documentation - +
    Version: v15.0.2

    Create Rule In Cumulus

    Once the above files are in place and the entries created in CMR and Cumulus, we are ready to begin ingesting data. Depending on the type of ingestion (FTP/Kinesis, etc) the values below will change, but for the most part they are all similar. Rules tell Cumulus how to associate providers and collections, and when/how to start processing a workflow.

    Steps

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v15.0.2/operator-docs/discovery-filtering/index.html b/docs/v15.0.2/operator-docs/discovery-filtering/index.html index 516c89d31e2..9559dbf6129 100644 --- a/docs/v15.0.2/operator-docs/discovery-filtering/index.html +++ b/docs/v15.0.2/operator-docs/discovery-filtering/index.html @@ -5,7 +5,7 @@ Discovery Filtering | Cumulus Documentation - + @@ -24,7 +24,7 @@ directly list the provider_path. If the path contains regular expression components, this may fail.

    It is recommended that operators diagnose any failures by checking error logs and ensuring that permissions on the remote file system allow reading of the default directory and any subdirectories that match the filter.

    Supported protocols

    Currently support for this feature is limited to the following protocols:

    • ftp
    • sftp
    - + \ No newline at end of file diff --git a/docs/v15.0.2/operator-docs/granule-workflows/index.html b/docs/v15.0.2/operator-docs/granule-workflows/index.html index a429d9b37a8..6aa84a6a69c 100644 --- a/docs/v15.0.2/operator-docs/granule-workflows/index.html +++ b/docs/v15.0.2/operator-docs/granule-workflows/index.html @@ -5,13 +5,13 @@ Granule Workflows | Cumulus Documentation - +
    Version: v15.0.2

    Granule Workflows

    Failed Granule

    Delete and Ingest

    1. Delete Granule

    Note: Granules published to CMR will need to be removed from CMR via the dashboard prior to deletion

    1. Ingest Granule via Ingest Rule
    • Re-trigger a one-time, kinesis, SQS, or SNS rule or a scheduled rule will re-discover and reingest the deleted granule.

    Reingest

    1. Select Failed Granule
    • In the Cumulus dashboard, go to the Collections page.
    • Use search field to find the granule.
    1. Re-ingest Granule
    • Go to the Collections page.
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of the Reingest modal workflow

    Delete and Ingest

    1. Bulk Delete Granules
    • Go to the Granules page.
    • Use the Bulk Delete button to bulk delete selected granules or select via a Kibana query

    Note: You can optionally force deletion from CMR

    1. Ingest Granules via Ingest Rule
    • Re-trigger one-time, kinesis, SQS, or SNS rules or scheduled rules will re-discover and reingest the deleted granule.

    Multiple Failed Granules

    1. Select Failed Granules
    • In the Cumulus dashboard, go to the Collections page.
    • Click on Failed Granules.
    • Select multiple granules.

    Screenshot of selected multiple granules

    1. Bulk Re-ingest Granules
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of Bulk Reingest modal workflow

    - + \ No newline at end of file diff --git a/docs/v15.0.2/operator-docs/kinesis-stream-for-ingest/index.html b/docs/v15.0.2/operator-docs/kinesis-stream-for-ingest/index.html index d0e26d128b8..d20ccea9a58 100644 --- a/docs/v15.0.2/operator-docs/kinesis-stream-for-ingest/index.html +++ b/docs/v15.0.2/operator-docs/kinesis-stream-for-ingest/index.html @@ -5,13 +5,13 @@ Setup Kinesis Stream & CNM Message | Cumulus Documentation - +
    Version: v15.0.2

    Setup Kinesis Stream & CNM Message

    Note: Keep in mind that you should only have to set this up once per ingest stream. Kinesis pricing is based on the shard value and not on amount of kinesis usage.

    1. Create a Kinesis Stream

      • In your AWS console, go to the Kinesis service and click Create Data Stream.
      • Assign a name to the stream.
      • Apply a shard value of 1.
      • Click on Create Kinesis Stream.
      • A status page with stream details display. Once the status is active then the stream is ready to use. Keep in mind to record the streamName and StreamARN for later use.

      Screenshot of AWS console page for creating a Kinesis stream

    2. Create a Rule

    3. Send a message

      • Send a message that makes your schema using python or by your command line.
      • The streamName and Collection must match the kinesisArn+collection defined in the rule that you have created in Step 2.
    - + \ No newline at end of file diff --git a/docs/v15.0.2/operator-docs/locating-access-logs/index.html b/docs/v15.0.2/operator-docs/locating-access-logs/index.html index d3bc6cd44ed..dc4dcfe5d1a 100644 --- a/docs/v15.0.2/operator-docs/locating-access-logs/index.html +++ b/docs/v15.0.2/operator-docs/locating-access-logs/index.html @@ -5,13 +5,13 @@ Locating S3 Access Logs | Cumulus Documentation - +
    Version: v15.0.2

    Locating S3 Access Logs

    When enabling S3 Access Logs for EMS Reporting you configured a TargetBucket and TargetPrefix. Inside the TargetBucket at the TargetPrefix is where you will find the raw S3 access logs.

    In a standard deployment, this will be your stack's <internal bucket name> and a key prefix of <stack>/ems-distribution/s3-server-access-logs/

    - + \ No newline at end of file diff --git a/docs/v15.0.2/operator-docs/naming-executions/index.html b/docs/v15.0.2/operator-docs/naming-executions/index.html index 407c629c4d1..3951778daf2 100644 --- a/docs/v15.0.2/operator-docs/naming-executions/index.html +++ b/docs/v15.0.2/operator-docs/naming-executions/index.html @@ -5,7 +5,7 @@ Naming Executions | Cumulus Documentation - + @@ -21,7 +21,7 @@ QueuePdrs step.

    In the following excerpt, the QueueGranules config.executionNamePrefix property is set using the value configured in the workflow's meta.executionNamePrefix.

    Please note: This meta.executionNamePrefix property should not be confused with the optional rule executionNamePrefix property from the previous section. Setting executionNamePrefix as a root property of the rule will set a prefix for the names of any workflows triggered by the rule. Setting meta.executionNamePrefix on the rule will set meta.executionNamePrefix in the workflow messages generated for this rule, allowing workflow steps like QueueGranules to read from the message meta.executionNamePrefix for their config. Then, workflows scheduled by QueueGranules would use the configured execution name prefix.

    Setting executionNamePrefix config for QueueGranules using rule.meta

    If you wanted to use a prefix of "my-prefix", you would create a rule with a meta property similar to the following Rule snippet:

    {
    ...other rule keys here...
    "meta":
    {
    "executionNamePrefix": "my-prefix"
    }
    }

    The value of meta.executionNamePrefix from the rule will be set as meta.executionNamePrefix in the workflow message.

    Then, the workflow could contain a "QueueGranules" step with the following state, which uses meta.executionNamePrefix from the message as the value for the executionNamePrefix config to the "QueueGranules" step:

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "executionNamePrefix": "{$.meta.executionNamePrefix}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },
    }
    - + \ No newline at end of file diff --git a/docs/v15.0.2/operator-docs/ops-common-use-cases/index.html b/docs/v15.0.2/operator-docs/ops-common-use-cases/index.html index 8bc06a04381..a08e2f708b2 100644 --- a/docs/v15.0.2/operator-docs/ops-common-use-cases/index.html +++ b/docs/v15.0.2/operator-docs/ops-common-use-cases/index.html @@ -5,13 +5,13 @@ Operator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v15.0.2/operator-docs/trigger-workflow/index.html b/docs/v15.0.2/operator-docs/trigger-workflow/index.html index 4b32d7d234d..c9904c43483 100644 --- a/docs/v15.0.2/operator-docs/trigger-workflow/index.html +++ b/docs/v15.0.2/operator-docs/trigger-workflow/index.html @@ -5,13 +5,13 @@ Trigger a Workflow Execution | Cumulus Documentation - +
    Version: v15.0.2

    Trigger a Workflow Execution

    To trigger a workflow, you need to create a rule. To trigger an ingest workflow, one that requires discovering and ingesting data, you will also need to configure the collection and provider and associate those to a rule.

    Trigger a HelloWorld Workflow

    To trigger a HelloWorld workflow that does not need to discover or archive data, you just need to create a rule.

    You can leave the provider and collection blank and do not need any additional metadata. If you create a onetime rule, the workflow execution will start momentarily and you can view its status on the Executions page.

    Trigger an Ingest Workflow

    To ingest data, you will need a provider and collection configured to tell your workflow where to discover data and where to archive the data respectively.

    Follow the instructions to create a provider and create a collection and configure their fields for your data ingest.

    In the rule's additional metadata you can specify a provider_path from which to get the data from the provider.

    Example: Ingest data from S3

    Setup

    Assume there are 2 files to be ingested in an S3 bucket called discovery-bucket, located in the test-data folder:

    • GRANULE.A2017025.jpg
    • GRANULE.A2017025.hdf

    Archive buckets should already be created and mapped to public / private / protected in the Cumulus deployment.

    For example:

    buckets = {
    private = {
    name = "discovery-bucket"
    type = "private"
    },
    protected = {
    name = "archive-protected"
    type = "protected"
    }
    public = {
    name = "archive-public"
    type = "public"
    }
    }

    Create a provider

    Create a new provider. Set protocol to S3 and Host to discovery-bucket.

    Screenshot of adding a sample S3 provider

    Create a collection

    Create a new collection. Configure the collection to extract the granule id from the filenames and configure where to store the granule files.

    The configuration below will store hdf files in the protected bucket and jpg files in the private bucket. The bucket types are

    {
    "name": "test-collection",
    "version": "001",
    "granuleId": "^GRANULE\\.A[\\d]{7}$",
    "granuleIdExtraction": "(GRANULE\\..*)(\\.hdf|\\.jpg)",
    "reportToEms": false,
    "sampleFileName": "GRANULE.A2017025.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^GRANULE\\.A[\\d]{7}\\.hdf$",
    "sampleFileName": "GRANULE.A2017025.hdf"
    },
    {
    "bucket": "public",
    "regex": "^GRANULE\\.A[\\d]{7}\\.jpg$",
    "sampleFileName": "GRANULE.A2017025.jpg"
    }
    ]
    }

    Create a rule

    Create a rule to trigger the workflow to discover your granule data and ingest your granule.

    Select the previously created provider and collection. See the Cumulus Discover Granules workflow for a workflow example of using Cumulus tasks to discover and queue data for ingest.

    In the rule meta, set the provider_path to test-data, so the test-data folder will be used to discover new granules.

    Screenshot of adding a Discover Granules rule

    A onetime rule will run your workflow on-demand and you can view it on the dashboard Executions page. The Cumulus Discover Granules workflow will trigger an ingest workflow and your ingested granules will be visible on the dashboard Granules page.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/tasks/index.html b/docs/v15.0.2/tasks/index.html index 3d857b51546..ba3514f0b03 100644 --- a/docs/v15.0.2/tasks/index.html +++ b/docs/v15.0.2/tasks/index.html @@ -5,13 +5,13 @@ Cumulus Tasks | Cumulus Documentation - +
    Version: v15.0.2

    Cumulus Tasks

    A list of reusable Cumulus tasks. Add your own.

    Tasks

    @cumulus/add-missing-file-checksums

    Add checksums to files in S3 which don't have one


    @cumulus/discover-granules

    Discover Granules in FTP/HTTP/HTTPS/SFTP/S3 endpoints


    @cumulus/discover-pdrs

    Discover PDRs in FTP and HTTP endpoints


    @cumulus/files-to-granules

    Converts array-of-files input into a granules object by extracting granuleId from filename


    @cumulus/hello-world

    Example task


    @cumulus/hyrax-metadata-updates

    Update granule metadata with hooks to OPeNDAP URL


    @cumulus/lzards-backup

    Run LZARDS backup


    @cumulus/move-granules

    Move granule files from staging to final location


    @cumulus/parse-pdr

    Download and Parse a given PDR


    @cumulus/pdr-status-check

    Checks execution status of granules in a PDR


    @cumulus/post-to-cmr

    Post a given granule to CMR


    @cumulus/queue-granules

    Add discovered granules to the queue


    @cumulus/queue-pdrs

    Add discovered PDRs to a queue


    @cumulus/queue-workflow

    Add workflow to the queue


    @cumulus/sf-sqs-report

    Sends an incoming Cumulus message to SQS


    @cumulus/sync-granule

    Download a given granule


    @cumulus/test-processing

    Fake processing task used for integration tests


    @cumulus/update-cmr-access-constraints

    Updates CMR metadata to set access constraints


    Update CMR metadata files with correct online access urls and etags and transfer etag info to granules' CMR files

    - + \ No newline at end of file diff --git a/docs/v15.0.2/team/index.html b/docs/v15.0.2/team/index.html index 423e1b6312f..458498ec305 100644 --- a/docs/v15.0.2/team/index.html +++ b/docs/v15.0.2/team/index.html @@ -5,13 +5,13 @@ Cumulus Team | Cumulus Documentation - +
    Version: v15.0.2

    Cumulus Team

    Cumulus Core Team

    Cumulus Emeritus Team

    - + \ No newline at end of file diff --git a/docs/v15.0.2/troubleshooting/index.html b/docs/v15.0.2/troubleshooting/index.html index ef3e7dd49a4..0979f11ee85 100644 --- a/docs/v15.0.2/troubleshooting/index.html +++ b/docs/v15.0.2/troubleshooting/index.html @@ -5,14 +5,14 @@ How to Troubleshoot and Fix Issues | Cumulus Documentation - +
    Version: v15.0.2

    How to Troubleshoot and Fix Issues

    While Cumulus is a complex system, there is a focus on maintaining the integrity and availability of the system and data. Should you encounter errors or issues while using this system, this section will help troubleshoot and solve those issues.

    Backup and Restore

    Cumulus has backup and restore functionality built-in to protect Cumulus data and allow recovery of a Cumulus stack. This is currently limited to Cumulus data and not full S3 archive data. Backup and restore is not enabled by default and must be enabled and configured to take advantage of this feature.

    For more information, read the Backup and Restore documentation.

    Elasticsearch reindexing

    If you run into issues with your Elasticsearch index, a reindex operation is available via the Cumulus API. See the Reindexing Guide.

    Information on how to reindex Elasticsearch is in the Cumulus API documentation.

    Troubleshooting Workflows

    Workflows are state machines comprised of tasks and services and each component logs to CloudWatch. The CloudWatch logs for all steps in the execution are displayed in the Cumulus dashboard or you can find them by going to CloudWatch and navigating to the logs for that particular task.

    Workflow Errors

    Visual representations of executed workflows can be found in the Cumulus dashboard or the AWS Step Functions console for that particular execution.

    If a workflow errors, the error will be handled according to the error handling configuration. The task that fails will have the exception field populated in the output, giving information about the error. Further information can be found in the CloudWatch logs for the task.

    Graph of AWS Step Function execution showing a failing workflow

    Workflow Did Not Start

    Generally, first check your rule configuration. If that is satisfactory, the answer will likely be in the CloudWatch logs for the schedule SF or SF starter lambda functions. See the workflow triggers page for more information on how workflows start.

    For Kinesis and SNS rules specifically, if an error occurs during the message consumer process, the fallback consumer lambda will be called and if the message continues to error, a message will be placed on the dead letter queue. Check the dead letter queue for a failure message. Errors can be traced back to the CloudWatch logs for the message consumer and the fallback consumer. Additionally, check that the name and version match those configured in your rule, as rules are filtered by the notification's collection name and version before scheduling executions.

    More information on kinesis error handling is here.

    Operator API Errors

    All operator API calls are funneled through the ApiEndpoints lambda. Each API call is logged to the ApiEndpoints CloudWatch log for your deployment.

    Lambda Errors

    KMS Exception: AccessDeniedException

    KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.

    The above error was being thrown by cumulus lambda function invocation. The KMS key is the encryption key used to encrypt lambda environment variables. The root cause of this error is unknown, but is speculated to be caused by deleting and recreating, with the same name, the IAM role the lambda uses.

    This error can be resolved by switching the lambda's execution role to a different one and then back through the Lambda management console. Unfortunately, this approach doesn't scale well.

    The other resolution (that scales but takes some time) that was found is as follows:

    1. Comment out all lambda definitions (and dependent resources) in your Terraform configuration.
    2. terraform apply to delete the lambdas.
    3. Un-comment the definitions.
    4. terraform apply to recreate the lambdas.

    If this problem occurs with Core lambdas and you are using the terraform-aws-cumulus.zip file source distributed in our release, we recommend using the non-scaling approach as the number of lambdas we distribute is in the low teens, which are likely to be easier and faster to reconfigure one-by-one compared to editing our configs.

    Error: Unable to import module 'index': Error

    This error is shown in the CloudWatch logs for a Lambda function.

    One possible cause is that the Lambda definition in the .tf file defining the lambda is not pointing to the correct packaged lambda source file. In order to resolve this issue, update the lambda definition to point directly to the packaged (e.g. .zip) lambda source file.

    resource "aws_lambda_function" "discover_granules_task" {
    function_name = "${var.prefix}-DiscoverGranules"
    filename = "${path.module}/../../tasks/discover-granules/dist/lambda.zip"
    handler = "index.handler"
    }

    If you are seeing this error when using the Lambda as a step in a Cumulus workflow, then inspect the output for this Lambda step in the AWS Step Function console. If you see the error Cannot find module 'node_modules/@cumulus/cumulus-message-adapter-js', then you need to ensure the lambda's packaged dependencies include cumulus-message-adapter-js.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/troubleshooting/reindex-elasticsearch/index.html b/docs/v15.0.2/troubleshooting/reindex-elasticsearch/index.html index aa29b716d47..53bd98cb8f8 100644 --- a/docs/v15.0.2/troubleshooting/reindex-elasticsearch/index.html +++ b/docs/v15.0.2/troubleshooting/reindex-elasticsearch/index.html @@ -5,7 +5,7 @@ Reindexing Elasticsearch Guide | Cumulus Documentation - + @@ -14,7 +14,7 @@ current index, or the mappings for an index have been updated (they do not update automatically). Any reindexing that will be required when upgrading Cumulus will be in the Migration Steps section of the changelog.

    Switch to a new index and Reindex

    There are two operations needed: reindex and change-index to switch over to the new index. A Change Index/Reindex can be done in either order, but both have their trade-offs.

    If you decide to point Cumulus to a new (empty) index first (with a change index operation), and then Reindex the data to the new index, data ingested while reindexing will automatically be sent to the new index. As reindexing operations can take a while, not all the data will show up on the Cumulus Dashboard right away. The advantage is you do not have to turn of any ingest operations. This way is recommended.

    If you decide to Reindex data to a new index first, and then point Cumulus to that new index, it is not guaranteed that data that is sent to the old index while reindexing will show up in the new index. If you prefer this way, it is recommended to turn off any ingest operations. This order will keep your dashboard data from seeing any interruption.

    Change Index

    This will point Cumulus to the index in Elasticsearch that will be used when retrieving data. Performing a change index operation to an index that does not exist yet will create the index for you. The change index operation can be found here.

    Reindex from the old index to the new index

    The reindex operation will take the data from one index and copy it into another index. The reindex operation can be found here

    Reindex status

    Reindexing is a long-running operation. The reindex-status endpoint can be used to monitor the progress of the operation.

    Index from database

    If you want to just grab the data straight from the database you can perform an Index from Database Operation. After the data is indexed from the database, a Change Index operation will need to be performed to ensure Cumulus is pointing to the right index. It is strongly recommended to turn off workflow rules when performing this operation so any data ingested to the database is not lost.

    Validate reindex

    To validate the reindex, use the reindex-status endpoint. The doc count can be used to verify that the reindex was successful. In the below example the reindex from cumulus-2020-11-3 to cumulus-2021-3-4 was not fully successful as they show different doc counts.

    "indices": {
    "cumulus-2020-11-3": {
    "primaries": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    },
    "total": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    }
    },
    "cumulus-2021-3-4": {
    "primaries": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    },
    "total": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    }
    }
    }

    To further drill down into what is missing, log in to the Kibana instance (found in the Elasticsearch section of the AWS console) and run the following command replacing <index> with your index name.

    GET <index>/_search
    {
    "aggs": {
    "count_by_type": {
    "terms": {
    "field": "_type"
    }
    }
    },
    "size": 0
    }

    which will produce a result like

    "aggregations": {
    "count_by_type": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "logs",
    "doc_count": 483955
    },
    {
    "key": "execution",
    "doc_count": 4966
    },
    {
    "key": "deletedgranule",
    "doc_count": 4715
    },
    {
    "key": "pdr",
    "doc_count": 1822
    },
    {
    "key": "granule",
    "doc_count": 740
    },
    {
    "key": "asyncOperation",
    "doc_count": 616
    },
    {
    "key": "provider",
    "doc_count": 108
    },
    {
    "key": "collection",
    "doc_count": 87
    },
    {
    "key": "reconciliationReport",
    "doc_count": 48
    },
    {
    "key": "rule",
    "doc_count": 7
    }
    ]
    }
    }

    Resuming a reindex

    If a reindex operation did not fully complete it can be resumed using the following command run from the Kibana instance.

    POST _reindex?wait_for_completion=false
    {
    "conflicts": "proceed",
    "source": {
    "index": "cumulus-2020-11-3"
    },
    "dest": {
    "index": "cumulus-2021-3-4",
    "op_type": "create"
    }
    }

    The Cumulus API reindex-status endpoint can be used to monitor completion of this operation.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/troubleshooting/rerunning-workflow-executions/index.html b/docs/v15.0.2/troubleshooting/rerunning-workflow-executions/index.html index cd1cef03c6b..5fc36767cdf 100644 --- a/docs/v15.0.2/troubleshooting/rerunning-workflow-executions/index.html +++ b/docs/v15.0.2/troubleshooting/rerunning-workflow-executions/index.html @@ -5,13 +5,13 @@ Re-running workflow executions | Cumulus Documentation - +
    Version: v15.0.2

    Re-running workflow executions

    To re-run a Cumulus workflow execution from the AWS console:

    1. Visit the page for an individual workflow execution

    2. Click the "New execution" button at the top right of the screen

      Screenshot of the AWS console for a Step Function execution highlighting the &quot;New execution&quot; button at the top right of the screen

    3. In the "New execution" modal that appears, replace the cumulus_meta.execution_name value in the default input with the value of the new execution ID as seen in the screenshot below

      Screenshot of the AWS console showing the modal window for entering input when running a new Step Function execution

    4. Click the "Start execution" button

    - + \ No newline at end of file diff --git a/docs/v15.0.2/troubleshooting/troubleshooting-deployment/index.html b/docs/v15.0.2/troubleshooting/troubleshooting-deployment/index.html index 1c7ed45c8e8..2386e0adea4 100644 --- a/docs/v15.0.2/troubleshooting/troubleshooting-deployment/index.html +++ b/docs/v15.0.2/troubleshooting/troubleshooting-deployment/index.html @@ -5,7 +5,7 @@ Troubleshooting Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ data-persistence modules, but your config is only creating one Elasticsearch instance. To fix the issue, update the elasticsearch_config variable for your data-persistence module to increase the number of instances:

    {
    domain_name = "es"
    instance_count = 2
    instance_type = "t2.small.elasticsearch"
    version = "5.3"
    volume_size = 10
    }

    Install dashboard

    Dashboard configuration

    Issues:

    • Problem clearing the cache: EACCES: permission denied, rmdir '/tmp/gulp-cache/default'", this probably means the files at that location, and/or the folder, are owned by someone else (or some other factor prevents you from writing there).

    It's possible to workaround this by editing the file cumulus-dashboard/node_modules/gulp-cache/index.js and alter the value of the line var fileCache = new Cache({cacheDirName: 'gulp-cache'}); to something like var fileCache = new Cache({cacheDirName: '<prefix>-cache'});. Now gulp-cache will be able to write to /tmp/<prefix>-cache/default, and the error should resolve.

    Dashboard deployment

    Issues:

    • If the dashboard sends you to an Earthdata Login page that has an error reading "Invalid request, please verify the client status or redirect_uri before resubmitting", this means you've either forgotten to update one or more of your EARTHDATA_CLIENT_ID, EARTHDATA_CLIENT_PASSWORD environment variables (from your app/.env file) and re-deploy Cumulus, or you haven't placed the correct values in them, or you've forgotten to add both the "redirect" and "token" URL to the Earthdata Application.
    • There is odd caching behavior associated with the dashboard and Earthdata Login at this point in time that can cause the above error to reappear on the Earthdata Login page loaded by the dashboard even after fixing the cause of the error. If you experience this, attempt to access the dashboard in a new browser window, and it should work.
    - + \ No newline at end of file diff --git a/docs/v15.0.2/upgrade-notes/cumulus_distribution_migration/index.html b/docs/v15.0.2/upgrade-notes/cumulus_distribution_migration/index.html index 31d2ef2b430..fa1960d5cb7 100644 --- a/docs/v15.0.2/upgrade-notes/cumulus_distribution_migration/index.html +++ b/docs/v15.0.2/upgrade-notes/cumulus_distribution_migration/index.html @@ -5,14 +5,14 @@ Migrate from TEA deployment to Cumulus Distribution | Cumulus Documentation - +
    Version: v15.0.2

    Migrate from TEA deployment to Cumulus Distribution

    Background

    The Cumulus Distribution API is configured to use the AWS Cognito OAuth client. This API can be used instead of the Thin Egress App, which is the default distribution API if using the Deployment Template.

    Configuring a Cumulus Distribution deployment

    See these instructions for deploying the Cumulus Distribution API.

    Important note if migrating from TEA to Cumulus Distribution

    If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/upgrade-notes/migrate_tea_standalone/index.html b/docs/v15.0.2/upgrade-notes/migrate_tea_standalone/index.html index bafc2cb48cd..ce3532ca105 100644 --- a/docs/v15.0.2/upgrade-notes/migrate_tea_standalone/index.html +++ b/docs/v15.0.2/upgrade-notes/migrate_tea_standalone/index.html @@ -5,13 +5,13 @@ Migrate TEA deployment to standalone module | Cumulus Documentation - +
    Version: v15.0.2

    Migrate TEA deployment to standalone module

    Background

    This document is only relevant for upgrades of Cumulus from versions < 3.x.x to versions > 3.x.x

    Previous versions of Cumulus included deployment of the Thin Egress App (TEA) by default in the distribution module. As a result, Cumulus users who wanted to deploy a new version of TEA to wait on a new release of Cumulus that incorporated that release.

    In order to give Cumulus users the flexibility to deploy newer versions of TEA whenever they want, deployment of TEA has been removed from the distribution module and Cumulus users must now add the TEA module to their deployment. Guidance on integrating the TEA module to your deployment is provided, or you can refer to Cumulus core example deployment code for the thin_egress_app module.

    By default, when upgrading Cumulus and moving from TEA deployed via the distribution module to deployed as a separate module, your API gateway for TEA would be destroyed and re-created, which could cause outages for any Cloudfront endpoints pointing at that API gateway.

    These instructions outline how to modify your state to preserve your existing Thin Egress App (TEA) API gateway when upgrading Cumulus and moving deployment of TEA to a standalone module. If you do not care about preserving your API gateway for TEA when upgrading your Cumulus deployment, you can skip these instructions.

    Prerequisites

    Notes about state management

    These instructions will involve manipulating your Terraform state via terraform state mv commands. These operations are extremely dangerous, since a mistake in editing your Terraform state can leave your stack in a corrupted state where deployment may be impossible or may result in unanticipated resource deletion.

    Since bucket versioning preserves a separate version of your state file each time it is written, and the Terraform state modification commands overwrite the state file, we can mitigate the risk of these operations by downloading the most recent state file before starting the upgrade process. Then, if anything goes wrong during the upgrade, we can restore that previous state version. Guidance on how to perform both operations is provided below.

    Download your most recent state version

    Run this command to download the most recent cumulus deployment state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp s3://BUCKET/KEY /path/to/terraform.tfstate

    Restore a previous state version

    Upload the state file that was previously downloaded to the bucket/key for your state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp /path/to/terraform.tfstate s3://BUCKET/KEY

    Then run terraform plan, which will give an error because we manually overwrote the state file and it is now out of sync with the lock table Terraform uses to track your state file:

    Error: Error loading state: state data in S3 does not have the expected content.

    This may be caused by unusually long delays in S3 processing a previous state
    update. Please wait for a minute or two and try again. If this problem
    persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
    to manually verify the remote state and update the Digest value stored in the
    DynamoDB table to the following value: <some-digest-value>

    To resolve this error, run this command and replace DYNAMO_LOCK_TABLE, BUCKET and KEY with the correct values from cumulus-tf/terraform.tf, and use the digest value from the previous error output:

     aws dynamodb put-item \
    --table-name DYNAMO_LOCK_TABLE \
    --item '{
    "LockID": {"S": "BUCKET/KEY-md5"},
    "Digest": {"S": "some-digest-value"}
    }'

    Now, if you re-run terraform plan, it should work as expected.

    Migration instructions

    Please note: These instructions assume that you are deploying the thin_egress_app module as shown in the Cumulus core example deployment code

    1. Ensure that you have downloaded the latest version of your state file for your cumulus deployment

    2. Find the URL for your <prefix>-thin-egress-app-EgressGateway API gateway. Confirm that you can access it in the browser and that it is functional.

    3. Run terraform plan. You should see output like (edited for readability):

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be created
      + resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket.lambda_source will be created
      + resource "aws_s3_bucket" "lambda_source" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be created
      + resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be created
      + resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be created
      + resource "aws_s3_bucket_object" "lambda_source" {

      # module.thin_egress_app.aws_security_group.egress_lambda[0] will be created
      + resource "aws_security_group" "egress_lambda" {

      ...

      # module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be destroyed
      - resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source will be destroyed
      - resource "aws_s3_bucket" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be destroyed
      - resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be destroyed
      - resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source will be destroyed
      - resource "aws_s3_bucket_object" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda[0] will be destroyed
      - resource "aws_security_group" "egress_lambda" {
    4. Run the state modification commands. The commands must be run in exactly this order:

       # Move security group
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda module.thin_egress_app.aws_security_group.egress_lambda

      # Move TEA storage bucket
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source module.thin_egress_app.aws_s3_bucket.lambda_source

      # Move TEA lambda source code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source module.thin_egress_app.aws_s3_bucket_object.lambda_source

      # Move TEA lambda dependency code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive

      # Move TEA Cloudformation template
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template module.thin_egress_app.aws_s3_bucket_object.cloudformation_template

      # Move URS creds secret version
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret_version.thin_egress_urs_creds aws_secretsmanager_secret_version.thin_egress_urs_creds

      # Move URS creds secret
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret.thin_egress_urs_creds aws_secretsmanager_secret.thin_egress_urs_creds

      # Move TEA Cloudformation stack
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app module.thin_egress_app.aws_cloudformation_stack.thin_egress_app

      Depending on how you were supplying a bucket map to TEA, there may be an additional step. If you were specifying the bucket_map_key variable to the cumulus module to use a custom bucket map, then you can ignore this step and just ensure that the bucket_map_file variable to the TEA module uses that same S3 key. Otherwise, if you were letting Cumulus generate a bucket map for you, then you need to take this step to migrate that bucket map:

      # Move bucket map
      terraform state mv module.cumulus.module.distribution.aws_s3_bucket_object.bucket_map_yaml[0] aws_s3_bucket_object.bucket_map_yaml
    5. Run terraform plan again. You may still see a few additions/modifications pending like below, but you should not see any deletion of Thin Egress App resources pending:

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be updated in-place
      ~ resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be updated in-place
      ~ resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_source" {

      If you still see deletion of module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app pending, then something went wrong and you should restore the previously downloaded state file version and start over from step 1. Otherwise, proceed to step 6.

    6. Once you have confirmed that everything looks as expected, run terraform apply.

    7. Visit the same API gateway from step 1 and confirm that it still works.

    Your TEA deployment has now been migrated to a standalone module, which gives you the ability to upgrade the deployed version of TEA independently of Cumulus releases.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/upgrade-notes/update-cma-2.0.2/index.html b/docs/v15.0.2/upgrade-notes/update-cma-2.0.2/index.html index 280b6d806ee..4145032ec20 100644 --- a/docs/v15.0.2/upgrade-notes/update-cma-2.0.2/index.html +++ b/docs/v15.0.2/upgrade-notes/update-cma-2.0.2/index.html @@ -5,13 +5,13 @@ Upgrade to CMA 2.0.2 | Cumulus Documentation - +
    Version: v15.0.2

    Upgrade to CMA 2.0.2

    Updating a Cumulus Deployment to CMA 2.0.2

    Background

    The Cumulus Message Adapter has been updated in release 2.0.2 to no longer utilize the AWS step function API to look up the defined name of a step function task for population in meta.workflow_tasks, but instead use an incrementing integer field.

    Additionally a bugfix was released in the form of v2.0.1/v2.0.2 following the initial 2.0.0 release, so all users should update to release 2.0.2

    The update is not tied to a particular version of Core, however the update should be done across all task components in order to ensure consistent execution records.

    Changes

    Execution Record Update

    This update functionally means that Cumulus tasks/activities using the CMA will now record a record that looks like the following in meta.workflowtasks, and more importantly in the tasks column for an execution record:

    Original

          "DiscoverGranules": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "QueueGranules": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    New

          "0": {
    "name": "jk-tf-DiscoverGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxxx:function:jk-tf-DiscoverGranules"
    },
    "1": {
    "name": "jk-tf-QueueGranules",
    "version": "$LATEST",
    "arn": "arn:aws:lambda:us-east-1:xxxx:function:jk-tf-QueueGranules"
    }

    Actions Required

    The following should be done as part of a Cumulus stack update to utilize cumulus message adapter > 2.0.2:

    • Python tasks that utilize cumulus-message-adapter-python should be updated to use > 2.0.0, their lambdas rebuilt and Cumulus workflows reconfigured to use the updated version.

    • Python activities that utilize cumulus-process-py should be rebuilt using > 1.0.0 with updated dependencies, and have their images deployed/Cumulus configured to use the new version.

    • The cumulus-message-adapter v2.0.2 lambda layer should be made available in the deployment account, and the Cumulus deployment should be reconfigured to use it (via the cumulus_message_adapter_lambda_layer_version_arn variable in the cumulus module). This should address all Core node.js tasks that utilize the CMA, and many contributed node.js/JAVA components.

    Once the above have been done, redeploy Cumulus to apply the configuration and the updates should be live.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/upgrade-notes/update-task-file-schemas/index.html b/docs/v15.0.2/upgrade-notes/update-task-file-schemas/index.html index 998b4d3c7cb..c855f8503be 100644 --- a/docs/v15.0.2/upgrade-notes/update-task-file-schemas/index.html +++ b/docs/v15.0.2/upgrade-notes/update-task-file-schemas/index.html @@ -5,13 +5,13 @@ Updates to task granule file schemas | Cumulus Documentation - +
    Version: v15.0.2

    Updates to task granule file schemas

    Background

    Most Cumulus workflow tasks expect as input a payload of granule(s) which contain the files for each granule. Most tasks also return this same granule structure as output.

    However, up to this point, there was inconsistency in the schemas for the granule files objects expected by each task. Furthermore, there was no guarantee of consistency between granule files objects as stored in the database and the expectations of any given workflow task.

    Thus, when performing bulk granule operations which pass granules from the database into a Cumulus workflow, it was possible for there to be schema validation failures depending on which task was used to start the workflow and its particular schema.

    In order to rectify this situation, CUMULUS-2388 was filed and addressed to create a common granule files schema between nearly all of the Cumulus tasks (exceptions discussed below) and the Cumulus database. The following documentation explains the manual changes you need to make to your deployment in order to be compatible with the updated files schema.

    Updated files schema

    The updated granule files schema can be found here.

    These former properties were deprecated (with notes about how to derive the same information from the updated schema, if possible):

    • filename - concatenate the bucket and key values with a directory separator (/)
    • name - use fileName property
    • etag - ETags are no longer provided as an individual file property. Instead, a separate etags object mapping S3 URIs to ETag values is provided as output from the following workflow tasks (guidance on how to integrate this output with your workflows is provided in the Upgrading your workflows section below):
      • update-granules-cmr-metadata-file-links
      • hyrax-metadata-updates
    • fileStagingDir - no longer supported
    • url_path - no longer supported
    • duplicate_found - This property is no longer supported, however sync-granule and move-granules now produce a separate granuleDuplicates object as part of their output. The granuleDuplicates object is a map of granules by granule ID which includes the files that encountered duplicates during processing. Guidance on how to integrate granuleDuplicates information into your workflow configuration is provided below.

    Exceptions

    These workflow tasks did not have their schema for granule files updated:

    • discover-granules - no updates
    • queue-granules - no updates
    • parse-pdr - no updates
    • sync-granule - input schema not updated, output schema was updated

    The reason that these task schemas were not updated is that all of these tasks start before the files have been ingested to S3, thus much of the information that is required in the updated files schema like bucket, key, or checksum is not yet known.

    Bulk granule operations

    Since the input schema for the above tasks was not updated, that means you cannot run bulk granule operations against workflows if they start with any of those tasks. Bulk granule operations work by loading the specified granules from the database and sending them as input to a specified workflow, so if the specified workflow begins with a task whose input schema does not conform to what is coming out of the database, there will be schema errors.

    Upgrading your deployment

    Upgrading your workflows

    For any workflows using the update-granules-cmr-metadata-file-links task before the hyrax-metadata-updates and/or post-to-cmr tasks, update the step definition for update-granules-cmr-metadata-file-links as follows:

        "UpdateGranulesCmrMetadataFileLinksStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    hyrax-metadata-updates

    For any workflows using the hyrax-metadata-updates task before a post-to-cmr task, update the definition of the hyrax-metadata-updates step as follows:

        "HyraxMetadataUpdatesTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.etags}",
    "destination": "{$.meta.file_etags}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    ...more configuration...

    post-to-cmr

    For any workflows using post-to-cmr task after the update-granules-cmr-metadata-file-links or hyrax-metadata-updates tasks, update the post-to-cmr step definition as follows:

        "CmrStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "stack": "{$.meta.stack}",
    "cmr": "{$.meta.cmr}",
    "launchpad": "{$.meta.launchpad}",
    "etags": "{$.meta.file_etags}"
    }
    }
    },
    ...more configuration...

    Example workflow

    For an example workflow integrating all of these changes, please see our example ingest and publish workflow.

    Optional - Integrate granuleDuplicates information

    Please note that the granuleDuplicates output is purely informational and does not have any bearing on the separate configuration for how duplicates should be handled.

    You can include granuleDuplicates output from the sync-granule or move-granules tasks in your workflow messages like so:

        "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    ...other config...
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granuleDuplicates}",
    "destination": "{$.meta.sync_granule.granule_duplicates}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    }
    ...more configuration...

    The result of this configuration is that the granuleDuplicates output from sync-granule would be placed in meta.sync_granule.granule_duplicates on the workflow message and remain there throughout the rest of the workflow. The same configuration could be replicated for the move-granules task, but be sure to use a different destination in the workflow message for the granuleDuplicates output .

    Updating collection URL path templates

    Collections can specify url_path templates to dynamically generate the final location of files. As part of url_path templates, file object properties can be interpolated to generate the file path. Thus, these url_path templates need to be updated to ensure that they are compatible with the updated files schema and the properties that will actually be available on file objects.

    See the notes on the updated files schema to know which properties are available and which previously existing properties were deprecated.

    As an example, you will want to update any url_path properties in your collections to remove references to file.name and replace them with references to file.fileName like so:

    - "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.name, 0, 3)}",
    + "url_path": "{cmrMetadata.CollectionReference.ShortName}___{cmrMetadata.CollectionReference.Version}/{substring(file.fileName, 0, 3)}",
    - + \ No newline at end of file diff --git a/docs/v15.0.2/upgrade-notes/upgrade-rds/index.html b/docs/v15.0.2/upgrade-notes/upgrade-rds/index.html index 8ff92609c1d..1f5a3334a0e 100644 --- a/docs/v15.0.2/upgrade-notes/upgrade-rds/index.html +++ b/docs/v15.0.2/upgrade-notes/upgrade-rds/index.html @@ -5,7 +5,7 @@ Upgrade to RDS release | Cumulus Documentation - + @@ -21,7 +21,7 @@ | cutoffSeconds | number | Number of seconds prior to this execution to 'cutoff' reconciliation queries. This allows in-progress/other in-flight operations time to complete and propagate to Elasticsearch/Dynamo/postgres. | 3600 | | dbConcurrency | number | Sets max number of parallel collections reports the script will run at a time. | 20 | | dbMaxPool | number | Sets the maximum number of connections the database pool has available. Modifying this may result in unexpected failures. | 20 |

    - + \ No newline at end of file diff --git a/docs/v15.0.2/upgrade-notes/upgrade_tf_version_0.13.6/index.html b/docs/v15.0.2/upgrade-notes/upgrade_tf_version_0.13.6/index.html index 80f475c95c9..4ce13a7f652 100644 --- a/docs/v15.0.2/upgrade-notes/upgrade_tf_version_0.13.6/index.html +++ b/docs/v15.0.2/upgrade-notes/upgrade_tf_version_0.13.6/index.html @@ -5,13 +5,13 @@ Upgrade to TF version 0.13.6 | Cumulus Documentation - +
    Version: v15.0.2

    Upgrade to TF version 0.13.6

    Background

    Cumulus pins its support to a specific version of Terraform see: deployment documentation. The reason for only supporting one specific Terraform version at a time is to avoid deployment errors than can be caused by deploying to the same target with different Terraform versions.

    Cumulus is upgrading its supported version of Terraform from 0.12.12 to 0.13.6. This document contains instructions on how to perform the upgrade for your deployments.

    Prerequisites

    • Follow the Terraform guidance for what to do before upgrading, notably ensuring that you have no pending changes to your Cumulus deployments before proceeding.
      • You should do a terraform plan to see if you have any pending changes for your deployment (for both the data-persistence-tf and cumulus-tf modules), and if so, run a terraform apply before doing the upgrade to Terraform 0.13.6
    • Review the Terraform v0.13 release notes to prepare for any breaking changes that may affect your custom deployment code. Cumulus' deployment code has already been updated for compatibility with version 0.13.
    • Install Terraform version 0.13.6. We recommend using Terraform Version Manager tfenv to manage your installed versons of Terraform, but this is not required.

    Upgrade your deployment code

    Terraform 0.13 does not support some of the syntax from previous Terraform versions, so you need to upgrade your deployment code for compatibility.

    Terraform provides a 0.13upgrade command as part of version 0.13 to handle automatically upgrading your code. Make sure to check out the documentation on batch usage of 0.13upgrade, which will allow you to upgrade all of your Terraform code with one command.

    Run the 0.13upgrade command until you have no more necessary updates to your deployment code.

    Upgrade your deployment

    1. Ensure that you are running Terraform 0.13.6 by running terraform --version. If you are using tfenv, you can switch versions by running tfenv use 0.13.6.

    2. For the data-persistence-tf and cumulus-tf directories, take the following steps:

      1. Run terraform init --reconfigure. The --reconfigure flag is required, otherwise you might see an error like:

        Error: Failed to decode current backend config

        The backend configuration created by the most recent run of "terraform init"
        could not be decoded: unsupported attribute "lock_table". The configuration
        may have been initialized by an earlier version that used an incompatible
        configuration structure. Run "terraform init -reconfigure" to force
        re-initialization of the backend.
      2. Run terraform apply to perform a deployment.

        WARNING: Even if Terraform says that no resource changes are pending, running the apply using Terraform version 0.13.6 will modify your backend state from version 0.12.12 to version 0.13.6 without requiring approval. Updating the backend state is a necessary part of the version 0.13.6 upgrade, but it is not completely transparent.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/workflow_tasks/discover_granules/index.html b/docs/v15.0.2/workflow_tasks/discover_granules/index.html index 45195206032..4838e54148f 100644 --- a/docs/v15.0.2/workflow_tasks/discover_granules/index.html +++ b/docs/v15.0.2/workflow_tasks/discover_granules/index.html @@ -5,7 +5,7 @@ Discover Granules | Cumulus Documentation - + @@ -21,7 +21,7 @@ included in a granule's file list. That is, no such filtering based on filename occurs as described above.

    When set on the task configuration, the value applies to all collections during discovery. Otherwise, this property may be set on individual collections.

    Concurrency

    A number property that determines the level of concurrency with which granule duplicate checks are performed when duplicateGranuleHandling is skip or error.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when discover-granules discovers a large number of granules with skip or error duplicate handling. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the discover-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/workflow_tasks/files_to_granules/index.html b/docs/v15.0.2/workflow_tasks/files_to_granules/index.html index 8e78693f126..7f655572280 100644 --- a/docs/v15.0.2/workflow_tasks/files_to_granules/index.html +++ b/docs/v15.0.2/workflow_tasks/files_to_granules/index.html @@ -5,13 +5,13 @@ Files To Granules | Cumulus Documentation - +
    Version: v15.0.2

    Files To Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming config.inputGranules and the task input list of s3 URIs along with the rest of the configuration objects to take the list of incoming files and sort them into a list of granule objects.

    Please note Files passed in without metadata defined previously for config.inputGranules will be added with the following keys:

    • size
    • bucket
    • key
    • fileName

    It is primarily intended to support compatibility with the standard output of a processing task, and convert that output into a granule object accepted as input by the majority of other Cumulus tasks.

    Task Inputs

    Input

    This task expects an incoming input that contains an array of 'staged' S3 URIs to move to their final archive location.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    inputGranules

    An array of Cumulus granule objects.

    This object will be used to define metadata values for the move granules task, and is the basis for the updated object that will be added to the output.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/workflow_tasks/lzards_backup/index.html b/docs/v15.0.2/workflow_tasks/lzards_backup/index.html index 0b5fd9dacbe..1075f74ac4f 100644 --- a/docs/v15.0.2/workflow_tasks/lzards_backup/index.html +++ b/docs/v15.0.2/workflow_tasks/lzards_backup/index.html @@ -5,13 +5,13 @@ LZARDS Backup | Cumulus Documentation - +
    Version: v15.0.2

    LZARDS Backup

    The LZARDS backup task takes an array of granules and initiates backup requests to the LZARDS API, which will be handled asynchronously by LZARDS.

    Deployment

    The LZARDS backup task is not automatically deployed with Cumulus. To deploy the task through the Cumulus module, first you must specify a lzards_launchpad_passphrase in your terraform variables (e.g. variables.tf) like so:

    variable "lzards_launchpad_passphrase" {
    type = string
    default = ""
    }

    Then you can specify a value for your lzards_launchpad_passphrase in terraform.tfvars like so:

    lzards_launchpad_passphrase = your-passphrase

    Lastly, you need to make sure that the lzards_launchpad_passphrase is passed into the Cumulus module (in main.tf) like so:

    lzards_launchpad_passphrase  = var.lzards_launchpad_passphrase

    In short, deploying the LZARDS task requires configuring a passphrase variable and ensuring that your TF configuration passes that variable into the Cumulus module.

    Additional terraform configuration for the LZARDS task can be found in the cumulus module's variables.tf file, where the the relevant variables are prefixed with lzards_. You can add these variables to your deployment using the same process outlined above for lzards_launchpad_passphrase.

    Task Inputs

    Input

    This task expects an array of granules as input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Task Outputs

    Output

    The LZARDS task outputs a composite object containing:

    • the input granules array, and
    • a backupResults object that describes the results of LZARDS backup attempts.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/workflow_tasks/move_granules/index.html b/docs/v15.0.2/workflow_tasks/move_granules/index.html index a677652e500..ab4f4829844 100644 --- a/docs/v15.0.2/workflow_tasks/move_granules/index.html +++ b/docs/v15.0.2/workflow_tasks/move_granules/index.html @@ -5,13 +5,13 @@ Move Granules | Cumulus Documentation - +
    Version: v15.0.2

    Move Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming event.input array of Cumulus granule objects to do the following:

    • Move granules from their 'staging' location to the final location (as configured in the Sync Granules task)

    • Update the event.input object with the new file locations.

    • If the granule has a ECHO10/UMM CMR file(.cmr.xml or .cmr.json) file included in the event.input:

      • Update that file's access locations

      • Add it to the appropriate access URL category for the CMR filetype as defined by granule CNM filetype.

      • Set the CMR file to 'metadata' in the output granules object and add it to the granule files if it's not already present.

        Please note: Granules without a valid CNM type set in the granule file type field in event.input will be treated as "data" in the updated CMR metadata file

    • Task then outputs an updated list of granule objects.

    Task Inputs

    Input

    This task expects an incoming input that contains a list of 'staged' S3 URIs to move to their final archive location. If CMR metadata is to be updated for a granule, it must also be included in the input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects event.input to provide an array of Cumulus granule objects. The files listed for each granule represent the files to be acted upon as described in summary.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects with post-move file locations as the payload for the next task, and returns only the expected payload for the next task. If a CMR file has been specified for a granule object, the CMR resources related to the granule files will be updated according to the updated granule file metadata.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v15.0.2/workflow_tasks/parse_pdr/index.html b/docs/v15.0.2/workflow_tasks/parse_pdr/index.html index 16af79d4635..8817e881a67 100644 --- a/docs/v15.0.2/workflow_tasks/parse_pdr/index.html +++ b/docs/v15.0.2/workflow_tasks/parse_pdr/index.html @@ -5,13 +5,13 @@ Parse PDR | Cumulus Documentation - +
    Version: v15.0.2

    Parse PDR

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to do the following with the incoming PDR object:

    • Stage it to an internal S3 bucket

    • Parse the PDR

    • Archive the PDR and remove the staged file if successful

    • Outputs a payload object containing metadata about the parsed PDR (e.g. total size of all files, files counts, etc) and a granules object

    The constructed granules object is created using PDR metadata to determine values like data type and version, collection definitions to determine a file storage location based on the extracted data type and version number.

    Granule file types are converted from the PDR spec types to CNM types according to the following translation table:

      HDF: 'data',
    HDF-EOS: 'data',
    SCIENCE: 'data',
    BROWSE: 'browse',
    METADATA: 'metadata',
    BROWSE_METADATA: 'metadata',
    QA_METADATA: 'metadata',
    PRODHIST: 'qa',
    QA: 'metadata',
    TGZ: 'data',
    LINKAGE: 'data'

    Files missing file types will have none assigned, files with invalid types will result in a PDR parse failure.

    Task Inputs

    Input

    This task expects an incoming input that contains name and path information about the PDR to be parsed. For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    Provider

    A Cumulus provider object. Used to define connection information for retrieving the PDR.

    Bucket

    Defines the bucket where the 'pdrs' folder for parsed PDRs will be stored.

    Collection

    A Cumulus collection object. Used to define granule file groupings and granule metadata for discovered files.

    Task Outputs

    This task outputs a single payload output object containing metadata about the parsed PDR (e.g. filesCount, totalSize, etc), a pdr object with information for later steps and a the generated array of granule objects.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v15.0.2/workflow_tasks/queue_granules/index.html b/docs/v15.0.2/workflow_tasks/queue_granules/index.html index 6877be379cd..99e85726475 100644 --- a/docs/v15.0.2/workflow_tasks/queue_granules/index.html +++ b/docs/v15.0.2/workflow_tasks/queue_granules/index.html @@ -5,14 +5,14 @@ Queue Granules | Cumulus Documentation - +
    Version: v15.0.2

    Queue Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions, and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to schedule ingest of granules that were discovered on a remote host, whether via the DiscoverGranules task or the ParsePDR task.

    The task utilizes a defined collection in concert with a defined provider, either on each granule, or passed in via config to queue up ingest executions for each granule, or for batches of granules.

    The constructed granules object is defined by the collection passed in the configuration, and has impacts to other provided core Cumulus Tasks.

    Users of this task in a workflow are encouraged to carefully consider their configuration in context of downstream tasks and workflows.

    Task Inputs

    Each of the following sections are a high-level discussion of the intent of the various input/output/config values.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects an incoming input that contains granules and information about them and their files. For the specifics, see the Cumulus Tasks page entry for the schema.

    This input is most commonly the output from a preceding DiscoverGranules or ParsePDR task.

    Cumulus Configuration

    This task does expect values to be set in the task_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    provider

    A Cumulus provider object for the originating provider. Will be passed along to the ingest workflow. This will be overruled by more specific provider information that may exist on a granule.

    internalBucket

    The Cumulus internal system bucket.

    granuleIngestWorkflow

    A string property that denotes the name of the ingest workflow into which granules should be queued.

    queueUrl

    A string property that denotes the URL of the queue to which scheduled execution messages are sent.

    preferredQueueBatchSize

    A number property that sets an upper bound on the size of each batch of granules queued into the payload of an ingest execution. Setting this property to a value higher than 1 allows queueing of multiple granules per ingest workflow.

    As ingest executions typically expect granules in the payload to have a common collection and common provider, this property only sets an upper bound within which batches will be created based on common collection and provider information.

    This means batches may be smaller than the preferred size if collection or provider information diverge, but never larger.

    The default value if none is specified is 1, which will queue one ingest execution per granule.

    concurrency

    A number property that determines the level of concurrency with which ingest executions are scheduled. Granules or batches of granules will be queued up into executions at this level of concurrency.

    This property is also used to limit concurrency when updating granule status to queued.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when queue-granules receives a large number of granules as input. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the queue-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    executionNamePrefix

    A string property that will prefix the names of scheduled executions.

    childWorkflowMeta

    An object property that will be merged into the scheduled execution input's meta field.

    Task Outputs

    This task outputs an assembled array of workflow execution ARNs for all scheduled workflow executions within the payload's running object.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/workflows/cumulus-task-message-flow/index.html b/docs/v15.0.2/workflows/cumulus-task-message-flow/index.html index 5f0eb310b89..0f976a8c367 100644 --- a/docs/v15.0.2/workflows/cumulus-task-message-flow/index.html +++ b/docs/v15.0.2/workflows/cumulus-task-message-flow/index.html @@ -5,14 +5,14 @@ Cumulus Tasks: Message Flow | Cumulus Documentation - +
    Version: v15.0.2

    Cumulus Tasks: Message Flow

    Cumulus Tasks comprise Cumulus Workflows and are either AWS Lambda tasks or AWS Elastic Container Service (ECS) activities. Cumulus Tasks permit a payload as input to the main task application code. The task payload is additionally wrapped by the Cumulus Message Adapter. The Cumulus Message Adapter supplies additional information supporting message templating and metadata management of these workflows.

    Diagram showing how incoming and outgoing Cumulus messages for workflow steps are handled by the Cumulus Message Adapter

    The steps in this flow are detailed in sections below.

    Cumulus Message Format

    A full Cumulus Message has the following keys:

    • cumulus_meta: System runtime information that should generally not be touched outside of Cumulus library code or the Cumulus Message Adapter. Stores meta information about the workflow such as the state machine name and the current workflow execution's name. This information is used to look up the current active task. The name of the current active task is used to look up the corresponding task's config in task_config.
    • meta: Runtime information captured by the workflow operators. Stores execution-agnostic variables.
    • payload: Payload is runtime information for the tasks.

    In addition to the above keys, it may contain the following keys:

    • replace: A key generated in conjunction with the Cumulus Message adapter. It contains the location on S3 for a message payload and a Target JSON path in the message to extract it to.
    • exception: A key used to track workflow exceptions, should not be modified outside of Cumulus library code.

    Here's a simple example of a Cumulus Message:

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    A message utilizing the Cumulus Remote message functionality must have at least the keys replace and cumulus_meta. Depending on configuration other portions of the message may be present, however the cumulus_meta, meta, and payload keys must be present once extraction is complete.

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    Cumulus Message Preparation

    The event coming into a Cumulus Task is assumed to be a Cumulus Message and should first be handled by the functions described below before being passed to the task application code.

    Preparation Step 1: Fetch remote event

    Fetch remote event will fetch the full event from S3 if the cumulus message includes a replace key.

    Once "my-large-event.json" is fetched from S3, it's returned from the fetch remote event function. If no "replace" key is present, the event passed to the fetch remote event function is assumed to be a complete Cumulus Message and returned as-is.

    Preparation Step 2: Parse step function config from CMA configuration parameters

    This step determines what current task is being executed. Note this is different from what lambda or activity is being executed, because the same lambda or activity can be used for different tasks. The current task name is used to load the appropriate configuration from the Cumulus Message's 'task_config' configuration parameter.

    Preparation Step 3: Load nested event

    Using the config returned from the previous step, load nested event resolves templates for the final config and input to send to the task's application code.

    Task Application Code

    After message prep, the message passed to the task application code is of the form:

    {
    "input": {},
    "config": {}
    }

    Create Next Message functions

    Whatever comes out of the task application code is used to construct an outgoing Cumulus Message.

    Create Next Message Step 1: Assign outputs

    The config loaded from the Fetch step function config step may have a cumulus_message key. This can be used to "dispatch" fields from the task's application output to a destination in the final event output (via URL templating). Here's an example where the value of input.anykey would be dispatched as the value of payload.out in the final cumulus message:

    {
    "task_config": {
    "bar": "baz",
    "cumulus_message": {
    "input": "{$.payload.input}",
    "outputs": [
    {
    "source": "{$.input.anykey}",
    "destination": "{$.payload.out}"
    }
    ]
    }
    },
    "cumulus_meta": {
    "task": "Example",
    "message_source": "local",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "input": {
    "anykey": "anyvalue"
    }
    }
    }

    Create Next Message Step 2: Store remote event

    If the ReplaceConfiguration parameter is set, the configured key's value will be stored in S3 and the final output of the task will include a replace key that contains configuration for a future step to extract the payload on S3 back into the Cumulus Message. The replace key identifies where the large event node has been stored in S3.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/workflows/developing-a-cumulus-workflow/index.html b/docs/v15.0.2/workflows/developing-a-cumulus-workflow/index.html index a173bf0fd7b..66bec664d55 100644 --- a/docs/v15.0.2/workflows/developing-a-cumulus-workflow/index.html +++ b/docs/v15.0.2/workflows/developing-a-cumulus-workflow/index.html @@ -5,13 +5,13 @@ Creating a Cumulus Workflow | Cumulus Documentation - +
    Version: v15.0.2

    Creating a Cumulus Workflow

    The Cumulus workflow module

    To facilitate adding a workflows to your deployment Cumulus provides a workflow module.

    In combination with the Cumulus message, the workflow module provides a way to easily turn a Step Function definition into a Cumulus workflow, complete with:

    Using the module also ensures that your workflows will continue to be compatible with future versions of Cumulus.

    For more on the full set of current available options for the module, please consult the module README.

    Adding a new Cumulus workflow to your deployment

    To add a new Cumulus workflow to your deployment that is using the cumulus module, add a new workflow resource to your deployment directory, either in a new .tf file, or to an existing file.

    The workflow should follow a syntax similar to:

    module "my_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/vx.x.x/terraform-aws-cumulus-workflow.zip"

    prefix = "my-prefix"
    name = "MyWorkflowName"
    system_bucket = "my-internal-bucket"

    workflow_config = module.cumulus.workflow_config

    tags = { Deployment = var.prefix }

    state_machine_definition = <<JSON
    {}
    JSON
    }

    In the above example, you would add your state_machine_definition using the Amazon States Language, using tasks you've developed and Cumulus core tasks that are made available as part of the cumulus terraform module.

    Please note: Cumulus follows the convention of tagging resources with the prefix variable { Deployment = var.prefix } that you pass to the cumulus module. For resources defined outside of Core, it's recommended that you adopt this convention as it makes resources and/or deployment recovery scenarios much easier to manage.

    Examples

    For a functional example of a basic workflow, please take a look at the hello_world_workflow.

    For more complete/advanced examples, please read the following cookbook entries/topics:

    - + \ No newline at end of file diff --git a/docs/v15.0.2/workflows/developing-workflow-tasks/index.html b/docs/v15.0.2/workflows/developing-workflow-tasks/index.html index 1f6c9cd1658..bb8ecb059e7 100644 --- a/docs/v15.0.2/workflows/developing-workflow-tasks/index.html +++ b/docs/v15.0.2/workflows/developing-workflow-tasks/index.html @@ -5,13 +5,13 @@ Developing Workflow Tasks | Cumulus Documentation - +
    Version: v15.0.2

    Developing Workflow Tasks

    Workflow tasks can be either AWS Lambda Functions or ECS Activities.

    Lambda functions

    The full set of available core Lambda functions can be found in the deployed cumulus module zipfile at /tasks, as well as reference documentation here. These Lambdas can be referenced in workflows via the outputs from that module (see the cumulus-template-deploy repo for an example).

    The tasks source is located in the Cumulus repository at cumulus/tasks.

    You can also develop your own Lambda function. See the Lambda Functions page to learn more.

    ECS Activities

    ECS activities are supported via the cumulus_ecs_module available from the Cumulus release page.

    Please read the module README for configuration details.

    For assistance in creating a task definition within the module read the AWS Task Definition Docs.

    For a step-by-step example of using the cumulus_ecs_module, please see the related cookbook entry.

    Cumulus Docker Image

    ECS activities require a docker image. Cumulus provides a docker image (source for node 12x+ lambdas on dockerhub: cumuluss/cumulus-ecs-task.

    Alternate Docker Images

    Custom docker images/runtimes are supported as are private registries. For details on configuring a private registry/image see the AWS documentation on Private Registry Authentication for Tasks.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/workflows/docker/index.html b/docs/v15.0.2/workflows/docker/index.html index eb2d705a0b6..944b2dbf7fc 100644 --- a/docs/v15.0.2/workflows/docker/index.html +++ b/docs/v15.0.2/workflows/docker/index.html @@ -5,7 +5,7 @@ Dockerizing Data Processing | Cumulus Documentation - + @@ -14,7 +14,7 @@ 2) validate the output (in this case just check for existence) 3) use 'ncatted' to update the resulting file to be CF-compliant 4) write out metadata generated for this file

    Process Testing

    It is important to have tests for data processing, however in many cases datafiles can be large so it is not practical to store the test data in the repository. Instead, test data is currently stored on AWS S3, and can be retrieved using the AWS CLI.

    aws s3 sync s3://cumulus-ghrc-logs/sample-data/collection-name data

    Where collection-name is the name of the data collection, such as 'avaps', or 'cpl'. For example, an abridged version of the data for CPL includes:

    ├── cpl
    │   ├── input
    │   │   ├── HS3_CPL_ATB_12203a_20120906.hdf5
    │   │   ├── HS3_CPL_OP_12203a_20120906.hdf5
    │   └── output
    │   ├── HS3_CPL_ATB_12203a_20120906.nc
    │   ├── HS3_CPL_ATB_12203a_20120906.nc.meta.xml
    │   ├── HS3_CPL_OP_12203a_20120906.nc
    │   ├── HS3_CPL_OP_12203a_20120906.nc.meta.xml

    Contained in the input directory are all possible sets of data files, while the output directory is the expected result of processing. In this case the hdf5 files are converted to NetCDF files and XML metadata files are generated.

    The docker image for a process can be used on the retrieved test data. First create a test-output directory in the newly created data directory.

    mkdir data/test-output

    Then run the docker image using docker-compose.

    docker-compose run test

    This will process the data in the data/input directory and put the output into data/test-output. Repositories also include Python based tests which will validate this newly created output to the contents of data/output. Use Python's Nose tool to run the included tests.

    nosetests

    If the data/test-output directory validated against the contents of data/output the tests will be successful, otherwise an error will be reported.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/workflows/index.html b/docs/v15.0.2/workflows/index.html index 094fb6a70e2..c5e201eed6b 100644 --- a/docs/v15.0.2/workflows/index.html +++ b/docs/v15.0.2/workflows/index.html @@ -5,13 +5,13 @@ Workflows | Cumulus Documentation - +
    Version: v15.0.2

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    Provider data ingest and GIBS have a set of common needs in getting data from a source system and into the cloud where they can be distributed to end users. These common needs are:

    • Data Discovery - Crawling, polling, or detecting changes from a variety of sources.
    • Data Transformation - Taking data files in their original format and extracting and transforming them into another desired format such as visible browse images.
    • Archival - Storage of the files in a location that's accessible to end users.

    The high level view of the architecture and many of the individual steps are the same but the details of ingesting each type of collection differs. Different collection types and different providers have different needs. The individual boxes of a workflow are not only different. The branching, error handling, and multiplicity of the arrows connecting the boxes are also different. Some need visible images rendered from component data files from multiple collections. Some need to contact the CMR with updated metadata. Some will have different retry strategies to handle availability issues with source data systems.

    AWS and other cloud vendors provide an ideal solution for parts of these problems but there needs to be a higher level solution to allow the composition of AWS components into a full featured solution. The Ingest Workflow Architecture is designed to meet the needs for Earth Science data ingest and transformation.

    Goals

    Flexibility and Composability

    The steps to ingest and process data is different for each collection within a provider. Ingest should be as flexible as possible in the rearranging of steps and configuration.

    We want to use lego-like individual steps that can be composed by an operator.

    Individual steps should ...

    • Be as ignorant as possible of the overall flow. They should not be aware of previous steps.
    • Be runnable on their own.
    • Define their input and output in simple data structures.
    • Be domain agnostic.
    • Not make assumptions of specifics of what goes into a granule for example.

    Scalable

    The ingest architecture needs to be scalable both to handle ingesting hundreds of millions of granules and interpret dozens of different workflows.

    Data Provenance

    • We should have traceability for how data was produced and where it comes from.
    • Use immutable representations of data. Data once received is not overwritten. Data can be removed for cleanup.
    • All software is versioned. We can trace transformation of data by tracking the immutable source data and the versioned software applied to it.

    Operator Visibility and Control

    • Operators should be able to see and understand everything that is happening in the system.
    • It should be obvious why things are happening and straightforward to diagnose problems.
    • We generally assume that the operators know best in terms of the limits on a providers infrastructure, how often things need to be done, and details of a collection. The architecture should defer to their decisions and knowledge while providing safety nets to prevent problems.

    A Reconfigurable Workflow Architecture

    The Ingest Workflow Architecture is defined by two entity types, Workflows and Tasks. A Workflow is a set of composed Tasks to complete an objective such as ingesting a granule. Tasks are the individual steps of a Workflow that perform one job. The workflow is responsible for executing the right task based on the current state and response from the last task executed. Tasks are completely decoupled in that they don't call each other or even need to know about the presence of other tasks.

    Workflows and tasks are configured as Terraform resources, which are triggered via configured rules within Cumulus.

    Diagram showing the Step Function execution path through workflow tasks for a collection ingest

    See the Example GIBS Ingest Architecture showing how workflows and tasks are used to define the GIBS Ingest Architecture.

    Workflows

    A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions.

    Benefits of AWS Step Functions

    AWS Step functions are described in detail in the AWS documentation but they provide several benefits which are applicable to AWS.

    • Prebuilt solution
    • Operations Visibility
      • Visual diagram
      • Every execution is recorded with both inputs and output for every step.
    • Composability
      • Allow composing AWS Lambdas and code running in other steps. Code can be run in EC2 to interface with it or even on premise if desired.
      • Step functions allow specifying when steps run in parallel or choices between steps based on data from the previous step.
    • Flexibility
      • Step functions are designed to be easy to build new applications and reconfigure. We're exposing that flexibility directly to the provider.
    • Reliability and Error Handling
      • Step functions allow configuration of retries and adding handling of error conditions.
    • Described via data
      • This makes it easy to save the step function in configuration management solutions.
      • We can build simple interfaces on top of the flexibility provided.

    Workflow Scheduler

    The scheduler is responsible for initiating a step function and passing in the relevant data for a collection. This is currently configured as an interval for each collection. The scheduler service creates the initial event by combining the collection configuration with the AWS execution context defined via the cumulus terraform module.

    Tasks

    A workflow is composed of tasks. Each task is responsible for performing a discrete step of the ingest process. These can be activities like:

    • Crawling a provider website for new data.
    • Uploading data from a provider to S3.
    • Executing a process to transform data.

    AWS Step Functions permit tasks to be code running anywhere, even on premise. We expect most tasks will be written as Lambda functions in order to take advantage of the easy deployment, scalability, and cost benefits provided by AWS Lambda.

    • Leverages Existing Work
      • The design leverages the existing work of Amazon by defining workflows using the AWS Step Function State Language. This is the language that was created for describing the state machines used in AWS Step Functions.
    • Open for Extension
      • Both meta and task_config which are used for configuring at the collection and task levels do not dictate the fields and structure of the configuration. Additional task specific JSON schemas can be used for extending the validation of individual steps.
    • Data-centric Configuration
      • The use of a single JSON configuration file allows this to be added to a workflow. We build additional support on top of the configuration file for simpler domain specific configuration or interactive GUIs.

    For more details on Task Messages and Configuration, visit Cumulus configuration and message protocol documentation.

    Ingest Deploy

    To view deployment documentation, please see the Cumulus deployment documentation.

    Tradeoffs, and Benefits

    This section documents various tradeoffs and benefits of the Ingest Workflow Architecture.

    Tradeoffs

    Workflow execution is handled completely by AWS

    This means we can't add our own code into the orchestration of the workflow. We can't add new features not supported by Step Functions. We can't do things like enforce that the responses from tasks always conform to a schema or extract the configuration for a task ahead of it's execution.

    If we implemented our own orchestration we'd be able to add all of these. We save significant amounts of development effort and gain all the features of Step Functions for this trade off. One workaround is by providing a library of common task capabilities. These would optionally be available to tasks that can be implemented with Node.js and are able to include the library.

    Workflow Configuration is specified in AWS Step Function States Language

    The current design combines the states language defined by AWS with Ingest specific configuration. This means our representation has a tight coupling with their standard. If they make backwards incompatible changes in the future we will have to deal with existing projects written against that.

    We avoid having to develop our own standard and code to process it. The design can support new features in AWS Step Functions without needing to update the Ingest library code changes. It is unlikely they will make a backwards incompatible change at this point. One mitigation for this is writing data transformations to a new format if that were to happen.

    Collection Configuration Flexibility vs Complexity

    The Collections Configuration File is very flexible but requires more knowledge of AWS step functions to configure. A person modifying this file directly would need to comfortable editing a JSON file and configuring AWS Step Functions state transitions which address AWS resources.

    The configuration file itself is not necessarily meant to be edited by a human directly. Since we are developing a reconfigurable, composable architecture that specified entirely in data additional tools can be developed on top of it. The existing recipes.json files can be mapped to this format. Operational Tools like a GUI can be built that provide a usable interface for customizing workflows but it will take time to develop these tools.

    Benefits

    This section describes benefits of the Ingest Workflow Architecture.

    Simplicity

    The concepts of Workflows and Tasks are simple ones that should make sense to providers. Additionally, the implementation will only consist of a few components because the design leverages existing services and capabilities of AWS. The Ingest implementation will only consist of some reusable task code to make task implementation easier, Ingest deployment, and the Workflow Scheduler.

    Composability

    The design aims to satisfy the needs for ingest integrating different workflows for providers. It's flexible in terms of the ability to arrange tasks to meet the needs of a collection. Providers have developed and incorporated open source tools over the years. All of these are easily integrable into the workflows as tasks.

    There is low coupling between task steps. Failures of one component don't bring the whole system down. Individual tasks can be deployed separately.

    Scalability

    AWS Step Functions scale up as needed and aren't limited by a set of number of servers. They also easily allow you to leverage the inherent scalability of serverless functions.

    Monitoring and Auditing

    • Every execution is captured.
    • Every task run has captured input and outputs.
    • CloudWatch Metrics can be used for monitoring many of the events with the StepFunctions. It can also generate alarms for the whole process.
    • Visual report of the entire configuration.
      • Errors and success states are highlighted visually in the flow.

    Data Provenance

    • Monitoring and auditing ensures we know the data that was given to a task.
    • Workflows are versioned and the state machines stored in AWS Step Functions are immutable. Once created they cannot change.
    • Versioning of data in S3 or using immutable records in S3 will mean we always know what data was created as the result of a step or fed into a step.

    Appendix

    Example GIBS Ingest Architecture

    This shows the GIBS Ingest Architecture as an example of the use of the Ingest Workflow Architecture.

    • The GIBS Ingest Architecture consists of two workflows per collection type. There is one for discovery and one for ingest. The final stage of discovery triggers multiple ingest workflows for each MRF granule that needs to be generated.
    • It demonstrates both lambdas as tasks and a container used for MRF generation.

    GIBS Ingest Workflows

    Diagram showing the AWS Step Function execution path for a GIBS ingest workflow

    GIBS Ingest Granules Workflow

    This shows a visualization of an execution of the ingets granules workflow in step functions. The steps highlighted in green are the ones that executed and completed successfully.

    Diagram showing the AWS Step Function execution path for a GIBS ingest granules workflow

    - + \ No newline at end of file diff --git a/docs/v15.0.2/workflows/input_output/index.html b/docs/v15.0.2/workflows/input_output/index.html index d0086c579c2..2b6db8bc35a 100644 --- a/docs/v15.0.2/workflows/input_output/index.html +++ b/docs/v15.0.2/workflows/input_output/index.html @@ -5,14 +5,14 @@ Workflow Inputs & Outputs | Cumulus Documentation - +
    Version: v15.0.2

    Workflow Inputs & Outputs

    General Structure

    Cumulus uses a common format for all inputs and outputs to workflows. The same format is used for input and output from workflow steps. The common format consists of a JSON object which holds all necessary information about the task execution and AWS environment. Tasks return objects identical in format to their input with the exception of a task-specific payload field. Tasks may also augment their execution metadata.

    Cumulus Message Adapter

    The Cumulus Message Adapter and Cumulus Message Adapter libraries help task developers integrate their tasks into a Cumulus workflow. These libraries adapt input and outputs from tasks into the Cumulus Message format. The Scheduler service creates the initial event message by combining the collection configuration, external resource configuration, workflow configuration, and deployment environment settings. The subsequent workflow messages between tasks must conform to the message schema. By using the Cumulus Message Adapter, individual task Lambda functions only receive the input and output specifically configured for the task, and not non-task-related message fields.

    The Cumulus Message Adapter libraries are called by the tasks with a callback function containing the business logic of the task as a parameter. They first adapt the incoming message to a format more easily consumable by Cumulus tasks, then invoke the task, and then adapt the task response back to the Cumulus message protocol to be sent to the next task.

    A task's Lambda function can be configured to include a Cumulus Message Adapter library which constructs input/output messages and resolves task configurations. The CMA can then be included in one of several ways:

    Lambda Layer

    In order to make use of this configuration, a Lambda layer must be uploaded to your account. Due to platform restrictions, Core cannot currently support sharable public layers, however you can deploy the appropriate version from the release page in two ways:

    Once you've deployed the layer, integrate the CMA layer with your Lambdas:

    • If using the cumulus module, set the cumulus_message_adapter_lambda_layer_version_arn in your .tfvars file to integrate the CMA layer with all core Cumulus lambdas.
    • If including your own Lambda or ECS task Terraform modules, specify the CMA layer ARN in the Terraform resource definitions. Also, make sure to set the CUMULUS_MESSAGE_ADAPTER_DIR environment variable for the task to /opt for the CMA integration to work properly.

    In the future if you wish to update/change the CMA version you will need to update the deployed CMA, and update the layer configuration for the impacted Lambdas as needed.

    Please Note: Updating/removing a layer does not change a deployed Lambda, so to update the CMA you should deploy a new version of the CMA layer, update the associated Lambda configuration to reference the new CMA version, and re-deploy your Lambdas.

    Manual Addition

    You can include the CMA package in the Lambda code in the cumulus-message-adapter sub-directory in your lambda .zip, for any Lambda runtime that includes a python runtime. python 2 is included in Lambda runtimes that use Amazon Linux, however Amazon Linux 2 will not support this directly.

    Please note: It is expected that upcoming Cumulus releases will update the CMA layer to include a python runtime.

    If you are manually adding the message adapter to your source and utilizing the CMA, you should set the Lambda's CUMULUS_MESSAGE_ADAPTER_DIR environment variable to target the installation path for the CMA.

    CMA Input/Output

    Input to the task application code is a json object with keys:

    • input: By default, the incoming payload is the payload output from the previous task, or it can be a portion of the payload as configured for the task in the corresponding .tf workflow definition file.
    • config: Task-specific configuration object with URL templates resolved.

    Output from the task application code is returned in and placed in the payload key by default, but the config key can also be used to return just a portion of the task output.

    CMA configuration

    As of Cumulus > 1.15 and CMA > v1.1.1, configuration of the CMA is expected to be driven by AWS Step Function Parameters.

    Using the CMA package with the Lambda by any of the above mentioned methods (Lambda Layers, manual) requires configuration for its various features via a specific Step Function Parameters configuration format (see sample workflows in the examples cumulus-tf source for more examples):

    {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": "{some config}",
    "task_config": "{some config}"
    }
    }

    The "event.$": "$" parameter is required as it passes the entire incoming message to the CMA client library for parsing, and the CMA itself to convert the incoming message into a Cumulus message for use in the function.

    The following are the CMA's current configuration settings:

    ReplaceConfig (Cumulus Remote Message)

    Because of the potential size of a Cumulus message, mainly the payload field, a task can be set via configuration to store a portion of its output on S3 with a message key Remote Message that defines how to retrieve it and an empty JSON object {} in its place. If the portion of the message targeted exceeds the configured MaxSize (defaults to 0 bytes) it will be written to S3.

    The CMA remote message functionality can be configured using parameters in several ways:

    Partial Message

    Setting the Path/Target path in the ReplaceConfig parameter (and optionally a non-default MaxSize)

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 1,
    "Path": "$.payload",
    "TargetPath": "$.payload"
    }
    }
    }
    }
    }

    will result in any payload output larger than the MaxSize (in bytes) to be written to S3. The CMA will then mark that the key has been replaced via a replace key on the event. When the CMA picks up the replace key in future steps, it will attempt to retrieve the output from S3 and write it back to payload.

    Note that you can optionally use a different TargetPath than Path, however as the target is a JSON path there must be a key to target for replacement in the output of that step. Also note that the JSON path specified must target one node, otherwise the CMA will error, as it does not support multiple replacement targets.

    If TargetPath is omitted, it will default to the value for Path.

    Full Message

    Setting the following parameters for a lambda:

    DiscoverGranules:
    Parameters:
    cma:
    event.$: '$'
    ReplaceConfig:
    FullMessage: true

    will result in the CMA assuming the entire inbound message should be stored to S3 if it exceeds the default max size.

    This is effectively the same as doing:

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 0,
    "Path": "$",
    "TargetPath": "$"
    }
    }
    }
    }
    }

    Cumulus Message example

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Cumulus Remote Message example

    The message may contain a reference to an S3 Bucket, Key and TargetPath as follows:

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    task_config

    This configuration key contains the input/output configuration values for definition of inputs/outputs via URL paths. Important: These values are all relative to json object configured for event.$.

    This configuration's behavior is outlined in the CMA step description below.

    The configuration should follow the format:

    {
    "FunctionName": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "other_cma_configuration": "<config object>",
    "task_config": "<task config>"
    }
    }
    }
    }

    Example:

    {
    "StepFunction": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "sfnEnd": true,
    "stack": "{$.meta.stack}",
    "bucket": "{$.meta.buckets.internal.name}",
    "stateMachine": "{$.cumulus_meta.state_machine}",
    "executionName": "{$.cumulus_meta.execution_name}",
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    }
    }
    }

    Cumulus Message Adapter Steps

    1. Reformat AWS Step Function message into Cumulus Message

    Due to the way AWS handles Parameterized messages, when Parameters are used the CMA takes an inbound message:

    {
    "resource": "arn:aws:lambda:us-east-1:<lambda arn values>",
    "input": {
    "Other Parameter": {},
    "cma": {
    "ConfigKey": {
    "config values": "some config values"
    },
    "event": {
    "cumulus_meta": {},
    "payload": {},
    "meta": {},
    "exception": {}
    }
    }
    }
    }

    and takes the following actions:

    • Takes the object at input.cma.event and makes it the full input
    • Merges all of the keys except event under input.cma into the parent input object

    This results in the incoming message (presumably a Cumulus message) with any cma configuration parameters merged in being passed to the CMA. All other parameterized values defined outside of the cma key are ignored

    2. Resolve Remote Messages

    If the incoming Cumulus message has a replace key value, the CMA will attempt to pull the payload from S3,

    For example, if the incoming contains the following:

      "meta": {
    "foo": {}
    },
    "replace": {
    "TargetPath": "$.meta.foo",
    "Bucket": "some_bucket",
    "Key": "events/some-event-id"
    }

    The CMA will attempt to pull the file stored at Bucket/Key and replace the value at TargetPath, then remove the replace object entirely and continue.

    3. Resolve URL templates in the task configuration

    In the workflow configuration (defined under the task_config key), each task has its own configuration, and it can use URL template as a value to achieve simplicity or for values only available at execution time. The Cumulus Message Adapter resolves the URL templates (relative to the event configuration key) and then passes message to next task. For example, given a task which has the following configuration:

    {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }
    }
    }
    }

    and and incoming message that contains:

    {
    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    }
    }

    The corresponding Cumulus Message would contain:

    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }

    The message sent to the task would be:

    "config" : {
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    },
    "inlinestr": "prefixbarsuffix",
    "array": ["bar"],
    "object": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    },
    "input": "{...}"

    URL template variables replace dotted paths inside curly brackets with their corresponding value. If the Cumulus Message Adapter cannot resolve a value, it will ignore the template, leaving it verbatim in the string. While seemingly complex, this allows significant decoupling of Tasks from one another and the data that drives them. Tasks are able to easily receive runtime configuration produced by previously run tasks and domain data.

    4. Resolve task input

    By default, the incoming payload is the payload from the previous task. The task can also be configured to use a portion of the payload its input message. For example, given a task specifies cma.task_config.cumulus_message.input:

        ExampleTask:
    Parameters:
    cma:
    event.$: '$'
    task_config:
    cumulus_message:
    input: '{$.payload.foo}'

    The task configuration in the message would be:

        {
    "task_config": {
    "cumulus_message": {
    "input": "{$.payload.foo}"
    }
    },
    "payload": {
    "foo": {
    "anykey": "anyvalue"
    }
    }
    }

    The Cumulus Message Adapter will resolve the task input, instead of sending the whole payload as task input, the task input would be:

        {
    "input" : {
    "anykey": "anyvalue"
    },
    "config": {...}
    }

    5. Resolve task output

    By default, the task's return value is the next payload. However, the workflow task configuration can specify a portion of the return value as the next payload, and can also augment values to other fields. Based on the task configuration under cma.task_config.cumulus_message.outputs, the Message Adapter uses a task's return value to output a message as configured by the task-specific config defined under cma.task_config. The Message Adapter dispatches a "source" to a "destination" as defined by URL templates stored in the task-specific cumulus_message.outputs. The value of the task's return value at the "source" URL is used to create or replace the value of the task's return value at the "destination" URL. For example, given a task specifies cumulus_message.output in its workflow configuration as follows:

    {
    "ExampleTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    }
    }
    }
    }
    }

    The corresponding Cumulus Message would be:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Given the response from the task is:

        {
    "output": {
    "anykey": "boo"
    }
    }

    The Cumulus Message Adapter would output the following Cumulus Message:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    6. Apply Remote Message Configuration

    If the ReplaceConfig configuration parameter is defined, the CMA will evaluate the configuration options provided, and if required write a portion of the Cumulus Message to S3, and add a replace key to the message for future steps to utilize.

    Please Note: the non user-modifiable field cumulus-meta will always be retained, regardless of the configuration.

    For example, if the output message (post output configuration) from a cumulus message looks like:

        {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    the resultant output would look like:

    {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "replace": {
    "TargetPath": "$",
    "Bucket": "some-internal-bucket",
    "Key": "events/some-event-id"
    }
    }

    Additional features

    Validate task input, output and configuration messages against the schemas provided

    The Cumulus Message Adapter has the capability to validate task input, output and configuration messages against their schemas. The default location of the schemas is the schemas folder in the top level of the task and the default filenames are input.json, output.json, and config.json. The task can also configure a different schema location. If no schema can be found, the Cumulus Message Adapter will not validate the messages.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/workflows/lambda/index.html b/docs/v15.0.2/workflows/lambda/index.html index 365e6c360b2..dd1928cfe77 100644 --- a/docs/v15.0.2/workflows/lambda/index.html +++ b/docs/v15.0.2/workflows/lambda/index.html @@ -5,13 +5,13 @@ Develop Lambda Functions | Cumulus Documentation - +
    Version: v15.0.2

    Develop Lambda Functions

    Develop a new Cumulus Lambda

    AWS provides great getting started guide for building Lambdas in the developer guide.

    Cumulus currently supports the following environments for Cumulus Message Adapter enabled functions:

    Additionally you may chose to include any of the other languages AWS supports as a resource with reduced feature support.

    Deploy a Lambda

    Node.js Lambda

    For a new Node.js Lambda, create a new function and add an aws_lambda_function resource to your Cumulus deployment (for examples, see the example in source example/lambdas.tf and ingest/lambda-functions.tf) as either a new .tf file, or added to an existing .tf file:

    resource "aws_lambda_function" "myfunction" {
    function_name = "${var.prefix}-function"
    filename = "/path/to/zip/lambda.zip"
    source_code_hash = filebase64sha256("/path/to/zip/lambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"

    vpc_config {
    subnet_ids = var.subnet_ids
    security_group_ids = var.security_group_ids
    }
    }

    Please note: This example contains the minimum set of required configuration.

    Make sure to include a vpc_config that matches the information you've provided the cumulus module if intending to integrate the lambda with a Cumulus deployment.

    Java Lambda

    Java Lambdas are created in much the same way as the Node.js example above.

    The source points to a folder with the compiled .class files and dependency libraries in the Lambda Java zip folder structure (details here), not an uber-jar.

    The deploy folder referenced here would contain a folder 'test_task/task/' which contains Task.class and TaskLogic.class as well as a lib folder containing dependency jars.

    Python Lambda

    Python Lambdas are created the same way as the Node.js example above.

    Cumulus Message Adapter

    For Lambdas wishing to utilize the Cumulus Message Adapter(CMA), you should define a layers key on your Lambda resource with the CMA you wish to include. See the input_output docs for more on how to create/use the CMA.

    Other Lambda Options

    Cumulus supports all of the options available to you via the aws_lambda_function Terraform resource. For more information on what's available, check out the Terraform resource docs.

    Cloudwatch log groups

    If you want to enable Cloudwatch logging for your Lambda resource, you'll need to add a aws_cloudwatch_log_group resource to your Lambda definition:

    resource "aws_cloudwatch_log_group" "myfunction_log_group" {
    name = "/aws/lambda/${aws_lambda_function.myfunction.function_name}"
    retention_in_days = 30
    tags = { Deployment = var.prefix }
    }
    - + \ No newline at end of file diff --git a/docs/v15.0.2/workflows/protocol/index.html b/docs/v15.0.2/workflows/protocol/index.html index f9e90f99e71..5661bfffbb7 100644 --- a/docs/v15.0.2/workflows/protocol/index.html +++ b/docs/v15.0.2/workflows/protocol/index.html @@ -5,13 +5,13 @@ Workflow Protocol | Cumulus Documentation - +
    Version: v15.0.2

    Workflow Protocol

    Configuration and Message Use Diagram

    A diagram showing at which point in a workflow the Cumulus message is checked for conformity with the message schema and where the configuration is checked for conformity with the configuration schema

    • Configuration - The Cumulus workflow configuration defines everything needed to describe an instance of Cumulus.
    • Scheduler - This starts ingest of a collection on configured intervals.
    • Input to Step Functions - The Scheduler uses the Configuration as source data to construct the input to the Workflow.
    • AWS Step Functions - Run the workflows as kicked off by the scheduler or other processes.
    • Input to Task - The input for each task is a JSON document that conforms to the message schema.
    • Output from Task - The output of each task must conform to the message schemas as well and is used as the input for the subsequent task.
    - + \ No newline at end of file diff --git a/docs/v15.0.2/workflows/workflow-configuration-how-to/index.html b/docs/v15.0.2/workflows/workflow-configuration-how-to/index.html index 9de0ce6374c..9c07f228e4d 100644 --- a/docs/v15.0.2/workflows/workflow-configuration-how-to/index.html +++ b/docs/v15.0.2/workflows/workflow-configuration-how-to/index.html @@ -5,7 +5,7 @@ Workflow Configuration How To's | Cumulus Documentation - + @@ -24,7 +24,7 @@ To take a subset of any given metadata, use the option substring.

    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{substring(file.fileName, 0, 3)}"

    This example will populate to "MOD09GQ/MOD"

    In addition to substring, several datetime-specific functions are available, which can parse a datetime string in the metadata and extract a certain part of it:

    "url_path": "{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"

    or

     "url_path": "{dateFormat(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime, YYYY-MM-DD[T]HH[:]mm[:]ss)}"

    The following functions are implemented:

    • extractYear - returns the year, formatted as YYYY
    • extractMonth - returns the month, formatted as MM
    • extractDate - returns the day of the month, formatted as DD
    • extractHour - returns the hour in 24-hour format, with no leading zero
    • dateFormat - takes a second argument describing how to format the date, and passes the metadata date string and the format argument to moment().format()

    Note: the move-granules step needs to be in the workflow for this template to be populated and the file moved. This cmrMetadata or CMR granule XML needs to have been generated and stored on S3. From there any field could be retrieved and used for a url_path.

    Adding Metadata dates and times to the URL Path

    There are a number of options to pull dates from the CMR file metadata. With this metadata:

    <Granule>
    <Temporal>
    <RangeDateTime>
    <BeginningDateTime>2003-02-19T00:00:00Z</BeginningDateTime>
    <EndingDateTime>2003-02-19T23:59:59Z</EndingDateTime>
    </RangeDateTime>
    </Temporal>
    </Granule>

    The following examples of url_path could be used.

    {extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the year from the full date: 2003.

    {extractMonth(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the month: 2.

    {extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the day: 19.

    {extractHour(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the hour: 0.

    Different values can be combined to create the url_path. For example

    {
    "bucket": "sample-protected-bucket",
    "name": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)/extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"
    }

    The final file location for the above would be s3://sample-protected-bucket/MOD09GQ/2003/19/MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.

    - + \ No newline at end of file diff --git a/docs/v15.0.2/workflows/workflow-triggers/index.html b/docs/v15.0.2/workflows/workflow-triggers/index.html index cbcb9c1acd8..4c1d1642046 100644 --- a/docs/v15.0.2/workflows/workflow-triggers/index.html +++ b/docs/v15.0.2/workflows/workflow-triggers/index.html @@ -5,13 +5,13 @@ Workflow Triggers | Cumulus Documentation - +
    Version: v15.0.2

    Workflow Triggers

    For a workflow to run, it needs to be associated with a rule (see rule configuration). The rule configuration determines how and when a workflow execution is triggered. Rules can be triggered one time, on a schedule, or by new data written to a kinesis stream.

    There are three lambda functions in the API package responsible for scheduling and starting workflows: SF scheduler, message consumer, and SF starter. Each Cumulus instance comes with a Start SF SQS queue.

    The SF scheduler lambda puts a message onto the Start SF queue. This message is picked up the Start SF lambda and an execution is started with the body of the message as the input.

    When a one time rule is created, the schedule SF lambda is triggered. Rules that are not one time are associated with a CloudWatch event which will manage the trigger of the lambdas that trigger the workflows.

    For a scheduled rule, the Cloudwatch event is triggered on the given schedule which calls directly to the schedule SF lambda.

    For a kinesis rule, when data is added to the kinesis stream, the Cloudwatch event is triggered, which calls the message consumer lambda. The message consumer lambda parses the kinesis message and finds all of the rules associated with that message. For each rule (which corresponds to one workflow), the schedule SF lambda is triggered to queue a message to start the workflow.

    For an sns rule, when a message is published to the SNS topic, the message consumer receives the SNS message (JSON expected), parses it into an object, starts a new execution of the workflow associated with the rule and passes the object in the payload field of the Cumulus message.

    Diagram showing how workflows are scheduled via rules

    - + \ No newline at end of file diff --git a/docs/v9.0.0/adding-a-task/index.html b/docs/v9.0.0/adding-a-task/index.html index 679cc73d2a6..48419632088 100644 --- a/docs/v9.0.0/adding-a-task/index.html +++ b/docs/v9.0.0/adding-a-task/index.html @@ -5,13 +5,13 @@ Contributing a Task | Cumulus Documentation - +
    Version: v9.0.0

    Contributing a Task

    We're tracking reusable Cumulus tasks in this list and, if you've got one you'd like to share with others, you can add it!

    Right now we're focused on tasks distributed via npm, but are open to including others. For now the script that pulls all the data for each package only supports npm.

    The tasks.md file is generated in the build process

    The tasks list in docs/tasks.md is generated from the list of task package names from the tasks folder.

    Do not edit the docs/tasks.md file directly.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/api/index.html b/docs/v9.0.0/api/index.html index 32448e2b44b..59451d4e317 100644 --- a/docs/v9.0.0/api/index.html +++ b/docs/v9.0.0/api/index.html @@ -5,13 +5,13 @@ Cumulus API | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v9.0.0/architecture/index.html b/docs/v9.0.0/architecture/index.html index 32e3814a3fe..1fda100d683 100644 --- a/docs/v9.0.0/architecture/index.html +++ b/docs/v9.0.0/architecture/index.html @@ -5,14 +5,14 @@ Architecture | Cumulus Documentation - +
    Version: v9.0.0

    Architecture

    Architecture

    Below, find a diagram with the components that comprise an instance of Cumulus.

    Architecture diagram of a Cumulus deployment

    This diagram details all of the major architectural components of a Cumulus deployment.

    While the diagram can feel complex, it can easily be digested in several major components:

    Data Distribution

    End Users can access data via Cumulus's distribution submodule, which includes ASF's thin egress application, this provides authenticated data egress, temporary S3 links and other statistics features.

    End user exposure of Cumulus's holdings is expected to be provided by an external service.

    For NASA use, this is assumed to be CMR in this diagram.

    Data ingest

    Workflows

    The core of the ingest and processing capabilities in Cumulus is built into the deployed AWS Step Function workflows. Cumulus rules trigger workflows via either Cloud Watch rules, Kinesis streams, SNS topic, or SQS queue. The workflows then run with a configured Cumulus message, utilizing built-in processes to report status of granules, PDRs, executions, etc to the Data Persistence components.

    Workflows can optionally report granule metadata to CMR, and workflow steps can report metrics information to a shared SNS topic, which could be subscribed to for near real time granule, execution, and PDR status. This could be used for metrics reporting using an external ELK stack, for example.

    Data persistence

    Cumulus entity state data is stored in a set of DynamoDB database tables, and is exported to an ElasticSearch instance for non-authoritative querying/state data for the API and other applications that require more complex queries.

    Data discovery

    Discovering data for ingest is handled via workflow step components using Cumulus provider and collection configurations and various triggers. Data can be ingested from AWS S3, FTP, HTTPS and more.

    Maintenance

    System maintenance personnel have access to manage ingest and various portions of Cumulus via an AWS API gateway, as well as the operator dashboard.

    Deployment Structure

    Cumulus is deployed via Terraform and is organized internally into two separate top-level modules, as well as several external modules.

    Cumulus

    The Cumulus module, which contains multiple internal submodules, deploys all of the Cumulus components that are not part of the Data Persistence portion of this diagram.

    Data persistence

    The data persistence module provides the Data Persistence portion of the diagram.

    Other modules

    Other modules are provided as artifacts on the release page for use in users configuring their own deployment and contain extracted subcomponents of the cumulus module. For more on these components see the components documentation.

    For more on the specific structure, examples of use and how to deploy and more, please see the deployment docs as well as the cumulus-template-deploy repo .

    - + \ No newline at end of file diff --git a/docs/v9.0.0/configuration/cloudwatch-retention/index.html b/docs/v9.0.0/configuration/cloudwatch-retention/index.html index 730433a78ad..d1875f949ec 100644 --- a/docs/v9.0.0/configuration/cloudwatch-retention/index.html +++ b/docs/v9.0.0/configuration/cloudwatch-retention/index.html @@ -5,13 +5,13 @@ Cloudwatch Retention | Cumulus Documentation - +
    Version: v9.0.0

    Cloudwatch Retention

    Our lambdas dump logs to AWS CloudWatch. By default, these logs exist indefinitely. However, there are ways to specify a duration for log retention.

    aws-cli

    In addition to getting your aws-cli set-up, there are two values you'll need to acquire.

    1. log-group-name: the name of the log group who's retention policy (retention time) you'd like to change. We'll use /aws/lambda/KinesisInboundLogger in our examples.
    2. retention-in-days: the number of days you'd like to retain the logs in the specified log group for. There is a list of possible values available in the aws logs documentation.

    For example, if we wanted to set log retention to 30 days on our KinesisInboundLogger lambda, we would write:

    aws logs put-retention-policy --log-group-name "/aws/lambda/KinesisInboundLogger" --retention-in-days 30

    Note: The aws-cli log command that we're using is explained in detail here.

    AWS Management Console

    Changing the log retention policy in the AWS Management Console is a fairly simple process:

    1. Navigate to the CloudWatch service in the AWS Management Console.
    2. Click on the Logs entry on the sidebar.
    3. Find the Log Group who's retention policy you're interested in changing.
    4. Click on the value in the Expire Events After column.
    5. Enter/Select the number of days you'd like to retain logs in that log group for.

    Screenshot of AWS console showing how to configure the retention period for Cloudwatch logs

    - + \ No newline at end of file diff --git a/docs/v9.0.0/configuration/collection-storage-best-practices/index.html b/docs/v9.0.0/configuration/collection-storage-best-practices/index.html index d7f5c9b95fd..09e798de217 100644 --- a/docs/v9.0.0/configuration/collection-storage-best-practices/index.html +++ b/docs/v9.0.0/configuration/collection-storage-best-practices/index.html @@ -5,13 +5,13 @@ Collection Cost Tracking and Storage Best Practices | Cumulus Documentation - +
    Version: v9.0.0

    Collection Cost Tracking and Storage Best Practices

    Organizing your data is important for metrics you may want to collect. AWS S3 storage and cost metrics are calculated at the bucket level, so it is easy to get metrics by bucket. You can get storage metrics at the key prefix level, but that is done through the CLI, which can be very slow for large buckets. It is very difficult to estimate costs at the prefix level.

    Calculating Storage By Collection

    By bucket

    Usage by bucket can be obtained in your AWS Billing Dashboard via an S3 Usage Report. You can download your usage report for a period of time and review your storage and requests at the bucket level.

    Bucket metrics can also be found in the AWS CloudWatch Metrics Console (also see Using Amazon CloudWatch Metrics).

    Navigate to Storage Metrics and select the BucketName for all buckets you are interested in. The available metrics are BucketSizeInBytes and NumberOfObjects.

    In the Graphed metrics tab, you can select the type of statistic (i.e. average, minimum, maximum) and the period for the stats. At the top, it's useful to select from the dropdown to view the metrics as a number. You can also select the time period for which you want to see stats.

    Alternatively you can query CloudWatch using the CLI.

    This command will return the average number of bytes in the bucket test-bucket for 7/31/2019:

    aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2019-07-31T00:00:00 --end-time 2019-08-01T00:00:00 --period 86400 --statistics Average --region us-east-1 --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=test-bucket Name=StorageType,Value=StandardStorage

    The result looks like:

    {
    "Datapoints": [
    {
    "Timestamp": "2019-07-31T00:00:00Z",
    "Average": 150996467959.0,
    "Unit": "Bytes"
    }
    ],
    "Label": "BucketSizeBytes"
    }

    By key prefix

    AWS does not offer storage and usage statistics at a key prefix level. Via the AWS CLI, you can get the total storage for a bucket or folder. The following command would get the storage for folder example-folder in bucket sample-bucket:

    aws s3 ls --summarize --human-readable --recursive s3://sample-bucket/example-folder | grep 'Total'

    Note that this can be a long-running operation for large buckets.

    Calculating Cost By Collection

    NASA NGAP Environment

    If using an NGAP account, the cost per bucket can be found in your CloudTamer console, in the Financials section of your account information. This is calculated on a monthly basis.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Outside of NGAP

    You can enabled S3 Cost Allocation Tags and tag your buckets. From there, you can view the cost breakdown in your AWS Billing Dashboard via the Cost Explorer. Cost Allocation Tagging is available at the bucket level.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Storage Configuration

    Cumulus allows for the configuration of many buckets for your files. Buckets are created and added to your deployment as part of the deployment process.

    In your Cumulus collection configuration, you specify where you want the files to be stored post-processing. This is done by matching a regular expression on the file with the configured bucket.

    Note that in the collection configuration, the bucket field is the key to the buckets variable in the deployment's .tfvars file.

    Organizing By Bucket

    You can specify separate groups of buckets for each collection, which could look like the example below.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "MOD09GQ-006-private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "MOD09GQ-006-public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    Additional collections would go to different buckets.

    Organizing by Key Prefix

    Different collections can be organized into different folders in the same bucket, using the key prefix, which is specified as the url_path in the collection configuration. In this simplified collection configuration example, the url_path field is set at the top level so that all files go to a path prefixed with the collection name and version.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    In this case, the path to all the files would be: MOD09GQ___006/<filename> in their respective buckets.

    The url_path can be overidden directly on the file configuration. The example below produces the same result.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "protected-2",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    }
    ]
    }
    - + \ No newline at end of file diff --git a/docs/v9.0.0/configuration/data-management-types/index.html b/docs/v9.0.0/configuration/data-management-types/index.html index 91947e51666..87de5ad9f5b 100644 --- a/docs/v9.0.0/configuration/data-management-types/index.html +++ b/docs/v9.0.0/configuration/data-management-types/index.html @@ -5,13 +5,13 @@ Cumulus Data Management Types | Cumulus Documentation - +
    Version: v9.0.0

    Cumulus Data Management Types

    What Are The Cumulus Data Management Types

    • Collections: Collections are logical sets of data objects of the same data type and version. They provide contextual information used by Cumulus ingest.
    • Granules: Granules are the smallest aggregation of data that can be independently managed. They are always associated with a collection, which is a grouping of granules.
    • Providers: Providers generate and distribute input data that Cumulus obtains and sends to workflows.
    • Rules: Rules tell Cumulus how to associate providers and collections and when/how to start processing a workflow.
    • Workflows: Workflows are composed of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage, and archive data.
    • Executions: Executions are records of a workflow.
    • Reconciliation Reports: Reports are a comparison of data sets to check to see if they are in agreement and to help Cumulus users detect conflicts.

    Interaction

    • Providers tell Cumulus where to get new data - i.e. S3, HTTPS
    • Collections tell Cumulus where to store the data files
    • Rules tell Cumulus when to trigger a workflow execution and tie providers and collections together

    Managing Data Management Types

    The following are created via the dashboard or API:

    • Providers
    • Collections
    • Rules
    • Reconciliation reports

    Granules are created by workflow executions and then can be managed via the dashboard or API.

    An execution record is created for each workflow execution triggered and can be viewed in the dashboard or data can be retrieved via the API.

    Workflows are created and managed via the Cumulus deployment.

    Configuration Fields

    Schemas

    Looking at our API schema definitions can provide us with some insight into collections, providers, rules, and their attributes (and whether those are required or not). The schema for different concepts will be reference throughout this document.

    The schemas are extremely useful for understanding which attributes are configurable and which of those are required. Cumulus uses these schemas for validation.

    Providers

    Please note:

    • While connection configuration is defined here, things that are more specific to a specific ingest setup (e.g. 'What target directory should we be pulling from' or 'How is duplicate handling configured?') are generally defined in a Rule or Collection, not the Provider.
    • There is some provider behavior which is controlled by task-specific configuration and not the provider definition. This configuration has to be set on a per-workflow basis. For example, see the httpListTimeout configuration on the discover-granules task

    Provider Configuration

    The Provider configuration is defined by a JSON object that takes different configuration keys depending on the provider type. The following are definitions of typical configuration values relevant for the various providers:

    Configuration by provider type
    S3
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be s3 for this provider type.
    hoststringYesS3 Bucket to pull data from
    http
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be http for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 80
    https
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be https for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 443
    ftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be ftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to anonymous if not defined
    passwordstringNoPassword to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to password if not defined
    portintegerNoPort to connect to the provider on. Defaults to 21
    sftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be sftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the sftp server.
    passwordstringNoPassword to use to connect to the sftp server.
    portintegerNoPort to connect to the provider on. Defaults to 22

    Collections

    Break down of [s3_MOD09GQ_006.json](https://github.com/nasa/cumulus/blob/master/example/data/collections/s3_MOD09GQ_006/s3_MOD09GQ_006.json)
    KeyValueRequiredDescription
    name"MOD09GQ"YesThe name attribute designates the name of the collection. This is the name under which the collection will be displayed on the dashboard
    version"006"YesA version tag for the collection
    granuleId"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$"YesThe regular expression used to validate the granule ID extracted from filenames according to the granuleIdExtraction
    granuleIdExtraction"(MOD09GQ\..*)(\.hdf|\.cmr|_ndvi\.jpg)"YesThe regular expression used to extract the granule ID from filenames. The first capturing group extracted from the filename by the regex will be used as the granule ID.
    sampleFileName"MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesAn example filename belonging to this collection
    files<JSON Object> of files defined hereYesDescribe the individual files that will exist for each granule in this collection (size, browse, meta, etc.)
    dataType"MOD09GQ"NoCan be specified, but this value will default to the collection_name if not
    duplicateHandling"replace"No("replace"|"version"|"skip") determines granule duplicate handling scheme
    ignoreFilesConfigForDiscoveryfalse (default)NoBy default, during discovery only files that match one of the regular expressions in this collection's files attribute (see above) are ingested. Setting this to true will ignore the files attribute during discovery, meaning that all files for a granule (i.e., all files with filenames matching granuleIdExtraction) will be ingested even when they don't match a regular expression in the files attribute at discovery time. (NOTE: this attribute does not appear in the example file, but is listed here for completeness.)
    process"modis"NoExample options for this are found in the ChooseProcess step definition in the IngestAndPublish workflow definition
    meta<JSON Object> of MetaData for the collectionNoMetaData for the collection. This metadata will be available to workflows for this collection via the Cumulus Message Adapter.
    url_path"{cmrMetadata.Granule.Collection.ShortName}/
    {substring(file.name, 0, 3)}"
    NoFilename without extension

    files-object

    KeyValueRequiredDescription
    regex"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"YesRegular expression used to identify the file
    sampleFileNameMOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesFilename used to validate the provided regex
    type"data"NoValue to be assigned to the Granule File Type. CNM types are used by Cumulus CMR steps, non-CNM values will be treated as 'data' type. Currently only utilized in DiscoverGranules task
    bucket"internal"YesName of the bucket where the file will be stored
    url_path"${collectionShortName}/{substring(file.name, 0, 3)}"NoFolder used to save the granule in the bucket. Defaults to the collection url_path
    checksumFor"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"NoIf this is a checksum file, set checksumFor to the regex of the target file.

    Rules

    Rules are used by to start processing workflows and the transformation process. Rules can be invoked manually, based on a schedule, or can be configured to be triggered by either events in Kinesis, SNS messages or SQS messages.

    Rule configuration
    KeyValueRequiredDescription
    name"L2_HR_PIXC_kinesisRule"YesName of the rule. This is the name under which the rule will be listed on the dashboard
    workflow"CNMExampleWorkflow"YesName of the workflow to be run. A list of available workflows can be found on the Workflows page
    provider"PODAAC_SWOT"NoConfigured provider's ID. This can be found on the Providers dashboard page
    collection<JSON Object> collection object shown belowYesName and version of the collection this rule will moderate. Relates to a collection configured and found in the Collections page
    payload<JSON Object or Array>NoThe payload to be passed to the workflow
    meta<JSON Object> of MetaData for the ruleNoMetaData for the rule. This metadata will be available to workflows for this rule via the Cumulus Message Adapter.
    rule<JSON Object> rule type and associated values - discussed belowYesObject defining the type and subsequent attributes of the rule
    state"ENABLED"No("ENABLED"|"DISABLED") whether or not the rule will be active. Defaults to "ENABLED".
    queueUrlhttps://sqs.us-east-1.amazonaws.com/1234567890/queue-nameNoURL for SQS queue that will be used to schedule workflows for this rule
    tags["kinesis", "podaac"]NoAn array of strings that can be used to simplify search

    collection-object

    KeyValueRequiredDescription
    name"L2_HR_PIXC"YesName of a collection defined/configured in the Collections dashboard page
    version"000"YesVersion number of a collection defined/configured in the Collections dashboard page

    meta-object

    KeyValueRequiredDescription
    retries3NoNumber of retries on errors, for sqs-type rule only. Defaults to 3.
    visibilityTimeout900NoVisibilityTimeout in seconds for the inflight messages, for sqs-type rule only. Defaults to the visibility timeout of the SQS queue when the rule is created.

    rule-object

    KeyValueRequiredDescription
    type"kinesis"Yes("onetime"|"scheduled"|"kinesis"|"sns"|"sqs") type of scheduling/workflow kick-off desired
    value<String> ObjectDependsDiscussion of valid values is below

    rule-value

    The rule - value entry depends on the type of run:

    • If this is a onetime rule this can be left blank. Example
    • If this is a scheduled rule this field must hold a valid cron-type expression or rate expression.
    • If this is a kinesis rule, this must be a configured ${Kinesis_stream_ARN}. Example
    • If this is an sns rule, this must be an existing ${SNS_Topic_Arn}. Example
    • If this is an sqs rule, this must be an existing ${SQS_QueueUrl} that your account has permissions to access, and also you must configure a dead-letter queue for this SQS queue. Example

    sqs-type rule features

    • When an SQS rule is triggered, the SQS message remains on the queue.
    • The SQS message is not processed multiple times in parallel when visibility timeout is properly set. You should set the visibility timeout to the maximum expected length of the workflow with padding. Longer is better to avoid parallel processing.
    • The SQS message visibility timeout can be overridden by the rule.
    • Upon successful workflow execution, the SQS message is removed from the queue.
    • Upon failed execution(s), the workflow is run 3 or configured number of times.
    • Upon failed execution(s), the visibility timeout will be set to 5s to allow retries.
    • After configured number of failed retries, the SQS message is moved to the dead-letter queue configured for the SQS queue.

    Configuration Via Cumulus Dashboard

    Create A Provider

    • In the Cumulus dashboard, go to the Provider page.

    Screenshot of Create Provider form

    • Click on Add Provider.
    • Fill in the form and then submit it.

    Screenshot of Create Provider form

    Create A Collection

    • Go to the Collections page.

    Screenshot of the Collections page

    • Click on Add Collection.
    • Copy and paste or fill in the collection JSON object form.

    Screenshot of Add Collection form

    • Once you submit the form, you should be able to verify that your new collection is in the list.

    Create A Rule

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Rule Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v9.0.0/configuration/lifecycle-policies/index.html b/docs/v9.0.0/configuration/lifecycle-policies/index.html index 2ea10c369e6..9237e0a5bfb 100644 --- a/docs/v9.0.0/configuration/lifecycle-policies/index.html +++ b/docs/v9.0.0/configuration/lifecycle-policies/index.html @@ -5,13 +5,13 @@ Setting S3 Lifecycle Policies | Cumulus Documentation - +
    Version: v9.0.0

    Setting S3 Lifecycle Policies

    This document will outline, in brief, how to set data lifecycle policies so that you are more easily able to control data storage costs while keeping your data accessible. For more information on why you might want to do this, see the 'Additional Information' section at the end of the document.

    Requirements

    • The AWS CLI installed and configured (if you wish to run the CLI example). See AWS's guide to setting up the AWS CLI for more on this. Please ensure the AWS CLI is in your shell path.
    • You will need a S3 bucket on AWS. You are strongly encouraged to use a bucket without voluminous amounts of data in it for experimenting/learning.
    • An AWS user with the appropriate roles to access the target bucket as well as modify bucket policies.

    Examples

    Walkthrough on setting time-based S3 Infrequent Access (S3IA) bucket policy

    This example will give step-by-step instructions on updating a bucket's lifecycle policy to move all objects in the bucket from the default storage to S3 Infrequent Access (S3IA) after a period of 90 days. Below are instructions for walking through configuration via the command line and the management console.

    Command Line

    Please ensure you have the AWS CLI installed and configured for access prior to attempting this example.

    Create policy

    From any directory you chose, open an editor and add the following to a file named exampleRule.json

    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    Set policy

    On the command line run the following command (with the bucket you're working with substituted in place of yourBucketNameHere).

    aws s3api put-bucket-lifecycle-configuration --bucket yourBucketNameHere --lifecycle-configuration file://exampleRule.json

    Verify policy has been set

    To obtain all of the existing policies for a bucket, run the following command (again substituting the correct bucket name):

     $ aws s3api get-bucket-lifecycle-configuration --bucket yourBucketNameHere
    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    You have set a policy that transitions any version of an object in the bucket to S3IA after each object version has not been modified for 90 days.

    Management Console

    Create Policy

    To create the example policy on a bucket via the management console, go to the following URL (replacing 'yourBucketHere' with the bucket you intend to update):

    https://s3.console.aws.amazon.com/s3/buckets/yourBucketHere/?tab=overview

    You should see a screen similar to:

    Screenshot of AWS console for an S3 bucket

    Click the "Management" Tab, then lifecycle button and press + Add lifecycle rule:

    Screenshot of &quot;Management&quot; tab of AWS console for an S3 bucket

    Give the rule a name (e.g. '90DayRule'), leaving the filter blank:

    Screenshot of window for configuring the name and scope of a lifecycle rule on an S3 bucket in the AWS console

    Click next, and mark Current Version and Previous Versions.

    Then for each, click + Add transition and select Transition to Standard-IA after for the Object creation field, and set 90 for the Days after creation/Days after objects become concurrent field. Your screen should look similar to:

    Screenshot of window for configuring the storage class transitions of a lifecycle rule on an S3 bucket in the AWS console

    Click next, then next past the Configure expiration screen (we won't be setting this), and on the fourth page, click Save:

    Screenshot of window for reviewing the configuration of a lifecycle rule on an S3 bucket in the AWS console

    You should now see you have a rule configured for your bucket:

    Screenshot of lifecycle rule appearing in the &quot;Management&quot; tab of AWS console for an S3 bucket

    You have now set a policy that transitions any version of an object in the bucket to S3IA after each object has not been modified for 90 days.

    Additional Information

    This section lists information you may want prior to enacting lifecycle policies. It is not required content for working through the examples.

    Strategy Overview

    For a discussion of overall recommended strategy, please review the Methodology for Data Lifecycle Management on the EarthData wiki.

    AWS Documentation

    The examples shown in this document are obviously fairly basic cases. By using object tags, filters and other configuration options you can enact far more complicated policies for various scenarios. For more reading on the topics presented on this page see:

    - + \ No newline at end of file diff --git a/docs/v9.0.0/configuration/monitoring-readme/index.html b/docs/v9.0.0/configuration/monitoring-readme/index.html index ee38e0f3ec7..607d13d6b1b 100644 --- a/docs/v9.0.0/configuration/monitoring-readme/index.html +++ b/docs/v9.0.0/configuration/monitoring-readme/index.html @@ -5,14 +5,14 @@ Monitoring Best Practices | Cumulus Documentation - +
    Version: v9.0.0

    Monitoring Best Practices

    This document intends to provide a set of recommendations and best practices for monitoring the state of a deployed Cumulus and diagnosing any issues.

    Cumulus-provided resources and integrations for monitoring

    Cumulus provides a number set of resources that are useful for monitoring the system and its operation.

    Cumulus Dashboard

    The primary tool for monitoring the Cumulus system is the Cumulus Dashboard. The dashboard is hosted on Github and includes instructions on how to deploy and link it into your core Cumulus deployment.

    The dashboard displays workflow executions, their status, inputs, outputs, and some diagnostic information such as logs. For further information on the dashboard, its usage, and the information it provides, see the documentation.

    Cumulus-provided AWS resources

    Cumulus sets up CloudWatch log groups for all Core-provided tasks.

    Monitoring Lambda Functions

    Logging for each Lambda Function is available in Lambda-specific CloudWatch log groups.

    Monitoring ECS services

    Each deployed cumulus_ecs_service module also includes a CloudWatch log group for the processes running on ECS.

    Monitoring workflows

    For advanced debugging, we also configure dead letter queues on critical system functions. These will allow you to monitor and debug invalid inputs to the functions we use to start workflows, which can be helpful if you find that you are not seeing workflows being started as expected. More information on these can be found in the dead letter queue documentation

    AWS recommendations

    AWS has a number of recommendations on system monitoring. Rather than reproduce those here and risk providing outdated guidance, we've documented the following links which will take you to available AWS docs on monitoring recommendations and best practices for the services used in Cumulus:

    Example: Setting up email notifications for CloudWatch logs

    Cumulus does not provide out-of-the-box support for email notifications at this time. However, setting up email notifications on AWS is fairly straightforward in that the operative components are an AWS SNS topic and a subscribed email address.

    In terms of Cumulus integration, forwarding CloudWatch logs requires creating a mechanism, most likely a Lambda Function subscribed to the log group that will receive, filter and forward these messages to the SNS topic.

    As a very simple example, we could create a function that filters CloudWatch logs created by the @cumulus/logger package and sends email notifications for error and fatal log levels, adapting the example linked above:

    const zlib = require('zlib');
    const aws = require('aws-sdk');
    const { promisify } = require('util');

    const gunzip = promisify(zlib.gunzip);
    const sns = new aws.SNS();

    exports.handler = async (event) => {
    const payload = Buffer.from(event.awslogs.data, 'base64');
    const decompressedData = await gunzip(payload);
    const logData = JSON.parse(decompressedData.toString('ascii'));
    return Promise.all(logData.logEvents.map(async (logEvent) => {
    const logMessage = JSON.parse(logEvent.message);
    if (['error', 'fatal'].includes(logMessage.level)) {
    return sns.publish({
    TopicArn: process.env.EmailReportingTopicArn,
    Message: logEvent.message
    }).promise();
    }
    return Promise.resolve();
    }));
    };

    After creating the SNS topic, We can deploy this code as a lambda function, following the setup steps from Amazon. Make sure to include your SNS topic ARN as an environment variable on the lambda function by using the --environment option on aws lambda create-function.

    You will need to create subscription filters for each log group you want to receive emails for. We recommend automating this as much as possible, and you could very well handle this via Terraform, such as using a module to deploy filters alongside log groups, or exporting the log group names to an all-in-one email notification module.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/configuration/server_access_logging/index.html b/docs/v9.0.0/configuration/server_access_logging/index.html index 7e81ad7c3e8..7cc2cb31004 100644 --- a/docs/v9.0.0/configuration/server_access_logging/index.html +++ b/docs/v9.0.0/configuration/server_access_logging/index.html @@ -5,13 +5,13 @@ S3 Server Access Logging | Cumulus Documentation - +
    Version: v9.0.0

    S3 Server Access Logging

    Note: To support EMS Reporting, you need to enable Amazon S3 server access logging on all protected and public buckets.

    Via AWS Console

    Enable server access logging for an S3 bucket

    Via AWS Command Line Interface

    1. Create a logging.json file with these contents, replacing <stack-internal-bucket> with your stack's internal bucket name, and <stack> with the name of your cumulus stack.

      {
      "LoggingEnabled": {
      "TargetBucket": "<stack-internal-bucket>",
      "TargetPrefix": "<stack>/ems-distribution/s3-server-access-logs/"
      }
      }
    2. Add the logging policy to each of your protected and public buckets by calling this command on each bucket.

      aws s3api put-bucket-logging --bucket <protected/public-bucket-name> --bucket-logging-status file://logging.json
    3. Verify the logging policy exists on your buckets.

      aws s3api get-bucket-logging --bucket <protected/public-bucket-name>
    - + \ No newline at end of file diff --git a/docs/v9.0.0/data-cookbooks/about-cookbooks/index.html b/docs/v9.0.0/data-cookbooks/about-cookbooks/index.html index 7fedfbb5b49..a1b254221bc 100644 --- a/docs/v9.0.0/data-cookbooks/about-cookbooks/index.html +++ b/docs/v9.0.0/data-cookbooks/about-cookbooks/index.html @@ -5,13 +5,13 @@ About Cookbooks | Cumulus Documentation - +
    Version: v9.0.0

    About Cookbooks

    Introduction

    The following data cookbooks are documents containing examples and explanations of workflows in the Cumulus framework. Additionally, the following data cookbooks should serve to help unify an institution/user group on a set of terms.

    Setup

    The data cookbooks assume you can configure providers, collections, and rules to run workflows. Visit Cumulus data management types for information on how to conifgure Cumulus data management types.

    Adding a page

    As shown in detail in the "Add a New Page and Sidebars" section in Cumulus Docs: How To's, you can add a new page to the data cookbook by creating a markdown (.md) file in the docs/data-cookbooks directory. The new page can then be linked to the sidebar by adding it to the Data-Cookbooks object in the website/sidebar.json file as data-cookbooks/${id}.

    More about workflows

    Workflow general information

    Input & Output

    Developing Workflow Tasks

    Workflow Configuration How-to's

    - + \ No newline at end of file diff --git a/docs/v9.0.0/data-cookbooks/browse-generation/index.html b/docs/v9.0.0/data-cookbooks/browse-generation/index.html index bb4d5925089..b9d4bd515d7 100644 --- a/docs/v9.0.0/data-cookbooks/browse-generation/index.html +++ b/docs/v9.0.0/data-cookbooks/browse-generation/index.html @@ -5,7 +5,7 @@ Ingest Browse Generation | Cumulus Documentation - + @@ -15,7 +15,7 @@ provider keys with the previously entered values) Note that you need to set the "provider_path" to the path on your bucket (e.g. "/data") that you've staged your mock/test data.:

    {
    "name": "TestBrowseGeneration",
    "workflow": "DiscoverGranulesBrowseExample",
    "provider": "{{provider_from_previous_step}}",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "meta": {
    "provider_path": "{{path_to_data}}"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "updatedAt": 1553053438767
    }

    Run Workflows

    Once you've configured the Collection and Provider and added a onetime rule, you're ready to trigger your rule, and watch the ingest workflows process.

    Go to the Rules tab, click the rule you just created:

    Screenshot of the Rules overview page with a list of rules in the Cumulus dashboard

    Then click the gear in the upper right corner and click "Rerun":

    Screenshot of clicking the button to rerun a workflow rule from the rule edit page in the Cumulus dashboard

    Tab over to executions and you should see the DiscoverGranulesBrowseExample workflow run, succeed, and then moments later the CookbookBrowseExample should run and succeed.

    Screenshot of page listing executions in the Cumulus dashboard

    Results

    You can verify your data has ingested by clicking the successful workflow entry:

    Screenshot of individual entry from table listing executions in the Cumulus dashboard

    Select "Show Output" on the next page

    Screenshot of &quot;Show output&quot; button from individual execution page in the Cumulus dashboard

    and you should see in the payload from the workflow something similar to:

    "payload": {
    "process": "modis",
    "granules": [
    {
    "files": [
    {
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "filepath": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "bucket": "cumulus-test-sandbox-protected",
    "filename": "s3://cumulus-test-sandbox-protected/MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "time": 1553027415000,
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.name, 0, 3)}",
    "duplicate_found": true,
    "size": 1908635
    },
    {
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "filepath": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-private",
    "filename": "s3://cumulus-test-sandbox-private/MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "time": 1553027412000,
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.name, 0, 3)}",
    "duplicate_found": true,
    "size": 21708
    },
    {
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "filepath": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "type": "browse",
    "bucket": "cumulus-test-sandbox-protected",
    "filename": "s3://cumulus-test-sandbox-protected/MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "time": 1553027415000,
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.name, 0, 3)}",
    "duplicate_found": true,
    "size": 1908635
    },
    {
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "filepath": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-protected-2",
    "filename": "s3://cumulus-test-sandbox-protected-2/MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.name, 0, 3)}"
    }
    ],
    "cmrLink": "https://cmr.uat.earthdata.nasa.gov/search/granules.json?concept_id=G1222231611-CUMULUS",
    "cmrConceptId": "G1222231611-CUMULUS",
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "cmrMetadataFormat": "echo10",
    "dataType": "MOD09GQ",
    "version": "006",
    "published": true
    }
    ]
    }

    You can verify the granules exist within your cumulus instance (search using the Granules interface, check the S3 buckets, etc) and validate that the above CMR entry


    Build Processing Lambda

    This section discusses the construction of a custom processing lambda to replace the contrived example from this entry for a real dataset processing task.

    To ingest your own data using this example, you will need to construct your own lambda to replace the source in ProcessingStep that will generate browse imagery and provide or update a CMR metadata export file.

    You will then need to add the lambda to your Cumulus deployment as a aws_lambda_function Terraform resource.

    The discussion below outlines requirements for this lambda.

    Inputs

    The incoming message to the task defined in the ProcessingStep as configured will have the following configuration values (accessible inside event.config courtesy of the message adapter):

    Configuration

    • event.config.bucket -- the name of the bucket configured in terraform.tfvars as your internal bucket.

    • event.config.collection -- The full collection object we will configure in the Configure Ingest section. You can view the expected collection schema in the docs here or in the source code on github. You need this as available input and output so you can update as needed.

    event.config.additionalUrls, generateFakeBrowse and event.config.cmrMetadataFormat from the example can be ignored as they're configuration flags for the provided example script.

    Payload

    The 'payload' from the previous task is accessible via event.input. The expected payload output schema from SyncGranules can be viewed here.

    In our example, the payload would look like the following. Note: The types are set per-file based on what we configured in our collection, and were initially added as part of the DiscoverGranules step in the DiscoverGranulesBrowseExample workflow.

     "payload": {
    "process": "modis",
    "granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "bucket": "cumulus-test-sandbox-internal",
    "filename": "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "fileStagingDir": "file-staging/jk2/MOD09GQ___006",
    "time": 1553027415000,
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.name, 0, 3)}",
    "size": 1908635
    },
    {
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-internal",
    "filename": "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "fileStagingDir": "file-staging/jk2/MOD09GQ___006",
    "time": 1553027412000,
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.name, 0, 3)}",
    "size": 21708
    }
    ]
    }
    ]
    }

    Generating Browse Imagery

    The provided example script used in the example goes through all granules and adds a 'fake' .jpg browse file to the same staging location as the data staged by prior ingest tasksf.

    The processing lambda you construct will need to do the following:

    • Create a browse image file based on the input data, and stage it to a location accessible to both this task and the FilesToGranules and MoveGranules tasks in a S3 bucket.
    • Add the browse file to the input granule files, making sure to set the granule file's type to browse.
    • Update meta.input_granules with the updated granules list, as well as provide the files to be integrated by FilesToGranules as output from the task.

    Generating/updating CMR metadata

    If you do not already have a CMR file in the granules list, you will need to generate one for valid export. This example's processing script generates and adds it to the FilesToGranules file list via the payload but it can be present in the InputGranules from the DiscoverGranules task as well if you'd prefer to pre-generate it.

    Both downstream tasks MoveGranules, UpdateGranulesCmrMetadataFileLinks, and PostToCmr expect a valid CMR file to be available if you want to export to CMR.

    Expected Outputs for processing task/tasks

    In the above example, the critical portion of the output to FilesToGranules is the payload and meta.input_granules.

    In the example provided, the processing task is setup to return an object with the keys "files" and "granules". In the cumulus_message configuration, the outputs are mapped in the configuration to the payload, granules to meta.input_granules:

              "task_config": {
    "inputGranules": "{$.meta.input_granules}",
    "granuleIdExtraction": "{$.meta.collection.granuleIdExtraction}"
    }

    Their expected values from the example above may be useful in constructing a processing task:

    payload

    The payload includes a full list of files to be 'moved' into the cumulus archive. The FilesToGranules task will take this list, merge it with the information from InputGranules, then pass that list to the MoveGranules task. The MoveGranules task will then move the files to their targets. The UpdateGranulesCmrMetadataFileLinks task will update the CMR metadata file if it exists with the updated granule locations and update the CMR file etags.

    In the provided example, a payload being passed to the FilesToGranules task should be expected to look like:

      "payload": [
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml"
    ]

    This list is the list of granules FilesToGranules will act upon to add/merge with the input_granules object.

    The pathing is generated from sync-granules, but in principle the files can be staged wherever you like so long as the processing/MoveGranules task's roles have access and the filename matches the collection configuration.

    input_granules

    The FilesToGranules task utilizes the incoming payload to chose which files to move, but pulls all other metadata from meta.input_granules. As such, the output payload in the example would look like:

    "input_granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "bucket": "cumulus-test-sandbox-internal",
    "filename": "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "fileStagingDir": "file-staging/jk2/MOD09GQ___006",
    "time": 1553027415000,
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.name, 0, 3)}",
    "duplicate_found": true,
    "size": 1908635
    },
    {
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-internal",
    "filename": "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "fileStagingDir": "file-staging/jk2/MOD09GQ___006",
    "time": 1553027412000,
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.name, 0, 3)}",
    "duplicate_found": true,
    "size": 21708
    },
    {
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "type": "browse",
    "bucket": "cumulus-test-sandbox-internal",
    "filename": "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "fileStagingDir": "file-staging/jk2/MOD09GQ___006",
    "time": 1553027415000,
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.name, 0, 3)}",
    "duplicate_found": true,
    }
    ]
    }
    ],
    - + \ No newline at end of file diff --git a/docs/v9.0.0/data-cookbooks/choice-states/index.html b/docs/v9.0.0/data-cookbooks/choice-states/index.html index f3af4077cce..b4bdcd9fb64 100644 --- a/docs/v9.0.0/data-cookbooks/choice-states/index.html +++ b/docs/v9.0.0/data-cookbooks/choice-states/index.html @@ -5,13 +5,13 @@ Choice States | Cumulus Documentation - +
    Version: v9.0.0

    Choice States

    Cumulus supports AWS Step Function Choice states. A Choice state enables branching logic in Cumulus workflows.

    Choice state definitions include a list of Choice Rules. Each Choice Rule defines a logical operation which compares an input value against a value using a comparison operator. For available comparison operators, review the AWS docs.

    If the comparison evaluates to true, the Next state is followed.

    Example

    In examples/cumulus-tf/parse_pdr_workflow.tf the ParsePdr workflow uses a Choice state, CheckAgainChoice, to terminate the workflow once meta.isPdrFinished: true is returned by the CheckStatus state.

    The CheckAgainChoice state definition requires an input object of the following structure:

    {
    "meta": {
    "isPdrFinished": false
    }
    }

    Given the above input to the CheckAgainChoice state, the workflow would transition to the PdrStatusReport state.

    "CheckAgainChoice": {
    "Type": "Choice",
    "Choices": [
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": false,
    "Next": "PdrStatusReport"
    },
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": true,
    "Next": "WorkflowSucceeded"
    }
    ],
    "Default": "WorkflowSucceeded"
    }

    Advanced: Loops in Cumulus Workflows

    Understanding the complete ParsePdr workflow is not necessary to understanding how Choice states work, but ParsePdr provides an example of how Choice states can be used to create a loop in a Cumulus workflow.

    In the complete ParsePdr workflow definition, the state QueueGranules is followed by CheckStatus. From CheckStatus a loop starts: Given CheckStatus returns meta.isPdrFinished: false, CheckStatus is followed by CheckAgainChoice is followed by PdrStatusReport is followed by WaitForSomeTime, which returns to CheckStatus. Once CheckStatus returns meta.isPdrFinished: true, CheckAgainChoice proceeds to WorkflowSucceeded.

    Execution graph of SIPS ParsePdr workflow in AWS Step Functions console

    Further documentation

    For complete details on Choice state configuration options, see the Choice state documentation.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/data-cookbooks/cnm-workflow/index.html b/docs/v9.0.0/data-cookbooks/cnm-workflow/index.html index 54adf0865c5..e8a7adaffea 100644 --- a/docs/v9.0.0/data-cookbooks/cnm-workflow/index.html +++ b/docs/v9.0.0/data-cookbooks/cnm-workflow/index.html @@ -5,7 +5,7 @@ CNM Workflow | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v9.0.0

    CNM Workflow

    This entry documents how to setup a workflow that utilizes the built-in CNM/Kinesis functionality in Cumulus.

    Prior to working through this entry you should be familiar with the Cloud Notification Mechanism.

    Sections


    Prerequisites

    Cumulus

    This entry assumes you have a deployed instance of Cumulus (version >= 1.16.0). The entry assumes you are deploying Cumulus via the cumulus terraform module sourced from the release page.

    AWS CLI

    This entry assumes you have the AWS CLI installed and configured. If you do not, please take a moment to review the documentation - particularly the examples relevant to Kinesis - and install it now.

    Kinesis

    This entry assumes you already have two Kinesis data steams created for use as CNM notification and response data streams.

    If you do not have two streams setup, please take a moment to review the Kinesis documentation and setup two basic single-shard streams for this example:

    Using the "Create Data Stream" button on the Kinesis Dashboard, work through the dialogue.

    You should be able to quickly use the "Create Data Stream" button on the Kinesis Dashboard, and setup streams that are similar to the following example:

    Screenshot of AWS console page for creating a Kinesis stream

    Please bear in mind that your {{prefix}}-lambda-processing IAM role will need permissions to write to the response stream for this workflow to succeed if you create the Kinesis stream with a dashboard user. If you are using the cumulus top-level module for your deployment this should be set properly.

    If not, the most straightforward approach is to attach the AmazonKinesisFullAccess policy for the stream resource to whatever role your Lambda s are using, however your environment/security policies may require an approach specific to your deployment environment.

    In operational environments it's likely science data providers would typically be responsible for providing a Kinesis stream with the appropriate permissions.

    For more information on how this process works and how to develop a process that will add records to a stream, read the Kinesis documentation and the developer guide.

    Source Data

    This entry will run the SyncGranule task against a single target data file. To that end it will require a single data file to be present in an S3 bucket matching the Provider configured in the next section.

    Collection and Provider

    Cumulus will need to be configured with a Collection and Provider entry of your choosing. The provider should match the location of the source data from the Ingest Source Data section.

    This can be done via the Cumulus Dashboard if installed or the API. It is strongly recommended to use the dashboard if possible.


    Configure the Workflow

    Provided the prerequisites have been fulfilled, you can begin adding the needed values to your Cumulus configuration to configure the example workflow.

    The following are steps that are required to set up your Cumulus instance to run the example workflow:

    Example CNM Workflow

    In this example, we're going to trigger a workflow by creating a Kinesis rule and sending a record to a Kinesis stream.

    The following workflow definition should be added to a new .tf workflow resource (e.g. cnm_workflow.tf) in your deployment directory. For the complete CNM workflow example, see examples/cumulus-tf/kinesis_trigger_test_workflow.tf.

    Add the following to the new terraform file in your deployment directory, updating the following:

    • Set the response-endpoint key in the CnmResponse task in the workflow JSON to match the name of the Kinesis response stream you configured in the prerequisites section
    • Update the source key to the workflow module to match the Cumulus release associated with your deployment.
    module "cnm_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-workflow.zip"

    prefix = var.prefix
    name = "CNMExampleWorkflow"
    workflow_config = module.cumulus.workflow_config
    system_bucket = var.system_bucket

    {
    state_machine_definition = <<JSON
    "CNMExampleWorkflow": {
    "Comment": "CNMExampleWorkflow",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "collection": "{$.meta.collection}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "response-endpoint": "ADD YOUR RESPONSE STREAM NAME HERE",
    "region": "us-east-1",
    "type": "kinesis",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$.input.input}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 5,
    "MaxAttempts": 3
    }
    ],
    "End": true
    }
    }
    }
    }
    JSON

    Again, please make sure to modify the value response-endpoint to match the stream name (not ARN) for your Kinesis response stream.

    Lambda Configuration

    To execute this workflow, you're required to include several Lambda resources in your deployment. To do this, add the following task (Lambda) definitions to your deployment along with the workflow you created above:

    Please note: To utilize these tasks you need to ensure you have a compatible CMA layer. See the deployment instructions for more details on how to deploy a CMA layer.

    Below is a description of each of these tasks:

    CNMToCMA

    CNMToCMA is meant for the beginning of a workflow: it maps CNM granule information to a payload for downstream tasks. For other CNM workflows, you would need to ensure that downstream tasks in your workflow either understand the CNM message or include a translation task like this one.

    You can also manipulate the data sent to downstream tasks using task_config for various states in your workflow resource configuration. Read more about how to configure data on the Workflow Input & Output page.

    CnmResponse

    The CnmResponse Lambda generates a CNM response message and puts it on the response-endpoint Kinesis stream.

    You can read more about the expected schema of a CnmResponse record in the Cloud Notification Mechanism schema repository.

    Additional Tasks

    Lastly, this entry also makes use of the SyncGranule task from the cumulus module.

    Redeploy

    Once the above configuration changes have been made, redeploy your stack.

    Please refer to Update Cumulus resources in the deployment documentation if you are unfamiliar with redeployment.

    Rule Configuration

    Cumulus includes a messageConsumer Lambda function (message-consumer). Cumulus kinesis-type rules create the event source mappings between Kinesis streams and the messageConsumer Lambda. The messageConsumer Lambda consumes records from one or more Kinesis streams, as defined by enabled kinesis-type rules. When new records are pushed to one of these streams, the messageConsumer triggers workflows associated with the enabled kinesis-type rules.

    To add a rule via the dashboard (if you'd like to use the API, see the docs here), navigate to the Rules page and click Add a rule, then configure the new rule using the following template (substituting correct values for parameters denoted by ${}):

    {
    "collection": {
    "name": "L2_HR_PIXC",
    "version": "000"
    },
    "name": "L2_HR_PIXC_kinesisRule",
    "provider": "PODAAC_SWOT",
    "rule": {
    "type": "kinesis",
    "value": "arn:aws:kinesis:{{awsRegion}}:{{awsAccountId}}:stream/{{streamName}}"
    },
    "state": "ENABLED",
    "workflow": "CNMExampleWorkflow"
    }

    Please Note:

    • The rule's value attribute value must match the Amazon Resource Name ARN for the Kinesis data stream you've preconfigured. You should be able to obtain this ARN from the Kinesis Dashboard entry for the selected stream.
    • The collection and provider should match the collection and provider you setup in the Prerequisites section.

    Once you've clicked on 'submit' a new rule should appear in the dashboard's Rule Overview.


    Execute the Workflow

    Once Cumulus has been redeployed and a rule has been added, we're ready to trigger the workflow and watch it execute.

    How to Trigger the Workflow

    To trigger matching workflows, you will need to put a record on the Kinesis stream that the message-consumer Lambda will recognize as a matching event. Most importantly, it should include a collection name that matches a valid collection.

    For the purpose of this example, the easiest way to accomplish this is using the AWS CLI.

    Create Record JSON

    Construct a JSON file containing an object that matches the values that have been previously setup. This JSON object should be a valid Cloud Notification Mechanism message.

    Please note: this example is somewhat contrived, as the downstream tasks don't care about most of these fields. A 'real' data ingest workflow would.

    The following values (denoted by ${} in the sample below) should be replaced to match values we've previously configured:

    • TEST_DATA_FILE_NAME: The filename of the test data that is available in the S3 (or other) provider we created earlier.
    • TEST_DATA_URI: The full S3 path to the test data (e.g. s3://bucket-name/path/granule)
    • COLLECTION: The collection name defined in the prerequisites for this product
    {
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "${TEST_DATA_FILE_NAME}",
    "checksum": "bogus_checksum_value",
    "uri": "${TEST_DATA_URI}",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "${TEST_DATA_FILE_NAME}",
    "dataVersion": "006"
    },
    "identifier ": "testIdentifier123456",
    "collection": "${COLLECTION}",
    "provider": "TestProvider",
    "version": "001",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Add Record to Kinesis Data Stream

    Using the JSON file you created, push it to the Kinesis notification stream:

    aws kinesis put-record --stream-name YOUR_KINESIS_NOTIFICATION_STREAM_NAME_HERE --partition-key 1 --data file:///path/to/file.json

    Please note: The above command uses the stream name, not the ARN.

    The command should return output similar to:

    {
    "ShardId": "shardId-000000000000",
    "SequenceNumber": "42356659532578640215890215117033555573986830588739321858"
    }

    This command will put a record containing the JSON from the --data flag onto the Kinesis data stream. The messageConsumer Lambda will consume the record and construct a valid CMA payload to trigger workflows. For this example, the record will trigger the CNMExampleWorkflow workflow as defined by the rule previously configured.

    You can view the current running executions on the Executions dashboard page which presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information.

    Verify Workflow Execution

    As detailed above, once the record is added to the Kinesis data stream, the messageConsumer Lambda will trigger the CNMExampleWorkflow .

    TranslateMessage

    TranslateMessage (which corresponds to the CNMToCMA Lambda) will take the CNM object payload and add a granules object to the CMA payload that's consistent with other Cumulus ingest tasks, and add a meta.cnm key (as well as the payload) to store the original message.

    For more on the Message Adapter, please see the Message Flow documentation.

    An example of what is happening in the CNMToCMA Lambda is as follows:

    Example Input Payload:

    "payload": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some_bucket/cumulus-test-data/pdrs/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Example Output Payload:

      "payload": {
    "cnm": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552"
    },
    "output": {
    "granules": [
    {
    "granuleId": "TestGranuleUR",
    "files": [
    {
    "path": "some-bucket/data",
    "url_path": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "some-bucket",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 12345678
    }
    ]
    }
    ]
    }
    }

    SyncGranules

    This Lambda will take the files listed in the payload and move them to s3://{deployment-private-bucket}/file-staging/{deployment-name}/{COLLECTION}/{file_name}.

    CnmResponse

    Assuming a successful execution of the workflow, this task will recover the meta.cnm key from the CMA output, and add a "SUCCESS" record to the notification Kinesis stream.

    If a prior step in the workflow has failed, this will add a "FAILURE" record to the stream instead.

    The data written to the response-endpoint should adhere to the Response Message Fields schema.

    Example CNM Success Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "SUCCESS"
    }
    }

    Example CNM Error Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "FAILURE",
    "errorCode": "PROCESSING_ERROR",
    "errorMessage": "File [cumulus-dev-a4d38f59-5e57-590c-a2be-58640db02d91/prod_20170926T11:30:36/production_file.nc] did not match gve checksum value."
    }
    }

    Note the CnmResponse state defined in the .tf workflow definition above configures $.exception to be passed to the CnmResponse Lambda keyed under config.WorkflowException. This is required for the CnmResponse code to deliver a failure response.

    To test the failure scenario, send a record missing the product.name key.


    Verify results

    Check for successful execution on the dashboard

    Following the successful execution of this workflow, you should expect to see the workflow complete successfully on the dashboard:

    Screenshot of a successful CNM workflow appearing on the executions page of the Cumulus dashboard

    Check the test granule has been delivered to S3 staging

    The test granule identified in the Kinesis record should be moved to the deployment's private staging area.

    Check for Kinesis records

    A SUCCESS notification should be present on the response-endpoint Kinesis stream.

    You should be able to validate the notification and response streams have the expected records with the following steps (the AWS CLI Kinesis Basic Stream Operations is useful to review before proceeding):

    Get a shard iterator (substituting your stream name as appropriate):

    aws kinesis get-shard-iterator \
    --shard-id shardId-000000000000 \
    --shard-iterator-type LATEST \
    --stream-name NOTIFICATION_OR_RESPONSE_STREAM_NAME

    which should result in an output to:

    {
    "ShardIterator": "VeryLongString=="
    }
    • Re-trigger the workflow by using the put-record command from
    • As the workflow completes, use the output from the get-shard-iterator command to request data from the stream:
    aws kinesis get-records --shard-iterator SHARD_ITERATOR_VALUE

    This should result in output similar to:

    {
    "Records": [
    {
    "SequenceNumber": "49586720336541656798369548102057798835250389930873978882",
    "ApproximateArrivalTimestamp": 1532664689.128,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjI4LjkxOSJ9",
    "PartitionKey": "1"
    },
    {
    "SequenceNumber": "49586720336541656798369548102059007761070005796999266306",
    "ApproximateArrivalTimestamp": 1532664707.149,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjQ2Ljk1OCJ9",
    "PartitionKey": "1"
    }
    ],
    "NextShardIterator": "AAAAAAAAAAFo9SkF8RzVYIEmIsTN+1PYuyRRdlj4Gmy3dBzsLEBxLo4OU+2Xj1AFYr8DVBodtAiXbs3KD7tGkOFsilD9R5tA+5w9SkGJZ+DRRXWWCywh+yDPVE0KtzeI0andAXDh9yTvs7fLfHH6R4MN9Gutb82k3lD8ugFUCeBVo0xwJULVqFZEFh3KXWruo6KOG79cz2EF7vFApx+skanQPveIMz/80V72KQvb6XNmg6WBhdjqAA==",
    "MillisBehindLatest": 0
    }

    Note the data encoding is not human readable and would need to be parsed/converted to be interpretable. There are many options to build a Kineis consumer such as the KCL.

    For purposes of validating the workflow, it may be simpler to locate the workflow in the Step Function Management Console and assert the expected output is similar to the below examples.

    Successful CNM Response Object Example:

    {
    "cnmResponse": {
    "provider": "TestProvider",
    "collection": "MOD09GQ",
    "version": "123456",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier ": "testIdentifier123456",
    "response": {
    "status": "SUCCESS"
    }
    }
    }

    Kinesis Record Error Handling

    messageConsumer

    The default Kinesis stream processing in the Cumulus system is configured for record error tolerance.

    When the messageConsumer fails to process a record, the failure is captured and the record is published to the kinesisFallback SNS Topic. The kinesisFallback SNS topic broadcasts the record and a subscribed copy of the messageConsumer Lambda named kinesisFallback consumes these failures.

    At this point, the normal Lambda asynchronous invocation retry behavior will attempt to process the record 3 mores times. After this, if the record cannot successfully be processed, it is written to a dead letter queue. Cumulus' dead letter queue is an SQS Queue named kinesisFailure. Operators can use this queue to inspect failed records.

    This system ensures when messageConsumer fails to process a record and trigger a workflow, the record is retried 3 times. This retry behavior improves system reliability in case of any external service failure outside of Cumulus control.

    The Kinesis error handling system - the kinesisFallback SNS topic, messageConsumer Lambda, and kinesisFailure SQS queue - come with the API package and do not need to be configured by the operator.

    To examine records that were unable to be processed at any step you need to go look at the dead letter queue {{prefix}}-kinesisFailure. Check the Simple Queue Service (SQS) console. Select your queue, and under the Queue Actions tab, you can choose View/Delete Messages. Start polling for messages and you will see records that failed to process through the messageConsumer.

    Note, these are only records that occurred when processing records from Kinesis streams. Workflow failures are handled differently.

    Kinesis Stream logging

    Notification Stream messages

    Cumulus includes two Lambdas (KinesisInboundEventLogger and KinesisOutboundEventLogger) that utilize the same code to take a Kinesis record event as input, deserialize the data field and output the modified event to the logs.

    When a kinesis rule is created, in addition to the messageConsumer event mapping, an event mapping is created to trigger KinesisInboundEventLogger to record a log of the inbound record, to allow for analysis in case of unexpected failure.

    Response Stream messages

    Cumulus also supports this feature for all outbound messages. To take advantage of this feature, you will need to set an event mapping on the KinesisOutboundEventLogger Lambda that targets your response-endpoint. You can do this in the Lambda management page for KinesisOutboundEventLogger. Add a Kinesis trigger, and configure it to target the cnmResponseStream for your workflow:

    Screenshot of the AWS console showing configuration for Kinesis stream trigger on KinesisOutboundEventLogger Lambda

    Once this is done, all records sent to the response-endpoint will also be logged in CloudWatch. For more on configuring Lambdas to trigger on Kinesis events, please see creating an event source mapping.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/data-cookbooks/error-handling/index.html b/docs/v9.0.0/data-cookbooks/error-handling/index.html index 176434473e0..a7def3d6f71 100644 --- a/docs/v9.0.0/data-cookbooks/error-handling/index.html +++ b/docs/v9.0.0/data-cookbooks/error-handling/index.html @@ -5,7 +5,7 @@ Error Handling in Workflows | Cumulus Documentation - + @@ -45,7 +45,7 @@ Service Exception. See this documentation on configuring your workflow to handle transient lambda errors.

    Example state machine definition:

    {
    "Comment": "Tests Workflow from Kinesis Stream",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "Path": "$.payload",
    "TargetPath": "$.payload"
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": ["States.ALL"],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowSucceeded"
    },
    "CnmResponseFail": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowFailed"
    },
    "WorkflowSucceeded": {
    "Type": "Succeed"
    },
    "WorkflowFailed": {
    "Type": "Fail",
    "Cause": "Workflow failed"
    }
    }
    }

    The above results in a workflow which is visualized in the diagram below:

    Screenshot of a visualization of an AWS Step Function workflow definition with branching logic for failures

    Summary

    Error handling should (mostly) be the domain of workflow configuration.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/data-cookbooks/hello-world/index.html b/docs/v9.0.0/data-cookbooks/hello-world/index.html index f7717ce41d8..403d45f32c7 100644 --- a/docs/v9.0.0/data-cookbooks/hello-world/index.html +++ b/docs/v9.0.0/data-cookbooks/hello-world/index.html @@ -5,14 +5,14 @@ HelloWorld Workflow | Cumulus Documentation - +
    Version: v9.0.0

    HelloWorld Workflow

    Example task meant to be a sanity check/introduction to the Cumulus workflows.

    Pre-Deployment Configuration

    Workflow Configuration

    A workflow definition can be found in the template repository hello_world_workflow module.

    {
    "Comment": "Returns Hello World",
    "StartAt": "HelloWorld",
    "States": {
    "HelloWorld": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.hello_world_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    }

    Workflow error-handling can be configured as discussed in the Error-Handling cookbook.

    Task Configuration

    The HelloWorld task is provided for you as part of the cumulus terraform module, no configuration is needed.

    If you want to manually deploy your own version of this Lambda for testing, you can copy the Lambda resource definition located in the Cumulus source code at cumulus/tf-modules/ingest/hello-world-task.tf. The Lambda source code is located in the Cumulus source code at 'cumulus/tasks/hello-world'.

    Execution

    We will focus on using the Cumulus dashboard to schedule the execution of a HelloWorld workflow.

    Our goal here is to create a rule through the Cumulus dashboard that will define the scheduling and execution of our HelloWorld workflow. Let's navigate to the Rules page and click Add a rule.

    {
    "collection": { # collection values can be configured and found on the Collections page
    "name": "${collection_name}",
    "version": "${collection_version}"
    },
    "name": "helloworld_rule",
    "provider": "${provider}", # found on the Providers page
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "workflow": "HelloWorldWorkflow" # This can be found on the Workflows page
    }

    Screenshot of AWS Step Function execution graph for the HelloWorld workflow Executed workflow as seen in AWS Console

    Output/Results

    The Executions page presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information. The rule defined in the previous section should start an execution of its own accord, and the status of that execution can be tracked here.

    To get some deeper information on the execution, click on the value in the Name column of your execution of interest. This should bring up a visual representation of the workflow similar to that shown above, execution details, and a list of events.

    Summary

    Setting up the HelloWorld workflow on the Cumulus dashboard is the tip of the iceberg, so to speak. The task and step-function need to be configured before Cumulus deployment. A compatible collection and provider must be configured and applied to the rule. Finally, workflow execution status can be viewed via the workflows tab on the dashboard.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/data-cookbooks/ingest-notifications/index.html b/docs/v9.0.0/data-cookbooks/ingest-notifications/index.html index 3438f0d6a56..8eb0e942198 100644 --- a/docs/v9.0.0/data-cookbooks/ingest-notifications/index.html +++ b/docs/v9.0.0/data-cookbooks/ingest-notifications/index.html @@ -5,13 +5,13 @@ Ingest Notification in Workflows | Cumulus Documentation - +
    Version: v9.0.0

    Ingest Notification in Workflows

    On deployment, an SQS queue and three SNS topics are created and used for handling notification messages related to the workflow.

    The sfEventSqsToDbRecords Lambda function reads from the sfEventSqsToDbRecordsInputQueue queue and updates DynamoDB. The DynamoDB events for the ExecutionsTable, GranulesTable and PdrsTable are streamed on DynamoDBStreams, which are read by the publishExecutions, publishGranules and publishPdrs Lambda functions, respectively.

    These Lambda functions publish to the three SNS topics both when the workflow starts and when it reaches a terminal state (completion or failure). The following describes how many message(s) each topic receives both on workflow start and workflow completion/failure:

    • reportExecutions - Receives 1 message per workflow execution
    • reportGranules - Receives 1 message per granule in a workflow execution
    • reportPdrs - Receives 1 message per PDR

    Diagram of architecture for reporting workflow ingest notifications from AWS Step Functions

    The ingest notification reporting SQS queue is populated via a Cloudwatch rule for any Step Function execution state transitions. The sfEventSqsToDbRecords Lambda consumes this queue. The queue and Lambda are included in the cumulus module and the Cloudwatch rule in the workflow module and are included by default in a Cumulus deployment.

    Sending SQS messages to report status

    Publishing granule/PDR reports directly to the SQS queue

    If you have a non-Cumulus workflow or process ingesting data and would like to update the status of your granules or PDRs, you can publish directly to the reporting SQS queue. Publishing messages to this queue will result in those messages being stored as granule/PDR records in the Cumulus database and having the status of those granules/PDRs being visible on the Cumulus dashboard. The queue does have certain expectations as it expects a Cumulus Message nested within a Cloudwatch Step Function Event object.

    Posting directly to the queue will require knowing the queue URL. Assuming that you are using the cumulus module for your deployment, you can get the queue URL by adding them to outputs.tf for your Terraform deployment as in our example deployment:

    output "stepfunction_event_reporter_queue_url" {
    value = module.cumulus.stepfunction_event_reporter_queue_url
    }

    output "report_executions_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_granules_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_pdrs_sns_topic_arn" {
    value = module.cumulus.report_pdrs_sns_topic_arn
    }

    Then, when you run terraform deploy, you should see the topic ARNs printed to your console:

    Outputs:
    ...
    stepfunction_event_reporter_queue_url = https://sqs.us-east-1.amazonaws.com/xxxxxxxxx/<prefix>-sfEventSqsToDbRecordsInputQueue
    report_executions_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_granules_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_pdrs_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-pdrs-topic

    Once you have the queue URL, you can use the AWS SDK for your language of choice to publish messages to the topic. The expected format of these messages is that of a Cloudwatch Step Function event containing a Cumulus message. For SUCCEEDED events, the Cumulus message is expected to be in detail.output. For all other events statuses, a Cumulus Message is expected in detail.input. The Cumulus Message populating these fields MUST be a JSON string, not an object. Messages that do not conform to the schemas will fail to be created as records.

    If you are not seeing records persist to the database or show up in the Cumulus dashboard, you can investigate the Cloudwatch logs of the SQS consumer Lambda:

    • /aws/lambda/<prefix>-sfEventSqsToDbRecords

    In a workflow

    As described above, ingest notifications will automatically be published to the SNS topics on workflow start and completion/failure, so you should not include a workflow step to publish the initial or final status of your workflows.

    However, if you want to report your ingest status at any point during a workflow execution, you can add a workflow step using the SfSqsReport Lambda. In the following example from cumulus-tf/parse_pdr_workflow.tf, the ParsePdr workflow is configured to use the SfSqsReport Lambda, primarily to update the PDR ingestion status.

    Note: ${sf_sqs_report_task_arn} is an interpolated value referring to a Terraform resource. See the example deployment code for the ParsePdr workflow.

      "PdrStatusReport": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    },
    "ResultPath": null,
    "Type": "Task",
    "Resource": "${sf_sqs_report_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WaitForSomeTime"
    },

    Subscribing additional listeners to SNS topics

    Additional listeners to SNS topics can be configured in a .tf file for your Cumulus deployment. Shown below is configuration that subscribes an additional Lambda function (test_lambda) to receive messages from the report_executions SNS topic. To subscribe to the report_granules or report_pdrs SNS topics instead, simply replace report_executions in the code block below with either of those values.

    resource "aws_lambda_function" "test_lambda" {
    function_name = "${var.prefix}-testLambda"
    filename = "./testLambda.zip"
    source_code_hash = filebase64sha256("./testLambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"
    }

    resource "aws_sns_topic_subscription" "test_lambda" {
    topic_arn = module.cumulus.report_executions_sns_topic_arn
    protocol = "lambda"
    endpoint = aws_lambda_function.test_lambda.arn
    }

    resource "aws_lambda_permission" "test_lambda" {
    action = "lambda:InvokeFunction"
    function_name = aws_lambda_function.test_lambda.arn
    principal = "sns.amazonaws.com"
    source_arn = module.cumulus.report_executions_sns_topic_arn
    }

    SNS message format

    Subscribers to the SNS topics can expect to find the published message in the SNS event at Records[0].Sns.Message. The message will be a JSON stringified version of the ingest notification record for an execution or a PDR. For granules, the message will be a JSON stringified object with ingest notification record in the record property and the event type as the event property.

    The ingest notification record of the execution, granule, or PDR should conform to the data model schema for the given record type.

    Summary

    Workflows can be configured to send SQS messages at any point using the sf-sqs-report task.

    Additional listeners can be easily configured to trigger when messages are sent to the SNS topics.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/data-cookbooks/queue-post-to-cmr/index.html b/docs/v9.0.0/data-cookbooks/queue-post-to-cmr/index.html index 8352fe9b4fc..1225128ff31 100644 --- a/docs/v9.0.0/data-cookbooks/queue-post-to-cmr/index.html +++ b/docs/v9.0.0/data-cookbooks/queue-post-to-cmr/index.html @@ -5,13 +5,13 @@ Queue PostToCmr | Cumulus Documentation - +
    Version: v9.0.0

    Queue PostToCmr

    In this document, we walktrough handling CMR errors in workflows by queueing PostToCmr. We assume that the user already has an ingest workflow setup.

    Overview

    The general concept is that the last task of the ingest workflow will be QueueWorkflow, which queues the publish workflow. The publish workflow contains the PostToCmr task and if a CMR error occurs during PostToCmr, the publish workflow will add itself back onto the queue so that it can be executed when CMR is back online. This is achieved by leveraging the QueueWorkflow task again in the publish workflow. The following diagram demonstrates this queueing process.

    Diagram of workflow queueing

    Ingest Workflow

    The last step should be the QueuePublishWorkflow step. It should be configured with a queueUrl and workflow. In this case, the queueUrl is a throttled queue. Any queueUrl can be specified here which is useful if you would like to use a lower priority queue. The workflow is the unprefixed workflow name that you would like to queue (e.g. PublishWorkflow).

      "QueuePublishWorkflowStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "workflow": "{$.meta.workflow}",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Publish Workflow

    Configure the Catch section of your PostToCmr task to proceed to QueueWorkflow if a CMRInternalError is caught. Any other error will cause the workflow to fail.

      "Catch": [
    {
    "ErrorEquals": [
    "CMRInternalError"
    ],
    "Next": "RequeueWorkflow"
    },
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],

    Then, configure the QueueWorkflow task similarly to its configuration in the ingest workflow. This time, pass the current publish workflow to the task config. This allows for the publish workflow to be requeued when there is a CMR error.

    {
    "RequeueWorkflow": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "workflow": "PublishGranuleQueue",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    - + \ No newline at end of file diff --git a/docs/v9.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html b/docs/v9.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html index e8fbce350ce..f1182ef6477 100644 --- a/docs/v9.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html +++ b/docs/v9.0.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html @@ -5,13 +5,13 @@ Run Step Function Tasks in AWS Lambda or Docker | Cumulus Documentation - +
    Version: v9.0.0

    Run Step Function Tasks in AWS Lambda or Docker

    Overview

    AWS Step Function Tasks can run tasks on AWS Lambda or on AWS Elastic Container Service (ECS) as a Docker container.

    Lambda provides serverless architecture, providing the best option for minimizing cost and server management. ECS provides the fullest extent of AWS EC2 resources via the flexibility to execute arbitrary code on any AWS EC2 instance type.

    When to use Lambda

    You should use AWS Lambda whenever all of the following are true:

    • The task runs on one of the supported Lambda Runtimes. At time of this writing, supported runtimes include versions of python, Java, Ruby, node.js, Go and .NET.
    • The lambda package is less than 50 MB in size, zipped.
    • The task consumes less than each of the following resources:
      • 3008 MB memory allocation
      • 512 MB disk storage (must be written to /tmp)
      • 15 minutes of execution time

    See this page for a complete and up-to-date list of AWS Lambda limits.

    If your task requires more than any of these resources or an unsupported runtime, creating a Docker image which can be run on ECS is the way to go. Cumulus supports running any lambda package (and its configured layers) as a Docker container with cumulus-ecs-task.

    Step Function Activities and cumulus-ecs-task

    Step Function Activities enable a state machine task to "publish" an activity task which can be picked up by any activity worker. Activity workers can run pretty much anywhere, but Cumulus workflows support the cumulus-ecs-task activity worker. The cumulus-ecs-task worker runs as a Docker container on the Cumulus ECS cluster.

    The cumulus-ecs-task container takes an AWS Lambda Amazon Resource Name (ARN) as an argument (see --lambdaArn in the example below). This ARN argument is defined at deployment time. The cumulus-ecs-task worker polls for new Step Function Activity Tasks. When a Step Function executes, the worker (container) picks up the activity task and runs the code contained in the lambda package defined on deployment.

    Example: Replacing AWS Lambda with a Docker container run on ECS

    This example will use an already-defined workflow from the cumulus module that includes the QueueGranules task in its configuration.

    The following example is an excerpt from the Discover Granules workflow containing the step definition for the QueueGranules step:

    Note: ${ingest_granule_workflow_name} and ${queue_granules_task_arn} are interpolated values that refer to Terraform resources. See the example deployment code for the Discover Granules workflow.

      "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "queueUrl": "{$.meta.queues.startSF}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Given it has been discovered this task can no longer run in AWS Lambda, you can instead run it on the Cumulus ECS cluster by adding the following resources to your terraform deployment (by either adding a new .tf file or updating an existing one):

    • A aws_sfn_activity resource:
    resource "aws_sfn_activity" "queue_granules" {
    name = "${var.prefix}-QueueGranules"
    }
    • An instance of the cumulus_ecs_service module (found on the Cumulus releases page configured to provide the QueueGranules task:

    module "queue_granules_service" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-ecs-service.zip"

    prefix = var.prefix
    name = "QueueGranules"

    cluster_arn = module.cumulus.ecs_cluster_arn
    desired_count = 1
    image = "cumuluss/cumulus-ecs-task:1.7.0"

    cpu = 400
    memory_reservation = 700

    environment = {
    AWS_DEFAULT_REGION = data.aws_region.current.name
    }
    command = [
    "cumulus-ecs-task",
    "--activityArn",
    aws_sfn_activity.queue_granules.id,
    "--lambdaArn",
    module.cumulus.queue_granules_task.task_arn
    ]
    alarms = {
    TaskCountHigh = {
    comparison_operator = "GreaterThanThreshold"
    evaluation_periods = 1
    metric_name = "MemoryUtilization"
    statistic = "SampleCount"
    threshold = 1
    }
    }
    }

    Please note: If you have updated the code for the Lambda specified by --lambdaArn, you will have to manually restart the tasks in your ECS service before invocation of the Step Function activity will use the updated Lambda code.

    • An updated Discover Granules workflow) to utilize the new resource (the Resource key in the QueueGranules step has been updated to:

    "Resource": "${aws_sfn_activity.queue_granules.id}")`

    If you then run this workflow in place of the DiscoverGranules workflow, the QueueGranules step would run as an ECS task instead of a lambda.

    Final note

    Step Function Activities and AWS Lambda are not the only ways to run tasks in an AWS Step Function. Learn more about other service integrations, including direct ECS integration via the AWS Service Integrations page.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/data-cookbooks/sips-workflow/index.html b/docs/v9.0.0/data-cookbooks/sips-workflow/index.html index 77c29544495..ced6c3a4845 100644 --- a/docs/v9.0.0/data-cookbooks/sips-workflow/index.html +++ b/docs/v9.0.0/data-cookbooks/sips-workflow/index.html @@ -5,7 +5,7 @@ Science Investigator-led Processing Systems (SIPS) | Cumulus Documentation - + @@ -16,7 +16,7 @@ we're just going to create a onetime throw-away rule that will be easy to test with. This rule will kick off the DiscoverAndQueuePdrs workflow, which is the beginning of a Cumulus SIPS workflow:

    Screenshot of a Cumulus rule configuration

    Note: A list of configured workflows exists under the "Workflows" in the navigation bar on the Cumulus dashboard. Additionally, one can find a list of executions and their respective status in the "Executions" tab in the navigation bar.

    DiscoverAndQueuePdrs Workflow

    This workflow will discover PDRs and queue them to be processed. Duplicate PDRs will be dealt with according to the configured duplicate handling setting in the collection. The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. DiscoverPdrs - source
    2. QueuePdrs - source

    Screenshot of execution graph for discover and queue PDRs workflow in the AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the discover_and_queue_pdrs_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    ParsePdr Workflow

    The ParsePdr workflow will parse a PDR, queue the specified granules (duplicates are handled according to the duplicate handling setting) and periodically check the status of those queued granules. This workflow will not succeed until all the granules included in the PDR are successfully ingested. If one of those fails, the ParsePdr workflow will fail. NOTE that ParsePdr may spin up multiple IngestGranule workflows in parallel, depending on the granules included in the PDR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. ParsePdr - source
    2. QueueGranules - source
    3. CheckStatus - source

    Screenshot of execution graph for SIPS Parse PDR workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the parse_pdr_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    IngestGranule Workflow

    The IngestGranule workflow processes and ingests a granule and posts the granule metadata to CMR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. SyncGranule - source.
    2. CmrStep - source

    Additionally this workflow requires a processing step you must provide. The ProcessingStep step in the workflow picture below is an example of a custom processing step.

    Note: Using the CmrStep is not required and can be left out of the processing trajectory if desired (for example, in testing situations).

    Screenshot of execution graph for SIPS IngestGranule workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the ingest_and_publish_granule_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    Summary

    In this cookbook we went over setting up a collection, rule, and provider for a SIPS workflow. Once we had the setup completed, we looked over the Cumulus workflows that participate in parsing PDRs, ingesting and processing granules, and updating CMR.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/data-cookbooks/throttling-queued-executions/index.html b/docs/v9.0.0/data-cookbooks/throttling-queued-executions/index.html index 1e1bde10782..ad9c4de718c 100644 --- a/docs/v9.0.0/data-cookbooks/throttling-queued-executions/index.html +++ b/docs/v9.0.0/data-cookbooks/throttling-queued-executions/index.html @@ -5,13 +5,13 @@ Throttling queued executions | Cumulus Documentation - +
    Version: v9.0.0

    Throttling queued executions

    In this entry, we will walkthrough how to create an SQS queue for scheduling executions which will be used to limit those executions to a maximum concurrency. And we will see how to configure our Cumulus workflows/rules to use this queue.

    We will also review the architecture of this feature and highlight some implementation notes.

    Limiting the number of executions that can be running from a given queue is useful for controlling the cloud resource usage of workflows that may be lower priority, such as granule reingestion or reprocessing campaigns. It could also be useful for preventing workflows from exceeding known resource limits, such as a maximum number of open connections to a data provider.

    Implementing the queue

    Create and deploy the queue

    Add a new queue

    In a .tf file for your Cumulus deployment, add a new SQS queue:

    resource "aws_sqs_queue" "background_job_queue" {
    name = "${var.prefix}-backgroundJobQueue"
    receive_wait_time_seconds = 20
    visibility_timeout_seconds = 60
    }

    Set maximum executions for the queue

    Define the throttled_queues variable for the cumulus module in your Cumulus deployment to specify the maximum concurrent executions for the queue.

    module "cumulus" {
    # ... other variables

    throttled_queues = [{
    url = aws_sqs_queue.background_job_queue.id,
    execution_limit = 5
    }]
    }

    Setup consumer for the queue

    Add the sqs2sfThrottle Lambda as the consumer for the queue and add a Cloudwatch event rule/target to read from the queue on a scheduled basis.

    Please note: You must use the sqs2sfThrottle Lambda as the consumer for any queue with a queue execution limit or else the execution throttling will not work correctly. Additionally, please allow at least 60 seconds after creation before using the queue while associated infrastructure and triggers are set up and made ready.

    aws_sqs_queue.background_job_queue.id refers to the queue resource defined above.

    resource "aws_cloudwatch_event_rule" "background_job_queue_watcher" {
    schedule_expression = "rate(1 minute)"
    }

    resource "aws_cloudwatch_event_target" "background_job_queue_watcher" {
    rule = aws_cloudwatch_event_rule.background_job_queue_watcher.name
    arn = module.cumulus.sqs2sfThrottle_lambda_function_arn
    input = jsonencode({
    messageLimit = 500
    queueUrl = aws_sqs_queue.background_job_queue.id
    timeLimit = 60
    })
    }

    resource "aws_lambda_permission" "background_job_queue_watcher" {
    action = "lambda:InvokeFunction"
    function_name = module.cumulus.sqs2sfThrottle_lambda_function_arn
    principal = "events.amazonaws.com"
    source_arn = aws_cloudwatch_event_rule.background_job_queue_watcher.arn
    }

    Re-deploy your Cumulus application

    Follow the instructions to re-deploy your Cumulus application. After you have re-deployed, your workflow template will be updated to the include information about the queue (the output below is partial output from an expected workflow template):

    {
    "cumulus_meta": {
    "queueExecutionLimits": {
    "<backgroundJobQueue_SQS_URL>": 5
    }
    }
    }

    Integrate your queue with workflows and/or rules

    Integrate queue with queuing steps in workflows

    For any workflows using QueueGranules or QueuePdrs that you want to use your new queue, update the Cumulus configuration of those steps in your workflows.

    As seen in this partial configuration for a QueueGranules step, update the queueUrl to reference the new throttled queue:

    Note: ${ingest_granule_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverGranules workflow.

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}"
    }
    }
    }
    }
    }

    Similarly, for a QueuePdrs step:

    Note: ${parse_pdr_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverPdrs workflow.

    {
    "QueuePdrs": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "parsePdrWorkflow": "${parse_pdr_workflow_name}"
    }
    }
    }
    }
    }

    After making these changes, re-deploy your Cumulus application for the execution throttling to take effect on workflow executions queued by these workflows.

    Create/update a rule to use your new queue

    Create or update a rule definition to include a queueUrl property that refers to your new queue:

    {
    "name": "s3_provider_rule",
    "workflow": "DiscoverAndQueuePdrs",
    "provider": "s3_provider",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "queueUrl": "<backgroundJobQueue_SQS_URL>" // configure rule to use your queue URL
    }

    After creating/updating the rule, any subsequent invocations of the rule should respect the maximum number of executions when starting workflows from the queue.

    Architecture

    Architecture diagram showing how executions started from a queue are throttled to a maximum concurrent limit

    Execution throttling based on the queue works by manually keeping a count (semaphore) of how many executions are running for the queue at a time. The key operation that prevents the number of executions from exceeding the maximum for the queue is that before starting new executions, the sqs2sfThrottle Lambda attempts to increment the semaphore and responds as follows:

    • If the increment operation is successful, then the count was not at the maximum and an execution is started
    • If the increment operation fails, then the count was already at the maximum so no execution is started

    Final notes

    Limiting the number of concurrent executions for work scheduled via a queue has several consequences worth noting:

    • The number of executions that are running for a given queue will be limited to the maximum for that queue regardless of which workflow(s) are started.
    • If you use the same queue to schedule executions across multiple workflows/rules, then the limit on the total number of executions running concurrently will be applied to all of the executions scheduled across all of those workflows/rules.
    • If you are scheduling the same workflow both via a queue with a maxExecutions value and a queue without a maxExecutions value, only the executions scheduled via the queue with the maxExecutions value will be limited to the maximum.
    - + \ No newline at end of file diff --git a/docs/v9.0.0/data-cookbooks/tracking-files/index.html b/docs/v9.0.0/data-cookbooks/tracking-files/index.html index f668a548669..618f8a54cea 100644 --- a/docs/v9.0.0/data-cookbooks/tracking-files/index.html +++ b/docs/v9.0.0/data-cookbooks/tracking-files/index.html @@ -5,7 +5,7 @@ Tracking Ancillary Files | Cumulus Documentation - + @@ -19,7 +19,7 @@ The UMM-G column reflects the RelatedURL's Type derived from the CNM type, whereas the ECHO10 column shows how the CNM type affects the destination element.

    CNM TypeUMM-G RelatedUrl.TypeECHO10 Location
    ancillary'VIEW RELATED INFORMATION'OnlineResource
    data'GET DATA'OnlineAccessURL
    browse'GET RELATED VISUALIZATION'AssociatedBrowseImage
    linkage'EXTENDED METADATA'OnlineResource
    metadata'EXTENDED METADATA'OnlineResource
    qa'EXTENDED METADATA'OnlineResource

    Common Use Cases

    This section briefly documents some common use cases and the recommended configuration for the file. The examples shown here are for the DiscoverGranules use case, which allows configuration at the Cumulus dashboard level. The other two cases covered in the ancillary metadata documentation require configuration at the provider notification level (either CNM message or PDR) and are not covered here.

    Configuring browse imagery:

    {
    "bucket": "public",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_[\\d]{1}.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_1.jpg",
    "type": "browse"
    }

    Configuring a documentation entry:

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_README.pdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_README.pdf",
    "type": "metadata"
    }

    Configuring other associated files (use types metadata or qa as appropriate):

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_QA.txt$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_QA.txt",
    "type": "qa"
    }
    - + \ No newline at end of file diff --git a/docs/v9.0.0/deployment/api-gateway-logging/index.html b/docs/v9.0.0/deployment/api-gateway-logging/index.html index 0765e8ddb10..e6434e016ed 100644 --- a/docs/v9.0.0/deployment/api-gateway-logging/index.html +++ b/docs/v9.0.0/deployment/api-gateway-logging/index.html @@ -5,13 +5,13 @@ API Gateway Logging | Cumulus Documentation - +
    Version: v9.0.0

    API Gateway Logging

    Enabling API Gateway logging

    In order to enable distribution API Access and execution logging, configure the TEA deployment by setting log_api_gateway_to_cloudwatch on the thin_egress_app module:

    log_api_gateway_to_cloudwatch = true

    This enables the distribution API to send its logs to the default CloudWatch location: API-Gateway-Execution-Logs_<RESTAPI_ID>/<STAGE>

    Configure Permissions for API Gateway Logging to CloudWatch

    Instructions for enabling account level logging from API Gateway to CloudWatch

    This is a one time operation that must be performed on each AWS account to allow API Gateway to push logs to CloudWatch.

    Create a policy document

    The AmazonAPIGatewayPushToCloudWatchLogs managed policy, with an ARN of arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs, has all the required permissions to enable API Gateway logging to CloudWatch. To grant these permissions to your account, first create an IAM role with apigateway.amazonaws.com as its trusted entity.

    Save this snippet as apigateway-policy.json.

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "",
    "Effect": "Allow",
    "Principal": {
    "Service": "apigateway.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
    }
    ]
    }

    Create an account role to act as ApiGateway and write to CloudWatchLogs

    NASA users in NGAP: be sure to use your account's permission boundary.

    aws iam create-role \
    --role-name ApiGatewayToCloudWatchLogs \
    [--permissions-boundary <permissionBoundaryArn>] \
    --assume-role-policy-document file://apigateway-policy.json

    Note the ARN of the returned role for the last step.

    Attach correct permissions to role

    Next attach the AmazonAPIGatewayPushToCloudWatchLogs policy to the IAM role.

    aws iam attach-role-policy \
    --role-name ApiGatewayToCloudWatchLogs \
    --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs"

    Update Account API Gateway settings with correct permissions

    Finally, set the IAM role ARN on the cloudWatchRoleArn property on your API Gateway Account settings.

    aws apigateway update-account \
    --patch-operations op='replace',path='/cloudwatchRoleArn',value='<ApiGatewayToCloudWatchLogs ARN>'

    Configure API Gateway CloudWatch Logs Delivery

    See Configure Cloudwatch Logs Delivery

    - + \ No newline at end of file diff --git a/docs/v9.0.0/deployment/cloudwatch-logs-delivery/index.html b/docs/v9.0.0/deployment/cloudwatch-logs-delivery/index.html index 429cdc5add9..f7e2bb7fbc3 100644 --- a/docs/v9.0.0/deployment/cloudwatch-logs-delivery/index.html +++ b/docs/v9.0.0/deployment/cloudwatch-logs-delivery/index.html @@ -5,13 +5,13 @@ Configure Cloudwatch Logs Delivery | Cumulus Documentation - +
    Version: v9.0.0

    Configure Cloudwatch Logs Delivery

    As an optional configuration step, it is possible to deliver CloudWatch logs to a cross-account shared AWS::Logs::Destination. An operator does this by configuring the cumulus module for your deployment as shown below. The value of the log_destination_arn variable is the ARN of a writeable log destination.

    The value can be either an AWS::Logs::Destination or a Kinesis Stream ARN to which your account can write.

    log_destination_arn           = arn:aws:[kinesis|logs]:us-east-1:123456789012:[streamName|destination:logDestinationName]

    Logs Sent

    Be default, the following logs will be sent to the destination when one is given.

    • Ingest logs
    • Async Operation logs
    • Thin Egress App API Gateway logs (if configured)

    Additional Logs

    If additional logs are needed, you can configure additional_log_groups_to_elk with the Cloudwatch log groups you want to send to the destination. additional_log_groups_to_elk is a map with the key as a descriptor and the value with the Cloudwatch log group name.

    additional_log_groups_to_elk = {
    "HelloWorldTask" = "/aws/lambda/cumulus-example-HelloWorld"
    "MyCustomTask" = "my-custom-task-log-group"
    }
    - + \ No newline at end of file diff --git a/docs/v9.0.0/deployment/components/index.html b/docs/v9.0.0/deployment/components/index.html index 80f53688177..d5f6d08f101 100644 --- a/docs/v9.0.0/deployment/components/index.html +++ b/docs/v9.0.0/deployment/components/index.html @@ -5,7 +5,7 @@ Component-based Cumulus Deployment | Cumulus Documentation - + @@ -39,7 +39,7 @@ Terraform at the same time.

    With remote state, Terraform writes the state data to a remote data store, which can then be shared between all members of a team.

    The recommended approach for handling remote state with Cumulus is to use the S3 backend. This backend stores state in S3 and uses a DynamoDB table for locking.

    See the deployment documentation for a walkthrough of creating resources for your remote state using an S3 backend.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/deployment/create_bucket/index.html b/docs/v9.0.0/deployment/create_bucket/index.html index 6c36c212872..ed9493b3c0e 100644 --- a/docs/v9.0.0/deployment/create_bucket/index.html +++ b/docs/v9.0.0/deployment/create_bucket/index.html @@ -5,13 +5,13 @@ Creating an S3 Bucket | Cumulus Documentation - +
    Version: v9.0.0

    Creating an S3 Bucket

    Buckets can be created on the command line with AWS CLI or via the web interface on the AWS console.

    When creating a protected bucket (a bucket containing data which will be served through the distribution API), make sure to enable S3 server access logging. See S3 Server Access Logging for more details.

    Command line

    Using the AWS command line tool create-bucket s3api subcommand:

    $ aws s3api create-bucket \
    --bucket foobar-internal \
    --region us-west-2 \
    --create-bucket-configuration LocationConstraint=us-west-2
    {
    "Location": "/foobar-internal"
    }

    Note: The region and create-bucket-configuration arguments are only necessary if you are creating a bucket outside of the us-east-1 region.

    Please note security settings and other bucket options can be set via the options listed in the s3api documentation.

    Repeat the above step for each bucket to be created.

    Web interface

    See: AWS "Creating a Bucket" documentation

    - + \ No newline at end of file diff --git a/docs/v9.0.0/deployment/index.html b/docs/v9.0.0/deployment/index.html index d418915ee9d..0a217d1cbfe 100644 --- a/docs/v9.0.0/deployment/index.html +++ b/docs/v9.0.0/deployment/index.html @@ -5,7 +5,7 @@ How to Deploy Cumulus | Cumulus Documentation - + @@ -18,7 +18,7 @@ Terraform root modules: data-persistence and cumulus.

    The data-persistence module should be deployed first, and creates the Elasticsearch domain and DynamoDB tables. The cumulus module deploys the rest of Cumulus: distribution, API, ingest, workflows, etc. The cumulus module depends on the resources created in the data-persistence deployment.

    Each of these modules have to be deployed independently and require their own Terraform backend, variable, and output settings. The template deploy repo that was cloned previously already contains the scaffolding of the necessary files for the deployment of each module: data-persistence-tf deploys the data-persistence module and cumulus-tf deploys the cumulus module. For reference on the files that are included, see the documentation on adding components to a Terraform deployment.

    Troubleshooting

    Please see our troubleshooting documentation for any issues with your deployment when performing the upcoming steps.

    Configure and deploy the data-persistence-tf root module

    These steps should be executed in the data-persistence-tf directory of the template deploy repo that you previously cloned. Run the following to copy the example files.

    cd data-persistence-tf/
    cp terraform.tf.example terraform.tf
    cp terraform.tfvars.example terraform.tfvars

    In terraform.tf, configure the remote state settings by substituting the appropriate values for:

    • bucket
    • dynamodb_table
    • PREFIX (whatever prefix you've chosen for your deployment)

    Fill in the appropriate values in terraform.tfvars. See the data-persistence module variable definitions for more detail on each variable.

    Consider the size of your Elasticsearch cluster when configuring data-persistence.

    Reminder: Elasticsearch is optional and can be disabled using include_elasticsearch = false in your terraform.tfvars. Your Cumulus dashboard will not work without Elasticsearch.

    Reminder: If you are including subnet_ids in your terraform.tfvars, Elasticsearch will need a service-linked role to deploy successfully. Follow the instructions above to create the service-linked role if you haven't already.

    Initialize Terraform

    Run terraform init3

    You should see output like:

    * provider.aws: version = "~> 2.32"

    Terraform has been successfully initialized!
    Optional: Import existing AWS resources to Terraform

    Import existing resources

    If you have an existing Cumulus deployment, you can import your existing DynamoDB tables and Elasticsearch instance to be used with your new Terraform deployment.

    To import a DynamoDB table from your existing deployment:

    terraform import module.data_persistence.aws_dynamodb_table.access_tokens_table PREFIX-AccessTokensTable

    Repeat this command for every DynamoDB table included in the data-persistence module, replacing PREFIX with the correct value for your existing deployment.

    To import the Elasticsearch instance from your existing deployment, run this command and replace PREFIX-es5vpc with the existing domain name:

    terraform import module.data_persistence.aws_elasticsearch_domain.es_vpc PREFIX-es5vpc

    You will also need to make sure to set these variables in your terraform.tfvars file:

    prefix = "PREFIX"     # must match prefix of existing deployment
    custom_domain_name = "PREFIX-es5vpc" # must match existing Elasticsearch domain name

    Note: If you are importing data resources from a previous version of Cumulus deployed using Cloudformation, then make sure DeletionPolicy: Retain is set on the data resources in the Cloudformation stack before deleting that stack. Otherwise, the imported data resources will be destroyed when you delete that stack. As of Cumulus version 1.15.0, DeletionPolicy: Retain is set by default for the data resources in the Cloudformation stack.

    Deploy

    Run terraform apply to deploy your data persistence resources. Type yes when prompted to confirm that you want to create the resources. Assuming the operation is successful, you should see output like:

    Apply complete! Resources: 16 added, 0 changed, 0 destroyed.

    Outputs:

    dynamo_tables = {
    "access_tokens" = {
    "arn" = "arn:aws:dynamodb:us-east-1:12345:table/prefix-AccessTokensTable"
    "name" = "prefix-AccessTokensTable"
    }
    # ... more tables ...
    }
    elasticsearch_alarms = [
    {
    "arn" = "arn:aws:cloudwatch:us-east-1:12345:alarm:prefix-es-vpc-NodesLowAlarm"
    "name" = "prefix-es-vpc-NodesLowAlarm"
    },
    # ... more alarms ...
    ]
    elasticsearch_domain_arn = arn:aws:es:us-east-1:12345:domain/prefix-es-vpc
    elasticsearch_hostname = vpc-prefix-es-vpc-abcdef.us-east-1.es.amazonaws.com
    elasticsearch_security_group_id = sg-12345

    Your data persistence resources are now deployed.

    Deploy the Cumulus Message Adapter layer

    The Cumulus Message Adapter (CMA) is necessary for interpreting the input and output of Cumulus workflow steps. The CMA is now integrated with Cumulus workflow steps as a Lambda layer.

    To deploy a CMA layer to your account:

    1. Go to the CMA releases page and download the cumulus-message-adapter.zip for the desired release
    2. Use the AWS CLI to publish your layer:
    $ aws lambda publish-layer-version \
    --layer-name prefix-CMA-layer \
    --region us-east-1 \
    --zip-file fileb:///path/to/cumulus-message-adapter.zip

    {
    ... more output ...
    "LayerVersionArn": "arn:aws:lambda:us-east-1:1234567890:layer:prefix-CMA-layer:1",
    ... more output ...
    }

    Make sure to copy the LayerVersionArn of the deployed layer, as it will be used to configure the cumulus-tf deployment in the next step.

    Configure and deploy the cumulus-tf root module

    These steps should be executed in the cumulus-tf directory of the template repo that was cloned previously.

    cd cumulus-tf/
    cp terraform.tf.example terraform.tf
    cp terraform.tfvars.example terraform.tfvars

    In terraform.tf, configure the remote state settings by substituting the appropriate values for:

    • bucket
    • dynamodb_table
    • PREFIX (whatever prefix you've chosen for your deployment)

    Fill in the appropriate values in terraform.tfvars. See the Cumulus module variable definitions for more detail on each variable.

    Notes on specific variables:

    • deploy_to_ngap: This variable controls the provisioning of certain resources and policies that are specific to an NGAP environment. If you are deploying to NGAP, you must set this variable to true.
    • prefix: The value should be the same as the prefix from the data-persistence deployment.
    • token_secret: A string value used for signing and verifying JSON Web Tokens (JWTs) issued by the API. For security purposes, it is strongly recommended that this value be a 32-character string.
    • data_persistence_remote_state_config: This object should contain the remote state values that you configured in data-persistence-tf/terraform.tf. These settings allow cumulus-tf to determine the names of the resources created in data-persistence-tf.
    • key_name (optional): The name of your key pair from setting up your key pair
    • rds_security_group: The ID of the security group used to allow access to the PostgreSQL database
    • rds_user_access_secret_arn: The ARN for the Secrets Manager secret that provides database access information
    • rds_connection_heartbeat: When using RDS/Aurora Serverless as a database backend, this should be set to true, this tells Core to always use a 'heartbeat' query when establishing a database connection to avoid spin-up timeout failures.

    Consider the sizing of your Cumulus instance when configuring your variables.

    Configure the Thin Egress App

    The Thin Egress App is used for Cumulus distribution. Follow the steps in the documentation to configure distribution in your cumulus-tf deployment.

    Initialize Terraform

    Follow the above instructions to initialize Terraform using terraform init3.

    Deploy

    Run terraform apply to deploy the resources. Type yes when prompted to confirm that you want to create the resources. Assuming the operation is successful, you should see output like this:

    Apply complete! Resources: 292 added, 0 changed, 0 destroyed.

    Outputs:

    archive_api_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/token
    archive_api_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/
    distribution_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/login
    distribution_url = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/

    Note: Be sure to copy the redirect URLs, as you will use them to update your Earthdata application.

    Update Earthdata Application

    You will need to add two redirect URLs to your EarthData login application.

    1. Login to URS.
    2. Under My Applications -> Application Administration -> use the edit icon of your application.
    3. Under Manage -> redirect URIs, add the Archive API url returned from the stack deployment
      • e.g. archive_api_redirect_uri = https://<czbbkscuy6>.execute-api.us-east-1.amazonaws.com/dev/token.
    4. Also add the Distribution url
      • e.g. distribution_redirect_uri = https://<kido2r7kji>.execute-api.us-east-1.amazonaws.com/dev/login1.
    5. You may delete the placeholder url you used to create the application.

    If you've lost track of the needed redirect URIs, they can be located on the API Gateway. Once there, select <prefix>-archive and/or <prefix>-thin-egress-app-EgressGateway, Dashboard and utilizing the base URL at the top of the page that is accompanied by the text Invoke this API at:. Make sure to append /token for the archive URL and /login to the thin egress app URL.


    Deploy Cumulus dashboard

    Dashboard Requirements

    Please note that the requirements are similar to the Cumulus stack deployment requirements. The installation instructions below include a step that will install/use the required node version referenced in the .nvmrc file in the dashboard repository.

    Prepare AWS

    Create S3 bucket for dashboard:

    • Create it, e.g. <prefix>-dashboard. Use the command line or console as you did when preparing AWS configuration.
    • Configure the bucket to host a website:
      • AWS S3 console: Select <prefix>-dashboard bucket then, "Properties" -> "Static Website Hosting", point to index.html
      • CLI: aws s3 website s3://<prefix>-dashboard --index-document index.html
    • The bucket's url will be http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or you can find it on the AWS console via "Properties" -> "Static website hosting" -> "Endpoint"
    • Ensure the bucket's access permissions allow your deployment user access to write to the bucket

    Install dashboard

    To install the dashboard, clone the Cumulus dashboard repository into the root deploy directory and install dependencies with npm install:

      git clone https://github.com/nasa/cumulus-dashboard
    cd cumulus-dashboard
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Dashboard versioning

    By default, the master branch will be used for dashboard deployments. The master branch of the dashboard repo contains the most recent stable release of the dashboard.

    If you want to test unreleased changes to the dashboard, use the develop branch.

    Each release/version of the dashboard will have a tag in the dashboard repo. Release/version numbers will use semantic versioning (major/minor/patch).

    To checkout and install a specific version of the dashboard:

      git fetch --tags
    git checkout <version-number> # e.g. v1.2.0
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Building the dashboard

    Note: These environment variables are available during the build: APIROOT, DAAC_NAME, STAGE, HIDE_PDR. Any of these can be set on the command line to override the values contained in config.js when running the build below.

    To configure your dashboard for deployment, set the APIROOT environment variable to your app's API root.2

    Build the dashboard from the dashboard repository root directory, cumulus-dashboard:

      APIROOT=<your_api_root> npm run build

    Dashboard deployment

    Deploy dashboard to s3 bucket from the cumulus-dashboard directory:

    Using AWS CLI:

      aws s3 sync dist s3://<prefix>-dashboard --acl public-read

    From the S3 Console:

    • Open the <prefix>-dashboard bucket, click 'upload'. Add the contents of the 'dist' subdirectory to the upload. Then select 'Next'. On the permissions window allow the public to view. Select 'Upload'.

    You should be able to visit the dashboard website at http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or find the url <prefix>-dashboard -> "Properties" -> "Static website hosting" -> "Endpoint" and login with a user that you configured for access in the Configure and Deploy the Cumulus Stack step.


    Cumulus Instance Sizing

    The Cumulus deployment default sizing for Elasticsearch instances, EC2 instances, and Autoscaling Groups are small and designed for testing and cost savings. The default settings are likely not suitable for production workloads. Sizing his highly individual and dependent on expected load and archive size.

    Please be cognizant of costs as any change in size will affect your AWS bill. AWS provides a pricing calculator for estimating costs.

    Elasticsearch

    The mappings file contains all of the data types that will be indexed into Elasticsearch. Elasticsearch sizing is tied to your archive size, including your collections, granules, and workflow executions that will be stored.

    AWS provides documentation on calculating and configuring for sizing.

    In addition to size you'll want to consider the number of nodes which determine how the system reacts in the event of a failure.

    Configuration can be done in the data persistence module in elasticsearch_config and the cumulus module in es_index_shards.

    If you make changes to your Elasticsearch configuration you will need to reindex for those changes to take effect.

    EC2 instances and autoscaling groups

    EC2 instances are used for long-running operations (i.e. generating a reconciliation report) and long-running workflow tasks. Configuration for your ECS cluster is achieved via Cumulus deployment variables.

    When configuring your ECS cluster consider:

    • The EC2 instance type and EBS volume size needed to accommodate your workloads. Configured as ecs_cluster_instance_type and ecs_cluster_instance_docker_volume_size.
    • The minimum and desired number of instances on hand to accommodate your workloads. Configured as ecs_cluster_min_size and ecs_cluster_desired_size.
    • The maximum number of instances you will need and are willing to pay for to accommodate your heaviest workloads. Configured as ecs_cluster_max_size.
    • Your autoscaling parameters: ecs_cluster_scale_in_adjustment_percent, ecs_cluster_scale_out_adjustment_percent, ecs_cluster_scale_in_threshold_percent, and ecs_cluster_scale_out_threshold_percent.

    Footnotes


    1. Run terraform init if:

      • This is the first time deploying the module
      • You have added any additional child modules, including Cumulus components
      • You have updated the source for any of the child modules

    2. To add another redirect URIs to your application. On Earthdata home page, select "My Applications". Scroll down to "Application Administration" and use the edit icon for your application. Then Manage -> Redirect URIs.

    3. The API root can be found a number of ways. The easiest is to note it in the output of the app deployment step. But you can also find it from the AWS console -> Amazon API Gateway -> APIs -> <prefix>-archive -> Dashboard, and reading the URL at the top after "Invoke this API at"

    - + \ No newline at end of file diff --git a/docs/v9.0.0/deployment/postgres_database_deployment/index.html b/docs/v9.0.0/deployment/postgres_database_deployment/index.html index c9147b79694..bd083a0ec05 100644 --- a/docs/v9.0.0/deployment/postgres_database_deployment/index.html +++ b/docs/v9.0.0/deployment/postgres_database_deployment/index.html @@ -5,7 +5,7 @@ PostgreSQL Database Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ cumulus-rds-tf that will deploy an AWS RDS Aurora Serverless PostgreSQL 10.2 compatible database cluster, and optionally provision a single deployment database with credentialed secrets for use with Cumulus.

    We have provided an example terraform deployment using this module in the Cumulus template-deploy repository on github.

    Use of this example involves:

    • Creating/configuring a Terraform module directory
    • Using Terraform to deploy resources to AWS

    Requirements

    Configuration/installation of this module requires the following:

    • Terraform
    • git
    • A VPC configured for use with Cumulus Core. This should match the subnets you provide when Deploying Cumulus to allow Core's lambdas to properly access the database.
    • At least two subnets across multiple AZs. These should match the subnets you provide as configuration when Deploying Cumulus, and should be within the same VPC.

    Needed Git Repositories

    Assumptions

    OS/Environment

    The instructions in this module require Linux/MacOS. While deployment via Windows is possible, it is unsupported.

    Terraform

    This document assumes knowledge of Terraform. If you are not comfortable working with Terraform, the following links should bring you up to speed:

    For Cumulus specific instructions on installation of Terraform, refer to the main Cumulus Installation Documentation

    Aurora/RDS

    This document also assumes some basic familiarity with PostgreSQL databases, and Amazon Aurora/RDS. If you're unfamiliar consider perusing the AWS docs, and the Aurora Serverless V1 docs.

    Prepare deployment repository

    If you already are working with an existing repository that has a configured rds-cluster-tf deployment for the version of Cumulus you intend to deploy or update, or just need to configure this module for your repository, skip to Prepare AWS configuration.

    Clone the cumulus-template-deploy repo and name appropriately for your organization:

      git clone https://github.com/nasa/cumulus-template-deploy <repository-name>

    We will return to configuring this repo and using it for deployment below.

    Optional: Create a new repository

    Create a new repository on Github so that you can add your workflows and other modules to source control:

      git remote set-url origin https://github.com/<org>/<repository-name>
    git push origin master

    You can then add/commit changes as needed.

    Note: If you are pushing your deployment code to a git repo, make sure to add terraform.tf and terraform.tfvars to .gitignore, as these files will contain sensitive data related to your AWS account.


    Prepare AWS configuration

    To deploy this module, you need to make sure that you have the following steps from the Cumulus deployment instructions in similar fashion for this module:

    --

    Configure and deploy the module

    When configuring this module, please keep in mind that unlike Cumulus deployment, this module should be deployed once to create the database cluster and only thereafter to make changes to that configuration/upgrade/etc. This module does not need to be re-deployed for each Core update.

    These steps should be executed in the rds-cluster-tf directory of the template deploy repo that you previously cloned. Run the following to copy the example files:

    cd rds-cluster-tf/
    cp terraform.tf.example terraform.tf
    cp terraform.tfvars.example terraform.tfvars

    In terraform.tf, configure the remote state settings by substituting the appropriate values for:

    • bucket
    • dynamodb_table
    • PREFIX (whatever prefix you've chosen for your deployment)

    Fill in the appropriate values in terraform.tfvars. See the rds-cluster-tf module variable definitions for more detail on all of the configuration options. A few notable configuration options are documented in the next section.

    Configuration Options

    • deletion_protection -- defaults to true. Set it to false if you want to be able to delete your cluster with a terraform destroy without manually updating the cluster.
    • db_admin_username -- cluster database administration username. Defaults to postgres.
    • db_admin_password -- required variable that specifies the admin user password for the cluster. To randomize this on each deployment, consider using a random_string resource as input.
    • region -- defaults to us-east-1.
    • subnets -- requires at least 2 across different AZs. For use with Cumulus, these AZs should match the values you configure for your lambda_subnet_ids.
    • max_capacity -- the max ACUs the cluster is allowed to use. Carefully consider cost/performance concerns when setting this value.
    • min_capacity -- the minimum ACUs the cluster will scale to
    • provision_user_database -- Optional flag to allow module to provision a user database in addition to creating the cluster. Described in the next section.

    Provision user and user database

    If you wish for the module to provision a PostgreSQL database on your new cluster and provide a secret for access in the module output, in addition to managing the cluster itself, the following configuration keys are required:

    • provision_user_database -- must be set to true, this configures the module to deploy a lambda that will create the user database, and update the provided configuration on deploy.
    • permissions_boundary_arn -- the permissions boundary to use in creating the roles for access the provisioning lambda will need. This should in most use cases be the same one used for Cumulus Core deployment.
    • rds_user_password -- the value to set the user password to
    • prefix -- this value will be used to set a unique identifier the ProvisionDatabase lambda, as well as name the provisioned user/database.

    Once configured, the module will deploy the lambda, and run it on each provision, creating the configured database if it does not exist, updating the user password if that value has been changed, and updating the output user database secret.

    Setting provision_user_database to false after provisioning will not result in removal of the configured database, as the lambda is non-destructive as configured in this module.

    Please Note: This functionality is limited in that it will only provision a single database/user and configure a basic database, and should not be used in scenarios where more complex configuration is required.

    Initialize Terraform

    Run terraform init

    You should see output like:

    * provider.aws: version = "~> 2.32"

    Terraform has been successfully initialized!

    Deploy

    Run terraform apply to deploy the resources.

    If re-applying this module, variables (e.g. engine_version, snapshot_identifier ) that force a recreation of the database cluster may result in data loss if deletion protection is disabled. Examine the changeset carefully for resources that will be re-created/destroyed before applying.

    Review the changeset, and assuming it looks correct, type yes when prompted to confirm that you want to create all of the resources.

    Assuming the operation is successful, you should see output similar to the following (this example omits the creation of a user database/lambdas/security groups):

    terraform apply

    An execution plan has been generated and is shown below.
    Resource actions are indicated with the following symbols:
    + create

    Terraform will perform the following actions:

    # module.rds_cluster.aws_db_subnet_group.default will be created
    + resource "aws_db_subnet_group" "default" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + subnet_ids = [
    + "subnet-xxxxxxxxx",
    + "subnet-xxxxxxxxx",
    ]
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    }

    # module.rds_cluster.aws_rds_cluster.cumulus will be created
    + resource "aws_rds_cluster" "cumulus" {
    + apply_immediately = true
    + arn = (known after apply)
    + availability_zones = (known after apply)
    + backup_retention_period = 1
    + cluster_identifier = "xxxxxxxxx"
    + cluster_identifier_prefix = (known after apply)
    + cluster_members = (known after apply)
    + cluster_resource_id = (known after apply)
    + copy_tags_to_snapshot = false
    + database_name = "xxxxxxxxx"
    + db_cluster_parameter_group_name = (known after apply)
    + db_subnet_group_name = (known after apply)
    + deletion_protection = true
    + enable_http_endpoint = true
    + endpoint = (known after apply)
    + engine = "aurora-postgresql"
    + engine_mode = "serverless"
    + engine_version = "10.12"
    + final_snapshot_identifier = "xxxxxxxxx"
    + hosted_zone_id = (known after apply)
    + id = (known after apply)
    + kms_key_id = (known after apply)
    + master_password = (sensitive value)
    + master_username = "xxxxxxxxx"
    + port = (known after apply)
    + preferred_backup_window = "07:00-09:00"
    + preferred_maintenance_window = (known after apply)
    + reader_endpoint = (known after apply)
    + skip_final_snapshot = false
    + storage_encrypted = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_security_group_ids = (known after apply)

    + scaling_configuration {
    + auto_pause = true
    + max_capacity = 4
    + min_capacity = 2
    + seconds_until_auto_pause = 300
    + timeout_action = "RollbackCapacityChange"
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret.rds_login will be created
    + resource "aws_secretsmanager_secret" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + policy = (known after apply)
    + recovery_window_in_days = 30
    + rotation_enabled = (known after apply)
    + rotation_lambda_arn = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }

    + rotation_rules {
    + automatically_after_days = (known after apply)
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret_version.rds_login will be created
    + resource "aws_secretsmanager_secret_version" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + secret_id = (known after apply)
    + secret_string = (sensitive value)
    + version_id = (known after apply)
    + version_stages = (known after apply)
    }

    # module.rds_cluster.aws_security_group.rds_cluster_access will be created
    + resource "aws_security_group" "rds_cluster_access" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + egress = (known after apply)
    + id = (known after apply)
    + ingress = (known after apply)
    + name = (known after apply)
    + name_prefix = "cumulus_rds_cluster_access_ingress"
    + owner_id = (known after apply)
    + revoke_rules_on_delete = false
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_id = "vpc-xxxxxxxxx"
    }

    # module.rds_cluster.aws_security_group_rule.rds_security_group_allow_PostgreSQL will be created
    + resource "aws_security_group_rule" "rds_security_group_allow_postgres" {
    + from_port = 5432
    + id = (known after apply)
    + protocol = "tcp"
    + security_group_id = (known after apply)
    + self = true
    + source_security_group_id = (known after apply)
    + to_port = 5432
    + type = "ingress"
    }

    Plan: 6 to add, 0 to change, 0 to destroy.

    Do you want to perform these actions?
    Terraform will perform the actions described above.
    Only 'yes' will be accepted to approve.

    Enter a value: yes

    module.rds_cluster.aws_db_subnet_group.default: Creating...
    module.rds_cluster.aws_security_group.rds_cluster_access: Creating...
    module.rds_cluster.aws_secretsmanager_secret.rds_login: Creating...

    Then, after the resources are created:

    Apply complete! Resources: X added, 0 changed, 0 destroyed.
    Releasing state lock. This may take a few moments...

    Outputs:

    admin_db_login_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxxxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmdR
    admin_db_login_secret_version = xxxxxxxxx
    rds_endpoint = xxxxxxxxx.us-east-1.rds.amazonaws.com
    security_group_id = xxxxxxxxx
    user_credentials_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmpXA

    Note the output values for admin_db_login_secret_arn (and optionally user_credentials_secret_arn) as these provide the AWS Secrets Manager secret required to access the database as the administrative user and, optionally, the user database credentials Cumulus requires as well.

    The content of each of these secrets are is in the form:

    {
    "database": "postgres",
    "dbClusterIdentifier": "clusterName",
    "engine": "postgres",
    "host": "xxx",
    "password": "defaultPassword",
    "port": 5432,
    "username": "xxx"
    }
    • database -- the PostgreSQL database used by the configured user
    • dbClusterIdentifier -- the value set by the cluster_identifier variable in the terraform module
    • engine -- the Aurora/RDS database engine
    • host -- the RDS service host for the database in the form (dbClusterIdentifier)-(AWS ID string).(region).rds.amazonaws.com
    • password -- the database password
    • username -- the account username
    • port -- The database connection port, should always be 5432

    Next Steps

    The database cluster has been created/updated! From here you can continue to add additional user accounts, databases and other database configuration.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/deployment/share-s3-access-logs/index.html b/docs/v9.0.0/deployment/share-s3-access-logs/index.html index 16299008691..370d7d35dca 100644 --- a/docs/v9.0.0/deployment/share-s3-access-logs/index.html +++ b/docs/v9.0.0/deployment/share-s3-access-logs/index.html @@ -5,14 +5,14 @@ Share S3 Access Logs | Cumulus Documentation - +
    Version: v9.0.0

    Share S3 Access Logs

    It is possible through Cumulus to share S3 access logs across multiple S3 packages using the S3 replicator package.

    S3 Replicator

    The S3 Replicator is a node package that contains a simple lambda function, associated permissions, and the Terraform instructions to replicate create-object events from one S3 bucket to another.

    First ensure that you have enabled S3 Server Access Logging.

    Next configure your config.tfvars as described in the s3-replicator/README.md to correspond to your deployment. The source_bucket and source_prefix are determined by how you enabled the S3 Server Access Logging.

    In order to deploy the s3-replicator with cumulus you will need to add the module to your terraform main.tf definition. e.g.

    module "s3-replicator" {
    source = "<path to s3-replicator.zip>"
    prefix = var.prefix
    vpc_id = var.vpc_id
    subnet_ids = var.subnet_ids
    permissions_boundary = var.permissions_boundary_arn
    source_bucket = var.s3_replicator_config.source_bucket
    source_prefix = var.s3_replicator_config.source_prefix
    target_bucket = var.s3_replicator_config.target_bucket
    target_prefix = var.s3_replicator_config.target_prefix
    }

    The terraform source package can be found on the Cumulus github release page under the asset tab terraform-aws-cumulus-s3-replicator.zip.

    ESDIS Metrics

    In the NGAP environment, the ESDIS Metrics team has set up an ELK stack to process logs from Cumulus instances. To use this system, you must deliver any S3 Server Access logs that Cumulus creates.

    Configure the S3 replicator as described above using the target_bucket and target_prefix provided by the metrics team.

    The metrics team has taken care of setting up Logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/deployment/terraform-best-practices/index.html b/docs/v9.0.0/deployment/terraform-best-practices/index.html index fc26890e386..e49c67c5877 100644 --- a/docs/v9.0.0/deployment/terraform-best-practices/index.html +++ b/docs/v9.0.0/deployment/terraform-best-practices/index.html @@ -5,7 +5,7 @@ Terraform Best Practices | Cumulus Documentation - + @@ -88,7 +88,7 @@ AWS CLI command, replacing PREFIX with your deployment prefix name:

    aws resourcegroupstaggingapi get-resources \
    --query "ResourceTagMappingList[].ResourceARN" \
    --tag-filters Key=Deployment,Values=PREFIX

    Ideally, the output should be an empty list, but if it is not, then you may need to manually delete the listed resources.

    Configuring the Cumulus deployment: link Restoring a previous version: link

    - + \ No newline at end of file diff --git a/docs/v9.0.0/deployment/thin_egress_app/index.html b/docs/v9.0.0/deployment/thin_egress_app/index.html index 35d03debc72..dc0d87f6f6e 100644 --- a/docs/v9.0.0/deployment/thin_egress_app/index.html +++ b/docs/v9.0.0/deployment/thin_egress_app/index.html @@ -5,7 +5,7 @@ Using the Thin Egress App for Cumulus distribution | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v9.0.0

    Using the Thin Egress App for Cumulus distribution

    The Thin Egress App (TEA) is an app running in Lambda that allows retrieving data from S3 using temporary links and provides URS integration.

    Configuring a TEA deployment

    TEA is deployed using Terraform modules. Refer to these instructions for guidance on how to integrate new components with your deployment.

    The cumulus-template-deploy repository cumulus-tf/main.tf contains a thin_egress_app for distribution.

    The TEA module provides these instructions showing how to add it to your deployment and the following are instructions to configure the thin_egress_app module in your Cumulus deployment.

    Create a secret for signing Thin Egress App JWTs

    The Thin Egress App uses JWTs internally to authenticate requests and requires a secret stored in AWS Secrets Manager containing SSH keys that are used to sign the JWTs.

    See the Thin Egress App documentation on how to create this secret with the correct values. It will be used later to set the thin_egress_jwt_secret_name variable when deploying the Cumulus module.

    bucket_map.yaml

    The Thin Egress App uses a bucket_map.yaml file to determine which buckets to serve. Documentation of the file format is available here.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple json mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    Please note: Cumulus only supports a one-to-one mapping of bucket->TEA path for 'distribution' buckets.

    Optionally configure a custom bucket map

    A simple config would look something like this:

    bucket_map.yaml
    MAP:
    my-protected: my-protected
    my-public: my-public

    PUBLIC_BUCKETS:
    - my-public

    Please note: your custom bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Optionally configure shared variables

    The cumulus module deploys certain components that interact with TEA. As a result, the cumulus module requires that if you are specifying a value for the stage_name variable to the TEA module, you must use the same value for the tea_api_gateway_stage variable to the cumulus module.

    One way to keep these variable values in sync across the modules is to use Terraform local values to define values to use for the variables for both modules. This approach is shown in the Cumulus core example deployment code.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/deployment/upgrade-readme/index.html b/docs/v9.0.0/deployment/upgrade-readme/index.html index 68414ba679f..b023272b549 100644 --- a/docs/v9.0.0/deployment/upgrade-readme/index.html +++ b/docs/v9.0.0/deployment/upgrade-readme/index.html @@ -5,7 +5,7 @@ Upgrading Cumulus | Cumulus Documentation - + @@ -15,7 +15,7 @@ deployment functions correctly. Please refer to some recommended smoke tests given above, and consider additional tests appropriate for your particular deployment and environment.

    Update Cumulus Dashboard

    If there are breaking (or otherwise significant) changes to the Cumulus API, you should also upgrade your Cumulus Dashboard deployment to use the version of the Cumulus API matching the version of Cumulus to which you are migrating.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/development/forked-pr/index.html b/docs/v9.0.0/development/forked-pr/index.html index 7bae5dec43a..ca2baca1610 100644 --- a/docs/v9.0.0/development/forked-pr/index.html +++ b/docs/v9.0.0/development/forked-pr/index.html @@ -5,13 +5,13 @@ Issuing PR From Forked Repos | Cumulus Documentation - +
    Version: v9.0.0

    Issuing PR From Forked Repos

    Fork the Repo

    • Fork the Cumulus repo
    • Create a new branch from the branch you'd like to contribute to
    • If an issue does't already exist, submit one (see above)

    Create a Pull Request

    Reviewing PRs from Forked Repos

    Upon submission of a pull request, the Cumulus development team will review the code.

    Once the code passes an initial review, the team will run the CI tests against the proposed update.

    The request will then either be merged, declined, or an adjustment to the code will be requested via the issue opened with the original PR request.

    PRs from forked repos cannot directly merged to master. Cumulus reviews must follow the following steps before completing the review process:

    1. Create a new branch:

        git checkout -b from-<name-of-the-branch> master
    2. Push the new branch to GitHub

    3. Change the destination of the forked PR to the new branch that was just pushed

      Screenshot of Github interface showing how to change the base branch of a pull request

    4. After code review and approval, merge the forked PR to the new branch.

    5. Create a PR for the new branch to master.

    6. If the CI tests pass, merge the new branch to master and close the issue. If the CI tests do not pass, request an amended PR from the original author/ or resolve failures as appropriate.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/development/integration-tests/index.html b/docs/v9.0.0/development/integration-tests/index.html index 1b6be0504e5..3f81572b8d1 100644 --- a/docs/v9.0.0/development/integration-tests/index.html +++ b/docs/v9.0.0/development/integration-tests/index.html @@ -5,7 +5,7 @@ Integration Tests | Cumulus Documentation - + @@ -19,7 +19,7 @@ in the commit message.

    If you create a new stack and want to be able to run integration tests against it in CI, you will need to add it to bamboo/select-stack.js.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/development/quality-and-coverage/index.html b/docs/v9.0.0/development/quality-and-coverage/index.html index 263c1292bcb..d04baf1d053 100644 --- a/docs/v9.0.0/development/quality-and-coverage/index.html +++ b/docs/v9.0.0/development/quality-and-coverage/index.html @@ -5,7 +5,7 @@ Code Coverage and Quality | Cumulus Documentation - + @@ -23,7 +23,7 @@ here.

    To run linting on the markdown files, run npm run lint-md.

    Audit

    This project uses audit-ci to run a security audit on the package dependency tree. This must pass prior to merge. The configured rules for audit-ci can be found here.

    To execute an audit, run npm run audit.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/development/release/index.html b/docs/v9.0.0/development/release/index.html index 72139ea7f61..cc3d3e63ddf 100644 --- a/docs/v9.0.0/development/release/index.html +++ b/docs/v9.0.0/development/release/index.html @@ -5,7 +5,7 @@ Versioning and Releases | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v9.0.0

    Versioning and Releases

    Versioning

    We use a global versioning approach, meaning version numbers in cumulus are consistent across all packages and tasks, and semantic versioning to track major, minor, and patch version (i.e. 1.0.0). We use Lerna to manage our versioning. Any change will force lerna to increment the version of all packages.

    Read more about the semantic versioning here.

    Pre-release testing

    Note: This is only necessary when preparing a release for a new major version of Cumulus (e.g. preparing to go from 6.x.x to 7.0.0)

    Before releasing a new major version of Cumulus, we should test the deployment upgrade path from the latest release of Cumulus to the upcoming release.

    It is preferable to use the cumulus-template-deploy repo for testing the deployment, since that repo is the officially recommended deployment configuration for end users.

    You should create an entirely new deployment for this testing to replicate the end user upgrade path. Using an existing test or CI deployment would not be useful because that deployment may already have been deployed with the latest changes and not match the upgrade path for end users.

    Pre-release testing steps:

    1. Checkout the cumulus-template-deploy repo

    2. Update the deployment code to use the latest release artifacts if it wasn't done already. For example, assuming that the latest release was 5.0.1, update the deployment files as follows:

      # in data-persistence-tf/main.tf
      source = "https://github.com/nasa/cumulus/releases/download/v5.0.1/terraform-aws-cumulus.zip//tf-modules/data-persistence"

      # in cumulus-tf/main.tf
      source = "https://github.com/nasa/cumulus/releases/download/v5.0.1/terraform-aws-cumulus.zip//tf-modules/cumulus"
    3. For both the data-persistence-tf and cumulus-tf modules:

      1. Add the necessary backend configuration (terraform.tf) and variables (terraform.tfvars)
        • You should use an entirely new deployment for this testing, so make sure to use values for key in terraform.tf and prefix in terraform.tfvars that don't collide with existing deployments
      2. Run terraform init
      3. Run terraform apply
    4. Checkout the master branch of the cumulus repo

    5. Run a full bootstrap of the code: npm run bootstrap

    6. Build the pre-release artifacts: ./bamboo/create-release-artifacts.sh

    7. For both the data-persistence-tf and cumulus-tf modules:

      1. Update the deployment to use the built release artifacts:

        # in data-persistence-tf/main.tf
        source = "[path]/cumulus/terraform-aws-cumulus.zip//tf-modules/data-persistence"

        # in cumulus-tf/main.tf
        source = "/Users/mboyd/development/cumulus/terraform-aws-cumulus.zip//tf-modules/cumulus"
      2. Review the CHANGELOG.md for any pre-deployment migration steps. If there are, go through the steps and confirm that they are successful

      3. Run terraform init

      4. Run terraform apply

    8. Review the CHANGELOG.md for any post-deployment migration steps and confirm that they are successful

    9. Delete your test deployment by running terraform destroy in cumulus-tf and data-persistence-tf

    Updating Cumulus version and publishing to NPM

    1. Create a branch for the new release

    From Master

    Create a branch titled release-MAJOR.MINOR.x for the release.

    git checkout -b release-MAJOR.MINOR.x

    If creating a new major version release from master, say 5.0.0, then the branch would be named release-5.0.x. If creating a new minor version release from master, say 1.14.0 then the branch would be named release-1.14.x.

    Having a release branch for each major/minor version allows us to easily backport patches to that version.

    Push the release-MAJOR.MINOR.x branch to GitHub if it was created locally. (Commits should be even with master at this point.)

    If creating a patch release, you can check out the existing base branch.

    Then create the release branch (e.g. release-1.14.0) from the minor version base branch. For example, from the release-1.14.x branch:

    git checkout -b release-1.14.0

    Backporting

    When creating a backport, a minor version base branch should already exist on GitHub. Check out the existing minor version base branch then create a release branch from it. For example:

    # check out existing minor version base branch
    git checkout release-1.14.x
    # create new release branch for backport
    git checkout -b release-1.14.1

    2. Update the Cumulus version number

    When changes are ready to be released, the Cumulus version number must be updated.

    Lerna handles the process of deciding which version number should be used as long as the developer specifies whether the change is a major, minor, or patch change.

    To update Cumulus's version number run:

    npm run update

    Screenshot of terminal showing interactive prompt from Lerna for selecting the new release version

    Lerna will handle updating the packages and all of the dependent package version numbers. If a dependency has not been changed with the update, however, lerna will not update the version of the dependency.

    Note: Lerna will struggle to correctly update the versions on any non-standard/alpha versions (e.g. 1.17.0-alpha0). Please be sure to check any packages that are new or have been manually published since the previous release and any packages that list it as a dependency to ensure the listed versions are correct. It's useful to use the search feature of your code editor or grep to see if there any references to outdated package versions.

    3. Check Cumulus Dashboard PRs for Version Bump

    There may be unreleased changes in the Cumulus Dashboard project that rely on this unreleased Cumulus Core version.

    If there is exists a PR in the cumulus-dashboard repo with a name containing: "Version Bump for Next Cumulus API Release":

    • There will be a placeholder change-me value that should be replaced with the Cumulus Core to-be-released-version.
    • Mark that PR as ready to be reviewed.

    4. Update CHANGELOG.md

    Update the CHANGELOG.md. Put a header under the Unreleased section with the new version number and the date.

    Add a link reference for the github "compare" view at the bottom of the CHANGELOG.md, following the existing pattern. This link reference should create a link in the CHANGELOG's release header to changes in the corresponding release.

    5. Update DATA_MODEL_CHANGELOG.md

    Similar to #4, make sure the DATA_MODEL_CHANGELOG is updated if there are data model changes in the release, and the link reference at the end of the document is updated as appropriate.

    5. Update CONTRIBUTORS.md

    ./bin/update-contributors.sh
    git add CONTRIBUTORS.md

    Commit and push these changes, if any.

    6. Update Cumulus package API documentation

    Update auto-generated API documentation for any Cumulus packages that have it:

    npm run docs-build-packages

    Commit and push these changes, if any.

    7. Cut new version of Cumulus Documentation

    If this is a backport, do not create a new version of the documentation. For various reasons, we do not merge backports back to master, other than changelog notes. Documentation changes for backports will not be published to our documentation website.

    cd website
    npm run version ${release_version}
    git add .

    Where ${release_version} corresponds to the version tag v1.2.3, for example.

    Commit and push these changes.

    8. Create a pull request against the minor version branch

    1. Push the release branch (e.g. release-1.2.3) to GitHub.

    2. Create a PR against the minor version base branch (e.g. release-1.2.x).

    3. Configure Bamboo to run automated tests against this PR by finding the branch plan for the release branch (release-1.2.3) and setting only these variables:

      • GIT_PR: true
      • SKIP_AUDIT: true

      IMPORTANT: Do NOT set the PUBLISH_FLAG variable to true for this branch plan. The actual publishing of the release will be handled by a separate, manually triggered branch plan.

      Screenshot of Bamboo CI interface showing the configuration of the GIT_PR branch variable to have a value of &quot;true&quot;

    4. Verify that the Bamboo build for the PR succeeds and then merge to the minor version base branch (release-1.2.x).

      • It is safe to do a squash merge in this instance, but not required
    5. You may delete your release branch (release-1.2.3) after merging to the base branch.

    9. Create a git tag for the release

    Check out the minor version base branch now that your changes are merged in and do a git pull.

    Ensure you are on the latest commit.

    Create and push a new git tag:

    git tag -a v1.x.x -m "Release 1.x.x"
    git push origin v1.x.x

    10. Publishing the release

    Publishing of new releases is handled by a custom Bamboo branch plan and is manually triggered.

    The reasons for using a separate branch plan to handle releases instead of the branch plan for the minor version (e.g. release-1.2.x) are:

    • The Bamboo build for the minor version release branch is triggered automatically on any commits to that branch, whereas we want to manually control when the release is published.
    • We want to verify that integration tests have passed on the Bamboo build for the minor version release branch before we manually trigger the release, so that we can be sure that our code is safe to release.

    If this is a new minor version branch, then you will need to create a new Bamboo branch plan for publishing the release following the instructions below:

    Creating a Bamboo branch plan for the release

    • In the Cumulus Core project (https://ci.earthdata.nasa.gov/browse/CUM-CBA), click Actions -> Configure Plan in the top right.

    • Next to Plan branch click the rightmost button that displays Create Plan Branch upon hover.

    • Click Create plan branch manually.

    • Add the values in that list. Choose a display name that makes it very clear this is a deployment branch plan. Release (minor version branch name) seems to work well (e.g. Release (1.2.x))).

      • Make sure you enter the correct branch name (e.g. release-1.2.x).
    • Important Deselect Enable Branch - if you do not do this, it will immediately fire off a build.

    • Do Immediately On the Branch Details page, enable Change trigger. Set the Trigger type to manual, this will prevent commits to the branch from triggering the build plan. You should have been redirected to the Branch Details tab after creating the plan. If not, navigate to the branch from the list where you clicked Create Plan Branch in the previous step.

    • Go to the Variables tab. Ensure that you are on your branch plan and not the master plan: You should not see a large list of configured variables, but instead a dropdown allowing you to select variables to override, and the tab title will be Branch Variables. Then set the branch variables as follow:

      • DEPLOYMENT: cumulus-from-npm-tf (except in special cases such as incompatible backport branches)
        • If this variable is not set, it will default to the deployment name for the last committer on the branch
      • USE_CACHED_BOOTSTRAP: false
      • USE_TERRAFORM_ZIPS: true (IMPORTANT: MUST be set in order to run integration tests against the .zip files published during the build so that we are actually testing our released files)
      • GIT_PR: true
      • SKIP_AUDIT: true
      • PUBLISH_FLAG: true
    • Enable the branch from the Branch Details page.

    • Run the branch using the Run button in the top right.

    Bamboo will build and run lint, audit and unit tests against that tagged release, publish the new packages to NPM, and then run the integration tests using those newly released packages.

    11. Create a new Cumulus release on github

    The CI release scripts will automatically create a GitHub release based on the release version tag, as well as upload artifacts to the Github release for the Terraform modules provided by Cumulus. The Terraform release artifacts include:

    • A multi-module Terraform .zip artifact containing filtered copies of the tf-modules, packages, and tasks directories for use as Terraform module sources.
    • A S3 replicator module
    • A workflow module
    • A distribution API module
    • An ECS service module

    Just make sure to verify the appropriate .zip files are present on Github after the release process is complete.

    12. Merge base branch back to master

    Finally, you need to reproduce the version update changes back to master.

    If this is the latest version, you can simply create a PR to merge the minor version base branch back to master.

    IMPORTANT: Do not squash this merge. Doing so will make the "compare" view from step 4 show an incorrect diff, because the tag is linked to a specific commit on the base branch.

    If this is a backport, you will need to create a PR that ports the changelog updates back to master. It is important in this changelog note to call it out as a backport. For example, fixes in backport version 1.14.5 may not be available in 1.15.0 because the fix was introduced in 1.15.3.

    Troubleshooting

    Delete and regenerate the tag

    To delete a published tag to re-tag, follow these steps:

      git tag -d v1.x.x
    git push -d origin v1.x.x
    - + \ No newline at end of file diff --git a/docs/v9.0.0/docs-how-to/index.html b/docs/v9.0.0/docs-how-to/index.html index 4906c66c00e..e25076bbf31 100644 --- a/docs/v9.0.0/docs-how-to/index.html +++ b/docs/v9.0.0/docs-how-to/index.html @@ -5,13 +5,13 @@ Cumulus Documentation: How To's | Cumulus Documentation - +
    Version: v9.0.0

    Cumulus Documentation: How To's

    Cumulus Docs Installation

    Run a Local Server

    Environment variables DOCSEARCH_API_KEY and DOCSEARCH_INDEX_NAME must be set for search to work. At the moment, search is only truly functional on prod because that is the only website we have registered to be indexed with DocSearch (see below on search).

    git clone git@github.com:nasa/cumulus
    cd cumulus
    npm run docs-install
    npm run docs-serve

    Note: docs-build will build the documents into website/build.

    Cumulus Documentation

    Our project documentation is hosted on GitHub Pages. The resources published to this website are housed in docs/ directory at the top of the Cumulus repository. Those resources primarily consist of markdown files and images.

    We use the open-source static website generator Docusaurus to build html files from our markdown documentation, add some organization and navigation, and provide some other niceties in the final website (search, easy templating, etc.).

    Add a New Page and Sidebars

    Adding a new page should be as simple as writing some documentation in markdown, placing it under the correct directory in the docs/ folder and adding some configuration values wrapped by --- at the top of the file. There are many files that already have this header which can be used as reference.

    ---
    id: doc-unique-id # unique id for this document. This must be unique across ALL documentation under docs/
    title: Title Of Doc # Whatever title you feel like adding. This will show up as the index to this page on the sidebar.
    hide_title: false
    ---

    Note: To have the new page show up in a sidebar the designated id must be added to a sidebar in the website/sidebars.js file. Docusaurus has an in depth explanation of sidebars here.

    Versioning Docs

    We lean heavily on Docusaurus for versioning. Their suggestions and walkthrough can be found here. It is worth noting that we would like the Documentation versions to match up directly with release versions. Cumulus versioning is explained in the Versioning Docs.

    Search on our documentation site is taken care of by DocSearch. We have been provided with an apiKey and an indexName by DocSearch that we include in our website/siteConfig.js file. The rest, indexing and actual searching, we leave to DocSearch. Our builds expect environment variables for both these values to exist - DOCSEARCH_API_KEY and DOCSEARCH_NAME_INDEX.

    Add a new task

    The tasks list in docs/tasks.md is generated from the list of task package in the task folder. Do not edit the docs/tasks.md file directly.

    Read more about adding a new task.

    Editing the tasks.md header or template

    Look at the bin/build-tasks-doc.js and bin/tasks-header.md files to edit the output of the tasks build script.

    Editing diagrams

    For some diagrams included in the documentation, the raw source is included in the docs/assets/raw directory to allow for easy updating in the future:

    • assets/interfaces.svg -> assets/raw/interfaces.drawio (generated using draw.io)

    Deployment

    The master branch is automatically built and deployed to gh-pages branch. The gh-pages branch is served by Github Pages. Do not make edits to the gh-pages branch.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/external-contributions/index.html b/docs/v9.0.0/external-contributions/index.html index 17bfa1287d2..1df51f24d99 100644 --- a/docs/v9.0.0/external-contributions/index.html +++ b/docs/v9.0.0/external-contributions/index.html @@ -5,13 +5,13 @@ External Contributions | Cumulus Documentation - +
    Version: v9.0.0

    External Contributions

    Contributions to Cumulus may be made in the form of PRs to the repositories directly or through externally developed tasks and components. Cumulus is designed as an ecosystem that leverages Terraform deployments and AWS Step Functions to easily integrate external components.

    This list may not be exhaustive and represents components that are open source, owned externally, and that have been tested with the Cumulus system. For more information and contributing guidelines, visit the respective GitHub repositories.

    Distribution

    The ASF Thin Egress App is used by Cumulus for distribution. TEA can be deployed with Cumulus or as part of other applications to distribute data.

    Operational Cloud Recovery Archive (ORCA)

    ORCA can be deployed with Cumulus to provide a customizable baseline for creating and managing operational backups.

    Workflow Tasks

    CNM

    PO.DAAC provides two workflow tasks to be used with the Cloud Notification Mechanism (CNM) Schema: CNM to Granule and CNM Response.

    See the CNM workflow data cookbook for an example of how these can be used in a Cumulus ingest workflow.

    DMR++ Generation

    GHRC has provided a DMR++ Generation wokrflow task. This task is meant to be used in conjunction with Cumulus' Hyrax Metadata Updates workflow task.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/faqs/index.html b/docs/v9.0.0/faqs/index.html index c36a948182c..6fbfaeb6c78 100644 --- a/docs/v9.0.0/faqs/index.html +++ b/docs/v9.0.0/faqs/index.html @@ -5,13 +5,13 @@ Frequently Asked Questions | Cumulus Documentation - +
    Version: v9.0.0

    Frequently Asked Questions

    Below are some commonly asked questions that you may encounter that can assist you along the way when working with Cumulus.

    General

    How do I deploy a new instance in Cumulus?

    Answer: For steps on the Cumulus deployment process go to How to Deploy Cumulus.

    What prerequisites are needed to setup Cumulus?

    Answer: You will need access to the AWS console and an Earthdata login before you can deploy Cumulus.

    What is the preferred web browser for the Cumulus environment?

    Answer: Our preferred web browser is the latest version of Google Chrome.

    How do I quickly troubleshoot an issue in Cumulus?

    Answer: To troubleshoot and fix issues in Cumulus reference our recommended solutions in Troubleshooting Cumulus.

    Where can I get support help?

    Answer: The following options are available for assistance:

    • Cumulus: Outside NASA users should file a GitHub issue and inside NASA users should file a JIRA issue.
    • AWS: You can create a case in the AWS Support Center, accessible via your AWS Console.

    Integrators & Developers

    What is a Cumulus integrator?

    Answer: Those who are working within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    What are the steps if I run into an issue during deployment?

    Answer: If you encounter an issue with your deployment go to the Troubleshooting Deployment guide.

    What is a Cumulus workflow?

    Answer: A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions. For more details, we suggest visiting here.

    How do I set up a Cumulus workflow?

    Answer: You will need to create a provider, have an associated collection (add a new one), and generate a new rule first. Then you can set up a Cumulus workflow by following these steps here.

    What are the common use cases that a Cumulus integrator encounters?

    Answer: The following are some examples of possible use cases you may see:


    Operators

    What is a Cumulus operator?

    Answer: Those that ingests, archives, and troubleshoots datasets (called collections in Cumulus). Your daily activities might include but not limited to the following:

    • Ingesting datasets
    • Maintaining historical data ingest
    • Starting and stopping data handlers
    • Managing collections
    • Managing provider definitions
    • Creating, enabling, and disabling rules
    • Investigating errors for granules and deleting or re-ingesting granules
    • Investigating errors in executions and isolating failed workflow step(s)
    What are the common use cases that a Cumulus operator encounters?

    Answer: The following are some examples of possible use cases you may see:

    Can you re-run a workflow execution in AWS?

    Answer: Yes. For steps on how to re-run a workflow execution go to Re-running workflow executions in the Cumulus Operator Docs.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/features/ancillary_metadata/index.html b/docs/v9.0.0/features/ancillary_metadata/index.html index 50ede62d2f9..ed828bc8cf3 100644 --- a/docs/v9.0.0/features/ancillary_metadata/index.html +++ b/docs/v9.0.0/features/ancillary_metadata/index.html @@ -5,7 +5,7 @@ Ancillary Metadata Export | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v9.0.0

    Ancillary Metadata Export

    This feature utilizes the type key on a files object in a Cumulus granule. It uses the key to provide a mechanism where granule discovery, processing and other tasks can set and use this value to facilitate metadata export to CMR.

    Tasks setting type

    Discover Granules

    Uses the Collection type key to set the value for files on discovered granules in it's output.

    Parse PDR

    Uses a task-specific mapping to map PDR 'FILE_TYPE' to a CNM type to set type on granules from the PDR.

    CNMToCMALambdaFunction

    Natively supports types that are included in incoming messages to a CNM Workflow.

    Tasks using type

    Move Granules

    Uses the granule file type key to update UMM/ECHO 10 CMR files passed in as candidates to the task. This task adds the external facing URLs to the CMR metadata file based on the type. See the file tracking data cookbook for a detailed mapping. If a non-CNM type is specified, the task assumes it is a 'data' file.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/features/backup_and_restore/index.html b/docs/v9.0.0/features/backup_and_restore/index.html index 5a9c4a1a042..cfd0cd2f80b 100644 --- a/docs/v9.0.0/features/backup_and_restore/index.html +++ b/docs/v9.0.0/features/backup_and_restore/index.html @@ -5,7 +5,7 @@ Cumulus Backup and Restore | Cumulus Documentation - + @@ -71,7 +71,7 @@ utilize the new cluster/security groups and redeploy.

    DynamoDB

    Backup and Restore with AWS

    You can enable point-in-time recovery (PITR) as well as create an on-demand backup for your Amazon DynamoDB tables.

    PITR provides continuous backups of your DynamoDB table data. PITR can be enabled through your Terraform deployment, the AWS console, or the AWS API. When enabled, DynamoDB maintains continuous backups of your table up to the last 35 days. You can recover a copy of that table to a previous state at any point in time from the moment you enable PITR, up to a maximum of the 35 preceding days. PITR provides continuous backups until you explicitly disable it.

    On-demand backups allow you to create backups of DynamoDB table data and its settings. You can initiate an on-demand backup at any time with a single click from the AWS Management Console or a single API call. You can restore the backups to a new DynamoDB table in the same AWS Region at any time.

    PITR gives your DynamoDB tables continuous protection from accidental writes and deletes. With PITR, you do not have to worry about creating, maintaining, or scheduling backups. You enable PITR on your table and your backup is available for restore at any point in time from the moment you enable it, up to a maximum of the 35 preceding days. For example, imagine a test script writing accidentally to a production DynamoDB table. You could recover your table to any point in time within the last 35 days.

    On-demand backups help with long-term archival requirements for regulatory compliance. On-demand backups give you full-control of managing the lifecycle of your backups, from creating as many backups as you need to retaining these for as long as you need.

    Enabling PITR during deployment

    By default, the Cumulus data-persistence module enables PITR on the default tables listed in the module's variable defaults for enable_point_in_time_tables. At the time of writing, that list includes:

    • AsyncOperationsTable
    • CollectionsTable
    • ExecutionsTable
    • FilesTable
    • GranulesTable
    • PdrsTable
    • ProvidersTable
    • RulesTable

    If you wish to change this list, simply update your deployment's data_persistence module (here in the template-deploy repository) to pass the correct list of tables.

    Restoring with PITR

    Restoring a full deployment

    If your deployment has been deleted all of your tables with PITR enabled will have had backups created automatically. You can locate these backups in the AWS console in the DynamoDb Backups Page or through the CLI by running:

    aws dynamodb list-backups --backup-type SYSTEM

    You can restore your tables to your AWS account using the following command:

    aws dynamodb restore-table-from-backup --target-table-name <prefix>-CollectionsTable --backup-arn <backup-arn>

    Where prefix matches the prefix from your data-persistence deployment. backup-arn can be found in the AWS console or by listing the backups using the command above.

    This will restore your tables to AWS. They will need to be linked to your Terraform deployment. After terraform init and before terraform apply, run the following command for each table:

    terraform import module.data_persistence.aws_dynamodb_table.collections_table <prefix>-CollectionsTable

    replacing collections_table with the table identifier in the DynamoDB Terraform table definitions.

    Terraform will now manage these tables as part of the Terraform state. Run terrform apply to generate the rest of the data-persistence deployment and then follow the instructions to deploy the cumulus deployment as normal.

    At this point the data will be in DynamoDB, but not in Elasticsearch, so nothing will be returned on the Operator dashboard or through Operator API calls. To get the data into Elasticsearch, run an index-from-database operation via the Operator API. The status of this operation can be viewed on the dashboard. When Elasticsearch is switched to the recovery index the data will be visible on the dashboard and available via the Operator API.

    Restoring an individual table

    A table can be restored to a previous state using PITR. This is easily achievable via the AWS Console by visiting the Backups tab for the table.

    A table can only be recovered to a new table name. Following the restoration of the table, the new table must be imported into Terraform.

    First, remove the old table from the Terraform state:

    terraform state rm module.data_persistence.aws_dynamodb_table.collections_table

    replacing collections_table with the table identifier in the DynamoDB Terraform table definitions.

    Then import the new table into the Terraform state:

    terraform import module.data_persistence.aws_dynamodb_table.collections_table <new-table-name>

    replacing collections_table with the table identifier in the DynamoDB Terraform table definitions.

    Your data-persistence and cumulus deployments should be redeployed so that your instance of Cumulus uses this new table. After the deployment, your Elasticsearch instance will be out of sync with your new table if there is any change in data. To resync your Elasticsearch with your database run an index-from-database operation via the Operator API. The status of this operation can be viewed on the dashboard. When Elasticsearch is switched to the new index the DynamoDB tables and Elasticsearch instance will be in sync and the correct data will be reflected on the dashboard.

    Backup and Restore with cumulus-api CLI

    cumulus-api CLI also includes a backup and restore command. The CLI backup command downloads the content of any of your DynamoDB tables to .json files. You can also use these .json files to restore the records to another DynamoDB table.

    Backup with the CLI

    To backup a table with the CLI, install the @cumulus/api package using npm, making sure to install the same version as your Cumulus deployment:

    npm install -g @cumulus/api@version

    Then run:

    cumulus-api backup --table <table-name>

    the backup will be stored at backups/<table-name>.json

    Restore with the CLI

    To restore data from a json file run the following command:

    cumulus-api restore backups/<table-name>.json --table <table-name>

    The restore can go to the in-use table and will update Elasticsearch. If an existing record exists in the table it will not be duplicated but will be updated with the record from the restore file.

    Data Backup and Restore

    Cumulus provides no core functionality to backup data stored in S3. Data disaster recovery is being developed in a separate effort here.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/features/data_in_dynamodb/index.html b/docs/v9.0.0/features/data_in_dynamodb/index.html index 2ba831ba03c..f8e6c62a0ce 100644 --- a/docs/v9.0.0/features/data_in_dynamodb/index.html +++ b/docs/v9.0.0/features/data_in_dynamodb/index.html @@ -5,13 +5,13 @@ Cumulus Metadata in DynamoDB | Cumulus Documentation - +
    Version: v9.0.0

    Cumulus Metadata in DynamoDB

    @cumulus/api uses a number of methods to preserve the metadata generated in a Cumulus instance.

    All configurations and system-generated metadata is stored in DynamoDB tables except the logs. System logs are stored in the AWS CloudWatch service.

    Amazon DynamoDB stores three geographically distributed replicas of each table to enable high availability and data durability. Amazon DynamoDB runs exclusively on solid-state drives (SSDs). SSDs help AWS achieve the design goals of predictable low-latency response times for storing and accessing data at any scale.

    DynamoDB Auto Scaling

    Cumulus deployed tables from the data-persistence module are set to on-demand mode.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/features/dead_letter_queues/index.html b/docs/v9.0.0/features/dead_letter_queues/index.html index d72267c923e..23adc01e9d9 100644 --- a/docs/v9.0.0/features/dead_letter_queues/index.html +++ b/docs/v9.0.0/features/dead_letter_queues/index.html @@ -5,13 +5,13 @@ Dead Letter Queues | Cumulus Documentation - +
    Version: v9.0.0

    Dead Letter Queues

    startSF SQS queue

    The workflow-trigger for the startSF queue has a Redrive Policy set up that directs any failed attempts to pull from the workflow start queue to a SQS queue Dead Letter Queue.

    This queue can then be monitored for failures to initiate a workflow. Please note that workflow failures will not show up in this queue, only repeated failure to trigger a workflow.

    Named Lambda Dead Letter Queues

    Cumulus provides configured Dead Letter Queues (DLQ) for non-workflow Lambdas (such as ScheduleSF) to capture Lambda failures for further processing.

    These DLQs are setup with the following configuration:

      receive_wait_time_seconds  = 20
    message_retention_seconds = 1209600
    visibility_timeout_seconds = 60

    Default Lambda Configuration

    The following built-in Cumulus Lambdas are setup with DLQs to allow handling of process failures:

    • dbIndexer (Updates Elasticsearch based on DynamoDB events)
    • EmsIngestReport (Daily EMS ingest report generation Lambda)
    • JobsLambda (writes logs outputs to Elasticsearch)
    • ScheduleSF (the SF Scheduler Lambda that places messages on the queue that is used to start workflows, see Workflow Triggers)
    • publishReports (Lambda that publishes messages to the SNS topics for execution, granule and PDR reporting)
    • reportGranules, reportExecutions, reportPdrs (Lambdas responsible for updating records based on messages in the queues published by publishReports)

    Troubleshooting/Utilizing messages in a Dead Letter Queue

    Ideally an automated process should be configured to poll the queue and process messages off a dead letter queue.

    For aid in manually troubleshooting, you can utilize the SQS Management console to view/messages available in the queues setup for a particular stack. The dead letter queues will have a Message Body containing the Lambda payload, as well as Message Attributes that reference both the error returned and a RequestID which can be cross referenced to the associated Lambda's CloudWatch logs for more information:

    Screenshot of the AWS SQS console showing how to view SQS message attributes

    - + \ No newline at end of file diff --git a/docs/v9.0.0/features/distribution-metrics/index.html b/docs/v9.0.0/features/distribution-metrics/index.html index e2c58dbbabe..cda1bdb9399 100644 --- a/docs/v9.0.0/features/distribution-metrics/index.html +++ b/docs/v9.0.0/features/distribution-metrics/index.html @@ -5,13 +5,13 @@ Cumulus Distribution Metrics | Cumulus Documentation - +
    Version: v9.0.0

    Cumulus Distribution Metrics

    It is possible to configure Cumulus and the Cumulus Dashboard to display information about the successes and failures of requests for data. This requires the Cumulus instance to deliver Cloudwatch Logs and S3 Server Access logs to an ELK stack.

    ESDIS Metrics in NGAP

    Work with the ESDIS metrics team to set up permissions and access to forward Cloudwatch Logs to a shared AWS:Logs:Destination as well as transferring your S3 Server Access logs to a metrics team bucket.

    The metrics team has taken care of setting up logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    Once Cumulus has been configured to deliver Cloudwatch logs to the ESDIS Metrics team, you can create a Kibana index pattern associated with your Cumulus stack. The metrics team has worked out a convention with the Cumulus devlopers to ensure access to your stack's logs. The important piece is that the Kibana index pattern is created with the exact name of the prefix (stackName) with which cumulus was deployed.

    Cumulus / ESDIS Metrics distribution system

    Architecture diagram showing how logs are replicated from a Cumulus instance to the ESDIS Metrics account and accessed by the Cumulus dashboard

    Kibana Index

    Before creating the Kibana index, verify that the Elasticsearch instance has been populated with at least one record1. Do this by visiting the Kibana endpoint and selecting Management, Elasticsearch Index Management and type the stack's prefix into the search bar. When you see an index with <prefix>-cloudwatch-YYY.MM.dd you are ready to continue. If you don't see at least one index for your stack, check to make sure you are delivering your logs to this Elasticsearch instance.

    Step 1: create the index by selecting Management, Kibana Index Patterns. Use an index pattern of <prefix>-* and continue to the Next step.

    Screenshot of Kibana console showing how to configure an index pattern to target logs from a Cumulus deployment

    Step 2: Set the Time Filter field name to @timestamp with the pulldown option. Very importantly Show advanced options to create a Custom index Pattern ID that is your <prefix>. Then Create index pattern. This important convention allows the dashboard to know which index to use to find the distribution metrics for a particular stack.

    Screenshot of Kibana console showing how to configure settings for an index pattern to target logs from a Cumulus deployment


    1. The Kibana console will not let you create an index if it doesn't match at least one Elasticsearch index.
    - + \ No newline at end of file diff --git a/docs/v9.0.0/features/ems_reporting/index.html b/docs/v9.0.0/features/ems_reporting/index.html index 848d959a285..65c9d77195b 100644 --- a/docs/v9.0.0/features/ems_reporting/index.html +++ b/docs/v9.0.0/features/ems_reporting/index.html @@ -5,14 +5,14 @@ EMS Reporting | Cumulus Documentation - +
    Version: v9.0.0

    EMS Reporting

    Cumulus reports usage statistics to the ESDIS Metrics System (EMS).

    Collection Configuration

    By default, a collection and its related records (Ingest, Distribution etc.) will be reported to EMS if the collection exists in both Cumulus and CMR. We can also configure to not report a collection to EMS by setting the collection configuration parameter reportToEms set to false. If the collection has been reported to EMS, it can only be removed manually by the EMS team.

    Types of Reports

    Product Metadata

    Cumulus creates a nightly Product Metadata report. The Product Metadata report provides ancillary information about the products (collections) in Cumulus, and this information is required before EMS can process ingest and distribution reports.

    Ingest

    Cumulus creates three ingest related reports for EMS: Ingest, Archive and Archive Delete.

    The Ingest report contains records of all granules that have been ingested into Cumulus.

    The Archive report contains records of all granules that have been archived into Cumulus. It's similar to Ingest report.

    The Archive Delete report lists granules that were reported to the EMS and now have been deleted from Cumulus.

    A scheduled Lambda task will run nightly that generates Ingest, Archive and Archive Delete reports.

    Distribution

    Cumulus reports all data distribution requests that pass through the distribution API to EMS. In order to track these requests, S3 server access logging must be enabled on all protected buckets.

    You must manually enable logging for each bucket before distribution logging will work, see S3 Server Access Logging.

    A scheduled Lambda task will run nightly that collects distribution events and builds an EMS distribution report.

    Report Submission

    Information about requesting EMS account can be found on EMS website. Here are basic steps in order to submit reports to EMS.

    1) Get a provider account on the EMS file server and obtain access to their UAT or OPS environment

    Provide IP addresses, data provider name , contact information (primary and secondary) to EMS, and EMS will set up account and firewall rules to allow applications to send files to EMS. For Cumulus instances running on NGAP, the IP address should be the Elastic IP (IPv4 Public IP field) of the NGAP NAT Instance in EC2, and that should be the IP that EMS firewall sees for any instance in that account.

    2) Request updates on NGAP NACL

    For Cumulus instances running on NGAP, submit a NGAP service desk ticket, and specify "Exception / Enhancement” request for “Network / Whitelist” changes to the account, that will add EMS host IP to the NACL (Network Access Control List) to allow outbound traffic from NGAP Application VPCs to EMS host.

    3) Send public key to EMS. Lambda will provide private key when sftp files to EMS

    Upload the corresponding private key to s3, use system_bucket as bucket name and {prefix}/crypto/ems-private.pem as key, system_bucket and prefix are configured in your deployment's terraform.tfvars file. If a different private key file name other than ems-private.pem is used, specify it in the ems_private_key configuration in terraform.tfvars.

    4) Create a data manifest file manually and send it to EMS team, and EMS team will configure the data provider on their side. Example configuration of the data manifest file can be found in Cumulus core's example

    5) Create a data collection to send to EMS. The report will be automatically generated and submit to EMS, and this step will be deleted after CUMULUS-1273 is completed

    6) Configure the ems* configuration variables passed to the cumulus terraform module. Example configuration of the ems* variables can be found in Cumulus core's example

    If ems_submit_report is not set to true in the configuration, the reports are still generated in s3://{buckets.internal.name}/{prefix}/ems/{filename} for Product Metadata and Ingest reports, and s3://{buckets.internal.name}/{prefix}/ems-distribution/reports/{filename} for Distribution reports, but they won't be submitted to EMS.

    Submitted reports will be saved to sent folder.

    Report Status

    1. EMS processes the reports and generates error reports which it sends to the provider's point of contacts.
    2. APEX EMS Reporting system allows users access to ingest, archive, distribution, and error metrics. The user with 'power user' privilege can also view the Data Provider Status and the status of flat files.

    The operator can submit an IdMAX request in NASA Access Management System (NAMS) to get access to GSFC ESDIS Metric System (EMS).

    - + \ No newline at end of file diff --git a/docs/v9.0.0/features/execution_payload_retention/index.html b/docs/v9.0.0/features/execution_payload_retention/index.html index e07822071a8..e7dd3e9b3c5 100644 --- a/docs/v9.0.0/features/execution_payload_retention/index.html +++ b/docs/v9.0.0/features/execution_payload_retention/index.html @@ -5,13 +5,13 @@ Execution Payload Retention | Cumulus Documentation - +
    Version: v9.0.0

    Execution Payload Retention

    In addition to CloudWatch logs and AWS StepFunction API records, Cumulus automatically stores the initial and 'final' (the last update to the execution record) payload values as part of the Execution record in DynamoDB and Elasticsearch.

    This allows access via the API (or optionally direct DB/Elasticsearch querying) for debugging/reporting purposes. The data is stored in the "originalPayload" and "finalPayload" fields.

    Payload record cleanup

    To reduce storage requirements, a CloudWatch rule ({stack-name}-dailyExecutionPayloadCleanupRule) triggering a daily run of the provided cleanExecutions lambda has been added. This lambda will remove all 'completed' and 'non-completed' payload records in the database that are older than the specified configuration.

    Configuration

    The following configuration flags have been made available in the cumulus module. They may be overridden in your deployment's instance of the cumulus module by adding the following configuration options:

    dailyexecution_payload_cleanup_schedule_expression (string)_

    This configuration option sets the execution times for this Lambda to run, using a Cloudwatch cron expression.

    Default value is "cron(0 4 * * ? *)".

    completeexecution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of completed execution payloads.

    Default value is false.

    completeexecution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a 'completed' status in days. Records with updatedAt values older than this with payload information will have that information removed.

    Default value is 10.

    noncomplete_execution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of "non-complete" (any status other than completed) execution payloads.

    Default value is false.

    noncomplete_execution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a status other than 'complete' in days. Records with updateTime values older than this with payload information will have that information removed.

    Default value is 30 days.

    • complete_execution_payload_disable/non_complete_execution_payload_disable

    These flags (true/false) determine if the cleanup script's logic for 'complete' and 'non-complete' executions will run. Default value is false for both.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/features/lambda_versioning/index.html b/docs/v9.0.0/features/lambda_versioning/index.html index 498631aa727..a11d85968aa 100644 --- a/docs/v9.0.0/features/lambda_versioning/index.html +++ b/docs/v9.0.0/features/lambda_versioning/index.html @@ -5,13 +5,13 @@ Lambda Versioning | Cumulus Documentation - +
    Version: v9.0.0

    Lambda Versioning

    Cumulus makes use of AWS's Lambda/Alias version objects to tag and retain references to recent copies of deployed workflow lambdas.

    All Cumulus deployed lambdas in lambdas.yml will have an alias/version resource created. Lambdas with source coming from S3 must be expressly configured to take advantage of versioning.

    A reference to the most current lambda version alias will replace the unversioned lambda resource ARN in all workflows for each task that is either built via Cumulus, or defined via the uniqueIdentifier configuration key for s3 sourced lambdas.

    A configurable number of previously deployed alias/version pairs will be retained to ensure that in-progress workflows are able to complete.

    This allows for workflows to automatically reference the specific version of a lambda function they were deployed with, prevents an updated deployment of an existing lambda from being utilized in an already in-progress workflow, and retains the executed version information in the AWS step function execution record and CloudWatch logs.

    Please note that care must be exercised to not update lambda versions and redeploy frequently enough that an in-progress workflow refers to an aged-off version of a lambda, or workflows that reference such a lambda may fail.

    Please note This feature is not currently compatible with utilizing the layers key in workflow lambdas, as updates/reconfiguration of lambda layers will not result in a new version being created by kes. See CUMULUS-1197 for more information.

    ( See AWS Lambda Function Versioning and Aliases for more on lambda versions/aliases)

    Configuration

    This feature is enabled by default for all Cumulus built/deployed lambdas, as well as s3Source lambdas that are configured as described below. s3Source Lambdas that are not configured will continue to utilize an unqualified reference and will not utilize lambda versioning.

    s3Source Lambda Version Configuration

    Lambdas with s3Source defined currently require additional configuration to make use of this feature in the form of a 'uniqueIdentifier' key:

    SomeLambda:
    Handler: lambda_handler.handler
    timeout: 300
    s3Source:
    bucket: '{{some_bucket}}'
    key: path/some-lambda.zip
    uniqueIdentifier: '5dot2'
    runtime: python2.7

    That key, due to AWS constraints, must be letters ([a-zA-Z]) only.

    Changing Number of Retained Lambdas

    The default number of retained lambda versions is 1.

    This can be overridden by adding the following key to your configuration file:

    maxNumberOfRetainedLambdas: X

    where X is the number of previous versions you wish to retain.

    This feature allows a variable number of retained lambdas, however due to CloudFormation limits and current implementation constraints, that number is fairly limited.

    The WorkflowLambdaVersions sub-template is constrained to 200 total resources, in addition to only being able to output 60 aliases back to the master template. As such, the limit on the template is:

    (200/2+2*RV)-2 where RV = total number of retained versions.

    Given the available limits, the following are the pratical limits on the number of lambdas that can be configured for a given number of retained lambdas:

    • 1: 48

    • 2: 31

    • 3: 23

    Disabling Lambda Versioning

    This feature is enabled by default in the deployment package template, but can be disabled by adding the following key to your app/config.yml:

    useWorkflowLambdaVersions: false

    Disabling this feature will result in Cumulus not creating alias/version lambda resource objects, the WorkflowLambdaVersions stack will not be created and the deployed workflow lambda references will be unqualified (always referring to the latest version).

    Disabling this feature after deploying a stack with it enabled will remove the WorkflowLambdaVersions stack, remove all Cumulus defined lambda Version/Alias pairs and reset all workflows to using an unqualified lambda reference. Workflows in progress with incomplete steps that have references to versioned lambdas will fail.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/features/logging-esdis-metrics/index.html b/docs/v9.0.0/features/logging-esdis-metrics/index.html index 7fe35256165..c08ef9b654b 100644 --- a/docs/v9.0.0/features/logging-esdis-metrics/index.html +++ b/docs/v9.0.0/features/logging-esdis-metrics/index.html @@ -5,13 +5,13 @@ Writing logs for ESDIS Metrics | Cumulus Documentation - +
    Version: v9.0.0

    Writing logs for ESDIS Metrics

    Note: This feature is only available for Cumulus deployments in NGAP environments.

    Prerequisite: You must configure your Cumulus deployment to deliver your logs to the correct shared logs destination for ESDIS metrics.

    Log messages delivered to the ESDIS metrics logs destination conforming to an expected format will be automatically ingested and parsed to enable helpful searching/filtering of your logs via the ESDIS metrics Kibana dashboard.

    Expected log format

    The ESDIS metrics pipeline expects a log message to be a JSON string representation of an object (dict in Python or map in Java). An example log message might look like:

    {
    "level": "info",
    "executions": "arn:aws:states:us-east-1:000000000000:execution:MySfn:abcd1234",
    "granules": "[\"granule-1\",\"granule-2\"]",
    "message": "hello world",
    "sender": "greetingFunction",
    "stackName": "myCumulus",
    "timestamp": "2018-10-19T19:12:47.501Z"
    }

    A log message can contain the following properties:

    • executions: The AWS Step Function execution name in which this task is executing, if any
    • granules: A JSON string of the array of granule IDs being processed by this code, if any
    • level: A string identifier for the type of message being logged. Possible values:
      • debug
      • error
      • fatal
      • info
      • warn
      • trace
    • message: String containing your actual log message
    • parentArn: The parent AWS Step Function execution ARN that triggered the current execution, if any
    • sender: The name of the resource generating the log message (e.g. a library name, a Lambda function name, an ECS activity name)
    • stackName: The unique prefix for your Cumulus deployment
    • timestamp: An ISO-8601 formatted timestamp
    • version: The version of the resource generating the log message, if any

    None of these properties are explicitly required for ESDIS metrics to parse your log correctly. However, a log without a message has no informational content. And having level, sender, and timestamp properties is very useful for filtering your logs. Including a stackName in your logs is helpful as it allows you to distinguish between logs generated by different deployments.

    Using Cumulus Message Adapter libraries

    If you are writing a custom task that is integrated with the Cumulus Message Adapter, then some of language specific client libraries can be used to write logs compatible with ESDIS metrics.

    The usage of each library differs slightly, but in general a logger is initialized with a Cumulus workflow message to determine the contextual information for the task (e.g. granules, executions). Then, after the logger is initialized, writing logs only requires specifying a message, but the logged output will include the contextual information as well.

    Writing logs using custom code

    Any code that produces logs matching the expected log format can be processed by ESDIS metrics.

    Node.js

    Cumulus core provides a @cumulus/logger library that writes logs in the expected format for ESDIS metrics.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/features/replay-kinesis-messages/index.html b/docs/v9.0.0/features/replay-kinesis-messages/index.html index a20a73b7ce8..4c047a94299 100644 --- a/docs/v9.0.0/features/replay-kinesis-messages/index.html +++ b/docs/v9.0.0/features/replay-kinesis-messages/index.html @@ -5,7 +5,7 @@ How to replay Kinesis messages after an outage | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v9.0.0

    How to replay Kinesis messages after an outage

    After a period of outage, it may be necessary for a Cumulus operator to reprocess or 'replay' messages that arrived on an AWS Kinesis Data Stream but did not trigger an ingest. This document serves as an outline on how to start a replay operation, and how to perform status tracking. Cumulus supports replay of all Kinesis messages on a stream (subject to the normal RetentionPeriod constraints), or all messages within a given time slice delimited by start and end timestamps.

    As Kinesis has no comparable field to e.g. the SQS ReceiveCount on its records, Cumulus cannot tell which messages within a given time slice have never been processed, and cannot guarantee only missed messages will be processed. Users will have to rely on duplicate handling or some other method of identifying messages that should not be processed within the time slice.

    NOTE: This operation flow effectively changes only the trigger mechanism for Kinesis ingest notifications. The existence of valid Kinesis-type rules and all other normal requirements for the triggering of ingest via Kinesis still apply.

    Replays endpoint

    Cumulus has added a new endpoint to its API, /replays. This endpoint will allow you to start replay operations and returns an AsyncOperationId for operation status tracking.

    Start a replay

    In order to start a replay, you must perform a POST request to the replays endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    NOTE: As the endTimestamp relies on a comparison with the Kinesis server-side ApproximateArrivalTimestamp, and given that there is no documented level of accuracy for the approximation, it is recommended that the endTimestamp include some amount of buffer to allow for slight discrepancies. If tolerable, the same is recommended for the startTimestamp although it is used differently and less vulnerable to discrepancies since a server-side arrival timestamp should never be earlier than the client-side request timestamp.

    FieldTypeRequiredDescription
    typestringrequiredCurrently only accepts kinesis.
    kinesisStreamstringfor type kinesisAny valid kinesis stream name (not ARN)
    kinesisStreamCreationTimestamp*optionalAny input valid for a JS Date constructor. For reasons to use this field see AWS documentation on StreamCreationTimestamp.
    endTimestamp*optionalAny input valid for a JS Date constructor. Messages newer than this timestamp will be skipped.
    startTimestamp*optionalAny input valid for a JS Date constructor. Messages will be fetched from the Kinesis stream starting at this timestamp. Ignored if it is further in the past than the stream's retention period.

    Status tracking

    A successful response from the /replays endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/features/reports/index.html b/docs/v9.0.0/features/reports/index.html index b7e62a13b86..9eb6605e76d 100644 --- a/docs/v9.0.0/features/reports/index.html +++ b/docs/v9.0.0/features/reports/index.html @@ -5,7 +5,7 @@ Reconciliation Reports | Cumulus Documentation - + @@ -16,7 +16,7 @@ Screenshot of the Dashboard Rconciliation Reports Overview page

    Viewing an inventory report will show a detailed list of collections, granules and files. Screenshot of an Inventory Report page

    Viewing a granule not found report will show a list of granules missing data Screenshot of a Granule Not Found Report page

    API

    The API also allows users to create and view reports. For more extensive API documentation, see the Cumulus API docs.

    Creating a Report via API

    Create a new inventory report with the following:

    curl --request POST https://example.com/reconciliationReports --header 'Authorization: Bearer ReplaceWithToken'

    Example response:

    {
    "message": "Report is being generated",
    "status": 202
    }

    Retrieving a Report via API

    Once a report has been generated, you can retrieve the full report.

    curl https://example.com/reconciliationReports/inventoryReport-20190305T153430508 --header 'Authorization: Bearer ReplaceWithTheToken'

    Example response:

    {
    "reportStartTime": "2019-03-05T15:34:30.508Z",
    "reportEndTime": "2019-03-05T15:34:37.243Z",
    "status": "SUCCESS",
    "error": null,
    "filesInCumulus": {
    "okCount": 40,
    "onlyInS3": [
    "s3://cumulus-test-sandbox-protected/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "s3://cumulus-test-sandbox-private/BROWSE.MYD13Q1.A2017297.h19v10.006.2017313221201.hdf"
    ],
    "onlyInDynamoDb": [
    {
    "uri": "s3://cumulus-test-sandbox-protected/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606"
    }
    ]
    },
    "collectionsInCumulusCmr": {
    "okCount": 1,
    "onlyInCumulus": [
    "L2_HR_PIXC___000"
    ],
    "onlyInCmr": [
    "MCD43A1___006",
    "MOD14A1___006"
    ]
    },
    "granulesInCumulusCmr": {
    "okCount": 3,
    "onlyInCumulus": [
    {
    "granuleId": "MOD09GQ.A3518809.ln_rVr.006.7962927138074",
    "collectionId": "MOD09GQ___006"
    },
    {
    "granuleId": "MOD09GQ.A8768252.HC4ddD.006.2077696236118",
    "collectionId": "MOD09GQ___006"
    }
    ],
    "onlyInCmr": [
    {
    "GranuleUR": "MOD09GQ.A0002421.oD4zvB.006.4281362831355",
    "ShortName": "MOD09GQ",
    "Version": "006"
    }
    ]
    },
    "filesInCumulusCmr": {
    "okCount": 11,
    "onlyInCumulus": [
    {
    "fileName": "MOD09GQ.A8722843.GTk5A3.006.4026909316904.jpeg",
    "uri": "s3://cumulus-test-sandbox-public/MOD09GQ___006/MOD/MOD09GQ.A8722843.GTk5A3.006.4026909316904.jpeg",
    "granuleId": "MOD09GQ.A8722843.GTk5A3.006.4026909316904"
    }
    ],
    "onlyInCmr": [
    {
    "URL": "https://cumulus-test-sandbox-public.s3.amazonaws.com/MOD09GQ___006/MOD/MOD09GQ.A8722843.GTk5A3.006.4026909316904_ndvi.jpg",
    "Type": "GET DATA",
    "GranuleUR": "MOD09GQ.A8722843.GTk5A3.006.4026909316904"
    }
    ]
    }
    }
    - + \ No newline at end of file diff --git a/docs/v9.0.0/getting-started/index.html b/docs/v9.0.0/getting-started/index.html index dd8bd732e79..36bb9286405 100644 --- a/docs/v9.0.0/getting-started/index.html +++ b/docs/v9.0.0/getting-started/index.html @@ -5,13 +5,13 @@ Getting Started | Cumulus Documentation - +
    Version: v9.0.0

    Getting Started

    Overview | Quick Tutorials | Helpful Tips

    Overview

    This serves as a guide for new Cumulus users to deploy and learn how to use Cumulus. Here you will learn what you need in order to complete any prerequisites, what Cumulus is and how it works, and how to successfully navigate and deploy a Cumulus environment.

    What is Cumulus

    Cumulus is an open source set of components for creating cloud-based data ingest, archive, distribution and management designed for NASA's future Earth Science data streams.

    Who uses Cumulus

    Data integrators/developers and operators across projects not limited to NASA use Cumulus for their daily work functions.

    Cumulus Roles

    Integrator/Developer

    Cumulus integrators/developers are those who work within Cumulus and AWS for deployments and to manage workflows.

    Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections.

    Role Guides

    As a developer, integrator, or operator, you will need to set up your environments to work in Cumulus. The following docs can get you started in your role specific activities.

    What is a Cumulus Data Type

    In Cumulus, we have the following types of data that you can create and manage:

    • Collections
    • Granules
    • Providers
    • Rules
    • Workflows
    • Executions
    • Reports

    For details on how to create or manage data types go to Data Management Types.


    Quick Tutorials

    Deployment & Configuration

    Cumulus is deployed to an AWS account, so you must have access to deploy resources to an AWS account to get started.

    1. Deploy Cumulus and Cumulus Dashboard to AWS

    Follow the deployment instructions to deploy Cumulus to your AWS account.

    2. Configure and Run the HelloWorld Workflow

    If you have deployed using the cumulus-template-deploy repository, you have a HelloWorld workflow deployed to your Cumulus backend.

    You can see your deployed workflows on the Workflows page of your Cumulus dashboard.

    Configure a collection and provider using the setup guidance on the Cumulus dashboard.

    Then create a rule to trigger your HelloWorld workflow. You can select a rule type of one time.

    Navigate to the Executions page of the dashboard to check the status of your workflow execution.

    3. Configure a Custom Workflow

    See Developing a custom workflow documentation for adding a new workflow to your deployment.

    There are plenty of workflow examples using Cumulus tasks here. The Data Cookbooks provide a more in-depth look at some of these more advanced workflows and their configurations.

    There is a list of Cumulus tasks already included in your deployment here.

    After configuring your workflow and redeploying, you can configure and run your workflow using the same steps as in step 2.


    Helpful Tips

    Here are some useful tips to keep in mind when deploying or working in Cumulus.

    Integrator/Developer

    • Versioning and Releases: This documentation gives information on our global versioning approach. We suggest upgrading to the supported version for Cumulus, Cumulus dashboard, and Thin Egress App (TEA).
    • Cumulus Developer Documentation: We suggest that you read through and reference this resource for development best practices in Cumulus.
    • Cumulus Deployment: It's good to know how to manually deploy to a Cumulus sandbox environment.
    • Integrator Common Use Cases: Scenarios to help integrators along in the Cumulus environment.

    Operator

    Troubleshooting

    Troubleshooting: Some suggestions to help you troubleshoot and solve issues you may encounter.

    Resources

    - + \ No newline at end of file diff --git a/docs/v9.0.0/glossary/index.html b/docs/v9.0.0/glossary/index.html index 24e0aea135a..f4aa92b6eaf 100644 --- a/docs/v9.0.0/glossary/index.html +++ b/docs/v9.0.0/glossary/index.html @@ -5,14 +5,14 @@ Glossary | Cumulus Documentation - +
    Version: v9.0.0

    Glossary

    AWS Glossary

    For terms/items from Amazon/AWS not mentioned in this glossary, please refer to the AWS Glossary.

    Cumulus Glossary of Terms

    API Gateway

    Refers to AWS's API Gateway. Used by the Cumulus API.

    ARN

    Refers to an AWS "Amazon Resource Name".

    For more info, see the AWS documentation.

    AWS

    See: aws.amazon.com

    AWS Lambda/Lambda Function

    AWS's 'serverless' option. Allows the running of code without provisioning a service or managing server/ECS instances/etc.

    For more information, see the AWS Lambda documentation.

    AWS Access Keys

    Access credentials that give you access to AWS to act as a IAM user programmatically or from the command line. For more information, see the AWS IAM Documentation.

    Bucket

    An Amazon S3 cloud storage resource.

    For more information, see the AWS Bucket Documentation.

    CloudFormation

    An AWS service that allows you to define and manage cloud resources as a preconfigured block.

    For more information, see the AWS CloudFormation User Guide.

    Cloudformation Template

    A template that defines an AWS Cloud Formation.

    For more information, see the AWS intro page.

    Cloudwatch

    AWS service that allows logging and metrics collections on various cloud resources you have in AWS.

    For more information, see the AWS User Guide.

    Cloud Notification Mechanism (CNM)

    An interface mechanism to support cloud-based ingest messaging. For more information, see PO.DAAC's CNM Schema.

    Common Metadata Repository (CMR)

    "A high-performance, high-quality, continuously evolving metadata system that catalogs Earth Science data and associated service metadata records". For more information, see NASA's CMR page.

    Collection (Cumulus)

    Cumulus Collections are logical sets of data objects of the same data type and version.

    For more information, see cookbook reference page.

    Cumulus Message Adapter (CMA)

    A library designed to help task developers integrate step function tasks into a Cumulus workflow by adapting task input/output into the Cumulus Message format.

    For more information, see CMA workflow reference page.

    Distributed Active Archive Center (DAAC)

    Refers to a specific organization that's part of NASA's distributed system of archive centers. For more information see EOSDIS's DAAC page

    Dead Letter Queue (DLQ)

    This refers to Amazon SQS Dead-Letter Queues - these SQS queues are specifically configured to capture failed messages from other services/SQS queues/etc to allow for processing of failed messages.

    For more on DLQs, see the Amazon Documentation and the Cumulus DLQ feature page.

    Developer

    Those who setup deployment and workflow management for Cumulus. Sometimes referred to as an integrator. See integrator.

    ECS

    Amazon's Elastic Container Service. Used in Cumulus by workflow steps that require more flexibility than Lambda can provide.

    For more information, see AWS's developer guide.

    ECS Activity

    An ECS instance run via a Step Function.

    EMS

    ESDIS Metrics System

    Execution (Cumulus)

    A Cumulus execution refers to a single execution of a (Cumulus) Workflow.

    GIBS

    Global Imagery Browse Services

    Granule

    A granule is the smallest aggregation of data that can be independently managed (described, inventoried, and retrieved). Granules are always associated with a collection, which is a grouping of granules. A granule is a grouping of data files.

    IAM

    AWS Identity and Access Management.

    For more information, see AWS IAMs.

    Integrator/Developer

    Those who work within Cumulus and AWS for deployments and to manage workflows.

    Kinesis

    Amazon's platform for streaming data on AWS.

    See AWS Kinesis for more information.

    Lambda

    AWS's cloud service that lets you run code without provisioning or managing servers.

    For more information, see AWS's lambda page.

    Module (Terraform)

    Refers to a terraform module.

    Node

    See node.js.

    Npm

    Node package manager.

    For more information, see npmjs.com.

    Operator

    Those who work within Cumulus to ingest/archive data and manage collections.

    PDR

    "Polling Delivery Mechanism" used in "DAAC Ingest" workflows.

    For more information, see nasa.gov.

    Packages (NPM)

    NPM hosted node.js packages. Cumulus packages can be found on NPM's site here

    Provider

    Data source that generates and/or distributes data for Cumulus workflows to act upon.

    For more information, see the Cumulus documentation.

    Rule

    Rules are configurable scheduled events that trigger workflows based on various criteria.

    For more information, see the Cumulus Rules documentation.

    S3

    Amazon's Simple Storage Service provides data object storage in the cloud. Used in Cumulus to store configuration, data and more.

    For more information, see AWS's s3 page.

    SIPS

    Science Investigator-led Processing Systems. In the context of DAAC ingest, this refers to data producers/providers.

    For more information, see nasa.gov.

    SNS

    Amazon's Simple Notification Service provides a messaging service that allows publication of and subscription to events. Used in Cumulus to trigger workflow events, track event failures, and others.

    For more information, see AWS's SNS page.

    SQS

    Amazon's Simple Queue Service.

    For more information, see AWS's SQS page.

    Stack

    A collection of AWS resources you can manage as a single unit.

    In the context of Cumulus, this refers to a deployment of the cumulus and data-persistence modules that is managed by Terraform

    Step Function

    AWS's web service that allows you to compose complex workflows as a state machine comprised of tasks (Lambdas, activities hosted on EC2/ECS, some AWS service APIs, etc). See AWS's Step Function Documentation for more information. In the context of Cumulus these are the underlying AWS service used to create Workflows.

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/index.html b/docs/v9.0.0/index.html index a69284c8a99..f697775337c 100644 --- a/docs/v9.0.0/index.html +++ b/docs/v9.0.0/index.html @@ -5,13 +5,13 @@ Introduction | Cumulus Documentation - +
    Version: v9.0.0

    Introduction

    This Cumulus project seeks to address the existing need for a “native” cloud-based data ingest, archive, distribution, and management system that can be used for all future Earth Observing System Data and Information System (EOSDIS) data streams via the development and implementation of Cumulus. The term “native” implies that the system will leverage all components of a cloud infrastructure provided by the vendor for efficiency (in terms of both processing time and cost). Additionally, Cumulus will operate on future data streams involving satellite missions, aircraft missions, and field campaigns.

    This documentation includes both guidelines, examples, and source code docs. It is accessible at https://nasa.github.io/cumulus.


    Get To Know Cumulus

    • Getting Started - here - If you are new to Cumulus we suggest that you begin with this section to help you understand and work in the environment.
    • General Cumulus Documentation - here <- you're here

    Cumulus Reference Docs

    • Cumulus API Documentation - here
    • Cumulus Developer Documentation - here - READMEs throughout the main repository.
    • Data Cookbooks - here

    Auxiliary Guides

    • Integrator Guide - here
    • Operator Docs - here

    Contributing

    Please refer to: https://github.com/nasa/cumulus/blob/master/CONTRIBUTING.md for information. We thank you in advance.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/integrator-guide/about-int-guide/index.html b/docs/v9.0.0/integrator-guide/about-int-guide/index.html index f7b866efee8..8bfcebd8d63 100644 --- a/docs/v9.0.0/integrator-guide/about-int-guide/index.html +++ b/docs/v9.0.0/integrator-guide/about-int-guide/index.html @@ -5,13 +5,13 @@ About Integrator Guide | Cumulus Documentation - +
    Version: v9.0.0

    About Integrator Guide

    Purpose

    The Integrator Guide is to help supplement the Cumulus documentation and Data Cookbooks. This content is for Cumulus integrators who are either new to the project or need a step-by-step resource to help them along.

    What Is A Cumulus Integrator

    Cumulus integrators are those who work within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    - + \ No newline at end of file diff --git a/docs/v9.0.0/integrator-guide/int-common-use-cases/index.html b/docs/v9.0.0/integrator-guide/int-common-use-cases/index.html index ffe46216a17..29f913e2ce0 100644 --- a/docs/v9.0.0/integrator-guide/int-common-use-cases/index.html +++ b/docs/v9.0.0/integrator-guide/int-common-use-cases/index.html @@ -5,13 +5,13 @@ Integrator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v9.0.0/integrator-guide/workflow-add-new-lambda/index.html b/docs/v9.0.0/integrator-guide/workflow-add-new-lambda/index.html index d42a4a5152d..50542d95db3 100644 --- a/docs/v9.0.0/integrator-guide/workflow-add-new-lambda/index.html +++ b/docs/v9.0.0/integrator-guide/workflow-add-new-lambda/index.html @@ -5,13 +5,13 @@ Workflow - Add New Lambda | Cumulus Documentation - +
    Version: v9.0.0

    Workflow - Add New Lambda

    You can develop a workflow task in AWS Lambda or Elastic Container Service (ECS). AWS ECS requires Docker. For a list of tasks to use go to our Cumulus Tasks page.

    The following steps are to help you along as you write a new Lambda that integrates with a Cumulus workflow. This will aid you with the understanding of the Cumulus Message Adapter (CMA) process.

    Steps

    1. Define New Lambda in Terraform

    2. Add Task in JSON Object

      For details on how to set up a workflow via CMA go to the CMA Tasks: Message Flow.

      You will need to assign input and output for the new task and follow the CMA contract here. This contract defines how libraries should call the cumulus-message-adapter to integrate a task into an existing Cumulus Workflow.

    3. Verify New Task

      Check the updated workflow in AWS and in Cumulus.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/integrator-guide/workflow-ts-failed-step/index.html b/docs/v9.0.0/integrator-guide/workflow-ts-failed-step/index.html index 4a338c2ac52..f920007cc6f 100644 --- a/docs/v9.0.0/integrator-guide/workflow-ts-failed-step/index.html +++ b/docs/v9.0.0/integrator-guide/workflow-ts-failed-step/index.html @@ -5,13 +5,13 @@ Workflow - Troubleshoot Failed Step(s) | Cumulus Documentation - +
    Version: v9.0.0

    Workflow - Troubleshoot Failed Step(s)

    Steps

    1. Locate Step
    • Go to Cumulus dashboard
    • Find the granule
    • Go to Executions to determine the failed step
    1. Investigate in Cloudwatch
    • Go to Cloudwatch
    • Locate lambda
    • Search Cloudwatch logs
    1. Recreate Error

      In your sandbox environment, try to recreate the error.

    2. Resolution

    - + \ No newline at end of file diff --git a/docs/v9.0.0/interfaces/index.html b/docs/v9.0.0/interfaces/index.html index 2963ea16dc2..f3a29e1fa7c 100644 --- a/docs/v9.0.0/interfaces/index.html +++ b/docs/v9.0.0/interfaces/index.html @@ -5,13 +5,13 @@ Interfaces | Cumulus Documentation - +
    Version: v9.0.0

    Interfaces

    Cumulus has multiple interfaces that allow interaction with discrete components of the system, such as starting workflows via SNS/Kinesis/SQS, manually queueing workflow start messages, submitting SNS notifications for completed workflows, and the many operations allowed by the Cumulus API.

    The diagram below illustrates the workflow process in detail and the various interfaces that allow starting of workflows, reporting of workflow information, and database create operations that occur when a workflow reporting message is processed. For interfaces with expected input or output schemas, details are provided below.

    Note: This diagram is current of v1.18.0.

    Architecture diagram showing the interfaces for triggering and reporting of Cumulus workflow executions

    Workflow triggers and queuing

    Kinesis stream

    As a Kinesis stream is consumed by the messageConsumer Lambda to queue workflow executions, the incoming event is validated against this consumer schema by the ajv package.

    SQS queue for executions

    The messages put into the SQS queue for executions should conform to the Cumulus message format.

    Workflow executions

    See the documentation on Cumulus workflows.

    Workflow reporting

    SNS reporting topics

    For granule and PDR reporting, the topics will only receive data if the Cumulus workflow execution message meets the following criteria:

    • Granules - workflow message contains granule data in payload.granules
    • PDRs - workflow message contains PDR data in payload.pdr

    The messages published to the SNS reporting topics for executions and PDRs and the record property in the messages published to the granules SNS topic should conform to the model schema for each data type.

    Further detail on workflow reporting and how to interact with these interfaces can be found in the workflow notifications data cookbook.

    Cumulus API

    See the Cumulus API documentation.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/operator-docs/about-operator-docs/index.html b/docs/v9.0.0/operator-docs/about-operator-docs/index.html index 00748f731a2..980401569f3 100644 --- a/docs/v9.0.0/operator-docs/about-operator-docs/index.html +++ b/docs/v9.0.0/operator-docs/about-operator-docs/index.html @@ -5,13 +5,13 @@ About Operator Docs | Cumulus Documentation - +
    Version: v9.0.0

    About Operator Docs

    Purpose

    Operator Docs are an augmentation to Cumulus documentation and Data Cookbooks. These documents will walk step-by-step through common Cumulus activities (that aren't necessarily as use-case directed as what you'd see in Data Cookbooks).

    What Is A Cumulus Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections. They may perform the following functions via the operator dashboard or API:

    • Configure providers and collections
    • Configure rules and monitor workflow executions
    • Monitor granule ingestion
    • Monitor system metrics
    - + \ No newline at end of file diff --git a/docs/v9.0.0/operator-docs/bulk-operations/index.html b/docs/v9.0.0/operator-docs/bulk-operations/index.html index 49750622551..7f9f18269f1 100644 --- a/docs/v9.0.0/operator-docs/bulk-operations/index.html +++ b/docs/v9.0.0/operator-docs/bulk-operations/index.html @@ -5,14 +5,14 @@ Bulk Operations | Cumulus Documentation - +
    Version: v9.0.0

    Bulk Operations

    Cumulus implements bulk operations through the use of AsyncOperations, which are long-running processes executed on an AWS ECS cluster.

    Submitting a bulk API request

    Bulk operations are generally submitted via the endpoint for the relevant data type, e.g. granules. For a list of supported API requests, refer to the Cumulus API documentation. Bulk operations are denoted with the keyword 'bulk'.

    Starting bulk operations from the Cumulus dashboard

    Using a Kibana query

    Note: You must have configured your dashboard build with a KIBANAROOT environment variable in order for the Kibana link to render in the bulk granules modal

    1. From the Granules dashboard page, click on the "Run Bulk Granules" button, then select what type of action you would like to perform

      • Note: the rest of the process is the same regardless of what type of bulk action you perform
    2. From the bulk granules modal, click the "Open Kibana" link:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations

    3. Once you have accessed Kibana, navigate to the "Discover" page. If this is your first time using Kibana, you may see a message like this at the top of the page:

      In order to visualize and explore data in Kibana, you'll need to create an index pattern to retrieve data from Elasticsearch.

      In that case, see the docs for creating an index pattern for Kibana

      Screenshot of Kibana user interface showing the &quot;Discover&quot; page for running queries

    4. Enter a query that returns the granule records that you want to use for bulk operations:

      Screenshot of Kibana user interface showing an example Kibana query and results

    5. Once the Kibana query is returning the results you want, click the "Inspect" link near the top of the page. A slide out tab with request details will appear on the right side of the page:

      Screenshot of Kibana user interface showing details of an example request

    6. In the slide out tab that appears on the right side of the page, click the "Request" link near the top and scroll down until you see the query property:

      Screenshot of Kibana user interface showing the Elasticsearch data request made for a given Kibana query

    7. Highlight and copy the query contents from Kibana. Go back to the Cumulus dashboard and paste the query contents from Kibana inside of the query property in the bulk granules request payload. It is expected that you should have a property of query nested inside of the existing query property:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query information populated

    8. Add values for the index and workflowName to the bulk granules request payload. The value for index will vary based on your Elasticsearch setup, but it is good to target an index specifically for granule data if possible:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query, index, and workflow information populated

    9. Click the "Run Bulk Operations" button. You should see a confirmation message, including an ID for the async operation that was started to handle your bulk action. You can track the status of this async operation on the Operations dashboard page, which can be visited by clicking the "Go To Operations" button:

      Screenshot of Cumulus dashboard showing confirmation message with async operation ID for bulk granules request

    Creating an index pattern for Kibana

    1. Define the index pattern for the indices that your Kibana queries should use. A wildcard character, *, will match across multiple indices. Once you are satisfied with your index pattern, click the "Next step" button:

      Screenshot of Kibana user interface for defining an index pattern

    2. Choose whether to use a Time Filter for your data, which is not required. Then click the "Create index pattern" button:

      Screenshot of Kibana user interface for configuring the settings of an index pattern

    Status Tracking

    All bulk operations return an AsyncOperationId which can be submitted to the /asyncOperations endpoint.

    The /asyncOperations endpoint allows listing of AsyncOperation records as well as record retrieval for individual records, which will contain the status. The Cumulus API documentation shows sample requests for these actions.

    The Cumulus Dashboard also includes an Operations monitoring page, where operations and their status are visible:

    Screenshot of Cumulus Dashboard Operations Page showing 5 operations and their status, ID, description, type and creation timestamp

    - + \ No newline at end of file diff --git a/docs/v9.0.0/operator-docs/cmr-operations/index.html b/docs/v9.0.0/operator-docs/cmr-operations/index.html index f3039108591..027116f40d1 100644 --- a/docs/v9.0.0/operator-docs/cmr-operations/index.html +++ b/docs/v9.0.0/operator-docs/cmr-operations/index.html @@ -5,7 +5,7 @@ CMR Operations | Cumulus Documentation - + @@ -16,7 +16,7 @@ UpdateCmrAccessConstraints will update CMR metadata file contents on S3, and PostToCmr will push the updates to CMR. The rest of this section will assume you have created this workflow under the name UpdateCmrAccessConstraints.

    Once created and deployed, the workflow is available in the Cumulus dashboard's Execute workflow selector. However, note that additional configuration is required for this request, to supply an access constraint integer value and optional description to the UpdateCmrAccessConstraints workflow, by clicking the Add Custom Workflow Meta option in the Execute popup, as shown below:

    Screenshot showing granule execute popup with &#39;updateCmrAccessConstraints&#39; selected and configuration values shown in a collapsible JSON field

    An example invocation of the API to perform this action is:

    $ curl --request PUT https://example.com/granules/MOD11A1.A2017137.h19v16.006.2017138085750 \
    --header 'Authorization: Bearer ReplaceWithTheToken' \
    --header 'Content-Type: application/json' \
    --data '{
    "action": "applyWorkflow",
    "workflow": "updateCmrAccessConstraints",
    "meta": {
    accessConstraints: {
    value: 5,
    description: "sample access constraint"
    }
    }
    }'

    Supported CMR metadata formats for the above operation are Echo10XML and UMMG-JSON, which will populate the RestrictionFlag and RestrictionComment fields in Echo10XML, or the AccessConstraints values in UMMG-JSON.

    Additional Operations

    At this time Cumulus does not, out of the box, support additional operations on CMR metadata. However, given the examples shown above, we recommend working with your integrators to develop additional workflows that perform any required operations.

    Bulk CMR operations

    In order to perform the above operations in bulk, Cumulus supports the use of ApplyWorkflow in an AsyncOperation. These are accessed via the Bulk Operation button on the dashboard, or the /granules/bulk endpoint on the Cumulus API.

    More information on bulk operations are in the bulk operations operator doc.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/operator-docs/create-rule-in-cumulus/index.html b/docs/v9.0.0/operator-docs/create-rule-in-cumulus/index.html index 9bd1175994f..6b60ce0b9b2 100644 --- a/docs/v9.0.0/operator-docs/create-rule-in-cumulus/index.html +++ b/docs/v9.0.0/operator-docs/create-rule-in-cumulus/index.html @@ -5,13 +5,13 @@ Create Rule In Cumulus | Cumulus Documentation - +
    Version: v9.0.0

    Create Rule In Cumulus

    Once the above files are in place and the entries created in CMR and Cumulus, we are ready to begin ingesting data. Depending on the type of ingestion (FTP/Kinesis, etc) the values below will change, but for the most part they are all similar. Rules tell Cumulus how to associate providers and collections, and when/how to start processing a workflow.

    Steps

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v9.0.0/operator-docs/discovery-filtering/index.html b/docs/v9.0.0/operator-docs/discovery-filtering/index.html index 2936b858501..1ac10c9eefd 100644 --- a/docs/v9.0.0/operator-docs/discovery-filtering/index.html +++ b/docs/v9.0.0/operator-docs/discovery-filtering/index.html @@ -5,7 +5,7 @@ Discovery Filtering | Cumulus Documentation - + @@ -24,7 +24,7 @@ directly list the provider_path. If the path contains regular expression components, this may fail.

    It is recommended that operators diagnose any failures by checking error logs and ensuring that permissions on the remote file system allow reading of the default directory and any subdirectories that match the filter.

    Supported protocols

    Currently support for this feature is limited to the following protocols:

    • ftp
    • sftp
    - + \ No newline at end of file diff --git a/docs/v9.0.0/operator-docs/granule-workflows/index.html b/docs/v9.0.0/operator-docs/granule-workflows/index.html index a97ef76efb8..0f2973f819f 100644 --- a/docs/v9.0.0/operator-docs/granule-workflows/index.html +++ b/docs/v9.0.0/operator-docs/granule-workflows/index.html @@ -5,13 +5,13 @@ Granule Workflows | Cumulus Documentation - +
    Version: v9.0.0

    Granule Workflows

    Failed Granule

    Delete and Ingest

    1. Delete Granule

    Note: Granules published to CMR will need to be removed from CMR via the dashboard prior to deletion

    1. Ingest Granule via Ingest Rule
    • Re-trigger a one-time, kinesis, SQS, or SNS rule or a scheduled rule will re-discover and reingest the deleted granule.

    Reingest

    1. Select Failed Granule
    • In the Cumulus dashboard, go to the Collections page.
    • Use search field to find the granule.
    1. Re-ingest Granule
    • Go to the Collections page.
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of the Reingest modal workflow

    Delete and Ingest

    1. Bulk Delete Granules
    • Go to the Granules page.
    • Use the Bulk Delete button to bulk delete selected granules or select via a Kibana query

    Note: You can optionally force deletion from CMR

    1. Ingest Granules via Ingest Rule
    • Re-trigger one-time, kinesis, SQS, or SNS rules or scheduled rules will re-discover and reingest the deleted granule.

    Multiple Failed Granules

    1. Select Failed Granules
    • In the Cumulus dashboard, go to the Collections page.
    • Click on Failed Granules.
    • Select multiple granules.

    Screenshot of selected multiple granules

    1. Bulk Re-ingest Granules
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of Bulk Reingest modal workflow

    - + \ No newline at end of file diff --git a/docs/v9.0.0/operator-docs/kinesis-stream-for-ingest/index.html b/docs/v9.0.0/operator-docs/kinesis-stream-for-ingest/index.html index 88835a1b445..784983fdcaf 100644 --- a/docs/v9.0.0/operator-docs/kinesis-stream-for-ingest/index.html +++ b/docs/v9.0.0/operator-docs/kinesis-stream-for-ingest/index.html @@ -5,13 +5,13 @@ Setup Kinesis Stream & CNM Message | Cumulus Documentation - +
    Version: v9.0.0

    Setup Kinesis Stream & CNM Message

    Note: Keep in mind that you should only have to set this up once per ingest stream. Kinesis pricing is based on the shard value and not on amount of kinesis usage.

    1. Create a Kinesis Stream

      • In your AWS console, go to the Kinesis service and click Create Data Stream.
      • Assign a name to the stream.
      • Apply a shard value of 1.
      • Click on Create Kinesis Stream.
      • A status page with stream details display. Once the status is active then the stream is ready to use. Keep in mind to record the streamName and StreamARN for later use.

      Screenshot of AWS console page for creating a Kinesis stream

    2. Create a Rule

    3. Send a message

      • Send a message that makes your schema using python or by your command line.
      • The streamName and Collection must match the kinesisArn+collection defined in the rule that you have created in Step 2.
    - + \ No newline at end of file diff --git a/docs/v9.0.0/operator-docs/locating-access-logs/index.html b/docs/v9.0.0/operator-docs/locating-access-logs/index.html index 8f1e4bde105..8c4ca9bce36 100644 --- a/docs/v9.0.0/operator-docs/locating-access-logs/index.html +++ b/docs/v9.0.0/operator-docs/locating-access-logs/index.html @@ -5,13 +5,13 @@ Locating S3 Access Logs | Cumulus Documentation - +
    Version: v9.0.0

    Locating S3 Access Logs

    When enabling S3 Access Logs for EMS Reporting you configured a TargetBucket and TargetPrefix. Inside the TargetBucket at the TargetPrefix is where you will find the raw S3 access logs.

    In a standard deployment, this will be your stack's <internal bucket name> and a key prefix of <stack>/ems-distribution/s3-server-access-logs/

    - + \ No newline at end of file diff --git a/docs/v9.0.0/operator-docs/naming-executions/index.html b/docs/v9.0.0/operator-docs/naming-executions/index.html index 3a85e71f655..e61405a9d21 100644 --- a/docs/v9.0.0/operator-docs/naming-executions/index.html +++ b/docs/v9.0.0/operator-docs/naming-executions/index.html @@ -5,7 +5,7 @@ Naming Executions | Cumulus Documentation - + @@ -21,7 +21,7 @@ QueuePdrs step.

    In the following excerpt, the QueueGranules config.executionNamePrefix property is set using the value configured in the workflow's meta.executionNamePrefix.

    Setting executionNamePrefix config for QueueGranules using rule.meta

    If you wanted to use a prefix of "my-prefix", you would create a rule with a meta property similar to this:

    {
    "executionNamePrefix": "my-prefix"
    }

    The value of meta.executionNamePrefix from the rule will be set as meta.executionNamePrefix in the workflow message.

    Then, the workflow could contain a "QueueGranules" step with the following state, which uses meta.executionNamePrefix from the message as the value for the executionNamePrefix config to the "QueueGranules" step:

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "executionNamePrefix": "{$.meta.executionNamePrefix}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },
    }
    - + \ No newline at end of file diff --git a/docs/v9.0.0/operator-docs/ops-common-use-cases/index.html b/docs/v9.0.0/operator-docs/ops-common-use-cases/index.html index 485267aae10..969fdb08012 100644 --- a/docs/v9.0.0/operator-docs/ops-common-use-cases/index.html +++ b/docs/v9.0.0/operator-docs/ops-common-use-cases/index.html @@ -5,13 +5,13 @@ Operator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v9.0.0/operator-docs/trigger-workflow/index.html b/docs/v9.0.0/operator-docs/trigger-workflow/index.html index 8a637e62f1f..2cd248aa9e7 100644 --- a/docs/v9.0.0/operator-docs/trigger-workflow/index.html +++ b/docs/v9.0.0/operator-docs/trigger-workflow/index.html @@ -5,13 +5,13 @@ Trigger a Workflow Execution | Cumulus Documentation - +
    Version: v9.0.0

    Trigger a Workflow Execution

    To trigger a workflow, you need to create a rule. To trigger an ingest workflow, one that requires discovering and ingesting data, you will also need to configure the collection and provider and associate those to a rule.

    Trigger a HelloWorld Workflow

    To trigger a HelloWorld workflow that does not need to discover or archive data, you just need to create a rule.

    You can leave the provider and collection blank and do not need any additional metadata. If you create a onetime rule, the workflow execution will start momentarily and you can view its status on the Executions page.

    Trigger an Ingest Workflow

    To ingest data, you will need a provider and collection configured to tell your workflow where to discover data and where to archive the data respectively.

    Follow the instructions to create a provider and create a collection and configure their fields for your data ingest.

    In the rule's additional metadata you can specify a provider_path from which to get the data from the provider.

    Example: Ingest data from S3

    Setup

    Assume there are 2 files to be ingested in an S3 bucket called discovery-bucket, located in the test-data folder:

    • GRANULE.A2017025.jpg
    • GRANULE.A2017025.hdf

    Archive buckets should already be created and mapped to public / private / protected in the Cumulus deployment.

    For example:

    buckets = {
    private = {
    name = "discovery-bucket"
    type = "private"
    },
    protected = {
    name = "archive-protected"
    type = "protected"
    }
    public = {
    name = "archive-public"
    type = "public"
    }
    }

    Create a provider

    Create a new provider. Set protocol to S3 and Host to discovery-bucket.

    Screenshot of adding a sample S3 provider

    Create a collection

    Create a new collection. Configure the collection to extract the granule id from the filenames and configure where to store the granule files.

    The configuration below will store hdf files in the protected bucket and jpg files in the private bucket. The bucket types are

    {
    "name": "test-collection",
    "version": "001",
    "granuleId": "^GRANULE\\.A[\\d]{7}$",
    "granuleIdExtraction": "(GRANULE\\..*)(\\.hdf|\\.jpg)",
    "reportToEms": false,
    "sampleFileName": "GRANULE.A2017025.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^GRANULE\\.A[\\d]{7}\\.hdf$",
    "sampleFileName": "GRANULE.A2017025.hdf"
    },
    {
    "bucket": "public",
    "regex": "^GRANULE\\.A[\\d]{7}\\.jpg$",
    "sampleFileName": "GRANULE.A2017025.jpg"
    }
    ]
    }

    Create a rule

    Create a rule to trigger the workflow to discover your granule data and ingest your granule.

    Select the previously created provider and collection. See the Cumulus Discover Granules workflow for a workflow example of using Cumulus tasks to discover and queue data for ingest.

    In the rule meta, set the provider_path to test-data, so the test-data folder will be used to discover new granules.

    Screenshot of adding a Discover Granules rule

    A onetime rule will run your workflow on-demand and you can view it on the dashboard Executions page. The Cumulus Discover Granules workflow will trigger an ingest workflow and your ingested granules will be visible on the dashboard Granules page.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/tasks/index.html b/docs/v9.0.0/tasks/index.html index c2f5286cf4b..14dc7b4fe97 100644 --- a/docs/v9.0.0/tasks/index.html +++ b/docs/v9.0.0/tasks/index.html @@ -5,13 +5,13 @@ Cumulus Tasks | Cumulus Documentation - +
    Version: v9.0.0

    Cumulus Tasks

    A list of reusable Cumulus tasks. Add your own.

    Tasks

    @cumulus/add-missing-file-checksums

    Add checksums to files in S3 which don't have one


    @cumulus/discover-granules

    Discover Granules in FTP/HTTP/HTTPS/SFTP/S3 endpoints


    @cumulus/discover-pdrs

    Discover PDRs in FTP and HTTP endpoints


    @cumulus/files-to-granules

    Converts array-of-files input into a granules object by extracting granuleId from filename


    @cumulus/hello-world

    Example task


    @cumulus/hyrax-metadata-updates

    Update granule metadata with hooks to OPeNDAP URL


    @cumulus/lzards-backup

    Run LZARDS backup


    @cumulus/move-granules

    Move granule files from staging to final location


    @cumulus/parse-pdr

    Download and Parse a given PDR


    @cumulus/pdr-status-check

    Checks execution status of granules in a PDR


    @cumulus/post-to-cmr

    Post a given granule to CMR


    @cumulus/queue-granules

    Add discovered granules to the queue


    @cumulus/queue-pdrs

    Add discovered PDRs to a queue


    @cumulus/queue-workflow

    Add workflow to the queue


    @cumulus/sf-sqs-report

    Sends an incoming Cumulus message to SQS


    @cumulus/sync-granule

    Download a given granule


    @cumulus/test-processing

    Fake processing task used for integration tests


    @cumulus/update-cmr-access-constraints

    Updates CMR metadata to set access constraints


    Update CMR metadata files with correct online access urls and etags and transfer etag info to granules' CMR files

    - + \ No newline at end of file diff --git a/docs/v9.0.0/team/index.html b/docs/v9.0.0/team/index.html index 6c75ce439f2..07c6675d647 100644 --- a/docs/v9.0.0/team/index.html +++ b/docs/v9.0.0/team/index.html @@ -5,13 +5,13 @@ Cumulus Team | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v9.0.0/troubleshooting/index.html b/docs/v9.0.0/troubleshooting/index.html index 774e406ba64..26ea15824ae 100644 --- a/docs/v9.0.0/troubleshooting/index.html +++ b/docs/v9.0.0/troubleshooting/index.html @@ -5,14 +5,14 @@ How to Troubleshoot and Fix Issues | Cumulus Documentation - +
    Version: v9.0.0

    How to Troubleshoot and Fix Issues

    While Cumulus is a complex system, there is a focus on maintaining the integrity and availability of the system and data. Should you encounter errors or issues while using this system, this section will help troubleshoot and solve those issues.

    Backup and Restore

    Cumulus has backup and restore functionality built-in to protect Cumulus data and allow recovery of a Cumulus stack. This is currently limited to Cumulus data and not full S3 archive data. Backup and restore is not enabled by default and must be enabled and configured to take advantage of this feature.

    For more information, read the Backup and Restore documentation.

    Elasticsearch reindexing

    If you run into issues with your Elasticsearch index, a reindex operation is available via the Cumulus API. See the Reindexing Guide.

    Information on how to reindex Elasticsearch is in the Cumulus API documentation.

    Troubleshooting Workflows

    Workflows are state machines comprised of tasks and services and each component logs to CloudWatch. The CloudWatch logs for all steps in the execution are displayed in the Cumulus dashboard or you can find them by going to CloudWatch and navigating to the logs for that particular task.

    Workflow Errors

    Visual representations of executed workflows can be found in the Cumulus dashboard or the AWS Step Functions console for that particular execution.

    If a workflow errors, the error will be handled according to the error handling configuration. The task that fails will have the exception field populated in the output, giving information about the error. Further information can be found in the CloudWatch logs for the task.

    Graph of AWS Step Function execution showing a failing workflow

    Workflow Did Not Start

    Generally, first check your rule configuration. If that is satisfactory, the answer will likely be in the CloudWatch logs for the schedule SF or SF starter lambda functions. See the workflow triggers page for more information on how workflows start.

    For Kinesis and SNS rules specifically, if an error occurs during the message consumer process, the fallback consumer lambda will be called and if the message continues to error, a message will be placed on the dead letter queue. Check the dead letter queue for a failure message. Errors can be traced back to the CloudWatch logs for the message consumer and the fallback consumer. Additionally, check that the name and version match those configured in your rule, as rules are filtered by the notification's collection name and version before scheduling executions.

    More information on kinesis error handling is here.

    Operator API Errors

    All operator API calls are funneled through the ApiEndpoints lambda. Each API call is logged to the ApiEndpoints CloudWatch log for your deployment.

    Lambda Errors

    KMS Exception: AccessDeniedException

    KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.

    The above error was being thrown by cumulus lambda function invocation. The KMS key is the encryption key used to encrypt lambda environment variables. The root cause of this error is unknown, but is speculated to be caused by deleting and recreating, with the same name, the IAM role the lambda uses.

    This error can be resolved by switching the lambda's execution role to a different one and then back through the Lambda management console. Unfortunately, this approach doesn't scale well.

    The other resolution (that scales but takes some time) that was found is as follows:

    1. Comment out all lambda definitions (and dependent resources) in your Terraform configuration.
    2. terraform apply to delete the lambdas.
    3. Un-comment the definitions.
    4. terraform apply to recreate the lambdas.

    If this problem occurs with Core lambdas and you are using the terraform-aws-cumulus.zip file source distributed in our release, we recommend using the non-scaling approach as the number of lambdas we distribute is in the low teens, which are likely to be easier and faster to reconfigure one-by-one compared to editing our configs.

    Error: Unable to import module 'index': Error

    This error is shown in the CloudWatch logs for a Lambda function.

    One possible cause is that the Lambda definition in the .tf file defining the lambda is not pointing to the correct packaged lambda source file. In order to resolve this issue, update the lambda definition to point directly to the packaged (e.g. .zip) lambda source file.

    resource "aws_lambda_function" "discover_granules_task" {
    function_name = "${var.prefix}-DiscoverGranules"
    filename = "${path.module}/../../tasks/discover-granules/dist/lambda.zip"
    handler = "index.handler"
    }

    If you are seeing this error when using the Lambda as a step in a Cumulus workflow, then inspect the output for this Lambda step in the AWS Step Function console. If you see the error Cannot find module 'node_modules/@cumulus/cumulus-message-adapter-js', then you need to ensure the lambda's packaged dependencies include cumulus-message-adapter-js.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/troubleshooting/reindex-elasticsearch/index.html b/docs/v9.0.0/troubleshooting/reindex-elasticsearch/index.html index d603a3e3d35..f6651576e1a 100644 --- a/docs/v9.0.0/troubleshooting/reindex-elasticsearch/index.html +++ b/docs/v9.0.0/troubleshooting/reindex-elasticsearch/index.html @@ -5,7 +5,7 @@ Reindexing Elasticsearch Guide | Cumulus Documentation - + @@ -14,7 +14,7 @@ current index, or the mappings for an index have been updated (they do not update automatically). Any reindexing that will be required when upgrading Cumulus will be in the Migration Steps section of the changelog.

    Switch to a new index and Reindex

    There are two operations needed: reindex and change-index to switch over to the new index. A Change Index/Reindex can be done in either order, but both have their trade-offs.

    If you decide to point Cumulus to a new (empty) index first (with a change index operation), and then Reindex the data to the new index, data ingested while reindexing will automatically be sent to the new index. As reindexing operations can take a while, not all the data will show up on the Cumulus Dashboard right away. The advantage is you do not have to turn of any ingest operations. This way is recommended.

    If you decide to Reindex data to a new index first, and then point Cumulus to that new index, it is not guaranteed that data that is sent to the old index while reindexing will show up in the new index. If you prefer this way, it is recommended to turn off any ingest operations. This order will keep your dashboard data from seeing any interruption.

    Change Index

    This will point Cumulus to the index in Elasticsearch that will be used when retrieving data. Performing a change index operation to an index that does not exist yet will create the index for you. The change index operation can be found here.

    Reindex from the old index to the new index

    The reindex operation will take the data from one index and copy it into another index. The reindex operation can be found here

    Reindex status

    Reindexing is a long-running operation. The reindex-status endpoint can be used to monitor the progress of the operation.

    Index from database

    If you want to just grab the data straight from the database you can perform an Index from Database Operation. After the data is indexed from the database, a Change Index operation will need to be performed to ensure Cumulus is pointing to the right index. It is strongly recommended to turn off workflow rules when performing this operation so any data ingested to the database is not lost.

    Validate reindex

    To validate the reindex, use the reindex-status endpoint. The doc count can be used to verify that the reindex was successful. In the below example the reindex from cumulus-2020-11-3 to cumulus-2021-3-4 was not fully successful as they show different doc counts.

    "indices": {
    "cumulus-2020-11-3": {
    "primaries": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    },
    "total": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    }
    },
    "cumulus-2021-3-4": {
    "primaries": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    },
    "total": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    }
    }
    }

    To further drill down into what is missing, log in to the Kibana instance (found in the Elasticsearch section of the AWS console) and run the following command replacing <index> with your index name.

    GET <index>/_search
    {
    "aggs": {
    "count_by_type": {
    "terms": {
    "field": "_type"
    }
    }
    },
    "size": 0
    }

    which will produce a result like

    "aggregations": {
    "count_by_type": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "logs",
    "doc_count": 483955
    },
    {
    "key": "execution",
    "doc_count": 4966
    },
    {
    "key": "deletedgranule",
    "doc_count": 4715
    },
    {
    "key": "pdr",
    "doc_count": 1822
    },
    {
    "key": "granule",
    "doc_count": 740
    },
    {
    "key": "asyncOperation",
    "doc_count": 616
    },
    {
    "key": "provider",
    "doc_count": 108
    },
    {
    "key": "collection",
    "doc_count": 87
    },
    {
    "key": "reconciliationReport",
    "doc_count": 48
    },
    {
    "key": "rule",
    "doc_count": 7
    }
    ]
    }
    }

    Resuming a reindex

    If a reindex operation did not fully complete it can be resumed using the following command run from the Kibana instance.

    POST _reindex?wait_for_completion=false
    {
    "conflicts": "proceed",
    "source": {
    "index": "cumulus-2020-11-3"
    },
    "dest": {
    "index": "cumulus-2021-3-4",
    "op_type": "create"
    }
    }

    The Cumulus API reindex-status endpoint can be used to monitor completion of this operation.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/troubleshooting/rerunning-workflow-executions/index.html b/docs/v9.0.0/troubleshooting/rerunning-workflow-executions/index.html index 720f8f4ad9a..e8fc8b777fb 100644 --- a/docs/v9.0.0/troubleshooting/rerunning-workflow-executions/index.html +++ b/docs/v9.0.0/troubleshooting/rerunning-workflow-executions/index.html @@ -5,13 +5,13 @@ Re-running workflow executions | Cumulus Documentation - +
    Version: v9.0.0

    Re-running workflow executions

    To re-run a Cumulus workflow execution from the AWS console:

    1. Visit the page for an individual workflow execution

    2. Click the "New execution" button at the top right of the screen

      Screenshot of the AWS console for a Step Function execution highlighting the &quot;New execution&quot; button at the top right of the screen

    3. In the "New execution" modal that appears, replace the cumulus_meta.execution_name value in the default input with the value of the new execution ID as seen in the screenshot below

      Screenshot of the AWS console showing the modal window for entering input when running a new Step Function execution

    4. Click the "Start execution" button

    - + \ No newline at end of file diff --git a/docs/v9.0.0/troubleshooting/troubleshooting-deployment/index.html b/docs/v9.0.0/troubleshooting/troubleshooting-deployment/index.html index 80b38dad1f0..f5919d180cd 100644 --- a/docs/v9.0.0/troubleshooting/troubleshooting-deployment/index.html +++ b/docs/v9.0.0/troubleshooting/troubleshooting-deployment/index.html @@ -5,7 +5,7 @@ Troubleshooting Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ data-persistence modules, but your config is only creating one Elasticsearch instance. To fix the issue, update the elasticsearch_config variable for your data-persistence module to increase the number of instances:

    {
    domain_name = "es"
    instance_count = 2
    instance_type = "t2.small.elasticsearch"
    version = "5.3"
    volume_size = 10
    }

    Install dashboard

    Dashboard configuration

    Issues:

    • Problem clearing the cache: EACCES: permission denied, rmdir '/tmp/gulp-cache/default'", this probably means the files at that location, and/or the folder, are owned by someone else (or some other factor prevents you from writing there).

    It's possible to workaround this by editing the file cumulus-dashboard/node_modules/gulp-cache/index.js and alter the value of the line var fileCache = new Cache({cacheDirName: 'gulp-cache'}); to something like var fileCache = new Cache({cacheDirName: '<prefix>-cache'});. Now gulp-cache will be able to write to /tmp/<prefix>-cache/default, and the error should resolve.

    Dashboard deployment

    Issues:

    • If the dashboard sends you to an Earthdata Login page that has an error reading "Invalid request, please verify the client status or redirect_uri before resubmitting", this means you've either forgotten to update one or more of your EARTHDATA_CLIENT_ID, EARTHDATA_CLIENT_PASSWORD environment variables (from your app/.env file) and re-deploy Cumulus, or you haven't placed the correct values in them, or you've forgotten to add both the "redirect" and "token" URL to the Earthdata Application.
    • There is odd caching behavior associated with the dashboard and Earthdata Login at this point in time that can cause the above error to reappear on the Earthdata Login page loaded by the dashboard even after fixing the cause of the error. If you experience this, attempt to access the dashboard in a new browser window, and it should work.
    - + \ No newline at end of file diff --git a/docs/v9.0.0/upgrade-notes/migrate_tea_standalone/index.html b/docs/v9.0.0/upgrade-notes/migrate_tea_standalone/index.html index ea6c0e7e9d2..a42a658a1f7 100644 --- a/docs/v9.0.0/upgrade-notes/migrate_tea_standalone/index.html +++ b/docs/v9.0.0/upgrade-notes/migrate_tea_standalone/index.html @@ -5,13 +5,13 @@ Migrate TEA deployment to standalone module | Cumulus Documentation - +
    Version: v9.0.0

    Migrate TEA deployment to standalone module

    Background

    This document is only relevant for upgrades of Cumulus from versions < 3.x.x to versions > 3.x.x

    Previous versions of Cumulus included deployment of the Thin Egress App (TEA) by default in the distribution module. As a result, Cumulus users who wanted to deploy a new version of TEA to wait on a new release of Cumulus that incorporated that release.

    In order to give Cumulus users the flexibility to deploy newer versions of TEA whenever they want, deployment of TEA has been removed from the distribution module and Cumulus users must now add the TEA module to their deployment. Guidance on integrating the TEA module to your deployment is provided, or you can refer to Cumulus core example deployment code for the thin_egress_app module.

    By default, when upgrading Cumulus and moving from TEA deployed via the distribution module to deployed as a separate module, your API gateway for TEA would be destroyed and re-created, which could cause outages for any Cloudfront endpoints pointing at that API gateway.

    These instructions outline how to modify your state to preserve your existing Thin Egress App (TEA) API gateway when upgrading Cumulus and moving deployment of TEA to a standalone module. If you do not care about preserving your API gateway for TEA when upgrading your Cumulus deployment, you can skip these instructions.

    Prerequisites

    Notes about state management

    These instructions will involve manipulating your Terraform state via terraform state mv commands. These operations are extremely dangerous, since a mistake in editing your Terraform state can leave your stack in a corrupted state where deployment may be impossible or may result in unanticipated resource deletion.

    Since bucket versioning preserves a separate version of your state file each time it is written, and the Terraform state modification commands overwrite the state file, we can mitigate the risk of these operations by downloading the most recent state file before starting the upgrade process. Then, if anything goes wrong during the upgrade, we can restore that previous state version. Guidance on how to perform both operations is provided below.

    Download your most recent state version

    Run this command to download the most recent cumulus deployment state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp s3://BUCKET/KEY /path/to/terraform.tfstate

    Restore a previous state version

    Upload the state file that was previously downloaded to the bucket/key for your state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp /path/to/terraform.tfstate s3://BUCKET/KEY

    Then run terraform plan, which will give an error because we manually overwrote the state file and it is now out of sync with the lock table Terraform uses to track your state file:

    Error: Error loading state: state data in S3 does not have the expected content.

    This may be caused by unusually long delays in S3 processing a previous state
    update. Please wait for a minute or two and try again. If this problem
    persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
    to manually verify the remote state and update the Digest value stored in the
    DynamoDB table to the following value: <some-digest-value>

    To resolve this error, run this command and replace DYNAMO_LOCK_TABLE, BUCKET and KEY with the correct values from cumulus-tf/terraform.tf, and use the digest value from the previous error output:

     aws dynamodb put-item \
    --table-name DYNAMO_LOCK_TABLE \
    --item '{
    "LockID": {"S": "BUCKET/KEY-md5"},
    "Digest": {"S": "some-digest-value"}
    }'

    Now, if you re-run terraform plan, it should work as expected.

    Migration instructions

    Please note: These instructions assume that you are deploying the thin_egress_app module as shown in the Cumulus core example deployment code

    1. Ensure that you have downloaded the latest version of your state file for your cumulus deployment

    2. Find the URL for your <prefix>-thin-egress-app-EgressGateway API gateway. Confirm that you can access it in the browser and that it is functional.

    3. Run terraform plan. You should see output like (edited for readability):

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be created
      + resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket.lambda_source will be created
      + resource "aws_s3_bucket" "lambda_source" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be created
      + resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be created
      + resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be created
      + resource "aws_s3_bucket_object" "lambda_source" {

      # module.thin_egress_app.aws_security_group.egress_lambda[0] will be created
      + resource "aws_security_group" "egress_lambda" {

      ...

      # module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be destroyed
      - resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source will be destroyed
      - resource "aws_s3_bucket" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be destroyed
      - resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be destroyed
      - resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source will be destroyed
      - resource "aws_s3_bucket_object" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda[0] will be destroyed
      - resource "aws_security_group" "egress_lambda" {
    4. Run the state modification commands. The commands must be run in exactly this order:

       # Move security group
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda module.thin_egress_app.aws_security_group.egress_lambda

      # Move TEA storage bucket
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source module.thin_egress_app.aws_s3_bucket.lambda_source

      # Move TEA lambda source code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source module.thin_egress_app.aws_s3_bucket_object.lambda_source

      # Move TEA lambda dependency code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive

      # Move TEA Cloudformation template
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template module.thin_egress_app.aws_s3_bucket_object.cloudformation_template

      # Move URS creds secret version
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret_version.thin_egress_urs_creds aws_secretsmanager_secret_version.thin_egress_urs_creds

      # Move URS creds secret
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret.thin_egress_urs_creds aws_secretsmanager_secret.thin_egress_urs_creds

      # Move TEA Cloudformation stack
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app module.thin_egress_app.aws_cloudformation_stack.thin_egress_app

      Depending on how you were supplying a bucket map to TEA, there may be an additional step. If you were specifying the bucket_map_key variable to the cumulus module to use a custom bucket map, then you can ignore this step and just ensure that the bucket_map_file variable to the TEA module uses that same S3 key. Otherwise, if you were letting Cumulus generate a bucket map for you, then you need to take this step to migrate that bucket map:

      # Move bucket map
      terraform state mv module.cumulus.module.distribution.aws_s3_bucket_object.bucket_map_yaml[0] aws_s3_bucket_object.bucket_map_yaml
    5. Run terraform plan again. You may still see a few additions/modifications pending like below, but you should not see any deletion of Thin Egress App resources pending:

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be updated in-place
      ~ resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be updated in-place
      ~ resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_source" {

      If you still see deletion of module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app pending, then something went wrong and you should restore the previously downloaded state file version and start over from step 1. Otherwise, proceed to step 6.

    6. Once you have confirmed that everything looks as expected, run terraform apply.

    7. Visit the same API gateway from step 1 and confirm that it still works.

    Your TEA deployment has now been migrated to a standalone module, which gives you the ability to upgrade the deployed version of TEA independently of Cumulus releases.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/upgrade-notes/upgrade-rds/index.html b/docs/v9.0.0/upgrade-notes/upgrade-rds/index.html index 66207924722..cb3a9fcb35f 100644 --- a/docs/v9.0.0/upgrade-notes/upgrade-rds/index.html +++ b/docs/v9.0.0/upgrade-notes/upgrade-rds/index.html @@ -5,7 +5,7 @@ Upgrade to RDS release | Cumulus Documentation - + @@ -21,7 +21,7 @@ | cutoffSeconds | number | Number of seconds prior to this execution to 'cutoff' reconciliation queries. This allows in-progress/other in-flight operations time to complete and propagate to Elasticsearch/Dynamo/postgres. | 3600 | | dbConcurrency | number | Sets max number of parallel collections reports the script will run at a time. | 20 | | dbMaxPool | number | Sets the maximum number of connections the database pool has available. Modifying this may result in unexpected failures. | 20 |

    - + \ No newline at end of file diff --git a/docs/v9.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html b/docs/v9.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html index 28185bea590..fefd164a600 100644 --- a/docs/v9.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html +++ b/docs/v9.0.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html @@ -5,13 +5,13 @@ Upgrade to TF version 0.13.6 | Cumulus Documentation - +
    Version: v9.0.0

    Upgrade to TF version 0.13.6

    Background

    Cumulus pins its support to a specific version of Terraform see: deployment documentation. The reason for only supporting one specific Terraform version at a time is to avoid deployment errors than can be caused by deploying to the same target with different Terraform versions.

    Cumulus is upgrading its supported version of Terraform from 0.12.12 to 0.13.6. This document contains instructions on how to perform the uprade for your deployments.

    Prerequisites

    • Follow the Terraform guidance for what to do before upgrading, notably ensuring that you have no pending changes to your Cumulus deployments before proceeding.
      • You should do a terraform plan to see if you have any pending changes for your deployment (for both the data-persistence-tf and cumulus-tf modules), and if so, run a terraform apply before doing the upgrade to Terraform 0.13.6
    • Review the Terraform v0.13 release notes to prepare for any breaking changes that may affect your custom deployment code. Cumulus' deployment code has already been updated for compatibility with version 0.13.
    • Install Terraform version 0.13.6. We recommend using Terraform Version Manager tfenv to manage your installed versons of Terraform, but this is not required.

    Upgrade your deployment code

    Terraform 0.13 does not support some of the syntax from previous Terraform versions, so you need to upgrade your deployment code for compatibility.

    Terraform provides a 0.13upgrade command as part of version 0.13 to handle automatically upgrading your code. Make sure to check out the documentation on batch usage of 0.13upgrade, which will allow you to upgrade all of your Terraform code with one command.

    Run the 0.13upgrade command until you have no more necessary updates to your deployment code.

    Upgrade your deployment

    1. Ensure that you are running Terraform 0.13.6 by running terraform --version. If you are using tfenv, you can switch versions by running tfenv use 0.13.6.

    2. For the data-persistence-tf and cumulus-tf directories, take the following steps:

      1. Run terraform init --reconfigure. The --reconfigure flag is required, otherwise you might see an error like:

        Error: Failed to decode current backend config

        The backend configuration created by the most recent run of "terraform init"
        could not be decoded: unsupported attribute "lock_table". The configuration
        may have been initialized by an earlier version that used an incompatible
        configuration structure. Run "terraform init -reconfigure" to force
        re-initialization of the backend.
      2. Run terraform apply to perform a deployment.

        WARNING: Even if Terraform says that no resource changes are pending, running the apply using Terraform version 0.13.6 will modify your backend state from version 0.12.12 to version 0.13.6 without requiring approval. Updating the backend state is a necessary part of the version 0.13.6 upgrade, but it is not completely transparent.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/workflow_tasks/discover_granules/index.html b/docs/v9.0.0/workflow_tasks/discover_granules/index.html index cce21149d6c..c6f859b0cfb 100644 --- a/docs/v9.0.0/workflow_tasks/discover_granules/index.html +++ b/docs/v9.0.0/workflow_tasks/discover_granules/index.html @@ -5,7 +5,7 @@ Discover Granules | Cumulus Documentation - + @@ -21,7 +21,7 @@ included in a granule's file list. That is, no such filtering based on filename occurs as described above.

    When set on the task configuration, the value applies to all collections during discovery. Otherwise, this property may be set on individual collections.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/workflow_tasks/files_to_granules/index.html b/docs/v9.0.0/workflow_tasks/files_to_granules/index.html index b1cc816d352..65338fcb533 100644 --- a/docs/v9.0.0/workflow_tasks/files_to_granules/index.html +++ b/docs/v9.0.0/workflow_tasks/files_to_granules/index.html @@ -5,13 +5,13 @@ Files To Granules | Cumulus Documentation - +
    Version: v9.0.0

    Files To Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming config.inputGranules and the task input list of s3 URIs along with the rest of the configuration objects to take the list of incoming files and sort them into a list of granule objects.

    Please note Files passed in without metadata defined previously for config.inputGranules will be added with the following keys:

    • name
    • bucket
    • filename
    • fileStagingDir

    It is primarily intended to support compatibility with the standard output of a processing task, and convert that output into a granule object accepted as input by the majority of other Cumulus tasks.

    Task Inputs

    Input

    This task expects an incoming input that contains an array of 'staged' S3 URIs to move to their final archive location.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    inputGranules

    An array of Cumulus granule objects.

    This object will be used to define metadata values for the move granules task, and is the basis for the updated object that will be added to the output.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/workflow_tasks/move_granules/index.html b/docs/v9.0.0/workflow_tasks/move_granules/index.html index eb03fc1babb..bd05c760013 100644 --- a/docs/v9.0.0/workflow_tasks/move_granules/index.html +++ b/docs/v9.0.0/workflow_tasks/move_granules/index.html @@ -5,13 +5,13 @@ Move Granules | Cumulus Documentation - +
    Version: v9.0.0

    Move Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming event.input array of Cumulus granule objects to do the following:

    • Move granules from their 'staging' location to the final location (as configured in the Sync Granules task)

    • Update the event.input object with the new file locations.

    • If the granule has a ECHO10/UMM CMR file(.cmr.xml or .cmr.json) file included in the event.input:

      • Update that file's access locations

      • Add it to the appropriate access URL category for the CMR filetype as defined by granule CNM filetype.

      • Set the CMR file to 'metadata' in the output granules object and add it to the granule files if it's not already present.

        Please note: Granules without a valid CNM type set in the granule file type field in event.input will be treated as "data" in the updated CMR metadata file

    • Task then outputs an updated list of granule objects.

    Task Inputs

    Input

    This task expects an incoming input that contains a list of 'staged' S3 URIs to move to their final archive location. If CMR metadata is to be updated for a granule, it must also be included in the input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects event.input to provide an array of Cumulus granule objects. The files listed for each granule represent the files to be acted upon as described in summary.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects with post-move file locations as the payload for the next task, and returns only the expected payload for the next task. If a CMR file has been specified for a granule object, the CMR resources related to the granule files will be updated according to the updated granule file metadata.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v9.0.0/workflow_tasks/parse_pdr/index.html b/docs/v9.0.0/workflow_tasks/parse_pdr/index.html index 0bde9f899bc..f81e6ac924c 100644 --- a/docs/v9.0.0/workflow_tasks/parse_pdr/index.html +++ b/docs/v9.0.0/workflow_tasks/parse_pdr/index.html @@ -5,13 +5,13 @@ Parse PDR | Cumulus Documentation - +
    Version: v9.0.0

    Parse PDR

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to do the following with the incoming PDR object:

    • Stage it to an internal S3 bucket

    • Parse the PDR

    • Archive the PDR and remove the staged file if successful

    • Outputs a payload object containing metadata about the parsed PDR (e.g. total size of all files, files counts, etc) and a granules object

    The constructed granules object is created using PDR metadata to determine values like data type and version, collection definitions to determine a file storage location based on the extracted data type and version number.

    Granule file types are converted from the PDR spec types to CNM types according to the following translation table:

      HDF: 'data',
    HDF-EOS: 'data',
    SCIENCE: 'data',
    BROWSE: 'browse',
    METADATA: 'metadata',
    BROWSE_METADATA: 'metadata',
    QA_METADATA: 'metadata',
    PRODHIST: 'qa',
    QA: 'metadata',
    TGZ: 'data',
    LINKAGE: 'data'

    Files missing file types will have none assigned, files with invalid types will result in a PDR parse failure.

    Task Inputs

    Input

    This task expects an incoming input that contains name and path information about the PDR to be parsed. For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    Provider

    A Cumulus provider object. Used to define connection information for retrieving the PDR.

    Bucket

    Defines the bucket where the 'pdrs' folder for parsed PDRs will be stored.

    Collection

    A Cumulus collection object. Used to define granule file groupings and granule metadata for discovered files.

    Task Outputs

    This task outputs a single payload output object containing metadata about the parsed PDR (e.g. filesCount, totalSize, etc), a pdr object with information for later steps and a the generated array of granule objects.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v9.0.0/workflows/cumulus-task-message-flow/index.html b/docs/v9.0.0/workflows/cumulus-task-message-flow/index.html index 98a6d9c2f99..4a12605c46b 100644 --- a/docs/v9.0.0/workflows/cumulus-task-message-flow/index.html +++ b/docs/v9.0.0/workflows/cumulus-task-message-flow/index.html @@ -5,14 +5,14 @@ Cumulus Tasks: Message Flow | Cumulus Documentation - +
    Version: v9.0.0

    Cumulus Tasks: Message Flow

    Cumulus Tasks comprise Cumulus Workflows and are either AWS Lambda tasks or AWS Elastic Container Service (ECS) activities. Cumulus Tasks permit a payload as input to the main task application code. The task payload is additionally wrapped by the Cumulus Message Adapter. The Cumulus Message Adapter supplies additional information supporting message templating and metadata management of these workflows.

    Diagram showing how incoming and outgoing Cumulus messages for workflow steps are handled by the Cumulus Message Adapter

    The steps in this flow are detailed in sections below.

    Cumulus Message Format

    A full Cumulus Message has the following keys:

    • cumulus_meta: System runtime information that should generally not be touched outside of Cumulus library code or the Cumulus Message Adapter. Stores meta information about the workflow such as the state machine name and the current workflow execution's name. This information is used to look up the current active task. The name of the current active task is used to look up the corresponding task's config in task_config.
    • meta: Runtime information captured by the workflow operators. Stores execution-agnostic variables.
    • payload: Payload is runtime information for the tasks.

    In addition to the above keys, it may contain the following keys:

    • replace: A key generated in conjunction with the Cumulus Message adapter. It contains the location on S3 for a message payload and a Target JSON path in the message to extract it to.
    • exception: A key used to track workflow exceptions, should not be modified outside of Cumulus library code.

    Here's a simple example of a Cumulus Message:

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    A message utilizing the Cumulus Remote message functionality must have at least the keys replace and cumulus_meta. Depending on configuration other portions of the message may be present, however the cumulus_meta, meta, and payload keys must be present once extraction is complete.

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    Cumulus Message Preparation

    The event coming into a Cumulus Task is assumed to be a Cumulus Message and should first be handled by the functions described below before being passed to the task application code.

    Preparation Step 1: Fetch remote event

    Fetch remote event will fetch the full event from S3 if the cumulus message includes a replace key.

    Once "my-large-event.json" is fetched from S3, it's returned from the fetch remote event function. If no "replace" key is present, the event passed to the fetch remote event function is assumed to be a complete Cumulus Message and returned as-is.

    Preparation Step 2: Parse step function config from CMA configuration parameters

    This step determines what current task is being executed. Note this is different from what lambda or activity is being executed, because the same lambda or activity can be used for different tasks. The current task name is used to load the appropriate configuration from the Cumulus Message's 'task_config' configuration parameter.

    Preparation Step 3: Load nested event

    Using the config returned from the previous step, load nested event resolves templates for the final config and input to send to the task's application code.

    Task Application Code

    After message prep, the message passed to the task application code is of the form:

    {
    "input": {},
    "config": {}
    }

    Create Next Message functions

    Whatever comes out of the task application code is used to construct an outgoing Cumulus Message.

    Create Next Message Step 1: Assign outputs

    The config loaded from the Fetch step function config step may have a cumulus_message key. This can be used to "dispatch" fields from the task's application output to a destination in the final event output (via URL templating). Here's an example where the value of input.anykey would be dispatched as the value of payload.out in the final cumulus message:

    {
    "task_config": {
    "bar": "baz",
    "cumulus_message": {
    "input": "{$.payload.input}",
    "outputs": [
    {
    "source": "{$.input.anykey}",
    "destination": "{$.payload.out}"
    }
    ]
    }
    },
    "cumulus_meta": {
    "task": "Example",
    "message_source": "local",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "input": {
    "anykey": "anyvalue"
    }
    }
    }

    Create Next Message Step 2: Store remote event

    If the ReplaceConfiguration parameter is set, the configured key's value will be stored in S3 and the final output of the task will include a replace key that contains configuration for a future step to extract the payload on S3 back into the Cumulus Message. The replace key identifies where the large event node has been stored in S3.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/workflows/developing-a-cumulus-workflow/index.html b/docs/v9.0.0/workflows/developing-a-cumulus-workflow/index.html index 94d7a1f5d07..9c620f74d97 100644 --- a/docs/v9.0.0/workflows/developing-a-cumulus-workflow/index.html +++ b/docs/v9.0.0/workflows/developing-a-cumulus-workflow/index.html @@ -5,13 +5,13 @@ Creating a Cumulus Workflow | Cumulus Documentation - +
    Version: v9.0.0

    Creating a Cumulus Workflow

    The Cumulus workflow module

    To facilitate adding a workflows to your deployment Cumulus provides a workflow module.

    In combination with the Cumulus message, the workflow module provides a way to easily turn a Step Function definition into a Cumulus workflow, complete with:

    Using the module also ensures that your workflows will continue to be compatible with future versions of Cumulus.

    For more on the full set of current available options for the module, please consult the module README.

    Adding a new Cumulus workflow to your deployment

    To add a new Cumulus workflow to your deployment that is using the cumulus module, add a new workflow resource to your deployment directory, either in a new .tf file, or to an existing file.

    The workflow should follow a syntax similar to:

    module "my_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/vx.x.x/terraform-aws-cumulus-workflow.zip"

    prefix = "my-prefix"
    name = "MyWorkflowName"
    system_bucket = "my-internal-bucket"

    workflow_config = module.cumulus.workflow_config

    tags = { Deployment = var.prefix }

    state_machine_definition = <<JSON
    {}
    JSON
    }

    In the above example, you would add your state_machine_definition using the Amazon States Language, using tasks you've developed and Cumulus core tasks that are made available as part of the cumulus terraform module.

    Please note: Cumulus follows the convention of tagging resources with the prefix variable { Deployment = var.prefix } that you pass to the cumulus module. For resources defined outside of Core, it's recommended that you adopt this convention as it makes resources and/or deployment recovery scenarios much easier to manage.

    Examples

    For a functional example of a basic workflow, please take a look at the hello_world_workflow.

    For more complete/advanced examples, please read the following cookbook entries/topics:

    - + \ No newline at end of file diff --git a/docs/v9.0.0/workflows/developing-workflow-tasks/index.html b/docs/v9.0.0/workflows/developing-workflow-tasks/index.html index 9d4471c2ac5..8c81051bec0 100644 --- a/docs/v9.0.0/workflows/developing-workflow-tasks/index.html +++ b/docs/v9.0.0/workflows/developing-workflow-tasks/index.html @@ -5,13 +5,13 @@ Developing Workflow Tasks | Cumulus Documentation - +
    Version: v9.0.0

    Developing Workflow Tasks

    Workflow tasks can be either AWS Lambda Functions or ECS Activities.

    Lambda functions

    The full set of available core Lambda functions can be found in the deployed cumulus module zipfile at /tasks, as well as reference documentation here. These Lambdas can be referenced in workflows via the outputs from that module (see the cumulus-template-deploy repo for an example).

    The tasks source is located in the Cumulus repository at cumulus/tasks.

    You can also develop your own Lambda function. See the Lambda Functions page to learn more.

    ECS Activities

    ECS activities are supported via the cumulus_ecs_module available from the Cumulus release page.

    Please read the module README for configuration details.

    For assistance in creating a task definition within the module read the AWS Task Definition Docs.

    For a step-by-step example of using the cumulus_ecs_module, please see the related cookbook entry.

    Cumulus Docker Image

    ECS activities require a docker image. Cumulus provides a docker image (source for node 12x+ lambdas on dockerhub: cumuluss/cumulus-ecs-task.

    Alternate Docker Images

    Custom docker images/runtimes are supported as are private registries. For details on configuring a private registry/image see the AWS documentation on Private Registry Authentication for Tasks.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/workflows/docker/index.html b/docs/v9.0.0/workflows/docker/index.html index 706af182a82..b2d26f81b6c 100644 --- a/docs/v9.0.0/workflows/docker/index.html +++ b/docs/v9.0.0/workflows/docker/index.html @@ -5,7 +5,7 @@ Dockerizing Data Processing | Cumulus Documentation - + @@ -14,7 +14,7 @@ 2) validate the output (in this case just check for existence) 3) use 'ncatted' to update the resulting file to be CF-compliant 4) write out metadata generated for this file

    Process Testing

    It is important to have tests for data processing, however in many cases datafiles can be large so it is not practical to store the test data in the repository. Instead, test data is currently stored on AWS S3, and can be retrieved using the AWS CLI.

    aws s3 sync s3://cumulus-ghrc-logs/sample-data/collection-name data

    Where collection-name is the name of the data collection, such as 'avaps', or 'cpl'. For example, an abridged version of the data for CPL includes:

    ├── cpl
    │   ├── input
    │   │   ├── HS3_CPL_ATB_12203a_20120906.hdf5
    │   │   ├── HS3_CPL_OP_12203a_20120906.hdf5
    │   └── output
    │   ├── HS3_CPL_ATB_12203a_20120906.nc
    │   ├── HS3_CPL_ATB_12203a_20120906.nc.meta.xml
    │   ├── HS3_CPL_OP_12203a_20120906.nc
    │   ├── HS3_CPL_OP_12203a_20120906.nc.meta.xml

    Contained in the input directory are all possible sets of data files, while the output directory is the expected result of processing. In this case the hdf5 files are converted to NetCDF files and XML metadata files are generated.

    The docker image for a process can be used on the retrieved test data. First create a test-output directory in the newly created data directory.

    mkdir data/test-output

    Then run the docker image using docker-compose.

    docker-compose run test

    This will process the data in the data/input directory and put the output into data/test-output. Repositories also include Python based tests which will validate this newly created output to the contents of data/output. Use Python's Nose tool to run the included tests.

    nosetests

    If the data/test-output directory validated against the contents of data/output the tests will be successful, otherwise an error will be reported.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/workflows/index.html b/docs/v9.0.0/workflows/index.html index 4342768a31e..2883122032d 100644 --- a/docs/v9.0.0/workflows/index.html +++ b/docs/v9.0.0/workflows/index.html @@ -5,13 +5,13 @@ Workflows | Cumulus Documentation - +
    Version: v9.0.0

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    Provider data ingest and GIBS have a set of common needs in getting data from a source system and into the cloud where they can be distributed to end users. These common needs are:

    • Data Discovery - Crawling, polling, or detecting changes from a variety of sources.
    • Data Transformation - Taking data files in their original format and extracting and transforming them into another desired format such as visible browse images.
    • Archival - Storage of the files in a location that's accessible to end users.

    The high level view of the architecture and many of the individual steps are the same but the details of ingesting each type of collection differs. Different collection types and different providers have different needs. The individual boxes of a workflow are not only different. The branching, error handling, and multiplicity of the arrows connecting the boxes are also different. Some need visible images rendered from component data files from multiple collections. Some need to contact the CMR with updated metadata. Some will have different retry strategies to handle availability issues with source data systems.

    AWS and other cloud vendors provide an ideal solution for parts of these problems but there needs to be a higher level solution to allow the composition of AWS components into a full featured solution. The Ingest Workflow Architecture is designed to meet the needs for Earth Science data ingest and transformation.

    Goals

    Flexibility and Composability

    The steps to ingest and process data is different for each collection within a provider. Ingest should be as flexible as possible in the rearranging of steps and configuration.

    We want to use lego-like individual steps that can be composed by an operator.

    Individual steps should ...

    • Be as ignorant as possible of the overall flow. They should not be aware of previous steps.
    • Be runnable on their own.
    • Define their input and output in simple data structures.
    • Be domain agnostic.
    • Not make assumptions of specifics of what goes into a granule for example.

    Scalable

    The ingest architecture needs to be scalable both to handle ingesting hundreds of millions of granules and interpret dozens of different workflows.

    Data Provenance

    • We should have traceability for how data was produced and where it comes from.
    • Use immutable representations of data. Data once received is not overwritten. Data can be removed for cleanup.
    • All software is versioned. We can trace transformation of data by tracking the immutable source data and the versioned software applied to it.

    Operator Visibility and Control

    • Operators should be able to see and understand everything that is happening in the system.
    • It should be obvious why things are happening and straightforward to diagnose problems.
    • We generally assume that the operators know best in terms of the limits on a providers infrastructure, how often things need to be done, and details of a collection. The architecture should defer to their decisions and knowledge while providing safety nets to prevent problems.

    A Reconfigurable Workflow Architecture

    The Ingest Workflow Architecture is defined by two entity types, Workflows and Tasks. A Workflow is a set of composed Tasks to complete an objective such as ingesting a granule. Tasks are the individual steps of a Workflow that perform one job. The workflow is responsible for executing the right task based on the current state and response from the last task executed. Tasks are completely decoupled in that they don't call each other or even need to know about the presence of other tasks.

    Workflows and tasks are configured as Terraform resources, which are triggered via configured rules within Cumulus.

    Diagram showing the Step Function execution path through workflow tasks for a collection ingest

    See the Example GIBS Ingest Architecture showing how workflows and tasks are used to define the GIBS Ingest Architecture.

    Workflows

    A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions.

    Benefits of AWS Step Functions

    AWS Step functions are described in detail in the AWS documentation but they provide several benefits which are applicable to AWS.

    • Prebuilt solution
    • Operations Visibility
      • Visual diagram
      • Every execution is recorded with both inputs and output for every step.
    • Composability
      • Allow composing AWS Lambdas and code running in other steps. Code can be run in EC2 to interface with it or even on premise if desired.
      • Step functions allow specifying when steps run in parallel or choices between steps based on data from the previous step.
    • Flexibility
      • Step functions are designed to be easy to build new applications and reconfigure. We're exposing that flexibility directly to the provider.
    • Reliability and Error Handling
      • Step functions allow configuration of retries and adding handling of error conditions.
    • Described via data
      • This makes it easy to save the step function in configuration management solutions.
      • We can build simple interfaces on top of the flexibility provided.

    Workflow Scheduler

    The scheduler is responsible for initiating a step function and passing in the relevant data for a collection. This is currently configured as an interval for each collection. The scheduler service creates the initial event by combining the collection configuration with the AWS execution context defined via the cumulus terraform module.

    Tasks

    A workflow is composed of tasks. Each task is responsible for performing a discrete step of the ingest process. These can be activities like:

    • Crawling a provider website for new data.
    • Uploading data from a provider to S3.
    • Executing a process to transform data.

    AWS Step Functions permit tasks to be code running anywhere, even on premise. We expect most tasks will be written as Lambda functions in order to take advantage of the easy deployment, scalability, and cost benefits provided by AWS Lambda.

    • Leverages Existing Work
      • The design leverages the existing work of Amazon by defining workflows using the AWS Step Function State Language. This is the language that was created for describing the state machines used in AWS Step Functions.
    • Open for Extension
      • Both meta and task_config which are used for configuring at the collection and task levels do not dictate the fields and structure of the configuration. Additional task specific JSON schemas can be used for extending the validation of individual steps.
    • Data-centric Configuration
      • The use of a single JSON configuration file allows this to be added to a workflow. We build additional support on top of the configuration file for simpler domain specific configuration or interactive GUIs.

    For more details on Task Messages and Configuration, visit Cumulus configuration and message protocol documentation.

    Ingest Deploy

    To view deployment documentation, please see the Cumulus deployment documentation.

    Tradeoffs, and Benefits

    This section documents various tradeoffs and benefits of the Ingest Workflow Architecture.

    Tradeoffs

    Workflow execution is handled completely by AWS

    This means we can't add our own code into the orchestration of the workflow. We can't add new features not supported by Step Functions. We can't do things like enforce that the responses from tasks always conform to a schema or extract the configuration for a task ahead of it's execution.

    If we implemented our own orchestration we'd be able to add all of these. We save significant amounts of development effort and gain all the features of Step Functions for this trade off. One workaround is by providing a library of common task capabilities. These would optionally be available to tasks that can be implemented with Node.js and are able to include the library.

    Workflow Configuration is specified in AWS Step Function States Language

    The current design combines the states language defined by AWS with Ingest specific configuration. This means our representation has a tight coupling with their standard. If they make backwards incompatible changes in the future we will have to deal with existing projects written against that.

    We avoid having to develop our own standard and code to process it. The design can support new features in AWS Step Functions without needing to update the Ingest library code changes. It is unlikely they will make a backwards incompatible change at this point. One mitigation for this is writing data transformations to a new format if that were to happen.

    Collection Configuration Flexibility vs Complexity

    The Collections Configuration File is very flexible but requires more knowledge of AWS step functions to configure. A person modifying this file directly would need to comfortable editing a JSON file and configuring AWS Step Functions state transitions which address AWS resources.

    The configuration file itself is not necessarily meant to be edited by a human directly. Since we are developing a reconfigurable, composable architecture that specified entirely in data additional tools can be developed on top of it. The existing recipes.json files can be mapped to this format. Operational Tools like a GUI can be built that provide a usable interface for customizing workflows but it will take time to develop these tools.

    Benefits

    This section describes benefits of the Ingest Workflow Architecture.

    Simplicity

    The concepts of Workflows and Tasks are simple ones that should make sense to providers. Additionally, the implementation will only consist of a few components because the design leverages existing services and capabilities of AWS. The Ingest implementation will only consist of some reusable task code to make task implementation easier, Ingest deployment, and the Workflow Scheduler.

    Composability

    The design aims to satisfy the needs for ingest integrating different workflows for providers. It's flexible in terms of the ability to arrange tasks to meet the needs of a collection. Providers have developed and incorporated open source tools over the years. All of these are easily integrable into the workflows as tasks.

    There is low coupling between task steps. Failures of one component don't bring the whole system down. Individual tasks can be deployed separately.

    Scalability

    AWS Step Functions scale up as needed and aren't limited by a set of number of servers. They also easily allow you to leverage the inherent scalability of serverless functions.

    Monitoring and Auditing

    • Every execution is captured.
    • Every task run has captured input and outputs.
    • CloudWatch Metrics can be used for monitoring many of the events with the StepFunctions. It can also generate alarms for the whole process.
    • Visual report of the entire configuration.
      • Errors and success states are highlighted visually in the flow.

    Data Provenance

    • Monitoring and auditing ensures we know the data that was given to a task.
    • Workflows are versioned and the state machines stored in AWS Step Functions are immutable. Once created they cannot change.
    • Versioning of data in S3 or using immutable records in S3 will mean we always know what data was created as the result of a step or fed into a step.

    Appendix

    Example GIBS Ingest Architecture

    This shows the GIBS Ingest Architecture as an example of the use of the Ingest Workflow Architecture.

    • The GIBS Ingest Architecture consists of two workflows per collection type. There is one for discovery and one for ingest. The final stage of discovery triggers multiple ingest workflows for each MRF granule that needs to be generated.
    • It demonstrates both lambdas as tasks and a container used for MRF generation.

    GIBS Ingest Workflows

    Diagram showing the AWS Step Function execution path for a GIBS ingest workflow

    GIBS Ingest Granules Workflow

    This shows a visualization of an execution of the ingets granules workflow in step functions. The steps highlighted in green are the ones that executed and completed successfully.

    Diagram showing the AWS Step Function execution path for a GIBS ingest granules workflow

    - + \ No newline at end of file diff --git a/docs/v9.0.0/workflows/input_output/index.html b/docs/v9.0.0/workflows/input_output/index.html index 9474b4a55c0..5a856480093 100644 --- a/docs/v9.0.0/workflows/input_output/index.html +++ b/docs/v9.0.0/workflows/input_output/index.html @@ -5,14 +5,14 @@ Workflow Inputs & Outputs | Cumulus Documentation - +
    Version: v9.0.0

    Workflow Inputs & Outputs

    General Structure

    Cumulus uses a common format for all inputs and outputs to workflows. The same format is used for input and output from workflow steps. The common format consists of a JSON object which holds all necessary information about the task execution and AWS environment. Tasks return objects identical in format to their input with the exception of a task-specific payload field. Tasks may also augment their execution metadata.

    Cumulus Message Adapter

    The Cumulus Message Adapter and Cumulus Message Adapter libraries help task developers integrate their tasks into a Cumulus workflow. These libraries adapt input and outputs from tasks into the Cumulus Message format. The Scheduler service creates the initial event message by combining the collection configuration, external resource configuration, workflow configuration, and deployment environment settings. The subsequent workflow messages between tasks must conform to the message schema. By using the Cumulus Message Adapter, individual task Lambda functions only receive the input and output specifically configured for the task, and not non-task-related message fields.

    The Cumulus Message Adapter libraries are called by the tasks with a callback function containing the business logic of the task as a parameter. They first adapt the incoming message to a format more easily consumable by Cumulus tasks, then invoke the task, and then adapt the task response back to the Cumulus message protocol to be sent to the next task.

    A task's Lambda function can be configured to include a Cumulus Message Adapter library which constructs input/output messages and resolves task configurations. The CMA can then be included in one of several ways:

    Lambda Layer

    In order to make use of this configuration, a Lambda layer must be uploaded to your account. Due to platform restrictions, Core cannot currently support sharable public layers, however you can deploy the appropriate version from the release page in two ways:

    Once you've deployed the layer, integrate the CMA layer with your Lambdas:

    • If using the cumulus module, set the cumulus_message_adapter_lambda_layer_version_arn in your .tfvars file to integrate the CMA layer with all core Cumulus lambdas.
    • If including your own Lambda or ECS task Terraform modules, specify the CMA layer ARN in the Terraform resource definitions. Also, make sure to set the CUMULUS_MESSAGE_ADAPTER_DIR environment variable for the task to /opt for the CMA integration to work properly.

    In the future if you wish to update/change the CMA version you will need to update the deployed CMA, and update the layer configuration for the impacted Lambdas as needed.

    Please Note: Updating/removing a layer does not change a deployed Lambda, so to update the CMA you should deploy a new version of the CMA layer, update the associated Lambda configuration to reference the new CMA version, and re-deploy your Lambdas.

    Manual Addition

    You can include the CMA package in the Lambda code in the cumulus-message-adapter sub-directory in your lambda .zip, for any Lambda runtime that includes a python runtime. python 2 is included in Lambda runtimes that use Amazon Linux, however Amazon Linux 2 will not support this directly.

    Please note: It is expected that upcoming Cumulus releases will update the CMA layer to include a python runtime.

    If you are manually adding the message adapter to your source and utilizing the CMA, you should set the Lambda's CUMULUS_MESSAGE_ADAPTER_DIR environment variable to target the installation path for the CMA.

    CMA Input/Output

    Input to the task application code is a json object with keys:

    • input: By default, the incoming payload is the payload output from the previous task, or it can be a portion of the payload as configured for the task in the corresponding .tf workflow definition file.
    • config: Task-specific configuration object with URL templates resolved.

    Output from the task application code is returned in and placed in the payload key by default, but the config key can also be used to return just a portion of the task output.

    CMA configuration

    As of Cumulus > 1.15 and CMA > v1.1.1, configuration of the CMA is expected to be driven by AWS Step Function Parameters.

    Using the CMA package with the Lambda by any of the above mentioned methods (Lambda Layers, manual) requires configuration for its various features via a specific Step Function Parameters configuration format (see sample workflows in the examples cumulus-tf source for more examples):

    {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": "{some config}",
    "task_config": "{some config}"
    }
    }

    The "event.$": "$" parameter is required as it passes the entire incoming message to the CMA client library for parsing, and the CMA itself to convert the incoming message into a Cumulus message for use in the function.

    The following are the CMA's current configuration settings:

    ReplaceConfig (Cumulus Remote Message)

    Because of the potential size of a Cumulus message, mainly the payload field, a task can be set via configuration to store a portion of its output on S3 with a message key Remote Message that defines how to retrieve it and an empty JSON object {} in its place. If the portion of the message targeted exceeds the configured MaxSize (defaults to 0 bytes) it will be written to S3.

    The CMA remote message functionality can be configured using parameters in several ways:

    Partial Message

    Setting the Path/Target path in the ReplaceConfig parameter (and optionally a non-default MaxSize)

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 1,
    "Path": "$.payload",
    "TargetPath": "$.payload"
    }
    }
    }
    }
    }

    will result in any payload output larger than the MaxSize (in bytes) to be written to S3. The CMA will then mark that the key has been replaced via a replace key on the event. When the CMA picks up the replace key in future steps, it will attempt to retrieve the output from S3 and write it back to payload.

    Note that you can optionally use a different TargetPath than Path, however as the target is a JSON path there must be a key to target for replacement in the output of that step. Also note that the JSON path specified must target one node, otherwise the CMA will error, as it does not support multiple replacement targets.

    If TargetPath is omitted, it will default to the value for Path.

    Full Message

    Setting the following parameters for a lambda:

    DiscoverGranules:
    Parameters:
    cma:
    event.$: '$'
    ReplaceConfig:
    FullMessage: true

    will result in the CMA assuming the entire inbound message should be stored to S3 if it exceeds the default max size.

    This is effectively the same as doing:

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 0,
    "Path": "$",
    "TargetPath": "$"
    }
    }
    }
    }
    }

    Cumulus Message example

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Cumulus Remote Message example

    The message may contain a reference to an S3 Bucket, Key and TargetPath as follows:

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    task_config

    This configuration key contains the input/output configuration values for definition of inputs/outputs via URL paths. Important: These values are all relative to json object configured for event.$.

    This configuration's behavior is outlined in the CMA step description below.

    The configuration should follow the format:

    {
    "FunctionName": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "other_cma_configuration": "<config object>",
    "task_config": "<task config>"
    }
    }
    }
    }

    Example:

    {
    "StepFunction": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "sfnEnd": true,
    "stack": "{$.meta.stack}",
    "bucket": "{$.meta.buckets.internal.name}",
    "stateMachine": "{$.cumulus_meta.state_machine}",
    "executionName": "{$.cumulus_meta.execution_name}",
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    }
    }
    }

    Cumulus Message Adapter Steps

    1. Reformat AWS Step Function message into Cumulus Message

    Due to the way AWS handles Parameterized messages, when Parameters are used the CMA takes an inbound message:

    {
    "resource": "arn:aws:lambda:us-east-1:<lambda arn values>",
    "input": {
    "Other Parameter": {},
    "cma": {
    "ConfigKey": {
    "config values": "some config values"
    },
    "event": {
    "cumulus_meta": {},
    "payload": {},
    "meta": {},
    "exception": {}
    }
    }
    }
    }

    and takes the following actions:

    • Takes the object at input.cma.event and makes it the full input
    • Merges all of the keys except event under input.cma into the parent input object

    This results in the incoming message (presumably a Cumulus message) with any cma configuration parameters merged in being passed to the CMA. All other parameterized values defined outside of the cma key are ignored

    2. Resolve Remote Messages

    If the incoming Cumulus message has a replace key value, the CMA will attempt to pull the payload from S3,

    For example, if the incoming contains the following:

      "meta": {
    "foo": {}
    },
    "replace": {
    "TargetPath": "$.meta.foo",
    "Bucket": "some_bucket",
    "Key": "events/some-event-id"
    }

    The CMA will attempt to pull the file stored at Bucket/Key and replace the value at TargetPath, then remove the replace object entirely and continue.

    3. Resolve URL templates in the task configuration

    In the workflow configuration (defined under the task_config key), each task has its own configuration, and it can use URL template as a value to achieve simplicity or for values only available at execution time. The Cumulus Message Adapter resolves the URL templates (relative to the event configuration key) and then passes message to next task. For example, given a task which has the following configuration:

    {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }
    }
    }
    }

    and and incoming message that contains:

    {
    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    }
    }

    The corresponding Cumulus Message would contain:

    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }

    The message sent to the task would be:

    "config" : {
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    },
    "inlinestr": "prefixbarsuffix",
    "array": ["bar"],
    "object": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    },
    "input": "{...}"

    URL template variables replace dotted paths inside curly brackets with their corresponding value. If the Cumulus Message Adapter cannot resolve a value, it will ignore the template, leaving it verbatim in the string. While seemingly complex, this allows significant decoupling of Tasks from one another and the data that drives them. Tasks are able to easily receive runtime configuration produced by previously run tasks and domain data.

    4. Resolve task input

    By default, the incoming payload is the payload from the previous task. The task can also be configured to use a portion of the payload its input message. For example, given a task specifies cma.task_config.cumulus_message.input:

        ExampleTask:
    Parameters:
    cma:
    event.$: '$'
    task_config:
    cumulus_message:
    input: '{$.payload.foo}'

    The task configuration in the message would be:

        {
    "task_config": {
    "cumulus_message": {
    "input": "{$.payload.foo}"
    }
    },
    "payload": {
    "foo": {
    "anykey": "anyvalue"
    }
    }
    }

    The Cumulus Message Adapter will resolve the task input, instead of sending the whole payload as task input, the task input would be:

        {
    "input" : {
    "anykey": "anyvalue"
    },
    "config": {...}
    }

    5. Resolve task output

    By default, the task's return value is the next payload. However, the workflow task configuration can specify a portion of the return value as the next payload, and can also augment values to other fields. Based on the task configuration under cma.task_config.cumulus_message.outputs, the Message Adapter uses a task's return value to output a message as configured by the task-specific config defined under cma.task_config. The Message Adapter dispatches a "source" to a "destination" as defined by URL templates stored in the task-specific cumulus_message.outputs. The value of the task's return value at the "source" URL is used to create or replace the value of the task's return value at the "destination" URL. For example, given a task specifies cumulus_message.output in its workflow configuration as follows:

    {
    "ExampleTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    }
    }
    }
    }
    }

    The corresponding Cumulus Message would be:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Given the response from the task is:

        {
    "output": {
    "anykey": "boo"
    }
    }

    The Cumulus Message Adapter would output the following Cumulus Message:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    6. Apply Remote Message Configuration

    If the ReplaceConfig configuration parameter is defined, the CMA will evaluate the configuration options provided, and if required write a portion of the Cumulus Message to S3, and add a replace key to the message for future steps to utilize.

    Please Note: the non user-modifiable field cumulus-meta will always be retained, regardless of the configuration.

    For example, if the output message (post output configuration) from a cumulus message looks like:

        {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    the resultant output would look like:

    {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "replace": {
    "TargetPath": "$",
    "Bucket": "some-internal-bucket",
    "Key": "events/some-event-id"
    }
    }

    Additional features

    Validate task input, output and configuration messages against the schemas provided

    The Cumulus Message Adapter has the capability to validate task input, output and configuration messages against their schemas. The default location of the schemas is the schemas folder in the top level of the task and the default filenames are input.json, output.json, and config.json. The task can also configure a different schema location. If no schema can be found, the Cumulus Message Adapter will not validate the messages.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/workflows/lambda/index.html b/docs/v9.0.0/workflows/lambda/index.html index 8469754d7c6..dfe85fe02f9 100644 --- a/docs/v9.0.0/workflows/lambda/index.html +++ b/docs/v9.0.0/workflows/lambda/index.html @@ -5,13 +5,13 @@ Develop Lambda Functions | Cumulus Documentation - +
    Version: v9.0.0

    Develop Lambda Functions

    Develop a new Cumulus Lambda

    AWS provides great getting started guide for building Lambdas in the developer guide.

    Cumulus currently supports the following environments for Cumulus Message Adapter enabled functions:

    Additionally you may chose to include any of the other languages AWS supports as a resource with reduced feature support.

    Deploy a Lambda

    Node.js Lambda

    For a new Node.js Lambda, create a new function and add an aws_lambda_function resource to your Cumulus deployment (for examples, see the example in source example/lambdas.tf and ingest/lambda-functions.tf) as either a new .tf file, or added to an existing .tf file:

    resource "aws_lambda_function" "myfunction" {
    function_name = "${var.prefix}-function"
    filename = "/path/to/zip/lambda.zip"
    source_code_hash = filebase64sha256("/path/to/zip/lambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"

    vpc_config {
    subnet_ids = var.subnet_ids
    security_group_ids = var.security_group_ids
    }
    }

    Please note: This example contains the minimum set of required configuration.

    Make sure to include a vpc_config that matches the information you've provided the cumulus module if intending to integrate the lambda with a Cumulus deployment.

    Java Lambda

    Java Lambdas are created in much the same way as the Node.js example above.

    The source points to a folder with the compiled .class files and dependency libraries in the Lambda Java zip folder structure (details here), not an uber-jar.

    The deploy folder referenced here would contain a folder 'test_task/task/' which contains Task.class and TaskLogic.class as well as a lib folder containing dependency jars.

    Python Lambda

    Python Lambdas are created the same way as the Node.js example above.

    Cumulus Message Adapter

    For Lambdas wishing to utilize the Cumulus Message Adapter(CMA), you should define a layers key on your Lambda resource with the CMA you wish to include. See the input_output docs for more on how to create/use the CMA.

    Other Lambda Options

    Cumulus supports all of the options available to you via the aws_lambda_function Terraform resource. For more information on what's available, check out the Terraform resource docs.

    Cloudwatch log groups

    If you want to enable Cloudwatch logging for your Lambda resource, you'll need to add a aws_cloudwatch_log_group resource to your Lambda definition:

    resource "aws_cloudwatch_log_group" "myfunction_log_group" {
    name = "/aws/lambda/${aws_lambda_function.myfunction.function_name}"
    retention_in_days = 30
    tags = { Deployment = var.prefix }
    }
    - + \ No newline at end of file diff --git a/docs/v9.0.0/workflows/protocol/index.html b/docs/v9.0.0/workflows/protocol/index.html index a27ce3af8de..b176ee932c5 100644 --- a/docs/v9.0.0/workflows/protocol/index.html +++ b/docs/v9.0.0/workflows/protocol/index.html @@ -5,13 +5,13 @@ Workflow Protocol | Cumulus Documentation - +
    Version: v9.0.0

    Workflow Protocol

    Configuration and Message Use Diagram

    A diagram showing at which point in a workflow the Cumulus message is checked for conformity with the message schema and where the configuration is checked for conformity with the configuration schema

    • Configuration - The Cumulus workflow configuration defines everything needed to describe an instance of Cumulus.
    • Scheduler - This starts ingest of a collection on configured intervals.
    • Input to Step Functions - The Scheduler uses the Configuration as source data to construct the input to the Workflow.
    • AWS Step Functions - Run the workflows as kicked off by the scheduler or other processes.
    • Input to Task - The input for each task is a JSON document that conforms to the message schema.
    • Output from Task - The output of each task must conform to the message schemas as well and is used as the input for the subsequent task.
    - + \ No newline at end of file diff --git a/docs/v9.0.0/workflows/workflow-configuration-how-to/index.html b/docs/v9.0.0/workflows/workflow-configuration-how-to/index.html index 990a06d42d7..c4104b38c75 100644 --- a/docs/v9.0.0/workflows/workflow-configuration-how-to/index.html +++ b/docs/v9.0.0/workflows/workflow-configuration-how-to/index.html @@ -5,7 +5,7 @@ Workflow Configuration How To's | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v9.0.0

    Workflow Configuration How To's

    How to specify a bucket for granules

    Bucket configuration

    Buckets configured in your deployment for the cumulus module's inputs will ultimately become part of the workflow configuration. The type property of a bucket relies on the how that bucket will be used:

    • public indicates a completely public bucket.
    • internal type is for system use.
    • protected buckets are for any information that should be behind an Earthdata Login authentication.
    • private buckets are for private data.

    Consider the following buckets configuration variable for the cumulus module for all following examples:

    buckets =  {
    internal = {
    name = "sample-internal-bucket",
    type = "internal"
    },
    private = {
    name = "sample-private-bucket",
    type = "private"
    },
    protected = {
    name = "sample-protected-bucket",
    type = "protected"
    },
    public = {
    name = "sample-public-bucket",
    type = "public"
    },
    protected-2 = {
    name = "sample-protected-bucket-2",
    type = "protected"
    }
    }

    Point to buckets in the workflow configuration

    Buckets specified in the buckets input variable to the cumulus module will be available in the meta object of the Cumulus message.

    To use the buckets specified in the configuration, you can do the following:

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "provider_path": "{$.meta.provider_path}",
    "collection": "{$.meta.collection}",
    "buckets": "{$.meta.buckets}"
    }
    }
    }
    }
    }

    Or, to map a specific bucket to a config value for a task:

    {
    "MoveGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "bucket": "{$.meta.buckets.internal.name}",
    "buckets": "{$.meta.buckets}"
    }
    }
    }
    }
    }

    Hardcode a bucket

    Bucket names can be hardcoded in your workflow configuration, for example:

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "provider_path": "{$.meta.provider_path}",
    "collection": "{$.meta.collection}",
    "buckets": {
    "internal": "sample-internal-bucket",
    "protected": "sample-protected-bucket-2"
    }
    }
    }
    }
    }
    }

    Or you can do a combination of meta buckets and hardcoded:

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "provider_path": "{$.meta.provider_path}",
    "collection": "{$.meta.collection}",
    "buckets": {
    "internal": "sample-internal-bucket",
    "private": "{$.meta.buckets.private.name}"
    }
    }
    }
    }
    }
    }

    Using meta and hardcoding

    Bucket names can be configured using a mixture of hardcoded values and values from the meta. For example, to configure the bucket based on the collection name you could do something like:

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "provider_path": "{$.meta.provider_path}",
    "collection": "{$.meta.collection}",
    "buckets": {
    "internal": "{$.meta.collection.name}-bucket"
    }
    }
    }
    }
    }
    }

    How to specify a file location in a bucket

    Granule files can be placed in folders and subfolders in buckets for better organization. This is done by setting a url_path in the base level of a collection configuration to be applied to all files. To only affect placement of a single file, the url_path variable can be placed in that specific file of the collection configuration. There are a number of different ways to populate url_path.

    Hardcoding file placement

    A file path can be added as the url_path in the collection configuration to specify the final location of the files. For example, take the following collection configuration

    {
    "name": "MOD09GQ",
    "version": "006",
    "url_path": "example-path",
    "files": [
    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "file-example-path"
    },
    {
    "bucket": "private",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    }
    ]
    }

    The first file, MOD09GQ.A2017025.h21v00.006.2017034065104.hdf has its own url_path so the resulting file path might look like s3://sample-protected-bucket/file-example-path/MOD09GQ.A2017025.h21v00.006.2017034065104.hdf. The second file, MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met, does not have it's own url_path so it will use the collection url_path and have a final file path of s3://sample-private-bucket/example-path/MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met.

    Using a template for file placement

    Instead of hardcoding the placement, the url_path can be a template to be populated with metadata during the move-granules step. For example:

    "url_path": "{cmrMetadata.Granule.Collection.ShortName}"

    This url path with be assigned as the collection shortname, "MOD09GQ". To take a subset of any given metadata, use the option substring.

    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{substring(file.name, 0, 3)}"

    This example will populate to "MOD09GQ/MOD"

    Note: the move-granules step needs to be in the workflow for this template to be populated and the file moved. This cmrMetadata or CMR granule XML needs to have been generated and stored on S3. From there any field could be retrieved and used for a url_path.

    Adding Metadata dates and times to the URL Path

    There are a number of options to pull dates from the CMR file metadata. With this metadata:

    <Granule>
    <Temporal>
    <RangeDateTime>
    <BeginningDateTime>2003-02-19T00:00:00Z</BeginningDateTime>
    <EndingDateTime>2003-02-19T23:59:59Z</EndingDateTime>
    </RangeDateTime>
    </Temporal>
    </Granule>

    The following examples of url_path could be used.

    {extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the year from the full date: 2003.

    {extractMonth(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the month: 2.

    {extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the day: 19.

    {extractHour(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the hour: 0.

    Different values can be combined to create the url_path. For example

    {
    "bucket": "sample-protected-bucket",
    "name": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)/extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"
    }

    The final file location for the above would be s3://sample-protected-bucket/MOD09GQ/2003/19/MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.

    - + \ No newline at end of file diff --git a/docs/v9.0.0/workflows/workflow-triggers/index.html b/docs/v9.0.0/workflows/workflow-triggers/index.html index 0537cbc1d9c..722bdeb5453 100644 --- a/docs/v9.0.0/workflows/workflow-triggers/index.html +++ b/docs/v9.0.0/workflows/workflow-triggers/index.html @@ -5,13 +5,13 @@ Workflow Triggers | Cumulus Documentation - +
    Version: v9.0.0

    Workflow Triggers

    For a workflow to run, it needs to be associated with a rule (see rule configuration). The rule configuration determines how and when a workflow execution is triggered. Rules can be triggered one time, on a schedule, or by new data written to a kinesis stream.

    There are three lambda functions in the API package responsible for scheduling and starting workflows: SF scheduler, message consumer, and SF starter. Each Cumulus instance comes with a Start SF SQS queue.

    The SF scheduler lambda puts a message onto the Start SF queue. This message is picked up the Start SF lambda and an execution is started with the body of the message as the input.

    When a one time rule is created, the schedule SF lambda is triggered. Rules that are not one time are associated with a CloudWatch event which will manage the trigger of the lambdas that trigger the workflows.

    For a scheduled rule, the Cloudwatch event is triggered on the given schedule which calls directly to the schedule SF lambda.

    For a kinesis rule, when data is added to the kinesis stream, the Cloudwatch event is triggered, which calls the message consumer lambda. The message consumer lambda parses the kinesis message and finds all of the rules associated with that message. For each rule (which corresponds to one workflow), the schedule SF lambda is triggered to queue a message to start the workflow.

    For an sns rule, when a message is published to the SNS topic, the message consumer receives the SNS message (JSON expected), parses it into an object, starts a new execution of the workflow associated with the rule and passes the object in the payload field of the Cumulus message.

    Diagram showing how workflows are scheduled via rules

    - + \ No newline at end of file diff --git a/docs/v9.9.0/adding-a-task/index.html b/docs/v9.9.0/adding-a-task/index.html index c039152133a..f699d7e3240 100644 --- a/docs/v9.9.0/adding-a-task/index.html +++ b/docs/v9.9.0/adding-a-task/index.html @@ -5,13 +5,13 @@ Contributing a Task | Cumulus Documentation - +
    Version: v9.9.0

    Contributing a Task

    We're tracking reusable Cumulus tasks in this list and, if you've got one you'd like to share with others, you can add it!

    Right now we're focused on tasks distributed via npm, but are open to including others. For now the script that pulls all the data for each package only supports npm.

    The tasks.md file is generated in the build process

    The tasks list in docs/tasks.md is generated from the list of task package names from the tasks folder.

    Do not edit the docs/tasks.md file directly.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/api/index.html b/docs/v9.9.0/api/index.html index 51cbe2d263c..37446b95ff9 100644 --- a/docs/v9.9.0/api/index.html +++ b/docs/v9.9.0/api/index.html @@ -5,13 +5,13 @@ Cumulus API | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v9.9.0/architecture/index.html b/docs/v9.9.0/architecture/index.html index a937eac2064..57457c7fa91 100644 --- a/docs/v9.9.0/architecture/index.html +++ b/docs/v9.9.0/architecture/index.html @@ -5,14 +5,14 @@ Architecture | Cumulus Documentation - +
    Version: v9.9.0

    Architecture

    Architecture

    Below, find a diagram with the components that comprise an instance of Cumulus.

    Architecture diagram of a Cumulus deployment

    This diagram details all of the major architectural components of a Cumulus deployment.

    While the diagram can feel complex, it can easily be digested in several major components:

    Data Distribution

    End Users can access data via Cumulus's distribution submodule, which includes ASF's thin egress application, this provides authenticated data egress, temporary S3 links and other statistics features.

    End user exposure of Cumulus's holdings is expected to be provided by an external service.

    For NASA use, this is assumed to be CMR in this diagram.

    Data ingest

    Workflows

    The core of the ingest and processing capabilities in Cumulus is built into the deployed AWS Step Function workflows. Cumulus rules trigger workflows via either Cloud Watch rules, Kinesis streams, SNS topic, or SQS queue. The workflows then run with a configured Cumulus message, utilizing built-in processes to report status of granules, PDRs, executions, etc to the Data Persistence components.

    Workflows can optionally report granule metadata to CMR, and workflow steps can report metrics information to a shared SNS topic, which could be subscribed to for near real time granule, execution, and PDR status. This could be used for metrics reporting using an external ELK stack, for example.

    Data persistence

    Cumulus entity state data is stored in a set of DynamoDB database tables, and is exported to an ElasticSearch instance for non-authoritative querying/state data for the API and other applications that require more complex queries.

    Data discovery

    Discovering data for ingest is handled via workflow step components using Cumulus provider and collection configurations and various triggers. Data can be ingested from AWS S3, FTP, HTTPS and more.

    Database

    Cumulus utilizes a user-provided PostgreSQL database backend. For improved API search query efficiency Cumulus provides data replication to an Elasticsearch instance. For legacy reasons, Cumulus is currently also deploying a DynamoDB datastore, and writes are replicated in parallel with the PostgreSQL database writes. The DynamoDB replicated tables and parallel writes will be removed in future releases.

    PostgreSQL Database Schema Diagram

    ERD of the Cumulus Database

    Maintenance

    System maintenance personnel have access to manage ingest and various portions of Cumulus via an AWS API gateway, as well as the operator dashboard.

    Deployment Structure

    Cumulus is deployed via Terraform and is organized internally into two separate top-level modules, as well as several external modules.

    Cumulus

    The Cumulus module, which contains multiple internal submodules, deploys all of the Cumulus components that are not part of the Data Persistence portion of this diagram.

    Data persistence

    The data persistence module provides the Data Persistence portion of the diagram.

    Other modules

    Other modules are provided as artifacts on the release page for use in users configuring their own deployment and contain extracted subcomponents of the cumulus module. For more on these components see the components documentation.

    For more on the specific structure, examples of use and how to deploy and more, please see the deployment docs as well as the cumulus-template-deploy repo .

    - + \ No newline at end of file diff --git a/docs/v9.9.0/configuration/cloudwatch-retention/index.html b/docs/v9.9.0/configuration/cloudwatch-retention/index.html index dfc5288ae3d..2ffddd41ddc 100644 --- a/docs/v9.9.0/configuration/cloudwatch-retention/index.html +++ b/docs/v9.9.0/configuration/cloudwatch-retention/index.html @@ -5,13 +5,13 @@ Cloudwatch Retention | Cumulus Documentation - +
    Version: v9.9.0

    Cloudwatch Retention

    Our lambdas dump logs to AWS CloudWatch. By default, these logs exist indefinitely. However, there are ways to specify a duration for log retention.

    aws-cli

    In addition to getting your aws-cli set-up, there are two values you'll need to acquire.

    1. log-group-name: the name of the log group who's retention policy (retention time) you'd like to change. We'll use /aws/lambda/KinesisInboundLogger in our examples.
    2. retention-in-days: the number of days you'd like to retain the logs in the specified log group for. There is a list of possible values available in the aws logs documentation.

    For example, if we wanted to set log retention to 30 days on our KinesisInboundLogger lambda, we would write:

    aws logs put-retention-policy --log-group-name "/aws/lambda/KinesisInboundLogger" --retention-in-days 30

    Note: The aws-cli log command that we're using is explained in detail here.

    AWS Management Console

    Changing the log retention policy in the AWS Management Console is a fairly simple process:

    1. Navigate to the CloudWatch service in the AWS Management Console.
    2. Click on the Logs entry on the sidebar.
    3. Find the Log Group who's retention policy you're interested in changing.
    4. Click on the value in the Expire Events After column.
    5. Enter/Select the number of days you'd like to retain logs in that log group for.

    Screenshot of AWS console showing how to configure the retention period for Cloudwatch logs

    - + \ No newline at end of file diff --git a/docs/v9.9.0/configuration/collection-storage-best-practices/index.html b/docs/v9.9.0/configuration/collection-storage-best-practices/index.html index 9057c89fa6b..02b4b0ec649 100644 --- a/docs/v9.9.0/configuration/collection-storage-best-practices/index.html +++ b/docs/v9.9.0/configuration/collection-storage-best-practices/index.html @@ -5,13 +5,13 @@ Collection Cost Tracking and Storage Best Practices | Cumulus Documentation - +
    Version: v9.9.0

    Collection Cost Tracking and Storage Best Practices

    Organizing your data is important for metrics you may want to collect. AWS S3 storage and cost metrics are calculated at the bucket level, so it is easy to get metrics by bucket. You can get storage metrics at the key prefix level, but that is done through the CLI, which can be very slow for large buckets. It is very difficult to estimate costs at the prefix level.

    Calculating Storage By Collection

    By bucket

    Usage by bucket can be obtained in your AWS Billing Dashboard via an S3 Usage Report. You can download your usage report for a period of time and review your storage and requests at the bucket level.

    Bucket metrics can also be found in the AWS CloudWatch Metrics Console (also see Using Amazon CloudWatch Metrics).

    Navigate to Storage Metrics and select the BucketName for all buckets you are interested in. The available metrics are BucketSizeInBytes and NumberOfObjects.

    In the Graphed metrics tab, you can select the type of statistic (i.e. average, minimum, maximum) and the period for the stats. At the top, it's useful to select from the dropdown to view the metrics as a number. You can also select the time period for which you want to see stats.

    Alternatively you can query CloudWatch using the CLI.

    This command will return the average number of bytes in the bucket test-bucket for 7/31/2019:

    aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2019-07-31T00:00:00 --end-time 2019-08-01T00:00:00 --period 86400 --statistics Average --region us-east-1 --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=test-bucket Name=StorageType,Value=StandardStorage

    The result looks like:

    {
    "Datapoints": [
    {
    "Timestamp": "2019-07-31T00:00:00Z",
    "Average": 150996467959.0,
    "Unit": "Bytes"
    }
    ],
    "Label": "BucketSizeBytes"
    }

    By key prefix

    AWS does not offer storage and usage statistics at a key prefix level. Via the AWS CLI, you can get the total storage for a bucket or folder. The following command would get the storage for folder example-folder in bucket sample-bucket:

    aws s3 ls --summarize --human-readable --recursive s3://sample-bucket/example-folder | grep 'Total'

    Note that this can be a long-running operation for large buckets.

    Calculating Cost By Collection

    NASA NGAP Environment

    If using an NGAP account, the cost per bucket can be found in your CloudTamer console, in the Financials section of your account information. This is calculated on a monthly basis.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Outside of NGAP

    You can enabled S3 Cost Allocation Tags and tag your buckets. From there, you can view the cost breakdown in your AWS Billing Dashboard via the Cost Explorer. Cost Allocation Tagging is available at the bucket level.

    There is no easy way to get the cost by folder in the buckets. You could calculate an estimate using the storage per prefix vs. the storage of the bucket.

    Storage Configuration

    Cumulus allows for the configuration of many buckets for your files. Buckets are created and added to your deployment as part of the deployment process.

    In your Cumulus collection configuration, you specify where you want the files to be stored post-processing. This is done by matching a regular expression on the file with the configured bucket.

    Note that in the collection configuration, the bucket field is the key to the buckets variable in the deployment's .tfvars file.

    Organizing By Bucket

    You can specify separate groups of buckets for each collection, which could look like the example below.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "MOD09GQ-006-private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "MOD09GQ-006-protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "MOD09GQ-006-public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    Additional collections would go to different buckets.

    Organizing by Key Prefix

    Different collections can be organized into different folders in the same bucket, using the key prefix, which is specified as the url_path in the collection configuration. In this simplified collection configuration example, the url_path field is set at the top level so that all files go to a path prefixed with the collection name and version.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met"
    },
    {
    "bucket": "protected",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg"
    }
    ]
    }

    In this case, the path to all the files would be: MOD09GQ___006/<filename> in their respective buckets.

    The url_path can be overidden directly on the file configuration. The example below produces the same result.

    {
    "name": "MOD09GQ",
    "version": "006",
    "granuleId": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^.*\\.hdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "private",
    "regex": "^.*\\.hdf\\.met$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.met",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "protected-2",
    "regex": "^.*\\.cmr\\.xml$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104.cmr.xml",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    },
    {
    "bucket": "public",
    "regex": "^*\\.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_ndvi.jpg",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}"
    }
    ]
    }
    - + \ No newline at end of file diff --git a/docs/v9.9.0/configuration/data-management-types/index.html b/docs/v9.9.0/configuration/data-management-types/index.html index 8bc4ca5973d..661a557849c 100644 --- a/docs/v9.9.0/configuration/data-management-types/index.html +++ b/docs/v9.9.0/configuration/data-management-types/index.html @@ -5,13 +5,13 @@ Cumulus Data Management Types | Cumulus Documentation - +
    Version: v9.9.0

    Cumulus Data Management Types

    What Are The Cumulus Data Management Types

    • Collections: Collections are logical sets of data objects of the same data type and version. They provide contextual information used by Cumulus ingest.
    • Granules: Granules are the smallest aggregation of data that can be independently managed. They are always associated with a collection, which is a grouping of granules.
    • Providers: Providers generate and distribute input data that Cumulus obtains and sends to workflows.
    • Rules: Rules tell Cumulus how to associate providers and collections and when/how to start processing a workflow.
    • Workflows: Workflows are composed of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage, and archive data.
    • Executions: Executions are records of a workflow.
    • Reconciliation Reports: Reports are a comparison of data sets to check to see if they are in agreement and to help Cumulus users detect conflicts.

    Interaction

    • Providers tell Cumulus where to get new data - i.e. S3, HTTPS
    • Collections tell Cumulus where to store the data files
    • Rules tell Cumulus when to trigger a workflow execution and tie providers and collections together

    Managing Data Management Types

    The following are created via the dashboard or API:

    • Providers
    • Collections
    • Rules
    • Reconciliation reports

    Granules are created by workflow executions and then can be managed via the dashboard or API.

    An execution record is created for each workflow execution triggered and can be viewed in the dashboard or data can be retrieved via the API.

    Workflows are created and managed via the Cumulus deployment.

    Configuration Fields

    Schemas

    Looking at our API schema definitions can provide us with some insight into collections, providers, rules, and their attributes (and whether those are required or not). The schema for different concepts will be reference throughout this document.

    The schemas are extremely useful for understanding which attributes are configurable and which of those are required. Cumulus uses these schemas for validation.

    Providers

    Please note:

    • While connection configuration is defined here, things that are more specific to a specific ingest setup (e.g. 'What target directory should we be pulling from' or 'How is duplicate handling configured?') are generally defined in a Rule or Collection, not the Provider.
    • There is some provider behavior which is controlled by task-specific configuration and not the provider definition. This configuration has to be set on a per-workflow basis. For example, see the httpListTimeout configuration on the discover-granules task

    Provider Configuration

    The Provider configuration is defined by a JSON object that takes different configuration keys depending on the provider type. The following are definitions of typical configuration values relevant for the various providers:

    Configuration by provider type
    S3
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be s3 for this provider type.
    hoststringYesS3 Bucket to pull data from
    http
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be http for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 80
    https
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be https for this provider type
    hoststringYesThe host to pull data from (e.g. nasa.gov)
    usernamestringNoConfigured username for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    passwordstringOnly if username is specifiedConfigured password for basic authentication. Cumulus encrypts this using KMS and uses it in a Basic auth header if needed for authentication
    portintegerNoPort to connect to the provider on. Defaults to 443
    ftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be ftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to anonymous if not defined
    passwordstringNoPassword to use to connect to the ftp server. Cumulus encrypts this using KMS. Defaults to password if not defined
    portintegerNoPort to connect to the provider on. Defaults to 21
    sftp
    KeyTypeRequiredDescription
    idstringYesUnique identifier for the provider
    globalConnectionLimitintegerNoInteger specifying the connection limit for the provider. This is the maximum number of connections Cumulus compatible ingest lambdas are expected to make to a provider. Defaults to unlimited
    protocolstringYesThe protocol for this provider. Must be sftp for this provider type
    hoststringYesThe ftp host to pull data from (e.g. nasa.gov)
    usernamestringNoUsername to use to connect to the sftp server.
    passwordstringNoPassword to use to connect to the sftp server.
    portintegerNoPort to connect to the provider on. Defaults to 22

    Collections

    Break down of [s3_MOD09GQ_006.json](https://github.com/nasa/cumulus/blob/master/example/data/collections/s3_MOD09GQ_006/s3_MOD09GQ_006.json)
    KeyValueRequiredDescription
    name"MOD09GQ"YesThe name attribute designates the name of the collection. This is the name under which the collection will be displayed on the dashboard
    version"006"YesA version tag for the collection
    granuleId"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}$"YesThe regular expression used to validate the granule ID extracted from filenames according to the granuleIdExtraction
    granuleIdExtraction"(MOD09GQ\..*)(\.hdf|\.cmr|_ndvi\.jpg)"YesThe regular expression used to extract the granule ID from filenames. The first capturing group extracted from the filename by the regex will be used as the granule ID.
    sampleFileName"MOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesAn example filename belonging to this collection
    files<JSON Object> of files defined hereYesDescribe the individual files that will exist for each granule in this collection (size, browse, meta, etc.)
    dataType"MOD09GQ"NoCan be specified, but this value will default to the collection_name if not
    duplicateHandling"replace"No("replace"|"version"|"skip") determines granule duplicate handling scheme
    ignoreFilesConfigForDiscoveryfalse (default)NoBy default, during discovery only files that match one of the regular expressions in this collection's files attribute (see above) are ingested. Setting this to true will ignore the files attribute during discovery, meaning that all files for a granule (i.e., all files with filenames matching granuleIdExtraction) will be ingested even when they don't match a regular expression in the files attribute at discovery time. (NOTE: this attribute does not appear in the example file, but is listed here for completeness.)
    process"modis"NoExample options for this are found in the ChooseProcess step definition in the IngestAndPublish workflow definition
    meta<JSON Object> of MetaData for the collectionNoMetaData for the collection. This metadata will be available to workflows for this collection via the Cumulus Message Adapter.
    url_path"{cmrMetadata.Granule.Collection.ShortName}/
    {substring(file.name, 0, 3)}"
    NoFilename without extension

    files-object

    KeyValueRequiredDescription
    regex"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"YesRegular expression used to identify the file
    sampleFileNameMOD09GQ.A2017025.h21v00.006.2017034065104.hdf"YesFilename used to validate the provided regex
    type"data"NoValue to be assigned to the Granule File Type. CNM types are used by Cumulus CMR steps, non-CNM values will be treated as 'data' type. Currently only utilized in DiscoverGranules task
    bucket"internal"YesName of the bucket where the file will be stored
    url_path"${collectionShortName}/{substring(file.name, 0, 3)}"NoFolder used to save the granule in the bucket. Defaults to the collection url_path
    checksumFor"^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\.hdf$"NoIf this is a checksum file, set checksumFor to the regex of the target file.

    Rules

    Rules are used by to start processing workflows and the transformation process. Rules can be invoked manually, based on a schedule, or can be configured to be triggered by either events in Kinesis, SNS messages or SQS messages.

    Rule configuration
    KeyValueRequiredDescription
    name"L2_HR_PIXC_kinesisRule"YesName of the rule. This is the name under which the rule will be listed on the dashboard
    workflow"CNMExampleWorkflow"YesName of the workflow to be run. A list of available workflows can be found on the Workflows page
    provider"PODAAC_SWOT"NoConfigured provider's ID. This can be found on the Providers dashboard page
    collection<JSON Object> collection object shown belowYesName and version of the collection this rule will moderate. Relates to a collection configured and found in the Collections page
    payload<JSON Object or Array>NoThe payload to be passed to the workflow
    meta<JSON Object> of MetaData for the ruleNoMetaData for the rule. This metadata will be available to workflows for this rule via the Cumulus Message Adapter.
    rule<JSON Object> rule type and associated values - discussed belowYesObject defining the type and subsequent attributes of the rule
    state"ENABLED"No("ENABLED"|"DISABLED") whether or not the rule will be active. Defaults to "ENABLED".
    queueUrlhttps://sqs.us-east-1.amazonaws.com/1234567890/queue-nameNoURL for SQS queue that will be used to schedule workflows for this rule
    tags["kinesis", "podaac"]NoAn array of strings that can be used to simplify search

    collection-object

    KeyValueRequiredDescription
    name"L2_HR_PIXC"YesName of a collection defined/configured in the Collections dashboard page
    version"000"YesVersion number of a collection defined/configured in the Collections dashboard page

    meta-object

    KeyValueRequiredDescription
    retries3NoNumber of retries on errors, for sqs-type rule only. Defaults to 3.
    visibilityTimeout900NoVisibilityTimeout in seconds for the inflight messages, for sqs-type rule only. Defaults to the visibility timeout of the SQS queue when the rule is created.

    rule-object

    KeyValueRequiredDescription
    type"kinesis"Yes("onetime"|"scheduled"|"kinesis"|"sns"|"sqs") type of scheduling/workflow kick-off desired
    value<String> ObjectDependsDiscussion of valid values is below

    rule-value

    The rule - value entry depends on the type of run:

    • If this is a onetime rule this can be left blank. Example
    • If this is a scheduled rule this field must hold a valid cron-type expression or rate expression.
    • If this is a kinesis rule, this must be a configured ${Kinesis_stream_ARN}. Example
    • If this is an sns rule, this must be an existing ${SNS_Topic_Arn}. Example
    • If this is an sqs rule, this must be an existing ${SQS_QueueUrl} that your account has permissions to access, and also you must configure a dead-letter queue for this SQS queue. Example

    sqs-type rule features

    • When an SQS rule is triggered, the SQS message remains on the queue.
    • The SQS message is not processed multiple times in parallel when visibility timeout is properly set. You should set the visibility timeout to the maximum expected length of the workflow with padding. Longer is better to avoid parallel processing.
    • The SQS message visibility timeout can be overridden by the rule.
    • Upon successful workflow execution, the SQS message is removed from the queue.
    • Upon failed execution(s), the workflow is run 3 or configured number of times.
    • Upon failed execution(s), the visibility timeout will be set to 5s to allow retries.
    • After configured number of failed retries, the SQS message is moved to the dead-letter queue configured for the SQS queue.

    Configuration Via Cumulus Dashboard

    Create A Provider

    • In the Cumulus dashboard, go to the Provider page.

    Screenshot of Create Provider form

    • Click on Add Provider.
    • Fill in the form and then submit it.

    Screenshot of Create Provider form

    Create A Collection

    • Go to the Collections page.

    Screenshot of the Collections page

    • Click on Add Collection.
    • Copy and paste or fill in the collection JSON object form.

    Screenshot of Add Collection form

    • Once you submit the form, you should be able to verify that your new collection is in the list.

    Create A Rule

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Rule Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v9.9.0/configuration/lifecycle-policies/index.html b/docs/v9.9.0/configuration/lifecycle-policies/index.html index 17a3b2463b5..88254908492 100644 --- a/docs/v9.9.0/configuration/lifecycle-policies/index.html +++ b/docs/v9.9.0/configuration/lifecycle-policies/index.html @@ -5,13 +5,13 @@ Setting S3 Lifecycle Policies | Cumulus Documentation - +
    Version: v9.9.0

    Setting S3 Lifecycle Policies

    This document will outline, in brief, how to set data lifecycle policies so that you are more easily able to control data storage costs while keeping your data accessible. For more information on why you might want to do this, see the 'Additional Information' section at the end of the document.

    Requirements

    • The AWS CLI installed and configured (if you wish to run the CLI example). See AWS's guide to setting up the AWS CLI for more on this. Please ensure the AWS CLI is in your shell path.
    • You will need a S3 bucket on AWS. You are strongly encouraged to use a bucket without voluminous amounts of data in it for experimenting/learning.
    • An AWS user with the appropriate roles to access the target bucket as well as modify bucket policies.

    Examples

    Walkthrough on setting time-based S3 Infrequent Access (S3IA) bucket policy

    This example will give step-by-step instructions on updating a bucket's lifecycle policy to move all objects in the bucket from the default storage to S3 Infrequent Access (S3IA) after a period of 90 days. Below are instructions for walking through configuration via the command line and the management console.

    Command Line

    Please ensure you have the AWS CLI installed and configured for access prior to attempting this example.

    Create policy

    From any directory you chose, open an editor and add the following to a file named exampleRule.json

    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    Set policy

    On the command line run the following command (with the bucket you're working with substituted in place of yourBucketNameHere).

    aws s3api put-bucket-lifecycle-configuration --bucket yourBucketNameHere --lifecycle-configuration file://exampleRule.json

    Verify policy has been set

    To obtain all of the existing policies for a bucket, run the following command (again substituting the correct bucket name):

     $ aws s3api get-bucket-lifecycle-configuration --bucket yourBucketNameHere
    {
    "Rules": [
    {
    "Status": "Enabled",
    "Filter": {
    "Prefix": ""
    },
    "Transitions": [
    {
    "Days": 90,
    "StorageClass": "STANDARD_IA"
    }
    ],
    "NoncurrentVersionTransitions": [
    {
    "NoncurrentDays": 90,
    "StorageClass": "STANDARD_IA"
    }
    ]
    "ID": "90DayS3IAExample"
    }
    ]
    }

    You have set a policy that transitions any version of an object in the bucket to S3IA after each object version has not been modified for 90 days.

    Management Console

    Create Policy

    To create the example policy on a bucket via the management console, go to the following URL (replacing 'yourBucketHere' with the bucket you intend to update):

    https://s3.console.aws.amazon.com/s3/buckets/yourBucketHere/?tab=overview

    You should see a screen similar to:

    Screenshot of AWS console for an S3 bucket

    Click the "Management" Tab, then lifecycle button and press + Add lifecycle rule:

    Screenshot of &quot;Management&quot; tab of AWS console for an S3 bucket

    Give the rule a name (e.g. '90DayRule'), leaving the filter blank:

    Screenshot of window for configuring the name and scope of a lifecycle rule on an S3 bucket in the AWS console

    Click next, and mark Current Version and Previous Versions.

    Then for each, click + Add transition and select Transition to Standard-IA after for the Object creation field, and set 90 for the Days after creation/Days after objects become concurrent field. Your screen should look similar to:

    Screenshot of window for configuring the storage class transitions of a lifecycle rule on an S3 bucket in the AWS console

    Click next, then next past the Configure expiration screen (we won't be setting this), and on the fourth page, click Save:

    Screenshot of window for reviewing the configuration of a lifecycle rule on an S3 bucket in the AWS console

    You should now see you have a rule configured for your bucket:

    Screenshot of lifecycle rule appearing in the &quot;Management&quot; tab of AWS console for an S3 bucket

    You have now set a policy that transitions any version of an object in the bucket to S3IA after each object has not been modified for 90 days.

    Additional Information

    This section lists information you may want prior to enacting lifecycle policies. It is not required content for working through the examples.

    Strategy Overview

    For a discussion of overall recommended strategy, please review the Methodology for Data Lifecycle Management on the EarthData wiki.

    AWS Documentation

    The examples shown in this document are obviously fairly basic cases. By using object tags, filters and other configuration options you can enact far more complicated policies for various scenarios. For more reading on the topics presented on this page see:

    - + \ No newline at end of file diff --git a/docs/v9.9.0/configuration/monitoring-readme/index.html b/docs/v9.9.0/configuration/monitoring-readme/index.html index c16f78f33b4..ae513351c5e 100644 --- a/docs/v9.9.0/configuration/monitoring-readme/index.html +++ b/docs/v9.9.0/configuration/monitoring-readme/index.html @@ -5,14 +5,14 @@ Monitoring Best Practices | Cumulus Documentation - +
    Version: v9.9.0

    Monitoring Best Practices

    This document intends to provide a set of recommendations and best practices for monitoring the state of a deployed Cumulus and diagnosing any issues.

    Cumulus-provided resources and integrations for monitoring

    Cumulus provides a number set of resources that are useful for monitoring the system and its operation.

    Cumulus Dashboard

    The primary tool for monitoring the Cumulus system is the Cumulus Dashboard. The dashboard is hosted on Github and includes instructions on how to deploy and link it into your core Cumulus deployment.

    The dashboard displays workflow executions, their status, inputs, outputs, and some diagnostic information such as logs. For further information on the dashboard, its usage, and the information it provides, see the documentation.

    Cumulus-provided AWS resources

    Cumulus sets up CloudWatch log groups for all Core-provided tasks.

    Monitoring Lambda Functions

    Logging for each Lambda Function is available in Lambda-specific CloudWatch log groups.

    Monitoring ECS services

    Each deployed cumulus_ecs_service module also includes a CloudWatch log group for the processes running on ECS.

    Monitoring workflows

    For advanced debugging, we also configure dead letter queues on critical system functions. These will allow you to monitor and debug invalid inputs to the functions we use to start workflows, which can be helpful if you find that you are not seeing workflows being started as expected. More information on these can be found in the dead letter queue documentation

    AWS recommendations

    AWS has a number of recommendations on system monitoring. Rather than reproduce those here and risk providing outdated guidance, we've documented the following links which will take you to available AWS docs on monitoring recommendations and best practices for the services used in Cumulus:

    Example: Setting up email notifications for CloudWatch logs

    Cumulus does not provide out-of-the-box support for email notifications at this time. However, setting up email notifications on AWS is fairly straightforward in that the operative components are an AWS SNS topic and a subscribed email address.

    In terms of Cumulus integration, forwarding CloudWatch logs requires creating a mechanism, most likely a Lambda Function subscribed to the log group that will receive, filter and forward these messages to the SNS topic.

    As a very simple example, we could create a function that filters CloudWatch logs created by the @cumulus/logger package and sends email notifications for error and fatal log levels, adapting the example linked above:

    const zlib = require('zlib');
    const aws = require('aws-sdk');
    const { promisify } = require('util');

    const gunzip = promisify(zlib.gunzip);
    const sns = new aws.SNS();

    exports.handler = async (event) => {
    const payload = Buffer.from(event.awslogs.data, 'base64');
    const decompressedData = await gunzip(payload);
    const logData = JSON.parse(decompressedData.toString('ascii'));
    return await Promise.all(logData.logEvents.map(async (logEvent) => {
    const logMessage = JSON.parse(logEvent.message);
    if (['error', 'fatal'].includes(logMessage.level)) {
    return sns.publish({
    TopicArn: process.env.EmailReportingTopicArn,
    Message: logEvent.message
    }).promise();
    }
    return Promise.resolve();
    }));
    };

    After creating the SNS topic, We can deploy this code as a lambda function, following the setup steps from Amazon. Make sure to include your SNS topic ARN as an environment variable on the lambda function by using the --environment option on aws lambda create-function.

    You will need to create subscription filters for each log group you want to receive emails for. We recommend automating this as much as possible, and you could very well handle this via Terraform, such as using a module to deploy filters alongside log groups, or exporting the log group names to an all-in-one email notification module.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/configuration/server_access_logging/index.html b/docs/v9.9.0/configuration/server_access_logging/index.html index ae33f91806a..4909f41dc09 100644 --- a/docs/v9.9.0/configuration/server_access_logging/index.html +++ b/docs/v9.9.0/configuration/server_access_logging/index.html @@ -5,13 +5,13 @@ S3 Server Access Logging | Cumulus Documentation - +
    Version: v9.9.0

    S3 Server Access Logging

    Via AWS Console

    Enable server access logging for an S3 bucket

    Via AWS Command Line Interface

    1. Create a logging.json file with these contents, replacing <stack-internal-bucket> with your stack's internal bucket name, and <stack> with the name of your cumulus stack.

      {
      "LoggingEnabled": {
      "TargetBucket": "<stack-internal-bucket>",
      "TargetPrefix": "<stack>/ems-distribution/s3-server-access-logs/"
      }
      }
    2. Add the logging policy to each of your protected and public buckets by calling this command on each bucket.

      aws s3api put-bucket-logging --bucket <protected/public-bucket-name> --bucket-logging-status file://logging.json
    3. Verify the logging policy exists on your buckets.

      aws s3api get-bucket-logging --bucket <protected/public-bucket-name>
    - + \ No newline at end of file diff --git a/docs/v9.9.0/configuration/task-configuration/index.html b/docs/v9.9.0/configuration/task-configuration/index.html index 9eb6598be8d..139affa47e2 100644 --- a/docs/v9.9.0/configuration/task-configuration/index.html +++ b/docs/v9.9.0/configuration/task-configuration/index.html @@ -5,13 +5,13 @@ Configuration of Tasks | Cumulus Documentation - +
    Version: v9.9.0

    Configuration of Tasks

    The cumulus module exposes values for configuration for some of the provided archive and ingest tasks. Currently the following are available as configurable variables:

    elasticsearch_client_config

    Configuration parameters for Elasticsearch client for cumulus archive module tasks in the form:

    <lambda_identifier>_es_scroll_duration = <duration>
    <lambda_identifier>_es_scroll_size = <size>
    type = map(string)

    Currently the following values are supported:

    • create_reconciliation_report_es_scroll_duration
    • create_reconciliation_report_es_scroll_size

    Example

    elasticsearch_client_config = {
    create_reconciliation_report_es_scroll_duration = "15m"
    create_reconciliation_report_es_scroll_size = 2000
    }

    lambda_timeouts

    A configurable map of timeouts (in seconds) for cumulus ingest module task lambdas in the form:

    <lambda_identifier>_timeout: <timeout>
    type = map(string)

    Currently the following values are supported:

    • discover_granules_task_timeout
    • discover_pdrs_task_timeout
    • hyrax_metadata_update_tasks_timeout
    • lzards_backup_task_timeout
    • move_granules_task_timeout
    • parse_pdr_task_timeout
    • pdr_status_check_task_timeout
    • post_to_cmr_task_timeout
    • queue_granules_task_timeout
    • queue_pdrs_task_timeout
    • queue_workflow_task_timeout
    • sync_granule_task_timeout
    • update_granules_cmr_metadata_file_links_task_timeout

    Example

    lambda_timeouts = {
    discover_granules_task_timeout = 300
    }
    - + \ No newline at end of file diff --git a/docs/v9.9.0/data-cookbooks/about-cookbooks/index.html b/docs/v9.9.0/data-cookbooks/about-cookbooks/index.html index 85692251c3b..1c48da66297 100644 --- a/docs/v9.9.0/data-cookbooks/about-cookbooks/index.html +++ b/docs/v9.9.0/data-cookbooks/about-cookbooks/index.html @@ -5,13 +5,13 @@ About Cookbooks | Cumulus Documentation - +
    Version: v9.9.0

    About Cookbooks

    Introduction

    The following data cookbooks are documents containing examples and explanations of workflows in the Cumulus framework. Additionally, the following data cookbooks should serve to help unify an institution/user group on a set of terms.

    Setup

    The data cookbooks assume you can configure providers, collections, and rules to run workflows. Visit Cumulus data management types for information on how to conifgure Cumulus data management types.

    Adding a page

    As shown in detail in the "Add a New Page and Sidebars" section in Cumulus Docs: How To's, you can add a new page to the data cookbook by creating a markdown (.md) file in the docs/data-cookbooks directory. The new page can then be linked to the sidebar by adding it to the Data-Cookbooks object in the website/sidebar.json file as data-cookbooks/${id}.

    More about workflows

    Workflow general information

    Input & Output

    Developing Workflow Tasks

    Workflow Configuration How-to's

    - + \ No newline at end of file diff --git a/docs/v9.9.0/data-cookbooks/browse-generation/index.html b/docs/v9.9.0/data-cookbooks/browse-generation/index.html index 87cca8c7342..f72c4372b02 100644 --- a/docs/v9.9.0/data-cookbooks/browse-generation/index.html +++ b/docs/v9.9.0/data-cookbooks/browse-generation/index.html @@ -5,7 +5,7 @@ Ingest Browse Generation | Cumulus Documentation - + @@ -15,7 +15,7 @@ provider keys with the previously entered values) Note that you need to set the "provider_path" to the path on your bucket (e.g. "/data") that you've staged your mock/test data.:

    {
    "name": "TestBrowseGeneration",
    "workflow": "DiscoverGranulesBrowseExample",
    "provider": "{{provider_from_previous_step}}",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "meta": {
    "provider_path": "{{path_to_data}}"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "updatedAt": 1553053438767
    }

    Run Workflows

    Once you've configured the Collection and Provider and added a onetime rule, you're ready to trigger your rule, and watch the ingest workflows process.

    Go to the Rules tab, click the rule you just created:

    Screenshot of the Rules overview page with a list of rules in the Cumulus dashboard

    Then click the gear in the upper right corner and click "Rerun":

    Screenshot of clicking the button to rerun a workflow rule from the rule edit page in the Cumulus dashboard

    Tab over to executions and you should see the DiscoverGranulesBrowseExample workflow run, succeed, and then moments later the CookbookBrowseExample should run and succeed.

    Screenshot of page listing executions in the Cumulus dashboard

    Results

    You can verify your data has ingested by clicking the successful workflow entry:

    Screenshot of individual entry from table listing executions in the Cumulus dashboard

    Select "Show Output" on the next page

    Screenshot of &quot;Show output&quot; button from individual execution page in the Cumulus dashboard

    and you should see in the payload from the workflow something similar to:

    "payload": {
    "process": "modis",
    "granules": [
    {
    "files": [
    {
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "filepath": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "bucket": "cumulus-test-sandbox-protected",
    "filename": "s3://cumulus-test-sandbox-protected/MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "time": 1553027415000,
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.name, 0, 3)}",
    "duplicate_found": true,
    "size": 1908635
    },
    {
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "filepath": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-private",
    "filename": "s3://cumulus-test-sandbox-private/MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "time": 1553027412000,
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.name, 0, 3)}",
    "duplicate_found": true,
    "size": 21708
    },
    {
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "filepath": "MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "type": "browse",
    "bucket": "cumulus-test-sandbox-protected",
    "filename": "s3://cumulus-test-sandbox-protected/MOD09GQ___006/2017/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "time": 1553027415000,
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.name, 0, 3)}",
    "duplicate_found": true,
    "size": 1908635
    },
    {
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "filepath": "MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-protected-2",
    "filename": "s3://cumulus-test-sandbox-protected-2/MOD09GQ___006/MOD/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.name, 0, 3)}"
    }
    ],
    "cmrLink": "https://cmr.uat.earthdata.nasa.gov/search/granules.json?concept_id=G1222231611-CUMULUS",
    "cmrConceptId": "G1222231611-CUMULUS",
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "cmrMetadataFormat": "echo10",
    "dataType": "MOD09GQ",
    "version": "006",
    "published": true
    }
    ]
    }

    You can verify the granules exist within your cumulus instance (search using the Granules interface, check the S3 buckets, etc) and validate that the above CMR entry


    Build Processing Lambda

    This section discusses the construction of a custom processing lambda to replace the contrived example from this entry for a real dataset processing task.

    To ingest your own data using this example, you will need to construct your own lambda to replace the source in ProcessingStep that will generate browse imagery and provide or update a CMR metadata export file.

    You will then need to add the lambda to your Cumulus deployment as a aws_lambda_function Terraform resource.

    The discussion below outlines requirements for this lambda.

    Inputs

    The incoming message to the task defined in the ProcessingStep as configured will have the following configuration values (accessible inside event.config courtesy of the message adapter):

    Configuration

    • event.config.bucket -- the name of the bucket configured in terraform.tfvars as your internal bucket.

    • event.config.collection -- The full collection object we will configure in the Configure Ingest section. You can view the expected collection schema in the docs here or in the source code on github. You need this as available input and output so you can update as needed.

    event.config.additionalUrls, generateFakeBrowse and event.config.cmrMetadataFormat from the example can be ignored as they're configuration flags for the provided example script.

    Payload

    The 'payload' from the previous task is accessible via event.input. The expected payload output schema from SyncGranules can be viewed here.

    In our example, the payload would look like the following. Note: The types are set per-file based on what we configured in our collection, and were initially added as part of the DiscoverGranules step in the DiscoverGranulesBrowseExample workflow.

     "payload": {
    "process": "modis",
    "granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "bucket": "cumulus-test-sandbox-internal",
    "filename": "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "fileStagingDir": "file-staging/jk2/MOD09GQ___006",
    "time": 1553027415000,
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.name, 0, 3)}",
    "size": 1908635
    },
    {
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-internal",
    "filename": "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "fileStagingDir": "file-staging/jk2/MOD09GQ___006",
    "time": 1553027412000,
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.name, 0, 3)}",
    "size": 21708
    }
    ]
    }
    ]
    }

    Generating Browse Imagery

    The provided example script used in the example goes through all granules and adds a 'fake' .jpg browse file to the same staging location as the data staged by prior ingest tasksf.

    The processing lambda you construct will need to do the following:

    • Create a browse image file based on the input data, and stage it to a location accessible to both this task and the FilesToGranules and MoveGranules tasks in a S3 bucket.
    • Add the browse file to the input granule files, making sure to set the granule file's type to browse.
    • Update meta.input_granules with the updated granules list, as well as provide the files to be integrated by FilesToGranules as output from the task.

    Generating/updating CMR metadata

    If you do not already have a CMR file in the granules list, you will need to generate one for valid export. This example's processing script generates and adds it to the FilesToGranules file list via the payload but it can be present in the InputGranules from the DiscoverGranules task as well if you'd prefer to pre-generate it.

    Both downstream tasks MoveGranules, UpdateGranulesCmrMetadataFileLinks, and PostToCmr expect a valid CMR file to be available if you want to export to CMR.

    Expected Outputs for processing task/tasks

    In the above example, the critical portion of the output to FilesToGranules is the payload and meta.input_granules.

    In the example provided, the processing task is setup to return an object with the keys "files" and "granules". In the cumulus_message configuration, the outputs are mapped in the configuration to the payload, granules to meta.input_granules:

              "task_config": {
    "inputGranules": "{$.meta.input_granules}",
    "granuleIdExtraction": "{$.meta.collection.granuleIdExtraction}"
    }

    Their expected values from the example above may be useful in constructing a processing task:

    payload

    The payload includes a full list of files to be 'moved' into the cumulus archive. The FilesToGranules task will take this list, merge it with the information from InputGranules, then pass that list to the MoveGranules task. The MoveGranules task will then move the files to their targets. The UpdateGranulesCmrMetadataFileLinks task will update the CMR metadata file if it exists with the updated granule locations and update the CMR file etags.

    In the provided example, a payload being passed to the FilesToGranules task should be expected to look like:

      "payload": [
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml"
    ]

    This list is the list of granules FilesToGranules will act upon to add/merge with the input_granules object.

    The pathing is generated from sync-granules, but in principle the files can be staged wherever you like so long as the processing/MoveGranules task's roles have access and the filename matches the collection configuration.

    input_granules

    The FilesToGranules task utilizes the incoming payload to chose which files to move, but pulls all other metadata from meta.input_granules. As such, the output payload in the example would look like:

    "input_granules": [
    {
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606",
    "dataType": "MOD09GQ",
    "version": "006",
    "files": [
    {
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "bucket": "cumulus-test-sandbox-internal",
    "filename": "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "fileStagingDir": "file-staging/jk2/MOD09GQ___006",
    "time": 1553027415000,
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.name, 0, 3)}",
    "duplicate_found": true,
    "size": 1908635
    },
    {
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "type": "metadata",
    "bucket": "cumulus-test-sandbox-internal",
    "filename": "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf.met",
    "fileStagingDir": "file-staging/jk2/MOD09GQ___006",
    "time": 1553027412000,
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{substring(file.name, 0, 3)}",
    "duplicate_found": true,
    "size": 21708
    },
    {
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "type": "browse",
    "bucket": "cumulus-test-sandbox-internal",
    "filename": "s3://cumulus-test-sandbox-internal/file-staging/jk2/MOD09GQ___006/MOD09GQ.A2016358.h13v04.006.2016360104606.jpg",
    "fileStagingDir": "file-staging/jk2/MOD09GQ___006",
    "time": 1553027415000,
    "path": "data",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}___{cmrMetadata.Granule.Collection.VersionId}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}/{substring(file.name, 0, 3)}",
    "duplicate_found": true,
    }
    ]
    }
    ],
    - + \ No newline at end of file diff --git a/docs/v9.9.0/data-cookbooks/choice-states/index.html b/docs/v9.9.0/data-cookbooks/choice-states/index.html index 9ea98aa1f55..b118227f6e1 100644 --- a/docs/v9.9.0/data-cookbooks/choice-states/index.html +++ b/docs/v9.9.0/data-cookbooks/choice-states/index.html @@ -5,13 +5,13 @@ Choice States | Cumulus Documentation - +
    Version: v9.9.0

    Choice States

    Cumulus supports AWS Step Function Choice states. A Choice state enables branching logic in Cumulus workflows.

    Choice state definitions include a list of Choice Rules. Each Choice Rule defines a logical operation which compares an input value against a value using a comparison operator. For available comparison operators, review the AWS docs.

    If the comparison evaluates to true, the Next state is followed.

    Example

    In examples/cumulus-tf/parse_pdr_workflow.tf the ParsePdr workflow uses a Choice state, CheckAgainChoice, to terminate the workflow once meta.isPdrFinished: true is returned by the CheckStatus state.

    The CheckAgainChoice state definition requires an input object of the following structure:

    {
    "meta": {
    "isPdrFinished": false
    }
    }

    Given the above input to the CheckAgainChoice state, the workflow would transition to the PdrStatusReport state.

    "CheckAgainChoice": {
    "Type": "Choice",
    "Choices": [
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": false,
    "Next": "PdrStatusReport"
    },
    {
    "Variable": "$.meta.isPdrFinished",
    "BooleanEquals": true,
    "Next": "WorkflowSucceeded"
    }
    ],
    "Default": "WorkflowSucceeded"
    }

    Advanced: Loops in Cumulus Workflows

    Understanding the complete ParsePdr workflow is not necessary to understanding how Choice states work, but ParsePdr provides an example of how Choice states can be used to create a loop in a Cumulus workflow.

    In the complete ParsePdr workflow definition, the state QueueGranules is followed by CheckStatus. From CheckStatus a loop starts: Given CheckStatus returns meta.isPdrFinished: false, CheckStatus is followed by CheckAgainChoice is followed by PdrStatusReport is followed by WaitForSomeTime, which returns to CheckStatus. Once CheckStatus returns meta.isPdrFinished: true, CheckAgainChoice proceeds to WorkflowSucceeded.

    Execution graph of SIPS ParsePdr workflow in AWS Step Functions console

    Further documentation

    For complete details on Choice state configuration options, see the Choice state documentation.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/data-cookbooks/cnm-workflow/index.html b/docs/v9.9.0/data-cookbooks/cnm-workflow/index.html index be771675f60..70c96de04cd 100644 --- a/docs/v9.9.0/data-cookbooks/cnm-workflow/index.html +++ b/docs/v9.9.0/data-cookbooks/cnm-workflow/index.html @@ -5,7 +5,7 @@ CNM Workflow | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v9.9.0

    CNM Workflow

    This entry documents how to setup a workflow that utilizes the built-in CNM/Kinesis functionality in Cumulus.

    Prior to working through this entry you should be familiar with the Cloud Notification Mechanism.

    Sections


    Prerequisites

    Cumulus

    This entry assumes you have a deployed instance of Cumulus (version >= 1.16.0). The entry assumes you are deploying Cumulus via the cumulus terraform module sourced from the release page.

    AWS CLI

    This entry assumes you have the AWS CLI installed and configured. If you do not, please take a moment to review the documentation - particularly the examples relevant to Kinesis - and install it now.

    Kinesis

    This entry assumes you already have two Kinesis data steams created for use as CNM notification and response data streams.

    If you do not have two streams setup, please take a moment to review the Kinesis documentation and setup two basic single-shard streams for this example:

    Using the "Create Data Stream" button on the Kinesis Dashboard, work through the dialogue.

    You should be able to quickly use the "Create Data Stream" button on the Kinesis Dashboard, and setup streams that are similar to the following example:

    Screenshot of AWS console page for creating a Kinesis stream

    Please bear in mind that your {{prefix}}-lambda-processing IAM role will need permissions to write to the response stream for this workflow to succeed if you create the Kinesis stream with a dashboard user. If you are using the cumulus top-level module for your deployment this should be set properly.

    If not, the most straightforward approach is to attach the AmazonKinesisFullAccess policy for the stream resource to whatever role your Lambda s are using, however your environment/security policies may require an approach specific to your deployment environment.

    In operational environments it's likely science data providers would typically be responsible for providing a Kinesis stream with the appropriate permissions.

    For more information on how this process works and how to develop a process that will add records to a stream, read the Kinesis documentation and the developer guide.

    Source Data

    This entry will run the SyncGranule task against a single target data file. To that end it will require a single data file to be present in an S3 bucket matching the Provider configured in the next section.

    Collection and Provider

    Cumulus will need to be configured with a Collection and Provider entry of your choosing. The provider should match the location of the source data from the Ingest Source Data section.

    This can be done via the Cumulus Dashboard if installed or the API. It is strongly recommended to use the dashboard if possible.


    Configure the Workflow

    Provided the prerequisites have been fulfilled, you can begin adding the needed values to your Cumulus configuration to configure the example workflow.

    The following are steps that are required to set up your Cumulus instance to run the example workflow:

    Example CNM Workflow

    In this example, we're going to trigger a workflow by creating a Kinesis rule and sending a record to a Kinesis stream.

    The following workflow definition should be added to a new .tf workflow resource (e.g. cnm_workflow.tf) in your deployment directory. For the complete CNM workflow example, see examples/cumulus-tf/kinesis_trigger_test_workflow.tf.

    Add the following to the new terraform file in your deployment directory, updating the following:

    • Set the response-endpoint key in the CnmResponse task in the workflow JSON to match the name of the Kinesis response stream you configured in the prerequisites section
    • Update the source key to the workflow module to match the Cumulus release associated with your deployment.
    module "cnm_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-workflow.zip"

    prefix = var.prefix
    name = "CNMExampleWorkflow"
    workflow_config = module.cumulus.workflow_config
    system_bucket = var.system_bucket

    {
    state_machine_definition = <<JSON
    "CNMExampleWorkflow": {
    "Comment": "CNMExampleWorkflow",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "collection": "{$.meta.collection}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "CnmResponse"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "response-endpoint": "ADD YOUR RESPONSE STREAM NAME HERE",
    "region": "us-east-1",
    "type": "kinesis",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$.input.input}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "IntervalSeconds": 5,
    "MaxAttempts": 3
    }
    ],
    "End": true
    }
    }
    }
    }
    JSON

    Again, please make sure to modify the value response-endpoint to match the stream name (not ARN) for your Kinesis response stream.

    Lambda Configuration

    To execute this workflow, you're required to include several Lambda resources in your deployment. To do this, add the following task (Lambda) definitions to your deployment along with the workflow you created above:

    Please note: To utilize these tasks you need to ensure you have a compatible CMA layer. See the deployment instructions for more details on how to deploy a CMA layer.

    Below is a description of each of these tasks:

    CNMToCMA

    CNMToCMA is meant for the beginning of a workflow: it maps CNM granule information to a payload for downstream tasks. For other CNM workflows, you would need to ensure that downstream tasks in your workflow either understand the CNM message or include a translation task like this one.

    You can also manipulate the data sent to downstream tasks using task_config for various states in your workflow resource configuration. Read more about how to configure data on the Workflow Input & Output page.

    CnmResponse

    The CnmResponse Lambda generates a CNM response message and puts it on the response-endpoint Kinesis stream.

    You can read more about the expected schema of a CnmResponse record in the Cloud Notification Mechanism schema repository.

    Additional Tasks

    Lastly, this entry also makes use of the SyncGranule task from the cumulus module.

    Redeploy

    Once the above configuration changes have been made, redeploy your stack.

    Please refer to Update Cumulus resources in the deployment documentation if you are unfamiliar with redeployment.

    Rule Configuration

    Cumulus includes a messageConsumer Lambda function (message-consumer). Cumulus kinesis-type rules create the event source mappings between Kinesis streams and the messageConsumer Lambda. The messageConsumer Lambda consumes records from one or more Kinesis streams, as defined by enabled kinesis-type rules. When new records are pushed to one of these streams, the messageConsumer triggers workflows associated with the enabled kinesis-type rules.

    To add a rule via the dashboard (if you'd like to use the API, see the docs here), navigate to the Rules page and click Add a rule, then configure the new rule using the following template (substituting correct values for parameters denoted by ${}):

    {
    "collection": {
    "name": "L2_HR_PIXC",
    "version": "000"
    },
    "name": "L2_HR_PIXC_kinesisRule",
    "provider": "PODAAC_SWOT",
    "rule": {
    "type": "kinesis",
    "value": "arn:aws:kinesis:{{awsRegion}}:{{awsAccountId}}:stream/{{streamName}}"
    },
    "state": "ENABLED",
    "workflow": "CNMExampleWorkflow"
    }

    Please Note:

    • The rule's value attribute value must match the Amazon Resource Name ARN for the Kinesis data stream you've preconfigured. You should be able to obtain this ARN from the Kinesis Dashboard entry for the selected stream.
    • The collection and provider should match the collection and provider you setup in the Prerequisites section.

    Once you've clicked on 'submit' a new rule should appear in the dashboard's Rule Overview.


    Execute the Workflow

    Once Cumulus has been redeployed and a rule has been added, we're ready to trigger the workflow and watch it execute.

    How to Trigger the Workflow

    To trigger matching workflows, you will need to put a record on the Kinesis stream that the message-consumer Lambda will recognize as a matching event. Most importantly, it should include a collection name that matches a valid collection.

    For the purpose of this example, the easiest way to accomplish this is using the AWS CLI.

    Create Record JSON

    Construct a JSON file containing an object that matches the values that have been previously setup. This JSON object should be a valid Cloud Notification Mechanism message.

    Please note: this example is somewhat contrived, as the downstream tasks don't care about most of these fields. A 'real' data ingest workflow would.

    The following values (denoted by ${} in the sample below) should be replaced to match values we've previously configured:

    • TEST_DATA_FILE_NAME: The filename of the test data that is available in the S3 (or other) provider we created earlier.
    • TEST_DATA_URI: The full S3 path to the test data (e.g. s3://bucket-name/path/granule)
    • COLLECTION: The collection name defined in the prerequisites for this product
    {
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "${TEST_DATA_FILE_NAME}",
    "checksum": "bogus_checksum_value",
    "uri": "${TEST_DATA_URI}",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "${TEST_DATA_FILE_NAME}",
    "dataVersion": "006"
    },
    "identifier ": "testIdentifier123456",
    "collection": "${COLLECTION}",
    "provider": "TestProvider",
    "version": "001",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Add Record to Kinesis Data Stream

    Using the JSON file you created, push it to the Kinesis notification stream:

    aws kinesis put-record --stream-name YOUR_KINESIS_NOTIFICATION_STREAM_NAME_HERE --partition-key 1 --data file:///path/to/file.json

    Please note: The above command uses the stream name, not the ARN.

    The command should return output similar to:

    {
    "ShardId": "shardId-000000000000",
    "SequenceNumber": "42356659532578640215890215117033555573986830588739321858"
    }

    This command will put a record containing the JSON from the --data flag onto the Kinesis data stream. The messageConsumer Lambda will consume the record and construct a valid CMA payload to trigger workflows. For this example, the record will trigger the CNMExampleWorkflow workflow as defined by the rule previously configured.

    You can view the current running executions on the Executions dashboard page which presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information.

    Verify Workflow Execution

    As detailed above, once the record is added to the Kinesis data stream, the messageConsumer Lambda will trigger the CNMExampleWorkflow .

    TranslateMessage

    TranslateMessage (which corresponds to the CNMToCMA Lambda) will take the CNM object payload and add a granules object to the CMA payload that's consistent with other Cumulus ingest tasks, and add a meta.cnm key (as well as the payload) to store the original message.

    For more on the Message Adapter, please see the Message Flow documentation.

    An example of what is happening in the CNMToCMA Lambda is as follows:

    Example Input Payload:

    "payload": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some_bucket/cumulus-test-data/pdrs/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198"
    }

    Example Output Payload:

      "payload": {
    "cnm": {
    "identifier ": "testIdentifier123456",
    "product": {
    "files": [
    {
    "checksumType": "md5",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "checksum": "bogus_checksum_value",
    "uri": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "type": "data",
    "size": 12345678
    }
    ],
    "name": "TestGranuleUR",
    "dataVersion": "006"
    },
    "version": "123456",
    "collection": "MOD09GQ",
    "provider": "TestProvider",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552"
    },
    "output": {
    "granules": [
    {
    "granuleId": "TestGranuleUR",
    "files": [
    {
    "path": "some-bucket/data",
    "url_path": "s3://some-bucket/cumulus-test-data/data/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "bucket": "some-bucket",
    "name": "MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "size": 12345678
    }
    ]
    }
    ]
    }
    }

    SyncGranules

    This Lambda will take the files listed in the payload and move them to s3://{deployment-private-bucket}/file-staging/{deployment-name}/{COLLECTION}/{file_name}.

    CnmResponse

    Assuming a successful execution of the workflow, this task will recover the meta.cnm key from the CMA output, and add a "SUCCESS" record to the notification Kinesis stream.

    If a prior step in the workflow has failed, this will add a "FAILURE" record to the stream instead.

    The data written to the response-endpoint should adhere to the Response Message Fields schema.

    Example CNM Success Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "SUCCESS"
    }
    }

    Example CNM Error Response:

    {
    "provider": "PODAAC_SWOT",
    "collection": "SWOT_Prod_l2:1",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier": "1234-abcd-efg0-9876",
    "response": {
    "status": "FAILURE",
    "errorCode": "PROCESSING_ERROR",
    "errorMessage": "File [cumulus-dev-a4d38f59-5e57-590c-a2be-58640db02d91/prod_20170926T11:30:36/production_file.nc] did not match gve checksum value."
    }
    }

    Note the CnmResponse state defined in the .tf workflow definition above configures $.exception to be passed to the CnmResponse Lambda keyed under config.WorkflowException. This is required for the CnmResponse code to deliver a failure response.

    To test the failure scenario, send a record missing the product.name key.


    Verify results

    Check for successful execution on the dashboard

    Following the successful execution of this workflow, you should expect to see the workflow complete successfully on the dashboard:

    Screenshot of a successful CNM workflow appearing on the executions page of the Cumulus dashboard

    Check the test granule has been delivered to S3 staging

    The test granule identified in the Kinesis record should be moved to the deployment's private staging area.

    Check for Kinesis records

    A SUCCESS notification should be present on the response-endpoint Kinesis stream.

    You should be able to validate the notification and response streams have the expected records with the following steps (the AWS CLI Kinesis Basic Stream Operations is useful to review before proceeding):

    Get a shard iterator (substituting your stream name as appropriate):

    aws kinesis get-shard-iterator \
    --shard-id shardId-000000000000 \
    --shard-iterator-type LATEST \
    --stream-name NOTIFICATION_OR_RESPONSE_STREAM_NAME

    which should result in an output to:

    {
    "ShardIterator": "VeryLongString=="
    }
    • Re-trigger the workflow by using the put-record command from
    • As the workflow completes, use the output from the get-shard-iterator command to request data from the stream:
    aws kinesis get-records --shard-iterator SHARD_ITERATOR_VALUE

    This should result in output similar to:

    {
    "Records": [
    {
    "SequenceNumber": "49586720336541656798369548102057798835250389930873978882",
    "ApproximateArrivalTimestamp": 1532664689.128,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjI4LjkxOSJ9",
    "PartitionKey": "1"
    },
    {
    "SequenceNumber": "49586720336541656798369548102059007761070005796999266306",
    "ApproximateArrivalTimestamp": 1532664707.149,
    "Data": "eyJpZGVudGlmaWVyICI6InRlc3RJZGVudGlmaWVyMTIzNDU2IiwidmVyc2lvbiI6IjAwNiIsImNvbGxlY3Rpb24iOiJNT0QwOUdRIiwicHJvdmlkZXIiOiJUZXN0UHJvdmlkZXIiLCJwcm9kdWN0U2l6ZSI6MTkwODYzNS4wLCJyZXNwb25zZSI6eyJzdGF0dXMiOiJTVUNDRVNTIn0sInByb2Nlc3NDb21wbGV0ZVRpbWUiOiIyMDE4LTA3LTI3VDA0OjExOjQ2Ljk1OCJ9",
    "PartitionKey": "1"
    }
    ],
    "NextShardIterator": "AAAAAAAAAAFo9SkF8RzVYIEmIsTN+1PYuyRRdlj4Gmy3dBzsLEBxLo4OU+2Xj1AFYr8DVBodtAiXbs3KD7tGkOFsilD9R5tA+5w9SkGJZ+DRRXWWCywh+yDPVE0KtzeI0andAXDh9yTvs7fLfHH6R4MN9Gutb82k3lD8ugFUCeBVo0xwJULVqFZEFh3KXWruo6KOG79cz2EF7vFApx+skanQPveIMz/80V72KQvb6XNmg6WBhdjqAA==",
    "MillisBehindLatest": 0
    }

    Note the data encoding is not human readable and would need to be parsed/converted to be interpretable. There are many options to build a Kineis consumer such as the KCL.

    For purposes of validating the workflow, it may be simpler to locate the workflow in the Step Function Management Console and assert the expected output is similar to the below examples.

    Successful CNM Response Object Example:

    {
    "cnmResponse": {
    "provider": "TestProvider",
    "collection": "MOD09GQ",
    "version": "123456",
    "processCompleteTime": "2017-09-30T03:45:29.791198",
    "submissionTime": "2017-09-30T03:42:29.791198",
    "receivedTime": "2017-09-30T03:42:31.634552",
    "identifier ": "testIdentifier123456",
    "response": {
    "status": "SUCCESS"
    }
    }
    }

    Kinesis Record Error Handling

    messageConsumer

    The default Kinesis stream processing in the Cumulus system is configured for record error tolerance.

    When the messageConsumer fails to process a record, the failure is captured and the record is published to the kinesisFallback SNS Topic. The kinesisFallback SNS topic broadcasts the record and a subscribed copy of the messageConsumer Lambda named kinesisFallback consumes these failures.

    At this point, the normal Lambda asynchronous invocation retry behavior will attempt to process the record 3 mores times. After this, if the record cannot successfully be processed, it is written to a dead letter queue. Cumulus' dead letter queue is an SQS Queue named kinesisFailure. Operators can use this queue to inspect failed records.

    This system ensures when messageConsumer fails to process a record and trigger a workflow, the record is retried 3 times. This retry behavior improves system reliability in case of any external service failure outside of Cumulus control.

    The Kinesis error handling system - the kinesisFallback SNS topic, messageConsumer Lambda, and kinesisFailure SQS queue - come with the API package and do not need to be configured by the operator.

    To examine records that were unable to be processed at any step you need to go look at the dead letter queue {{prefix}}-kinesisFailure. Check the Simple Queue Service (SQS) console. Select your queue, and under the Queue Actions tab, you can choose View/Delete Messages. Start polling for messages and you will see records that failed to process through the messageConsumer.

    Note, these are only records that occurred when processing records from Kinesis streams. Workflow failures are handled differently.

    Kinesis Stream logging

    Notification Stream messages

    Cumulus includes two Lambdas (KinesisInboundEventLogger and KinesisOutboundEventLogger) that utilize the same code to take a Kinesis record event as input, deserialize the data field and output the modified event to the logs.

    When a kinesis rule is created, in addition to the messageConsumer event mapping, an event mapping is created to trigger KinesisInboundEventLogger to record a log of the inbound record, to allow for analysis in case of unexpected failure.

    Response Stream messages

    Cumulus also supports this feature for all outbound messages. To take advantage of this feature, you will need to set an event mapping on the KinesisOutboundEventLogger Lambda that targets your response-endpoint. You can do this in the Lambda management page for KinesisOutboundEventLogger. Add a Kinesis trigger, and configure it to target the cnmResponseStream for your workflow:

    Screenshot of the AWS console showing configuration for Kinesis stream trigger on KinesisOutboundEventLogger Lambda

    Once this is done, all records sent to the response-endpoint will also be logged in CloudWatch. For more on configuring Lambdas to trigger on Kinesis events, please see creating an event source mapping.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/data-cookbooks/error-handling/index.html b/docs/v9.9.0/data-cookbooks/error-handling/index.html index 43712e4af03..fce0dd26e13 100644 --- a/docs/v9.9.0/data-cookbooks/error-handling/index.html +++ b/docs/v9.9.0/data-cookbooks/error-handling/index.html @@ -5,7 +5,7 @@ Error Handling in Workflows | Cumulus Documentation - + @@ -45,7 +45,7 @@ Service Exception. See this documentation on configuring your workflow to handle transient lambda errors.

    Example state machine definition:

    {
    "Comment": "Tests Workflow from Kinesis Stream",
    "StartAt": "TranslateMessage",
    "States": {
    "TranslateMessage": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.cnm}",
    "destination": "{$.meta.cnm}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_to_cma_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "SyncGranule"
    },
    "SyncGranule": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "Path": "$.payload",
    "TargetPath": "$.payload"
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "buckets": "{$.meta.buckets}",
    "collection": "{$.meta.collection}",
    "downloadBucket": "{$.meta.buckets.private.name}",
    "stack": "{$.meta.stack}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$.granules}",
    "destination": "{$.meta.input_granules}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.sync_granule_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": ["States.ALL"],
    "IntervalSeconds": 10,
    "MaxAttempts": 3
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "CnmResponseFail"
    }
    ],
    "Next": "CnmResponse"
    },
    "CnmResponse": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowSucceeded"
    },
    "CnmResponseFail": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "OriginalCNM": "{$.meta.cnm}",
    "CNMResponseStream": "{$.meta.cnmResponseStream}",
    "region": "us-east-1",
    "WorkflowException": "{$.exception}",
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.meta.cnmResponse}"
    },
    {
    "source": "{$}",
    "destination": "{$.payload}"
    }
    ]
    }
    }
    }
    },
    "Type": "Task",
    "Resource": "${aws_lambda_function.cnm_response_task.arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": ["States.ALL"],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WorkflowFailed"
    },
    "WorkflowSucceeded": {
    "Type": "Succeed"
    },
    "WorkflowFailed": {
    "Type": "Fail",
    "Cause": "Workflow failed"
    }
    }
    }

    The above results in a workflow which is visualized in the diagram below:

    Screenshot of a visualization of an AWS Step Function workflow definition with branching logic for failures

    Summary

    Error handling should (mostly) be the domain of workflow configuration.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/data-cookbooks/hello-world/index.html b/docs/v9.9.0/data-cookbooks/hello-world/index.html index e95595e875f..874412a54f3 100644 --- a/docs/v9.9.0/data-cookbooks/hello-world/index.html +++ b/docs/v9.9.0/data-cookbooks/hello-world/index.html @@ -5,14 +5,14 @@ HelloWorld Workflow | Cumulus Documentation - +
    Version: v9.9.0

    HelloWorld Workflow

    Example task meant to be a sanity check/introduction to the Cumulus workflows.

    Pre-Deployment Configuration

    Workflow Configuration

    A workflow definition can be found in the template repository hello_world_workflow module.

    {
    "Comment": "Returns Hello World",
    "StartAt": "HelloWorld",
    "States": {
    "HelloWorld": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${module.cumulus.hello_world_task.task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    }

    Workflow error-handling can be configured as discussed in the Error-Handling cookbook.

    Task Configuration

    The HelloWorld task is provided for you as part of the cumulus terraform module, no configuration is needed.

    If you want to manually deploy your own version of this Lambda for testing, you can copy the Lambda resource definition located in the Cumulus source code at cumulus/tf-modules/ingest/hello-world-task.tf. The Lambda source code is located in the Cumulus source code at 'cumulus/tasks/hello-world'.

    Execution

    We will focus on using the Cumulus dashboard to schedule the execution of a HelloWorld workflow.

    Our goal here is to create a rule through the Cumulus dashboard that will define the scheduling and execution of our HelloWorld workflow. Let's navigate to the Rules page and click Add a rule.

    {
    "collection": { # collection values can be configured and found on the Collections page
    "name": "${collection_name}",
    "version": "${collection_version}"
    },
    "name": "helloworld_rule",
    "provider": "${provider}", # found on the Providers page
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "workflow": "HelloWorldWorkflow" # This can be found on the Workflows page
    }

    Screenshot of AWS Step Function execution graph for the HelloWorld workflow Executed workflow as seen in AWS Console

    Output/Results

    The Executions page presents a list of all executions, their status (running, failed, or completed), to which workflow the execution belongs, along with other information. The rule defined in the previous section should start an execution of its own accord, and the status of that execution can be tracked here.

    To get some deeper information on the execution, click on the value in the Name column of your execution of interest. This should bring up a visual representation of the workflow similar to that shown above, execution details, and a list of events.

    Summary

    Setting up the HelloWorld workflow on the Cumulus dashboard is the tip of the iceberg, so to speak. The task and step-function need to be configured before Cumulus deployment. A compatible collection and provider must be configured and applied to the rule. Finally, workflow execution status can be viewed via the workflows tab on the dashboard.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/data-cookbooks/ingest-notifications/index.html b/docs/v9.9.0/data-cookbooks/ingest-notifications/index.html index 6e22be17a5f..f6b944c8f9c 100644 --- a/docs/v9.9.0/data-cookbooks/ingest-notifications/index.html +++ b/docs/v9.9.0/data-cookbooks/ingest-notifications/index.html @@ -5,13 +5,13 @@ Ingest Notification in Workflows | Cumulus Documentation - +
    Version: v9.9.0

    Ingest Notification in Workflows

    On deployment, an SQS queue and three SNS topics are created and used for handling notification messages related to the workflow.

    The sfEventSqsToDbRecords Lambda function reads from the sfEventSqsToDbRecordsInputQueue queue and updates DynamoDB. The DynamoDB events for the ExecutionsTable, GranulesTable and PdrsTable are streamed on DynamoDBStreams, which are read by the publishExecutions, publishGranules and publishPdrs Lambda functions, respectively.

    These Lambda functions publish to the three SNS topics both when the workflow starts and when it reaches a terminal state (completion or failure). The following describes how many message(s) each topic receives both on workflow start and workflow completion/failure:

    • reportExecutions - Receives 1 message per workflow execution
    • reportGranules - Receives 1 message per granule in a workflow execution
    • reportPdrs - Receives 1 message per PDR

    Diagram of architecture for reporting workflow ingest notifications from AWS Step Functions

    The ingest notification reporting SQS queue is populated via a Cloudwatch rule for any Step Function execution state transitions. The sfEventSqsToDbRecords Lambda consumes this queue. The queue and Lambda are included in the cumulus module and the Cloudwatch rule in the workflow module and are included by default in a Cumulus deployment.

    Sending SQS messages to report status

    Publishing granule/PDR reports directly to the SQS queue

    If you have a non-Cumulus workflow or process ingesting data and would like to update the status of your granules or PDRs, you can publish directly to the reporting SQS queue. Publishing messages to this queue will result in those messages being stored as granule/PDR records in the Cumulus database and having the status of those granules/PDRs being visible on the Cumulus dashboard. The queue does have certain expectations as it expects a Cumulus Message nested within a Cloudwatch Step Function Event object.

    Posting directly to the queue will require knowing the queue URL. Assuming that you are using the cumulus module for your deployment, you can get the queue URL by adding them to outputs.tf for your Terraform deployment as in our example deployment:

    output "stepfunction_event_reporter_queue_url" {
    value = module.cumulus.stepfunction_event_reporter_queue_url
    }

    output "report_executions_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_granules_sns_topic_arn" {
    value = module.cumulus.report_executions_sns_topic_arn
    }
    output "report_pdrs_sns_topic_arn" {
    value = module.cumulus.report_pdrs_sns_topic_arn
    }

    Then, when you run terraform deploy, you should see the topic ARNs printed to your console:

    Outputs:
    ...
    stepfunction_event_reporter_queue_url = https://sqs.us-east-1.amazonaws.com/xxxxxxxxx/<prefix>-sfEventSqsToDbRecordsInputQueue
    report_executions_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_granules_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-executions-topic
    report_pdrs_sns_topic_arn = arn:aws:sns:us-east-1:xxxxxxxxx:<prefix>-report-pdrs-topic

    Once you have the queue URL, you can use the AWS SDK for your language of choice to publish messages to the topic. The expected format of these messages is that of a Cloudwatch Step Function event containing a Cumulus message. For SUCCEEDED events, the Cumulus message is expected to be in detail.output. For all other events statuses, a Cumulus Message is expected in detail.input. The Cumulus Message populating these fields MUST be a JSON string, not an object. Messages that do not conform to the schemas will fail to be created as records.

    If you are not seeing records persist to the database or show up in the Cumulus dashboard, you can investigate the Cloudwatch logs of the SQS consumer Lambda:

    • /aws/lambda/<prefix>-sfEventSqsToDbRecords

    In a workflow

    As described above, ingest notifications will automatically be published to the SNS topics on workflow start and completion/failure, so you should not include a workflow step to publish the initial or final status of your workflows.

    However, if you want to report your ingest status at any point during a workflow execution, you can add a workflow step using the SfSqsReport Lambda. In the following example from cumulus-tf/parse_pdr_workflow.tf, the ParsePdr workflow is configured to use the SfSqsReport Lambda, primarily to update the PDR ingestion status.

    Note: ${sf_sqs_report_task_arn} is an interpolated value referring to a Terraform resource. See the example deployment code for the ParsePdr workflow.

      "PdrStatusReport": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    },
    "ResultPath": null,
    "Type": "Task",
    "Resource": "${sf_sqs_report_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "Next": "WaitForSomeTime"
    },

    Subscribing additional listeners to SNS topics

    Additional listeners to SNS topics can be configured in a .tf file for your Cumulus deployment. Shown below is configuration that subscribes an additional Lambda function (test_lambda) to receive messages from the report_executions SNS topic. To subscribe to the report_granules or report_pdrs SNS topics instead, simply replace report_executions in the code block below with either of those values.

    resource "aws_lambda_function" "test_lambda" {
    function_name = "${var.prefix}-testLambda"
    filename = "./testLambda.zip"
    source_code_hash = filebase64sha256("./testLambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"
    }

    resource "aws_sns_topic_subscription" "test_lambda" {
    topic_arn = module.cumulus.report_executions_sns_topic_arn
    protocol = "lambda"
    endpoint = aws_lambda_function.test_lambda.arn
    }

    resource "aws_lambda_permission" "test_lambda" {
    action = "lambda:InvokeFunction"
    function_name = aws_lambda_function.test_lambda.arn
    principal = "sns.amazonaws.com"
    source_arn = module.cumulus.report_executions_sns_topic_arn
    }

    SNS message format

    Subscribers to the SNS topics can expect to find the published message in the SNS event at Records[0].Sns.Message. The message will be a JSON stringified version of the ingest notification record for an execution or a PDR. For granules, the message will be a JSON stringified object with ingest notification record in the record property and the event type as the event property.

    The ingest notification record of the execution, granule, or PDR should conform to the data model schema for the given record type.

    Summary

    Workflows can be configured to send SQS messages at any point using the sf-sqs-report task.

    Additional listeners can be easily configured to trigger when messages are sent to the SNS topics.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/data-cookbooks/queue-post-to-cmr/index.html b/docs/v9.9.0/data-cookbooks/queue-post-to-cmr/index.html index 2552580fe91..f5da1f5fccf 100644 --- a/docs/v9.9.0/data-cookbooks/queue-post-to-cmr/index.html +++ b/docs/v9.9.0/data-cookbooks/queue-post-to-cmr/index.html @@ -5,13 +5,13 @@ Queue PostToCmr | Cumulus Documentation - +
    Version: v9.9.0

    Queue PostToCmr

    In this document, we walktrough handling CMR errors in workflows by queueing PostToCmr. We assume that the user already has an ingest workflow setup.

    Overview

    The general concept is that the last task of the ingest workflow will be QueueWorkflow, which queues the publish workflow. The publish workflow contains the PostToCmr task and if a CMR error occurs during PostToCmr, the publish workflow will add itself back onto the queue so that it can be executed when CMR is back online. This is achieved by leveraging the QueueWorkflow task again in the publish workflow. The following diagram demonstrates this queueing process.

    Diagram of workflow queueing

    Ingest Workflow

    The last step should be the QueuePublishWorkflow step. It should be configured with a queueUrl and workflow. In this case, the queueUrl is a throttled queue. Any queueUrl can be specified here which is useful if you would like to use a lower priority queue. The workflow is the unprefixed workflow name that you would like to queue (e.g. PublishWorkflow).

      "QueuePublishWorkflowStep": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "workflow": "{$.meta.workflow}",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Publish Workflow

    Configure the Catch section of your PostToCmr task to proceed to QueueWorkflow if a CMRInternalError is caught. Any other error will cause the workflow to fail.

      "Catch": [
    {
    "ErrorEquals": [
    "CMRInternalError"
    ],
    "Next": "RequeueWorkflow"
    },
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],

    Then, configure the QueueWorkflow task similarly to its configuration in the ingest workflow. This time, pass the current publish workflow to the task config. This allows for the publish workflow to be requeued when there is a CMR error.

    {
    "RequeueWorkflow": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "buckets": "{$.meta.buckets}",
    "distribution_endpoint": "{$.meta.distribution_endpoint}",
    "workflow": "PublishGranuleQueue",
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_workflow_task_arn}",
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "Next": "WorkflowFailed",
    "ResultPath": "$.exception"
    }
    ],
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "End": true
    }
    }
    - + \ No newline at end of file diff --git a/docs/v9.9.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html b/docs/v9.9.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html index 84ac9fce2fc..eb0ba416797 100644 --- a/docs/v9.9.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html +++ b/docs/v9.9.0/data-cookbooks/run-tasks-in-lambda-or-docker/index.html @@ -5,13 +5,13 @@ Run Step Function Tasks in AWS Lambda or Docker | Cumulus Documentation - +
    Version: v9.9.0

    Run Step Function Tasks in AWS Lambda or Docker

    Overview

    AWS Step Function Tasks can run tasks on AWS Lambda or on AWS Elastic Container Service (ECS) as a Docker container.

    Lambda provides serverless architecture, providing the best option for minimizing cost and server management. ECS provides the fullest extent of AWS EC2 resources via the flexibility to execute arbitrary code on any AWS EC2 instance type.

    When to use Lambda

    You should use AWS Lambda whenever all of the following are true:

    • The task runs on one of the supported Lambda Runtimes. At time of this writing, supported runtimes include versions of python, Java, Ruby, node.js, Go and .NET.
    • The lambda package is less than 50 MB in size, zipped.
    • The task consumes less than each of the following resources:
      • 3008 MB memory allocation
      • 512 MB disk storage (must be written to /tmp)
      • 15 minutes of execution time

    See this page for a complete and up-to-date list of AWS Lambda limits.

    If your task requires more than any of these resources or an unsupported runtime, creating a Docker image which can be run on ECS is the way to go. Cumulus supports running any lambda package (and its configured layers) as a Docker container with cumulus-ecs-task.

    Step Function Activities and cumulus-ecs-task

    Step Function Activities enable a state machine task to "publish" an activity task which can be picked up by any activity worker. Activity workers can run pretty much anywhere, but Cumulus workflows support the cumulus-ecs-task activity worker. The cumulus-ecs-task worker runs as a Docker container on the Cumulus ECS cluster.

    The cumulus-ecs-task container takes an AWS Lambda Amazon Resource Name (ARN) as an argument (see --lambdaArn in the example below). This ARN argument is defined at deployment time. The cumulus-ecs-task worker polls for new Step Function Activity Tasks. When a Step Function executes, the worker (container) picks up the activity task and runs the code contained in the lambda package defined on deployment.

    Example: Replacing AWS Lambda with a Docker container run on ECS

    This example will use an already-defined workflow from the cumulus module that includes the QueueGranules task in its configuration.

    The following example is an excerpt from the Discover Granules workflow containing the step definition for the QueueGranules step:

    Note: ${ingest_granule_workflow_name} and ${queue_granules_task_arn} are interpolated values that refer to Terraform resources. See the example deployment code for the Discover Granules workflow.

      "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "queueUrl": "{$.meta.queues.startSF}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },

    Given it has been discovered this task can no longer run in AWS Lambda, you can instead run it on the Cumulus ECS cluster by adding the following resources to your terraform deployment (by either adding a new .tf file or updating an existing one):

    • A aws_sfn_activity resource:
    resource "aws_sfn_activity" "queue_granules" {
    name = "${var.prefix}-QueueGranules"
    }
    • An instance of the cumulus_ecs_service module (found on the Cumulus releases page configured to provide the QueueGranules task:

    module "queue_granules_service" {
    source = "https://github.com/nasa/cumulus/releases/download/{version}/terraform-aws-cumulus-ecs-service.zip"

    prefix = var.prefix
    name = "QueueGranules"

    cluster_arn = module.cumulus.ecs_cluster_arn
    desired_count = 1
    image = "cumuluss/cumulus-ecs-task:1.7.0"

    cpu = 400
    memory_reservation = 700

    environment = {
    AWS_DEFAULT_REGION = data.aws_region.current.name
    }
    command = [
    "cumulus-ecs-task",
    "--activityArn",
    aws_sfn_activity.queue_granules.id,
    "--lambdaArn",
    module.cumulus.queue_granules_task.task_arn
    ]
    alarms = {
    TaskCountHigh = {
    comparison_operator = "GreaterThanThreshold"
    evaluation_periods = 1
    metric_name = "MemoryUtilization"
    statistic = "SampleCount"
    threshold = 1
    }
    }
    }

    Please note: If you have updated the code for the Lambda specified by --lambdaArn, you will have to manually restart the tasks in your ECS service before invocation of the Step Function activity will use the updated Lambda code.

    • An updated Discover Granules workflow) to utilize the new resource (the Resource key in the QueueGranules step has been updated to:

    "Resource": "${aws_sfn_activity.queue_granules.id}")`

    If you then run this workflow in place of the DiscoverGranules workflow, the QueueGranules step would run as an ECS task instead of a lambda.

    Final note

    Step Function Activities and AWS Lambda are not the only ways to run tasks in an AWS Step Function. Learn more about other service integrations, including direct ECS integration via the AWS Service Integrations page.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/data-cookbooks/sips-workflow/index.html b/docs/v9.9.0/data-cookbooks/sips-workflow/index.html index 0ee5dce9f5b..ec8a3a95f09 100644 --- a/docs/v9.9.0/data-cookbooks/sips-workflow/index.html +++ b/docs/v9.9.0/data-cookbooks/sips-workflow/index.html @@ -5,7 +5,7 @@ Science Investigator-led Processing Systems (SIPS) | Cumulus Documentation - + @@ -16,7 +16,7 @@ we're just going to create a onetime throw-away rule that will be easy to test with. This rule will kick off the DiscoverAndQueuePdrs workflow, which is the beginning of a Cumulus SIPS workflow:

    Screenshot of a Cumulus rule configuration

    Note: A list of configured workflows exists under the "Workflows" in the navigation bar on the Cumulus dashboard. Additionally, one can find a list of executions and their respective status in the "Executions" tab in the navigation bar.

    DiscoverAndQueuePdrs Workflow

    This workflow will discover PDRs and queue them to be processed. Duplicate PDRs will be dealt with according to the configured duplicate handling setting in the collection. The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. DiscoverPdrs - source
    2. QueuePdrs - source

    Screenshot of execution graph for discover and queue PDRs workflow in the AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the discover_and_queue_pdrs_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    ParsePdr Workflow

    The ParsePdr workflow will parse a PDR, queue the specified granules (duplicates are handled according to the duplicate handling setting) and periodically check the status of those queued granules. This workflow will not succeed until all the granules included in the PDR are successfully ingested. If one of those fails, the ParsePdr workflow will fail. NOTE that ParsePdr may spin up multiple IngestGranule workflows in parallel, depending on the granules included in the PDR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. ParsePdr - source
    2. QueueGranules - source
    3. CheckStatus - source

    Screenshot of execution graph for SIPS Parse PDR workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the parse_pdr_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    IngestGranule Workflow

    The IngestGranule workflow processes and ingests a granule and posts the granule metadata to CMR.

    The lambdas below are included in the cumulus terraform module for use in your workflows:

    1. SyncGranule - source.
    2. CmrStep - source

    Additionally this workflow requires a processing step you must provide. The ProcessingStep step in the workflow picture below is an example of a custom processing step.

    Note: Using the CmrStep is not required and can be left out of the processing trajectory if desired (for example, in testing situations).

    Screenshot of execution graph for SIPS IngestGranule workflow in AWS Step Functions console

    An example workflow module configuration can be viewed in the Cumulus source for the ingest_and_publish_granule_workflow.

    Please note: To use this example workflow module as a template for a new workflow in your deployment the source key for the workflow module would need to point to a release of the cumulus-workflow (terraform-aws-cumulus-workflow.zip) module on our release page, as all of the provided Cumulus workflows are internally self-referential.

    Summary

    In this cookbook we went over setting up a collection, rule, and provider for a SIPS workflow. Once we had the setup completed, we looked over the Cumulus workflows that participate in parsing PDRs, ingesting and processing granules, and updating CMR.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/data-cookbooks/throttling-queued-executions/index.html b/docs/v9.9.0/data-cookbooks/throttling-queued-executions/index.html index 1039e58f173..009f707e628 100644 --- a/docs/v9.9.0/data-cookbooks/throttling-queued-executions/index.html +++ b/docs/v9.9.0/data-cookbooks/throttling-queued-executions/index.html @@ -5,13 +5,13 @@ Throttling queued executions | Cumulus Documentation - +
    Version: v9.9.0

    Throttling queued executions

    In this entry, we will walkthrough how to create an SQS queue for scheduling executions which will be used to limit those executions to a maximum concurrency. And we will see how to configure our Cumulus workflows/rules to use this queue.

    We will also review the architecture of this feature and highlight some implementation notes.

    Limiting the number of executions that can be running from a given queue is useful for controlling the cloud resource usage of workflows that may be lower priority, such as granule reingestion or reprocessing campaigns. It could also be useful for preventing workflows from exceeding known resource limits, such as a maximum number of open connections to a data provider.

    Implementing the queue

    Create and deploy the queue

    Add a new queue

    In a .tf file for your Cumulus deployment, add a new SQS queue:

    resource "aws_sqs_queue" "background_job_queue" {
    name = "${var.prefix}-backgroundJobQueue"
    receive_wait_time_seconds = 20
    visibility_timeout_seconds = 60
    }

    Set maximum executions for the queue

    Define the throttled_queues variable for the cumulus module in your Cumulus deployment to specify the maximum concurrent executions for the queue.

    module "cumulus" {
    # ... other variables

    throttled_queues = [{
    url = aws_sqs_queue.background_job_queue.id,
    execution_limit = 5
    }]
    }

    Setup consumer for the queue

    Add the sqs2sfThrottle Lambda as the consumer for the queue and add a Cloudwatch event rule/target to read from the queue on a scheduled basis.

    Please note: You must use the sqs2sfThrottle Lambda as the consumer for any queue with a queue execution limit or else the execution throttling will not work correctly. Additionally, please allow at least 60 seconds after creation before using the queue while associated infrastructure and triggers are set up and made ready.

    aws_sqs_queue.background_job_queue.id refers to the queue resource defined above.

    resource "aws_cloudwatch_event_rule" "background_job_queue_watcher" {
    schedule_expression = "rate(1 minute)"
    }

    resource "aws_cloudwatch_event_target" "background_job_queue_watcher" {
    rule = aws_cloudwatch_event_rule.background_job_queue_watcher.name
    arn = module.cumulus.sqs2sfThrottle_lambda_function_arn
    input = jsonencode({
    messageLimit = 500
    queueUrl = aws_sqs_queue.background_job_queue.id
    timeLimit = 60
    })
    }

    resource "aws_lambda_permission" "background_job_queue_watcher" {
    action = "lambda:InvokeFunction"
    function_name = module.cumulus.sqs2sfThrottle_lambda_function_arn
    principal = "events.amazonaws.com"
    source_arn = aws_cloudwatch_event_rule.background_job_queue_watcher.arn
    }

    Re-deploy your Cumulus application

    Follow the instructions to re-deploy your Cumulus application. After you have re-deployed, your workflow template will be updated to the include information about the queue (the output below is partial output from an expected workflow template):

    {
    "cumulus_meta": {
    "queueExecutionLimits": {
    "<backgroundJobQueue_SQS_URL>": 5
    }
    }
    }

    Integrate your queue with workflows and/or rules

    Integrate queue with queuing steps in workflows

    For any workflows using QueueGranules or QueuePdrs that you want to use your new queue, update the Cumulus configuration of those steps in your workflows.

    As seen in this partial configuration for a QueueGranules step, update the queueUrl to reference the new throttled queue:

    Note: ${ingest_granule_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverGranules workflow.

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}"
    }
    }
    }
    }
    }

    Similarly, for a QueuePdrs step:

    Note: ${parse_pdr_workflow_name} is an interpolated value referring to a Terraform resource. See the example deployment code for the DiscoverPdrs workflow.

    {
    "QueuePdrs": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${aws_sqs_queue.background_job_queue.id}",
    "provider": "{$.meta.provider}",
    "collection": "{$.meta.collection}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "parsePdrWorkflow": "${parse_pdr_workflow_name}"
    }
    }
    }
    }
    }

    After making these changes, re-deploy your Cumulus application for the execution throttling to take effect on workflow executions queued by these workflows.

    Create/update a rule to use your new queue

    Create or update a rule definition to include a queueUrl property that refers to your new queue:

    {
    "name": "s3_provider_rule",
    "workflow": "DiscoverAndQueuePdrs",
    "provider": "s3_provider",
    "collection": {
    "name": "MOD09GQ",
    "version": "006"
    },
    "rule": {
    "type": "onetime"
    },
    "state": "ENABLED",
    "queueUrl": "<backgroundJobQueue_SQS_URL>" // configure rule to use your queue URL
    }

    After creating/updating the rule, any subsequent invocations of the rule should respect the maximum number of executions when starting workflows from the queue.

    Architecture

    Architecture diagram showing how executions started from a queue are throttled to a maximum concurrent limit

    Execution throttling based on the queue works by manually keeping a count (semaphore) of how many executions are running for the queue at a time. The key operation that prevents the number of executions from exceeding the maximum for the queue is that before starting new executions, the sqs2sfThrottle Lambda attempts to increment the semaphore and responds as follows:

    • If the increment operation is successful, then the count was not at the maximum and an execution is started
    • If the increment operation fails, then the count was already at the maximum so no execution is started

    Final notes

    Limiting the number of concurrent executions for work scheduled via a queue has several consequences worth noting:

    • The number of executions that are running for a given queue will be limited to the maximum for that queue regardless of which workflow(s) are started.
    • If you use the same queue to schedule executions across multiple workflows/rules, then the limit on the total number of executions running concurrently will be applied to all of the executions scheduled across all of those workflows/rules.
    • If you are scheduling the same workflow both via a queue with a maxExecutions value and a queue without a maxExecutions value, only the executions scheduled via the queue with the maxExecutions value will be limited to the maximum.
    - + \ No newline at end of file diff --git a/docs/v9.9.0/data-cookbooks/tracking-files/index.html b/docs/v9.9.0/data-cookbooks/tracking-files/index.html index 2109b5fd230..e37ff066229 100644 --- a/docs/v9.9.0/data-cookbooks/tracking-files/index.html +++ b/docs/v9.9.0/data-cookbooks/tracking-files/index.html @@ -5,7 +5,7 @@ Tracking Ancillary Files | Cumulus Documentation - + @@ -19,7 +19,7 @@ The UMM-G column reflects the RelatedURL's Type derived from the CNM type, whereas the ECHO10 column shows how the CNM type affects the destination element.

    CNM TypeUMM-G RelatedUrl.TypeECHO10 Location
    ancillary'VIEW RELATED INFORMATION'OnlineResource
    data'GET DATA'(HTTPS URL) or 'GET DATA VIA DIRECT ACCESS'(S3 URI)OnlineAccessURL
    browse'GET RELATED VISUALIZATION'AssociatedBrowseImage
    linkage'EXTENDED METADATA'OnlineResource
    metadata'EXTENDED METADATA'OnlineResource
    qa'EXTENDED METADATA'OnlineResource

    Common Use Cases

    This section briefly documents some common use cases and the recommended configuration for the file. The examples shown here are for the DiscoverGranules use case, which allows configuration at the Cumulus dashboard level. The other two cases covered in the ancillary metadata documentation require configuration at the provider notification level (either CNM message or PDR) and are not covered here.

    Configuring browse imagery:

    {
    "bucket": "public",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_[\\d]{1}.jpg$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_1.jpg",
    "type": "browse"
    }

    Configuring a documentation entry:

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_README.pdf$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_README.pdf",
    "type": "metadata"
    }

    Configuring other associated files (use types metadata or qa as appropriate):

    {
    "bucket": "protected",
    "regex": "^MOD09GQ\\.A[\\d]{7}\\.[\\S]{6}\\.006\\.[\\d]{13}\\_QA.txt$",
    "sampleFileName": "MOD09GQ.A2017025.h21v00.006.2017034065104_QA.txt",
    "type": "qa"
    }
    - + \ No newline at end of file diff --git a/docs/v9.9.0/deployment/api-gateway-logging/index.html b/docs/v9.9.0/deployment/api-gateway-logging/index.html index 0ce0dcb66f8..5d8e8d5bd91 100644 --- a/docs/v9.9.0/deployment/api-gateway-logging/index.html +++ b/docs/v9.9.0/deployment/api-gateway-logging/index.html @@ -5,13 +5,13 @@ API Gateway Logging | Cumulus Documentation - +
    Version: v9.9.0

    API Gateway Logging

    Enabling API Gateway logging

    In order to enable distribution API Access and execution logging, configure the TEA deployment by setting log_api_gateway_to_cloudwatch on the thin_egress_app module:

    log_api_gateway_to_cloudwatch = true

    This enables the distribution API to send its logs to the default CloudWatch location: API-Gateway-Execution-Logs_<RESTAPI_ID>/<STAGE>

    Configure Permissions for API Gateway Logging to CloudWatch

    Instructions for enabling account level logging from API Gateway to CloudWatch

    This is a one time operation that must be performed on each AWS account to allow API Gateway to push logs to CloudWatch.

    Create a policy document

    The AmazonAPIGatewayPushToCloudWatchLogs managed policy, with an ARN of arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs, has all the required permissions to enable API Gateway logging to CloudWatch. To grant these permissions to your account, first create an IAM role with apigateway.amazonaws.com as its trusted entity.

    Save this snippet as apigateway-policy.json.

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "",
    "Effect": "Allow",
    "Principal": {
    "Service": "apigateway.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
    }
    ]
    }

    Create an account role to act as ApiGateway and write to CloudWatchLogs

    NASA users in NGAP: be sure to use your account's permission boundary.

    aws iam create-role \
    --role-name ApiGatewayToCloudWatchLogs \
    [--permissions-boundary <permissionBoundaryArn>] \
    --assume-role-policy-document file://apigateway-policy.json

    Note the ARN of the returned role for the last step.

    Attach correct permissions to role

    Next attach the AmazonAPIGatewayPushToCloudWatchLogs policy to the IAM role.

    aws iam attach-role-policy \
    --role-name ApiGatewayToCloudWatchLogs \
    --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs"

    Update Account API Gateway settings with correct permissions

    Finally, set the IAM role ARN on the cloudWatchRoleArn property on your API Gateway Account settings.

    aws apigateway update-account \
    --patch-operations op='replace',path='/cloudwatchRoleArn',value='<ApiGatewayToCloudWatchLogs ARN>'

    Configure API Gateway CloudWatch Logs Delivery

    See Configure Cloudwatch Logs Delivery

    - + \ No newline at end of file diff --git a/docs/v9.9.0/deployment/cloudwatch-logs-delivery/index.html b/docs/v9.9.0/deployment/cloudwatch-logs-delivery/index.html index 8fb425da81c..e76a792b3de 100644 --- a/docs/v9.9.0/deployment/cloudwatch-logs-delivery/index.html +++ b/docs/v9.9.0/deployment/cloudwatch-logs-delivery/index.html @@ -5,13 +5,13 @@ Configure Cloudwatch Logs Delivery | Cumulus Documentation - +
    Version: v9.9.0

    Configure Cloudwatch Logs Delivery

    As an optional configuration step, it is possible to deliver CloudWatch logs to a cross-account shared AWS::Logs::Destination. An operator does this by configuring the cumulus module for your deployment as shown below. The value of the log_destination_arn variable is the ARN of a writeable log destination.

    The value can be either an AWS::Logs::Destination or a Kinesis Stream ARN to which your account can write.

    log_destination_arn           = arn:aws:[kinesis|logs]:us-east-1:123456789012:[streamName|destination:logDestinationName]

    Logs Sent

    Be default, the following logs will be sent to the destination when one is given.

    • Ingest logs
    • Async Operation logs
    • Thin Egress App API Gateway logs (if configured)

    Additional Logs

    If additional logs are needed, you can configure additional_log_groups_to_elk with the Cloudwatch log groups you want to send to the destination. additional_log_groups_to_elk is a map with the key as a descriptor and the value with the Cloudwatch log group name.

    additional_log_groups_to_elk = {
    "HelloWorldTask" = "/aws/lambda/cumulus-example-HelloWorld"
    "MyCustomTask" = "my-custom-task-log-group"
    }
    - + \ No newline at end of file diff --git a/docs/v9.9.0/deployment/components/index.html b/docs/v9.9.0/deployment/components/index.html index eca04aa6158..d05c1891b8a 100644 --- a/docs/v9.9.0/deployment/components/index.html +++ b/docs/v9.9.0/deployment/components/index.html @@ -5,7 +5,7 @@ Component-based Cumulus Deployment | Cumulus Documentation - + @@ -39,7 +39,7 @@ Terraform at the same time.

    With remote state, Terraform writes the state data to a remote data store, which can then be shared between all members of a team.

    The recommended approach for handling remote state with Cumulus is to use the S3 backend. This backend stores state in S3 and uses a DynamoDB table for locking.

    See the deployment documentation for a walkthrough of creating resources for your remote state using an S3 backend.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/deployment/create_bucket/index.html b/docs/v9.9.0/deployment/create_bucket/index.html index 9410c1d767c..4bfeb68a8c8 100644 --- a/docs/v9.9.0/deployment/create_bucket/index.html +++ b/docs/v9.9.0/deployment/create_bucket/index.html @@ -5,13 +5,13 @@ Creating an S3 Bucket | Cumulus Documentation - +
    Version: v9.9.0

    Creating an S3 Bucket

    Buckets can be created on the command line with AWS CLI or via the web interface on the AWS console.

    When creating a protected bucket (a bucket containing data which will be served through the distribution API), make sure to enable S3 server access logging. See S3 Server Access Logging for more details.

    Command line

    Using the AWS command line tool create-bucket s3api subcommand:

    $ aws s3api create-bucket \
    --bucket foobar-internal \
    --region us-west-2 \
    --create-bucket-configuration LocationConstraint=us-west-2
    {
    "Location": "/foobar-internal"
    }

    Note: The region and create-bucket-configuration arguments are only necessary if you are creating a bucket outside of the us-east-1 region.

    Please note security settings and other bucket options can be set via the options listed in the s3api documentation.

    Repeat the above step for each bucket to be created.

    Web interface

    See: AWS "Creating a Bucket" documentation

    - + \ No newline at end of file diff --git a/docs/v9.9.0/deployment/cumulus_distribution/index.html b/docs/v9.9.0/deployment/cumulus_distribution/index.html index 2160facf5fd..b1c5c87577c 100644 --- a/docs/v9.9.0/deployment/cumulus_distribution/index.html +++ b/docs/v9.9.0/deployment/cumulus_distribution/index.html @@ -5,14 +5,14 @@ Using the Cumulus Distribution API | Cumulus Documentation - +
    Version: v9.9.0

    Using the Cumulus Distribution API

    The Cumulus Distribution API is a set of endpoints that can be used to enable AWS Cognito authentication when downloading data from S3.

    Configuring a Cumulus Distribution deployment

    The Cumulus Distribution API is included in the main Cumulus repo. It is available as part of the terraform-aws-cumulus.zip archive in the latest release.

    These steps assume you're using the Cumulus Deployment Template but can also be used for custom deployments.

    To configure a deployment to use Cumulus Distribution:

    1. Remove or comment the "Thin Egress App Settings" in the Cumulus Template Deploy and enable the Cumulus Distribution settings.
    2. Delete or comment the contents of thin_egress_app.tf and the corresponding Thin Egress App outputs in outputs.tf. These are not necessary for a Cumulus Distribution deployment.
    3. Uncomment the Cumulus Distribution outputs in outputs.tf.
    4. Rename cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.example to cumulus-template-deploy/cumulus-tf/cumulus_distribution.tf.

    Cognito Application and User Credentials

    The major prerequisite for using the Cumulus Distribution API is to set up Cognito. If operating within NGAP, this should already be done for you. If operating outside of NGAP, you must set up Cognito yourself, which is beyond the scope of this documentation.

    Given that Cognito is set up, in order to be able to download granule files via the Cumulus Distribution API, you must obtain Cognito user credentials, because any attempt to download such files (that will be, or have been, published to the CMR via your Cumulus deployment) will result in a prompt for you to supply Cognito user credentials. To obtain your own user credentials, talk to your product owner or scrum master for additional information. They should either know how to create the credentials, know who can create them for the team, or be the liaison to the Cognito team.

    Further, whoever helps to obtain your Cognito user credentials should also be able to supply you with the values for the following new variables that you must add to your cumulus-tf/terraform.tfvars file:

    • csdap_host_url: The URL of the Cognito service to which your Cumulus deployment will make Cognito API calls during a distribution (download) event
    • csdap_client_id: The client ID for the Cumulus application registered within the Cognito service
    • csdap_client_password: The client password for the Cumulus application registered within the Cognito service

    Although you might have to wait a bit for your Cognito user credentials, the remaining instructions do not depend upon having them, so you may continue with these instructions while waiting for your credentials.

    Cumulus Distribution URL

    Your Cumulus Distribution URL is used by Cumulus to generate download URLs as part of the granule metadata generated and published to the CMR. For example, a granule download URL will be of the form <distribution url>/<protected bucket>/<key> (or <distribution url>/path/to/file, if using a custom bucket map, as explained further below).

    By default, the value of your distribution URL is the URL of your private Cumulus Distribution API Gateway (the API Gateway named <prefix>-distribution, once you deploy the Cumulus Distribution module). Therefore, by default, the generated download URLs are private, and thus inaccessible directly, but there are 2 ways to address this issue (both of which are detailed below): (a) use tunneling (typically in development) or (b) put a CloudFront URL in front of your API Gateway (typically in production, and perhaps UAT and/or SIT).

    In either case, you must first know the default URL (i.e., the URL for the private Cumulus Distribution API Gateway). In order to obtain this default URL, you must first deploy your cumulus-tf module with the new Cumulus Distribution module, and once your initial deployment is complete, one of the Terraform outputs will be cumulus_distribution_api_uri, which is the URL for the private API Gateway.

    You may override this default URL by adding a cumulus_distribution_url variable to your cumulus-tf/terraform.tfvars file, and setting it to one of the following values (both of which are explained below):

    1. The default URL, but with a port added to it, in order to allow you to configure tunneling (typically only in development)
    2. A CloudFront URL placed in front of your Cumulus Distribution API Gateway (typically only for Production, but perhaps also for a UAT or SIT environment)

    The following subsections explain these approaches, in turn.

    Using your Cumulus Distribution API Gateway URL as your distribution URL

    Since your Cumulus Distribution API Gateway URL is private, the only way you can use it to confirm that your integration with Cognito is working is by using tunneling (again, generally for development), as described here. Here is an outline of the required steps, with details provided further below:

    1. Create/import a key pair into your AWS EC2 service (if you haven't already done so)
    2. Add a reference to the name of the key pair to your Terraform variables (we'll set the key_name Terraform variable)
    3. Choose an open local port on your machine (we'll use 9000 in the following details)
    4. Add a reference to the value of your cumulus_distribution_api_uri (mentioned earlier), including your chosen port (we'll set the cumulus_distribution_url Terraform variable)
    5. Redeploy Cumulus
    6. Add an entry to your /etc/hosts file
    7. Add a redirect URI to Cognito, via the Cognito API
    8. Install the Session Manager Plugin for the AWS CLI (if you haven't already done so; assuming you have already installed the AWS CLI)
    9. Add a sample file to S3 to test downloading via Cognito

    To create or import an existing key pair, you can use the AWS CLI (see aws ec2 import-key-pair), or the AWS Console (see Amazon EC2 key pairs and Linux instances).

    Once your key pair is added to AWS, add the following to your cumulus-tf/terraform.tfvars file:

    key_name = "<name>"
    cumulus_distribution_url = "https://<id>.execute-api.<region>.amazonaws.com:<port>/dev/"

    where:

    • <name> is the name of the key pair you just added to AWS
    • <id> and <region> are the corresponding parts from your cumulus_distribution_api_uri output variable
    • <port> is your open local port of choice (9000 is typically a good choice)

    Once you save your variable changes, redeploy your cumulus-tf module.

    While your deployment runs, add the following entry to your /etc/hosts file, replacing <hostname> with the host name of the cumulus_distribution_url Terraform variable you just added above:

    localhost <hostname>

    Next, you'll need to use the Cognito API to add the value of your cumulus_distribution_url Terraform variable as a Cognito redirect URI. To do so, use your favorite tool (e.g., curl, wget, Postman, etc.) to make a BasicAuth request to the Cognito API, using the following details:

    • method: POST
    • base URL: the value of your csdap_host_url Terraform variable
    • path: /authclient/updateRedirectUri
    • username: the value of your csdap_client_id Terraform variable
    • password: the value of your csdap_client_password Terraform variable
    • headers: Content-Type='application/x-www-form-urlencoded'
    • body: redirect_uri=<cumulus_distribution_url>/login

    where <cumulus_distribution_url> is the value of your cumulus_distribution_url Terraform variable. Note the /login path at the end of the redirect_uri value.

    For reference, see the Cognito Authentication Service API.

    Next, install the Session Manager Plugin for the AWS CLI. If running on macOS, and you use Homebrew, you can install it simply as follows:

    brew install --cask session-manager-plugin --no-quarantine

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    At this point, you should be ready to open a tunnel and attempt to download your sample file via your browser, summarized as follows:

    1. Determine your ec2 instance ID
    2. Connect to the NASA VPN
    3. Start an AWS SSM session
    4. Open an ssh tunnel
    5. Use a browser to navigate to your file

    To determine your ec2 instance ID for your Cumulus deployment, run the follow command, where <profile> is the name of the appropriate AWS profile to use, and <prefix> is the value of your prefix Terraform variable:

    aws --profile <profile> ec2 describe-instances --filters Name=tag:Deployment,Values=<prefix> Name=instance-state-name,Values=running --query "Reservations[0].Instances[].InstanceId" --output text

    IMPORTANT: Before proceeding with the remaining steps, make sure you're connected to the NASA VPN.

    Use the value output from the command above in place of <id> in the following command, which will start an SSM session:

    aws ssm start-session --target <id> --document-name AWS-StartPortForwardingSession --parameters portNumber=22,localPortNumber=6000

    If successful, you should see output similar to the following:

    Starting session with SessionId: NGAPShApplicationDeveloper-***
    Port 6000 opened for sessionId NGAPShApplicationDeveloper-***.
    Waiting for connections...

    Open another terminal window, and open a tunnel with port forwarding, using your chosen port from above (e.g., 9000):

    ssh -4 -p 6000 -N -L <port>:<api-gateway-host>:443 ec2-user@127.0.0.1

    where:

    • <port> is the open local port you chose earlier (e.g., 9000)
    • <api-gateway-host> is the hostname of your private API Gateway (i.e., the host portion of the URL you used as the value of your cumulus_distribution_url Terraform variable above)

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3 above.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    Once you're finished testing, clean up as follows:

    1. Kill your ssh tunnel (Ctrl-C)
    2. Kill your AWS SSM session (Ctrl-C)
    3. If you like, disconnect from the NASA VPC

    While this is a relatively lengthy process, things are much easier when using CloudFront, such as in Production (OPS), SIT, or UAT, as explained next.

    Using a CloudFront URL as your distribution URL

    In Production (OPS), and perhaps in other environments, such as UAT and SIT, you'll need to provide a publicly accessible URL for users to use for downloading (distributing) granule files.

    This is generally done by placing a CloudFront URL in front of your private Cumulus Distribution API Gateway. In order to create such a CloudFront URL, contact the person who helped you obtain your Cognito credentials, and request a CloudFront URL with the following details:

    • The private, backing URL, which is the value of your cumulus_distribution_api_uri Terraform output value
    • A request to add the AWS account's VPC to the whitelist

    Once this request is completed, and you obtain the new CloudFront URL, override your default distribution URL with the CloudFront URL by adding the following to your cumulus-tf/terraform.tfvars file:

    cumulus_distribution_url = <cloudfront_url>

    In addition, add a Cognito redirect URI, as detailed in the previous section. Note that in this case, the value you'll use for redirect_uri is <cloudfront_url>/login since the value of your cumulus_distribution_url is now your CloudFront URL.

    At this point, it is assumed that you have added the appropriate values for this environment for the variables described at the top (csdap_host_url, csdap_client_id, and csdap_client_password).

    Redeploy Cumulus with your new/updated Terraform variables.

    As your final setup step, add a sample file to one of the protected buckets listed in your buckets Terraform variable in your cumulus-tf/terraform.tfvars file. The key for the S3 object doesn't matter, nor does it matter what file you use. All that matters is that the file is an S3 object in one of your protected buckets, because Cognito is triggered when attempting to download from one of those buckets.

    Finally, use your chosen browser to navigate to <cumulus_distribution_url>/<bucket>/<key>, where <bucket> and <key> reference the sample file you added to S3.

    If all goes well, you should be prompted for your Cognito username and password. If you have obtained your Cognito user credentials, enter them, followed by entering a code generated by the authenticator application you registered at the time you completed your Cognito registration process. Once your credentials and auth code are correctly supplied, after a few moments, the download process will begin.

    S3 Bucket Mapping

    An S3 Bucket map allows users to abstract bucket names. If the bucket names change at any point, only the bucket map would need to be updated instead of every S3 link.

    The Cumulus Distribution API uses a bucket_map.yaml or bucket_map.yaml.tmpl file to determine which buckets to serve. See the examples.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple json mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    Note: Cumulus only supports a one-to-one mapping of bucket -> Cumulus Distribution path for 'distribution' buckets. Also, the bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Switching from the Thin Egress App to Cumulus Distribution

    If you have previously deployed the Thin Egress App (TEA) as your distribution app, you can switch to Cumulus Distribution by following the steps above.

    Note, however, that the cumulus_distribution module will generate a bucket map cache and overwrite any existing bucket map caches created by TEA.

    There will also be downtime while your API gateway is updated.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/deployment/index.html b/docs/v9.9.0/deployment/index.html index dfea9228731..ef623a05917 100644 --- a/docs/v9.9.0/deployment/index.html +++ b/docs/v9.9.0/deployment/index.html @@ -5,7 +5,7 @@ How to Deploy Cumulus | Cumulus Documentation - + @@ -21,7 +21,7 @@ for deployment's EC2 instances and allows you to connect to them via SSH/SSM.

    Consider the sizing of your Cumulus instance when configuring your variables.

    Choose a distribution API

    Cumulus can be configured to use either the Thin Egress App (TEA) or the Cumulus Distribution API. The default selection is the Thin Egress App if you're using the Deployment Template.

    IMPORTANT! If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    Configure the Thin Egress App

    The Thin Egress App can be used for Cumulus distribution and is the default selection. It allows authentication using Earthdata Login. Follow the steps in the documentation to configure distribution in your cumulus-tf deployment.

    Configure the Cumulus Distribution API (optional)

    If you would prefer to use the Cumulus Distribution API, which supports AWS Cognito authentication, follow these steps to configure distribution in your cumulus-tf deployment.

    Initialize Terraform

    Follow the above instructions to initialize Terraform using terraform init3.

    Deploy

    Run terraform apply to deploy the resources. Type yes when prompted to confirm that you want to create the resources. Assuming the operation is successful, you should see output like this:

    Apply complete! Resources: 292 added, 0 changed, 0 destroyed.

    Outputs:

    archive_api_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/token
    archive_api_uri = https://abc123.execute-api.us-east-1.amazonaws.com/dev/
    distribution_redirect_uri = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/login
    distribution_url = https://abc123.execute-api.us-east-1.amazonaws.com/DEV/

    Note: Be sure to copy the redirect URLs, as you will use them to update your Earthdata application.

    Update Earthdata Application

    You will need to add two redirect URLs to your EarthData login application.

    1. Login to URS.
    2. Under My Applications -> Application Administration -> use the edit icon of your application.
    3. Under Manage -> redirect URIs, add the Archive API url returned from the stack deployment
      • e.g. archive_api_redirect_uri = https://<czbbkscuy6>.execute-api.us-east-1.amazonaws.com/dev/token.
    4. Also add the Distribution url
      • e.g. distribution_redirect_uri = https://<kido2r7kji>.execute-api.us-east-1.amazonaws.com/dev/login1.
    5. You may delete the placeholder url you used to create the application.

    If you've lost track of the needed redirect URIs, they can be located on the API Gateway. Once there, select <prefix>-archive and/or <prefix>-thin-egress-app-EgressGateway, Dashboard and utilizing the base URL at the top of the page that is accompanied by the text Invoke this API at:. Make sure to append /token for the archive URL and /login to the thin egress app URL.


    Deploy Cumulus dashboard

    Dashboard Requirements

    Please note that the requirements are similar to the Cumulus stack deployment requirements. The installation instructions below include a step that will install/use the required node version referenced in the .nvmrc file in the dashboard repository.

    Prepare AWS

    Create S3 bucket for dashboard:

    • Create it, e.g. <prefix>-dashboard. Use the command line or console as you did when preparing AWS configuration.
    • Configure the bucket to host a website:
      • AWS S3 console: Select <prefix>-dashboard bucket then, "Properties" -> "Static Website Hosting", point to index.html
      • CLI: aws s3 website s3://<prefix>-dashboard --index-document index.html
    • The bucket's url will be http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or you can find it on the AWS console via "Properties" -> "Static website hosting" -> "Endpoint"
    • Ensure the bucket's access permissions allow your deployment user access to write to the bucket

    Install dashboard

    To install the dashboard, clone the Cumulus dashboard repository into the root deploy directory and install dependencies with npm install:

      git clone https://github.com/nasa/cumulus-dashboard
    cd cumulus-dashboard
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Dashboard versioning

    By default, the master branch will be used for dashboard deployments. The master branch of the dashboard repo contains the most recent stable release of the dashboard.

    If you want to test unreleased changes to the dashboard, use the develop branch.

    Each release/version of the dashboard will have a tag in the dashboard repo. Release/version numbers will use semantic versioning (major/minor/patch).

    To checkout and install a specific version of the dashboard:

      git fetch --tags
    git checkout <version-number> # e.g. v1.2.0
    nvm use
    npm install

    If you do not have the correct version of node installed, replace nvm use with nvm install $(cat .nvmrc) in the above example.

    Building the dashboard

    Note: These environment variables are available during the build: APIROOT, DAAC_NAME, STAGE, HIDE_PDR. Any of these can be set on the command line to override the values contained in config.js when running the build below.

    To configure your dashboard for deployment, set the APIROOT environment variable to your app's API root.2

    Build the dashboard from the dashboard repository root directory, cumulus-dashboard:

      APIROOT=<your_api_root> npm run build

    Dashboard deployment

    Deploy dashboard to s3 bucket from the cumulus-dashboard directory:

    Using AWS CLI:

      aws s3 sync dist s3://<prefix>-dashboard --acl public-read

    From the S3 Console:

    • Open the <prefix>-dashboard bucket, click 'upload'. Add the contents of the 'dist' subdirectory to the upload. Then select 'Next'. On the permissions window allow the public to view. Select 'Upload'.

    You should be able to visit the dashboard website at http://<prefix>-dashboard.s3-website-<region>.amazonaws.com or find the url <prefix>-dashboard -> "Properties" -> "Static website hosting" -> "Endpoint" and login with a user that you configured for access in the Configure and Deploy the Cumulus Stack step.


    Cumulus Instance Sizing

    The Cumulus deployment default sizing for Elasticsearch instances, EC2 instances, and Autoscaling Groups are small and designed for testing and cost savings. The default settings are likely not suitable for production workloads. Sizing is highly individual and dependent on expected load and archive size.

    Please be cognizant of costs as any change in size will affect your AWS bill. AWS provides a pricing calculator for estimating costs.

    Elasticsearch

    The mappings file contains all of the data types that will be indexed into Elasticsearch. Elasticsearch sizing is tied to your archive size, including your collections, granules, and workflow executions that will be stored.

    AWS provides documentation on calculating and configuring for sizing.

    In addition to size you'll want to consider the number of nodes which determine how the system reacts in the event of a failure.

    Configuration can be done in the data persistence module in elasticsearch_config and the cumulus module in es_index_shards.

    If you make changes to your Elasticsearch configuration you will need to reindex for those changes to take effect.

    EC2 instances and autoscaling groups

    EC2 instances are used for long-running operations (i.e. generating a reconciliation report) and long-running workflow tasks. Configuration for your ECS cluster is achieved via Cumulus deployment variables.

    When configuring your ECS cluster consider:

    • The EC2 instance type and EBS volume size needed to accommodate your workloads. Configured as ecs_cluster_instance_type and ecs_cluster_instance_docker_volume_size.
    • The minimum and desired number of instances on hand to accommodate your workloads. Configured as ecs_cluster_min_size and ecs_cluster_desired_size.
    • The maximum number of instances you will need and are willing to pay for to accommodate your heaviest workloads. Configured as ecs_cluster_max_size.
    • Your autoscaling parameters: ecs_cluster_scale_in_adjustment_percent, ecs_cluster_scale_out_adjustment_percent, ecs_cluster_scale_in_threshold_percent, and ecs_cluster_scale_out_threshold_percent.

    Footnotes


    1. Run terraform init if:

      • This is the first time deploying the module
      • You have added any additional child modules, including Cumulus components
      • You have updated the source for any of the child modules

    2. To add another redirect URIs to your application. On Earthdata home page, select "My Applications". Scroll down to "Application Administration" and use the edit icon for your application. Then Manage -> Redirect URIs.

    3. The API root can be found a number of ways. The easiest is to note it in the output of the app deployment step. But you can also find it from the AWS console -> Amazon API Gateway -> APIs -> <prefix>-archive -> Dashboard, and reading the URL at the top after "Invoke this API at"

    - + \ No newline at end of file diff --git a/docs/v9.9.0/deployment/postgres_database_deployment/index.html b/docs/v9.9.0/deployment/postgres_database_deployment/index.html index 2cb9e0e611f..6943d602a11 100644 --- a/docs/v9.9.0/deployment/postgres_database_deployment/index.html +++ b/docs/v9.9.0/deployment/postgres_database_deployment/index.html @@ -5,7 +5,7 @@ PostgreSQL Database Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ cumulus-rds-tf that will deploy an AWS RDS Aurora Serverless PostgreSQL 10.2 compatible database cluster, and optionally provision a single deployment database with credentialed secrets for use with Cumulus.

    We have provided an example terraform deployment using this module in the Cumulus template-deploy repository on github.

    Use of this example involves:

    • Creating/configuring a Terraform module directory
    • Using Terraform to deploy resources to AWS

    Requirements

    Configuration/installation of this module requires the following:

    • Terraform
    • git
    • A VPC configured for use with Cumulus Core. This should match the subnets you provide when Deploying Cumulus to allow Core's lambdas to properly access the database.
    • At least two subnets across multiple AZs. These should match the subnets you provide as configuration when Deploying Cumulus, and should be within the same VPC.

    Needed Git Repositories

    Assumptions

    OS/Environment

    The instructions in this module require Linux/MacOS. While deployment via Windows is possible, it is unsupported.

    Terraform

    This document assumes knowledge of Terraform. If you are not comfortable working with Terraform, the following links should bring you up to speed:

    For Cumulus specific instructions on installation of Terraform, refer to the main Cumulus Installation Documentation

    Aurora/RDS

    This document also assumes some basic familiarity with PostgreSQL databases, and Amazon Aurora/RDS. If you're unfamiliar consider perusing the AWS docs, and the Aurora Serverless V1 docs.

    Prepare deployment repository

    If you already are working with an existing repository that has a configured rds-cluster-tf deployment for the version of Cumulus you intend to deploy or update, or just need to configure this module for your repository, skip to Prepare AWS configuration.

    Clone the cumulus-template-deploy repo and name appropriately for your organization:

      git clone https://github.com/nasa/cumulus-template-deploy <repository-name>

    We will return to configuring this repo and using it for deployment below.

    Optional: Create a new repository

    Create a new repository on Github so that you can add your workflows and other modules to source control:

      git remote set-url origin https://github.com/<org>/<repository-name>
    git push origin master

    You can then add/commit changes as needed.

    Note: If you are pushing your deployment code to a git repo, make sure to add terraform.tf and terraform.tfvars to .gitignore, as these files will contain sensitive data related to your AWS account.


    Prepare AWS configuration

    To deploy this module, you need to make sure that you have the following steps from the Cumulus deployment instructions in similar fashion for this module:

    --

    Configure and deploy the module

    When configuring this module, please keep in mind that unlike Cumulus deployment, this module should be deployed once to create the database cluster and only thereafter to make changes to that configuration/upgrade/etc. This module does not need to be re-deployed for each Core update.

    These steps should be executed in the rds-cluster-tf directory of the template deploy repo that you previously cloned. Run the following to copy the example files:

    cd rds-cluster-tf/
    cp terraform.tf.example terraform.tf
    cp terraform.tfvars.example terraform.tfvars

    In terraform.tf, configure the remote state settings by substituting the appropriate values for:

    • bucket
    • dynamodb_table
    • PREFIX (whatever prefix you've chosen for your deployment)

    Fill in the appropriate values in terraform.tfvars. See the rds-cluster-tf module variable definitions for more detail on all of the configuration options. A few notable configuration options are documented in the next section.

    Configuration Options

    • deletion_protection -- defaults to true. Set it to false if you want to be able to delete your cluster with a terraform destroy without manually updating the cluster.
    • db_admin_username -- cluster database administration username. Defaults to postgres.
    • db_admin_password -- required variable that specifies the admin user password for the cluster. To randomize this on each deployment, consider using a random_string resource as input.
    • region -- defaults to us-east-1.
    • subnets -- requires at least 2 across different AZs. For use with Cumulus, these AZs should match the values you configure for your lambda_subnet_ids.
    • max_capacity -- the max ACUs the cluster is allowed to use. Carefully consider cost/performance concerns when setting this value.
    • min_capacity -- the minimum ACUs the cluster will scale to
    • provision_user_database -- Optional flag to allow module to provision a user database in addition to creating the cluster. Described in the next section.

    Provision user and user database

    If you wish for the module to provision a PostgreSQL database on your new cluster and provide a secret for access in the module output, in addition to managing the cluster itself, the following configuration keys are required:

    • provision_user_database -- must be set to true, this configures the module to deploy a lambda that will create the user database, and update the provided configuration on deploy.
    • permissions_boundary_arn -- the permissions boundary to use in creating the roles for access the provisioning lambda will need. This should in most use cases be the same one used for Cumulus Core deployment.
    • rds_user_password -- the value to set the user password to
    • prefix -- this value will be used to set a unique identifier the ProvisionDatabase lambda, as well as name the provisioned user/database.

    Once configured, the module will deploy the lambda, and run it on each provision, creating the configured database if it does not exist, updating the user password if that value has been changed, and updating the output user database secret.

    Setting provision_user_database to false after provisioning will not result in removal of the configured database, as the lambda is non-destructive as configured in this module.

    Please Note: This functionality is limited in that it will only provision a single database/user and configure a basic database, and should not be used in scenarios where more complex configuration is required.

    Initialize Terraform

    Run terraform init

    You should see output like:

    * provider.aws: version = "~> 2.32"

    Terraform has been successfully initialized!

    Deploy

    Run terraform apply to deploy the resources.

    If re-applying this module, variables (e.g. engine_version, snapshot_identifier ) that force a recreation of the database cluster may result in data loss if deletion protection is disabled. Examine the changeset carefully for resources that will be re-created/destroyed before applying.

    Review the changeset, and assuming it looks correct, type yes when prompted to confirm that you want to create all of the resources.

    Assuming the operation is successful, you should see output similar to the following (this example omits the creation of a user database/lambdas/security groups):

    terraform apply

    An execution plan has been generated and is shown below.
    Resource actions are indicated with the following symbols:
    + create

    Terraform will perform the following actions:

    # module.rds_cluster.aws_db_subnet_group.default will be created
    + resource "aws_db_subnet_group" "default" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + subnet_ids = [
    + "subnet-xxxxxxxxx",
    + "subnet-xxxxxxxxx",
    ]
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    }

    # module.rds_cluster.aws_rds_cluster.cumulus will be created
    + resource "aws_rds_cluster" "cumulus" {
    + apply_immediately = true
    + arn = (known after apply)
    + availability_zones = (known after apply)
    + backup_retention_period = 1
    + cluster_identifier = "xxxxxxxxx"
    + cluster_identifier_prefix = (known after apply)
    + cluster_members = (known after apply)
    + cluster_resource_id = (known after apply)
    + copy_tags_to_snapshot = false
    + database_name = "xxxxxxxxx"
    + db_cluster_parameter_group_name = (known after apply)
    + db_subnet_group_name = (known after apply)
    + deletion_protection = true
    + enable_http_endpoint = true
    + endpoint = (known after apply)
    + engine = "aurora-postgresql"
    + engine_mode = "serverless"
    + engine_version = "10.12"
    + final_snapshot_identifier = "xxxxxxxxx"
    + hosted_zone_id = (known after apply)
    + id = (known after apply)
    + kms_key_id = (known after apply)
    + master_password = (sensitive value)
    + master_username = "xxxxxxxxx"
    + port = (known after apply)
    + preferred_backup_window = "07:00-09:00"
    + preferred_maintenance_window = (known after apply)
    + reader_endpoint = (known after apply)
    + skip_final_snapshot = false
    + storage_encrypted = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_security_group_ids = (known after apply)

    + scaling_configuration {
    + auto_pause = true
    + max_capacity = 4
    + min_capacity = 2
    + seconds_until_auto_pause = 300
    + timeout_action = "RollbackCapacityChange"
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret.rds_login will be created
    + resource "aws_secretsmanager_secret" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + name = (known after apply)
    + name_prefix = "xxxxxxxxx"
    + policy = (known after apply)
    + recovery_window_in_days = 30
    + rotation_enabled = (known after apply)
    + rotation_lambda_arn = (known after apply)
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }

    + rotation_rules {
    + automatically_after_days = (known after apply)
    }
    }

    # module.rds_cluster.aws_secretsmanager_secret_version.rds_login will be created
    + resource "aws_secretsmanager_secret_version" "rds_login" {
    + arn = (known after apply)
    + id = (known after apply)
    + secret_id = (known after apply)
    + secret_string = (sensitive value)
    + version_id = (known after apply)
    + version_stages = (known after apply)
    }

    # module.rds_cluster.aws_security_group.rds_cluster_access will be created
    + resource "aws_security_group" "rds_cluster_access" {
    + arn = (known after apply)
    + description = "Managed by Terraform"
    + egress = (known after apply)
    + id = (known after apply)
    + ingress = (known after apply)
    + name = (known after apply)
    + name_prefix = "cumulus_rds_cluster_access_ingress"
    + owner_id = (known after apply)
    + revoke_rules_on_delete = false
    + tags = {
    + "Deployment" = "xxxxxxxxx"
    }
    + vpc_id = "vpc-xxxxxxxxx"
    }

    # module.rds_cluster.aws_security_group_rule.rds_security_group_allow_PostgreSQL will be created
    + resource "aws_security_group_rule" "rds_security_group_allow_postgres" {
    + from_port = 5432
    + id = (known after apply)
    + protocol = "tcp"
    + security_group_id = (known after apply)
    + self = true
    + source_security_group_id = (known after apply)
    + to_port = 5432
    + type = "ingress"
    }

    Plan: 6 to add, 0 to change, 0 to destroy.

    Do you want to perform these actions?
    Terraform will perform the actions described above.
    Only 'yes' will be accepted to approve.

    Enter a value: yes

    module.rds_cluster.aws_db_subnet_group.default: Creating...
    module.rds_cluster.aws_security_group.rds_cluster_access: Creating...
    module.rds_cluster.aws_secretsmanager_secret.rds_login: Creating...

    Then, after the resources are created:

    Apply complete! Resources: X added, 0 changed, 0 destroyed.
    Releasing state lock. This may take a few moments...

    Outputs:

    admin_db_login_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxxxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmdR
    admin_db_login_secret_version = xxxxxxxxx
    rds_endpoint = xxxxxxxxx.us-east-1.rds.amazonaws.com
    security_group_id = xxxxxxxxx
    user_credentials_secret_arn = arn:aws:secretsmanager:us-east-1:xxxxx:secret:xxxxxxxxxx20210407182709367700000002-dpmpXA

    Note the output values for admin_db_login_secret_arn (and optionally user_credentials_secret_arn) as these provide the AWS Secrets Manager secret required to access the database as the administrative user and, optionally, the user database credentials Cumulus requires as well.

    The content of each of these secrets are is in the form:

    {
    "database": "postgres",
    "dbClusterIdentifier": "clusterName",
    "engine": "postgres",
    "host": "xxx",
    "password": "defaultPassword",
    "port": 5432,
    "username": "xxx"
    }
    • database -- the PostgreSQL database used by the configured user
    • dbClusterIdentifier -- the value set by the cluster_identifier variable in the terraform module
    • engine -- the Aurora/RDS database engine
    • host -- the RDS service host for the database in the form (dbClusterIdentifier)-(AWS ID string).(region).rds.amazonaws.com
    • password -- the database password
    • username -- the account username
    • port -- The database connection port, should always be 5432

    Next Steps

    The database cluster has been created/updated! From here you can continue to add additional user accounts, databases and other database configuration.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/deployment/share-s3-access-logs/index.html b/docs/v9.9.0/deployment/share-s3-access-logs/index.html index cafd1576b1d..01148960c12 100644 --- a/docs/v9.9.0/deployment/share-s3-access-logs/index.html +++ b/docs/v9.9.0/deployment/share-s3-access-logs/index.html @@ -5,14 +5,14 @@ Share S3 Access Logs | Cumulus Documentation - +
    Version: v9.9.0

    Share S3 Access Logs

    It is possible through Cumulus to share S3 access logs across multiple S3 packages using the S3 replicator package.

    S3 Replicator

    The S3 Replicator is a node package that contains a simple lambda function, associated permissions, and the Terraform instructions to replicate create-object events from one S3 bucket to another.

    First ensure that you have enabled S3 Server Access Logging.

    Next configure your config.tfvars as described in the s3-replicator/README.md to correspond to your deployment. The source_bucket and source_prefix are determined by how you enabled the S3 Server Access Logging.

    In order to deploy the s3-replicator with cumulus you will need to add the module to your terraform main.tf definition. e.g.

    module "s3-replicator" {
    source = "<path to s3-replicator.zip>"
    prefix = var.prefix
    vpc_id = var.vpc_id
    subnet_ids = var.subnet_ids
    permissions_boundary = var.permissions_boundary_arn
    source_bucket = var.s3_replicator_config.source_bucket
    source_prefix = var.s3_replicator_config.source_prefix
    target_bucket = var.s3_replicator_config.target_bucket
    target_prefix = var.s3_replicator_config.target_prefix
    }

    The terraform source package can be found on the Cumulus github release page under the asset tab terraform-aws-cumulus-s3-replicator.zip.

    ESDIS Metrics

    In the NGAP environment, the ESDIS Metrics team has set up an ELK stack to process logs from Cumulus instances. To use this system, you must deliver any S3 Server Access logs that Cumulus creates.

    Configure the S3 replicator as described above using the target_bucket and target_prefix provided by the metrics team.

    The metrics team has taken care of setting up Logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/deployment/terraform-best-practices/index.html b/docs/v9.9.0/deployment/terraform-best-practices/index.html index 50222600468..658b92d0508 100644 --- a/docs/v9.9.0/deployment/terraform-best-practices/index.html +++ b/docs/v9.9.0/deployment/terraform-best-practices/index.html @@ -5,7 +5,7 @@ Terraform Best Practices | Cumulus Documentation - + @@ -88,7 +88,7 @@ AWS CLI command, replacing PREFIX with your deployment prefix name:

    aws resourcegroupstaggingapi get-resources \
    --query "ResourceTagMappingList[].ResourceARN" \
    --tag-filters Key=Deployment,Values=PREFIX

    Ideally, the output should be an empty list, but if it is not, then you may need to manually delete the listed resources.

    Configuring the Cumulus deployment: link Restoring a previous version: link

    - + \ No newline at end of file diff --git a/docs/v9.9.0/deployment/thin_egress_app/index.html b/docs/v9.9.0/deployment/thin_egress_app/index.html index 4b8b0008192..b9afe0786a0 100644 --- a/docs/v9.9.0/deployment/thin_egress_app/index.html +++ b/docs/v9.9.0/deployment/thin_egress_app/index.html @@ -5,7 +5,7 @@ Using the Thin Egress App for Cumulus distribution | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v9.9.0

    Using the Thin Egress App for Cumulus distribution

    The Thin Egress App (TEA) is an app running in Lambda that allows retrieving data from S3 using temporary links and provides URS integration.

    Configuring a TEA deployment

    TEA is deployed using Terraform modules. Refer to these instructions for guidance on how to integrate new components with your deployment.

    The cumulus-template-deploy repository cumulus-tf/main.tf contains a thin_egress_app for distribution.

    The TEA module provides these instructions showing how to add it to your deployment and the following are instructions to configure the thin_egress_app module in your Cumulus deployment.

    Create a secret for signing Thin Egress App JWTs

    The Thin Egress App uses JWTs internally to authenticate requests and requires a secret stored in AWS Secrets Manager containing SSH keys that are used to sign the JWTs.

    See the Thin Egress App documentation on how to create this secret with the correct values. It will be used later to set the thin_egress_jwt_secret_name variable when deploying the Cumulus module.

    bucket_map.yaml

    The Thin Egress App uses a bucket_map.yaml file to determine which buckets to serve. Documentation of the file format is available here.

    The default Cumulus module generates a file at s3://${system_bucket}/distribution_bucket_map.json.

    The configuration file is a simple json mapping of the form:

    {
    "daac-public-data-bucket": "/path/to/this/kind/of/data"
    }

    Please note: Cumulus only supports a one-to-one mapping of bucket->TEA path for 'distribution' buckets.

    Optionally configure a custom bucket map

    A simple config would look something like this:

    bucket_map.yaml
    MAP:
    my-protected: my-protected
    my-public: my-public

    PUBLIC_BUCKETS:
    - my-public

    Please note: your custom bucket map must include mappings for all of the protected and public buckets specified in the buckets variable in cumulus-tf/terraform.tfvars, otherwise Cumulus may not be able to determine the correct distribution URL for ingested files and you may encounter errors.

    Optionally configure shared variables

    The cumulus module deploys certain components that interact with TEA. As a result, the cumulus module requires that if you are specifying a value for the stage_name variable to the TEA module, you must use the same value for the tea_api_gateway_stage variable to the cumulus module.

    One way to keep these variable values in sync across the modules is to use Terraform local values to define values to use for the variables for both modules. This approach is shown in the Cumulus core example deployment code.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/deployment/upgrade-readme/index.html b/docs/v9.9.0/deployment/upgrade-readme/index.html index e12611257db..17e6aca0022 100644 --- a/docs/v9.9.0/deployment/upgrade-readme/index.html +++ b/docs/v9.9.0/deployment/upgrade-readme/index.html @@ -5,7 +5,7 @@ Upgrading Cumulus | Cumulus Documentation - + @@ -15,7 +15,7 @@ deployment functions correctly. Please refer to some recommended smoke tests given above, and consider additional tests appropriate for your particular deployment and environment.

    Update Cumulus Dashboard

    If there are breaking (or otherwise significant) changes to the Cumulus API, you should also upgrade your Cumulus Dashboard deployment to use the version of the Cumulus API matching the version of Cumulus to which you are migrating.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/development/forked-pr/index.html b/docs/v9.9.0/development/forked-pr/index.html index 0223b532353..0a9751881f9 100644 --- a/docs/v9.9.0/development/forked-pr/index.html +++ b/docs/v9.9.0/development/forked-pr/index.html @@ -5,13 +5,13 @@ Issuing PR From Forked Repos | Cumulus Documentation - +
    Version: v9.9.0

    Issuing PR From Forked Repos

    Fork the Repo

    • Fork the Cumulus repo
    • Create a new branch from the branch you'd like to contribute to
    • If an issue does't already exist, submit one (see above)

    Create a Pull Request

    Reviewing PRs from Forked Repos

    Upon submission of a pull request, the Cumulus development team will review the code.

    Once the code passes an initial review, the team will run the CI tests against the proposed update.

    The request will then either be merged, declined, or an adjustment to the code will be requested via the issue opened with the original PR request.

    PRs from forked repos cannot directly merged to master. Cumulus reviews must follow the following steps before completing the review process:

    1. Create a new branch:

        git checkout -b from-<name-of-the-branch> master
    2. Push the new branch to GitHub

    3. Change the destination of the forked PR to the new branch that was just pushed

      Screenshot of Github interface showing how to change the base branch of a pull request

    4. After code review and approval, merge the forked PR to the new branch.

    5. Create a PR for the new branch to master.

    6. If the CI tests pass, merge the new branch to master and close the issue. If the CI tests do not pass, request an amended PR from the original author/ or resolve failures as appropriate.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/development/integration-tests/index.html b/docs/v9.9.0/development/integration-tests/index.html index 99931fca255..1031204665c 100644 --- a/docs/v9.9.0/development/integration-tests/index.html +++ b/docs/v9.9.0/development/integration-tests/index.html @@ -5,7 +5,7 @@ Integration Tests | Cumulus Documentation - + @@ -19,7 +19,7 @@ in the commit message.

    If you create a new stack and want to be able to run integration tests against it in CI, you will need to add it to bamboo/select-stack.js.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/development/quality-and-coverage/index.html b/docs/v9.9.0/development/quality-and-coverage/index.html index 34f0fa2c965..0eb7ca82da0 100644 --- a/docs/v9.9.0/development/quality-and-coverage/index.html +++ b/docs/v9.9.0/development/quality-and-coverage/index.html @@ -5,7 +5,7 @@ Code Coverage and Quality | Cumulus Documentation - + @@ -23,7 +23,7 @@ here.

    To run linting on the markdown files, run npm run lint-md.

    Audit

    This project uses audit-ci to run a security audit on the package dependency tree. This must pass prior to merge. The configured rules for audit-ci can be found here.

    To execute an audit, run npm run audit.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/development/release/index.html b/docs/v9.9.0/development/release/index.html index 58c0dfb284c..d6658c6c485 100644 --- a/docs/v9.9.0/development/release/index.html +++ b/docs/v9.9.0/development/release/index.html @@ -5,7 +5,7 @@ Versioning and Releases | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v9.9.0

    Versioning and Releases

    Versioning

    We use a global versioning approach, meaning version numbers in cumulus are consistent across all packages and tasks, and semantic versioning to track major, minor, and patch version (i.e. 1.0.0). We use Lerna to manage our versioning. Any change will force lerna to increment the version of all packages.

    Read more about the semantic versioning here.

    Pre-release testing

    Note: This is only necessary when preparing a release for a new major version of Cumulus (e.g. preparing to go from 6.x.x to 7.0.0)

    Before releasing a new major version of Cumulus, we should test the deployment upgrade path from the latest release of Cumulus to the upcoming release.

    It is preferable to use the cumulus-template-deploy repo for testing the deployment, since that repo is the officially recommended deployment configuration for end users.

    You should create an entirely new deployment for this testing to replicate the end user upgrade path. Using an existing test or CI deployment would not be useful because that deployment may already have been deployed with the latest changes and not match the upgrade path for end users.

    Pre-release testing steps:

    1. Checkout the cumulus-template-deploy repo

    2. Update the deployment code to use the latest release artifacts if it wasn't done already. For example, assuming that the latest release was 5.0.1, update the deployment files as follows:

      # in data-persistence-tf/main.tf
      source = "https://github.com/nasa/cumulus/releases/download/v5.0.1/terraform-aws-cumulus.zip//tf-modules/data-persistence"

      # in cumulus-tf/main.tf
      source = "https://github.com/nasa/cumulus/releases/download/v5.0.1/terraform-aws-cumulus.zip//tf-modules/cumulus"
    3. For both the data-persistence-tf and cumulus-tf modules:

      1. Add the necessary backend configuration (terraform.tf) and variables (terraform.tfvars)
        • You should use an entirely new deployment for this testing, so make sure to use values for key in terraform.tf and prefix in terraform.tfvars that don't collide with existing deployments
      2. Run terraform init
      3. Run terraform apply
    4. Checkout the master branch of the cumulus repo

    5. Run a full bootstrap of the code: npm run bootstrap

    6. Build the pre-release artifacts: ./bamboo/create-release-artifacts.sh

    7. For both the data-persistence-tf and cumulus-tf modules:

      1. Update the deployment to use the built release artifacts:

        # in data-persistence-tf/main.tf
        source = "[path]/cumulus/terraform-aws-cumulus.zip//tf-modules/data-persistence"

        # in cumulus-tf/main.tf
        source = "/Users/mboyd/development/cumulus/terraform-aws-cumulus.zip//tf-modules/cumulus"
      2. Review the CHANGELOG.md for any pre-deployment migration steps. If there are, go through the steps and confirm that they are successful

      3. Run terraform init

      4. Run terraform apply

    8. Review the CHANGELOG.md for any post-deployment migration steps and confirm that they are successful

    9. Delete your test deployment by running terraform destroy in cumulus-tf and data-persistence-tf

    Updating Cumulus version and publishing to NPM

    1. Create a branch for the new release

    From Master

    Create a branch titled release-MAJOR.MINOR.x for the release (use a literal x for the patch version).

        git checkout -b release-MAJOR.MINOR.x

    e.g.:
    git checkout -b release-9.1.x

    If creating a new major version release from master, say 5.0.0, then the branch would be named release-5.0.x. If creating a new minor version release from master, say 1.14.0 then the branch would be named release-1.14.x.

    Having a release branch for each major/minor version allows us to easily backport patches to that version.

    Push the release-MAJOR.MINOR.x branch to GitHub if it was created locally. (Commits should be even with master at this point.)

    If creating a patch release, you can check out the existing base branch.

    Then create the release branch (e.g. release-1.14.0) from the minor version base branch. For example, from the release-1.14.x branch:

    git checkout -b release-1.14.0

    Backporting

    When creating a backport, a minor version base branch should already exist on GitHub. Check out the existing minor version base branch then create a release branch from it. For example:

    # check out existing minor version base branch
    git checkout release-1.14.x
    # pull to ensure you have the latest changes
    git pull origin release-1.14.x
    # create new release branch for backport
    git checkout -b release-1.14.1
    # cherry pick the commits (or single squashed commit of changes) relevant to the backport
    git cherry-pick [replace-with-commit-SHA]
    # push up the changes to the release branch
    git push

    2. Update the Cumulus version number

    When changes are ready to be released, the Cumulus version number must be updated.

    Lerna handles the process of deciding which version number should be used as long as the developer specifies whether the change is a major, minor, or patch change.

    To update Cumulus's version number run:

    npm run update

    Screenshot of terminal showing interactive prompt from Lerna for selecting the new release version

    Lerna will handle updating the packages and all of the dependent package version numbers. If a dependency has not been changed with the update, however, lerna will not update the version of the dependency.

    Note: Lerna will struggle to correctly update the versions on any non-standard/alpha versions (e.g. 1.17.0-alpha0). Please be sure to check any packages that are new or have been manually published since the previous release and any packages that list it as a dependency to ensure the listed versions are correct. It's useful to use the search feature of your code editor or grep to see if there any references to outdated package versions.

    3. Check Cumulus Dashboard PRs for Version Bump

    There may be unreleased changes in the Cumulus Dashboard project that rely on this unreleased Cumulus Core version.

    If there is exists a PR in the cumulus-dashboard repo with a name containing: "Version Bump for Next Cumulus API Release":

    • There will be a placeholder change-me value that should be replaced with the Cumulus Core to-be-released-version.
    • Mark that PR as ready to be reviewed.

    4. Update CHANGELOG.md

    Update the CHANGELOG.md. Put a header under the Unreleased section with the new version number and the date.

    Add a link reference for the github "compare" view at the bottom of the CHANGELOG.md, following the existing pattern. This link reference should create a link in the CHANGELOG's release header to changes in the corresponding release.

    5. Update DATA_MODEL_CHANGELOG.md

    Similar to #4, make sure the DATA_MODEL_CHANGELOG is updated if there are data model changes in the release, and the link reference at the end of the document is updated as appropriate.

    6. Update CONTRIBUTORS.md

    ./bin/update-contributors.sh
    git add CONTRIBUTORS.md

    Commit and push these changes, if any.

    7. Update Cumulus package API documentation

    Update auto-generated API documentation for any Cumulus packages that have it:

    npm run docs-build-packages

    Commit and push these changes, if any.

    8. Cut new version of Cumulus Documentation

    If this is a backport, do not create a new version of the documentation. For various reasons, we do not merge backports back to master, other than changelog notes. Documentation changes for backports will not be published to our documentation website.

    cd website
    npm run version ${release_version}
    git add .

    Where ${release_version} corresponds to the version tag v1.2.3, for example.

    Commit and push these changes.

    9. Create a pull request against the minor version branch

    1. Push the release branch (e.g. release-1.2.3) to GitHub.

    2. Create a PR against the minor version base branch (e.g. release-1.2.x).

    3. Configure Bamboo to run automated tests against this PR by finding the branch plan for the release branch (release-1.2.3) and setting only these variables:

      • GIT_PR: true
      • SKIP_AUDIT: true

      IMPORTANT: Do NOT set the PUBLISH_FLAG variable to true for this branch plan. The actual publishing of the release will be handled by a separate, manually triggered branch plan.

      Screenshot of Bamboo CI interface showing the configuration of the GIT_PR branch variable to have a value of &quot;true&quot;

    4. Verify that the Bamboo build for the PR succeeds and then merge to the minor version base branch (release-1.2.x).

      • It is safe to do a squash merge in this instance, but not required
    5. You may delete your release branch (release-1.2.3) after merging to the base branch.

    10. Create a git tag for the release

    Check out the minor version base branch now that your changes are merged in and do a git pull.

    Ensure you are on the latest commit.

    Create and push a new git tag:

        git tag -a vMAJOR.MINOR.PATCH -m "Release MAJOR.MINOR.PATCH"
    git push origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -a v9.1.0 -m "Release 9.1.0"
    git push origin v9.1.0

    11. Publishing the release

    Publishing of new releases is handled by a custom Bamboo branch plan and is manually triggered.

    The reasons for using a separate branch plan to handle releases instead of the branch plan for the minor version (e.g. release-1.2.x) are:

    • The Bamboo build for the minor version release branch is triggered automatically on any commits to that branch, whereas we want to manually control when the release is published.
    • We want to verify that integration tests have passed on the Bamboo build for the minor version release branch before we manually trigger the release, so that we can be sure that our code is safe to release.

    If this is a new minor version branch, then you will need to create a new Bamboo branch plan for publishing the release following the instructions below:

    Creating a Bamboo branch plan for the release

    • In the Cumulus Core project (https://ci.earthdata.nasa.gov/browse/CUM-CBA), click Actions -> Configure Plan in the top right.

    • Next to Plan branch click the rightmost button that displays Create Plan Branch upon hover.

    • Click Create plan branch manually.

    • Add the values in that list. Choose a display name that makes it very clear this is a deployment branch plan. Release (minor version branch name) seems to work well (e.g. Release (1.2.x))).

      • Make sure you enter the correct branch name (e.g. release-1.2.x).
    • Important Deselect Enable Branch - if you do not do this, it will immediately fire off a build.

    • Do Immediately On the Branch Details page, enable Change trigger. Set the Trigger type to manual, this will prevent commits to the branch from triggering the build plan. You should have been redirected to the Branch Details tab after creating the plan. If not, navigate to the branch from the list where you clicked Create Plan Branch in the previous step.

    • Go to the Variables tab. Ensure that you are on your branch plan and not the master plan: You should not see a large list of configured variables, but instead a dropdown allowing you to select variables to override, and the tab title will be Branch Variables. Then set the branch variables as follow:

      • DEPLOYMENT: cumulus-from-npm-tf (except in special cases such as incompatible backport branches)
        • If this variable is not set, it will default to the deployment name for the last committer on the branch
      • USE_CACHED_BOOTSTRAP: false
      • USE_TERRAFORM_ZIPS: true (IMPORTANT: MUST be set in order to run integration tests against the .zip files published during the build so that we are actually testing our released files)
      • GIT_PR: true
      • SKIP_AUDIT: true
      • PUBLISH_FLAG: true
    • Enable the branch from the Branch Details page.

    • Run the branch using the Run button in the top right.

    Bamboo will build and run lint, audit and unit tests against that tagged release, publish the new packages to NPM, and then run the integration tests using those newly released packages.

    12. Create a new Cumulus release on github

    The CI release scripts will automatically create a GitHub release based on the release version tag, as well as upload artifacts to the Github release for the Terraform modules provided by Cumulus. The Terraform release artifacts include:

    • A multi-module Terraform .zip artifact containing filtered copies of the tf-modules, packages, and tasks directories for use as Terraform module sources.
    • A S3 replicator module
    • A workflow module
    • A distribution API module
    • An ECS service module

    Just make sure to verify the appropriate .zip files are present on Github after the release process is complete.

    13. Merge base branch back to master

    Finally, you need to reproduce the version update changes back to master.

    If this is the latest version, you can simply create a PR to merge the minor version base branch back to master.

    Do not merge master back into the release branch since we want the release branch to just have the code from the release. Instead, create a new branch off of the release branch and merge that to master. You can freely merge master into this branch and delete it when it is merged to master.

    If this is a backport, you will need to create a PR that ports the changelog updates back to master. It is important in this changelog note to call it out as a backport. For example, fixes in backport version 1.14.5 may not be available in 1.15.0 because the fix was introduced in 1.15.3.

    Troubleshooting

    Delete and regenerate the tag

    To delete a published tag to re-tag, follow these steps:

      git tag -d vMAJOR.MINOR.PATCH
    git push -d origin vMAJOR.MINOR.PATCH

    e.g.:
    git tag -d v9.1.0
    git push -d origin v9.1.0
    - + \ No newline at end of file diff --git a/docs/v9.9.0/docs-how-to/index.html b/docs/v9.9.0/docs-how-to/index.html index 876a0501aca..662e22a9d5d 100644 --- a/docs/v9.9.0/docs-how-to/index.html +++ b/docs/v9.9.0/docs-how-to/index.html @@ -5,13 +5,13 @@ Cumulus Documentation: How To's | Cumulus Documentation - +
    Version: v9.9.0

    Cumulus Documentation: How To's

    Cumulus Docs Installation

    Run a Local Server

    Environment variables DOCSEARCH_API_KEY and DOCSEARCH_INDEX_NAME must be set for search to work. At the moment, search is only truly functional on prod because that is the only website we have registered to be indexed with DocSearch (see below on search).

    git clone git@github.com:nasa/cumulus
    cd cumulus
    npm run docs-install
    npm run docs-serve

    Note: docs-build will build the documents into website/build.

    Cumulus Documentation

    Our project documentation is hosted on GitHub Pages. The resources published to this website are housed in docs/ directory at the top of the Cumulus repository. Those resources primarily consist of markdown files and images.

    We use the open-source static website generator Docusaurus to build html files from our markdown documentation, add some organization and navigation, and provide some other niceties in the final website (search, easy templating, etc.).

    Add a New Page and Sidebars

    Adding a new page should be as simple as writing some documentation in markdown, placing it under the correct directory in the docs/ folder and adding some configuration values wrapped by --- at the top of the file. There are many files that already have this header which can be used as reference.

    ---
    id: doc-unique-id # unique id for this document. This must be unique across ALL documentation under docs/
    title: Title Of Doc # Whatever title you feel like adding. This will show up as the index to this page on the sidebar.
    hide_title: false
    ---

    Note: To have the new page show up in a sidebar the designated id must be added to a sidebar in the website/sidebars.js file. Docusaurus has an in depth explanation of sidebars here.

    Versioning Docs

    We lean heavily on Docusaurus for versioning. Their suggestions and walkthrough can be found here. It is worth noting that we would like the Documentation versions to match up directly with release versions. Cumulus versioning is explained in the Versioning Docs.

    Search on our documentation site is taken care of by DocSearch. We have been provided with an apiKey and an indexName by DocSearch that we include in our website/siteConfig.js file. The rest, indexing and actual searching, we leave to DocSearch. Our builds expect environment variables for both these values to exist - DOCSEARCH_API_KEY and DOCSEARCH_NAME_INDEX.

    Add a new task

    The tasks list in docs/tasks.md is generated from the list of task package in the task folder. Do not edit the docs/tasks.md file directly.

    Read more about adding a new task.

    Editing the tasks.md header or template

    Look at the bin/build-tasks-doc.js and bin/tasks-header.md files to edit the output of the tasks build script.

    Editing diagrams

    For some diagrams included in the documentation, the raw source is included in the docs/assets/raw directory to allow for easy updating in the future:

    • assets/interfaces.svg -> assets/raw/interfaces.drawio (generated using draw.io)

    Deployment

    The master branch is automatically built and deployed to gh-pages branch. The gh-pages branch is served by Github Pages. Do not make edits to the gh-pages branch.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/external-contributions/index.html b/docs/v9.9.0/external-contributions/index.html index b0472bec2c1..416f1140c42 100644 --- a/docs/v9.9.0/external-contributions/index.html +++ b/docs/v9.9.0/external-contributions/index.html @@ -5,13 +5,13 @@ External Contributions | Cumulus Documentation - +
    Version: v9.9.0

    External Contributions

    Contributions to Cumulus may be made in the form of PRs to the repositories directly or through externally developed tasks and components. Cumulus is designed as an ecosystem that leverages Terraform deployments and AWS Step Functions to easily integrate external components.

    This list may not be exhaustive and represents components that are open source, owned externally, and that have been tested with the Cumulus system. For more information and contributing guidelines, visit the respective GitHub repositories.

    Distribution

    The ASF Thin Egress App is used by Cumulus for distribution. TEA can be deployed with Cumulus or as part of other applications to distribute data.

    Operational Cloud Recovery Archive (ORCA)

    ORCA can be deployed with Cumulus to provide a customizable baseline for creating and managing operational backups.

    Workflow Tasks

    CNM

    PO.DAAC provides two workflow tasks to be used with the Cloud Notification Mechanism (CNM) Schema: CNM to Granule and CNM Response.

    See the CNM workflow data cookbook for an example of how these can be used in a Cumulus ingest workflow.

    DMR++ Generation

    GHRC has provided a DMR++ Generation wokrflow task. This task is meant to be used in conjunction with Cumulus' Hyrax Metadata Updates workflow task.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/faqs/index.html b/docs/v9.9.0/faqs/index.html index 77d8e56d767..6cd3e12d3d3 100644 --- a/docs/v9.9.0/faqs/index.html +++ b/docs/v9.9.0/faqs/index.html @@ -5,13 +5,13 @@ Frequently Asked Questions | Cumulus Documentation - +
    Version: v9.9.0

    Frequently Asked Questions

    Below are some commonly asked questions that you may encounter that can assist you along the way when working with Cumulus.

    General

    How do I deploy a new instance in Cumulus?

    Answer: For steps on the Cumulus deployment process go to How to Deploy Cumulus.

    What prerequisites are needed to setup Cumulus?

    Answer: You will need access to the AWS console and an Earthdata login before you can deploy Cumulus.

    What is the preferred web browser for the Cumulus environment?

    Answer: Our preferred web browser is the latest version of Google Chrome.

    How do I quickly troubleshoot an issue in Cumulus?

    Answer: To troubleshoot and fix issues in Cumulus reference our recommended solutions in Troubleshooting Cumulus.

    Where can I get support help?

    Answer: The following options are available for assistance:

    • Cumulus: Outside NASA users should file a GitHub issue and inside NASA users should file a JIRA issue.
    • AWS: You can create a case in the AWS Support Center, accessible via your AWS Console.

    Integrators & Developers

    What is a Cumulus integrator?

    Answer: Those who are working within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    What are the steps if I run into an issue during deployment?

    Answer: If you encounter an issue with your deployment go to the Troubleshooting Deployment guide.

    Is Cumulus customizable and flexible?

    Answer: Yes. Cumulus is a modular architecture that allows you to decide which components that you want/need to deploy. These components are maintained as Terraform modules.

    What are Terraform modules?

    Answer: They are modules that are composed to create a Cumulus deployment, which gives integrators the flexibility to choose the components of Cumulus that want/need. To view Cumulus maintained modules or steps on how to create a module go to Terraform modules.

    Where do I find Terraform module variables

    Answer: Go here for a list of Cumulus maintained variables.

    What is a Cumulus workflow?

    Answer: A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions. For more details, we suggest visiting here.

    How do I set up a Cumulus workflow?

    Answer: You will need to create a provider, have an associated collection (add a new one), and generate a new rule first. Then you can set up a Cumulus workflow by following these steps here.

    What are the common use cases that a Cumulus integrator encounters?

    Answer: The following are some examples of possible use cases you may see:


    Operators

    What is a Cumulus operator?

    Answer: Those that ingests, archives, and troubleshoots datasets (called collections in Cumulus). Your daily activities might include but not limited to the following:

    • Ingesting datasets
    • Maintaining historical data ingest
    • Starting and stopping data handlers
    • Managing collections
    • Managing provider definitions
    • Creating, enabling, and disabling rules
    • Investigating errors for granules and deleting or re-ingesting granules
    • Investigating errors in executions and isolating failed workflow step(s)
    What are the common use cases that a Cumulus operator encounters?

    Answer: The following are some examples of possible use cases you may see:

    Can you re-run a workflow execution in AWS?

    Answer: Yes. For steps on how to re-run a workflow execution go to Re-running workflow executions in the Cumulus Operator Docs.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/features/ancillary_metadata/index.html b/docs/v9.9.0/features/ancillary_metadata/index.html index 6a3a6fa7195..388d8defb3c 100644 --- a/docs/v9.9.0/features/ancillary_metadata/index.html +++ b/docs/v9.9.0/features/ancillary_metadata/index.html @@ -5,7 +5,7 @@ Ancillary Metadata Export | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v9.9.0

    Ancillary Metadata Export

    This feature utilizes the type key on a files object in a Cumulus granule. It uses the key to provide a mechanism where granule discovery, processing and other tasks can set and use this value to facilitate metadata export to CMR.

    Tasks setting type

    Discover Granules

    Uses the Collection type key to set the value for files on discovered granules in it's output.

    Parse PDR

    Uses a task-specific mapping to map PDR 'FILE_TYPE' to a CNM type to set type on granules from the PDR.

    CNMToCMALambdaFunction

    Natively supports types that are included in incoming messages to a CNM Workflow.

    Tasks using type

    Move Granules

    Uses the granule file type key to update UMM/ECHO 10 CMR files passed in as candidates to the task. This task adds the external facing URLs to the CMR metadata file based on the type. See the file tracking data cookbook for a detailed mapping. If a non-CNM type is specified, the task assumes it is a 'data' file.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/features/backup_and_restore/index.html b/docs/v9.9.0/features/backup_and_restore/index.html index e38bb8b8079..b03da2e6b1d 100644 --- a/docs/v9.9.0/features/backup_and_restore/index.html +++ b/docs/v9.9.0/features/backup_and_restore/index.html @@ -5,7 +5,7 @@ Cumulus Backup and Restore | Cumulus Documentation - + @@ -71,7 +71,7 @@ utilize the new cluster/security groups and redeploy.

    DynamoDB

    Backup and Restore with AWS

    You can enable point-in-time recovery (PITR) as well as create an on-demand backup for your Amazon DynamoDB tables.

    PITR provides continuous backups of your DynamoDB table data. PITR can be enabled through your Terraform deployment, the AWS console, or the AWS API. When enabled, DynamoDB maintains continuous backups of your table up to the last 35 days. You can recover a copy of that table to a previous state at any point in time from the moment you enable PITR, up to a maximum of the 35 preceding days. PITR provides continuous backups until you explicitly disable it.

    On-demand backups allow you to create backups of DynamoDB table data and its settings. You can initiate an on-demand backup at any time with a single click from the AWS Management Console or a single API call. You can restore the backups to a new DynamoDB table in the same AWS Region at any time.

    PITR gives your DynamoDB tables continuous protection from accidental writes and deletes. With PITR, you do not have to worry about creating, maintaining, or scheduling backups. You enable PITR on your table and your backup is available for restore at any point in time from the moment you enable it, up to a maximum of the 35 preceding days. For example, imagine a test script writing accidentally to a production DynamoDB table. You could recover your table to any point in time within the last 35 days.

    On-demand backups help with long-term archival requirements for regulatory compliance. On-demand backups give you full-control of managing the lifecycle of your backups, from creating as many backups as you need to retaining these for as long as you need.

    Enabling PITR during deployment

    By default, the Cumulus data-persistence module enables PITR on the default tables listed in the module's variable defaults for enable_point_in_time_tables. At the time of writing, that list includes:

    • AsyncOperationsTable
    • CollectionsTable
    • ExecutionsTable
    • FilesTable
    • GranulesTable
    • PdrsTable
    • ProvidersTable
    • RulesTable

    If you wish to change this list, simply update your deployment's data_persistence module (here in the template-deploy repository) to pass the correct list of tables.

    Restoring with PITR

    Restoring a full deployment

    If your deployment has been deleted all of your tables with PITR enabled will have had backups created automatically. You can locate these backups in the AWS console in the DynamoDb Backups Page or through the CLI by running:

    aws dynamodb list-backups --backup-type SYSTEM

    You can restore your tables to your AWS account using the following command:

    aws dynamodb restore-table-from-backup --target-table-name <prefix>-CollectionsTable --backup-arn <backup-arn>

    Where prefix matches the prefix from your data-persistence deployment. backup-arn can be found in the AWS console or by listing the backups using the command above.

    This will restore your tables to AWS. They will need to be linked to your Terraform deployment. After terraform init and before terraform apply, run the following command for each table:

    terraform import module.data_persistence.aws_dynamodb_table.collections_table <prefix>-CollectionsTable

    replacing collections_table with the table identifier in the DynamoDB Terraform table definitions.

    Terraform will now manage these tables as part of the Terraform state. Run terrform apply to generate the rest of the data-persistence deployment and then follow the instructions to deploy the cumulus deployment as normal.

    At this point the data will be in DynamoDB, but not in Elasticsearch, so nothing will be returned on the Operator dashboard or through Operator API calls. To get the data into Elasticsearch, run an index-from-database operation via the Operator API. The status of this operation can be viewed on the dashboard. When Elasticsearch is switched to the recovery index the data will be visible on the dashboard and available via the Operator API.

    Restoring an individual table

    A table can be restored to a previous state using PITR. This is easily achievable via the AWS Console by visiting the Backups tab for the table.

    A table can only be recovered to a new table name. Following the restoration of the table, the new table must be imported into Terraform.

    First, remove the old table from the Terraform state:

    terraform state rm module.data_persistence.aws_dynamodb_table.collections_table

    replacing collections_table with the table identifier in the DynamoDB Terraform table definitions.

    Then import the new table into the Terraform state:

    terraform import module.data_persistence.aws_dynamodb_table.collections_table <new-table-name>

    replacing collections_table with the table identifier in the DynamoDB Terraform table definitions.

    Your data-persistence and cumulus deployments should be redeployed so that your instance of Cumulus uses this new table. After the deployment, your Elasticsearch instance will be out of sync with your new table if there is any change in data. To resync your Elasticsearch with your database run an index-from-database operation via the Operator API. The status of this operation can be viewed on the dashboard. When Elasticsearch is switched to the new index the DynamoDB tables and Elasticsearch instance will be in sync and the correct data will be reflected on the dashboard.

    Backup and Restore with cumulus-api CLI

    cumulus-api CLI also includes a backup and restore command. The CLI backup command downloads the content of any of your DynamoDB tables to .json files. You can also use these .json files to restore the records to another DynamoDB table.

    Backup with the CLI

    To backup a table with the CLI, install the @cumulus/api package using npm, making sure to install the same version as your Cumulus deployment:

    npm install -g @cumulus/api@version

    Then run:

    cumulus-api backup --table <table-name>

    the backup will be stored at backups/<table-name>.json

    Restore with the CLI

    To restore data from a json file run the following command:

    cumulus-api restore backups/<table-name>.json --table <table-name>

    The restore can go to the in-use table and will update Elasticsearch. If an existing record exists in the table it will not be duplicated but will be updated with the record from the restore file.

    Data Backup and Restore

    Cumulus provides no core functionality to backup data stored in S3. Data disaster recovery is being developed in a separate effort here.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/features/data_in_dynamodb/index.html b/docs/v9.9.0/features/data_in_dynamodb/index.html index 8c692f4e2f8..6d220af9a9c 100644 --- a/docs/v9.9.0/features/data_in_dynamodb/index.html +++ b/docs/v9.9.0/features/data_in_dynamodb/index.html @@ -5,13 +5,13 @@ Cumulus Metadata in DynamoDB | Cumulus Documentation - +
    Version: v9.9.0

    Cumulus Metadata in DynamoDB

    @cumulus/api uses a number of methods to preserve the metadata generated in a Cumulus instance.

    All configurations and system-generated metadata is stored in DynamoDB tables except the logs. System logs are stored in the AWS CloudWatch service.

    Amazon DynamoDB stores three geographically distributed replicas of each table to enable high availability and data durability. Amazon DynamoDB runs exclusively on solid-state drives (SSDs). SSDs help AWS achieve the design goals of predictable low-latency response times for storing and accessing data at any scale.

    DynamoDB Auto Scaling

    Cumulus deployed tables from the data-persistence module are set to on-demand mode.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/features/dead_letter_archive/index.html b/docs/v9.9.0/features/dead_letter_archive/index.html index dce27db9e40..7f245acc3a0 100644 --- a/docs/v9.9.0/features/dead_letter_archive/index.html +++ b/docs/v9.9.0/features/dead_letter_archive/index.html @@ -5,13 +5,13 @@ Cumulus Dead Letter Archive | Cumulus Documentation - +
    Version: v9.9.0

    Cumulus Dead Letter Archive

    This documentation explains the Cumulus dead letter archive and associated functionality.

    DB Records DLQ Archive

    The Cumulus system contains a number of dead letter queues. Perhaps the most important system lambda function supported by a DLQ is the sfEventSqsToDbRecords lambda function which parses Cumulus messages from workflow executions to generate and write database records to the Cumulus database.

    As of Cumulus v9+, the dead letter queue for this lambda (named sfEventSqsToDbRecordsDeadLetterQueue) has been updated with a consumer lambda that will automatically write any incoming records to the S3 system bucket, under the path <stackName>/dead-letter-archive/sqs/. This will allow integrators and operators engaged in debugging missing records to inspect any Cumulus messages which failed to process and did not result in the successful creation of database records.

    Dead Letter Archive recovery

    In addition to the above, as of Cumulus v9+, the Cumulus API also contains a new endpoint at /deadLetterArchive/recoverCumulusMessages.

    Sending a POST request to this endpoint will trigger a Cumulus AsyncOperation that will attempt to reprocess (and if successful delete) all Cumulus messages in the dead letter archive, using the same underlying logic as the existing sfEventSqsToDbRecords.

    This endpoint may prove particularly useful when recovering from extended or unexpected database outage, where messages failed to process due to external outage and there is no essential malformation of each Cumulus message.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/features/dead_letter_queues/index.html b/docs/v9.9.0/features/dead_letter_queues/index.html index 0cec0460346..52950a0d4f4 100644 --- a/docs/v9.9.0/features/dead_letter_queues/index.html +++ b/docs/v9.9.0/features/dead_letter_queues/index.html @@ -5,13 +5,13 @@ Dead Letter Queues | Cumulus Documentation - +
    Version: v9.9.0

    Dead Letter Queues

    startSF SQS queue

    The workflow-trigger for the startSF queue has a Redrive Policy set up that directs any failed attempts to pull from the workflow start queue to a SQS queue Dead Letter Queue.

    This queue can then be monitored for failures to initiate a workflow. Please note that workflow failures will not show up in this queue, only repeated failure to trigger a workflow.

    Named Lambda Dead Letter Queues

    Cumulus provides configured Dead Letter Queues (DLQ) for non-workflow Lambdas (such as ScheduleSF) to capture Lambda failures for further processing.

    These DLQs are setup with the following configuration:

      receive_wait_time_seconds  = 20
    message_retention_seconds = 1209600
    visibility_timeout_seconds = 60

    Default Lambda Configuration

    The following built-in Cumulus Lambdas are setup with DLQs to allow handling of process failures:

    • dbIndexer (Updates Elasticsearch based on DynamoDB events)
    • JobsLambda (writes logs outputs to Elasticsearch)
    • ScheduleSF (the SF Scheduler Lambda that places messages on the queue that is used to start workflows, see Workflow Triggers)
    • publishReports (Lambda that publishes messages to the SNS topics for execution, granule and PDR reporting)
    • reportGranules, reportExecutions, reportPdrs (Lambdas responsible for updating records based on messages in the queues published by publishReports)

    Troubleshooting/Utilizing messages in a Dead Letter Queue

    Ideally an automated process should be configured to poll the queue and process messages off a dead letter queue.

    For aid in manually troubleshooting, you can utilize the SQS Management console to view/messages available in the queues setup for a particular stack. The dead letter queues will have a Message Body containing the Lambda payload, as well as Message Attributes that reference both the error returned and a RequestID which can be cross referenced to the associated Lambda's CloudWatch logs for more information:

    Screenshot of the AWS SQS console showing how to view SQS message attributes

    - + \ No newline at end of file diff --git a/docs/v9.9.0/features/distribution-metrics/index.html b/docs/v9.9.0/features/distribution-metrics/index.html index 82f13418c56..38a52de5289 100644 --- a/docs/v9.9.0/features/distribution-metrics/index.html +++ b/docs/v9.9.0/features/distribution-metrics/index.html @@ -5,13 +5,13 @@ Cumulus Distribution Metrics | Cumulus Documentation - +
    Version: v9.9.0

    Cumulus Distribution Metrics

    It is possible to configure Cumulus and the Cumulus Dashboard to display information about the successes and failures of requests for data. This requires the Cumulus instance to deliver Cloudwatch Logs and S3 Server Access logs to an ELK stack.

    ESDIS Metrics in NGAP

    Work with the ESDIS metrics team to set up permissions and access to forward Cloudwatch Logs to a shared AWS:Logs:Destination as well as transferring your S3 Server Access logs to a metrics team bucket.

    The metrics team has taken care of setting up logstash to ingest the files that get delivered to their bucket into their Elasticsearch instance.

    Once Cumulus has been configured to deliver Cloudwatch logs to the ESDIS Metrics team, you can use the Elasticsearch indexes to create the necessary target patterns on the dashboard. These are often <daac>-cloudwatch-cumulus-<env>-* and <daac>-distribution-<env>-*, but they will depend on your specific Elastiscearch setup.

    Cumulus / ESDIS Metrics distribution system

    Architecture diagram showing how logs are replicated from a Cumulus instance to the ESDIS Metrics account and accessed by the Cumulus dashboard

    - + \ No newline at end of file diff --git a/docs/v9.9.0/features/execution_payload_retention/index.html b/docs/v9.9.0/features/execution_payload_retention/index.html index 3b917406b34..2ac1655d22c 100644 --- a/docs/v9.9.0/features/execution_payload_retention/index.html +++ b/docs/v9.9.0/features/execution_payload_retention/index.html @@ -5,13 +5,13 @@ Execution Payload Retention | Cumulus Documentation - +
    Version: v9.9.0

    Execution Payload Retention

    In addition to CloudWatch logs and AWS StepFunction API records, Cumulus automatically stores the initial and 'final' (the last update to the execution record) payload values as part of the Execution record in DynamoDB and Elasticsearch.

    This allows access via the API (or optionally direct DB/Elasticsearch querying) for debugging/reporting purposes. The data is stored in the "originalPayload" and "finalPayload" fields.

    Payload record cleanup

    To reduce storage requirements, a CloudWatch rule ({stack-name}-dailyExecutionPayloadCleanupRule) triggering a daily run of the provided cleanExecutions lambda has been added. This lambda will remove all 'completed' and 'non-completed' payload records in the database that are older than the specified configuration.

    Configuration

    The following configuration flags have been made available in the cumulus module. They may be overridden in your deployment's instance of the cumulus module by adding the following configuration options:

    dailyexecution_payload_cleanup_schedule_expression (string)_

    This configuration option sets the execution times for this Lambda to run, using a Cloudwatch cron expression.

    Default value is "cron(0 4 * * ? *)".

    completeexecution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of completed execution payloads.

    Default value is false.

    completeexecution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a 'completed' status in days. Records with updatedAt values older than this with payload information will have that information removed.

    Default value is 10.

    noncomplete_execution_payload_timeout_disable (bool)_

    This configuration option, when set to true, will disable all cleanup of "non-complete" (any status other than completed) execution payloads.

    Default value is false.

    noncomplete_execution_payload_timeout (number)_

    This flag defines the cleanup threshold for executions with a status other than 'complete' in days. Records with updateTime values older than this with payload information will have that information removed.

    Default value is 30 days.

    • complete_execution_payload_disable/non_complete_execution_payload_disable

    These flags (true/false) determine if the cleanup script's logic for 'complete' and 'non-complete' executions will run. Default value is false for both.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/features/logging-esdis-metrics/index.html b/docs/v9.9.0/features/logging-esdis-metrics/index.html index 427d72def14..a421643ff32 100644 --- a/docs/v9.9.0/features/logging-esdis-metrics/index.html +++ b/docs/v9.9.0/features/logging-esdis-metrics/index.html @@ -5,13 +5,13 @@ Writing logs for ESDIS Metrics | Cumulus Documentation - +
    Version: v9.9.0

    Writing logs for ESDIS Metrics

    Note: This feature is only available for Cumulus deployments in NGAP environments.

    Prerequisite: You must configure your Cumulus deployment to deliver your logs to the correct shared logs destination for ESDIS metrics.

    Log messages delivered to the ESDIS metrics logs destination conforming to an expected format will be automatically ingested and parsed to enable helpful searching/filtering of your logs via the ESDIS metrics Kibana dashboard.

    Expected log format

    The ESDIS metrics pipeline expects a log message to be a JSON string representation of an object (dict in Python or map in Java). An example log message might look like:

    {
    "level": "info",
    "executions": "arn:aws:states:us-east-1:000000000000:execution:MySfn:abcd1234",
    "granules": "[\"granule-1\",\"granule-2\"]",
    "message": "hello world",
    "sender": "greetingFunction",
    "stackName": "myCumulus",
    "timestamp": "2018-10-19T19:12:47.501Z"
    }

    A log message can contain the following properties:

    • executions: The AWS Step Function execution name in which this task is executing, if any
    • granules: A JSON string of the array of granule IDs being processed by this code, if any
    • level: A string identifier for the type of message being logged. Possible values:
      • debug
      • error
      • fatal
      • info
      • warn
      • trace
    • message: String containing your actual log message
    • parentArn: The parent AWS Step Function execution ARN that triggered the current execution, if any
    • sender: The name of the resource generating the log message (e.g. a library name, a Lambda function name, an ECS activity name)
    • stackName: The unique prefix for your Cumulus deployment
    • timestamp: An ISO-8601 formatted timestamp
    • version: The version of the resource generating the log message, if any

    None of these properties are explicitly required for ESDIS metrics to parse your log correctly. However, a log without a message has no informational content. And having level, sender, and timestamp properties is very useful for filtering your logs. Including a stackName in your logs is helpful as it allows you to distinguish between logs generated by different deployments.

    Using Cumulus Message Adapter libraries

    If you are writing a custom task that is integrated with the Cumulus Message Adapter, then some of language specific client libraries can be used to write logs compatible with ESDIS metrics.

    The usage of each library differs slightly, but in general a logger is initialized with a Cumulus workflow message to determine the contextual information for the task (e.g. granules, executions). Then, after the logger is initialized, writing logs only requires specifying a message, but the logged output will include the contextual information as well.

    Writing logs using custom code

    Any code that produces logs matching the expected log format can be processed by ESDIS metrics.

    Node.js

    Cumulus core provides a @cumulus/logger library that writes logs in the expected format for ESDIS metrics.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/features/replay-archived-sqs-messages/index.html b/docs/v9.9.0/features/replay-archived-sqs-messages/index.html index 0f3324a62f7..d1d33cd3226 100644 --- a/docs/v9.9.0/features/replay-archived-sqs-messages/index.html +++ b/docs/v9.9.0/features/replay-archived-sqs-messages/index.html @@ -5,14 +5,14 @@ How to replay SQS messages archived in S3 | Cumulus Documentation - +
    Version: v9.9.0

    How to replay SQS messages archived in S3

    Context

    Cumulus archives all incoming SQS messages to S3 and removes messages once they have been processed. Unprocessed messages are archived at the path: ${stackName}/archived-incoming-messages/${queueName}/${messageId}

    Replay SQS messages endpoint

    The Cumulus API has added a new endpoint, /replays/sqs. This endpoint will allow you to start a replay operation to requeue all archived SQS messages by queueName and returns an AsyncOperationId for operation status tracking.

    Start replaying archived SQS messages

    In order to start a replay, you must perform a POST request to the replays/sqs endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    FieldTypeDescription
    queueNamestringAny valid SQS queue name (not ARN)

    Status tracking

    A successful response from the /replays/sqs endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/features/replay-kinesis-messages/index.html b/docs/v9.9.0/features/replay-kinesis-messages/index.html index 09bff91b2e6..c6d57388f5a 100644 --- a/docs/v9.9.0/features/replay-kinesis-messages/index.html +++ b/docs/v9.9.0/features/replay-kinesis-messages/index.html @@ -5,7 +5,7 @@ How to replay Kinesis messages after an outage | Cumulus Documentation - + @@ -13,7 +13,7 @@
    Version: v9.9.0

    How to replay Kinesis messages after an outage

    After a period of outage, it may be necessary for a Cumulus operator to reprocess or 'replay' messages that arrived on an AWS Kinesis Data Stream but did not trigger an ingest. This document serves as an outline on how to start a replay operation, and how to perform status tracking. Cumulus supports replay of all Kinesis messages on a stream (subject to the normal RetentionPeriod constraints), or all messages within a given time slice delimited by start and end timestamps.

    As Kinesis has no comparable field to e.g. the SQS ReceiveCount on its records, Cumulus cannot tell which messages within a given time slice have never been processed, and cannot guarantee only missed messages will be processed. Users will have to rely on duplicate handling or some other method of identifying messages that should not be processed within the time slice.

    NOTE: This operation flow effectively changes only the trigger mechanism for Kinesis ingest notifications. The existence of valid Kinesis-type rules and all other normal requirements for the triggering of ingest via Kinesis still apply.

    Replays endpoint

    Cumulus has added a new endpoint to its API, /replays. This endpoint will allow you to start replay operations and returns an AsyncOperationId for operation status tracking.

    Start a replay

    In order to start a replay, you must perform a POST request to the replays endpoint.

    The required and optional fields that should be part of the body of this request are documented below.

    NOTE: As the endTimestamp relies on a comparison with the Kinesis server-side ApproximateArrivalTimestamp, and given that there is no documented level of accuracy for the approximation, it is recommended that the endTimestamp include some amount of buffer to allow for slight discrepancies. If tolerable, the same is recommended for the startTimestamp although it is used differently and less vulnerable to discrepancies since a server-side arrival timestamp should never be earlier than the client-side request timestamp.

    FieldTypeRequiredDescription
    typestringrequiredCurrently only accepts kinesis.
    kinesisStreamstringfor type kinesisAny valid kinesis stream name (not ARN)
    kinesisStreamCreationTimestamp*optionalAny input valid for a JS Date constructor. For reasons to use this field see AWS documentation on StreamCreationTimestamp.
    endTimestamp*optionalAny input valid for a JS Date constructor. Messages newer than this timestamp will be skipped.
    startTimestamp*optionalAny input valid for a JS Date constructor. Messages will be fetched from the Kinesis stream starting at this timestamp. Ignored if it is further in the past than the stream's retention period.

    Status tracking

    A successful response from the /replays endpoint will contain an asyncOperationId field. Use this ID with the /asyncOperations endpoint to track the status.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/features/reports/index.html b/docs/v9.9.0/features/reports/index.html index 67d40795c7a..5206246d263 100644 --- a/docs/v9.9.0/features/reports/index.html +++ b/docs/v9.9.0/features/reports/index.html @@ -5,7 +5,7 @@ Reconciliation Reports | Cumulus Documentation - + @@ -16,7 +16,7 @@ Screenshot of the Dashboard Rconciliation Reports Overview page

    Viewing an inventory report will show a detailed list of collections, granules and files. Screenshot of an Inventory Report page

    Viewing a granule not found report will show a list of granules missing data Screenshot of a Granule Not Found Report page

    API

    The API also allows users to create and view reports. For more extensive API documentation, see the Cumulus API docs.

    Creating a Report via API

    Create a new inventory report with the following:

    curl --request POST https://example.com/reconciliationReports --header 'Authorization: Bearer ReplaceWithToken'

    Example response:

    {
    "message": "Report is being generated",
    "status": 202
    }

    Retrieving a Report via API

    Once a report has been generated, you can retrieve the full report.

    curl https://example.com/reconciliationReports/inventoryReport-20190305T153430508 --header 'Authorization: Bearer ReplaceWithTheToken'

    Example response:

    {
    "reportStartTime": "2019-03-05T15:34:30.508Z",
    "reportEndTime": "2019-03-05T15:34:37.243Z",
    "status": "SUCCESS",
    "error": null,
    "filesInCumulus": {
    "okCount": 40,
    "onlyInS3": [
    "s3://cumulus-test-sandbox-protected/MOD09GQ.A2016358.h13v04.006.2016360104606.cmr.xml",
    "s3://cumulus-test-sandbox-private/BROWSE.MYD13Q1.A2017297.h19v10.006.2017313221201.hdf"
    ],
    "onlyInDynamoDb": [
    {
    "uri": "s3://cumulus-test-sandbox-protected/MOD09GQ.A2016358.h13v04.006.2016360104606.hdf",
    "granuleId": "MOD09GQ.A2016358.h13v04.006.2016360104606"
    }
    ]
    },
    "collectionsInCumulusCmr": {
    "okCount": 1,
    "onlyInCumulus": [
    "L2_HR_PIXC___000"
    ],
    "onlyInCmr": [
    "MCD43A1___006",
    "MOD14A1___006"
    ]
    },
    "granulesInCumulusCmr": {
    "okCount": 3,
    "onlyInCumulus": [
    {
    "granuleId": "MOD09GQ.A3518809.ln_rVr.006.7962927138074",
    "collectionId": "MOD09GQ___006"
    },
    {
    "granuleId": "MOD09GQ.A8768252.HC4ddD.006.2077696236118",
    "collectionId": "MOD09GQ___006"
    }
    ],
    "onlyInCmr": [
    {
    "GranuleUR": "MOD09GQ.A0002421.oD4zvB.006.4281362831355",
    "ShortName": "MOD09GQ",
    "Version": "006"
    }
    ]
    },
    "filesInCumulusCmr": {
    "okCount": 11,
    "onlyInCumulus": [
    {
    "fileName": "MOD09GQ.A8722843.GTk5A3.006.4026909316904.jpeg",
    "uri": "s3://cumulus-test-sandbox-public/MOD09GQ___006/MOD/MOD09GQ.A8722843.GTk5A3.006.4026909316904.jpeg",
    "granuleId": "MOD09GQ.A8722843.GTk5A3.006.4026909316904"
    }
    ],
    "onlyInCmr": [
    {
    "URL": "https://cumulus-test-sandbox-public.s3.amazonaws.com/MOD09GQ___006/MOD/MOD09GQ.A8722843.GTk5A3.006.4026909316904_ndvi.jpg",
    "Type": "GET DATA",
    "GranuleUR": "MOD09GQ.A8722843.GTk5A3.006.4026909316904"
    }
    ]
    }
    }
    - + \ No newline at end of file diff --git a/docs/v9.9.0/getting-started/index.html b/docs/v9.9.0/getting-started/index.html index 24bc6ee58fb..5b43a3f111e 100644 --- a/docs/v9.9.0/getting-started/index.html +++ b/docs/v9.9.0/getting-started/index.html @@ -5,13 +5,13 @@ Getting Started | Cumulus Documentation - +
    Version: v9.9.0

    Getting Started

    Overview | Quick Tutorials | Helpful Tips

    Overview

    This serves as a guide for new Cumulus users to deploy and learn how to use Cumulus. Here you will learn what you need in order to complete any prerequisites, what Cumulus is and how it works, and how to successfully navigate and deploy a Cumulus environment.

    What is Cumulus

    Cumulus is an open source set of components for creating cloud-based data ingest, archive, distribution and management designed for NASA's future Earth Science data streams.

    Who uses Cumulus

    Data integrators/developers and operators across projects not limited to NASA use Cumulus for their daily work functions.

    Cumulus Roles

    Integrator/Developer

    Cumulus integrators/developers are those who work within Cumulus and AWS for deployments and to manage workflows.

    Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections.

    Role Guides

    As a developer, integrator, or operator, you will need to set up your environments to work in Cumulus. The following docs can get you started in your role specific activities.

    What is a Cumulus Data Type

    In Cumulus, we have the following types of data that you can create and manage:

    • Collections
    • Granules
    • Providers
    • Rules
    • Workflows
    • Executions
    • Reports

    For details on how to create or manage data types go to Data Management Types.


    Quick Tutorials

    Deployment & Configuration

    Cumulus is deployed to an AWS account, so you must have access to deploy resources to an AWS account to get started.

    1. Deploy Cumulus and Cumulus Dashboard to AWS

    Follow the deployment instructions to deploy Cumulus to your AWS account.

    2. Configure and Run the HelloWorld Workflow

    If you have deployed using the cumulus-template-deploy repository, you have a HelloWorld workflow deployed to your Cumulus backend.

    You can see your deployed workflows on the Workflows page of your Cumulus dashboard.

    Configure a collection and provider using the setup guidance on the Cumulus dashboard.

    Then create a rule to trigger your HelloWorld workflow. You can select a rule type of one time.

    Navigate to the Executions page of the dashboard to check the status of your workflow execution.

    3. Configure a Custom Workflow

    See Developing a custom workflow documentation for adding a new workflow to your deployment.

    There are plenty of workflow examples using Cumulus tasks here. The Data Cookbooks provide a more in-depth look at some of these more advanced workflows and their configurations.

    There is a list of Cumulus tasks already included in your deployment here.

    After configuring your workflow and redeploying, you can configure and run your workflow using the same steps as in step 2.


    Helpful Tips

    Here are some useful tips to keep in mind when deploying or working in Cumulus.

    Integrator/Developer

    • Versioning and Releases: This documentation gives information on our global versioning approach. We suggest upgrading to the supported version for Cumulus, Cumulus dashboard, and Thin Egress App (TEA).
    • Cumulus Developer Documentation: We suggest that you read through and reference this resource for development best practices in Cumulus.
    • Cumulus Deployment: We will guide you on how to manually deploy a new instance of Cumulus. In this reference, you will learn how to install Terraform, create an AWS S3 bucket, configure a compatible database, and create a Lambda layer.
    • Terraform Best Practices: This will help guide you through your Terraform configuration and Cumulus deployment. For an introduction about Terraform go here.
    • Integrator Common Use Cases: Scenarios to help integrators along in the Cumulus environment.

    Operator

    Troubleshooting

    Troubleshooting: Some suggestions to help you troubleshoot and solve issues you may encounter.

    Resources

    - + \ No newline at end of file diff --git a/docs/v9.9.0/glossary/index.html b/docs/v9.9.0/glossary/index.html index b500bfa0773..384c826a8a8 100644 --- a/docs/v9.9.0/glossary/index.html +++ b/docs/v9.9.0/glossary/index.html @@ -5,13 +5,13 @@ Glossary | Cumulus Documentation - +
    Version: v9.9.0

    Glossary

    AWS Glossary

    For terms/items from Amazon/AWS not mentioned in this glossary, please refer to the AWS Glossary.

    Cumulus Glossary of Terms

    API Gateway

    Refers to AWS's API Gateway. Used by the Cumulus API.

    ARN

    Refers to an AWS "Amazon Resource Name".

    For more info, see the AWS documentation.

    AWS

    See: aws.amazon.com

    AWS Lambda/Lambda Function

    AWS's 'serverless' option. Allows the running of code without provisioning a service or managing server/ECS instances/etc.

    For more information, see the AWS Lambda documentation.

    AWS Access Keys

    Access credentials that give you access to AWS to act as a IAM user programmatically or from the command line.

    For more information, see the AWS IAM Documentation.

    Bucket

    An Amazon S3 cloud storage resource.

    For more information, see the AWS Bucket Documentation.

    CloudFormation

    An AWS service that allows you to define and manage cloud resources as a preconfigured block.

    For more information, see the AWS CloudFormation User Guide.

    Cloudformation Template

    A template that defines an AWS Cloud Formation.

    For more information, see the AWS intro page.

    Cloudwatch

    AWS service that allows logging and metrics collections on various cloud resources you have in AWS.

    For more information, see the AWS User Guide.

    Cloud Notification Mechanism (CNM)

    An interface mechanism to support cloud-based ingest messaging. For more information, see PO.DAAC's CNM Schema.

    Common Metadata Repository (CMR)

    "A high-performance, high-quality, continuously evolving metadata system that catalogs Earth Science data and associated service metadata records". For more information, see NASA's CMR page.

    Collection (Cumulus)

    Cumulus Collections are logical sets of data objects of the same data type and version.

    For more information, see cookbook reference page.

    Cumulus Message Adapter (CMA)

    A library designed to help task developers integrate step function tasks into a Cumulus workflow by adapting task input/output into the Cumulus Message format.

    For more information, see CMA workflow reference page.

    Distributed Active Archive Center (DAAC)

    Refers to a specific organization that's part of NASA's distributed system of archive centers. For more information see EOSDIS's DAAC page

    Dead Letter Queue (DLQ)

    This refers to Amazon SQS Dead-Letter Queues - these SQS queues are specifically configured to capture failed messages from other services/SQS queues/etc to allow for processing of failed messages.

    For more on DLQs, see the Amazon Documentation and the Cumulus DLQ feature page.

    Developer

    Those who setup deployment and workflow management for Cumulus. Sometimes referred to as an integrator. See integrator.

    ECS

    Amazon's Elastic Container Service. Used in Cumulus by workflow steps that require more flexibility than Lambda can provide.

    For more information, see AWS's developer guide.

    ECS Activity

    An ECS instance run via a Step Function.

    Execution (Cumulus)

    A Cumulus execution refers to a single execution of a (Cumulus) Workflow.

    GIBS

    Global Imagery Browse Services

    Granule

    A granule is the smallest aggregation of data that can be independently managed (described, inventoried, and retrieved). Granules are always associated with a collection, which is a grouping of granules. A granule is a grouping of data files.

    IAM

    AWS Identity and Access Management.

    For more information, see AWS IAMs.

    Integrator/Developer

    Those who work within Cumulus and AWS for deployments and to manage workflows.

    Kinesis

    Amazon's platform for streaming data on AWS.

    See AWS Kinesis for more information.

    Lambda

    AWS's cloud service that lets you run code without provisioning or managing servers.

    For more information, see AWS's lambda page.

    Module (Terraform)

    Refers to a terraform module.

    Node

    See node.js.

    Npm

    Node package manager.

    For more information, see npmjs.com.

    Operator

    Those who work within Cumulus to ingest/archive data and manage collections.

    PDR

    "Polling Delivery Mechanism" used in "DAAC Ingest" workflows.

    For more information, see nasa.gov.

    Packages (NPM)

    NPM hosted node.js packages. Cumulus packages can be found on NPM's site here

    Provider

    Data source that generates and/or distributes data for Cumulus workflows to act upon.

    For more information, see the Cumulus documentation.

    Rule

    Rules are configurable scheduled events that trigger workflows based on various criteria.

    For more information, see the Cumulus Rules documentation.

    S3

    Amazon's Simple Storage Service provides data object storage in the cloud. Used in Cumulus to store configuration, data and more.

    For more information, see AWS's s3 page.

    SIPS

    Science Investigator-led Processing Systems. In the context of DAAC ingest, this refers to data producers/providers.

    For more information, see nasa.gov.

    SNS

    Amazon's Simple Notification Service provides a messaging service that allows publication of and subscription to events. Used in Cumulus to trigger workflow events, track event failures, and others.

    For more information, see AWS's SNS page.

    SQS

    Amazon's Simple Queue Service.

    For more information, see AWS's SQS page.

    Stack

    A collection of AWS resources you can manage as a single unit.

    In the context of Cumulus, this refers to a deployment of the cumulus and data-persistence modules that is managed by Terraform

    Step Function

    AWS's web service that allows you to compose complex workflows as a state machine comprised of tasks (Lambdas, activities hosted on EC2/ECS, some AWS service APIs, etc). See AWS's Step Function Documentation for more information. In the context of Cumulus these are the underlying AWS service used to create Workflows.

    Terraform

    Terraform is the tool that you will use for deployment and configuration of your Cumulus environment.

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/index.html b/docs/v9.9.0/index.html index adafba823b5..1e74f66c4be 100644 --- a/docs/v9.9.0/index.html +++ b/docs/v9.9.0/index.html @@ -5,13 +5,13 @@ Introduction | Cumulus Documentation - +
    Version: v9.9.0

    Introduction

    This Cumulus project seeks to address the existing need for a “native” cloud-based data ingest, archive, distribution, and management system that can be used for all future Earth Observing System Data and Information System (EOSDIS) data streams via the development and implementation of Cumulus. The term “native” implies that the system will leverage all components of a cloud infrastructure provided by the vendor for efficiency (in terms of both processing time and cost). Additionally, Cumulus will operate on future data streams involving satellite missions, aircraft missions, and field campaigns.

    This documentation includes both guidelines, examples, and source code docs. It is accessible at https://nasa.github.io/cumulus.


    Get To Know Cumulus

    • Getting Started - here - If you are new to Cumulus we suggest that you begin with this section to help you understand and work in the environment.
    • General Cumulus Documentation - here <- you're here

    Cumulus Reference Docs

    • Cumulus API Documentation - here
    • Cumulus Developer Documentation - here - READMEs throughout the main repository.
    • Data Cookbooks - here

    Auxiliary Guides

    • Integrator Guide - here
    • Operator Docs - here

    Contributing

    Please refer to: https://github.com/nasa/cumulus/blob/master/CONTRIBUTING.md for information. We thank you in advance.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/integrator-guide/about-int-guide/index.html b/docs/v9.9.0/integrator-guide/about-int-guide/index.html index af3d79898b4..685174fa844 100644 --- a/docs/v9.9.0/integrator-guide/about-int-guide/index.html +++ b/docs/v9.9.0/integrator-guide/about-int-guide/index.html @@ -5,13 +5,13 @@ About Integrator Guide | Cumulus Documentation - +
    Version: v9.9.0

    About Integrator Guide

    Purpose

    The Integrator Guide is to help supplement the Cumulus documentation and Data Cookbooks. This content is for Cumulus integrators who are either new to the project or need a step-by-step resource to help them along.

    What Is A Cumulus Integrator

    Cumulus integrators are those who work within Cumulus and AWS for deployments and to manage workflows. They may perform the following functions:

    • Configure and deploy Cumulus to the AWS environment
    • Configure Cumulus workflows
    • Write custom workflow tasks
    - + \ No newline at end of file diff --git a/docs/v9.9.0/integrator-guide/int-common-use-cases/index.html b/docs/v9.9.0/integrator-guide/int-common-use-cases/index.html index 4fc50bd599f..7967a5e9acf 100644 --- a/docs/v9.9.0/integrator-guide/int-common-use-cases/index.html +++ b/docs/v9.9.0/integrator-guide/int-common-use-cases/index.html @@ -5,13 +5,13 @@ Integrator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v9.9.0/integrator-guide/workflow-add-new-lambda/index.html b/docs/v9.9.0/integrator-guide/workflow-add-new-lambda/index.html index 0ba2c990eca..ad265400ccd 100644 --- a/docs/v9.9.0/integrator-guide/workflow-add-new-lambda/index.html +++ b/docs/v9.9.0/integrator-guide/workflow-add-new-lambda/index.html @@ -5,13 +5,13 @@ Workflow - Add New Lambda | Cumulus Documentation - +
    Version: v9.9.0

    Workflow - Add New Lambda

    You can develop a workflow task in AWS Lambda or Elastic Container Service (ECS). AWS ECS requires Docker. For a list of tasks to use go to our Cumulus Tasks page.

    The following steps are to help you along as you write a new Lambda that integrates with a Cumulus workflow. This will aid you with the understanding of the Cumulus Message Adapter (CMA) process.

    Steps

    1. Define New Lambda in Terraform

    2. Add Task in JSON Object

      For details on how to set up a workflow via CMA go to the CMA Tasks: Message Flow.

      You will need to assign input and output for the new task and follow the CMA contract here. This contract defines how libraries should call the cumulus-message-adapter to integrate a task into an existing Cumulus Workflow.

    3. Verify New Task

      Check the updated workflow in AWS and in Cumulus.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/integrator-guide/workflow-ts-failed-step/index.html b/docs/v9.9.0/integrator-guide/workflow-ts-failed-step/index.html index eae66053248..c001957e76f 100644 --- a/docs/v9.9.0/integrator-guide/workflow-ts-failed-step/index.html +++ b/docs/v9.9.0/integrator-guide/workflow-ts-failed-step/index.html @@ -5,13 +5,13 @@ Workflow - Troubleshoot Failed Step(s) | Cumulus Documentation - +
    Version: v9.9.0

    Workflow - Troubleshoot Failed Step(s)

    Steps

    1. Locate Step
    • Go to Cumulus dashboard
    • Find the granule
    • Go to Executions to determine the failed step
    1. Investigate in Cloudwatch
    • Go to Cloudwatch
    • Locate lambda
    • Search Cloudwatch logs
    1. Recreate Error

      In your sandbox environment, try to recreate the error.

    2. Resolution

    - + \ No newline at end of file diff --git a/docs/v9.9.0/interfaces/index.html b/docs/v9.9.0/interfaces/index.html index b68604bf3d3..afdbbae6aed 100644 --- a/docs/v9.9.0/interfaces/index.html +++ b/docs/v9.9.0/interfaces/index.html @@ -5,13 +5,13 @@ Interfaces | Cumulus Documentation - +
    Version: v9.9.0

    Interfaces

    Cumulus has multiple interfaces that allow interaction with discrete components of the system, such as starting workflows via SNS/Kinesis/SQS, manually queueing workflow start messages, submitting SNS notifications for completed workflows, and the many operations allowed by the Cumulus API.

    The diagram below illustrates the workflow process in detail and the various interfaces that allow starting of workflows, reporting of workflow information, and database create operations that occur when a workflow reporting message is processed. For interfaces with expected input or output schemas, details are provided below.

    Note: This diagram is current of v1.18.0.

    Architecture diagram showing the interfaces for triggering and reporting of Cumulus workflow executions

    Workflow triggers and queuing

    Kinesis stream

    As a Kinesis stream is consumed by the messageConsumer Lambda to queue workflow executions, the incoming event is validated against this consumer schema by the ajv package.

    SQS queue for executions

    The messages put into the SQS queue for executions should conform to the Cumulus message format.

    Workflow executions

    See the documentation on Cumulus workflows.

    Workflow reporting

    SNS reporting topics

    For granule and PDR reporting, the topics will only receive data if the Cumulus workflow execution message meets the following criteria:

    • Granules - workflow message contains granule data in payload.granules
    • PDRs - workflow message contains PDR data in payload.pdr

    The messages published to the SNS reporting topics for executions and PDRs and the record property in the messages published to the granules SNS topic should conform to the model schema for each data type.

    Further detail on workflow reporting and how to interact with these interfaces can be found in the workflow notifications data cookbook.

    Cumulus API

    See the Cumulus API documentation.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/operator-docs/about-operator-docs/index.html b/docs/v9.9.0/operator-docs/about-operator-docs/index.html index 7373ca653fc..2dd682cae7a 100644 --- a/docs/v9.9.0/operator-docs/about-operator-docs/index.html +++ b/docs/v9.9.0/operator-docs/about-operator-docs/index.html @@ -5,13 +5,13 @@ About Operator Docs | Cumulus Documentation - +
    Version: v9.9.0

    About Operator Docs

    Purpose

    Operator Docs are an augmentation to Cumulus documentation and Data Cookbooks. These documents will walk step-by-step through common Cumulus activities (that aren't necessarily as use-case directed as what you'd see in Data Cookbooks).

    What Is A Cumulus Operator

    Cumulus operators are those who work within Cumulus to ingest/archive data and manage collections. They may perform the following functions via the operator dashboard or API:

    • Configure providers and collections
    • Configure rules and monitor workflow executions
    • Monitor granule ingestion
    • Monitor system metrics
    - + \ No newline at end of file diff --git a/docs/v9.9.0/operator-docs/bulk-operations/index.html b/docs/v9.9.0/operator-docs/bulk-operations/index.html index 1ba11914820..a129b7967d9 100644 --- a/docs/v9.9.0/operator-docs/bulk-operations/index.html +++ b/docs/v9.9.0/operator-docs/bulk-operations/index.html @@ -5,14 +5,14 @@ Bulk Operations | Cumulus Documentation - +
    Version: v9.9.0

    Bulk Operations

    Cumulus implements bulk operations through the use of AsyncOperations, which are long-running processes executed on an AWS ECS cluster.

    Submitting a bulk API request

    Bulk operations are generally submitted via the endpoint for the relevant data type, e.g. granules. For a list of supported API requests, refer to the Cumulus API documentation. Bulk operations are denoted with the keyword 'bulk'.

    Starting bulk operations from the Cumulus dashboard

    Using a Kibana query

    Note: You must have configured your dashboard build with a KIBANAROOT environment variable in order for the Kibana link to render in the bulk granules modal

    1. From the Granules dashboard page, click on the "Run Bulk Granules" button, then select what type of action you would like to perform

      • Note: the rest of the process is the same regardless of what type of bulk action you perform
    2. From the bulk granules modal, click the "Open Kibana" link:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations

    3. Once you have accessed Kibana, navigate to the "Discover" page. If this is your first time using Kibana, you may see a message like this at the top of the page:

      In order to visualize and explore data in Kibana, you'll need to create an index pattern to retrieve data from Elasticsearch.

      In that case, see the docs for creating an index pattern for Kibana

      Screenshot of Kibana user interface showing the &quot;Discover&quot; page for running queries

    4. Enter a query that returns the granule records that you want to use for bulk operations:

      Screenshot of Kibana user interface showing an example Kibana query and results

    5. Once the Kibana query is returning the results you want, click the "Inspect" link near the top of the page. A slide out tab with request details will appear on the right side of the page:

      Screenshot of Kibana user interface showing details of an example request

    6. In the slide out tab that appears on the right side of the page, click the "Request" link near the top and scroll down until you see the query property:

      Screenshot of Kibana user interface showing the Elasticsearch data request made for a given Kibana query

    7. Highlight and copy the query contents from Kibana. Go back to the Cumulus dashboard and paste the query contents from Kibana inside of the query property in the bulk granules request payload. It is expected that you should have a property of query nested inside of the existing query property:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query information populated

    8. Add values for the index and workflowName to the bulk granules request payload. The value for index will vary based on your Elasticsearch setup, but it is good to target an index specifically for granule data if possible:

      Screenshot of Cumulus dashboard showing modal window for triggering bulk granule operations with query, index, and workflow information populated

    9. Click the "Run Bulk Operations" button. You should see a confirmation message, including an ID for the async operation that was started to handle your bulk action. You can track the status of this async operation on the Operations dashboard page, which can be visited by clicking the "Go To Operations" button:

      Screenshot of Cumulus dashboard showing confirmation message with async operation ID for bulk granules request

    Creating an index pattern for Kibana

    1. Define the index pattern for the indices that your Kibana queries should use. A wildcard character, *, will match across multiple indices. Once you are satisfied with your index pattern, click the "Next step" button:

      Screenshot of Kibana user interface for defining an index pattern

    2. Choose whether to use a Time Filter for your data, which is not required. Then click the "Create index pattern" button:

      Screenshot of Kibana user interface for configuring the settings of an index pattern

    Status Tracking

    All bulk operations return an AsyncOperationId which can be submitted to the /asyncOperations endpoint.

    The /asyncOperations endpoint allows listing of AsyncOperation records as well as record retrieval for individual records, which will contain the status. The Cumulus API documentation shows sample requests for these actions.

    The Cumulus Dashboard also includes an Operations monitoring page, where operations and their status are visible:

    Screenshot of Cumulus Dashboard Operations Page showing 5 operations and their status, ID, description, type and creation timestamp

    - + \ No newline at end of file diff --git a/docs/v9.9.0/operator-docs/cmr-operations/index.html b/docs/v9.9.0/operator-docs/cmr-operations/index.html index 1b6a793996d..6a6d28a1e41 100644 --- a/docs/v9.9.0/operator-docs/cmr-operations/index.html +++ b/docs/v9.9.0/operator-docs/cmr-operations/index.html @@ -5,7 +5,7 @@ CMR Operations | Cumulus Documentation - + @@ -16,7 +16,7 @@ UpdateCmrAccessConstraints will update CMR metadata file contents on S3, and PostToCmr will push the updates to CMR. The rest of this section will assume you have created this workflow under the name UpdateCmrAccessConstraints.

    Once created and deployed, the workflow is available in the Cumulus dashboard's Execute workflow selector. However, note that additional configuration is required for this request, to supply an access constraint integer value and optional description to the UpdateCmrAccessConstraints workflow, by clicking the Add Custom Workflow Meta option in the Execute popup, as shown below:

    Screenshot showing granule execute popup with &#39;updateCmrAccessConstraints&#39; selected and configuration values shown in a collapsible JSON field

    An example invocation of the API to perform this action is:

    $ curl --request PUT https://example.com/granules/MOD11A1.A2017137.h19v16.006.2017138085750 \
    --header 'Authorization: Bearer ReplaceWithTheToken' \
    --header 'Content-Type: application/json' \
    --data '{
    "action": "applyWorkflow",
    "workflow": "updateCmrAccessConstraints",
    "meta": {
    accessConstraints: {
    value: 5,
    description: "sample access constraint"
    }
    }
    }'

    Supported CMR metadata formats for the above operation are Echo10XML and UMMG-JSON, which will populate the RestrictionFlag and RestrictionComment fields in Echo10XML, or the AccessConstraints values in UMMG-JSON.

    Additional Operations

    At this time Cumulus does not, out of the box, support additional operations on CMR metadata. However, given the examples shown above, we recommend working with your integrators to develop additional workflows that perform any required operations.

    Bulk CMR operations

    In order to perform the above operations in bulk, Cumulus supports the use of ApplyWorkflow in an AsyncOperation. These are accessed via the Bulk Operation button on the dashboard, or the /granules/bulk endpoint on the Cumulus API.

    More information on bulk operations are in the bulk operations operator doc.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/operator-docs/create-rule-in-cumulus/index.html b/docs/v9.9.0/operator-docs/create-rule-in-cumulus/index.html index 0412140b498..455124a494f 100644 --- a/docs/v9.9.0/operator-docs/create-rule-in-cumulus/index.html +++ b/docs/v9.9.0/operator-docs/create-rule-in-cumulus/index.html @@ -5,13 +5,13 @@ Create Rule In Cumulus | Cumulus Documentation - +
    Version: v9.9.0

    Create Rule In Cumulus

    Once the above files are in place and the entries created in CMR and Cumulus, we are ready to begin ingesting data. Depending on the type of ingestion (FTP/Kinesis, etc) the values below will change, but for the most part they are all similar. Rules tell Cumulus how to associate providers and collections, and when/how to start processing a workflow.

    Steps

    1. Go To Rules Page
    • Go to the Cumulus dashboard, click on Rules in the navigation.
    • Click Add Rule.

    Screenshot of Rules page

    1. Complete Form
    • Fill out the template form.

    Screenshot of a Rules template for adding a new rule

    For more details regarding the field definitions and required information go to Data Cookbooks.

    Note: If the state field is left blank, it defaults to false.

    Examples

    • A rule form with completed required fields:

    Screenshot of a completed rule form

    • A successfully added Rule:

    Screenshot of created rule

    - + \ No newline at end of file diff --git a/docs/v9.9.0/operator-docs/discovery-filtering/index.html b/docs/v9.9.0/operator-docs/discovery-filtering/index.html index 4cf7ecc94d3..2e5bdda177e 100644 --- a/docs/v9.9.0/operator-docs/discovery-filtering/index.html +++ b/docs/v9.9.0/operator-docs/discovery-filtering/index.html @@ -5,7 +5,7 @@ Discovery Filtering | Cumulus Documentation - + @@ -24,7 +24,7 @@ directly list the provider_path. If the path contains regular expression components, this may fail.

    It is recommended that operators diagnose any failures by checking error logs and ensuring that permissions on the remote file system allow reading of the default directory and any subdirectories that match the filter.

    Supported protocols

    Currently support for this feature is limited to the following protocols:

    • ftp
    • sftp
    - + \ No newline at end of file diff --git a/docs/v9.9.0/operator-docs/granule-workflows/index.html b/docs/v9.9.0/operator-docs/granule-workflows/index.html index 8ca97fe5662..79accadca6c 100644 --- a/docs/v9.9.0/operator-docs/granule-workflows/index.html +++ b/docs/v9.9.0/operator-docs/granule-workflows/index.html @@ -5,13 +5,13 @@ Granule Workflows | Cumulus Documentation - +
    Version: v9.9.0

    Granule Workflows

    Failed Granule

    Delete and Ingest

    1. Delete Granule

    Note: Granules published to CMR will need to be removed from CMR via the dashboard prior to deletion

    1. Ingest Granule via Ingest Rule
    • Re-trigger a one-time, kinesis, SQS, or SNS rule or a scheduled rule will re-discover and reingest the deleted granule.

    Reingest

    1. Select Failed Granule
    • In the Cumulus dashboard, go to the Collections page.
    • Use search field to find the granule.
    1. Re-ingest Granule
    • Go to the Collections page.
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of the Reingest modal workflow

    Delete and Ingest

    1. Bulk Delete Granules
    • Go to the Granules page.
    • Use the Bulk Delete button to bulk delete selected granules or select via a Kibana query

    Note: You can optionally force deletion from CMR

    1. Ingest Granules via Ingest Rule
    • Re-trigger one-time, kinesis, SQS, or SNS rules or scheduled rules will re-discover and reingest the deleted granule.

    Multiple Failed Granules

    1. Select Failed Granules
    • In the Cumulus dashboard, go to the Collections page.
    • Click on Failed Granules.
    • Select multiple granules.

    Screenshot of selected multiple granules

    1. Bulk Re-ingest Granules
    • Click on Reingest and a modal will pop up for your confirmation.

    Screenshot of Bulk Reingest modal workflow

    - + \ No newline at end of file diff --git a/docs/v9.9.0/operator-docs/kinesis-stream-for-ingest/index.html b/docs/v9.9.0/operator-docs/kinesis-stream-for-ingest/index.html index 326d7ffadb3..ea24a1194d6 100644 --- a/docs/v9.9.0/operator-docs/kinesis-stream-for-ingest/index.html +++ b/docs/v9.9.0/operator-docs/kinesis-stream-for-ingest/index.html @@ -5,13 +5,13 @@ Setup Kinesis Stream & CNM Message | Cumulus Documentation - +
    Version: v9.9.0

    Setup Kinesis Stream & CNM Message

    Note: Keep in mind that you should only have to set this up once per ingest stream. Kinesis pricing is based on the shard value and not on amount of kinesis usage.

    1. Create a Kinesis Stream

      • In your AWS console, go to the Kinesis service and click Create Data Stream.
      • Assign a name to the stream.
      • Apply a shard value of 1.
      • Click on Create Kinesis Stream.
      • A status page with stream details display. Once the status is active then the stream is ready to use. Keep in mind to record the streamName and StreamARN for later use.

      Screenshot of AWS console page for creating a Kinesis stream

    2. Create a Rule

    3. Send a message

      • Send a message that makes your schema using python or by your command line.
      • The streamName and Collection must match the kinesisArn+collection defined in the rule that you have created in Step 2.
    - + \ No newline at end of file diff --git a/docs/v9.9.0/operator-docs/locating-access-logs/index.html b/docs/v9.9.0/operator-docs/locating-access-logs/index.html index 8262f38c942..88d2764c17b 100644 --- a/docs/v9.9.0/operator-docs/locating-access-logs/index.html +++ b/docs/v9.9.0/operator-docs/locating-access-logs/index.html @@ -5,13 +5,13 @@ Locating S3 Access Logs | Cumulus Documentation - +
    Version: v9.9.0

    Locating S3 Access Logs

    When enabling S3 Access Logs for EMS Reporting you configured a TargetBucket and TargetPrefix. Inside the TargetBucket at the TargetPrefix is where you will find the raw S3 access logs.

    In a standard deployment, this will be your stack's <internal bucket name> and a key prefix of <stack>/ems-distribution/s3-server-access-logs/

    - + \ No newline at end of file diff --git a/docs/v9.9.0/operator-docs/naming-executions/index.html b/docs/v9.9.0/operator-docs/naming-executions/index.html index feced0dc68c..cf195ca0daf 100644 --- a/docs/v9.9.0/operator-docs/naming-executions/index.html +++ b/docs/v9.9.0/operator-docs/naming-executions/index.html @@ -5,7 +5,7 @@ Naming Executions | Cumulus Documentation - + @@ -21,7 +21,7 @@ QueuePdrs step.

    In the following excerpt, the QueueGranules config.executionNamePrefix property is set using the value configured in the workflow's meta.executionNamePrefix.

    Please note: This meta.executionNamePrefix property should not be confused with the optional rule executionNamePrefix property from the previous section. Setting executionNamePrefix as a root property of the rule will set a prefix for the names of any workflows triggered by the rule. Setting meta.executionNamePrefix on the rule will set meta.executionNamePrefix in the workflow messages generated for this rule, allowing workflow steps like QueueGranules to read from the message meta.executionNamePrefix for their config. Then, workflows scheduled by QueueGranules would use the configured execution name prefix.

    Setting executionNamePrefix config for QueueGranules using rule.meta

    If you wanted to use a prefix of "my-prefix", you would create a rule with a meta property similar to the following Rule snippet:

    {
    ...other rule keys here...
    "meta":
    {
    "executionNamePrefix": "my-prefix"
    }
    }

    The value of meta.executionNamePrefix from the rule will be set as meta.executionNamePrefix in the workflow message.

    Then, the workflow could contain a "QueueGranules" step with the following state, which uses meta.executionNamePrefix from the message as the value for the executionNamePrefix config to the "QueueGranules" step:

    {
    "QueueGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "queueUrl": "${start_sf_queue_url}",
    "provider": "{$.meta.provider}",
    "internalBucket": "{$.meta.buckets.internal.name}",
    "stackName": "{$.meta.stack}",
    "granuleIngestWorkflow": "${ingest_granule_workflow_name}",
    "executionNamePrefix": "{$.meta.executionNamePrefix}"
    }
    }
    },
    "Type": "Task",
    "Resource": "${queue_granules_task_arn}",
    "Retry": [
    {
    "ErrorEquals": [
    "Lambda.ServiceException",
    "Lambda.AWSLambdaException",
    "Lambda.SdkClientException"
    ],
    "IntervalSeconds": 2,
    "MaxAttempts": 6,
    "BackoffRate": 2
    }
    ],
    "Catch": [
    {
    "ErrorEquals": [
    "States.ALL"
    ],
    "ResultPath": "$.exception",
    "Next": "WorkflowFailed"
    }
    ],
    "End": true
    },
    }
    - + \ No newline at end of file diff --git a/docs/v9.9.0/operator-docs/ops-common-use-cases/index.html b/docs/v9.9.0/operator-docs/ops-common-use-cases/index.html index 1947e1003c3..71e81cf598a 100644 --- a/docs/v9.9.0/operator-docs/ops-common-use-cases/index.html +++ b/docs/v9.9.0/operator-docs/ops-common-use-cases/index.html @@ -5,13 +5,13 @@ Operator Common Use Cases | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v9.9.0/operator-docs/trigger-workflow/index.html b/docs/v9.9.0/operator-docs/trigger-workflow/index.html index 7671c9bc876..a3ad1d66bd9 100644 --- a/docs/v9.9.0/operator-docs/trigger-workflow/index.html +++ b/docs/v9.9.0/operator-docs/trigger-workflow/index.html @@ -5,13 +5,13 @@ Trigger a Workflow Execution | Cumulus Documentation - +
    Version: v9.9.0

    Trigger a Workflow Execution

    To trigger a workflow, you need to create a rule. To trigger an ingest workflow, one that requires discovering and ingesting data, you will also need to configure the collection and provider and associate those to a rule.

    Trigger a HelloWorld Workflow

    To trigger a HelloWorld workflow that does not need to discover or archive data, you just need to create a rule.

    You can leave the provider and collection blank and do not need any additional metadata. If you create a onetime rule, the workflow execution will start momentarily and you can view its status on the Executions page.

    Trigger an Ingest Workflow

    To ingest data, you will need a provider and collection configured to tell your workflow where to discover data and where to archive the data respectively.

    Follow the instructions to create a provider and create a collection and configure their fields for your data ingest.

    In the rule's additional metadata you can specify a provider_path from which to get the data from the provider.

    Example: Ingest data from S3

    Setup

    Assume there are 2 files to be ingested in an S3 bucket called discovery-bucket, located in the test-data folder:

    • GRANULE.A2017025.jpg
    • GRANULE.A2017025.hdf

    Archive buckets should already be created and mapped to public / private / protected in the Cumulus deployment.

    For example:

    buckets = {
    private = {
    name = "discovery-bucket"
    type = "private"
    },
    protected = {
    name = "archive-protected"
    type = "protected"
    }
    public = {
    name = "archive-public"
    type = "public"
    }
    }

    Create a provider

    Create a new provider. Set protocol to S3 and Host to discovery-bucket.

    Screenshot of adding a sample S3 provider

    Create a collection

    Create a new collection. Configure the collection to extract the granule id from the filenames and configure where to store the granule files.

    The configuration below will store hdf files in the protected bucket and jpg files in the private bucket. The bucket types are

    {
    "name": "test-collection",
    "version": "001",
    "granuleId": "^GRANULE\\.A[\\d]{7}$",
    "granuleIdExtraction": "(GRANULE\\..*)(\\.hdf|\\.jpg)",
    "reportToEms": false,
    "sampleFileName": "GRANULE.A2017025.hdf",
    "files": [
    {
    "bucket": "protected",
    "regex": "^GRANULE\\.A[\\d]{7}\\.hdf$",
    "sampleFileName": "GRANULE.A2017025.hdf"
    },
    {
    "bucket": "public",
    "regex": "^GRANULE\\.A[\\d]{7}\\.jpg$",
    "sampleFileName": "GRANULE.A2017025.jpg"
    }
    ]
    }

    Create a rule

    Create a rule to trigger the workflow to discover your granule data and ingest your granule.

    Select the previously created provider and collection. See the Cumulus Discover Granules workflow for a workflow example of using Cumulus tasks to discover and queue data for ingest.

    In the rule meta, set the provider_path to test-data, so the test-data folder will be used to discover new granules.

    Screenshot of adding a Discover Granules rule

    A onetime rule will run your workflow on-demand and you can view it on the dashboard Executions page. The Cumulus Discover Granules workflow will trigger an ingest workflow and your ingested granules will be visible on the dashboard Granules page.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/tasks/index.html b/docs/v9.9.0/tasks/index.html index 97e30180f3b..32bcc6222ed 100644 --- a/docs/v9.9.0/tasks/index.html +++ b/docs/v9.9.0/tasks/index.html @@ -5,13 +5,13 @@ Cumulus Tasks | Cumulus Documentation - +
    Version: v9.9.0

    Cumulus Tasks

    A list of reusable Cumulus tasks. Add your own.

    Tasks

    @cumulus/add-missing-file-checksums

    Add checksums to files in S3 which don't have one


    @cumulus/discover-granules

    Discover Granules in FTP/HTTP/HTTPS/SFTP/S3 endpoints


    @cumulus/discover-pdrs

    Discover PDRs in FTP and HTTP endpoints


    @cumulus/files-to-granules

    Converts array-of-files input into a granules object by extracting granuleId from filename


    @cumulus/hello-world

    Example task


    @cumulus/hyrax-metadata-updates

    Update granule metadata with hooks to OPeNDAP URL


    @cumulus/lzards-backup

    Run LZARDS backup


    @cumulus/move-granules

    Move granule files from staging to final location


    @cumulus/parse-pdr

    Download and Parse a given PDR


    @cumulus/pdr-status-check

    Checks execution status of granules in a PDR


    @cumulus/post-to-cmr

    Post a given granule to CMR


    @cumulus/queue-granules

    Add discovered granules to the queue


    @cumulus/queue-pdrs

    Add discovered PDRs to a queue


    @cumulus/queue-workflow

    Add workflow to the queue


    @cumulus/sf-sqs-report

    Sends an incoming Cumulus message to SQS


    @cumulus/sync-granule

    Download a given granule


    @cumulus/test-processing

    Fake processing task used for integration tests


    @cumulus/update-cmr-access-constraints

    Updates CMR metadata to set access constraints


    Update CMR metadata files with correct online access urls and etags and transfer etag info to granules' CMR files

    - + \ No newline at end of file diff --git a/docs/v9.9.0/team/index.html b/docs/v9.9.0/team/index.html index 456129420a5..34929b004b5 100644 --- a/docs/v9.9.0/team/index.html +++ b/docs/v9.9.0/team/index.html @@ -5,13 +5,13 @@ Cumulus Team | Cumulus Documentation - + - + \ No newline at end of file diff --git a/docs/v9.9.0/troubleshooting/index.html b/docs/v9.9.0/troubleshooting/index.html index 0483c392b1f..6ecc7741de1 100644 --- a/docs/v9.9.0/troubleshooting/index.html +++ b/docs/v9.9.0/troubleshooting/index.html @@ -5,14 +5,14 @@ How to Troubleshoot and Fix Issues | Cumulus Documentation - +
    Version: v9.9.0

    How to Troubleshoot and Fix Issues

    While Cumulus is a complex system, there is a focus on maintaining the integrity and availability of the system and data. Should you encounter errors or issues while using this system, this section will help troubleshoot and solve those issues.

    Backup and Restore

    Cumulus has backup and restore functionality built-in to protect Cumulus data and allow recovery of a Cumulus stack. This is currently limited to Cumulus data and not full S3 archive data. Backup and restore is not enabled by default and must be enabled and configured to take advantage of this feature.

    For more information, read the Backup and Restore documentation.

    Elasticsearch reindexing

    If you run into issues with your Elasticsearch index, a reindex operation is available via the Cumulus API. See the Reindexing Guide.

    Information on how to reindex Elasticsearch is in the Cumulus API documentation.

    Troubleshooting Workflows

    Workflows are state machines comprised of tasks and services and each component logs to CloudWatch. The CloudWatch logs for all steps in the execution are displayed in the Cumulus dashboard or you can find them by going to CloudWatch and navigating to the logs for that particular task.

    Workflow Errors

    Visual representations of executed workflows can be found in the Cumulus dashboard or the AWS Step Functions console for that particular execution.

    If a workflow errors, the error will be handled according to the error handling configuration. The task that fails will have the exception field populated in the output, giving information about the error. Further information can be found in the CloudWatch logs for the task.

    Graph of AWS Step Function execution showing a failing workflow

    Workflow Did Not Start

    Generally, first check your rule configuration. If that is satisfactory, the answer will likely be in the CloudWatch logs for the schedule SF or SF starter lambda functions. See the workflow triggers page for more information on how workflows start.

    For Kinesis and SNS rules specifically, if an error occurs during the message consumer process, the fallback consumer lambda will be called and if the message continues to error, a message will be placed on the dead letter queue. Check the dead letter queue for a failure message. Errors can be traced back to the CloudWatch logs for the message consumer and the fallback consumer. Additionally, check that the name and version match those configured in your rule, as rules are filtered by the notification's collection name and version before scheduling executions.

    More information on kinesis error handling is here.

    Operator API Errors

    All operator API calls are funneled through the ApiEndpoints lambda. Each API call is logged to the ApiEndpoints CloudWatch log for your deployment.

    Lambda Errors

    KMS Exception: AccessDeniedException

    KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.

    The above error was being thrown by cumulus lambda function invocation. The KMS key is the encryption key used to encrypt lambda environment variables. The root cause of this error is unknown, but is speculated to be caused by deleting and recreating, with the same name, the IAM role the lambda uses.

    This error can be resolved by switching the lambda's execution role to a different one and then back through the Lambda management console. Unfortunately, this approach doesn't scale well.

    The other resolution (that scales but takes some time) that was found is as follows:

    1. Comment out all lambda definitions (and dependent resources) in your Terraform configuration.
    2. terraform apply to delete the lambdas.
    3. Un-comment the definitions.
    4. terraform apply to recreate the lambdas.

    If this problem occurs with Core lambdas and you are using the terraform-aws-cumulus.zip file source distributed in our release, we recommend using the non-scaling approach as the number of lambdas we distribute is in the low teens, which are likely to be easier and faster to reconfigure one-by-one compared to editing our configs.

    Error: Unable to import module 'index': Error

    This error is shown in the CloudWatch logs for a Lambda function.

    One possible cause is that the Lambda definition in the .tf file defining the lambda is not pointing to the correct packaged lambda source file. In order to resolve this issue, update the lambda definition to point directly to the packaged (e.g. .zip) lambda source file.

    resource "aws_lambda_function" "discover_granules_task" {
    function_name = "${var.prefix}-DiscoverGranules"
    filename = "${path.module}/../../tasks/discover-granules/dist/lambda.zip"
    handler = "index.handler"
    }

    If you are seeing this error when using the Lambda as a step in a Cumulus workflow, then inspect the output for this Lambda step in the AWS Step Function console. If you see the error Cannot find module 'node_modules/@cumulus/cumulus-message-adapter-js', then you need to ensure the lambda's packaged dependencies include cumulus-message-adapter-js.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/troubleshooting/reindex-elasticsearch/index.html b/docs/v9.9.0/troubleshooting/reindex-elasticsearch/index.html index 40af69f0a89..98e4356c3e1 100644 --- a/docs/v9.9.0/troubleshooting/reindex-elasticsearch/index.html +++ b/docs/v9.9.0/troubleshooting/reindex-elasticsearch/index.html @@ -5,7 +5,7 @@ Reindexing Elasticsearch Guide | Cumulus Documentation - + @@ -14,7 +14,7 @@ current index, or the mappings for an index have been updated (they do not update automatically). Any reindexing that will be required when upgrading Cumulus will be in the Migration Steps section of the changelog.

    Switch to a new index and Reindex

    There are two operations needed: reindex and change-index to switch over to the new index. A Change Index/Reindex can be done in either order, but both have their trade-offs.

    If you decide to point Cumulus to a new (empty) index first (with a change index operation), and then Reindex the data to the new index, data ingested while reindexing will automatically be sent to the new index. As reindexing operations can take a while, not all the data will show up on the Cumulus Dashboard right away. The advantage is you do not have to turn of any ingest operations. This way is recommended.

    If you decide to Reindex data to a new index first, and then point Cumulus to that new index, it is not guaranteed that data that is sent to the old index while reindexing will show up in the new index. If you prefer this way, it is recommended to turn off any ingest operations. This order will keep your dashboard data from seeing any interruption.

    Change Index

    This will point Cumulus to the index in Elasticsearch that will be used when retrieving data. Performing a change index operation to an index that does not exist yet will create the index for you. The change index operation can be found here.

    Reindex from the old index to the new index

    The reindex operation will take the data from one index and copy it into another index. The reindex operation can be found here

    Reindex status

    Reindexing is a long-running operation. The reindex-status endpoint can be used to monitor the progress of the operation.

    Index from database

    If you want to just grab the data straight from the database you can perform an Index from Database Operation. After the data is indexed from the database, a Change Index operation will need to be performed to ensure Cumulus is pointing to the right index. It is strongly recommended to turn off workflow rules when performing this operation so any data ingested to the database is not lost.

    Validate reindex

    To validate the reindex, use the reindex-status endpoint. The doc count can be used to verify that the reindex was successful. In the below example the reindex from cumulus-2020-11-3 to cumulus-2021-3-4 was not fully successful as they show different doc counts.

    "indices": {
    "cumulus-2020-11-3": {
    "primaries": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    },
    "total": {
    "docs": {
    "count": 21096512,
    "deleted": 176895
    }
    }
    },
    "cumulus-2021-3-4": {
    "primaries": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    },
    "total": {
    "docs": {
    "count": 715949,
    "deleted": 140191
    }
    }
    }
    }

    To further drill down into what is missing, log in to the Kibana instance (found in the Elasticsearch section of the AWS console) and run the following command replacing <index> with your index name.

    GET <index>/_search
    {
    "aggs": {
    "count_by_type": {
    "terms": {
    "field": "_type"
    }
    }
    },
    "size": 0
    }

    which will produce a result like

    "aggregations": {
    "count_by_type": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
    {
    "key": "logs",
    "doc_count": 483955
    },
    {
    "key": "execution",
    "doc_count": 4966
    },
    {
    "key": "deletedgranule",
    "doc_count": 4715
    },
    {
    "key": "pdr",
    "doc_count": 1822
    },
    {
    "key": "granule",
    "doc_count": 740
    },
    {
    "key": "asyncOperation",
    "doc_count": 616
    },
    {
    "key": "provider",
    "doc_count": 108
    },
    {
    "key": "collection",
    "doc_count": 87
    },
    {
    "key": "reconciliationReport",
    "doc_count": 48
    },
    {
    "key": "rule",
    "doc_count": 7
    }
    ]
    }
    }

    Resuming a reindex

    If a reindex operation did not fully complete it can be resumed using the following command run from the Kibana instance.

    POST _reindex?wait_for_completion=false
    {
    "conflicts": "proceed",
    "source": {
    "index": "cumulus-2020-11-3"
    },
    "dest": {
    "index": "cumulus-2021-3-4",
    "op_type": "create"
    }
    }

    The Cumulus API reindex-status endpoint can be used to monitor completion of this operation.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/troubleshooting/rerunning-workflow-executions/index.html b/docs/v9.9.0/troubleshooting/rerunning-workflow-executions/index.html index 85d7dfca505..18ee5ccebfb 100644 --- a/docs/v9.9.0/troubleshooting/rerunning-workflow-executions/index.html +++ b/docs/v9.9.0/troubleshooting/rerunning-workflow-executions/index.html @@ -5,13 +5,13 @@ Re-running workflow executions | Cumulus Documentation - +
    Version: v9.9.0

    Re-running workflow executions

    To re-run a Cumulus workflow execution from the AWS console:

    1. Visit the page for an individual workflow execution

    2. Click the "New execution" button at the top right of the screen

      Screenshot of the AWS console for a Step Function execution highlighting the &quot;New execution&quot; button at the top right of the screen

    3. In the "New execution" modal that appears, replace the cumulus_meta.execution_name value in the default input with the value of the new execution ID as seen in the screenshot below

      Screenshot of the AWS console showing the modal window for entering input when running a new Step Function execution

    4. Click the "Start execution" button

    - + \ No newline at end of file diff --git a/docs/v9.9.0/troubleshooting/troubleshooting-deployment/index.html b/docs/v9.9.0/troubleshooting/troubleshooting-deployment/index.html index f6d93e0960a..0b776b92f28 100644 --- a/docs/v9.9.0/troubleshooting/troubleshooting-deployment/index.html +++ b/docs/v9.9.0/troubleshooting/troubleshooting-deployment/index.html @@ -5,7 +5,7 @@ Troubleshooting Deployment | Cumulus Documentation - + @@ -16,7 +16,7 @@ data-persistence modules, but your config is only creating one Elasticsearch instance. To fix the issue, update the elasticsearch_config variable for your data-persistence module to increase the number of instances:

    {
    domain_name = "es"
    instance_count = 2
    instance_type = "t2.small.elasticsearch"
    version = "5.3"
    volume_size = 10
    }

    Install dashboard

    Dashboard configuration

    Issues:

    • Problem clearing the cache: EACCES: permission denied, rmdir '/tmp/gulp-cache/default'", this probably means the files at that location, and/or the folder, are owned by someone else (or some other factor prevents you from writing there).

    It's possible to workaround this by editing the file cumulus-dashboard/node_modules/gulp-cache/index.js and alter the value of the line var fileCache = new Cache({cacheDirName: 'gulp-cache'}); to something like var fileCache = new Cache({cacheDirName: '<prefix>-cache'});. Now gulp-cache will be able to write to /tmp/<prefix>-cache/default, and the error should resolve.

    Dashboard deployment

    Issues:

    • If the dashboard sends you to an Earthdata Login page that has an error reading "Invalid request, please verify the client status or redirect_uri before resubmitting", this means you've either forgotten to update one or more of your EARTHDATA_CLIENT_ID, EARTHDATA_CLIENT_PASSWORD environment variables (from your app/.env file) and re-deploy Cumulus, or you haven't placed the correct values in them, or you've forgotten to add both the "redirect" and "token" URL to the Earthdata Application.
    • There is odd caching behavior associated with the dashboard and Earthdata Login at this point in time that can cause the above error to reappear on the Earthdata Login page loaded by the dashboard even after fixing the cause of the error. If you experience this, attempt to access the dashboard in a new browser window, and it should work.
    - + \ No newline at end of file diff --git a/docs/v9.9.0/upgrade-notes/cumulus_distribution_migration/index.html b/docs/v9.9.0/upgrade-notes/cumulus_distribution_migration/index.html index fc1edb6a776..df7d5874c78 100644 --- a/docs/v9.9.0/upgrade-notes/cumulus_distribution_migration/index.html +++ b/docs/v9.9.0/upgrade-notes/cumulus_distribution_migration/index.html @@ -5,14 +5,14 @@ Migrate from TEA deployment to Cumulus Distribution | Cumulus Documentation - +
    Version: v9.9.0

    Migrate from TEA deployment to Cumulus Distribution

    Background

    The Cumulus Distribution API is configured to use the AWS Cognito OAuth client. This API can be used instead of the Thin Egress App, which is the default distribution API if using the Deployment Template.

    Configuring a Cumulus Distribution deployment

    See these instructions for deploying the Cumulus Distribution API.

    Important note if migrating from TEA to Cumulus Distribution

    If you already have a deployment using the TEA distribution and want to switch to Cumulus Distribution, there will be an API Gateway change. This means that there will be downtime while you update your CloudFront endpoint to use the new API gateway.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/upgrade-notes/migrate_tea_standalone/index.html b/docs/v9.9.0/upgrade-notes/migrate_tea_standalone/index.html index e8ce99ba903..c89453b3a3a 100644 --- a/docs/v9.9.0/upgrade-notes/migrate_tea_standalone/index.html +++ b/docs/v9.9.0/upgrade-notes/migrate_tea_standalone/index.html @@ -5,13 +5,13 @@ Migrate TEA deployment to standalone module | Cumulus Documentation - +
    Version: v9.9.0

    Migrate TEA deployment to standalone module

    Background

    This document is only relevant for upgrades of Cumulus from versions < 3.x.x to versions > 3.x.x

    Previous versions of Cumulus included deployment of the Thin Egress App (TEA) by default in the distribution module. As a result, Cumulus users who wanted to deploy a new version of TEA to wait on a new release of Cumulus that incorporated that release.

    In order to give Cumulus users the flexibility to deploy newer versions of TEA whenever they want, deployment of TEA has been removed from the distribution module and Cumulus users must now add the TEA module to their deployment. Guidance on integrating the TEA module to your deployment is provided, or you can refer to Cumulus core example deployment code for the thin_egress_app module.

    By default, when upgrading Cumulus and moving from TEA deployed via the distribution module to deployed as a separate module, your API gateway for TEA would be destroyed and re-created, which could cause outages for any Cloudfront endpoints pointing at that API gateway.

    These instructions outline how to modify your state to preserve your existing Thin Egress App (TEA) API gateway when upgrading Cumulus and moving deployment of TEA to a standalone module. If you do not care about preserving your API gateway for TEA when upgrading your Cumulus deployment, you can skip these instructions.

    Prerequisites

    Notes about state management

    These instructions will involve manipulating your Terraform state via terraform state mv commands. These operations are extremely dangerous, since a mistake in editing your Terraform state can leave your stack in a corrupted state where deployment may be impossible or may result in unanticipated resource deletion.

    Since bucket versioning preserves a separate version of your state file each time it is written, and the Terraform state modification commands overwrite the state file, we can mitigate the risk of these operations by downloading the most recent state file before starting the upgrade process. Then, if anything goes wrong during the upgrade, we can restore that previous state version. Guidance on how to perform both operations is provided below.

    Download your most recent state version

    Run this command to download the most recent cumulus deployment state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp s3://BUCKET/KEY /path/to/terraform.tfstate

    Restore a previous state version

    Upload the state file that was previously downloaded to the bucket/key for your state file, replacing BUCKET and KEY with the correct values from cumulus-tf/terraform.tf:

     aws s3 cp /path/to/terraform.tfstate s3://BUCKET/KEY

    Then run terraform plan, which will give an error because we manually overwrote the state file and it is now out of sync with the lock table Terraform uses to track your state file:

    Error: Error loading state: state data in S3 does not have the expected content.

    This may be caused by unusually long delays in S3 processing a previous state
    update. Please wait for a minute or two and try again. If this problem
    persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
    to manually verify the remote state and update the Digest value stored in the
    DynamoDB table to the following value: <some-digest-value>

    To resolve this error, run this command and replace DYNAMO_LOCK_TABLE, BUCKET and KEY with the correct values from cumulus-tf/terraform.tf, and use the digest value from the previous error output:

     aws dynamodb put-item \
    --table-name DYNAMO_LOCK_TABLE \
    --item '{
    "LockID": {"S": "BUCKET/KEY-md5"},
    "Digest": {"S": "some-digest-value"}
    }'

    Now, if you re-run terraform plan, it should work as expected.

    Migration instructions

    Please note: These instructions assume that you are deploying the thin_egress_app module as shown in the Cumulus core example deployment code

    1. Ensure that you have downloaded the latest version of your state file for your cumulus deployment

    2. Find the URL for your <prefix>-thin-egress-app-EgressGateway API gateway. Confirm that you can access it in the browser and that it is functional.

    3. Run terraform plan. You should see output like (edited for readability):

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be created
      + resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket.lambda_source will be created
      + resource "aws_s3_bucket" "lambda_source" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be created
      + resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be created
      + resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be created
      + resource "aws_s3_bucket_object" "lambda_source" {

      # module.thin_egress_app.aws_security_group.egress_lambda[0] will be created
      + resource "aws_security_group" "egress_lambda" {

      ...

      # module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be destroyed
      - resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source will be destroyed
      - resource "aws_s3_bucket" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be destroyed
      - resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be destroyed
      - resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source will be destroyed
      - resource "aws_s3_bucket_object" "lambda_source" {

      # module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda[0] will be destroyed
      - resource "aws_security_group" "egress_lambda" {
    4. Run the state modification commands. The commands must be run in exactly this order:

       # Move security group
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_security_group.egress_lambda module.thin_egress_app.aws_security_group.egress_lambda

      # Move TEA storage bucket
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket.lambda_source module.thin_egress_app.aws_s3_bucket.lambda_source

      # Move TEA lambda source code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_source module.thin_egress_app.aws_s3_bucket_object.lambda_source

      # Move TEA lambda dependency code
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive

      # Move TEA Cloudformation template
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_s3_bucket_object.cloudformation_template module.thin_egress_app.aws_s3_bucket_object.cloudformation_template

      # Move URS creds secret version
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret_version.thin_egress_urs_creds aws_secretsmanager_secret_version.thin_egress_urs_creds

      # Move URS creds secret
      terraform state mv module.cumulus.module.distribution.aws_secretsmanager_secret.thin_egress_urs_creds aws_secretsmanager_secret.thin_egress_urs_creds

      # Move TEA Cloudformation stack
      terraform state mv module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app module.thin_egress_app.aws_cloudformation_stack.thin_egress_app

      Depending on how you were supplying a bucket map to TEA, there may be an additional step. If you were specifying the bucket_map_key variable to the cumulus module to use a custom bucket map, then you can ignore this step and just ensure that the bucket_map_file variable to the TEA module uses that same S3 key. Otherwise, if you were letting Cumulus generate a bucket map for you, then you need to take this step to migrate that bucket map:

      # Move bucket map
      terraform state mv module.cumulus.module.distribution.aws_s3_bucket_object.bucket_map_yaml[0] aws_s3_bucket_object.bucket_map_yaml
    5. Run terraform plan again. You may still see a few additions/modifications pending like below, but you should not see any deletion of Thin Egress App resources pending:

      # module.thin_egress_app.aws_cloudformation_stack.thin_egress_app will be updated in-place
      ~ resource "aws_cloudformation_stack" "thin_egress_app" {

      # module.thin_egress_app.aws_s3_bucket_object.cloudformation_template will be updated in-place
      ~ resource "aws_s3_bucket_object" "cloudformation_template" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_code_dependency_archive will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_code_dependency_archive" {

      # module.thin_egress_app.aws_s3_bucket_object.lambda_source will be updated in-place
      ~ resource "aws_s3_bucket_object" "lambda_source" {

      If you still see deletion of module.cumulus.module.distribution.module.thin_egress_app.aws_cloudformation_stack.thin_egress_app pending, then something went wrong and you should restore the previously downloaded state file version and start over from step 1. Otherwise, proceed to step 6.

    6. Once you have confirmed that everything looks as expected, run terraform apply.

    7. Visit the same API gateway from step 1 and confirm that it still works.

    Your TEA deployment has now been migrated to a standalone module, which gives you the ability to upgrade the deployed version of TEA independently of Cumulus releases.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/upgrade-notes/upgrade-rds/index.html b/docs/v9.9.0/upgrade-notes/upgrade-rds/index.html index 19d08d83077..8860e63ac0e 100644 --- a/docs/v9.9.0/upgrade-notes/upgrade-rds/index.html +++ b/docs/v9.9.0/upgrade-notes/upgrade-rds/index.html @@ -5,7 +5,7 @@ Upgrade to RDS release | Cumulus Documentation - + @@ -21,7 +21,7 @@ | cutoffSeconds | number | Number of seconds prior to this execution to 'cutoff' reconciliation queries. This allows in-progress/other in-flight operations time to complete and propagate to Elasticsearch/Dynamo/postgres. | 3600 | | dbConcurrency | number | Sets max number of parallel collections reports the script will run at a time. | 20 | | dbMaxPool | number | Sets the maximum number of connections the database pool has available. Modifying this may result in unexpected failures. | 20 |

    - + \ No newline at end of file diff --git a/docs/v9.9.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html b/docs/v9.9.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html index 80439b97790..c19b99ae560 100644 --- a/docs/v9.9.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html +++ b/docs/v9.9.0/upgrade-notes/upgrade_tf_version_0.13.6/index.html @@ -5,13 +5,13 @@ Upgrade to TF version 0.13.6 | Cumulus Documentation - +
    Version: v9.9.0

    Upgrade to TF version 0.13.6

    Background

    Cumulus pins its support to a specific version of Terraform see: deployment documentation. The reason for only supporting one specific Terraform version at a time is to avoid deployment errors than can be caused by deploying to the same target with different Terraform versions.

    Cumulus is upgrading its supported version of Terraform from 0.12.12 to 0.13.6. This document contains instructions on how to perform the uprade for your deployments.

    Prerequisites

    • Follow the Terraform guidance for what to do before upgrading, notably ensuring that you have no pending changes to your Cumulus deployments before proceeding.
      • You should do a terraform plan to see if you have any pending changes for your deployment (for both the data-persistence-tf and cumulus-tf modules), and if so, run a terraform apply before doing the upgrade to Terraform 0.13.6
    • Review the Terraform v0.13 release notes to prepare for any breaking changes that may affect your custom deployment code. Cumulus' deployment code has already been updated for compatibility with version 0.13.
    • Install Terraform version 0.13.6. We recommend using Terraform Version Manager tfenv to manage your installed versons of Terraform, but this is not required.

    Upgrade your deployment code

    Terraform 0.13 does not support some of the syntax from previous Terraform versions, so you need to upgrade your deployment code for compatibility.

    Terraform provides a 0.13upgrade command as part of version 0.13 to handle automatically upgrading your code. Make sure to check out the documentation on batch usage of 0.13upgrade, which will allow you to upgrade all of your Terraform code with one command.

    Run the 0.13upgrade command until you have no more necessary updates to your deployment code.

    Upgrade your deployment

    1. Ensure that you are running Terraform 0.13.6 by running terraform --version. If you are using tfenv, you can switch versions by running tfenv use 0.13.6.

    2. For the data-persistence-tf and cumulus-tf directories, take the following steps:

      1. Run terraform init --reconfigure. The --reconfigure flag is required, otherwise you might see an error like:

        Error: Failed to decode current backend config

        The backend configuration created by the most recent run of "terraform init"
        could not be decoded: unsupported attribute "lock_table". The configuration
        may have been initialized by an earlier version that used an incompatible
        configuration structure. Run "terraform init -reconfigure" to force
        re-initialization of the backend.
      2. Run terraform apply to perform a deployment.

        WARNING: Even if Terraform says that no resource changes are pending, running the apply using Terraform version 0.13.6 will modify your backend state from version 0.12.12 to version 0.13.6 without requiring approval. Updating the backend state is a necessary part of the version 0.13.6 upgrade, but it is not completely transparent.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/workflow_tasks/discover_granules/index.html b/docs/v9.9.0/workflow_tasks/discover_granules/index.html index 51cc520b9c9..1b6d5d560bd 100644 --- a/docs/v9.9.0/workflow_tasks/discover_granules/index.html +++ b/docs/v9.9.0/workflow_tasks/discover_granules/index.html @@ -5,7 +5,7 @@ Discover Granules | Cumulus Documentation - + @@ -21,7 +21,7 @@ included in a granule's file list. That is, no such filtering based on filename occurs as described above.

    When set on the task configuration, the value applies to all collections during discovery. Otherwise, this property may be set on individual collections.

    Concurrency

    A number property that determines the level of concurrency with which granule duplicate checks are performed when duplicateGranuleHandling is skip or error.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when discover-granules discovers a large number of granules with skip or error duplicate handling. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the discover-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/workflow_tasks/files_to_granules/index.html b/docs/v9.9.0/workflow_tasks/files_to_granules/index.html index c02fc2ded71..224886e7cf8 100644 --- a/docs/v9.9.0/workflow_tasks/files_to_granules/index.html +++ b/docs/v9.9.0/workflow_tasks/files_to_granules/index.html @@ -5,13 +5,13 @@ Files To Granules | Cumulus Documentation - +
    Version: v9.9.0

    Files To Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming config.inputGranules and the task input list of s3 URIs along with the rest of the configuration objects to take the list of incoming files and sort them into a list of granule objects.

    Please note Files passed in without metadata defined previously for config.inputGranules will be added with the following keys:

    • name
    • bucket
    • filename
    • fileStagingDir

    It is primarily intended to support compatibility with the standard output of a processing task, and convert that output into a granule object accepted as input by the majority of other Cumulus tasks.

    Task Inputs

    Input

    This task expects an incoming input that contains an array of 'staged' S3 URIs to move to their final archive location.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    inputGranules

    An array of Cumulus granule objects.

    This object will be used to define metadata values for the move granules task, and is the basis for the updated object that will be added to the output.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/workflow_tasks/move_granules/index.html b/docs/v9.9.0/workflow_tasks/move_granules/index.html index 14b2e9bec24..ba0675625a5 100644 --- a/docs/v9.9.0/workflow_tasks/move_granules/index.html +++ b/docs/v9.9.0/workflow_tasks/move_granules/index.html @@ -5,13 +5,13 @@ Move Granules | Cumulus Documentation - +
    Version: v9.9.0

    Move Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming event.input array of Cumulus granule objects to do the following:

    • Move granules from their 'staging' location to the final location (as configured in the Sync Granules task)

    • Update the event.input object with the new file locations.

    • If the granule has a ECHO10/UMM CMR file(.cmr.xml or .cmr.json) file included in the event.input:

      • Update that file's access locations

      • Add it to the appropriate access URL category for the CMR filetype as defined by granule CNM filetype.

      • Set the CMR file to 'metadata' in the output granules object and add it to the granule files if it's not already present.

        Please note: Granules without a valid CNM type set in the granule file type field in event.input will be treated as "data" in the updated CMR metadata file

    • Task then outputs an updated list of granule objects.

    Task Inputs

    Input

    This task expects an incoming input that contains a list of 'staged' S3 URIs to move to their final archive location. If CMR metadata is to be updated for a granule, it must also be included in the input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects event.input to provide an array of Cumulus granule objects. The files listed for each granule represent the files to be acted upon as described in summary.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects with post-move file locations as the payload for the next task, and returns only the expected payload for the next task. If a CMR file has been specified for a granule object, the CMR resources related to the granule files will be updated according to the updated granule file metadata.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v9.9.0/workflow_tasks/parse_pdr/index.html b/docs/v9.9.0/workflow_tasks/parse_pdr/index.html index 4f3fa9ddd16..fc196e274b5 100644 --- a/docs/v9.9.0/workflow_tasks/parse_pdr/index.html +++ b/docs/v9.9.0/workflow_tasks/parse_pdr/index.html @@ -5,13 +5,13 @@ Parse PDR | Cumulus Documentation - +
    Version: v9.9.0

    Parse PDR

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to do the following with the incoming PDR object:

    • Stage it to an internal S3 bucket

    • Parse the PDR

    • Archive the PDR and remove the staged file if successful

    • Outputs a payload object containing metadata about the parsed PDR (e.g. total size of all files, files counts, etc) and a granules object

    The constructed granules object is created using PDR metadata to determine values like data type and version, collection definitions to determine a file storage location based on the extracted data type and version number.

    Granule file types are converted from the PDR spec types to CNM types according to the following translation table:

      HDF: 'data',
    HDF-EOS: 'data',
    SCIENCE: 'data',
    BROWSE: 'browse',
    METADATA: 'metadata',
    BROWSE_METADATA: 'metadata',
    QA_METADATA: 'metadata',
    PRODHIST: 'qa',
    QA: 'metadata',
    TGZ: 'data',
    LINKAGE: 'data'

    Files missing file types will have none assigned, files with invalid types will result in a PDR parse failure.

    Task Inputs

    Input

    This task expects an incoming input that contains name and path information about the PDR to be parsed. For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    Provider

    A Cumulus provider object. Used to define connection information for retrieving the PDR.

    Bucket

    Defines the bucket where the 'pdrs' folder for parsed PDRs will be stored.

    Collection

    A Cumulus collection object. Used to define granule file groupings and granule metadata for discovered files.

    Task Outputs

    This task outputs a single payload output object containing metadata about the parsed PDR (e.g. filesCount, totalSize, etc), a pdr object with information for later steps and a the generated array of granule objects.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/v9.9.0/workflow_tasks/queue_granules/index.html b/docs/v9.9.0/workflow_tasks/queue_granules/index.html index f603c1056d2..bfa5f99ec3c 100644 --- a/docs/v9.9.0/workflow_tasks/queue_granules/index.html +++ b/docs/v9.9.0/workflow_tasks/queue_granules/index.html @@ -5,14 +5,14 @@ Queue Granules | Cumulus Documentation - +
    Version: v9.9.0

    Queue Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions, and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to schedule ingest of granules that were discovered on a remote host, whether via the DiscoverGranules task or the ParsePDR task.

    The task utilizes a defined collection in concert with a defined provider, either on each granule, or passed in via config to queue up ingest executions for each granule, or for batches of granules.

    The constructed granules object is defined by the collection passed in the configuration, and has impacts to other provided core Cumulus Tasks.

    Users of this task in a workflow are encouraged to carefully consider their configuration in context of downstream tasks and workflows.

    Task Inputs

    Each of the following sections are a high-level discussion of the intent of the various input/output/config values.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects an incoming input that contains granules and information about them and their files. For the specifics, see the Cumulus Tasks page entry for the schema.

    This input is most commonly the output from a preceding DiscoverGranules or ParsePDR task.

    Cumulus Configuration

    This task does expect values to be set in the task_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    provider

    A Cumulus provider object for the originating provider. Will be passed along to the ingest workflow. This will be overruled by more specific provider information that may exist on a granule.

    internalBucket

    The Cumulus internal system bucket.

    granuleIngestWorkflow

    A string property that denotes the name of the ingest workflow into which granules should be queued.

    queueUrl

    A string property that denotes the URL of the queue to which scheduled execution messages are sent.

    preferredQueueBatchSize

    A number property that sets an upper bound on the size of each batch of granules queued into the payload of an ingest execution. Setting this property to a value higher than 1 allows queueing of multiple granules per ingest workflow.

    As ingest executions typically expect granules in the payload to have a common collection and common provider, this property only sets an upper bound within which batches will be created based on common collection and provider information.

    This means batches may be smaller than the preferred size if collection or provider information diverge, but never larger.

    The default value if none is specified is 1, which will queue one ingest execution per granule.

    concurrency

    A number property that determines the level of concurrency with which ingest executions are scheduled. Granules or batches of granules will be queued up into executions at this level of concurrency.

    This property is also used to limit concurrency when updating granule status to queued.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when queue-granules receives a large number of granules as input. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the queue-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    executionNamePrefix

    A string property that will prefix the names of scheduled executions.

    childWorkflowMeta

    An object property that will be merged into the scheduled execution input's meta field.

    Task Outputs

    This task outputs an assembled array of workflow execution ARNs for all scheduled workflow executions within the payload's running object.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/workflows/cumulus-task-message-flow/index.html b/docs/v9.9.0/workflows/cumulus-task-message-flow/index.html index ace5203038a..a75e9f9fa4c 100644 --- a/docs/v9.9.0/workflows/cumulus-task-message-flow/index.html +++ b/docs/v9.9.0/workflows/cumulus-task-message-flow/index.html @@ -5,14 +5,14 @@ Cumulus Tasks: Message Flow | Cumulus Documentation - +
    Version: v9.9.0

    Cumulus Tasks: Message Flow

    Cumulus Tasks comprise Cumulus Workflows and are either AWS Lambda tasks or AWS Elastic Container Service (ECS) activities. Cumulus Tasks permit a payload as input to the main task application code. The task payload is additionally wrapped by the Cumulus Message Adapter. The Cumulus Message Adapter supplies additional information supporting message templating and metadata management of these workflows.

    Diagram showing how incoming and outgoing Cumulus messages for workflow steps are handled by the Cumulus Message Adapter

    The steps in this flow are detailed in sections below.

    Cumulus Message Format

    A full Cumulus Message has the following keys:

    • cumulus_meta: System runtime information that should generally not be touched outside of Cumulus library code or the Cumulus Message Adapter. Stores meta information about the workflow such as the state machine name and the current workflow execution's name. This information is used to look up the current active task. The name of the current active task is used to look up the corresponding task's config in task_config.
    • meta: Runtime information captured by the workflow operators. Stores execution-agnostic variables.
    • payload: Payload is runtime information for the tasks.

    In addition to the above keys, it may contain the following keys:

    • replace: A key generated in conjunction with the Cumulus Message adapter. It contains the location on S3 for a message payload and a Target JSON path in the message to extract it to.
    • exception: A key used to track workflow exceptions, should not be modified outside of Cumulus library code.

    Here's a simple example of a Cumulus Message:

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    A message utilizing the Cumulus Remote message functionality must have at least the keys replace and cumulus_meta. Depending on configuration other portions of the message may be present, however the cumulus_meta, meta, and payload keys must be present once extraction is complete.

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    Cumulus Message Preparation

    The event coming into a Cumulus Task is assumed to be a Cumulus Message and should first be handled by the functions described below before being passed to the task application code.

    Preparation Step 1: Fetch remote event

    Fetch remote event will fetch the full event from S3 if the cumulus message includes a replace key.

    Once "my-large-event.json" is fetched from S3, it's returned from the fetch remote event function. If no "replace" key is present, the event passed to the fetch remote event function is assumed to be a complete Cumulus Message and returned as-is.

    Preparation Step 2: Parse step function config from CMA configuration parameters

    This step determines what current task is being executed. Note this is different from what lambda or activity is being executed, because the same lambda or activity can be used for different tasks. The current task name is used to load the appropriate configuration from the Cumulus Message's 'task_config' configuration parameter.

    Preparation Step 3: Load nested event

    Using the config returned from the previous step, load nested event resolves templates for the final config and input to send to the task's application code.

    Task Application Code

    After message prep, the message passed to the task application code is of the form:

    {
    "input": {},
    "config": {}
    }

    Create Next Message functions

    Whatever comes out of the task application code is used to construct an outgoing Cumulus Message.

    Create Next Message Step 1: Assign outputs

    The config loaded from the Fetch step function config step may have a cumulus_message key. This can be used to "dispatch" fields from the task's application output to a destination in the final event output (via URL templating). Here's an example where the value of input.anykey would be dispatched as the value of payload.out in the final cumulus message:

    {
    "task_config": {
    "bar": "baz",
    "cumulus_message": {
    "input": "{$.payload.input}",
    "outputs": [
    {
    "source": "{$.input.anykey}",
    "destination": "{$.payload.out}"
    }
    ]
    }
    },
    "cumulus_meta": {
    "task": "Example",
    "message_source": "local",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "input": {
    "anykey": "anyvalue"
    }
    }
    }

    Create Next Message Step 2: Store remote event

    If the ReplaceConfiguration parameter is set, the configured key's value will be stored in S3 and the final output of the task will include a replace key that contains configuration for a future step to extract the payload on S3 back into the Cumulus Message. The replace key identifies where the large event node has been stored in S3.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/workflows/developing-a-cumulus-workflow/index.html b/docs/v9.9.0/workflows/developing-a-cumulus-workflow/index.html index 98509c9c961..078b4fe997f 100644 --- a/docs/v9.9.0/workflows/developing-a-cumulus-workflow/index.html +++ b/docs/v9.9.0/workflows/developing-a-cumulus-workflow/index.html @@ -5,13 +5,13 @@ Creating a Cumulus Workflow | Cumulus Documentation - +
    Version: v9.9.0

    Creating a Cumulus Workflow

    The Cumulus workflow module

    To facilitate adding a workflows to your deployment Cumulus provides a workflow module.

    In combination with the Cumulus message, the workflow module provides a way to easily turn a Step Function definition into a Cumulus workflow, complete with:

    Using the module also ensures that your workflows will continue to be compatible with future versions of Cumulus.

    For more on the full set of current available options for the module, please consult the module README.

    Adding a new Cumulus workflow to your deployment

    To add a new Cumulus workflow to your deployment that is using the cumulus module, add a new workflow resource to your deployment directory, either in a new .tf file, or to an existing file.

    The workflow should follow a syntax similar to:

    module "my_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/vx.x.x/terraform-aws-cumulus-workflow.zip"

    prefix = "my-prefix"
    name = "MyWorkflowName"
    system_bucket = "my-internal-bucket"

    workflow_config = module.cumulus.workflow_config

    tags = { Deployment = var.prefix }

    state_machine_definition = <<JSON
    {}
    JSON
    }

    In the above example, you would add your state_machine_definition using the Amazon States Language, using tasks you've developed and Cumulus core tasks that are made available as part of the cumulus terraform module.

    Please note: Cumulus follows the convention of tagging resources with the prefix variable { Deployment = var.prefix } that you pass to the cumulus module. For resources defined outside of Core, it's recommended that you adopt this convention as it makes resources and/or deployment recovery scenarios much easier to manage.

    Examples

    For a functional example of a basic workflow, please take a look at the hello_world_workflow.

    For more complete/advanced examples, please read the following cookbook entries/topics:

    - + \ No newline at end of file diff --git a/docs/v9.9.0/workflows/developing-workflow-tasks/index.html b/docs/v9.9.0/workflows/developing-workflow-tasks/index.html index 2ad6b04f257..1060e24b3c6 100644 --- a/docs/v9.9.0/workflows/developing-workflow-tasks/index.html +++ b/docs/v9.9.0/workflows/developing-workflow-tasks/index.html @@ -5,13 +5,13 @@ Developing Workflow Tasks | Cumulus Documentation - +
    Version: v9.9.0

    Developing Workflow Tasks

    Workflow tasks can be either AWS Lambda Functions or ECS Activities.

    Lambda functions

    The full set of available core Lambda functions can be found in the deployed cumulus module zipfile at /tasks, as well as reference documentation here. These Lambdas can be referenced in workflows via the outputs from that module (see the cumulus-template-deploy repo for an example).

    The tasks source is located in the Cumulus repository at cumulus/tasks.

    You can also develop your own Lambda function. See the Lambda Functions page to learn more.

    ECS Activities

    ECS activities are supported via the cumulus_ecs_module available from the Cumulus release page.

    Please read the module README for configuration details.

    For assistance in creating a task definition within the module read the AWS Task Definition Docs.

    For a step-by-step example of using the cumulus_ecs_module, please see the related cookbook entry.

    Cumulus Docker Image

    ECS activities require a docker image. Cumulus provides a docker image (source for node 12x+ lambdas on dockerhub: cumuluss/cumulus-ecs-task.

    Alternate Docker Images

    Custom docker images/runtimes are supported as are private registries. For details on configuring a private registry/image see the AWS documentation on Private Registry Authentication for Tasks.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/workflows/docker/index.html b/docs/v9.9.0/workflows/docker/index.html index 46d978c1896..2b26f61be0f 100644 --- a/docs/v9.9.0/workflows/docker/index.html +++ b/docs/v9.9.0/workflows/docker/index.html @@ -5,7 +5,7 @@ Dockerizing Data Processing | Cumulus Documentation - + @@ -14,7 +14,7 @@ 2) validate the output (in this case just check for existence) 3) use 'ncatted' to update the resulting file to be CF-compliant 4) write out metadata generated for this file

    Process Testing

    It is important to have tests for data processing, however in many cases datafiles can be large so it is not practical to store the test data in the repository. Instead, test data is currently stored on AWS S3, and can be retrieved using the AWS CLI.

    aws s3 sync s3://cumulus-ghrc-logs/sample-data/collection-name data

    Where collection-name is the name of the data collection, such as 'avaps', or 'cpl'. For example, an abridged version of the data for CPL includes:

    ├── cpl
    │   ├── input
    │   │   ├── HS3_CPL_ATB_12203a_20120906.hdf5
    │   │   ├── HS3_CPL_OP_12203a_20120906.hdf5
    │   └── output
    │   ├── HS3_CPL_ATB_12203a_20120906.nc
    │   ├── HS3_CPL_ATB_12203a_20120906.nc.meta.xml
    │   ├── HS3_CPL_OP_12203a_20120906.nc
    │   ├── HS3_CPL_OP_12203a_20120906.nc.meta.xml

    Contained in the input directory are all possible sets of data files, while the output directory is the expected result of processing. In this case the hdf5 files are converted to NetCDF files and XML metadata files are generated.

    The docker image for a process can be used on the retrieved test data. First create a test-output directory in the newly created data directory.

    mkdir data/test-output

    Then run the docker image using docker-compose.

    docker-compose run test

    This will process the data in the data/input directory and put the output into data/test-output. Repositories also include Python based tests which will validate this newly created output to the contents of data/output. Use Python's Nose tool to run the included tests.

    nosetests

    If the data/test-output directory validated against the contents of data/output the tests will be successful, otherwise an error will be reported.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/workflows/index.html b/docs/v9.9.0/workflows/index.html index 31582a65a0b..065c09e4630 100644 --- a/docs/v9.9.0/workflows/index.html +++ b/docs/v9.9.0/workflows/index.html @@ -5,13 +5,13 @@ Workflows | Cumulus Documentation - +
    Version: v9.9.0

    Workflows

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    Provider data ingest and GIBS have a set of common needs in getting data from a source system and into the cloud where they can be distributed to end users. These common needs are:

    • Data Discovery - Crawling, polling, or detecting changes from a variety of sources.
    • Data Transformation - Taking data files in their original format and extracting and transforming them into another desired format such as visible browse images.
    • Archival - Storage of the files in a location that's accessible to end users.

    The high level view of the architecture and many of the individual steps are the same but the details of ingesting each type of collection differs. Different collection types and different providers have different needs. The individual boxes of a workflow are not only different. The branching, error handling, and multiplicity of the arrows connecting the boxes are also different. Some need visible images rendered from component data files from multiple collections. Some need to contact the CMR with updated metadata. Some will have different retry strategies to handle availability issues with source data systems.

    AWS and other cloud vendors provide an ideal solution for parts of these problems but there needs to be a higher level solution to allow the composition of AWS components into a full featured solution. The Ingest Workflow Architecture is designed to meet the needs for Earth Science data ingest and transformation.

    Goals

    Flexibility and Composability

    The steps to ingest and process data is different for each collection within a provider. Ingest should be as flexible as possible in the rearranging of steps and configuration.

    We want to use lego-like individual steps that can be composed by an operator.

    Individual steps should ...

    • Be as ignorant as possible of the overall flow. They should not be aware of previous steps.
    • Be runnable on their own.
    • Define their input and output in simple data structures.
    • Be domain agnostic.
    • Not make assumptions of specifics of what goes into a granule for example.

    Scalable

    The ingest architecture needs to be scalable both to handle ingesting hundreds of millions of granules and interpret dozens of different workflows.

    Data Provenance

    • We should have traceability for how data was produced and where it comes from.
    • Use immutable representations of data. Data once received is not overwritten. Data can be removed for cleanup.
    • All software is versioned. We can trace transformation of data by tracking the immutable source data and the versioned software applied to it.

    Operator Visibility and Control

    • Operators should be able to see and understand everything that is happening in the system.
    • It should be obvious why things are happening and straightforward to diagnose problems.
    • We generally assume that the operators know best in terms of the limits on a providers infrastructure, how often things need to be done, and details of a collection. The architecture should defer to their decisions and knowledge while providing safety nets to prevent problems.

    A Reconfigurable Workflow Architecture

    The Ingest Workflow Architecture is defined by two entity types, Workflows and Tasks. A Workflow is a set of composed Tasks to complete an objective such as ingesting a granule. Tasks are the individual steps of a Workflow that perform one job. The workflow is responsible for executing the right task based on the current state and response from the last task executed. Tasks are completely decoupled in that they don't call each other or even need to know about the presence of other tasks.

    Workflows and tasks are configured as Terraform resources, which are triggered via configured rules within Cumulus.

    Diagram showing the Step Function execution path through workflow tasks for a collection ingest

    See the Example GIBS Ingest Architecture showing how workflows and tasks are used to define the GIBS Ingest Architecture.

    Workflows

    A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions.

    Benefits of AWS Step Functions

    AWS Step functions are described in detail in the AWS documentation but they provide several benefits which are applicable to AWS.

    • Prebuilt solution
    • Operations Visibility
      • Visual diagram
      • Every execution is recorded with both inputs and output for every step.
    • Composability
      • Allow composing AWS Lambdas and code running in other steps. Code can be run in EC2 to interface with it or even on premise if desired.
      • Step functions allow specifying when steps run in parallel or choices between steps based on data from the previous step.
    • Flexibility
      • Step functions are designed to be easy to build new applications and reconfigure. We're exposing that flexibility directly to the provider.
    • Reliability and Error Handling
      • Step functions allow configuration of retries and adding handling of error conditions.
    • Described via data
      • This makes it easy to save the step function in configuration management solutions.
      • We can build simple interfaces on top of the flexibility provided.

    Workflow Scheduler

    The scheduler is responsible for initiating a step function and passing in the relevant data for a collection. This is currently configured as an interval for each collection. The scheduler service creates the initial event by combining the collection configuration with the AWS execution context defined via the cumulus terraform module.

    Tasks

    A workflow is composed of tasks. Each task is responsible for performing a discrete step of the ingest process. These can be activities like:

    • Crawling a provider website for new data.
    • Uploading data from a provider to S3.
    • Executing a process to transform data.

    AWS Step Functions permit tasks to be code running anywhere, even on premise. We expect most tasks will be written as Lambda functions in order to take advantage of the easy deployment, scalability, and cost benefits provided by AWS Lambda.

    • Leverages Existing Work
      • The design leverages the existing work of Amazon by defining workflows using the AWS Step Function State Language. This is the language that was created for describing the state machines used in AWS Step Functions.
    • Open for Extension
      • Both meta and task_config which are used for configuring at the collection and task levels do not dictate the fields and structure of the configuration. Additional task specific JSON schemas can be used for extending the validation of individual steps.
    • Data-centric Configuration
      • The use of a single JSON configuration file allows this to be added to a workflow. We build additional support on top of the configuration file for simpler domain specific configuration or interactive GUIs.

    For more details on Task Messages and Configuration, visit Cumulus configuration and message protocol documentation.

    Ingest Deploy

    To view deployment documentation, please see the Cumulus deployment documentation.

    Tradeoffs, and Benefits

    This section documents various tradeoffs and benefits of the Ingest Workflow Architecture.

    Tradeoffs

    Workflow execution is handled completely by AWS

    This means we can't add our own code into the orchestration of the workflow. We can't add new features not supported by Step Functions. We can't do things like enforce that the responses from tasks always conform to a schema or extract the configuration for a task ahead of it's execution.

    If we implemented our own orchestration we'd be able to add all of these. We save significant amounts of development effort and gain all the features of Step Functions for this trade off. One workaround is by providing a library of common task capabilities. These would optionally be available to tasks that can be implemented with Node.js and are able to include the library.

    Workflow Configuration is specified in AWS Step Function States Language

    The current design combines the states language defined by AWS with Ingest specific configuration. This means our representation has a tight coupling with their standard. If they make backwards incompatible changes in the future we will have to deal with existing projects written against that.

    We avoid having to develop our own standard and code to process it. The design can support new features in AWS Step Functions without needing to update the Ingest library code changes. It is unlikely they will make a backwards incompatible change at this point. One mitigation for this is writing data transformations to a new format if that were to happen.

    Collection Configuration Flexibility vs Complexity

    The Collections Configuration File is very flexible but requires more knowledge of AWS step functions to configure. A person modifying this file directly would need to comfortable editing a JSON file and configuring AWS Step Functions state transitions which address AWS resources.

    The configuration file itself is not necessarily meant to be edited by a human directly. Since we are developing a reconfigurable, composable architecture that specified entirely in data additional tools can be developed on top of it. The existing recipes.json files can be mapped to this format. Operational Tools like a GUI can be built that provide a usable interface for customizing workflows but it will take time to develop these tools.

    Benefits

    This section describes benefits of the Ingest Workflow Architecture.

    Simplicity

    The concepts of Workflows and Tasks are simple ones that should make sense to providers. Additionally, the implementation will only consist of a few components because the design leverages existing services and capabilities of AWS. The Ingest implementation will only consist of some reusable task code to make task implementation easier, Ingest deployment, and the Workflow Scheduler.

    Composability

    The design aims to satisfy the needs for ingest integrating different workflows for providers. It's flexible in terms of the ability to arrange tasks to meet the needs of a collection. Providers have developed and incorporated open source tools over the years. All of these are easily integrable into the workflows as tasks.

    There is low coupling between task steps. Failures of one component don't bring the whole system down. Individual tasks can be deployed separately.

    Scalability

    AWS Step Functions scale up as needed and aren't limited by a set of number of servers. They also easily allow you to leverage the inherent scalability of serverless functions.

    Monitoring and Auditing

    • Every execution is captured.
    • Every task run has captured input and outputs.
    • CloudWatch Metrics can be used for monitoring many of the events with the StepFunctions. It can also generate alarms for the whole process.
    • Visual report of the entire configuration.
      • Errors and success states are highlighted visually in the flow.

    Data Provenance

    • Monitoring and auditing ensures we know the data that was given to a task.
    • Workflows are versioned and the state machines stored in AWS Step Functions are immutable. Once created they cannot change.
    • Versioning of data in S3 or using immutable records in S3 will mean we always know what data was created as the result of a step or fed into a step.

    Appendix

    Example GIBS Ingest Architecture

    This shows the GIBS Ingest Architecture as an example of the use of the Ingest Workflow Architecture.

    • The GIBS Ingest Architecture consists of two workflows per collection type. There is one for discovery and one for ingest. The final stage of discovery triggers multiple ingest workflows for each MRF granule that needs to be generated.
    • It demonstrates both lambdas as tasks and a container used for MRF generation.

    GIBS Ingest Workflows

    Diagram showing the AWS Step Function execution path for a GIBS ingest workflow

    GIBS Ingest Granules Workflow

    This shows a visualization of an execution of the ingets granules workflow in step functions. The steps highlighted in green are the ones that executed and completed successfully.

    Diagram showing the AWS Step Function execution path for a GIBS ingest granules workflow

    - + \ No newline at end of file diff --git a/docs/v9.9.0/workflows/input_output/index.html b/docs/v9.9.0/workflows/input_output/index.html index 9ee79394740..6e52099e865 100644 --- a/docs/v9.9.0/workflows/input_output/index.html +++ b/docs/v9.9.0/workflows/input_output/index.html @@ -5,14 +5,14 @@ Workflow Inputs & Outputs | Cumulus Documentation - +
    Version: v9.9.0

    Workflow Inputs & Outputs

    General Structure

    Cumulus uses a common format for all inputs and outputs to workflows. The same format is used for input and output from workflow steps. The common format consists of a JSON object which holds all necessary information about the task execution and AWS environment. Tasks return objects identical in format to their input with the exception of a task-specific payload field. Tasks may also augment their execution metadata.

    Cumulus Message Adapter

    The Cumulus Message Adapter and Cumulus Message Adapter libraries help task developers integrate their tasks into a Cumulus workflow. These libraries adapt input and outputs from tasks into the Cumulus Message format. The Scheduler service creates the initial event message by combining the collection configuration, external resource configuration, workflow configuration, and deployment environment settings. The subsequent workflow messages between tasks must conform to the message schema. By using the Cumulus Message Adapter, individual task Lambda functions only receive the input and output specifically configured for the task, and not non-task-related message fields.

    The Cumulus Message Adapter libraries are called by the tasks with a callback function containing the business logic of the task as a parameter. They first adapt the incoming message to a format more easily consumable by Cumulus tasks, then invoke the task, and then adapt the task response back to the Cumulus message protocol to be sent to the next task.

    A task's Lambda function can be configured to include a Cumulus Message Adapter library which constructs input/output messages and resolves task configurations. The CMA can then be included in one of several ways:

    Lambda Layer

    In order to make use of this configuration, a Lambda layer must be uploaded to your account. Due to platform restrictions, Core cannot currently support sharable public layers, however you can deploy the appropriate version from the release page in two ways:

    Once you've deployed the layer, integrate the CMA layer with your Lambdas:

    • If using the cumulus module, set the cumulus_message_adapter_lambda_layer_version_arn in your .tfvars file to integrate the CMA layer with all core Cumulus lambdas.
    • If including your own Lambda or ECS task Terraform modules, specify the CMA layer ARN in the Terraform resource definitions. Also, make sure to set the CUMULUS_MESSAGE_ADAPTER_DIR environment variable for the task to /opt for the CMA integration to work properly.

    In the future if you wish to update/change the CMA version you will need to update the deployed CMA, and update the layer configuration for the impacted Lambdas as needed.

    Please Note: Updating/removing a layer does not change a deployed Lambda, so to update the CMA you should deploy a new version of the CMA layer, update the associated Lambda configuration to reference the new CMA version, and re-deploy your Lambdas.

    Manual Addition

    You can include the CMA package in the Lambda code in the cumulus-message-adapter sub-directory in your lambda .zip, for any Lambda runtime that includes a python runtime. python 2 is included in Lambda runtimes that use Amazon Linux, however Amazon Linux 2 will not support this directly.

    Please note: It is expected that upcoming Cumulus releases will update the CMA layer to include a python runtime.

    If you are manually adding the message adapter to your source and utilizing the CMA, you should set the Lambda's CUMULUS_MESSAGE_ADAPTER_DIR environment variable to target the installation path for the CMA.

    CMA Input/Output

    Input to the task application code is a json object with keys:

    • input: By default, the incoming payload is the payload output from the previous task, or it can be a portion of the payload as configured for the task in the corresponding .tf workflow definition file.
    • config: Task-specific configuration object with URL templates resolved.

    Output from the task application code is returned in and placed in the payload key by default, but the config key can also be used to return just a portion of the task output.

    CMA configuration

    As of Cumulus > 1.15 and CMA > v1.1.1, configuration of the CMA is expected to be driven by AWS Step Function Parameters.

    Using the CMA package with the Lambda by any of the above mentioned methods (Lambda Layers, manual) requires configuration for its various features via a specific Step Function Parameters configuration format (see sample workflows in the examples cumulus-tf source for more examples):

    {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": "{some config}",
    "task_config": "{some config}"
    }
    }

    The "event.$": "$" parameter is required as it passes the entire incoming message to the CMA client library for parsing, and the CMA itself to convert the incoming message into a Cumulus message for use in the function.

    The following are the CMA's current configuration settings:

    ReplaceConfig (Cumulus Remote Message)

    Because of the potential size of a Cumulus message, mainly the payload field, a task can be set via configuration to store a portion of its output on S3 with a message key Remote Message that defines how to retrieve it and an empty JSON object {} in its place. If the portion of the message targeted exceeds the configured MaxSize (defaults to 0 bytes) it will be written to S3.

    The CMA remote message functionality can be configured using parameters in several ways:

    Partial Message

    Setting the Path/Target path in the ReplaceConfig parameter (and optionally a non-default MaxSize)

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 1,
    "Path": "$.payload",
    "TargetPath": "$.payload"
    }
    }
    }
    }
    }

    will result in any payload output larger than the MaxSize (in bytes) to be written to S3. The CMA will then mark that the key has been replaced via a replace key on the event. When the CMA picks up the replace key in future steps, it will attempt to retrieve the output from S3 and write it back to payload.

    Note that you can optionally use a different TargetPath than Path, however as the target is a JSON path there must be a key to target for replacement in the output of that step. Also note that the JSON path specified must target one node, otherwise the CMA will error, as it does not support multiple replacement targets.

    If TargetPath is omitted, it will default to the value for Path.

    Full Message

    Setting the following parameters for a lambda:

    DiscoverGranules:
    Parameters:
    cma:
    event.$: '$'
    ReplaceConfig:
    FullMessage: true

    will result in the CMA assuming the entire inbound message should be stored to S3 if it exceeds the default max size.

    This is effectively the same as doing:

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 0,
    "Path": "$",
    "TargetPath": "$"
    }
    }
    }
    }
    }

    Cumulus Message example

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Cumulus Remote Message example

    The message may contain a reference to an S3 Bucket, Key and TargetPath as follows:

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    task_config

    This configuration key contains the input/output configuration values for definition of inputs/outputs via URL paths. Important: These values are all relative to json object configured for event.$.

    This configuration's behavior is outlined in the CMA step description below.

    The configuration should follow the format:

    {
    "FunctionName": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "other_cma_configuration": "<config object>",
    "task_config": "<task config>"
    }
    }
    }
    }

    Example:

    {
    "StepFunction": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "sfnEnd": true,
    "stack": "{$.meta.stack}",
    "bucket": "{$.meta.buckets.internal.name}",
    "stateMachine": "{$.cumulus_meta.state_machine}",
    "executionName": "{$.cumulus_meta.execution_name}",
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    }
    }
    }

    Cumulus Message Adapter Steps

    1. Reformat AWS Step Function message into Cumulus Message

    Due to the way AWS handles Parameterized messages, when Parameters are used the CMA takes an inbound message:

    {
    "resource": "arn:aws:lambda:us-east-1:<lambda arn values>",
    "input": {
    "Other Parameter": {},
    "cma": {
    "ConfigKey": {
    "config values": "some config values"
    },
    "event": {
    "cumulus_meta": {},
    "payload": {},
    "meta": {},
    "exception": {}
    }
    }
    }
    }

    and takes the following actions:

    • Takes the object at input.cma.event and makes it the full input
    • Merges all of the keys except event under input.cma into the parent input object

    This results in the incoming message (presumably a Cumulus message) with any cma configuration parameters merged in being passed to the CMA. All other parameterized values defined outside of the cma key are ignored

    2. Resolve Remote Messages

    If the incoming Cumulus message has a replace key value, the CMA will attempt to pull the payload from S3,

    For example, if the incoming contains the following:

      "meta": {
    "foo": {}
    },
    "replace": {
    "TargetPath": "$.meta.foo",
    "Bucket": "some_bucket",
    "Key": "events/some-event-id"
    }

    The CMA will attempt to pull the file stored at Bucket/Key and replace the value at TargetPath, then remove the replace object entirely and continue.

    3. Resolve URL templates in the task configuration

    In the workflow configuration (defined under the task_config key), each task has its own configuration, and it can use URL template as a value to achieve simplicity or for values only available at execution time. The Cumulus Message Adapter resolves the URL templates (relative to the event configuration key) and then passes message to next task. For example, given a task which has the following configuration:

    {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }
    }
    }
    }

    and and incoming message that contains:

    {
    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    }
    }

    The corresponding Cumulus Message would contain:

    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }

    The message sent to the task would be:

    "config" : {
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    },
    "inlinestr": "prefixbarsuffix",
    "array": ["bar"],
    "object": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    },
    "input": "{...}"

    URL template variables replace dotted paths inside curly brackets with their corresponding value. If the Cumulus Message Adapter cannot resolve a value, it will ignore the template, leaving it verbatim in the string. While seemingly complex, this allows significant decoupling of Tasks from one another and the data that drives them. Tasks are able to easily receive runtime configuration produced by previously run tasks and domain data.

    4. Resolve task input

    By default, the incoming payload is the payload from the previous task. The task can also be configured to use a portion of the payload its input message. For example, given a task specifies cma.task_config.cumulus_message.input:

        ExampleTask:
    Parameters:
    cma:
    event.$: '$'
    task_config:
    cumulus_message:
    input: '{$.payload.foo}'

    The task configuration in the message would be:

        {
    "task_config": {
    "cumulus_message": {
    "input": "{$.payload.foo}"
    }
    },
    "payload": {
    "foo": {
    "anykey": "anyvalue"
    }
    }
    }

    The Cumulus Message Adapter will resolve the task input, instead of sending the whole payload as task input, the task input would be:

        {
    "input" : {
    "anykey": "anyvalue"
    },
    "config": {...}
    }

    5. Resolve task output

    By default, the task's return value is the next payload. However, the workflow task configuration can specify a portion of the return value as the next payload, and can also augment values to other fields. Based on the task configuration under cma.task_config.cumulus_message.outputs, the Message Adapter uses a task's return value to output a message as configured by the task-specific config defined under cma.task_config. The Message Adapter dispatches a "source" to a "destination" as defined by URL templates stored in the task-specific cumulus_message.outputs. The value of the task's return value at the "source" URL is used to create or replace the value of the task's return value at the "destination" URL. For example, given a task specifies cumulus_message.output in its workflow configuration as follows:

    {
    "ExampleTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    }
    }
    }
    }
    }

    The corresponding Cumulus Message would be:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Given the response from the task is:

        {
    "output": {
    "anykey": "boo"
    }
    }

    The Cumulus Message Adapter would output the following Cumulus Message:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    6. Apply Remote Message Configuration

    If the ReplaceConfig configuration parameter is defined, the CMA will evaluate the configuration options provided, and if required write a portion of the Cumulus Message to S3, and add a replace key to the message for future steps to utilize.

    Please Note: the non user-modifiable field cumulus-meta will always be retained, regardless of the configuration.

    For example, if the output message (post output configuration) from a cumulus message looks like:

        {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    the resultant output would look like:

    {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "replace": {
    "TargetPath": "$",
    "Bucket": "some-internal-bucket",
    "Key": "events/some-event-id"
    }
    }

    Additional features

    Validate task input, output and configuration messages against the schemas provided

    The Cumulus Message Adapter has the capability to validate task input, output and configuration messages against their schemas. The default location of the schemas is the schemas folder in the top level of the task and the default filenames are input.json, output.json, and config.json. The task can also configure a different schema location. If no schema can be found, the Cumulus Message Adapter will not validate the messages.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/workflows/lambda/index.html b/docs/v9.9.0/workflows/lambda/index.html index f21b721ab4a..bf05f2fbeb5 100644 --- a/docs/v9.9.0/workflows/lambda/index.html +++ b/docs/v9.9.0/workflows/lambda/index.html @@ -5,13 +5,13 @@ Develop Lambda Functions | Cumulus Documentation - +
    Version: v9.9.0

    Develop Lambda Functions

    Develop a new Cumulus Lambda

    AWS provides great getting started guide for building Lambdas in the developer guide.

    Cumulus currently supports the following environments for Cumulus Message Adapter enabled functions:

    Additionally you may chose to include any of the other languages AWS supports as a resource with reduced feature support.

    Deploy a Lambda

    Node.js Lambda

    For a new Node.js Lambda, create a new function and add an aws_lambda_function resource to your Cumulus deployment (for examples, see the example in source example/lambdas.tf and ingest/lambda-functions.tf) as either a new .tf file, or added to an existing .tf file:

    resource "aws_lambda_function" "myfunction" {
    function_name = "${var.prefix}-function"
    filename = "/path/to/zip/lambda.zip"
    source_code_hash = filebase64sha256("/path/to/zip/lambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"

    vpc_config {
    subnet_ids = var.subnet_ids
    security_group_ids = var.security_group_ids
    }
    }

    Please note: This example contains the minimum set of required configuration.

    Make sure to include a vpc_config that matches the information you've provided the cumulus module if intending to integrate the lambda with a Cumulus deployment.

    Java Lambda

    Java Lambdas are created in much the same way as the Node.js example above.

    The source points to a folder with the compiled .class files and dependency libraries in the Lambda Java zip folder structure (details here), not an uber-jar.

    The deploy folder referenced here would contain a folder 'test_task/task/' which contains Task.class and TaskLogic.class as well as a lib folder containing dependency jars.

    Python Lambda

    Python Lambdas are created the same way as the Node.js example above.

    Cumulus Message Adapter

    For Lambdas wishing to utilize the Cumulus Message Adapter(CMA), you should define a layers key on your Lambda resource with the CMA you wish to include. See the input_output docs for more on how to create/use the CMA.

    Other Lambda Options

    Cumulus supports all of the options available to you via the aws_lambda_function Terraform resource. For more information on what's available, check out the Terraform resource docs.

    Cloudwatch log groups

    If you want to enable Cloudwatch logging for your Lambda resource, you'll need to add a aws_cloudwatch_log_group resource to your Lambda definition:

    resource "aws_cloudwatch_log_group" "myfunction_log_group" {
    name = "/aws/lambda/${aws_lambda_function.myfunction.function_name}"
    retention_in_days = 30
    tags = { Deployment = var.prefix }
    }
    - + \ No newline at end of file diff --git a/docs/v9.9.0/workflows/protocol/index.html b/docs/v9.9.0/workflows/protocol/index.html index 1a972b5d296..b116e7a7448 100644 --- a/docs/v9.9.0/workflows/protocol/index.html +++ b/docs/v9.9.0/workflows/protocol/index.html @@ -5,13 +5,13 @@ Workflow Protocol | Cumulus Documentation - +
    Version: v9.9.0

    Workflow Protocol

    Configuration and Message Use Diagram

    A diagram showing at which point in a workflow the Cumulus message is checked for conformity with the message schema and where the configuration is checked for conformity with the configuration schema

    • Configuration - The Cumulus workflow configuration defines everything needed to describe an instance of Cumulus.
    • Scheduler - This starts ingest of a collection on configured intervals.
    • Input to Step Functions - The Scheduler uses the Configuration as source data to construct the input to the Workflow.
    • AWS Step Functions - Run the workflows as kicked off by the scheduler or other processes.
    • Input to Task - The input for each task is a JSON document that conforms to the message schema.
    • Output from Task - The output of each task must conform to the message schemas as well and is used as the input for the subsequent task.
    - + \ No newline at end of file diff --git a/docs/v9.9.0/workflows/workflow-configuration-how-to/index.html b/docs/v9.9.0/workflows/workflow-configuration-how-to/index.html index f28e7937a2c..26fdfe9ec03 100644 --- a/docs/v9.9.0/workflows/workflow-configuration-how-to/index.html +++ b/docs/v9.9.0/workflows/workflow-configuration-how-to/index.html @@ -5,7 +5,7 @@ Workflow Configuration How To's | Cumulus Documentation - + @@ -15,7 +15,7 @@ To take a subset of any given metadata, use the option substring.

    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{substring(file.name, 0, 3)}"

    This example will populate to "MOD09GQ/MOD"

    In addition to substring, several datetime-specific functions are available, which can parse a datetime string in the metadata and extract a certain part of it:

    "url_path": "{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"

    or

     "url_path": "{dateFormat(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime, YYYY-MM-DD[T]HH[:]mm[:]ss)}"

    The following functions are implemented:

    • extractYear - returns the year, formatted as YYYY
    • extractMonth - returns the month, formatted as MM
    • extractDate - returns the day of the month, formatted as DD
    • extractHour - returns the hour in 24-hour format, with no leading zero
    • dateFormat - takes a second argument describing how to format the date, and passes the metadata date string and the format argument to moment().format()

    Note: the move-granules step needs to be in the workflow for this template to be populated and the file moved. This cmrMetadata or CMR granule XML needs to have been generated and stored on S3. From there any field could be retrieved and used for a url_path.

    Adding Metadata dates and times to the URL Path

    There are a number of options to pull dates from the CMR file metadata. With this metadata:

    <Granule>
    <Temporal>
    <RangeDateTime>
    <BeginningDateTime>2003-02-19T00:00:00Z</BeginningDateTime>
    <EndingDateTime>2003-02-19T23:59:59Z</EndingDateTime>
    </RangeDateTime>
    </Temporal>
    </Granule>

    The following examples of url_path could be used.

    {extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the year from the full date: 2003.

    {extractMonth(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the month: 2.

    {extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the day: 19.

    {extractHour(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the hour: 0.

    Different values can be combined to create the url_path. For example

    {
    "bucket": "sample-protected-bucket",
    "name": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)/extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"
    }

    The final file location for the above would be s3://sample-protected-bucket/MOD09GQ/2003/19/MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.

    - + \ No newline at end of file diff --git a/docs/v9.9.0/workflows/workflow-triggers/index.html b/docs/v9.9.0/workflows/workflow-triggers/index.html index f5d7ffde5b1..72348bd2d99 100644 --- a/docs/v9.9.0/workflows/workflow-triggers/index.html +++ b/docs/v9.9.0/workflows/workflow-triggers/index.html @@ -5,13 +5,13 @@ Workflow Triggers | Cumulus Documentation - +
    Version: v9.9.0

    Workflow Triggers

    For a workflow to run, it needs to be associated with a rule (see rule configuration). The rule configuration determines how and when a workflow execution is triggered. Rules can be triggered one time, on a schedule, or by new data written to a kinesis stream.

    There are three lambda functions in the API package responsible for scheduling and starting workflows: SF scheduler, message consumer, and SF starter. Each Cumulus instance comes with a Start SF SQS queue.

    The SF scheduler lambda puts a message onto the Start SF queue. This message is picked up the Start SF lambda and an execution is started with the body of the message as the input.

    When a one time rule is created, the schedule SF lambda is triggered. Rules that are not one time are associated with a CloudWatch event which will manage the trigger of the lambdas that trigger the workflows.

    For a scheduled rule, the Cloudwatch event is triggered on the given schedule which calls directly to the schedule SF lambda.

    For a kinesis rule, when data is added to the kinesis stream, the Cloudwatch event is triggered, which calls the message consumer lambda. The message consumer lambda parses the kinesis message and finds all of the rules associated with that message. For each rule (which corresponds to one workflow), the schedule SF lambda is triggered to queue a message to start the workflow.

    For an sns rule, when a message is published to the SNS topic, the message consumer receives the SNS message (JSON expected), parses it into an object, starts a new execution of the workflow associated with the rule and passes the object in the payload field of the Cumulus message.

    Diagram showing how workflows are scheduled via rules

    - + \ No newline at end of file diff --git a/docs/workflow_tasks/discover_granules/index.html b/docs/workflow_tasks/discover_granules/index.html index 4b0d3a9f28c..b248ca37828 100644 --- a/docs/workflow_tasks/discover_granules/index.html +++ b/docs/workflow_tasks/discover_granules/index.html @@ -5,7 +5,7 @@ Discover Granules | Cumulus Documentation - + @@ -21,7 +21,7 @@ included in a granule's file list. That is, no such filtering based on filename occurs as described above.

    When set on the task configuration, the value applies to all collections during discovery. Otherwise, this property may be set on individual collections.

    Concurrency

    A number property that determines the level of concurrency with which granule duplicate checks are performed when duplicateGranuleHandling is skip or error.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when discover-granules discovers a large number of granules with skip or error duplicate handling. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the discover-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/workflow_tasks/files_to_granules/index.html b/docs/workflow_tasks/files_to_granules/index.html index a8bd8f400bc..4e57570fe89 100644 --- a/docs/workflow_tasks/files_to_granules/index.html +++ b/docs/workflow_tasks/files_to_granules/index.html @@ -5,13 +5,13 @@ Files To Granules | Cumulus Documentation - +
    Version: v16.0.0

    Files To Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming config.inputGranules and the task input list of s3 URIs along with the rest of the configuration objects to take the list of incoming files and sort them into a list of granule objects.

    Please note Files passed in without metadata defined previously for config.inputGranules will be added with the following keys:

    • size
    • bucket
    • key
    • fileName

    It is primarily intended to support compatibility with the standard output of a processing task, and convert that output into a granule object accepted as input by the majority of other Cumulus tasks.

    Task Inputs

    Input

    This task expects an incoming input that contains an array of 'staged' S3 URIs to move to their final archive location.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    inputGranules

    An array of Cumulus granule objects.

    This object will be used to define metadata values for the move granules task, and is the basis for the updated object that will be added to the output.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects as the payload for the next task, and returns only the expected payload for the next task.

    - + \ No newline at end of file diff --git a/docs/workflow_tasks/lzards_backup/index.html b/docs/workflow_tasks/lzards_backup/index.html index 01e8e9da6c2..5a0a4dbba41 100644 --- a/docs/workflow_tasks/lzards_backup/index.html +++ b/docs/workflow_tasks/lzards_backup/index.html @@ -5,13 +5,13 @@ LZARDS Backup | Cumulus Documentation - +
    Version: v16.0.0

    LZARDS Backup

    The LZARDS backup task takes an array of granules and initiates backup requests to the LZARDS API, which will be handled asynchronously by LZARDS.

    Deployment

    The LZARDS backup task is not automatically deployed with Cumulus. To deploy the task through the Cumulus module, first you must specify a lzards_launchpad_passphrase in your terraform variables (e.g. variables.tf) like so:

    variable "lzards_launchpad_passphrase" {
    type = string
    default = ""
    }

    Then you can specify a value for your lzards_launchpad_passphrase in terraform.tfvars like so:

    lzards_launchpad_passphrase = your-passphrase

    Lastly, you need to make sure that the lzards_launchpad_passphrase is passed into the Cumulus module (in main.tf) like so:

    lzards_launchpad_passphrase  = var.lzards_launchpad_passphrase

    In short, deploying the LZARDS task requires configuring a passphrase variable and ensuring that your TF configuration passes that variable into the Cumulus module.

    Additional terraform configuration for the LZARDS task can be found in the cumulus module's variables.tf file, where the the relevant variables are prefixed with lzards_. You can add these variables to your deployment using the same process outlined above for lzards_launchpad_passphrase.

    Task Inputs

    Input

    This task expects an array of granules as input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Task Outputs

    Output

    The LZARDS task outputs a composite object containing:

    • the input granules array, and
    • a backupResults object that describes the results of LZARDS backup attempts.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    - + \ No newline at end of file diff --git a/docs/workflow_tasks/move_granules/index.html b/docs/workflow_tasks/move_granules/index.html index 95e88a7e47a..e9f0c0dc0bf 100644 --- a/docs/workflow_tasks/move_granules/index.html +++ b/docs/workflow_tasks/move_granules/index.html @@ -5,13 +5,13 @@ Move Granules | Cumulus Documentation - +
    Version: v16.0.0

    Move Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    This task utilizes the incoming event.input array of Cumulus granule objects to do the following:

    • Move granules from their 'staging' location to the final location (as configured in the Sync Granules task)

    • Update the event.input object with the new file locations.

    • If the granule has a ECHO10/UMM CMR file(.cmr.xml or .cmr.json) file included in the event.input:

      • Update that file's access locations

      • Add it to the appropriate access URL category for the CMR filetype as defined by granule CNM filetype.

      • Set the CMR file to 'metadata' in the output granules object and add it to the granule files if it's not already present.

        Please note: Granules without a valid CNM type set in the granule file type field in event.input will be treated as "data" in the updated CMR metadata file

    • Task then outputs an updated list of granule objects.

    Task Inputs

    Input

    This task expects an incoming input that contains a list of 'staged' S3 URIs to move to their final archive location. If CMR metadata is to be updated for a granule, it must also be included in the input.

    For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects event.input to provide an array of Cumulus granule objects. The files listed for each granule represent the files to be acted upon as described in summary.

    Task Outputs

    This task outputs an assembled array of Cumulus granule objects with post-move file locations as the payload for the next task, and returns only the expected payload for the next task. If a CMR file has been specified for a granule object, the CMR resources related to the granule files will be updated according to the updated granule file metadata.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/workflow_tasks/parse_pdr/index.html b/docs/workflow_tasks/parse_pdr/index.html index 0d8ebaa6572..0c84d8aa610 100644 --- a/docs/workflow_tasks/parse_pdr/index.html +++ b/docs/workflow_tasks/parse_pdr/index.html @@ -5,13 +5,13 @@ Parse PDR | Cumulus Documentation - +
    Version: v16.0.0

    Parse PDR

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to do the following with the incoming PDR object:

    • Stage it to an internal S3 bucket

    • Parse the PDR

    • Archive the PDR and remove the staged file if successful

    • Outputs a payload object containing metadata about the parsed PDR (e.g. total size of all files, files counts, etc) and a granules object

    The constructed granules object is created using PDR metadata to determine values like data type and version, collection definitions to determine a file storage location based on the extracted data type and version number.

    Granule file types are converted from the PDR spec types to CNM types according to the following translation table:

      HDF: 'data',
    HDF-EOS: 'data',
    SCIENCE: 'data',
    BROWSE: 'browse',
    METADATA: 'metadata',
    BROWSE_METADATA: 'metadata',
    QA_METADATA: 'metadata',
    PRODHIST: 'qa',
    QA: 'metadata',
    TGZ: 'data',
    LINKAGE: 'data'

    Files missing file types will have none assigned, files with invalid types will result in a PDR parse failure.

    Task Inputs

    Input

    This task expects an incoming input that contains name and path information about the PDR to be parsed. For the specifics, see the Cumulus Tasks page entry for the schema.

    Configuration

    This task does expect values to be set in the workflow_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    Provider

    A Cumulus provider object. Used to define connection information for retrieving the PDR.

    Bucket

    Defines the bucket where the 'pdrs' folder for parsed PDRs will be stored.

    Collection

    A Cumulus collection object. Used to define granule file groupings and granule metadata for discovered files.

    Task Outputs

    This task outputs a single payload output object containing metadata about the parsed PDR (e.g. filesCount, totalSize, etc), a pdr object with information for later steps and a the generated array of granule objects.

    Examples

    See the SIPS workflow cookbook for an example of this task in a workflow

    - + \ No newline at end of file diff --git a/docs/workflow_tasks/queue_granules/index.html b/docs/workflow_tasks/queue_granules/index.html index 7dcdad7e56e..06183fcd9ab 100644 --- a/docs/workflow_tasks/queue_granules/index.html +++ b/docs/workflow_tasks/queue_granules/index.html @@ -5,14 +5,14 @@ Queue Granules | Cumulus Documentation - +
    Version: v16.0.0

    Queue Granules

    This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.

    Links to the npm package, task input, output and configuration schema definitions, and more can be found on the auto-generated Cumulus Tasks page.

    Summary

    The purpose of this task is to schedule ingest of granules that were discovered on a remote host, whether via the DiscoverGranules task or the ParsePDR task.

    The task utilizes a defined collection in concert with a defined provider, either on each granule, or passed in via config to queue up ingest executions for each granule, or for batches of granules.

    The constructed granules object is defined by the collection passed in the configuration, and has impacts to other provided core Cumulus Tasks.

    Users of this task in a workflow are encouraged to carefully consider their configuration in context of downstream tasks and workflows.

    Task Inputs

    Each of the following sections are a high-level discussion of the intent of the various input/output/config values.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Input

    This task expects an incoming input that contains granules and information about them and their files. For the specifics, see the Cumulus Tasks page entry for the schema.

    This input is most commonly the output from a preceding DiscoverGranules or ParsePDR task.

    Cumulus Configuration

    This task does expect values to be set in the task_config CMA parameters for the workflows. A schema exists that defines the requirements for the task.

    For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.

    Below are expanded descriptions of selected config keys:

    provider

    A Cumulus provider object for the originating provider. Will be passed along to the ingest workflow. This will be overruled by more specific provider information that may exist on a granule.

    internalBucket

    The Cumulus internal system bucket.

    granuleIngestWorkflow

    A string property that denotes the name of the ingest workflow into which granules should be queued.

    queueUrl

    A string property that denotes the URL of the queue to which scheduled execution messages are sent.

    preferredQueueBatchSize

    A number property that sets an upper bound on the size of each batch of granules queued into the payload of an ingest execution. Setting this property to a value higher than 1 allows queueing of multiple granules per ingest workflow.

    As ingest executions typically expect granules in the payload to have a common collection and common provider, this property only sets an upper bound within which batches will be created based on common collection and provider information.

    This means batches may be smaller than the preferred size if collection or provider information diverge, but never larger.

    The default value if none is specified is 1, which will queue one ingest execution per granule.

    concurrency

    A number property that determines the level of concurrency with which ingest executions are scheduled. Granules or batches of granules will be queued up into executions at this level of concurrency.

    This property is also used to limit concurrency when updating granule status to queued.

    Limiting concurrency helps to avoid throttling by the AWS Lambda API and helps to avoid encountering account Lambda concurrency limitations.

    We do not recommend increasing this value unless you are seeing Lambda.Timeout errors when queue-granules receives a large number of granules as input. However, as increasing the concurrency may lead to Lambda API or Lambda concurrency throttling errors, you may wish to consider converting the queue-granules task to an ECS activity, which does not face similar runtime constraints.

    The default value is 3.

    executionNamePrefix

    A string property that will prefix the names of scheduled executions.

    childWorkflowMeta

    An object property that will be merged into the scheduled execution input's meta field.

    Task Outputs

    This task outputs an assembled array of workflow execution ARNs for all scheduled workflow executions within the payload's running object.

    - + \ No newline at end of file diff --git a/docs/workflows/cumulus-task-message-flow/index.html b/docs/workflows/cumulus-task-message-flow/index.html index cfcc43a09b5..ce3a0747b4f 100644 --- a/docs/workflows/cumulus-task-message-flow/index.html +++ b/docs/workflows/cumulus-task-message-flow/index.html @@ -5,14 +5,14 @@ Cumulus Tasks: Message Flow | Cumulus Documentation - +
    Version: v16.0.0

    Cumulus Tasks: Message Flow

    Cumulus Tasks comprise Cumulus Workflows and are either AWS Lambda tasks or AWS Elastic Container Service (ECS) activities. Cumulus Tasks permit a payload as input to the main task application code. The task payload is additionally wrapped by the Cumulus Message Adapter. The Cumulus Message Adapter supplies additional information supporting message templating and metadata management of these workflows.

    Diagram showing how incoming and outgoing Cumulus messages for workflow steps are handled by the Cumulus Message Adapter

    The steps in this flow are detailed in sections below.

    Cumulus Message Format

    A full Cumulus Message has the following keys:

    • cumulus_meta: System runtime information that should generally not be touched outside of Cumulus library code or the Cumulus Message Adapter. Stores meta information about the workflow such as the state machine name and the current workflow execution's name. This information is used to look up the current active task. The name of the current active task is used to look up the corresponding task's config in task_config.
    • meta: Runtime information captured by the workflow operators. Stores execution-agnostic variables.
    • payload: Payload is runtime information for the tasks.

    In addition to the above keys, it may contain the following keys:

    • replace: A key generated in conjunction with the Cumulus Message adapter. It contains the location on S3 for a message payload and a Target JSON path in the message to extract it to.
    • exception: A key used to track workflow exceptions, should not be modified outside of Cumulus library code.

    Here's a simple example of a Cumulus Message:

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    A message utilizing the Cumulus Remote message functionality must have at least the keys replace and cumulus_meta. Depending on configuration other portions of the message may be present, however the cumulus_meta, meta, and payload keys must be present once extraction is complete.

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    Cumulus Message Preparation

    The event coming into a Cumulus Task is assumed to be a Cumulus Message and should first be handled by the functions described below before being passed to the task application code.

    Preparation Step 1: Fetch remote event

    Fetch remote event will fetch the full event from S3 if the cumulus message includes a replace key.

    Once "my-large-event.json" is fetched from S3, it's returned from the fetch remote event function. If no "replace" key is present, the event passed to the fetch remote event function is assumed to be a complete Cumulus Message and returned as-is.

    Preparation Step 2: Parse step function config from CMA configuration parameters

    This step determines what current task is being executed. Note this is different from what lambda or activity is being executed, because the same lambda or activity can be used for different tasks. The current task name is used to load the appropriate configuration from the Cumulus Message's 'task_config' configuration parameter.

    Preparation Step 3: Load nested event

    Using the config returned from the previous step, load nested event resolves templates for the final config and input to send to the task's application code.

    Task Application Code

    After message prep, the message passed to the task application code is of the form:

    {
    "input": {},
    "config": {}
    }

    Create Next Message functions

    Whatever comes out of the task application code is used to construct an outgoing Cumulus Message.

    Create Next Message Step 1: Assign outputs

    The config loaded from the Fetch step function config step may have a cumulus_message key. This can be used to "dispatch" fields from the task's application output to a destination in the final event output (via URL templating). Here's an example where the value of input.anykey would be dispatched as the value of payload.out in the final cumulus message:

    {
    "task_config": {
    "bar": "baz",
    "cumulus_message": {
    "input": "{$.payload.input}",
    "outputs": [
    {
    "source": "{$.input.anykey}",
    "destination": "{$.payload.out}"
    }
    ]
    }
    },
    "cumulus_meta": {
    "task": "Example",
    "message_source": "local",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "input": {
    "anykey": "anyvalue"
    }
    }
    }

    Create Next Message Step 2: Store remote event

    If the ReplaceConfiguration parameter is set, the configured key's value will be stored in S3 and the final output of the task will include a replace key that contains configuration for a future step to extract the payload on S3 back into the Cumulus Message. The replace key identifies where the large event node has been stored in S3.

    - + \ No newline at end of file diff --git a/docs/workflows/developing-a-cumulus-workflow/index.html b/docs/workflows/developing-a-cumulus-workflow/index.html index b2624c6ee65..2ea48b3a964 100644 --- a/docs/workflows/developing-a-cumulus-workflow/index.html +++ b/docs/workflows/developing-a-cumulus-workflow/index.html @@ -5,13 +5,13 @@ Creating a Cumulus Workflow | Cumulus Documentation - +
    Version: v16.0.0

    Creating a Cumulus Workflow

    The Cumulus workflow module

    To facilitate adding a workflows to your deployment Cumulus provides a workflow module.

    In combination with the Cumulus message, the workflow module provides a way to easily turn a Step Function definition into a Cumulus workflow, complete with:

    Using the module also ensures that your workflows will continue to be compatible with future versions of Cumulus.

    For more on the full set of current available options for the module, please consult the module README.

    Adding a new Cumulus workflow to your deployment

    To add a new Cumulus workflow to your deployment that is using the cumulus module, add a new workflow resource to your deployment directory, either in a new .tf file, or to an existing file.

    The workflow should follow a syntax similar to:

    module "my_workflow" {
    source = "https://github.com/nasa/cumulus/releases/download/vx.x.x/terraform-aws-cumulus-workflow.zip"

    prefix = "my-prefix"
    name = "MyWorkflowName"
    system_bucket = "my-internal-bucket"

    workflow_config = module.cumulus.workflow_config

    tags = { Deployment = var.prefix }

    state_machine_definition = <<JSON
    {}
    JSON
    }

    In the above example, you would add your state_machine_definition using the Amazon States Language, using tasks you've developed and Cumulus core tasks that are made available as part of the cumulus terraform module.

    Please note: Cumulus follows the convention of tagging resources with the prefix variable { Deployment = var.prefix } that you pass to the cumulus module. For resources defined outside of Core, it's recommended that you adopt this convention as it makes resources and/or deployment recovery scenarios much easier to manage.

    Examples

    For a functional example of a basic workflow, please take a look at the hello_world_workflow.

    For more complete/advanced examples, please read the following cookbook entries/topics:

    - + \ No newline at end of file diff --git a/docs/workflows/developing-workflow-tasks/index.html b/docs/workflows/developing-workflow-tasks/index.html index d3d150eb97b..dd01384a0c3 100644 --- a/docs/workflows/developing-workflow-tasks/index.html +++ b/docs/workflows/developing-workflow-tasks/index.html @@ -5,13 +5,13 @@ Developing Workflow Tasks | Cumulus Documentation - +
    Version: v16.0.0

    Developing Workflow Tasks

    Workflow tasks can be either AWS Lambda Functions or ECS Activities.

    Lambda functions

    The full set of available core Lambda functions can be found in the deployed cumulus module zipfile at /tasks, as well as reference documentation here. These Lambdas can be referenced in workflows via the outputs from that module (see the cumulus-template-deploy repo for an example).

    The tasks source is located in the Cumulus repository at cumulus/tasks.

    You can also develop your own Lambda function. See the Lambda Functions page to learn more.

    ECS Activities

    ECS activities are supported via the cumulus_ecs_module available from the Cumulus release page.

    Please read the module README for configuration details.

    For assistance in creating a task definition within the module read the AWS Task Definition Docs.

    For a step-by-step example of using the cumulus_ecs_module, please see the related cookbook entry.

    Cumulus Docker Image

    ECS activities require a docker image. Cumulus provides a docker image (source for node 12x+ lambdas on dockerhub: cumuluss/cumulus-ecs-task.

    Alternate Docker Images

    Custom docker images/runtimes are supported as are private registries. For details on configuring a private registry/image see the AWS documentation on Private Registry Authentication for Tasks.

    - + \ No newline at end of file diff --git a/docs/workflows/docker/index.html b/docs/workflows/docker/index.html index 50f2834d962..779f10958a2 100644 --- a/docs/workflows/docker/index.html +++ b/docs/workflows/docker/index.html @@ -5,7 +5,7 @@ Dockerizing Data Processing | Cumulus Documentation - + @@ -14,7 +14,7 @@ 2) validate the output (in this case just check for existence) 3) use 'ncatted' to update the resulting file to be CF-compliant 4) write out metadata generated for this file

    Process Testing

    It is important to have tests for data processing, however in many cases datafiles can be large so it is not practical to store the test data in the repository. Instead, test data is currently stored on AWS S3, and can be retrieved using the AWS CLI.

    aws s3 sync s3://cumulus-ghrc-logs/sample-data/collection-name data

    Where collection-name is the name of the data collection, such as 'avaps', or 'cpl'. For example, an abridged version of the data for CPL includes:

    ├── cpl
    │   ├── input
    │   │   ├── HS3_CPL_ATB_12203a_20120906.hdf5
    │   │   ├── HS3_CPL_OP_12203a_20120906.hdf5
    │   └── output
    │   ├── HS3_CPL_ATB_12203a_20120906.nc
    │   ├── HS3_CPL_ATB_12203a_20120906.nc.meta.xml
    │   ├── HS3_CPL_OP_12203a_20120906.nc
    │   ├── HS3_CPL_OP_12203a_20120906.nc.meta.xml

    Contained in the input directory are all possible sets of data files, while the output directory is the expected result of processing. In this case the hdf5 files are converted to NetCDF files and XML metadata files are generated.

    The docker image for a process can be used on the retrieved test data. First create a test-output directory in the newly created data directory.

    mkdir data/test-output

    Then run the docker image using docker-compose.

    docker-compose run test

    This will process the data in the data/input directory and put the output into data/test-output. Repositories also include Python based tests which will validate this newly created output to the contents of data/output. Use Python's Nose tool to run the included tests.

    nosetests

    If the data/test-output directory validated against the contents of data/output the tests will be successful, otherwise an error will be reported.

    - + \ No newline at end of file diff --git a/docs/workflows/index.html b/docs/workflows/index.html index f04fd8a71e1..4eeb0e2b25e 100644 --- a/docs/workflows/index.html +++ b/docs/workflows/index.html @@ -5,13 +5,13 @@ Workflows Overview | Cumulus Documentation - +
    Version: v16.0.0

    Workflows Overview

    Workflows are comprised of one or more AWS Lambda Functions and ECS Activities to discover, ingest, process, manage and archive data.

    Provider data ingest and GIBS have a set of common needs in getting data from a source system and into the cloud where they can be distributed to end users. These common needs are:

    • Data Discovery - Crawling, polling, or detecting changes from a variety of sources.
    • Data Transformation - Taking data files in their original format and extracting and transforming them into another desired format such as visible browse images.
    • Archival - Storage of the files in a location that's accessible to end users.

    The high level view of the architecture and many of the individual steps are the same but the details of ingesting each type of collection differs. Different collection types and different providers have different needs. The individual boxes of a workflow are not only different. The branching, error handling, and multiplicity of the arrows connecting the boxes are also different. Some need visible images rendered from component data files from multiple collections. Some need to contact the CMR with updated metadata. Some will have different retry strategies to handle availability issues with source data systems.

    AWS and other cloud vendors provide an ideal solution for parts of these problems but there needs to be a higher level solution to allow the composition of AWS components into a full featured solution. The Ingest Workflow Architecture is designed to meet the needs for Earth Science data ingest and transformation.

    Goals

    Flexibility and Composability

    The steps to ingest and process data is different for each collection within a provider. Ingest should be as flexible as possible in the rearranging of steps and configuration.

    We want to use lego-like individual steps that can be composed by an operator.

    Individual steps should ...

    • Be as ignorant as possible of the overall flow. They should not be aware of previous steps.
    • Be runnable on their own.
    • Define their input and output in simple data structures.
    • Be domain agnostic.
    • Not make assumptions of specifics of what goes into a granule for example.

    Scalable

    The ingest architecture needs to be scalable both to handle ingesting hundreds of millions of granules and interpret dozens of different workflows.

    Data Provenance

    • We should have traceability for how data was produced and where it comes from.
    • Use immutable representations of data. Data once received is not overwritten. Data can be removed for cleanup.
    • All software is versioned. We can trace transformation of data by tracking the immutable source data and the versioned software applied to it.

    Operator Visibility and Control

    • Operators should be able to see and understand everything that is happening in the system.
    • It should be obvious why things are happening and straightforward to diagnose problems.
    • We generally assume that the operators know best in terms of the limits on a providers infrastructure, how often things need to be done, and details of a collection. The architecture should defer to their decisions and knowledge while providing safety nets to prevent problems.

    A Reconfigurable Workflow Architecture

    The Ingest Workflow Architecture is defined by two entity types, Workflows and Tasks. A Workflow is a set of composed Tasks to complete an objective such as ingesting a granule. Tasks are the individual steps of a Workflow that perform one job. The workflow is responsible for executing the right task based on the current state and response from the last task executed. Tasks are completely decoupled in that they don't call each other or even need to know about the presence of other tasks.

    Workflows and tasks are configured as Terraform resources, which are triggered via configured rules within Cumulus.

    Diagram showing the Step Function execution path through workflow tasks for a collection ingest

    See the Example GIBS Ingest Architecture showing how workflows and tasks are used to define the GIBS Ingest Architecture.

    Workflows

    A workflow is a provider-configured set of steps that describe the process to ingest data. Workflows are defined using AWS Step Functions.

    Benefits of AWS Step Functions

    AWS Step functions are described in detail in the AWS documentation but they provide several benefits which are applicable to AWS.

    • Prebuilt solution
    • Operations Visibility
      • Visual diagram
      • Every execution is recorded with both inputs and output for every step.
    • Composability
      • Allow composing AWS Lambdas and code running in other steps. Code can be run in EC2 to interface with it or even on premise if desired.
      • Step functions allow specifying when steps run in parallel or choices between steps based on data from the previous step.
    • Flexibility
      • Step functions are designed to be easy to build new applications and reconfigure. We're exposing that flexibility directly to the provider.
    • Reliability and Error Handling
      • Step functions allow configuration of retries and adding handling of error conditions.
    • Described via data
      • This makes it easy to save the step function in configuration management solutions.
      • We can build simple interfaces on top of the flexibility provided.

    Workflow Scheduler

    The scheduler is responsible for initiating a step function and passing in the relevant data for a collection. This is currently configured as an interval for each collection. The scheduler service creates the initial event by combining the collection configuration with the AWS execution context defined via the cumulus terraform module.

    Tasks

    A workflow is composed of tasks. Each task is responsible for performing a discrete step of the ingest process. These can be activities like:

    • Crawling a provider website for new data.
    • Uploading data from a provider to S3.
    • Executing a process to transform data.

    AWS Step Functions permit tasks to be code running anywhere, even on premise. We expect most tasks will be written as Lambda functions in order to take advantage of the easy deployment, scalability, and cost benefits provided by AWS Lambda.

    • Leverages Existing Work
      • The design leverages the existing work of Amazon by defining workflows using the AWS Step Function State Language. This is the language that was created for describing the state machines used in AWS Step Functions.
    • Open for Extension
      • Both meta and task_config which are used for configuring at the collection and task levels do not dictate the fields and structure of the configuration. Additional task specific JSON schemas can be used for extending the validation of individual steps.
    • Data-centric Configuration
      • The use of a single JSON configuration file allows this to be added to a workflow. We build additional support on top of the configuration file for simpler domain specific configuration or interactive GUIs.

    For more details on Task Messages and Configuration, visit Cumulus configuration and message protocol documentation.

    Ingest Deploy

    To view deployment documentation, please see the Cumulus deployment documentation.

    Tradeoffs, and Benefits

    This section documents various tradeoffs and benefits of the Ingest Workflow Architecture.

    Tradeoffs

    Workflow execution is handled completely by AWS

    This means we can't add our own code into the orchestration of the workflow. We can't add new features not supported by Step Functions. We can't do things like enforce that the responses from tasks always conform to a schema or extract the configuration for a task ahead of it's execution.

    If we implemented our own orchestration we'd be able to add all of these. We save significant amounts of development effort and gain all the features of Step Functions for this trade off. One workaround is by providing a library of common task capabilities. These would optionally be available to tasks that can be implemented with Node.js and are able to include the library.

    Workflow Configuration is specified in AWS Step Function States Language

    The current design combines the states language defined by AWS with Ingest specific configuration. This means our representation has a tight coupling with their standard. If they make backwards incompatible changes in the future we will have to deal with existing projects written against that.

    We avoid having to develop our own standard and code to process it. The design can support new features in AWS Step Functions without needing to update the Ingest library code changes. It is unlikely they will make a backwards incompatible change at this point. One mitigation for this is writing data transformations to a new format if that were to happen.

    Collection Configuration Flexibility vs Complexity

    The Collections Configuration File is very flexible but requires more knowledge of AWS step functions to configure. A person modifying this file directly would need to comfortable editing a JSON file and configuring AWS Step Functions state transitions which address AWS resources.

    The configuration file itself is not necessarily meant to be edited by a human directly. Since we are developing a reconfigurable, composable architecture that specified entirely in data additional tools can be developed on top of it. The existing recipes.json files can be mapped to this format. Operational Tools like a GUI can be built that provide a usable interface for customizing workflows but it will take time to develop these tools.

    Benefits

    This section describes benefits of the Ingest Workflow Architecture.

    Simplicity

    The concepts of Workflows and Tasks are simple ones that should make sense to providers. Additionally, the implementation will only consist of a few components because the design leverages existing services and capabilities of AWS. The Ingest implementation will only consist of some reusable task code to make task implementation easier, Ingest deployment, and the Workflow Scheduler.

    Composability

    The design aims to satisfy the needs for ingest integrating different workflows for providers. It's flexible in terms of the ability to arrange tasks to meet the needs of a collection. Providers have developed and incorporated open source tools over the years. All of these are easily integrable into the workflows as tasks.

    There is low coupling between task steps. Failures of one component don't bring the whole system down. Individual tasks can be deployed separately.

    Scalability

    AWS Step Functions scale up as needed and aren't limited by a set of number of servers. They also easily allow you to leverage the inherent scalability of serverless functions.

    Monitoring and Auditing

    • Every execution is captured.
    • Every task run has captured input and outputs.
    • CloudWatch Metrics can be used for monitoring many of the events with the StepFunctions. It can also generate alarms for the whole process.
    • Visual report of the entire configuration.
      • Errors and success states are highlighted visually in the flow.

    Data Provenance

    • Monitoring and auditing ensures we know the data that was given to a task.
    • Workflows are versioned and the state machines stored in AWS Step Functions are immutable. Once created they cannot change.
    • Versioning of data in S3 or using immutable records in S3 will mean we always know what data was created as the result of a step or fed into a step.

    Appendix

    Example GIBS Ingest Architecture

    This shows the GIBS Ingest Architecture as an example of the use of the Ingest Workflow Architecture.

    • The GIBS Ingest Architecture consists of two workflows per collection type. There is one for discovery and one for ingest. The final stage of discovery triggers multiple ingest workflows for each MRF granule that needs to be generated.
    • It demonstrates both lambdas as tasks and a container used for MRF generation.

    GIBS Ingest Workflows

    Diagram showing the AWS Step Function execution path for a GIBS ingest workflow

    GIBS Ingest Granules Workflow

    This shows a visualization of an execution of the ingets granules workflow in step functions. The steps highlighted in green are the ones that executed and completed successfully.

    Diagram showing the AWS Step Function execution path for a GIBS ingest granules workflow

    - + \ No newline at end of file diff --git a/docs/workflows/input_output/index.html b/docs/workflows/input_output/index.html index 40810f781e6..ff1617e6bdd 100644 --- a/docs/workflows/input_output/index.html +++ b/docs/workflows/input_output/index.html @@ -5,14 +5,14 @@ Workflow Inputs & Outputs | Cumulus Documentation - +
    Version: v16.0.0

    Workflow Inputs & Outputs

    General Structure

    Cumulus uses a common format for all inputs and outputs to workflows. The same format is used for input and output from workflow steps. The common format consists of a JSON object which holds all necessary information about the task execution and AWS environment. Tasks return objects identical in format to their input with the exception of a task-specific payload field. Tasks may also augment their execution metadata.

    Cumulus Message Adapter

    The Cumulus Message Adapter and Cumulus Message Adapter libraries help task developers integrate their tasks into a Cumulus workflow. These libraries adapt input and outputs from tasks into the Cumulus Message format. The Scheduler service creates the initial event message by combining the collection configuration, external resource configuration, workflow configuration, and deployment environment settings. The subsequent workflow messages between tasks must conform to the message schema. By using the Cumulus Message Adapter, individual task Lambda functions only receive the input and output specifically configured for the task, and not non-task-related message fields.

    The Cumulus Message Adapter libraries are called by the tasks with a callback function containing the business logic of the task as a parameter. They first adapt the incoming message to a format more easily consumable by Cumulus tasks, then invoke the task, and then adapt the task response back to the Cumulus message protocol to be sent to the next task.

    A task's Lambda function can be configured to include a Cumulus Message Adapter library which constructs input/output messages and resolves task configurations. The CMA can then be included in one of several ways:

    Lambda Layer

    In order to make use of this configuration, a Lambda layer must be uploaded to your account. Due to platform restrictions, Core cannot currently support sharable public layers, however you can deploy the appropriate version from the release page in two ways:

    Once you've deployed the layer, integrate the CMA layer with your Lambdas:

    • If using the cumulus module, set the cumulus_message_adapter_lambda_layer_version_arn in your .tfvars file to integrate the CMA layer with all core Cumulus lambdas.
    • If including your own Lambda or ECS task Terraform modules, specify the CMA layer ARN in the Terraform resource definitions. Also, make sure to set the CUMULUS_MESSAGE_ADAPTER_DIR environment variable for the task to /opt for the CMA integration to work properly.

    In the future if you wish to update/change the CMA version you will need to update the deployed CMA, and update the layer configuration for the impacted Lambdas as needed.

    Please Note: Updating/removing a layer does not change a deployed Lambda, so to update the CMA you should deploy a new version of the CMA layer, update the associated Lambda configuration to reference the new CMA version, and re-deploy your Lambdas.

    Manual Addition

    You can include the CMA package in the Lambda code in the cumulus-message-adapter sub-directory in your lambda .zip, for any Lambda runtime that includes a python runtime. python 2 is included in Lambda runtimes that use Amazon Linux, however Amazon Linux 2 will not support this directly.

    Please note: It is expected that upcoming Cumulus releases will update the CMA layer to include a python runtime.

    If you are manually adding the message adapter to your source and utilizing the CMA, you should set the Lambda's CUMULUS_MESSAGE_ADAPTER_DIR environment variable to target the installation path for the CMA.

    CMA Input/Output

    Input to the task application code is a json object with keys:

    • input: By default, the incoming payload is the payload output from the previous task, or it can be a portion of the payload as configured for the task in the corresponding .tf workflow definition file.
    • config: Task-specific configuration object with URL templates resolved.

    Output from the task application code is returned in and placed in the payload key by default, but the config key can also be used to return just a portion of the task output.

    CMA configuration

    As of Cumulus > 1.15 and CMA > v1.1.1, configuration of the CMA is expected to be driven by AWS Step Function Parameters.

    Using the CMA package with the Lambda by any of the above mentioned methods (Lambda Layers, manual) requires configuration for its various features via a specific Step Function Parameters configuration format (see sample workflows in the examples cumulus-tf source for more examples):

    {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": "{some config}",
    "task_config": "{some config}"
    }
    }

    The "event.$": "$" parameter is required as it passes the entire incoming message to the CMA client library for parsing, and the CMA itself to convert the incoming message into a Cumulus message for use in the function.

    The following are the CMA's current configuration settings:

    ReplaceConfig (Cumulus Remote Message)

    Because of the potential size of a Cumulus message, mainly the payload field, a task can be set via configuration to store a portion of its output on S3 with a message key Remote Message that defines how to retrieve it and an empty JSON object {} in its place. If the portion of the message targeted exceeds the configured MaxSize (defaults to 0 bytes) it will be written to S3.

    The CMA remote message functionality can be configured using parameters in several ways:

    Partial Message

    Setting the Path/Target path in the ReplaceConfig parameter (and optionally a non-default MaxSize)

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 1,
    "Path": "$.payload",
    "TargetPath": "$.payload"
    }
    }
    }
    }
    }

    will result in any payload output larger than the MaxSize (in bytes) to be written to S3. The CMA will then mark that the key has been replaced via a replace key on the event. When the CMA picks up the replace key in future steps, it will attempt to retrieve the output from S3 and write it back to payload.

    Note that you can optionally use a different TargetPath than Path, however as the target is a JSON path there must be a key to target for replacement in the output of that step. Also note that the JSON path specified must target one node, otherwise the CMA will error, as it does not support multiple replacement targets.

    If TargetPath is omitted, it will default to the value for Path.

    Full Message

    Setting the following parameters for a lambda:

    DiscoverGranules:
    Parameters:
    cma:
    event.$: '$'
    ReplaceConfig:
    FullMessage: true

    will result in the CMA assuming the entire inbound message should be stored to S3 if it exceeds the default max size.

    This is effectively the same as doing:

    {
    "DiscoverGranules": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "ReplaceConfig": {
    "MaxSize": 0,
    "Path": "$",
    "TargetPath": "$"
    }
    }
    }
    }
    }

    Cumulus Message example

    {
    "task_config": {
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    },
    "cumulus_meta": {
    "message_source": "sfn",
    "state_machine": "arn:aws:states:us-east-1:1234:stateMachine:MySfn",
    "execution_name": "MyExecution__id-1234",
    "id": "id-1234"
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Cumulus Remote Message example

    The message may contain a reference to an S3 Bucket, Key and TargetPath as follows:

    {
    "replace": {
    "Bucket": "cumulus-bucket",
    "Key": "my-large-event.json",
    "TargetPath": "$"
    },
    "cumulus_meta": {}
    }

    task_config

    This configuration key contains the input/output configuration values for definition of inputs/outputs via URL paths. Important: These values are all relative to json object configured for event.$.

    This configuration's behavior is outlined in the CMA step description below.

    The configuration should follow the format:

    {
    "FunctionName": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "other_cma_configuration": "<config object>",
    "task_config": "<task config>"
    }
    }
    }
    }

    Example:

    {
    "StepFunction": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "sfnEnd": true,
    "stack": "{$.meta.stack}",
    "bucket": "{$.meta.buckets.internal.name}",
    "stateMachine": "{$.cumulus_meta.state_machine}",
    "executionName": "{$.cumulus_meta.execution_name}",
    "cumulus_message": {
    "input": "{$}"
    }
    }
    }
    }
    }
    }

    Cumulus Message Adapter Steps

    1. Reformat AWS Step Function message into Cumulus Message

    Due to the way AWS handles Parameterized messages, when Parameters are used the CMA takes an inbound message:

    {
    "resource": "arn:aws:lambda:us-east-1:<lambda arn values>",
    "input": {
    "Other Parameter": {},
    "cma": {
    "ConfigKey": {
    "config values": "some config values"
    },
    "event": {
    "cumulus_meta": {},
    "payload": {},
    "meta": {},
    "exception": {}
    }
    }
    }
    }

    and takes the following actions:

    • Takes the object at input.cma.event and makes it the full input
    • Merges all of the keys except event under input.cma into the parent input object

    This results in the incoming message (presumably a Cumulus message) with any cma configuration parameters merged in being passed to the CMA. All other parameterized values defined outside of the cma key are ignored

    2. Resolve Remote Messages

    If the incoming Cumulus message has a replace key value, the CMA will attempt to pull the payload from S3,

    For example, if the incoming contains the following:

      "meta": {
    "foo": {}
    },
    "replace": {
    "TargetPath": "$.meta.foo",
    "Bucket": "some_bucket",
    "Key": "events/some-event-id"
    }

    The CMA will attempt to pull the file stored at Bucket/Key and replace the value at TargetPath, then remove the replace object entirely and continue.

    3. Resolve URL templates in the task configuration

    In the workflow configuration (defined under the task_config key), each task has its own configuration, and it can use URL template as a value to achieve simplicity or for values only available at execution time. The Cumulus Message Adapter resolves the URL templates (relative to the event configuration key) and then passes message to next task. For example, given a task which has the following configuration:

    {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }
    }
    }
    }

    and and incoming message that contains:

    {
    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    }
    }

    The corresponding Cumulus Message would contain:

    "meta": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    "task_config": {
    "provider": "{$.meta.provider}",
    "inlinestr": "prefix{meta.foo}suffix",
    "array": "{[$.meta.foo]}",
    "object": "{$.meta}"
    }

    The message sent to the task would be:

    "config" : {
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    },
    "inlinestr": "prefixbarsuffix",
    "array": ["bar"],
    "object": {
    "foo": "bar",
    "provider": {
    "id": "FOO_DAAC",
    "anykey": "anyvalue"
    }
    },
    },
    "input": "{...}"

    URL template variables replace dotted paths inside curly brackets with their corresponding value. If the Cumulus Message Adapter cannot resolve a value, it will ignore the template, leaving it verbatim in the string. While seemingly complex, this allows significant decoupling of Tasks from one another and the data that drives them. Tasks are able to easily receive runtime configuration produced by previously run tasks and domain data.

    4. Resolve task input

    By default, the incoming payload is the payload from the previous task. The task can also be configured to use a portion of the payload its input message. For example, given a task specifies cma.task_config.cumulus_message.input:

        ExampleTask:
    Parameters:
    cma:
    event.$: '$'
    task_config:
    cumulus_message:
    input: '{$.payload.foo}'

    The task configuration in the message would be:

        {
    "task_config": {
    "cumulus_message": {
    "input": "{$.payload.foo}"
    }
    },
    "payload": {
    "foo": {
    "anykey": "anyvalue"
    }
    }
    }

    The Cumulus Message Adapter will resolve the task input, instead of sending the whole payload as task input, the task input would be:

        {
    "input" : {
    "anykey": "anyvalue"
    },
    "config": {...}
    }

    5. Resolve task output

    By default, the task's return value is the next payload. However, the workflow task configuration can specify a portion of the return value as the next payload, and can also augment values to other fields. Based on the task configuration under cma.task_config.cumulus_message.outputs, the Message Adapter uses a task's return value to output a message as configured by the task-specific config defined under cma.task_config. The Message Adapter dispatches a "source" to a "destination" as defined by URL templates stored in the task-specific cumulus_message.outputs. The value of the task's return value at the "source" URL is used to create or replace the value of the task's return value at the "destination" URL. For example, given a task specifies cumulus_message.output in its workflow configuration as follows:

    {
    "ExampleTask": {
    "Parameters": {
    "cma": {
    "event.$": "$",
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    }
    }
    }
    }
    }

    The corresponding Cumulus Message would be:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar"
    },
    "payload": {
    "anykey": "anyvalue"
    }
    }

    Given the response from the task is:

        {
    "output": {
    "anykey": "boo"
    }
    }

    The Cumulus Message Adapter would output the following Cumulus Message:

        {
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    6. Apply Remote Message Configuration

    If the ReplaceConfig configuration parameter is defined, the CMA will evaluate the configuration options provided, and if required write a portion of the Cumulus Message to S3, and add a replace key to the message for future steps to utilize.

    Please Note: the non user-modifiable field cumulus-meta will always be retained, regardless of the configuration.

    For example, if the output message (post output configuration) from a cumulus message looks like:

        {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "ReplaceConfig": {
    "FullMessage": true
    },
    "task_config": {
    "cumulus_message": {
    "outputs": [
    {
    "source": "{$}",
    "destination": "{$.payload}"
    },
    {
    "source": "{$.output.anykey}",
    "destination": "{$.meta.baz}"
    }
    ]
    }
    },
    "meta": {
    "foo": "bar",
    "baz": "boo"
    },
    "payload": {
    "output": {
    "anykey": "boo"
    }
    }
    }

    the resultant output would look like:

    {
    "cumulus_meta": {
    "some_key": "some_value"
    },
    "replace": {
    "TargetPath": "$",
    "Bucket": "some-internal-bucket",
    "Key": "events/some-event-id"
    }
    }

    Additional features

    Validate task input, output and configuration messages against the schemas provided

    The Cumulus Message Adapter has the capability to validate task input, output and configuration messages against their schemas. The default location of the schemas is the schemas folder in the top level of the task and the default filenames are input.json, output.json, and config.json. The task can also configure a different schema location. If no schema can be found, the Cumulus Message Adapter will not validate the messages.

    - + \ No newline at end of file diff --git a/docs/workflows/lambda/index.html b/docs/workflows/lambda/index.html index 4e356becd1d..38939b953b5 100644 --- a/docs/workflows/lambda/index.html +++ b/docs/workflows/lambda/index.html @@ -5,13 +5,13 @@ Develop Lambda Functions | Cumulus Documentation - +
    Version: v16.0.0

    Develop Lambda Functions

    Develop a new Cumulus Lambda

    AWS provides great getting started guide for building Lambdas in the developer guide.

    Cumulus currently supports the following environments for Cumulus Message Adapter enabled functions:

    Additionally you may chose to include any of the other languages AWS supports as a resource with reduced feature support.

    Deploy a Lambda

    Node.js Lambda

    For a new Node.js Lambda, create a new function and add an aws_lambda_function resource to your Cumulus deployment (for examples, see the example in source example/lambdas.tf and ingest/lambda-functions.tf) as either a new .tf file, or added to an existing .tf file:

    resource "aws_lambda_function" "myfunction" {
    function_name = "${var.prefix}-function"
    filename = "/path/to/zip/lambda.zip"
    source_code_hash = filebase64sha256("/path/to/zip/lambda.zip")
    handler = "index.handler"
    role = module.cumulus.lambda_processing_role_arn
    runtime = "nodejs10.x"

    vpc_config {
    subnet_ids = var.subnet_ids
    security_group_ids = var.security_group_ids
    }
    }

    Please note: This example contains the minimum set of required configuration.

    Make sure to include a vpc_config that matches the information you've provided the cumulus module if intending to integrate the lambda with a Cumulus deployment.

    Java Lambda

    Java Lambdas are created in much the same way as the Node.js example above.

    The source points to a folder with the compiled .class files and dependency libraries in the Lambda Java zip folder structure (details here), not an uber-jar.

    The deploy folder referenced here would contain a folder 'test_task/task/' which contains Task.class and TaskLogic.class as well as a lib folder containing dependency jars.

    Python Lambda

    Python Lambdas are created the same way as the Node.js example above.

    Cumulus Message Adapter

    For Lambdas wishing to utilize the Cumulus Message Adapter(CMA), you should define a layers key on your Lambda resource with the CMA you wish to include. See the input_output docs for more on how to create/use the CMA.

    Other Lambda Options

    Cumulus supports all of the options available to you via the aws_lambda_function Terraform resource. For more information on what's available, check out the Terraform resource docs.

    Cloudwatch log groups

    If you want to enable Cloudwatch logging for your Lambda resource, you'll need to add a aws_cloudwatch_log_group resource to your Lambda definition:

    resource "aws_cloudwatch_log_group" "myfunction_log_group" {
    name = "/aws/lambda/${aws_lambda_function.myfunction.function_name}"
    retention_in_days = 30
    tags = { Deployment = var.prefix }
    }
    - + \ No newline at end of file diff --git a/docs/workflows/protocol/index.html b/docs/workflows/protocol/index.html index 8a98a63e0fe..ec05685ae07 100644 --- a/docs/workflows/protocol/index.html +++ b/docs/workflows/protocol/index.html @@ -5,13 +5,13 @@ Workflow Protocol | Cumulus Documentation - +
    Version: v16.0.0

    Workflow Protocol

    Configuration and Message Use Diagram

    A diagram showing at which point in a workflow the Cumulus message is checked for conformity with the message schema and where the configuration is checked for conformity with the configuration schema

    • Configuration - The Cumulus workflow configuration defines everything needed to describe an instance of Cumulus.
    • Scheduler - This starts ingest of a collection on configured intervals.
    • Input to Step Functions - The Scheduler uses the Configuration as source data to construct the input to the Workflow.
    • AWS Step Functions - Run the workflows as kicked off by the scheduler or other processes.
    • Input to Task - The input for each task is a JSON document that conforms to the message schema.
    • Output from Task - The output of each task must conform to the message schemas as well and is used as the input for the subsequent task.
    - + \ No newline at end of file diff --git a/docs/workflows/workflow-configuration-how-to/index.html b/docs/workflows/workflow-configuration-how-to/index.html index 14114bd83a4..76941104190 100644 --- a/docs/workflows/workflow-configuration-how-to/index.html +++ b/docs/workflows/workflow-configuration-how-to/index.html @@ -5,7 +5,7 @@ Workflow Configuration How To's | Cumulus Documentation - + @@ -24,7 +24,7 @@ To take a subset of any given metadata, use the option substring.

    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{substring(file.fileName, 0, 3)}"

    This example will populate to "MOD09GQ/MOD"

    In addition to substring, several datetime-specific functions are available, which can parse a datetime string in the metadata and extract a certain part of it:

    "url_path": "{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"

    or

     "url_path": "{dateFormat(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime, YYYY-MM-DD[T]HH[:]mm[:]ss)}"

    The following functions are implemented:

    • extractYear - returns the year, formatted as YYYY
    • extractMonth - returns the month, formatted as MM
    • extractDate - returns the day of the month, formatted as DD
    • extractHour - returns the hour in 24-hour format, with no leading zero
    • dateFormat - takes a second argument describing how to format the date, and passes the metadata date string and the format argument to moment().format()

    Note: the move-granules step needs to be in the workflow for this template to be populated and the file moved. This cmrMetadata or CMR granule XML needs to have been generated and stored on S3. From there any field could be retrieved and used for a url_path.

    Adding Metadata dates and times to the URL Path

    There are a number of options to pull dates from the CMR file metadata. With this metadata:

    <Granule>
    <Temporal>
    <RangeDateTime>
    <BeginningDateTime>2003-02-19T00:00:00Z</BeginningDateTime>
    <EndingDateTime>2003-02-19T23:59:59Z</EndingDateTime>
    </RangeDateTime>
    </Temporal>
    </Granule>

    The following examples of url_path could be used.

    {extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the year from the full date: 2003.

    {extractMonth(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the month: 2.

    {extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the day: 19.

    {extractHour(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)} will pull the hour: 0.

    Different values can be combined to create the url_path. For example

    {
    "bucket": "sample-protected-bucket",
    "name": "MOD09GQ.A2017025.h21v00.006.2017034065104.hdf",
    "url_path": "{cmrMetadata.Granule.Collection.ShortName}/{extractYear(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)/extractDate(cmrMetadata.Granule.Temporal.RangeDateTime.BeginningDateTime)}"
    }

    The final file location for the above would be s3://sample-protected-bucket/MOD09GQ/2003/19/MOD09GQ.A2017025.h21v00.006.2017034065104.hdf.

    - + \ No newline at end of file diff --git a/docs/workflows/workflow-triggers/index.html b/docs/workflows/workflow-triggers/index.html index 98d53b9d932..993dac59ce9 100644 --- a/docs/workflows/workflow-triggers/index.html +++ b/docs/workflows/workflow-triggers/index.html @@ -5,13 +5,13 @@ Workflow Triggers | Cumulus Documentation - +
    Version: v16.0.0

    Workflow Triggers

    For a workflow to run, it needs to be associated with a rule (see rule configuration). The rule configuration determines how and when a workflow execution is triggered. Rules can be triggered one time, on a schedule, or by new data written to a kinesis stream.

    There are three lambda functions in the API package responsible for scheduling and starting workflows: SF scheduler, message consumer, and SF starter. Each Cumulus instance comes with a Start SF SQS queue.

    The SF scheduler lambda puts a message onto the Start SF queue. This message is picked up the Start SF lambda and an execution is started with the body of the message as the input.

    When a one time rule is created, the schedule SF lambda is triggered. Rules that are not one time are associated with a CloudWatch event which will manage the trigger of the lambdas that trigger the workflows.

    For a scheduled rule, the Cloudwatch event is triggered on the given schedule which calls directly to the schedule SF lambda.

    For a kinesis rule, when data is added to the kinesis stream, the Cloudwatch event is triggered, which calls the message consumer lambda. The message consumer lambda parses the kinesis message and finds all of the rules associated with that message. For each rule (which corresponds to one workflow), the schedule SF lambda is triggered to queue a message to start the workflow.

    For an sns rule, when a message is published to the SNS topic, the message consumer receives the SNS message (JSON expected), parses it into an object, starts a new execution of the workflow associated with the rule and passes the object in the payload field of the Cumulus message.

    Diagram showing how workflows are scheduled via rules

    - + \ No newline at end of file diff --git a/index.html b/index.html index fe3a511888c..f88c77b11d6 100644 --- a/index.html +++ b/index.html @@ -5,13 +5,13 @@ Cumulus Documentation - +
    - + \ No newline at end of file diff --git a/search/index.html b/search/index.html index 30c2a4988b6..198e9ad8971 100644 --- a/search/index.html +++ b/search/index.html @@ -5,13 +5,13 @@ Search the documentation | Cumulus Documentation - + - + \ No newline at end of file diff --git a/versions/index.html b/versions/index.html index 0be73232309..fcb4a1cc9fa 100644 --- a/versions/index.html +++ b/versions/index.html @@ -5,13 +5,13 @@ Versions | Cumulus Documentation - + - + \ No newline at end of file