Migrate catalog to cloud.gov #2788

adborden · 2021-02-11T17:18:18Z

User Story

In order to stop maintaining the FCS deployment, the data.gov team wants production service to be directed to our deployment on cloud.gov.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

WHEN we visit catalog.data.gov in our browser
THEN we see the expected catalog output
AND we see requests in the catalog app's logs on cloud.gov

Background

[Any helpful contextual notes or links to artifacts/evidence, if needed]

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]
This change will migrate us away from our old environment, which is harder to maintain and for which there are more things that we have to look after. The new environment has already been pen-tested and ATOd so we think it's going to be net win on attack surface overall.

Launch plan

Pre-launch

In the days leading up to the launch, these tasks should be completed:

Launch

Tasks to be completed at the time of launch.

In the event a rollback is necessary, apply these tasks.

Revert DNS changes, should point to original values in FCS
Ensure application services are started and running in FCS

The text was updated successfully, but these errors were encountered:

mogul · 2021-11-08T21:57:52Z

People had expressed concerns about migrating large data dumps from FCS to cloud.gov. The simplest way to sidestep that would be to pipe directly from mysqldump -> gzip -> sws s3 cp, which people do all the time. The S3 credentials would be for a bucket that we provision in our management space; see the instructions for getting credentials for use outside of cloud.gov. Restore would work the same way from an application instance... Bind the S3 bucket, then run aws s3 cp -> gzip -dc -> mysql.

jbrown-xentity · 2021-11-18T21:34:56Z

Plan to do dashboard, then inventory, then catalog, with static site being "whenever ready". Subject to change.

mogul · 2021-11-22T22:14:58Z

For reference, here's how the more general backup strategy will work.

adborden · 2021-11-23T21:55:12Z

Confirmed that we have CAA for letsencrypt.org at data.gov, which will be inherited for all subdomains (unless overridden).

nickumia-reisys · 2021-12-01T16:21:36Z

Database Migration Commands:

# Create Temporary S3 Storage
S3_NAME=migrate-data
S3_KEY=md-key
cf create-service s3 basic $S3_NAME

S3_CREDENTIALS=`cf service-key "${S3_NAME}" "${S3_KEY}" | tail -n +2`
export AWS_ACCESS_KEY_ID=`echo "${S3_CREDENTIALS}" | jq -r .credentials.access_key_id`
export AWS_SECRET_ACCESS_KEY=`echo "${S3_CREDENTIALS}" | jq -r .credentials.secret_access_key`
export BUCKET_NAME=`echo "${S3_CREDENTIALS}" | jq -r .credentials.bucket`
export AWS_DEFAULT_REGION=`echo "${S3_CREDENTIALS}" | jq -r '.credentials.region'`


# Non-binary PSQL Dump
pg_dump --no-acl --no-owner --clean -T spatial_ref_sys -T layer -T topology ckan > ckan.dump

# Binary PSQL Dump
pg_dump --format=custom --no-acl --no-owner --clean -T spatial_ref_sys -T layer -T topology ckan > ckan.dump


# Pipe into S3
<pg_dump> | gzip | aws s3 cp - s3://${BUCKET_NAME}/<backup_name.sql.gz>

# Pipe out of S3
aws s3 cp s3://${BUCKET_NAME}/<backup_name.sql.gz> - | gzip -dc | <psql/pg_restore>


# Non-binary restore
PGPASSWORD=$DB_PASS psql -h $DB_HOST -U $DB_USER -p $DB_PORT $DB_NAME < <backup>

# Binary restore
PGPASSWORD=$DB_PASS pg_restore -h $DB_HOST -p $DB_PORT -U $DB_USER --no-owner --clean -d $DB_NAME < <backup>

# Local/Cloud.gov Restore
DB_USER=ckan
DB_PASS=ckan
DB_HOST=127.0.0.1
DB_PORT=5432
DB_NAME=ckan

# If no key exists,
# cf create-service-key <db_name> <db_key>
DB_CREDENTIALS=`cf service-key <db_name> <db_key> | tail -n +2`
export DB_NAME=`echo "${DB_CREDENTIALS}" | jq -r .credentials.db_name`
export DB_HOST=`echo "${DB_CREDENTIALS}" | jq -r .credentials.host`
export DB_USER=`echo "${DB_CREDENTIALS}" | jq -r .credentials.username`
export DB_PASS=`echo "${DB_CREDENTIALS}" | jq -r .credentials.password`

PGPASSWORD=$DB_PASS psql -h $DB_HOST -p $DB_PORT -U $DB_USER -c "create database ckan_temp;"
PGPASSWORD=$DB_PASS psql -h $DB_HOST -p $DB_PORT -U $DB_USER -d ckan_temp -c "drop extension IF EXISTS postgis cascade;"
PGPASSWORD=$DB_PASS psql -h $DB_HOST -p $DB_PORT -U $DB_USER -d ckan_temp -c "select pg_terminate_backend(pid) from pg_stat_activity where datname='ckan';"
PGPASSWORD=$DB_PASS psql -h $DB_HOST -p $DB_PORT -U $DB_USER -d ckan_temp -c "drop database $DB_NAME;"
PGPASSWORD=$DB_PASS psql -h $DB_HOST -p $DB_PORT -U $DB_USER -d ckan_temp -c "create database $DB_NAME;"
PGPASSWORD=$DB_PASS psql -h $DB_HOST -p $DB_PORT -U $DB_USER -c "create extension postgis;"
PGPASSWORD=$DB_PASS psql -h $DB_HOST -p $DB_PORT -U $DB_USER -c "drop database ckan_temp;"

# Binary or Non-binary restore from above,
PGPASSWORD=$DB_PASS pg_restore -h $DB_HOST -p $DB_PORT -U $DB_USER --no-owner --clean -d $DB_NAME < <binary_dump>
PGPASSWORD=$DB_PASS psql -h $DB_HOST -p $DB_PORT -U $DB_USER -d $DB_NAME < <non_binary_dump>

docker-compose exec ckan /bin/bash -c "ckan db upgrade"
docker-compose exec ckan /bin/bash -c "ckan search-index rebuild"
cf run-task catalog -c "ckan db upgrade"
cf run-task catalog -c "ckan db search-index rebuild"

nickumia-reisys · 2021-12-07T15:08:49Z

Final DB Migration Script: https://gist.github.com/nickumia-reisys/8a5da2c3e33b9b7fb2ada263b9f9c52e

Steps to replicate:

Run 01-fcs-catalog-db-migration.sh script to create backup from FCS (needs to run on FCS production server).
Run 02-fcs-catalog-db-migration.sh script locally to start restore process on cloud.gov.

jbrown-xentity · 2021-12-12T14:14:49Z

@nickumia-reisys since we needed some collaboration, I moved the scripts as "docs" or usage scripts for cf-backup-manager: GSA/cf-backup-manager#18 (I also made some changes).
We finally got the ckan db upgrade command to work, it took 7.25 hours to complete. See catalog logs on 10/12 to confirm.
I kicked off the ckan search-index rebuild command just now, to see if it crashes and/or to get an estimate on how long it will take on the full DB (current best estimate is 5 days).

jbrown-xentity · 2021-12-13T15:33:22Z

ckan search-index rebuild crashed in 9 minutes with error code 137 (out of memory). Next steps for this ticket:

Verify dataset integrity by using ckan api (/api/action/package_show?id=name)
Try bumping memory in run-task (-k default 1G, -m default 512) to avoid memory issues, track closely at the beginning
Consider using rebuild_fast (docs) to speed up process
Consider turning off harvesters on prod now, so that when we get the right configuration we can "go", and not repeat all the steps in waterfall
If bumping memory does not work, examine if we need more solr instances (should google to see if there is an estimate on solr instances needed per number of records)
We could try to get a database dump of the dataset names, and break them up into digestable files, and running a task to index each dataset individually and parallelize.

nickumia-reisys · 2021-12-13T16:19:01Z

Database does look like it is functional, Accessing package_show on staging shows data equivalent to catalog fcs prod UI,

Staging api route: https://catalog-stage-datagov.app.cloud.gov/api/action/package_show?id=megapixel-mercury-cadmium-telluride-focal-plane-arrays-for-infrared-imaging-out-to-12-micr
FCS Prod route: https://catalog.data.gov/dataset/megapixel-mercury-cadmium-telluride-focal-plane-arrays-for-infrared-imaging-out-to-12-micr

Staging api route: https://catalog-stage-datagov.app.cloud.gov/api/action/package_show?id=namma-lightning-zeus-data-v1
FCS Prod route: https://catalog.data.gov/dataset/namma-lightning-zeus-data-v1

nickumia-reisys · 2022-01-10T18:02:49Z

Courtesy of @jbrown-xentity,
To check how many collections have been indexed on catalog, go to https://catalog.data.gov/api/action/package_search?q=collection_metadata=true

nickumia-reisys · 2022-03-12T21:39:08Z

I'm proposing that we don't need to take a new database dump and just run all of the harvest jobs on catalog production since it has all of the data since December 2021.

FuhuXia · 2022-03-22T13:13:14Z

Harvesting activity is stopped. catalog.final.20220322.prod.gz was saved on S3.

nickumia-reisys · 2022-03-23T15:31:02Z

Database is restored and Solr registered 17k datasets to reindex. Solr reindex is currently running.

FuhuXia · 2022-04-18T13:57:28Z

After database restore, we need to run an ANALYZE command to collect new statistics.

nickumia-reisys · 2022-07-01T19:15:28Z

Catalog DB (prod) backup/restore times

FuhuXia · 2022-07-29T20:24:59Z

FuhuXia · 2022-08-03T15:34:58Z

Pointed current production CDN to catalog-web app on cloud.gov. Catalog.data.gov is officially migrated to cloud.gov, Things are looking fine, UI speed is good, catalog-web instances are stable, ECS Solr memory is normal. Will watch for the performance in the next a few days, and gradually take catalog.data.gov out of 'safe mode' and turn on harvesting.

FuhuXia · 2022-08-09T14:33:17Z

Initial harvesting has been running for 4 days. Dataset count increased 57k. This is abnormal. Investigating data.json source duplicating issue now.

FuhuXia · 2022-08-11T12:51:58Z

ckanext-datajson duplicate issue identified and fixed. Refreshing catalog with last FCS DB backup and reindexing Solr.

FuhuXia · 2022-08-16T14:37:10Z

Change requests for staging and production saml2 app were submitted to login.gov. Hopefully they can be deployed this Thursday, but it might take up to 2 weeks.
https://zendesk.login.gov/hc/en-us/requests/1073
https://zendesk.login.gov/hc/en-us/requests/1074

FuhuXia · 2022-08-19T13:50:36Z

The error we saw during pg_restore ERROR: schema "public" already exists is due to making dump using cli version 9 but restoring using cli version 12, as discussed in this stackexchange thread. Two ways we can try to resolve it:

use cli version 9 to restore, or
run pg_restore with option --schema=public -d target_db

FuhuXia · 2022-08-27T05:17:44Z

IdP promoted to login.gov production, Migration completed.

adborden added the component/inventory Inventory playbooks/roles label Feb 11, 2021

adborden mentioned this issue Feb 11, 2021

Deploy inventory application on cloud.gov #2603

Closed

38 tasks

mogul changed the title ~~Launch inventory.data.gov on cloud.gov~~ Launch [appname] on cloud.gov Feb 11, 2021

mogul mentioned this issue Nov 4, 2021

Rehearse cloud.gov migration for [appname] #3435

Closed

6 tasks

mogul changed the title ~~Launch [appname] on cloud.gov~~ Migrate service to cloud.gov Nov 4, 2021

mogul changed the title ~~Migrate service to cloud.gov~~ Migrate services to cloud.gov Nov 4, 2021

jbrown-xentity assigned adborden Nov 18, 2021

adborden changed the title ~~Migrate services to cloud.gov~~ Migrate catalog to cloud.gov Nov 23, 2021

adborden added component/catalog Related to catalog component playbooks/roles and removed component/inventory Inventory playbooks/roles labels Nov 23, 2021

adborden removed their assignment Nov 24, 2021

mogul assigned nickumia-reisys Nov 30, 2021

mogul added the ATO label Mar 23, 2022

FuhuXia self-assigned this Mar 25, 2022

FuhuXia mentioned this issue Mar 25, 2022

switch from identitysandbox.gov to login.gov GSA/catalog.data.gov#436

Merged

This was referenced Mar 28, 2022

Turn harvesters back on in prod space GSA/catalog.data.gov#438

Merged

switch to production URL, turn on restart GSA/catalog.data.gov#441

Merged

hkdctol moved this to Icebox in data.gov team board Apr 14, 2022

hkdctol moved this from Icebox to Sprint Backlog [7] in data.gov team board Apr 14, 2022

nickumia-reisys mentioned this issue Aug 2, 2022

Solr on ECS - fix order of start script GSA/catalog.data.gov#508

Merged

hkdctol moved this from Sprint Backlog [7] to In Progress [8] in data.gov team board Aug 2, 2022

jbrown-xentity mentioned this issue Aug 3, 2022

Block languages and verify blocked GSA/catalog.data.gov#510

Merged

FuhuXia mentioned this issue Aug 3, 2022

change prod CDN GSA/catalog.data.gov#509

Merged

jbrown-xentity mentioned this issue Aug 3, 2022

Check login cookie GSA/catalog.data.gov#511

Merged

FuhuXia moved this from In Progress [8] to Blocked in data.gov team board Aug 16, 2022

FuhuXia mentioned this issue Aug 18, 2022

Data harvested as private #3593

Closed

FuhuXia mentioned this issue Aug 23, 2022

change idp to login.gov prod GSA/catalog.data.gov#528

Merged

FuhuXia moved this from Blocked to Done in data.gov team board Aug 29, 2022

btylerburton closed this as completed Sep 1, 2022

nickumia-reisys mentioned this issue Aug 3, 2023

Forward User-Agent from AWS CloudFront to catalog app #4059

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate catalog to cloud.gov #2788

Migrate catalog to cloud.gov #2788

adborden commented Feb 11, 2021 •

edited by FuhuXia

Loading

mogul commented Nov 8, 2021 •

edited by adborden

Loading

jbrown-xentity commented Nov 18, 2021

mogul commented Nov 22, 2021

adborden commented Nov 23, 2021

nickumia-reisys commented Dec 1, 2021

nickumia-reisys commented Dec 7, 2021

jbrown-xentity commented Dec 12, 2021 •

edited

Loading

jbrown-xentity commented Dec 13, 2021 •

edited by nickumia-reisys

Loading

nickumia-reisys commented Dec 13, 2021

nickumia-reisys commented Jan 10, 2022

nickumia-reisys commented Mar 12, 2022

FuhuXia commented Mar 22, 2022

nickumia-reisys commented Mar 23, 2022

FuhuXia commented Apr 18, 2022

nickumia-reisys commented Jul 1, 2022

FuhuXia commented Jul 29, 2022 •

edited

Loading

FuhuXia commented Aug 3, 2022

FuhuXia commented Aug 9, 2022

FuhuXia commented Aug 11, 2022

FuhuXia commented Aug 16, 2022

FuhuXia commented Aug 19, 2022

FuhuXia commented Aug 27, 2022

Migrate catalog to cloud.gov #2788

Migrate catalog to cloud.gov #2788

Comments

adborden commented Feb 11, 2021 • edited by FuhuXia Loading

User Story

Acceptance Criteria

Background

Security Considerations (required)

Launch plan

Pre-launch

Launch

mogul commented Nov 8, 2021 • edited by adborden Loading

jbrown-xentity commented Nov 18, 2021

mogul commented Nov 22, 2021

adborden commented Nov 23, 2021

nickumia-reisys commented Dec 1, 2021

nickumia-reisys commented Dec 7, 2021

jbrown-xentity commented Dec 12, 2021 • edited Loading

jbrown-xentity commented Dec 13, 2021 • edited by nickumia-reisys Loading

nickumia-reisys commented Dec 13, 2021

nickumia-reisys commented Jan 10, 2022

nickumia-reisys commented Mar 12, 2022

FuhuXia commented Mar 22, 2022

nickumia-reisys commented Mar 23, 2022

FuhuXia commented Apr 18, 2022

nickumia-reisys commented Jul 1, 2022

FuhuXia commented Jul 29, 2022 • edited Loading

FuhuXia commented Aug 3, 2022

FuhuXia commented Aug 9, 2022

FuhuXia commented Aug 11, 2022

FuhuXia commented Aug 16, 2022

FuhuXia commented Aug 19, 2022

FuhuXia commented Aug 27, 2022

adborden commented Feb 11, 2021 •

edited by FuhuXia

Loading

mogul commented Nov 8, 2021 •

edited by adborden

Loading

jbrown-xentity commented Dec 12, 2021 •

edited

Loading

jbrown-xentity commented Dec 13, 2021 •

edited by nickumia-reisys

Loading

FuhuXia commented Jul 29, 2022 •

edited

Loading