Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application-independent way to backup/restore data services to/from S3 bucket #2768

Open
1 task
adborden opened this issue Feb 8, 2021 · 1 comment
Open
1 task

Comments

@adborden
Copy link
Contributor

adborden commented Feb 8, 2021

User Story

In order to enable recovery from major outages, as well as snapshotting from production to other environments, the data.gov team wants provisioned data-services to be dumped to S3 storage regularly, with a documented and tested path for restoration.

Acceptance Criteria

  • GIVEN I am viewing the FY21 ISCP document
    WHEN I follow the directions in Appendix C
    THEN I see a replication of a recent backup of production data services in the target space
    AND I see the applications are functional in the target space.

Background

  • Production databases, buckets, and other non-ephemeral data services should have a CI-driven process for making backups into an S3 bucket.
  • We need to have at least a manual set of steps for restoration.
  • Ideally we would also be restoring via CI, but that's not necessary for meeting the ACs.

Security Considerations (required)

Backup and retention policy is documented in the SSP. The implementation should be consistent with what is documented.

Sketch

  • Create a private S3 bucket called service-dumps in the management space.
  • Share the S3 bucket to the other spaces and deploy a backup-manager application in each space.
    • Alternatively, services to be backed up should be shared with the management space, and there's just a single backup-manager application running there. (Although this is more desirable/centralized, this will result in service name collisions. We would have to include the space name in the service name to avoid that, which complicates everything by making app code need to be space-aware when it shouldn't need to be.)
  • The backup-manager application is triggered via cf run-task.
    • It can also be triggered cron-style (via GitHub Action) to make scheduled backups above and beyond what cloud.gov already provides, if needed.
  • There's a "restore" task that can be triggered, parameterized with the space/environment to be restored from and the name of the backup (corresponding to the names in Table 9-2 of the SSP
  • Document the process for restoring from backups in Appendix C of the CP document

In more detail

Storage

Create a private S3 bucket in the gsa-datagov/management space, and call the instance service-dumps.

cf t -s management
cf create-service s3 basic service-dumps

Make the service accessible from the two environments (though it still "lives" in the management space)

cf share-service service-dumps -s staging
cf share-service service-dumps -s production

The backup-manager app

Make an app that will act as a utility for making and restoring backups across environments. The app should include:

  • the AWS CLI
  • the MySQL CLI client
  • the Postgress CLI client
  • the Redis CLI client
  • the Elasticsearch CLI client

The app should use the apt-buildpack to get those installed. (If the AWS CLI can't be installed using apt, then just curl it and unzip it in the app .profile.) Use binary-buildpack for the final buildpack.

The .profile should parse out creds for the service-dumps bucket and set the environment variables properly so that the aws CLI will be able to aws s3 cp to and from the bucket.

The app manifest should include a default start-command which summarizes other commands available:

  • backup INSTANCENAME [BACKUPID]
    • Create a backup for INSTANCENAME in /SPACENAME/INSTANCENAME.BACKUPID.SERVICETYPE.gz. SPACENAME is the current application space name. If BACKUPID defaults to a date formatted ccyymmdd-HHMM.
    • For example: /production/catalog-db.20211122-2248.psql.gz
  • list [INSTANCENAME]
    • List available BACKUPIDs for services. If INSTANCENAME is provided limit the list to just the backups for INSTANCENAME.
  • restore INSTANCENAME [BACKUPID] [SPACENAME]
    • Restore the specified BACKUPID into the instance. If the BACKUPID was not specified, default to the most recently-created backup. SPACENAME defaults to the application space name.

Deploy the app in each space, but don't start it or give it a route

cf target -s staging
cf push backup-manager --task
cf target -s production
cf push backup-manager --task

Usage

Making backups
cf bind-service backup-manager my-service
cf run-task backup-manager --command "backup my-service"
cf unbind-service backup-manager my-service
Restoring backups

Restore from the most recent backup in this space

cf bind-service backup-manager my-service
cf run-task backup-manager --command "restore my-service"
cf unbind-service backup-manager my-service

Restore a particular backup from the production space:

cf bind-service backup-manager my-service
cf run-task backup-manager --command "restore my-service 20211122-2248 production"
cf unbind-service backup-manager my-service
@mogul mogul changed the title Application-independent way to backup databases to S3 buckets Application-independent way to backup data services to S3 buckets Mar 1, 2021
@mogul mogul changed the title Application-independent way to backup data services to S3 buckets Application-independent way to backup/restore data services to/from S3 bucket Mar 1, 2021
@adborden adborden self-assigned this Mar 18, 2021
@mogul mogul mentioned this issue Nov 22, 2021
31 tasks
@adborden
Copy link
Contributor Author

adborden commented Dec 8, 2021

https://github.com/gsa/cf-backup-manager now exists and can be expanded to meet this story. We've currently implemented enough to support backup/restore of mysql and postgres services.

@adborden adborden removed their assignment Dec 8, 2021
@hkdctol hkdctol moved this to Icebox in data.gov team board Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

1 participant