You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to enable recovery from major outages, as well as snapshotting from production to other environments, the data.gov team wants provisioned data-services to be dumped to S3 storage regularly, with a documented and tested path for restoration.
Acceptance Criteria
GIVEN I am viewing the FY21 ISCP document
WHEN I follow the directions in Appendix C
THEN I see a replication of a recent backup of production data services in the target space
AND I see the applications are functional in the target space.
Background
Production databases, buckets, and other non-ephemeral data services should have a CI-driven process for making backups into an S3 bucket.
We need to have at least a manual set of steps for restoration.
Ideally we would also be restoring via CI, but that's not necessary for meeting the ACs.
Backup and retention policy is documented in the SSP. The implementation should be consistent with what is documented.
Sketch
Create a private S3 bucket called service-dumps in the management space.
Share the S3 bucket to the other spaces and deploy a backup-manager application in each space.
Alternatively, services to be backed up should be shared with the management space, and there's just a single backup-manager application running there. (Although this is more desirable/centralized, this will result in service name collisions. We would have to include the space name in the service name to avoid that, which complicates everything by making app code need to be space-aware when it shouldn't need to be.)
The backup-manager application is triggered via cf run-task.
It can also be triggered cron-style (via GitHub Action) to make scheduled backups above and beyond what cloud.gov already provides, if needed.
There's a "restore" task that can be triggered, parameterized with the space/environment to be restored from and the name of the backup (corresponding to the names in Table 9-2 of the SSP
Create a private S3 bucket in the gsa-datagov/management space, and call the instance service-dumps.
cf t -s management
cf create-service s3 basic service-dumps
Make the service accessible from the two environments (though it still "lives" in the management space)
cf share-service service-dumps -s staging
cf share-service service-dumps -s production
The backup-manager app
Make an app that will act as a utility for making and restoring backups across environments. The app should include:
the AWS CLI
the MySQL CLI client
the Postgress CLI client
the Redis CLI client
the Elasticsearch CLI client
The app should use the apt-buildpack to get those installed. (If the AWS CLI can't be installed using apt, then just curl it and unzip it in the app .profile.) Use binary-buildpack for the final buildpack.
The .profile should parse out creds for the service-dumps bucket and set the environment variables properly so that the aws CLI will be able to aws s3 cp to and from the bucket.
The app manifest should include a default start-command which summarizes other commands available:
backup INSTANCENAME [BACKUPID]
Create a backup for INSTANCENAME in /SPACENAME/INSTANCENAME.BACKUPID.SERVICETYPE.gz. SPACENAME is the current application space name. If BACKUPID defaults to a date formatted ccyymmdd-HHMM.
For example: /production/catalog-db.20211122-2248.psql.gz
list [INSTANCENAME]
List available BACKUPIDs for services. If INSTANCENAME is provided limit the list to just the backups for INSTANCENAME.
restore INSTANCENAME [BACKUPID] [SPACENAME]
Restore the specified BACKUPID into the instance. If the BACKUPID was not specified, default to the most recently-created backup. SPACENAME defaults to the application space name.
Deploy the app in each space, but don't start it or give it a route
The text was updated successfully, but these errors were encountered:
mogul
changed the title
Application-independent way to backup databases to S3 buckets
Application-independent way to backup data services to S3 buckets
Mar 1, 2021
mogul
changed the title
Application-independent way to backup data services to S3 buckets
Application-independent way to backup/restore data services to/from S3 bucket
Mar 1, 2021
User Story
In order to enable recovery from major outages, as well as snapshotting from production to other environments, the data.gov team wants provisioned data-services to be dumped to S3 storage regularly, with a documented and tested path for restoration.
Acceptance Criteria
WHEN I follow the directions in Appendix C
THEN I see a replication of a recent backup of production data services in the target space
AND I see the applications are functional in the target space.
Background
Security Considerations (required)
Backup and retention policy is documented in the SSP. The implementation should be consistent with what is documented.
Sketch
service-dumps
in themanagement
space.backup-manager
application in each space.backup-manager
application running there. (Although this is more desirable/centralized, this will result in service name collisions. We would have to include the space name in the service name to avoid that, which complicates everything by making app code need to be space-aware when it shouldn't need to be.)cf run-task
.In more detail
Storage
Create a private S3 bucket in the
gsa-datagov/management
space, and call the instanceservice-dumps
.Make the service accessible from the two environments (though it still "lives" in the
management
space)The backup-manager app
Make an app that will act as a utility for making and restoring backups across environments. The app should include:
The app should use the
apt-buildpack
to get those installed. (If the AWS CLI can't be installed usingapt
, then just curl it and unzip it in the app.profile
.) Usebinary-buildpack
for the final buildpack.The
.profile
should parse out creds for theservice-dumps
bucket and set the environment variables properly so that theaws
CLI will be able toaws s3 cp
to and from the bucket.The app manifest should include a default start-command which summarizes other commands available:
backup INSTANCENAME [BACKUPID]
INSTANCENAME
in/SPACENAME/INSTANCENAME.BACKUPID.SERVICETYPE.gz
.SPACENAME
is the current application space name. IfBACKUPID
defaults to adate
formattedccyymmdd-HHMM
.list [INSTANCENAME]
BACKUPIDs
for services. IfINSTANCENAME
is provided limit the list to just the backups forINSTANCENAME
.restore INSTANCENAME [BACKUPID] [SPACENAME]
BACKUPID
into the instance. If the BACKUPID was not specified, default to the most recently-created backup.SPACENAME
defaults to the application space name.Deploy the app in each space, but don't start it or give it a route
Usage
Making backups
cf bind-service backup-manager my-service cf run-task backup-manager --command "backup my-service" cf unbind-service backup-manager my-service
Restoring backups
Restore from the most recent backup in this space
cf bind-service backup-manager my-service cf run-task backup-manager --command "restore my-service" cf unbind-service backup-manager my-service
Restore a particular backup from the production space:
cf bind-service backup-manager my-service cf run-task backup-manager --command "restore my-service 20211122-2248 production" cf unbind-service backup-manager my-service
The text was updated successfully, but these errors were encountered: