Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to restore from backup using Point-in-time-Recovery (PITR) #1886

Closed
haribhauhud opened this issue Sep 15, 2020 · 4 comments
Closed

Comments

@haribhauhud
Copy link

haribhauhud commented Sep 15, 2020

** Which example are you working with? **

We have below pgo version:
pgo client version 4.3.2
pgo-apiserver version 4.3.2

What is the current behavior?

We have below two full backups:

    full backup: 20200915-132804F
        timestamp start/stop: 2020-09-15 18:58:04 +0530 IST / 2020-09-15 18:58:18 +0530 IST
        wal start/stop: 000000010000000000000015 / 000000010000000000000015
        database size: 46.4MiB, backup size: 46.4MiB
        repository size: 4.5MiB, repository backup size: 4.5MiB
        backup reference list: 

    full backup: 20200915-132853F
        timestamp start/stop: 2020-09-15 18:58:53 +0530 IST / 2020-09-15 18:59:07 +0530 IST
        wal start/stop: 000000010000000000000017 / 000000010000000000000017
        database size: 46.4MiB, backup size: 46.4MiB
        repository size: 4.5MiB, repository backup size: 4.5MiB
        backup reference list: 

When trying to restore to first backup i.e 20200915-132804F restore not happening and PostgreSQL cluster goes down. following this document - https://access.crunchydata.com/documentation/postgres-operator/latest/pgo-client/common-tasks/#disaster-recovery-backups-restores

$ pgo restore awx-stg -n pgo --pitr-target=" 2020-09-15 18:58:04" --backup-opts="--type=name --target-action=promote --set=20200915-132804F"

image

Is it because we are not passing time in right way? what is expected? Don't see any error message in restor pod log saying time stamp not in required format something like that

What is the expected behavior?

Cluster should restore to previous backup

@jkatz
Copy link
Contributor

jkatz commented Sep 15, 2020

  • What environment are you running this in?
  • Are you backing up to a local pgBackRest repository, S3, or both?
  • What is the full state of the cluster? A screenshot just showing one Pod does not tell me much
  • Do any of the Pods come back up? What does kubectl -n get pods indicate? What is the full output of pgo test?

Doing a restore, the PostgreSQL cluster will go down for a time period. This is a destructive action (as both the command and documentation warn).

That said, v4.5.0 makes significant improvements to the pgo restore process that could resolve your issue, but would need to better understand your e nvironment.

@haribhauhud haribhauhud changed the title Note able to restore from backup using Point-in-time-Recovery (PITR) Not able to restore from backup using Point-in-time-Recovery (PITR) Sep 16, 2020
@haribhauhud
Copy link
Author

haribhauhud commented Sep 17, 2020

Thanks for the quick response @jkatz. We are able to resolve this issue. During my testing I was not paying attention to timestamp format required by pgbackrest. The restore worked with proper timestamp value. By any chance do you know how to restore just by using labels and not timestamp, something like:

$> pgo restore awx-stg --backup-opts="--type=name --set=20200916-073249F"

@Venryx
Copy link

Venryx commented Aug 29, 2021

I had assumed you could restore to just a label (and not a timestamp) by using:

  dataSource:
    postgresCluster:
      clusterName: ...
      repoName: ...
      options:
      - --set 20210828-074206F
      - --type=immediate
      - --target-action=promote

However, the above configuration gives the error:

time="2021-08-29T12:34:55Z" level=debug msg=Warning file="sigs.k8s.io/[email protected]/pkg/internal/recorder/recorder.go:98" func="recorder.(*Provider).getBroadcaster.func1.1" message="Option '--target-action' is not allowed: the operator will automatically set this option %!(EXTRA string=repo2)" object="{PostgresCluster postgres-operator debate-map 9702e2ed-1573-43af-b4c2-953fac91ee20 postgres-operator.crunchydata.com/v1beta1 11847 }"

When I leave out the --target-action=promote as the message instructs, I get this problem:

[...]
2021-08-29 12:26:16.892 GMT [17] LOG:  restored log file "0000001D.history" from archive
2021-08-29 12:26:16.920 GMT [17] LOG:  redo starts at 0/15000028
2021-08-29 12:26:16.927 GMT [17] LOG:  consistent recovery state reached at 0/15015430
2021-08-29 12:26:16.927 GMT [17] LOG:  recovery stopping after reaching consistency
2021-08-29 12:26:16.927 GMT [17] LOG:  pausing at the end of recovery
2021-08-29 12:26:16.927 GMT [17] HINT:  Execute pg_wal_replay_resume() to promote.
2021-08-29 12:26:16.927 GMT [16] LOG:  database system is ready to accept read only connections

So it seems that the operator is blocking me from specifying --target-action=promote (saying it will do it itself), yet when I do that, it doesn't successfully promote the cluster, instead saying pausing at the end of recovery HINT: Execute pg_wal_replay_resume() to promote.

How then should someone go about "restoring to just a label, not using a timestamp"? (Isn't that what the type=immediate + target-action=promote combo is for?)

@Venryx
Copy link

Venryx commented Sep 12, 2021

Well, I wasn't able to get the --target-action=promote flag to work, but I found a workaround:

  1. Apply the kubectl config in my previous post. (but without the target-action flag)
  2. Use Lens to find the MYPROJECT-restore-XXXX-XXXXX pod (make sure you have "All namespaces" enabled in the namespace-selector, otherwise you won't be able to see any of the pgo pods), and open a shell in it.
  3. Execute: psql
  4. Execute: select pg_wal_replay_resume();

It adds another step to the restore process, but it gets the job done for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants