Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
e328182
In-place major upgrade
Aug 21, 2020
9f0320a
Better logging + drop postgres_log before upgrade
Aug 21, 2020
fc47c58
Little fixes
Aug 21, 2020
be18343
More work
Aug 27, 2020
342da9f
Implement integration tests
Aug 27, 2020
eec5b9a
Move tests to the right place
Aug 27, 2020
cf37c6d
Polish tests
Aug 28, 2020
10260ff
Add minio
Sep 3, 2020
d0eb6b4
Patroni 2.0
Sep 3, 2020
a9cf41e
Merge branch 'master' of github.com:zalando/spilo into feature/in-pla…
Sep 3, 2020
936a90b
Merge branch 'feature/patroni-2.0' of github.com:zalando/spilo into f…
Sep 3, 2020
590917a
More refactoring. Define PATRONI_CONFIG_FILE in spilo_commons
Sep 3, 2020
3a5b988
Upgrade with clone + tests
Sep 3, 2020
2bf0a67
More tests: timescaledb and upgrade after clone
Sep 7, 2020
2aa408f
Merge branch 'master' of github.com:zalando/spilo into feature/in-pla…
Sep 7, 2020
862acc9
Run tests in CDP
Sep 7, 2020
23e71f1
Fix delivery.yaml
Sep 7, 2020
ad68923
Install docker-compose
Sep 7, 2020
592306e
Install docker-compose with pip3
Sep 7, 2020
4f22981
Pin docker-compose version
Sep 7, 2020
92442de
debug tests
Sep 7, 2020
a7a4cd7
SPILO_PROVIDER=local
Sep 7, 2020
13e9567
Raise timeouts
Sep 7, 2020
5ed144d
Skip failed upgrade
Sep 7, 2020
eb45869
Rename test_major_upgrade.sh -> test_spilo.sh
Sep 8, 2020
67caf5b
Merge branch 'master' of github.com:zalando/spilo into feature/in-pla…
Sep 8, 2020
1b49069
Disable one test
Sep 8, 2020
c64fe33
Update wal-e envdir and trigger backup after upgrade
Sep 8, 2020
28217e3
Implemented ENABLE_WAL_PATH_COMPAT
Sep 9, 2020
1a0a436
Merge branch 'master' of github.com:zalando/spilo into feature/in-pla…
Sep 24, 2020
7c70d96
Merge branch 'feature/in-place-upgrade' of github.com:zalando/spilo i…
Sep 29, 2020
7903d61
more tests
Sep 29, 2020
722d63f
Merge branch 'master' of github.com:zalando/spilo into feature/in-pla…
Sep 29, 2020
06b26ee
Merge branch 'feature/pg13' of github.com:zalando/spilo into feature/…
Sep 29, 2020
6e3c87f
Merge branch 'feature/pg13-inplace-upgrade-v2' of github.com:zalando/…
Sep 29, 2020
3c5f6f5
Make rsync port configurable
Sep 29, 2020
4c36a50
Add a few comments
Sep 30, 2020
92e69b0
Merge branch 'master' of github.com:zalando/spilo into feature/in-pla…
Oct 2, 2020
aef8a98
Remove unnecessary cd
Oct 8, 2020
35349b9
Bump bg_mon commit id
Oct 9, 2020
c1cf441
Start backup only if there is envdir defined
Oct 9, 2020
179b825
Merge branch 'feature/pg13' of github.com:zalando/spilo into feature/…
Oct 9, 2020
c5981d5
Merge branch 'feature/pg13' of github.com:zalando/spilo into feature/…
Oct 26, 2020
32a5243
Merge branch 'feature/pg13' of github.com:zalando/spilo into feature/…
Nov 23, 2020
a1685da
Merge branch 'feature/pg13' of github.com:zalando/spilo into feature/…
Nov 30, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions ENVIRONMENT.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,3 +78,4 @@ Environment Configuration Settings
- **KUBERNETES_ROLE_LABEL**: name of the label containing Postgres role when running on Kubernetens. Default is 'spilo-role'.
- **KUBERNETES_SCOPE_LABEL**: name of the label containing cluster name. Default is 'version'.
- **KUBERNETES_LABELS**: a JSON describing names and values of other labels used by Patroni on Kubernetes to locate its metadata. Default is '{"application": "spilo"}'.
- **ENABLE_WAL_PATH_COMPAT**: old Spilo images were generating wal path in the backup store using the following template ``/spilo/{WAL_BUCKET_SCOPE_PREFIX}{SCOPE}{WAL_BUCKET_SCOPE_SUFFIX}/wal/``, while new images adding one additional directory (``{PGVERSION}``) to the end. In order to avoid (unlikely) issues with restoring WALs (from S3/GC/and so on) when switching to ``spilo-13`` please set the ``ENABLE_WAL_PATH_COMPAT=true`` when deploying old cluster with ``spilo-13`` for the first time. After that the environment variable could be removed. Change of the WAL path also mean that backups stored in the old location will not be cleaned up automatically.
4 changes: 2 additions & 2 deletions postgres-appliance/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ FROM ubuntu:18.04 as builder-false
RUN export DEBIAN_FRONTEND=noninteractive \
&& echo 'APT::Install-Recommends "0";\nAPT::Install-Suggests "0";' > /etc/apt/apt.conf.d/01norecommend \
&& apt-get update \
&& apt-get install -y curl ca-certificates less locales jq vim-tiny gnupg1 cron runit dumb-init libcap2-bin \
&& apt-get install -y curl ca-certificates less locales jq vim-tiny gnupg1 cron runit dumb-init libcap2-bin rsync \
&& ln -s chpst /usr/bin/envdir \
# Make it possible to use the following utilities without root
&& setcap 'cap_sys_nice+ep' /usr/bin/chrt \
Expand Down Expand Up @@ -528,7 +528,7 @@ RUN sed -i "s|/var/lib/postgresql.*|$PGHOME:/bin/bash|" /etc/passwd \
&& usermod -a -G root postgres; \
fi

COPY scripts bootstrap /scripts/
COPY scripts bootstrap major_upgrade /scripts/
COPY launch.sh /

CMD ["/bin/sh", "/launch.sh", "init"]
108 changes: 94 additions & 14 deletions postgres-appliance/bootstrap/clone_with_wale.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import logging
import os
import re
import shlex
import subprocess
import sys

Expand Down Expand Up @@ -61,36 +62,115 @@ def fix_output(output):
yield '\t'.join(line.split())


def choose_backup(output, recovery_target_time):
def choose_backup(backup_list, recovery_target_time):
""" pick up the latest backup file starting before time recovery_target_time"""
reader = csv.DictReader(fix_output(output), dialect='excel-tab')
backup_list = list(reader)
if len(backup_list) <= 0:
raise Exception("wal-e could not found any backups")

match_timestamp = match = None
for backup in backup_list:
last_modified = parse(backup['last_modified'])
if last_modified < recovery_target_time:
if match is None or last_modified > match_timestamp:
match = backup
match_timestamp = last_modified
if match is None:
raise Exception("wal-e could not found any backups prior to the point in time {0}".format(recovery_target_time))
return match['name']
if match is not None:
return match['name']


def list_backups(env):
backup_list_cmd = build_wale_command('backup-list')
output = subprocess.check_output(backup_list_cmd, env=env)
reader = csv.DictReader(fix_output(output), dialect='excel-tab')
return list(reader)


def get_clone_envdir():
from spilo_commons import get_patroni_config

config = get_patroni_config()
restore_command = shlex.split(config['bootstrap']['clone_with_wale']['recovery_conf']['restore_command'])
if len(restore_command) > 4 and restore_command[0] == 'envdir':
return restore_command[1]
raise Exception('Failed to find clone envdir')


def get_possible_versions():
from spilo_commons import LIB_DIR, get_binary_version, get_bin_dir, get_patroni_config

config = get_patroni_config()

max_version = float(get_binary_version(config.get('postgresql', {}).get('bin_dir')))

versions = {}

for d in os.listdir(LIB_DIR):
try:
ver = get_binary_version(get_bin_dir(d))
fver = float(ver)
if fver <= max_version:
versions[fver] = ver
except Exception:
pass

# return possible versions in reversed order, i.e. 12, 11, 10, 9.6, and so on
return [ver for _, ver in sorted(versions.items(), reverse=True)]


def get_wale_environments(env):
use_walg = env.get('USE_WALG_RESTORE') == 'true'
prefix = 'WALG_' if use_walg else 'WALE_'
# len('WALE__PREFIX') = 12
names = [name for name in env.keys() if name.endswith('_PREFIX') and name.startswith(prefix) and len(name) > 12]
if len(names) != 1:
raise Exception('Found find {0} {1}*_PREFIX environment variables, expected 1'
.format(len(names), prefix))

name = names[0]
value = env[name].rstrip('/')

if '/spilo/' in value and value.endswith('/wal'): # path crafted in the configure_spilo.py?
# Try all versions descending if we don't know the version of the source cluster
for version in get_possible_versions():
yield name, '{0}/{1}/'.format(value, version)

# Last, try the original value
yield name, env[name]


def find_backup(recovery_target_time, env):
old_value = None
for name, value in get_wale_environments(env):
if not old_value:
old_value = env[name]
env[name] = value
backup_list = list_backups(env)
if backup_list:
if recovery_target_time:
backup = choose_backup(backup_list, recovery_target_time)
if backup:
return backup, (name if value != old_value else None)
else: # We assume that the LATEST backup will be for the biggest postgres version!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to what does the "biggest" refer in this comment ? Does it meant the LATEST backup has to be for PG v 12 when spilo-13 is deployed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't have the major version of the source cluster specified explicitly we try all postgres versions starting from the biggest. I.e., get_wale_environments() function yields tuples:

  • ('WALE_S3_PREFIX', 's3://$bucket/spilo/cluster-name/$uid/wal/12')
  • ('WALE_S3_PREFIX', 's3://$bucket/spilo/cluster-name/$uid/wal/11')
  • ('WALE_S3_PREFIX', 's3://$bucket/spilo/cluster-name/$uid/wal/10')
    and so on.

For every prefix we call wal-e backup-list and trying to find the backup suitable for the given recovery_target_time.
If the recovery_target_time is not specified we just pick the LATEST backup.

But! It might be, that under the s3://$bucket/spilo/cluster-name/$uid/wal/ path there are backups for 12 and lets say 10. The correct way of selecting the latest backup between two (or more) different versions would be listing backups for all versions and choosing between them. This is too much work with too few benefits. Therefore I made an assumption if the backup for version 12 is there to not continue with other versions, because most likely the backup for 10 would be older.

return 'LATEST', (name if value != old_value else None)
if recovery_target_time:
raise Exception('Could not find any backups prior to the point in time {0}'.format(recovery_target_time))
raise Exception('Could not find any backups')


def run_clone_from_s3(options):
backup_name = 'LATEST'
if options.recovery_target_time:
backup_list_cmd = build_wale_command('backup-list')
backup_list = subprocess.check_output(backup_list_cmd)
backup_name = choose_backup(backup_list, options.recovery_target_time)
env = os.environ.copy()

backup_name, update_envdir = find_backup(options.recovery_target_time, env)

backup_fetch_cmd = build_wale_command('backup-fetch', options.datadir, backup_name)
logger.info("cloning cluster %s using %s", options.name, ' '.join(backup_fetch_cmd))
if not options.dry_run:
ret = subprocess.call(backup_fetch_cmd)
ret = subprocess.call(backup_fetch_cmd, env=env)
if ret != 0:
raise Exception("wal-e backup-fetch exited with exit code {0}".format(ret))

if update_envdir: # We need to update file in the clone envdir or restore_command will fail!
envdir = get_clone_envdir()
with open(os.path.join(envdir, update_envdir), 'w') as f:
f.write(env[update_envdir])
return 0


Expand Down
29 changes: 15 additions & 14 deletions postgres-appliance/bootstrap/maybe_pg_upgrade.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@ def main():
from pg_upgrade import PostgresqlUpgrade
from patroni.config import Config
from patroni.utils import polling_loop
from spilo_commons import get_binary_version

config = Config(sys.argv[1])
config['postgresql'].update({'callbacks': {}, 'pg_ctl_timeout': 3600*24*7})
upgrade = PostgresqlUpgrade(config['postgresql'])
upgrade = PostgresqlUpgrade(config)

bin_version = upgrade.get_binary_version()
bin_version = get_binary_version(upgrade.pgcommand(''))
cluster_version = upgrade.get_cluster_version()

if cluster_version == bin_version:
Expand All @@ -37,13 +37,9 @@ def main():
upgrade.stop(block_callbacks=True, checkpoint=False)
raise Exception('Failed to run bootstrap.post_init')

locale = upgrade.query('SHOW lc_collate').fetchone()[0]
encoding = upgrade.query('SHOW server_encoding').fetchone()[0]
initdb_config = [{'locale': locale}, {'encoding': encoding}]
if upgrade.query("SELECT current_setting('data_checksums')::bool").fetchone()[0]:
initdb_config.append('data-checksums')
if not upgrade.prepare_new_pgdata(bin_version):
raise Exception('initdb failed')

logger.info('Dropping objects from the cluster which could be incompatible')
try:
upgrade.drop_possibly_incompatible_objects()
except Exception:
Expand All @@ -54,15 +50,18 @@ def main():
if not upgrade.stop(block_callbacks=True, checkpoint=False):
raise Exception('Failed to stop the cluster with old postgres')

logger.info('initdb config: %s', initdb_config)

logger.info('Executing pg_upgrade')
if not upgrade.do_upgrade(bin_version, initdb_config):
if not upgrade.do_upgrade():
raise Exception('Failed to upgrade cluster from {0} to {1}'.format(cluster_version, bin_version))

logger.info('Starting the cluster with new postgres after upgrade')
if not upgrade.start():
raise Exception('Failed to start the cluster with new postgres')

try:
upgrade.update_extensions()
except Exception as e:
logger.error('Failed to update extensions: %r', e)

upgrade.analyze()


Expand All @@ -71,8 +70,10 @@ def call_maybe_pg_upgrade():
import os
import subprocess

from spilo_commons import PATRONI_CONFIG_FILE

my_name = os.path.abspath(inspect.getfile(inspect.currentframe()))
ret = subprocess.call([sys.executable, my_name, os.path.join(os.getenv('PGHOME'), 'postgres.yml')])
ret = subprocess.call([sys.executable, my_name, PATRONI_CONFIG_FILE])
if ret != 0:
logger.error('%s script failed', my_name)
return ret
Expand Down
134 changes: 0 additions & 134 deletions postgres-appliance/bootstrap/pg_upgrade.py

This file was deleted.

Loading