Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
b56ee79
Add common set of callbacks to shell scripts
errose28 Jan 24, 2023
317a9f7
Remove MLV VERSION file checks
errose28 Jan 25, 2023
cf9dd45
Add finalization status check
errose28 Jan 25, 2023
e3ab1f9
Base callbacks on starting version only, and better support test matrix
errose28 Jan 25, 2023
b4964db
Fix bugs discovered when running
errose28 Jan 26, 2023
2b517ec
Make CI only run upgrade tests
errose28 Jan 26, 2023
0fffde2
Bash quoting fix in for robot args in testlib
errose28 Jan 26, 2023
446d9b7
Revert "Bash quoting fix in for robot args in testlib"
errose28 Jan 26, 2023
67338db
Fixes around finalization status robot checks
errose28 Jan 26, 2023
6abf01f
Add scm ha cluster
errose28 Feb 4, 2023
ef982af
Rename callbacks and fix finalization checks
errose28 Feb 4, 2023
ebafcca
Add data dur creation for full HA cluster
errose28 Feb 7, 2023
e5795f0
Update upgrade acc test README with new instructions
errose28 Feb 7, 2023
de7d66c
Merge branch 'master' into improve-upgrade-acc-tests
errose28 Mar 16, 2023
67cdd19
Fix scripting issues when loading compose files
errose28 Mar 16, 2023
db1e1f7
Move and rename callbacks
errose28 Mar 20, 2023
e65b2ec
Improve warning about finding callbacks
errose28 Mar 20, 2023
8e24f84
Handle finalization/prepare checks based on version
errose28 Mar 21, 2023
52bcb3e
Fix parameter order
errose28 Mar 21, 2023
278de5b
Handle current version in runner image
errose28 Mar 21, 2023
b829b56
Update README
errose28 Mar 21, 2023
33d6979
Update unused manual upgrade callbacks
errose28 Mar 21, 2023
0896abe
Test om-ha
errose28 Mar 21, 2023
b7852ec
Reverse test order
errose28 Mar 21, 2023
2e60fb9
Fix typo using wrong cluster in om-ha
errose28 Mar 21, 2023
139d500
Propogate upgrade version env vars into containers
errose28 Mar 21, 2023
9606427
Use scm to run commands whether using HA or not
errose28 Mar 21, 2023
bf124a7
Fix typo in robot
errose28 Mar 21, 2023
fddd42f
Fix grep
errose28 Mar 21, 2023
284d37c
Restore original test order and clusters used
errose28 Mar 21, 2023
a74bfb1
Lots of bug fixes
errose28 Mar 21, 2023
357d1a3
Remove failing callback check
errose28 Mar 21, 2023
19ac3fd
Fix disk state being deleted on restart
errose28 Mar 22, 2023
b5c06ff
Update directory structure and README
errose28 Mar 22, 2023
c296d21
Revert Github Actions changes for testing
errose28 Mar 23, 2023
fd2164c
Only test upgrade/downgrade with latest version
errose28 Mar 23, 2023
a24321d
Merge branch 'master' into HDDS-6633
errose28 Mar 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 60 additions & 33 deletions hadoop-ozone/dist/src/main/compose/upgrade/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,39 +18,60 @@ This directory contains cluster definitions and scripts for testing upgrades fro
previous version, or to the local build of the code. It is designed to catch backwards incompatible changes made between
an older release of Ozone and a later release (which may be the local build).

## IMPORTANT NOTES
## Quick Guide For Release Managers

1. Backwards Incompatibility
- These tests will not catch backwards incompatible changes against commits in between releases.
- Example:
1. After 1.0.0, a change *c1* is made that is backwards compatible with 1.0.0.
2. After *c1*, a new change *c2* is made that is also backwards compatible with 1.0.0 but backwards *incompatible* with *c1*.
- The whole test matrix of upgrading and downgrading to/from previous releases to the current code is too time consuming to do on every CI run. Instead we recommend release managers manually test the full matrix before each release, and let only the tests from the previous release to the current code be run in CI.

- This test suite will not raise an error for *c2*, because it only tests against the last release
(1.0.0), and not the last commit (*c1*).
1. Before the release, test the whole matrix of upgrades from the previous version to the version you are releasing.
- This is important manual verification that the release does not break backwards compatibility, and the results can be included in the release vote mailing thread.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- This is important manual verification that the release does not break backwards compatibility, and the results can be included in the release vote mailing thread.
- This is an important manual verification that the release does not break backwards compatibility, and the results can be included in the release vote mailing thread.

- To do this, uncomment all lines that contain `run_test` in the *test.sh* file, and execute *test.sh* either locally or on GitHub actions.

2. After the release is finished and its docker image is published, add the new version to the test matrix.
1. Change the `OZONE_CURRENT_VERSION` variable to `OZONE_CURRENT_VERSION=<newly-released-version>`.
2. Comment out all `run_test` lines in *test.sh*.
3. Add a new line: `run_test ha non-rolling-upgrade <newly-released-version> "$OZONE_CURRENT_VERSION"` before the commented out lines.

## Important Notes on Test Scope

- These tests will not catch backwards incompatible changes against commits in between releases.
- Example:
1. After 1.2.0, a change *c1* is made that is backwards compatible with 1.2.0.
2. After *c1*, a new change *c2* is made that is also backwards compatible with 1.0.0 but backwards *incompatible* with *c1*.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. After *c1*, a new change *c2* is made that is also backwards compatible with 1.0.0 but backwards *incompatible* with *c1*.
2. After *c1*, a new change *c2* is made that is also backwards compatible with 1.2.0 but backwards *incompatible* with *c1*.


- This test suite will not raise an error for *c2*, because it only tests against the last release
(1.2.0), and not the last commit (*c1*).

## Supported Versions

Non-rolling upgrades and downgrades are supported from 1.1.0 to any later version. Note that 1.1.0 did not have the non-rolling upgrade framework, so things like preparing the OMs for upgrade and checking finalization status are not present in that version. Manual upgrade is the only supported upgrade option from 1.0.0 to 1.1.0.

## Directory Layout

### upgrades

Each type of upgrade has a subdirectory under the *upgrades* directory. Each upgrade's steps are controlled by a *driver.sh* script in its *upgrades/\<upgrade-type>* directory. Callbacks to execute throughout the upgrade are called by this script and should be placed in a file called *callback.sh* in the *upgrades/\<upgrade-type>/\<upgrade-from>-\<upgrade-to>* directory. After the test is run, results and docker volume data for the upgrade for these versions will also be placed in this directory. The results of all upgrades run as part of the tests will be placed in a *results* folder in the top level upgrade directory.
Each type of upgrade has a subdirectory under the *upgrades* directory.

- Each upgrade's steps are controlled by a *driver.sh* script in its *upgrades/\<upgrade-type>* directory. Callbacks to execute throughout the upgrade are called by this script and should be placed in a file called *callback.sh* in the *upgrades/\<upgrade-type>/\<upgrade-to>* directory.

- As the test is run, result logs and docker volume data for the upgrade for these versions will be placed in *upgrades/\<upgrade-type>/execution/\<upgrade-from>-\<upgrade-to>*. This allows a suite of upgrades to be run without conflicting directory names.

- The result logs of all upgrades run as part of the tests will be copied to a *result* directory in the top level upgrade directory.

#### non-rolling-upgrade

- Any necessary conversion of on disk structures from the old version to the new version are handled by Ozone's non-rolling upgrade framework.

- The name of each subdirectory in *non-rolling-upgrade* is a version to start the upgrade test from, with a *callback.sh* file whose callbacks will be invoked for any upgrade starting in that version.

- The *common* directory contains callbacks used for all upgrade tests regardless of the version.

- Supported Callbacks:
1. `setup`: Run before ozone is started in the old version.
3. `with_old_version`: Run while ozone is running in the old version.
3. `with_new_version_pre_finalized`: Run after ozone is stopped in the old version, and brought back up and running in the new version pre-finalized.
4. `with_old_version_downgraded`: Run after ozone is stopped in the new version pre-finalized, and restarted in the old version again.
5. `with_new_version_finalized`: Run after ozone is stopped in the old version after donwgrade, started again in the new version pre-finalized, and then finalized.
1. `with_old_version`: Run while ozone is in the original version to start the upgrade from, before any upgrade steps have been done.
2. `with_this_version_pre_finalized`: Run after ozone is stopped in the old version, and brought back up and running in the new version pre-finalized.
3. `with_old_version_downgraded`: Run after ozone is stopped in the new version pre-finalized, and restarted in the old version again.
4. `with_this_version_finalized`: Run after ozone is stopped in the old version after donwgrade, started again in the new version pre-finalized, and then finalized.
- The upgrade is complete when this callback runs.

- Note that on the first upgrade after the non-rolling upgrade framework is added, the old version does not have the non-rolling upgrade framework, but the new version does.
- The non-rolling upgrade framework can still be used, the only difference is that OMs cannot be prepared before moving from the old version to the new version.
- Set the variable `OZONE_PREPARE_OMS` to `false` in `callback.sh` setup function to disable OM preparation as part of the upgrade.

#### manual-upgrade

- This is a legacy option that was used before the upgrade framework was introduced in 1.2.0. This option is left as an example in case it needs to be used for some reason in the future.
Expand All @@ -60,40 +81,46 @@ Each type of upgrade has a subdirectory under the *upgrades* directory. Each upg
- This is primarily for testing upgrades from versions before the non-rolling upgrade framework was introduced.

- Supported Callbacks:
1. `setup_with_old_version`: Run before ozone is started in the old version.
1. `setup_old_version`: Run before ozone is started in the old version.
3. `with_old_version`: Run while ozone is running in the old version.
3. `setup_with_new_version`: Run after ozone is stopped in the old version, but before it is restarted in the new version.
4. `with_new_version`: Run while ozone is running in the new version.
3. `setup_this_version`: Run after ozone is stopped in the old version, but before it is restarted in the new version.
4. `with_this_version`: Run while ozone is running in the new version.

### compose

Docker compose cluster definitions to be used in upgrade testing are defined in the *compose* directory. A compose cluster can be selected by sourcing the *load.sh* script in the compose cluster's directory on the setup callback for the upgrade test.
Docker compose cluster definitions to be used in upgrade testing are defined in the *compose* directory. A compose cluster can be selected by specifying the name of its subdirectory as the first argument to `run_test`. `run_test` will then source the `load.sh` script in the cluster's directory so it is used during the test. For manual testing, docker compose can be used normally from the compose cluster directory. Note that some clusters may not work with older versions. Ozone 1.1.0, for example, does not support SCM HA.

## Persisting Data

- Data for each container is persisted in a mounted volume.

- By default it's `data` under the *compose/upgrade/\<versions>* directory, but can be overridden with the `OZONE_VOLUME` environment variable.
- By default it's *data* under the *upgrades/\<upgrade-type>/execution/\<from-version>-\<to-version>* directory, but can be overridden with the `OZONE_VOLUME` environment variable.

- This allows data to be persisted in the cluster throughout container restarts, meaning that tests can check that data written in older versions is still readable in newer versions.
- Mounting volumes allows data to be persisted in the cluster throughout container restarts, meaning that tests can check that data written in older versions is still readable in newer versions.

- Data will be available after the tests finish for debugging purposes. It will be erased on a following run of the test.

## Extending

### Adding New Tests

- To add tests to an existing upgrade type, edit its *compose/upgrade/\<upgrade-type>/\<versions>/callback.sh* file and add commands in the callback function when they should be run.
- Tests that should run for all upgrades, regardless of the version being tested, can be added to *compose/upgrade/\<upgrade-type>/common/callback.sh*.

- Tests that should run only for an upgrade to a specific version can be added to *compose/upgrade/\<upgrade-type>/\<ending-upgrade-version>/callback.sh*.

- Each callback file will have access to the following environment variables:
- `OZONE_UPGRADE_FROM`: The version of ozone being upgraded from.
- `OZONE_UPGRADE_TO`: The version of ozone being upgraded to.
- `TEST_DIR`: The top level *upgrade* directory containing all files for upgrade testing.
- Add commands in the callback function when they should be run. Each callback file will have access to the following environment variables:
- `OZONE_UPGRADE_FROM`: The version of ozone being upgraded from.
- `OZONE_UPGRADE_TO`: The version of ozone being upgraded to.
- `TEST_DIR`: The top level *upgrade* directory containing all files for upgrade testing.
- `SCM`: The name of the SCM container to run robot tests from.
- This can be passed as the first argument to `execute_robot_test`.
- This allows the same tests to work with and without SCM HA.

### Testing New Versions

- To test upgrade between different versions, add a line `run_test <upgrade-type> <old-version> <new-version>` to the top level *test.sh* file.
- The `run_test` function will execute *\<upgrade-type>/test.sh* with the callbacks defined in *\<upgrade-type>/\<old-version>-\<new-version>/callback.sh*.
- To test upgrade between different versions, add a line `run_test <compose-cluster-directory> <upgrade-type> <old-version> <new-version>` to the top level *test.sh* file.
- The `run_test` function will execute *upgrades/\<upgrade-type>/driver.sh* with the callbacks defined in *upgrades/\<upgrade-type>/common/callback.sh* and *upgrades/\<upgrade-type>/\<new-version>/callback.sh*.

- The variable `OZONE_CURRENT_VERSION` is used to define the version corresponding to the locally built source code in the `apache/ozone-runner` image.
- All other versions will be treated as tags specifying a released version of the `apache/ozone` docker image to use.

- If one of the specified versions does not match the current version defined by `OZONE_CURRENT_VERSION`, it will be pulled from the corresponding *apache/ozone* docker image.
- Else, the current version will be used, which will run the locally built source code in the `apache/ozone-runner` image.
Original file line number Diff line number Diff line change
Expand Up @@ -23,27 +23,37 @@ x-common-config:
- docker-config
image: ${OZONE_IMAGE}

x-replication:
&replication
x-environment:
&environment
OZONE-SITE.XML_ozone.replication: ${OZONE_REPLICATION_FACTOR:-3}
OZONE_UPGRADE_TO: ${OZONE_UPGRADE_TO:-0}
OZONE_UPGRADE_FROM: ${OZONE_UPGRADE_FROM:-0}
OZONE-SITE.XML_hdds.scm.safemode.min.datanode: ${OZONE_SAFEMODE_MIN_DATANODES:-1}

x-datanode:
&datanode
command: ["ozone","datanode"]
<<: *common-config
environment:
<<: *replication
<<: *environment
ports:
- 9864
- 9882

x-scm:
&scm
command: ["ozone","scm"]
<<: *common-config
ports:
- 9876

x-om:
&om
command: ["ozone","om","${OM_HA_ARGS}"]
<<: *common-config
environment:
ENSURE_OM_INITIALIZED: /data/metadata/om/current/VERSION
<<: *replication
<<: *environment
ports:
- 9862
- 9872
Expand Down Expand Up @@ -81,27 +91,50 @@ services:
- *ozone-dir
- *transformation

scm:
command: ["ozone","scm"]
<<: *common-config
scm1:
<<: *scm
environment:
ENSURE_SCM_INITIALIZED: /data/metadata/scm/current/VERSION
OZONE-SITE.XML_hdds.scm.safemode.min.datanode: ${OZONE_SAFEMODE_MIN_DATANODES:-1}
<<: *replication
<<: *environment
networks:
net:
ipv4_address: 10.9.0.14
ports:
- 9876:9876
volumes:
- ${OZONE_VOLUME}/scm:/data
- ${OZONE_VOLUME}/scm1:/data
- *ozone-dir
- *transformation
scm2:
<<: *scm
environment:
WAITFOR: scm1:9894
ENSURE_SCM_BOOTSTRAPPED: /data/metadata/scm/current/VERSION
<<: *environment
networks:
net:
ipv4_address: 10.9.0.15
volumes:
- ${OZONE_VOLUME}/scm2:/data
- *ozone-dir
- *transformation
scm3:
<<: *scm
environment:
WAITFOR: scm2:9894
ENSURE_SCM_BOOTSTRAPPED: /data/metadata/scm/current/VERSION
<<: *environment
networks:
net:
ipv4_address: 10.9.0.16
volumes:
- ${OZONE_VOLUME}/scm3:/data
- *ozone-dir
- *transformation

dn1:
<<: *datanode
networks:
net:
ipv4_address: 10.9.0.15
ipv4_address: 10.9.0.17
volumes:
- ${OZONE_VOLUME}/dn1:/data
- *ozone-dir
Expand All @@ -110,7 +143,7 @@ services:
<<: *datanode
networks:
net:
ipv4_address: 10.9.0.16
ipv4_address: 10.9.0.18
volumes:
- ${OZONE_VOLUME}/dn2:/data
- *ozone-dir
Expand All @@ -119,7 +152,7 @@ services:
<<: *datanode
networks:
net:
ipv4_address: 10.9.0.17
ipv4_address: 10.9.0.19
volumes:
- ${OZONE_VOLUME}/dn3:/data
- *ozone-dir
Expand All @@ -128,7 +161,7 @@ services:
<<: *datanode
networks:
net:
ipv4_address: 10.9.0.18
ipv4_address: 10.9.0.20
volumes:
- ${OZONE_VOLUME}/dn4:/data
- *ozone-dir
Expand All @@ -137,7 +170,7 @@ services:
<<: *datanode
networks:
net:
ipv4_address: 10.9.0.19
ipv4_address: 10.9.0.21
volumes:
- ${OZONE_VOLUME}/dn5:/data
- *ozone-dir
Expand All @@ -146,10 +179,10 @@ services:
command: ["ozone","recon"]
<<: *common-config
environment:
<<: *replication
<<: *environment
networks:
net:
ipv4_address: 10.9.0.20
ipv4_address: 10.9.0.22
ports:
- 9888:9888
volumes:
Expand All @@ -160,16 +193,17 @@ services:
command: ["ozone","s3g"]
<<: *common-config
environment:
<<: *replication
<<: *environment
networks:
net:
ipv4_address: 10.9.0.21
ipv4_address: 10.9.0.23
ports:
- 9878:9878
volumes:
- ${OZONE_VOLUME}/s3g:/data
- *ozone-dir
- *transformation

networks:
net:
driver: bridge
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,19 @@ OZONE-SITE.XML_ozone.om.address.omservice.om1=om1
OZONE-SITE.XML_ozone.om.address.omservice.om2=om2
OZONE-SITE.XML_ozone.om.address.omservice.om3=om3
OZONE-SITE.XML_ozone.om.ratis.enable=true

OZONE-SITE.XML_ozone.scm.service.ids=scmservice
OZONE-SITE.XML_ozone.scm.nodes.scmservice=scm1,scm2,scm3
OZONE-SITE.XML_ozone.scm.address.scmservice.scm1=scm1
OZONE-SITE.XML_ozone.scm.address.scmservice.scm2=scm2
OZONE-SITE.XML_ozone.scm.address.scmservice.scm3=scm3
OZONE-SITE.XML_ozone.scm.ratis.enable=true
OZONE-SITE.XML_ozone.scm.primordial.node.id=scm1

OZONE-SITE.XML_ozone.scm.pipeline.creation.interval=30s
OZONE-SITE.XML_ozone.scm.pipeline.owner.container.count=1
OZONE-SITE.XML_ozone.scm.names=scm
OZONE-SITE.XML_ozone.scm.datanode.id.dir=/data
OZONE-SITE.XML_ozone.scm.block.client.address=scm
OZONE-SITE.XML_ozone.scm.container.size=1GB
OZONE-SITE.XML_ozone.scm.client.address=scm

OZONE-SITE.XML_hdds.datanode.dir=/data/hdds

# If SCM sends container close commands as part of upgrade finalization while
Expand All @@ -53,6 +58,6 @@ OZONE-SITE.XML_ozone.recon.db.dir=/data/metadata/recon
OZONE-SITE.XML_ozone.recon.om.snapshot.task.interval.delay=1m
OZONE-SITE.XML_ozone.recon.address=recon:9891

no_proxy=om1,om2,om3,scm,s3g,kdc,localhost,127.0.0.1
no_proxy=om1,om2,om3,scm1,scm2,scm3,s3g,kdc,localhost,127.0.0.1

OM_SERVICE_ID=omservice
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,6 @@ source "$TEST_DIR/testlib.sh"

export COMPOSE_FILE="$TEST_DIR/compose/ha/docker-compose.yaml"
export OM_SERVICE_ID=omservice
create_data_dirs "${OZONE_VOLUME}"/{om1,om2,om3,dn1,dn2,dn3,dn4,dn5,recon,s3g,scm}
create_data_dirs "${OZONE_VOLUME}"/{om1,om2,om3,dn1,dn2,dn3,dn4,dn5,recon,s3g,scm1,scm2,scm3}

echo "Using docker cluster defined in $COMPOSE_FILE"
Loading