Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Compare ported to unported PG schemas in portdb test job #13808

Merged
merged 5 commits into from
Sep 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 0 additions & 31 deletions .ci/scripts/postgres_exec.py

This file was deleted.

2 changes: 1 addition & 1 deletion .ci/scripts/test_export_data_command.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ else
fi

# Create the PostgreSQL database.
poetry run .ci/scripts/postgres_exec.py "CREATE DATABASE synapse"
psql -c "CREATE DATABASE synapse"

# Port the SQLite databse to postgres so we can check command works against postgres
echo "+++ Port SQLite3 databse to postgres"
Expand Down
36 changes: 25 additions & 11 deletions .ci/scripts/test_synapse_port_db.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,27 @@
#
# Test script for 'synapse_port_db'.
# - configures synapse and a postgres server.
# - runs the port script on a prepopulated test sqlite db
# - also runs it against an new sqlite db
# - runs the port script on a prepopulated test sqlite db. Checks that the
# return code is zero.
# - reruns the port script on the same sqlite db, targetting the same postgres db.
# Checks that the return code is zero.
# - runs the port script against a new sqlite db. Checks the return code is zero.
#
# Expects Synapse to have been already installed with `poetry install --extras postgres`.
# Expects `poetry` to be available on the `PATH`.

set -xe
set -xe -o pipefail
cd "$(dirname "$0")/../.."

echo "--- Generate the signing key"

# Generate the server's signing key.
poetry run synapse_homeserver --generate-keys -c .ci/sqlite-config.yaml

echo "--- Prepare test database"

# Make sure the SQLite3 database is using the latest schema and has no pending background update.
# Make sure the SQLite3 database is using the latest schema and has no pending background updates.
poetry run update_synapse_database --database-config .ci/sqlite-config.yaml --run-background-updates

# Create the PostgreSQL database.
poetry run .ci/scripts/postgres_exec.py "CREATE DATABASE synapse"
psql -c "CREATE DATABASE synapse"

echo "+++ Run synapse_port_db against test database"
# TODO: this invocation of synapse_port_db (and others below) used to be prepended with `coverage run`,
Expand All @@ -45,9 +45,23 @@ rm .ci/test_db.db
poetry run update_synapse_database --database-config .ci/sqlite-config.yaml --run-background-updates

# re-create the PostgreSQL database.
poetry run .ci/scripts/postgres_exec.py \
"DROP DATABASE synapse" \
"CREATE DATABASE synapse"
psql \
-c "DROP DATABASE synapse" \
-c "CREATE DATABASE synapse"

echo "+++ Run synapse_port_db against empty database"
poetry run synapse_port_db --sqlite-database .ci/test_db.db --postgres-config .ci/postgres-config.yaml

echo "--- Create a brand new postgres database from schema"
cp .ci/postgres-config.yaml .ci/postgres-config-unported.yaml
sed -i -e 's/database: synapse/database: synapse_unported/' .ci/postgres-config-unported.yaml
psql -c "CREATE DATABASE synapse_unported"
poetry run update_synapse_database --database-config .ci/postgres-config-unported.yaml --run-background-updates

echo "+++ Comparing ported schema with unported schema"
# Ignore the tables that portdb creates. (Should it tidy them up when the porting is completed?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. What do you call 'completed' though?; the script is meant to be re-run multiple times.
I think it's OK as it is. It may be useful to have a trace that the database is a ported one anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you call 'completed' though?; the script is meant to be re-run multiple times.

My understanding: portdb is a one-time migration process that can be suspended (cancel the script) and resumed (rerun the script). Once it finished porting across all rows you're supposed to only write to the postgres DB, and not rewrite to the sqlite db.

More precisely: the portdb script assumes the following for each table to be ported.

If all rows in the range X <= rowid <= Y have been ported in a previous run 1 of the script, then there are no new rows in that range added between run 1 and run 2 of the script.

I am slightly paranoid that this assumption is false. https://sqlite.org/rowidtable.html notes that rowids can change if you vacuum the script. If you have an INTEGER PRIMARY KEY then you can insert any rowid you like in any order you like. (Related: #13226.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be explicit: the above is mostly paranoia and extreme caution rather than anything concrete. I plan to leave the script and comment as it is. How does that sound?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that you can re-run the portdb script many times online (it's an incremental process), but then you finalise it by running it once again with Synapse offline. Hence there isn't really a 'completion' condition, unless we make the admin specify some flag to say 'this is the last time I intend to run the script'..

psql synapse -c "DROP TABLE port_from_sqlite3;"
pg_dump --format=plain --schema-only --no-tablespaces --no-acl --no-owner synapse_unported > unported.sql
pg_dump --format=plain --schema-only --no-tablespaces --no-acl --no-owner synapse > ported.sql
# By default, `diff` returns zero if there are no changes and nonzero otherwise
diff -u unported.sql ported.sql | tee schema_diff
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously this is a syntactic diff. Do we have reasonable confidence that this will be OK for semantically equivalent databases?

(does pg_dump have a deterministic dump order etc?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema is apparently dumped in a deterministic order, see https://www.postgresql.org/message-id/1303223266.24799.13.camel%40fsopti579.F-Secure.com. https://stackoverflow.com/a/2179376/5252017 claims it's dumped in a "fairly deterministic" order.

No guarantees about the data. I suggest we leave this be for now: there's very few rows inserted into a brand new synapse database---IIRC they're just stream positions and schema metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I only dump the schema in the PR anyway.

I suppose we could also dump data and check they agree, but I'm tempted to leave this PR as is. WDYT?

27 changes: 23 additions & 4 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -361,18 +361,22 @@ jobs:

steps:
- uses: actions/checkout@v2
- run: sudo apt-get -qq install xmlsec1
- run: sudo apt-get -qq install xmlsec1 postgresql-client
- uses: matrix-org/setup-python-poetry@v1
with:
extras: "postgres"
- run: .ci/scripts/test_export_data_command.sh
env:
PGHOST: localhost
PGUSER: postgres
PGPASSWORD: postgres
PGDATABASE: postgres


portdb:
if: ${{ !failure() && !cancelled() }} # Allow previous steps to be skipped, but not fail
needs: linting-done
runs-on: ubuntu-latest
env:
TOP: ${{ github.workspace }}
strategy:
matrix:
include:
Expand All @@ -398,12 +402,27 @@ jobs:

steps:
- uses: actions/checkout@v2
- run: sudo apt-get -qq install xmlsec1
- run: sudo apt-get -qq install xmlsec1 postgresql-client
- uses: matrix-org/setup-python-poetry@v1
with:
python-version: ${{ matrix.python-version }}
extras: "postgres"
- run: .ci/scripts/test_synapse_port_db.sh
id: run_tester_script
env:
PGHOST: localhost
PGUSER: postgres
PGPASSWORD: postgres
PGDATABASE: postgres
- name: "Upload schema differences"
uses: actions/upload-artifact@v3
if: ${{ failure() && !cancelled() && steps.run_tester_script.outcome == 'failure' }}
with:
name: Schema dumps
path: |
unported.sql
ported.sql
schema_diff

complement:
if: "${{ !failure() && !cancelled() }}"
Expand Down
1 change: 1 addition & 0 deletions changelog.d/13808.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Check that portdb generates the same postgres schema as that in the source tree.