Skip to content

Commit

Permalink
Implement filtering support. (dimitri#19)
Browse files Browse the repository at this point in the history
* Implement a parser for filtering settings, and filter table lists.

* Apply filters to listing sequences.

We filter only those sequences that have a link dependency to the selected
tables. There is no way at the moment to select a sequence that's not
tracked to a specific default value for a column that doesn't belong to one
of the selected tables.

That said, when filtering is not used, all sequences are processed, even
those not attached to any table. Which means that `pgcopydb copy sequences`
in the use case where filtering is needed for tables, but not for sequences.

* Implement pgcopydb list ... --list-skipped.

This option allows debugging the filtering setup, and is also needed to
implement pg_restore catalog editing (--use-list) to avoid installing
objects that are filtered-out in the setup.

* Document filtering setup.

* Add a test for the filtering capabilities.

* Improve filtering of pg_restore list entries without an OID.

Some pg_restore --list entries such as INDEX ATTACH miss both the catalog
and the object oid, and then we need to match them by their pg_restore list
name, which is a compound of the schema name, object name, and owner name.

Adding to that, the way pg_restore builds that compound name is with using a
single space as a separator, and replacing \n and \r characters with a
single space. This makes the pg_restore list output unfriendly to machine
parsing, and so instead we generate the list name from the Postgres catalogs
in our catalog queries.

Then we can use an hash table on the OIDs and another one on the compound
names and find out if an INDEX ATTACH catalog entry refers to an index that
has been filtered-out by filtering rules, and comment it out then.

* Add a pg_depend recursive walker facility.

This allows filtering from pg_restore --list objects that depend on tables
that have been filtered out.
  • Loading branch information
dimitri authored May 10, 2022
1 parent c805830 commit aea0ff0
Show file tree
Hide file tree
Showing 41 changed files with 6,151 additions and 239 deletions.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,4 @@ src/bin/lib/log/** -citus-style
src/bin/lib/libs/** -citus-style
src/bin/lib/pg/** -citus-style
src/bin/lib/subcommands.c/** -citus-style
src/bin/lib/uthash/** -citus-style
15 changes: 3 additions & 12 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -19,17 +19,8 @@ test: build

tests: test ;

tests/pagila: build
$(MAKE) -C tests/pagila

tests/pagila-multi-steps: build
$(MAKE) -C tests/pagila-multi-steps

tests/blobs: build
$(MAKE) -C tests/blobs

tests/unit: build
$(MAKE) -C tests/unit
tests/*: build
$(MAKE) -C $@

install: bin
$(MAKE) -C src/bin/ install
Expand All @@ -56,5 +47,5 @@ debsh-qa: deb-qa

.PHONY: all
.PHONY: bin clean install docs
.PHONY: test tests tests/pagila tests/pagila-multi-steps tests/blobs tests/unit
.PHONY: test tests tests/*
.PHONY: deb debsh deb-qa debsh-qa
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ def __init__(self, **options):
# (source start file, name, description, authors, manual section).
man_pages = [
("ref/pgcopydb", "pgcopydb", "pgcopydb", [author], 1),
("ref/pgcopydb_config", "pgcopydb", "pgcopydb", [author], 5,),
(
"ref/pgcopydb_copy-db",
"pgcopydb copy-db",
Expand Down
1 change: 1 addition & 0 deletions docs/ref/manual.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@ their own manual page.
pgcopydb_restore
pgcopydb_list
pgcopydb_copy
pgcopydb_config
113 changes: 113 additions & 0 deletions docs/ref/pgcopydb_config.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
.. _config:

pgcopydb configuration
======================

Manual page for the configuration of pgcopydb. The ``pgcopydb`` command
accepts sub-commands and command line options, see the manual for those
commands for details. The only setup that ``pgcopydb`` commands accept is
the filtering.

.. _filtering:

Filtering
---------

Filtering allows to skip some object definitions and data when copying from
the source to the target database. The pgcopydb commands that accept the
option ``--filter`` (or ``--filters``) expect an existing filename as the
option argument. The given filename is read in the INI file format, but only
uses sections and option keys. Option values are not used.

Here is an inclusion based filter configuration example:

.. code-block:: ini
:linenos:
[include-only-table]
public.allcols
public.csv
public.serial
public.xzero
[exclude-index]
public.foo_gin_tsvector
[exclude-table-data]
public.csv
Here is an exclusion based filter configuration example:

.. code-block:: ini
:linenos:
[exclude-schema]
foo
bar
expected
[exclude-table]
"schema"."name"
schema.othername
err.errors
public.serial
[exclude-index]
schema.indexname
[exclude-table-data]
public.bar
nsitra.test1
Filtering can be done with pgcopydb by using the following rules, which are
also the name of the sections of the INI file.

include-only-tables
^^^^^^^^^^^^^^^^^^^

This section allows listing the exclusive list of the source tables to copy
to the target database. No other table will be processed by pgcopydb.

Each line in that section should be a schema-qualified table name. `Postgres
identifier quoting rules`__ can be used to avoid ambiguity.

__ https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

When the section ``include-only-tables`` is used in the filtering
configuration then the sections ``exclude-schema`` and ``exclude-table`` are
disallowed. We would not know how to handle tables that exist on the source
database and are not part of any filter.

exclude-schema
^^^^^^^^^^^^^^

This section allows adding schemas (Postgres namespaces) to the exclusion
filters. All the tables that belong to any listed schema in this section are
going to be ignored by the pgcopydb command.

This section is not allowed when the section ``include-only-tables`` is
used.

exclude-table
^^^^^^^^^^^^^

This section allows to add a list of qualified table names to the exclusion
filters. All the tables that are listed in the ``exclude-table`` section are
going to be ignored by the pgcopydb command.

This section is not allowed when the section ``include-only-tables`` is
used.

exclude-index
^^^^^^^^^^^^^

This section allows to add a list of qualified index names to the exclusion
filters. It is then possible for pgcopydb to operate on a table and skip a
single index definition that belong to a table that is still processed.

exclude-table-data
^^^^^^^^^^^^^^^^^^

This section allows to skip copying the data from a list of qualified table
names. The schema, index, constraints, etc of the table are still copied
over.
7 changes: 7 additions & 0 deletions docs/ref/pgcopydb_copy-db.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Postgres instance to the target Postgres instance.
--no-acl Prevent restoration of access privileges (grant/revoke commands).
--no-comments Do not output commands to restore comments
--skip-large-objects Skip copying large objects (blobs)
--filters <filename> Use the filters defined in <filename>
--restart Allow restarting when temp files exist already
--resume Allow resuming operations after a failure
--not-consistent Allow taking a new snapshot on the source database
Expand Down Expand Up @@ -162,6 +163,12 @@ The following options are available to ``pgcopydb copy-db``:
Skip copying large objects, also known as blobs, when copying the data
from the source database to the target database.

--filters <filename>

This option allows to exclude table and indexes from the copy operations.
See :ref:`filtering` for details about the expected file format and the
filtering options available.

--restart

When running the pgcopydb command again, if the work directory already
Expand Down
30 changes: 24 additions & 6 deletions docs/ref/pgcopydb_list.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,10 @@ tables to COPY the data from.
pgcopydb list tables: List all the source tables to copy data from
usage: pgcopydb list tables --source ...

--source Postgres URI to the source database
--without-pkey List only tables that have no primary key
--source Postgres URI to the source database
--filter <filename> Use the filters defined in <filename>
--list-skipped List only tables that are setup to be skipped
--without-pkey List only tables that have no primary key

.. _pgcopydb_list_sequences:

Expand All @@ -50,7 +52,9 @@ sequences to COPY the data from.
pgcopydb list sequences: List all the source sequences to copy data from
usage: pgcopydb list sequences --source ...

--source Postgres URI to the source database
--source Postgres URI to the source database
--filter <filename> Use the filters defined in <filename>
--list-skipped List only tables that are setup to be skipped

.. _pgcopydb_list_indexes:

Expand All @@ -68,9 +72,11 @@ indexes to COPY the data from.
pgcopydb list indexes: List all the indexes to create again after copying the data
usage: pgcopydb list indexes --source ... [ --schema-name [ --table-name ] ]

--source Postgres URI to the source database
--schema-name Name of the schema where to find the table
--table-name Name of the target table
--source Postgres URI to the source database
--schema-name Name of the schema where to find the table
--table-name Name of the target table
--filter <filename> Use the filters defined in <filename>
--list-skipped List only tables that are setup to be skipped


Options
Expand Down Expand Up @@ -101,6 +107,18 @@ The following options are available to ``pgcopydb dump schema``:
List only tables from the source database when they have no primary key
attached to their schema.

--filter <filename>

This option allows to skip objects in the list operations. See
:ref:`filtering` for details about the expected file format and the
filtering options available.

--list-skipped

Instead of listing objects that are selected for copy by the filters
installed with the ``--filter`` option, list the objects that are going to
be skipped when using the filters.

Environment
-----------

Expand Down
18 changes: 18 additions & 0 deletions src/bin/lib/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,26 @@ The single-header library is used to implement parsing "modern" command lines.
The parson librairy at https://github.com/kgabis/parson is a single C file
and MIT licenced. It allows parsing from and serializing to JSON.

## Configuration file parsing

We utilize the "ini.h" ini-file reader from https://github.com/mattiasgustavsson/libs

## pg

We vendor-in some code from the Postgres project at
https://git.postgresql.org/gitweb/?p=postgresql.git;a=summary. This code is
licenced under The PostgreSQL Licence, a derivative of the BSD licence.

## uthash

A hash in C that's available at

https://github.com/troydhanson/uthash

It says that

All you need to do is copy the header file into your project, and include
it. Since uthash is a header file only, there is no library code to link
against.

This directory contains only the `uthash.h` file, which implements hash tables.
Loading

0 comments on commit aea0ff0

Please sign in to comment.