Skip to content

Commit

Permalink
New project getting started: pgcopydb.
Browse files Browse the repository at this point in the history
  • Loading branch information
dimitri committed Dec 20, 2021
0 parents commit 9183353
Show file tree
Hide file tree
Showing 53 changed files with 10,687 additions and 0 deletions.
28 changes: 28 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# top-most EditorConfig file
root = true

# rules for all files
# we use tabs with indent size 4
[*]
indent_style = tab
indent_size = 4
tab_width = 4
end_of_line = lf
insert_final_newline = true
charset = utf-8
trim_trailing_whitespace = true

# Don't change test output files, pngs or test data files
[*.{out,png,data}]
insert_final_newline = unset
trim_trailing_whitespace = unset

[*.{sql,sh,py,tex}]
indent_style = space
indent_size = 4
tab_width = 4

[*.yml]
indent_style = space
indent_size = 2
tab_width = 2
26 changes: 26 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
* whitespace=space-before-tab,trailing-space
*.[chly] whitespace=space-before-tab,trailing-space,indent-with-non-tab,tabwidth=4
*.dsl whitespace=space-before-tab,trailing-space,tab-in-indent
*.patch -whitespace
*.pl whitespace=space-before-tab,trailing-space,tabwidth=4
*.po whitespace=space-before-tab,trailing-space,tab-in-indent,-blank-at-eof
*.sgml whitespace=space-before-tab,trailing-space,tab-in-indent,-blank-at-eol
*.x[ms]l whitespace=space-before-tab,trailing-space,tab-in-indent

# Avoid confusing ASCII underlines with leftover merge conflict markers
README conflict-marker-size=32
README.* conflict-marker-size=32


# These files are maintained or generated elsewhere. We take them as is.
configure -whitespace

# all C files (implementation and header) use our style...
*.[ch] citus-style

# except these exceptions...
src/bin/lib/parson/** -citus-style
src/bin/lib/log/** -citus-style
src/bin/lib/libs/** -citus-style
src/bin/lib/pg/** -citus-style
src/bin/lib/subcommands.c/** -citus-style
41 changes: 41 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Global excludes across all subdirectories
*.o
*.bc
*.so
*.so.[0-9]
*.so.[0-9].[0-9]
*.sl
*.sl.[0-9]
*.sl.[0-9].[0-9]
*.dylib
*.dll
*.a
*.mo
*.pot
objfiles.txt
.deps/
*.gcno
*.gcda
*.gcov
*.gcov.out
lcov.info
coverage/
*.vcproj
*.vcxproj
win32ver.rc
*.exe
lib*dll.def
lib*.pc

# Local excludes in root directory
/config.log
/config.status
/pgsql.sln
/pgsql.sln.cache
/Debug/
/Release/
/autom4te.cache
/Makefile.global
/src/Makefile.custom
/tests/__pycache__/
/env/
19 changes: 19 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Copyright (c) 2021 The PostgreSQL Global Development Group.

PostgreSQL License

Permission to use, copy, modify, and distribute this software and its
documentation for any purpose, without fee, and without a written agreement
is hereby granted, provided that the above copyright notice and this
paragraph and the following two paragraphs appear in all copies.

IN NO EVENT SHALL MICROSOFT CORPORATION BE LIABLE TO ANY PARTY FOR DIRECT,
INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST
PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN
IF MICROSOFT CORPORATION HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

MICROSOFT CORPORATION SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND
MICROSOFT CORPORATION HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT,
UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
22 changes: 22 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Copyright (c) 2021 The PostgreSQL Global Development Group.
# Licensed under the PostgreSQL License.

TOP := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))

all: bin ;

bin:
$(MAKE) -C src/bin/ all

clean:
$(MAKE) -C src/bin/ clean

install: bin
$(MAKE) -C src/bin/ install

indent:
citus_indent


.PHONY: all
.PHONY: bin clean install
111 changes: 111 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---------------------------------------------------------

kgabis/parson 8beeb5ea4da5eedff8d3221307ef04855804a920 - MIT


Copyright (c) 2012 - 2020 Krzysztof Gabis

MIT License

Copyright (c) 2012 - 2020 Krzysztof Gabis

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.


---------------------------------------------------------

---------------------------------------------------------

rxi/log.c f9ea34994bd58ed342d2245cd4110bb5c6790153 - MIT



Copyright (c) 2020 rxi

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


---------------------------------------------------------

mattiasgustavsson/libs a64e6e6f06b7b8392cec5614280f70411282508c - MIT

Copyright (c) 2015 Mattias Gustavsson
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

---------------------------------------------------------


postgres/postgres 9213462c539e6412fe0498a7f8e20b662e15c4ec


PostgreSQL Database Management System
(formerly known as Postgres, then as Postgres95)

Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group

Portions Copyright (c) 1994, The Regents of the University of California

Permission to use, copy, modify, and distribute this software and its
documentation for any purpose, without fee, and without a written agreement
is hereby granted, provided that the above copyright notice and this
paragraph and the following two paragraphs appear in all copies.

IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR
DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS
ON AN "AS IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO
PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

---------------------------------------------------------
82 changes: 82 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# pgcopydb

pgcopydb is a tool that automates running `pg_dump | pg_restore` between two
running Postgres servers. To make a copy of a database to another server as
quickly as possible, one would like to use the parallel options of `pg_dump`
and still be able to stream the data to as many `pg_restore` jobs.

The idea would be to use `pg_dump --jobs=N --format=directory
postgres://user@source/dbname | pg_restore --jobs=N --format=directory -d
postgres://user@target/dbname` in a way. This command line can't be made to
work, unfortunately, because `pg_dump --format=directory` writes to local
files and directories first, and then later `pg_restore --format=directory`
can be used to read from those files again.

When using `pgcopydb` it is possible to achieve the result outlined before
with this simple command line:

```bash
$ pgcopydb copy db --jobs=N --from postgres://user@source/dbname --into postgres://user@target/dbname
```

Then `pgcopydb` implements the following steps:

1. `pgcopydb` produces `pre-data` section and the `post-data` sections of
the dump using Postgres custom format.

2. The `pre-data` section of the dump is restored on the target database,
creating all the Postgres objects from the source database into the
target database.

3. `pgcopydb` gets the list of ordinary and partitioned tables and for
each of them runs a COPY FREEZE job as a sub-process, and starts and
control the sub-processes until all the data has been copied over.

Postgres catalog table pg_class is used to get the list of tables with
data to copy around, and the `reltuples` is used to start with the
tables with the greatest number of rows first, as an attempt to
minimize the copy time.

4. In each copy table sub-process, as soon as the data copying is done,
then `pgcopydb` gets the list of index definitions attached to the
current target table and creates them in parallel.

The primary indexes are created as UNIQUE indexes at this stage.

Then the PRIMARY KEY constraints are created USING the just built
index, allowing the primary key index itself to be created in parallel
with other indexes on the same table.

6. Then VACUUM ANALYZE is run on the each target table as soon as the data
and indexes are all created.

7. The final stage consists now of running the rest of the `post-data`
section script for the whole database, and that's where the foreign key
constraints and other elements are created.

The `post-data` script is filtered out using the `pg_restore
--use-list` option so that indexes and primary key constraints already
created in step 4. are properly skipped now.

This is done by the per-table sub-processes sharing the dump IDs of the
`post-data` items they have created with the main process, which can
then filter out the `pg_restore --list` output and comment the already
created objects from there, by dump ID.

## Dependencies

At run-time `pgcopydb` depends on the `pg_dump` and `pg_restore` tools being
available in the `PATH`. The tools version should match the Postgres version
of the target database.

## Authors

* [Dimitri Fontaine](https://github.com/dimitri)

## License

Copyright (c) The PostgreSQL Global Development Group.

This project is licensed under the PostgreSQL License, see LICENSE file for details.

This project includes bundled third-party dependencies, see NOTICE file for details.
15 changes: 15 additions & 0 deletions src/bin/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the PostgreSQL License.

all: pgcopydb ;

pgcopydb:
$(MAKE) -C pgcopydb pgcopydb

clean:
$(MAKE) -C pgcopydb clean

install: $(pgcopydb)
$(MAKE) -C pgcopydb install

.PHONY: all pgcopydb install clean
29 changes: 29 additions & 0 deletions src/bin/lib/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Vendored-in librairies

## log.c

A very simple lib for handling logs in C is available at

https://github.com/rxi/log.c

It says that

log.c and log.h should be dropped into an existing project and compiled
along with it.

So this directory contains a _vendored-in_ copy of the log.c repository.

## SubCommands.c

The single-header library is used to implement parsing "modern" command lines.

## JSON

The parson librairy at https://github.com/kgabis/parson is a single C file
and MIT licenced. It allows parsing from and serializing to JSON.

## pg

We vendor-in some code from the Postgres project at
https://git.postgresql.org/gitweb/?p=postgresql.git;a=summary. This code is
licenced under The PostgreSQL Licence, a derivative of the BSD licence.
Loading

0 comments on commit 9183353

Please sign in to comment.