Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extensible copy #2

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Extensible copy #2

wants to merge 10 commits into from

Conversation

zhjwpku
Copy link
Owner

@zhjwpku zhjwpku commented Jan 27, 2024

No description provided.

kou and others added 10 commits January 26, 2024 22:14
This is a part of making COPY format extendable. See also these past
discussions:
* New Copy Formats - avro/orc/parquet:
  https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com
* Make COPY extendable in order to support Parquet and other formats:
  https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com

This doesn't change the current behavior. This just introduces
CopyToRoutine, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"csv" and "binary" format implementations.

Note that CopyToRoutine can't be used from extensions yet because
CopySend*() aren't exported yet. Extensions can't send formatted data
to a destination without CopySend*(). They will be exported by
subsequent patches.

Here is a benchmark result with/without this change because there was
a discussion that we should care about performance regression:

https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us

> I think that step 1 ought to be to convert the existing formats into
> plug-ins, and demonstrate that there's no significant loss of
> performance.

You can see that there is no significant loss of performance:

Data: Random 32 bit integers:

    CREATE TABLE data (int32 integer);
    INSERT INTO data
      SELECT random() * 10000
        FROM generate_series(1, ${n_records});

The number of records: 100K, 1M and 10M

100K without this change:

    format,elapsed time (ms)
    text,11.002
    csv,11.696
    binary,11.352

100K with this change:

    format,elapsed time (ms)
    text,100000,11.562
    csv,100000,11.889
    binary,100000,10.825

1M without this change:

    format,elapsed time (ms)
    text,108.359
    csv,114.233
    binary,111.251

1M with this change:

    format,elapsed time (ms)
    text,111.269
    csv,116.277
    binary,104.765

10M without this change:

    format,elapsed time (ms)
    text,1090.763
    csv,1136.103
    binary,1137.141

10M with this change:

    format,elapsed time (ms)
    text,1082.654
    csv,1196.991
    binary,1069.697
This uses the handler approach like tablesample. The approach creates
an internal function that returns an internal struct. In this case,
a COPY TO handler returns a CopyToRoutine.

We will add support for custom COPY FROM format later. We'll use the
same handler for COPY TO and COPY FROM. PostgreSQL calls a COPY
TO/FROM handler with "is_from" argument. It's true for COPY FROM and
false for COPY TO:

    copy_handler(true) returns CopyToRoutine
    copy_handler(false) returns CopyFromRoutine (not exist yet)

We discussed that we introduce a wrapper struct for it:

    typedef struct CopyRoutine
    {
        NodeTag type;
        /* either CopyToRoutine or CopyFromRoutine */
        Node *routine;
    }

    copy_handler(true) returns CopyRoutine with CopyToRoutine
    copy_handler(false) returns CopyRoutine with CopyFromRoutine

See also: https://www.postgresql.org/message-id/flat/CAD21AoCunywHird3GaPzWe6s9JG1wzxj3Cr6vGN36DDheGjOjA%40mail.gmail.com

But I noticed that we don't need the wrapper struct. We can just
CopyToRoutine or CopyFromRoutine. Because we can distinct the returned
struct by checking its NodeTag. So I don't use the wrapper struct
approach.
It's for custom COPY TO format handlers implemented as extension.

This just moves codes. This doesn't change codes except CopyDest enum
values. CopyDest enum values such as COPY_FILE are conflicted
CopySource enum values defined in copyfrom_internal.h. So COPY_DEST_
prefix instead of COPY_ prefix is used. For example, COPY_FILE is
renamed to COPY_DEST_FILE.

Note that this change isn't enough to implement a custom COPY TO
format handler as extension. We'll do the followings in a subsequent
commit:

1. Add an opaque space for custom COPY TO format handler
2. Export CopySendEndOfRow() to flush buffer
* Add CopyToStateData::opaque that can be used to keep data for custom
  COPY TO format implementation
* Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf
* Rename CopySendEndOfRow() to CopyToStateFlush() because it's a
  method for CopyToState and it's used for flushing. End-of-row related
  codes were moved to CopyToTextSendEndOfRow().
This doesn't change the current behavior. This just introduces
CopyFromRoutine, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"csv" and "binary" format implementations.

Note that CopyFromRoutine can't be used from extensions yet because
CopyRead*() aren't exported yet. Extensions can't read data from a
source without CopyRead*(). They will be exported by subsequent
patches.
We use the same approach as we used for custom COPY TO format. Now,
custom COPY format handler can return COPY TO format routines or COPY
FROM format routines based on the "is_from" argument:

    copy_handler(true) returns CopyToRoutine
    copy_handler(false) returns CopyFromRoutine
It's for custom COPY FROM format handlers implemented as extension.

This just moves codes. This doesn't change codes except CopySource
enum values. CopySource enum values changes aren't required but I did
like I did for CopyDest enum values. I changed COPY_ prefix to
COPY_SOURCE_ prefix. For example, COPY_FILE to COPY_SOURCE_FILE.

Note that this change isn't enough to implement a custom COPY FROM
format handler as extension. We'll do the followings in a subsequent
commit:

1. Add an opaque space for custom COPY FROM format handler
2. Export CopyReadBinaryData() to read the next data
* Add CopyFromStateData::opaque that can be used to keep data for
  custom COPY From format implementation
* Export CopyReadBinaryData() to read the next data
* Rename CopyReadBinaryData() to CopyFromStateRead() because it's a
  method for CopyFromState and "BinaryData" is redundant.
zhjwpku pushed a commit that referenced this pull request Jul 28, 2024
1. TruncateMultiXact() performs the SLRU truncations in a critical
section. Deleting the SLRU segments calls ForwardSyncRequest(), which
will try to compact the request queue if it's full
(CompactCheckpointerRequestQueue()). That in turn allocates memory,
which is not allowed in a critical section. Backtrace:

    TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: "../src/backend/utils/mmgr/mcxt.c", Line: 1353, PID: 920981
    postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e]
    postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d]
    postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e]
    postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb]
    postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a]
    postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1]
    postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b]
    postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3]
    postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66]
    postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d]
    postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead]
    postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e]
    postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb]
    postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e]
    /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45]
    postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31]

To fix, bail out in CompactCheckpointerRequestQueue() without doing
anything, if it's called in a critical section. That covers the above
call path, as well as any other similar cases where
RegisterSyncRequest might be called in a critical section.

2. After fixing that, another problem became apparent: Autovacuum
process doing that truncation can deadlock with the checkpointer
process. TruncateMultiXact() sets "MyProc->delayChkptFlags |=
DELAY_CHKPT_START". If the sync request queue is full and cannot be
compacted, the process will repeatedly sleep and retry, until there is
room in the queue. However, if the checkpointer is trying to start a
checkpoint at the same time, and is waiting for the DELAY_CHKPT_START
processes to finish, the queue will never shrink.

More concretely, the autovacuum process is stuck here:

    #0  0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
    #1  0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=<optimized out>) at ../src/backend/storage/ipc/latch.c:1570
    #2  WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=<optimized out>, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1,
        wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516
    postgres#3  0x000056220b243224 in WaitLatch (latch=<optimized out>, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949)
        at ../src/backend/storage/ipc/latch.c:538
    postgres#4  0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614
    postgres#5  0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495
    postgres#6  0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566
    postgres#7  0x000056220af98e1b in PerformMembersTruncation (oldestOffset=<optimized out>, newOldestOffset=<optimized out>) at ../src/backend/access/transam/multixact.c:3006
    postgres#8  TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201
    postgres#9  0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=<optimized out>, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917
    postgres#10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760
    postgres#11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550
    postgres#12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/autovacuum.c:1569

and the checkpointer is stuck here:

    #0  0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
    #1  0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
    #2  0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50
    postgres#3  0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098
    postgres#4  0x000056220b1c6e86 in CheckpointerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/checkpointer.c:464

To fix, add AbsorbSyncRequests() to the loops where the checkpointer
waits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to
finish.

Backpatch to v14. Before that, SLRU deletion didn't call
RegisterSyncRequest, which avoided this failure. I'm not sure if there
are other similar scenarios on older versions, but we haven't had
any such reports.

Discussion: https://www.postgresql.org/message-id/[email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants