Extensible copy #2

zhjwpku · 2024-01-27T05:37:47Z

No description provided.

This is a part of making COPY format extendable. See also these past discussions: * New Copy Formats - avro/orc/parquet: https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com * Make COPY extendable in order to support Parquet and other formats: https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com This doesn't change the current behavior. This just introduces CopyToRoutine, which just has function pointers of format implementation like TupleTableSlotOps, and use it for existing "text", "csv" and "binary" format implementations. Note that CopyToRoutine can't be used from extensions yet because CopySend*() aren't exported yet. Extensions can't send formatted data to a destination without CopySend*(). They will be exported by subsequent patches. Here is a benchmark result with/without this change because there was a discussion that we should care about performance regression: https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us > I think that step 1 ought to be to convert the existing formats into > plug-ins, and demonstrate that there's no significant loss of > performance. You can see that there is no significant loss of performance: Data: Random 32 bit integers: CREATE TABLE data (int32 integer); INSERT INTO data SELECT random() * 10000 FROM generate_series(1, ${n_records}); The number of records: 100K, 1M and 10M 100K without this change: format,elapsed time (ms) text,11.002 csv,11.696 binary,11.352 100K with this change: format,elapsed time (ms) text,100000,11.562 csv,100000,11.889 binary,100000,10.825 1M without this change: format,elapsed time (ms) text,108.359 csv,114.233 binary,111.251 1M with this change: format,elapsed time (ms) text,111.269 csv,116.277 binary,104.765 10M without this change: format,elapsed time (ms) text,1090.763 csv,1136.103 binary,1137.141 10M with this change: format,elapsed time (ms) text,1082.654 csv,1196.991 binary,1069.697

This uses the handler approach like tablesample. The approach creates an internal function that returns an internal struct. In this case, a COPY TO handler returns a CopyToRoutine. We will add support for custom COPY FROM format later. We'll use the same handler for COPY TO and COPY FROM. PostgreSQL calls a COPY TO/FROM handler with "is_from" argument. It's true for COPY FROM and false for COPY TO: copy_handler(true) returns CopyToRoutine copy_handler(false) returns CopyFromRoutine (not exist yet) We discussed that we introduce a wrapper struct for it: typedef struct CopyRoutine { NodeTag type; /* either CopyToRoutine or CopyFromRoutine */ Node *routine; } copy_handler(true) returns CopyRoutine with CopyToRoutine copy_handler(false) returns CopyRoutine with CopyFromRoutine See also: https://www.postgresql.org/message-id/flat/CAD21AoCunywHird3GaPzWe6s9JG1wzxj3Cr6vGN36DDheGjOjA%40mail.gmail.com But I noticed that we don't need the wrapper struct. We can just CopyToRoutine or CopyFromRoutine. Because we can distinct the returned struct by checking its NodeTag. So I don't use the wrapper struct approach.

It's for custom COPY TO format handlers implemented as extension. This just moves codes. This doesn't change codes except CopyDest enum values. CopyDest enum values such as COPY_FILE are conflicted CopySource enum values defined in copyfrom_internal.h. So COPY_DEST_ prefix instead of COPY_ prefix is used. For example, COPY_FILE is renamed to COPY_DEST_FILE. Note that this change isn't enough to implement a custom COPY TO format handler as extension. We'll do the followings in a subsequent commit: 1. Add an opaque space for custom COPY TO format handler 2. Export CopySendEndOfRow() to flush buffer

* Add CopyToStateData::opaque that can be used to keep data for custom COPY TO format implementation * Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf * Rename CopySendEndOfRow() to CopyToStateFlush() because it's a method for CopyToState and it's used for flushing. End-of-row related codes were moved to CopyToTextSendEndOfRow().

This doesn't change the current behavior. This just introduces CopyFromRoutine, which just has function pointers of format implementation like TupleTableSlotOps, and use it for existing "text", "csv" and "binary" format implementations. Note that CopyFromRoutine can't be used from extensions yet because CopyRead*() aren't exported yet. Extensions can't read data from a source without CopyRead*(). They will be exported by subsequent patches.

We use the same approach as we used for custom COPY TO format. Now, custom COPY format handler can return COPY TO format routines or COPY FROM format routines based on the "is_from" argument: copy_handler(true) returns CopyToRoutine copy_handler(false) returns CopyFromRoutine

It's for custom COPY FROM format handlers implemented as extension. This just moves codes. This doesn't change codes except CopySource enum values. CopySource enum values changes aren't required but I did like I did for CopyDest enum values. I changed COPY_ prefix to COPY_SOURCE_ prefix. For example, COPY_FILE to COPY_SOURCE_FILE. Note that this change isn't enough to implement a custom COPY FROM format handler as extension. We'll do the followings in a subsequent commit: 1. Add an opaque space for custom COPY FROM format handler 2. Export CopyReadBinaryData() to read the next data

* Add CopyFromStateData::opaque that can be used to keep data for custom COPY From format implementation * Export CopyReadBinaryData() to read the next data * Rename CopyReadBinaryData() to CopyFromStateRead() because it's a method for CopyFromState and "BinaryData" is redundant.

Signed-off-by: Zhao Junwang <[email protected]>

1. TruncateMultiXact() performs the SLRU truncations in a critical section. Deleting the SLRU segments calls ForwardSyncRequest(), which will try to compact the request queue if it's full (CompactCheckpointerRequestQueue()). That in turn allocates memory, which is not allowed in a critical section. Backtrace: TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: "../src/backend/utils/mmgr/mcxt.c", Line: 1353, PID: 920981 postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e] postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d] postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e] postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb] postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a] postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1] postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b] postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3] postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66] postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d] postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead] postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e] postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb] postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e] /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45] postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31] To fix, bail out in CompactCheckpointerRequestQueue() without doing anything, if it's called in a critical section. That covers the above call path, as well as any other similar cases where RegisterSyncRequest might be called in a critical section. 2. After fixing that, another problem became apparent: Autovacuum process doing that truncation can deadlock with the checkpointer process. TruncateMultiXact() sets "MyProc->delayChkptFlags |= DELAY_CHKPT_START". If the sync request queue is full and cannot be compacted, the process will repeatedly sleep and retry, until there is room in the queue. However, if the checkpointer is trying to start a checkpoint at the same time, and is waiting for the DELAY_CHKPT_START processes to finish, the queue will never shrink. More concretely, the autovacuum process is stuck here: #0 0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=<optimized out>) at ../src/backend/storage/ipc/latch.c:1570 #2 WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=<optimized out>, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516 postgres#3 0x000056220b243224 in WaitLatch (latch=<optimized out>, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:538 postgres#4 0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614 postgres#5 0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495 postgres#6 0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566 postgres#7 0x000056220af98e1b in PerformMembersTruncation (oldestOffset=<optimized out>, newOldestOffset=<optimized out>) at ../src/backend/access/transam/multixact.c:3006 postgres#8 TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201 postgres#9 0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=<optimized out>, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917 postgres#10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760 postgres#11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550 postgres#12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/autovacuum.c:1569 and the checkpointer is stuck here: #0 0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50 postgres#3 0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098 postgres#4 0x000056220b1c6e86 in CheckpointerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/checkpointer.c:464 To fix, add AbsorbSyncRequests() to the loops where the checkpointer waits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to finish. Backpatch to v14. Before that, SLRU deletion didn't call RegisterSyncRequest, which avoided this failure. I'm not sure if there are other similar scenarios on older versions, but we haven't had any such reports. Discussion: https://www.postgresql.org/message-id/[email protected]

kou and others added 10 commits January 26, 2024 22:14

change CopyToGetFormat to CopyToSendCopyBegin and export more api

f0a8151

Signed-off-by: Zhao Junwang <[email protected]>

introduce contrib/pg_copy_json

7dc6c1c

Signed-off-by: Zhao Junwang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extensible copy #2

Extensible copy #2

zhjwpku commented Jan 27, 2024

Extensible copy #2

Are you sure you want to change the base?

Extensible copy #2

Conversation

zhjwpku commented Jan 27, 2024