Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Implement Async Flush for COPY command. #11628

Closed
nathanhjli opened this issue Mar 2, 2022 · 0 comments
Closed

[YSQL] Implement Async Flush for COPY command. #11628

nathanhjli opened this issue Mar 2, 2022 · 0 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL)

Comments

@nathanhjli
Copy link
Contributor

Description

Currently, we synchronously wait for a flush response every time we flush. We want to make this asynchronous to reduce the time spent waiting and improve the performance of COPY.

@nathanhjli nathanhjli added the area/ysql Yugabyte SQL (YSQL) label Mar 2, 2022
@nathanhjli nathanhjli self-assigned this Mar 2, 2022
nathanhjli added a commit that referenced this issue Mar 10, 2022
Summary:
Currently, as part of any statement, YSQL does some processing and buffers
writes. The write buffer is flushed once either of the below conditions is hit -

(1) the write buffer is full (i.e., hits ysql_session_max_batch_size limit)
(2) a read op is required

On a flush, YSQL directs the writes to required tablet servers in different rpcs
(all issued in parallel). Only once responses to all RPCs are received, the YSQL
backend makes further progress. This waiting behaviour affects performance of
bulk loading using COPY FROM because YSQL spends a lot of time waiting for
responses. It would be ideal to use that wait time for reading further tuples from
the input source and perform necessary processing.

In this diff, we are adding some asynchrony to the flush to allow the YSQL's
COPY FROM to read more tuples after sending a set of rpcs to tablet servers
(without waiting for the responses).

This is done by storing the flush future and not waiting for its result
immediately. Only when YSQL refills its write buffer, it will wait for the
earlier flush's result just before performing the next flush call. Note that the
right choice of ysql_session_max_batch_size is required to help us mask almost
all of the wait time. The optimal batch size is one in which both of the
following tasks (which will run simultaneously after this diff) take almost the
same time -

(1) YSQL fetching and buffering ysql_session_max_batch_size rows
(2) Sending rpcs for the previous ysql_session_max_batch_size rows and arrival
of responses from the tserver

Note also that there might not be any value of ysql_session_max_batch_size for
which both tasks complete at roughly the same time. This could be due to the
inherently different speeds of disk reading and tablet servers' performance.

Test Plan: Tested manually locally and on portal clusters. Experiments show that there is generally a 20-25% increase in speed when using async flush versus using regular flushing.

Reviewers: kannan, smishra, pjain

Reviewed By: pjain

Subscribers: mtakahara, zyu, lnguyen, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D15757
nathanhjli added a commit that referenced this issue Mar 15, 2022
Summary:
This reverts commit 1a3a344.

Reverting current implementation of async flush changes so that we can refactor, fix potential bugs, and improve implementation details.

Test Plan:
Jenkins: urgent

Built and run a COPY locally to verify it worked.

Reviewers: pjain

Reviewed By: pjain

Subscribers: dmitry, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D15975
nathanhjli added a commit to nathanhjli/yugabyte-db that referenced this issue Mar 16, 2022
…r pg_session, indexes supported

Summary: Working on flakiness still (Aborted: backfill connection to DB failed), but getting this diff out for initial reviews and opinions. One thing to note is that the initial pipeline where we passed async flush seems to be less flakier since we can control exactly when we want to use async flush.

Test Plan:
Built locally and tested by creating indexes and performing COPY FROM.

Also added java test:
./yb_build.sh --java-test org.yb.pgsql.TestAsyncFlush

Reviewers: pjain, dmitry

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D16005
d-uspenskiy added a commit that referenced this issue Mar 25, 2022
Summary: To simplify the code of `PgSession` class and simplify further implementation of write operation buffering subsystem improvements (issue #11628) code related to buffering is moved into separate class `PgOperationBuffer`. Current functionality of buffering subsystem is preserved (in general).

Test Plan: Jenkins

Reviewers: nli, pjain

Reviewed By: pjain

Differential Revision: https://phabricator.dev.yugabyte.com/D16083
nathanhjli added a commit that referenced this issue Apr 11, 2022
Summary:
Currently, as part of any statement, YSQL does some processing and buffers
writes. The write buffer is flushed once either of the below conditions is hit -

(1) the write buffer is full (i.e., hits ysql_session_max_batch_size limit)
(2) a read op is required

On a flush, YSQL directs the writes to required tablet servers in different rpcs
(all issued in parallel). Only once responses to all RPCs are received, the YSQL
backend makes further progress. This waiting behaviour affects performance of
bulk loading using COPY FROM because YSQL spends a lot of time waiting for
responses. It would be ideal to use that wait time for reading further tuples from
the input source and perform necessary processing.

In this diff, we are adding some asynchrony to the flush to allow the YSQL's
COPY FROM to read more tuples after sending a set of rpcs to tablet servers
(without waiting for the responses).

This is done by storing the flush future and not waiting for its result
immediately. Only when YSQL refills its write buffer, it will wait for the
earlier flush's result just before performing the next flush call. Note that the
right choice of ysql_session_max_batch_size is required to help us mask almost
all of the wait time. The optimal batch size is one in which both of the
following tasks (which will run simultaneously after this diff) take almost the
same time -

(1) YSQL fetching and buffering ysql_session_max_batch_size rows
(2) Sending rpcs for the previous ysql_session_max_batch_size rows and arrival
of responses from the tserver

Note also that there might not be any value of ysql_session_max_batch_size for
which both tasks complete at roughly the same time. This could be due to the
inherently different speeds of disk reading and tablet servers' performance.

Test Plan:
Built locally and tested by creating indexes and performing COPY FROM. Previous experiments on portal clusters show that there is generally a 30% increase in speed when using async flush versus using regular flushing.

Also Jenkins tests since this is a general enhancement that is used everywhere.

Reviewers: dmitry, pjain

Reviewed By: dmitry, pjain

Subscribers: jason, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D16005
@pkj415 pkj415 closed this as completed Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL)
Projects
None yet
Development

No branches or pull requests

3 participants