[New Feature] Faster Bulk-Data Loading in YugabyteDB

Jira Link: [DB-4641](https://yugabyte.atlassian.net/browse/DB-4641)
### Description

Master issue to track improvements to make it easier and faster to get large amounts of data into YugabyteDB.

###  Phase 1
| Status | Feature        | GitHub Issue | Comments |
| :--: | --------------- | ------------------  | ------------------ |
| ✅ | Faster non transactional writes during bulk load  | #7809 | Allowing faster writes on copy command by using session variable "yb_force_non_transactional_writes". |
| ✅ | Disable transactional writes during bulk data loading for indexes   | #11266|  Add yb_disable_transactional_writes session to improve the latency performance of bulk data loading for index tables such as when COPY command is used which goes into the insert write path (not delete or update).|
| ✅  |  Implement Async Flush for COPY command | #11628 | Currently, we synchronously wait for a flush response every time we flush. We want to make this asynchronous to reduce the time spent waiting and improve the performance of COPY.|
| ✅ | Speed up YSQL inserts by skipping lookup of keys being inserted | #11269 | During bulk load (for example inserts by Copy command), skip lookup of the key being inserted, to speed up the inserts. This is similar to the upsert mode that is supported for YCQL. |
| ✅  | Optimize memory allocation/deallocation in bulk insert/copy using Protobuf's arena | #11720 |  Currently when running bulk insert / copy command, in the PostgreSQL backend for, about 15 percent of CPU time is spent on memory allocation / deallocation.|
| ✅  | Perf improvement by eliminating serialization to the WAL format. | #11409 | When writing data to the RocksDb layer, there are additional steps of serializing to the WAL format which is unnecessary and leads to wasted work.|
| ✅  |  Tuning parameters for faster copy performance | #12293 | Tuning parameters for faster copy performance |
| ✅  | Pack columns in DocDB storage format for better performance| #3520 | Packing columns into a single RocksDB entry per row instead of one per column (as we do currently) improves YSQL performance |
| ⬜️  | Parallelize copy command | #11453 | Distribute copy operation internally using multiple workers |


###  Phase 2
| Status | Feature        | GitHub Issue | Comments |
| :--: | --------------- | ------------------  | ------------------ |
| ⬜️  |  Streaming ingest to YugabyteDB without using JDBC | |Inserting around 1 billion records through the streaming interface every day. It will be inefficient to transfer this huge volume of records over the JDBC interface. It could be implementing Spark RDD write interface.|


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[New Feature] Faster Bulk-Data Loading in YugabyteDB #11765

Description

Phase 1

Phase 2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Status	Feature	GitHub Issue	Comments
✅	Faster non transactional writes during bulk load	#7809	Allowing faster writes on copy command by using session variable "yb_force_non_transactional_writes".
✅	Disable transactional writes during bulk data loading for indexes	#11266	Add yb_disable_transactional_writes session to improve the latency performance of bulk data loading for index tables such as when COPY command is used which goes into the insert write path (not delete or update).
✅	Implement Async Flush for COPY command	#11628	Currently, we synchronously wait for a flush response every time we flush. We want to make this asynchronous to reduce the time spent waiting and improve the performance of COPY.
✅	Speed up YSQL inserts by skipping lookup of keys being inserted	#11269	During bulk load (for example inserts by Copy command), skip lookup of the key being inserted, to speed up the inserts. This is similar to the upsert mode that is supported for YCQL.
✅	Optimize memory allocation/deallocation in bulk insert/copy using Protobuf's arena	#11720	Currently when running bulk insert / copy command, in the PostgreSQL backend for, about 15 percent of CPU time is spent on memory allocation / deallocation.
✅	Perf improvement by eliminating serialization to the WAL format.	#11409	When writing data to the RocksDb layer, there are additional steps of serializing to the WAL format which is unnecessary and leads to wasted work.
✅	Tuning parameters for faster copy performance	#12293	Tuning parameters for faster copy performance
✅	Pack columns in DocDB storage format for better performance	#3520	Packing columns into a single RocksDB entry per row instead of one per column (as we do currently) improves YSQL performance
⬜️	Parallelize copy command	#11453	Distribute copy operation internally using multiple workers

[New Feature] Faster Bulk-Data Loading in YugabyteDB #11765

Description

Description

Phase 1

Phase 2

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions