Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support of bulk load for the string like HBase bulkload #1301

Open
git-hulk opened this issue Mar 8, 2023 · 19 comments
Open

Add support of bulk load for the string like HBase bulkload #1301

git-hulk opened this issue Mar 8, 2023 · 19 comments
Labels
enhancement type enhancement

Comments

@git-hulk
Copy link
Member

git-hulk commented Mar 8, 2023

Motivation

Many scenarios need to bulk-load mass data regularly, and it may bring heavy workload and latency spike if loads through the API interface. So it will be better if we can offer a way to mitigate this issue.

Solution

We can use RocksDB Ingest SST to bulk load those data and support for simple strings only.

see more discussions in #1628

@git-hulk git-hulk added the enhancement type enhancement label Mar 8, 2023
@zuston
Copy link
Member

zuston commented Mar 9, 2023

Thanks for proposing this. +1 for this feature.

@ColinChamber
Copy link
Contributor

I'm willing to submit a PR!

@git-hulk
Copy link
Member Author

git-hulk commented Apr 3, 2023

@ColinChamber Assigned.

@liucyao1990
Copy link

@git-hulk @ColinChamber Thanks for this PR , Is there any progress?looking forward to this bulkload function

@ColinChamber ColinChamber removed their assignment Jun 9, 2023
@ColinChamber
Copy link
Contributor

Recently I haven't had enough time. Looking forward to others to achieve it. Unassigned. @liucyao1990

@git-hulk
Copy link
Member Author

git-hulk commented Jun 9, 2023

Thanks @ColinChamber for your update.

@jihuayu
Copy link
Member

jihuayu commented Jun 15, 2023

@git-hulk For this feature, we need provide a command to load data, or provide a tool?

In my opinion, there are two steps here.

  1. Create SST files with the data.
  2. Ingest the SST files.

The second step requires stopping the world.

Do we need to support online bulk load? Will there be problems with stopping the world?

@git-hulk
Copy link
Member Author

git-hulk commented Jun 15, 2023

In my opinion, there are two steps here.
Create SST files with the data.
Ingest the SST files.

@jihuayu Yes, you're right. And I think it's good to only support the string type first.

Do we need to support online bulk load? Will there be problems with stopping the world?

My intuitive thought is yes for the online bulk load, even though it will block the write operations when ingesting SSTs.

For this feature, we need provide a command to load data, or provide a tool?

From my side, I would like to support loading the local SSTs via command and also provides a tool to generate SST files. For the tool input file, we can require users to put their data in a specified format like CSV or others.

@jihuayu
Copy link
Member

jihuayu commented Jun 17, 2023

@git-hulk Ok, I'm willing to submit a PR!

@git-hulk
Copy link
Member Author

Thanks @jihuayu, assigned.

@zuston @liucyao1990 Also welcome to provide more input about how to use the bulk load.

@liucyao1990
Copy link

liucyao1990 commented Jun 19, 2023

@git-hulk @jihuayu Hi, here is the bulk load ingestion implementation of Pegasus. https://github.com/apache/incubator-pegasus/pulls?q=label%3Acomponent%2Fbulk_load+. FYI

@git-hulk
Copy link
Member Author

@git-hulk @jihuayu Hi, here is the bulk load ingestion implementation of pegasus. https://github.com/apache/incubator-pegasus/pulls?q=label%3Acomponent%2Fbulk_load+. FYI

Cool, thanks for your input.

@jihuayu
Copy link
Member

jihuayu commented Jun 24, 2023

I will first create the SST generation tool.
we have cluster and replication mode, Ingest SST may be different. I think I can first support Ingest in standalone mode.

@git-hulk
Copy link
Member Author

Yes, that's right. It's good to NOT support the replication for now.

@JackyYangPassion
Copy link

Are there any updates here?

@jihuayu jihuayu removed their assignment Apr 10, 2024
@jihuayu
Copy link
Member

jihuayu commented Apr 10, 2024

@JackyYangPassion No. Do you want to have a try?

@JackyYangPassion
Copy link

JackyYangPassion commented Apr 12, 2024

@JackyYangPassion No. Do you want to have a try?

Okk,
I've been researching how to generate SST files recently.

I looked carefully discussions in #1628

Initially, this function only supports String type?

@git-hulk
Copy link
Member Author

@JackyYangPassion Yes, we would like to support the string first since it's the simplest one. And it's definitely great if can involve other data types.

@jihuayu
Copy link
Member

jihuayu commented Apr 12, 2024

@JackyYangPassion Thank you!
Supporting strings is our first step in the plan. We want to start by creating a basic version to provide to users for their use. This way, we can gather feedback from users on the functionality as early as possible.
In the later stages, we will support more types and functionalities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement type enhancement
Projects
Status: In Progress
Development

No branches or pull requests

6 participants