-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix, remove items_.reserve for batch write #248
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @1261385937, could you please share your performance test?
RN it looks really weird that pre-allocating a vector is slower than appending items to it one-by-one.
@Enmk, This is the simplest code: int main() {
int a = 1;
uint64_t b = 11;
std::string c = "11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111";
std::vector<std::string> d = {
"22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222",
"33333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
"44444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444" };
std::deque<std::vector<std::string>> e = {
{ "444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444"
,"5555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555"
,"6666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666" },
{ "777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777"
,"8888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888"
,"9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999" },
{ "1010101010101010101010101010101010101010101010101011010101010101010101010101010101010110101010101010101010101010101010101101010101010101010101010101010101011010101010101010101010101010101"
,"12121212121212121212121212121212121212121212121121212121212121212121212121121212121212121212121212121121212121212121212121212121121212121212121212121212121121212121212121212121212121121212"
,"13131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313" },
{ "1414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141" }
};
constexpr size_t total = 100000;
for (int i = 0; i < 10; i++) {
clickhouse::Block block;
auto a_ptr = std::make_shared<clickhouse::ColumnInt32>();
auto b_ptr = std::make_shared<clickhouse::ColumnUInt64>();
auto c_ptr = std::make_shared<clickhouse::ColumnString>();
auto d_ptr = std::make_shared<clickhouse::ColumnArray>(std::make_shared<clickhouse::ColumnString>());
auto e_ptr = std::make_shared<clickhouse::ColumnArray>(std::make_shared<clickhouse::ColumnArray>(std::make_shared<clickhouse::ColumnString>()));
for (int j = 0; j < total; j++) {
a_ptr->Append(a);
b_ptr->Append(b);
c_ptr->Append(c);
auto ds_ptr = std::make_shared<clickhouse::ColumnString>();
for (auto& dd : d) {
ds_ptr->Append(dd);
}
d_ptr->AppendAsColumn(ds_ptr);
auto es_ptr = std::make_shared<clickhouse::ColumnArray>(std::make_shared<clickhouse::ColumnString>());
for (auto& ee : e) {
auto ees_ptr = std::make_shared<clickhouse::ColumnString>();
for (auto& eee : ee) {
ees_ptr->Append(eee);
}
es_ptr->AppendAsColumn(ees_ptr);
}
e_ptr->AppendAsColumn(es_ptr);
}
block.AppendColumn("a", a_ptr);
block.AppendColumn("b", b_ptr);
block.AppendColumn("c", c_ptr);
block.AppendColumn("d", d_ptr);
block.AppendColumn("e", e_ptr);
}
return 0; |
Always reallocate a little bit bigger than previous size, too much memory copy, I think |
I recommend using You can use if in nested arrays, like
As for the performance issue, I need a bit of time to look at it properly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Confirmed performance improvement
Indeed, there is a HUGE difference in performance! |
Platform: windows
Data: 100w, 10w/per
This is a bug for batch write. It costs too much cpu time.
Before remove, already cost 1min and nothing insert ok. Top hot:
After remove, just cost 18.7s and all insert ok. Top hot: