bugfix, remove items_.reserve for batch write #248

1261385937 · 2022-11-19T05:52:30Z

Platform: windows
Data: 100w, 10w/per

This is a bug for batch write. It costs too much cpu time.

Before remove, already cost 1min and nothing insert ok. Top hot:

After remove, just cost 18.7s and all insert ok. Top hot:

Enmk

Hi @1261385937, could you please share your performance test?

RN it looks really weird that pre-allocating a vector is slower than appending items to it one-by-one.

1261385937 · 2022-11-21T12:50:50Z

@Enmk, This is the simplest code:

int main() {
	int a = 1;
	uint64_t b = 11;
	std::string c = "11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111";
	std::vector<std::string> d = {
		"22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222",
		"33333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
		"44444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444" };
	std::deque<std::vector<std::string>> e = {
		{ "444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444"
		,"5555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555"
		,"6666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666" },

		{ "777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777"
		,"8888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888"
		,"9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999" },

		{ "1010101010101010101010101010101010101010101010101011010101010101010101010101010101010110101010101010101010101010101010101101010101010101010101010101010101011010101010101010101010101010101"
		,"12121212121212121212121212121212121212121212121121212121212121212121212121121212121212121212121212121121212121212121212121212121121212121212121212121212121121212121212121212121212121121212"
		,"13131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313" },

		{ "1414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141" }
	};


	constexpr size_t total = 100000;
	for (int i = 0; i < 10; i++) {
		clickhouse::Block block;
		auto a_ptr = std::make_shared<clickhouse::ColumnInt32>();
		auto b_ptr = std::make_shared<clickhouse::ColumnUInt64>();
		auto c_ptr = std::make_shared<clickhouse::ColumnString>();
		auto d_ptr = std::make_shared<clickhouse::ColumnArray>(std::make_shared<clickhouse::ColumnString>());
		auto e_ptr = std::make_shared<clickhouse::ColumnArray>(std::make_shared<clickhouse::ColumnArray>(std::make_shared<clickhouse::ColumnString>()));

		for (int j = 0; j < total; j++) {
			a_ptr->Append(a);
			b_ptr->Append(b);
			c_ptr->Append(c);

			auto ds_ptr = std::make_shared<clickhouse::ColumnString>();
			for (auto& dd : d) {
				ds_ptr->Append(dd);
			}
			d_ptr->AppendAsColumn(ds_ptr);

			auto es_ptr = std::make_shared<clickhouse::ColumnArray>(std::make_shared<clickhouse::ColumnString>());
			for (auto& ee : e) {
				auto ees_ptr = std::make_shared<clickhouse::ColumnString>();
				for (auto& eee : ee) {
					ees_ptr->Append(eee);
				}
				es_ptr->AppendAsColumn(ees_ptr);
			}
			e_ptr->AppendAsColumn(es_ptr);
		}
		block.AppendColumn("a", a_ptr);
		block.AppendColumn("b", b_ptr);
		block.AppendColumn("c", c_ptr);
		block.AppendColumn("d", d_ptr);
		block.AppendColumn("e", e_ptr);
	}

	return 0;

1261385937 · 2022-11-21T12:54:01Z

Always reallocate a little bit bigger than previous size, too much memory copy, I think

Enmk · 2022-11-22T10:06:34Z

e_ptr->AppendAsColumn(es_ptr);

I recommend using ColumnArrrayT, it provides type-safe and fast (no excessive slicing, copying, etc) API.

You can use if in nested arrays, like ColumnArrayT<ColumnArrayT<ColumnString>>:

As for the performance issue, I need a bit of time to look at it properly.

Enmk

LGTM, Confirmed performance improvement

Enmk · 2022-11-22T15:17:42Z

Indeed, there is a HUGE difference in performance!

remove items_.reserve for batch write

e76cb29

1261385937 changed the title ~~remove items_.reserve for batch write~~ bugfix, remove items_.reserve for batch write Nov 19, 2022

Enmk requested changes Nov 21, 2022

View reviewed changes

Minor comment

69062c9

Enmk approved these changes Nov 22, 2022

View reviewed changes

Enmk merged commit 51c62ce into ClickHouse:master Nov 22, 2022

1261385937 deleted the bugfix branch November 22, 2022 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix, remove items_.reserve for batch write #248

bugfix, remove items_.reserve for batch write #248

1261385937 commented Nov 19, 2022 •

edited

Loading

Enmk left a comment

1261385937 commented Nov 21, 2022

1261385937 commented Nov 21, 2022

Enmk commented Nov 22, 2022 •

edited

Loading

Enmk left a comment

Enmk commented Nov 22, 2022

bugfix, remove items_.reserve for batch write #248

bugfix, remove items_.reserve for batch write #248

Conversation

1261385937 commented Nov 19, 2022 • edited Loading

Enmk left a comment

Choose a reason for hiding this comment

1261385937 commented Nov 21, 2022

1261385937 commented Nov 21, 2022

Enmk commented Nov 22, 2022 • edited Loading

Enmk left a comment

Choose a reason for hiding this comment

Enmk commented Nov 22, 2022

1261385937 commented Nov 19, 2022 •

edited

Loading

Enmk commented Nov 22, 2022 •

edited

Loading