Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix, remove items_.reserve for batch write #248

Merged
merged 2 commits into from
Nov 22, 2022
Merged

bugfix, remove items_.reserve for batch write #248

merged 2 commits into from
Nov 22, 2022

Conversation

1261385937
Copy link
Contributor

@1261385937 1261385937 commented Nov 19, 2022

Platform: windows
Data: 100w, 10w/per

This is a bug for batch write. It costs too much cpu time.

Before remove, already cost 1min and nothing insert ok. Top hot:
a34d24f7c98b5c8408407311dc6a876

After remove, just cost 18.7s and all insert ok. Top hot:
38e437dd519aa742d08d1bb8c7e26fa

@1261385937 1261385937 changed the title remove items_.reserve for batch write bugfix, remove items_.reserve for batch write Nov 19, 2022
Copy link
Collaborator

@Enmk Enmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @1261385937, could you please share your performance test?

RN it looks really weird that pre-allocating a vector is slower than appending items to it one-by-one.

@1261385937
Copy link
Contributor Author

@Enmk, This is the simplest code:

int main() {
	int a = 1;
	uint64_t b = 11;
	std::string c = "11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111";
	std::vector<std::string> d = {
		"22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222",
		"33333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333",
		"44444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444" };
	std::deque<std::vector<std::string>> e = {
		{ "444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444"
		,"5555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555"
		,"6666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666" },

		{ "777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777"
		,"8888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888"
		,"9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999" },

		{ "1010101010101010101010101010101010101010101010101011010101010101010101010101010101010110101010101010101010101010101010101101010101010101010101010101010101011010101010101010101010101010101"
		,"12121212121212121212121212121212121212121212121121212121212121212121212121121212121212121212121212121121212121212121212121212121121212121212121212121212121121212121212121212121212121121212"
		,"13131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313131313" },

		{ "1414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141" }
	};


	constexpr size_t total = 100000;
	for (int i = 0; i < 10; i++) {
		clickhouse::Block block;
		auto a_ptr = std::make_shared<clickhouse::ColumnInt32>();
		auto b_ptr = std::make_shared<clickhouse::ColumnUInt64>();
		auto c_ptr = std::make_shared<clickhouse::ColumnString>();
		auto d_ptr = std::make_shared<clickhouse::ColumnArray>(std::make_shared<clickhouse::ColumnString>());
		auto e_ptr = std::make_shared<clickhouse::ColumnArray>(std::make_shared<clickhouse::ColumnArray>(std::make_shared<clickhouse::ColumnString>()));

		for (int j = 0; j < total; j++) {
			a_ptr->Append(a);
			b_ptr->Append(b);
			c_ptr->Append(c);

			auto ds_ptr = std::make_shared<clickhouse::ColumnString>();
			for (auto& dd : d) {
				ds_ptr->Append(dd);
			}
			d_ptr->AppendAsColumn(ds_ptr);

			auto es_ptr = std::make_shared<clickhouse::ColumnArray>(std::make_shared<clickhouse::ColumnString>());
			for (auto& ee : e) {
				auto ees_ptr = std::make_shared<clickhouse::ColumnString>();
				for (auto& eee : ee) {
					ees_ptr->Append(eee);
				}
				es_ptr->AppendAsColumn(ees_ptr);
			}
			e_ptr->AppendAsColumn(es_ptr);
		}
		block.AppendColumn("a", a_ptr);
		block.AppendColumn("b", b_ptr);
		block.AppendColumn("c", c_ptr);
		block.AppendColumn("d", d_ptr);
		block.AppendColumn("e", e_ptr);
	}

	return 0;

@1261385937
Copy link
Contributor Author

Always reallocate a little bit bigger than previous size, too much memory copy, I think

@Enmk
Copy link
Collaborator

Enmk commented Nov 22, 2022

e_ptr->AppendAsColumn(es_ptr);

I recommend using ColumnArrrayT, it provides type-safe and fast (no excessive slicing, copying, etc) API.

You can use if in nested arrays, like ColumnArrayT<ColumnArrayT<ColumnString>>:

As for the performance issue, I need a bit of time to look at it properly.

Copy link
Collaborator

@Enmk Enmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Confirmed performance improvement

@Enmk
Copy link
Collaborator

Enmk commented Nov 22, 2022

Indeed, there is a HUGE difference in performance!

@Enmk Enmk merged commit 51c62ce into ClickHouse:master Nov 22, 2022
@1261385937 1261385937 deleted the bugfix branch November 22, 2022 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants