-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client doesn't send enable_http_compression
parameter in some circumstances
#157
Comments
Just to be clear, the query still works, it's just slower? What are the results with compress=False? I suspect the issue is that you have a ClickHouse server timezone set that is not UTC and you have one or more columns with DateTime/DateTime64 types. This is the most likely reason for the performance problem, since applying timezones is very expensive. You might try Otherwise I would need a query and sample data to reproduce the problem. I'm preparing a release that will not apply the server timezone when the client and server timezones match to handle this issue in locations where only one timezone is common. |
|
I removed all json and time columns. Only strings and arrays remain. Works 26 times slower than in version 0.5.3 |
I reproduced a similar performance problem with mismatched timezones and released a fix for that problem. Otherwise I don't see this performance issue in our client benchmarks or other tests, and no one else has reported it despite many new downloads of 0.5.17 and 0.5.18. Apparently something else is going on in your environment. (1) That much of a degradation might be related to not having the C extensions enabled/working for some reason. Most likely that would show up in any log messages when you run your application. What is the log output when you start your application with clickhouse-connect? (2) If you are running against ClickHouse on the local host, it could be slower because your ClickHouse server is taking time to compress the data. Did you try Also can you run the following script on both versions: #!/usr/bin/env python -u
import time
import clickhouse_connect
from clickhouse_connect.common import version
client = clickhouse_connect.get_client(compress=False, query_limit=0)
start = time.time()
result = client.query(
"""
SELECT number, toDateTime(toUInt32(number + toInt32(now()))) as date,
randomPrintableASCII(256) as string from numbers(5000000)
"""
)
rows = len(result.result_set)
total_time = time.time() - start
print(f'VERSION: {version()}')
print(f'\t\tTime: {total_time:.4f} sec rows: {rows} rows/sec {rows // total_time}') Result on my Mac laptop are:
I'm happy to reopen this if you can provide reproducible example. |
Here fast test: 0.648514986038208 - version 0.5.18 import time
import clickhouse_connect
clickhouse_client = clickhouse_connect.get_client(
host="",
port=,
secure=False,
database=None,
user='',
password="",
query_limit=0,
connect_timeout=20,
compress=True,
session_id=f"test_session_id",
)
clickhouse_client.database = None
query_create_temp_table = """
CREATE TEMPORARY TABLE temp_test_table
(
Filed1 String,
Filed2 String,
Filed3 Array(String),
Field4 Nullable(String)
)"""
clickhouse_client.command(query_create_temp_table)
generate_values = ", ".join([f"('field1_{x}', 'field2_{x}', ['field3_1_{x}', 'field3_2_{x}'], 'field4_{x}' )" for x in range(10000)])
query_insert_temp_table = (
f""" INSERT INTO temp_test_table (Filed1, Filed2, Filed3, Field4) VALUES {generate_values}"""
)
clickhouse_client.command(query_insert_temp_table)
start = time.time()
query_read_temp_table = """ SELECT * FROM temp_test_table"""
result = clickhouse_client.query(query_read_temp_table).result_rows
total_time = time.time() - start
print(total_time)
clickhouse_client.query("DROP TABLE temp_test_table") |
if i make 0.7565817832946777 - verison 0.5.3 and clickhouse on a remote server |
Thanks for the deeper investigation, it's really appreciated. Locally with your query I actually see that 0.5.18 is slightly faster in all cases regardless of compression. (Note that there were improvements to Nullable columns in 0.5.10). Also your current query size of 10k records is probably too small to really measure just clickhouse-connect performance, as many things can add variance in such a short time frame. My slightly modified script: import time
import clickhouse_connect
from clickhouse_connect.common import version
compression = 'zstd'
clickhouse_client = clickhouse_connect.get_client(
user='default',
query_limit=0,
connect_timeout=20,
compress=compression,
session_id=f"test_session_id",
)
query_create_temp_table = """
CREATE TABLE IF NOT EXISTS temp_test_table
(
Field1 String,
Field2 String,
Field3 Array(String),
Field4 Nullable(String)
) ENGINE MergeTree() ORDER BY Field1
"""
clickhouse_client.command(query_create_temp_table)
generate_values = [[f'field1_{x}', f'field2_{x}', [f'field3_1_{x}', f'field3_2_{x}'], 'field4_{x}']
for x in range(2000000)]
clickhouse_client.insert('temp_test_table', generate_values)
start = time.time()
result = clickhouse_client.query('SELECT * FROM temp_test_table')
rows = len(result.result_rows)
total_time = time.time() - start
print(f'VERSION: {version()} COMPRESSION:{compression}')
print(f'\t\tTime: {total_time:.4f} sec rows: {rows} rows/sec {rows // total_time}')
clickhouse_client.query("DROP TABLE temp_test_table") Test results on 2 million rows (locally):
So maybe there is something strange in in your environment since you are connecting to a remote server? I'm really not finding any evidence something changed between versions that would negatively affect performance. |
Thanks! Here is my results with your code:
we'll look for what could be the reason. |
can i provide some settings of my clickhouse for diagnostics? please tell me which ones? |
In our clickhouse:we found the problem. In the old version, the parameter was passed And even if we add Also if after turning on this setting in server if i use Local clickhouse (this):all work fast by default |
The current version should pass the setting I'm trying to figure why this works in 0.5.3 but doesn't work in 0.5.17. There are some code changes in that area but like you say, it works fine locally. What version of ClickHouse are you using on the remote server? Ideally if you could step through this code in httpclient.py with a debugger and see why comp_setting = self._setting_status('enable_http_compression')
self._send_comp_setting = not comp_setting.is_set and comp_setting.is_writable
if comp_setting.is_set or comp_setting.is_writable:
self.compression = compression |
enable_http_compression
parameter in some circumstances
Problem in this line. I think you need make: |
Oh, nice catch! Thanks so much! I stared at that code many times and didn't see that problem. I'll release a fix. |
Describe the bug
When i try run client.query in
clickhouse_connect version 0.5.3, i get result for 0:00:00.812322
clickhouse_connect version 0.5.17, i get result for 0:00:07.987024
Steps to reproduce
Expected behaviour
Code example
clickhouse-connect and/or ClickHouse server logs
Configuration
Environment
ClickHouse server
CREATE TABLE
statements for tables involved:The text was updated successfully, but these errors were encountered: