Releases: ClibMouse/ClickHouse
v22.8.1.1-clib
Release v22.8.1.1-clib
Image is published at icr.io/clickhouse/clickhouse:22.8.1.1-1-clib-ibm
KQL implemented features.
Augest 1, 2022
-
strcmp (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/strcmpfunction)
print strcmp('abc','ABC')
-
parse_url (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/parseurlfunction)
print Result = parse_url('scheme://username:[email protected]:1234/this/is/a/path?k1=v1&k2=v2#fragment')
-
parse_urlquery (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/parseurlqueryfunction)
print Result = parse_urlquery('k1=v1&k2=v2&k3=v3')
-
print operator (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/printoperator)
print x=1, s=strcat('Hello', ', ', 'World!')
-
The following functions now support arbitrary expressions as their argument:
-
Aggregate Functions:
-
make_list()
Customers | summarize t = make_list(FirstName) by FirstName
Customers | summarize t = make_list(FirstName, 10) by FirstName
-
make_list_if()
Customers | summarize t = make_list_if(FirstName, Age > 10) by FirstName
Customers | summarize t = make_list_if(FirstName, Age > 10, 10) by FirstName
-
make_list_with_nulls()
Customers | summarize t = make_list_with_nulls(FirstName) by FirstName
-
make_set()
Customers | summarize t = make_set(FirstName) by FirstName
Customers | summarize t = make_set(FirstName, 10) by FirstName
-
make_set_if()
Customers | summarize t = make_set_if(FirstName, Age > 10) by FirstName
Customers | summarize t = make_set_if(FirstName, Age > 10, 10) by FirstName
-
Default dialect config setting for session and user:
-
Set dialect setting in server configuration XML at user level(
users.xml
). This sets thedialect
at server startup and CH will do query parsing for all users withdefault
profile acording to dialect value.For example:
<profiles> <!-- Default settings. --> <default> <load_balancing>random</load_balancing> <dialect>kusto_auto</dialect> </default>
-
Query can be executed with HTTP client as below once dialect is set in users.xml
echo "KQL query" | curl -sS "http://localhost:8123/?" --data-binary @-
-
To execute the query using clickhouse-client , Update clickhouse-client.xml as below and connect clickhouse-client with --config-file option (
clickhouse-client --config-file=<config-file path>
)<config> <dialect>kusto_auto</dialect> </config>
OR
pass dialect setting with '--'. For example :
clickhouse-client --dialect='kusto_auto' -q "KQL query"
v22.7.1.3-clib
Release v22.7.1.3-clib is a Full-Text Search specific release.
Image is published at icr.io/clickhouse/clickhouse:22.7.1.3-1-clib-ibm
Full-Text Search Release Build
Embedded Full-Text search is enabled by introducing a new inverted index feature into ClickHouse.
This inverted index feature is implemented as a new type of skipping index named GIN.
The implementation is well-aligned with ClickHouse secondary index(skipping index) architecture, including index creating syntax, block stream piping, expression(RPN) evaluation, etc.
The following statements are examples defining GIN index:
CREATE TABLE my_table1 (k UInt64,s String,INDEX my_gin_index(s) TYPE gin(0) GRANULARITY 1) Engine=MergeTree ORDER BY (k)
CREATE TABLE my_table2 (k UInt64,s String,INDEX my_gin_index(s) TYPE gin(3) GRANULARITY 1) Engine=MergeTree ORDER BY (k)
gin() or gin(0) set tokenizer to "tokens", while gin(n) (n is between 2 to 8) indicates using “ngrams(n)” as tokenizer.
The main techniques include single-pass index construction with segmentation, Roaring Bitmap(for postings lists) and FST(for term dictionaries), see the following references:
[1] Heinz, Steffen, and Justin Zobel. 2003. Efficient single-pass index construction for text databases. JASIST 54(8):713–729.
[2] Roaring Bitmap https://github.com/RoaringBitmap/RoaringBitmap
[3] FST(Finite State Transducer) Direct Construction of Minimal Acyclic Subsequential Transducers
There is also a merge tree setting to control the maximum size of data digestion during indexing: max_digestion_size_per_segment (default is 256M)
The following steps demonstrate the inverted index feature using hackernews dataset.
- Create and load hackernews table
CREATE TABLE hackernews ENGINE = MergeTree ORDER BY id
AS SELECT * FROM url('https://datasets.clickhouse.com/hackernews.native.zst', Native,
$$
id UInt32,
deleted UInt8,
type Enum('story' = 1, 'comment' = 2, 'poll' = 3, 'pollopt' = 4, 'job' = 5),
by LowCardinality(String),
time DateTime,
text String,
dead UInt8,
parent UInt32,
poll UInt32,
kids Array(UInt32),
url String,
score Int32,
title String,
parts Array(UInt32),
descendants Int32
$$);
- Create hackernews_gin3 table, which has the same column definitions as hackernews, and with an inverted index.
CREATE TABLE hackernews_gin3
(
id UInt32,
deleted UInt8,
type Enum('story' = 1, 'comment' = 2, 'poll' = 3, 'pollopt' = 4, 'job' = 5),
by LowCardinality(String),
time DateTime,
text String,
dead UInt8,
parent UInt32,
poll UInt32,
kids Array(UInt32),
url String,
score Int32,
title String,
parts Array(UInt32),
descendants Int32,
INDEX gin_index(text) TYPE gin(3) GRANULARITY 1
) ENGINE = MergeTree ORDER BY id
SETTINGS index_granularity=1024;
- Populate hackernews_gin3 table by copying the same data from hackernews table
set max_insert_threads=6;
insert into hackernews_gin3 select * from hackernews;
- Run query against hackernews and hackernews_gin3 for comparison
SELECT * FROM hackernews WHERE text LIKE '%I love clickhouse%';
SELECT * FROM hackernews_gin3 WHERE text LIKE '%I love clickhouse%';
v22.7.1.2-clib
Release v22.7.1.2-clib
- Image is published at icr.io/clickhouse/clickhouse:22.7.1.2-1-clib-ibm
Renamed dialect from sql_dialect to dialect
set dialect='clickhouse'
set dialect='kusto'
set dialect='kusto_auto'
IP functions
- parse_ipv4
"Customers | project parse_ipv4('127.0.0.1')"
- parse_ipv6
"Customers | project parse_ipv6('127.0.0.1')"
Please note that the functions listed below only take constant parameters for now. Further improvement is to be expected to support expressions.
- ipv4_is_private
"Customers | project ipv4_is_private('192.168.1.6/24')"
"Customers | project ipv4_is_private('192.168.1.6')"
- ipv4_is_in_range
"Customers | project ipv4_is_in_range('127.0.0.1', '127.0.0.1')"
"Customers | project ipv4_is_in_range('192.168.1.6', '192.168.1.1/24')"
- ipv4_netmask_suffix
"Customers | project ipv4_netmask_suffix('192.168.1.1/24')"
"Customers | project ipv4_netmask_suffix('192.168.1.1')"
string functions
-
support subquery for
in
orerator (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/in-cs-operator)
(subquery need to be wraped with bracket inside bracket)Customers | where Age in ((Customers|project Age|where Age < 30))
Note: case-insensitive not supported yet -
has_all (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/has-all-operator)
Customers|where Occupation has_any ('Skilled','abcd')
note : subquery not supported yet -
has _any (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/has-anyoperator)
Customers|where Occupation has_all ('Skilled','abcd')
note : subquery not supported yet -
countof (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/countoffunction)
Customers | project countof('The cat sat on the mat', 'at')
Customers | project countof('The cat sat on the mat', 'at', 'normal')
Customers | project countof('The cat sat on the mat', 'at', 'regex')
-
extract ( https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/extractfunction)
Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 0, 'The price of PINEAPPLE ice cream is 20')
Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 1, 'The price of PINEAPPLE ice cream is 20')
Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 2, 'The price of PINEAPPLE ice cream is 20')
Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 3, 'The price of PINEAPPLE ice cream is 20')
Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 2, 'The price of PINEAPPLE ice cream is 20', typeof(real))
-
extract_all (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/extractallfunction)
Customers | project extract_all('(\\w)(\\w+)(\\w)','The price of PINEAPPLE ice cream is 20')
note: captureGroups not supported yet -
split (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/splitfunction)
Customers | project split('aa_bb', '_')
Customers | project split('aaa_bbb_ccc', '_', 1)
Customers | project split('', '_')
Customers | project split('a__b', '_')
Customers | project split('aabbcc', 'bb')
-
strcat_delim (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/strcat-delimfunction)
Customers | project strcat_delim('-', '1', '2', 'A') , 1s)
Customers | project strcat_delim('-', '1', '2', strcat('A','b'))
note: only support string now. -
indexof (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/indexoffunction)
Customers | project indexof('abcdefg','cde')
Customers | project indexof('abcdefg','cde',2)
Customers | project indexof('abcdefg','cde',6)
note: length and occurrence not supported yet
Previously released:
KQL() function
-
create table
CREATE TABLE kql_table4 ENGINE = Memory AS select *, now() as new_column From kql(Customers | project LastName,Age);
verify the content ofkql_table
select * from kql_table
-
insert into table
create a tmp table:CREATE TABLE temp ( FirstName Nullable(String), LastName String, Age Nullable(UInt8) ) ENGINE = Memory;
INSERT INTO temp select * from kql(Customers|project FirstName,LastName,Age);
verify the content oftemp
select * from temp
-
Select from kql()
Select * from kql(Customers|project FirstName)
KQL operators:
- Tabular expression statements
Customers
- Select Column
Customers | project FirstName,LastName,Occupation
- Limit returned results
Customers | project FirstName,LastName,Occupation | take 1 | take 3
- sort, order
Customers | order by Age desc , FirstName asc
- Filter
Customers | where Occupation == 'Skilled Manual'
- summarize
Customers |summarize max(Age) by Occupation
KQL string operators and functions
-
contains
Customers |where Education contains 'degree'
-
!contains
Customers |where Education !contains 'degree'
-
contains_cs
Customers |where Education contains 'Degree'
-
!contains_cs
Customers |where Education !contains 'Degree'
-
endswith
Customers | where FirstName endswith 'RE'
-
!endswith
Customers | where !FirstName endswith 'RE'
-
endswith_cs
Customers | where FirstName endswith_cs 're'
-
!endswith_cs
Customers | where FirstName !endswith_cs 're'
-
==
Customers | where Occupation == 'Skilled Manual'
-
!=
Customers | where Occupation != 'Skilled Manual'
-
has
Customers | where Occupation has 'skilled'
-
!has
Customers | where Occupation !has 'skilled'
-
has_cs
Customers | where Occupation has 'Skilled'
-
!has_cs
Customers | where Occupation !has 'Skilled'
-
hasprefix
Customers | where Occupation hasprefix_cs 'Ab'
-
!hasprefix
Customers | where Occupation !hasprefix_cs 'Ab'
-
hasprefix_cs
Customers | where Occupation hasprefix_cs 'ab'
-
!hasprefix_cs
Customers | where Occupation! hasprefix_cs 'ab'
-
hassuffix
Customers | where Occupation hassuffix 'Ent'
-
!hassuffix
Customers | where Occupation !hassuffix 'Ent'
-
hassuffix_cs
Customers | where Occupation hassuffix 'ent'
-
!hassuffix_cs
Customers | where Occupation hassuffix 'ent'
-
in
Customers |where Education in ('Bachelors','High School')
-
!in
Customers | where Education !in ('Bachelors','High School')
-
matches regex
Customers | where FirstName matches regex 'P.*r'
-
startswith
Customers | where FirstName startswith 'pet'
-
!startswith
Customers | where FirstName !startswith 'pet'
-
startswith_cs
Customers | where FirstName startswith_cs 'pet'
-
!startswith_cs
Customers | where FirstName !startswith_cs 'pet'
-
base64_encode_tostring()
Customers | project base64_encode_tostring('Kusto1') | take 1
-
base64_decode_tostring()
Customers | project base64_decode_tostring('S3VzdG8x') | take 1
-
isempty()
Customers | where isempty(LastName)
-
isnotempty()
Customers | where isnotempty(LastName)
-
isnotnull()
Customers | where isnotnull(FirstName)
-
isnull()
Customers | where isnull(FirstName)
-
url_decode()
Customers | project url_decode('https%3A%2F%2Fwww.test.com%2Fhello%20word') | take 1
-
url_encode()
Customers | project url_encode('https://www.test.com/hello word') | take 1
-
substring()
Customers | project name_abbr = strcat(substring(FirstName,0,3), ' ', substring(LastName,2))
-
strcat()
Customers | project name = strcat(FirstName, ' ', LastName)
-
strlen()
Customers | project FirstName, strlen(FirstName)
-
strrep()
Customers | project strrep(FirstName,2,'_')
-
toupper()
Customers | project toupper(FirstName)
-
tolower()
Customers | project tolower(FirstName)
Aggregate Functions
- avg()
- avgif()
- count()
- countif()
- max()
- maxif()
- min()
- minif()
- sum()
- sumif()
- dcount()
- dcountif()
- bin
v22.7.1.1-clib
Release v22.7.1.1-clib