Releases · ClibMouse/ClickHouse

04 Aug 00:53

ch-devops

v22.8.1.1-clib

68ba390

v22.8.1.1-clib

Release v22.8.1.1-clib
Image is published at icr.io/clickhouse/clickhouse:22.8.1.1-1-clib-ibm

KQL implemented features.

Augest 1, 2022

strcmp (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/strcmpfunction)
print strcmp('abc','ABC')
parse_url (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/parseurlfunction)
print Result = parse_url('scheme://username:[email protected]:1234/this/is/a/path?k1=v1&k2=v2#fragment')
parse_urlquery (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/parseurlqueryfunction)
print Result = parse_urlquery('k1=v1&k2=v2&k3=v3')
print operator (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/printoperator)
print x=1, s=strcat('Hello', ', ', 'World!')
The following functions now support arbitrary expressions as their argument:
Aggregate Functions:
make_list()
Customers | summarize t = make_list(FirstName) by FirstName
Customers | summarize t = make_list(FirstName, 10) by FirstName
make_list_if()
Customers | summarize t = make_list_if(FirstName, Age > 10) by FirstName
Customers | summarize t = make_list_if(FirstName, Age > 10, 10) by FirstName
make_list_with_nulls()
Customers | summarize t = make_list_with_nulls(FirstName) by FirstName
make_set()
Customers | summarize t = make_set(FirstName) by FirstName
Customers | summarize t = make_set(FirstName, 10) by FirstName
make_set_if()
Customers | summarize t = make_set_if(FirstName, Age > 10) by FirstName
Customers | summarize t = make_set_if(FirstName, Age > 10, 10) by FirstName
Default dialect config setting for session and user:
Set dialect setting in server configuration XML at user level(users.xml). This sets the dialect at server startup and CH will do query parsing for all users with default profile acording to dialect value.

For example:
<profiles>  <default> <load_balancing>random</load_balancing> <dialect>kusto_auto</dialect> </default>
Query can be executed with HTTP client as below once dialect is set in users.xml
echo "KQL query" | curl -sS "http://localhost:8123/?" --data-binary @-
To execute the query using clickhouse-client , Update clickhouse-client.xml as below and connect clickhouse-client with --config-file option (clickhouse-client --config-file=<config-file path>)

<config> <dialect>kusto_auto</dialect> </config>

OR
pass dialect setting with '--'. For example :
clickhouse-client --dialect='kusto_auto' -q "KQL query"

Assets 5

21 Jul 22:03

ch-devops

v22.7.1.3-clib

fe76556

v22.7.1.3-clib

Release v22.7.1.3-clib is a Full-Text Search specific release.
Image is published at icr.io/clickhouse/clickhouse:22.7.1.3-1-clib-ibm

Full-Text Search Release Build

Embedded Full-Text search is enabled by introducing a new inverted index feature into ClickHouse.
This inverted index feature is implemented as a new type of skipping index named GIN.
The implementation is well-aligned with ClickHouse secondary index(skipping index) architecture, including index creating syntax, block stream piping, expression(RPN) evaluation, etc.

The following statements are examples defining GIN index:

CREATE TABLE my_table1 (k UInt64,s String,INDEX my_gin_index(s) TYPE gin(0) GRANULARITY 1)  Engine=MergeTree ORDER BY (k)
CREATE TABLE my_table2 (k UInt64,s String,INDEX my_gin_index(s) TYPE gin(3) GRANULARITY 1)  Engine=MergeTree ORDER BY (k)

gin() or gin(0) set tokenizer to "tokens", while gin(n) (n is between 2 to 8) indicates using “ngrams(n)” as tokenizer.

The main techniques include single-pass index construction with segmentation, Roaring Bitmap(for postings lists) and FST(for term dictionaries), see the following references:
[1] Heinz, Steffen, and Justin Zobel. 2003. Efficient single-pass index construction for text databases. JASIST 54(8):713–729.
[2] Roaring Bitmap https://github.com/RoaringBitmap/RoaringBitmap
[3] FST(Finite State Transducer) Direct Construction of Minimal Acyclic Subsequential Transducers

There is also a merge tree setting to control the maximum size of data digestion during indexing: max_digestion_size_per_segment (default is 256M)

The following steps demonstrate the inverted index feature using hackernews dataset.

Create and load hackernews table

CREATE TABLE hackernews ENGINE = MergeTree ORDER BY id
          AS SELECT * FROM url('https://datasets.clickhouse.com/hackernews.native.zst', Native,
          $$
          id UInt32,
          deleted UInt8,
          type Enum('story' = 1, 'comment' = 2, 'poll' = 3, 'pollopt' = 4, 'job' = 5),
          by LowCardinality(String),
          time DateTime,
          text String,
          dead UInt8,
          parent UInt32,
          poll UInt32,
          kids Array(UInt32),
          url String,
          score Int32,
          title String,
          parts Array(UInt32),
          descendants Int32
          $$);

Create hackernews_gin3 table, which has the same column definitions as hackernews, and with an inverted index.

CREATE TABLE hackernews_gin3
(
          id UInt32,
          deleted UInt8,
          type Enum('story' = 1, 'comment' = 2, 'poll' = 3, 'pollopt' = 4, 'job' = 5),
          by LowCardinality(String),
          time DateTime,
          text String,
          dead UInt8,
          parent UInt32,
          poll UInt32,
          kids Array(UInt32),
          url String,
          score Int32,
          title String,
          parts Array(UInt32),
          descendants Int32,
          INDEX gin_index(text) TYPE gin(3) GRANULARITY 1
) ENGINE = MergeTree ORDER BY id
SETTINGS index_granularity=1024;

Populate hackernews_gin3 table by copying the same data from hackernews table

set max_insert_threads=6;
insert into hackernews_gin3 select * from hackernews;

Run query against hackernews and hackernews_gin3 for comparison

SELECT * FROM hackernews WHERE text LIKE '%I love clickhouse%';
SELECT * FROM hackernews_gin3 WHERE text LIKE '%I love clickhouse%';

Assets 5

20 Jul 14:58

ch-devops

v22.7.1.2-clib

652ff7c

v22.7.1.2-clib

Release v22.7.1.2-clib

Image is published at icr.io/clickhouse/clickhouse:22.7.1.2-1-clib-ibm

Renamed dialect from sql_dialect to dialect

set dialect='clickhouse'
set dialect='kusto'
set dialect='kusto_auto'

IP functions

parse_ipv4
"Customers | project parse_ipv4('127.0.0.1')"
parse_ipv6
"Customers | project parse_ipv6('127.0.0.1')"

Please note that the functions listed below only take constant parameters for now. Further improvement is to be expected to support expressions.

ipv4_is_private
"Customers | project ipv4_is_private('192.168.1.6/24')"
"Customers | project ipv4_is_private('192.168.1.6')"
ipv4_is_in_range
"Customers | project ipv4_is_in_range('127.0.0.1', '127.0.0.1')"
"Customers | project ipv4_is_in_range('192.168.1.6', '192.168.1.1/24')"
ipv4_netmask_suffix
"Customers | project ipv4_netmask_suffix('192.168.1.1/24')"
"Customers | project ipv4_netmask_suffix('192.168.1.1')"

string functions

support subquery for in orerator (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/in-cs-operator)
(subquery need to be wraped with bracket inside bracket)

Customers | where Age in ((Customers|project Age|where Age < 30))
Note: case-insensitive not supported yet
has_all (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/has-all-operator)
Customers|where Occupation has_any ('Skilled','abcd')
note : subquery not supported yet
has _any (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/has-anyoperator)
Customers|where Occupation has_all ('Skilled','abcd')
note : subquery not supported yet
countof (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/countoffunction)
Customers | project countof('The cat sat on the mat', 'at')
Customers | project countof('The cat sat on the mat', 'at', 'normal')
Customers | project countof('The cat sat on the mat', 'at', 'regex')
extract ( https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/extractfunction)
Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 0, 'The price of PINEAPPLE ice cream is 20')
Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 1, 'The price of PINEAPPLE ice cream is 20')
Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 2, 'The price of PINEAPPLE ice cream is 20')
Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 3, 'The price of PINEAPPLE ice cream is 20')
Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 2, 'The price of PINEAPPLE ice cream is 20', typeof(real))
extract_all (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/extractallfunction)

Customers | project extract_all('(\\w)(\\w+)(\\w)','The price of PINEAPPLE ice cream is 20')
note: captureGroups not supported yet
split (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/splitfunction)
Customers | project split('aa_bb', '_')
Customers | project split('aaa_bbb_ccc', '_', 1)
Customers | project split('', '_')
Customers | project split('a__b', '_')
Customers | project split('aabbcc', 'bb')
strcat_delim (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/strcat-delimfunction)
Customers | project strcat_delim('-', '1', '2', 'A') , 1s)
Customers | project strcat_delim('-', '1', '2', strcat('A','b'))
note: only support string now.
indexof (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/indexoffunction)
Customers | project indexof('abcdefg','cde')
Customers | project indexof('abcdefg','cde',2)
Customers | project indexof('abcdefg','cde',6)
note: length and occurrence not supported yet

Previously released:

KQL() function

create table
CREATE TABLE kql_table4 ENGINE = Memory AS select *, now() as new_column From kql(Customers | project LastName,Age);
verify the content of kql_table
select * from kql_table
insert into table
create a tmp table:
```
CREATE TABLE temp
(    
    FirstName Nullable(String),
    LastName String, 
    Age Nullable(UInt8)
) ENGINE = Memory;
```
INSERT INTO temp select * from kql(Customers|project FirstName,LastName,Age);
verify the content of temp
select * from temp
Select from kql()
Select * from kql(Customers|project FirstName)

KQL operators:

Tabular expression statements
Customers
Select Column
Customers | project FirstName,LastName,Occupation
Limit returned results
Customers | project FirstName,LastName,Occupation | take 1 | take 3
sort, order
Customers | order by Age desc , FirstName asc
Filter
Customers | where Occupation == 'Skilled Manual'
summarize
Customers |summarize max(Age) by Occupation

KQL string operators and functions

contains
Customers |where Education contains 'degree'
!contains
Customers |where Education !contains 'degree'
contains_cs
Customers |where Education contains 'Degree'
!contains_cs
Customers |where Education !contains 'Degree'
endswith
Customers | where FirstName endswith 'RE'
!endswith
Customers | where !FirstName endswith 'RE'
endswith_cs
Customers | where FirstName endswith_cs 're'
!endswith_cs
Customers | where FirstName !endswith_cs 're'
==
Customers | where Occupation == 'Skilled Manual'
!=
Customers | where Occupation != 'Skilled Manual'
has
Customers | where Occupation has 'skilled'
!has
Customers | where Occupation !has 'skilled'
has_cs
Customers | where Occupation has 'Skilled'
!has_cs
Customers | where Occupation !has 'Skilled'
hasprefix
Customers | where Occupation hasprefix_cs 'Ab'
!hasprefix
Customers | where Occupation !hasprefix_cs 'Ab'
hasprefix_cs
Customers | where Occupation hasprefix_cs 'ab'
!hasprefix_cs
Customers | where Occupation! hasprefix_cs 'ab'
hassuffix
Customers | where Occupation hassuffix 'Ent'
!hassuffix
Customers | where Occupation !hassuffix 'Ent'
hassuffix_cs
Customers | where Occupation hassuffix 'ent'
!hassuffix_cs
Customers | where Occupation hassuffix 'ent'
in
Customers |where Education in ('Bachelors','High School')
!in
Customers | where Education !in ('Bachelors','High School')
matches regex
Customers | where FirstName matches regex 'P.*r'
startswith
Customers | where FirstName startswith 'pet'
!startswith
Customers | where FirstName !startswith 'pet'
startswith_cs
Customers | where FirstName startswith_cs 'pet'
!startswith_cs
Customers | where FirstName !startswith_cs 'pet'
base64_encode_tostring()
Customers | project base64_encode_tostring('Kusto1') | take 1
base64_decode_tostring()
Customers | project base64_decode_tostring('S3VzdG8x') | take 1
isempty()
Customers | where isempty(LastName)
isnotempty()
Customers | where isnotempty(LastName)
isnotnull()
Customers | where isnotnull(FirstName)
isnull()
Customers | where isnull(FirstName)
url_decode()
Customers | project url_decode('https%3A%2F%2Fwww.test.com%2Fhello%20word') | take 1
url_encode()
Customers | project url_encode('https://www.test.com/hello word') | take 1
substring()
Customers | project name_abbr = strcat(substring(FirstName,0,3), ' ', substring(LastName,2))
strcat()
Customers | project name = strcat(FirstName, ' ', LastName)
strlen()
Customers | project FirstName, strlen(FirstName)
strrep()
Customers | project strrep(FirstName,2,'_')
toupper()
Customers | project toupper(FirstName)
tolower()
Customers | project tolower(FirstName)

Aggregate Functions

avg()
avgif()
count()
countif()
max()
maxif()
min()
minif()
sum()
sumif()
dcount()
dcountif()
bin

Assets 5

25 Jun 02:05

ch-devops

v22.7.1.1-clib

f65b5d2

v22.7.1.1-clib

Release v22.7.1.1-clib

Assets 5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KQL implemented features.

Augest 1, 2022

Full-Text Search Release Build

Release v22.7.1.2-clib

Renamed dialect from sql_dialect to dialect

IP functions

string functions

Previously released:

KQL() function

KQL operators:

KQL string operators and functions

Aggregate Functions

Releases: ClibMouse/ClickHouse

v22.8.1.1-clib

KQL implemented features.

Augest 1, 2022

v22.7.1.3-clib

Full-Text Search Release Build

v22.7.1.2-clib

Release v22.7.1.2-clib

Renamed dialect from sql_dialect to dialect

IP functions

string functions

Previously released:

KQL() function

KQL operators:

KQL string operators and functions

Aggregate Functions

v22.7.1.1-clib