Skip to content

Releases: ClibMouse/ClickHouse

v22.8.1.1-clib

04 Aug 00:53
Compare
Choose a tag to compare

Release v22.8.1.1-clib
Image is published at icr.io/clickhouse/clickhouse:22.8.1.1-1-clib-ibm

KQL implemented features.


Augest 1, 2022

  • strcmp (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/strcmpfunction)
    print strcmp('abc','ABC')

  • parse_url (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/parseurlfunction)
    print Result = parse_url('scheme://username:[email protected]:1234/this/is/a/path?k1=v1&k2=v2#fragment')

  • parse_urlquery (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/parseurlqueryfunction)
    print Result = parse_urlquery('k1=v1&k2=v2&k3=v3')

  • print operator (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/printoperator)
    print x=1, s=strcat('Hello', ', ', 'World!')

  • The following functions now support arbitrary expressions as their argument:

  • Aggregate Functions:

  • make_list()
    Customers | summarize t = make_list(FirstName) by FirstName
    Customers | summarize t = make_list(FirstName, 10) by FirstName

  • make_list_if()
    Customers | summarize t = make_list_if(FirstName, Age > 10) by FirstName
    Customers | summarize t = make_list_if(FirstName, Age > 10, 10) by FirstName

  • make_list_with_nulls()
    Customers | summarize t = make_list_with_nulls(FirstName) by FirstName

  • make_set()
    Customers | summarize t = make_set(FirstName) by FirstName
    Customers | summarize t = make_set(FirstName, 10) by FirstName

  • make_set_if()
    Customers | summarize t = make_set_if(FirstName, Age > 10) by FirstName
    Customers | summarize t = make_set_if(FirstName, Age > 10, 10) by FirstName

  • Default dialect config setting for session and user:

  • Set dialect setting in server configuration XML at user level(users.xml). This sets the dialect at server startup and CH will do query parsing for all users with default profile acording to dialect value.

    For example:
    <profiles> <!-- Default settings. --> <default> <load_balancing>random</load_balancing> <dialect>kusto_auto</dialect> </default>

  • Query can be executed with HTTP client as below once dialect is set in users.xml
    echo "KQL query" | curl -sS "http://localhost:8123/?" --data-binary @-

  • To execute the query using clickhouse-client , Update clickhouse-client.xml as below and connect clickhouse-client with --config-file option (clickhouse-client --config-file=<config-file path>)

    <config> <dialect>kusto_auto</dialect> </config>

    OR
    pass dialect setting with '--'. For example :
    clickhouse-client --dialect='kusto_auto' -q "KQL query"

v22.7.1.3-clib

21 Jul 22:03
Compare
Choose a tag to compare

Release v22.7.1.3-clib is a Full-Text Search specific release.
Image is published at icr.io/clickhouse/clickhouse:22.7.1.3-1-clib-ibm

Full-Text Search Release Build


Embedded Full-Text search is enabled by introducing a new inverted index feature into ClickHouse.
This inverted index feature is implemented as a new type of skipping index named GIN.
The implementation is well-aligned with ClickHouse secondary index(skipping index) architecture, including index creating syntax, block stream piping, expression(RPN) evaluation, etc.

The following statements are examples defining GIN index:

CREATE TABLE my_table1 (k UInt64,s String,INDEX my_gin_index(s) TYPE gin(0) GRANULARITY 1)  Engine=MergeTree ORDER BY (k)
CREATE TABLE my_table2 (k UInt64,s String,INDEX my_gin_index(s) TYPE gin(3) GRANULARITY 1)  Engine=MergeTree ORDER BY (k)

gin() or gin(0) set tokenizer to "tokens", while gin(n) (n is between 2 to 8) indicates using “ngrams(n)” as tokenizer.

The main techniques include single-pass index construction with segmentation, Roaring Bitmap(for postings lists) and FST(for term dictionaries), see the following references:
[1] Heinz, Steffen, and Justin Zobel. 2003. Efficient single-pass index construction for text databases. JASIST 54(8):713–729.
[2] Roaring Bitmap https://github.com/RoaringBitmap/RoaringBitmap
[3] FST(Finite State Transducer) Direct Construction of Minimal Acyclic Subsequential Transducers

There is also a merge tree setting to control the maximum size of data digestion during indexing: max_digestion_size_per_segment (default is 256M)

The following steps demonstrate the inverted index feature using hackernews dataset.

  1. Create and load hackernews table
CREATE TABLE hackernews ENGINE = MergeTree ORDER BY id
          AS SELECT * FROM url('https://datasets.clickhouse.com/hackernews.native.zst', Native,
          $$
          id UInt32,
          deleted UInt8,
          type Enum('story' = 1, 'comment' = 2, 'poll' = 3, 'pollopt' = 4, 'job' = 5),
          by LowCardinality(String),
          time DateTime,
          text String,
          dead UInt8,
          parent UInt32,
          poll UInt32,
          kids Array(UInt32),
          url String,
          score Int32,
          title String,
          parts Array(UInt32),
          descendants Int32
          $$);
  1. Create hackernews_gin3 table, which has the same column definitions as hackernews, and with an inverted index.
CREATE TABLE hackernews_gin3
(
          id UInt32,
          deleted UInt8,
          type Enum('story' = 1, 'comment' = 2, 'poll' = 3, 'pollopt' = 4, 'job' = 5),
          by LowCardinality(String),
          time DateTime,
          text String,
          dead UInt8,
          parent UInt32,
          poll UInt32,
          kids Array(UInt32),
          url String,
          score Int32,
          title String,
          parts Array(UInt32),
          descendants Int32,
          INDEX gin_index(text) TYPE gin(3) GRANULARITY 1
) ENGINE = MergeTree ORDER BY id
SETTINGS index_granularity=1024;
  1. Populate hackernews_gin3 table by copying the same data from hackernews table
set max_insert_threads=6;
insert into hackernews_gin3 select * from hackernews;
  1. Run query against hackernews and hackernews_gin3 for comparison
SELECT * FROM hackernews WHERE text LIKE '%I love clickhouse%';
SELECT * FROM hackernews_gin3 WHERE text LIKE '%I love clickhouse%';

v22.7.1.2-clib

20 Jul 14:58
Compare
Choose a tag to compare

Release v22.7.1.2-clib

  • Image is published at icr.io/clickhouse/clickhouse:22.7.1.2-1-clib-ibm

Renamed dialect from sql_dialect to dialect

set dialect='clickhouse'
set dialect='kusto'
set dialect='kusto_auto'

IP functions

  • parse_ipv4
    "Customers | project parse_ipv4('127.0.0.1')"
  • parse_ipv6
    "Customers | project parse_ipv6('127.0.0.1')"

Please note that the functions listed below only take constant parameters for now. Further improvement is to be expected to support expressions.

  • ipv4_is_private
    "Customers | project ipv4_is_private('192.168.1.6/24')"
    "Customers | project ipv4_is_private('192.168.1.6')"
  • ipv4_is_in_range
    "Customers | project ipv4_is_in_range('127.0.0.1', '127.0.0.1')"
    "Customers | project ipv4_is_in_range('192.168.1.6', '192.168.1.1/24')"
  • ipv4_netmask_suffix
    "Customers | project ipv4_netmask_suffix('192.168.1.1/24')"
    "Customers | project ipv4_netmask_suffix('192.168.1.1')"

string functions

Previously released:


KQL() function

  • create table
    CREATE TABLE kql_table4 ENGINE = Memory AS select *, now() as new_column From kql(Customers | project LastName,Age);
    verify the content of kql_table
    select * from kql_table

  • insert into table
    create a tmp table:

    CREATE TABLE temp
    (    
        FirstName Nullable(String),
        LastName String, 
        Age Nullable(UInt8)
    ) ENGINE = Memory;
    

    INSERT INTO temp select * from kql(Customers|project FirstName,LastName,Age);
    verify the content of temp
    select * from temp

  • Select from kql()
    Select * from kql(Customers|project FirstName)

KQL operators:

  • Tabular expression statements
    Customers
  • Select Column
    Customers | project FirstName,LastName,Occupation
  • Limit returned results
    Customers | project FirstName,LastName,Occupation | take 1 | take 3
  • sort, order
    Customers | order by Age desc , FirstName asc
  • Filter
    Customers | where Occupation == 'Skilled Manual'
  • summarize
    Customers |summarize max(Age) by Occupation

KQL string operators and functions

  • contains
    Customers |where Education contains 'degree'

  • !contains
    Customers |where Education !contains 'degree'

  • contains_cs
    Customers |where Education contains 'Degree'

  • !contains_cs
    Customers |where Education !contains 'Degree'

  • endswith
    Customers | where FirstName endswith 'RE'

  • !endswith
    Customers | where !FirstName endswith 'RE'

  • endswith_cs
    Customers | where FirstName endswith_cs 're'

  • !endswith_cs
    Customers | where FirstName !endswith_cs 're'

  • ==
    Customers | where Occupation == 'Skilled Manual'

  • !=
    Customers | where Occupation != 'Skilled Manual'

  • has
    Customers | where Occupation has 'skilled'

  • !has
    Customers | where Occupation !has 'skilled'

  • has_cs
    Customers | where Occupation has 'Skilled'

  • !has_cs
    Customers | where Occupation !has 'Skilled'

  • hasprefix
    Customers | where Occupation hasprefix_cs 'Ab'

  • !hasprefix
    Customers | where Occupation !hasprefix_cs 'Ab'

  • hasprefix_cs
    Customers | where Occupation hasprefix_cs 'ab'

  • !hasprefix_cs
    Customers | where Occupation! hasprefix_cs 'ab'

  • hassuffix
    Customers | where Occupation hassuffix 'Ent'

  • !hassuffix
    Customers | where Occupation !hassuffix 'Ent'

  • hassuffix_cs
    Customers | where Occupation hassuffix 'ent'

  • !hassuffix_cs
    Customers | where Occupation hassuffix 'ent'

  • in
    Customers |where Education in ('Bachelors','High School')

  • !in
    Customers | where Education !in ('Bachelors','High School')

  • matches regex
    Customers | where FirstName matches regex 'P.*r'

  • startswith
    Customers | where FirstName startswith 'pet'

  • !startswith
    Customers | where FirstName !startswith 'pet'

  • startswith_cs
    Customers | where FirstName startswith_cs 'pet'

  • !startswith_cs
    Customers | where FirstName !startswith_cs 'pet'

  • base64_encode_tostring()
    Customers | project base64_encode_tostring('Kusto1') | take 1

  • base64_decode_tostring()
    Customers | project base64_decode_tostring('S3VzdG8x') | take 1

  • isempty()
    Customers | where isempty(LastName)

  • isnotempty()
    Customers | where isnotempty(LastName)

  • isnotnull()
    Customers | where isnotnull(FirstName)

  • isnull()
    Customers | where isnull(FirstName)

  • url_decode()
    Customers | project url_decode('https%3A%2F%2Fwww.test.com%2Fhello%20word') | take 1

  • url_encode()
    Customers | project url_encode('https://www.test.com/hello word') | take 1

  • substring()
    Customers | project name_abbr = strcat(substring(FirstName,0,3), ' ', substring(LastName,2))

  • strcat()
    Customers | project name = strcat(FirstName, ' ', LastName)

  • strlen()
    Customers | project FirstName, strlen(FirstName)

  • strrep()
    Customers | project strrep(FirstName,2,'_')

  • toupper()
    Customers | project toupper(FirstName)

  • tolower()
    Customers | project tolower(FirstName)

Aggregate Functions

  • avg()
  • avgif()
  • count()
  • countif()
  • max()
  • maxif()
  • min()
  • minif()
  • sum()
  • sumif()
  • dcount()
  • dcountif()
  • bin

v22.7.1.1-clib

25 Jun 02:05
f65b5d2
Compare
Choose a tag to compare

Release v22.7.1.1-clib