Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: extractNormalizedInsertQueryAndColumns does not remove backticks inside a dot-delimited column name #1401

Open
8 of 9 tasks
fidiego opened this issue Sep 10, 2024 · 7 comments
Assignees

Comments

@fidiego
Copy link

fidiego commented Sep 10, 2024

Observed

extractNormalizedInsertQueryAndColumns cleans up leading and trailing backticks in columns such as "events.attributes" but leaves the internal backticks untouched leading to a block cannot be sorted error due to the mismatch.

This affects Nested type columns used via go-gorm.

Expected behaviour

Ideally, extractNormalizedInsertQueryAndColumns should handle this case.

Working `extractNormalizedInsertQueryColumns`

func extractNormalizedInsertQueryAndColumns(query string) (normalizedQuery string, tableName string, columns []string, err error) {
	query = truncateFormat.ReplaceAllString(query, "")
	query = truncateValues.ReplaceAllString(query, "")

	matches := normalizeInsertQueryMatch.FindStringSubmatch(query)
	if len(matches) == 0 {
		err = errors.Errorf("invalid INSERT query: %s", query)
		return
	}

	normalizedQuery = fmt.Sprintf("%s FORMAT Native", matches[1])
	tableName = strings.TrimSpace(matches[2])

	columns = make([]string, 0)
	matches = extractInsertColumnsMatch.FindStringSubmatch(matches[1])
	if len(matches) == 2 {
		columns = strings.Split(matches[1], ",")
		for i := range columns {
			// refers to https://clickhouse.com/docs/en/sql-reference/syntax#identifiers
			// we can use identifiers with double quotes or backticks, for example: "id", `id`, but not both, like `"id"`.
			col := columns[i]
			col = strings.TrimSpace(columns[i]) // trim lead/trail -in whitespace
			col = strings.Trim(col, "\"")       // trim lead/trail -ing double quote
			col = strings.Trim(col, "`")        // trim lead/trial -ing tilde
			// for columns like events.attributes that are otherwise represented like this: events`.`attributes
			col = strings.ReplaceAll(col, "`.`", ".") // replace internal tilde delimited periods w/ plain period
			columns[i] = col
		}
	}

	return
}

`show create traces`

SHOW CREATE TABLE ctl_api.otel_traces

Query id: 69b8f8cf-71c2-4967-8cba-d80945a4be78

   ┌─statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ CREATE TABLE ctl_api.otel_traces
(
    `id` String,
    `created_by_id` String,
    `created_at` DateTime64(3),
    `updated_at` DateTime64(3),
    `deleted_at` UInt64,
    `runner_id` String,
    `runner_job_id` String,
    `runner_job_execution_id` String,
    `timestamp` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
    `timestamp_date` Date DEFAULT toDate(timestamp),
    `timestamp_time` DateTime DEFAULT toDateTime(timestamp),
    `resource_attributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `resource_schema_url` LowCardinality(String) CODEC(ZSTD(1)),
    `scope_name` String CODEC(ZSTD(1)),
    `scope_version` LowCardinality(String) CODEC(ZSTD(1)),
    `scope_attributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `scope_dropped_attr_count` Int64,
    `scope_schema_url` LowCardinality(String) CODEC(ZSTD(1)),
    `trace_id` String CODEC(ZSTD(1)),
    `span_id` String CODEC(ZSTD(1)),
    `parent_span_id` String CODEC(ZSTD(1)),
    `trace_state` String CODEC(ZSTD(1)),
    `span_name` LowCardinality(String) CODEC(ZSTD(1)),
    `span_kind` LowCardinality(String) CODEC(ZSTD(1)),
    `service_name` LowCardinality(String) CODEC(ZSTD(1)),
    `span_attributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `duration` Int64 CODEC(ZSTD(1)),
    `status_code` LowCardinality(String) CODEC(ZSTD(1)),
    `status_message` String CODEC(ZSTD(1)),
    `events.timestamp` Array(DateTime64(9)),
    `events.name` Array(LowCardinality(String)),
    `events.attributes` Array(Map(LowCardinality(String), String)),
    `links.trace_id` Array(String),
    `links.span_id` Array(String),
    `links.span_state` Array(String),
    `links.attributes` Array(Map(LowCardinality(String), String)),
    INDEX idx_trace_id trace_id TYPE bloom_filter(0.001) GRANULARITY 1,
    INDEX idx_span_attr_key mapKeys(resource_attributes) TYPE bloom_filter(0.1) GRANULARITY 1,
    INDEX idx_span_attr_value mapKeys(resource_attributes) TYPE bloom_filter(0.1) GRANULARITY 1,
    INDEX idx_res_attr_key mapKeys(resource_attributes) TYPE bloom_filter(0.1) GRANULARITY 1,
    INDEX idx_res_attr_value mapKeys(resource_attributes) TYPE bloom_filter(0.1) GRANULARITY 1,
    INDEX idx_scope_attr_key mapKeys(resource_attributes) TYPE bloom_filter(0.1) GRANULARITY 1,
    INDEX idx_scope_attr_value mapKeys(resource_attributes) TYPE bloom_filter(0.1) GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY toDate(timestamp)
ORDER BY (service_name, span_name, toUnixTimestamp(timestamp), trace_id)
TTL toDateTime(timestamp) + toIntervalDay(180)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1 │
   └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

1 row in set. Elapsed: 0.004 sec.

Code example

package code

// given an auto-migrated struct like this

type Trace struct {
	Events         []OtelTraceEvent  `gorm:"type:Nested(timestamp DateTime64(9), name LowCardinality(String), attributes Map(LowCardinality(String), String));"` // order matters, dont't touch nested def
	Links          []OtelTraceLink   `gorm:"type:Nested(trace_id String, span_id String, span_state String, attributes Map(LowCardinality(String), String));"`   // order matters, dont't touch nested def
}

// and a non-migrated struct like this
type TraceIngestion struct {
	EventsTimestamp  []time.Time         `gorm:"type:DateTime64(9);column:events.timestamp"`
	EventsName       []string            `gorm:"type:LowCardinality(String);column:events.name"`
	EventsAttributes []map[string]string `gorm:"type:Map(LowCardinality(String), String);column:events.attributes"`
	LinksTraceID     []string            `gorm:"type:LowCardinality(String);column:links.trace_id"`
	LinksSpanID      []string            `gorm:"type:LowCardinality(String);column:links.span_id"`
	LinksState       []string            `gorm:"type:LowCardinality(String);column:links.span_state"`
	LinksAttributes  []map[string]string `gorm:"type:Map(LowCardinality(String), String);column:links.attributes"`
}

Error log

block cannot be sorted - missing columns in requested order: [events.timestamp events.name events.attributes]"

Details

Environment

  • clickhouse-go version: 2.28.2
  • Interface: go-gorm/clickhouse
  • Go version: 1.22.0
  • Operating system: OS X
  • ClickHouse version: 24.9.1.628
  • Is it a ClickHouse Cloud? No
  • ClickHouse Server non-default settings, if any: -
  • CREATE TABLE statements for tables involved:
  • Sample data for all these tables, use clickhouse-obfuscator if necessary
@jkaflik
Copy link
Contributor

jkaflik commented Sep 12, 2024

@fidiego thanks for reporting this. Could you (if possible) post an INSERT statement generated by go-gorm?

@jkaflik jkaflik self-assigned this Sep 12, 2024
@fidiego
Copy link
Author

fidiego commented Sep 12, 2024

sure thing @jkaflik.

INSERT SQL

INSERT INTO
    `otel_traces` (
        `id`,
        `created_by_id`,
        `created_at`,
        `updated_at`,
        `deleted_at`,
        `runner_id`,
        `runner_job_id`,
        `runner_job_execution_id`,
        `timestamp`,
        `timestamp_date`,
        `timestamp_time`,
        `resource_attributes`,
        `resource_schema_url`,
        `scope_name`,
        `scope_version`,
        `scope_attributes`,
        `scope_dropped_attr_count`,
        `scope_schema_url`,
        `trace_id`,
        `span_id`,
        `parent_span_id`,
        `trace_state`,
        `span_name`,
        `span_kind`,
        `service_name`,
        `span_attributes`,
        `duration`,
        `status_code`,
        `status_message`,
        `events`.`timestamp`,
        `events`.`name`,
        `events`.`attributes`
    )
VALUES
    (
        '',
        '',
        '2024-09-09 18:49:12.765',
        '2024-09-09 18:49:12.765',
        0,
        'runccccccccccccccccccccccc',
        '',
        '',
        '2018-12-13 08:51:00',
        '0000-00-00 00:00:00',
        '0000-00-00 00:00:00',
        { 'job.id' :'jobidentfier',
        'service.name': 'runner-runxxxxxxxxxxxxxxxxxxxxxxxxxxx' },
        '',
        'my.library',
        '1.0.0',
        {'my.scope.attribute':'some scope attribute'},
        0,
        '',
        '5bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
        'eee19b7ec3c1b174',
        'eee19b7ec3c1b173',
        '',
        'I''m a server span',
        'Server',
        'runner-runxxxxxxxxxxxxxxxxxxxxxxxxxxx',
        {'my.span.attr':'some value'},
        1,
        'Unset',
        '',
        (
            '2018-12-13 14:51:01',
            '2020-02-11 20:26:13',
            '2020-02-11 20:26:13'
        ),
        ('ebento1', 'event-with-attr', 'event'),
        (
            [],
            { 'span-event-attr' :'span-event-attr-val' } ',
            []
        )
    )

@fidiego
Copy link
Author

fidiego commented Sep 20, 2024

hi @jkaflik, following up on this. if it is useful, i can set up a reproduction.

@jkaflik
Copy link
Contributor

jkaflik commented Sep 27, 2024

@fidiego sorry for delayed response. Yes, please.

@fidiego
Copy link
Author

fidiego commented Oct 1, 2024

@jkaflik no worries.

I've set up a PoC here. https://github.com/fidiego/ch-go-poc

In my actual code I define a struct for the migration w/ gorm:"type:Nested(...) (like exemplars in the poc) and a second struct for reading with explicit columns (like events.* in the poc). Because it's simpler than writing scanner/value code. Read works great, but creating/migrating w/ the dot-delimited columns raises an error.

The dot-delimited fields work as event.column but not as events.column.

@fidiego
Copy link
Author

fidiego commented Oct 9, 2024

hi @jkaflik, i hope all is well. i wanted to ask if there was anything you needed from my end or if there were any updates on this issue.

@jkaflik
Copy link
Contributor

jkaflik commented Oct 15, 2024

@fidiego, thanks for submitting a reproduction DDL. I will take a look this week. Apologies, my focus was shift on another project.

@jkaflik jkaflik assigned SpencerTorres and unassigned jkaflik Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants