Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial update to have attributes use the ordered map so the order is maintained into clickhouse #34598

Conversation

JustinMason
Copy link

@JustinMason JustinMason commented Aug 11, 2024

Description: Update to maintain attribute order for records created in ClickHouse

Otel Records where not consistently maintain order. Go does not maintain order for maps. This was causing records to be duplicated since ClickHouse considers different attribute order as unique records. This was also effecting compression because these records were considered different if the order was different.

Link to tracking Issue: #33634

Implemented the column.MapIterator and changed the use of attributes from map[string]string to column.MapIterator
ClickHouse/clickhouse-go#1152

Testing:
Updated Unit tests and added coverage for OrderMap type.

Basic integration tested was manually done using ClickHouse Server in a local environment.

Documentation:

Pre-existing use of the attribute columns were switched out to use the MapIterator. The Exporter explicitly sorts the attributes before inserting. The MapIterator maintains the original order so it is not lost on record creation.

@SpencerTorres
Copy link
Member

Awesome to see this. Will review thoroughly once marked as ready, but so far things look good. Especially interested in seeing this for log/trace attributes.

@JustinMason JustinMason force-pushed the feature/clickhouse-exporter-ordered-map-for-attributes branch from 6eca859 to 1277292 Compare August 13, 2024 20:09
@JustinMason JustinMason force-pushed the feature/clickhouse-exporter-ordered-map-for-attributes branch from 1277292 to 60c729b Compare August 13, 2024 20:23
@JustinMason JustinMason marked this pull request as ready for review August 13, 2024 20:26
@JustinMason JustinMason requested a review from a team August 13, 2024 20:26
Copy link
Member

@SpencerTorres SpencerTorres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! Glad to see this applies to logs/traces now. Can you confirm the trace test file doesn't require any changes?

Copy link
Contributor

@Frapschen Frapschen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!, LGTM

@JustinMason
Copy link
Author

Excellent! Glad to see this applies to logs/traces now. Can you confirm the trace test file doesn't require any changes?

I added field mapping coverage for the attribute conversion to OrderedMap.

FYI, I did notice Tracing Scope().Attributes() does not have a column defined in the clickhouse schema, so those values are not persisted. Was this intentional?

For example: Logs has it https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/clickhouseexporter/exporter_logs.go#L159

Tracing does not: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/clickhouseexporter/exporter_traces.go#L176

I can create an issue for this, it should be independent of this PR.

@SpencerTorres
Copy link
Member

FYI, I did notice Tracing Scope().Attributes() does not have a column defined in the clickhouse schema, so those values are not persisted. Was this intentional?

Not sure, I'll have to check the OTel spec to see what's in there. Let me know your thoughts on adding it and I'll add it in my exisitng PR #34245

@JustinMason
Copy link
Author

FYI, I did notice Tracing Scope().Attributes() does not have a column defined in the clickhouse schema, so those values are not persisted. Was this intentional?

Not sure, I'll have to check the OTel spec to see what's in there. Let me know your thoughts on adding it and I'll add it in my exisitng PR #34245

Looking at the spec it looks like the attributes are optional for tracing, so its probably a good idea to add them in case they are being used in an up stream component.
https://opentelemetry.io/docs/specs/otel/trace/api/#get-a-tracer

[since 1.13.0] attributes (optional): Specifies the instrumentation scope attributes to associate with emitted telemetry.

@earwin
Copy link
Contributor

earwin commented Sep 11, 2024

I wrote a similar IterableOrderedMap some while ago, and came here to report the same problem to see it's already being taken care of : D Thanks!
I'll leave my code here for inspiration: https://gist.github.com/earwin/149740749c8aaccde5ea626169c32735
It uses a single in-place sorted slice and generics, so mayhaps is easier on GC.
All stuff using iter.Seq is go1.23 sugar and can be dropped.

@hanjm
Copy link
Member

hanjm commented Sep 11, 2024

I wrote a similar IterableOrderedMap some while ago, and came here to report the same problem to see it's already being taken care of : D Thanks!

I'll leave my code here for inspiration: https://gist.github.com/earwin/149740749c8aaccde5ea626169c32735

It uses a single in-place sorted slice and generics, so mayhaps is easier on GC.

All stuff using iter.Seq is go1.23 sugar and can be dropped.

@earwin so good ideal for go 1.23, can you put it to clickhouse go sdk?

@earwin
Copy link
Contributor

earwin commented Sep 11, 2024

@earwin so good ideal for go 1.23, can you put it to clickhouse go sdk?

I've been meaning to say it's perfectly usable for pre-1.23 as well, because everything that uses iter.Seq is non-core functionality which can be deleted.

I'll ask and see if they are open to having an impl out-of-the-box. ... there

Copy link
Contributor

github-actions bot commented Oct 3, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Oct 3, 2024
@JustinMason
Copy link
Author

@dmitryax Any feedback on this pr? It will go stale soon.

@JustinMason JustinMason requested a review from a team as a code owner October 3, 2024 14:04
@github-actions github-actions bot removed the Stale label Oct 4, 2024
@earwin
Copy link
Contributor

earwin commented Oct 9, 2024

Here's my pull request: ClickHouse/clickhouse-go#1417, I modified my impl so it compiles on pre-1.23 go, while still being forward compatible with 1.23 iterators.

mapIterator := orderedMap.Iterator()

for mapIterator.Next() {
attrs.Put(mapIterator.Key(), mapIterator.Value())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're trying to copy orderedMap. Why?
But you're also reusing the attrs var defined outside of the loop, so exemplars will get attributes that are a union of all their attributes. Is this intended?

@earwin
Copy link
Contributor

earwin commented Oct 9, 2024

Heads up: I've been working on my own reimplementation of this pull request,
and we're both hitting the same (or very similar) clickhouse-go bug: ClickHouse/clickhouse-go#1365

To reproduce: reenable and run integration_test.go:TestIntegration

=== RUN   TestIntegration
=== RUN   TestIntegration/test_clickhouse_24-alpine
    logger.go:146: 2024-10-09T00:00:03.222+0000	DEBUG	insert logs	{"records": 1, "cost": "64.272625ms"}
    logger.go:146: 2024-10-09T00:00:03.254+0000	DEBUG	insert traces	{"records": 1, "cost": "2.285208ms"}
    exporter_traces_test.go:126: 
        	Error Trace:	/*/opentelemetry-collector-contrib/exporter/clickhouseexporter/exporter_traces_test.go:126
        	            				/*/opentelemetry-collector-contrib/exporter/clickhouseexporter/integration_test.go:147
        	            				/*/opentelemetry-collector-contrib/exporter/clickhouseexporter/integration_test.go:68
        	Error:      	Received unexpected error:
        	            	ExecContext:clickhouse [AppendRow]: Events.Attributes clickhouse [AppendRow]: converting internal.OrderedMap to Map(LowCardinality(String), String) is unsupported. try using map[string]string
        	Test:       	TestIntegration/test_clickhouse_24-alpine
--- FAIL: TestIntegration/test_clickhouse_24-alpine (1.18s)

--- FAIL: TestIntegration (1.18s)

FAIL

Seems to be triggered by arrays holding pointers to instances of OrderedMap/IterableOrderedMap. Somewhere in append() chain elements of the array get dereferenced, and then v.(column.IterableOrderedMap) stops recognizing the instance.

@JustinMason
Copy link
Author

Closing due to parallel effort, @earwin has a better take on the problem and is following up with prs against the Clickhouse repo. Thanks for helping address this.

@JustinMason JustinMason deleted the feature/clickhouse-exporter-ordered-map-for-attributes branch October 17, 2024 03:20
TylerHelmuth pushed a commit that referenced this pull request Dec 12, 2024
…3634 (#35725)

#### Description
Our attributes are stored as Map(String, String) in CH.
By default the order of keys is undefined and as described in #33634
leads to worse compression and duplicates in `group by` (unless
carefully accounted for).
This PR uses the `column.IterableOrderedMap` facility from clickhouse-go
to ensure fixed attribute key order.

It is a reimplementation of #34598 that uses less allocations and is
(arguably) somewhat more straightforward.

I'm **opening this as a draft**, because this PR (and #34598) are
blocked by ClickHouse/clickhouse-go#1365
(fixed in ClickHouse/clickhouse-go#1418)

In addition, I'm trying to add the implementation of
`column.IterableOrderedMap` used to clickhouse-go upstream:
ClickHouse/clickhouse-go#1417
If it is accepted, I will amend this PR accordingly.

#### Link to tracking issue
Fixes #33634

#### Testing
The IOM implementation was used in production independently.
I'm planning to build otelcollector with this PR and cut over my
production to it in the next few of days.
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this pull request Dec 17, 2024
…en-telemetry#33634 (open-telemetry#35725)

#### Description
Our attributes are stored as Map(String, String) in CH.
By default the order of keys is undefined and as described in open-telemetry#33634
leads to worse compression and duplicates in `group by` (unless
carefully accounted for).
This PR uses the `column.IterableOrderedMap` facility from clickhouse-go
to ensure fixed attribute key order.

It is a reimplementation of open-telemetry#34598 that uses less allocations and is
(arguably) somewhat more straightforward.

I'm **opening this as a draft**, because this PR (and open-telemetry#34598) are
blocked by ClickHouse/clickhouse-go#1365
(fixed in ClickHouse/clickhouse-go#1418)

In addition, I'm trying to add the implementation of
`column.IterableOrderedMap` used to clickhouse-go upstream:
ClickHouse/clickhouse-go#1417
If it is accepted, I will amend this PR accordingly.

#### Link to tracking issue
Fixes open-telemetry#33634

#### Testing
The IOM implementation was used in production independently.
I'm planning to build otelcollector with this PR and cut over my
production to it in the next few of days.
mterhar pushed a commit to mterhar/opentelemetry-collector-contrib that referenced this pull request Dec 19, 2024
…en-telemetry#33634 (open-telemetry#35725)

#### Description
Our attributes are stored as Map(String, String) in CH.
By default the order of keys is undefined and as described in open-telemetry#33634
leads to worse compression and duplicates in `group by` (unless
carefully accounted for).
This PR uses the `column.IterableOrderedMap` facility from clickhouse-go
to ensure fixed attribute key order.

It is a reimplementation of open-telemetry#34598 that uses less allocations and is
(arguably) somewhat more straightforward.

I'm **opening this as a draft**, because this PR (and open-telemetry#34598) are
blocked by ClickHouse/clickhouse-go#1365
(fixed in ClickHouse/clickhouse-go#1418)

In addition, I'm trying to add the implementation of
`column.IterableOrderedMap` used to clickhouse-go upstream:
ClickHouse/clickhouse-go#1417
If it is accepted, I will amend this PR accordingly.

#### Link to tracking issue
Fixes open-telemetry#33634

#### Testing
The IOM implementation was used in production independently.
I'm planning to build otelcollector with this PR and cut over my
production to it in the next few of days.
AkhigbeEromo pushed a commit to sematext/opentelemetry-collector-contrib that referenced this pull request Jan 13, 2025
…en-telemetry#33634 (open-telemetry#35725)

#### Description
Our attributes are stored as Map(String, String) in CH.
By default the order of keys is undefined and as described in open-telemetry#33634
leads to worse compression and duplicates in `group by` (unless
carefully accounted for).
This PR uses the `column.IterableOrderedMap` facility from clickhouse-go
to ensure fixed attribute key order.

It is a reimplementation of open-telemetry#34598 that uses less allocations and is
(arguably) somewhat more straightforward.

I'm **opening this as a draft**, because this PR (and open-telemetry#34598) are
blocked by ClickHouse/clickhouse-go#1365
(fixed in ClickHouse/clickhouse-go#1418)

In addition, I'm trying to add the implementation of
`column.IterableOrderedMap` used to clickhouse-go upstream:
ClickHouse/clickhouse-go#1417
If it is accepted, I will amend this PR accordingly.

#### Link to tracking issue
Fixes open-telemetry#33634

#### Testing
The IOM implementation was used in production independently.
I'm planning to build otelcollector with this PR and cut over my
production to it in the next few of days.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants