Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channels: Speed up clickhouse calculations #4789

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

macobo
Copy link
Contributor

@macobo macobo commented Nov 7, 2024

The previous has queries proved to be problematic and causing a lot of
CPU overhead. We can improve this by using dictionaries instead.

Benchmarked via this query:

SELECT
  channel,
  count(),
  countIf(acquisition_channel(referrer_source, utm_medium, utm_campaign, utm_source, click_id_param) = channel) AS matches
FROM events_v2
WHERE timestamp > now() - toIntervalHour(48)
GROUP BY channel
ORDER BY count() desc

Before this fix:

query_duration_ms:                                                57960
DiskReadElapsedMs:                                                374.712
RealTimeMs:                                                       2891200.667
UserTimeMs:                                                       2704024.783
SystemTimeMs:                                                     1693.265
OSCPUWaitMs:                                                      90.253
OSCPUVirtualTimeMs:                                               2705709.58

After this fix:

query_duration_ms:                                                4367
DiskReadElapsedMs:                                                454.356
RealTimeMs:                                                       213892.207
UserTimeMs:                                                       199363.485
SystemTimeMs:                                                     1479.364
OSCPUWaitMs:                                                      13.739
OSCPUVirtualTimeMs:                                               200837.37

Note that the new tables are not tracked in our schema as usual as
they're pretty much temporary tables to create the dictionary without
needing to upload files to clickhouse servers.

The previous `has` queries proved to be problematic and causing a lot of
CPU overhead.

Benchmarked via this query:

```sql
SELECT
  channel,
  count(),
  countIf(acquisition_channel(referrer_source, utm_medium, utm_campaign, utm_source, click_id_param) = channel) AS matches
FROM events_v2
WHERE timestamp > now() - toIntervalHour(48)
GROUP BY channel
ORDER BY count() desc
```

Before this fix:
```
query_duration_ms:                                                57960
DiskReadElapsedMs:                                                374.712
RealTimeMs:                                                       2891200.667
UserTimeMs:                                                       2704024.783
SystemTimeMs:                                                     1693.265
OSCPUWaitMs:                                                      90.253
OSCPUVirtualTimeMs:                                               2705709.58
```

After this fix:
```
query_duration_ms:                                                4367
DiskReadElapsedMs:                                                454.356
RealTimeMs:                                                       213892.207
UserTimeMs:                                                       199363.485
SystemTimeMs:                                                     1479.364
OSCPUWaitMs:                                                      13.739
OSCPUVirtualTimeMs:                                               200837.37
```

Note that the new tables are not tracked in our schema as usual as
they're pretty much temporary tables to create the dictionary without
needing to upload files to clickhouse servers.
@macobo macobo requested a review from ukutaht November 7, 2024 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant