-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MIGRATION: add bytes_scanned to querylog #3360
Conversation
This PR has a migration; here is the generated SQL -- start migrations
-- forward migration querylog : 0004_add_bytes_scanned
Local operations:
ALTER TABLE querylog_local ADD COLUMN IF NOT EXISTS clickhouse_queries.bytes_scanned Array(UInt64) AFTER clickhouse_queries.array_join_columns;
Dist operations:
ALTER TABLE querylog_dist ADD COLUMN IF NOT EXISTS clickhouse_queries.bytes_scanned Array(UInt64) AFTER clickhouse_queries.array_join_columns;
-- end forward migration querylog : 0004_add_bytes_scanned
-- backward migration querylog : 0004_add_bytes_scanned
Local operations:
ALTER TABLE querylog_local DROP COLUMN IF EXISTS clickhouse_queries.bytes_scanned;
Dist operations:
ALTER TABLE querylog_dist DROP COLUMN IF EXISTS clickhouse_queries.bytes_scanned;
-- end backward migration querylog : 0004_add_bytes_scanned |
Codecov ReportBase: 90.75% // Head: 21.87% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #3360 +/- ##
===========================================
- Coverage 90.75% 21.87% -68.88%
===========================================
Files 704 663 -41
Lines 32335 31209 -1126
===========================================
- Hits 29345 6827 -22518
- Misses 2990 24382 +21392
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
column=Column( | ||
"clickhouse_queries.bytes_scanned", | ||
Array( | ||
UInt(64), | ||
Modifiers( | ||
default="arrayResize([0], length(clickhouse_queries.sql))" | ||
), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not be this simple.
If I applied this migration the way it is, the consumer would break.
The consumer would try write rows in the clickhouse_queries
nested column that would not be of the same size as the bytes_scanned
array would not be passed.
As such Clickhouse will consider this array as passed empty.
The migration has to happen in the consumer first passing the column first. You will have to set the input_format_skip_unknown_fields
Clickhouse settings in your write so Clickhouse will ignore the unknown column. Then we can apply the migration.
This was the example https://github.com/getsentry/snuba/pull/2232/files
bb7b41a
to
7622e56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the previous PR to change how data is written, I think this should be OK.
Modifiers( | ||
default="arrayResize([0], length(clickhouse_queries.sql))" | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to remove this as you are now writing the actual empty array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But not all rows have this column written. Doesn't it need to exist until that's no longer the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because the column does not exist in the DB.
The critical part is that you deploy the change that writes the column for all rows before we apply the migration.
The default here proved to be useless right after we added it back in 2020.
It was a desperate move to make a migration work without having to stop the consumer. We did not know about input_format_skip_unknown_fields
back then.
The attribution log contains attribution for internal customers but not for sentry customers (no project_id in attribution log). We want to be able to analyze bytes scanned by sentry customers as well. Hence we need to add this column