-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(generic-metrics): Add Base64 decoding to Snuba processors #5761
Conversation
} | ||
|
||
impl<T> EncodedSeries<T> { | ||
impl<T: Decodable> EncodedSeries<T> { | ||
fn into_vec(self) -> Vec<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long term I think we should make this fallible and return Result<Vec<T>, SomeError>
, the decoding has a few possible failures, e.g. a bad encoding (base64/zstd) or an invalid length of the payload (length % T::size != 0
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I added some error handling, do we think this is sufficient to cover the failure cases you outlined?
[RESOLVED] This is where we got to so far. The final error we ran into:
|
Ok, we fixed that particular compilation issue. Next, error handling and then local testing. We need to make sure that the error being propagated would lead any problematic message to get DLQ'd |
@@ -76,7 +77,7 @@ enum MetricValue { | |||
#[serde(rename = "c")] | |||
Counter(f64), | |||
#[serde(rename = "s", deserialize_with = "encoded_series_compat_deserializer")] | |||
Set(EncodedSeries<u64>), | |||
Set(EncodedSeries<u32>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we're changing this is because Relay actually sends the sets as buckets of u32 and not u64. Clickhouse expects u64, but we can test this out locally to make sure it's compatible and we can write data.
.map(TryInto::try_into) | ||
.map(Result::unwrap) | ||
.map(T::decode_bytes) | ||
.collect(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a question of personal preference, but I would find a for
-loop more readable in this case:
let decoded = BASE64.decode(data.as_bytes())?;
let mut res = vec![];
for chunk in decoded.chunks_exact(T::SIZE) {
res.push(T::decode_bytes(chunk.try_into()?));
}
res
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I actually found the iterators to be more idiomatic, but I don't have a strong preference. @ayirr7 Thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong preferences but I found iterator to be fine
fn decode_bytes(bytes: [u8; Self::SIZE]) -> Self { | ||
Self::from_le_bytes(bytes) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The few bytes that make up an integer or a floating point number need to be copied out of the slice at some point anyway. As for try_into
itself, I do hope that the compiler optimizes out any unnecessary copies. You can check https://godbolt.org/ to be sure.
There still might be a possible optimization where we reinterpret the entire array of bytes as an array of numbers without copying anything (relying on the assumption that the platform represents numbers in little endian notation). But I wouldn't go there in the first iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There still might be a possible optimization where we reinterpret the entire array of bytes as an array of numbers without copying anything (relying on the assumption that the platform represents numbers in little endian notation). But I wouldn't go there in the first iteration.
Would that be with std::mem::transmute
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for example
unsafe {
std::mem::transmute::<&[u8], &[u32]>(&raw_bytes)
}
Tested this locally with sentry's
|
fn decode_bytes(bytes: [u8; Self::SIZE]) -> Self { | ||
Self::from_le_bytes(bytes) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for example
unsafe {
std::mem::transmute::<&[u8], &[u32]>(&raw_bytes)
}
.collect()) | ||
} else { | ||
Err(anyhow!( | ||
"Decoded Base64 cannot be chunked into {}, but got {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit on the error message here. Should it be something like "Decoded Base64 cannot be chunked into {}, got {}"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
@@ -50,11 +50,13 @@ json-schema-diff = "0.1.7" | |||
serde_path_to_error = "0.1.15" | |||
hyper = "1.2.0" | |||
tokio-stream = "0.1.15" | |||
data-encoding = "2.5.0" | |||
|
|||
|
|||
[patch.crates-io] | |||
rdkafka = { git = "https://github.com/fede1024/rust-rdkafka" } | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we want to also bump the Kafka schema in this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did it
Adds BASE64 decoding support to generic metrics processors.
TODO: