-
Notifications
You must be signed in to change notification settings - Fork 1.9k
perf: Improve performance of md5
#19568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -33,7 +33,7 @@ use datafusion_common::{ | |
| use datafusion_expr::ColumnarValue; | ||
| use md5::Md5; | ||
| use sha2::{Sha224, Sha256, Sha384, Sha512}; | ||
| use std::fmt::{self, Write}; | ||
| use std::fmt; | ||
| use std::str::FromStr; | ||
| use std::sync::Arc; | ||
|
|
||
|
|
@@ -157,14 +157,18 @@ pub fn md5(args: &[ColumnarValue]) -> Result<ColumnarValue> { | |
| }) | ||
| } | ||
|
|
||
| /// this function exists so that we do not need to pull in the crate hex. it is only used by md5 | ||
| /// function below | ||
| /// Hex encoding lookup table for fast byte-to-hex conversion | ||
| const HEX_CHARS_LOWER: &[u8; 16] = b"0123456789abcdef"; | ||
|
|
||
| /// Fast hex encoding using a lookup table instead of format strings. | ||
| /// This is significantly faster than using `write!("{:02x}")` for each byte. | ||
| #[inline] | ||
| fn hex_encode<T: AsRef<[u8]>>(data: T) -> String { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. great, curious if that would be more performance if create a string directly from bytes without utf checks(which is redundant ) and extract casts?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried this, but the difference is tiny, so it may not be worth it. |
||
| let mut s = String::with_capacity(data.as_ref().len() * 2); | ||
| for b in data.as_ref() { | ||
| // Writing to a string never errors, so we can unwrap here. | ||
| write!(&mut s, "{b:02x}").unwrap(); | ||
| let bytes = data.as_ref(); | ||
| let mut s = String::with_capacity(bytes.len() * 2); | ||
| for &b in bytes { | ||
| s.push(HEX_CHARS_LOWER[(b >> 4) as usize] as char); | ||
| s.push(HEX_CHARS_LOWER[(b & 0x0f) as usize] as char); | ||
| } | ||
| s | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit weird to put the comment here. Maybe put on top of
HEX_CHARS_LOWERlookup code.