Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 11 additions & 7 deletions datafusion/functions/src/crypto/basic.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ use datafusion_common::{
use datafusion_expr::ColumnarValue;
use md5::Md5;
use sha2::{Sha224, Sha256, Sha384, Sha512};
use std::fmt::{self, Write};
use std::fmt;
use std::str::FromStr;
use std::sync::Arc;

Expand Down Expand Up @@ -157,14 +157,18 @@ pub fn md5(args: &[ColumnarValue]) -> Result<ColumnarValue> {
})
}

/// this function exists so that we do not need to pull in the crate hex. it is only used by md5
/// function below
/// Hex encoding lookup table for fast byte-to-hex conversion
const HEX_CHARS_LOWER: &[u8; 16] = b"0123456789abcdef";

/// Fast hex encoding using a lookup table instead of format strings.
/// This is significantly faster than using `write!("{:02x}")` for each byte.
Comment on lines +163 to +164
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit weird to put the comment here. Maybe put on top of HEX_CHARS_LOWER lookup code.

#[inline]
fn hex_encode<T: AsRef<[u8]>>(data: T) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, curious if that would be more performance if create a string directly from bytes without utf checks(which is redundant ) and extract casts?

let bytes = data.as_ref();
let mut out = Vec::with_capacity(bytes.len() * 2);

for &b in bytes {
        out.push(HEX_CHARS_LOWER[(b >> 4) as usize]);
        out.push(HEX_CHARS_LOWER[(b & 0x0f) as usize]);
}

unsafe { String::from_utf8_unchecked(out) }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this, but the difference is tiny, so it may not be worth it.

let mut s = String::with_capacity(data.as_ref().len() * 2);
for b in data.as_ref() {
// Writing to a string never errors, so we can unwrap here.
write!(&mut s, "{b:02x}").unwrap();
let bytes = data.as_ref();
let mut s = String::with_capacity(bytes.len() * 2);
for &b in bytes {
s.push(HEX_CHARS_LOWER[(b >> 4) as usize] as char);
s.push(HEX_CHARS_LOWER[(b & 0x0f) as usize] as char);
}
s
}
Expand Down