You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've wanted chunked UTF-8 decoding twice recently for different escaping routines, and have used simdutf8::compat::from_utf8 in a loop to achieve that. I would really like to be able to use an API like Utf8Chunks from #[feature(utf8_lossy)] or bstr::Utf8Chunks, but with the faster validation of this crate. Utf8Chunks avoids the disconnect between the length of the valid prefix and the prefix as a string. Additionally, I suspect an API for this could avoid some overhead from decoding in a loop.
I've wanted chunked UTF-8 decoding twice recently for different escaping routines, and have used
simdutf8::compat::from_utf8
in a loop to achieve that. I would really like to be able to use an API likeUtf8Chunks
from#[feature(utf8_lossy)]
orbstr::Utf8Chunks
, but with the faster validation of this crate.Utf8Chunks
avoids the disconnect between the length of the valid prefix and the prefix as a string. Additionally, I suspect an API for this could avoid some overhead from decoding in a loop.I ended up writing something close to this:
Compare to the stdlib implementation of
String::from_utf8_lossy
, which avoids any direct offset fiddling and unchecked conversions:The text was updated successfully, but these errors were encountered: