Improve table search speed through lookups #112

indutny · 2023-01-28T18:25:33Z

Prior to this change table search would have to do a binary search over about 1000 entries which resulted in around 10 memory loads on average. In this commit we reduce the search space by doing a pre-lookup in a generated table to get a smaller (often zero-length) slice of the full sorted range list. On average this gives us just one entry of the range list to perform binary search on, which reduces the average number of memory loads to 2.

indutny · 2023-01-28T18:48:11Z

With the help of #113 this gives:

Between 0% and 15% improvement on grapheme benchmark (depending on language)
Between 22% and 40% improvement on word benchmarks (depending on language)

Full Report: https://gist.github.com/indutny/9f5f623c52d6e935225edfe09857ef99

indutny · 2023-01-30T17:44:27Z

Rebased over the latest master, but it looks like nothing changed in this PR.

Manishearth · 2023-01-30T17:51:13Z

src/tables.rs

@@ -387,9 +387,81 @@ pub mod grapheme {
    }

    pub fn grapheme_category(c: char) -> (u32, u32, GraphemeCat) {
-        bsearch_range_value_table(c, grapheme_cat_table)
+        let idx = c as usize / 0x80;


please add/generate comments into the code that explains what's going on here

also on the const

Sounds good! Force pushed with comments!

indutny-signal · 2023-01-30T22:11:38Z

Let me run benchmarks on this first, it looks like I might have regressed with my latest refactor.

Prior to this change table search would have to do a binary search over about 1000 entries which resulted in around 10 memory loads on average. In this commit we reduce the search space by doing a pre-lookup in a generated table to get a smaller (often zero-length) slice of the full sorted range list. On average this gives us just one entry of the range list to perform binary search on, which reduces the average number of memory loads to 2.

Manishearth · 2023-01-30T22:56:09Z

Alright, let me know when I should look at this!

indutny · 2023-01-30T23:00:53Z

@Manishearth Should be good now, sorry for the back and forth! I made some simplifications and benchmarks looks even better than before: https://gist.github.com/indutny/c8d29a00680bfbaf19d22d02a7175c0d ( with up to 51% improvement on some word bounds benches).

Manishearth · 2023-01-30T23:11:28Z

Thanks!

indutny · 2023-01-30T23:13:05Z

Wow! Thank you for merging it!

Not to impose, but I was wondering when you planned to release a new version of the library?

Manishearth · 2023-01-30T23:15:47Z

I'd happily merge a PR bumping the library to 1.10.1.

indutny · 2023-01-30T23:18:24Z

You got it!

jrose-signal · 2023-01-30T23:24:29Z

Wait, I have one more optimization for you. :-) (EDIT: I'm @indutny's coworker, we've been loosely collaborating and independently looking at optimization opportunities in a few of the unicode-rs crates.)

jrose-signal · 2023-01-31T01:00:25Z

Ah, it looks like @indutny's optimization has made mine irrelevant—some texts get faster, others get slower. So it's probably not worth it.

indutny force-pushed the feature/lookup-and-search branch from 0b2442c to a2f1b04 Compare January 30, 2023 17:45

Manishearth reviewed Jan 30, 2023

View reviewed changes

indutny force-pushed the feature/lookup-and-search branch 4 times, most recently from f52636c to bb6b82a Compare January 30, 2023 22:04

indutny force-pushed the feature/lookup-and-search branch from bb6b82a to b4c9ce1 Compare January 30, 2023 22:49

Manishearth approved these changes Jan 30, 2023

View reviewed changes

Manishearth merged commit e94d2fe into unicode-rs:master Jan 30, 2023

indutny deleted the feature/lookup-and-search branch January 30, 2023 23:12

indutny-signal mentioned this pull request Jan 31, 2023

Bump to 1.10.1 #114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve table search speed through lookups #112

Improve table search speed through lookups #112

indutny commented Jan 28, 2023

indutny commented Jan 28, 2023 •

edited

Loading

indutny commented Jan 30, 2023

Manishearth Jan 30, 2023

indutny Jan 30, 2023

indutny-signal commented Jan 30, 2023

Manishearth commented Jan 30, 2023

indutny commented Jan 30, 2023

Manishearth commented Jan 30, 2023

indutny commented Jan 30, 2023

Manishearth commented Jan 30, 2023

indutny commented Jan 30, 2023

jrose-signal commented Jan 30, 2023 •

edited

Loading

jrose-signal commented Jan 31, 2023

Improve table search speed through lookups #112

Improve table search speed through lookups #112

Conversation

indutny commented Jan 28, 2023

indutny commented Jan 28, 2023 • edited Loading

indutny commented Jan 30, 2023

Manishearth Jan 30, 2023

Choose a reason for hiding this comment

indutny Jan 30, 2023

Choose a reason for hiding this comment

indutny-signal commented Jan 30, 2023

Manishearth commented Jan 30, 2023

indutny commented Jan 30, 2023

Manishearth commented Jan 30, 2023

indutny commented Jan 30, 2023

Manishearth commented Jan 30, 2023

indutny commented Jan 30, 2023

jrose-signal commented Jan 30, 2023 • edited Loading

jrose-signal commented Jan 31, 2023

indutny commented Jan 28, 2023 •

edited

Loading

jrose-signal commented Jan 30, 2023 •

edited

Loading