Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String search kernel optimisations #6107

Open
6 tasks
samuelcolvin opened this issue Jul 24, 2024 · 4 comments
Open
6 tasks

String search kernel optimisations #6107

samuelcolvin opened this issue Jul 24, 2024 · 4 comments

Comments

@samuelcolvin
Copy link
Contributor

The main context for this is well described by BurntSushi/memchr#156.

I think (in rough order of impact) we should:

  • switch from str.contains to memchr
  • switch from str.starts_with to to hopefully memchr, otherwise quick_strings::starts_with - there's no "what if the haystack is very long" concern since we're looking at the start of the string, so the difference between memchr and quick_strings won't be as big, or even might be negative
  • switch from using starts_with_ignore_ascii_case to quick_strings::istarts_with
  • same for *ends_with
  • switch from Regex to use quick_strings::icontains (copying the code) for ILIKE - maybe we have to check it's actually faster for large haystacks? - this might have the biggest impact in some scenarois, but me should be careful
  • to use those improvements, switch from some direct use of str.contains etc in like.rs to use Predicate

(I'm not suggesting that we make quick_strings a dependency, it was just a scratch experiment, if we use any of that code we should copy it.

@samuelcolvin
Copy link
Contributor Author

I'm keen to try and work on this.

@alamb
Copy link
Contributor

alamb commented Jul 25, 2024

Thanks @samuelcolvin

I think in general the basic requirement for performance optimizations in this crate is benchmarks that show performance improvements to justify the additional code complexity / maintenance burden.

I think there are already several cargo bench style benchmarks for string operations -- maybe a good first step would be to review them and add any additional cases you think are not covered that would benefit from the optimizations described above

@alamb
Copy link
Contributor

alamb commented Jul 25, 2024

I think @Dandandan and @jhorstmann are especailly execited by low level optimizations like this 😁

@samuelcolvin
Copy link
Contributor Author

While working on this, I found #6145, we should merge that, then rebase and review the other PRs here.

@alamb alamb changed the title String search optimisations String search kernel optimisations Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants