Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer inline representation over static? #276

Closed
overlookmotel opened this issue Mar 19, 2023 · 2 comments
Closed

Prefer inline representation over static? #276

overlookmotel opened this issue Mar 19, 2023 · 2 comments

Comments

@overlookmotel
Copy link
Contributor

Currently, static representation takes priority over inline.

For example, Atom::from("bananas") performs the following operations:

  • Calculate hash of "bananas".
  • Calculate index in static set (several arithmetic operations).
  • String comparison of "bananas" and the string in the static set at that index.

If "bananas" is not in the static set, it falls back to inline representation.

Is there a reason why strings of 7 bytes or less don't skip all the above, and go direct to using inline representation?

As far as I can see, inline representation is much cheaper, and which representation takes priority doesn't matter for the rest of Atom's implementation - as long as it's consistent.

Short strings could still be input to the codegen so they can be used in atom!(), but codegen would not add them to the static set, and instead generate pack_inline() calls (instead of pack_static()).

I can't see any downside to this, and imagine it could be significantly more performant for use cases where there are lots of short strings in use. But am I missing something?

If you think this may be worth looking into, I'd be happy to implement and benchmark.

@overlookmotel
Copy link
Contributor Author

overlookmotel commented Mar 20, 2023

A very quick benchmark:

Before:

Atom::from("")
    time:   [18.124 ns 18.163 ns 18.209 ns]

Atom::from("i")
    time:   [23.256 ns 23.311 ns 23.373 ns]

Atom::from("bananas")
    time:   [25.756 ns 25.827 ns 25.902 ns]

After:

Atom::from("")
    time:   [5.3891 ns 5.4092 ns 5.4281 ns]
    change: [-70.539% -70.401% -70.263%] (p = 0.00 < 0.05)
    Performance has improved.

Atom::from("i")
    time:   [8.5015 ns 8.5451 ns 8.6017 ns]
    change: [-64.009% -63.692% -63.438%] (p = 0.00 < 0.05)
    Performance has improved.

Atom::from("bananas")
    time:   [10.098 ns 10.124 ns 10.149 ns]
    change: [-61.047% -60.838% -60.657%] (p = 0.00 < 0.05)
    Performance has improved.

from() is 2.5x faster for short strings if inline representation has priority over static.

I also wonder if reducing the size of the static set (because it no longer contains any strings under 8 bytes) might improve performance for static strings too - but I've not tested that.

The change is really simple: src/atom.rs
Benchmark code here: benches/inline.rs

If you don't see a downside to this, would be happy to make a PR (also including necessary changes to the codegen).

overlookmotel added a commit to overlookmotel/string-cache that referenced this issue Jul 4, 2023
overlookmotel added a commit to overlookmotel/string-cache that referenced this issue Jul 5, 2023
overlookmotel added a commit to overlookmotel/string-cache that referenced this issue Jul 17, 2024
overlookmotel added a commit to overlookmotel/string-cache that referenced this issue Jul 17, 2024
overlookmotel added a commit to overlookmotel/string-cache that referenced this issue Jul 17, 2024
overlookmotel added a commit to overlookmotel/string-cache that referenced this issue Jul 17, 2024
overlookmotel added a commit to overlookmotel/string-cache that referenced this issue Jul 17, 2024
github-merge-queue bot pushed a commit that referenced this issue Jul 31, 2024
* Benchmarks use longer static strings

* Use inline for short strings

Closes #276.
@nicoburns
Copy link

Implemented in #278

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants