Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify benchmarks to compare against stdlib functions #133

Merged
merged 1 commit into from
Apr 16, 2024

Conversation

Jonas-Heinrich
Copy link
Contributor

This commit refactors and expands the microbenchmarks in order to evaluate the performance hit of handling full unicode. It is expected that unicode-segmentation's functions are slower since they consider graphemes, the question is just how much.

  • bump criterion dependency
  • rename benchmarks to remove unicode/grapheme relationship
  • move benchmarks into benchmark group
  • add scalar versions with stdlib "equivalents" (scalars)

This commit refactors and expands the microbenchmarks in order to
evaluate the performance hit of handling full unicode. It is expected
that `unicode-segmentation`'s functions are slower since they consider
graphemes, the question is just how much.

- bump criterion dependency
- rename benchmarks to remove unicode/grapheme relationship
- move benchmarks into benchmark group
- add scalar versions with stdlib "equivalents" (scalars)
@Jonas-Heinrich
Copy link
Contributor Author

Results on M1 Pro:

 cargo criterion                                                                                                                                                                                                                                                                                                                                                                        chars/grapheme/arabic   time:   [225.81 µs 227.21 µs 228.99 µs]                                  
chars/grapheme/english  time:   [321.12 µs 324.93 µs 332.17 µs]                                   
chars/grapheme/hindi    time:   [310.65 µs 313.42 µs 317.46 µs]                                 
chars/grapheme/japanese time:   [263.73 µs 264.32 µs 265.04 µs]                                    
chars/grapheme/korean   time:   [374.51 µs 375.28 µs 376.23 µs]                                  
chars/grapheme/mandarin time:   [181.40 µs 181.92 µs 182.43 µs]                                    
chars/grapheme/russian  time:   [223.38 µs 225.97 µs 230.94 µs]                                   
chars/grapheme/source_code                                                                            
                        time:   [331.74 µs 339.59 µs 350.17 µs]
chars/scalar/arabic     time:   [34.403 µs 34.629 µs 34.872 µs]                                 
chars/scalar/english    time:   [29.143 µs 29.238 µs 29.333 µs]                                  
chars/scalar/hindi      time:   [32.569 µs 32.903 µs 33.253 µs]                                
chars/scalar/japanese   time:   [19.473 µs 19.578 µs 19.705 µs]                                   
chars/scalar/korean     time:   [28.406 µs 28.835 µs 29.526 µs]                                 
chars/scalar/mandarin   time:   [18.407 µs 18.524 µs 18.688 µs]                                   
chars/scalar/russian    time:   [33.282 µs 33.840 µs 34.721 µs]                                  
chars/scalar/source_code                                                                             
                        time:   [29.295 µs 29.410 µs 29.545 µs]

Gnuplot not found, using plotters backend
word_bounds/grapheme/arabic                                                                            
                        time:   [307.01 µs 307.80 µs 308.64 µs]
word_bounds/grapheme/english                                                                            
                        time:   [546.69 µs 548.37 µs 550.20 µs]
word_bounds/grapheme/hindi                                                                            
                        time:   [258.34 µs 259.83 µs 261.33 µs]
word_bounds/grapheme/japanese                                                                            
                        time:   [451.61 µs 452.79 µs 454.02 µs]
word_bounds/grapheme/korean                                                                            
                        time:   [186.72 µs 187.40 µs 188.27 µs]
word_bounds/grapheme/mandarin                                                                            
                        time:   [302.78 µs 303.41 µs 304.11 µs]
word_bounds/grapheme/russian                                                                            
                        time:   [213.85 µs 214.64 µs 215.40 µs]
word_bounds/grapheme/source_code                                                                            
                        time:   [645.49 µs 647.82 µs 650.39 µs]

Gnuplot not found, using plotters backend
words/grapheme/arabic   time:   [408.06 µs 409.05 µs 410.07 µs]                                  
words/grapheme/english  time:   [565.94 µs 570.32 µs 576.88 µs]                                   
words/grapheme/hindi    time:   [288.32 µs 289.24 µs 290.26 µs]                                 
words/grapheme/japanese time:   [769.22 µs 773.32 µs 781.58 µs]                                    
words/grapheme/korean   time:   [239.53 µs 240.74 µs 241.96 µs]                                  
words/grapheme/mandarin time:   [637.44 µs 638.90 µs 640.41 µs]                                    
words/grapheme/russian  time:   [238.54 µs 239.48 µs 240.84 µs]                                   
words/grapheme/source_code                                                                            
                        time:   [672.63 µs 674.83 µs 677.05 µs]
words/scalar/arabic     time:   [75.142 µs 75.378 µs 75.636 µs]                                
words/scalar/english    time:   [91.580 µs 92.256 µs 93.210 µs]                                 
words/scalar/hindi      time:   [46.629 µs 46.863 µs 47.107 µs]                                
words/scalar/japanese   time:   [64.907 µs 65.176 µs 65.509 µs]                                  
words/scalar/korean     time:   [48.730 µs 49.012 µs 49.296 µs]                                 
words/scalar/mandarin   time:   [35.407 µs 35.436 µs 35.469 µs]                                   
words/scalar/russian    time:   [71.672 µs 71.774 µs 71.885 µs]                                 
words/scalar/source_code                                                                            
                        time:   [100.26 µs 100.49 µs 100.73 µs]

@Manishearth Manishearth merged commit 3ff9de6 into unicode-rs:master Apr 16, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants