-
Notifications
You must be signed in to change notification settings - Fork 83
Enhance matching words #483
Conversation
e330917 to
d3cb16e
Compare
d3cb16e to
fa7d3a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am aware that this PR is incomplete and must be benchmarked, especially the essential parts: the create_matching_words function and the matches/mod.rs module.
The only remark that I want to say is that it seems quite complex, and I have no real idea how to fix that. We should probably delay it and benchmark it before releasing it. There are some parts that could be rewritten with idiomatic code. I know that the algorithm complexity doesn't make that easy to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is much better now, thank you for that!
b572590 to
7c5355d
Compare
http-ui/src/main.rs
Outdated
| let analyzed: Vec<_> = analyzed.tokens().collect(); | ||
| let mut matcher = matcher_builder.build(&analyzed[..], &old_string); | ||
|
|
||
| Value::String(matcher.format(true, true).to_string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite like this format(true, true) thing as it is not clear what those too booleans mean, but we can open an issue and address this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move them into the builder with a specific method for each 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose to use FormatOption struct coming from Meilisearch which will ease the future integration.
7c5355d to
011f821
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is, indeed, better with this struct, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like what I see, merging!
bors merge
| let matching_words = | ||
| create_matching_words(self, self.authorize_typos, &primitive_query)?; | ||
| Ok(Some((qt, primitive_query, matching_words))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This addition is probably the source of the performance issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to replace create_matching_word call by:
fn matching_words() -> MatchingWords {
let matching_words = vec![
(vec![MatchingWord::new("split".to_string(), 0, false)], vec![0]),
(vec![MatchingWord::new("the".to_string(), 0, false)], vec![1]),
(vec![MatchingWord::new("world".to_string(), 1, true)], vec![2]),
];
MatchingWords::new(matching_words)
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MatchingWords::new(vec![])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so you are running the benchmarks from #537!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything got faster by a good margin;
Your fix against this PR
group search_songs_improve-bench_3824ee55 search_songs_main_ea4bb940
----- ----------------------------------- --------------------------
smol-songs.csv: asc + default/Notstandskomitee 1.00 3.5±0.02ms ? ?/sec 1.03 3.6±0.02ms ? ?/sec
smol-songs.csv: asc + default/charles 1.00 2.5±0.01ms ? ?/sec 1.04 2.5±0.01ms ? ?/sec
smol-songs.csv: asc + default/charles mingus 1.00 3.6±0.02ms ? ?/sec 1.08 3.9±0.02ms ? ?/sec
smol-songs.csv: asc + default/david 1.00 3.3±0.01ms ? ?/sec 1.02 3.3±0.04ms ? ?/sec
smol-songs.csv: asc + default/david bowie 1.00 5.1±0.02ms ? ?/sec 1.05 5.4±0.02ms ? ?/sec
smol-songs.csv: asc + default/john 1.00 3.6±0.01ms ? ?/sec 1.03 3.7±0.02ms ? ?/sec
smol-songs.csv: asc + default/marcus miller 1.00 5.6±0.02ms ? ?/sec 1.04 5.9±0.02ms ? ?/sec
smol-songs.csv: asc + default/michael jackson 1.00 5.2±0.02ms ? ?/sec 1.07 5.5±0.02ms ? ?/sec
smol-songs.csv: asc + default/tamo 1.00 1636.1±9.22µs ? ?/sec 1.05 1713.9±11.57µs ? ?/sec
smol-songs.csv: asc + default/thelonious monk 1.00 5.6±0.02ms ? ?/sec 1.05 5.9±0.29ms ? ?/sec
smol-songs.csv: asc/Notstandskomitee 1.00 3.1±0.01ms ? ?/sec 1.10 3.4±0.71ms ? ?/sec
smol-songs.csv: asc/charles 1.00 577.6±4.37µs ? ?/sec 1.16 668.3±3.17µs ? ?/sec
smol-songs.csv: asc/charles mingus 1.00 1101.9±9.81µs ? ?/sec 1.28 1408.8±149.50µs ? ?/sec
smol-songs.csv: asc/david 1.00 865.5±3.22µs ? ?/sec 1.11 957.9±5.74µs ? ?/sec
smol-songs.csv: asc/david bowie 1.00 1456.6±11.51µs ? ?/sec 1.19 1739.9±176.35µs ? ?/sec
smol-songs.csv: asc/john 1.00 761.2±3.43µs ? ?/sec 1.11 845.6±4.98µs ? ?/sec
smol-songs.csv: asc/marcus miller 1.00 1356.3±11.01µs ? ?/sec 1.20 1629.5±111.53µs ? ?/sec
smol-songs.csv: asc/michael jackson 1.00 1374.7±7.89µs ? ?/sec 1.22 1675.3±10.50µs ? ?/sec
smol-songs.csv: asc/tamo 1.00 143.9±1.73µs ? ?/sec 1.59 228.2±2.47µs ? ?/sec
smol-songs.csv: asc/thelonious monk 1.00 3.7±0.02ms ? ?/sec 1.14 4.2±0.88ms ? ?/sec
smol-songs.csv: basic filter: <=/Notstandskomitee 1.00 105.8±0.75µs ? ?/sec 2.69 284.2±2.94µs ? ?/sec
smol-songs.csv: basic filter: <=/charles 1.00 97.8±0.98µs ? ?/sec 2.00 195.8±1.23µs ? ?/sec
smol-songs.csv: basic filter: <=/charles mingus 1.00 321.4±8.88µs ? ?/sec 1.89 608.4±6.02µs ? ?/sec
smol-songs.csv: basic filter: <=/david 1.00 97.1±1.81µs ? ?/sec 1.95 189.4±2.81µs ? ?/sec
smol-songs.csv: basic filter: <=/david bowie 1.00 313.5±8.98µs ? ?/sec 1.85 579.8±2.84µs ? ?/sec
smol-songs.csv: basic filter: <=/john 1.00 97.3±0.55µs ? ?/sec 1.87 182.0±1.52µs ? ?/sec
smol-songs.csv: basic filter: <=/marcus miller 1.00 314.8±1.76µs ? ?/sec 1.90 596.8±3.63µs ? ?/sec
smol-songs.csv: basic filter: <=/michael jackson 1.00 322.6±9.09µs ? ?/sec 1.88 608.1±3.91µs ? ?/sec
smol-songs.csv: basic filter: <=/tamo 1.00 95.8±0.76µs ? ?/sec 1.98 190.1±14.42µs ? ?/sec
smol-songs.csv: basic filter: <=/thelonious monk 1.00 325.1±4.19µs ? ?/sec 1.96 635.9±3.40µs ? ?/sec
smol-songs.csv: basic filter: TO/Notstandskomitee 1.00 106.8±1.10µs ? ?/sec 2.90 309.3±16.35µs ? ?/sec
smol-songs.csv: basic filter: TO/charles 1.00 99.1±0.92µs ? ?/sec 1.99 197.2±6.85µs ? ?/sec
smol-songs.csv: basic filter: TO/charles mingus 1.00 318.5±9.75µs ? ?/sec 1.87 597.1±3.11µs ? ?/sec
smol-songs.csv: basic filter: TO/david 1.00 97.9±3.72µs ? ?/sec 1.92 187.5±2.82µs ? ?/sec
smol-songs.csv: basic filter: TO/david bowie 1.00 315.0±9.05µs ? ?/sec 1.87 590.6±128.06µs ? ?/sec
smol-songs.csv: basic filter: TO/john 1.00 96.9±0.88µs ? ?/sec 1.87 181.3±1.52µs ? ?/sec
smol-songs.csv: basic filter: TO/marcus miller 1.00 316.1±8.40µs ? ?/sec 1.89 596.4±3.81µs ? ?/sec
smol-songs.csv: basic filter: TO/michael jackson 1.00 323.5±8.45µs ? ?/sec 1.87 606.3±3.84µs ? ?/sec
smol-songs.csv: basic filter: TO/tamo 1.00 95.9±0.62µs ? ?/sec 1.92 183.9±1.79µs ? ?/sec
smol-songs.csv: basic filter: TO/thelonious monk 1.00 321.7±8.53µs ? ?/sec 1.97 632.8±4.33µs ? ?/sec
smol-songs.csv: basic placeholder/ 1.01 58.9±0.40µs ? ?/sec 1.00 58.5±0.60µs ? ?/sec
smol-songs.csv: basic with quote/"Notstandskomitee" 1.00 193.8±1.30µs ? ?/sec 1.08 209.4±1.60µs ? ?/sec
smol-songs.csv: basic with quote/"charles" 1.00 167.6±1.41µs ? ?/sec 1.05 175.7±14.53µs ? ?/sec
smol-songs.csv: basic with quote/"charles" "mingus" 1.00 1264.3±8.88µs ? ?/sec 1.02 1286.7±13.80µs ? ?/sec
smol-songs.csv: basic with quote/"david" 1.00 239.4±1.99µs ? ?/sec 1.04 248.2±2.43µs ? ?/sec
smol-songs.csv: basic with quote/"david" "bowie" 1.00 1412.1±9.58µs ? ?/sec 1.02 1441.8±8.48µs ? ?/sec
smol-songs.csv: basic with quote/"john" 1.00 355.4±3.21µs ? ?/sec 1.03 365.3±2.58µs ? ?/sec
smol-songs.csv: basic with quote/"marcus" "miller" 1.00 295.6±2.12µs ? ?/sec 1.04 306.1±1.62µs ? ?/sec
smol-songs.csv: basic with quote/"michael" "jackson" 1.00 1415.9±7.05µs ? ?/sec 1.01 1427.9±10.72µs ? ?/sec
smol-songs.csv: basic with quote/"tamo" 1.00 545.1±4.96µs ? ?/sec 1.01 549.5±5.61µs ? ?/sec
smol-songs.csv: basic with quote/"thelonious" "monk" 1.00 1756.4±9.23µs ? ?/sec 1.00 1764.4±11.15µs ? ?/sec
smol-songs.csv: basic without quote/Notstandskomitee 1.00 3.4±0.02ms ? ?/sec 1.04 3.5±0.03ms ? ?/sec
smol-songs.csv: basic without quote/charles 1.00 342.2±11.65µs ? ?/sec 1.27 434.8±3.86µs ? ?/sec
smol-songs.csv: basic without quote/charles mingus 1.00 3.3±0.01ms ? ?/sec 1.09 3.6±0.03ms ? ?/sec
smol-songs.csv: basic without quote/david 1.00 499.9±3.72µs ? ?/sec 1.17 586.6±3.54µs ? ?/sec
smol-songs.csv: basic without quote/david bowie 1.00 6.6±0.02ms ? ?/sec 1.10 7.2±1.46ms ? ?/sec
smol-songs.csv: basic without quote/john 1.00 1344.7±8.76µs ? ?/sec 1.07 1437.8±10.74µs ? ?/sec
smol-songs.csv: basic without quote/marcus miller 1.00 2.8±0.01ms ? ?/sec 1.10 3.1±0.01ms ? ?/sec
smol-songs.csv: basic without quote/michael jackson 1.00 4.3±0.02ms ? ?/sec 1.08 4.6±0.02ms ? ?/sec
smol-songs.csv: basic without quote/tamo 1.00 993.0±9.34µs ? ?/sec 1.08 1068.3±11.75µs ? ?/sec
smol-songs.csv: basic without quote/thelonious monk 1.00 5.5±0.02ms ? ?/sec 1.09 6.0±1.39ms ? ?/sec
smol-songs.csv: big filter/Notstandskomitee 1.00 3.3±0.02ms ? ?/sec 1.04 3.5±0.02ms ? ?/sec
smol-songs.csv: big filter/charles 1.00 364.3±2.73µs ? ?/sec 1.26 457.4±4.21µs ? ?/sec
smol-songs.csv: big filter/charles mingus 1.00 983.2±17.50µs ? ?/sec 1.27 1253.0±12.26µs ? ?/sec
smol-songs.csv: big filter/david 1.00 989.9±17.46µs ? ?/sec 1.17 1153.3±292.60µs ? ?/sec
smol-songs.csv: big filter/david bowie 1.00 2.0±0.01ms ? ?/sec 1.13 2.3±0.01ms ? ?/sec
smol-songs.csv: big filter/john 1.00 1143.7±10.60µs ? ?/sec 1.08 1234.0±14.98µs ? ?/sec
smol-songs.csv: big filter/marcus miller 1.00 1048.0±9.52µs ? ?/sec 1.28 1338.6±8.39µs ? ?/sec
smol-songs.csv: big filter/michael jackson 1.00 1966.1±12.75µs ? ?/sec 1.16 2.3±0.01ms ? ?/sec
smol-songs.csv: big filter/tamo 1.00 218.2±1.75µs ? ?/sec 1.45 315.4±69.07µs ? ?/sec
smol-songs.csv: big filter/thelonious monk 1.00 3.8±0.01ms ? ?/sec 1.10 4.2±0.02ms ? ?/sec
smol-songs.csv: desc + default/Notstandskomitee 1.00 3.5±0.02ms ? ?/sec 1.04 3.7±0.02ms ? ?/sec
smol-songs.csv: desc + default/charles 1.00 1787.2±9.68µs ? ?/sec 1.07 1906.7±224.75µs ? ?/sec
smol-songs.csv: desc + default/charles mingus 1.00 2.8±0.01ms ? ?/sec 1.11 3.1±0.02ms ? ?/sec
smol-songs.csv: desc + default/david 1.00 6.1±0.05ms ? ?/sec 1.02 6.2±0.02ms ? ?/sec
smol-songs.csv: desc + default/david bowie 1.00 9.4±0.03ms ? ?/sec 1.05 9.9±0.04ms ? ?/sec
smol-songs.csv: desc + default/john 1.00 4.8±0.02ms ? ?/sec 1.04 5.0±0.08ms ? ?/sec
smol-songs.csv: desc + default/marcus miller 1.00 4.4±0.05ms ? ?/sec 1.06 4.7±0.13ms ? ?/sec
smol-songs.csv: desc + default/michael jackson 1.00 7.0±0.03ms ? ?/sec 1.04 7.4±0.03ms ? ?/sec
smol-songs.csv: desc + default/tamo 1.00 1756.3±10.72µs ? ?/sec 1.05 1842.7±8.64µs ? ?/sec
smol-songs.csv: desc + default/thelonious monk 1.00 5.6±0.05ms ? ?/sec 1.02 5.7±0.02ms ? ?/sec
smol-songs.csv: desc/Notstandskomitee 1.00 3.1±0.01ms ? ?/sec 1.06 3.3±0.02ms ? ?/sec
smol-songs.csv: desc/charles 1.00 582.3±6.58µs ? ?/sec 1.16 675.0±4.55µs ? ?/sec
smol-songs.csv: desc/charles mingus 1.00 1110.5±9.88µs ? ?/sec 1.25 1382.8±15.19µs ? ?/sec
smol-songs.csv: desc/david 1.00 860.4±3.70µs ? ?/sec 1.14 978.8±114.06µs ? ?/sec
smol-songs.csv: desc/david bowie 1.00 1454.3±16.35µs ? ?/sec 1.18 1722.9±17.64µs ? ?/sec
smol-songs.csv: desc/john 1.00 765.6±4.85µs ? ?/sec 1.11 853.0±6.45µs ? ?/sec
smol-songs.csv: desc/marcus miller 1.00 1355.7±24.10µs ? ?/sec 1.22 1647.6±70.51µs ? ?/sec
smol-songs.csv: desc/michael jackson 1.00 1374.6±9.85µs ? ?/sec 1.21 1658.3±8.39µs ? ?/sec
smol-songs.csv: desc/tamo 1.00 143.9±0.85µs ? ?/sec 1.61 231.1±1.48µs ? ?/sec
smol-songs.csv: desc/thelonious monk 1.00 3.7±0.02ms ? ?/sec 1.09 4.0±0.02ms ? ?/sec
smol-songs.csv: prefix search/a 1.00 1291.9±9.99µs ? ?/sec 1.06 1371.2±14.75µs ? ?/sec
smol-songs.csv: prefix search/b 1.00 1152.7±7.74µs ? ?/sec 1.07 1232.6±10.69µs ? ?/sec
smol-songs.csv: prefix search/i 1.00 1434.3±34.29µs ? ?/sec 1.04 1495.2±10.30µs ? ?/sec
smol-songs.csv: prefix search/s 1.00 907.9±3.77µs ? ?/sec 1.09 988.3±11.32µs ? ?/sec
smol-songs.csv: prefix search/x 1.00 359.5±12.13µs ? ?/sec 1.20 431.9±4.86µs ? ?/sec
smol-songs.csv: proximity/7000 Danses Un Jour Dans Notre Vie 1.00 9.4±0.04ms ? ?/sec 2.00 18.7±1.53ms ? ?/sec
smol-songs.csv: proximity/The Disneyland Sing-Along Chorus 1.00 10.1±0.05ms ? ?/sec 1.26 12.7±0.06ms ? ?/sec
smol-songs.csv: proximity/Under Great Northern Lights 1.00 3.8±0.02ms ? ?/sec 1.35 5.1±0.02ms ? ?/sec
smol-songs.csv: proximity/black saint sinner lady 1.00 4.9±0.02ms ? ?/sec 1.26 6.1±0.03ms ? ?/sec
smol-songs.csv: proximity/les dangeureuses 1960 1.00 5.4±0.02ms ? ?/sec 1.16 6.2±0.03ms ? ?/sec
smol-songs.csv: typo/Arethla Franklin 1.00 614.2±3.93µs ? ?/sec 1.45 893.1±4.32µs ? ?/sec
smol-songs.csv: typo/Disnaylande 1.00 3.3±0.01ms ? ?/sec 1.05 3.4±0.01ms ? ?/sec
smol-songs.csv: typo/dire straights 1.00 3.9±0.02ms ? ?/sec 1.07 4.2±0.02ms ? ?/sec
smol-songs.csv: typo/fear of the duck 1.00 1788.2±10.08µs ? ?/sec 1.68 3.0±0.01ms ? ?/sec
smol-songs.csv: typo/indochie 1.00 177.9±1.25µs ? ?/sec 1.57 278.5±2.03µs ? ?/sec
smol-songs.csv: typo/indochien 1.00 182.3±1.55µs ? ?/sec 1.72 314.1±1.76µs ? ?/sec
smol-songs.csv: typo/klub des loopers 1.00 993.2±15.53µs ? ?/sec 1.62 1612.6±10.66µs ? ?/sec
smol-songs.csv: typo/michel depech 1.00 664.9±4.01µs ? ?/sec 1.43 951.3±5.31µs ? ?/sec
smol-songs.csv: typo/mongus 1.00 211.6±1.53µs ? ?/sec 1.42 300.9±3.48µs ? ?/sec
smol-songs.csv: typo/stromal 1.00 218.1±1.48µs ? ?/sec 1.43 312.3±2.77µs ? ?/sec
smol-songs.csv: typo/the white striper 1.00 1155.0±10.78µs ? ?/sec 1.57 1810.3±19.64µs ? ?/sec
smol-songs.csv: typo/thelonius monk 1.00 447.8±2.53µs ? ?/sec 1.68 752.5±3.37µs ? ?/sec
smol-songs.csv: words/7000 Danses / Le Baiser / je me trompe de mots 1.00 64.2±0.61ms ? ?/sec 1.48 94.8±2.75ms ? ?/sec
smol-songs.csv: words/Bring Your Daughter To The Slaughter but now this is not part of the title 1.00 128.3±0.49ms ? ?/sec 1.48 189.5±6.95ms ? ?/sec
smol-songs.csv: words/The Disneyland Children's Sing-Alone song 1.00 23.4±0.10ms ? ?/sec 1.43 33.4±0.18ms ? ?/sec
smol-songs.csv: words/les liaisons dangeureuses 1793 1.00 5.7±0.02ms ? ?/sec 1.26 7.2±0.04ms ? ?/sec
smol-songs.csv: words/seven nation mummy 1.00 1888.9±9.06µs ? ?/sec 1.41 2.7±0.59ms ? ?/sec
smol-songs.csv: words/the black saint and the sinner lady and the good doggo 1.00 143.2±0.48ms ? ?/sec 1.42 202.9±6.68ms ? ?/sec
smol-songs.csv: words/whathavenotnsuchforth and a good amount of words to pop to match the first one 1.00 147.1±0.48ms ? ?/sec 1.35 199.2±3.75ms ? ?/sec
And your fix against the main branch
group search_songs_improve-bench_3824ee55 search_songs_main_19dac01c
----- ----------------------------------- --------------------------
smol-songs.csv: asc + default/Notstandskomitee 1.00 3.5±0.02ms ? ?/sec 1.04 3.7±0.02ms ? ?/sec
smol-songs.csv: asc + default/charles 1.00 2.5±0.01ms ? ?/sec 1.05 2.6±0.01ms ? ?/sec
smol-songs.csv: asc + default/charles mingus 1.00 3.6±0.02ms ? ?/sec 1.12 4.0±0.05ms ? ?/sec
smol-songs.csv: asc + default/david 1.00 3.3±0.01ms ? ?/sec 1.04 3.4±0.01ms ? ?/sec
smol-songs.csv: asc + default/david bowie 1.00 5.1±0.02ms ? ?/sec 1.07 5.5±0.02ms ? ?/sec
smol-songs.csv: asc + default/john 1.00 3.6±0.01ms ? ?/sec 1.04 3.7±0.02ms ? ?/sec
smol-songs.csv: asc + default/marcus miller 1.00 5.6±0.02ms ? ?/sec 1.09 6.1±0.03ms ? ?/sec
smol-songs.csv: asc + default/michael jackson 1.00 5.2±0.02ms ? ?/sec 1.07 5.5±0.02ms ? ?/sec
smol-songs.csv: asc + default/tamo 1.00 1636.1±9.22µs ? ?/sec 1.04 1708.4±16.69µs ? ?/sec
smol-songs.csv: asc + default/thelonious monk 1.00 5.6±0.02ms ? ?/sec 1.04 5.8±0.02ms ? ?/sec
smol-songs.csv: asc/Notstandskomitee 1.00 3.1±0.01ms ? ?/sec 1.05 3.3±0.02ms ? ?/sec
smol-songs.csv: asc/charles 1.00 577.6±4.37µs ? ?/sec 1.18 678.9±8.35µs ? ?/sec
smol-songs.csv: asc/charles mingus 1.00 1101.9±9.81µs ? ?/sec 1.28 1409.4±9.82µs ? ?/sec
smol-songs.csv: asc/david 1.00 865.5±3.22µs ? ?/sec 1.11 960.4±4.86µs ? ?/sec
smol-songs.csv: asc/david bowie 1.00 1456.6±11.51µs ? ?/sec 1.20 1743.8±11.87µs ? ?/sec
smol-songs.csv: asc/john 1.00 761.2±3.43µs ? ?/sec 1.12 852.3±5.11µs ? ?/sec
smol-songs.csv: asc/marcus miller 1.00 1356.3±11.01µs ? ?/sec 1.20 1626.2±11.56µs ? ?/sec
smol-songs.csv: asc/michael jackson 1.00 1374.7±7.89µs ? ?/sec 1.22 1683.3±16.59µs ? ?/sec
smol-songs.csv: asc/tamo 1.00 143.9±1.73µs ? ?/sec 1.59 229.0±2.94µs ? ?/sec
smol-songs.csv: asc/thelonious monk 1.00 3.7±0.02ms ? ?/sec 1.11 4.1±0.02ms ? ?/sec
smol-songs.csv: basic filter: <=/Notstandskomitee 1.00 105.8±0.75µs ? ?/sec 2.71 286.2±4.35µs ? ?/sec
smol-songs.csv: basic filter: <=/charles 1.00 97.8±0.98µs ? ?/sec 1.99 194.3±1.34µs ? ?/sec
smol-songs.csv: basic filter: <=/charles mingus 1.00 321.4±8.88µs ? ?/sec 1.85 593.6±4.56µs ? ?/sec
smol-songs.csv: basic filter: <=/david 1.00 97.1±1.81µs ? ?/sec 1.91 185.1±1.23µs ? ?/sec
smol-songs.csv: basic filter: <=/david bowie 1.00 313.5±8.98µs ? ?/sec 1.83 573.4±4.85µs ? ?/sec
smol-songs.csv: basic filter: <=/john 1.00 97.3±0.55µs ? ?/sec 1.85 180.2±1.33µs ? ?/sec
smol-songs.csv: basic filter: <=/marcus miller 1.00 314.8±1.76µs ? ?/sec 1.87 587.2±4.71µs ? ?/sec
smol-songs.csv: basic filter: <=/michael jackson 1.00 322.6±9.09µs ? ?/sec 1.88 606.4±4.28µs ? ?/sec
smol-songs.csv: basic filter: <=/tamo 1.00 95.8±0.76µs ? ?/sec 1.89 181.3±1.12µs ? ?/sec
smol-songs.csv: basic filter: <=/thelonious monk 1.00 325.1±4.19µs ? ?/sec 1.97 640.7±5.21µs ? ?/sec
smol-songs.csv: basic filter: TO/Notstandskomitee 1.00 106.8±1.10µs ? ?/sec 2.63 280.8±1.63µs ? ?/sec
smol-songs.csv: basic filter: TO/charles 1.00 99.1±0.92µs ? ?/sec 1.98 196.3±1.40µs ? ?/sec
smol-songs.csv: basic filter: TO/charles mingus 1.00 318.5±9.75µs ? ?/sec 1.86 593.9±3.52µs ? ?/sec
smol-songs.csv: basic filter: TO/david 1.00 97.9±3.72µs ? ?/sec 1.92 188.2±1.64µs ? ?/sec
smol-songs.csv: basic filter: TO/david bowie 1.00 315.0±9.05µs ? ?/sec 1.85 583.2±4.61µs ? ?/sec
smol-songs.csv: basic filter: TO/john 1.00 96.9±0.88µs ? ?/sec 1.89 183.5±1.20µs ? ?/sec
smol-songs.csv: basic filter: TO/marcus miller 1.00 316.1±8.40µs ? ?/sec 1.85 585.6±3.80µs ? ?/sec
smol-songs.csv: basic filter: TO/michael jackson 1.00 323.5±8.45µs ? ?/sec 1.90 615.6±5.14µs ? ?/sec
smol-songs.csv: basic filter: TO/tamo 1.00 95.9±0.62µs ? ?/sec 1.87 179.2±1.83µs ? ?/sec
smol-songs.csv: basic filter: TO/thelonious monk 1.00 321.7±8.53µs ? ?/sec 2.00 643.2±5.75µs ? ?/sec
smol-songs.csv: basic placeholder/ 1.01 58.9±0.40µs ? ?/sec 1.00 58.0±0.37µs ? ?/sec
smol-songs.csv: basic with quote/"Notstandskomitee" 1.00 193.8±1.30µs ? ?/sec 1.07 206.4±4.17µs ? ?/sec
smol-songs.csv: basic with quote/"charles" 1.00 167.6±1.41µs ? ?/sec 1.05 176.4±1.88µs ? ?/sec
smol-songs.csv: basic with quote/"charles" "mingus" 1.00 1264.3±8.88µs ? ?/sec 1.02 1293.3±8.48µs ? ?/sec
smol-songs.csv: basic with quote/"david" 1.00 239.4±1.99µs ? ?/sec 1.04 248.3±1.82µs ? ?/sec
smol-songs.csv: basic with quote/"david" "bowie" 1.00 1412.1±9.58µs ? ?/sec 1.02 1440.7±10.46µs ? ?/sec
smol-songs.csv: basic with quote/"john" 1.00 355.4±3.21µs ? ?/sec 1.03 366.4±2.60µs ? ?/sec
smol-songs.csv: basic with quote/"marcus" "miller" 1.00 295.6±2.12µs ? ?/sec 1.05 311.3±2.06µs ? ?/sec
smol-songs.csv: basic with quote/"michael" "jackson" 1.00 1415.9±7.05µs ? ?/sec 1.03 1452.6±28.00µs ? ?/sec
smol-songs.csv: basic with quote/"tamo" 1.00 545.1±4.96µs ? ?/sec 1.03 558.8±3.72µs ? ?/sec
smol-songs.csv: basic with quote/"thelonious" "monk" 1.00 1756.4±9.23µs ? ?/sec 1.00 1762.9±10.53µs ? ?/sec
smol-songs.csv: basic without quote/Notstandskomitee 1.00 3.4±0.02ms ? ?/sec 1.06 3.5±0.01ms ? ?/sec
smol-songs.csv: basic without quote/charles 1.00 342.2±11.65µs ? ?/sec 1.29 442.6±2.91µs ? ?/sec
smol-songs.csv: basic without quote/charles mingus 1.00 3.3±0.01ms ? ?/sec 1.09 3.6±0.02ms ? ?/sec
smol-songs.csv: basic without quote/david 1.00 499.9±3.72µs ? ?/sec 1.19 597.4±4.67µs ? ?/sec
smol-songs.csv: basic without quote/david bowie 1.00 6.6±0.02ms ? ?/sec 1.06 7.0±0.03ms ? ?/sec
smol-songs.csv: basic without quote/john 1.00 1344.7±8.76µs ? ?/sec 1.07 1441.9±9.42µs ? ?/sec
smol-songs.csv: basic without quote/marcus miller 1.00 2.8±0.01ms ? ?/sec 1.11 3.1±0.02ms ? ?/sec
smol-songs.csv: basic without quote/michael jackson 1.00 4.3±0.02ms ? ?/sec 1.08 4.6±0.02ms ? ?/sec
smol-songs.csv: basic without quote/tamo 1.00 993.0±9.34µs ? ?/sec 1.09 1082.7±7.38µs ? ?/sec
smol-songs.csv: basic without quote/thelonious monk 1.00 5.5±0.02ms ? ?/sec 1.00 5.5±0.02ms ? ?/sec
smol-songs.csv: big filter/Notstandskomitee 1.00 3.3±0.02ms ? ?/sec 1.06 3.5±0.02ms ? ?/sec
smol-songs.csv: big filter/charles 1.00 364.3±2.73µs ? ?/sec 1.28 465.0±2.60µs ? ?/sec
smol-songs.csv: big filter/charles mingus 1.00 983.2±17.50µs ? ?/sec 1.28 1259.6±7.82µs ? ?/sec
smol-songs.csv: big filter/david 1.00 989.9±17.46µs ? ?/sec 1.11 1100.8±10.32µs ? ?/sec
smol-songs.csv: big filter/david bowie 1.00 2.0±0.01ms ? ?/sec 1.14 2.3±0.01ms ? ?/sec
smol-songs.csv: big filter/john 1.00 1143.7±10.60µs ? ?/sec 1.09 1245.1±9.30µs ? ?/sec
smol-songs.csv: big filter/marcus miller 1.00 1048.0±9.52µs ? ?/sec 1.28 1341.8±10.14µs ? ?/sec
smol-songs.csv: big filter/michael jackson 1.00 1966.1±12.75µs ? ?/sec 1.17 2.3±0.01ms ? ?/sec
smol-songs.csv: big filter/tamo 1.00 218.2±1.75µs ? ?/sec 1.41 307.6±2.22µs ? ?/sec
smol-songs.csv: big filter/thelonious monk 1.00 3.8±0.01ms ? ?/sec 1.10 4.2±0.02ms ? ?/sec
smol-songs.csv: desc + default/Notstandskomitee 1.00 3.5±0.02ms ? ?/sec 1.04 3.7±0.01ms ? ?/sec
smol-songs.csv: desc + default/charles 1.00 1787.2±9.68µs ? ?/sec 1.07 1903.8±25.41µs ? ?/sec
smol-songs.csv: desc + default/charles mingus 1.00 2.8±0.01ms ? ?/sec 1.11 3.1±0.02ms ? ?/sec
smol-songs.csv: desc + default/david 1.00 6.1±0.05ms ? ?/sec 1.05 6.4±0.02ms ? ?/sec
smol-songs.csv: desc + default/david bowie 1.00 9.4±0.03ms ? ?/sec 1.06 10.0±0.03ms ? ?/sec
smol-songs.csv: desc + default/john 1.00 4.8±0.02ms ? ?/sec 1.04 5.0±0.07ms ? ?/sec
smol-songs.csv: desc + default/marcus miller 1.00 4.4±0.05ms ? ?/sec 1.07 4.8±0.02ms ? ?/sec
smol-songs.csv: desc + default/michael jackson 1.00 7.0±0.03ms ? ?/sec 1.05 7.4±0.03ms ? ?/sec
smol-songs.csv: desc + default/tamo 1.03 1756.3±10.72µs ? ?/sec 1.00 1700.8±11.51µs ? ?/sec
smol-songs.csv: desc + default/thelonious monk 1.00 5.6±0.05ms ? ?/sec 1.01 5.7±0.02ms ? ?/sec
smol-songs.csv: desc/Notstandskomitee 1.00 3.1±0.01ms ? ?/sec 1.06 3.3±0.01ms ? ?/sec
smol-songs.csv: desc/charles 1.00 582.3±6.58µs ? ?/sec 1.17 682.5±4.57µs ? ?/sec
smol-songs.csv: desc/charles mingus 1.00 1110.5±9.88µs ? ?/sec 1.25 1388.2±14.81µs ? ?/sec
smol-songs.csv: desc/david 1.00 860.4±3.70µs ? ?/sec 1.12 960.4±5.04µs ? ?/sec
smol-songs.csv: desc/david bowie 1.00 1454.3±16.35µs ? ?/sec 1.19 1726.6±9.71µs ? ?/sec
smol-songs.csv: desc/john 1.00 765.6±4.85µs ? ?/sec 1.12 855.4±4.86µs ? ?/sec
smol-songs.csv: desc/marcus miller 1.00 1355.7±24.10µs ? ?/sec 1.20 1627.4±11.02µs ? ?/sec
smol-songs.csv: desc/michael jackson 1.00 1374.6±9.85µs ? ?/sec 1.21 1667.9±11.14µs ? ?/sec
smol-songs.csv: desc/tamo 1.00 143.9±0.85µs ? ?/sec 1.60 230.8±1.40µs ? ?/sec
smol-songs.csv: desc/thelonious monk 1.00 3.7±0.02ms ? ?/sec 1.11 4.1±0.02ms ? ?/sec
smol-songs.csv: prefix search/a 1.00 1291.9±9.99µs ? ?/sec 1.06 1366.0±7.99µs ? ?/sec
smol-songs.csv: prefix search/b 1.00 1152.7±7.74µs ? ?/sec 1.07 1236.9±12.65µs ? ?/sec
smol-songs.csv: prefix search/i 1.00 1434.3±34.29µs ? ?/sec 1.05 1508.4±12.93µs ? ?/sec
smol-songs.csv: prefix search/s 1.00 907.9±3.77µs ? ?/sec 1.08 983.2±11.53µs ? ?/sec
smol-songs.csv: prefix search/x 1.00 359.5±12.13µs ? ?/sec 1.21 434.4±2.45µs ? ?/sec
smol-songs.csv: proximity/7000 Danses Un Jour Dans Notre Vie 1.00 9.4±0.04ms ? ?/sec 1.98 18.6±0.22ms ? ?/sec
smol-songs.csv: proximity/The Disneyland Sing-Along Chorus 1.00 10.1±0.05ms ? ?/sec 1.26 12.7±0.06ms ? ?/sec
smol-songs.csv: proximity/Under Great Northern Lights 1.00 3.8±0.02ms ? ?/sec 1.35 5.1±0.02ms ? ?/sec
smol-songs.csv: proximity/black saint sinner lady 1.00 4.9±0.02ms ? ?/sec 1.25 6.1±0.03ms ? ?/sec
smol-songs.csv: proximity/les dangeureuses 1960 1.00 5.4±0.02ms ? ?/sec 1.13 6.1±0.02ms ? ?/sec
smol-songs.csv: typo/Arethla Franklin 1.00 614.2±3.93µs ? ?/sec 1.45 888.0±4.59µs ? ?/sec
smol-songs.csv: typo/Disnaylande 1.00 3.3±0.01ms ? ?/sec 1.02 3.3±0.01ms ? ?/sec
smol-songs.csv: typo/dire straights 1.00 3.9±0.02ms ? ?/sec 1.09 4.2±0.02ms ? ?/sec
smol-songs.csv: typo/fear of the duck 1.00 1788.2±10.08µs ? ?/sec 1.65 3.0±0.02ms ? ?/sec
smol-songs.csv: typo/indochie 1.00 177.9±1.25µs ? ?/sec 1.53 272.8±2.03µs ? ?/sec
smol-songs.csv: typo/indochien 1.00 182.3±1.55µs ? ?/sec 1.71 310.8±3.03µs ? ?/sec
smol-songs.csv: typo/klub des loopers 1.00 993.2±15.53µs ? ?/sec 1.63 1623.7±12.38µs ? ?/sec
smol-songs.csv: typo/michel depech 1.00 664.9±4.01µs ? ?/sec 1.43 950.3±5.38µs ? ?/sec
smol-songs.csv: typo/mongus 1.00 211.6±1.53µs ? ?/sec 1.42 301.3±3.11µs ? ?/sec
smol-songs.csv: typo/stromal 1.00 218.1±1.48µs ? ?/sec 1.43 312.0±1.91µs ? ?/sec
smol-songs.csv: typo/the white striper 1.00 1155.0±10.78µs ? ?/sec 1.58 1825.9±17.28µs ? ?/sec
smol-songs.csv: typo/thelonius monk 1.00 447.8±2.53µs ? ?/sec 1.69 758.3±6.81µs ? ?/sec
smol-songs.csv: words/7000 Danses / Le Baiser / je me trompe de mots 1.00 64.2±0.61ms ? ?/sec 1.45 93.4±2.05ms ? ?/sec
smol-songs.csv: words/Bring Your Daughter To The Slaughter but now this is not part of the title 1.00 128.3±0.49ms ? ?/sec 1.46 187.5±5.90ms ? ?/sec
smol-songs.csv: words/The Disneyland Children's Sing-Alone song 1.00 23.4±0.10ms ? ?/sec 1.42 33.1±0.33ms ? ?/sec
smol-songs.csv: words/les liaisons dangeureuses 1793 1.00 5.7±0.02ms ? ?/sec 1.28 7.3±0.12ms ? ?/sec
smol-songs.csv: words/seven nation mummy 1.00 1888.9±9.06µs ? ?/sec 1.34 2.5±0.01ms ? ?/sec
smol-songs.csv: words/the black saint and the sinner lady and the good doggo 1.00 143.2±0.48ms ? ?/sec 1.40 200.4±6.51ms ? ?/sec
smol-songs.csv: words/whathavenotnsuchforth and a good amount of words to pop to match the first one 1.00 147.1±0.48ms ? ?/sec 1.36 200.6±0.91ms ? ?/sec
Summary
Enhance milli word-matcher making it handle match computing and cropping.
Implementation
Computing best matches for cropping
Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched.
Now we are searching for the best matches interval to crop around, the chosen interval is the one:
Cropping around the best matches interval
Before we were cropping around the interval without checking the context.
Now we are cropping around words in the same context as matching words.
This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot.