Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Rewrite scoring algorithm to support run of consecutive character, fix acronyms and add optimal selection of character. #22

Closed
wants to merge 77 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
9511715
Updated scoring algorithm, and merged @plrenaudin filter spec
jeancroy Jul 9, 2015
bc509dd
Removed SlahCount from baseNameScore (Test if this is source of diffe…
jeancroy Jul 9, 2015
b3d47c5
Increased BasePath bonus
jeancroy Jul 9, 2015
82dfcc3
Increase exact match bonus if happens right after a separator
jeancroy Jul 9, 2015
f779333
Introduce concept of optional characters, for example space can be ma…
jeancroy Jul 9, 2015
7539792
Added test for issue
jeancroy Jul 9, 2015
7c7f5d1
Added back preference for shallower path. (got removed after first bu…
jeancroy Jul 10, 2015
0e2c1a4
Compute lowercase once (thnaks to @walles)
jeancroy Jul 10, 2015
6dd4eaa
Last refactoring permit to drop -Infinity value and remove one test c…
jeancroy Jul 10, 2015
d2eb476
re-use precomputed lowercase
jeancroy Jul 10, 2015
09aa694
corrected math for lpos, in the suffix bonus case.
jeancroy Jul 10, 2015
6b44306
case sensitive exact match is now a bypass
jeancroy Jul 10, 2015
957da9b
Improved accuracy for CamelCase
jeancroy Jul 10, 2015
dec3278
account for case in CamelCase vs Substring matches
jeancroy Jul 10, 2015
67a0a0e
CamelCase Matches now prefer smaller haystack, matches that happens s…
jeancroy Jul 10, 2015
4290552
More robust basePath score
jeancroy Jul 10, 2015
8d1d713
Added more real-life test cases
jeancroy Jul 11, 2015
25d7b3c
Adjusted some weight so close call test have more legroom
jeancroy Jul 11, 2015
ab040aa
Query that have slashes also get bonus for matching base file.
jeancroy Jul 11, 2015
9ab128c
fix some comments
jeancroy Jul 11, 2015
e0ff69b
Merge remote-tracking branch 'origin/master'
jeancroy Jul 11, 2015
28f2248
Now using improved scorer to do aligments (matches)
jeancroy Jul 11, 2015
2c011a9
Option to allow error in query, disabled by default.
jeancroy Jul 11, 2015
8f42f93
Updated match to mirror behavior of filter
jeancroy Jul 12, 2015
99d3237
rework sequence merging (basePath, completePath) in match method
jeancroy Jul 12, 2015
8196b5e
added some mid difficulty highlight test case
jeancroy Jul 12, 2015
e811fa4
Introduce knowledge of consecutive characters in both match and the c…
jeancroy Jul 12, 2015
d6c2882
Implemented forward search for the number of consecutive chars.
jeancroy Jul 12, 2015
fa66d51
Improve Match:
jeancroy Jul 13, 2015
f0c492f
corrected exit condition of perfect camelCase match to account for ha…
jeancroy Jul 13, 2015
a862b89
added test for case-sensitive exact matches.
jeancroy Jul 13, 2015
e3c0fff
added test case to ensure correct result event in the non-exact subst…
jeancroy Jul 13, 2015
ba5d196
with the introduction of bonus relative to the number of consecutive …
jeancroy Jul 13, 2015
616346e
Remove un-needed (since last change) gap tracking variables
jeancroy Jul 13, 2015
03727bf
Merge branch 'after22'
jeancroy Jul 13, 2015
4b9550e
Merge remote-tracking branch 'origin/master'
jeancroy Jul 13, 2015
f9197a3
Merge highlight (match) feature using improved scoring algorithm.
jeancroy Jul 13, 2015
329e7c1
make score() follow filter() behavior more closely
jeancroy Jul 13, 2015
beee187
clean-up, fix comments
jeancroy Jul 14, 2015
c06cc8f
added more non-exact (fuzzy) test.
jeancroy Jul 14, 2015
cf8a43d
Replace test by an easier one for now.
jeancroy Jul 14, 2015
da12540
unified handling of snake_case and CamelCase.
jeancroy Jul 14, 2015
be99411
faster exit condition abbrPrefix
jeancroy Jul 14, 2015
1908a56
Faster abbrPrefix
jeancroy Jul 14, 2015
39e92f7
Bring back string_score as an optional faster but less accurate algor…
jeancroy Jul 14, 2015
e163e38
Speed: Math.max with 3 args is slow - use ternary, ternary
jeancroy Jul 15, 2015
1376514
Speed: use Object for multiple return value of abbrPrefix a bit faste…
jeancroy Jul 15, 2015
8993e06
Cleaner definition of sepmap
jeancroy Jul 15, 2015
93d283c
Show worst case scenario in benchmark
jeancroy Jul 15, 2015
0987c51
One extra test case needed to explain speed behavior
jeancroy Jul 15, 2015
8ac9178
show acronym quick exit in benchmark
jeancroy Jul 15, 2015
f129adc
Propose Worst case mitigation strategy
jeancroy Jul 15, 2015
b7c2422
Added a setting to control of the length subject considered for the s…
jeancroy Jul 16, 2015
7a063b4
Improve scoring of a match neighborhood quality.
jeancroy Jul 16, 2015
cb15663
remove the +1 offset between score and string position
jeancroy Jul 16, 2015
f47fc76
More uniform naming, use less memory
jeancroy Jul 16, 2015
0fcc59d
Merge pull request #2 from jeancroy/structural_change
jeancroy Jul 16, 2015
af723ee
Code review, missed one switch i<->j
jeancroy Jul 16, 2015
8bf3f60
"Worser" worst case scenario in benchmark
jeancroy Jul 16, 2015
99a8a3b
Strip space and space like character in score() and match() to mirror…
jeancroy Jul 17, 2015
887a577
test case for https://github.com/substantial/atomfiles/issues/43
jeancroy Jul 18, 2015
753c6c7
simplification:
jeancroy Jul 18, 2015
e6f103b
Faster test for Case-Sensitive Exact match
jeancroy Jul 18, 2015
a14bb8d
compute lowercase() upfront, enable mitigation by default
jeancroy Jul 19, 2015
da72fa9
when queryHasSlashes, basepath contain as many folder as query, start…
jeancroy Jul 20, 2015
98af35f
Clarify behavior of filter() on empty string or empty array
jeancroy Jul 21, 2015
a0099df
Added test for Suffix feature of exact match
jeancroy Jul 21, 2015
1cb9ce8
Allow to interchange forward and backward slashes in query.
jeancroy Aug 29, 2015
6cd2b3f
Prepare for review / V3
jeancroy Sep 17, 2015
46c8d2b
Corrected a bug with isMatch, reworked acronym weigth
jeancroy Sep 18, 2015
2f56dcb
Scoring improvement for single character query.
jeancroy Sep 19, 2015
8086648
Scoring improvement for single character query.
jeancroy Sep 19, 2015
2ba093a
Speed: Hit Count Optimisation & various improvements
jeancroy Sep 22, 2015
00ccd02
Delay computing candidate lowercase until IsMatch confirmed.
jeancroy Sep 23, 2015
f7cd989
faster end condition for scoreAcronyms
jeancroy Sep 23, 2015
a67fe76
clean up around scoreAcronyms added a new benchmark case
jeancroy Sep 23, 2015
149d937
fix indentation
jeancroy Sep 23, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 95 additions & 6 deletions benchmark/benchmark.coffee
Original file line number Diff line number Diff line change
@@ -1,18 +1,107 @@
fs = require 'fs'
path = require 'path'

{filter, match} = require '../src/fuzzaldrin'
{filter, match, prepQuery} = require '../src/fuzzaldrin'

lines = fs.readFileSync(path.join(__dirname, 'data.txt'), 'utf8').trim().split('\n')

startTime = Date.now()
results = filter(lines, 'index')
console.log("Filtering #{lines.length} entries for 'index' took #{Date.now() - startTime}ms for #{results.length} results")
forceAllMatch = {maxInners:-1}
legacy = {legacy:true}
mitigation = {maxInners:Math.floor(0.2*lines.length)}

#warmup + compile
filter(lines, 'index', forceAllMatch)
filter(lines, 'index', legacy)


console.log("======")

startTime = Date.now()
match(line, 'index') for line in lines
console.log("Matching #{lines.length} entries for 'index' took #{Date.now() - startTime}ms for #{results.length} results")
results = filter(lines, 'index')
console.log("Filtering #{lines.length} entries for 'index' took #{Date.now() - startTime}ms for #{results.length} results (~10% of results are positive, mix exact & fuzzy)")

if results.length isnt 6168
console.error("Results count changed! #{results.length} instead of 6168")
process.exit(1)

startTime = Date.now()
results = filter(lines, 'index', legacy)
console.log("Filtering #{lines.length} entries for 'index' took #{Date.now() - startTime}ms for #{results.length} results (~10% of results are positive, Legacy method)")


console.log("======")

startTime = Date.now()
results = filter(lines, 'indx')
console.log("Filtering #{lines.length} entries for 'indx' took #{Date.now() - startTime}ms for #{results.length} results (~10% of results are positive, Fuzzy match)")

startTime = Date.now()
results = filter(lines, 'indx', legacy)
console.log("Filtering #{lines.length} entries for 'indx' took #{Date.now() - startTime}ms for #{results.length} results (~10% of results are positive, Fuzzy match, Legacy)")

console.log("======")

startTime = Date.now()
results = filter(lines, 'walkdr')
console.log("Filtering #{lines.length} entries for 'walkdr' took #{Date.now() - startTime}ms for #{results.length} results (~1% of results are positive, fuzzy)")

startTime = Date.now()
results = filter(lines, 'walkdr', legacy)
console.log("Filtering #{lines.length} entries for 'walkdr' took #{Date.now() - startTime}ms for #{results.length} results (~1% of results are positive, Legacy method)")


console.log("======")

startTime = Date.now()
results = filter(lines, 'node', forceAllMatch)
console.log("Filtering #{lines.length} entries for 'node' took #{Date.now() - startTime}ms for #{results.length} results (~98% of results are positive, mostly Exact match)")

startTime = Date.now()
results = filter(lines, 'node', legacy)
console.log("Filtering #{lines.length} entries for 'node' took #{Date.now() - startTime}ms for #{results.length} results (~98% of results are positive, mostly Exact match, Legacy method)")


console.log("======")

startTime = Date.now()
results = filter(lines, 'nm', forceAllMatch)
console.log("Filtering #{lines.length} entries for 'nm' took #{Date.now() - startTime}ms for #{results.length} results (~98% of results are positive, Acronym match)")

startTime = Date.now()
results = filter(lines, 'nm', forceAllMatch)
console.log("Filtering #{lines.length} entries for 'nm' took #{Date.now() - startTime}ms for #{results.length} results (~98% of results are positive, Acronym match, Legacy method)")


console.log("======")

startTime = Date.now()
results = filter(lines, 'nodemodules', forceAllMatch)
console.log("Filtering #{lines.length} entries for 'nodemodules' took #{Date.now() - startTime}ms for #{results.length} results (~98% positive + Fuzzy match, [Worst case scenario])")

startTime = Date.now()
results = filter(lines, 'nodemodules', mitigation)
console.log("Filtering #{lines.length} entries for 'nodemodules' took #{Date.now() - startTime}ms for #{results.length} results (~98% positive + Fuzzy match, [Mitigation])")

startTime = Date.now()
results = filter(lines, 'nodemodules', legacy)
console.log("Filtering #{lines.length} entries for 'nodemodules' took #{Date.now() - startTime}ms for #{results.length} results (Legacy)")

console.log("======")

startTime = Date.now()
results = filter(lines, 'ndem', forceAllMatch)
console.log("Filtering #{lines.length} entries for 'ndem' took #{Date.now() - startTime}ms for #{results.length} results (~98% positive + Fuzzy match, [Worst case but shorter srting])")

startTime = Date.now()
results = filter(lines, 'ndem', legacy)
console.log("Filtering #{lines.length} entries for 'ndem' took #{Date.now() - startTime}ms for #{results.length} results (Legacy)")


console.log("======")

startTime = Date.now()
query = 'index'
prepared = prepQuery(query)
match(line, query, prepared) for line in lines
console.log("Matching #{results.length} results for 'index' took #{Date.now() - startTime}ms")

Loading