Skip to content

Commit c2ca949

Browse files
committed
wip: triagehistory: Try some optimisation
Signed-off-by: Mathieu Dubois-Briand <[email protected]>
1 parent f11cfd2 commit c2ca949

File tree

1 file changed

+12
-2
lines changed

1 file changed

+12
-2
lines changed

swattool/triagehistory.py

+12-2
Original file line numberDiff line numberDiff line change
@@ -63,8 +63,18 @@ def get_similarity_score(self, log_fingerprint: Collection[str]) -> float:
6363
flags=re.IGNORECASE | re.MULTILINE)
6464

6565
# Compute scores for all fingerprint fragment combinations
66-
scores = [[jellyfish.jaro_similarity(f1, f2) for f2 in log_fingerprint]
67-
for f1 in self.log_fingerprint]
66+
# Only consider combinations with similar positions in the files:
67+
# reduce both false positives and computation time.
68+
scores = [[0 for f2 in log_fingerprint] for f1 in self.log_fingerprint]
69+
lendiff = len(self.log_fingerprint) - len(log_fingerprint)
70+
for i, f1 in enumerate(self.log_fingerprint):
71+
for j, f2 in enumerate(log_fingerprint):
72+
maxdist = 2
73+
startdist = i - j
74+
enddist = lendiff - startdist
75+
if min(abs(startdist), abs(enddist)) > maxdist:
76+
continue
77+
scores[i][j] = jellyfish.jaro_similarity(f1, f2)
6878

6979
# Compute the final score as 2 half-scores: fingerprint A to B, then B
7080
# to A, so the similarity score is commutative.

0 commit comments

Comments
 (0)