Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.5x faster #139

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions sumy/summarizers/text_rank.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,11 @@ def _create_matrix(self, document):
sentences_count = len(sentences_as_words)
weights = numpy.zeros((sentences_count, sentences_count))

for i, words_i in enumerate(sentences_as_words):
for j, words_j in enumerate(sentences_as_words):
weights[i, j] = self._rate_sentences_edge(words_i, words_j)
for i in range(0, sentences_count-1):
Copy link
Owner

@miso-belica miso-belica Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you should do range(0, sentences_count) here because list(range(0, 3)) == [0, 1, 2]. But I would prefer you return enumerate(sentences_as_words). I think the one last missing word may a reason why the test fails on CI. So the code after the changes will be:

for i, words_i in enumerate(sentences_as_words):
  for j in range(i+1, sentences_count):
    rating = self._rate_sentences_edge(words_i, sentences_as_words[j])
    weights[i, j] = rating
    weights[j, i] = rating

EDIT: now I realized, this is not the case when you reach the end you fill the last row by weights[j, i] = rating. But still, the test fails so please check it why.

for j in range(i+1, sentences_count):
miso-belica marked this conversation as resolved.
Show resolved Hide resolved
weights[i, j] = self._rate_sentences_edge(sentences_as_words[i], sentences_as_words[j])
weights[j, i] = weights[i, j]

weights /= (weights.sum(axis=1)[:, numpy.newaxis]+self._delta) # delta added to prevent zero-division error
#(see issue https://github.com/miso-belica/sumy/issues/112 )

Expand All @@ -84,10 +86,8 @@ def _to_words_set(self, sentence):

@staticmethod
def _rate_sentences_edge(words1, words2):
rank = 0
for w1 in words1:
for w2 in words2:
rank += int(w1 == w2)

rank=sum([words2.count(el) for el in words1])
miso-belica marked this conversation as resolved.
Show resolved Hide resolved

if rank == 0:
return 0.0
Expand Down