Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.5x faster #139

Closed
wants to merge 2 commits into from
Closed

2.5x faster #139

wants to merge 2 commits into from

Conversation

vprelovac
Copy link

@vprelovac vprelovac commented Feb 20, 2020

Performance optimization, about 2.5x faster

Copy link
Owner

@miso-belica miso-belica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. Good job 👍

I added some comments so please try to react on them (by code or another comment :) and if it would be possible could you maybe create some test for method _create_matrix so we are sure it's the same as before?

Oh, and I almost forgot. If you have any measurements/benchmark to show how the changes result in the 2.5x speedup it would be great if you can provide that too.

sumy/summarizers/text_rank.py Show resolved Hide resolved
for i, words_i in enumerate(sentences_as_words):
for j, words_j in enumerate(sentences_as_words):
weights[i, j] = self._rate_sentences_edge(words_i, words_j)
for i in range(0, sentences_count-1):
Copy link
Owner

@miso-belica miso-belica Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you should do range(0, sentences_count) here because list(range(0, 3)) == [0, 1, 2]. But I would prefer you return enumerate(sentences_as_words). I think the one last missing word may a reason why the test fails on CI. So the code after the changes will be:

for i, words_i in enumerate(sentences_as_words):
  for j in range(i+1, sentences_count):
    rating = self._rate_sentences_edge(words_i, sentences_as_words[j])
    weights[i, j] = rating
    weights[j, i] = rating

EDIT: now I realized, this is not the case when you reach the end you fill the last row by weights[j, i] = rating. But still, the test fails so please check it why.

sumy/summarizers/text_rank.py Show resolved Hide resolved
@miso-belica
Copy link
Owner

Don't have time to do these, feel free to reject and implement changes yourself.

@vprelovac Thanks for the inspiration. I implemented the optimizations in the related PR but was not so aggressive so the tests pass and it is working the same as before.

But still, the problem was in the diagonal of the matrix which should not affect the score so I will dig deeper into it to find out why we can't omit the computing of the diagonal.

Have a nice day :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants