Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return the summary size by custom callable object #161

Merged
merged 2 commits into from
Nov 23, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions .github/workflows/run-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: Run tests
on:
workflow_dispatch:
pull_request:
branches:
- "main"

jobs:
tests:
timeout-minutes: 10
strategy:
fail-fast: false
matrix:
os: ["ubuntu-latest"]
python-version: ["3.5", "3.6", "3.7", "3.8", "3.9", "3.10"]
include:
# - os: "windows-2022"
# python-version: "3.10"
- os: "macos-10.15"
python-version: "3.10"
- os: "macos-11"
python-version: "3.10"

runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v2

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}

- name: "Install dependencies"
run: |
python -m pip install --upgrade pip wheel setuptools
pip install .
# https://stackoverflow.com/a/69439779/2988107
pip install --upgrade numpy tinysegmenter jieba konlpy hebrew_tokenizer "tweepy<4.0.0"
python -c "import nltk; nltk.download('punkt')"
pip install --upgrade pytest codecov pytest-cov

- run: pytest tests
env:
CI: 1
PYTHONDONTWRITEBYTECODE: 1

- run: codecov
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ __pycache__/

# IDE files, .coverage, .pytest_cache, ...
.*
!.travis.yml
!/.github/
!.gitignore
61 changes: 0 additions & 61 deletions .travis.yml

This file was deleted.

3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Changelog

## 0.9.1 (unpublished)
**FEATURE:** Return the summary size by custom callable object

## 0.9.0 (2021-10-21)
- **INCOMPATIBILITY** Dropped official support for Python 2.7. It should still work if you install Python 2 compatible dependencies.
- **FEATURE:** Add basic Korean support by @kimbyungnam in https://github.com/miso-belica/sumy/pull/129
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Automatic text summarizer

[![image](https://api.travis-ci.org/miso-belica/sumy.png?branch=master)](https://travis-ci.org/miso-belica/sumy)

[![image](https://github.com/miso-belica/sumy/actions/workflows/run-tests.yml/badge.svg)](https://github.com/miso-belica/sumy/actions/workflows/run-tests.yml)
[![Gitpod Ready-to-Code](https://img.shields.io/badge/Gitpod-Ready--to--Code-blue?logo=gitpod)](https://gitpod.io/#https://github.com/miso-belica/sumy)

Simple library and command line utility for extracting summary from HTML
Expand Down
5 changes: 5 additions & 0 deletions sumy/_compat.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,11 @@
except ImportError:
from itertools import filterfalse as ffilter

try:
from collections.abc import Sequence
except ImportError:
from collections import Sequence


def unicode_compatible(cls):
"""
Expand Down
3 changes: 1 addition & 2 deletions sumy/models/tf.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@

from collections import Counter
from pprint import pformat
from collections import Sequence
from .._compat import to_unicode, unicode, string_types
from .._compat import to_unicode, unicode, string_types, Sequence


class TfDocumentModel(object):
Expand Down
2 changes: 1 addition & 1 deletion sumy/summarizers/_summarizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def _get_best_sentences(sentences, count, rating, *args, **kwargs):
# sort sentences by rating in descending order
infos = sorted(infos, key=attrgetter("rating"), reverse=True)
# get `count` first best rated sentences
if not isinstance(count, ItemsCount):
if not callable(count):
count = ItemsCount(count)
infos = count(infos)
# sort sentences by their order in document
Expand Down
35 changes: 35 additions & 0 deletions tests/test_summarizers/test_random.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

from __future__ import absolute_import, division, print_function, unicode_literals

from functools import partial

from sumy._compat import to_unicode
from sumy.summarizers.random import RandomSummarizer
from ..utils import build_document, build_document_from_string
Expand Down Expand Up @@ -60,3 +62,36 @@ def test_more_sentences_than_requested():

sentences = summarizer(document, 4)
assert len(sentences) == 4


def test_less_than_10_words_should_be_returned():
"""https://github.com/miso-belica/sumy/issues/159"""
document = build_document_from_string("""
# Heading one
First sentence.
Second sentence.
Third sentence.

# Heading two
I like sentences but this one is really long.
They are so wordy
And have many many letters
And are green in my editor
But someone doesn't like them :(
""")
summarizer = RandomSummarizer()

def count(max_words, sentence_infos):
results = []
words_count = 0
for info in sentence_infos:
words_count += len(info.sentence.words)
if words_count > max_words:
return results
else:
results.append(info)

return results

sentences = summarizer(document, partial(count, 10))
assert 0 < sum(len(s.words) for s in sentences) <= 10