Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix package included resources reading to standard pkgutil mechanism #53

Merged
merged 2 commits into from
Mar 13, 2016
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions sumy/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

import sys
import requests
import pkgutil

from functools import wraps
from contextlib import closing
Expand Down Expand Up @@ -49,15 +50,20 @@ def expand_resource_path(path):


def get_stop_words(language):
path = expand_resource_path("stopwords/%s.txt" % language)
if not exists(path):
try:
stopwords_data = pkgutil.get_data("sumy", "data/stopwords/%s.txt" % language)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use function expand_resource_path("stopwords/%s.txt" % language) here so I can change/move folder with resources in single place in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you haven't correctly unsderstood my problem.
In our team we use buildout to develop, build and distribute packages.
Buildout uses egg format of python packages.
In case of egg format expand_resource_path returns paths link '/usr/local/lib/python3.5/site-packages/sumy-0.4.1a0-py3.5.egg/sumy/data/stopwords/english.txt' and it's not a legal file (because of .egg is zip-archive).

For this cases pkgutil ( https://docs.python.org/3.5/library/pkgutil.html#pkgutil.get_data )is single known for me standard way to get access to distributed package data.
Of course I can explicitly add possibility to read in-archive paths for reading data into your code (it you will insist on it) - but I think that usage of standard pkgutil is more pretty way to solve problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or may you have suggested write code the next way?
stopwords_data = pkgutil.get_data("sumy", expand_resource_path("stopwords/%s.txt" % language))

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just wanted to have resources at the single place. But I see function expand_resource_path is used only at one place in this project, so I think it's OK. Sorry

except IOError as e:
raise LookupError("Stop-words are not available for language %s." % language)
return read_stop_words(path)
return parse_stop_words(stopwords_data)


def read_stop_words(filename):
with open(filename, "rb") as open_file:
return frozenset(to_unicode(w.rstrip()) for w in open_file.readlines())
return parse_stop_words(open_file.read())


def parse_stop_words(data):
return frozenset(w.rstrip() for w in to_unicode(data).split("\n") if w)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use frozenset(w.rstrip() for w in to_unicode(data).splitlines()) here to be more explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!



class ItemsCount(object):
Expand Down