-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: Add pdf_reader_page fixture #1735
Conversation
a505ed6
to
fad15bb
Compare
@pubpub-zz What to you think about this type of change? I've only applied this to one of up to 52 places where it might be used. I didn't do the rest as I want to know your opinion before I continue. We pretty often need just a random page for our tests. This could make it more obvious that there is nothing special about that page. If anybody reads the tests to understand how a certain feature is used, fixtures might make their lives harder. For us, it might make it more clear what we actually want to test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like your proposal with just one comment (inline)
Just one point : do you have any idea to improve readability of all the test files using urls?
tests/conftest.py
Outdated
|
||
|
||
@pytest.fixture(scope="session") | ||
def pdf_reader_page(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in order to understand about what we are talking about it might be better to call it pdf_crazyones_page0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was wondering about this as well. I thought that this fixture should only be used in places where the concrete page / PDF could be swapped without affecting the test. Hence I would like to keep the name.
Another idea was to parametrize the fixture to allow specifying a specific (local) PDF / page. But that might actually harm readability.
Instead of @pytest.mark.enable_socket()
@pytest.mark.slow()
def test_compute_space_width():
url = "https://corpora.tika.apache.org/base/docs/govdocs1/923/923406.pdf"
name = "tika-923406.pdf"
reader = PdfReader(BytesIO(get_pdf_from_url(url, name=name)))
for page in reader.pages:
page.extract_text() we could do @pytest.mark.enable_socket()
@pytest.mark.slow()
@pytest.mark.parametrize(
"external_url_reader",
[
[
"https://corpora.tika.apache.org/base/docs/govdocs1/923/923406.pdf",
"tika-923406.pdf",
]
],
indirect=True,
)
def test_compute_space_width(external_url_reader):
reader = external_url_reader
for page in reader.pages:
page.extract_text() I don't think that is better, though 😅 |
I was hoping that we could get more PDFs into https://github.com/py-pdf/sample-files so that we don't need the load anything from URLs anymore. |
I like to use the pdf which has produced the error, it is so much easier to ensure the test coverage is good |
In the context of adding more PDFs for testing to the sample-files / directly to the git repo: We might be able to use test files from other open source projects with a compatible license, e.g. https://github.com/jsvine/pdfplumber/tree/stable/tests/pdfs |
Added via #1738 |
This change should make our tests shorter and easier to read.