TST: Add pdf_reader_page fixture #1735

MartinThoma · 2023-03-21T17:36:02Z

This change should make our tests shorter and easier to read.

MartinThoma · 2023-03-21T17:41:15Z

@pubpub-zz What to you think about this type of change? I've only applied this to one of up to 52 places where it might be used. I didn't do the rest as I want to know your opinion before I continue.

We pretty often need just a random page for our tests. This could make it more obvious that there is nothing special about that page.

If anybody reads the tests to understand how a certain feature is used, fixtures might make their lives harder. For us, it might make it more clear what we actually want to test.

pubpub-zz

I like your proposal with just one comment (inline)

Just one point : do you have any idea to improve readability of all the test files using urls?

pubpub-zz · 2023-03-21T17:49:52Z

tests/conftest.py

+
+
+@pytest.fixture(scope="session")
+def pdf_reader_page():


in order to understand about what we are talking about it might be better to call it pdf_crazyones_page0

Yes, I was wondering about this as well. I thought that this fixture should only be used in places where the concrete page / PDF could be swapped without affecting the test. Hence I would like to keep the name.

Another idea was to parametrize the fixture to allow specifying a specific (local) PDF / page. But that might actually harm readability.

MartinThoma · 2023-03-21T18:52:48Z

do you have any idea to improve readability of all the test files using urls?

Instead of

@pytest.mark.enable_socket()
@pytest.mark.slow()
def test_compute_space_width():
    url = "https://corpora.tika.apache.org/base/docs/govdocs1/923/923406.pdf"
    name = "tika-923406.pdf"

    reader = PdfReader(BytesIO(get_pdf_from_url(url, name=name)))
    for page in reader.pages:
        page.extract_text()

we could do

@pytest.mark.enable_socket()
@pytest.mark.slow()
@pytest.mark.parametrize(
    "external_url_reader",
    [
        [
            "https://corpora.tika.apache.org/base/docs/govdocs1/923/923406.pdf",
            "tika-923406.pdf",
        ]
    ],
    indirect=True,
)
def test_compute_space_width(external_url_reader):
    reader = external_url_reader
    for page in reader.pages:
        page.extract_text()

I don't think that is better, though 😅

MartinThoma · 2023-03-21T18:54:01Z

do you have any idea to improve readability of all the test files using urls?

I was hoping that we could get more PDFs into https://github.com/py-pdf/sample-files so that we don't need the load anything from URLs anymore.

pubpub-zz · 2023-03-21T19:45:30Z

do you have any idea to improve readability of all the test files using urls?

I was hoping that we could get more PDFs into https://github.com/py-pdf/sample-files so that we don't need the load anything from URLs anymore.

I like to use the pdf which has produced the error, it is so much easier to ensure the test coverage is good

MartinThoma · 2023-03-21T20:57:56Z

In the context of adding more PDFs for testing to the sample-files / directly to the git repo: We might be able to use test files from other open source projects with a compatible license, e.g. https://github.com/jsvine/pdfplumber/tree/stable/tests/pdfs

MartinThoma · 2023-03-26T09:18:58Z

Added via #1738

MartinThoma requested a review from pubpub-zz March 21, 2023 17:36

MartinThoma marked this pull request as draft March 21, 2023 17:36

TST: Add pdf_reader_page fixture

fad15bb

MartinThoma force-pushed the pdf-reader-page-fixture branch from a505ed6 to fad15bb Compare March 21, 2023 17:38

pubpub-zz approved these changes Mar 21, 2023

View reviewed changes

Merge branch 'main' into pdf-reader-page-fixture

d73bf95

MartinThoma closed this Mar 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST: Add pdf_reader_page fixture #1735

TST: Add pdf_reader_page fixture #1735

MartinThoma commented Mar 21, 2023 •

edited

Loading

MartinThoma commented Mar 21, 2023

pubpub-zz left a comment

pubpub-zz Mar 21, 2023

MartinThoma Mar 21, 2023

MartinThoma commented Mar 21, 2023

MartinThoma commented Mar 21, 2023

pubpub-zz commented Mar 21, 2023

MartinThoma commented Mar 21, 2023 •

edited

Loading

MartinThoma commented Mar 26, 2023

TST: Add pdf_reader_page fixture #1735

TST: Add pdf_reader_page fixture #1735

Conversation

MartinThoma commented Mar 21, 2023 • edited Loading

MartinThoma commented Mar 21, 2023

pubpub-zz left a comment

Choose a reason for hiding this comment

pubpub-zz Mar 21, 2023

Choose a reason for hiding this comment

MartinThoma Mar 21, 2023

Choose a reason for hiding this comment

MartinThoma commented Mar 21, 2023

MartinThoma commented Mar 21, 2023

pubpub-zz commented Mar 21, 2023

MartinThoma commented Mar 21, 2023 • edited Loading

MartinThoma commented Mar 26, 2023

MartinThoma commented Mar 21, 2023 •

edited

Loading

MartinThoma commented Mar 21, 2023 •

edited

Loading