Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: Add pdf_reader_page fixture #1735

Closed
wants to merge 2 commits into from
Closed

Conversation

MartinThoma
Copy link
Member

@MartinThoma MartinThoma commented Mar 21, 2023

This change should make our tests shorter and easier to read.

@MartinThoma MartinThoma marked this pull request as draft March 21, 2023 17:36
@MartinThoma
Copy link
Member Author

@pubpub-zz What to you think about this type of change? I've only applied this to one of up to 52 places where it might be used. I didn't do the rest as I want to know your opinion before I continue.

We pretty often need just a random page for our tests. This could make it more obvious that there is nothing special about that page.

If anybody reads the tests to understand how a certain feature is used, fixtures might make their lives harder. For us, it might make it more clear what we actually want to test.

Copy link
Collaborator

@pubpub-zz pubpub-zz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your proposal with just one comment (inline)

Just one point : do you have any idea to improve readability of all the test files using urls?



@pytest.fixture(scope="session")
def pdf_reader_page():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in order to understand about what we are talking about it might be better to call it pdf_crazyones_page0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was wondering about this as well. I thought that this fixture should only be used in places where the concrete page / PDF could be swapped without affecting the test. Hence I would like to keep the name.

Another idea was to parametrize the fixture to allow specifying a specific (local) PDF / page. But that might actually harm readability.

@MartinThoma
Copy link
Member Author

do you have any idea to improve readability of all the test files using urls?

Instead of

@pytest.mark.enable_socket()
@pytest.mark.slow()
def test_compute_space_width():
    url = "https://corpora.tika.apache.org/base/docs/govdocs1/923/923406.pdf"
    name = "tika-923406.pdf"

    reader = PdfReader(BytesIO(get_pdf_from_url(url, name=name)))
    for page in reader.pages:
        page.extract_text()

we could do

@pytest.mark.enable_socket()
@pytest.mark.slow()
@pytest.mark.parametrize(
    "external_url_reader",
    [
        [
            "https://corpora.tika.apache.org/base/docs/govdocs1/923/923406.pdf",
            "tika-923406.pdf",
        ]
    ],
    indirect=True,
)
def test_compute_space_width(external_url_reader):
    reader = external_url_reader
    for page in reader.pages:
        page.extract_text()

I don't think that is better, though 😅

@MartinThoma
Copy link
Member Author

do you have any idea to improve readability of all the test files using urls?

I was hoping that we could get more PDFs into https://github.com/py-pdf/sample-files so that we don't need the load anything from URLs anymore.

@pubpub-zz
Copy link
Collaborator

do you have any idea to improve readability of all the test files using urls?

I was hoping that we could get more PDFs into https://github.com/py-pdf/sample-files so that we don't need the load anything from URLs anymore.

I like to use the pdf which has produced the error, it is so much easier to ensure the test coverage is good

@MartinThoma
Copy link
Member Author

MartinThoma commented Mar 21, 2023

In the context of adding more PDFs for testing to the sample-files / directly to the git repo: We might be able to use test files from other open source projects with a compatible license, e.g. https://github.com/jsvine/pdfplumber/tree/stable/tests/pdfs

@MartinThoma
Copy link
Member Author

Added via #1738

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants