Skip to content

Tests and dependencies refresh#1

Closed
crusaderky wants to merge 3 commits intov8.3.xfrom
ci
Closed

Tests and dependencies refresh#1
crusaderky wants to merge 3 commits intov8.3.xfrom
ci

Conversation

@crusaderky
Copy link
Owner

@crusaderky crusaderky commented Dec 10, 2025

  • In an effort to reduce the burden of maintenance, relax the upper constraint of most dependencies
  • For the same reason, remove the upper constraint on Python version
  • Clean up some obsolete Python 3.6~3.9 artefacts
  • Revisit the unit tests CI workflow:
    • run on more OS'es, to match the wheels
    • run on Python 3.13 (note: 3.14 at the moment is broken)
    • allow running on forks
    • allow running on PRs to staging branches
    • allow triggering manually
    • cancel previous runs when multiple pushes happen in short sequence
  • Unit tests had to be tweaked to cope with the new OSes and Python versions.

@crusaderky crusaderky force-pushed the ci branch 7 times, most recently from 8abce14 to ece0206 Compare December 11, 2025 17:35
@crusaderky crusaderky changed the title WIP CI refresh CI refresh Dec 11, 2025
@crusaderky crusaderky marked this pull request as ready for review December 11, 2025 17:35
@crusaderky crusaderky changed the title CI refresh CI and dependencies refresh Dec 11, 2025
@crusaderky crusaderky force-pushed the ci branch 2 times, most recently from e835f35 to 59221cc Compare December 11, 2025 17:48
@crusaderky crusaderky changed the title CI and dependencies refresh Tests and dependencies refresh Dec 11, 2025
matrix.python_version != '3.6' &&
matrix.python_version != '3.7'
- name: Build sdist and wheel
run: python -m build
Copy link
Owner Author

@crusaderky crusaderky Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This redesign of the build workflow could look unnecessary on first sight.
However, it will become justified in a follow-up PR, where I build vs. nightly wheels of major dependencies (numpy and blis). In order to do so, I had to build with --no-isolation, meaning that compiling from sdist later is no longer functional if you want to test that runtime dependencies are pulled in on install.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me to change, python -m build is the standard and python setup.py ... is deprecated anyway.

- name: Run tests without extras
run: |
python -m pytest --pyargs thinc -Werror --cov=thinc --cov-report=term
run: python -m pytest --pyargs thinc --cov=thinc --cov-report=term
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-Werror moved to pyproject.toml

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, vestigial now that the repo uses cibuildwheel

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed a bunch of redundant dependencies. Now only test dependencies are left.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave the dependencies needed for building too, in case someone wants to do a --no-build-isolation build. Also IMO a file named requirements.txt should include all needed dependencies.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also similar to below, even if it does reduce maintenance burden, the requirements files here are relatively strict for a reason and we shouldn't change that without explicit buy-in from upstream.

requirements.txt Outdated
typing_extensions>=3.7.4.1,<4.5.0; python_version < "3.8"
contextvars>=2.4,<3; python_version < "3.7"
# Explosion-provided test dependencies
ml-datasets>=0.2.0,<1
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed pin python_version < "3.11", as there was no strong reason for it.
The removal of the pin causes ml-datasets to run on many more CI os'es, which highlighted a lot of failures (as marked in the tests).

requirements.txt Outdated
# Test to_disk/from_disk against pathlib.Path subclasses
pathy>=0.3.5
# Linters
flake8==5.0.4
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved pin from tests.yml

requirements.txt Outdated
pathy>=0.3.5
# Linters
flake8==5.0.4
mypy>=1.5.0,<1.6.0; platform_machine != "aarch64"
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bizarre pin which I did not investigate. This mypy version is extremely old and would need to be upgraded anyway, which is out of scope for this exercise.

setup.cfg Outdated
cymem>=2.0.2,<3
preshed>=3.0.2,<4
wasabi>=0.8.1,<2
srsly>=2.4.0,<4
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enables explosion/srsly#120

setup.cfg Outdated
wasabi>=0.8.1,<2
srsly>=2.4.0,<4
catalogue>=2.0.4,<3
confection>=0.0.1,<2
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enables confection git tip

Copy link

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is changing more than it needs to - please try to keep these PRs as minimal as possible.

In particular, I don't think we should change the upper version pins without buy-in from upstream maintainers and a thorough understanding of why they're in there right now. I realize there are technical downsides to not relaxing the upper version pins. We only have a limited amount of attention from the explosion maintainers and I don't want them to get distracted by changes that they might perceive as unnecessary.

numpy>=1.19.0,<3.0.0
numpy>=1.21.0,<3.0.0
pydantic>=2.0.0,<3.0.0
packaging>=20.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not relaxing the upper version pins unless you need to and only then to relax them to the needed feature release.

The maintainers added these pins for a reason, IMO we need to leave them as-is unless we get explicit buy-in to remove them.

[options]
zip_safe = false
include_package_data = true
python_requires = >=3.10,<3.15

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave this as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave the dependencies needed for building too, in case someone wants to do a --no-build-isolation build. Also IMO a file named requirements.txt should include all needed dependencies.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also similar to below, even if it does reduce maintenance burden, the requirements files here are relatively strict for a reason and we shouldn't change that without explicit buy-in from upstream.

pyproject.toml Outdated
"murmurhash>=1.0.2,<2",
"cymem>=2.0.2,<3",
"preshed>=3.0.2,<4",
"blis>=1.3.0,<2",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep the upper pins to be on a feature version? See the explanation below.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, vestigial now that the repo uses cibuildwheel

@pytest.mark.parametrize(
("depth", "width", "vector_width", "nb_epoch"), [(2, 32, 16, 5)]
)
@pytest.mark.xfail(reason="Flaky live download from the internet", strict=False)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't strict=True cause a failure if the flaky download happens to work?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, that's why I wrote strict=False.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want the test to fail if the download succeeds?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't fail, it does say strict=False.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test will XPASS if the download succeeds.

This is a textbook use case for

@pytest.mark.xfail(
    exception=URLError, reason="Flaky live download from the internet", strict=False
)

which would neatly single out the flakiness of the download phase and leave a hard fail if anything else raises.
However, ml-datasets obfuscates the URLError.
I've written explosion/ml-datasets#8 to address it (but it will need a follow-up clean-up here after the next ml-datasets release).

@pytest.mark.xfail(
platform.system() == "Darwin",
reason="SSL: CERTIFICATE_VERIFY_FAILED",
strict=False, # Works on macos-15-intel Python 3.10, for some reason

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is saying "works" here a typo? If I'm reading this correctly the xfail says this always fails on macs.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MacOS Intel Python 3.10 - always works
MacOS Intel Python 3.11 - always fails
MacOS Intel Python 3.12 - always fails
MacOS Intel Python 3.13 - always fails
MacOS ARM Python 3.10 - always fails
MacOS ARM Python 3.11 - always fails
MacOS ARM Python 3.12 - always fails
MacOS ARM Python 3.13 - always fails

I did not invest time to investigate why.
Pinpoint the one and only combination of CPU and Python that for some reason makes it work felt like overengineering it. Also, I have no evidence about what happens on local boxes (and I suspect behaviour may change depending on your OS version).

So I just relaxed the check and moved on. The test must pass on Linux and Windows, which is plenty for me. Everything else is a ml-datasets issue that shouldn't be dealt with here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'd delete the strict=True then.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'd delete the strict=True then.

This looks fine as is, the code says strict=False (also higher up). I think from the comments you're misreading it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'm sorry! Thank you. I'd just delete the strict=False. That's the default and I don't think that's being overridden in the pytest configuration.

Copy link
Owner Author

@crusaderky crusaderky Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's healthier to explicitly declare that this is the intended behaviour if someone later sets strict=True by default in the pytest config?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then xfail_strict should probably be set in the CI config to enforce that.

- "*.md"
pull_request:
types: [opened, synchronize, reopened, edited]
branches: ["*"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say [main, v8.3.x] here as well.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I routinely do pull requests against a staging branch of some sort on my local forks to verify that everything works AND produce a nice PR diff.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that, but you can always change that locally in your fork. When I work with CI I prefer to leave behind settings that reduce unnecessary electricity usage.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only case where this comes in to play is when someone deliberately opens a PR against a staging branch, at which point why wouldn't they want to run CI?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's agree to disagree, it's not a big deal. I like to err on the side of reducing unnecessary CI use, that's all.

Copy link

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The majority of changes look fine. I agree with @ngoldbaum's comment about avoiding this:

  • In an effort to reduce the burden of maintenance, relax the upper constraint of most dependencies
  • For the same reason, remove the upper constraint on Python version

Those kinds of changes are invariably controversial, and not necessary.

preshed>=3.0.2,<3.1.0
blis>=1.3.0,<1.4.0
srsly>=2.4.0,<3.0.0
srsly>=2.4.0,<3.1.0
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catalogue>=2.0.4,<2.1.0
confection>=0.0.1,<1.0.0
ml_datasets>=0.2.0,<0.3.0; python_version < "3.11"
confection>=0.0.1,<1.1.0
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enables confection git tip

confection>=0.0.1,<1.0.0
ml_datasets>=0.2.0,<0.3.0; python_version < "3.11"
confection>=0.0.1,<1.1.0
ml_datasets>=0.2.0,<0.3.0
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was no strong reason to pin it only to Python 3.10.
The unpin makes ml-datasets tests run on more platforms, too, which highlighted a bunch of test fragility (see changes to tests/`).

types-contextvars>=0.1.2; python_version < "3.7"
types-dataclasses>=0.1.3; python_version < "3.7"
importlib_resources; python_version < "3.7"
flake8==5.0.4
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pin moved from tests.yml

types-dataclasses>=0.1.3; python_version < "3.7"
importlib_resources; python_version < "3.7"
flake8==5.0.4
mypy>=1.5.0,<1.6.0; platform_machine != "aarch64"
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not spend time to investigate this bizarre arch pin. This is a very old mypy version. The whole linters stack could use a proper refresh, which is out of scope for this PR.

Comment on lines +25 to +26
pydot
graphviz
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from tests.yml

@crusaderky
Copy link
Owner Author

I revisited the pins as requested; this is ready for a second round of review.

Copy link

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! All the remaining discussion is over quibbles. This looks good. Maybe add xfail_strict to the CI config if it's straightfoward but otherwise this is fine.

@crusaderky
Copy link
Owner Author

Set xfail_strict=True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants