Don't add fixture finalizer if the value is cached #11833

jakkdl · 2024-01-17T15:30:42Z

Fixes #1489 - see #1489 (comment) for explanation of current behavior.

It's possible this might break test setups that (inadvertently) relies on the teardown order being as it is, but I think it's an improvement on the status quo.

The code for setup & teardown of fixtures was very hard to parse, and I added a couple comments and fixed what I think are outdated and now incorrect docstrings. But feels like a more thorough cleanup could be warranted. I have not looked into git blame to see if/when/how stuff got changed.

Remaining questions:

Should finalizer be scheduled if the cached result is an exception? It previously was, I probably lean no. The change does not affect any current tests, but I could maybe write a test that asserts the behavior.
I never managed to figure out why fixtures adds its finalizer to all fixtures it depends on, nor if this could cause setup/teardown ordering failures (I tried writing tests that would trigger it, but didn't manage to). There's definitely reasons for it though, lots of stuff breaks if I remove that code.
- I tested moving the loop of adding the finalizer to fixtures after getting the cached value, which could plausibly remove some extraneous finalizers getting added, and that affected the result of test_issue_519 fixture scope is ignored when using metafunc.parametrize() #519 (teardown of ('test_one[arg1v2-arg2v1]', 'fix2', 'arg2v1'), got moved one step earlier) and a similar thing in test_parametrization_setup_teardown_ordering (step1-2 happening before setup 2), but I haven't dug into why or if/how it's bad.
- saving the functools.partial call is mostly to make it more visible that it's done in several places in the function, but am down for skipping it.
should I just make execute return None?

…e fixture value is cached.

jakkdl · 2024-01-17T15:50:15Z

seems like the failing tests are unrelated to this PR, they're failing in several other PRs.

bluetech · 2024-01-18T10:36:07Z

seems like the failing tests are unrelated to this PR, they're failing in several other PRs.

Yes, it's unrelated. If you rebase it should be good.

nicoddemus · 2024-01-18T14:45:44Z

I never managed to figure out why fixtures adds its finalizer to all fixtures it depends on

IIRC, this is meant to ensure that we teardown the current fixture before its upstream fixtures, because the upstream fixture will call its finalizers and then teardown itself.

It is a trick because fixture teardown is not structured/graph based, at least that's how I recall. For the note I also find this extremely confusing, and we have discussed in the past to replace this implicit mechanism in favor of a proper dependency graph.

jakkdl · 2024-01-18T15:39:13Z

I never managed to figure out why fixtures adds its finalizer to all fixtures it depends on

IIRC, this is meant to ensure that we teardown the current fixture before its upstream fixtures, because the upstream fixture will call its finalizers and then teardown itself.

Right. I thought that would be handled by the fact that they get torn down in reverse order of setup?

It is a trick because fixture teardown is not structured/graph based, at least that's how I recall. For the note I also find this extremely confusing, and we have discussed in the past to replace this implicit mechanism in favor of a proper dependency graph.

It is at least partly graph-based, see docstring of SetupState:

pytest/src/_pytest/runner.py

Line 422 in eefc9d4

class SetupState:

But yeah the teardowns within any of session/mod/item are just flat stacks.

RonnyPfannschmidt · 2024-01-18T15:48:10Z

This is in part necessary to handle dynamic fixture request

RonnyPfannschmidt · 2024-01-18T15:50:23Z

A big part of todays complexity is the funcarg mechanism which predates fixtures and scopes

jakkdl · 2024-02-09T12:45:19Z

Bump :)
any blockers or other problems with the PR?

bluetech · 2024-02-09T13:38:32Z

@jakkdl I will try to review, but right now still mostly dealing with pytest 8 fallout.

Fair warning though, making changes to core fixture code is usually very tricky.

RonnyPfannschmidt

The overall implementation looks like an improvement

I'm slightly worried as we currently express the logic in code instead of a data structure

However at first glance the better control looks reasonable

RonnyPfannschmidt · 2024-02-09T16:11:54Z

src/_pytest/fixtures.py

@@ -641,11 +642,8 @@ def _compute_fixture_value(self, fixturedef: "FixtureDef[object]") -> None:

        # Check if a higher-level scoped fixture accesses a lower level one.
        subrequest._check_scope(argname, self._scope, scope)
-        try:
-            # Call the fixture function.


This looks like it originally ensured teardown even when setup failed

I suspect that we are missing a edge case tests there

Does the fixture need finalizing in case we never run ihook.pytest_fixture_setup? I thought it was tightly coupled to the user-supplied code.

But I could change it to

try: fixturedef.execute(...) except: self._schedule_finalizers(...) raise

or specifically schedule a finalizer where I made a comment on line 1076

The specific schedule is slightly better,

On further consideration, changing that seems... bad to me? It means we'll be running the finalizer multiple times for a fixture with a cached exception, even if its setup was only run once. That is how it worked in the past though

Apologies, what I meant is that a single schedule is better than the current mechanism

Ideally this logic would move to a data structure instead of the code where it is

Right - no yeah I agree that the current implementation is quite messy and would benefit from an overhaul, but that seems out of scope for this PR.
I'm not sure I follow your original comment in that case, the try/except that previously was on this line is now moved one step deeper into execute() - and so there should be no change wrt to behaviour of scheduling finalizer if setup fails.

The only real change should ™️ be that the finalizer isn't scheduled if the value is cached (regardless of if it's an exception or not). If the setup code for the fixture fails, it's catched by the new try/finally.
If some other unrelated code in execute raises an exception the finalizer will not be scheduled, but in that case the setup has not run either.

So is there a problem I should address, or was your comment just a musing on how this should be generally overhauled? Or do you consider that a requirement before modifying any logic?

RonnyPfannschmidt

I believe we need to figure tests that include cached fixtures with failures

jakkdl · 2024-02-13T12:47:21Z

I believe we need to figure tests that include cached fixtures with failures

Added a test. If I were to add the change discussed, for scheduling teardown when setup failed, the code for the last function would be

def test_crash_expected_setup_and_teardown() -> None:
    assert executed_crash == ["fix_crash setup", "fix_crash teardown", "fix_crash_teardown"]

I should also change these tests so they're robust to reordering before merging.

bluetech

Not a real review yet, just some quick comments

src/_pytest/fixtures.py

testing/python/test_scope_fixture_caching.py

…tests to a pytester test in testing/python/fixtures.py

testing/python/test_scope_fixture_teardown_order.py

…der to testing/python/fixtures.py

bluetech

Thanks a lot for the PR @jakkdl, if we can get it merged it would be a very nice simplification.

So here is my understanding:

In issue fixture finalizer dependency incorrect when using autouse or getfuncargvalue #1489 we're seeing bad fixture teardown ordering.
You've analyzed the situation and concluded that we're adding finalizers too eagerly, particularly what's causing the issue is that we're adding a finalizer even when the fixturedef value is already cached.
Your PR makes it so that we don't add a finalizer when the fixture is cached.
In the process you've also removed code that was added as a fix for Incorrect finalize/cleanup order of fixtures when using request.getfixturevalue() #1895 (the SubRequest._schedule_finalizers code), because it is seemingly no longer needed (indeed, the regression test test_getfixturevalue_teardown passes).

So for me as a reviewer there are two questions:

Is the addfinalizer in the cached case needed for correctness?
Is the removal of the Incorrect finalize/cleanup order of fixtures when using request.getfixturevalue() #1895 code OK?

Is the removal of the #1895 code OK?

First I'm looking at the 2nd question since it's easier to examine. Here is the regression test for #1895:

import pytest

@pytest.fixture(scope='session')
def resource():
    r = ['value']
    yield r
    r.pop()

@pytest.fixture(scope='session')
def inner(request):
    resource = request.getfixturevalue('resource')
    assert resource == ['value']
    yield
    assert resource == ['value']

def test_inner(inner):
    pass

def test_func(resource):
    pass

I added some prints in pytest to see what's going on. Here is the output before this PR:

x.py::test_inner PASSED
TEARDOWN <Function test_inner>

x.py::test_func PASSED
TEARDOWN <Function test_func>

TEARDOWN <Module x.py>

TEARDOWN <Dir pytest>

TEARDOWN <Session  exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=2>

FIXTURE FINISH <FixtureDef argname='resource' scope='session' baseid='x.py'> <SubRequest 'resource' for <Function test_func>>

FIXTURE FINISH <FixtureDef argname='inner' scope='session' baseid='x.py'> <SubRequest 'inner' for <Function test_inner>>

FIXTURE FINISH <FixtureDef argname='inner' scope='session' baseid='x.py'> <SubRequest 'inner' for <Function test_inner>>

FIXTURE FINISH <FixtureDef argname='resource' scope='session' baseid='x.py'> <SubRequest 'resource' for <Function test_inner>>

After the PR, the output is (diff):

@@ -19,10 +19,6 @@
 
 TEARDOWN <Session  exitstatus=<ExitCode.OK: 0> testsfailed=0 testscollected=2>
 
-FIXTURE FINISH <FixtureDef argname='resource' scope='session' baseid='x.py'> <SubRequest 'resource' for <Function test_func>>
-
-FIXTURE FINISH <FixtureDef argname='inner' scope='session' baseid='x.py'> <SubRequest 'inner' for <Function test_inner>>
-
 FIXTURE FINISH <FixtureDef argname='inner' scope='session' baseid='x.py'> <SubRequest 'inner' for <Function test_inner>>
 
 FIXTURE FINISH <FixtureDef argname='resource' scope='session' baseid='x.py'> <SubRequest 'resource' for <Function test_inner>>

The optimistic interpretation of this is that before #1895 fix, we've been adding a useless finalizer in the cached case, and the #1895 fix fixed this by adding another useless finalizer to beat the previous useless finalizer. And this PR fixes it by not adding the useless finalizer in the first place.

The less optimistic interpretation is that the "useless" finalizer is not useless and is needed for correctness in some odd case, and then this PR causes some regression.

So really the second question reduces down to the first question.

Is the addfinalizer in the cached case needed for correctness?

Unfortunately I'm out of time for today to investigate this question, so this ends with a cliffhanger...

bluetech · 2024-03-04T07:42:42Z

testing/python/fixtures.py

+    """
+    pytester.makepyfile(
+        """
+        from typing import Generator


When inside pytester, probably no need for type annotations, they're not enforced anyway..

Imagine if they were though 🤩
Idk, they could maybe help understanding the code, or if somebody copies the code out from the pytester-string to "real" code. I don't see much/any harm in leaving them anyhow

testing/python/fixtures.py

bluetech · 2024-03-04T07:47:43Z

testing/python/fixtures.py

+    assert result.ret == 0
+
+
+def test_scope_fixture_caching_1(pytester: Pytester) -> None:


Perhaps we can submit test_scope_fixture_caching_1 and test_scope_fixture_caching_2 in a separate PR, as they're nice to have anyway and will reduce the size of this PR.

Seems reasonable, see #12121

bluetech · 2024-03-14T18:42:36Z

src/_pytest/fixtures.py

-        This will force the FixtureDef object to throw away any previous
-        results and compute a new fixture value, which will be stored into
-        the FixtureDef object itself.
+        If the FixtureDef has cached the result it will do nothing, otherwise it will


The previous comment is wrong, however saying that if the FixtureDef has cached the result it does nothing is not right either. It registers finalizers, and recomputes if the cache key no longer matches,

oh, registering finalizers regardless of if the value is cached seems dumb... if it's cached then we've already registered a finalizer when we computed the value. And this can also cause bad teardown ordering:

import pytest @pytest.fixture(scope="module") def fixture_1(request): ... @pytest.fixture(scope="module") def fixture_2(fixture_1): print("setup 2") yield print("teardown 2") @pytest.fixture(scope="module") def fixture_3(fixture_1): print("setup 3") yield print("teardown 3") def test_1(fixture_2): ... def test_2(fixture_3): ... # this will reschedule fixture_2's finalizer in the parent fixture, causing it to be # torn down before fixture 3 def test_3(fixture_2): ... # trigger finalization of fixture_1, otherwise the cleanup would sequence 3&2 before 1 as normal @pytest.mark.parametrize("fixture_1", [None], indirect=["fixture_1"]) def test_4(fixture_1): ...

this prints

setup 2 setup 3 teardown 2 teardown 3

but if we remove test_3 we get 2-3-3-2.

But this is also a different issue+PR

@jakkdl would you mind opening a fresh issue for this case and the one you describe in #11833 (comment)?

Done: #12134 and #12135

bluetech · 2024-03-14T19:23:26Z

testing/python/fixtures.py

+    """
+    Make sure setup and finalization is only run once when using fixture
+    multiple times. This might be a duplicate of another test."""


Suggested change

"""

Make sure setup and finalization is only run once when using fixture

multiple times. This might be a duplicate of another test."""

"""Make sure setup and finalization is only run once when using a fixture

multiple times."""

src/_pytest/fixtures.py

bluetech · 2024-03-14T20:05:21Z

I never managed to figure out why fixtures adds its finalizer to all fixtures it depends on,

If a fixture F1 depends on fixture F2, then F1 must be torn down before F2.

F1 guarantees this by registering its own finish as a finalizer in all fixtures it depends on. So when F2.finish() runs, it runs F1.finish() first as a finalizer.

That's a pretty brute force way to do it, but it is how it is...

jakkdl · 2024-03-15T14:43:35Z

I never managed to figure out why fixtures adds its finalizer to all fixtures it depends on,

If a fixture F1 depends on fixture F2, then F1 must be torn down before F2.

F1 guarantees this by registering its own finish as a finalizer in all fixtures it depends on. So when F2.finish() runs, it runs F1.finish() first as a finalizer.

That's a pretty brute force way to do it, but it is how it is...

Oh wait, this can cause funky ordering as well:

import pytest


@pytest.fixture(scope="module", params=["a", "b"])
def fixture_1(request):
    print("setup 1 ", request.param)
    yield
    print("teardown 1", request.param)


@pytest.fixture(scope="module")
def fixture_2():
    print("setup 2")
    yield
    print("teardown 2")


@pytest.fixture(scope="module")
def fixture_3(fixture_1):
    print("setup 3")
    yield
    print("\nteardown 3")


def test_1(fixture_1, fixture_2, fixture_3): ...

setup 1  a
setup 2
setup 3
.
teardown 3
teardown 1 a
setup 1  b
setup 3
.
teardown 3
teardown 1 b
teardown 2 <-- 2 is torn down out of order

That's definitely a different issue though, and not sure how to tackle that - if at all realistic.

for more information, see https://pre-commit.ci

Fixes pytest-dev#1489

#12393) ## Description of PR Summary: 1. Add `ip/test_mgmt_ipv6_only.py` into PR pipeline testing. 2. Rearrange fixture order for two test cases: `ip/test_mgmt_ipv6_only.py::test_rw_user_ipv6_only` and `ip/test_mgmt_ipv6_only.py::test_ro_user_ipv6_only`. 3. Workaround pytest fixture teardown bug affecting `setup_ntp` when run the `ip/test_mgmt_ipv6_only.py` tests. ### Type of change - [x] Bug fix - [ ] Testbed and Framework(new/improvement) - [x] Test case(new/improvement) ## Approach #### What is the motivation for this PR? ##### 1. Include `ip/test_mgmt_ipv6_only.py` into PR pipeline testing for IPv6 hardening. ##### 2. Fix errors when running individual test cases. ``` $ ./run_tests.sh -n vms-kvm-t0 -d vlab-01 -f vtestbed.yaml -i ../ansible/veos_vtb -u -c ip/test_mgmt_ipv6_only.py::test_ro_user_ipv6_only ...... ip/test_mgmt_ipv6_only.py::test_ro_user_ipv6_only[vlab-01] FAILED [100%] ...... ip/test_mgmt_ipv6_only.py:138: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ output = {'failed': True, 'changed': True, 'stdout': '', 'stderr': "Warning: Permanently added 'fec0::ffff:afa:1' (RSA) to the ...fec0::ffff:afa:1' (RSA) to the list of known hosts.", 'Permission denied, please try again.'], '_ansible_no_log': None} exp_val1 = 'test', exp_val2 = 'remote_user' def check_output(output, exp_val1, exp_val2): > pytest_assert(not output['failed'], output['stderr']) E Failed: Warning: Permanently added 'fec0::ffff:afa:1' (RSA) to the list of known hosts. E Permission denied, please try again. exp_val1 = 'test' exp_val2 = 'remote_user' output = {'failed': True, 'changed': True, 'stdout': '', 'stderr': "Warning: Permanently added 'fec0::ffff:afa:1' (RSA) to the ...fec0::ffff:afa:1' (RSA) to the list of known hosts.", 'Permission denied, please try again.'], '_ansible_no_log': None} tacacs/utils.py:25: Failed ``` The root case is: in current test case definition, the fixture setup sequence is: 1. `tacacs_v6` --> `sudo config tacacs add fec0::ffff:afa:2` 2. `convert_and_restore_config_db_to_ipv6_only` --> `config reload -y` after removing ipv4 mgmt address The `sudo config tacacs add fec0::ffff:afa:2` config is lost after the `config reload -y` in step 2. Therefore, causing tacacs authentication failure. If `convert_and_restore_config_db_to_ipv6_only` is called before `check_tacacs_v6`, there will be no issue. ``` Current definition: def test_ro_user_ipv6_only(localhost, duthosts, enum_rand_one_per_hwsku_hostname, tacacs_creds, check_tacacs_v6, convert_and_restore_config_db_to_ipv6_only): # noqa F811 Correct definition: def test_ro_user_ipv6_only(localhost, duthosts, enum_rand_one_per_hwsku_hostname, tacacs_creds, convert_and_restore_config_db_to_ipv6_only, check_tacacs_v6): # noqa F811 ``` ##### 3. Fix fixture teardown error when running whole ip/test_mgmt_ipv6_only.py. ``` When running the full test cases, we are seeing the following fixture sequence and error. $./run_tests.sh -n vms-kvm-t0 -d vlab-01 -c ip/test_mgmt_ipv6_only.py -f vtestbed.yaml -i ../ansible/veos_vtb -u -e "--setup-show" SETUP M convert_and_restore_config_db_to_ipv6_only (fixtures used: duthosts) SETUP M setup_ntp (fixtures used: duthosts, ptf_use_ipv6, ptfhost, rand_one_dut_hostname) ...... TEARDOWN M convert_and_restore_config_db_to_ipv6_only ---> This is wrong. setup_ntp should be teardown first. TEARDOWN M setup_ntp ...... > raise RunAnsibleModuleFail("run module {} failed".format(self.module_name), res) E tests.common.errors.RunAnsibleModuleFail: run module command failed, Ansible Results => E {"changed": true, "cmd": ["config", "ntp", "del", "fec0::ffff:afa:2"], "delta": "0:00:00.277230", "end": "2024-05-02 11:32:22.404196", "failed": true, "msg": "non-zero return code", "rc": 2, "start": "2024-05-02 11:32:22.126966", "stderr": "Usage: config ntp del [OPTIONS] <ntp_ip_address>\nTry \"config ntp del -h\" for help.\n\nError: NTP server fec0::ffff:afa:2 is not configured.", "stderr_lines": ["Usage: config ntp del [OPTIONS] <ntp_ip_address>", "Try \"config ntp del -h\" for help.", "", "Error: NTP server fec0::ffff:afa:2 is not configured."], "stdout": "", "stdout_lines": []} ...... ``` The teardown should be the reverse of fixture setup. The expected setup/teardown order is: ``` SETUP M convert_and_restore_config_db_to_ipv6_only (fixtures used: duthosts) SETUP M setup_ntp (fixtures used: duthosts, ptf_use_ipv6, ptfhost, rand_one_dut_hostname) ...... TEARDOWN M setup_ntp TEARDOWN M convert_and_restore_config_db_to_ipv6_only ``` This error is linked to a known issue pytest-dev/pytest#12135 in pytest, and it has been fixed pytest 8.2.0 via pytest-dev/pytest#11833. Currently, SONiC is utilizing pytest version 7.4.0, which does not include the fix for this issue. To address this, a workaround will be necessary until sonic-mgmt is upgraded to pytest version 8.2.0. #### How did you do it? 1. Add it into the PR test case list. 2. changed the fixture request sequence, put `convert_and_restore_config_db_to_ipv6_only` to the left of `check_tacacs_v6.` so `convert_and_restore_config_db_to_ipv6_only` fixture will run before `tacacs_v6`. 4. As upgrading pytest version is not trial change, I duplicated the `setup_ntp` fixture at `function` scope. As ntp is only one case in `test_mgmt_ipv6_only.py`, it makes it more suitable to use a `function` scope fixture instead of `module` scope fixture. #### How did you verify/test it? 1. pipeline check included test_mgmt_ipv6_only.py 2. Run individual test against test_rw_user_ipv6_only, test_ro_user_ipv6_only, test_ntp_ipv6_only. All passed: ``` $./run_tests.sh -n vms-kvm-t0 -d vlab-01 -f vtestbed.yaml -i ../ansible/veos_vtb -u -c ip/test_mgmt_ipv6_only.py::test_ro_user_ipv6_only .... ip/test_mgmt_ipv6_only.py::test_ro_user_ipv6_only[vlab-01] PASSED [100%] $ ./run_tests.sh -n vms-kvm-t0 -d vlab-01 -f vtestbed.yaml -i ../ansible/veos_vtb -u -c ip/test_mgmt_ipv6_only.py::test_rw_user_ipv6_only ...... ip/test_mgmt_ipv6_only.py::test_rw_user_ipv6_only[vlab-01] PASSED [100%] $./run_tests.sh -n vms-kvm-t0 -d vlab-01 -f vtestbed.yaml -i ../ansible/veos_vtb -u -c ip/test_mgmt_ipv6_only.py::test_ntp_ipv6_only ...... ip/test_mgmt_ipv6_only.py::test_ntp_ipv6_only[True-vlab-01] PASSED [100%] ``` 3. Full test passed: ``` $./run_tests.sh -n vms-kvm-t0 -d vlab-01 -f vtestbed.yaml -i ../ansible/veos_vtb -u -c ip/test_mgmt_ipv6_only.py ...... ip/test_mgmt_ipv6_only.py::test_bgp_facts_ipv6_only[vlab-01-None] PASSED [ 10%] ip/test_mgmt_ipv6_only.py::test_show_features_ipv6_only[vlab-01] PASSED [ 20%] ip/test_mgmt_ipv6_only.py::test_image_download_ipv6_only[vlab-01] SKIPPED (Cannot get image url) [ 30%] ip/test_mgmt_ipv6_only.py::test_syslog_ipv6_only[vlab-01-fd82:b34f:cc99::100-None] PASSED [ 40%] ip/test_mgmt_ipv6_only.py::test_syslog_ipv6_only[vlab-01-fd82:b34f:cc99::100-fd82:b34f:cc99::200] PASSED [ 50%] ip/test_mgmt_ipv6_only.py::test_ntp_ipv6_only[True-vlab-01] PASSED [ 60%] ip/test_mgmt_ipv6_only.py::test_snmp_ipv6_only[vlab-01] PASSED [ 70%] ip/test_mgmt_ipv6_only.py::test_ro_user_ipv6_only[vlab-01] PASSED [ 80%] ip/test_mgmt_ipv6_only.py::test_rw_user_ipv6_only[vlab-01] PASSED [ 90%] ip/test_mgmt_ipv6_only.py::test_telemetry_output_ipv6_only[vlab-01-True] PASSED [100%] ==================================================================================== warnings summary ==================================================================================== ../../../usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236 /usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated "class": algorithms.Blowfish, -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ----------------------------------------------------------------- generated xml file: /data/sonic-mgmt/tests/logs/tr.xml ----------------------------------------------------------------- ================================================================================ short test summary info ================================================================================= SKIPPED [1] common/helpers/assertions.py:16: Cannot get image url ================================================================== 9 passed, 1 skipped, 1 warning in 745.28s (0:12:25) =================================================================== ```

sonic-net#12393) Summary: 1. Add `ip/test_mgmt_ipv6_only.py` into PR pipeline testing. 2. Rearrange fixture order for two test cases: `ip/test_mgmt_ipv6_only.py::test_rw_user_ipv6_only` and `ip/test_mgmt_ipv6_only.py::test_ro_user_ipv6_only`. 3. Workaround pytest fixture teardown bug affecting `setup_ntp` when run the `ip/test_mgmt_ipv6_only.py` tests. - [x] Bug fix - [ ] Testbed and Framework(new/improvement) - [x] Test case(new/improvement) ``` $ ./run_tests.sh -n vms-kvm-t0 -d vlab-01 -f vtestbed.yaml -i ../ansible/veos_vtb -u -c ip/test_mgmt_ipv6_only.py::test_ro_user_ipv6_only ...... ip/test_mgmt_ipv6_only.py::test_ro_user_ipv6_only[vlab-01] FAILED [100%] ...... ip/test_mgmt_ipv6_only.py:138: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ output = {'failed': True, 'changed': True, 'stdout': '', 'stderr': "Warning: Permanently added 'fec0::ffff:afa:1' (RSA) to the ...fec0::ffff:afa:1' (RSA) to the list of known hosts.", 'Permission denied, please try again.'], '_ansible_no_log': None} exp_val1 = 'test', exp_val2 = 'remote_user' def check_output(output, exp_val1, exp_val2): > pytest_assert(not output['failed'], output['stderr']) E Failed: Warning: Permanently added 'fec0::ffff:afa:1' (RSA) to the list of known hosts. E Permission denied, please try again. exp_val1 = 'test' exp_val2 = 'remote_user' output = {'failed': True, 'changed': True, 'stdout': '', 'stderr': "Warning: Permanently added 'fec0::ffff:afa:1' (RSA) to the ...fec0::ffff:afa:1' (RSA) to the list of known hosts.", 'Permission denied, please try again.'], '_ansible_no_log': None} tacacs/utils.py:25: Failed ``` The root case is: in current test case definition, the fixture setup sequence is: 1. `tacacs_v6` --> `sudo config tacacs add fec0::ffff:afa:2` 2. `convert_and_restore_config_db_to_ipv6_only` --> `config reload -y` after removing ipv4 mgmt address The `sudo config tacacs add fec0::ffff:afa:2` config is lost after the `config reload -y` in step 2. Therefore, causing tacacs authentication failure. If `convert_and_restore_config_db_to_ipv6_only` is called before `check_tacacs_v6`, there will be no issue. ``` Current definition: def test_ro_user_ipv6_only(localhost, duthosts, enum_rand_one_per_hwsku_hostname, tacacs_creds, check_tacacs_v6, convert_and_restore_config_db_to_ipv6_only): # noqa F811 Correct definition: def test_ro_user_ipv6_only(localhost, duthosts, enum_rand_one_per_hwsku_hostname, tacacs_creds, convert_and_restore_config_db_to_ipv6_only, check_tacacs_v6): # noqa F811 ``` ``` When running the full test cases, we are seeing the following fixture sequence and error. $./run_tests.sh -n vms-kvm-t0 -d vlab-01 -c ip/test_mgmt_ipv6_only.py -f vtestbed.yaml -i ../ansible/veos_vtb -u -e "--setup-show" SETUP M convert_and_restore_config_db_to_ipv6_only (fixtures used: duthosts) SETUP M setup_ntp (fixtures used: duthosts, ptf_use_ipv6, ptfhost, rand_one_dut_hostname) ...... TEARDOWN M convert_and_restore_config_db_to_ipv6_only ---> This is wrong. setup_ntp should be teardown first. TEARDOWN M setup_ntp ...... > raise RunAnsibleModuleFail("run module {} failed".format(self.module_name), res) E tests.common.errors.RunAnsibleModuleFail: run module command failed, Ansible Results => E {"changed": true, "cmd": ["config", "ntp", "del", "fec0::ffff:afa:2"], "delta": "0:00:00.277230", "end": "2024-05-02 11:32:22.404196", "failed": true, "msg": "non-zero return code", "rc": 2, "start": "2024-05-02 11:32:22.126966", "stderr": "Usage: config ntp del [OPTIONS] <ntp_ip_address>\nTry \"config ntp del -h\" for help.\n\nError: NTP server fec0::ffff:afa:2 is not configured.", "stderr_lines": ["Usage: config ntp del [OPTIONS] <ntp_ip_address>", "Try \"config ntp del -h\" for help.", "", "Error: NTP server fec0::ffff:afa:2 is not configured."], "stdout": "", "stdout_lines": []} ...... ``` The teardown should be the reverse of fixture setup. The expected setup/teardown order is: ``` SETUP M convert_and_restore_config_db_to_ipv6_only (fixtures used: duthosts) SETUP M setup_ntp (fixtures used: duthosts, ptf_use_ipv6, ptfhost, rand_one_dut_hostname) ...... TEARDOWN M setup_ntp TEARDOWN M convert_and_restore_config_db_to_ipv6_only ``` This error is linked to a known issue pytest-dev/pytest#12135 in pytest, and it has been fixed pytest 8.2.0 via pytest-dev/pytest#11833. Currently, SONiC is utilizing pytest version 7.4.0, which does not include the fix for this issue. To address this, a workaround will be necessary until sonic-mgmt is upgraded to pytest version 8.2.0. 1. Add it into the PR test case list. 2. changed the fixture request sequence, put `convert_and_restore_config_db_to_ipv6_only` to the left of `check_tacacs_v6.` so `convert_and_restore_config_db_to_ipv6_only` fixture will run before `tacacs_v6`. 4. As upgrading pytest version is not trial change, I duplicated the `setup_ntp` fixture at `function` scope. As ntp is only one case in `test_mgmt_ipv6_only.py`, it makes it more suitable to use a `function` scope fixture instead of `module` scope fixture. 1. pipeline check included test_mgmt_ipv6_only.py 2. Run individual test against test_rw_user_ipv6_only, test_ro_user_ipv6_only, test_ntp_ipv6_only. All passed: ``` $./run_tests.sh -n vms-kvm-t0 -d vlab-01 -f vtestbed.yaml -i ../ansible/veos_vtb -u -c ip/test_mgmt_ipv6_only.py::test_ro_user_ipv6_only .... ip/test_mgmt_ipv6_only.py::test_ro_user_ipv6_only[vlab-01] PASSED [100%] $ ./run_tests.sh -n vms-kvm-t0 -d vlab-01 -f vtestbed.yaml -i ../ansible/veos_vtb -u -c ip/test_mgmt_ipv6_only.py::test_rw_user_ipv6_only ...... ip/test_mgmt_ipv6_only.py::test_rw_user_ipv6_only[vlab-01] PASSED [100%] $./run_tests.sh -n vms-kvm-t0 -d vlab-01 -f vtestbed.yaml -i ../ansible/veos_vtb -u -c ip/test_mgmt_ipv6_only.py::test_ntp_ipv6_only ...... ip/test_mgmt_ipv6_only.py::test_ntp_ipv6_only[True-vlab-01] PASSED [100%] ``` 3. Full test passed: ``` $./run_tests.sh -n vms-kvm-t0 -d vlab-01 -f vtestbed.yaml -i ../ansible/veos_vtb -u -c ip/test_mgmt_ipv6_only.py ...... ip/test_mgmt_ipv6_only.py::test_bgp_facts_ipv6_only[vlab-01-None] PASSED [ 10%] ip/test_mgmt_ipv6_only.py::test_show_features_ipv6_only[vlab-01] PASSED [ 20%] ip/test_mgmt_ipv6_only.py::test_image_download_ipv6_only[vlab-01] SKIPPED (Cannot get image url) [ 30%] ip/test_mgmt_ipv6_only.py::test_syslog_ipv6_only[vlab-01-fd82:b34f:cc99::100-None] PASSED [ 40%] ip/test_mgmt_ipv6_only.py::test_syslog_ipv6_only[vlab-01-fd82:b34f:cc99::100-fd82:b34f:cc99::200] PASSED [ 50%] ip/test_mgmt_ipv6_only.py::test_ntp_ipv6_only[True-vlab-01] PASSED [ 60%] ip/test_mgmt_ipv6_only.py::test_snmp_ipv6_only[vlab-01] PASSED [ 70%] ip/test_mgmt_ipv6_only.py::test_ro_user_ipv6_only[vlab-01] PASSED [ 80%] ip/test_mgmt_ipv6_only.py::test_rw_user_ipv6_only[vlab-01] PASSED [ 90%] ip/test_mgmt_ipv6_only.py::test_telemetry_output_ipv6_only[vlab-01-True] PASSED [100%] ==================================================================================== warnings summary ==================================================================================== ../../../usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236 /usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated "class": algorithms.Blowfish, -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ----------------------------------------------------------------- generated xml file: /data/sonic-mgmt/tests/logs/tr.xml ----------------------------------------------------------------- ================================================================================ short test summary info ================================================================================= SKIPPED [1] common/helpers/assertions.py:16: Cannot get image url ================================================================== 9 passed, 1 skipped, 1 warning in 745.28s (0:12:25) =================================================================== ```

sonic-net#12393) ## Description of PR Summary: 1. Add `ip/test_mgmt_ipv6_only.py` into PR pipeline testing. 2. Rearrange fixture order for two test cases: `ip/test_mgmt_ipv6_only.py::test_rw_user_ipv6_only` and `ip/test_mgmt_ipv6_only.py::test_ro_user_ipv6_only`. 3. Workaround pytest fixture teardown bug affecting `setup_ntp` when run the `ip/test_mgmt_ipv6_only.py` tests. ### Type of change - [x] Bug fix - [ ] Testbed and Framework(new/improvement) - [x] Test case(new/improvement) ## Approach #### What is the motivation for this PR? ##### 1. Include `ip/test_mgmt_ipv6_only.py` into PR pipeline testing for IPv6 hardening. ##### 2. Fix errors when running individual test cases. ``` $ ./run_tests.sh -n vms-kvm-t0 -d vlab-01 -f vtestbed.yaml -i ../ansible/veos_vtb -u -c ip/test_mgmt_ipv6_only.py::test_ro_user_ipv6_only ...... ip/test_mgmt_ipv6_only.py::test_ro_user_ipv6_only[vlab-01] FAILED [100%] ...... ip/test_mgmt_ipv6_only.py:138: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ output = {'failed': True, 'changed': True, 'stdout': '', 'stderr': "Warning: Permanently added 'fec0::ffff:afa:1' (RSA) to the ...fec0::ffff:afa:1' (RSA) to the list of known hosts.", 'Permission denied, please try again.'], '_ansible_no_log': None} exp_val1 = 'test', exp_val2 = 'remote_user' def check_output(output, exp_val1, exp_val2): > pytest_assert(not output['failed'], output['stderr']) E Failed: Warning: Permanently added 'fec0::ffff:afa:1' (RSA) to the list of known hosts. E Permission denied, please try again. exp_val1 = 'test' exp_val2 = 'remote_user' output = {'failed': True, 'changed': True, 'stdout': '', 'stderr': "Warning: Permanently added 'fec0::ffff:afa:1' (RSA) to the ...fec0::ffff:afa:1' (RSA) to the list of known hosts.", 'Permission denied, please try again.'], '_ansible_no_log': None} tacacs/utils.py:25: Failed ``` The root case is: in current test case definition, the fixture setup sequence is: 1. `tacacs_v6` --> `sudo config tacacs add fec0::ffff:afa:2` 2. `convert_and_restore_config_db_to_ipv6_only` --> `config reload -y` after removing ipv4 mgmt address The `sudo config tacacs add fec0::ffff:afa:2` config is lost after the `config reload -y` in step 2. Therefore, causing tacacs authentication failure. If `convert_and_restore_config_db_to_ipv6_only` is called before `check_tacacs_v6`, there will be no issue. ``` Current definition: def test_ro_user_ipv6_only(localhost, duthosts, enum_rand_one_per_hwsku_hostname, tacacs_creds, check_tacacs_v6, convert_and_restore_config_db_to_ipv6_only): # noqa F811 Correct definition: def test_ro_user_ipv6_only(localhost, duthosts, enum_rand_one_per_hwsku_hostname, tacacs_creds, convert_and_restore_config_db_to_ipv6_only, check_tacacs_v6): # noqa F811 ``` ##### 3. Fix fixture teardown error when running whole ip/test_mgmt_ipv6_only.py. ``` When running the full test cases, we are seeing the following fixture sequence and error. $./run_tests.sh -n vms-kvm-t0 -d vlab-01 -c ip/test_mgmt_ipv6_only.py -f vtestbed.yaml -i ../ansible/veos_vtb -u -e "--setup-show" SETUP M convert_and_restore_config_db_to_ipv6_only (fixtures used: duthosts) SETUP M setup_ntp (fixtures used: duthosts, ptf_use_ipv6, ptfhost, rand_one_dut_hostname) ...... TEARDOWN M convert_and_restore_config_db_to_ipv6_only ---> This is wrong. setup_ntp should be teardown first. TEARDOWN M setup_ntp ...... > raise RunAnsibleModuleFail("run module {} failed".format(self.module_name), res) E tests.common.errors.RunAnsibleModuleFail: run module command failed, Ansible Results => E {"changed": true, "cmd": ["config", "ntp", "del", "fec0::ffff:afa:2"], "delta": "0:00:00.277230", "end": "2024-05-02 11:32:22.404196", "failed": true, "msg": "non-zero return code", "rc": 2, "start": "2024-05-02 11:32:22.126966", "stderr": "Usage: config ntp del [OPTIONS] <ntp_ip_address>\nTry \"config ntp del -h\" for help.\n\nError: NTP server fec0::ffff:afa:2 is not configured.", "stderr_lines": ["Usage: config ntp del [OPTIONS] <ntp_ip_address>", "Try \"config ntp del -h\" for help.", "", "Error: NTP server fec0::ffff:afa:2 is not configured."], "stdout": "", "stdout_lines": []} ...... ``` The teardown should be the reverse of fixture setup. The expected setup/teardown order is: ``` SETUP M convert_and_restore_config_db_to_ipv6_only (fixtures used: duthosts) SETUP M setup_ntp (fixtures used: duthosts, ptf_use_ipv6, ptfhost, rand_one_dut_hostname) ...... TEARDOWN M setup_ntp TEARDOWN M convert_and_restore_config_db_to_ipv6_only ``` This error is linked to a known issue pytest-dev/pytest#12135 in pytest, and it has been fixed pytest 8.2.0 via pytest-dev/pytest#11833. Currently, SONiC is utilizing pytest version 7.4.0, which does not include the fix for this issue. To address this, a workaround will be necessary until sonic-mgmt is upgraded to pytest version 8.2.0. #### How did you do it? 1. Add it into the PR test case list. 2. changed the fixture request sequence, put `convert_and_restore_config_db_to_ipv6_only` to the left of `check_tacacs_v6.` so `convert_and_restore_config_db_to_ipv6_only` fixture will run before `tacacs_v6`. 4. As upgrading pytest version is not trial change, I duplicated the `setup_ntp` fixture at `function` scope. As ntp is only one case in `test_mgmt_ipv6_only.py`, it makes it more suitable to use a `function` scope fixture instead of `module` scope fixture. #### How did you verify/test it? 1. pipeline check included test_mgmt_ipv6_only.py 2. Run individual test against test_rw_user_ipv6_only, test_ro_user_ipv6_only, test_ntp_ipv6_only. All passed: ``` $./run_tests.sh -n vms-kvm-t0 -d vlab-01 -f vtestbed.yaml -i ../ansible/veos_vtb -u -c ip/test_mgmt_ipv6_only.py::test_ro_user_ipv6_only .... ip/test_mgmt_ipv6_only.py::test_ro_user_ipv6_only[vlab-01] PASSED [100%] $ ./run_tests.sh -n vms-kvm-t0 -d vlab-01 -f vtestbed.yaml -i ../ansible/veos_vtb -u -c ip/test_mgmt_ipv6_only.py::test_rw_user_ipv6_only ...... ip/test_mgmt_ipv6_only.py::test_rw_user_ipv6_only[vlab-01] PASSED [100%] $./run_tests.sh -n vms-kvm-t0 -d vlab-01 -f vtestbed.yaml -i ../ansible/veos_vtb -u -c ip/test_mgmt_ipv6_only.py::test_ntp_ipv6_only ...... ip/test_mgmt_ipv6_only.py::test_ntp_ipv6_only[True-vlab-01] PASSED [100%] ``` 3. Full test passed: ``` $./run_tests.sh -n vms-kvm-t0 -d vlab-01 -f vtestbed.yaml -i ../ansible/veos_vtb -u -c ip/test_mgmt_ipv6_only.py ...... ip/test_mgmt_ipv6_only.py::test_bgp_facts_ipv6_only[vlab-01-None] PASSED [ 10%] ip/test_mgmt_ipv6_only.py::test_show_features_ipv6_only[vlab-01] PASSED [ 20%] ip/test_mgmt_ipv6_only.py::test_image_download_ipv6_only[vlab-01] SKIPPED (Cannot get image url) [ 30%] ip/test_mgmt_ipv6_only.py::test_syslog_ipv6_only[vlab-01-fd82:b34f:cc99::100-None] PASSED [ 40%] ip/test_mgmt_ipv6_only.py::test_syslog_ipv6_only[vlab-01-fd82:b34f:cc99::100-fd82:b34f:cc99::200] PASSED [ 50%] ip/test_mgmt_ipv6_only.py::test_ntp_ipv6_only[True-vlab-01] PASSED [ 60%] ip/test_mgmt_ipv6_only.py::test_snmp_ipv6_only[vlab-01] PASSED [ 70%] ip/test_mgmt_ipv6_only.py::test_ro_user_ipv6_only[vlab-01] PASSED [ 80%] ip/test_mgmt_ipv6_only.py::test_rw_user_ipv6_only[vlab-01] PASSED [ 90%] ip/test_mgmt_ipv6_only.py::test_telemetry_output_ipv6_only[vlab-01-True] PASSED [100%] ==================================================================================== warnings summary ==================================================================================== ../../../usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236 /usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated "class": algorithms.Blowfish, -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ----------------------------------------------------------------- generated xml file: /data/sonic-mgmt/tests/logs/tr.xml ----------------------------------------------------------------- ================================================================================ short test summary info ================================================================================= SKIPPED [1] common/helpers/assertions.py:16: Cannot get image url ================================================================== 9 passed, 1 skipped, 1 warning in 745.28s (0:12:25) =================================================================== ```

asottile-sentry · 2024-09-17T19:18:57Z

haven't figured out what about it yet -- but something about this PR makes sentry's testsuite very flaky

annoyingly it's not very reproducible beyond running a whole suite a bunch of times so my "bisect" has taken a few days (I sorta manually bisected against git log --oneline --first-parent 8.1.0..8.2.0 -- src)

the failure mode looks like https://github.com/getsentry/sentry/actions/runs/10905480202/job/30267377794?pr=76935 -- the vast majority of the exceptions are in a per-test teardown

don't suppose anyone has more specific ideas about how I should narrow down what's causing this? I tried looking at the --setup-plan and the only differences in that seems to be in session-scoped fixtures teardown (but that's long long after the errors start happening)

my current hypothesis (with absolutely no evidence because I haven't found a consistent reproduction) is the first test failure (which happens during teardown) somehow interrupts the rest of the teardowns meaning the database connections never get cleaned up (and then subsequent tests which check that error). the test itself is flaky but mostly paved over with retries -- but that shouldn't bring down the rest of the suite with it!

first teardown failure, kinda long

_ ERROR at teardown of GroupDetailsTest.test_plugin_external_issue_annotation __
src/sentry/db/postgres/decorators.py:91: in inner
    return func(self, sql, *args, **kwargs)
src/sentry/db/postgres/base.py:85: in execute
    return self.cursor.execute(sql)
E   psycopg2.errors.ForeignKeyViolation: insert or update on table "sentry_projectcounter" violates foreign key constraint "sentry_projectcounter_project_id_90383de8_fk_sentry_project_id"
E   DETAIL:  Key (project_id)=(4554717002661889) is not present in table "sentry_project".

During handling of the above exception, another exception occurred:
.venv/lib/python3.12/site-packages/django/db/backends/utils.py:103: in _execute
    return self.cursor.execute(sql)
src/sentry/db/postgres/decorators.py:77: in inner
    raise_the_exception(self.db, e)
src/sentry/db/postgres/decorators.py:75: in inner
    return func(self, *args, **kwargs)
src/sentry/db/postgres/decorators.py:18: in inner
    return func(self, *args, **kwargs)
src/sentry/db/postgres/decorators.py:93: in inner
    raise type(e)(f"{e!r}\nSQL: {sql}").with_traceback(e.__traceback__)
src/sentry/db/postgres/decorators.py:91: in inner
    return func(self, sql, *args, **kwargs)
src/sentry/db/postgres/base.py:85: in execute
    return self.cursor.execute(sql)
E   psycopg2.errors.ForeignKeyViolation: ForeignKeyViolation('insert or update on table "sentry_projectcounter" violates foreign key constraint "sentry_projectcounter_project_id_90383de8_fk_sentry_project_id"\nDETAIL:  Key (project_id)=(4554717002661889) is not present in table "sentry_project".\n')
E   SQL: SET CONSTRAINTS ALL IMMEDIATE

The above exception was the direct cause of the following exception:
.venv/lib/python3.12/site-packages/django/test/testcases.py:372: in _setup_and_call
    self._post_teardown()
src/sentry/testutils/cases.py:415: in _post_teardown
    super()._post_teardown()
.venv/lib/python3.12/site-packages/django/test/testcases.py:1202: in _post_teardown
    self._fixture_teardown()
.venv/lib/python3.12/site-packages/django/test/testcases.py:1455: in _fixture_teardown
    connections[db_name].check_constraints()
.venv/lib/python3.12/site-packages/django/db/backends/postgresql/base.py:482: in check_constraints
    cursor.execute("SET CONSTRAINTS ALL IMMEDIATE")
.venv/lib/python3.12/site-packages/django/db/backends/utils.py:122: in execute
    return super().execute(sql, params)
.venv/lib/python3.12/site-packages/sentry_sdk/utils.py:1718: in runner
    return original_function(*args, **kwargs)
.venv/lib/python3.12/site-packages/django/db/backends/utils.py:79: in execute
    return self._execute_with_wrappers(
.venv/lib/python3.12/site-packages/django/db/backends/utils.py:92: in _execute_with_wrappers
    return executor(sql, params, many, context)
src/sentry/testutils/hybrid_cloud.py:130: in __call__
    return execute(*params)
.venv/lib/python3.12/site-packages/django/db/backends/utils.py:100: in _execute
    with self.db.wrap_database_errors:
.venv/lib/python3.12/site-packages/django/db/utils.py:91: in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
.venv/lib/python3.12/site-packages/django/db/backends/utils.py:103: in _execute
    return self.cursor.execute(sql)
src/sentry/db/postgres/decorators.py:77: in inner
    raise_the_exception(self.db, e)
src/sentry/db/postgres/decorators.py:75: in inner
    return func(self, *args, **kwargs)
src/sentry/db/postgres/decorators.py:18: in inner
    return func(self, *args, **kwargs)
src/sentry/db/postgres/decorators.py:93: in inner
    raise type(e)(f"{e!r}\nSQL: {sql}").with_traceback(e.__traceback__)
src/sentry/db/postgres/decorators.py:91: in inner
    return func(self, sql, *args, **kwargs)
src/sentry/db/postgres/base.py:85: in execute
    return self.cursor.execute(sql)
E   django.db.utils.IntegrityError: ForeignKeyViolation('insert or update on table "sentry_projectcounter" violates foreign key constraint "sentry_projectcounter_project_id_90383de8_fk_sentry_project_id"\nDETAIL:  Key (project_id)=(4554717002661889) is not present in table "sentry_project".\n')
E   SQL: SET CONSTRAINTS ALL IMMEDIATE

bluetech · 2024-09-17T21:30:38Z

@asottile-sentry What do you use for retries?

jakkdl · 2024-09-18T08:57:34Z

@asottile-sentry as I said at the start of the PR:

It's possible this might break test setups that (inadvertently) relies on the teardown order being as it is, but I think it's an improvement on the status quo.

To debug I would suggest logging extensively the order that your fixtures are set up and torn down (before and after this PR) and also watch out for #12134. It's possible that you need to add inter-fixture dependencies to ensure that they're handled correctly.

asottile-sentry · 2024-09-18T13:28:36Z

@asottile-sentry What do you use for retries?

pytest-rerunfailures -- though I wish we didn't because paving over retries just doesn't seem like a scalable solution if we ever want to reduce flakiness 😆

@asottile-sentry as I said at the start of the PR:

It's possible this might break test setups that (inadvertently) relies on the teardown order being as it is, but I think it's an improvement on the status quo.

To debug I would suggest logging extensively the order that your fixtures are set up and torn down (before and after this PR) and also watch out for #12134. It's possible that you need to add inter-fixture dependencies to ensure that they're handled correctly.

are you saying that --setup-plan doesn't tell me how the fixtures will actually order? (there were minor changes in that order as I noted above -- but far far after the failures themselves)

jakkdl · 2024-09-18T13:55:23Z

Oh, knowing about --setup-plan would've been better than liberal use of print to debug this and related issues in the first place 🙃

This PR really shouldn't change anything other than teardown ordering in some cases, I have no clue why it would impact anything else. But the fixture system is quite messy so ???

bluetech · 2024-09-18T18:14:36Z

pytest-rerunfailures

I suspected this -- rerunfailures is great but it adds another layer of complexity to such issues, especially involving teardown failures, because it does substantial hacking around the pytest runner internals. If you're able to try, do the cascading failures still happen without rerunfailures? Knowing this will tell us where to look.

though I wish we didn't because paving over retries just doesn't seem like a scalable solution if we ever want to reduce flakiness 😆

Indeed...

Also, teardown errors should be avoided whenever possible, semantically they're a mess to deal with.

asottile-sentry · 2024-09-24T16:45:29Z

pytest-rerunfailures

I suspected this -- rerunfailures is great but it adds another layer of complexity to such issues, especially involving teardown failures, because it does substantial hacking around the pytest runner internals. If you're able to try, do the cascading failures still happen without rerunfailures? Knowing this will tell us where to look.

though I wish we didn't because paving over retries just doesn't seem like a scalable solution if we ever want to reduce flakiness 😆

Indeed...

Also, teardown errors should be avoided whenever possible, semantically they're a mess to deal with.

turning off rerunfailures does seem to fix this -- we're also on an old version of that so I'm going to try upgrading it too

asottile-sentry · 2024-09-25T13:29:39Z

pytest-rerunfailures

I suspected this -- rerunfailures is great but it adds another layer of complexity to such issues, especially involving teardown failures, because it does substantial hacking around the pytest runner internals. If you're able to try, do the cascading failures still happen without rerunfailures? Knowing this will tell us where to look.

though I wish we didn't because paving over retries just doesn't seem like a scalable solution if we ever want to reduce flakiness 😆

Indeed...
Also, teardown errors should be avoided whenever possible, semantically they're a mess to deal with.

turning off rerunfailures does seem to fix this -- we're also on an old version of that so I'm going to try upgrading it too

upgrading it did not fix the problem -- so I believe there's something amiss with rerunfailures and pytest>=8.2 -- potentially with (partial?) missing teardowns

asottile-sentry · 2024-09-26T13:58:52Z

I think I'm getting closer -- I now have a local reproduction and it seems to need:

a flaky test
pytest-rerunfailures (with reruns)
~~pytest-sentry (with sentry configured)~~ jk I think I've eliminated that as a cause

hopefully I'll be able to find the problem from that!

asottile-sentry · 2024-09-26T21:38:28Z

finally narrowed it down to a minimal example -- and reconfirmed that this is the patch which regresses it via bisect

from unittest import TestCase

g = True


class TestCaseTest(TestCase):
    @classmethod
    def tearDownClass(cls):
        print('class teardown!')

    def test(self):
        global g
        print('test!')
        if g:
            print('flaky fail!')
            g = False
            raise AssertionError('#')

this is what it should produce:

$ ./venv/bin/pytest --reruns=2 -s t.py
============================ test session starts ============================
platform darwin -- Python 3.12.2, pytest-8.1.2, pluggy-1.5.0
rootdir: /private/tmp/y
plugins: rerunfailures-14.0
collected 1 item                                                            

t.py test!
flaky fail!
Rtest!
class teardown!
.

======================== 1 passed, 1 rerun in 0.01s =========================

and this is what it produces after this patch (note that class teardown! is missing):

$ ./venvnew/bin/pytest --reruns=2 -s t.py
============================ test session starts ============================
platform darwin -- Python 3.12.2, pytest-8.2.0, pluggy-1.5.0
rootdir: /private/tmp/y
plugins: rerunfailures-14.0
collected 1 item                                                            

t.py test!
flaky fail!
Rtest!
.

======================== 1 passed, 1 rerun in 0.01s =========================

jakkdl · 2024-09-27T09:57:49Z

before this patch redundant copies of teardowns were added (which is what caused erronous ordering), so maybe rerunfailures ends up removing some teardowns and was relying on there being duplicate copies in some way?

edit: trying it out locally, replacing tearDownClass with a pytest fixture I can't repro. So might be something with how they register that

jakkdl · 2024-09-27T10:19:36Z

https://github.com/pytest-dev/pytest-rerunfailures?tab=readme-ov-file#compatibility

pytest-rerunfailures does not official support class-scoped fixtures, which unittest tearDownClass might count as? Although some recent issues suggest that might have been resolved.

I think you may be another instance of pytest-dev/pytest-rerunfailures#267 though, this PR was released as part of 8.2

pytest-rerunfailures does not run class teardowns in pytest 8.2+ see pytest-dev/pytest#11833

jakkdl and others added 3 commits January 17, 2024 15:44

Move scheduling of fixture finalization so it isn't rescheduled if th…

baed905

…e fixture value is cached.

add changelog

3356fa7

Merge branch 'main' into teardown_fixture_order

6e5a8fe

Merge branch 'main' into teardown_fixture_order

fa9607e

RonnyPfannschmidt reviewed Feb 9, 2024

View reviewed changes

jakkdl added 4 commits February 13, 2024 13:35

Merge remote-tracking branch 'origin/main' into teardown_fixture_order

5eddb50

Add test for finalizer in a fixture with a cached exception

f76a77b

add typing to teardown_order test

537a831

fix typing in test to work on runtime in py38

d168ea4

Merge branch 'main' into teardown_fixture_order

fbe15ca

bluetech reviewed Mar 2, 2024

View reviewed changes

jakkdl added 2 commits March 3, 2024 11:22

Merge remote-tracking branch 'origin/main' into teardown_fixture_order

b3928bf

remove some comments, remove now unused _schedule_finalizers(), move …

3fc5c55

…tests to a pytester test in testing/python/fixtures.py

bluetech reviewed Mar 3, 2024

View reviewed changes

testing/python/test_scope_fixture_teardown_order.py Outdated Show resolved Hide resolved

improve comment getfixturevalue, move test_scoped_fixture_teardown_or…

007d24a

…der to testing/python/fixtures.py

jakkdl requested a review from bluetech March 3, 2024 12:23

bluetech reviewed Mar 14, 2024

View reviewed changes

jakkdl mentioned this pull request Mar 15, 2024

Add tests to ensure setup&finalization for scoped fixtures only run once. #12121

Merged

[pre-commit.ci] auto fixes from pre-commit.com hooks

13d1049

for more information, see https://pre-commit.ci

bluetech merged commit 70c1158 into pytest-dev:main Mar 16, 2024
24 checks passed

bluetech mentioned this pull request Mar 17, 2024

Refactor fixture finalization #4871

Open

This was referenced Mar 18, 2024

Teardown order mismatch with parametrized parent fixtures and scopes #12134

Open

Finalizer re-registered in parent fixture even when the value of the fixture is cached #12135

Closed

jakkdl deleted the teardown_fixture_order branch March 18, 2024 12:39

flying-sheep pushed a commit to flying-sheep/pytest that referenced this pull request Apr 9, 2024

Don't add fixture finalizer if the value is cached (pytest-dev#11833)

61d06a9

Fixes pytest-dev#1489

sdszhang mentioned this pull request May 2, 2024

Adding ipv6_mgmt_only test case into PR testing and fix fixture errors sonic-net/sonic-mgmt#12393

Merged

8 tasks

sdszhang mentioned this pull request May 13, 2024

fix fixture order issue in ipv6_mgmt tacacs and telemetry cases sonic-net/sonic-mgmt#12825

Merged

8 tasks

jakkdl mentioned this pull request Sep 27, 2024

pytest >= 8.2 is not yet supported (not only tests fail) pytest-dev/pytest-rerunfailures#267

Open

asottile-sentry added a commit to getsentry/sentry that referenced this pull request Sep 27, 2024

ref: replace pytest-rerunfailures with flaky

565ab96

pytest-rerunfailures does not run class teardowns in pytest 8.2+ see pytest-dev/pytest#11833

asottile-sentry mentioned this pull request Sep 27, 2024

ref: replace pytest-rerunfailures with flaky getsentry/sentry#78271

Closed

jakkdl mentioned this pull request Oct 31, 2024

fix compatibility with pytest 8.2 by restoring deleted finalizers pytest-dev/pytest-rerunfailures#278

Merged

		assert result.ret == 0


		def test_scope_fixture_caching_1(pytester: Pytester) -> None:

Don't add fixture finalizer if the value is cached #11833

Don't add fixture finalizer if the value is cached #11833

Conversation

jakkdl commented Jan 17, 2024

jakkdl commented Jan 17, 2024

bluetech commented Jan 18, 2024

nicoddemus commented Jan 18, 2024

jakkdl commented Jan 18, 2024 • edited Loading

RonnyPfannschmidt commented Jan 18, 2024

RonnyPfannschmidt commented Jan 18, 2024

jakkdl commented Feb 9, 2024

bluetech commented Feb 9, 2024

RonnyPfannschmidt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakkdl Feb 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RonnyPfannschmidt left a comment

Choose a reason for hiding this comment

jakkdl commented Feb 13, 2024 • edited Loading

bluetech left a comment

Choose a reason for hiding this comment

bluetech left a comment

Choose a reason for hiding this comment

Is the removal of the #1895 code OK?

Is the addfinalizer in the cached case needed for correctness?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakkdl Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bluetech commented Mar 14, 2024 • edited Loading

jakkdl commented Mar 15, 2024

asottile-sentry commented Sep 17, 2024

bluetech commented Sep 17, 2024

jakkdl commented Sep 18, 2024

asottile-sentry commented Sep 18, 2024

jakkdl commented Sep 18, 2024

bluetech commented Sep 18, 2024

asottile-sentry commented Sep 24, 2024

asottile-sentry commented Sep 25, 2024

asottile-sentry commented Sep 26, 2024 • edited Loading

asottile-sentry commented Sep 26, 2024

jakkdl commented Sep 27, 2024 • edited Loading

jakkdl commented Sep 27, 2024

jakkdl commented Jan 18, 2024 •

edited

Loading

jakkdl Feb 12, 2024 •

edited

Loading

jakkdl commented Feb 13, 2024 •

edited

Loading

jakkdl Mar 18, 2024 •

edited

Loading

bluetech commented Mar 14, 2024 •

edited

Loading

asottile-sentry commented Sep 26, 2024 •

edited

Loading

jakkdl commented Sep 27, 2024 •

edited

Loading