ZTS: Add timeout to cp_stress #16298

tonyhutter · 2024-06-24T18:42:29Z

Motivation and Context

cp_stress ZTS test can timeout after 10min on underpowered test systems

Description

cp_stress is getting killed on the new QEMU-based github runners we're developing. The problem is that the QEMU-based testers are so under-powered that the test is taking longer than the 10min maximum that ZTS enforces. Instead, enforce an inter-test-cp_stress timeout so the entire test doesn't get killed.

How Has This Been Tested?

Ran on FreeBSD 13 and Ubuntu 24.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

tests/zfs-tests/tests/functional/cp_files/cp_stress.ksh

robn

Looks right for what it is, but I wonder: what kind of "underpowered" are we talking here? Because seekflood is more likely to hit things if there are more CPUs. So if there's only one or two here, is it even worth running this?

tonyhutter · 2024-06-25T00:41:04Z

Looks right for what it is, but I wonder: what kind of "underpowered" are we talking here? Because seekflood is more likely to hit things if there are more CPUs. So if there's only one or two here, is it even worth running this?

A little background: I'm currently helping test #15838. The idea there is to convert our buildbot builders to github runners. Or more specifically, run ZTS within VMs on the github runners (using the runner as a VM host). We need to use VMs, since github runners don't natively support all the OSs we test on (like Fedora, Almalinux, FreeBSD, etc). In order to speed up the testing and not run into the 4-hour testing timeout, we're chunking up the the tests into thirds and running it on three VMs running in parallel. That unfortunately means each each VM is going over-provisioned and run slower. That's fine for most ZTS tests since they aren't very CPU bound, but there are some outliers like cp_stress that do run too long and get killed.

So yes, we ideally would want faster runners with more CPUs since they're going to expose race conditions and locking bugs like this quicker. But we also have to make due with what's available.

robn · 2024-06-25T00:51:17Z

Oh I'm sorry, my question wasn't clear (also it was more of a drive-by mumble than a really big-brained thought).

I more meant, if we know this test is running in an environment where it's unlikely to actually bump into the thing it's checking for (because it isn't parallel enough), should we disable the test in that environment. Like, should it SKIP if it has fewer than N cores or something?

I suppose even "unlikely" is better than "never" though!

tonyhutter · 2024-06-25T00:58:18Z

I think it's still a useful test to run no matter what the hardware is. This popped up today: #16297

robn · 2024-06-25T01:01:35Z

Oof, hard to argue with results. Also yikes, I'll check it out! 😅

cp_stress is getting killed on the new QEMU-based github runners we're developing. The problem is that the QEMU-based testers are so underpowered that the test is taking longer than the 10min maximum that ZTS enforces. Instead, enforce an inter-test-cp_stress timeout so the entire test doesn't get killed. Signed-off-by: Tony Hutter <[email protected]>

mcmilk · 2024-07-20T07:11:56Z

I created another pull request for this problem here: #16369

This one just takes down the RUNS to the limit of the FreeBSD testings, and then the test work fine on Linux also.
The cp_stress test needs around 2 minutes on FreeBSD and Linux then...

mcmilk · 2024-07-22T21:12:48Z

@tonyhutter - can be closed now 👍

tonyhutter · 2024-07-25T23:46:37Z

Closing in favor of #16369

tonyhutter force-pushed the cp_stress_timeout branch from 5aa2e07 to 49f9076 Compare June 24, 2024 18:58

robn reviewed Jun 24, 2024

View reviewed changes

tests/zfs-tests/tests/functional/cp_files/cp_stress.ksh Show resolved Hide resolved

robn approved these changes Jun 24, 2024

View reviewed changes

tonyhutter force-pushed the cp_stress_timeout branch from 49f9076 to a036bf8 Compare June 25, 2024 00:44

mcmilk approved these changes Jun 25, 2024

View reviewed changes

tonyhutter added the Status: Code Review Needed Ready for review and testing label Jul 12, 2024

tonyhutter force-pushed the cp_stress_timeout branch from a036bf8 to e73ecbd Compare July 19, 2024 00:01

tonyhutter closed this Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZTS: Add timeout to cp_stress #16298

ZTS: Add timeout to cp_stress #16298

tonyhutter commented Jun 24, 2024

robn left a comment

tonyhutter commented Jun 25, 2024

robn commented Jun 25, 2024

tonyhutter commented Jun 25, 2024 •

edited

Loading

robn commented Jun 25, 2024

mcmilk commented Jul 20, 2024

mcmilk commented Jul 22, 2024

tonyhutter commented Jul 25, 2024

ZTS: Add timeout to cp_stress #16298

ZTS: Add timeout to cp_stress #16298

Conversation

tonyhutter commented Jun 24, 2024

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

robn left a comment

Choose a reason for hiding this comment

tonyhutter commented Jun 25, 2024

robn commented Jun 25, 2024

tonyhutter commented Jun 25, 2024 • edited Loading

robn commented Jun 25, 2024

mcmilk commented Jul 20, 2024

mcmilk commented Jul 22, 2024

tonyhutter commented Jul 25, 2024

tonyhutter commented Jun 25, 2024 •

edited

Loading