-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZTS: Add timeout to cp_stress #16298
Conversation
5aa2e07
to
49f9076
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks right for what it is, but I wonder: what kind of "underpowered" are we talking here? Because seekflood
is more likely to hit things if there are more CPUs. So if there's only one or two here, is it even worth running this?
A little background: I'm currently helping test #15838. The idea there is to convert our buildbot builders to github runners. Or more specifically, run ZTS within VMs on the github runners (using the runner as a VM host). We need to use VMs, since github runners don't natively support all the OSs we test on (like Fedora, Almalinux, FreeBSD, etc). In order to speed up the testing and not run into the 4-hour testing timeout, we're chunking up the the tests into thirds and running it on three VMs running in parallel. That unfortunately means each each VM is going over-provisioned and run slower. That's fine for most ZTS tests since they aren't very CPU bound, but there are some outliers like So yes, we ideally would want faster runners with more CPUs since they're going to expose race conditions and locking bugs like this quicker. But we also have to make due with what's available. |
49f9076
to
a036bf8
Compare
Oh I'm sorry, my question wasn't clear (also it was more of a drive-by mumble than a really big-brained thought). I more meant, if we know this test is running in an environment where it's unlikely to actually bump into the thing it's checking for (because it isn't parallel enough), should we disable the test in that environment. Like, should it SKIP if it has fewer than N cores or something? I suppose even "unlikely" is better than "never" though! |
I think it's still a useful test to run no matter what the hardware is. This popped up today: #16297 |
Oof, hard to argue with results. Also yikes, I'll check it out! 😅 |
cp_stress is getting killed on the new QEMU-based github runners we're developing. The problem is that the QEMU-based testers are so underpowered that the test is taking longer than the 10min maximum that ZTS enforces. Instead, enforce an inter-test-cp_stress timeout so the entire test doesn't get killed. Signed-off-by: Tony Hutter <[email protected]>
a036bf8
to
e73ecbd
Compare
I created another pull request for this problem here: #16369 This one just takes down the RUNS to the limit of the FreeBSD testings, and then the test work fine on Linux also. |
@tonyhutter - can be closed now 👍 |
Closing in favor of #16369 |
Motivation and Context
cp_stress
ZTS test can timeout after 10min on underpowered test systemsDescription
cp_stress is getting killed on the new QEMU-based github runners we're developing. The problem is that the QEMU-based testers are so under-powered that the test is taking longer than the 10min maximum that ZTS enforces. Instead, enforce an inter-test-cp_stress timeout so the entire test doesn't get killed.
How Has This Been Tested?
Ran on FreeBSD 13 and Ubuntu 24.
Types of changes
Checklist:
Signed-off-by
.