Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark: fix race condition on fs benchs #50035

Merged
merged 5 commits into from
Oct 15, 2023

Conversation

H4ad
Copy link
Member

@H4ad H4ad commented Oct 4, 2023

Some of these benchmarks fails even on main versions of Node.js.

This issue is caused because afterRead is being called and is locking the temp file which makes the .on('exit') fails to clean the temp file.

If we wait all the concurrent timers to finish, the issue is solved.

First found this issue on: #49962

@nodejs-github-bot nodejs-github-bot added benchmark Issues and PRs related to the benchmark subsystem. fs Issues and PRs related to the fs subsystem / file system. labels Oct 4, 2023
@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Oct 4, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 4, 2023
@nodejs-github-bot
Copy link
Collaborator

@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Oct 4, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 4, 2023
@nodejs-github-bot
Copy link
Collaborator

@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Oct 4, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 4, 2023
@nodejs-github-bot
Copy link
Collaborator

@anonrig
Copy link
Member

anonrig commented Oct 4, 2023

@nodejs/fs @nodejs/performance @nodejs/platform-windows can you review?

@anonrig anonrig added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Oct 4, 2023
Copy link
Member

@benjamingr benjamingr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -1 on an approach that uses setTimeout here, if we end up using setTimeout we should use platform dependent timeout probably.

In general code that does cleanup and may crash shouldn't clean up after itself but when starting which the code here already does - so I wonder if it isn't better to just remove the cleanup and rely on the test.

The benchmarks should likely use async/await instead of the callbackify-liike weirdness they do at the moment.

@H4ad
Copy link
Member Author

H4ad commented Oct 5, 2023

What do you mean by platform dependent timeout?

async/away has some overhead and the idea of this benchmark is to test the performance of the callback version, so there is no reason to change this behavior.

Another approach for setTimeout could be setImediate, but I didn't test if could handle the race condition.

These changes are meant to fix the error that happens in these lines:

function onexit(useSpawn) {
// Change directory to avoid possible EBUSY
if (isMainThread)
process.chdir(testRoot);
try {
rmSync(tmpPath, useSpawn);
} catch (e) {
console.error('Can\'t clean tmpdir:', tmpPath);
const files = fs.readdirSync(tmpPath);
console.error('Files blocking:', files);
if (files.some((f) => f.startsWith('.nfs'))) {
// Warn about NFS "silly rename"
console.error('Note: ".nfs*" might be files that were open and ' +
'unlinked but not closed.');
console.error('See http://nfs.sourceforge.net/#faq_d2 for details.');
}
console.error();
throw e;
}
}

Due to race condition, the file cannot be deleted because the readline still locking the file.

Maybe we could remove the tmpdir dependency and create and delete manually the temp folders, but I think we will have the same issue and the solution will be the same (setTimeout).

@benjamingr
Copy link
Member

What do you mean by platform dependent timeout?

https://github.com/nodejs/node/tree/main/test/common#platformtimeoutms

async/away has some overhead and the idea of this benchmark is to test the performance of the callback version, so there is no reason to change this behavior.

I guarantee you it has less or as much overhead as the "create a promise then go through extra closure and function" version we currently use.

Another approach for setTimeout could be setImediate, but I didn't test if could handle the race condition.

We should never rely on timers for dealing with race conditions typically. The problem here is design, we should delete the file when the test starts and not when it ends..

@Uzlopak
Copy link
Contributor

Uzlopak commented Oct 5, 2023

@RafaelGSS

Cant we have a after helper, like mocha/chai/tap has, in the benchmark framework and run it after the benchmark finished?

@benjamingr
Copy link
Member

What if the user stops the benchmark in the middle (or similarly in CI?)?

What's the issue with only doing the cleanup before the test and not after?

@H4ad
Copy link
Member Author

H4ad commented Oct 6, 2023

@benjamingr Doing it this way, it will clean up only during the start of the benchmark, and then the file will be there without being cleaned.

It may or may not be a problem, but if we went this way we would need to remove the reference to tempdir.refresh because they automatically install the process.on('exit') cleanup hook.

About stopping the CI in the middle, the file is cleaned because of process.on('exit') but this can lead to the same error, in this context, the error doesn't matter since we are canceling the execution.

But I will try to use platform-specific timeout and see how it behaves with async/await.

@benjamingr
Copy link
Member

Doing it this way, it will clean up only during the start of the benchmark, and then the file will be there without being cleaned.

It may or may not be a problem

It's not (at least shouldn't be) it's a tmp dir, it's transient storage by design. It's "fine" to leave trash there.

But I will try to use platform-specific timeout and see how it behaves with async/await.

Sure but I would strongly prefer it that we don't use a timers based solution since that inherently depends on how slow the machine running the benchmark/test is which means harder-to-track-and-fix flakes.

@H4ad
Copy link
Member Author

H4ad commented Oct 10, 2023

@benjamingr Just to summarize the changes, now I wait for all the concurrent timers to finish until really stop the benchmark, in this approach I don't relly on platform dependant timeouts.

@anonrig anonrig added request-ci Add this label to start a Jenkins CI on a PR. and removed author ready PRs that have at least one approval, no pending requests for changes, and a CI started. labels Oct 10, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 10, 2023
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina
Copy link
Member

@benjamingr I see you asked for some changes here, are you still blocking this to land?

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@anonrig

This comment has been minimized.

@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Oct 14, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 14, 2023
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@anonrig anonrig added the commit-queue Add this label to land a pull request using GitHub Actions. label Oct 15, 2023
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Oct 15, 2023
@nodejs-github-bot nodejs-github-bot merged commit 33c87ec into nodejs:main Oct 15, 2023
55 checks passed
@nodejs-github-bot
Copy link
Collaborator

Landed in 33c87ec

kumarrishav pushed a commit to kumarrishav/node that referenced this pull request Oct 16, 2023
PR-URL: nodejs#50035
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
alexfernandez pushed a commit to alexfernandez/node that referenced this pull request Nov 1, 2023
PR-URL: nodejs#50035
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
targos pushed a commit that referenced this pull request Nov 11, 2023
PR-URL: #50035
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark Issues and PRs related to the benchmark subsystem. fs Issues and PRs related to the fs subsystem / file system.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants