-
Notifications
You must be signed in to change notification settings - Fork 384
Fix Linux Mutex Utilization #2229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It seems like it dropped the execution lock (maybe it threw an error) when executing The command apt-get -o DPkg::Lock::Timeout=180 upgrade -y dotnet-sdk-8.0 was forwarded to the master process to run. It waited about 2 seconds then freed the server before it was done. It freed before it threw but free happens before throw, so it might have failed for some reason, then failed again since stdout.txt didn't exist because the process wasn't done running. then freed. When it freed, it was still going but then it allowed the sudo master process killer to run which caused this to fail later.... Still not sure why that'd happen, it might be if the locking was not implemented correctly.
This will help prevent an existing process from colliding with another.
this is basically like random int collision at this point, could probably remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
vscode-dotnet-runtime-library/src/Utils/CommandExecutor.ts:123
- Locking now uses the script path instead of the communication directory, while subsequent operations continue to use the directory. Verify if this change is intentional to avoid potential race conditions or locking inconsistencies.
return executeWithLock(this.context.eventStream, false, RUN_UNDER_SUDO_LOCK(this.sudoProcessScript), SUDO_LOCK_PING_DURATION_MS, waitForLockTimeMs,
vscode-dotnet-runtime-library/src/Utils/CommandExecutor.ts:149
- [nitpick] Reducing the timeout from 1000ms to 500ms for checking process liveness may cause premature failures. Confirm that this new timeout value meets the operational requirements.
if (await this.sudoProcIsLive(false, fullCommandString, 500))
Co-authored-by: Copilot <[email protected]>
vscode-dotnet-runtime-library/src/Acquisition/DotnetPathFinder.ts
Outdated
Show resolved
Hide resolved
vscode-dotnet-runtime-library/src/Acquisition/DotnetPathFinder.ts
Outdated
Show resolved
Hide resolved
| catch (error: any) | ||
| { | ||
| // eslint-disable-next-line @typescript-eslint/no-unsafe-member-access | ||
| error.message = error.message + `\nFailed to create ${this.sudoProcessCommunicationDir}. Please check your permissions or install dotnet manually.`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may happen in the rare event of a collision where the dir exists, so its ok to keep going here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you have fallback dirs? Like if dir exists, check dir1, then dir2, etc.? I don't feel great about catching any exception and not even verifying that it failed for the reason you mentioned. You could also have sudoProcessCommunication dir is unexpectedly null; that would be a bug, and we should care, I think
Forgind
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think any of these are too serious. I'm a little concerned about the undefined thing, but that might just be my lack of js knowledge
| catch (error: any) | ||
| { | ||
| // eslint-disable-next-line @typescript-eslint/no-unsafe-member-access | ||
| error.message = error.message + `\nFailed to create ${this.sudoProcessCommunicationDir}. Please check your permissions or install dotnet manually.`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you have fallback dirs? Like if dir exists, check dir1, then dir2, etc.? I don't feel great about catching any exception and not even verifying that it failed for the reason you mentioned. You could also have sudoProcessCommunication dir is unexpectedly null; that would be a bug, and we should care, I think
| { | ||
| if (await this.sudoProcIsLive(false, fullCommandString, 1000)) // If the sudo process was spawned by another instance of code, we do not want to have 2 at once but also do not waste a lot of time checking | ||
| // As it should not be in the middle of an operation which may cause it to take a while. | ||
| if (await this.sudoProcIsLive(false, fullCommandString, 500)) // If the sudo process was spawned by another instance of code, we do not want to have 2 at once but also do not waste a lot of time checking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure users would really notice the difference between 500ms and 1000ms. I don't know how long this is expected to take (IPC, right? I think speed then depends on OS), but I could imagine it taking longer than 500ms on a heavily loaded system
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. This check only exists for when there is an existing sudo process working in same the individualized folder.
That should only happen when we think we have not spawned a sudo process ourselves that is currently alive, or the sudo process spawned unsuccessfully while throwing an error. This check can probably be removed in its entirety.
| stderr: (fs.readFileSync(stderrFile, 'utf8')).trim(), | ||
| status: (fs.readFileSync(statusFile, 'utf8')).trim() | ||
| stdout: (await (this.fileUtil as FileUtilities).read(stdoutFile)).trim(), | ||
| stderr: (await (this.fileUtil as FileUtilities).read(stderrFile)).trim(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This defaults to utf8? I'm also wondering if you can have these three going in parallel then just 'join' them afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! The function does default to utf8. I like the parallelization idea, though I want to avoid adding that complexity here
That's a good point. If we cant make the first dir, I doubt we can make the 2nd. Perhaps we could use the existing directory as a fallback, though that would introduce the race condition again if it happens multiple times. I could condition it to not fail on EEXIST. Though, I just don't know what other types of errors the kernel could throw at us :/ |
… kernels EACCESS as well. This may be something that will not resolve, though if some admin thing is blocking and holding the file, perhaps this is a good resolution. I never encountered this when trying different permissions on my demos but it seems that some people are.
|
@Forgind, I think I resolved the 2 issues you put at the top by throwing the error in certain cases if we cant use mkdir or using a backupdir, and then by removing the extra 500 ms timeout check. Thanks for looking and let me know what you think! |
Forgind
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy; thanks!
might as well do it in this PR
Now that the mutex is implemented correctly, I can check that the code which uses the mutex is all correct.
An issue I discovered, is that on Linux, we use a sudo process folder to exchange files with a master sudo process.
This folder is shared across all instances of vscode. If you kill vscode, the sudo process does not go away.
When a new instance checks if it should spawn a master sudo process, it tries to communicate with the old one. But the old one may be busy, and the old logic only waited 1 second. The sudo process may be busy for multiple minutes. This can result in race conditions where the old sudo process and new one are doing some of the same tasks. Now, I've given each vscode instance its own folder to run sudo commands under.
The script that runs these commands is still in the same protected location. It will only run authorized and approved commands by us, so spawning a new directory is OK.
Resolves #2224