-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Use a GUID when creating the temp path #4645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
52f321d
280e9ad
fc636e4
c29a00b
0f06dfb
ca78b2c
34c7bb2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -256,7 +256,7 @@ protected void GetPath(out string pathEnt, out string pathTemp, string dir, stri | |||||||||||
| string root = Path.GetFullPath(DirTemp ?? @"x:\dummy"); | ||||||||||||
| string entityPath = Path.Combine(root, dir ?? "", name); | ||||||||||||
| entityPath = Path.GetFullPath(entityPath); | ||||||||||||
| string tempPath = Path.Combine(root, PathMap.Count.ToString()); | ||||||||||||
| string tempPath = Path.Combine(root, Guid.NewGuid().ToString()); | ||||||||||||
| tempPath = Path.GetFullPath(tempPath); | ||||||||||||
|
|
||||||||||||
| string parent = Path.GetDirectoryName(entityPath); | ||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current function here, machinelearning/src/Microsoft.ML.Core/Data/Repository.cs Lines 335 to 339 in c29a00b
There another minor race condition in how the new path is being checked/added to the dictionary. It's a nit, and minor enough to ignore if you want. If we want to handle in this PR, we could fix by either adding a lock around the IF/ELSE, or by moving to a ConcurrentDictionary and using its (I tagged this on a line of code to allow for a threaded discussion)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Went the |
||||||||||||
|
|
@@ -340,7 +340,9 @@ public Entry CreateEntry(string dir, string name) | |||||||||||
|
|
||||||||||||
| Stream stream; | ||||||||||||
| if (pathTemp != null) | ||||||||||||
| { | ||||||||||||
|
jwood803 marked this conversation as resolved.
Outdated
|
||||||||||||
| stream = new FileStream(pathTemp, FileMode.CreateNew); | ||||||||||||
| } | ||||||||||||
| else | ||||||||||||
| stream = new MemoryStream(); | ||||||||||||
|
|
||||||||||||
|
|
||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please help me understand the problem you are trying to fix?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, there's a race condition in creating temporary files, causing a naming collision.
The paths are created as
C:\Users\jormont\AppData\Local\Temp\2\TLC_25553D77\0, where:TLC_is fixed25553D77is a seeded random in hex (bad)0is the count [0..N] of folders existing in the parent folder (also bad).The prefix
TLC_(1) is fine, though should be updated to the current name, perhaps toML_NET_. OrMicrosoft_ML_, though I'd worry about creating too long of paths under Windows ~260 chars, especially as a we move towards a GUID. Note users can have long usernames, plus the names of the files placed in this path.First issue -- The seeded random number (2) is an issue since running parallelly (or rerunning) with the same seed creates the same "random" folder name. This part is a deterministic collision.
Second issue -- The race condition then occurs when counting how many folders (3) are in the parent folder. When multiple processes/threads check the contents of the folder, they all receive the same number. The purposed fix replaces this count [0..N] with a GUID.
The need is to have unique folders for creating temporary files, otherwise they clobber each other. This is causing a crash and has the possibility of silent failures, especially on macOS/Linux where you can delete and overwrite files with currently open handles.
Currently the way I run into this, is running N parallel copies of the AutoML․NET CLI, which I do for validating changes to the ML․NET AutoML API on ~30 datasets. Some of the processes crash with the error in the linked issue, unless they are started staggered to avoid the race condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushing deeper into the code, the random number (2) is not seeded, but uses the system seed which is set from the current time. I expect the collision occurs due the remark in
System.Randomdocs, "On most Windows systems, Random objects created within 15 milliseconds of one another are likely to have identical seed values." (src). Hence another race condition in (2).Hopefully the .NET GUIDs don't have the same 15ms race condition as the RNG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current code, it looks like we are using our own random number generation, which seems odd.
It is okay to use GUIDs. But they seem a bit of overkill.
Path.GetRandomFileName seems to be intended for this kind of scenario.
Would that work better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harishsk: You're right,
Path.GetRandomFileName()seems more purpose built than a GUID.Code behind this function -- https://github.com/dotnet/runtime/blob/f57d49bb711fda8ac0bd231229d91919f4fca8e8/src/libraries/System.Private.CoreLib/src/System/IO/Path.cs#L249-L265
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harishsk I had no idea that method existed. Thanks for letting me know! Updated to use it.