Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ReadStream/WriteStream data race #62

Closed
wants to merge 2 commits into from

Conversation

floren
Copy link
Contributor

@floren floren commented May 3, 2021

Handle possible race where WriteStream can mess up the data being read from ReadStream.

Copy link
Owner

@peterbourgon peterbourgon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please add a test which fails on current master and passes on your branch :)

// By calling os.Remove, we unlink the existing file, but since the process
// still has a handle on the file, the existing readers can continue to
// read from it. Meanwhile we're free to update what's actually on-disk.
os.Remove(d.completeFilename(pathKey))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if Remove returns an error? Is it safe to ignore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the time, if Remove returns an error, it's because the file doesn't exist. If some other error comes up, I think the best thing to do is just try and go ahead with the OpenFile, see if that works or not.

// read from it. Meanwhile we're free to update what's actually on-disk.
os.Remove(d.completeFilename(pathKey))

mode := os.O_WRONLY | os.O_CREATE | os.O_TRUNC // overwrite if exists somehow
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a minimum, this comment should be updated; might also want to change the mode itself, given we know (assume) the file isn't there? Is that a good idea?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above, if Remove somehow fails and the file still exists, we definitely want O_TRUNC, even though the most likely case is that OpenFile will also then fail (I've never heard of a situation where you can open a file to write but cannot remove it)

Comment on lines +226 to 229
os.Remove(d.completeFilename(pathKey))

mode := os.O_WRONLY | os.O_CREATE | os.O_TRUNC // overwrite if exists somehow
f, err := os.OpenFile(d.completeFilename(pathKey), mode, d.FilePerm)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... what do you think about creating the new file as d.completeFilename(pathKey) plus some random extension, and then relying on the existing behavior in writeStreamWithLock to atomically move it to the proper filename? That would accomplish the same thing, I think, and avoid this intermediate phase...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This brings up an excellent point: the TempDir flag already fixes this problem by doing basically that.

The problem is that if you instantiate a diskv without setting the TempDir option, you cannot safely use ReadStream, which seems bad.

What's your opinion on making createKeyFileWithLock always write to d.completeFilename(pathKey)+extension? We would leave TempDir in the Options struct, but note that it is deprecated since all writes are now atomic.

If you have many slow readers using ReadStream, coupled with frequent writes, you will end up consuming additional disk space because the OS is keeping the old files around until the readers complete... but the alternative is handing back invalid data to the readers, which seems much worse to me!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your opinion on making createKeyFileWithLock always write to d.completeFilename(pathKey)+extension?

Yeah, I think I like this better. (Assuming the +extension bit is the random extension I mentioned.)

If you have many slow readers using ReadStream, coupled with frequent writes, you will end up consuming additional disk space because the OS is keeping the old files around until the readers complete...

I think this is basically unavoidable. If it became a problem, you could maybe interrupt readers.

You OK with this change?

@floren
Copy link
Contributor Author

floren commented May 4, 2021

I created #63 to track this more easily.

@floren
Copy link
Contributor Author

floren commented May 4, 2021

Test has been pushed. I've verified that it works with my new code and fails on master.

@floren
Copy link
Contributor Author

floren commented May 6, 2021

See #64

@floren floren closed this May 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants