-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce StreamDone in Stream Writer #1061
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ A review job has been created and sent to the PullRequest network.
@ashish-goswami you can click here to see the review status or cancel the code review job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few comments.
Reviewable status: 0 of 4 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @balajijinnah, @jarifibrahim, and @manishrjain)
stream_writer.go, line 33 at r1 (raw file):
var ( // ErrStreamClosed is returned when a sent is performed on closed stream.
... when data is sent on a closed stream.
stream_writer.go, line 145 at r1 (raw file):
defer sw.writeLock.Unlock() // Check if any of the stream we got records for are closed.
// Check if we've received data on a closed stream.
stream_writer.go, line 171 at r1 (raw file):
// close any streams if required. for streamID := range closedStreams { writer, ok := sw.writers[streamID]
If we have a streamID
in closedStreams
that implies that we've recieved some data on that stream. Which also means there should be a writer for it. I would suggest we either log an error or return an error here if we don't find the streamID
in sw.writers
map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall changes here look pretty good and thanks for showing the tests that you did to validate. I left some feedback and questions inline.
Reviewed with ❤️ by PullRequest
stream_writer.go
Outdated
@@ -85,8 +90,23 @@ func (sw *StreamWriter) Write(kvs *pb.KVList) error { | |||
if len(kvs.GetKv()) == 0 { | |||
return nil | |||
} | |||
|
|||
closedStreams := make(map[uint32]bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nitpick: Since you are just using these as basically a set
you could use make(map[uint32]struct{})
since struct{}
is zero-byte.
stream_writer.go
Outdated
return err | ||
} | ||
|
||
sw.closedStreams[streamID] = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to double check, should these be marked as closed directly after the SignalAndWait()
or after the wait.Done()
?
stream_writer.go
Outdated
// Check if any of the stream we got records for are closed. | ||
for streamID := range streamWithRecords { | ||
if _, ok := sw.closedStreams[streamID]; ok { | ||
return ErrStreamClosed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there may be a race where two threads are calling Write
at the same time and one takes the lock first that is closing the stream and the other one still has a streamWithRecords
version of that streamID
since the above code is run concurrently and unlocked. I just wanted to check if this is okay to return an error in this case? My guess is that its probably fine, but it may be a slight change in behavior from the current behavior in some cases.
stream_writer.go
Outdated
const headStreamID uint32 = math.MaxUint32 | ||
|
||
var ( | ||
// ErrStreamClosed is returned when a sent is performed on closed stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this possibly say "when a write is performed" instead of "when a sent is performed"?
stream_writer.go
Outdated
@@ -121,6 +141,14 @@ func (sw *StreamWriter) Write(kvs *pb.KVList) error { | |||
|
|||
sw.writeLock.Lock() | |||
defer sw.writeLock.Unlock() | |||
|
|||
// Check if any of the stream we got records for are closed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stream -> streams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good, but it might be worth checking the script from the commit message in as well for future testing and verification
Reviewed with ❤️ by PullRequest
|
||
// Now we can close any streams if required. We will make writer for | ||
// the closed streams as nil. | ||
for streamId := range closedStreams { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
range var streamId
should be streamID
(from golint
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 8 unresolved discussions (waiting on @ashish-goswami, @balajijinnah, @jarifibrahim, @manishrjain, and @pullrequest[bot])
stream_writer.go, line 33 at r1 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
... when data is sent on a closed stream.
I have removed the error.
stream_writer.go, line 94 at r1 (raw file):
Previously, pullrequest[bot] wrote…
minor nitpick: Since you are just using these as basically a
set
you could usemake(map[uint32]struct{})
sincestruct{}
is zero-byte.
Done.
stream_writer.go, line 145 at r1 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
// Check if we've received data on a closed stream.
changed the comment.
stream_writer.go, line 145 at r1 (raw file):
Previously, pullrequest[bot] wrote…
stream -> streams
changed the comment.
stream_writer.go, line 148 at r1 (raw file):
Previously, pullrequest[bot] wrote…
It seems like there may be a race where two threads are calling
Write
at the same time and one takes the lock first that is closing the stream and the other one still has astreamWithRecords
version of thatstreamID
since the above code is run concurrently and unlocked. I just wanted to check if this is okay to return an error in this case? My guess is that its probably fine, but it may be a slight change in behavior from the current behavior in some cases.
Right now, we are panicking if sent is done on closed stream. Normally it is expected that single stream is handled by a same thread.
stream_writer.go, line 171 at r1 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
If we have a
streamID
inclosedStreams
that implies that we've recieved some data on that stream. Which also means there should be a writer for it. I would suggest we either log an error or return an error here if we don't find thestreamID
insw.writers
map.
Not necessarily. We can just get close(StreamDone) without any data also. If it is with data, we will definitely have writer for stream(if not code panics). If no data has been received(just StreamDone) and we don't have any writer for it, we are just logging it.
stream_writer.go, line 181 at r1 (raw file):
Previously, pullrequest[bot] wrote…
Just to double check, should these be marked as closed directly after the
SignalAndWait()
or after thewait.Done()
?
SignalAndWait
already waits for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the above program part of the tests. And a comment.
Reviewed 2 of 4 files at r1, 1 of 2 files at r2, 1 of 1 files at r3.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @ashish-goswami, @balajijinnah, @jarifibrahim, and @pullrequest[bot])
stream_writer.go, line 140 at r3 (raw file):
// We are writing all requests to vlog even if some request belongs to already closed stream. // It is safe to do because we are panicing while writing to sorted writer, which will be nil
panicking
stream_writer.go, line 282 at r3 (raw file):
builder: table.NewTableBuilder(bopts), reqCh: make(chan *request, 3), closer: closer,
closer: y.NewCloser(1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @ashish-goswami, @balajijinnah, @jarifibrahim, @manishrjain, and @pullrequest[bot])
stream_writer.go, line 140 at r3 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
panicking
This needs to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 4 files reviewed, 8 unresolved discussions (waiting on @ashish-goswami, @balajijinnah, @jarifibrahim, @manishrjain, and @pullrequest[bot])
stream_writer.go, line 140 at r3 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
This needs to be fixed.
Done.
stream_writer.go, line 282 at r3 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
closer: y.NewCloser(1)
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 5 files reviewed, 7 unresolved discussions (waiting on @ashish-goswami, @balajijinnah, @manishrjain, and @pullrequest[bot])
This PR introduces, a way to tell StreamWriter to close a stream. Previously Streams were always open until Flush is called on StreamWriter. This resulted in memory utilisation, because of underlying TableBuilder to a sortedWriter. Also closing all sorted writer in single call resulted in more memory allocation(during Flush()). This can be useful in some case such as bulk loader in Dgraph, where only one stream is active at a time. (cherry picked from commit 385da91)
This PR introduces, a way to tell
StreamWriter
to close a stream. Previously Streams were always open untilFlush
is called onStreamWriter
. This resulted in memory utilisation, because of underlyingTableBuilder
to asortedWriter
.I used below program to test the change.
Memory usage spikes(RES reach >25GB) in case of
master
branch. While this PR limits memory usage to very low.I left above program running for a day(on this PR), it completed around ~800K streams without OOM.
This change is