-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hubble/container: Properly deal with nil values in RingReader #11323
Conversation
Commit 357e0a4b493be4c991d31b68f0a3b0f315366277 does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
357e0a4
to
b06ee8f
Compare
Commit 357e0a4b493be4c991d31b68f0a3b0f315366277 does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
test-me-please |
b06ee8f
to
8835e85
Compare
test-me-please |
Are we expecting that the |
I believe they will suffer from the concurrent writer issue. I'm currently thinking that the way to implement something like |
I should also note: The Me and @rolinh didn't consider this a bug, therefore I didn't "fix" it. It can however easily be avoided if we never read past the initial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now i see all the flows from ring buffer:
bash-5.0# hubble observe --last 10000 -o json | wc -l
4095
bash-5.0# hubble observe --since 1h -o json | wc -l
4095
sometimes i get more than len(ringbuffer) - 1
flows, but i guess this can happen if new flows are coming in as we read.
regarding @gandro 's comment, i think it's totally fine to stop at the initial LastParallelWrite
index. you can always follow if you want to get more.
This feels non intuitive at first but it's definitely not a problem as long as the reader is able to eventually catch up with the writers.
The problem is not really to "not get enough" but that the caller will in effect miss the most recent flows. I'd be in favor of keeping the current behavior unless it's a source of bugs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for having invested time into this issue. I know the ring buffer is tricky to work with :)
I think this is definitely going to the right direction. That said, I have one concern with a suggested change. Could you have a look at it please?
8835e85
to
bbbc6ba
Compare
test-me-please |
if !ok { | ||
t.Errorf("Should be able to read position %x", lastWrite) | ||
} | ||
if entry != nil { | ||
t.Errorf("Read Event should be %+v, got %+v instead", nil, entry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewer note: This test relied on read
returning nil
values. With this PR, we never allow nil
values to be returned from read
(same as readFrom
, which had this restriction already for a while)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're almost there but I left a few comments. Note that the commit message is really hard to comprehend, eg:
If the current index is unreadable, then we have reached either end of the ring buffer.
What's the "end" of a ring structure?
Next
may return ErrConcurrentRead
I think that the error name has changed.
pkg/hubble/container/ring.go
Outdated
// ErrConcurrentWrite indicates that a value was overwritten | ||
ErrConcurrentWrite = errors.New("concurrent write detected") | ||
// ErrEndOfBuffer is an alias for io.EOF | ||
ErrEndOfBuffer = io.EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this alias is strictly necessary as io.EOF
is widely accepted as meaning "end of whatever".
Besides, I'm still trying to understand what the "end" of a ring structure can be :)
pkg/hubble/container/ring.go
Outdated
@@ -24,6 +26,13 @@ import ( | |||
"github.com/cilium/cilium/pkg/lock" | |||
) | |||
|
|||
var ( | |||
// ErrConcurrentWrite indicates that a value was overwritten | |||
ErrConcurrentWrite = errors.New("concurrent write detected") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your commit message mentions ErrConcurrentRead
. I think that both names are misleading anyway as reading concurrently should not be a problem and a "concurrent write" error while reading is rather something unexpected. What about a more generic name like ErrInvalidRead
for instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, the ConcurrentRead
is a typo in the commit messages. I'll think about the naming some more. I don't want it to be too generic, since the error is also presented to the user, and invalid read
sounds like something is broken internally, when in fact they just tried to read stale data.
if e == nil { | ||
e, err := reader.Previous() | ||
if err != nil { | ||
idx++ // we went backward 1 too far |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we went backward 1 too far
Unless the error is io.EOF
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda, I was not sure how to handle this properly. If Previous
returns io.EOF
, then something went horribly wrong. But yeah, we can check the error for ErrConcurrentWrite
and propagate everything else.
bbbc6ba
to
bee1bf7
Compare
test-me-please |
When the ring-buffer is not full, we must treat nil values the same as reads outside of the readable indices. This commit ensures that `read` treats nil values the same way `readFrom` does. This has an impact on the RingReader, as we should not try to either increase or decrease the index if the current value is unreadable. If the current index is unreadable, then we have either read past the last written value, or we have tried to access an index which has already been overwritten. This commit therefore adds an error return type to `RingReader.{Next,Prev}` in order to determine which of the two cases occured. Most notably, `Next` may now return ErrInvalidData. This means this means that the value it tried to access just got overwritten by the writer from behind. In such cases we want to return an error, as otherwise `RingReader.Next` would silently stop reading and subsequentially terminate the request as if it had reached most recent value in the ring buffer (which it has not). The error is also added to `Previous`, however in that case we only ever expect ErrInvalidData, as it traverses the ring buffer in the opposite direction of the writer. The check for concurrent writes is currently missing from `readFrom`, which can suffer from the same problem. This will be added in a follow-up PR. Fixes: cilium/hubble#131 Signed-off-by: Sebastian Wicki <[email protected]>
bee1bf7
to
4229ac8
Compare
test-me-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for the great work! LGTM.
When the ring-buffer is not full, we must treat nil values the same as
reads outside of the readable indices. This commit ensures that
read
treats nil values the same way
readFrom
does.This has an impact on the RingReader, as we should not try to either
increase or decrease the index if the current value is unreadable.
If the current index is unreadable, then we have either read past the
last written value, or we have tried to access an index which has
already been overwritten.
This commit therefore adds an error return type to
RingReader.{Next,Prev}
in order to determine which of the two casesoccured.
Most notably,
Next
may now return ErrInvalidData. This means thismeans that the value it tried to access just got overwritten by the
writer from behind. In such cases we want to return an error, as
otherwise
RingReader.Next
would silently stop reading andsubsequentially terminate the request as if it had reached most
recent value in the ring buffer (which it has not).
The error is also added to
Previous
, however in that casewe only ever expect ErrInvalidData, as it traverses the ring buffer
in the opposite direction of the writer.
The check for concurrent writes is currently missing from
readFrom
,which can suffer from the same problem. This will be added in a
follow-up PR.
Fixes: cilium/hubble#131