Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: HTTPTransport.Flush panic and data race #140

Merged
merged 2 commits into from
Jan 22, 2020

Conversation

rhcarvalho
Copy link
Contributor

Rewrite HTTPTransport internals to remove a data race and occasional
panics.

Prior to this, HTTPTransport used a sync.WaitGroup to track how many
in-flight requests existed, and Flush waited until the observed number
of in-flight requests reached zero.

Unsynchronized access to the WaitGroup lead to panics (reuse before
wg.Wait returns) and data races (undefined order of wg.Add and wg.Wait
calls). See #103 (comment), #103 (comment) and #103 (comment).

The new implementation changes the Flush behavior to wait until the
current in-flight requests are processed, inline with other SDKs and the
Unified API.

Fixes #103.

@rhcarvalho rhcarvalho added this to the v0.5.0 milestone Jan 21, 2020
@rhcarvalho
Copy link
Contributor Author

There are no automated tests covering the HTTPTransport.Flush behavior. In comments of #103 I posted what test code I used to guide this PR, however that's not a very good test to have in the repo.

Would love some ideas in that regard.

Thinking of testing some properties of Flush, like:

  • After sending a few events and calling Flush those events should show up in a test server. Note that if too many events are sent, some will be dropped because of overflow.
  • Calling Flush multiple times in a row should work and "be quick".
  • ...

untitaker
untitaker previously approved these changes Jan 21, 2020
@rhcarvalho
Copy link
Contributor Author

I've included a few tests to exercise HTTPTransport usage flows.

bruno-garcia
bruno-garcia previously approved these changes Jan 22, 2020
transport.go Show resolved Hide resolved
transport.go Show resolved Hide resolved
transport.go Outdated Show resolved Hide resolved
The ConcurrentSendAndFlush test reveals a data race in the HTTPTransport
implementation.
Rewrite HTTPTransport internals to remove a data race and occasional
panics.

Prior to this, HTTPTransport used a `sync.WaitGroup` to track how many
in-flight requests existed, and `Flush` waited until the observed number
of in-flight requests reached zero.

Unsynchronized access to the WaitGroup lead to panics (reuse before
wg.Wait returns) and data races (undefined order of wg.Add and wg.Wait
calls).

The new implementation changes the `Flush` behavior to wait until the
current in-flight requests are processed, inline with other SDKs and the
Unified API.
@rhcarvalho
Copy link
Contributor Author

Added more code comments describing how the mechanism works and what invariants are considered.

Renamed and reorganized test code based on code review in person with @bruno-garcia. Thanks @bruno-garcia!!!

@rhcarvalho rhcarvalho merged commit 0a07f35 into getsentry:master Jan 22, 2020
@rhcarvalho rhcarvalho deleted the flush-batch branch January 22, 2020 19:18
@rhcarvalho rhcarvalho removed this from the v0.5.0 milestone Jan 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intermittently seeing "sync: WaitGroup is reused before previous Wait has returned" in transport.go
3 participants