[otel-arrow/receiver] Receiver concurrency fixes; readability improvements & restructuring#205
Conversation
|
@kristinapathak Proposing to add you as a reviewer in this repository. This change is a substantial rewrite but not a substantial change of behavior, so it's worth a careful review and should help you learn the receiver code very well. Thank you! 😁 🚀 |
kristinapathak
left a comment
There was a problem hiding this comment.
Left a few questions. The arrow.go file is long - might be worth splitting up in a separate PR. 🙂
lquerel
left a comment
There was a problem hiding this comment.
Generally, it seems much clearer to me, but I have a small doubt about the test (see my comment regarding the last unit test). It's probably because I haven't understood everything. With a few additional explanations, I would certainly be able to approve this commit.
|
Reviewers, please take another look. |
|
For the test flake above, #206 |
[](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [github.com/open-telemetry/otel-arrow](https://github.com/open-telemetry/otel-arrow) | `v0.23.0` -> `v0.24.0` | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | [](https://docs.renovatebot.com/merge-confidence/) | --- > [!WARNING] > Some dependencies could not be looked up. Check the Dependency Dashboard for more information. --- ### Release Notes <details> <summary>open-telemetry/otel-arrow (github.com/open-telemetry/otel-arrow)</summary> ### [`v0.24.0`](https://github.com/open-telemetry/otel-arrow/releases/tag/v0.24.0) [Compare Source](https://github.com/open-telemetry/otel-arrow/compare/v0.23.0...v0.24.0) Jitter is applied to once per process, not once per stream. [https://github.com/open-telemetry/otel-arrow/pull/199](https://github.com/open-telemetry/otel-arrow/pull/199) Network statistics tracing instrumentation simplified. [https://github.com/open-telemetry/otel-arrow/pull/201](https://github.com/open-telemetry/otel-arrow/pull/201) Protocol includes use of more gRPC codes. [https://github.com/open-telemetry/otel-arrow/pull/202](https://github.com/open-telemetry/otel-arrow/pull/202) Receiver concurrency bugfix. [https://github.com/open-telemetry/otel-arrow/pull/205](https://github.com/open-telemetry/otel-arrow/pull/205) Concurrent batch processor size==0 bugfix. [https://github.com/open-telemetry/otel-arrow/pull/208](https://github.com/open-telemetry/otel-arrow/pull/208) New integration testing. [https://github.com/open-telemetry/otel-arrow/pull/210](https://github.com/open-telemetry/otel-arrow/pull/210) Use gRPC Status codes in the Arrow exporter. [https://github.com/open-telemetry/otel-arrow/pull/211](https://github.com/open-telemetry/otel-arrow/pull/211) Fix stream-shutdown race in Arrow receiver. [https://github.com/open-telemetry/otel-arrow/pull/212](https://github.com/open-telemetry/otel-arrow/pull/212) Avoid work for already-canceled requests. [https://github.com/open-telemetry/otel-arrow/pull/213](https://github.com/open-telemetry/otel-arrow/pull/213) Call IPCReader.Err() after reader loop. [https://github.com/open-telemetry/otel-arrow/pull/215](https://github.com/open-telemetry/otel-arrow/pull/215) Update to Arrow-Go v16.1.0. [https://github.com/open-telemetry/otel-arrow/pull/218](https://github.com/open-telemetry/otel-arrow/pull/218) Update to OpenTelemetry Collector v0.102.x. [https://github.com/open-telemetry/otel-arrow/pull/219](https://github.com/open-telemetry/otel-arrow/pull/219) </details> --- ### Configuration 📅 **Schedule**: Branch creation - "on tuesday" (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/open-telemetry/opentelemetry-collector-contrib). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zOTMuMCIsInVwZGF0ZWRJblZlciI6IjM3LjM5My4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJkZXBlbmRlbmNpZXMiLCJyZW5vdmF0ZWJvdCJdfQ==--> --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: opentelemetrybot <107717825+opentelemetrybot@users.noreply.github.com> Co-authored-by: Yang Song <songy23@users.noreply.github.com>
Restructure receiver code to improve readability. There are a number of metrics that are incremented when a batch starts being processed and are decremented when the batch is finished, but the control flow that maintained the balance of these updates was convoluted.
The root-cause of #204 is that Arrow batches meant for a consumer to be processed in order were processed out-of-order. There was a large function body which served two purposes: consume Arrow data of the appropriate kind, enter data for the pipeline to consume next. This had to be split into two parts and should have been done as part of #181. (I, as reviewer, missed this and find, in hindsight, that the code is not easy to follow.)
This improves the code structure by moving all stateful aspects of starting/finishing a request into a new
inFlightDataobject which has a deferrable method to finish the request. Here, we keep:inFlightWGdone countAuthorization now happens before acquiring from the semaphore.
A number of
fmt.Errorf()calls are replaced withstatus.Errorf(...)and a specific error code. The tests are updated to be more specific. Several Arrow tests were accidentally canceling the test before an expected error condition was actually tested, they have been audited and improved.One new concurrent-receiver test was added.
Fixes #204.