Skip to content

[otel-arrow/receiver] Receiver concurrency fixes; readability improvements & restructuring#205

Merged
jmacd merged 20 commits into
open-telemetry:mainfrom
jmacd:jmacd/deconcur
May 31, 2024
Merged

[otel-arrow/receiver] Receiver concurrency fixes; readability improvements & restructuring#205
jmacd merged 20 commits into
open-telemetry:mainfrom
jmacd:jmacd/deconcur

Conversation

@jmacd
Copy link
Copy Markdown
Contributor

@jmacd jmacd commented May 28, 2024

Restructure receiver code to improve readability. There are a number of metrics that are incremented when a batch starts being processed and are decremented when the batch is finished, but the control flow that maintained the balance of these updates was convoluted.

The root-cause of #204 is that Arrow batches meant for a consumer to be processed in order were processed out-of-order. There was a large function body which served two purposes: consume Arrow data of the appropriate kind, enter data for the pipeline to consume next. This had to be split into two parts and should have been done as part of #181. (I, as reviewer, missed this and find, in hindsight, that the code is not easy to follow.)

This improves the code structure by moving all stateful aspects of starting/finishing a request into a new inFlightData object which has a deferrable method to finish the request. Here, we keep:

  1. The inFlightWG done count
  2. The active requests metric
  3. The active items metric
  4. The active bytes metric
  5. The bytes-acquired from the semaphore
  6. A per-request span covering Arrow decode
  7. Netstat-related instrumentation

Authorization now happens before acquiring from the semaphore.

A number of fmt.Errorf() calls are replaced with status.Errorf(...) and a specific error code. The tests are updated to be more specific. Several Arrow tests were accidentally canceling the test before an expected error condition was actually tested, they have been audited and improved.

One new concurrent-receiver test was added.

Fixes #204.

@jmacd jmacd changed the title Receiver concurrency fixes [otel-arrow/receiver] Receiver concurrency fixes; readability improvements & restructuring May 29, 2024
@jmacd jmacd requested a review from kristinapathak May 29, 2024 00:20
Comment thread collector/receiver/otelarrowreceiver/internal/arrow/arrow.go
@jmacd
Copy link
Copy Markdown
Contributor Author

jmacd commented May 29, 2024

@kristinapathak Proposing to add you as a reviewer in this repository. This change is a substantial rewrite but not a substantial change of behavior, so it's worth a careful review and should help you learn the receiver code very well. Thank you! 😁 🚀

@jmacd jmacd marked this pull request as ready for review May 29, 2024 00:23
Copy link
Copy Markdown

@kristinapathak kristinapathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few questions. The arrow.go file is long - might be worth splitting up in a separate PR. 🙂

Comment thread collector/receiver/otelarrowreceiver/internal/arrow/arrow.go Outdated
Comment thread collector/receiver/otelarrowreceiver/internal/arrow/arrow.go Outdated
Copy link
Copy Markdown
Contributor

@lquerel lquerel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, it seems much clearer to me, but I have a small doubt about the test (see my comment regarding the last unit test). It's probably because I haven't understood everything. With a few additional explanations, I would certainly be able to approve this commit.

Comment thread collector/receiver/otelarrowreceiver/internal/arrow/arrow_test.go Outdated
Comment thread collector/receiver/otelarrowreceiver/internal/arrow/arrow.go Outdated
Comment thread collector/receiver/otelarrowreceiver/internal/arrow/arrow.go
Comment thread collector/receiver/otelarrowreceiver/otelarrow_test.go
@jmacd
Copy link
Copy Markdown
Contributor Author

jmacd commented May 30, 2024

Reviewers, please take another look.

@jmacd
Copy link
Copy Markdown
Contributor Author

jmacd commented May 31, 2024

For the test flake above, #206

Copy link
Copy Markdown
Contributor

@lquerel lquerel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jmacd jmacd merged commit c5c9d7e into open-telemetry:main May 31, 2024
@jmacd jmacd deleted the jmacd/deconcur branch May 31, 2024 20:38
codeboten referenced this pull request in open-telemetry/opentelemetry-collector-contrib Jun 11, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
|
[github.com/open-telemetry/otel-arrow](https://github.com/open-telemetry/otel-arrow)
| `v0.23.0` -> `v0.24.0` |
[![age](https://developer.mend.io/api/mc/badges/age/go/github.com%2fopen-telemetry%2fotel-arrow/v0.24.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/go/github.com%2fopen-telemetry%2fotel-arrow/v0.24.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/go/github.com%2fopen-telemetry%2fotel-arrow/v0.23.0/v0.24.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/go/github.com%2fopen-telemetry%2fotel-arrow/v0.23.0/v0.24.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>open-telemetry/otel-arrow
(github.com/open-telemetry/otel-arrow)</summary>

###
[`v0.24.0`](https://github.com/open-telemetry/otel-arrow/releases/tag/v0.24.0)

[Compare
Source](https://github.com/open-telemetry/otel-arrow/compare/v0.23.0...v0.24.0)

Jitter is applied to once per process, not once per stream.
[https://github.com/open-telemetry/otel-arrow/pull/199](https://github.com/open-telemetry/otel-arrow/pull/199)
Network statistics tracing instrumentation simplified.
[https://github.com/open-telemetry/otel-arrow/pull/201](https://github.com/open-telemetry/otel-arrow/pull/201)
Protocol includes use of more gRPC codes.
[https://github.com/open-telemetry/otel-arrow/pull/202](https://github.com/open-telemetry/otel-arrow/pull/202)
Receiver concurrency bugfix.
[https://github.com/open-telemetry/otel-arrow/pull/205](https://github.com/open-telemetry/otel-arrow/pull/205)
Concurrent batch processor size==0 bugfix.
[https://github.com/open-telemetry/otel-arrow/pull/208](https://github.com/open-telemetry/otel-arrow/pull/208)
New integration testing.
[https://github.com/open-telemetry/otel-arrow/pull/210](https://github.com/open-telemetry/otel-arrow/pull/210)
Use gRPC Status codes in the Arrow exporter.
[https://github.com/open-telemetry/otel-arrow/pull/211](https://github.com/open-telemetry/otel-arrow/pull/211)
Fix stream-shutdown race in Arrow receiver.
[https://github.com/open-telemetry/otel-arrow/pull/212](https://github.com/open-telemetry/otel-arrow/pull/212)
Avoid work for already-canceled requests.
[https://github.com/open-telemetry/otel-arrow/pull/213](https://github.com/open-telemetry/otel-arrow/pull/213)
Call IPCReader.Err() after reader loop.
[https://github.com/open-telemetry/otel-arrow/pull/215](https://github.com/open-telemetry/otel-arrow/pull/215)
Update to Arrow-Go v16.1.0.
[https://github.com/open-telemetry/otel-arrow/pull/218](https://github.com/open-telemetry/otel-arrow/pull/218)
Update to OpenTelemetry Collector v0.102.x.
[https://github.com/open-telemetry/otel-arrow/pull/219](https://github.com/open-telemetry/otel-arrow/pull/219)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "on tuesday" (UTC), Automerge - At any
time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/open-telemetry/opentelemetry-collector-contrib).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zOTMuMCIsInVwZGF0ZWRJblZlciI6IjM3LjM5My4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJkZXBlbmRlbmNpZXMiLCJyZW5vdmF0ZWJvdCJdfQ==-->

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: opentelemetrybot <107717825+opentelemetrybot@users.noreply.github.com>
Co-authored-by: Yang Song <songy23@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Receiver crash report: concurrency bug

3 participants