Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix start offset calculation in mmaped memory area #738

Merged
merged 1 commit into from
Mar 28, 2024

Conversation

RonFed
Copy link
Contributor

@RonFed RonFed commented Mar 26, 2024

This PR fixes a relatively rare bug that can happen when a write operation to the mmaped area is moving between pages. We didn't take the page-aligned address into account when updating the start offset.

@RonFed RonFed added bug Something isn't working Skip Changelog eBPF labels Mar 26, 2024
@RonFed RonFed requested a review from a team March 26, 2024 17:19
@MrAlias
Copy link
Contributor

MrAlias commented Mar 26, 2024

Are there any tests we can add to prevent a regression?

@RonFed
Copy link
Contributor Author

RonFed commented Mar 26, 2024

Are there any tests we can add to prevent a regression?

I think the e2e tests for grpc and http are covering the case of regression espicialy this part which checks the client-server propagation:

@test "client, server :: server span has client span as parent" {
server_parent_span_id=$(server_spans_from_scope_named ${SCOPE} | jq ".parentSpanId")
client_span_id=$(client_spans_from_scope_named ${SCOPE} | jq ".spanId")
assert_equal "$client_span_id" "$server_parent_span_id"
}

The bug in this PR can happen approximately once every 50-100 client spans (since page size is 4096 bytes and we write ~40 bytes per request in gRPC for example)

@damemi damemi mentioned this pull request Mar 27, 2024
@damemi
Copy link
Contributor

damemi commented Mar 28, 2024

The bug in this PR can happen approximately once every 50-100 client spans (since page size is 4096 bytes and we write ~40 bytes per request in gRPC for example)

1-2% of the time isn't insignificant, but I don't think any of our tests are set up for that kind of scenario. Maybe as part of #93 we should update the tests to send more than one span (ideally enough to cross the page threshold).

@RonFed just so I understand does this bug cause spans to get dropped by being overwritten in the map when the new address isn't taken into account?

@damemi damemi mentioned this pull request Mar 28, 2024
@RonFed
Copy link
Contributor Author

RonFed commented Mar 28, 2024

The bug in this PR can happen approximately once every 50-100 client spans (since page size is 4096 bytes and we write ~40 bytes per request in gRPC for example)

1-2% of the time isn't insignificant, but I don't think any of our tests are set up for that kind of scenario. Maybe as part of #93 we should update the tests to send more than one span (ideally enough to cross the page threshold).

@RonFed just so I understand does this bug cause spans to get dropped by being overwritten in the map when the new address isn't taken into account?

Right, it can cause spans to get dropped. Additionally in some case it caused invalid characters to override the trace parent header which resulted in a RST_STREAM code in the underlying http2 (in grpc).

@MrAlias
Copy link
Contributor

MrAlias commented Mar 28, 2024

@damemi thank you for updating #93. I'm going to merge this, and have the follow-up work tracked there.

@MrAlias MrAlias merged commit 170bef3 into open-telemetry:main Mar 28, 2024
20 checks passed
@MrAlias MrAlias added this to the v0.11.0-alpha milestone Mar 28, 2024
@damemi
Copy link
Contributor

damemi commented Mar 28, 2024

btw @RonFed I missed it but I think this should have had a changelog entry. I'll add one in #737 for this.

Maybe we should define what needs a changelog entry somewhere?

@RonFed
Copy link
Contributor Author

RonFed commented Mar 28, 2024

btw @RonFed I missed it but I think this should have had a changelog entry. I'll add one in #737 for this.

Maybe we should define what needs a changelog entry somewhere?

Ah thanks 🙂, yeah I don't really know when to add a changelog entry or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working eBPF Skip Changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants