feat(go jsonrpc): fix jsonrpc flaky test#298
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #298 +/- ##
==========================================
- Coverage 76.85% 76.75% -0.11%
==========================================
Files 181 181
Lines 19884 19884
==========================================
- Hits 15282 15261 -21
- Misses 3810 3829 +19
- Partials 792 794 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
grcevski
left a comment
There was a problem hiding this comment.
Looks like fix works! Anything else we need to change here of we should start the review process?
|
Sorry, I think the flaky test still exist, and I have added some comment in #273 . |
5fb3e0b to
84af67d
Compare
|
I think the missing log may come from ring buffer size limit of |
|
I think I find the crucial log |
84af67d to
b773679
Compare
|
@titaneric I've rebased your branch onto main - I believe one of the tests were failing because of the changes introduced by #276 - in particular, this one: https://github.com/open-telemetry/opentelemetry-ebpf-instrumentation/pull/276/files#diff-9840da89403837f076f6f383e9d645312ed48c02732bda31a1c9abcfcefb8133 hopefully that will do the trick (famous last words) |
|
Ignore the failing OATs tests, it's not related to your changes and we are looking into it. |
b773679 to
ce1cd4f
Compare
|
Thank you for helping rebase the main branch. However, the flasky test still exits. |
|
I think I find the root cause. I take a look at original log in Line 9999-10017. For some reason, the In the first phase, the did detect the I may simply update code from if (header_inv && header_inv->content_type[0]) {
bpf_dbg_printk("Found content type in ongoing request: %s", header_inv->content_type);
if (is_json_content_type((void *)header_inv->content_type,
sizeof(header_inv->content_type))) {
invocation.json_content_type = 1;
}
}into if (header_inv && header_inv->content_type[0]) {
bpf_dbg_printk("Found content type in ongoing request: %s", header_inv->content_type);
if (header_inv->json_content_type ||
is_json_content_type((void *)header_inv->content_type,
sizeof(header_inv->content_type))) {
invocation.json_content_type = 1;
}to fix this flaky test. |
|
I ran the In addition, I am not sure whether the different request to the same http server would share the same go routine address. For a complete and robust fix, we may find the parent go routine of the given go routine, to check whether they are born from the same http server. P.S. I would remove unnessary commits if we all agree this is the root cause. |
|
@titaneric so I have been looking into the failed tests as well - if you look at the logs (look for What I will do is updating the test so to ensure it waits for all traces to be submitted from OBI to the Jaeger DB and then see if that works. Would that make sense to you? And would we still need this PR? |
|
@rafaelroquetto , thank you for helping check the log. I think 5eb6851 is still needed. If you search As this #298 (comment) explains, I think it's nessary to add this patch to resolve the flaky test. |
|
@titaneric so we have been doing some digging and have managed to reproduce the bug once. When it happens, it appears that we do see the incoming request, but we fail to recognise it as JSON-RPC - either because the uprobe doesn't fire at all times, or most likely, because sometimes (rarely) the body can come chunked and we fail to parse it. It's a bit tricky. Whilst looking into it, we realised that we may be able to simplify the code, and most importantly, do without attaching a uprobe to This is what the call stack of handling a JSON request looks like: Which means the go runtime already does the job of detecting whether we have a jsonrpc request - I believe all we need to do is attaching a Check this out: and we could remove the probe from read altogether (and all the associated parsing). What do you think? Is this something you'd like to take on, or would you prefer if we did it? Thanks again! |
|
OMG, it's a huge discovery for me. Could I rewrite the present implementation with your idea? I think it's much much robust, and I don't have to parse json-rpc method by mysel. Really thank you for this help, I would raise another PR to re-implement it. |
|
Raise #335 here, and I would close this PR. |
This PR intends to fix #273, which is a JSON-RPC flaky test introduced in #161.