Skip to content

Conversation

@srini38
Copy link
Contributor

@srini38 srini38 commented Dec 2, 2023

This patch fixes a crash in fluent-bit when ctr_decode_opentelemetry_create() tries to access the span status and when span status is not present.

the patch has been tested with opentelemetry-cpp-1.12.0/example_otlp_http http://localhost:4318/v1/traces DEBUG=yes bin and fluent-bit 2.2.0


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
[INPUT]
        name opentelemetry
        listen 127.0.0.1
        port 4318
        successful_response_code 200

[OUTPUT]
        name stdout
        match *
  • Debug log output from testing the change

Before fix:

root@b4aa3e5d75b5:/source/opentelemetry-cpp-1.12.0/build/examples/otlp# ./example_otlp_http http://localhost:4318/v1/traces DEBUG=yes bin

[2023/12/02 11:25:16] [engine] caught signal (SIGSEGV)
[2023/12/02 11:25:16] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/opentelemetry.c:52] new TCP connection arrived FD=40
[2023/12/02 11:25:16] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/http_conn.c:89] read()=402 pre_len=0 now_len=402
#0  0x559e36b6196c      in  ctr_decode_opentelemetry_create() at lib/ctraces/src/ctr_decode_opentelemetry.c:574
#1  0x559e36841d39      in  process_payload_traces_proto() at plugins/in_opentelemetry/opentelemetry_prot.c:166
#2  0x559e36841fa8      in  process_payload_traces() at plugins/in_opentelemetry/opentelemetry_prot.c:234
#3  0x559e3684543c      in  opentelemetry_prot_handle() at plugins/in_opentelemetry/opentelemetry_prot.c:1644
#4  0x559e3683c73c      in  opentelemetry_conn_event() at plugins/in_opentelemetry/http_conn.c:99
#5  0x559e3661e79f      in  flb_engine_start() at src/flb_engine.c:1009
#6  0x559e365bccf2      in  flb_lib_worker() at src/flb_lib.c:638
#7  0x7f1a092a2ad9      in  ???() at ???:0
#8  0x7f1a093332e3      in  ???() at ???:0
#9  0xffffffffffffffff  in  ???() at ???:0
Aborted (core dumped)

Post fix:

[2023/12/02 14:50:23] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/opentelemetry.c:52] new TCP connection arrived FD=40
[2023/12/02 14:50:23] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/http_conn.c:89] read()=402 pre_len=0 now_len=402
[2023/12/02 14:50:23] [debug] [input chunk] update output instances with new chunk size diff=567, records=0, input=opentelemetry.0
[2023/12/02 14:50:23] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/http_conn.c:89] read()=402 pre_len=0 now_len=402
[2023/12/02 14:50:23] [debug] [input chunk] update output instances with new chunk size diff=567, records=0, input=opentelemetry.0
[2023/12/02 14:50:23] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/http_conn.c:89] read()=165 pre_len=0 now_len=165
[2023/12/02 14:50:23] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/http_conn.c:89] read()=237 pre_len=165 now_len=402
[2023/12/02 14:50:23] [debug] [input chunk] update output instances with new chunk size diff=567, records=0, input=opentelemetry.0
[2023/12/02 14:50:23] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/http_conn.c:89] read()=165 pre_len=0 now_len=165
[2023/12/02 14:50:23] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/http_conn.c:89] read()=232 pre_len=165 now_len=397
[2023/12/02 14:50:23] [debug] [input chunk] update output instances with new chunk size diff=556, records=0, input=opentelemetry.0
[2023/12/02 14:50:23] [trace] [input:opentelemetry:opentelemetry.0 at /source/fluent-bit-2.2.0/plugins/in_opentelemetry/http_conn.c:84] fd=40 closed connection
[2023/12/02 14:50:24] [debug] [task] created task=0x7fabf8018480 id=0 OK
[2023/12/02 14:50:24] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
|-------------------- RESOURCE SPAN --------------------|
  resource:
     - attributes:
            - service.name: 'unknown_service'
            - telemetry.sdk.version: '1.12.0'
            - telemetry.sdk.name: 'opentelemetry'
            - telemetry.sdk.language: 'cpp'
     - dropped_attributes_count: 0
  schema_url:
  [scope_span]
    instrumentation scope:
        - name                    : foo_library
        - version                 : 1.12.0
        - dropped_attributes_count: 0
        - attributes:

    schema_url:
    [spans]
         [span 'f1']
             - trace_id                : e86248c61e028f03fde5e462bcae3fb1
             - span_id                 : 5593e2db2dd9c035
             - parent_span_id          : 9e1b28eb1506ce3c
             - kind                    : 1 (internal)
             - start_time              : 1701517823379482754
             - end_time                : 1701517823379486350
             - dropped_attributes_count: 0
             - dropped_events_count    : 0
             - status:
                 - code        : 0
             - attributes: none
             - events: none
             - [links]
[2023/12/02 14:50:24] [debug] [out flush] cb_destroy coro_id=0
[2023/12/02 14:50:24] [debug] [task] destroy task=0x7fabf8018480 (task_id=0)
  • Attached Valgrind output that shows no leaks or memory corruption was found
root@b4aa3e5d75b5:/source/fluent-bit-2.2.0/build/bin# valgrind --tool=memcheck --leak-check=full --track-origins=yes --show-leak-kinds=all ./fluent-bit -v -c ./flb.conf
==44134== Memcheck, a memory error detector
==44134== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==44134== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==44134== Command: ./fluent-bit -v -c ./flb.conf
==44134==
Fluent Bit v2.2.0
* Copyright (C) 2015-2023 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2023/12/02 15:21:46] [ info] Configuration:
[2023/12/02 15:21:46] [ info]  flush time     | 1.000000 seconds
[2023/12/02 15:21:46] [ info]  grace          | 5 seconds
[2023/12/02 15:21:46] [ info]  daemon         | 0
[2023/12/02 15:21:46] [ info] ___________
[2023/12/02 15:21:46] [ info]  inputs:
[2023/12/02 15:21:46] [ info]      opentelemetry
[2023/12/02 15:21:46] [ info] ___________
[2023/12/02 15:21:46] [ info]  filters:
[2023/12/02 15:21:46] [ info] ___________
[2023/12/02 15:21:46] [ info]  outputs:
[2023/12/02 15:21:46] [ info]      stdout.0
[2023/12/02 15:21:46] [ info] ___________
[2023/12/02 15:21:46] [ info]  collectors:
[2023/12/02 15:21:46] [ info] [fluent bit] version=2.2.0, commit=, pid=44134
[2023/12/02 15:21:46] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2023/12/02 15:21:46] [ info] [storage] ver=1.5.1, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2023/12/02 15:21:46] [ info] [output:stdout:stdout.0] worker #0 started
[2023/12/02 15:21:46] [ info] [cmetrics] version=0.6.4
[2023/12/02 15:21:46] [ info] [ctraces ] version=0.3.1
[2023/12/02 15:21:46] [ info] [input:opentelemetry:opentelemetry.0] initializing
[2023/12/02 15:21:46] [ info] [input:opentelemetry:opentelemetry.0] storage_strategy='memory' (memory only)
[2023/12/02 15:21:46] [debug] [opentelemetry:opentelemetry.0] created event channels: read=21 write=22
[2023/12/02 15:21:46] [debug] [downstream] listening on 127.0.0.1:4318
[2023/12/02 15:21:46] [ info] [input:opentelemetry:opentelemetry.0] listening on 127.0.0.1:4318
[2023/12/02 15:21:46] [debug] [stdout:stdout.0] created event channels: read=24 write=25
[2023/12/02 15:21:46] [ info] [sp] stream processor started
[2023/12/02 15:21:52] [debug] [input chunk] update output instances with new chunk size diff=567, records=0, input=opentelemetry.0
[2023/12/02 15:21:52] [debug] [input chunk] update output instances with new chunk size diff=567, records=0, input=opentelemetry.0
[2023/12/02 15:21:52] [debug] [input chunk] update output instances with new chunk size diff=567, records=0, input=opentelemetry.0
[2023/12/02 15:21:52] [debug] [input chunk] update output instances with new chunk size diff=556, records=0, input=opentelemetry.0
[2023/12/02 15:21:53] [debug] [task] created task=0x526eb20 id=0 OK
|-------------------- RESOURCE SPAN --------------------|

[2023/12/02 15:21:53] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
  resource:
     - attributes:
            - service.name: 'unknown_service'
            - telemetry.sdk.version: '1.12.0'
            - telemetry.sdk.name: 'opentelemetry'
            - telemetry.sdk.language: 'cpp'
     - dropped_attributes_count: 0
  schema_url:
  [scope_span]
    instrumentation scope:
        - name                    : foo_library
        - version                 : 1.12.0
        - dropped_attributes_count: 0
        - attributes:

    schema_url:
    [spans]
         [span 'f1']
             - trace_id                : 4c9fd66d21d8655bac6dc161b3fb8365
             - span_id                 : da35b5ab06c17c82
             - parent_span_id          : 3261a69334388f15
             - kind                    : 1 (internal)
             - start_time              : 1701519712566966527
             - end_time                : 1701519712566969984
             - dropped_attributes_count: 0
             - dropped_events_count    : 0
             - status:
                 - code        : 0
             - attributes: none
             - events: none
             - [links]
[2023/12/02 15:21:53] [debug] [out flush] cb_destroy coro_id=0
[2023/12/02 15:21:53] [debug] [task] destroy task=0x526eb20 (task_id=0)
^C[2023/12/02 15:21:55] [engine] caught signal (SIGINT)
[2023/12/02 15:21:55] [ warn] [engine] service will shutdown in max 5 seconds
[2023/12/02 15:21:56] [ info] [engine] service has stopped (0 pending tasks)
[2023/12/02 15:21:56] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2023/12/02 15:21:56] [ info] [output:stdout:stdout.0] thread worker #0 stopped
==44134==
==44134== HEAP SUMMARY:
==44134==     in use at exit: 0 bytes in 0 blocks
==44134==   total heap usage: 1,999 allocs, 1,999 frees, 1,354,369 bytes allocated
==44134==
==44134== All heap blocks were freed -- no leaks are possible
==44134==
==44134== For lists of detected and suppressed errors, rerun with: -s
==44134== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • [N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

This patch fixes a crash in fluent-bit when ctr_decode_opentelemetry_create()
tries to access the span status and when span status is not present.

the patch has been tested with opentelemetry-cpp-1.12.0/example_otlp_http
http://localhost:4318/v1/traces DEBUG=yes bin and fluent-bit 2.2.0

Signed-off-by: Srinivasan J <srinidpdk@gmail.com>
@nokute78
Copy link
Contributor

nokute78 commented Dec 3, 2023

@srini38 Could you send this PR to https://github.com/fluent/ctraces ?

@srini38
Copy link
Contributor Author

srini38 commented Dec 3, 2023

@srini38 Could you send this PR to https://github.com/fluent/ctraces ?

Hi @nokute78,
Please let me know if I need to open a separate PR for ctraces in https://github.com/fluent/ctraces or should I close this and open the PR only for https://github.com/fluent/ctraces ?

Thanks,
Srini

@nokute78
Copy link
Contributor

nokute78 commented Dec 3, 2023

I think it is better to close this PR and open an new PR at ctraces repo.

Files under lib directory are files of another project like ctraces.
If this PR is merged , it will be overwritten with files of ctraces repo.
e.g. f959e2e

@srini38
Copy link
Contributor Author

srini38 commented Dec 3, 2023

I think it is better to close this PR and open an new PR at ctraces repo.

Files under lib directory are files of another project like ctraces. If this PR is merged , it will be overwritten with files of ctraces repo. e.g. f959e2e

Thanks @nokute78 I have opened a new PR for ctrace repo as requested fluent/ctraces#46
Will close this one.

Regards,
Srini

@srini38 srini38 closed this Dec 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants