Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opentelemetry stops sending traces with 9.0.0 #8566

Closed
1 task done
moserke opened this issue Jun 27, 2024 · 22 comments
Closed
1 task done

Opentelemetry stops sending traces with 9.0.0 #8566

moserke opened this issue Jun 27, 2024 · 22 comments
Labels
bug This issue/PR relates to a bug callback callback plugin has_pr plugins plugin (any type)

Comments

@moserke
Copy link

moserke commented Jun 27, 2024

Summary

When going from 8.6.2 to 9.0.0 the opentelemetry callback stops sending traces to the endpoint. Same exact configuration and traces get forwarded in 8.6.2 but go nowhere in 9.0.0. I suspect it's due to how the exporter is getting picked but can't seem to figure out how to make it work.

otel_exporter = None
        if store_spans_in_file:
            otel_exporter = InMemorySpanExporter()
            processor = SimpleSpanProcessor(otel_exporter)
        else:
            if otel_exporter_otlp_traces_protocol == 'grpc':
                otel_exporter = GRPCOTLPSpanExporter()
            else:
                otel_exporter = HTTPOTLPSpanExporter()
            processor = BatchSpanProcessor(otel_exporter)

Issue Type

Bug Report

Component Name

opentelemetry callback

Ansible Version

$ ansible --version
ansible [core 2.17.1]
  config file = /ansible/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.12/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.12.3 (main, Apr 17 2024, 00:00:00) [GCC 14.0.1 20240411 (Red Hat 14.0.1-0)] (/usr/bin/python3)
  jinja version = 3.1.4
  libyaml = True

Community.general Version

$ ansible-galaxy collection list community.general
Collection        Version
----------------- -------
community.general 9.1.0  

Configuration

$ ansible-config dump --only-changed

OS / Environment

No response

Steps to Reproduce

ansibile config:
[defaults]
callbacks_enabled = community.general.opentelemetry

Run playbook
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 ansible-playbook playbook.yml

Expected Results

Expect traces to be sent to endpoint

Actual Results

Traces are never forwarded

Code of Conduct

  • I agree to follow the Ansible Code of Conduct
@ansibullbot
Copy link
Collaborator

Files identified in the description:

If these files are incorrect, please update the component name section of the description or use the !component bot command.

click here for bot help

@ansibullbot
Copy link
Collaborator

cc @v1v
click here for bot help

@ansibullbot ansibullbot added bug This issue/PR relates to a bug callback callback plugin plugins plugin (any type) labels Jun 27, 2024
@v1v
Copy link
Contributor

v1v commented Jun 28, 2024

#8321 is the PR that introduced the support for the http exporter.

As far as I see, the change uses the same exporter by default.

Can you try to run the plugin with the explicit configuration entries?

ansible.cfg:

    [defaults]
    callbacks_enabled = community.general.opentelemetry
    [callback_opentelemetry]
    otel_exporter_otlp_traces_protocol = grpc
    store_spans_in_file = None

IIUC, you tried locally running OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317/ ansible-playbook playbook.yml against your OTEL collector, right?

@moserke
Copy link
Author

moserke commented Jul 1, 2024

Tried setting all of the possible config options to their defaults in the ansible.cfg and still the same issue, it just simply isn't trying to send the traces. If I do a store_spans_in_file=/dev/stdout instead just to see, it prints them to the screen, so I know it's tracing, it's just for some reason not sending to the otlp endpoint...

@rojon8
Copy link

rojon8 commented Jul 12, 2024

Seeing the same issue here. Works nicely in 8.6, but silently stops sending traces in >=9.0.0.

@v1v
Copy link
Contributor

v1v commented Jul 12, 2024

I can see a few changes were added to v9.0:

IIUC, from the description, the issue might be related to supporting HTTP exporters and the existing GRPC support.

@wilfriedroset @russoz, since you worked and helped on #8321, would you mind if I asked you to double-check if things work nicely on your end if you use >=9.0.0? 🙇

@cervajs
Copy link

cervajs commented Jul 16, 2024

tested with 9.2.0. problem persist

8.6.3 works ok

ansible [core 2.14.14]
config file = /home/cervenka/.ansible.cfg
configured module search path = ['/home/cervenka/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3.9/site-packages/ansible
ansible collection location = /home/cervenka/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/bin/ansible
python version = 3.9.18 (main, Jan 4 2024, 00:00:00) [GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] (/usr/bin/python3)
jinja version = 3.1.2
libyaml = True

@russoz
Copy link
Collaborator

russoz commented Jul 17, 2024

I can see a few changes were added to v9.0:

IIUC, from the description, the issue might be related to supporting HTTP exporters and the existing GRPC support.

@wilfriedroset @russoz, since you worked and helped on #8321, would you mind if I asked you to double-check if things work nicely on your end if you use >=9.0.0? 🙇

Hi @v1v I pretty much helped review it from a Python/Ansible perspective, I am not familiar enough with OpenTelemetry to make a call on the plugin logic.

@wilfriedroset Would it be possible for you to double check the code change? TIA

@russoz
Copy link
Collaborator

russoz commented Jul 17, 2024

I have just reviewed the changes in that PR, and to the best of my ability I could not find anything that would be a problem. There are 4 other PRs after #8321 that might have introduced a problem (I have no x-ref' d them with the version tag, so probably not all of them apply).

@felixfontein
Copy link
Collaborator

I've merged #8741, would be great if someone could verify that it fixes this bug.

@OneCyrus
Copy link

@felixfontein

with this version we only get the trace without any spans. if we use the community.general < 9.0.0 we have all the spans correctly reported.

@felixfontein
Copy link
Collaborator

@wilfriedroset @v1v ^

@OneCyrus
Copy link

friendly push if someone has any pointer to the cause of this?

@wilfriedroset @v1v

@v1v
Copy link
Contributor

v1v commented Dec 31, 2024

Sorry for the radio silence;

I cannot reproduce the missing traces/spans error with the latest changes in main.

How did I test this out?

I've been using the latest changes for the otel ansible plugin and testing against an OTEL Collector that has been configured with the Elastic exporter

OTEL collector config

receivers:
  otlp:
    protocols:
      grpc:
      http:

exporters:
  otlp/elastic:
    endpoint: "${env:APM_URL}"
    headers:
      Authorization: "Bearer ${env:APM_TOKEN}"

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp/elastic]
    logs:
      receivers: [otlp]
      exporters: [otlp/elastic]

Then I ran docker compose with the below settings:

docker-compose.yml

---
services:

  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector-config.yaml"]
    platform: linux/arm64
    volumes:
      - ./config/otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "1888:1888"   # pprof extension
      - "13133:13133" # health_check extension
      - "4317:4317"   # OTLP gRPC receiver
      - "55670:55679" # zpages extension
    environment:
      APM_URL: ${APM_URL}
      APM_TOKEN: ${APM_TOKEN}
    networks:
      - otel

volumes:
  otel:
    driver: local

networks:
  otel:

and ran:

$ OTEL_EXPORTER_OTLP_INSECURE=true \
	OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317 \
	ansible-playbook playbook.yml

and so far so good in both cases OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317 and OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317.

image

My current environment is:

Expand to view

ansible [core 2.16.6]
  config file = /Users/vmartinez/workspaces/v1v/its-ansible-otel/ansible.cfg
  configured module search path = ['/Users/vmartinez/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /Users/vmartinez/workspaces/v1v/its-ansible-otel/.venv/lib/python3.12/site-packages/ansible
  ansible collection location = /Users/vmartinez/.ansible/collections:/usr/share/ansible/collections
  executable location = /Users/vmartinez/workspaces/v1v/its-ansible-otel/.venv/bin/ansible
  python version = 3.12.8 (main, Dec  3 2024, 18:42:41) [Clang 16.0.0 (clang-1600.0.26.4)] (/Users/vmartinez/workspaces/v1v/its-ansible-otel/.venv/bin/python)
  jinja version = 3.1.4
  libyaml = True
Package                                  Version
---------------------------------------- --------
ansible                                  9.5.1
ansible-core                             2.16.6
certifi                                  2024.2.2
cffi                                     1.16.0
charset-normalizer                       3.3.2
cryptography                             42.0.7
Deprecated                               1.2.14
docker                                   7.0.0
googleapis-common-protos                 1.63.0
grpcio                                   1.63.0
idna                                     3.7
importlib-metadata                       7.0.0
iniconfig                                2.0.0
Jinja2                                   3.1.4
MarkupSafe                               2.1.5
opentelemetry-api                        1.24.0
opentelemetry-exporter-otlp              1.24.0
opentelemetry-exporter-otlp-proto-common 1.24.0
opentelemetry-exporter-otlp-proto-grpc   1.24.0
opentelemetry-exporter-otlp-proto-http   1.24.0
opentelemetry-proto                      1.24.0
opentelemetry-sdk                        1.24.0
opentelemetry-semantic-conventions       0.45b0
packaging                                24.0
pip                                      24.0
pluggy                                   1.5.0
protobuf                                 4.25.3
pycparser                                2.22
pytest                                   8.2.0
PyYAML                                   6.0.1
requests                                 2.31.0
resolvelib                               1.0.1
typing_extensions                        4.11.0
urllib3                                  2.2.1
wrapt                                    1.16.0
zipp                                     3.18.1
                "org.opencontainers.image.source": "https://github.com/open-telemetry/opentelemetry-collector-releases",
                "org.opencontainers.image.version": "0.100.0"

If I update those dependencies, it works too:

Expand to view

ansible [core 2.18.1]
  config file = /Users/vmartinez/workspaces/v1v/its-ansible-otel/ansible.cfg
  configured module search path = ['/Users/vmartinez/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /Users/vmartinez/workspaces/v1v/its-ansible-otel/.venv/lib/python3.12/site-packages/ansible
  ansible collection location = /Users/vmartinez/.ansible/collections:/usr/share/ansible/collections
  executable location = /Users/vmartinez/workspaces/v1v/its-ansible-otel/.venv/bin/ansible
  python version = 3.12.8 (main, Dec  3 2024, 18:42:41) [Clang 16.0.0 (clang-1600.0.26.4)] (/Users/vmartinez/workspaces/v1v/its-ansible-otel/.venv/bin/python)
  jinja version = 3.1.5
  libyaml = True
Package                                  Version
---------------------------------------- ----------
ansible                                  11.1.0
ansible-core                             2.18.1
certifi                                  2024.12.14
cffi                                     1.17.1
charset-normalizer                       3.4.1
cryptography                             44.0.0
Deprecated                               1.2.15
googleapis-common-protos                 1.66.0
grpcio                                   1.68.1
idna                                     3.10
importlib_metadata                       8.5.0
iniconfig                                2.0.0
Jinja2                                   3.1.5
MarkupSafe                               3.0.2
opentelemetry-api                        1.29.0
opentelemetry-exporter-otlp              1.29.0
opentelemetry-exporter-otlp-proto-common 1.29.0
opentelemetry-exporter-otlp-proto-grpc   1.29.0
opentelemetry-exporter-otlp-proto-http   1.29.0
opentelemetry-proto                      1.29.0
opentelemetry-sdk                        1.29.0
opentelemetry-semantic-conventions       0.50b0
packaging                                24.2
pip                                      24.3.1
pluggy                                   1.5.0
protobuf                                 5.29.2
pycparser                                2.22
pytest                                   8.3.4
PyYAML                                   6.0.2
requests                                 2.32.3
resolvelib                               1.0.1
typing_extensions                        4.12.2
urllib3                                  2.3.0
wrapt                                    1.17.0
zipp                                     3.21.0

If you'd like to reuse what I've done, v1v/otel-ansible-callback-plugin#2 might help you - you can configure another OTEL vendor.

Please let me know if you can provide what vendors you can see it's not working

@v1v
Copy link
Contributor

v1v commented Dec 31, 2024

However, if I use the latest container (0.116.1) for the OTEL Collector :

"org.opencontainers.image.created": "2024-12-17T21:09:34Z",
"org.opencontainers.image.licenses": "Apache-2.0",
"org.opencontainers.image.name": "opentelemetry-collector-releases",
"org.opencontainers.image.revision": "62dfc10402322ae4e2cdbdd92a0c0cc797f1b1f4",
"org.opencontainers.image.source": "https://github.com/open-telemetry/opentelemetry-collector-releases",
"org.opencontainers.image.version": "0.116.1"

Then the same setup it's not working:

Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 1s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 2s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 4s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 8s.

Regardless, https://github.com/ansible-collections/community.general/blob/main/plugins/callback/opentelemetry.py works fine if I use OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_HEADERS without the OTEL Collector itself:

OTEL_EXPORTER_OTLP_INSECURE=true \
	OTEL_EXPORTER_OTLP_ENDPOINT=https://*****.elastic-cloud.com:443 \
	OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer *****" \
	ansible-playbook playbook.yml
[...]

PLAY RECAP *********************************************************************
localhost                  : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
image

@v1v
Copy link
Contributor

v1v commented Jan 7, 2025

We can close this issue. So far I have not been able to reproduce the issue after the fix done at #8566 (comment)

@russoz
Copy link
Collaborator

russoz commented Jan 7, 2025

@moserke any objection to that?

@russoz
Copy link
Collaborator

russoz commented Jan 8, 2025

needs_info

@ansibullbot ansibullbot added the needs_info This issue requires further information. Please answer any outstanding questions label Jan 8, 2025
@moserke
Copy link
Author

moserke commented Jan 8, 2025

Thanks @russoz. Sounds good to me. My apologies for missing all of these.

@OneCyrus
Copy link

OneCyrus commented Jan 8, 2025

it started to work again with one of latest versions. looks good here too.

@ansibullbot ansibullbot removed the needs_info This issue requires further information. Please answer any outstanding questions label Jan 8, 2025
@felixfontein
Copy link
Collaborator

@v1v since you're a maintainer for this plugin you can write close_me in a comment to make the bot close the issue. (https://github.com/ansible/ansibullbot/blob/devel/ISSUE_HELP.md#commands) (I won't close it now so you can try it out ;-) )

@v1v
Copy link
Contributor

v1v commented Jan 8, 2025

close_me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue/PR relates to a bug callback callback plugin has_pr plugins plugin (any type)
Projects
None yet
Development

No branches or pull requests

8 participants