Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the DatadogExportSpanProcessor re-initialize itself after fork #932

Conversation

phillipuniverse
Copy link
Contributor

@phillipuniverse phillipuniverse commented Feb 28, 2022

Description

The DatadogExportSpanProcessor suffered from the same problems as the BatchSpanProcessor. When using the DatadogSpanProcessor I would see my background workers initialized with os.fork() deadlock. I observed this in RQ but I'm sure the same types of problems would happen in Celery.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

I effectively copied over the test from open-telemetry/opentelemetry-python#2242. Test passes when re-initializing on fork, fails without it.

I haven't yet put this live into my project because there are a few questions I want to ask about the particulars of this implementation.

Does This PR Require a Core Repo Change?

  • No

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

@phillipuniverse phillipuniverse force-pushed the datadog-exporter-forkaware branch 4 times, most recently from d430242 to 4d76fe1 Compare February 28, 2022 03:44
Comment on lines +92 to +97
self.condition = threading.Condition(threading.Lock())
self.flush_condition = threading.Condition(threading.Lock())
self.traces_lock = threading.Lock()
self.check_traces_queue.clear()
self.worker_thread = self._create_worker_thread()
self.worker_thread.start()
Copy link
Contributor Author

@phillipuniverse phillipuniverse Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wasn't too sure here what I should/shouldn't re-initialize. I figured I would initialize the threading-related code just in case. What do you think? Are these the right things to re-initialize on the fork?

hasattr(os, "fork") and sys.version_info >= (3, 7),
"needs *nix and minor version 7 or later",
)
def test_exports_with_fork(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is effectively the same sort of one from open-telemetry/opentelemetry-python#2242 with as few changes as possible.

conn.close()

parent_conn, child_conn = multiprocessing.Pipe()
fork_context = multiprocessing.get_context("fork")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm running this code on macOS where the default is spawn. With that default I got a 'Can't pickle local object' when the created Process linked to this inner child function. I didn't really want to move out child but figured this small detail didn't matter much since the test should behave the same on all systems.

@@ -360,7 +360,7 @@ commands_pre =

aiopg: pip install {toxinidir}/instrumentation/opentelemetry-instrumentation-dbapi pip install {toxinidir}/instrumentation/opentelemetry-instrumentation-aiopg[test]

datadog: pip install flaky {toxinidir}/exporter/opentelemetry-exporter-datadog[test]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a bit more sense to me to keep this flaky dependency in the extras_require of the module instead of here in tox.ini.

@phillipuniverse phillipuniverse marked this pull request as ready for review February 28, 2022 03:55
@phillipuniverse phillipuniverse requested a review from a team February 28, 2022 03:55
@owais
Copy link
Contributor

owais commented Mar 9, 2022

@phillipuniverse I'm pretty sure this package has been deprecated in favor of DataDog's official library (https://github.com/DataDog/dd-trace-py/) so not sure if we should invest considerable effort in maintaining/reviewing this package.

@open-telemetry/opentelemetry-python-contrib-approvers Perhaps we should add a deprecation notice to this package and re-direct to the official DataDog solution.

@majorgreys could you please confirm if this package is deprecated and if so, is DataDog/dd-trace-py the right repo to redirect users to?

@phillipuniverse
Copy link
Contributor Author

phillipuniverse commented Mar 10, 2022

@owais interesting context but dd-trace-py does not have any OpenTelemetry exporters which would take an otel trace and convert it to a valid DataDog trace. But there is code there that takes an OpenTracing trace and converts it to Datadog - maybe this exporter belongs there too 🤷 .

If this package should truly be deprecated then we should probably also remove these issues as they probably don't apply anymore and should likely be closed:

Since I started using OpenTelemetry and Datadog a few months ago, I have seen that the official docs recommend using the OpenTelemetry collector to do the transformation into the Datadog format via an exporter. It would be great to clear up the discrepancy - when I came into the ecosystem a few months ago I thought that using the service exporter directly into the Datadog Agent was the "blessed" path. Deprecating this opentelemetry-python-contrib exporter and instead guiding users to user the datadog-specific exporter at the OpenTelemetry collector level would make things a lot clearer.

Thanks for the consideration!

@owais
Copy link
Contributor

owais commented Mar 10, 2022

@phillipuniverse it seems DataDog recommends sending native OTLP to DataDog agent instead of using exporters such as these.

https://docs.datadoghq.com/tracing/setup_overview/open_standards/#otlp-ingest-in-datadog-agent

#900
open-telemetry/opentelemetry-python#2517

@phillipuniverse
Copy link
Contributor Author

@owais thanks for your note about the DatadogExportSpanProcessor being effectively deprecated. I have since converted my services to use the BatchSpanProcessor and that has fixed my issues.

I am currently still utilizing the Datadog agent to receive traces, but this I run it with the DD_OTLP_GRPC_PORT which allows me to receive OpenTelemetry traces:

docker run -e DD_OTLP_GRPC_PORT=4317 -e DD_APM_CONFIG_AMP_NON_LOCAL_TRAFFIC=1 -e DD_API_KEY=<<some-api-key>> -p 8125:8125/udp -p 8126:8126 -p 4317:4317 --rm datadog/agent:latest

@opentelemetry-python-contrib-approvers Perhaps we should add a deprecation notice to this package and re-direct to the official DataDog solution.

This would be really nice and would definitely help avoid confusion in the future.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants