-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to send traces to Datadog Agent - Time out #1413
Comments
Maybe it's because of this hardcoded timeout limit https://github.com/DataDog/dd-trace-py/blob/v0.37.0/ddtrace/api.py#L127 |
Hi @weisurya, sorry for the delay here. Thanks for providing your setup. The tracer attempts to send traces every 1 second which is probably why you'd be seeing this message on that interval. Are any traces coming through at all? I suspect it's not related to the timeout limit. If the requests are taking longer than 2 seconds to send then there's probably a networking issue. Is there something unique about the way you deploy your Python app vs the Go or Node apps? Could you provide a little more insight about how you're deploying your app? |
@Kyle-Verhoog hey apologies for my late reply, and thank you for the follow-up. The way I implement DD APM on Python is similar to the implementation of Go & Node.js project, which I use
besides that, I just use the default configuration from each library. Yes, I could see some traces on the dashboard but also error report my log regarding this timeout. |
All of them are using the same deployment, which is in Docker environment as the baseline and deploy in AWS EB. Specific for Python, I use |
Any updates on this? We are also seeing the same error messages in our logs with version |
@Kyle-Verhoog We are seeing these errors intermittently.
|
Hi all, We're aware that this seems to occur but haven't come to a root cause yet due to how randomly this seems to occur. Our speculation so far is that the agent becomes overloaded and is unable to handle the request. We're looking to address it on our end by introducing retry logic for sending.
Correct, currently there is no retry logic
No nothing is crashing, the exception is occurring, being caught and finally logged in the worker thread that |
Any follow up on this? |
We are seeing these errors too, while using version 0.45.0. |
This change introduces a Fibonacci retry policy (with jitter) to the agent writer to mitigate networking issues (e.g. timeouts, broken pipes, ...), similar to what the profiler does already. Resolves DataDog#1413.
This change introduces a Fibonacci retry policy (with jitter) to the agent writer to mitigate networking issues (e.g. timeouts, broken pipes, ...), similar to what the profiler does already. Resolves #1413. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Which version of dd-trace-py are you using?
0.37.0
Which version of the libraries are you using?
You can copy/paste the output of
pip freeze
here.How can we reproduce your problem?
Here is how I initiate the project with ddtrace
ddtrace-run gunicorn index:app
and on the system env, I customized these variables
What is the result that you get?
the interval between 1 event to another is quite short - in seconds.
What is the result that you expected?
It could publish the event normally like my other services that use Go and Node library. All of them use same configuration, so I expect it would behave same.
This issue has been occurred since 2 months ago when I was still using version
0.28.0
The text was updated successfully, but these errors were encountered: