Http Event Listener retry for codes not in [200, 300) and expections, rertry delay calculation fix, better logging by mosiac1 · Pull Request #10566 · trinodb/trino

mosiac1 · 2022-01-12T11:04:04Z

I have been running into problems while using the event listener. Problems are usually one of:

Broken Pipe Exceptions, which are irregular and usually occur less than 2 times per hour. These aren't critical but cause dropped events. (Cause of these I suspect might be bad timing between trino and the receiving server in regards to timeouts) here is an exception;
Regular timeouts. There are cases where most of the plugin's attempts to send events will end with a timeout (from the http-client, not from the receiving server). The cause of these issues I haven't been able to track down yet. What I know is that it's probably not the fault of the ingest server because when the plugin keeps timing-out other requests to that server went through correctly (event from the same machine as trino). This usually happens when there are high loads on trino.

The changes this PR implements are simple and self-explanatory from the title.

These should fix problem 1. and help with tracking down problem 2.

Any ideas regarding problem 2 are greatly appreciated!

losipiuk · 2022-01-13T16:13:25Z

plugin/trino-http-event-listener/src/main/java/io/trino/plugin/httpquery/HttpEventListener.java

Commit messge Add more logging the event listener -> Add more logging in the HTTP event listener

losipiuk · 2022-01-13T16:15:02Z

plugin/trino-http-event-listener/src/main/java/io/trino/plugin/httpquery/HttpEventListener.java

feels to verbose. I think it should be debug

I see you changed that in latter commit but it should be debug already in "Add more logging to the HTTP event listener"

My bad, I just renamed the first commit and added this to the second one. I see you approved already, I can go back and make this change if you want.

Please do. We want clean commit history when possible. I will merge when CI passes.

losipiuk

LGTM - minor comments

mosiac1 · 2022-01-13T17:00:23Z

Thanks for the review! I implemented the requested changes.

I also found a bug in the next delay calculation and fixed that as well (+ changed PR title to include that).

hashhar · 2022-01-25T06:15:23Z

plugin/trino-http-event-listener/src/main/java/io/trino/plugin/httpquery/HttpEventListener.java

                                verify(result != null);

-                                if (result.getStatusCode() >= 500 && attempt < config.getRetryCount()) {
+                                if (!(result.getStatusCode() >= 200 && result.getStatusCode() < 300) && attempt < config.getRetryCount()) {


I'm not sure how would retrying any 4xx error ever succeed? Wouldn't we end up retrying until retry attempts are exhausted?

Yes, we would retry until attempts are exhausted.

Seems wasteful - specially since the event listener is synchronous and retrying a 4xx error seems guaranteed to fail unless for maybe HTTP 429.

Thanks for clarifying. I believe the intent is to use this mechanism to identify the scenarios we don't handle well currently and then fix them.

I think it has its use-cases (which admittedly are edge-cases) and it's better to be safe than drop data. This doesn't run on the query execution threads so no direct slowdown.

cla-bot bot added the cla-signed label Jan 12, 2022

losipiuk reviewed Jan 13, 2022

View reviewed changes

mosiac1 force-pushed the 369_events_try branch from a8619f9 to bc62232 Compare January 13, 2022 16:57

mosiac1 changed the title ~~Http Event Listener retry for codes not in [200, 300) and expections, better logging~~ Http Event Listener retry for codes not in [200, 300) and expections, rertry delay calculation fix, better logging Jan 13, 2022

losipiuk approved these changes Jan 13, 2022

View reviewed changes

mosiac1 added 3 commits January 13, 2022 17:34

Add more logging to the HTTP event listener

bb93efa

Add retries for codes not in [200, 300) and exceptions

349d445

Fix HTTP event listener retry delay calculation

6d3a951

mosiac1 force-pushed the 369_events_try branch from bc62232 to 6d3a951 Compare January 13, 2022 17:42

losipiuk merged commit 107b42f into trinodb:master Jan 14, 2022

losipiuk mentioned this pull request Jan 14, 2022

Release notes for 369 #10552

Closed

github-actions bot added this to the 369 milestone Jan 14, 2022

mosabua mentioned this pull request Jan 15, 2022

Add Trino 369 release notes #10553

Merged

hashhar reviewed Jan 25, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Http Event Listener retry for codes not in [200, 300) and expections, rertry delay calculation fix, better logging#10566

Http Event Listener retry for codes not in [200, 300) and expections, rertry delay calculation fix, better logging#10566
losipiuk merged 3 commits intotrinodb:masterfrom
mosiac1:369_events_try

mosiac1 commented Jan 12, 2022

Uh oh!

losipiuk Jan 13, 2022

Uh oh!

losipiuk Jan 13, 2022

Uh oh!

losipiuk Jan 13, 2022

Uh oh!

mosiac1 Jan 13, 2022

Uh oh!

losipiuk Jan 13, 2022

Uh oh!

losipiuk left a comment

Uh oh!

mosiac1 commented Jan 13, 2022

Uh oh!

hashhar Jan 25, 2022

Uh oh!

mosiac1 Jan 25, 2022

Uh oh!

hashhar Jan 25, 2022 •

edited

Loading

Uh oh!

mosiac1 Jan 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

mosiac1 commented Jan 12, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

losipiuk left a comment

Choose a reason for hiding this comment

Uh oh!

mosiac1 commented Jan 13, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hashhar Jan 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

hashhar Jan 25, 2022 •

edited

Loading