Graph: fix race condition in timeout #88946

nik9000 · 2022-07-29T14:57:47Z

Previously graph checked if the request timed out, then spent some
time doing work, then passed the timeout on to the next request. Over
and over again. It's quite possible that the response may not have timed
out for the first check but would have timed out for the second check.
This manifests as the timeout being sent to the next hop being a
negative number of milliseconds. We don't allow this sort of thing.

This fixes this by moving the timeout check to the same spot it is read
for setting the timeout on the next request - we just check if its > 0
to find the timeouts.

This does keep the request running slightly longer after it's officially
timed out - but it's just long enough to prepare the next layer of
request. Usually microseconds. Which should be fine.

Closes #55396

elasticsearchmachine · 2022-07-29T14:58:10Z

Hi @nik9000, I've created a changelog YAML for you.

elasticsearchmachine · 2022-07-29T14:58:10Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

nik9000 · 2022-07-29T14:58:47Z

...plugin/graph/src/internalClusterTest/java/org/elasticsearch/xpack/graph/test/GraphTests.java

        );
        grb.createNextHop(timeoutQuery).addVertexRequest("people").size(100).minDocCount(1);
+        // 00s friends of beatles
+        grb.createNextHop(QueryBuilders.termQuery("decade", "00s")).addVertexRequest("people").size(100).minDocCount(1);


I had to move the script query causing the timeout to the hop before the last hop because we no longer check the timeout on the final response. If we get a full response from the query we return it even if we're above the timeout time.

I figured that was fine because it should be quite fast.

nik9000 · 2022-07-29T14:59:16Z

...in/graph/src/main/java/org/elasticsearch/xpack/graph/action/TransportGraphExploreAction.java

 * connected terms in a single index.
 */
 public class TransportGraphExploreAction extends HandledTransportAction<GraphExploreRequest, GraphExploreResponse> {
+    private static final Logger logger = LogManager.getLogger(TransportGraphExploreAction.class);


The logger this one was using was deprecated.

nik9000 · 2022-07-29T14:59:45Z

...in/graph/src/main/java/org/elasticsearch/xpack/graph/action/TransportGraphExploreAction.java

        private final ActionListener<GraphExploreResponse> listener;

        private final long startTime;
-        private final AtomicBoolean timedOut;


This was pretty unnecessary. We only ever set it one time so I just pass it in.

nik9000 · 2022-07-29T15:00:14Z

...in/graph/src/main/java/org/elasticsearch/xpack/graph/action/TransportGraphExploreAction.java

-                listener.onResponse(buildResponse());
+                listener.onResponse(buildResponse(false));
                return;
            }


Here's the bit where we return the response even if we're over time. If we're done we may as well return anyway, I say.

This is fine for me

nik9000 · 2022-07-29T15:00:43Z

...in/graph/src/main/java/org/elasticsearch/xpack/graph/action/TransportGraphExploreAction.java

            client.search(searchRequest, new ActionListener.Delegating<>(listener) {
                @Override
                public void onResponse(SearchResponse searchResponse) {
-                    // System.out.println(searchResponse);


These just seemed like leftovers. I figure we'd dump these into the trace logs if we want them.

Previously `graph` checked if the request timed out, then spent some time doing work, then passed the timeout on to the next request. Over and over again. It's quite possible that the response may not have timed out for the first check but would have timed out for the second check. This manifests as the timeout being sent to the next hop being a negative number of milliseconds. We don't allow this sort of thing. This fixes this by moving the timeout check to the same spot it is read for setting the timeout on the next request - we just check if its `> 0` to find the timeouts. This does keep the request running slightly longer after it's officially timed out - but it's just long enough to prepare the next layer of request. Usually microseconds. Which should be fine. Closes elastic#55396

nik9000 · 2022-08-01T12:14:03Z

@elasticmachine run elasticsearch-ci/packaging-tests-unix-sample

iverase

LGTM

nik9000 added >bug :Analytics/Graph v8.5.0 labels Jul 29, 2022

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 29, 2022

nik9000 commented Jul 29, 2022

View reviewed changes

nik9000 added 2 commits July 29, 2022 11:05

Update docs/changelog/88946.yaml

5c25b33

iverase self-requested a review August 3, 2022 15:12

iverase approved these changes Aug 9, 2022

View reviewed changes

nik9000 merged commit 09d0025 into elastic:main Aug 17, 2022

tlrx mentioned this pull request Sep 23, 2022

[CI] GraphTests testTimedoutQueryCrawl failing #90286

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Graph: fix race condition in timeout #88946

Graph: fix race condition in timeout #88946

Uh oh!

nik9000 commented Jul 29, 2022

Uh oh!

elasticsearchmachine commented Jul 29, 2022

Uh oh!

elasticsearchmachine commented Jul 29, 2022

Uh oh!

nik9000 Jul 29, 2022

Uh oh!

nik9000 Jul 29, 2022

Uh oh!

nik9000 Jul 29, 2022

Uh oh!

nik9000 Jul 29, 2022

Uh oh!

nik9000 Jul 29, 2022

Uh oh!

iverase Aug 9, 2022

Uh oh!

nik9000 Jul 29, 2022

Uh oh!

nik9000 commented Aug 1, 2022

Uh oh!

iverase left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Graph: fix race condition in timeout #88946

Graph: fix race condition in timeout #88946

Uh oh!

Conversation

nik9000 commented Jul 29, 2022

Uh oh!

elasticsearchmachine commented Jul 29, 2022

Uh oh!

elasticsearchmachine commented Jul 29, 2022

Uh oh!

nik9000 Jul 29, 2022

Choose a reason for hiding this comment

Uh oh!

nik9000 Jul 29, 2022

Choose a reason for hiding this comment

Uh oh!

nik9000 Jul 29, 2022

Choose a reason for hiding this comment

Uh oh!

nik9000 Jul 29, 2022

Choose a reason for hiding this comment

Uh oh!

nik9000 Jul 29, 2022

Choose a reason for hiding this comment

Uh oh!

iverase Aug 9, 2022

Choose a reason for hiding this comment

Uh oh!

nik9000 Jul 29, 2022

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Aug 1, 2022

Uh oh!

iverase left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants