Skip to content

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented Jul 29, 2022

Previously graph checked if the request timed out, then spent some
time doing work, then passed the timeout on to the next request. Over
and over again. It's quite possible that the response may not have timed
out for the first check but would have timed out for the second check.
This manifests as the timeout being sent to the next hop being a
negative number of milliseconds. We don't allow this sort of thing.

This fixes this by moving the timeout check to the same spot it is read
for setting the timeout on the next request - we just check if its > 0
to find the timeouts.

This does keep the request running slightly longer after it's officially
timed out - but it's just long enough to prepare the next layer of
request. Usually microseconds. Which should be fine.

Closes #55396

@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 29, 2022
);
grb.createNextHop(timeoutQuery).addVertexRequest("people").size(100).minDocCount(1);
// 00s friends of beatles
grb.createNextHop(QueryBuilders.termQuery("decade", "00s")).addVertexRequest("people").size(100).minDocCount(1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to move the script query causing the timeout to the hop before the last hop because we no longer check the timeout on the final response. If we get a full response from the query we return it even if we're above the timeout time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured that was fine because it should be quite fast.

* connected terms in a single index.
*/
public class TransportGraphExploreAction extends HandledTransportAction<GraphExploreRequest, GraphExploreResponse> {
private static final Logger logger = LogManager.getLogger(TransportGraphExploreAction.class);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logger this one was using was deprecated.

private final ActionListener<GraphExploreResponse> listener;

private final long startTime;
private final AtomicBoolean timedOut;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was pretty unnecessary. We only ever set it one time so I just pass it in.

listener.onResponse(buildResponse());
listener.onResponse(buildResponse(false));
return;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the bit where we return the response even if we're over time. If we're done we may as well return anyway, I say.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for me

client.search(searchRequest, new ActionListener.Delegating<>(listener) {
@Override
public void onResponse(SearchResponse searchResponse) {
// System.out.println(searchResponse);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These just seemed like leftovers. I figure we'd dump these into the trace logs if we want them.

nik9000 added 2 commits July 29, 2022 11:05
Previously `graph` checked if the request timed out, then spent some
time doing work, then passed the timeout on to the next request. Over
and over again. It's quite possible that the response may not have timed
out for the first check but would have timed out for the second check.
This manifests as the timeout being sent to the next hop being a
negative number of milliseconds. We don't allow this sort of thing.

This fixes this by moving the timeout check to the same spot it is read
for setting the timeout on the next request - we just check if its `> 0`
to find the timeouts.

This does keep the request running slightly longer after it's officially
timed out - but it's just long enough to prepare the next layer of
request. Usually microseconds. Which should be fine.

Closes elastic#55396
@nik9000
Copy link
Member Author

nik9000 commented Aug 1, 2022

@elasticmachine run elasticsearch-ci/packaging-tests-unix-sample

@iverase iverase self-requested a review August 3, 2022 15:12
Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nik9000 nik9000 merged commit 09d0025 into elastic:main Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] GraphTests.testTimedoutQueryCrawl fails on CI

3 participants