Skip to content

Address intermittent failures of testQueryLoggingCount#9409

Merged
losipiuk merged 3 commits intoprestodb:masterfrom
losipiuk:lo/fix-testQueryLoggingCount-intermittent
Nov 27, 2017
Merged

Address intermittent failures of testQueryLoggingCount#9409
losipiuk merged 3 commits intoprestodb:masterfrom
losipiuk:lo/fix-testQueryLoggingCount-intermittent

Conversation

@losipiuk
Copy link
Contributor

@losipiuk losipiuk commented Nov 24, 2017

The change addresses intermittent failures of AbstractTestDistributedQueries.testQueryLoggingCount.
The failure happened because of how beforeCompletedQueriesCount and
beforeSubmittedQueriesCount were determined at the beginning of tests.
Code assumed that it is safe to get those values as soon as all the
queries in queryManager are completed.

Yet it does not mean that stats counters were already updated, as those
two changes are not synchronized. See SqlQueryManager.createQuery:

Here is the code which updates stats counter for completed queries.

  QueryInfo info = queryExecution.getQueryInfo();
  stats.queryFinished(info);
  queryMonitor.queryCompletedEvent(info);

Yet according to info.isFinalQueryInfo() taken from
queryManager.getAllQueryInfo() the query is already completed.

fixes #8953

@losipiuk losipiuk force-pushed the lo/fix-testQueryLoggingCount-intermittent branch from 9d77489 to 24f45d6 Compare November 24, 2017 16:30
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear reviewer: The effect of this change is very similar to just putting

sleepUninterruptibly(100, MILLISECONDS)

before this line.
I may change it, whichever you prefer.

Copy link
Contributor

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could

lastValue = fun.get()

and then sleep before getting next value. This allow you to live without Optional

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/fun/valueSupplier/ or //computation

@losipiuk losipiuk force-pushed the lo/fix-testQueryLoggingCount-intermittent branch from 24f45d6 to da96ce2 Compare November 27, 2017 17:39
@losipiuk
Copy link
Contributor Author

@findepi updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waitUtilStable -> waitUntilStable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this method is only used for getting the before completed queries count, should we make it more specific like getBeforeCompletedQueriesCount? This is currently pretty generic and currently we don't need it to be.

I also feel like we should add a comment here that summarizes what's available in the git commit message. It's usually easier for the reader to just see it in the code rather than digging into the commit message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comment. Not convinced to rename the method. It is fairly simple. And it's use in test is pretty self explanatory without need to look into body.
If something I would return it at all and replaced with sleep. See my comment #9409 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should we throw a TimeoutException here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used UncheckedTimeoutException from Guava.

@losipiuk
Copy link
Contributor Author

losipiuk commented Nov 27, 2017

@nezihyigitbasi, I addressed comments.

Copy link
Contributor

@nezihyigitbasi nezihyigitbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this, it has been bothering me for a while :) I just left a minor comment regarding the comment we added to the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is confusing, because I understand from this comment that it's not safe to get the value of beforeSubmittedQueriesCount immediately after queries finish, however we don't use waitUntilStable for it below like we do for the other counter (in other words, if it's not safe why don't we wrap it with waitUntilStable). In fact, when I look at the code I think it's safe to get the number of submitted queries as that counter is updated synchronously (from createQuery() -> addStatsListeners() -- not in a final query info listener unlike the other counter).

How about we add this comment instead?

// We cannot simply get the number of completed queries as soon as all the queries are completed, because this counter may not be up-to-date at that point.
// The completed queries counter is updated in a final query info listener, which is called eventually.
// Therefore, here we wait until the value of this counter gets stable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good catch! I looked at just one counter and wrongly generalised how it is set it to both. Yet the reason why it is not wrapped with waitUntilStable is that I assumed that when one stabilises, then the other should be good too.

Anyway, whole logic is wrong for other counter so I will change the comment and commit message.

The change addresses intermittent failures of AbstractTestDistributedQueries.testQueryLoggingCount.
The failure happened because of how `beforeCompletedQueriesCount` is  determined at the beginning of test.
Code assumed that it is safe to get value of this counter as soon as all the
queries in queryManager are completed.

Yet it does not mean that stats counter was already updated, as those
two changes are not synchronized. See `SqlQueryManager.createQuery`:

Here is the code which updates stats counter for completed queries.
```
  QueryInfo info = queryExecution.getQueryInfo();
  stats.queryFinished(info);
  queryMonitor.queryCompletedEvent(info);
```

Yet according to `info.isFinalQueryInfo()` taken from
`queryManager.getAllQueryInfo()` the query is already completed.
@losipiuk losipiuk force-pushed the lo/fix-testQueryLoggingCount-intermittent branch from 1c66909 to 5e66999 Compare November 27, 2017 21:13
@losipiuk losipiuk merged commit c0186a7 into prestodb:master Nov 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test testQueryLoggingCount

4 participants