Various Kudu cleanups and fixes related to schema emulation races by grantatspothero · Pull Request #11264 · trinodb/trino

grantatspothero · 2022-03-01T19:22:16Z

Description

Fixes lots of different problems with flaky kudu tests. Let me know if I should split some of these commits into separate PRs

Fixes a problem with the connector code itself racing to create the $schemas table
Fixes problems with tests asserting on the number of tables that exist, which races with the $schemas table
Workaround a change in the kudu client introduced in this commit that caused opening a table immediately after creating it to sometimes fail: apache/kudu@4bf7e45
Flush kudu operations in batches instead of one at a time, this helps reduce CI runtime. This PR https://github.com/trinodb/trino/pull/10940/files changed the behavior from AUTO_FLUSH_BACKGROUND to AUTO_FLUSH_SYNC to improve correctness, but unfortunately it caused a performance degradation due to flushing each operation one at a time. This commit keeps correctness while getting better write performance.

Related issues, pull requests, and links

#11203

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Kudu
* Fixes race in creating schemas table if schema emulation is enabled. ({issue}`11264`)
* Improve write performance by flushing operations in batches. ({issue}`11264`)

ebyhr · 2022-03-02T01:05:27Z

Please don't remove PR template.

.github/workflows/ci.yml

hashhar

Thank you, lots of improvements here for Kudu users.

Some comments. nits are really nits, feel free to ignore.

.github/workflows/ci.yml

...udu/src/test/java/io/trino/plugin/kudu/AbstractKuduWithDisabledInferSchemaConnectorTest.java

plugin/trino-kudu/src/test/java/io/trino/plugin/kudu/TestingKuduServer.java

plugin/trino-kudu/src/main/java/org/apache/kudu/client/KuduOperationApplier.java

.github/bin/build-matrix-from-impacted.py

...ino-kudu/src/main/java/io/trino/plugin/kudu/schema/SchemaEmulationByTableNameConvention.java

Praveen2112 · 2022-03-15T06:48:39Z

...udu/src/test/java/io/trino/plugin/kudu/AbstractKuduWithStandardInferSchemaConnectorTest.java

When does the table is actually created ? When we try to create a new table ?

It is created when trying to listSchemas: https://github.com/trinodb/trino/blob/1773ce393abd4eecf0de26e0909df78f6f34621c/plugin/trino-kudu/src/main/java/io/trino/plugin/kudu/schema/SchemaEmulationByTableNameConvention.java#L112-L118

I think the purpose of the table is listing every table from every tablet is expensive, so instead do that listing once and cache the results in a kudu table and then update the kudu table whenever a new schema is created/deleted.

I think the purpose of the table is listing every table from every tablet is expensive, so instead do that listing once and cache the results in a kudu table and then update the kudu table whenever a new schema is created/deleted.

I think we don't use it for caching - Kudu doesn't have the concept of namespace i.e schema so we virtually create a table to store the schemas and add them as a prefix for the tables under this virtual schema.
https://trino.io/docs/current/connector/kudu.html#behavior-with-schema-emulation

...udu/src/test/java/io/trino/plugin/kudu/AbstractKuduWithDisabledInferSchemaConnectorTest.java

...udu/src/test/java/io/trino/plugin/kudu/AbstractKuduWithStandardInferSchemaConnectorTest.java

Praveen2112 · 2022-03-15T06:59:41Z

plugin/trino-kudu/src/test/java/io/trino/plugin/kudu/TestingKuduServer.java

So in the real world also it causes an issue this flakiness ?

Yes, if a kudu master has multiple resolvable ip addresses for a dns record then this read your own writes flakiness will occur [like creating then immediately opening a table]. This is unconfirmed, but it appears to be an issue with the kudu client thinking multiple masters exist even though there is a single master, which results in this problem manifesting: https://issues.apache.org/jira/browse/KUDU-3266

I reached out to the kudu maintainers on slack and they were surprised about this behavior:
https://getkudu.slack.com/archives/C0CPXJ3CH/p1646330492501219

Praveen2112 · 2022-03-15T07:25:12Z

plugin/trino-kudu/src/main/java/org/apache/kudu/client/KuduOperationApplier.java

Can we configure the kuduSession to have manual_flush as its default flush mode.

It is tricky to do without introducing extra abstraction in ForwardingKuduClient:
https://github.com/trinodb/trino/blob/master/plugin/trino-kudu/src/main/java/io/trino/plugin/kudu/ForwardingKuduClient.java#L98-L101

And we guard against the case of incorrectly using the wrong flush mode here. I'm inclined to leave the configured default flush mode.

There was a check if exists then create and populate table if not exists pattern in the kudu schema emulation layer. This races when two concurrent queries happen. Instead, always try to create and upsert data into the kudu schemas table idempotently.

Depending on test ordering, the kudu schemas table may or may not exist

Include the throwable kuduexception cause

BaseConnectorTest creates tables in the default schema. This races in listing of the default schema

This commit (in kudu client v1.12.0): apache/kudu@4bf7e45 Causes multiple ip addresses to be resolved for each kudu master. This interacts poorly with kudu's multi-master support which does not provide read your own writes, leading to flaky tests when creating then immediatly opening a table. The workaround is explicitly connectng to the kudu master at the ipv4 loopback address (127.0.0.1) instead of localhost

This helps performance of loading data into kudu. However, still verify operations are successful and fail fast when errors occur.

They are now reliably passing without flakes after this PR: trinodb#11264

leveyja · 2022-03-17T14:50:44Z

plugin/trino-kudu/src/main/java/io/trino/plugin/kudu/KuduPageSink.java

+                PartialRow row = upsert.getRow();
+                int start = 0;
+                if (generateUUID) {
+                    String id = format("%s-%08x", uuid, nextSubId++);


Q: Should nextSubId be marked volatile (it's a member being updated in a CompletableFuture)?

The completablefuture NOT_BLOCKED is just a placeholder to signal the page sink is not blocked. NOT_BLOCKED does not hold a reference to nextSubId, so I do not think there are any issues here.

Just to be clear, non-volatile means that any other thread may not see the new value of nextSubId++ (it's not volatile, so there's no guarantee a separate thread will ever see the updated value, if it had ever read the value before).
If that's not important, then feel free to ignore! (But as an aside, I think it's generally better to be explicit about these scenarios, and I would mark it volatile to be correct - rather than have code which could be "incorrect" under certain circumstances, and just depending on those situations being "impossible" - someone may always re-use this code and fall foul of the invisible update!).

Concrete example:

executor.submit(() -> kuduPageSink.appendPage(page)); executor.submit(() -> kuduPageSink.appendPage(anotherPage));

If that were a multi-threaded Exector, this line could yield the same value in both threads:

String id = format("%s-%08x", uuid, nextSubId++);

(both threads read kuduPageSink.nextSubId , both increment it, neither see the value of the other, so it's both a race and a lost update - but even if only 1 thread did updates, because it's not volatile, there's no guarantee anyone who has a reference to kuduPageSink instance will see the update nextSubId++, hence the problem).

On reflection, I'd use AtomicInteger.getAndIncrement() for simplicity sake (and to avoid the race I mentioned - volatile wouldn't protect against that...).

But anyways - just wanted to throw that out there for your consideration!

Correct me if I'm wrong, but page sinks are not expected to be thread safe.

Nowhere in the javadoc for ConnectorPageSink does it list thread safety as a requirement. The closest thing I could find was:

/** * Returns a future that will be completed when the page sink can accept * more pages. If the page sink can accept more pages immediately, * this method should return {@code NOT_BLOCKED}. */ CompletableFuture<?> appendPage(Page page);

Here we only return NOT_BLOCKED after the critical section, so there is no risk of concurrent access to nextSubId

leveyja · 2022-03-17T14:55:04Z

...ino-kudu/src/main/java/io/trino/plugin/kudu/schema/SchemaEmulationByTableNameConvention.java

+        catch (KuduException e) {
+            if (e.getStatus().isAlreadyPresent()) {
+                // Table already exists
+                return client.openTable(name);


Q: (For complete correctness) should we do a comparison on the existing table's options to be sure they're effectively "the same"?
(I know this is fixing a race in test code, but in Prod code if someone attempted to create a table w/incompatible options at the same time as someone else, they may (unexpectedly) get back a KuduTable which doesn't match the create options they were trying to apply?)

(Looking a little more closely, it seems the only usage is from createAndFillSchemasTable, which means the case I suggested can't manifest - so ignore!).

The whole approach of emulating schemas in kudu using a kudu table is not safe given kudu's consistency semantics and trino does not propagate client timestamps in kudu.

I talked with some kudu maintainers, impala uses the hive metastore to emulate schemas in kudu, that introduces a big dependency however.

Praveen2112 · 2022-03-23T07:19:02Z

...udu/src/test/java/io/trino/plugin/kudu/AbstractKuduWithStandardInferSchemaConnectorTest.java

I think the purpose of the table is listing every table from every tablet is expensive, so instead do that listing once and cache the results in a kudu table and then update the kudu table whenever a new schema is created/deleted.

I think we don't use it for caching - Kudu doesn't have the concept of namespace i.e schema so we virtually create a table to store the schemas and add them as a prefix for the tables under this virtual schema.
https://trino.io/docs/current/connector/kudu.html#behavior-with-schema-emulation

Praveen2112 · 2022-03-23T07:22:28Z

...udu/src/test/java/io/trino/plugin/kudu/AbstractKuduWithDisabledInferSchemaConnectorTest.java

+        assertThat(rows).containsAll(
+                REQUIRED_TPCH_TABLES.stream()
+                        .map(TpchTable::getTableName)
+                        .collect(Collectors.toList()));


nit: Any specific reason on not using similar toUnmodifiableListhere ?

Doesn't seem like it matters given the list clearly cannot be modified.

...udu/src/test/java/io/trino/plugin/kudu/AbstractKuduWithDisabledInferSchemaConnectorTest.java

grantatspothero · 2022-03-23T15:10:32Z

@Praveen2112 WRT to #11264 (comment)

Yes, but you could imagine instead of creating a $schemas table to emulate schemas you could list all kudu tables and from the naming convention use that information to emulate schemas. The $schemas table is purely for performance reasons.

Praveen2112 · 2022-03-23T15:32:45Z

Yeah !! I agree schema operations would be a lot faster with the $schema tables.

hashhar · 2022-03-25T07:37:49Z

Thanks a lot for digging to the bottom of this flakiness @grantatspothero . Wasn't easy for sure. And the performance improvements for writes is also nice.

cla-bot bot added the cla-signed label Mar 1, 2022

grantatspothero force-pushed the gn/fixKuduSchemasTableRace branch from da56d10 to dab4609 Compare March 1, 2022 19:28

grantatspothero mentioned this pull request Mar 1, 2022

Remove usage of static test classes in kuduconnectortests #11185

Merged

This was referenced Mar 3, 2022

Fix flaky kudu tests: schema/table not found and enable CI #11203

Closed

Temporarily disable kudu tests from running in CI #11313

Merged

grantatspothero force-pushed the gn/fixKuduSchemasTableRace branch 13 times, most recently from 8d0f4de to 3872062 Compare March 10, 2022 15:43

grantatspothero requested review from ebyhr, findepi and hashhar March 11, 2022 15:38

grantatspothero commented Mar 11, 2022

View reviewed changes

.github/workflows/ci.yml Outdated Show resolved Hide resolved

hashhar reviewed Mar 11, 2022

View reviewed changes

grantatspothero force-pushed the gn/fixKuduSchemasTableRace branch 6 times, most recently from fa5d5af to 0994f87 Compare March 11, 2022 21:32

hashhar reviewed Mar 12, 2022

View reviewed changes

.github/bin/build-matrix-from-impacted.py Outdated Show resolved Hide resolved

grantatspothero force-pushed the gn/fixKuduSchemasTableRace branch from 74fba32 to d8f9b04 Compare March 14, 2022 14:44

Praveen2112 reviewed Mar 15, 2022

View reviewed changes

grantatspothero force-pushed the gn/fixKuduSchemasTableRace branch from 1773ce3 to 2e909d6 Compare March 15, 2022 16:56

grantatspothero force-pushed the gn/fixKuduSchemasTableRace branch from 2e909d6 to 5d53302 Compare March 15, 2022 20:13

grantatspothero added 6 commits March 15, 2022 15:23

Fix race relating to the kudu schema emulation schemas table

4deebd0

Depending on test ordering, the kudu schemas table may or may not exist

Improve kudu schema/table not found error message

db8c708

Include the throwable kuduexception cause

Fix test race in AbstractKuduWithDisabledInferSchemaConnectorTest

ee0f62e

BaseConnectorTest creates tables in the default schema. This races in listing of the default schema

Flush kudu operations in bulk instead of one at a time

5397928

This helps performance of loading data into kudu. However, still verify operations are successful and fail fast when errors occur.

Re-enable kudu tests after fixing flakes

d174114

They are now reliably passing without flakes after this PR: trinodb#11264

grantatspothero force-pushed the gn/fixKuduSchemasTableRace branch from 5d53302 to d174114 Compare March 15, 2022 20:23

leveyja reviewed Mar 17, 2022

View reviewed changes

hashhar requested a review from Praveen2112 March 17, 2022 17:56

Praveen2112 approved these changes Mar 23, 2022

View reviewed changes

hashhar approved these changes Mar 25, 2022

View reviewed changes

hashhar changed the title ~~Gn/fix kudu schemas table race~~ Various Kudu cleanups and fixes related to schema emulation races Mar 25, 2022

hashhar merged commit 2c2b3ec into trinodb:master Mar 25, 2022

hashhar mentioned this pull request Mar 25, 2022

Release notes for 375 #11527

Closed

github-actions bot added this to the 375 milestone Mar 25, 2022

mosabua mentioned this pull request Mar 25, 2022

Add Trino 375 release notes #11528

Merged

Conversation

grantatspothero commented Mar 1, 2022 • edited by hashhar Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues, pull requests, and links

Documentation

Release notes

Uh oh!

ebyhr commented Mar 2, 2022

Uh oh!

Uh oh!

hashhar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grantatspothero Mar 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grantatspothero Mar 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

grantatspothero commented Mar 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Praveen2112 commented Mar 23, 2022

Uh oh!

hashhar commented Mar 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

grantatspothero commented Mar 1, 2022 •

edited by hashhar

Loading

hashhar left a comment •

edited

Loading

grantatspothero Mar 15, 2022 •

edited

Loading

grantatspothero Mar 21, 2022 •

edited

Loading

grantatspothero commented Mar 23, 2022 •

edited

Loading