Avoid overriding nullability decision by martint · Pull Request #13152 · trinodb/trino

martint · 2022-07-12T12:15:45Z

mayReturnNullOnNonNullInput currently returns different results for
expressions such as:

CAST(...) = CASE ...

when the terms appear in the opposite order.

This causes problems for some rules such as ReorderJoins, which relies
on the result of the check from one form of the expression to make
decisions about the alternative form of the expression.

Fixes #13145

Documentation

(x) No documentation is needed.

Release notes

(x) Release notes entries required with the following suggested text:

# General
* Fix incorrect results for certain join queries containing filters involving explicit or implicit casts. ({issue}`13145 `)

mayReturnNullOnNonNullInput currently returns different results for expressions such as: CAST(...) = CASE ... when the terms appear in the opposite order. This causes problems for some rules such as ReorderJoins, which relies on the result of the check from one form of the expression to make decisions about the alternative form of the expression.

dain · 2022-07-12T17:18:03Z

Oh maybe add a comment that compare and set is critical for this code.

findepi · 2022-07-18T11:27:17Z

core/trino-main/src/main/java/io/trino/sql/planner/NullabilityAnalyzer.java

            //
            // Also, try_cast (i.e., safe cast) can return null
-            result.set(node.isSafe() || !node.isTypeOnly());
+            result.compareAndSet(false, node.isSafe() || !node.isTypeOnly());


result.set(result.get && ...) would be clearer, since we're not using any concurrency here

The bug fixed in trinodb#13152 can be reproduced with a simpler query: SELECT x, y FROM t1 JOIN t2 ON (t1.id = t2.id) WHERE IF(t1.v = 0, 'cc', y) = 'b'

The bug fixed in #13152 can be reproduced with a simpler query: SELECT x, y FROM t1 JOIN t2 ON (t1.id = t2.id) WHERE IF(t1.v = 0, 'cc', y) = 'b'

@experimental

* Upgrade toxiproxy docker container version Toxiproxy docker container in version 2.4.0 has arm64 architecture support, which allows to run tests that use toxiproxy on Macbook Pro M1 * Leverage Optional.orElseThrow to increase code readability Use of `Optional.orElseThrow` allows to verify the optional is not empty before assigning to a variable. This subsequently removes need for `Optional.get()` calls. * Remove redundant else * Move currentRequestStartNanos to UpdateResponseHandler * Make sendUpdate lock free Ensure that only single thread will execute sendUpdate * Inline method This makes further refactorings easier to read. * Simplify creation of join operator Since the split into spilling and non-spilling, creation of join operators had some unnecessary casting. This commit cleans it and makes clear spilling/non-spilling code paths in LocalExecutionPlanner. Changes in this class are strictly mechanical. No logical changes are made. * Remove trailing slash when building metadata location in Iceberg * Use ImmutableList.Builder in CheckpointSchemaManager * Enforce writeStatsAsJson and writeStatsAsStruct option in Delta Lake * Support all Trino Security Access Control modes in Delta lake Connector * Randomly distribute spooling directories across buckets * Bump calcite-core to 1.21.0.152 Bump calcite-core dependency to address https://nvd.nist.gov/vuln/detail/CVE-2017-18640 * Update pull request template * Remove redundant keywords * Fix name of like function for connector expressions The function is misnamed. It should have been $like all along. $like_pattern is the name of the internal function to convert from a varchar to the compiled form of a pattern (i.e., LikeMatcher). * Improve type mapping documentation for Druid connector * Clarify connector support for SQL MERGE * Add ORC UuidColumnReader Read UUID values to Int128ArrayBlock instead of VariableWidthBlock to improve performance and to use only type preferred by UUIDType. This is only supported in iceberg connector as only iceberg specification has explicit UUID support for orc. Benchmark (compression) Mode Cnt Score Error Units BenchmarkColumnReaders.readUuidNoNull NONE avgt 20 4.425 ± 0.044 ns/op BenchmarkColumnReaders.readUuidNoNull ZLIB avgt 20 4.393 ± 0.051 ns/op BenchmarkColumnReaders.readUuidWithNull NONE avgt 20 6.459 ± 0.127 ns/op BenchmarkColumnReaders.readUuidWithNull ZLIB avgt 20 6.465 ± 0.103 ns/op BenchmarkColumnReaders.readVarbinaryUuidNoNull NONE avgt 20 4.965 ± 0.077 ns/op BenchmarkColumnReaders.readVarbinaryUuidNoNull ZLIB avgt 20 4.932 ± 0.088 ns/op BenchmarkColumnReaders.readVarbinaryUuidWithNull NONE avgt 20 6.966 ± 0.174 ns/op BenchmarkColumnReaders.readVarbinaryUuidWithNull ZLIB avgt 20 6.895 ± 0.221 ns/op * Migrate interval year-month literal tests to new assertions * Migrate interval day-time literal tests to new assertions * Remove unnecessary method The method was a left-over from before the utf8Slice() convenience method existed. * Add Optional predicate to ParquetReader constructor used in Iceberg Add requireNonNull on parquetPredicate and columnIndexStore in ParquetReader, pass Optional.empty() from Iceberg for parquetPredicate param * Fix CanonicalizeExpressions processing of date() argument A recursive rewrite call on the function argument was missing, so the result was potentially not canonical. * Put getter before setter Per code style rule * Analyze Iceberg tables Support `ANALYZE` in Iceberg connector. This collects number distinct values (NDV) of selected columns and stores that in table properties. This is interim solution until Iceberg library has first-class statistics files support. * Remove deprecated statistics collection ConnectorMetadata methods Remove `ConnectorMetadata.getStatisticsCollectionMetadata` and `ConnectorMetadata.getTableHandleForStatisticsCollection` that were deprecated in cbe2dca. * Remove redundant cast from TestingClickHouseServer * Add missing final modifier in CassandraServer * Remove unused throws Exception clauses * Remove usage of deprecated ContainerState#getContainerIpAddress * Reveal enforced constraint in EXPLAIN for Iceberg table scan Include `enforcedPredicate`'s columns in `IcebergTableHandle.toString` so that it's visible in `EXPLAIN` output. * Remove unnecessary null check * Refactor DriverSplitRunnerFactory Remove DriverSplitRunnerFactory#Status * Transition task to FINISHED after noMoreOperators is set Currently dynamic filters are delivered via DynamicFilterFetcher. When a TaskStatus contains an updated dynamic filter version the DynamicFilterFetcher is responsible for fetching the new dynamic filters from a worker. However tasks were getting transitioned to FINISHED before the LocalDynamicFilterConsumer#setPartitionCount is set and dynamic filters updated. When task is transitioned to FINISHED the TaskStatusFetcher no longer tries to update TaskStatus so the coordinator is not able to learn that there is dynamic filters to be fetched. * Update task holder before starting task execution Otherwise there is a chance that the task may finish before task holder is updated resulting in incomplete final TaskStatus being created. Since the TaskStatus is used to signal DynamicFilterFetcher to get dynamic filters the coordinator may never learn that there are dynamic filters to be retrieved. * Add recording distribution of outputBuffer utilization Add outputBufferUtilization field to the OutputBufferInfo to expose distribution of output buffer utilization. * Convert ClientSession to follow builder pattern * Simplify TestDeltaLakeRenameToWithGlueMetastore * Simplify Delta Lake query runner * Add Trino 395 release notes * [maven-release-plugin] prepare release 395 * [maven-release-plugin] prepare for next development iteration * Truncate month names in release note headers * Simplify test case The bug fixed in trinodb#13152 can be reproduced with a simpler query: SELECT x, y FROM t1 JOIN t2 ON (t1.id = t2.id) WHERE IF(t1.v = 0, 'cc', y) = 'b' * Rename DefLevelIterables to DefLevelWriterProviders * Refactor DefLevelWriterProviders to improve optimized parquet writer Avoid iterators, streams and optionals when writing definition levels to improve performance. Before Benchmark HiveFileFormat#write (compression NONE, benchmarkFileFormat TRINO_PARQUET) MAP_VARCHAR_DOUBLE 80.6MB/s ± 1964.5kB/s ( 2.38%) (N = 45, α = 99.9%) LARGE_MAP_VARCHAR_DOUBLE 108.4MB/s ± 3725.3kB/s ( 3.36%) (N = 45, α = 99.9%) MAP_INT_DOUBLE 90.6MB/s ± 1461.7kB/s ( 1.58%) (N = 45, α = 99.9%) LARGE_MAP_INT_DOUBLE 94.4MB/s ± 1490.0kB/s ( 1.54%) (N = 45, α = 99.9%) LARGE_ARRAY_VARCHAR 91.9MB/s ± 1458.6kB/s ( 1.55%) (N = 45, α = 99.9%) After MAP_VARCHAR_DOUBLE 114.9MB/s ± 5665.1kB/s ( 4.82%) (N = 45, α = 99.9%) LARGE_MAP_VARCHAR_DOUBLE 136.8MB/s ± 3532.4kB/s ( 2.52%) (N = 45, α = 99.9%) MAP_INT_DOUBLE 114.9MB/s ± 3012.9kB/s ( 2.56%) (N = 45, α = 99.9%) LARGE_MAP_INT_DOUBLE 124.3MB/s ± 3292.7kB/s ( 2.59%) (N = 45, α = 99.9%) LARGE_ARRAY_VARCHAR 102.9MB/s ± 2475.0kB/s ( 2.35%) (N = 45, α = 99.9%) * Make checkCanExecuteFunction checks future proof Previously they had special handling for `TABLE` function kind. They are now changed to no longer implicitly allow unknown function kinds. * Add access control for views with table functions Adds necessary AccessControl interfaces to support views with table functions. They key change is introduction of `checkCanGrantExecuteFunctionPrivilege` that takes `FunctionKind` and a qualified function name, as an overload of existing `checkCanGrantExecuteFunctionPrivilege` that takes plain function name. This follows existing overloads for `checkCanExecuteFunction`. * Extract method to get table location in Iceberg product test * Add test for write.metadata.compression-codec property in Iceberg * use correct identity to initiate delete in HDFS * Update link to issue in release notes * Add hasNonNullValue support to ArrayBlockBuilder Produce RunLengthEncodedBlock in ArrayBlockBuilder when all values are null * Add public getRawSlice and getRawSliceOffset to VariableWidthBlock * Improve SlicePositionsAppender performance The improvement comes from not allocating Slice per position and using System.arrayCopy directly instead of Slice.getBytes. Before Benchmark (channelCount) (enableCompression) (nullRate) (partitionCount) (positionCount) (type) Mode Cnt Score Error Units BenchmarkPartitionedOutputOperator.addPage 1 false 0 16 8192 VARCHAR avgt 20 2137.367 ± 166.834 ms/op BenchmarkPartitionedOutputOperator.addPage 1 false 0.2 16 8192 VARCHAR avgt 20 1660.550 ± 24.274 ms/op After BenchmarkPartitionedOutputOperator.addPage 1 false 0 16 8192 VARCHAR avgt 20 1399.476 ± 181.301 ms/op BenchmarkPartitionedOutputOperator.addPage 1 false 0.2 16 8192 VARCHAR avgt 20 1194.077 ± 160.496 ms/op * Extract method to prepare Pinot testing tables * Extend BaseConnectorSmokeTest in Pinot tests Made the AbstractPinotIntegrationSmokeTest to extended the same to BaseConnectorSmokeTest Renamed AbstractPinotIntegrationSmokeTest to BasePinotIntegrationConnectorSmokeTest * Extract quote method to BigQueryUtil * Support commenting on table and column in BigQuery * Update Antlr to 4.11.1 * Update jdbi to 3.32.0 * Make ExpressionAssert evaluation lazy In preparation to add support for binding values. Adds a new evaluate() method to force evaluation, which is needed for usages in the context of assertThrownBy() * Run expresion assertions via constant folder and engine * Add operator() and function() to QueryAssertions Also, deprecate various methods in FunctionAssertions and AbstractTestFunctions in favor of these. * Add isNull(type) shorthand to expression assertion Syntactic sugar for: assertThat(...) .hasType(type) .isNull() * Migrate TestArrayFunctions to new assertions * Migrate TestConditions to new asserts * Migrate TestArrayContainsSequence to new assertions * Migrate TestArrayExceptFunction to new assertions * Migrate TestArrayFilterFunction to new assertions * Migrate TestArrayMatchFunctions to new assertions * Use operator assertions in TestColor * Migrate TestDataSizeFunctions to new assertions * Drop useless test * Ensure test table contains multiple active files * Increase default domain-compaction-threshold for phoenix * Increase default domain-compaction-threshold for clickhouse * Refactor jdbc domain-compaction-threshold documentation This allows specifying the default for each connector separately * Change sligthly the exception message * Disallow writes on materialized views * Reveal @experimental in javadoc * Fix "No bucket node map" failure for Iceberg inserts * Unbind MetastoreTypeConfig in Hive tests This removes `MetastoreTypeConfig` binding added in 7229738 and replaces it with an annotated boolean, decoupling the `HiveMetadata` logic from metastore implementations. * Deliver RemoteSplit exactly once * Ensure noMoreInputs is not called too early Call ExchangeDataSource#noMoreInputs only after all ExchangeOperator's have been created and have received noMoreSplits from the engine * Allow only single QueryOutputInfo listener To further allow releasing query inputs as they are being consumed * Ensure query results inputs are delivered exactly once In addition to that make sure the inputs are not retained in memory longer than necessary. While it is unlikely that the number of inputs will be large with pipelined execution it may no longer be true with fault tolerant execution. * Do not allow exchange inputs to be added more than once * Verify table just after adding column in testAddColumn * Make StandardColumnMappings::longTimestampReadFunction public * Rename TestClickHouseTypeMapping to BaseClickHouseTypeMapping * Make TestClickHouseTypeMapping extend BaseClickHouseTypeMapping * Rename method addTimestampRoundTrips to timestampTest * Introduce `TrinoToClickHouseWriteChecker` Different versions of ClickHouse may support different min/max values for the same data type, you can refer to the table below: | version | column type | min value | max value | |---------|-------------|---------------------|----------------------| | any | UInt8 | 0 | 255 | | any | UInt16 | 0 | 65535 | | any | UInt32 | 0 | 4294967295 | | any | UInt64 | 0 | 18446744073709551615 | | < 21.4 | Date | 1970-01-01 | 2106-02-07 | | < 21.4 | DateTime | 1970-01-01 00:00:00 | 2106-02-06 06:28:15 | | >= 21.4 | Date | 1970-01-01 | 2149-06-06 | | >= 21.4 | DateTime | 1970-01-01 00:00:00 | 2106-02-07 06:28:15 | And when the value written to ClickHouse is out of range, ClickHouse will store the incorrect result, so we introduced `TrinoToClickHouseWriteChecker` to check the range of the written value to prevent ClickHouse from storing the incorrect value. Introducing `TrinoToClickHouseWriteChecker` is also a preparation for supporting `DateTime[timezone]` and `DateTime64(precision, [timezone])`. The next several commits will use `TrinoToClickHouseWriteChecker` to verify the values written to ClickHouse. * Use `TrinoToClickHouseWriteChecker` to validate UInt8 * Use `TrinoToClickHouseWriteChecker` to validate UInt16 * Use `TrinoToClickHouseWriteChecker` to validate UInt32 * Use `TrinoToClickHouseWriteChecker` to validate UInt64 * Make ClickHouse Date min/max/unsupportedDate tests customizable This is a preparatory commit to use `TrinoToClickHouseWriteChecker` to validate Date. * Extract errorMessageForDateYearOfEraPredicate method * Use `TrinoToClickHouseWriteChecker` to validate Date * Make ClickHouse DateTime min/max/unsupportedDateTime tests customizable This is a preparatory commit to use `TrinoToClickHouseWriteChecker` to validate DateTime. * Use `TrinoToClickHouseWriteChecker` to validate DateTime Co-authored-by: Patryk Owczarek <owczarekpatryk97@gmail.com> Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com> Co-authored-by: Karol Sobczak <napewnotrafi@gmail.com> Co-authored-by: skrzypo987 <krzysztof.skrzypczynski@starburstdata.com> Co-authored-by: Yuya Ebihara <ebyhry@gmail.com> Co-authored-by: Michiel De Smet <mdesmet@gmail.com> Co-authored-by: Zebing Lin <linzebing1995@gmail.com> Co-authored-by: Marius Grama <findinpath@gmail.com> Co-authored-by: Colebow <cole.bowden@starburstdata.com> Co-authored-by: Martin Traverso <mtraverso@gmail.com> Co-authored-by: Terry Blessing <terry.blessing@starburstdata.com> Co-authored-by: Joe Lodin <joe.lodin@starburstdata.com> Co-authored-by: Lukasz Stec <lukasz.stec@starburstdata.com> Co-authored-by: Charles Morgan <charles.morgan@starburstdata.com> Co-authored-by: tangjiangling <surpass.simple@gmail.com> Co-authored-by: Andrii Rosa <andriyrosa@gmail.com> Co-authored-by: rkondziolka <radoslaw.kondziolka@starburstdata.com> Co-authored-by: Akshay Rai <akrai@linkedin.com> Co-authored-by: Raunaq Morarka <raunaqmorarka@gmail.com> Co-authored-by: Wei Liu <wliufle@gmail.com> Co-authored-by: Nagaraj Tantri <nagarajtantri@gmail.com> Co-authored-by: David Phillips <david@acz.org>

cla-bot bot added the cla-signed label Jul 12, 2022

martint requested review from kasiafi and sopel39 July 12, 2022 13:03

dain approved these changes Jul 12, 2022

View reviewed changes

martint merged commit 06e7526 into trinodb:master Jul 13, 2022

github-actions bot added this to the 390 milestone Jul 13, 2022

colebow mentioned this pull request Jul 13, 2022

Add Trino 390 release notes #13130

Merged

findepi reviewed Jul 18, 2022

View reviewed changes

martint added a commit to martint/trino that referenced this pull request Sep 6, 2022

Simplify test case

b20d6b6

The bug fixed in trinodb#13152 can be reproduced with a simpler query: SELECT x, y FROM t1 JOIN t2 ON (t1.id = t2.id) WHERE IF(t1.v = 0, 'cc', y) = 'b'

martint mentioned this pull request Sep 6, 2022

Simplify test case #14022

Merged

martint added a commit that referenced this pull request Sep 8, 2022

Simplify test case

00951cf

The bug fixed in #13152 can be reproduced with a simpler query: SELECT x, y FROM t1 JOIN t2 ON (t1.id = t2.id) WHERE IF(t1.v = 0, 'cc', y) = 'b'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid overriding nullability decision#13152

Avoid overriding nullability decision#13152
martint merged 1 commit intotrinodb:masterfrom
martint:reorder-join-fix

martint commented Jul 12, 2022

Uh oh!

dain commented Jul 12, 2022

Uh oh!

findepi Jul 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

martint commented Jul 12, 2022

Documentation

Release notes

Uh oh!

dain commented Jul 12, 2022

Uh oh!

findepi Jul 18, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants