Skip to content

Update#1

Merged
MirrorChu merged 146 commits intoMirrorChu:masterfrom
trinodb:master
Apr 17, 2021
Merged

Update#1
MirrorChu merged 146 commits intoMirrorChu:masterfrom
trinodb:master

Conversation

@MirrorChu
Copy link
Copy Markdown
Owner

No description provided.

sopel39 and others added 30 commits March 30, 2021 12:29
Before the change, each invocation of `assertMetadataCalls` opened a new
connection, even though the test setup provides one.
Number of `DatabaseMetaData` methods accept name patterns (e.g.
`schemaPattern`, `tableNamePattern`, `columnNamePattern`, etc.). Values
provided, if ought to be treated as literals, should have underscores
`_` and percent signs `%` escaped with
`DatabaseMetaData.getSearchStringEscape()`. Sadly, many client
applications do not do that, which results in metadata queries which
cannot be answered efficiently by the server.

This commit adds
`assumeLiteralNamesInMetadataCallsForNonConformingClients`, a
compatibility flag as a workaround for such misbehaving client
applications.
Wording copied from session property.
This captures the columns created/inserted/updated in a given query
for consistency, TEXTFILE is already used by other tests in that class.
PushJoinIntoTableScan must run:

- after ReorderJoins to ensure optimal join ordering is preserved
- after DetermineJoinDistributionType to ensure table statistics are
  available to determine join distribution
- before DetermineTableScanNodePartitioning because the new table handle
  returned from PushJoinIntoTableScan doesn't have
  useConnectorNodePartitioning set which will cause a planner failure
Inline empty block of code
After the timestamp semantics fix, the behavior of the variant of the
function that takes timestamp(p) became inconsistent with the
timestamp(p) w/ time zone variant. Unix time represent a point in time
(vs a wall clock time), so, conceptually, a timestamp(p) first needs to
be converted to a timestamp(p) with time zone before the transformation
is applied.

Currently, such timestamps are interpreted as being UTC, which is confusing
and counter-intuitive, especially in comparison with an invocation that uses
an explicit cast:

    SELECT
      to_unixtime(localtimestamp),
      to_unixtime(cast(localtimestamp as timestamp with time zone))

          _col0       |      _col1
    ------------------+------------------
     1.617009524129E9 | 1.617034724129E9

This fixes the semantics of the function by relying on the standard coercion
rules to convert from timestamp(p) -> timestamp(p) w/ time zone
Unix time represents a point in time, so it is natural for this
function to return a timestamp with time zone instead of a
plain timestamp.
losipiuk and others added 29 commits April 14, 2021 09:08
It should not sendUpdate before task been started,
while the needsUpdate flag in RemoteHttpTask is not enough to ensure this,
which case will happen on all non-leaf stage, as 450-470 lines in
SqlStageExecution.java
Although task will start soon after noMoreSplits, but we should avoid
this, which is unnecessary and not strictly accurate in principle.
 - order fields
 - remove extra line
 - make things private and static where possible
This brings no user-visible changes.
If the _orc_acid_version file is not present for a table we are not
failing read flow immediatelly. Instead we are validating if ORC ACID
version is supported by Trino using ORC data files user metadata. The
ORC ACID version is recorded in data file metadata under hive.acid.version
by recent Hive versions.
MetadataManager.resolveOperator() and MetadataManager.getCoercion() are frequently invoked during planning.
Add cache for them could reduce many planning time for IN predicate with a large list.

before the change
Benchmark (stage) Mode Cnt Score Error Units
BenchmarkPlanner.planLargeInQuery optimized avgt 20 17808.872 ± 326.616 ms/op
BenchmarkPlanner.planLargeInQuery created avgt 20 52.415 ± 2.171 ms/op

After the change
Benchmark (stage) Mode Cnt Score Error Units
BenchmarkPlanner.planLargeInQuery optimized avgt 20 5110.045 ± 88.355 ms/op
BenchmarkPlanner.planLargeInQuery created avgt 20 50.761 ± 1.200 ms/op
They fail pretty much in every build.
The fix PR is ready, but still needs some paper work.
Plugin directory is configured by `plugin.dir` and not `catalog.config-dir`
With previous  query assertion was failing for connectors which exposed totalprice as
double value due to different double rounding behaviour in Trino and H2.
See referenced issue for more info.
@MirrorChu MirrorChu merged commit b49e1bb into MirrorChu:master Apr 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.