Skip to content

Conversation

@SemionPar
Copy link
Contributor

@SemionPar SemionPar commented Apr 11, 2024

Description

  • extend automatic type coercion support in Iceberg table creation to char type
  • refactor existing tests to reduce duplication (these were essentially the same)

Additional context and related issues

#13981
#20611

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Iceberg
* Automatically use `varchar` as type during table creation when 
  `char` is specified. ({issue}`19336`). ({issue}`21515`)

@cla-bot cla-bot bot added the cla-signed label Apr 11, 2024
@github-actions github-actions bot added the iceberg Iceberg connector label Apr 11, 2024
@findinpath
Copy link
Contributor

https://github.com/trinodb/trino/actions/runs/8646870078/job/23709683367?pr=21515
Failure sounds related to your changes:

TestIcebergParquetConnectorTest>BaseIcebergConnectorTest.testCharVarcharComparison:304 
Expecting message to be:
  "Type not supported for Iceberg: char(3)"
but was:
  "For query 20240411_142226_03881_g6k7z: 
 SELECT k, v FROM test_char_varcharjwzwbc93wu WHERE v = CAST('  ' AS varchar(2))
not equal
Actual rows (up to 100 of 0 extra rows shown, 0 rows in total):
    
Expected rows (up to 100 of 1 missing rows shown, 1 rows in total):
    [3,    ]
"

@SemionPar SemionPar force-pushed the semionpar/char-in-iceberg-ctas branch from e3447a8 to b52bd02 Compare April 15, 2024 12:29
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testDataMappingSmokeTest.DataMappingTestSetup("char(3)", "'ab'", "'zzz'") is failing due to the fact that char is padded upon coercion:

CREATE TABLE test_data_mapping_smoke_char_3_63hssk8sp8 
    AS SELECT CAST(row_id AS varchar(50)) row_id, CAST(value AS char(3)) value, CAST(value AS char(3)) another_column 
    FROM (VALUES   ('null value', NULL),   ('sample value', 'ab'),   ('high value', 'zzz'))  t(row_id, value);

DESCRIBE test_data_mapping_smoke_char_3_63hssk8sp8;
-- MaterializedResult{rows=[[row_id, varchar, , ], [value, varchar, , ], [another_column, varchar, , ]]}

SELECT value FROM test_data_mapping_smoke_char_3_63hssk8sp8;
-- value='ab '

-- from assertion:
SELECT row_id FROM test_data_mapping_smoke_char_3_63hssk8sp8 WHERE rand() = 42 OR value = 'ab'
-- and expect VALUES 'sample value' -> failure as 'ab' != 'ab '

Changing it here breaks it for other connectors:

Error:    TestClickHouseConnectorTest>BaseConnectorTest.testDataMappingSmokeTest:5556->BaseConnectorTest.testDataMapping:5590->AbstractTestQueryFramework.assertQuery:350 For query 20240415_123625_02027_285wm: 
 SELECT row_id FROM test_data_mapping_smoke_char_3_mhz2ny6ak4 WHERE rand() = 42 OR value = 'ab '
not equal
Actual rows (up to 100 of 0 extra rows shown, 0 rows in total):
    
Expected rows (up to 100 of 1 missing rows shown, 1 rows in total):
    [sample value]

I will use filterDataMappingSmokeTestData and change 'ab' -> 'ab ' just for Iceberg tests.

@SemionPar SemionPar force-pushed the semionpar/char-in-iceberg-ctas branch from b52bd02 to c9020c2 Compare April 15, 2024 14:27
@SemionPar SemionPar force-pushed the semionpar/char-in-iceberg-ctas branch 2 times, most recently from 8dcd6d9 to 7df5801 Compare April 18, 2024 08:07
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint @findepi Is it safe to use getSupportedType method for converting char → varchar type? Asking this question because this conversion is little different from the existing mapping (temporal types for the most part).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@SemionPar SemionPar force-pushed the semionpar/char-in-iceberg-ctas branch from 7df5801 to 855056f Compare April 23, 2024 09:31
@ebyhr ebyhr merged commit 6e104f2 into trinodb:master Apr 23, 2024
@github-actions github-actions bot added this to the 446 milestone Apr 23, 2024
SemionPar added a commit to SemionPar/trino that referenced this pull request May 21, 2024
Similar to trinodb#21515, extend type coercion support to char type.
SemionPar added a commit to SemionPar/trino that referenced this pull request Jun 4, 2024
Similar to trinodb#21515, extend type coercion support to char type.
ebyhr pushed a commit that referenced this pull request Jun 5, 2024
Similar to #21515, extend type coercion support to char type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed iceberg Iceberg connector

Development

Successfully merging this pull request may close these issues.

5 participants