feat(plugin-iceberg): Add support for tinyint and smallint datatypes by mapping them to Iceberg INTEGER type#27461
Conversation
Reviewer's GuideMaps Presto SMALLINT and TINYINT types to Iceberg INTEGER in the Iceberg connector and adds end-to-end tests validating table DDL/DML, partitioning, casting, and arithmetic behavior for these types. Sequence diagram for mapping SMALLINT and TINYINT to Iceberg INTEGER during table creationsequenceDiagram
actor User
participant PrestoCLI
participant PrestoEngine
participant IcebergConnector
participant TypeConverter
participant IcebergCatalog
User->>PrestoCLI: submit CREATE TABLE with tinyint_col, smallint_col
PrestoCLI->>PrestoEngine: send SQL
PrestoEngine->>IcebergConnector: plan table creation
IcebergConnector->>TypeConverter: toIcebergType(tinyint_col_type)
TypeConverter-->>IcebergConnector: Iceberg IntegerType
IcebergConnector->>TypeConverter: toIcebergType(smallint_col_type)
TypeConverter-->>IcebergConnector: Iceberg IntegerType
IcebergConnector->>IcebergCatalog: create table with INTEGER columns
IcebergCatalog-->>IcebergConnector: table created
IcebergConnector-->>PrestoEngine: table metadata
PrestoEngine-->>PrestoCLI: CREATE TABLE success
PrestoCLI-->>User: table created with tinyint_col, smallint_col
Updated class diagram for Iceberg TypeConverter mappingsclassDiagram
class Type
class IntegerType
class BigintType
class SmallintType
class TinyintType
class IcebergIntegerType
class IcebergLongType
class TypeConverter {
+static org_apache_iceberg_types_Type toIcebergType(Type type)
}
Type <|-- IntegerType
Type <|-- BigintType
Type <|-- SmallintType
Type <|-- TinyintType
TypeConverter ..> IntegerType : checks_instanceof
TypeConverter ..> BigintType : checks_instanceof
TypeConverter ..> SmallintType : checks_equality
TypeConverter ..> TinyintType : checks_equality
TypeConverter ..> IcebergIntegerType : returns_for_Integer_Smallint_Tinyint
TypeConverter ..> IcebergLongType : returns_for_Bigint
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
f2a6d4e to
ebdb957
Compare
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The SMALLINT/TINYINT mapping in TypeConverter duplicates the existing IntegerType handling; consider folding these into a single branch (e.g., by treating SMALLINT/TINYINT as IntegerType-compatible in one place) to keep the mapping logic centralized and easier to maintain.
- The new tests repeatedly create and drop hard-coded table names inside each method; consider extracting a helper or using randomized/unique table names with a shared cleanup (e.g., in @AfterMethod) to reduce duplication and avoid potential name collisions when tests are run in parallel.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The SMALLINT/TINYINT mapping in TypeConverter duplicates the existing IntegerType handling; consider folding these into a single branch (e.g., by treating SMALLINT/TINYINT as IntegerType-compatible in one place) to keep the mapping logic centralized and easier to maintain.
- The new tests repeatedly create and drop hard-coded table names inside each method; consider extracting a helper or using randomized/unique table names with a shared cleanup (e.g., in @AfterMethod) to reduce duplication and avoid potential name collisions when tests are run in parallel.
## Individual Comments
### Comment 1
<location path="presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergSmallintTinyintTypes.java" line_range="125-50" />
<code_context>
+ public void testPartitionedTableWithSmallintTinyint()
</code_context>
<issue_to_address>
**suggestion (testing):** Extend partitioning test to cover TINYINT as a partition column or predicate target
Since this feature also adds TINYINT support, this test should exercise it directly. Either partition by `tiny_col` (or both `small_col` and `tiny_col`) and assert pruning via a TINYINT predicate, or at least add a predicate on `tiny_col` to the existing test so we verify partitioning behavior for both types.
Suggested implementation:
```java
String tableName = "test_partitioned_smallint_tinyint";
assertUpdate("DROP TABLE IF EXISTS " + tableName);
// Create partitioned table with SMALLINT and TINYINT partition columns
assertUpdate("CREATE TABLE " + tableName + " (" +
"id INTEGER, " +
"small_col SMALLINT, " +
"tiny_col TINYINT, " +
"data VARCHAR) " +
"WITH (PARTITIONING = ARRAY['small_col', 'tiny_col'])");
```
To fully implement the suggestion, the rest of `testPartitionedTableWithSmallintTinyint` should:
1. Insert multiple rows that differ in both `small_col` and `tiny_col` so that pruning can be observed on each dimension.
2. Include at least one query that filters *only* on `tiny_col` (e.g., `WHERE tiny_col = <value>`), and assert the expected row count and values.
3. Optionally, add a query that filters on both `small_col` and `tiny_col` together to confirm partition pruning works when both partition columns are constrained.
4. If the test currently only has predicates on `small_col`, extend or duplicate those assertions to use predicates on `tiny_col` analogously, ensuring both types are covered by partition pruning checks.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| assertEquals(result.getMaterializedRows().get(0).getField(1), 100); | ||
| assertEquals(result.getMaterializedRows().get(1).getField(1), 32767); | ||
| assertEquals(result.getMaterializedRows().get(2).getField(1), -32768); | ||
| assertUpdate("DROP TABLE " + tableName); |
There was a problem hiding this comment.
suggestion (testing): Extend partitioning test to cover TINYINT as a partition column or predicate target
Since this feature also adds TINYINT support, this test should exercise it directly. Either partition by tiny_col (or both small_col and tiny_col) and assert pruning via a TINYINT predicate, or at least add a predicate on tiny_col to the existing test so we verify partitioning behavior for both types.
Suggested implementation:
String tableName = "test_partitioned_smallint_tinyint";
assertUpdate("DROP TABLE IF EXISTS " + tableName);
// Create partitioned table with SMALLINT and TINYINT partition columns
assertUpdate("CREATE TABLE " + tableName + " (" +
"id INTEGER, " +
"small_col SMALLINT, " +
"tiny_col TINYINT, " +
"data VARCHAR) " +
"WITH (PARTITIONING = ARRAY['small_col', 'tiny_col'])");To fully implement the suggestion, the rest of testPartitionedTableWithSmallintTinyint should:
- Insert multiple rows that differ in both
small_colandtiny_colso that pruning can be observed on each dimension. - Include at least one query that filters only on
tiny_col(e.g.,WHERE tiny_col = <value>), and assert the expected row count and values. - Optionally, add a query that filters on both
small_colandtiny_coltogether to confirm partition pruning works when both partition columns are constrained. - If the test currently only has predicates on
small_col, extend or duplicate those assertions to use predicates ontiny_colanalogously, ensuring both types are covered by partition pruning checks.
sumi-mathew
left a comment
There was a problem hiding this comment.
Thanks @nishithakbhaskaran. LGTM!
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks @nishithakbhaskaran for this change, overall looks good to me! Could you also add the the type mapping message to Iceberg documentation?
Trino seems to only address TINYINT and SMALLINT in the iceberg.system.migrate procedure, without fully supporting them across the Iceberg connector. However, after checking the documentation of Spark, Flink, and Hive regarding Iceberg type compatibility, I found that they all directly map TINYINT and SMALLINT to INTEGER. Therefore, aligning with them is a good approach, and it also follows the expected behavior of the Iceberg community.
ca88c82 to
a7c7e90
Compare
@hantangwangd Updated the docs. CPTL? Thanks!! |
NivinCS
left a comment
There was a problem hiding this comment.
Thanks for the change, LGTM.
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks @nishithakbhaskaran
Description
Add support for SMALLINT and TINYINT columns in presto-iceberg by mapping them to Iceberg INTEGER type.
Motivation and Context
#27444
Impact
Test Plan
Added Unit Tests to cover the scenarios.
Contributor checklist
Summary by Sourcery
Add Iceberg connector support for Presto SMALLINT and TINYINT types by mapping them to Iceberg INTEGER and validating behavior end-to-end.
New Features:
Tests:
Summary by Sourcery
Add Iceberg connector support for Presto SMALLINT and TINYINT types by mapping them to the Iceberg INTEGER type and validating end-to-end behavior.
New Features:
Tests:
Release Notes
Please follow release notes guidelines and fill in the release notes below.