Skip to content

feat(plugin-iceberg): Add support for tinyint and smallint datatypes by mapping them to Iceberg INTEGER type#27461

Merged
nishithakbhaskaran merged 1 commit intoprestodb:masterfrom
nishithakbhaskaran:iceberg-tinyint-smallint
Apr 1, 2026
Merged

feat(plugin-iceberg): Add support for tinyint and smallint datatypes by mapping them to Iceberg INTEGER type#27461
nishithakbhaskaran merged 1 commit intoprestodb:masterfrom
nishithakbhaskaran:iceberg-tinyint-smallint

Conversation

@nishithakbhaskaran
Copy link
Copy Markdown
Contributor

@nishithakbhaskaran nishithakbhaskaran commented Mar 30, 2026

Description

Add support for SMALLINT and TINYINT columns in presto-iceberg by mapping them to Iceberg INTEGER type.

Motivation and Context

#27444

Impact

Test Plan

Added Unit Tests to cover the scenarios.

presto> create table iceberg.sales.testtable (id int, name varchar, tinyint_col tinyint, smallint_col smallint);
CREATE TABLE

presto> show create table iceberg.sales.testtable ;
                                    Create Table                                     
-------------------------------------------------------------------------------------
 CREATE TABLE iceberg.sales.testtable (                                              
    "id" integer,                                                                    
    "name" varchar,                                                                  
    "tinyint_col" integer,                                                           
    "smallint_col" integer                                                           
 )                                                                                   
 WITH (                                                                              
    "format-version" = '2',                                                          
    location = 'file:/Users/nishithakbhaskaran/Documents/minio/sales/testtable', 
    "read.split.target-size" = 134217728,                                            
    "write.delete.mode" = 'merge-on-read',                                           
    "write.format.default" = 'PARQUET',                                              
    "write.metadata.delete-after-commit.enabled" = false,                            
    "write.metadata.metrics.max-inferred-column-defaults" = 100,                     
    "write.metadata.previous-versions-max" = 100,                                    
    "write.update.mode" = 'merge-on-read'                                            
 )                                                                                   
(1 row)

presto> insert into iceberg.sales.testtable values(1,'sample',2,3);
INSERT: 1 row

presto> select * from iceberg.sales.testtable;
 id |  name  | tinyint_col | smallint_col 
----+--------+-------------+--------------
  1 | sample |           2 |            3 

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Summary by Sourcery

Add Iceberg connector support for Presto SMALLINT and TINYINT types by mapping them to Iceberg INTEGER and validating behavior end-to-end.

New Features:

  • Support SMALLINT and TINYINT columns in presto-iceberg by mapping them to Iceberg INTEGER type.

Tests:

  • Add integration tests covering table creation, inserts, selects, partitioning, casting, and arithmetic using SMALLINT and TINYINT columns in the Iceberg connector.

Summary by Sourcery

Add Iceberg connector support for Presto SMALLINT and TINYINT types by mapping them to the Iceberg INTEGER type and validating end-to-end behavior.

New Features:

  • Support SMALLINT and TINYINT columns in the Iceberg connector by mapping them to the Iceberg INTEGER type.

Tests:

  • Add integration-style tests for SMALLINT and TINYINT in Iceberg tables covering creation, inserts, selects, partitioning, casting, and arithmetic operations.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==
Iceberg Connector Changes
* Add support for SMALLINT and TINYINT columns in presto-iceberg by mapping them to Iceberg INTEGER type

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Mar 30, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 30, 2026

Reviewer's Guide

Maps Presto SMALLINT and TINYINT types to Iceberg INTEGER in the Iceberg connector and adds end-to-end tests validating table DDL/DML, partitioning, casting, and arithmetic behavior for these types.

Sequence diagram for mapping SMALLINT and TINYINT to Iceberg INTEGER during table creation

sequenceDiagram
    actor User
    participant PrestoCLI
    participant PrestoEngine
    participant IcebergConnector
    participant TypeConverter
    participant IcebergCatalog

    User->>PrestoCLI: submit CREATE TABLE with tinyint_col, smallint_col
    PrestoCLI->>PrestoEngine: send SQL
    PrestoEngine->>IcebergConnector: plan table creation
    IcebergConnector->>TypeConverter: toIcebergType(tinyint_col_type)
    TypeConverter-->>IcebergConnector: Iceberg IntegerType
    IcebergConnector->>TypeConverter: toIcebergType(smallint_col_type)
    TypeConverter-->>IcebergConnector: Iceberg IntegerType
    IcebergConnector->>IcebergCatalog: create table with INTEGER columns
    IcebergCatalog-->>IcebergConnector: table created
    IcebergConnector-->>PrestoEngine: table metadata
    PrestoEngine-->>PrestoCLI: CREATE TABLE success
    PrestoCLI-->>User: table created with tinyint_col, smallint_col
Loading

Updated class diagram for Iceberg TypeConverter mappings

classDiagram
    class Type
    class IntegerType
    class BigintType
    class SmallintType
    class TinyintType

    class IcebergIntegerType
    class IcebergLongType

    class TypeConverter {
        +static org_apache_iceberg_types_Type toIcebergType(Type type)
    }

    Type <|-- IntegerType
    Type <|-- BigintType
    Type <|-- SmallintType
    Type <|-- TinyintType

    TypeConverter ..> IntegerType : checks_instanceof
    TypeConverter ..> BigintType : checks_instanceof
    TypeConverter ..> SmallintType : checks_equality
    TypeConverter ..> TinyintType : checks_equality

    TypeConverter ..> IcebergIntegerType : returns_for_Integer_Smallint_Tinyint
    TypeConverter ..> IcebergLongType : returns_for_Bigint
Loading

File-Level Changes

Change Details Files
Map Presto SMALLINT and TINYINT connector types to Iceberg INTEGER.
  • Extend type conversion to recognize SMALLINT and TINYINT as integer-compatible types.
  • Return Iceberg Types.IntegerType for SMALLINT and TINYINT, consistent with existing INTEGER mapping.
  • Keep existing BIGINT-to-Long and other mappings unchanged.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/TypeConverter.java
Add integration tests covering SMALLINT and TINYINT behavior in the Iceberg connector.
  • Introduce a new query framework test class using IcebergQueryRunner to exercise SMALLINT and TINYINT support.
  • Test create/insert/select for standalone SMALLINT and TINYINT columns, including min/max bounds.
  • Test mixed integer columns (TINYINT, SMALLINT, INTEGER, BIGINT) to verify interoperability.
  • Test INSERT ... SELECT between tables with SMALLINT and TINYINT to ensure round-trip correctness.
  • Test partitioned tables using SMALLINT as a partition column with filters.
  • Test explicit casts from SMALLINT/TINYINT to INTEGER and arithmetic expressions combining them.
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergSmallintTinyintTypes.java

Possibly linked issues

  • #SMALLINT and TINYINT data types not supported for iceberg: PR exactly implements the issue’s request: map SMALLINT/TINYINT to Iceberg INTEGER and verify via tests.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@nishithakbhaskaran nishithakbhaskaran changed the title feat(presto-iceberg): Add support for tiny int and small int datatypes in presto-iceberg feat(presto-iceberg): Add support for tinyint and smallint datatypes Mar 30, 2026
@nishithakbhaskaran nishithakbhaskaran changed the title feat(presto-iceberg): Add support for tinyint and smallint datatypes feat(presto-iceberg): Add support for tinyint and smallint datatypes by mapping them to Iceberg INTEGER type. Mar 31, 2026
@nishithakbhaskaran nishithakbhaskaran changed the title feat(presto-iceberg): Add support for tinyint and smallint datatypes by mapping them to Iceberg INTEGER type. feat(connector): Add support for tinyint and smallint datatypes by mapping them to Iceberg INTEGER type. Mar 31, 2026
@nishithakbhaskaran nishithakbhaskaran changed the title feat(connector): Add support for tinyint and smallint datatypes by mapping them to Iceberg INTEGER type. feat(connector): Add support for tinyint and smallint datatypes by mapping them to Iceberg INTEGER type Mar 31, 2026
@nishithakbhaskaran nishithakbhaskaran force-pushed the iceberg-tinyint-smallint branch 3 times, most recently from f2a6d4e to ebdb957 Compare March 31, 2026 07:01
@nishithakbhaskaran nishithakbhaskaran marked this pull request as ready for review March 31, 2026 07:05
@prestodb-ci prestodb-ci requested review from a team, Shreya-ibm and faizdani-ibm and removed request for a team March 31, 2026 07:05
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The SMALLINT/TINYINT mapping in TypeConverter duplicates the existing IntegerType handling; consider folding these into a single branch (e.g., by treating SMALLINT/TINYINT as IntegerType-compatible in one place) to keep the mapping logic centralized and easier to maintain.
  • The new tests repeatedly create and drop hard-coded table names inside each method; consider extracting a helper or using randomized/unique table names with a shared cleanup (e.g., in @AfterMethod) to reduce duplication and avoid potential name collisions when tests are run in parallel.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The SMALLINT/TINYINT mapping in TypeConverter duplicates the existing IntegerType handling; consider folding these into a single branch (e.g., by treating SMALLINT/TINYINT as IntegerType-compatible in one place) to keep the mapping logic centralized and easier to maintain.
- The new tests repeatedly create and drop hard-coded table names inside each method; consider extracting a helper or using randomized/unique table names with a shared cleanup (e.g., in @AfterMethod) to reduce duplication and avoid potential name collisions when tests are run in parallel.

## Individual Comments

### Comment 1
<location path="presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergSmallintTinyintTypes.java" line_range="125-50" />
<code_context>
+    public void testPartitionedTableWithSmallintTinyint()
</code_context>
<issue_to_address>
**suggestion (testing):** Extend partitioning test to cover TINYINT as a partition column or predicate target

Since this feature also adds TINYINT support, this test should exercise it directly. Either partition by `tiny_col` (or both `small_col` and `tiny_col`) and assert pruning via a TINYINT predicate, or at least add a predicate on `tiny_col` to the existing test so we verify partitioning behavior for both types.

Suggested implementation:

```java
        String tableName = "test_partitioned_smallint_tinyint";
        assertUpdate("DROP TABLE IF EXISTS " + tableName);
        // Create partitioned table with SMALLINT and TINYINT partition columns
        assertUpdate("CREATE TABLE " + tableName + " (" +
                "id INTEGER, " +
                "small_col SMALLINT, " +
                "tiny_col TINYINT, " +
                "data VARCHAR) " +
                "WITH (PARTITIONING = ARRAY['small_col', 'tiny_col'])");

```

To fully implement the suggestion, the rest of `testPartitionedTableWithSmallintTinyint` should:
1. Insert multiple rows that differ in both `small_col` and `tiny_col` so that pruning can be observed on each dimension.
2. Include at least one query that filters *only* on `tiny_col` (e.g., `WHERE tiny_col = <value>`), and assert the expected row count and values.
3. Optionally, add a query that filters on both `small_col` and `tiny_col` together to confirm partition pruning works when both partition columns are constrained.
4. If the test currently only has predicates on `small_col`, extend or duplicate those assertions to use predicates on `tiny_col` analogously, ensuring both types are covered by partition pruning checks.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

assertEquals(result.getMaterializedRows().get(0).getField(1), 100);
assertEquals(result.getMaterializedRows().get(1).getField(1), 32767);
assertEquals(result.getMaterializedRows().get(2).getField(1), -32768);
assertUpdate("DROP TABLE " + tableName);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Extend partitioning test to cover TINYINT as a partition column or predicate target

Since this feature also adds TINYINT support, this test should exercise it directly. Either partition by tiny_col (or both small_col and tiny_col) and assert pruning via a TINYINT predicate, or at least add a predicate on tiny_col to the existing test so we verify partitioning behavior for both types.

Suggested implementation:

        String tableName = "test_partitioned_smallint_tinyint";
        assertUpdate("DROP TABLE IF EXISTS " + tableName);
        // Create partitioned table with SMALLINT and TINYINT partition columns
        assertUpdate("CREATE TABLE " + tableName + " (" +
                "id INTEGER, " +
                "small_col SMALLINT, " +
                "tiny_col TINYINT, " +
                "data VARCHAR) " +
                "WITH (PARTITIONING = ARRAY['small_col', 'tiny_col'])");

To fully implement the suggestion, the rest of testPartitionedTableWithSmallintTinyint should:

  1. Insert multiple rows that differ in both small_col and tiny_col so that pruning can be observed on each dimension.
  2. Include at least one query that filters only on tiny_col (e.g., WHERE tiny_col = <value>), and assert the expected row count and values.
  3. Optionally, add a query that filters on both small_col and tiny_col together to confirm partition pruning works when both partition columns are constrained.
  4. If the test currently only has predicates on small_col, extend or duplicate those assertions to use predicates on tiny_col analogously, ensuring both types are covered by partition pruning checks.

@nishithakbhaskaran nishithakbhaskaran changed the title feat(connector): Add support for tinyint and smallint datatypes by mapping them to Iceberg INTEGER type feat(plugin-iceberg): Add support for tinyint and smallint datatypes by mapping them to Iceberg INTEGER type Mar 31, 2026
sumi-mathew
sumi-mathew previously approved these changes Mar 31, 2026
Copy link
Copy Markdown
Contributor

@sumi-mathew sumi-mathew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nishithakbhaskaran. LGTM!

Copy link
Copy Markdown
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nishithakbhaskaran for this change, overall looks good to me! Could you also add the the type mapping message to Iceberg documentation?

Trino seems to only address TINYINT and SMALLINT in the iceberg.system.migrate procedure, without fully supporting them across the Iceberg connector. However, after checking the documentation of Spark, Flink, and Hive regarding Iceberg type compatibility, I found that they all directly map TINYINT and SMALLINT to INTEGER. Therefore, aligning with them is a good approach, and it also follows the expected behavior of the Iceberg community.

@nishithakbhaskaran
Copy link
Copy Markdown
Contributor Author

nishithakbhaskaran commented Apr 1, 2026

Thanks @nishithakbhaskaran for this change, overall looks good to me! Could you also add the the type mapping message to Iceberg documentation?

Trino seems to only address TINYINT and SMALLINT in the iceberg.system.migrate procedure, without fully supporting them across the Iceberg connector. However, after checking the documentation of Spark, Flink, and Hive regarding Iceberg type compatibility, I found that they all directly map TINYINT and SMALLINT to INTEGER. Therefore, aligning with them is a good approach, and it also follows the expected behavior of the Iceberg community.

@hantangwangd Updated the docs. CPTL? Thanks!!

Copy link
Copy Markdown
Contributor

@NivinCS NivinCS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change, LGTM.

Copy link
Copy Markdown
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nishithakbhaskaran nishithakbhaskaran merged commit 607ebdd into prestodb:master Apr 1, 2026
116 of 117 checks passed
@nishithakbhaskaran nishithakbhaskaran deleted the iceberg-tinyint-smallint branch April 1, 2026 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants