-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Open
Labels
Description
Presto currently allows creating tables with column names that contain leading and/or trailing spaces, but when attempting to perform an INSERT
into such tables, the operation fails due to column name mismatches.
This behavior is inconsistent and creates usability and compatibility issues.
Your Environment
- Presto version used: 0.295-SNAPSHOT
- Storage (HDFS/S3/GCS..): N.A.
- Data source and connector used: Hive
- Deployment (Cloud or On-prem): N.A.
Expected Behavior
Either:
- Presto should not allow creating tables with leading/trailing spaces in column names, or
- If such columns are supported,
INSERT
and other DML operations should work consistently with quoted identifiers.
Current Behavior
Table creation succeeds.
- Querying the table using quoted column names doesn't throw any error:
SELECT " c1", "c2 " FROM ...
- But
INSERT
(and likelyUPDATE
/DELETE
) fails due to column name mismatch.
Steps to Reproduce
- Create a table with column names containing spaces:
CREATE TABLE column_with_leading_trailing_spaces (
" c1" INT,
"c2 " INT
);
- Attempt to insert data:
INSERT INTO column_with_leading_trailing_spaces (" c1")
VALUES (1);
- The query fails with:
Query 20250909_024822_00016_9j458 failed: Table {columns.types=int:int, numRows=0, rawDataSize=0, columns= c1,c2 , transient_lastDdlTime=1757416283, columns.comments=, bucket_count=0, presto_version=testversion, serialization.ddl=struct column_with_leading_trailing_spaces { i32 c1, i32 c2 }, presto_query_id=20250909_024658_00013_9j458, totalSize=0, file.outputformat=org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, numFiles=0, name=column_with_leading_trailing_spaces, serialization.lib=org.apache.hadoop.hive.ql.io.orc.OrcSerde, location=s3a://imjalpreet-db/column_with_leading_trailing_spaces, file.inputformat=org.apache.hadoop.hive.ql.io.orc.OrcInputFormat}.column_with_leading_trailing_spaces does not have columns [c2 , c1]
gist with stacktrace: https://gist.github.com/imjalpreet/9f9d5d0841fb1cdb2e78cb0cdc5f961f
Proposed Discussion Points
- Should Presto normalize column names (trim spaces) during table creation?
- Or should Presto disallow such columns altogether to prevent unexpected behavior?
- If allowed, should DML operations be enhanced to handle such cases gracefully?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
🆕 Unprioritized
Status
🆕 Unprioritized