-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[core] format table: support file compression and custom default for … #6500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces automatic file compression configuration for format tables based on their file format type. Instead of requiring users to explicitly set 'file.compression'='none' in table properties, the system now automatically determines appropriate compression defaults: none for CSV/JSON, snappy for Parquet, and zstd for ORC.
Key changes:
- Added
FORMAT_TABLE_FILE_COMPRESSIONconfig option with fallback toFILE_COMPRESSION - Implemented
getFormatTableFileCompression()utility method to determine compression based on format - Updated catalog implementations (HiveCatalog, RESTCatalog) to auto-populate file compression during table creation
- Removed explicit
'file.compression'='none'declarations from test cases
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| paimon-api/src/main/java/org/apache/paimon/CoreOptions.java | Added FORMAT_TABLE_FILE_COMPRESSION config option and formatTableFileImplementation() method |
| paimon-core/src/main/java/org/apache/paimon/catalog/CatalogUtils.java | Implemented getFormatTableFileCompression() with format-specific defaults |
| paimon-hive/paimon-hive-catalog/src/main/java/org/apache/paimon/hive/HiveCatalog.java | Auto-set file compression during format table creation |
| paimon-core/src/main/java/org/apache/paimon/rest/RESTCatalog.java | Auto-set file compression during format table creation |
| paimon-core/src/main/java/org/apache/paimon/table/format/FormatTableFileWriter.java | Updated to use formatTableFileImplementation() instead of fileCompression() |
| paimon-core/src/test/java/org/apache/paimon/catalog/CatalogTestBase.java | Added test coverage for format-specific file compression defaults |
| paimon-spark/paimon-spark-ut/src/test/scala/org/apache/paimon/spark/table/PaimonFormatTableTest.scala | Removed explicit file compression settings from test cases |
| paimon-spark/paimon-spark-ut/src/test/scala/org/apache/paimon/spark/sql/FormatTableTestBase.scala | Removed explicit file compression settings from test cases |
| paimon-spark/paimon-spark-ut/src/test/java/org/apache/paimon/spark/SparkCatalogWithHiveTest.java | Removed explicit file compression settings from test cases |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } else if (options.containsKey(FORMAT_TABLE_FILE_COMPRESSION.key())) { | ||
| return options.get(FORMAT_TABLE_FILE_COMPRESSION); | ||
| } else { | ||
| return fileCompression(); |
Copilot
AI
Oct 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback logic at line 2311 calls fileCompression() which returns options.get(FILE_COMPRESSION) and can be null. This creates a circular logic issue: if FILE_COMPRESSION key is not present (checked at line 2306), calling fileCompression() will still return null. This method can return null, causing potential NullPointerException when used. The method should either return a non-null default value based on format type (like CatalogUtils.getFormatTableFileCompression() does) or be annotated as @Nullable.
| return fileCompression(); | |
| // Fallback to a sensible default value if neither key is present. | |
| // For example, "none" or another appropriate default. | |
| return "none"; |
Purpose
Linked issue: close #xxx
Tests
API and Format
Documentation