Skip to content

Conversation

@XiaoHongbo-Hope
Copy link
Contributor

@XiaoHongbo-Hope XiaoHongbo-Hope commented Oct 31, 2025

[core][rest] Add schema validation and inference for REST catalog external tables

Purpose

Currently, REST catalog supported external tables but did not validate schema consistency between the filesystem and server-side schema. External tables always required explicit schema definition even when schema already existed in the filesystem.

This PR

Enhances REST catalog external table creation with schema inference and validation:

  • Validates client-provided schema against filesystem schema before creating table metadata
  • Enables creating external tables without explicit schema when schema exists
    in the location

Examples:
CREATE TABLE t2 (id INT, name STRING) USING paimon LOCATION 'path' (explicit schema)
CREATE TABLE t2 USING paimon LOCATION 'path' (schema inference from filesystem, newly supported)

other fix about external table

  • Ignore NotImplementedException exception when show all tables by load sys.tables to avoid failure caused by external paimon table

Tests

PaimonExternalTableTest
RESTCatalogTest.testCreateExternalTableWithSchemaInference
RESTCatalogTest.testReadSystemTablesWithExternalTable

API and Format

Documentation

@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as draft October 31, 2025 13:52
@XiaoHongbo-Hope XiaoHongbo-Hope changed the title support schema validation and infer for external paimon table [core][rest] support schema validation and infer for external paimon table Oct 31, 2025
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review November 2, 2025 10:03
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as draft November 2, 2025 14:00
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review November 3, 2025 02:06
@liujiayi771
Copy link
Contributor

liujiayi771 commented Nov 3, 2025

Is my understanding correct? There will be support for two ways to create paimon external tables:

  1. Explicitly specifying the schema, which must be consistent with the schema on the file system.
  2. Having no prior knowledge of the schema and directly using the schema from the file system.

@XiaoHongbo-Hope
Copy link
Contributor Author

Is my understanding correct? There will be support for two ways to create paimon external tables:

  1. Explicitly specifying the schema, which must be consistent with the schema on the file system.
  2. Having no prior knowledge of the schema and directly using the schema from the file system.

Totally correct. Should be similar with HiveCatalog

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 0f70166 into apache:master Nov 3, 2025
36 of 44 checks passed
gmdfalk added a commit to gmdfalk/paimon that referenced this pull request Nov 5, 2025
* master: (162 commits)
  [Python] Rename to BATCH_COMMIT_IDENTIFIER in snapshot.py
  [Python] Suppport multi prepare commit in the same TableWrite  (apache#6526)
  [spark] Fix drop temporary view (apache#6529)
  [core] skip validate main branch before orphan files cleaning (apache#6524)
  [core][spark] Introduce upper transform (apache#6521)
  [Python] Keep the variable names of Identifier consistent with Java (apache#6520)
  [core] Remove hash lookup to simplify interface (apache#6519)
  [core][format] Format Table plan partitions should ignore hidden & illegal dirs (apache#6522)
  [hotfix] Print partition spec and type when error in InternalRowPartitionComputer
  [hotfix] Add more informat to check partition spec in InternalRowPartitionComputer
  [hotfix] Use deleteDirectoryQuietly in TempFileCommitter.clean
  [core] format table: support write file in _temporary at first (apache#6510)
  [core] Support non null column with write type (apache#6513)
  [core][fix] Blob with rolling file failed (apache#6518)
  [core][rest] Support schema validation and infer for external paimon table (apache#6501)
  [hotfix] Correct visitors for TransformPredicate
  [hotfix] Rename to copy from withNewInputs in TransformPredicate
  [core][spark] Support push down transform predicate (apache#6506)
  [spark] Implement SupportsReportStatistics for PaimonFormatTableBaseScan (apache#6515)
  [docs] add docs for auto-clustering of historical partitions (apache#6516)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants