[HUDI-3345][RFC-36] Hudi metastore server #4718

minihippo · 2022-01-29T14:09:31Z

What is the purpose of the pull request

A new rfc for hudi metastore server

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

hudi-bot · 2022-01-29T16:26:56Z

CI report:

3208c9f Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

vinothchandar

Thanks for the detailed proposal @minihippo . Reviewing!

vinothchandar

I think the direction here is great. Can we also flesh out Some of the details? Iam happy to push a Commit with my edits to the design. You can let me know how you like it or revert if you don't. Let me know if You are Okay with it

vinothchandar · 2022-02-17T23:11:05Z

rfc/rfc-36/rfc-36.md

+
+**How Hudi metadata is stored**
+
+The metadata of hudi are table location, configuration and schema, timeline generated by instants, metadata of each commit / instant, which records files created / updated, new records num and so on in this commit. Besides, the information of files in a hudi table is also a part of hudi metadata.


Would also love to get file listings and column ranges for each file part ofthe metastore

File listing is covered by the metastore in the alpha version, supported by snapshot service part. Does column ranges mean statistics like min, max at column level? Has plan to do it in the next version.

Yes. To be precise, I would like to see if the metastore can serve FILES and COL_STATS partitions of the metadata table.

vinothchandar · 2022-02-17T23:13:35Z

rfc/rfc-36/rfc-36.md

+
+Hive metastore server is widely used as a metadata center in the data warehouse on Hadoop. It stores the metadata for hive tables like their schema, location and partitions. Currently, almost all of the storage or computing engines support registering table information to it, discovering and retrieving metadata from it. Meanwhile, cloud service providers like AWS Glue, HUAWEI Cloud, Google Cloud Dataproc, Alibaba Cloud, ByteDance Volcano Engine all provide Apache Hive metastore compatible catalog. It seems that hive metastore has become a standard in the data warehouse.
+
+Different from the traditional table format like hive table, the data lake table not only has schema, partitions and other hive metadata, but also has timeline, snapshot which is unconventional. Hence, the metadata of data lake cannot be managed by HMS directly.


I would also add a lock provider mechanism to this list

I have a proposal about the lock provider but not in the alpha version. I will add the detailed design of this part to the rfc.

vinothchandar · 2022-02-17T23:16:52Z

rfc/rfc-36/rfc-36.md

+- **Pluggable storage**
+    -  The storage is only responsible for metadata presistency. Therefore, it's doesn't matter what the storage engine is used to store the data, it can be a RDBMS, kv system or file system.
+
+- **Easy to be expanded**


plus one. if we can Make it horizontally Scalable and highly available like all the Standard micro services out there, it will be amazing

vinothchandar · 2022-02-17T23:19:21Z

rfc/rfc-36/rfc-36.md

+    - creating or updating API cannot be invoked directly,  only a new commit completion can trigger it.
+    -  dropping a partition not only deletes the partition and files at metadata level, but also triggers a clean action to do the physical clean that deletes the data on the file system.
+
+- **timeline service**


As you know we have a timeline Server already. Can we merge the existing functionality

Yes, agree with u. About the timeline service, there are two options:

Replace the timeline service started at the driver/coodinator in each job with the service in the metastore. Then all the tasks will connect to the metastore server directly.

A timeline service started in the metastore server and an embeddedTS in each job, tasks connect to the embeddedTS started at job driver, the embeddedTS connects with the metastore server.

Considering the concurrent writing scenario, option1 will bring consistency problems. I prefer the option2, its architecture is more easier to expand and can reduce access pressure on the metastore server side.

Agree with Option 2. Its more scalable. let's discuss details during implementation

vinothchandar · 2022-03-10T19:31:06Z

@minihippo Picking this back up again. What are the next steps in our plan here?

minihippo · 2022-03-12T02:15:46Z

@minihippo Picking this back up again. What are the next steps in our plan here?

@vinothchandar Thanks for the review,

More details for RFC
I will submit a pr about the initial hudi-metastore module which supports basic functions next week

vinothchandar · 2022-03-30T23:55:33Z

@minihippo Sounds good! We can revisit once you have the basic PR out

boneanxs · 2022-03-31T11:27:37Z

@minihippo This is a great work👍, I think it can also solve the problem I recently met: HUDI-3634 as we keep commit instants consistent in the hudi metastore server.

But I'm curious how spark side get metadata of a hudi table(stored in the hudi metastore server) and a hive table (stored in the HMS) in one query(like a hudi table join a hive table)? Will we handle this in the HudiCatalog to get hudi table metadata from hudi metastore server and hive table from HMS, or we provide a unified view in the hudi metastore server, let hudi metastore to request HMS server if it's a hive table?

zhangyue19921010 · 2022-04-18T06:24:49Z

Very valuable idea!

Further, maybe we can do more interesting things based on this very valuable hudi metastore server, which is beneficial to realize Hudi Lake Manager which could decouple hudi ingestion and hudi table service, including cleaner, archival, clustering, compaction and any table services in the feature.

And this lake manager could unify and automatically call put services such as cleaner/clustering/compaction/archive(multi-writer and async) based on this metastore server.

Users only need to care about their own ingest pipline and leave all the table services to the manager to automatically discover and manage the hudi table thereby greatly reducing the pressure of operation and maintenance and the cost of on board.

Maybe We could expand this RFC or raising a new RFC and take this MTS as informations inputs?

CC @yihua and @nsivabalan

minihippo · 2022-04-25T17:28:01Z

@minihippo This is a great work👍, I think it can also solve the problem I recently met: HUDI-3634 as we keep commit instants consistent in the hudi metastore server.

But I'm curious how spark side get metadata of a hudi table(stored in the hudi metastore server) and a hive table (stored in the HMS) in one query(like a hudi table join a hive table)? Will we handle this in the HudiCatalog to get hudi table metadata from hudi metastore server and hive table from HMS, or we provide a unified view in the hudi metastore server, let hudi metastore to request HMS server if it's a hive table?

@boneanxs In ByteDance in house implementation, we do more like the second way. There is a proxy over the hudi metastore server and hive metastore server. The proxy routes requests to the corresponding server according to the table type.

minihippo · 2022-04-25T17:31:04Z

Very valuable idea!

Further, maybe we can do more interesting things based on this very valuable hudi metastore server, which is beneficial to realize Hudi Lake Manager which could decouple hudi ingestion and hudi table service, including cleaner, archival, clustering, compaction and any table services in the feature.

And this lake manager could unify and automatically call put services such as cleaner/clustering/compaction/archive(multi-writer and async) based on this metastore server.

Users only need to care about their own ingest pipline and leave all the table services to the manager to automatically discover and manage the hudi table thereby greatly reducing the pressure of operation and maintenance and the cost of on board.

Maybe We could expand this RFC or raising a new RFC and take this MTS as informations inputs?

CC @yihua and @nsivabalan

@zhangyue19921010 #4309 here it is.

zhangyue19921010 · 2022-04-26T06:42:56Z

Yeap, I read #4309 RFC. What i am thinking is that could we expand this scope. Maybe is more common infrastructure not only clustering/compaction but also clean, archive and any other service in the future :)

minihippo · 2022-04-26T12:31:51Z

@zhangyue19921010 Yes. It's on the list. Hi @yuzhaojing could u supply this part in the RFC?

vinothchandar · 2022-04-26T23:24:13Z

On this RFC, I think the main thing is to decide the first phase scope. IMO, it can be limited to just Hudi tables for now and depending on whether a hudi.metastore.uris is configured or not, the queries will use this metaserver or not.

Does the RFC address high availability/sharding of metadata? Have you thought about these? If the metastore will also deal with locks, then the servers will become stateful. May be we can phase them as well? @minihippo thoughts?

minihippo · 2022-06-07T14:26:16Z

@vinothchandar sorry for replying the comments so late. When design the storage schema of metadata store, tbl_id is in each storage table so that metadata could be sharded by tbl_id, and all metadata of a table is in one shard. There are no problems about joining across the shard.

minihippo · 2022-06-07T14:28:02Z

Short-term plan (target 1.0)

Phase1

Implement the basic functions

Databases and tables store
All actions (i.e. commit, compaction) and operations (i.e. upsert, compact, cluster)
Timeline, instant meta store.
Partition, snapshot store.
Spark/ Flink read/write available based on metastore
Parameters of table/partition level persistence.
e.g. table config

Phase2

Extensions

Schema store and support schema evolution
Concurrency support (will submit a new rfc)
Hudi catalog

prasannarajaperumal · 2022-07-14T06:16:03Z

rfc/rfc-36/rfc-36.md

+
+**How Hudi metadata is stored**
+
+The metadata of hudi are table location, configuration and schema, timeline generated by instants, metadata of each commit / instant, which records files created / updated, new records num and so on in this commit. Besides, the information of files in a hudi table is also a part of hudi metadata.


It would be good to list all the metadata stored in the file system today and explicitly call out the metadata that will be in the metadata server as per this design and the ones we still plan to read from the file system. Eventually I think it is good to evolve the design in the direction of abstracting ALL the details from the file system and then having multiple implementations.

List of metadata I can think of

Dataset Partitioning

Partition to File Group IDs mapping

File Group Id to Data files mapping (Base file, log files for a specific version)

Transaction timeline

Schema registry

Physical data statistics (column stats)

Index (bloom, etc)

Lock Manager

Good suggestions. I will add them and explain how they are stored in the rfc.

This discussion brings me to a high level question. Today column stats are already stored at a file level in metadata table. So do we intend to completely replace metadata table with this new metastore server?
Or do we intend to use metastore server only to store table level stats similar to how hive metastore does that?

Another possibility I can think of is just exposing endpoints via metastore service to interact with different partitions of metadata table as Vinoth pointed out in another comment.
@minihippo

prasannarajaperumal · 2022-07-14T08:04:58Z

rfc/rfc-36/rfc-36.md

+
+RFC-15 metadata table is a proposal that can solve these problems. However, it only manages the metadata of one table. There is a lack of a unified view.
+
+**The integration of Hive metastore and Hudi metadata lacks a single source of truth.**


Agree to this generally. Moving a Hudi user away from Hive Metastore into a Hudi specific metastore requires a lot more work on establishing trust and working out details. Have to remember that a lot of these companies having existing support contracts with engines and they support Hive meta store or cloud specific meta store and I dont see this practical for us to push towards replacing this.

In my opinion, The practical way is to abstract all metadata interactions. Design a metaserver that serves metadata much more efficiently than doing it through file system and plug that into these catalogs (HMS, Glue etc) and for any non-standard API provide a thin client that abstracts remote api calls into the Hudi meta server if configured. Hudi meta server becomes an invisible implementation that if configured will make Hudi write and read path much more efficient.

For the first part, i think it's a question about how we provide a lake metastore that is hive compatible, so that the metastore can connect engines with no effort. There is a discussion / simple idea left in the Hive Metastore Adapter part, the end of the rfc page. Maybe the lake metastore is another store which adapts to the existing hive metastore, and the lake one provides a superset of functions

For the second part, I totally agree with you. That's the features that metastore will support desperately, has more higher priority that hive metastore adapter

prasannarajaperumal · 2022-07-14T08:11:04Z

rfc/rfc-36/rfc-36.md

+
+There are specific requirements of the metastore server in the different scenarios. Through the storage of server is pluggable, considering the general situation of disk storage, good performance of read and write, convenience of development, RDBMS may be a better one to be chosen.
+
+#### Storage Schema


We need to think about the Storage schema version at the highest level and every evolution of the storage schema needs to ensure strict forwards and backwards compatibility. Think about scenarios where hudi is upgraded and then downgraded to an older version in production.

It's a good question about the compatibility. In my opinion, scalability will brings the big changes of the storage schema. So, at first we thought db sharding could make the metastore storage scalable. Tbl_id is a suitable to be shard key. Cause shard limits the join action and only relations belong to the same tbl_id will connect together.

During the metastore evolution at ByteDance, the other reason results in schema changes can all be solved by adding a new column with default value to the table. There is no compatibility problem.

pratyakshsharma · 2022-07-29T12:46:31Z

rfc/rfc-36/rfc-36.md

+      | action     | tinyint   | instant action, commit, deltacommit and etc. |
+      | state      | tinyint   | instant state, requested, inflight and etc.  |
+      | duration   | int       | for heartbeat                                |
+      | start_ts   | timestamp | for heartbeat                                |


What is the reason behind having 2 different fields for heartbeat? Can you please elaborate?

Whether the heartbeat fails, it is judeged by whether duration + start_ts < current_time.

xushiyan · 2022-08-10T02:33:22Z

rfc/rfc-36/rfc-36.md

+- **Easy to be expanded**
+    -  The service is stateless, so it can be scaled horizontally to support higher QPS. The storage can be split vertically to store more data.
+
+- **Compatible with multiple computing engines**


you meant compatible with multiple catalog services? HMS not computing engine precisely..

xushiyan · 2022-08-10T02:44:59Z

rfc/rfc-36/rfc-36.md

+    -  To client, it exposes API that
+
+    - get the latest snapshot of a partition without multiple file version
+    - get the incremental files after a specified timestamp, for incremental reading


to be precise, this incremental files is the files for current incremental query results? In RFC-51 CDC support will enrich the cdc results by having complete changed data. Maybe this API should belong to a "CDC service" that supports current basic incremental files and CDC files? The name "snapshot" usually does not imply CDC capabilities.

xushiyan · 2022-08-10T02:48:16Z

rfc/rfc-36/rfc-36.md

+
+- No external components are introduced and maintained.
+
+crons:


Suggested change

crons:

cons:

xushiyan · 2022-08-10T02:51:39Z

rfc/rfc-36/rfc-36.md

+
+- **tbl_params**
+
+    -  is used to record table level statistics like file num, total size.


if it's meant for statistics, then tbl_stats is more precise. params may lead to confusion with table properties

xushiyan · 2022-08-10T02:52:07Z

rfc/rfc-36/rfc-36.md

+      | update_time | timestamp | partition updated time           |
+      | is_deleted  | tinyint   | whether the partition is deleted |
+
+- **partition****_parmas**


xushiyan · 2022-08-10T02:57:37Z

rfc/rfc-36/rfc-36.md

+
+- **partition****_key_val**
+
+    -  is used to support partition pruning.


a bit confused by "key_val" vs "params". i suppose in "params" we store statistics just like "table params", but don't you also store min, max stats in partition params? if so then partition pruning should leverage "params" too. not sure what is planned for "key_val"

xushiyan · 2022-08-10T03:01:44Z

rfc/rfc-36/rfc-36.md

+
+[TBD]
+
+## Implementation


would love to see some plans on when/how to incorporate existing sync features like hive-sync, glue-sync, datahub-sync, bigquery/snowflake-sync. it would be a good idea to consolidate the sync features

@minihippo cross-posting a related comment here: #5064 (comment)
clarifying the RFC plans will certainly help expedite the implementation's review process

osipovgit · 2025-03-24T08:48:19Z

Hi! This RFC has not been merged, can you explain if we are going to develop Metaserver or if it has been decided to abandon this idea?
@vinothchandar @xushiyan @yihua

xushiyan · 2025-03-25T22:05:04Z

Hi! This RFC has not been merged, can you explain if we are going to develop Metaserver or if it has been decided to abandon this idea? @vinothchandar @xushiyan @yihua

@osipovgit metaserver was implemented - see https://github.com/apache/hudi/tree/master/hudi-platform-service/hudi-metaserver

The RFC was not merged as there are discussion points to be finalized, and the current implementation does not include every feature proposed or to-be-proposed.

minihippo added 2 commits January 29, 2022 22:07

[HUDI-3345][RFC-36] Proposal for hudi metastore server.

242e34c

typo

3208c9f

nsivabalan assigned nsivabalan and vinothchandar Feb 3, 2022

leesf self-assigned this Feb 14, 2022

nsivabalan removed their assignment Feb 16, 2022

vinothchandar added the rfc Request for comments label Feb 17, 2022

vinothchandar reviewed Feb 17, 2022

View reviewed changes

xushiyan self-assigned this Mar 4, 2022

vinothchandar mentioned this pull request Mar 10, 2022

[HUDI-3016][RFC-43] Proposal to implement Table Service Manager #4309

Closed

5 tasks

prasannarajaperumal reviewed Jul 14, 2022

View reviewed changes

pratyakshsharma reviewed Jul 29, 2022

View reviewed changes

xushiyan reviewed Aug 10, 2022

View reviewed changes

xushiyan changed the title ~~[HUDI-3345][RFC-36] Proposal for hudi metastore server.~~ [HUDI-3345][RFC-36] Hudi metastore server Aug 10, 2022

yihua added the priority:critical Production degraded; pipelines stalled label Sep 13, 2022

add hudi catalog abstraction

370d700

minihippo force-pushed the rfc-36 branch from f52be5a to 370d700 Compare October 19, 2022 02:21

minihippo mentioned this pull request Oct 19, 2022

[HUDI-3654] Add new module hudi-metaserver #5064

Merged

5 tasks

github-actions bot added the size:L PR with lines of changes in (300, 1000] label Feb 26, 2024

vinothchandar removed their assignment May 23, 2025

hudi-bot mentioned this pull request Dec 9, 2025

[RFC-36] Hudi metastore server #15011

Open


		How Hudi metadata is stored

		The metadata of hudi are table location, configuration and schema, timeline generated by instants, metadata of each commit / instant, which records files created / updated, new records num and so on in this commit. Besides, the information of files in a hudi table is also a part of hudi metadata.


		Hive metastore server is widely used as a metadata center in the data warehouse on Hadoop. It stores the metadata for hive tables like their schema, location and partitions. Currently, almost all of the storage or computing engines support registering table information to it, discovering and retrieving metadata from it. Meanwhile, cloud service providers like AWS Glue, HUAWEI Cloud, Google Cloud Dataproc, Alibaba Cloud, ByteDance Volcano Engine all provide Apache Hive metastore compatible catalog. It seems that hive metastore has become a standard in the data warehouse.

		Different from the traditional table format like hive table, the data lake table not only has schema, partitions and other hive metadata, but also has timeline, snapshot which is unconventional. Hence, the metadata of data lake cannot be managed by HMS directly.


		RFC-15 metadata table is a proposal that can solve these problems. However, it only manages the metadata of one table. There is a lack of a unified view.

		The integration of Hive metastore and Hudi metadata lacks a single source of truth.


		There are specific requirements of the metastore server in the different scenarios. Through the storage of server is pluggable, considering the general situation of disk storage, good performance of read and write, convenience of development, RDBMS may be a better one to be chosen.

		#### Storage Schema


		- No external components are introduced and maintained.

		crons:


		- tbl_params

		- is used to record table level statistics like file num, total size.


		- partition_key_val

		- is used to support partition pruning.


		[TBD]

		## Implementation

[HUDI-3345][RFC-36] Hudi metastore server #4718

Are you sure you want to change the base?

[HUDI-3345][RFC-36] Hudi metastore server #4718

Uh oh!

Conversation

minihippo commented Jan 29, 2022

What is the purpose of the pull request

Committer checklist

Uh oh!

hudi-bot commented Jan 29, 2022

CI report:

Uh oh!

vinothchandar left a comment

Choose a reason for hiding this comment

Uh oh!

vinothchandar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

minihippo Feb 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

minihippo Feb 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

minihippo Feb 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vinothchandar commented Mar 10, 2022

Uh oh!

minihippo commented Mar 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vinothchandar commented Mar 30, 2022

Uh oh!

boneanxs commented Mar 31, 2022

Uh oh!

zhangyue19921010 commented Apr 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

minihippo commented Apr 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

minihippo commented Apr 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhangyue19921010 commented Apr 26, 2022

Uh oh!

minihippo commented Apr 26, 2022

Uh oh!

vinothchandar commented Apr 26, 2022

Uh oh!

minihippo commented Jun 7, 2022

Uh oh!

minihippo commented Jun 7, 2022

Short-term plan (target 1.0)

Phase1

Phase2

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pratyakshsharma Jul 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

minihippo Feb 18, 2022 •

edited

Loading

minihippo Feb 18, 2022 •

edited

Loading

minihippo Feb 18, 2022 •

edited

Loading

minihippo commented Mar 12, 2022 •

edited

Loading

zhangyue19921010 commented Apr 18, 2022 •

edited

Loading

minihippo commented Apr 25, 2022 •

edited

Loading

minihippo commented Apr 25, 2022 •

edited

Loading

pratyakshsharma Jul 29, 2022 •

edited

Loading

minihippo Jul 20, 2022 •

edited

Loading

minihippo Jul 20, 2022 •

edited

Loading