[HUDI-3016][RFC-43] Proposal to implement Table Service Manager #4309

yuzhaojing · 2021-12-14T12:32:37Z

…ce for Hudi

Tips

Thank you very much for contributing to Apache Hudi.
Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end.
Added HoodieClientWriteTest to verify the change.
Manually verified the change by running a job locally.

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

rfc/rfc-42/rfc-42.md

vinothchandar · 2022-03-10T04:56:17Z

cc @prashantwason @nbalajee and folks at uber, who are looking into similar things.

vinothchandar · 2022-03-10T04:59:34Z

@yuzhaojing Can we fold this under the #4718 proposal? It would be awesome to move

Metadata
Table service management
Locking

into a single, highly available metadata layer! This is very exciting, for many reasons, that I can elaborate on

vinothchandar · 2022-03-10T04:59:51Z

cc @minihippo what do you think?

prashantwason · 2022-03-11T02:55:28Z

rfc/rfc-43/rfc-43.md

+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# RFC-43: Implement Compaction/Clustering Service for Hudi


Since this may cover other HUDI operations in future (Async Indexing of metadata table partitions, Clean, etc) maybe rename this to "HUDI Table Management Service".

prashantwason · 2022-03-11T03:03:28Z

rfc/rfc-43/rfc-43.md

+
+### Meta Table
+
+Meta table has been implemented by using an internal HUDI MOR Table to store the required metadata for the compaction/clustering message. This table will be internal to a dataset and will not be exposed directly to the user to write/modify.


Using metadata table for this has some issues:

Scale - Since each write to metadata table will be a deltacommit which may take many seconds (my tests should 5 to 10seconds is minimum), this restricts us to maximum number of tables possible

Job state tracking (start job, complete job, job error) means further writes into the metadata table. This will further lower the number of possible requests / sec that can be served.

I feel there are better approached for saving this information:

Using rocksdb

Using simple "job files" in yaml format in the folder ".hoodie/service". Each job scheduling writes a new job file and on completion we can simply delete the file or archive it. This will be more work than rocksdb.

Using an external store like MySQL or MongoDB (downside is external depedecy ofcourse).

We have a scale of 10000+ tables with ingestion running atleast every 30 mins.

In any case, this could be an Interface with multiple implementations.

Thanks for review.
I will update this document later, the current implementation is already an Interface with multiple implementations.

+1. real time db/key value store like rocksDB or mysql might be a good fit as proposed.

This is a great idea, we can use a HUDI table by default and provide a MYSQL based implementation as configuration.

prashantwason · 2022-03-11T03:06:50Z

rfc/rfc-43/rfc-43.md

+
+### Schema
+
+Being an actual Hoodie Table, the Meta Table needs to have a schema to store the compaction/clustering message. 


The service itself will be a stand alone Java program which will be running all the time. So it does not have to be a HUDI table and hence there is no metadata table.

I feel since this is long running Java program, it should be "fast" and hence use an embedded database like rocksdb.

The hoodie table is only an implementation for this, I think we can add other database for the service like MySQL / rocksdb.

prashantwason · 2022-03-11T03:09:37Z

rfc/rfc-43/rfc-43.md

+
+## Interface design
+
+### Register


What is the different between register and schedule?

I feel we can merge the two to just register:
When you register a job, it will be added to the list of jobs and scheduled as necessary. Some jobs may be periodic (e.g. clean and compaction) and some jobs one-shot (e.g. clustering).

from my understanding, actual scheduling (strategy of when to compact and which file groups to compact) is not delegated to this service. That still lies w/ original hudi table. Once compaction plan is finalized, we delegate it to this service.

So, not sure if we need to overload this service w/ actual scheduling as well. But I do see there is scope of improvement and add the functionality to the service. may be in first cut, we can leave it to original hudi table.

Registration means that this table is scheduled by the service. This interface is to reserve the scheme of subsequent calls through the service, which will be a follow up.In the current first version we can leave the scheduling to the original HUDI table.

@prashantwason @nsivabalan What do you think?

prashantwason · 2022-03-11T03:12:31Z

rfc/rfc-43/rfc-43.md

+"base_path":"/hoodie/base_path",
+"compaction_config":"compaction_config",
+"clustering_config":"clustering_config"
+}


Many other configs will also be required:

retry_on_error: true/false (if auto retry on error)

Periodic: true/false (if should be done periodically)

timeout: timeout for job execution

priority: some priority

context: some config which the user can add and retrieve. This helps us add "tier", "SLA", "owner", "yarn_queue" etc which are useful for the execution engine to schedule the job.

We have over 10000+ tables which belong to different teams. So scheduling jobs on each table should be executed in the table-owners yarn_queue.

+1 for above.
yarn_queue might def be useful.

Don't we need entire write configs for each table as well.for eg, lock provider configurations?

Yes, these parameters are necessary.

prashantwason · 2022-03-11T03:14:14Z

rfc/rfc-43/rfc-43.md

+## Proposal
+
+### Proposal For V1
+


Maybe have a python / GO CLI too for the following:

List all jobs

Add job (useful for testing as well as manually adding operations for tables)

Remove job

Clear jobs for table (etc)

Can we put the implementation of the client in the next stage, I hope the goal of the first step is to be able to complete the task execution of compaction & clustering job.

@yuzhaojing please go ahead and incorporate/list out these future items in the RFC accordingly. then you may resolve the comment

prashantwason · 2022-03-11T03:15:03Z

rfc/rfc-43/rfc-43.md

+
+### Proposal For V2
+
+- Scalable service


What does scalable service imply? More than one node? More than one web server node? Primary-secondary replication?

I want to implement stateless multi-instance nodes.

@yuzhaojing please add the details in the RFC to clarify these questions and resolve comment after updating.

prashantwason · 2022-03-11T03:17:09Z

rfc/rfc-43/rfc-43.md

+
+### Scheduler
+
+Periodically scan the Meta Table and submit compaction/clustering job according to user-specified rules, and need to plug-in the execution engine.


The scheduler will need to consider various things:

Priority of jobs

Resources available to schedule jobs (e.g. a bin-fitting algorithm to execute maximum jobs in given resources)

we might have to introduce scheduling strategy here. FIFO could be one. priority based could be another. for eg, compaction could have higher priority compared to cleaning and cleaning could take higher priority compared to clustering.
I mean, just an example. users should be able to configure these properties.

This is a great addition. I think it is necessary to provide a configurable scheduling strategy.
We can implement a FIFO and priority scheduling strategy first, and leave the interface to improve the scheduling strategy in the follow up.

nsivabalan · 2022-02-28T16:58:17Z

rfc/rfc-43/rfc-43.md

+
+2. Using Async compaction/clustering, in this mode the job execute async but also sharing the resource with HUDI to write a job that may affect the stability of job writing, which is not what the user wants to see.
+
+3. Using independent compaction/clustering job is a better way to schedule the job, in this mode the job execute async and do not sharing resources with writing job, but also has some questions:


Please do add a line here that, users have to enable lock service providers so that there is not data loss. Especially when compaction/clustering is getting scheduled, no other writes should proceed concurrently and hence a lock is required.

Yes, it is very critical!

nsivabalan · 2022-03-11T15:46:21Z

rfc/rfc-43/rfc-43.md

+
+### Meta Table
+
+Meta table has been implemented by using an internal HUDI MOR Table to store the required metadata for the compaction/clustering message. This table will be internal to a dataset and will not be exposed directly to the user to write/modify.


+1. real time db/key value store like rocksDB or mysql might be a good fit as proposed.

nsivabalan · 2022-03-11T15:48:03Z

rfc/rfc-43/rfc-43.md

+
+### Scheduler
+
+Periodically scan the Meta Table and submit compaction/clustering job according to user-specified rules, and need to plug-in the execution engine.


we might have to introduce scheduling strategy here. FIFO could be one. priority based could be another. for eg, compaction could have higher priority compared to cleaning and cleaning could take higher priority compared to clustering.
I mean, just an example. users should be able to configure these properties.

nsivabalan · 2022-03-11T15:54:05Z

rfc/rfc-43/rfc-43.md

+
+4. Follows all the conventions of a HUDI Table
+
+### Scheduler


So, we are introducing 2 long running services here, request handler and scheduler?
any particular reason to de-couple them. why not have one service for both? I understand its easy to logically de-couple them. but we need to ensure availability of both services?
not too strong on this. just wanted to hear your thoughts.

In my concept, the scheduler is not a separate service, it is just a thread in the service.

nsivabalan · 2022-03-11T15:54:47Z

rfc/rfc-43/rfc-43.md

+
+### Scheduler
+
+Periodically scan the Meta Table and submit compaction/clustering job according to user-specified rules, and need to plug-in the execution engine.


how do we ensure availability of these services. what incase the service goes down. devs have to manually restart it? until then, these jobs are never going to be tried right ?

I think as long as the operation of the job is handled appropriately, the availability of the service should be relatively high, because it just accepts the request and writes it to the storage.
Even if there are 10000+ hudi jobs, it will not exceed the qps that the service can handle. This has been verified on the timeline service.

We can provide a solution to monitor the service and automatically restart it when the service goes down, or provide users with a way to start it through the supervisor, I would like to hear your thoughts.

nsivabalan · 2022-03-11T15:56:55Z

rfc/rfc-43/rfc-43.md

+
+Register to Service with CompactionConfig/ClusteringConfig when the job starts. When generating a compaction/clustering plan, report to the compaction/clustering Service address specified in HoodieConfig and request the Service to stop the corresponding compaction/clustering job when the Service is not required or rollback compaction/clustering instant.
+
+### Request Handler


if I understand correctly, request handler is very light weight and may not need any compute resources as such. where as the scheduler will be using the cluster compute resources.

bcoz, the meta table as such may not have more than million entries even for a large deployment (w/ large no of tables).

Yes, I think request handler is very light weight and meta table also not large.

nsivabalan · 2022-03-11T16:00:23Z

rfc/rfc-43/rfc-43.md

+
+## Interface design
+
+### Register


from my understanding, actual scheduling (strategy of when to compact and which file groups to compact) is not delegated to this service. That still lies w/ original hudi table. Once compaction plan is finalized, we delegate it to this service.

So, not sure if we need to overload this service w/ actual scheduling as well. But I do see there is scope of improvement and add the functionality to the service. may be in first cut, we can leave it to original hudi table.

nsivabalan · 2022-03-11T16:00:43Z

rfc/rfc-43/rfc-43.md

+"base_path":"/hoodie/base_path",
+"compaction_config":"compaction_config",
+"clustering_config":"clustering_config"
+}


+1 for above.
yarn_queue might def be useful.

nsivabalan · 2022-03-11T16:01:51Z

rfc/rfc-43/rfc-43.md

+"base_path":"/hoodie/base_path",
+"compaction_config":"compaction_config",
+"clustering_config":"clustering_config"
+}


Don't we need entire write configs for each table as well.for eg, lock provider configurations?

nsivabalan · 2022-03-11T16:07:18Z

rfc/rfc-43/rfc-43.md

+
+2. RequestHandler received request but the commit is not completed.
+
+3. Client rollback plan after request to Compaction/Clustering Service.


by client we mean regular writer to original hudi table right. Let's try to keep the overhead less as possible.

I guess here we are employing push based mechanism. Alternatively, can we think a pull based one. Each hudi table will register w/ this service for table services it likes to delegate. for eg, tableA can delegate just cleaning, tableB can delegate cleaning and compaction.
Once registered, regular writers don't make any further communication w/ the service. The request handler will periodically poll the datatable timeline of these tables and check for any requested instants for table services and work on them. i.e. update the internal meta table, etc.
This way, the client does not need to worry about whether request handler is available or not. if not available, should it abort and continue it inline etc.

This is a good proposal. Such a design will be more flexible. I previously considered that regular polling of the service may concentrate the pressure on the service and cause delays. Performance optimization can be placed in follow up.

The pull based mechanism works for less number of tables. Scanning 1000s of tables for possible services is going to induce lots of load of listing.

The push based model is actually very elegant. Let me explain how I envision it:

Assume a regular table which has some cleaning setting. In the current code, we schedule the clean and then execute the clean in the same Spark application. When hoodie.service.enable=true, the scheduling of clean will be as earlier but the execution part (HoodieWriteClient.clean(...) will detect that TableServices are enabled and instead of running clean, it will schedule it with the TableService by calling in the API endpoint.

Benefits:

Easy Integration with HUDI current functionality

No extra load to list datasets to find operations to run

Confirms to all HUDI write config settings (e.g. when to clean, when to compact)

In case of downtime of TableServices, one can simply disable hoodie.service.enable and revert to inline functionality.

Furthermore, we can implement a Java client for table services - HoodieTableServiceClient - which can enable scheduling any type of services from within the logic of table operation.

E.g. Assume the HoodieWriteConfig enable record level index. Since the index does not exist yet, it needs to be bootstrapped. There are two possibilities here:

The user manually runs async-indexing to initialize the record index

The metadata table code detects that record index is enabled but not present and calls HoodieTableServiceClient.scheduleService(...) to enable async indexing automatically.

The 2 scenario above can be extended to various intelligent cases:

The ingestion side code decided when it is time to compact the MOR table (based on some metric like read time / write time) rather than a fixed schedule like after 10 deltacommits.

yuzhaojing · 2022-03-16T15:15:16Z

@prashantwason @nsivabalan Thanks for the review, I'll be updating the RFC next week, looking forward to more comments from you.

prashantwason

Left some comments as per our requirements.

prashantwason · 2022-04-26T10:18:31Z

rfc/rfc-43/rfc-43.md

+- Independent compaction/clustering job, execute an async compaction/clustering job of another application.
+
+With the increase in the number of HUDI tables, due to a lack of management capabilities, maintenance costs will become
+higher. This proposal is to implement an independent compaction/clustering Service to manage the Hudi


I think we should generalize this PR to include all type of "services" on HUDI Tables.

A service on HUDI table can be defined as any operation which needs to be run async. This includes compaction, clustering, async-indexing, cleaning, validation, etc.

Service definition can take various key-value parameters like:

basepath

priority

mainClass (java class to execute to run the service)

context (key=value params specific to the service execution)

This allows adding more services in the future and running various other management operations on HUDI datasets.

Totally agree, we will evolve in this direction!

Trying to understand the comment "maintenance cost will become higher". Can you explain?

prashantwason · 2022-04-26T10:29:11Z

rfc/rfc-43/rfc-43.md

+
+2. RequestHandler received request but the commit is not completed.
+
+3. Client rollback plan after request to Compaction/Clustering Service.


The pull based mechanism works for less number of tables. Scanning 1000s of tables for possible services is going to induce lots of load of listing.

The push based model is actually very elegant. Let me explain how I envision it:

Assume a regular table which has some cleaning setting. In the current code, we schedule the clean and then execute the clean in the same Spark application. When hoodie.service.enable=true, the scheduling of clean will be as earlier but the execution part (HoodieWriteClient.clean(...) will detect that TableServices are enabled and instead of running clean, it will schedule it with the TableService by calling in the API endpoint.

Benefits:

Easy Integration with HUDI current functionality

No extra load to list datasets to find operations to run

Confirms to all HUDI write config settings (e.g. when to clean, when to compact)

In case of downtime of TableServices, one can simply disable hoodie.service.enable and revert to inline functionality.

prashantwason · 2022-04-26T10:35:01Z

rfc/rfc-43/rfc-43.md

+
+2. RequestHandler received request but the commit is not completed.
+
+3. Client rollback plan after request to Compaction/Clustering Service.


Furthermore, we can implement a Java client for table services - HoodieTableServiceClient - which can enable scheduling any type of services from within the logic of table operation.

E.g. Assume the HoodieWriteConfig enable record level index. Since the index does not exist yet, it needs to be bootstrapped. There are two possibilities here:

The user manually runs async-indexing to initialize the record index

The metadata table code detects that record index is enabled but not present and calls HoodieTableServiceClient.scheduleService(...) to enable async indexing automatically.

The 2 scenario above can be extended to various intelligent cases:

The ingestion side code decided when it is time to compact the MOR table (based on some metric like read time / write time) rather than a fixed schedule like after 10 deltacommits.

prashantwason · 2022-04-26T10:59:04Z

rfc/rfc-43/rfc-43.md

+
+## Implementation
+
+![](service.jpg)


The following interfaces would be required:

API (REST / GRPC) - We would need GRPC

The Request Handler would be common

Execution Engine:

This is the component which executes a Spark job (e.g. Spark-submit) and returns result

Metrics / alerts may also need to be added

We can implement REST API for phase1, and have a common Request Handler.
2 and 3 is very necessary, which is also reflected in the RFC.

yuzhaojing · 2022-05-11T05:16:34Z

In a production environment with limited resources, the concept of priority is required, and different priorities are required for different tables and different actions.
Should we provide an interface of schedulation strategy and use FIFO as default?

yuzhaojing · 2022-05-11T05:19:07Z

@vinothchandar @prashantwason Would like to hear your thoughts on this！

hudi-bot · 2022-05-18T02:34:36Z

CI report:

fbe2769 UNKNOWN
5125110 Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

vinothchandar · 2022-06-08T04:17:45Z

Taking this over, given @prashantwason is now on a break.

prasannarajaperumal · 2022-07-13T04:38:20Z

rfc/rfc-43/rfc-43.md

+- Independent compaction/clustering job, execute an async compaction/clustering job of another application.
+
+With the increase in the number of HUDI tables, due to a lack of management capabilities, maintenance costs will become
+higher. This proposal is to implement an independent compaction/clustering Service to manage the Hudi


Trying to understand the comment "maintenance cost will become higher". Can you explain?

prasannarajaperumal · 2022-07-14T09:41:52Z

rfc/rfc-43/rfc-43.md

+- Perfect metrics and reuse HoodieMetric expose to the outside.
+
+- Provide automatic failure retry for compaction/clustering job.
+


In addition to this - We should also think about a pluggable triggering strategies for any table management service to run. Eventually, we can be more intelligent when we trigger a service I guess.

Trying to understand the comment "maintenance cost will become higher". Can you explain?

If we have more than 1000 tables, each configured with a compaction and a clustering task, then we need to manage more than 2000 table service tasks, and this management cost is very huge. Including but not limited to task failure retry, exception cause troubleshooting, resource management.

In addition to this - We should also think about a pluggable triggering strategies for any table management service to run. Eventually, we can be more intelligent when we trigger a service I guess.

We plan to support triggering service execution via API, is that what you mean？

pluggable triggering strategies

@yuzhaojing this is for scheduling strategy, like priority-based, FIFO, etc, and make it customizable via pluggable interface

This is expected to provide pluggable interfaces to support various scheduling strategies.

@yuzhaojing to resolve this comment, please add the details for "pluggable triggering strategies" in the doc

prasannarajaperumal · 2022-07-14T09:47:39Z

rfc/rfc-43/rfc-43.md

+## Implementation
+
+### Processing mode
+Different processing modes depending on whether the meta server is enabled


I agree that Table Management Service polling the datasets and determining what service to run is a scalable design. We should never do this. We need a scalable persistent queue implementation to schedule "Table Service Job" - This could be file system based or RPC based.

We need a scalable persistent queue implementation to schedule

i think this is the idea below, whether with metaserver or not, TMS is to be triggered by pushing events. Metaserver contains an event bus to dispatch events. In case of no metaserver, TMS can internally keep a queue to buffer the incoming request, as in async requests. This can be clarified in the section below.

xushiyan

Overall i think there should be more design details to be illustrated on the RFC document. For example, would like to see the overview of inter-components interactions/workflows for phase 1, phase 2 (and more future phases?), best if illustrated via diagrams. Please also resolve previous comments wherever applicable.

An important aspect missing from the RFC is how users can package and deploy TMS. Are people expected to run a bundle jar? Or are we providing scripts to run the server? the deployment mode is crucial to promoting this feature.

rfc/rfc-43/rfc-43.md

xushiyan · 2022-08-29T00:12:13Z

rfc/rfc-43/rfc-43.md

+    - The meta server provides a listener that takes as input the uris of the Table Management Service and triggers a callback through the hook at each instant commit, thereby calling the Table Management Service to do the scheduling/execution for the table.
+    ![](service_with_meta_server.png)


what is the design choice here? RPC or REST?

The current design choice is REST.

pls make it explicit

xushiyan · 2022-08-29T00:15:02Z

rfc/rfc-43/rfc-43.md

+- Do not enable meta server
+    - for every write/commit on the table, the table management server is notified.
+      We can set a heartbeat timeout for each hoodie table, and if it exceeds it, we will actively pull it once to prevent the commit request from being lost
+    ![](service_without_meta_server.png)


what is the default heartbeat timeout? how does the configuration look like? how is TMS registered with this heartbeat info? let's be explicit with design details.

At present, this design has been modified to bring all pending instant every time the client requests TMS, and update the RFC document later.

rfc/rfc-43/rfc-43.md

xushiyan · 2022-08-29T00:41:47Z

rfc/rfc-43/rfc-43.md

+    - | name         | type   | comment               |
+            | ------------ | ------ | --------------------- |


please fix the markdown table. it's not showing properly

rfc/rfc-43/rfc-43.md

xushiyan · 2022-08-29T00:44:42Z

rfc/rfc-43/rfc-43.md

+hoodie.table.service.clustering.enable=true
+hoodie.table.service.clean.enable=true
+
+## Proposal


this is not proposal section, this is implementation plan. The whole document is a proposal.

xushiyan · 2022-08-29T00:46:30Z

rfc/rfc-43/rfc-43.md

+5. API（only REST）
+6. Writer
+
+**Landing plan: 0.12**


Suggested change

**Landing plan: 0.12**

**Taget: 0.12**

xushiyan · 2022-08-29T00:52:20Z

rfc/rfc-43/rfc-43.md

+This RFC aims to implement a new Service to manager the Hudi table compaction and clustering action, to test this
+feature, there will be some test tables trigger compaction and clustering action to Service with unit tests for the
+code. Since this is an entirely new feature, I am confident that this will not cause any regressions during and after
+roll out.


Test plan is not only about regression or impacting existing services. For this feature itself, what kinds of tests can be carried out? what scenarios will be covered in the OSS CI? please list out relevant items accordingly.

xushiyan · 2022-10-03T02:34:42Z

rfc/rfc-43/rfc-43.md

+- Perfect metrics and reuse HoodieMetric expose to the outside.
+
+- Provide automatic failure retry for compaction/clustering job.
+


@yuzhaojing to resolve this comment, please add the details for "pluggable triggering strategies" in the doc

xushiyan · 2022-10-03T02:44:31Z

rfc/rfc-43/rfc-43.md

+## Implementation
+
+### Processing mode
+Different processing modes depending on whether the meta server is enabled


We need a scalable persistent queue implementation to schedule

i think this is the idea below, whether with metaserver or not, TMS is to be triggered by pushing events. Metaserver contains an event bus to dispatch events. In case of no metaserver, TMS can internally keep a queue to buffer the incoming request, as in async requests. This can be clarified in the section below.

xushiyan · 2022-10-03T02:47:09Z

rfc/rfc-43/rfc-43.md

+    - The pull-based mechanism works for fewer tables. Scanning 1000s of tables for possible services is going to induce
+      lots of a load of listing.


so we're not doing pull, please make it explicit as this confuses people and makes it sound like you propose pull-based for fewer tables. You can say "As pull-based mechanism won't scale for 1000s of tables, we ..."

xushiyan · 2022-10-03T02:52:06Z

rfc/rfc-43/rfc-43.md

+### Processing flow
+
+- If hudi metaserver is used, after receiving the request, the table management server schedules the relevant table
+  service to the table's timeline


schedules the relevant table service to the table's timeline

need to make it explicit: this is table timeline managed in metaserver, right? it can confuse with the table timeline on storage. Should also mention how metaserver interact with storage in this case.

xushiyan · 2022-10-03T02:54:31Z

rfc/rfc-43/rfc-43.md

+- Persist each table service into an instance table of Table Management Service
+- notify a separate execution component/thread can start executing it
+- Monitor task execution status, update table information, and retry failed table services up to the maximum number of
+  times


why not combine this "Processing flow" section with the "Processing mode" section above? they are basically talking about the same thing. We need the end-to-end flow for both cases, of which the no-metaserver case is missing in "Processing flow" section

xushiyan · 2022-10-03T03:05:22Z

rfc/rfc-43/rfc-43.md

+5. API（only REST）
+6. Writer
+
+**Target: 0.12**


this needs update

nsivabalan · 2022-10-05T14:49:33Z

One high level comment/ask. not sure if its doable already. can regular writers take care of scheduling table services by themselves and only delegate execution to TMS? I am thinking from a deltastreamer standpoint. As of now, Hudi could achieve async non blocking compaction/clustering, bcoz, regular writer will take care of scheduling (while table is frozen (no other writes in flight)) and delegate the execution to a separate thread. if we delegate the scheduling also to TMS, we can't guarantee that there won't be any other inflight operation while scheduling compaction (on which case, scheduling could fail). So, instead of a separate thread executing(as of master), we can let TMS handle the execution alone. This will keep the deltastreamer job lean (less resources), while TMS takes up heavy compute job of executing the table service.

From what I glean, only if scheduling is done by TMS, it goes into its backing storage which the scheduler will be polling. if scheduling is done by regular writer, not sure how this will be handled. sorry, I haven't taken a look at the impl yet.

Is this functionality supported? if not, we should look to support this.

nsivabalan · 2022-10-05T15:01:11Z

Infact, this could also be a problem w/ other writes as well. when scheduling compaction, hudi expects to not have any other delta commits in flight. So, better to let one of the writers to take care of scheduling and delegate the execution to TMS. probably this is specific to compaction. Wanted to remind just incase.

xushiyan · 2022-10-27T08:35:16Z

if we delegate the scheduling also to TMS, we can't guarantee that there won't be any other inflight operation while scheduling compaction

@nsivabalan TMS is a downstream listener to metaserver (or any timeline server used if no metaserver) so TMS is aware of all inflight commits on the registered tables, and we should use that info to generate plans.

can regular writers take care of scheduling table services by themselves and only delegate execution to TMS?

Let's clarify: TMS is responsible for plan generation and scheduling, while execution is send to a separate cluster, and TMS is monitoring the execution. You're proposing to basically delegating execution, it's possible but not really making use of TMS capabilities. I'd suggest full delegation to TMS; we're starting a server, so making full use of it will justify the design and the cost.

zyclove · 2023-12-24T03:04:24Z

@xushiyan @yuzhaojing @danny0405
Hi, Can version 1.0 support this feature? This feature is very necessary. Please push forward the progress.

vinothchandar · 2025-08-28T03:20:42Z

@yuzhaojing are you still interested in pushing this ..

There is some interest from @suryaprasanna & uber team to contribute.

I will wait for few days and close this if there is no response.. We can open a new RFC.

yuzhaojing force-pushed the HUDI-3016 branch from 3be5292 to b368a15 Compare December 14, 2021 12:38

nsivabalan reviewed Dec 14, 2021

View reviewed changes

rfc/rfc-42/rfc-42.md Outdated Show resolved Hide resolved

vinothchandar added the rfc Request for comments label Dec 15, 2021

yihua reviewed Dec 16, 2021

View reviewed changes

rfc/rfc-42/rfc-42.md Outdated Show resolved Hide resolved

yuzhaojing force-pushed the HUDI-3016 branch 2 times, most recently from b4c1b6a to fbe2769 Compare December 17, 2021 07:33

yuzhaojing changed the title ~~[HUDI-3016][RFC-42] Proposal to implement Compaction/Clustering Servi…~~ [HUDI-3016][RFC-43] Proposal to implement Compaction/Clustering Servi… Dec 17, 2021

yuzhaojing force-pushed the HUDI-3016 branch from fbe2769 to 1f72537 Compare December 17, 2021 07:34

vinothchandar self-assigned this Dec 25, 2021

nsivabalan mentioned this pull request Feb 28, 2022

Hudi upsert doesnt trigger compaction for MOR #4839

Closed

vinothchandar assigned xushiyan Mar 3, 2022

vinothchandar assigned prashantwason Mar 10, 2022

prashantwason reviewed Mar 11, 2022

View reviewed changes

nsivabalan reviewed Mar 11, 2022

View reviewed changes

yuzhaojing force-pushed the HUDI-3016 branch 2 times, most recently from 14fff4a to ec26a6b Compare March 21, 2022 01:17

yuzhaojing force-pushed the HUDI-3016 branch from ec26a6b to 9f70edf Compare April 19, 2022 03:22

minihippo mentioned this pull request Apr 25, 2022

[HUDI-3345][RFC-36] Hudi metastore server #4718

Open

5 tasks

prashantwason reviewed Apr 26, 2022

View reviewed changes

yuzhaojing changed the title ~~[HUDI-3016][RFC-43] Proposal to implement Compaction/Clustering Servi…~~ [HUDI-3016][RFC-43] Proposal to implement Table Management Service May 10, 2022

yuzhaojing force-pushed the HUDI-3016 branch from 9f70edf to 5125110 Compare May 18, 2022 01:30

vinothchandar added the priority:blocker Production down; release blocker label Jul 13, 2022

prasannarajaperumal reviewed Jul 14, 2022

View reviewed changes

codope added priority:critical Production degraded; pipelines stalled and removed priority:blocker Production down; release blocker labels Jul 20, 2022

xushiyan requested changes Aug 29, 2022

View reviewed changes

[HUDI-3016] [RFC-43] Implement Table management Service for Hudi

85a69f8

yuzhaojing force-pushed the HUDI-3016 branch from 5125110 to 85a69f8 Compare September 8, 2022 02:02

xushiyan reviewed Oct 3, 2022

View reviewed changes

rfc/rfc-43/rfc-43.md

5. API（only REST）

6. Writer

**Target: 0.12**

Copy link

Member

xushiyan Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs update

xushiyan changed the title ~~[HUDI-3016][RFC-43] Proposal to implement Table Management Service~~ [HUDI-3016][RFC-43] Proposal to implement Table Service Manager May 10, 2023

github-actions bot added the size:L PR with lines of changes in (300, 1000] label Feb 26, 2024

yihua added the area:table-service Table services label Mar 26, 2024

vinothchandar removed their assignment May 23, 2025

vinothchandar assigned vinothchandar and unassigned xushiyan and prashantwason Aug 28, 2025

vinothchandar closed this Sep 3, 2025

github-project-automation bot moved this from 🆕 New to ✅ Done in Hudi PR Support Sep 3, 2025

hudi-bot mentioned this pull request Dec 9, 2025

[RFC-42] Implement Compaction/Clustering Service for Hudi #14952

Open


		### Meta Table

		Meta table has been implemented by using an internal HUDI MOR Table to store the required metadata for the compaction/clustering message. This table will be internal to a dataset and will not be exposed directly to the user to write/modify.


		### Schema

		Being an actual Hoodie Table, the Meta Table needs to have a schema to store the compaction/clustering message.


		### Scheduler

		Periodically scan the Meta Table and submit compaction/clustering job according to user-specified rules, and need to plug-in the execution engine.


		2. Using Async compaction/clustering, in this mode the job execute async but also sharing the resource with HUDI to write a job that may affect the stability of job writing, which is not what the user wants to see.

		3. Using independent compaction/clustering job is a better way to schedule the job, in this mode the job execute async and do not sharing resources with writing job, but also has some questions:


		Register to Service with CompactionConfig/ClusteringConfig when the job starts. When generating a compaction/clustering plan, report to the compaction/clustering Service address specified in HoodieConfig and request the Service to stop the corresponding compaction/clustering job when the Service is not required or rollback compaction/clustering instant.

		### Request Handler


		2. RequestHandler received request but the commit is not completed.

		3. Client rollback plan after request to Compaction/Clustering Service.

		- Perfect metrics and reuse HoodieMetric expose to the outside.

		- Provide automatic failure retry for compaction/clustering job.

		- The meta server provides a listener that takes as input the uris of the Table Management Service and triggers a callback through the hook at each instant commit, thereby calling the Table Management Service to do the scheduling/execution for the table.
		![](service_with_meta_server.png)

		- \| name \| type \| comment \|
		\| ------------ \| ------ \| --------------------- \|

		- The pull-based mechanism works for fewer tables. Scanning 1000s of tables for possible services is going to induce
		lots of a load of listing.


		## Implementation

		![](service.jpg)

[HUDI-3016][RFC-43] Proposal to implement Table Service Manager #4309

[HUDI-3016][RFC-43] Proposal to implement Table Service Manager #4309

Uh oh!

Conversation

yuzhaojing commented Dec 14, 2021

Tips

What is the purpose of the pull request

Brief change log

Verify this pull request

Committer checklist

Uh oh!

Uh oh!

Uh oh!

vinothchandar commented Mar 10, 2022

Uh oh!

vinothchandar commented Mar 10, 2022

Uh oh!

vinothchandar commented Mar 10, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuzhaojing Mar 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuzhaojing Mar 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

yuzhaojing Mar 14, 2022 •

edited

Loading

yuzhaojing Mar 16, 2022 •

edited

Loading