Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dedicating threads to specific types of Fate transactions #5130

Open
dlmarion opened this issue Dec 3, 2024 · 19 comments
Open

Support dedicating threads to specific types of Fate transactions #5130

dlmarion opened this issue Dec 3, 2024 · 19 comments
Assignees
Milestone

Comments

@dlmarion
Copy link
Contributor

dlmarion commented Dec 3, 2024

In main the Manager instantiates two instances of Fate, one for the MetaFateStore and one for the UserFateStore. Each instance of Fate contains a single thread pool of Property.MANAGER_FATE_THREADPOOL_SIZE size which is used to process Fate transactions. The threads in the Fate thread pool try to process transactions in a fair manner, by spending time on a different transaction in the queue when done with the current transactions' step (see #3852 and FateInterleavingIT for more information). The default size of MANAGER_FATE_THREADPOOL_SIZE is 4, which may need to be increased in main, could allow for some transactions to go a long time without making progress (It's possible that this could be the cause of the long delete table transaction times we are seeing in integration tests).

It may be advisable to dedicate some number of threads to specific transaction types. For example, maybe 1 thread dedicated to creating and deleting tables, and a larger thread pool dedicated to bulk imports and compactions. I'm initially thinking of a change to the MANAGER_FATE_THREADPOOL_SIZE property such that instead of it being a count of the number of threads, the property is renamed to MANAGER_FATE_THREADPOOL_CONFIGURATION and contains a json map where the key is a comma separated list of Fate transaction names and the value is the number of threads, like:

{
  "SET_AVAILABILITY,CREATE_TABLE,DELETE_TABLE,CREATE_NAMESPACE,DELETE_NAMESPACE,RENAME_NAMESPACE": 1,
  "BULK_IMPORT": 2,
  "COMPACT": 4,
  "MERGE,SPLIT,IMPORT_TABLE,EXPORT_TABLE": 2
}

The Manager would parse this configuration and start one Fate instance for each Key,Value pair in the map. The Fate instance would be responsible for managing only the types that it's given. Currently MANAGER_FATE_THREADPOOL_SIZE is watched and the size of the threadpool changes when the property changes. That may be harder to do with this idea. I think the name of the transaction is currently serialized into the transaction and I think the transactions can be filtered by type, so this would build upon that capability.

@dlmarion dlmarion added this to the 4.0.0 milestone Dec 3, 2024
@dlmarion
Copy link
Contributor Author

dlmarion commented Dec 3, 2024

It might be useful for COMPACTION_COMMIT to be it's own type with it's own dedicated resources.

@cshannon
Copy link
Contributor

cshannon commented Dec 4, 2024

One other idea I thought of is if we could configure the fate thread pool be to use a PriorityBlockingQueue and then write a comparator to bump the priority of certain fate operation types higher than others so they run first.

This option may not work though because it is not going to be as good as having separate dedicated thread pools to run fate operations by type because if the thread pool is saturated and tied up with long running fate jobs then it still blocks jobs from running, but it at least would bump certain things to the front to run as soon as there are free threads.

@cshannon
Copy link
Contributor

cshannon commented Dec 5, 2024

@kevinrr888 - Is this something you wanted to take a look at? I figure we were involved with the recent FATE changes and updates so one of us could probably work on this and feel free to grab it if you want to but if you were not interested or working on something else I could take a look as well.

@kevinrr888
Copy link
Member

kevinrr888 commented Dec 5, 2024

@cshannon - Yeah, I would be interested in looking at this. Currently working on some end of year stuff right now, so if you're free feel free to pick this up if you would like. I probably wouldn't be able to get to this until next week

@cshannon
Copy link
Contributor

cshannon commented Dec 5, 2024

@cshannon - Yeah, I would be interested in looking at this. Currently working on some end of year stuff right now, so if you're free feel free to pick this up if you would like. I probably wouldn't be able to get to this until next week

@kevinrr888 - Alright, I am planning to work on #5117 and to take a look into #5139 this week (and i will also look to see if there are any other issues that need to be worked on) so not sure yet if i I will have time either this week but i'll let you know if i start looking at it first

@kevinrr888
Copy link
Member

@cshannon - Cool, sounds good

@keith-turner
Copy link
Contributor

One other idea I thought of is if we could configure the fate thread pool be to use a PriorityBlockingQueue and then write a comparator to bump the priority of certain fate operation types higher than others so they run first.

A prioq would not work w/ current code structure. Currently fate has a single thread that scans looking for work. When it finds work it use a transfer queue to directly pass what it found to any of the idle fate threads that are looking for work. The code is structured this way to avoid queuing the same fate op over and over as the work finder thread keeps repeatedly scanning looking for work.

@cshannon
Copy link
Contributor

cshannon commented Dec 6, 2024

A prioq would not work w/ current code structure. Currently fate has a single thread that scans looking for work. When it finds work it use a transfer queue to directly pass what it found to any of the idle fate threads that are looking for work. The code is structured this way to avoid queuing the same fate op over and over as the work finder thread keeps repeatedly scanning looking for work.

I completely forgot that we changed that code to use a transfer queue so yeah that would definitely not work.

@cshannon
Copy link
Contributor

cshannon commented Dec 8, 2024

@kevinrr888 - I didn't get a chance to look at this so you can go ahead and grab it this week, I'll be working on #5014

@kevinrr888
Copy link
Member

@cshannon sounds good, thanks

@keith-turner
Copy link
Contributor

The current fate code creates two thread pools one for meta and one for user. Both thread pools give threads the same name in jstack. Would be nice if work around this gave any new threads pools different names in jstack.

@kevinrr888
Copy link
Member

The current fate code creates two thread pools one for meta and one for user. Both thread pools give threads the same name in jstack. Would be nice if work around this gave any new threads pools different names in jstack.

Yeah, I noticed that as well. I plan to change it here

@kevinrr888
Copy link
Member

@keith-turner and I met yesterday to discuss the overall structure of the new changes to be made. Here is a summary of what we concluded.

The current implementation in 4.0:

1 FATE instance for USER transactions, 1 FATE instance for META transactions.
Each FATE instance has its own set of transaction runners which work on all fate operations in a single thread pool, a work finder (scans the underlying storage for FATE transactions to work on and assigns the work to the runners), a pool watcher (watches for property change and dead threads, resizing pool and submitting more transaction runners if needed), and a dead reservation cleaner (finds and unreserves transactions which are reserved by a now-dead manager).

The changes to be made:

1 FATE instance for USER transactions, 1 FATE instance for META transactions.
Each FATE instance has its own set of executors (new class maybe called FateExecutor). Each FateExecutor works on a subset of fate operations (meaning each has a pool of runners, a work finder, and a pool watcher). Each FATE instance only has one dead reservation cleaner since more than one per instance is unneeded (we are just scanning the storage layer for dead reservations).

There will be two properties, one to configure the USER pools, and one for META pools.

What else we considered:

We considered a set of FATE instances for each transaction type (USER/META), but that was a bit bloated and had some less-desirable functionality as some operations wouldn't care about what FATE instance we operate on and only care about the underlying storage layer. For example, starting a transaction (Fate.startTransaction()) just creates the transaction in the store. With multiple FATE instances, this could be achieved the same through any of them, which makes it confusing for code using FATE.

We considered one work finder per FATE instance but with the current structure of the code using a TransferQueue for the work, this wouldn't be possible. This does have the draw back of having multiple threads scanning the same store for the same things, then just filtering them out based on what work they are assigned to take. Maybe this structure could change in a follow on PR if desired, but as an initial PR, using multiple work finders per FATE instance.

We considered sharing pools between USER/META transactions, but Keith noted that this could lead to dead locks. Best to keep resources for USER and META transactions separate.

I think I covered everything. If anyone has any questions or concerns, please let me know.

@cshannon
Copy link
Contributor

cshannon commented Dec 20, 2024

1 FATE instance for USER transactions, 1 FATE instance for META transactions. Each FATE instance has its own set of executors (new class maybe called FateExecutor). Each FateExecutor works on a subset of fate operations (meaning each has a pool of runners, a work finder, and a pool watcher).

How are the operations being split into different subsets/groups? Type or something else?

@kevinrr888
Copy link
Member

How are the operations being split into different subsets/groups? Type or something else?

Split based on FateOperation type (e.g., TABLE_CREATE, TABLE_COMPACT, TABLE_BULK_IMPORT2, etc.). The FateOperation class is a thrift type and only used for FateServiceHandler. Some transactions are seeded with different operations not included in FateOperation and used outside of FateServiceHandler. Currently, the FateOperations are passed to the Fates as Strings. Will make a small PR to add a concrete type of FateOperation which includes all possible operations and isn't thrift (will probably rename existing FateOperation to TFateOperation). This gets rid of the use of strings for these.

@kevinrr888
Copy link
Member

I realized you may have been asking how is this configurable, so I'll answer that too.
There will be 2 properties (one for configuring the META pools, and one for USER pools). The properties are JSON. They contain key/values the keys being a list of the fate operation names and the values being the pool size for those operations. The properties will have a default value and will be configurable by the user. We will ensure the property is always valid (covers all fate ops, no overlap, etc.)

@cshannon
Copy link
Contributor

@kevinrr888 - Thanks I was asking really both, how we planned to split it and also how we planned to configure it so thanks for the clarification.

@dlmarion
Copy link
Contributor Author

This does have the draw back of having multiple threads scanning the same store for the same things, then just filtering them out based on what work they are assigned to take. Maybe this structure could change in a follow on PR if desired, but as an initial PR, using multiple work finders per FATE instance.

I'm not sure what the table structure looks like for the USER store, but if the fate operation name is in the colf, then we could set up locality groups for each fate operation in the Initialize and Upgrade code.

@kevinrr888
Copy link
Member

I'm not sure what the table structure looks like for the USER store, but if the fate operation name is in the colf, then we could set up locality groups for each fate operation in the Initialize and Upgrade code.

Yes, good point. That is true for the USER store and is another thing we plan to change. I forgot to mention this. #5031 is an open PR regarding this. We can just add the fate operation to the same group.
For META store I don't know of a similar optimization since we store the data in ZooKeeper. Maybe it could be done with refactoring to how we currently store the Fate data, but I don't think it could be done as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

4 participants