-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add identifier to split points for automated vs user created split points #5014
Comments
I will take a look at this issue, I think it makes sense to add a new column to track the split point type to the tablet metadata |
@keith-turner - I've started to look into this and prototype something with a new column qual that will be added to the split family that was added and a few questions come to mind on how we want to handle marking the tablets:
|
The pro for absence==system split is less data in the metadata table for a common case. The con is absence can mean two things, its kinda like using null in java. It can mean there is a bug and it was not set when it was intended to be set or it can mean its a system split. So maybe we should only go w/ absence==system split if we feel confident enough in our tests to ensure that user split marker is always being set when intended.
The create table API has a mechanism to set initial splits on a table. Those could be marked as user initially. The default tablet at the end of every table seems like it would always be marked system, but not sure. The default tablet is always there when you have zero or more splits.
I can not remember what merge does w.r.t. adding splits. Reviewing all of your merge changes, I remember something about deleterows inserting a split point possibly but not sure. But, maybe this was removed in the 4.0 changes? If merge is still adding splits, it was just doing that to make things work so not sure those should be marked as user. |
Would the following be a good way to handle upgrade?
Then after upgrade user have a chance to mark splits that should never be automatically removed before enabling the auto merge feature. |
I was wondering about the upgrade case, i like the idea of using absent for system splits as most splits will be system splits. However, one downside is any existing splits that a user created would be incorrectly marked as a system split on upgrade unless there were manual steps done to mark them as user during upgrade like mentioned above. I think this is fine but I'm wondering how we make it easy for a user to mark splits. We might need a new command line tool or something to make it simple to mark them. In fact, I could use a use case where a user decides maybe there's some tablets they don't want merged (even if originally system created) and they could use the tool not just for upgrades but for marking those to prevent merging too. |
Also wondering about unmarking them. The TabletInformation API couild be used to get information about the markings. Then just need to figure out what the APIs and shell commands are for marking and unmarking them. Could be similar to APIs and shell commands for makring tablet availability.
Maybe instead of marking how a split was created we mark a splits mergability. That could have the following three states.
|
I talked to @keith-turner offline and we think it would be good to have a broader discussion with others first before deciding the best way forward for this. The main use case for this feature in the first place was to handle automatic merges. After talking to Keith about it being able to mark mergeability separately and not have it tied to how splits are created could have some advantages (but increases complexity). There's more potential use cases and features we came up with in regards to merging such as being able to mark a split as a candidate for auto merge or not regardless how the split was created, marking a tablet as never being able to merge, change the marking, etc and it would be good to see if anyone else has any opinions on it. |
Several of us talked and came up with a path forward for now, the tentative plan to try is:
One thing I am still trying to figure out is how the automatic merging should work if we reach the threshold we set (say smaller than 10% of the split threshold) and if we should only ever merge together 2 tablets or try and merge as many as possible etc. @keith-turner - Is there anything I missed or got wrong after the conversation and conclusions we came to? |
I like merging as many tablets that fit into the merge threshold. So if we have five tablets and the sum of their existing sizes is less than the merge threshold then can merge all five of them. |
My plan is to break this into multiple PRs, I'm thinking 3 parts for now:
|
This adds a new Mergeability column for marking if/when tablets are eligible to me merged by the system based on a threshold. The column stores a duration that is relative to the time the Manager uses, Steady time. There are 3 possible states for the value: 1) -1 : This means a tablet will never automatically merge 2) 0 : Tablet is eligible immediately to automatically merge 3) positive duration: eligible to merge after the given duration relative to the current system Steady time. Ie. the tablet can be merged if SteadyTime is later than the given duration. This change only adds the new column itself and populates it. The default is to never merge automatically for all cases except for when the system automatically splits tablets. In that case the newly split tablets are marked as being eligible to merge immediately. Future updates will add API enhancements to allow setting/viewing the mergeability setting as well as to enable automatic merging by the system that is based on this new column value. When automatic merging is enabled, if a user wants to make a tablet elgible to be merged in the future they would do so by adding a period of time to the current SteadyTime. For example, to make a tablet eligible to be merged 3 days from now the user would read the current SteadyTime value (represented as a duration), add 3 days to that and then store that new duration in the column. When the current steady time passes that duration the tablet would be eligible to be merged. See apache#5014 for more details
This adds a new Mergeability column for marking if/when tablets are eligible to me merged by the system based on a threshold. The column stores a duration that is relative to the time the Manager uses, Steady time. There are 3 possible states for the value: 1) -1 : This means a tablet will never automatically merge 2) 0 : Tablet is eligible now to automatically merge 3) positive duration: eligible to merge after the given duration relative to the current system Steady time. Ie. the tablet can be merged if SteadyTime is later than the given duration. This change only adds the new column itself and populates it. The default is to never merge automatically for all cases except for when the system automatically splits tablets. In that case the newly split tablets are marked as being eligible to merge immediately. Future updates will add API enhancements to allow setting/viewing the mergeability setting as well as to enable automatic merging by the system that is based on this new column value. When automatic merging is enabled, if a user wants to make a tablet elgible to be merged in the future they would do so by adding a period of time to the current SteadyTime. For example, to make a tablet eligible to be merged 3 days from now the user would read the current SteadyTime value (represented as a duration), add 3 days to that and then store that new duration in the column. When the current steady time passes that duration the tablet would be eligible to be merged. See apache#5014 for more details
This adds a new Mergeability column for marking if/when tablets are eligible to me merged by the system based on a threshold. The column stores a duration that is relative to the time the Manager uses, Steady time. There are 3 possible states for the value: 1) -1 : This means a tablet will never automatically merge 2) 0 : Tablet is eligible now to automatically merge 3) positive duration: eligible to merge after the given duration relative to the current system Steady time. Ie. the tablet can be merged if SteadyTime is later than the given duration. This change only adds the new column itself and populates it. The default is to never merge automatically for all cases except for when the system automatically splits tablets. In that case the newly split tablets are marked as being eligible to merge immediately. Future updates will add API enhancements to allow setting/viewing the mergeability setting as well as to enable automatic merging by the system that is based on this new column value. When automatic merging is enabled, if a user wants to make a tablet elgible to be merged in the future they would do so by adding a period of time to the current SteadyTime. For example, to make a tablet eligible to be merged 3 days from now the user would read the current SteadyTime value (represented as a duration), add 3 days to that and then store that new duration in the column. When the current steady time passes that duration the tablet would be eligible to be merged. See apache#5014 for more details
Is your feature request related to a problem? Please describe.
When split points are automatically created due to tablet sizes, these split points persist until a user runs a merge.
If a merge is run automatically, then it might remove split points that were purposely set by the user.
Describe the solution you'd like
Split points created automatically by accumulo should be identified as such so they can easily be removed.
The text was updated successfully, but these errors were encountered: