-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Add config for index parallelism and make clean public #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @siddharthagunda - FYI |
|
This fixes #108 |
|
|
||
| /** | ||
| * Clean up any stale/old files/data lying around (either on file storage or index storage) that is past | ||
| * the typical query timeout. Default is 12 hours. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given cleaner itself has these knobs, why are we reintroducing a default clean time window here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment on index tagLocation and updateLocation...
Would come really handy, as we design the next generation of indexing,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given cleaner itself has these knobs
True. I just added more description on the API as we are making this public. I will add that based on the cleaning policy.
Same comment on index tagLocation and updateLocation...
I suppose adding metrics? That can be a seperate diff.
| * Clean up any stale/old files/data lying around (either on file storage or index storage) that is past | ||
| * the typical query timeout. Default is 12 hours. | ||
| */ | ||
| public void clean() throws HoodieIOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a metric around how long cleaning takes? It might be really useful going forward to keep an eye on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have metrics today for cleaning. We should be able to get this already. We are also saving this in the .clean file in the timeline already.
|
Ship it |
…ers (apache#109) * fail job if duplicate data files detected during reconcileAgainstMarkers * adding missing apache License
…ers (apache#109) (apache#169) * fail job if duplicate data files detected during reconcileAgainstMarkers * adding missing apache License Co-authored-by: harshal <[email protected]>
…ers (apache#109) * fail job if duplicate data files detected during reconcileAgainstMarkers * adding missing apache License
No description provided.