-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query Performance: Hard Compactions #582
Comments
After some testing the following defects were identified: 1) User feedback about history-breaking changes on derivative dataset inputs Epic mentions a subtask: However currently in this scenario a generic error is shown: According to @zaychenko-sergei there was a special interpretation of the 2) It's not possible to recover the derivative dataset and bring it into a good state if one of its inputs was compacted. Consider a scenario where:
3) "Metadata only" option for root datasets should be better explained Even I was not entirely sure what to expect from "metadata-only option". Currently its not at all clear that it will delete all data (!!!) 4) Not possible to compact a root and trigger recursive "metadata-only" compactions for derivatives This scenario was the main reason for introducing recursive compactions, but it's not possible in current UI:
5) Incorrect info in "Recursive" flag tooltip As a suggestion to address 2), 3), and 4) I propose the following steps:
In other words:
I think the above can be done with none or minimal changes on the backend the only thing missing I think is the ability to run normal compaction on root while triggering metadata-only compactions for derivatives. |
Maybe this should be taken out into a separate flow? We need to discuss this, what is the best way to proceed. I just checked this case and everything works. Everyone in the chain has empty data. I can't reproduce this problem. |
--keep-metadata-only
flag to compact command #625kamu system compact --keep-metadata-only
kamu pull --reset-derivatives-on-diverged-input <derivative dataset>
- make flag part of the pull flowPullParams
and applies to all datasets matches by pattern /--recursive
/--all
PullService
will attempt to run deriv transform as usualTransformService
returnsIntervalError
(??) and our--reset-derivatives-on-diverged-input
flag was specified it will callCompactService::compact_keep_metadata_only()
to reset the derivative dataset to empty state-
PullService
will then try to repeat the transformkeep_metadata_only
flag #630HardCompactKeepMetadataOnly
or configurableHardCompact(keep_metadata_only=true)
(for derivative AND root datasets)Future idea: Proactively detect when derivative datasets' inputs have broke history and offer user to mitigate the problem by triggering hard compaction
The text was updated successfully, but these errors were encountered: