Query Performance: Hard Compactions #582

sergiimk · 2024-04-08T16:49:40Z

Future idea: Proactively detect when derivative datasets' inputs have broke history and offer user to mitigate the problem by triggering hard compaction

The text was updated successfully, but these errors were encountered:

sergiimk · 2024-07-25T22:10:21Z

After some testing the following defects were identified:

1) User feedback about history-breaking changes on derivative dataset inputs

Epic mentions a subtask:
Display task outcome to let user know that derivative dataset cannot be updated because of root compaction

However currently in this scenario a generic error is shown:

According to @zaychenko-sergei there was a special interpretation of the Invalid block interval error added to the UI to explain the error to the user, so looks like this functionality is not working any more.

2) It's not possible to recover the derivative dataset and bring it into a good state if one of its inputs was compacted.

Consider a scenario where:

We run "full" compaction on the root dataset
Attempting to update derivative dataset will return error (as per 1.)
In settings there is no option to run a compaction or somehow reset the derivative dataset - we're stuck

3) "Metadata only" option for root datasets should be better explained

Even I was not entirely sure what to expect from "metadata-only option". Currently its not at all clear that it will delete all data (!!!)

4) Not possible to compact a root and trigger recursive "metadata-only" compactions for derivatives

This scenario was the main reason for introducing recursive compactions, but it's not possible in current UI:

Recursive flow can be triggered only via "metadata only" option on a root dataset, which deletes all data
If you run normal compaction that preserves the data - you cannot compact the derivatives any more
- neither recursively
- nor non-recursively because of the problem 2)

5) Incorrect info in "Recursive" flag tooltip

As a suggestion to address 2), 3), and 4) I propose the following steps:

Remove "Metadata Only" option from root compaction settings, leaving only the "full" mode (fixes 3.)
a. better document what regular hard compaction does
Add "Reset dataset" option in "General" tab, next to the "Delete dataset"
a. By default "reset" will remove everything except the Seed block (preserve only the identity of a dataset)
b. If Flatten metadata checkbox is checked - this will trigger our "metadata only" compaction
c. If recursive checkbox is checked - this will run "metadata only" recursively
x. As a separate feature later "Reset" can also allow resetting to the specified block hash - i.e. we will bring reset and "metadata only" compactions under the same umbrella in UX
Make "Reset dataset" available for both root and derivative (fixes 2.)
Add "Reset downstream datasets recursively" checkbox to "Compaction" tab (fixes 4.)
a. This will allow running normal compaction for root while triggering recursive "metadata-only" compactions for derivatives downstream.

In other words:

I think it's simpler for users to understand the concept of "reset with flattened metadata" than "metadata-only compaction"
Separating reset into its own function group avoids having users to chose from two very different compaction modes, one of which actually drops data

I think the above can be done with none or minimal changes on the backend the only thing missing I think is the ability to run normal compaction on root while triggering metadata-only compactions for derivatives.

sergiimk · 2024-09-01T21:14:29Z

Defects found:

Reset + flatten metadata option is missing for derivative datasets kamu-web-ui#405

sergiimk · 2024-09-09T23:12:25Z

My test setup is a chain of datasets:

gps -> gps-deriv-1 -> gps-deriv-2

Derivative datasets are simply doing select * from <input>.

Issues found:

When root dataset is hard compacted (non-recursive) and I trigger a derivative update manually I get an error (as expected), but this error shows the name of my derivative dataset instead of the name of the root input
When I do Reset + flatten metadata on derivative dataset I see a flow confusingly called "Hard Compaction". I expected to see "Reset"

Please ticket up these issues as low-priority bugs - they are not blocking this epic's completion.

Aside: I had a stable reproduce of the issue on demo environment where when I ran Reset with "recursive" flag the downstream datasets were not affected, only the current one.

I then tried the same on europort and everything worked. I then recreated datasets on demo and it started working as well. Just a note that there might be something fishy.

dmitriy-borzenko · 2024-09-10T09:46:01Z

Created a ticket for the backend Change input dataset for derived datasets when manual update fails #820
Reset + flatten metadata on derivative dataset under the hood it starts Hard compaction that's why the result is like this

Maybe this should be taken out into a separate flow? We need to discuss this, what is the best way to proceed.

I just checked this case and everything works. Everyone in the chain has empty data. I can't reproduce this problem.

zaychenko-sergei added the epic label Apr 9, 2024

zaychenko-sergei mentioned this issue May 21, 2024

Extensions in hard compacting flow configuration kamu-data/kamu-web-ui#317

Closed

sergiimk changed the title ~~Data query performance~~ Query performance (hard compactions) Jun 26, 2024

sergiimk changed the title ~~Query performance (hard compactions)~~ Query Performance: Hard Compactions Jun 26, 2024

This was referenced Jul 29, 2024

Modify and extend Compaction tab kamu-data/kamu-web-ui#369

Closed

Create a new block "Reset dataset" in the "General" tab in the dataset settings kamu-data/kamu-web-ui#370

Closed

sergiimk assigned rmn-boiko Aug 22, 2024

sergiimk closed this as completed Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query Performance: Hard Compactions #582

Query Performance: Hard Compactions #582

sergiimk commented Apr 8, 2024 •

edited

Loading

sergiimk commented Jul 25, 2024 •

edited

Loading

sergiimk commented Sep 1, 2024

sergiimk commented Sep 9, 2024

dmitriy-borzenko commented Sep 10, 2024 •

edited

Loading

Query Performance: Hard Compactions #582

Query Performance: Hard Compactions #582

Comments

sergiimk commented Apr 8, 2024 • edited Loading

sergiimk commented Jul 25, 2024 • edited Loading

sergiimk commented Sep 1, 2024

sergiimk commented Sep 9, 2024

dmitriy-borzenko commented Sep 10, 2024 • edited Loading

sergiimk commented Apr 8, 2024 •

edited

Loading

sergiimk commented Jul 25, 2024 •

edited

Loading

dmitriy-borzenko commented Sep 10, 2024 •

edited

Loading