feat: add hook for dataset health check#11970
feat: add hook for dataset health check#11970graceguo-supercat merged 6 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
i can populate warning message from here, or from superset-ui-chart-controls:
https://github.com/apache-superset/superset-ui/blob/1f8a700b953efbed81d2c9970b514d7fe031ad2e/packages/superset-ui-chart-controls/src/shared-controls/index.tsx#L166
but it will need to create another PR in another code base.@ktmud do you have preference?
There was a problem hiding this comment.
This component ControlPanelsContainer is generic and should stay generic. It should not contain logics related to any specific control field.
|
What if users change datasource or update datasource to point to a physical datasource in the Edit Datasource modal? |
Codecov Report
@@ Coverage Diff @@
## master #11970 +/- ##
==========================================
- Coverage 66.26% 60.24% -6.02%
==========================================
Files 941 912 -29
Lines 45715 45009 -706
Branches 4395 4061 -334
==========================================
- Hits 30293 27116 -3177
- Misses 15287 17893 +2606
+ Partials 135 0 -135
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
I believe most if not all control configs in this file are not in use any more. We should clean them up.
There was a problem hiding this comment.
yes. this line has no use :(
I don't offer any rule in the PR. the goal for this PR is to offer a hook in the codebase. Each company define their own rules. |
|
I mean does the error message update when users switch datasources? It seems the check only happens at the first page load. |
No. save dataset data is on a different api. This health check will only happen when opening explore view. |
5f0a0bf to
5f4f2cc
Compare
5f4f2cc to
88a96b3
Compare
superset/connectors/sqla/models.py
Outdated
There was a problem hiding this comment.
can i add event log this way? want to monitor the duration for health check logic, in case it may affect explore view perf.
There was a problem hiding this comment.
What you care most is the actual check function, the part where we check version and skip the health check (happens for most page views) is known to be pretty trivial. So maybe use this code instead:
with event_logger.log_context(action="datasource_health_check") as log:
message = check(self)
...6b5315f to
04309c5
Compare
04309c5 to
420adcc
Compare
| data_["fetch_values_predicate"] = self.fetch_values_predicate | ||
| data_["template_params"] = self.template_params | ||
| data_["is_sqllab_view"] = self.is_sqllab_view | ||
| data_["health_check_message"] = self.health_check_message |
There was a problem hiding this comment.
instead of output whole "extra", i saw certified data only offer flattened information. I think it's easier for frontend usage.
ktmud
left a comment
There was a problem hiding this comment.
Thanks for the iterations! One blocking comment (you should set health_check even when check returns None), otherwise LGTM.
I'm also curious to hear how others think about this feature, though.
@junlincc @mistercrunch do you think this align well with overall Superset roadmap?
superset/views/core.py
Outdated
| ) | ||
|
|
||
| # if feature enabled, run some health check rules for sqla datasource | ||
| if hasattr(datasource, "health_check") and conf["DATASET_HEALTH_CHECK"]: |
There was a problem hiding this comment.
I think you can skip the conf check here. It will be immediately checked by datasource.health_check() anyway.
superset/connectors/sqla/models.py
Outdated
| extra["health_check"] = { | ||
| "version": check.version, | ||
| "message": message, | ||
| } |
There was a problem hiding this comment.
You might want to set the health_check field even if check message returns None or empty. Because otherwise this will run for every "good" datasources that didn't have any health check warnings.
There was a problem hiding this comment.
I would also recommend making the message more structured for scalability (i.e. let DATASET_HEALTH_CHECK return a dict instead of a string). Because in the future, you may need to localize the warning message, then it would be better to store an ERROR_CODE + some metadata instead of just a message string. This also opens the door for more actionable warning message where users may click on the warning icon and open a popup for more details.
Maybe we can even define it as a new SupersetError type and use @etr2460 's error message registry to render it.
But of course, this could also be reserved for future refactor/extension. (We can always update the health check version when that happens!)
There was a problem hiding this comment.
It's not as strong as SupesetError. Error and Warning should be treated differently.
Let's keep this PR as simple as it is, and we can reiterate when it is really needed. The impact of this feature is still unknown to me, because it depends a lot on the rule implementation. Let's run the core function with some real examples before putting too much efforts on extras.
There was a problem hiding this comment.
Yes, agreed. We don't need to go that route if it's too much work. Just laying out my thoughts here in case we want to double down on this feature. Even if the message depends on specific rule implementation, we can enforce a basic structure so that it's possible the override the message renders.
Error and Warning should be treated differently.
This I must humbly disagree. Both are actually a subclass of Alert in most UI libraries, and we see in a lot of places the concept of "error levels" (error, warning, info). Most of the time, you'd see warnings and errors being rendered by the same component, just different colors or icons. And in case we want to render them differently, the shared infrastructure of custom renderer registry comes handy.
There was a problem hiding this comment.
thanks. i will keep this work in mind. If we see this feature is useful, I will polish it with localization and other re-usable UI component.
ea3b23b to
5146afa
Compare
| it('should render health check message', () => { | ||
| const wrapper = setup(); | ||
| const icons = wrapper.find(Icon); | ||
| expect(icons.first().prop('name')).toBe('alert-solid'); |
There was a problem hiding this comment.
should we also make sure the warning message is shown here?
| <Styles className="DatasourceControl"> | ||
| <ControlHeader {...this.props} /> | ||
| <div> | ||
| <div style={{ display: 'flex' }}> |
There was a problem hiding this comment.
instead of an inline style, should we use a styled.div to apply this style?
| # It's possible to add a dataset health check logic which is specific to your system. | ||
| # It will get executed each time when user open a chart's explore view. | ||
| DATASET_HEALTH_CHECK = None |
There was a problem hiding this comment.
can you add the signature about the health check to the comment here?
Or maybe even better, include a default function that always passes. It's actually a bit unclear to me how this should be used (should I return True if the dataset passes? None?)
There was a problem hiding this comment.
See Jesse's comment above, if the dataset passes, it will still add a health_check section into db record (so that not running check again).
|
|
||
| # if feature enabled, run some health check rules for sqla datasource | ||
| if hasattr(datasource, "health_check"): | ||
| datasource.health_check() |
There was a problem hiding this comment.
@graceguo-supercat why in this instance do we not require to force a check, i.e., datasource.health_check(force=True)?
There was a problem hiding this comment.
We should run check again when user updates dataset, or when health check rule's version is changed. Otherwise the healthy status will not be changed.
This function is triggered at the time explore view is opened from browser, so it doesn't need to force run the check.
SUMMARY
Inside airbnb, we spent a lot of time on Superset dashboard and query performance issues. We create some guidelines for airbnb users, to help them write better queries to create virtual dataset. So we want to add a dataset health check logic when user explore a chart, and show some warning message if health check returns any message.
To decide whether a dataset is good or bad, this logic is probably very specific to each corporation. Some rules applied in airbnb may totally not relevant for Dropbox. So in the open source codebase, we only add an entry point,
DATASET_HEALTH_CHECK = _your function to check dataset quality_which allow each company to implement and hook their own rules.
For example, after implements a rule that checks if the chart is using a virtual dataset:
I will see a warning message like this:
Note
The health_check result will be saved with datasource record
extracolumn. When user update datasource, orDATASET_HEALTH_CHECK.versionis changed, health_check function will be triggered again and update the record in the database. Please see how we discussed different ideas for this feature.TEST PLAN
CI
ADDITIONAL INFORMATION
cc @ktmud @etr2460