perf(charts): improve performance on GET list #9619

dpgaspar · 2020-04-22T16:45:55Z

SUMMARY

This API endpoint was issuing a couple of extra queries for each row. One for ab_user resolving created by field and another for fetching the correct datasource. Since druid support outside of SQLAlchemy is deprecated, making an outer join with SqlaTable.

The idea to optimize and avoid a query per row, is when using @property has a column and the method itself references a column, make sure this column is declared on list_columns so it's prefetched or SQLAlchemy will issue the extra query

Local times are:
Before:
(timing) ChartRestApi.get_list.time = 130ms - 180ms

After:
(timing) ChartRestApi.get_list.time = 25ms - 50ms

Druid charts get displayed normally but if any, an extra query is issued for each one:

ADDITIONAL INFORMATION

REVIEWERS

@nytai

…rformance

dpgaspar · 2020-04-23T10:48:25Z

superset/models/slice.py

+        "Slice.datasource_type == 'table')",
+        remote_side="SqlaTable.id",
+        lazy="joined",
+    )


Setting this relation will avoid making one extra query per row, but will not support showing the datasource for deprecated druid source (yet will issue an outer join)

What will the experience for the user be if they are primarily using the deprecated Druid connector?

Updated the PR description with an example

No impact now

codecov-io · 2020-04-23T10:49:36Z

Codecov Report

Merging #9619 into master will decrease coverage by 0.02%.
The diff coverage is 71.42%.

@@            Coverage Diff             @@
##           master    #9619      +/-   ##
==========================================
- Coverage   65.71%   65.68%   -0.03%     
==========================================
  Files         574      574              
  Lines       30135    30138       +3     
  Branches     3066     3066              
==========================================
- Hits        19802    19797       -5     
- Misses      10149    10157       +8     
  Partials      184      184

Flag	Coverage Δ
#javascript	`58.76% <ø> (ø)`
#python	`70.55% <71.42%> (-0.05%)`	⬇️

Impacted Files	Coverage Δ
superset/charts/api.py	`81.66% <ø> (ø)`
superset/models/slice.py	`84.69% <71.42%> (-0.86%)`	⬇️
superset/db_engine_specs/postgres.py	`80.00% <0.00%> (-15.00%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 76764ac...2cc7494. Read the comment docs.

…rformance

willbarrett · 2020-04-28T19:30:41Z

I think Airbnb might have opinions about this. @john-bodley would this mess with your users' workflow?

john-bodley · 2020-04-28T20:19:09Z

@dpgaspar regarding the comment,

Since druid support outside of SQLAlchemy is deprecated

this isn't quite true, i.e., although we encourage environments to use Druid SQL (rather than Druid NoSQL) we still currently support Druid NoSQL and thus I'm not certain whether we should merge this PR.

I sense this PR shows the potential performance wins of actually fully deprecating the Druid NoSQL connector, i.e., there are numerous other places in the code base were the logic is complex and/or requires additional joins because there doesn't exist a foreign key between the slices and tables (or datasources which will be deprecated) tables.

…rformance

dpgaspar · 2020-04-30T14:47:54Z

@john-bodley had the wrong impression regarding Druid NoSQL.

Yet, adapted this PR so that we can have the best of both worlds. We still have the performance boost for fetching charts outside of Druid NoSQL and an extra query is issued for each Druid NoSQL existent on the query page (like before).

@willbarrett
No user impact, just user satisfaction :)

john-bodley

LGTM.

[charts] Improve performance on GET list

e400976

superset-github-bot bot added the preset-io label Apr 22, 2020

pull-request-size bot added the size/M label Apr 22, 2020

dpgaspar marked this pull request as ready for review April 23, 2020 10:02

dpgaspar added 3 commits April 23, 2020 11:03

wakey wakey GitHub Actions

5750eb1

Merge remote-tracking branch 'upstream/master' into fix/api-charts-pe…

0214835

…rformance

[charts] fix tests

2cc7494

dpgaspar commented Apr 23, 2020

View reviewed changes

dpgaspar added 2 commits April 23, 2020 11:50

[charts] black

80c8c1b

Merge remote-tracking branch 'upstream/master' into fix/api-charts-pe…

bead80c

…rformance

Merge remote-tracking branch 'upstream/master' into fix/api-charts-pe…

d981384

…rformance

perf(chars): still support Druid NoSQL

62422aa

pull-request-size bot added size/S and removed size/M labels Apr 30, 2020

dpgaspar requested a review from john-bodley April 30, 2020 14:53

john-bodley approved these changes Apr 30, 2020

View reviewed changes

dpgaspar changed the title ~~[charts] Improve performance on GET list~~ perf(charts): improve performance on GET list Apr 30, 2020

dpgaspar mentioned this pull request Apr 30, 2020

perf(dashboards): improve API performance for dashboards #9704

Merged

12 tasks

willbarrett approved these changes Apr 30, 2020

View reviewed changes

dpgaspar merged commit 48ef619 into apache:master Apr 30, 2020

dpgaspar deleted the fix/api-charts-performance branch April 30, 2020 16:15

dpgaspar mentioned this pull request May 5, 2020

perf(dataset): improve performance on get list #9739

Merged

12 tasks

dpgaspar mentioned this pull request May 19, 2020

fix(chart): chart datasource explore URL showing datasource name for druid #9839

Merged

6 tasks

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.37.0 First shipped in 0.37.0 labels Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(charts): improve performance on GET list #9619

perf(charts): improve performance on GET list #9619

Uh oh!

dpgaspar commented Apr 22, 2020 •

edited

Loading

Uh oh!

dpgaspar Apr 23, 2020

Uh oh!

willbarrett Apr 23, 2020

Uh oh!

dpgaspar Apr 24, 2020

Uh oh!

dpgaspar Apr 30, 2020

Uh oh!

codecov-io commented Apr 23, 2020 •

edited

Loading

Uh oh!

willbarrett commented Apr 28, 2020

Uh oh!

john-bodley commented Apr 28, 2020

Uh oh!

dpgaspar commented Apr 30, 2020

Uh oh!

john-bodley left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

perf(charts): improve performance on GET list #9619

perf(charts): improve performance on GET list #9619

Uh oh!

Conversation

dpgaspar commented Apr 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CATEGORY

SUMMARY

ADDITIONAL INFORMATION

REVIEWERS

Uh oh!

dpgaspar Apr 23, 2020

Choose a reason for hiding this comment

Uh oh!

willbarrett Apr 23, 2020

Choose a reason for hiding this comment

Uh oh!

dpgaspar Apr 24, 2020

Choose a reason for hiding this comment

Uh oh!

dpgaspar Apr 30, 2020

Choose a reason for hiding this comment

Uh oh!

codecov-io commented Apr 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

willbarrett commented Apr 28, 2020

Uh oh!

john-bodley commented Apr 28, 2020

Uh oh!

dpgaspar commented Apr 30, 2020

Uh oh!

john-bodley left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dpgaspar commented Apr 22, 2020 •

edited

Loading

codecov-io commented Apr 23, 2020 •

edited

Loading

john-bodley left a comment •

edited

Loading