Skip to content

Conversation

@dpgaspar
Copy link
Member

@dpgaspar dpgaspar commented Apr 22, 2020

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

This API endpoint was issuing a couple of extra queries for each row. One for ab_user resolving created by field and another for fetching the correct datasource. Since druid support outside of SQLAlchemy is deprecated, making an outer join with SqlaTable.

The idea to optimize and avoid a query per row, is when using @property has a column and the method itself references a column, make sure this column is declared on list_columns so it's prefetched or SQLAlchemy will issue the extra query

Local times are:
Before:
(timing) ChartRestApi.get_list.time = 130ms - 180ms

After:
(timing) ChartRestApi.get_list.time = 25ms - 50ms

Druid charts get displayed normally but if any, an extra query is issued for each one:

Screenshot 2020-04-30 at 15 34 17

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

@nytai

@dpgaspar dpgaspar marked this pull request as ready for review April 23, 2020 10:02
"Slice.datasource_type == 'table')",
remote_side="SqlaTable.id",
lazy="joined",
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting this relation will avoid making one extra query per row, but will not support showing the datasource for deprecated druid source (yet will issue an outer join)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will the experience for the user be if they are primarily using the deprecated Druid connector?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR description with an example

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No impact now

@codecov-io
Copy link

codecov-io commented Apr 23, 2020

Codecov Report

Merging #9619 into master will decrease coverage by 0.02%.
The diff coverage is 71.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9619      +/-   ##
==========================================
- Coverage   65.71%   65.68%   -0.03%     
==========================================
  Files         574      574              
  Lines       30135    30138       +3     
  Branches     3066     3066              
==========================================
- Hits        19802    19797       -5     
- Misses      10149    10157       +8     
  Partials      184      184              
Flag Coverage Δ
#javascript 58.76% <ø> (ø)
#python 70.55% <71.42%> (-0.05%) ⬇️
Impacted Files Coverage Δ
superset/charts/api.py 81.66% <ø> (ø)
superset/models/slice.py 84.69% <71.42%> (-0.86%) ⬇️
superset/db_engine_specs/postgres.py 80.00% <0.00%> (-15.00%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 76764ac...2cc7494. Read the comment docs.

@willbarrett
Copy link
Member

I think Airbnb might have opinions about this. @john-bodley would this mess with your users' workflow?

@john-bodley
Copy link
Member

@dpgaspar regarding the comment,

Since druid support outside of SQLAlchemy is deprecated

this isn't quite true, i.e., although we encourage environments to use Druid SQL (rather than Druid NoSQL) we still currently support Druid NoSQL and thus I'm not certain whether we should merge this PR.

I sense this PR shows the potential performance wins of actually fully deprecating the Druid NoSQL connector, i.e., there are numerous other places in the code base were the logic is complex and/or requires additional joins because there doesn't exist a foreign key between the slices and tables (or datasources which will be deprecated) tables.

@dpgaspar
Copy link
Member Author

@john-bodley had the wrong impression regarding Druid NoSQL.

Yet, adapted this PR so that we can have the best of both worlds. We still have the performance boost for fetching charts outside of Druid NoSQL and an extra query is issued for each Druid NoSQL existent on the query page (like before).

@willbarrett
No user impact, just user satisfaction :)

@pull-request-size pull-request-size bot added size/S and removed size/M labels Apr 30, 2020
@dpgaspar dpgaspar requested a review from john-bodley April 30, 2020 14:53
Copy link
Member

@john-bodley john-bodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@dpgaspar dpgaspar changed the title [charts] Improve performance on GET list perf(charts): improve performance on GET list Apr 30, 2020
@dpgaspar dpgaspar merged commit 48ef619 into apache:master Apr 30, 2020
@dpgaspar dpgaspar deleted the fix/api-charts-performance branch April 30, 2020 16:15
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.37.0 First shipped in 0.37.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/S 🚢 0.37.0 First shipped in 0.37.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants