Skip to content

Comments

feat: optimize catalog permission sync#33000

Merged
mistercrunch merged 3 commits intomasterfrom
optimize-catalog-permission-sync
Apr 11, 2025
Merged

feat: optimize catalog permission sync#33000
mistercrunch merged 3 commits intomasterfrom
optimize-catalog-permission-sync

Conversation

@betodealmeida
Copy link
Member

SUMMARY

For performance reasons, change the SyncPermissionsCommand to only sync permissions for non-default catalogs when at least one of these conditions is true:

  1. Cross-catalog queries are supported (for example, in BigQuery, but not RDS). When cross-catalog queries it's good to create t/he permissions for non-default catalogs so that the admin can manage access to them.
  2. Multi-catalog is enabled in the database. When multi-catalog is enabled the admin needs to have the permissions for the same reason.

This means that for a database like RDS or Postgres, when we sync the permissions we will use only the default catalog, unless multi-catalog is enabled.

Fixes #32993.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

Added tests.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@korbit-ai
Copy link

korbit-ai bot commented Apr 4, 2025

Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment /korbit-review.

Your admin can change your review schedule in the Korbit Console

@betodealmeida betodealmeida marked this pull request as ready for review April 4, 2025 17:14
@dosubot dosubot bot added data:connect Namespace | Anything related to db connections / integrations data:connect:postgres Related to Postgres labels Apr 4, 2025
Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've completed my review and didn't find any issues.

Files scanned
File Path Reviewed
superset/db_engine_specs/doris.py
superset/commands/database/sync_permissions.py
superset/db_engine_specs/snowflake.py
superset/db_engine_specs/postgres.py
superset/db_engine_specs/databricks.py
superset/db_engine_specs/bigquery.py
superset/db_engine_specs/presto.py
superset/db_engine_specs/base.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Need a new review? Comment /korbit-review on this PR and I'll review your latest changes.

Korbit Guide: Usage and Customization

Interacting with Korbit

  • You can manually ask Korbit to review your PR using the /korbit-review command in a comment at the root of your PR.
  • You can ask Korbit to generate a new PR description using the /korbit-generate-pr-description command in any comment on your PR.
  • Too many Korbit comments? I can resolve all my comment threads if you use the /korbit-resolve command in any comment on your PR.
  • On any given comment that Korbit raises on your pull request, you can have a discussion with Korbit by replying to the comment.
  • Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.

Customizing Korbit

  • Check out our docs on how you can make Korbit work best for you and your team.
  • Customize Korbit for your organization through the Korbit Console.

Current Korbit Configuration

General Settings
Setting Value
Review Schedule Automatic excluding drafts
Max Issue Count 10
Automatic PR Descriptions
Issue Categories
Category Enabled
Documentation
Logging
Error Handling
Readability
Design
Performance
Security
Functionality

Feedback and Support

Note

Korbit Pro is free for open source projects 🎉

Looking to add Korbit to your team? Get started with a free 2 week trial here

force=True,
ssh_tunnel=self.db_connection_ssh_tunnel,
)
# Adding permissions to all catalogs (and all their schemas) can take a long
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a similar check here:

catalogs = (
self._get_catalog_names()
if self.db_connection.db_engine_spec.supports_catalog
else [None]
)

Can these two conditions be consolidated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that block is actually calling this method, so I think we're good? Would defer to @betodealmeida to confirm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, those are different.

Basically with my changes _get_catalog_names returns all relevant catalogs in a given database. This could be all catalogs, or just the default.

When a database doesn't support catalogs, then we need to use None as the catalog, hence the [None].

)
# Adding permissions to all catalogs (and all their schemas) can take a long
# time (minutes, while importing a chart, eg). If the database does not
# support cross-catalog queries (like RDS or Postgres), and the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on @arafoperata's feedback, we probably want to say "like Postgres" as it's not tied to RDS

Copy link
Contributor

@Vitor-Avila Vitor-Avila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for working on this improvement

Copy link
Contributor

@arafoperata arafoperata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betodealmeida ,

I've tested your changes and the issue is still occurring. It seems there are two different places where all catalogs are iterated. Once is in sync_permissions, but the other happens here.

This gets invoked on the CreateDatabaseCommand.

A similar change is required there. Is there any harm in moving the logic to return only the default catalog into the underlying get_all_catalog_names method(superset/models/core.py)?

@mistercrunch mistercrunch merged commit d88cba9 into master Apr 11, 2025
40 checks passed
@mistercrunch mistercrunch deleted the optimize-catalog-permission-sync branch April 11, 2025 00:38
alexandrusoare pushed a commit to alexandrusoare/superset that referenced this pull request Jun 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:connect:postgres Related to Postgres data:connect Namespace | Anything related to db connections / integrations size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Catalog permission syncs severely degrades dashboard import performance

4 participants