Skip to content

Conversation

@justinpark
Copy link
Member

@justinpark justinpark commented Mar 11, 2025

SUMMARY

Currently, the recent_activity query for the log table groups by the dashboard ID and slice ID to extract a distinct list from the entire log table, which leads to performance issues.
(In the case of Airbnb, more than 1 million logs are generated each day, and grouping by dashboard and slice ID, even with indexing, significantly impacts database performance as shown in the following log)

| 3793593 | superset         | 100.117.121.20:41904  | superset_production | Query       |  1726 s| Sending data                                                  | SELECT anon_1.dashboard_id AS anon_1_dashboard_id, anon_1.slice_id AS anon_1_slice_id, anon_1.action AS anon_1_action, anon_1.dttm AS anon_1_dttm, dashboards.slug AS dashboard_slug, dashboards.dashboard_title AS dashboards_dashboard_title, slices.slice_name AS slices_slice_name
FROM (SELECT logs.dashboard_id AS dashboard_id, logs.slice_id AS slice_id, logs.action AS action, max(logs.dttm) AS dttm
FROM logs
WHERE logs.action IN ('explore', 'dashboard') AND logs.user_id = 13295 AND logs.dttm > '2024-03-06 18:00:12.201653' AND (logs.dashboard_id IS NOT NULL OR logs.slice_id IS NOT NULL) GROUP BY logs.dashboard_id, logs.slice_id, logs.action) AS anon_1 LEFT OUTER JOIN dashboards ON dashboards.id = anon_1.dashboard_id LEFT OUTER JOIN slices ON slices.id = anon_1.slice_id
WHERE dashboards.dashboard_title != '' OR slices.slice_name != '' ORDER BY anon_1.dttm DESC
 LIMIT 0, 6 |

To resolve this issue, it would be appropriate to create a materialized activity statistics view through a daily/hourly batch job. However, functionally, the main purpose of recent activity is to display only a few of the most recently visited items. Therefore, we improved performance by changing the approach to fetch the latest (including duplicates) log entries (by distinct: false) and extract a distinct list using LRU cache in the frontend side.

TESTING INSTRUCTIONS

specs (no visual changes)
Screenshot 2025-03-11 at 4 21 02 PM

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@justinpark justinpark requested review from michael-s-molina and villebro and removed request for michael-s-molina March 11, 2025 23:35
@dosubot dosubot bot added the change:frontend Requires changing the frontend label Mar 11, 2025
Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Fix Detected
Functionality Incorrect Chronological Order in Recent Activities ▹ view
Readability Misleading Method Name ▹ view
Files scanned
File Path Reviewed
superset-frontend/packages/superset-ui-core/src/utils/lruCache.ts
superset-frontend/src/features/home/types.ts
superset-frontend/src/features/home/ActivityTable.tsx
superset-frontend/src/pages/Home/index.tsx
superset-frontend/src/views/CRUD/utils.tsx

Explore our documentation to understand the languages and file types we support and the files we ignore.

Need a new review? Comment /korbit-review on this PR and I'll review your latest changes.

Korbit Guide: Usage and Customization

Interacting with Korbit

  • You can manually ask Korbit to review your PR using the /korbit-review command in a comment at the root of your PR.
  • You can ask Korbit to generate a new PR description using the /korbit-generate-pr-description command in any comment on your PR.
  • Too many Korbit comments? I can resolve all my comment threads if you use the /korbit-resolve command in any comment on your PR.
  • On any given comment that Korbit raises on your pull request, you can have a discussion with Korbit by replying to the comment.
  • Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.

Customizing Korbit

  • Check out our docs on how you can make Korbit work best for you and your team.
  • Customize Korbit for your organization through the Korbit Console.

Feedback and Support

Comment on lines 71 to 73
public entries(): T[] {
return [...this.cache.values()];
}

This comment was marked as resolved.

Comment on lines +228 to +230
recentsRes.json.result.reverse().forEach((record: RecentActivity) => {
distinctRes.set(record.item_url, record);
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect Chronological Order in Recent Activities category Functionality

Tell me more
What is the issue?

The original array is being reversed before being processed, which modifies the chronological order of activities before applying the LRU cache.

Why this matters

This can lead to incorrect chronological ordering of recent activities in the UI, as older activities might appear as more recent than they actually are.

Suggested change ∙ Feature Preview

Process the records in their original order to maintain correct chronological sequence:

recentsRes.json.result.forEach((record: RecentActivity) => {
  distinctRes.set(record.item_url, record);
});

Report a problem with this comment

💬 Looking for more details? Reply to this comment to chat with Korbit.

Copy link
Member

@michael-s-molina michael-s-molina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @justinpark.

LogDAO.get_recent_activity is only called from the recent_activity endpoint at superset/views/log/api.py. Should we remove the the whole distinct block? Given that this endpoint is not under v1, it's not a breaking change.

def get_recent_activity(
   actions: list[str],
   distinct: bool,
   page: int,
   page_size: int,
) -> list[dict[str, Any]]:
   ...
   if distinct:
      ...

cc @sadpandajoe @villebro

({ other }) => {
res.other = other;
res.viewed = recentsRes.json.result;
res.viewed = distinctRes.values().reverse();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to reverse twice or can this be optimized?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API returns the latest accesses in reverse chronological order. (i.e. [2025-03-10, 2025-03-09, 2025-03-07, ...]) For an LRU cache, access records need to be stored in chronological order. Therefore, the API list must be reversed to maintain a record of accesses in chronological order. After that, since the final LRU cache list has the most recent entries placed at the end, it needs to be reversed again to position the latest entries at the front.
This two-step reversal process is an algorithmically correct approach.

@justinpark
Copy link
Member Author

Given that this endpoint is not under v1, it's not a breaking change.

Its endpoint is /api/v1/log/recent_activity/ and the distinct param is included in the schema so it will be a break change.

@michael-s-molina
Copy link
Member

Its endpoint is /api/v1/log/recent_activity/ and the distinct param is included in the schema so it will be a break change.

Oh, you're correct!

@justinpark
Copy link
Member Author

Its endpoint is /api/v1/log/recent_activity/ and the distinct param is included in the schema so it will be a break change.

Oh, you're correct!

Therefore, it improves the frontend to fetch in an efficient way and keeps the API specification as-is.

@justinpark justinpark merged commit 832e028 into apache:master Mar 13, 2025
48 checks passed
michael-s-molina pushed a commit that referenced this pull request Mar 13, 2025
michael-s-molina pushed a commit that referenced this pull request Mar 13, 2025
@michael-s-molina michael-s-molina added the v5.0 Label added by the release manager to track PRs to be included in the 5.0 branch label Mar 17, 2025
michael-s-molina pushed a commit that referenced this pull request Mar 17, 2025
@sadpandajoe sadpandajoe added the v4.1 Label added by the release manager to track PRs to be included in the 4.1 branch label Mar 26, 2025
sadpandajoe pushed a commit that referenced this pull request Mar 26, 2025
@mistercrunch mistercrunch added 🍒 4.1.3 Cherry-picked to 4.1.3 🍒 5.0.0 Cherry-picked to 5.0.0 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels labels Jul 29, 2025
cyber-jessie added a commit to CybercentreCanada/superset that referenced this pull request Jan 8, 2026
* chore: bump base image in Dockerfile with `ARG PY_VER=3.11.11-slim-bookworm` (apache#32780)

* chore: Revert "chore: bump base image in Dockerfile with `ARG PY_VER=3.11.11-slim-bookworm`" (apache#32782)

* fix(chart data): removing query from /chart/data payload when accessing as guest user (apache#30858)

(cherry picked from commit dd39138)

* fix: upgrade to 3.11.11-slim-bookworm to address critical vulnerabilities (apache#32240)

(cherry picked from commit ad05732)

* fix(model/helper): represent RLS filter clause in proper textual SQL string (apache#32406)

Signed-off-by: hainenber <[email protected]>
(cherry picked from commit ff0529c)

* fix: Log table retention policy (apache#32572)

(cherry picked from commit 89b6d7f)

* fix(welcome): perf on distinct recent activities (apache#32608)

(cherry picked from commit 832e028)

* fix(log): Update recent_activity by event name (apache#32681)

(cherry picked from commit 449f51a)

* fix: Signature of Celery pruner jobs (apache#32699)

(cherry picked from commit df06bdf)

* fix(logging): missing path in event data (apache#32708)

(cherry picked from commit cd5a943)

* fix(fe/dashboard-list): display modifier info for `Last modified` data (apache#32035)

Signed-off-by: hainenber <[email protected]>
(cherry picked from commit 88cf2d5)

* fix: make packages PEP 625 compliant (apache#32866)

Co-authored-by: Michael S. Molina <[email protected]>
(cherry picked from commit 6e02d19)

* all cccs changes

* fix: Downgrade to marshmallow<4 (apache#33216)

* fix(log): store navigation path to get correct logging path (apache#32795)

(cherry picked from commit 4a70065)

* fix(pivot-table): Revert "fix(Pivot Table): Fix column width to respect currency config (apache#31414)" (apache#32968)

(cherry picked from commit a36e636)

* fix: improve error type on parse error (apache#33048)

(cherry picked from commit ed0cd5e)

* fix(plugin-chart-echarts): remove erroneous upper bound value (apache#32473)

(cherry picked from commit 5766c36)

* fix(pinot): revert join and subquery flags (apache#32382)

(cherry picked from commit 822d72c)

* fix: loading examples from raw.githubusercontent.com fails with 429 errors (apache#33354)

(cherry picked from commit f045a73)

* chore: creating 4.1.3rc1 change log and updating frontend json

(cherry picked from commit 72cf9b6)

* chore(🦾): bump python sqlglot 26.1.3 -> 26.11.1 (apache#32745)

Co-authored-by: GitHub Action <[email protected]>
(cherry picked from commit 66c1a6a)

* chore(🦾): bump python h11 0.14.0 -> 0.16.0 (apache#33339)

Co-authored-by: GitHub Action <[email protected]>
(cherry picked from commit 8252686)

* docs: CVEs fixed on 4.1.2 (apache#33435)

(cherry picked from commit 8a8fb49)

* feat(api): Added uuid to list api calls (apache#32414)

(cherry picked from commit 8decc9e)

* fix(table-chart): time shift is not working (apache#33425)

(cherry picked from commit dc44748)

* fix(Sqllab):  Autocomplete got stuck in UI when open it too fast (apache#33522)

(cherry picked from commit b4e2406)

* chore: update Dockerfile - Upgrade to 3.11.12 (apache#33612)

(cherry picked from commit f0b6e87)

* chore: updating 4.1.3rc2 change log

* Select all Drag and Drop (#546)

* add a select all button for the dnd select

* remove cypress

* chore(deps): bump cryptography from 43.0.3 to 44.0.1 (apache#32236)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit fa09d81)

* fix: Adds missing __init__ file to commands/logs (apache#33059)

(cherry picked from commit c1159c5)

* fix: Saved queries list break if one query can't be parsed (apache#34289)

(cherry picked from commit 1e5a4e9)

* chore: Adds 4.1.4RC1 data to CHANGELOG.md and UPDATING.md

* tag bump for select all drag and drop

* Fix package-lock.json

* Add db migration, bump Docker image base

* gevent for gunicorn

* remove threads and make worker-connections configurable

* Fix package-lock.json

* tag bump for cccs build

* Remove CCCS Dataset Explorer (#550)

* tag bump for CCCS build

---------

Signed-off-by: hainenber <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: gpchandran <[email protected]>
Co-authored-by: Joe Li <[email protected]>
Co-authored-by: Jack <[email protected]>
Co-authored-by: Đỗ Trọng Hải <[email protected]>
Co-authored-by: Michael S. Molina <[email protected]>
Co-authored-by: JUST.in DO IT <[email protected]>
Co-authored-by: Michael S. Molina <[email protected]>
Co-authored-by: Andreas Motl <[email protected]>
Co-authored-by: Ville Brofeldt <[email protected]>
Co-authored-by: Yuri <[email protected]>
Co-authored-by: Maxime Beauchemin <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: GitHub Action <[email protected]>
Co-authored-by: sha174n <[email protected]>
Co-authored-by: Paul Rhodes <[email protected]>
Co-authored-by: Rafael Benitez <[email protected]>
Co-authored-by: cccs-RyanK <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: cyber-jessie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels change:frontend Requires changing the frontend packages size/M v4.1 Label added by the release manager to track PRs to be included in the 4.1 branch v5.0 Label added by the release manager to track PRs to be included in the 5.0 branch 🍒 4.1.3 Cherry-picked to 4.1.3 🍒 4.1.4 🍒 5.0.0 Cherry-picked to 5.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants