-
Notifications
You must be signed in to change notification settings - Fork 16.5k
fix(welcome): perf on distinct recent activities #32608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(welcome): perf on distinct recent activities #32608
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
| Category | Issue | Fix Detected |
|---|---|---|
| Incorrect Chronological Order in Recent Activities ▹ view | ||
| Misleading Method Name ▹ view | ✅ |
Files scanned
| File Path | Reviewed |
|---|---|
| superset-frontend/packages/superset-ui-core/src/utils/lruCache.ts | ✅ |
| superset-frontend/src/features/home/types.ts | ✅ |
| superset-frontend/src/features/home/ActivityTable.tsx | ✅ |
| superset-frontend/src/pages/Home/index.tsx | ✅ |
| superset-frontend/src/views/CRUD/utils.tsx | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Need a new review? Comment
/korbit-reviewon this PR and I'll review your latest changes.Korbit Guide: Usage and Customization
Interacting with Korbit
- You can manually ask Korbit to review your PR using the
/korbit-reviewcommand in a comment at the root of your PR.- You can ask Korbit to generate a new PR description using the
/korbit-generate-pr-descriptioncommand in any comment on your PR.- Too many Korbit comments? I can resolve all my comment threads if you use the
/korbit-resolvecommand in any comment on your PR.- On any given comment that Korbit raises on your pull request, you can have a discussion with Korbit by replying to the comment.
- Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.
Customizing Korbit
- Check out our docs on how you can make Korbit work best for you and your team.
- Customize Korbit for your organization through the Korbit Console.
Feedback and Support
| public entries(): T[] { | ||
| return [...this.cache.values()]; | ||
| } |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
| recentsRes.json.result.reverse().forEach((record: RecentActivity) => { | ||
| distinctRes.set(record.item_url, record); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect Chronological Order in Recent Activities 
Tell me more
What is the issue?
The original array is being reversed before being processed, which modifies the chronological order of activities before applying the LRU cache.
Why this matters
This can lead to incorrect chronological ordering of recent activities in the UI, as older activities might appear as more recent than they actually are.
Suggested change ∙ Feature Preview
Process the records in their original order to maintain correct chronological sequence:
recentsRes.json.result.forEach((record: RecentActivity) => {
distinctRes.set(record.item_url, record);
});💬 Looking for more details? Reply to this comment to chat with Korbit.
michael-s-molina
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @justinpark.
LogDAO.get_recent_activity is only called from the recent_activity endpoint at superset/views/log/api.py. Should we remove the the whole distinct block? Given that this endpoint is not under v1, it's not a breaking change.
def get_recent_activity(
actions: list[str],
distinct: bool,
page: int,
page_size: int,
) -> list[dict[str, Any]]:
...
if distinct:
...
| ({ other }) => { | ||
| res.other = other; | ||
| res.viewed = recentsRes.json.result; | ||
| res.viewed = distinctRes.values().reverse(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to reverse twice or can this be optimized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API returns the latest accesses in reverse chronological order. (i.e. [2025-03-10, 2025-03-09, 2025-03-07, ...]) For an LRU cache, access records need to be stored in chronological order. Therefore, the API list must be reversed to maintain a record of accesses in chronological order. After that, since the final LRU cache list has the most recent entries placed at the end, it needs to be reversed again to position the latest entries at the front.
This two-step reversal process is an algorithmically correct approach.
Its endpoint is |
Oh, you're correct! |
Therefore, it improves the frontend to fetch in an efficient way and keeps the API specification as-is. |
(cherry picked from commit 832e028)
(cherry picked from commit 832e028)
(cherry picked from commit 832e028)
(cherry picked from commit 832e028)
* chore: bump base image in Dockerfile with `ARG PY_VER=3.11.11-slim-bookworm` (apache#32780) * chore: Revert "chore: bump base image in Dockerfile with `ARG PY_VER=3.11.11-slim-bookworm`" (apache#32782) * fix(chart data): removing query from /chart/data payload when accessing as guest user (apache#30858) (cherry picked from commit dd39138) * fix: upgrade to 3.11.11-slim-bookworm to address critical vulnerabilities (apache#32240) (cherry picked from commit ad05732) * fix(model/helper): represent RLS filter clause in proper textual SQL string (apache#32406) Signed-off-by: hainenber <[email protected]> (cherry picked from commit ff0529c) * fix: Log table retention policy (apache#32572) (cherry picked from commit 89b6d7f) * fix(welcome): perf on distinct recent activities (apache#32608) (cherry picked from commit 832e028) * fix(log): Update recent_activity by event name (apache#32681) (cherry picked from commit 449f51a) * fix: Signature of Celery pruner jobs (apache#32699) (cherry picked from commit df06bdf) * fix(logging): missing path in event data (apache#32708) (cherry picked from commit cd5a943) * fix(fe/dashboard-list): display modifier info for `Last modified` data (apache#32035) Signed-off-by: hainenber <[email protected]> (cherry picked from commit 88cf2d5) * fix: make packages PEP 625 compliant (apache#32866) Co-authored-by: Michael S. Molina <[email protected]> (cherry picked from commit 6e02d19) * all cccs changes * fix: Downgrade to marshmallow<4 (apache#33216) * fix(log): store navigation path to get correct logging path (apache#32795) (cherry picked from commit 4a70065) * fix(pivot-table): Revert "fix(Pivot Table): Fix column width to respect currency config (apache#31414)" (apache#32968) (cherry picked from commit a36e636) * fix: improve error type on parse error (apache#33048) (cherry picked from commit ed0cd5e) * fix(plugin-chart-echarts): remove erroneous upper bound value (apache#32473) (cherry picked from commit 5766c36) * fix(pinot): revert join and subquery flags (apache#32382) (cherry picked from commit 822d72c) * fix: loading examples from raw.githubusercontent.com fails with 429 errors (apache#33354) (cherry picked from commit f045a73) * chore: creating 4.1.3rc1 change log and updating frontend json (cherry picked from commit 72cf9b6) * chore(🦾): bump python sqlglot 26.1.3 -> 26.11.1 (apache#32745) Co-authored-by: GitHub Action <[email protected]> (cherry picked from commit 66c1a6a) * chore(🦾): bump python h11 0.14.0 -> 0.16.0 (apache#33339) Co-authored-by: GitHub Action <[email protected]> (cherry picked from commit 8252686) * docs: CVEs fixed on 4.1.2 (apache#33435) (cherry picked from commit 8a8fb49) * feat(api): Added uuid to list api calls (apache#32414) (cherry picked from commit 8decc9e) * fix(table-chart): time shift is not working (apache#33425) (cherry picked from commit dc44748) * fix(Sqllab): Autocomplete got stuck in UI when open it too fast (apache#33522) (cherry picked from commit b4e2406) * chore: update Dockerfile - Upgrade to 3.11.12 (apache#33612) (cherry picked from commit f0b6e87) * chore: updating 4.1.3rc2 change log * Select all Drag and Drop (#546) * add a select all button for the dnd select * remove cypress * chore(deps): bump cryptography from 43.0.3 to 44.0.1 (apache#32236) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> (cherry picked from commit fa09d81) * fix: Adds missing __init__ file to commands/logs (apache#33059) (cherry picked from commit c1159c5) * fix: Saved queries list break if one query can't be parsed (apache#34289) (cherry picked from commit 1e5a4e9) * chore: Adds 4.1.4RC1 data to CHANGELOG.md and UPDATING.md * tag bump for select all drag and drop * Fix package-lock.json * Add db migration, bump Docker image base * gevent for gunicorn * remove threads and make worker-connections configurable * Fix package-lock.json * tag bump for cccs build * Remove CCCS Dataset Explorer (#550) * tag bump for CCCS build --------- Signed-off-by: hainenber <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: gpchandran <[email protected]> Co-authored-by: Joe Li <[email protected]> Co-authored-by: Jack <[email protected]> Co-authored-by: Đỗ Trọng Hải <[email protected]> Co-authored-by: Michael S. Molina <[email protected]> Co-authored-by: JUST.in DO IT <[email protected]> Co-authored-by: Michael S. Molina <[email protected]> Co-authored-by: Andreas Motl <[email protected]> Co-authored-by: Ville Brofeldt <[email protected]> Co-authored-by: Yuri <[email protected]> Co-authored-by: Maxime Beauchemin <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: GitHub Action <[email protected]> Co-authored-by: sha174n <[email protected]> Co-authored-by: Paul Rhodes <[email protected]> Co-authored-by: Rafael Benitez <[email protected]> Co-authored-by: cccs-RyanK <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: cyber-jessie <[email protected]>
SUMMARY
Currently, the
recent_activityquery for the log table groups by the dashboard ID and slice ID to extract a distinct list from the entire log table, which leads to performance issues.(In the case of Airbnb, more than 1 million logs are generated each day, and grouping by dashboard and slice ID, even with indexing, significantly impacts database performance as shown in the following log)
To resolve this issue, it would be appropriate to create a materialized activity statistics view through a daily/hourly batch job. However, functionally, the main purpose of recent activity is to display only a few of the most recently visited items. Therefore, we improved performance by changing the approach to fetch the latest (including duplicates) log entries (by
distinct: false) and extract a distinct list using LRU cache in the frontend side.TESTING INSTRUCTIONS
specs (no visual changes)

ADDITIONAL INFORMATION