Skip to content

Conversation

@Usiel
Copy link
Contributor

@Usiel Usiel commented Mar 11, 2025

SUMMARY

Adds a celery task to warm up the Slack channel cache and allows admins to schedule the same task (with example). This is helpful for cases where retrieval of the full channel list takes minutes.

Changes to the get_channels function

The Slack conversations API is not necessarily faster when using the types filter: For example, when we filter for private channels, the cursor will still iterate over all public channels and the number of requests remains the same, only the data transmitted is less.

With this in mind, I believe it is more efficient for us to cache only a single value for get_channels() and avoid having multiple cache values for each possible value of types. Doing the filtering in get_channels_with_search on demand doesn't add much cost.

Additionally, the limit param is also not relevant for the cache key and is inlined.

... and, of course, this change makes the celery task more sensible :)

Warning

There is a minor API change in this PR: Previously, we followed Slack's conversations.list convention, where empty an types param means that only public channels are returned. After this PR we will return all channel types when types is empty or null. To me that feels more logical, but if people feel we should stick to previous approach I'm OK with changing this.

TESTING INSTRUCTIONS

Pre:

  • Set SLACK_API_TOKEN
  • Enable ALERT_REPORTS

Test Celery Task

celery --app=superset.tasks.celery_app:app call slack.cache_channels

Expected: Task is launched and result is cached.

Test Beat Schedule Example

  1. Uncomment the beat schedule entry for slack.cache_channels
  2. [Optional]: Adapt schedule to trigger every minute (or set time)

Expected: Task is scheduled and result is cached.

Test REST API

Try for TYPES={"public_channel", "private_channel", "", "public_channel, private_channel"}

curl -X 'GET' \
  'http://localhost:8088/api/v1/report/slack_channels/?q=(types%3A!(<TYPES>))' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <TOKEN>'

Expected: Cached value is used, filtered and returned.

ADDITIONAL INFORMATION

cc @Vitor-Avila since this builds on your recent work in #32529

Usiel added 2 commits March 11, 2025 12:46
The Slack conversations API is not necessarily faster when using the `types` filter: For example,
when we filter for private channels, the cursor will still iterate over all public channels and the
number of requests remains the same, only the data transmitted is less.

With this in mind, I believe it is more efficient for us to cache only a single value for `get_channels()`
and avoid having multiple cache values for each possible value of `types`. Doing the filtering in
`get_channels_with_search` on demand doesn't add much cost.

Additionally, the `limit` param is also not relevant for the cache key and is inlined.
Adds a celery task to warm up the Slack channel cache and allows admins to schedule
the same task (with example). This is helpful for cases where retrieval of the full
channel list takes minutes.

### Testing

Pre:
- Set `SLACK_API_TOKEN`
- Enable `ALERT_REPORTS`

#### Test Celery Task

```
celery --app=superset.tasks.celery_app:app call slack.cache_channels
```

**Expected**: Task is launched and result is cached.

#### Test Beat Schedule Example

1. Uncomment the beat schedule entry for `slack.cache_channels`
2. [Optional]: Adapt schedule to trigger every minute (or set time)

**Expected**: Task is scheduled and result is cached.

#### Test REST API

Try for `TYPES={"public_channel", "private_channel", "", "public_channel, private_channel"}

```
curl -X 'GET' \
  'http://localhost:8088/api/v1/report/slack_channels/?q=(types%3A!(<TYPES>))' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <TOKEN>'
```

**Expected**: Cached value is filtered and returned.
@dosubot dosubot bot added the alert-reports Namespace | Anything related to the Alert & Reports feature label Mar 11, 2025
raise SupersetException(f"Failed to list channels: {ex}") from ex

if types and not len(types) == len(SlackChannelTypes):
conditions: list[Callable[[SlackChannelSchema], bool]] = []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention here was to make this more easily pluggable in the future if we add more types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Fix Detected
Performance Missing Celery Task Retry Mechanism ▹ view
Functionality Function Call Parameter Mismatch ▹ view
Files scanned
File Path Reviewed
superset/tasks/slack.py
superset/utils/slack.py
superset/config.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Need a new review? Comment /korbit-review on this PR and I'll review your latest changes.

Korbit Guide: Usage and Customization

Interacting with Korbit

  • You can manually ask Korbit to review your PR using the /korbit-review command in a comment at the root of your PR.
  • You can ask Korbit to generate a new PR description using the /korbit-generate-pr-description command in any comment on your PR.
  • Too many Korbit comments? I can resolve all my comment threads if you use the /korbit-resolve command in any comment on your PR.
  • On any given comment that Korbit raises on your pull request, you can have a discussion with Korbit by replying to the comment.
  • Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.

Customizing Korbit

  • Check out our docs on how you can make Korbit work best for you and your team.
  • Customize Korbit for your organization through the Korbit Console.

Feedback and Support

@codecov
Copy link

codecov bot commented Mar 11, 2025

Codecov Report

Attention: Patch coverage is 47.82609% with 12 lines in your changes missing coverage. Please review.

Project coverage is 83.43%. Comparing base (76d897e) to head (b3a7db5).
Report is 1700 commits behind head on master.

Files with missing lines Patch % Lines
superset/tasks/slack.py 0.00% 12 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #32585       +/-   ##
===========================================
+ Coverage   60.48%   83.43%   +22.94%     
===========================================
  Files        1931      549     -1382     
  Lines       76236    39378    -36858     
  Branches     8568        0     -8568     
===========================================
- Hits        46114    32855    -13259     
+ Misses      28017     6523    -21494     
+ Partials     2105        0     -2105     
Flag Coverage Δ
hive ∅ <13.04%> (∅)
javascript ?
mysql ∅ <13.04%> (?)
postgres ∅ <13.04%> (?)
presto ∅ <13.04%> (∅)
python ∅ <47.82%> (∅)
sqlite ∅ <13.04%> (?)
unit ∅ <47.82%> (∅)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +1050 to +1054
# Uncomment to enable Slack channel cache warm-up
# "slack.cache_channels": {
# "task": "slack.cache_channels",
# "schedule": crontab(minute="0", hour="*"),
# },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@Vitor-Avila
Copy link
Contributor

This is super cool! In regards to:

There is a minor API change in this PR: Previously, we followed Slack's conversations.list convention, where empty an types param means that only public channels are returned. After this PR we will return all channel types when types is empty or null. To me that feels more logical, but if people feel we should stick to previous approach I'm OK with changing this.

That makes sense to me (if you're not including a filter, you're expecting to get all data), but I'm not super familiar with all rules for API changes, so I'd like to get @eschutho's thoughts here. Would this be considered a breaking change? While it's an API change, this API endpoint is mostly used by the bot (I can't think of a reason someone would interact with this API directly).

Regardless, I believe that doesn't hold us from merging, but curious if there should be mention of this in the UPDATING.md file and also if we can include it in the next OSS release version.

while True:
response = client.conversations_list(
limit=limit, cursor=cursor, exclude_archived=True, **extra_params
limit=999, cursor=cursor, exclude_archived=True, **extra_params
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we hardcoding this now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I inlined the limit because it never governed the amount of channels returned by the function. The function always returns all channels. The limit parameter here only governs the amount of channels returned by each Slack API response.

Going for the maximum (999) allowed by Slack is most efficient, imo. Lower values are more likely to result in rate-limiting (which is based on requests, not rows returned or similar).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we see a need we could add paging to our /slack_channels endpoint (and get_channels_with_search(...)). I don't know about that though, our only usage is to get all channels to give the user a choice for the report notification setting.

@sadpandajoe sadpandajoe requested a review from eschutho March 11, 2025 17:28
Usiel added 2 commits March 12, 2025 11:27
The new configuration field defaults to 1 day. Currently, the only cache affected is the Slack channels cache.
@pull-request-size pull-request-size bot added size/L and removed size/M labels Mar 12, 2025
# Slack API token for the superset reports, either string or callable
SLACK_API_TOKEN: Callable[[], str] | str | None = None
SLACK_PROXY = None
SLACK_CACHE_TIMEOUT = int(timedelta(days=1).total_seconds())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Copy link
Contributor

@Vitor-Avila Vitor-Avila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

def cache_channels() -> None:
try:
get_channels(
force=True, cache_timeout=current_app.config["SLACK_CACHE_TIMEOUT"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpgaspar can you confirm if this task would work properly for fetching the slack token (when it is a callable) for multi-tenant applications? Do we need to pass in some user context to this function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should just work

@rusackas
Copy link
Member

Looks good to merge?

@Vitor-Avila
Copy link
Contributor

@rusackas my only question here is relating to the API change. While this API endpoint is solely used by the Slack integration (I can't think why a user would create any workflow on top of a Superset API endpoint that just lists Slack channels) I wonder if we need to mention this in UPDATING.md and if it has to be considered a breaking change or not.

@michael-s-molina michael-s-molina merged commit 662f0fa into apache:master Mar 31, 2025
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

alert-reports Namespace | Anything related to the Alert & Reports feature size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants