-
Notifications
You must be signed in to change notification settings - Fork 16.6k
chore(reports): add task for slack channels warm-up #32585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(reports): add task for slack channels warm-up #32585
Conversation
The Slack conversations API is not necessarily faster when using the `types` filter: For example, when we filter for private channels, the cursor will still iterate over all public channels and the number of requests remains the same, only the data transmitted is less. With this in mind, I believe it is more efficient for us to cache only a single value for `get_channels()` and avoid having multiple cache values for each possible value of `types`. Doing the filtering in `get_channels_with_search` on demand doesn't add much cost. Additionally, the `limit` param is also not relevant for the cache key and is inlined.
Adds a celery task to warm up the Slack channel cache and allows admins to schedule
the same task (with example). This is helpful for cases where retrieval of the full
channel list takes minutes.
### Testing
Pre:
- Set `SLACK_API_TOKEN`
- Enable `ALERT_REPORTS`
#### Test Celery Task
```
celery --app=superset.tasks.celery_app:app call slack.cache_channels
```
**Expected**: Task is launched and result is cached.
#### Test Beat Schedule Example
1. Uncomment the beat schedule entry for `slack.cache_channels`
2. [Optional]: Adapt schedule to trigger every minute (or set time)
**Expected**: Task is scheduled and result is cached.
#### Test REST API
Try for `TYPES={"public_channel", "private_channel", "", "public_channel, private_channel"}
```
curl -X 'GET' \
'http://localhost:8088/api/v1/report/slack_channels/?q=(types%3A!(<TYPES>))' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <TOKEN>'
```
**Expected**: Cached value is filtered and returned.
| raise SupersetException(f"Failed to list channels: {ex}") from ex | ||
|
|
||
| if types and not len(types) == len(SlackChannelTypes): | ||
| conditions: list[Callable[[SlackChannelSchema], bool]] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention here was to make this more easily pluggable in the future if we add more types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
| Category | Issue | Fix Detected |
|---|---|---|
| Missing Celery Task Retry Mechanism ▹ view | ✅ | |
| Function Call Parameter Mismatch ▹ view |
Files scanned
| File Path | Reviewed |
|---|---|
| superset/tasks/slack.py | ✅ |
| superset/utils/slack.py | ✅ |
| superset/config.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Need a new review? Comment
/korbit-reviewon this PR and I'll review your latest changes.Korbit Guide: Usage and Customization
Interacting with Korbit
- You can manually ask Korbit to review your PR using the
/korbit-reviewcommand in a comment at the root of your PR.- You can ask Korbit to generate a new PR description using the
/korbit-generate-pr-descriptioncommand in any comment on your PR.- Too many Korbit comments? I can resolve all my comment threads if you use the
/korbit-resolvecommand in any comment on your PR.- On any given comment that Korbit raises on your pull request, you can have a discussion with Korbit by replying to the comment.
- Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.
Customizing Korbit
- Check out our docs on how you can make Korbit work best for you and your team.
- Customize Korbit for your organization through the Korbit Console.
Feedback and Support
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #32585 +/- ##
===========================================
+ Coverage 60.48% 83.43% +22.94%
===========================================
Files 1931 549 -1382
Lines 76236 39378 -36858
Branches 8568 0 -8568
===========================================
- Hits 46114 32855 -13259
+ Misses 28017 6523 -21494
+ Partials 2105 0 -2105
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| # Uncomment to enable Slack channel cache warm-up | ||
| # "slack.cache_channels": { | ||
| # "task": "slack.cache_channels", | ||
| # "schedule": crontab(minute="0", hour="*"), | ||
| # }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
|
This is super cool! In regards to:
That makes sense to me (if you're not including a filter, you're expecting to get all data), but I'm not super familiar with all rules for API changes, so I'd like to get @eschutho's thoughts here. Would this be considered a breaking change? While it's an API change, this API endpoint is mostly used by the bot (I can't think of a reason someone would interact with this API directly). Regardless, I believe that doesn't hold us from merging, but curious if there should be mention of this in the |
| while True: | ||
| response = client.conversations_list( | ||
| limit=limit, cursor=cursor, exclude_archived=True, **extra_params | ||
| limit=999, cursor=cursor, exclude_archived=True, **extra_params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we hardcoding this now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I inlined the limit because it never governed the amount of channels returned by the function. The function always returns all channels. The limit parameter here only governs the amount of channels returned by each Slack API response.
Going for the maximum (999) allowed by Slack is most efficient, imo. Lower values are more likely to result in rate-limiting (which is based on requests, not rows returned or similar).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we see a need we could add paging to our /slack_channels endpoint (and get_channels_with_search(...)). I don't know about that though, our only usage is to get all channels to give the user a choice for the report notification setting.
The new configuration field defaults to 1 day. Currently, the only cache affected is the Slack channels cache.
| # Slack API token for the superset reports, either string or callable | ||
| SLACK_API_TOKEN: Callable[[], str] | str | None = None | ||
| SLACK_PROXY = None | ||
| SLACK_CACHE_TIMEOUT = int(timedelta(days=1).total_seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
Vitor-Avila
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
| def cache_channels() -> None: | ||
| try: | ||
| get_channels( | ||
| force=True, cache_timeout=current_app.config["SLACK_CACHE_TIMEOUT"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dpgaspar can you confirm if this task would work properly for fetching the slack token (when it is a callable) for multi-tenant applications? Do we need to pass in some user context to this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should just work
|
Looks good to merge? |
|
@rusackas my only question here is relating to the API change. While this API endpoint is solely used by the Slack integration (I can't think why a user would create any workflow on top of a Superset API endpoint that just lists Slack channels) I wonder if we need to mention this in |
SUMMARY
Adds a celery task to warm up the Slack channel cache and allows admins to schedule the same task (with example). This is helpful for cases where retrieval of the full channel list takes minutes.
Changes to the
get_channelsfunctionThe Slack conversations API is not necessarily faster when using the
typesfilter: For example, when we filter for private channels, the cursor will still iterate over all public channels and the number of requests remains the same, only the data transmitted is less.With this in mind, I believe it is more efficient for us to cache only a single value for
get_channels()and avoid having multiple cache values for each possible value oftypes. Doing the filtering inget_channels_with_searchon demand doesn't add much cost.Additionally, the
limitparam is also not relevant for the cache key and is inlined.... and, of course, this change makes the celery task more sensible :)
Warning
There is a minor API change in this PR: Previously, we followed Slack's conversations.list convention, where empty an
typesparam means that only public channels are returned. After this PR we will return all channel types whentypesis empty or null. To me that feels more logical, but if people feel we should stick to previous approach I'm OK with changing this.TESTING INSTRUCTIONS
Pre:
SLACK_API_TOKENALERT_REPORTSTest Celery Task
Expected: Task is launched and result is cached.
Test Beat Schedule Example
slack.cache_channelsExpected: Task is scheduled and result is cached.
Test REST API
Try for
TYPES={"public_channel", "private_channel", "", "public_channel, private_channel"}Expected: Cached value is used, filtered and returned.
ADDITIONAL INFORMATION
cc @Vitor-Avila since this builds on your recent work in #32529