Skip to content

Saving entire body of messsages#1621

Merged
arkid15r merged 17 commits intoOWASP:mainfrom
Dishant1804:fetch_entire_message_body
Jun 21, 2025
Merged

Saving entire body of messsages#1621
arkid15r merged 17 commits intoOWASP:mainfrom
Dishant1804:fetch_entire_message_body

Conversation

@Dishant1804
Copy link
Contributor

Resolves #1613

  • messages whole body saved
  • thread replies whole body saved

@Dishant1804 Dishant1804 requested a review from arkid15r as a code owner June 16, 2025 02:59
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 16, 2025

Summary by CodeRabbit

  • New Features
    • Messages now store the complete raw Slack message data, allowing for richer information and improved search capabilities.
    • Added the ability to view the latest message in a conversation and the latest reply in a message thread.
  • Improvements
    • Message authors are now optional, supporting messages without a user or bot.
    • Enhanced admin interface with new filters and improved search for messages and members.
    • Increased batch size for message retrieval, improving sync performance.
  • Bug Fixes
    • Improved handling of missing profile information for members.
  • Tests
    • Updated and consolidated tests to reflect changes in message storage and author handling.

Summary by CodeRabbit

  • New Features

    • Messages now store the full raw Slack message data for improved traceability.
    • Messages can be created and stored even if no author is identified.
  • Bug Fixes

    • Improved handling of messages without an author or with missing user/bot information.
  • Tests

    • Updated and removed tests to reflect the new message creation behavior for messages without an author.

Walkthrough

This change updates the Slack message synchronization process to store the entire raw message body in the database, modifies the message model to allow messages without an author, and adjusts related tests accordingly. It also removes filtering for certain message subtypes and improves author resolution logic with retries and error handling.

Changes

Files / Grouped Files Change Summary
backend/apps/slack/management/commands/slack_sync_messages.py Removed subtype/content filtering; improved author resolution with retries; always creates message with or without author; increased batch size to 999; refactored message and reply fetching to bulk save.
backend/apps/slack/models/message.py, backend/apps/slack/migrations/0016_*.py, 0017_remove_message_text.py Added raw_data JSONField to Message; made author nullable; removed text field; updated methods to handle optional author and store raw data; added latest_reply property.
backend/apps/slack/models/conversation.py Added latest_message property to Conversation model.
backend/apps/slack/models/member.py Improved robustness of email extraction from Slack member data.
backend/apps/slack/admin.py Added filters and updated display/search fields in MemberAdmin and MessageAdmin for new model fields.
backend/tests/slack/commands/management/slack_sync_messages_test.py Removed tests for subtype/content filtering; consolidated and updated tests to reflect new message creation behavior without author; updated batch size default in tests.
backend/tests/slack/models/message_test.py Removed assertions on removed text field; adapted tests to use raw_data for message content verification.

Assessment against linked issues

Objective (Issue #) Addressed Explanation
Save entire message body during message sync (#1613)

Assessment against linked issues: Out-of-scope changes

No out-of-scope changes found.

Possibly related PRs

Suggested reviewers

  • kasya
  • arkid15r
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate Unit Tests
  • Create PR with Unit Tests
  • Post Copyable Unit Tests in Comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
backend/apps/slack/models/message.py (1)

47-49: Rename related_name for the inviter FK

Using related_name="inviter" will create member.inviter which collides conceptually with the Member object itself and reads ambiguously. Consider something like related_name="invited_messages" for clarity and to avoid accidental shadowing.

backend/apps/slack/management/commands/slack_sync_messages.py (2)

352-357: Remove unreachable duplicate author-check

author is already validated just above; these lines will never be hit and add noise.

-            if not author:
-                self.stdout.write(
-                    self.style.WARNING(f"Could not fetch user {slack_user_id}, skipping message")
-                )
-                return None

370-415: Tidy _get_or_create_member: naming & needless initial sleep

  1. Variable author actually holds a Member; rename to member for readability.
  2. The unconditional time.sleep(delay) before the first API call slows the happy path without need. Sleep only after a rate-limit response.
-            author = None
+            member = None
...
-                    time.sleep(delay)
...
-                    author = Member.update_data(...)
+                    member = Member.update_data(...)
...
-            return author
+            return member

This also satisfies the pylint “too many arguments” warning if you move delay to an internal constant or default.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 370-370: Too many arguments (6/5)

(R0913)


[refactor] 370-370: Too many positional arguments (6/5)

(R0917)

backend/tests/slack/models/message_test.py (1)

212-219: Assert the inviter was persisted

The test verifies the plumbing but never checks the field value:

-assert result is mock_message_instance
+assert result is mock_message_instance
+assert result.inviter is mock_inviter

Helps catch regressions where the FK assignment is accidentally dropped.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ddab0b7 and 1d06bc5.

📒 Files selected for processing (5)
  • backend/apps/slack/management/commands/slack_sync_messages.py (1 hunks)
  • backend/apps/slack/migrations/0016_message_blocks_message_client_message_id_and_more.py (1 hunks)
  • backend/apps/slack/models/message.py (5 hunks)
  • backend/tests/slack/commands/management/slack_sync_messages_test.py (9 hunks)
  • backend/tests/slack/models/message_test.py (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
backend/apps/slack/management/commands/slack_sync_messages.py (3)
backend/apps/slack/models/message.py (2)
  • Message (13-149)
  • update_data (111-149)
backend/apps/slack/models/conversation.py (1)
  • update_data (68-90)
backend/apps/slack/models/member.py (1)
  • Member (10-78)
🪛 Pylint (3.3.7)
backend/apps/slack/management/commands/slack_sync_messages.py

[refactor] 370-370: Too many arguments (6/5)

(R0913)


[refactor] 370-370: Too many positional arguments (6/5)

(R0917)

backend/apps/slack/migrations/0016_message_blocks_message_client_message_id_and_more.py

[refactor] 7-7: Too few public methods (0/2)

(R0903)

⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Run frontend unit tests
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run backend tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (1)
backend/tests/slack/commands/management/slack_sync_messages_test.py (1)

234-260: Looks good – updated subtype tests cover the new ignore-logic.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d06bc5 and b3a940e.

📒 Files selected for processing (2)
  • backend/apps/slack/migrations/0016_message_blocks_message_client_message_id_and_more.py (1 hunks)
  • backend/apps/slack/models/message.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/apps/slack/models/message.py
🧰 Additional context used
🪛 Pylint (3.3.7)
backend/apps/slack/migrations/0016_message_blocks_message_client_message_id_and_more.py

[refactor] 7-7: Too few public methods (0/2)

(R0903)

⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run backend tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (2)
backend/apps/slack/migrations/0016_message_blocks_message_client_message_id_and_more.py (2)

74-81: 👍 parent_user_id, subtype, and team now include max_length

This addresses the divergence warning noted in the previous review. No further action needed here.


28-34: Re-evaluate on_delete=models.CASCADE for the inviter relation

If a Member is deleted, all messages they invited will also be deleted.
Consider PROTECT or SET_NULL instead to preserve historical messages.

[ suggest_optional_refactor ]

@Dishant1804 Dishant1804 force-pushed the fetch_entire_message_body branch from b3a940e to 3ce2b3c Compare June 16, 2025 17:46
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
backend/apps/slack/migrations/0016_message_message_body.py (1)

12-16: Adding message_body JSONField
The new field with default=dict and a descriptive verbose_name will store full Slack payloads as intended.
Suggestion: if you need historical message data backfilled (instead of {}), consider adding a RunPython data migration or allowing null=True until real payloads are populated.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b3a940e and 3ce2b3c.

📒 Files selected for processing (4)
  • backend/apps/slack/admin.py (1 hunks)
  • backend/apps/slack/management/commands/slack_sync_messages.py (1 hunks)
  • backend/apps/slack/migrations/0016_message_message_body.py (1 hunks)
  • backend/apps/slack/models/message.py (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • backend/apps/slack/admin.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/apps/slack/management/commands/slack_sync_messages.py
  • backend/apps/slack/models/message.py
🧰 Additional context used
🪛 Pylint (3.3.7)
backend/apps/slack/migrations/0016_message_message_body.py

[refactor] 6-6: Too few public methods (0/2)

(R0903)

⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: Run backend tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (3)
backend/apps/slack/migrations/0016_message_message_body.py (3)

1-2: Auto-generated migration header is correct
The timestamp and Django version comment follow conventions and need no changes.


3-3: Imports are appropriate
Using migrations and models from django.db aligns with standard Django migration patterns.


7-9: Dependency declaration is valid
Dependence on 0015_remove_message_is_thread_parent_message_has_replies_and_more correctly sequences this migration.

@Dishant1804 Dishant1804 requested a review from arkid15r June 16, 2025 21:57
@Dishant1804 Dishant1804 requested a review from arkid15r June 17, 2025 03:14
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
backend/apps/slack/management/commands/slack_sync_messages.py (2)

338-341: Remove unconditional sleep before first users_info call

time.sleep(delay) is executed unconditionally before the first attempt to call the Slack API, slowing down normal execution paths.
Move the sleep inside the rate-limited branch where it is actually needed.

-                        try:
-                            time.sleep(delay)
+                        try:
                             user_info = client.users_info(user=slack_user_id)

352-360: Use dict-style access consistently for SlackResponse

Everywhere else you use e.response["error"]; here you switch to .get().
SlackResponse mimics a dict but does not guarantee .get(). Stick to the bracket form for consistency and to avoid surprises if the SDK changes.

-                            if e.response.get("error") == "ratelimited":
+                            if e.response["error"] == "ratelimited":
backend/apps/slack/migrations/0016_message_is_bot_message_raw_data_alter_message_author.py (1)

20-22: Guard against storing unbounded raw payloads

raw_data will store the full Slack payload, which can occasionally exceed several hundred kilobytes (large file attachments, blocks, etc.).
Consider adding a DB-level size check or an application-level trim/compression step to avoid bloating the table.

This can be deferred, but worth tracking.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 514f30a and 158f650.

📒 Files selected for processing (3)
  • backend/apps/slack/management/commands/slack_sync_messages.py (1 hunks)
  • backend/apps/slack/migrations/0016_message_is_bot_message_raw_data_alter_message_author.py (1 hunks)
  • backend/apps/slack/models/message.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/apps/slack/models/message.py
🧰 Additional context used
🪛 Pylint (3.3.7)
backend/apps/slack/migrations/0016_message_is_bot_message_raw_data_alter_message_author.py

[refactor] 7-7: Too few public methods (0/2)

(R0903)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Run frontend e2e tests

@Dishant1804 Dishant1804 requested a review from arkid15r June 18, 2025 18:54
Copy link
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general is you are writing the same code (especially in the same file) more than once you probably implementing something wrong.

@sonarqubecloud
Copy link

@arkid15r arkid15r enabled auto-merge June 21, 2025 02:55
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
backend/apps/slack/management/commands/slack_sync_messages.py (2)

274-274: Consider parameter consolidation to address complexity.

The method has 7 parameters, which exceeds best practices and triggers pylint warnings. While all parameters appear necessary, consider consolidating them into a parameter object or dataclass to improve maintainability.

Example refactoring approach:

+from dataclasses import dataclass
+
+@dataclass
+class MessageCreationParams:
+    client: WebClient
+    conversation: Conversation
+    delay: float
+    max_retries: int
+
-def _create_message(
-    self,
-    client: WebClient,
-    message_data: dict,
-    conversation: Conversation,
-    delay: float,
-    max_retries: int,
-    *,
-    parent_message: Message | None = None,
-) -> Message | None:
+def _create_message(
+    self,
+    params: MessageCreationParams,
+    message_data: dict,
+    *,
+    parent_message: Message | None = None,
+) -> Message | None:

285-338: Simplify the nested retry logic for better maintainability.

While the logic correctly handles both users and bots, the nested retry structure makes it difficult to follow. Consider extracting the member creation logic to reduce complexity.

+def _create_member_with_retry(self, client, slack_user_id, message_data, workspace, delay, max_retries):
+    """Create member with retry logic."""
+    for retry_count in range(max_retries):
+        try:
+            if message_data.get("user"):
+                user_info = client.users_info(user=slack_user_id)
+                self._handle_slack_response(user_info, "users_info")
+                return Member.update_data(user_info["user"], workspace, save=True)
+            else:
+                bot_info = client.bots_info(bot=slack_user_id)
+                self._handle_slack_response(bot_info, "bots_info")
+                bot_data = {
+                    "id": slack_user_id,
+                    "is_bot": True,
+                    "name": bot_info["bot"].get("name"),
+                    "real_name": bot_info["bot"].get("name"),
+                }
+                return Member.update_data(bot_data, workspace, save=True)
+        except SlackApiError as e:
+            if e.response.get("error") == "ratelimited":
+                retry_after = int(e.response.headers.get("Retry-After", delay))
+                self.stdout.write(self.style.WARNING("Rate limited on member info"))
+                time.sleep(retry_after)
+            else:
+                self.stdout.write(self.style.ERROR(f"Failed to fetch member data for {slack_user_id}"))
+                break
+    return None

 def _create_message(self, ...):
     # ... existing code ...
     except Member.DoesNotExist:
-        retry_count = 0
-        while retry_count < max_retries:
-            # ... complex nested logic ...
+        author = self._create_member_with_retry(
+            client, slack_user_id, message_data, conversation.workspace, delay, max_retries
+        )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bbf0780 and ba3b3fe.

📒 Files selected for processing (8)
  • backend/apps/slack/admin.py (2 hunks)
  • backend/apps/slack/management/commands/slack_sync_messages.py (7 hunks)
  • backend/apps/slack/migrations/0017_remove_message_text.py (1 hunks)
  • backend/apps/slack/models/conversation.py (2 hunks)
  • backend/apps/slack/models/member.py (1 hunks)
  • backend/apps/slack/models/message.py (3 hunks)
  • backend/tests/slack/commands/management/slack_sync_messages_test.py (5 hunks)
  • backend/tests/slack/models/message_test.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/apps/slack/models/message.py
  • backend/tests/slack/commands/management/slack_sync_messages_test.py
🧰 Additional context used
🪛 Pylint (3.3.7)
backend/apps/slack/management/commands/slack_sync_messages.py

[refactor] 274-274: Too many arguments (7/5)

(R0913)


[refactor] 274-274: Too many positional arguments (6/5)

(R0917)

backend/apps/slack/migrations/0017_remove_message_text.py

[refactor] 6-6: Too few public methods (0/2)

(R0903)

⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: Run backend tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (13)
backend/apps/slack/migrations/0017_remove_message_text.py (1)

6-16: LGTM! Migration structure is correct.

The migration properly removes the text field and correctly depends on the previous migration that introduced raw_data. The structure follows Django migration best practices.

backend/apps/slack/models/conversation.py (2)

4-4: Good use of TYPE_CHECKING to prevent circular imports.

The conditional import pattern properly avoids runtime circular dependencies while maintaining type hints.

Also applies to: 11-12


43-46: Efficient implementation of latest_message property.

The property correctly uses Django ORM ordering to get the most recent message. The return type annotation properly indicates it can return None when no messages exist.

backend/apps/slack/models/member.py (1)

52-52: Excellent defensive programming improvement.

Using nested .get() calls prevents KeyError exceptions when the "profile" key is missing from member_data, making the code more robust when processing Slack member data.

backend/tests/slack/models/message_test.py (1)

181-182: Test correctly updated to align with model changes.

The test properly uses raw_data={"text": "Short message"} instead of the removed text field, maintaining the same test logic while adapting to the new data structure.

backend/apps/slack/admin.py (2)

93-96: Useful admin filters added for Member model.

Adding filters for is_bot and workspace will improve the admin interface usability for managing Slack members.


136-154: Admin interface properly updated to align with model changes.

The changes correctly:

  • Update list_display to show created_at instead of the removed text field
  • Add author and conversation to provide better context in the admin list view
  • Change search from "text" to "raw_data__text" to maintain search functionality with the new data structure
  • Add useful filters for has_replies and conversation
backend/apps/slack/management/commands/slack_sync_messages.py (6)

23-23: Good performance optimization.

Increasing the default batch size from 200 to 999 will reduce the number of API calls needed while staying within Slack's API limits.


115-124: Excellent architectural improvements.

The refactoring to use bulk saves and immediate reply fetching significantly improves performance and aligns well with the goal of saving complete message data. The void method approach with bulk saves is much more efficient than individual saves.

Also applies to: 130-138, 141-153, 173-191


202-272: Consistent and well-implemented reply fetching.

The bulk save approach and improved parameter handling align well with the main message fetching logic. Good consistency in the codebase.


339-345: Correct implementation for bulk save strategy.

The save=False parameter correctly supports the bulk save approach, and the method properly handles cases where author might be None.


274-345: Successfully addresses past review feedback.

The refactored _create_message method properly handles the bot_id vs user distinction that was flagged in previous reviews and aligns well with the PR objective of saving entire message bodies without unnecessary filtering.


312-320: Verify bot_data structure matches Member model expectations.

The custom bot_data dictionary creation looks functional, but please verify that the field mapping aligns with the Member model structure, particularly the mapping of bot name to both name and real_name fields.

#!/bin/bash
# Description: Check the Member model structure to verify bot_data field mapping
# Expected: Find the Member model definition and its field structure

ast-grep --pattern 'class Member($$$):
  $$$'

@Dishant1804 Dishant1804 requested a review from arkid15r June 21, 2025 05:16
@arkid15r arkid15r added this pull request to the merge queue Jun 21, 2025
Merged via the queue into OWASP:main with commit ff8f279 Jun 21, 2025
23 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Aug 21, 2025
5 tasks
@coderabbitai coderabbitai bot mentioned this pull request Oct 26, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Save entire message body during for message sync

2 participants