Skip to content

Conversation

Dishant1804
Copy link
Collaborator

Resolves #1682

  • created chunks and embeddings for chanpters of OWASP

@Dishant1804 Dishant1804 requested a review from arkid15r as a code owner July 1, 2025 21:00
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 1, 2025

Summary by CodeRabbit

  • New Features

    • Added a command to generate and store chapter text chunks with vector embeddings for enhanced retrieval workflows.
    • Introduced a Makefile target to facilitate chapter chunk creation.
  • Improvements

    • Generalized chunk storage to support linking text chunks to any object, not just messages, enabling broader AI data association.
    • Updated admin interface to display and search by new generic chunk fields.
  • Bug Fixes

    • Improved handling of OpenAI API errors during chunk creation to ensure continued processing.
  • Tests

    • Updated and expanded tests to cover new generic chunk functionality and ensure correct behavior.
  • Chores

    • Centralized configuration constants for request timing and text delimiting.

Summary by CodeRabbit

  • New Features

    • Added a management command to create chapter chunks for AI workflows.
    • Introduced new configuration options for request timing intervals.
  • Improvements

    • Enhanced the admin interface for managing chunks, including updated list display and search fields.
    • Generalized chunk associations to support linking chunks with any model, not just messages.
  • Bug Fixes

    • Updated migration to ensure proper uniqueness and relationships for chunk data.

Walkthrough

This change generalizes the Chunk model from being Slack message-specific to supporting any content object using Django's content types framework. It introduces a management command for creating chapter-related chunks, updates admin and internal logic, centralizes timing constants, and provides a Makefile target for the new command. Database migrations support the new schema.

Changes

File(s) Change Summary
backend/apps/ai/models/chunk.py, backend/apps/ai/migrations/0004_alter_chunk_unique_together_chunk_content_type_and_more.py Refactored Chunk model: removed message ForeignKey, added generic relation fields (content_type, object_id, etc.), updated unique constraints and string representation. Migration updates schema accordingly.
backend/apps/ai/management/commands/ai_create_chapter_chunks.py Added new management command to create chapter chunks for RAG, with OpenAI embedding and batching logic.
backend/apps/ai/Makefile Added ai-create-chapter-chunks Makefile target to run the new management command.
backend/apps/ai/common/constants.py Added timing-related constants for request intervals.
backend/apps/ai/management/commands/ai_create_slack_message_chunks.py Updated to use generic content object logic and import timing constants from shared module.
backend/apps/ai/admin.py Updated admin list and search fields for ChunkAdmin to reflect generic content object model.

Assessment against linked issues

Objective Addressed Explanation
Implement chapter context related functionality (e.g., support for answering questions about OWASP chapters, context storage) (#1682)

Assessment against linked issues: Out-of-scope changes

No out-of-scope changes found.

Possibly related PRs

Suggested labels

nestbot, backend-tests, makefile

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate Unit Tests
  • Create PR with Unit Tests
  • Post Copyable Unit Tests in a Comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4b604f and 1429824.

📒 Files selected for processing (7)
  • backend/apps/ai/Makefile (1 hunks)
  • backend/apps/ai/admin.py (1 hunks)
  • backend/apps/ai/common/constants.py (1 hunks)
  • backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1 hunks)
  • backend/apps/ai/management/commands/ai_create_slack_message_chunks.py (2 hunks)
  • backend/apps/ai/migrations/0004_alter_chunk_unique_together_chunk_content_type_and_more.py (1 hunks)
  • backend/apps/ai/models/chunk.py (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
backend/apps/ai/management/commands/ai_create_slack_message_chunks.py (1)
backend/apps/slack/models/message.py (1)
  • text (83-85)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1)
backend/apps/ai/models/chunk.py (1)
  • Chunk (13-88)
🪛 Pylint (3.3.7)
backend/apps/ai/management/commands/ai_create_slack_message_chunks.py

[error] 10-13: Unable to import 'apps.ai.common.constants'

(E0401)

backend/apps/ai/migrations/0004_alter_chunk_unique_together_chunk_content_type_and_more.py

[convention] 1-1: Missing module docstring

(C0114)


[convention] 1-1: Module name "0004_alter_chunk_unique_together_chunk_content_type_and_more" doesn't conform to snake_case naming style

(C0103)


[error] 3-3: Unable to import 'django.db.models.deletion'

(E0401)


[error] 4-4: Unable to import 'django.db'

(E0401)


[convention] 7-7: Missing class docstring

(C0115)


[refactor] 7-7: Too few public methods (0/2)

(R0903)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py

[error] 7-7: Unable to import 'openai'

(E0401)


[error] 8-8: Unable to import 'django.core.management.base'

(E0401)


[error] 10-13: Unable to import 'apps.ai.common.constants'

(E0401)


[error] 14-14: Unable to import 'apps.ai.models.chunk'

(E0401)


[error] 15-15: Unable to import 'apps.owasp.models.chapter'

(E0401)


[convention] 18-18: Missing class docstring

(C0115)


[convention] 21-21: Missing function or method docstring

(C0116)


[convention] 35-35: Missing function or method docstring

(C0116)


[warning] 35-35: Unused argument 'args'

(W0613)


[refactor] 130-130: Too many branches (25/12)

(R0912)


[refactor] 130-130: Too many statements (57/50)

(R0915)


[warning] 42-42: Attribute 'openai_client' defined outside init

(W0201)


[warning] 108-108: Attribute 'last_request_time' defined outside init

(W0201)

backend/apps/ai/models/chunk.py

[error] 3-3: Unable to import 'django.contrib.contenttypes.fields'

(E0401)


[error] 4-4: Unable to import 'django.contrib.contenttypes.models'

(E0401)


[error] 5-5: Unable to import 'django.db'

(E0401)


[error] 6-6: Unable to import 'langchain.text_splitter'

(E0401)


[error] 7-7: Unable to import 'pgvector.django'

(E0401)


[convention] 16-16: Missing class docstring

(C0115)


[refactor] 16-16: Too few public methods (0/2)

(R0903)

🪛 Flake8 (7.2.0)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py

[error] 62-62: whitespace before ':'

(E203)

🔇 Additional comments (8)
backend/apps/ai/Makefile (1)

5-7: LGTM! Consistent target implementation.

The new Makefile target follows the established pattern and naming conventions. The implementation is clean and consistent with the existing ai-create-slack-message-chunks target.

backend/apps/ai/common/constants.py (1)

1-4: LGTM! Good centralization of timing constants.

The timing constants are well-named and the values (1.2s minimum interval, 2s default offset) are reasonable for API rate limiting. This centralization improves maintainability and eliminates code duplication across management commands.

backend/apps/ai/admin.py (1)

12-14: LGTM! Admin fields properly updated for generic relations.

The changes correctly reflect the model refactoring from message-specific to generic content objects. Using content_type in the display and object_id in search fields is appropriate for the new generic foreign key structure.

backend/apps/ai/management/commands/ai_create_slack_message_chunks.py (2)

10-13: LGTM! Proper use of centralized constants.

The import of timing constants from the shared module eliminates code duplication and improves maintainability. The static analysis import error is likely a false positive since the constants file exists in the codebase.


82-84: LGTM! Correct adaptation to generic content model.

The changes properly update the Chunk.update_data call to use the new generic content object pattern:

  • text parameter moved to first position
  • message parameter replaced with content_object=message

This aligns with the model refactoring to support any content type via Django's content types framework.

backend/apps/ai/migrations/0004_alter_chunk_unique_together_chunk_content_type_and_more.py (1)

14-42: object_id default aligns with model; migration approved

Verified that object_id is defined with default=0 in backend/apps/ai/models/chunk.py (line 22), which matches the migration’s default=0. No changes required unless you intend to allow a NULL value for object_id—the current setup is consistent.

backend/apps/ai/models/chunk.py (2)

27-37: Good implementation of dynamic content representation!

The __str__ method effectively handles different content types by checking for common attributes (name, key) before falling back to string representation.


54-88: Well-implemented generic content type support!

The update_data method has been successfully refactored to support any content object type while maintaining the same functionality. The updated docstring clearly reflects the change.

Copy link
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great refactoring, here are a few items to address before merging:

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1)

19-210: Address comprehensive feedback from previous reviews.

The previous reviews have identified several important issues that should be addressed:

  1. Code structure improvements: Missing docstrings, unused parameters, and attribute initialization
  2. Code complexity: The extract_chapter_content method has high cyclomatic complexity and should be refactored into smaller helper methods
  3. Formatting issues: Whitespace before colon in slice operation
  4. List comprehension complexity: The complex list comprehension with walrus operator should be simplified

These issues collectively impact code maintainability and readability. Please address the previous review feedback to improve the overall code quality.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1429824 and eac3b50.

📒 Files selected for processing (3)
  • backend/apps/ai/Makefile (1 hunks)
  • backend/apps/ai/common/constants.py (1 hunks)
  • backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1)
Learnt from: M-ayank2005
PR: OWASP/Nest#1282
File: frontend/src/pages/About.tsx:94-98
Timestamp: 2025-04-04T05:23:43.562Z
Learning: In the OWASP/Nest frontend project, DOMPurify is used to sanitize HTML content before rendering it with dangerouslySetInnerHTML, following security best practices to prevent XSS vulnerabilities.
🧬 Code Graph Analysis (1)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py (4)
backend/apps/ai/models/chunk.py (4)
  • Chunk (13-88)
  • bulk_save (40-42)
  • split_text (45-52)
  • update_data (55-88)
backend/apps/ai/management/commands/ai_create_slack_message_chunks.py (3)
  • Command (18-93)
  • handle (21-43)
  • create_chunks (45-93)
backend/apps/owasp/graphql/nodes/chapter.py (1)
  • suggested_location (54-56)
backend/apps/owasp/graphql/nodes/common.py (1)
  • related_urls (18-20)
🪛 checkmake (0.2.2)
backend/apps/ai/Makefile

[warning] 1-1: Missing required phony target "all"

(minphony)


[warning] 1-1: Missing required phony target "clean"

(minphony)


[warning] 1-1: Missing required phony target "test"

(minphony)

🪛 Flake8 (7.2.0)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py

[error] 60-60: whitespace before ':'

(E203)

🪛 Pylint (3.3.7)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py

[error] 7-7: Unable to import 'openai'

(E0401)


[error] 8-8: Unable to import 'django.core.management.base'

(E0401)


[error] 10-14: Unable to import 'apps.ai.common.constants'

(E0401)


[error] 15-15: Unable to import 'apps.ai.models.chunk'

(E0401)


[error] 16-16: Unable to import 'apps.owasp.models.chapter'

(E0401)


[convention] 19-19: Missing class docstring

(C0115)


[convention] 22-22: Missing function or method docstring

(C0116)


[convention] 34-34: Missing function or method docstring

(C0116)


[warning] 34-34: Unused argument 'args'

(W0613)


[refactor] 127-127: Too many branches (24/12)

(R0912)


[refactor] 127-127: Too many statements (55/50)

(R0915)


[warning] 41-41: Attribute 'openai_client' defined outside init

(W0201)


[warning] 105-105: Attribute 'last_request_time' defined outside init

(W0201)

🔇 Additional comments (2)
backend/apps/ai/common/constants.py (1)

1-5: LGTM! Well-defined constants for AI app configuration.

The constants are appropriately named and have reasonable values for API rate limiting and text processing. The centralization of these values will improve maintainability across the AI app.

backend/apps/ai/Makefile (1)

1-3: LGTM! Follows established pattern and proper ordering.

The new Makefile target is correctly implemented following the same pattern as the existing ai-create-slack-message-chunks target and is properly ordered alphabetically.

@Dishant1804 Dishant1804 requested a review from arkid15r July 2, 2025 05:12
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
backend/tests/apps/ai/models/chunk_test.py (1)

127-164: Test correctly validates no-save behavior.

The test properly verifies that save() is not called when save=False, maintaining the expected behavior with generic foreign keys.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 85bb4c5 and 70dd534.

📒 Files selected for processing (1)
  • backend/tests/apps/ai/models/chunk_test.py (2 hunks)
🔇 Additional comments (4)
backend/tests/apps/ai/models/chunk_test.py (4)

3-3: LGTM!

The ContentType import and the addition of mock.id are appropriate changes that align with the model's transition to generic foreign keys.

Also applies to: 14-14


19-36: LGTM!

The test correctly validates the string representation using the new generic foreign key approach with appropriate mocking.


101-125: LGTM!

The test correctly validates the behavior when a chunk already exists, properly using the generic foreign key approach.


171-177: LGTM!

The test appropriately validates the generic foreign key field types and relationships.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (9)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py (8)

19-21: Add a class docstring for better documentation.

While the help attribute provides basic information, adding a proper class docstring would improve code documentation and address the pylint warning.


22-38: Add docstring to document command-line arguments.


40-40: Add docstring and remove unused parameter.

The args parameter is not used in the method.


47-47: Consider initializing instance attributes in init method.

The openai_client attribute is defined outside __init__, which could lead to attribute errors if methods are called in unexpected order.


64-64: Fix whitespace formatting issue.

Remove the whitespace before the colon to comply with PEP 8.


95-106: Consider initializing instance attributes in init method.

The last_request_time attribute is accessed before being assigned, which could lead to attribute errors.


108-123: Simplify the complex list comprehension for better readability.

The list comprehension with walrus operator and filtering is hard to read and maintain.


128-211: Refactor this method to reduce complexity.

This method has high cyclomatic complexity (25 branches) and too many statements. Consider breaking it down into smaller, focused helper methods.

backend/apps/ai/models/chunk.py (1)

19-22: Revisit nullable content_type with unique_together constraint.

The current model allows content_type to be nullable, but it's part of the unique_together constraint. This could lead to unexpected behavior where multiple chunks with NULL content_type and the same text would be allowed.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70dd534 and 0951b19.

📒 Files selected for processing (3)
  • backend/apps/ai/common/constants.py (1 hunks)
  • backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1 hunks)
  • backend/apps/ai/models/chunk.py (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1)
Learnt from: M-ayank2005
PR: OWASP/Nest#1282
File: frontend/src/pages/About.tsx:94-98
Timestamp: 2025-04-04T05:23:43.562Z
Learning: In the OWASP/Nest frontend project, DOMPurify is used to sanitize HTML content before rendering it with dangerouslySetInnerHTML, following security best practices to prevent XSS vulnerabilities.
🧬 Code Graph Analysis (1)
backend/apps/ai/models/chunk.py (3)
backend/apps/common/models.py (2)
  • BulkSaveModel (8-30)
  • TimestampedModel (33-40)
backend/apps/common/utils.py (1)
  • truncate (164-176)
backend/apps/slack/models/message.py (2)
  • Meta (18-21)
  • text (83-85)
🪛 Pylint (3.3.7)
backend/apps/ai/models/chunk.py

[error] 3-3: Unable to import 'django.contrib.contenttypes.fields'

(E0401)


[error] 4-4: Unable to import 'django.contrib.contenttypes.models'

(E0401)


[error] 5-5: Unable to import 'django.db'

(E0401)


[error] 6-6: Unable to import 'langchain.text_splitter'

(E0401)


[error] 7-7: Unable to import 'pgvector.django'

(E0401)


[convention] 16-16: Missing class docstring

(C0115)


[refactor] 16-16: Too few public methods (0/2)

(R0903)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py

[error] 7-7: Unable to import 'openai'

(E0401)


[error] 8-8: Unable to import 'django.core.management.base'

(E0401)


[error] 10-14: Unable to import 'apps.ai.common.constants'

(E0401)


[error] 15-15: Unable to import 'apps.ai.models.chunk'

(E0401)


[error] 16-16: Unable to import 'apps.owasp.models.chapter'

(E0401)


[convention] 19-19: Missing class docstring

(C0115)


[convention] 22-22: Missing function or method docstring

(C0116)


[convention] 40-40: Missing function or method docstring

(C0116)


[warning] 40-40: Unused argument 'args'

(W0613)


[refactor] 128-128: Too many branches (24/12)

(R0912)


[refactor] 128-128: Too many statements (53/50)

(R0915)


[warning] 47-47: Attribute 'openai_client' defined outside init

(W0201)


[warning] 106-106: Attribute 'last_request_time' defined outside init

(W0201)

🪛 Flake8 (7.2.0)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py

[error] 64-64: whitespace before ':'

(E203)

⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: Run backend tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (3)
backend/apps/ai/common/constants.py (1)

1-6: LGTM!

The constants are well-named and provide sensible defaults for API rate limiting and text delimiting.

backend/apps/ai/models/chunk.py (2)

27-37: Well-designed generic string representation.

The __str__ method elegantly handles different content types by checking for common attributes (name, key) before falling back to string conversion.


55-88: Clean refactoring to support generic relations.

The update_data method has been properly updated to work with any content object while maintaining the existing behavior and API.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jul 3, 2025

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (9)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py (9)

19-21: Add a class docstring for better documentation.

While the help attribute provides basic information, adding a proper class docstring would improve code documentation and address the pylint warning.


22-22: Add docstring to document command-line arguments.


40-40: Add docstring and remove unused parameter.

The args parameter is not used in the method.


47-47: Consider initializing instance attributes in init method.

The openai_client and last_request_time attributes are defined outside __init__, which could lead to attribute errors if methods are called in unexpected order.


64-64: Fix whitespace formatting issue.

Remove the whitespace before the colon to comply with PEP 8.


95-106: Consider initializing instance attributes in init method.

The openai_client and last_request_time attributes are defined outside __init__, which could lead to attribute errors if methods are called in unexpected order.


108-123: Simplify the complex list comprehension for better readability.

The list comprehension with walrus operator and filtering is hard to read and maintain.


128-211: Refactor this method to reduce complexity.

This method has high cyclomatic complexity (24 branches) and too many statements. Consider breaking it down into smaller, focused helper methods.


195-195: Fix incorrect variable usage in leaders information.

Line 195 incorrectly appends location_parts instead of leaders_info to the metadata.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0951b19 and 50599e0.

📒 Files selected for processing (1)
  • backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1)
Learnt from: M-ayank2005
PR: OWASP/Nest#1282
File: frontend/src/pages/About.tsx:94-98
Timestamp: 2025-04-04T05:23:43.562Z
Learning: In the OWASP/Nest frontend project, DOMPurify is used to sanitize HTML content before rendering it with dangerouslySetInnerHTML, following security best practices to prevent XSS vulnerabilities.
🧬 Code Graph Analysis (1)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py (4)
backend/apps/ai/models/chunk.py (4)
  • Chunk (13-88)
  • bulk_save (40-42)
  • split_text (45-52)
  • update_data (55-88)
backend/apps/ai/management/commands/ai_create_slack_message_chunks.py (3)
  • Command (18-93)
  • handle (21-43)
  • create_chunks (45-93)
backend/apps/owasp/graphql/nodes/chapter.py (1)
  • suggested_location (54-56)
backend/apps/owasp/graphql/nodes/common.py (1)
  • related_urls (18-20)
🪛 Flake8 (7.2.0)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py

[error] 64-64: whitespace before ':'

(E203)

🪛 Pylint (3.3.7)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py

[error] 7-7: Unable to import 'openai'

(E0401)


[error] 8-8: Unable to import 'django.core.management.base'

(E0401)


[error] 10-14: Unable to import 'apps.ai.common.constants'

(E0401)


[error] 15-15: Unable to import 'apps.ai.models.chunk'

(E0401)


[error] 16-16: Unable to import 'apps.owasp.models.chapter'

(E0401)


[convention] 19-19: Missing class docstring

(C0115)


[convention] 22-22: Missing function or method docstring

(C0116)


[convention] 40-40: Missing function or method docstring

(C0116)


[warning] 40-40: Unused argument 'args'

(W0613)


[refactor] 128-128: Too many branches (24/12)

(R0912)


[refactor] 128-128: Too many statements (53/50)

(R0915)


[warning] 47-47: Attribute 'openai_client' defined outside init

(W0201)


[warning] 106-106: Attribute 'last_request_time' defined outside init

(W0201)

⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: Run backend tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (1)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1)

1-17: Well-organized imports and proper use of constants.

Good practice importing timing constants from a centralized location.

Comment on lines +34 to +37
"--batch-size",
type=int,
default=50,
help="Number of chapters to process in each batch",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add validation for batch-size argument.

Consider adding validation to ensure batch-size is positive.

         parser.add_argument(
             "--batch-size",
             type=int,
             default=50,
             help="Number of chapters to process in each batch",
         )
+        parser.add_argument(
+            "--batch-size",
+            type=int,
+            default=50,
+            help="Number of chapters to process in each batch",
+            choices=range(1, 1001),
+            metavar="{1..1000}",
+        )

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In backend/apps/ai/management/commands/ai_create_chapter_chunks.py around lines
34 to 37, the batch-size argument lacks validation to ensure it is positive. Add
a check after parsing the batch-size argument to verify it is greater than zero,
and raise an appropriate error or exit with a message if the value is zero or
negative. This will prevent invalid batch sizes from being used during
processing.

Comment on lines +29 to +31
"--all",
action="store_true",
help="Process all the chapters",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Clarify that --all includes inactive chapters.

The help text should explicitly mention that this option processes both active and inactive chapters to avoid confusion.

         parser.add_argument(
             "--all",
             action="store_true",
-            help="Process all the chapters",
+            help="Process all chapters (including inactive)",
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"--all",
action="store_true",
help="Process all the chapters",
parser.add_argument(
"--all",
action="store_true",
help="Process all chapters (including inactive)",
)
🤖 Prompt for AI Agents
In backend/apps/ai/management/commands/ai_create_chapter_chunks.py around lines
29 to 31, update the help text for the "--all" argument to explicitly state that
it processes both active and inactive chapters. Modify the help string to
clarify this behavior so users understand that using "--all" includes inactive
chapters as well.

Comment on lines +63 to +73
for offset in range(0, total_chapters, batch_size):
batch_chapters = queryset[offset : offset + batch_size]

batch_chunks = []
for chapter in batch_chapters:
batch_chunks.extend(self.create_chunks(chapter))

if batch_chunks:
Chunk.bulk_save(batch_chunks)
self.stdout.write(f"Saved {len(batch_chunks)} chunks")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add progress tracking for better user feedback.

Consider adding progress indicators showing which batch is being processed.

         batch_size = options["batch_size"]
+        batch_count = 0
         for offset in range(0, total_chapters, batch_size):
+            batch_count += 1
+            self.stdout.write(f"Processing batch {batch_count}/{(total_chapters + batch_size - 1) // batch_size}...")
             batch_chapters = queryset[offset : offset + batch_size]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for offset in range(0, total_chapters, batch_size):
batch_chapters = queryset[offset : offset + batch_size]
batch_chunks = []
for chapter in batch_chapters:
batch_chunks.extend(self.create_chunks(chapter))
if batch_chunks:
Chunk.bulk_save(batch_chunks)
self.stdout.write(f"Saved {len(batch_chunks)} chunks")
batch_size = options["batch_size"]
+ batch_count = 0
for offset in range(0, total_chapters, batch_size):
+ batch_count += 1
+ self.stdout.write(
+ f"Processing batch {batch_count}/{(total_chapters + batch_size - 1) // batch_size}..."
+ )
batch_chapters = queryset[offset : offset + batch_size]
batch_chunks = []
for chapter in batch_chapters:
batch_chunks.extend(self.create_chunks(chapter))
if batch_chunks:
Chunk.bulk_save(batch_chunks)
self.stdout.write(f"Saved {len(batch_chunks)} chunks")
🧰 Tools
🪛 Flake8 (7.2.0)

[error] 64-64: whitespace before ':'

(E203)

🤖 Prompt for AI Agents
In backend/apps/ai/management/commands/ai_create_chapter_chunks.py around lines
63 to 73, add progress tracking by including a print or log statement before
processing each batch to indicate the current batch number and total batches.
Calculate the total number of batches based on total_chapters and batch_size,
then output progress like "Processing batch X of Y" before processing each batch
to provide better user feedback.

@arkid15r arkid15r enabled auto-merge July 3, 2025 18:12
@arkid15r arkid15r added this pull request to the merge queue Jul 3, 2025
Merged via the queue into OWASP:main with commit 40aadc1 Jul 3, 2025
23 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Aug 13, 2025
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement chapter context related functionality

2 participants