Creating chunks for OWASP chapters #1693

Dishant1804 · 2025-07-01T21:00:58Z

Resolves #1682

created chunks and embeddings for chanpters of OWASP

coderabbitai · 2025-07-01T21:01:05Z

Summary by CodeRabbit

New Features
- Added a command to generate and store chapter text chunks with vector embeddings for enhanced retrieval workflows.
- Introduced a Makefile target to facilitate chapter chunk creation.
Improvements
- Generalized chunk storage to support linking text chunks to any object, not just messages, enabling broader AI data association.
- Updated admin interface to display and search by new generic chunk fields.
Bug Fixes
- Improved handling of OpenAI API errors during chunk creation to ensure continued processing.
Tests
- Updated and expanded tests to cover new generic chunk functionality and ensure correct behavior.
Chores
- Centralized configuration constants for request timing and text delimiting.

Summary by CodeRabbit

New Features
- Added a management command to create chapter chunks for AI workflows.
- Introduced new configuration options for request timing intervals.
Improvements
- Enhanced the admin interface for managing chunks, including updated list display and search fields.
- Generalized chunk associations to support linking chunks with any model, not just messages.
Bug Fixes
- Updated migration to ensure proper uniqueness and relationships for chunk data.

Walkthrough

This change generalizes the Chunk model from being Slack message-specific to supporting any content object using Django's content types framework. It introduces a management command for creating chapter-related chunks, updates admin and internal logic, centralizes timing constants, and provides a Makefile target for the new command. Database migrations support the new schema.

Changes

File(s)	Change Summary
backend/apps/ai/models/chunk.py, backend/apps/ai/migrations/0004_alter_chunk_unique_together_chunk_content_type_and_more.py	Refactored `Chunk` model: removed `message` ForeignKey, added generic relation fields (`content_type`, `object_id`, etc.), updated unique constraints and string representation. Migration updates schema accordingly.
backend/apps/ai/management/commands/ai_create_chapter_chunks.py	Added new management command to create chapter chunks for RAG, with OpenAI embedding and batching logic.
backend/apps/ai/Makefile	Added `ai-create-chapter-chunks` Makefile target to run the new management command.
backend/apps/ai/common/constants.py	Added timing-related constants for request intervals.
backend/apps/ai/management/commands/ai_create_slack_message_chunks.py	Updated to use generic content object logic and import timing constants from shared module.
backend/apps/ai/admin.py	Updated admin list and search fields for `ChunkAdmin` to reflect generic content object model.

Assessment against linked issues

Objective	Addressed	Explanation
Implement chapter context related functionality (e.g., support for answering questions about OWASP chapters, context storage) (#1682)	✅

Assessment against linked issues: Out-of-scope changes

No out-of-scope changes found.

Possibly related PRs

Chunk model and embeddings of message #1651: Introduced the original Slack message-specific Chunk model and related admin and command code, which this PR generalizes to support any content object.

Suggested labels

nestbot, backend-tests, makefile

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate Unit Tests

Create PR with Unit Tests
Post Copyable Unit Tests in a Comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai auto-generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 8

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4b604f and 1429824.

📒 Files selected for processing (7)

backend/apps/ai/Makefile (1 hunks)
backend/apps/ai/admin.py (1 hunks)
backend/apps/ai/common/constants.py (1 hunks)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1 hunks)
backend/apps/ai/management/commands/ai_create_slack_message_chunks.py (2 hunks)
backend/apps/ai/migrations/0004_alter_chunk_unique_together_chunk_content_type_and_more.py (1 hunks)
backend/apps/ai/models/chunk.py (3 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (2)

backend/apps/ai/management/commands/ai_create_slack_message_chunks.py (1)

backend/apps/slack/models/message.py (1)

text (83-85)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1)

backend/apps/ai/models/chunk.py (1)

Chunk (13-88)

🪛 Pylint (3.3.7)

backend/apps/ai/management/commands/ai_create_slack_message_chunks.py

[error] 10-13: Unable to import 'apps.ai.common.constants'

(E0401)

backend/apps/ai/migrations/0004_alter_chunk_unique_together_chunk_content_type_and_more.py

[convention] 1-1: Missing module docstring

(C0114)

[convention] 1-1: Module name "0004_alter_chunk_unique_together_chunk_content_type_and_more" doesn't conform to snake_case naming style

(C0103)

[error] 3-3: Unable to import 'django.db.models.deletion'

(E0401)

[error] 4-4: Unable to import 'django.db'

(E0401)

[convention] 7-7: Missing class docstring

(C0115)

[refactor] 7-7: Too few public methods (0/2)

(R0903)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py

[error] 7-7: Unable to import 'openai'

(E0401)

[error] 8-8: Unable to import 'django.core.management.base'

(E0401)

[error] 10-13: Unable to import 'apps.ai.common.constants'

(E0401)

[error] 14-14: Unable to import 'apps.ai.models.chunk'

(E0401)

[error] 15-15: Unable to import 'apps.owasp.models.chapter'

(E0401)

[convention] 18-18: Missing class docstring

(C0115)

[convention] 21-21: Missing function or method docstring

(C0116)

[convention] 35-35: Missing function or method docstring

(C0116)

[warning] 35-35: Unused argument 'args'

(W0613)

[refactor] 130-130: Too many branches (25/12)

(R0912)

[refactor] 130-130: Too many statements (57/50)

(R0915)

[warning] 42-42: Attribute 'openai_client' defined outside init

(W0201)

[warning] 108-108: Attribute 'last_request_time' defined outside init

(W0201)

backend/apps/ai/models/chunk.py

[error] 3-3: Unable to import 'django.contrib.contenttypes.fields'

(E0401)

[error] 4-4: Unable to import 'django.contrib.contenttypes.models'

(E0401)

[error] 5-5: Unable to import 'django.db'

(E0401)

[error] 6-6: Unable to import 'langchain.text_splitter'

(E0401)

[error] 7-7: Unable to import 'pgvector.django'

(E0401)

[convention] 16-16: Missing class docstring

(C0115)

[refactor] 16-16: Too few public methods (0/2)

(R0903)

🪛 Flake8 (7.2.0)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py

[error] 62-62: whitespace before ':'

(E203)

🔇 Additional comments (8)

backend/apps/ai/Makefile (1)

5-7: LGTM! Consistent target implementation.

The new Makefile target follows the established pattern and naming conventions. The implementation is clean and consistent with the existing ai-create-slack-message-chunks target.

backend/apps/ai/common/constants.py (1)

1-4: LGTM! Good centralization of timing constants.

The timing constants are well-named and the values (1.2s minimum interval, 2s default offset) are reasonable for API rate limiting. This centralization improves maintainability and eliminates code duplication across management commands.

backend/apps/ai/admin.py (1)

12-14: LGTM! Admin fields properly updated for generic relations.

The changes correctly reflect the model refactoring from message-specific to generic content objects. Using content_type in the display and object_id in search fields is appropriate for the new generic foreign key structure.

backend/apps/ai/management/commands/ai_create_slack_message_chunks.py (2)

10-13: LGTM! Proper use of centralized constants.

The import of timing constants from the shared module eliminates code duplication and improves maintainability. The static analysis import error is likely a false positive since the constants file exists in the codebase.

82-84: LGTM! Correct adaptation to generic content model.

The changes properly update the Chunk.update_data call to use the new generic content object pattern:

text parameter moved to first position

message parameter replaced with content_object=message

This aligns with the model refactoring to support any content type via Django's content types framework.

backend/apps/ai/migrations/0004_alter_chunk_unique_together_chunk_content_type_and_more.py (1)

14-42: object_id default aligns with model; migration approved

Verified that object_id is defined with default=0 in backend/apps/ai/models/chunk.py (line 22), which matches the migration’s default=0. No changes required unless you intend to allow a NULL value for object_id—the current setup is consistent.

backend/apps/ai/models/chunk.py (2)

27-37: Good implementation of dynamic content representation!

The __str__ method effectively handles different content types by checking for common attributes (name, key) before falling back to string representation.

54-88: Well-implemented generic content type support!

The update_data method has been successfully refactored to support any content object type while maintaining the same functionality. The updated docstring clearly reflects the change.

backend/apps/ai/management/commands/ai_create_chapter_chunks.py

backend/apps/ai/models/chunk.py

arkid15r

Great refactoring, here are a few items to address before merging:

backend/apps/ai/Makefile

backend/apps/ai/management/commands/ai_create_chapter_chunks.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1)

19-210: Address comprehensive feedback from previous reviews.

The previous reviews have identified several important issues that should be addressed:

Code structure improvements: Missing docstrings, unused parameters, and attribute initialization

Code complexity: The extract_chapter_content method has high cyclomatic complexity and should be refactored into smaller helper methods

Formatting issues: Whitespace before colon in slice operation

List comprehension complexity: The complex list comprehension with walrus operator should be simplified

These issues collectively impact code maintainability and readability. Please address the previous review feedback to improve the overall code quality.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1429824 and eac3b50.

📒 Files selected for processing (3)

backend/apps/ai/Makefile (1 hunks)
backend/apps/ai/common/constants.py (1 hunks)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1)

Learnt from: M-ayank2005
PR: OWASP/Nest#1282
File: frontend/src/pages/About.tsx:94-98
Timestamp: 2025-04-04T05:23:43.562Z
Learning: In the OWASP/Nest frontend project, DOMPurify is used to sanitize HTML content before rendering it with dangerouslySetInnerHTML, following security best practices to prevent XSS vulnerabilities.

🧬 Code Graph Analysis (1)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py (4)

backend/apps/ai/models/chunk.py (4)

Chunk (13-88)

bulk_save (40-42)

split_text (45-52)

update_data (55-88)

backend/apps/ai/management/commands/ai_create_slack_message_chunks.py (3)

Command (18-93)

handle (21-43)

create_chunks (45-93)

backend/apps/owasp/graphql/nodes/chapter.py (1)

suggested_location (54-56)

backend/apps/owasp/graphql/nodes/common.py (1)

related_urls (18-20)

🪛 checkmake (0.2.2)

backend/apps/ai/Makefile

[warning] 1-1: Missing required phony target "all"

(minphony)

[warning] 1-1: Missing required phony target "clean"

(minphony)

[warning] 1-1: Missing required phony target "test"

(minphony)

🪛 Flake8 (7.2.0)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py

[error] 60-60: whitespace before ':'

(E203)

🪛 Pylint (3.3.7)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py

[error] 7-7: Unable to import 'openai'

(E0401)

[error] 8-8: Unable to import 'django.core.management.base'

(E0401)

[error] 10-14: Unable to import 'apps.ai.common.constants'

(E0401)

[error] 15-15: Unable to import 'apps.ai.models.chunk'

(E0401)

[error] 16-16: Unable to import 'apps.owasp.models.chapter'

(E0401)

[convention] 19-19: Missing class docstring

(C0115)

[convention] 22-22: Missing function or method docstring

(C0116)

[convention] 34-34: Missing function or method docstring

(C0116)

[warning] 34-34: Unused argument 'args'

(W0613)

[refactor] 127-127: Too many branches (24/12)

(R0912)

[refactor] 127-127: Too many statements (55/50)

(R0915)

[warning] 41-41: Attribute 'openai_client' defined outside init

(W0201)

[warning] 105-105: Attribute 'last_request_time' defined outside init

(W0201)

🔇 Additional comments (2)

backend/apps/ai/common/constants.py (1)

1-5: LGTM! Well-defined constants for AI app configuration.

The constants are appropriately named and have reasonable values for API rate limiting and text processing. The centralization of these values will improve maintainability across the AI app.

backend/apps/ai/Makefile (1)

1-3: LGTM! Follows established pattern and proper ordering.

The new Makefile target is correctly implemented following the same pattern as the existing ai-create-slack-message-chunks target and is properly ordered alphabetically.

backend/apps/ai/management/commands/ai_create_chapter_chunks.py

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

backend/tests/apps/ai/models/chunk_test.py (1)

127-164: Test correctly validates no-save behavior.

The test properly verifies that save() is not called when save=False, maintaining the expected behavior with generic foreign keys.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 85bb4c5 and 70dd534.

📒 Files selected for processing (1)

backend/tests/apps/ai/models/chunk_test.py (2 hunks)

🔇 Additional comments (4)

backend/tests/apps/ai/models/chunk_test.py (4)

3-3: LGTM!

The ContentType import and the addition of mock.id are appropriate changes that align with the model's transition to generic foreign keys.

Also applies to: 14-14

19-36: LGTM!

The test correctly validates the string representation using the new generic foreign key approach with appropriate mocking.

101-125: LGTM!

The test correctly validates the behavior when a chunk already exists, properly using the generic foreign key approach.

171-177: LGTM!

The test appropriately validates the generic foreign key field types and relationships.

backend/tests/apps/ai/models/chunk_test.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (9)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py (8)

19-21: Add a class docstring for better documentation.

While the help attribute provides basic information, adding a proper class docstring would improve code documentation and address the pylint warning.

22-38: Add docstring to document command-line arguments.

40-40: Add docstring and remove unused parameter.

The args parameter is not used in the method.

47-47: Consider initializing instance attributes in init method.

The openai_client attribute is defined outside __init__, which could lead to attribute errors if methods are called in unexpected order.

64-64: Fix whitespace formatting issue.

Remove the whitespace before the colon to comply with PEP 8.

95-106: Consider initializing instance attributes in init method.

The last_request_time attribute is accessed before being assigned, which could lead to attribute errors.

108-123: Simplify the complex list comprehension for better readability.

The list comprehension with walrus operator and filtering is hard to read and maintain.

128-211: Refactor this method to reduce complexity.

This method has high cyclomatic complexity (25 branches) and too many statements. Consider breaking it down into smaller, focused helper methods.

backend/apps/ai/models/chunk.py (1)

19-22: Revisit nullable content_type with unique_together constraint.

The current model allows content_type to be nullable, but it's part of the unique_together constraint. This could lead to unexpected behavior where multiple chunks with NULL content_type and the same text would be allowed.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70dd534 and 0951b19.

📒 Files selected for processing (3)

backend/apps/ai/common/constants.py (1 hunks)
backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1 hunks)
backend/apps/ai/models/chunk.py (3 hunks)

🧰 Additional context used

🧠 Learnings (1)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1)

Learnt from: M-ayank2005
PR: OWASP/Nest#1282
File: frontend/src/pages/About.tsx:94-98
Timestamp: 2025-04-04T05:23:43.562Z
Learning: In the OWASP/Nest frontend project, DOMPurify is used to sanitize HTML content before rendering it with dangerouslySetInnerHTML, following security best practices to prevent XSS vulnerabilities.

🧬 Code Graph Analysis (1)

backend/apps/ai/models/chunk.py (3)

backend/apps/common/models.py (2)

BulkSaveModel (8-30)

TimestampedModel (33-40)

backend/apps/common/utils.py (1)

truncate (164-176)

backend/apps/slack/models/message.py (2)

Meta (18-21)

text (83-85)

🪛 Pylint (3.3.7)

backend/apps/ai/models/chunk.py

[error] 3-3: Unable to import 'django.contrib.contenttypes.fields'

(E0401)

[error] 4-4: Unable to import 'django.contrib.contenttypes.models'

(E0401)

[error] 5-5: Unable to import 'django.db'

(E0401)

[error] 6-6: Unable to import 'langchain.text_splitter'

(E0401)

[error] 7-7: Unable to import 'pgvector.django'

(E0401)

[convention] 16-16: Missing class docstring

(C0115)

[refactor] 16-16: Too few public methods (0/2)

(R0903)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py

[error] 7-7: Unable to import 'openai'

(E0401)

[error] 8-8: Unable to import 'django.core.management.base'

(E0401)

[error] 10-14: Unable to import 'apps.ai.common.constants'

(E0401)

[error] 15-15: Unable to import 'apps.ai.models.chunk'

(E0401)

[error] 16-16: Unable to import 'apps.owasp.models.chapter'

(E0401)

[convention] 19-19: Missing class docstring

(C0115)

[convention] 22-22: Missing function or method docstring

(C0116)

[convention] 40-40: Missing function or method docstring

(C0116)

[warning] 40-40: Unused argument 'args'

(W0613)

[refactor] 128-128: Too many branches (24/12)

(R0912)

[refactor] 128-128: Too many statements (53/50)

(R0915)

[warning] 47-47: Attribute 'openai_client' defined outside init

(W0201)

[warning] 106-106: Attribute 'last_request_time' defined outside init

(W0201)

🪛 Flake8 (7.2.0)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py

[error] 64-64: whitespace before ':'

(E203)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: Run frontend e2e tests
GitHub Check: Run frontend unit tests
GitHub Check: Run backend tests
GitHub Check: CodeQL (javascript-typescript)

🔇 Additional comments (3)

backend/apps/ai/common/constants.py (1)

1-6: LGTM!

The constants are well-named and provide sensible defaults for API rate limiting and text delimiting.

backend/apps/ai/models/chunk.py (2)

27-37: Well-designed generic string representation.

The __str__ method elegantly handles different content types by checking for common attributes (name, key) before falling back to string conversion.

55-88: Clean refactoring to support generic relations.

The update_data method has been properly updated to work with any content object while maintaining the existing behavior and API.

backend/apps/ai/management/commands/ai_create_chapter_chunks.py

sonarqubecloud · 2025-07-03T17:59:31Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (9)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py (9)

19-21: Add a class docstring for better documentation.

While the help attribute provides basic information, adding a proper class docstring would improve code documentation and address the pylint warning.

22-22: Add docstring to document command-line arguments.

40-40: Add docstring and remove unused parameter.

The args parameter is not used in the method.

47-47: Consider initializing instance attributes in init method.

The openai_client and last_request_time attributes are defined outside __init__, which could lead to attribute errors if methods are called in unexpected order.

64-64: Fix whitespace formatting issue.

Remove the whitespace before the colon to comply with PEP 8.

95-106: Consider initializing instance attributes in init method.

The openai_client and last_request_time attributes are defined outside __init__, which could lead to attribute errors if methods are called in unexpected order.

108-123: Simplify the complex list comprehension for better readability.

The list comprehension with walrus operator and filtering is hard to read and maintain.

128-211: Refactor this method to reduce complexity.

This method has high cyclomatic complexity (24 branches) and too many statements. Consider breaking it down into smaller, focused helper methods.

195-195: Fix incorrect variable usage in leaders information.

Line 195 incorrectly appends location_parts instead of leaders_info to the metadata.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0951b19 and 50599e0.

📒 Files selected for processing (1)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1)

Learnt from: M-ayank2005
PR: OWASP/Nest#1282
File: frontend/src/pages/About.tsx:94-98
Timestamp: 2025-04-04T05:23:43.562Z
Learning: In the OWASP/Nest frontend project, DOMPurify is used to sanitize HTML content before rendering it with dangerouslySetInnerHTML, following security best practices to prevent XSS vulnerabilities.

🧬 Code Graph Analysis (1)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py (4)

backend/apps/ai/models/chunk.py (4)

Chunk (13-88)

bulk_save (40-42)

split_text (45-52)

update_data (55-88)

backend/apps/ai/management/commands/ai_create_slack_message_chunks.py (3)

Command (18-93)

handle (21-43)

create_chunks (45-93)

backend/apps/owasp/graphql/nodes/chapter.py (1)

suggested_location (54-56)

backend/apps/owasp/graphql/nodes/common.py (1)

related_urls (18-20)

🪛 Flake8 (7.2.0)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py

[error] 64-64: whitespace before ':'

(E203)

🪛 Pylint (3.3.7)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py

[error] 7-7: Unable to import 'openai'

(E0401)

[error] 8-8: Unable to import 'django.core.management.base'

(E0401)

[error] 10-14: Unable to import 'apps.ai.common.constants'

(E0401)

[error] 15-15: Unable to import 'apps.ai.models.chunk'

(E0401)

[error] 16-16: Unable to import 'apps.owasp.models.chapter'

(E0401)

[convention] 19-19: Missing class docstring

(C0115)

[convention] 22-22: Missing function or method docstring

(C0116)

[convention] 40-40: Missing function or method docstring

(C0116)

[warning] 40-40: Unused argument 'args'

(W0613)

[refactor] 128-128: Too many branches (24/12)

(R0912)

[refactor] 128-128: Too many statements (53/50)

(R0915)

[warning] 47-47: Attribute 'openai_client' defined outside init

(W0201)

[warning] 106-106: Attribute 'last_request_time' defined outside init

(W0201)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: Run frontend e2e tests
GitHub Check: Run frontend unit tests
GitHub Check: Run backend tests
GitHub Check: CodeQL (javascript-typescript)

🔇 Additional comments (1)

backend/apps/ai/management/commands/ai_create_chapter_chunks.py (1)

1-17: Well-organized imports and proper use of constants.

Good practice importing timing constants from a centralized location.

backend/apps/ai/management/commands/ai_create_chapter_chunks.py

coderabbitai · 2025-07-03T18:04:07Z

backend/apps/ai/management/commands/ai_create_chapter_chunks.py

+            "--batch-size",
+            type=int,
+            default=50,
+            help="Number of chapters to process in each batch",


🧹 Nitpick (assertive)

Add validation for batch-size argument.

Consider adding validation to ensure batch-size is positive.

parser.add_argument( "--batch-size", type=int, default=50, help="Number of chapters to process in each batch", ) + parser.add_argument( + "--batch-size", + type=int, + default=50, + help="Number of chapters to process in each batch", + choices=range(1, 1001), + metavar="{1..1000}", + )

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In backend/apps/ai/management/commands/ai_create_chapter_chunks.py around lines 34 to 37, the batch-size argument lacks validation to ensure it is positive. Add a check after parsing the batch-size argument to verify it is greater than zero, and raise an appropriate error or exit with a message if the value is zero or negative. This will prevent invalid batch sizes from being used during processing.

coderabbitai · 2025-07-03T18:04:07Z

backend/apps/ai/management/commands/ai_create_chapter_chunks.py

+            "--all",
+            action="store_true",
+            help="Process all the chapters",


🧹 Nitpick (assertive)

Clarify that --all includes inactive chapters.

The help text should explicitly mention that this option processes both active and inactive chapters to avoid confusion.

parser.add_argument( "--all", action="store_true", - help="Process all the chapters", + help="Process all chapters (including inactive)", )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"--all",

action="store_true",

help="Process all the chapters",

parser.add_argument(

"--all",

action="store_true",

help="Process all chapters (including inactive)",

)

🤖 Prompt for AI Agents

In backend/apps/ai/management/commands/ai_create_chapter_chunks.py around lines 29 to 31, update the help text for the "--all" argument to explicitly state that it processes both active and inactive chapters. Modify the help string to clarify this behavior so users understand that using "--all" includes inactive chapters as well.

coderabbitai · 2025-07-03T18:04:07Z

backend/apps/ai/management/commands/ai_create_chapter_chunks.py

+        for offset in range(0, total_chapters, batch_size):
+            batch_chapters = queryset[offset : offset + batch_size]
+
+            batch_chunks = []
+            for chapter in batch_chapters:
+                batch_chunks.extend(self.create_chunks(chapter))
+
+            if batch_chunks:
+                Chunk.bulk_save(batch_chunks)
+                self.stdout.write(f"Saved {len(batch_chunks)} chunks")
+


🧹 Nitpick (assertive)

Add progress tracking for better user feedback.

Consider adding progress indicators showing which batch is being processed.

batch_size = options["batch_size"] + batch_count = 0 for offset in range(0, total_chapters, batch_size): + batch_count += 1 + self.stdout.write(f"Processing batch {batch_count}/{(total_chapters + batch_size - 1) // batch_size}...") batch_chapters = queryset[offset : offset + batch_size]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for offset in range(0, total_chapters, batch_size):

batch_chapters = queryset[offset : offset + batch_size]

batch_chunks = []

for chapter in batch_chapters:

batch_chunks.extend(self.create_chunks(chapter))

if batch_chunks:

Chunk.bulk_save(batch_chunks)

self.stdout.write(f"Saved {len(batch_chunks)} chunks")

batch_size = options["batch_size"]

+ batch_count = 0

for offset in range(0, total_chapters, batch_size):

+ batch_count += 1

+ self.stdout.write(

+ f"Processing batch {batch_count}/{(total_chapters + batch_size - 1) // batch_size}..."

+ )

batch_chapters = queryset[offset : offset + batch_size]

batch_chunks = []

for chapter in batch_chapters:

batch_chunks.extend(self.create_chunks(chapter))

if batch_chunks:

Chunk.bulk_save(batch_chunks)

self.stdout.write(f"Saved {len(batch_chunks)} chunks")

🧰 Tools

🪛 Flake8 (7.2.0)

[error] 64-64: whitespace before ':'

(E203)

🤖 Prompt for AI Agents

In backend/apps/ai/management/commands/ai_create_chapter_chunks.py around lines 63 to 73, add progress tracking by including a print or log statement before processing each batch to indicate the current batch number and total batches. Calculate the total number of batches based on total_chapters and batch_size, then output progress like "Processing batch X of Y" before processing each batch to provide better user feedback.

Dishant1804 added 2 commits July 2, 2025 02:26

chapter chunks created

44aa23c

Merge remote-tracking branch 'upstream/main' into chapter_chunks

1429824

Dishant1804 requested a review from arkid15r as a code owner July 1, 2025 21:00

coderabbitai bot reviewed Jul 1, 2025

View reviewed changes

arkid15r requested changes Jul 1, 2025

View reviewed changes

Dishant1804 added 2 commits July 2, 2025 10:29

suggestions implemented

bf7f676

Merge branch 'main' into chapter_chunks

eac3b50

coderabbitai bot reviewed Jul 2, 2025

View reviewed changes

backend/apps/ai/management/commands/ai_create_chapter_chunks.py Outdated Show resolved Hide resolved

code rabbit suggestions implemented

85bb4c5

Dishant1804 requested a review from arkid15r July 2, 2025 05:12

Merge branch 'main' into chapter_chunks

996d9a7

github-actions bot added backend makefile labels Jul 3, 2025

tests updated

70dd534

coderabbitai bot reviewed Jul 3, 2025

View reviewed changes

backend/tests/apps/ai/models/chunk_test.py Show resolved Hide resolved

backend/tests/apps/ai/models/chunk_test.py Show resolved Hide resolved

backend/tests/apps/ai/models/chunk_test.py Show resolved Hide resolved

Dishant1804 and others added 2 commits July 3, 2025 22:48

Merge branch 'main' into chapter_chunks

8834fd6

Update code

0951b19

github-actions bot added the backend-tests label Jul 3, 2025

coderabbitai bot reviewed Jul 3, 2025

View reviewed changes

backend/apps/ai/management/commands/ai_create_chapter_chunks.py Outdated Show resolved Hide resolved

Update code

50599e0

coderabbitai bot reviewed Jul 3, 2025

View reviewed changes

arkid15r enabled auto-merge July 3, 2025 18:12

arkid15r approved these changes Jul 3, 2025

View reviewed changes

arkid15r added this pull request to the merge queue Jul 3, 2025

Merged via the queue into OWASP:main with commit 40aadc1 Jul 3, 2025
23 checks passed

coderabbitai bot mentioned this pull request Jul 26, 2025

refactor:remove PLR0912 ignore and fix 4 violations #1834

Closed

2 tasks

coderabbitai bot mentioned this pull request Aug 5, 2025

refactor:remove PLR0912 ignore and fix 4 violations update #1979

Open

2 tasks

coderabbitai bot mentioned this pull request Aug 13, 2025

NestBot AI Assistant Contexts #1891

Merged

7 tasks

Uh oh!

Creating chunks for OWASP chapters #1693

Creating chunks for OWASP chapters #1693

Uh oh!

Conversation

Dishant1804 commented Jul 1, 2025

Uh oh!

coderabbitai bot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Summary by CodeRabbit

Walkthrough

Changes

Assessment against linked issues

Assessment against linked issues: Out-of-scope changes

Possibly related PRs

Suggested labels

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arkid15r left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud bot commented Jul 3, 2025

Quality Gate passed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

coderabbitai bot commented Jul 1, 2025 •

edited

Loading