-
-
Notifications
You must be signed in to change notification settings - Fork 264
Nestbot MVP #2113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Nestbot MVP #2113
Changes from 7 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
34c5451
Sync www-repopsitories (#2164)
Dishant1804 db6a6f2
Consolidate code commits
arkid15r 17db04d
Update cspell/custom-dict.txt
arkid15r d6b4a85
Update docker-compose/local.yaml
arkid15r 5cba678
Merge branch 'feature/nestbot-ai-assistant' into MVP
Dishant1804 a565d10
local yaml worder volume fix
Dishant1804 e2ebd91
instance check
Dishant1804 7207ddf
Merge branch 'feature/nestbot-ai-assistant' into MVP
Dishant1804 0354f7b
poetry file updated
Dishant1804 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,177 @@ | ||
| """Content extractor for Repository.""" | ||
|
|
||
| import json | ||
| import logging | ||
| import time | ||
|
|
||
| from apps.ai.common.constants import DELIMITER, GITHUB_REQUEST_INTERVAL_SECONDS | ||
| from apps.common.utils import is_valid_json | ||
| from apps.github.utils import get_repository_file_content | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def extract_repository_content(repository) -> tuple[str, str]: | ||
| """Extract structured content from repository data. | ||
|
|
||
| Args: | ||
| repository: Repository instance | ||
|
|
||
| Returns: | ||
| tuple[str, str]: (json_content, metadata_content) | ||
|
|
||
| """ | ||
| repository_data = {} | ||
|
|
||
| if repository.name: | ||
| repository_data["name"] = repository.name | ||
| if repository.key: | ||
| repository_data["key"] = repository.key | ||
| if repository.description: | ||
| repository_data["description"] = repository.description | ||
| if repository.homepage: | ||
| repository_data["homepage"] = repository.homepage | ||
| if repository.license: | ||
| repository_data["license"] = repository.license | ||
| if repository.topics: | ||
| repository_data["topics"] = repository.topics | ||
|
|
||
| status = {} | ||
| if repository.is_archived: | ||
| status["archived"] = True | ||
| if repository.is_empty: | ||
| status["empty"] = True | ||
| if repository.is_owasp_repository: | ||
| status["owasp_repository"] = True | ||
| if repository.is_owasp_site_repository: | ||
| status["owasp_site_repository"] = True | ||
| if status: | ||
| repository_data["status"] = status | ||
|
|
||
| funding = {} | ||
| if repository.is_funding_policy_compliant: | ||
| funding["policy_compliant"] = True | ||
| if repository.has_funding_yml: | ||
| funding["has_funding_yml"] = True | ||
| if funding: | ||
| repository_data["funding"] = funding | ||
|
|
||
| if repository.pages_status: | ||
| repository_data["pages_status"] = repository.pages_status | ||
|
|
||
| features = [] | ||
| if repository.has_downloads: | ||
| features.append("downloads") | ||
| if repository.has_issues: | ||
| features.append("issues") | ||
| if repository.has_pages: | ||
| features.append("pages") | ||
| if repository.has_projects: | ||
| features.append("projects") | ||
| if repository.has_wiki: | ||
| features.append("wiki") | ||
| if features: | ||
| repository_data["features"] = features | ||
|
|
||
| stats = {} | ||
| if repository.commits_count: | ||
| stats["commits"] = repository.commits_count | ||
| if repository.contributors_count: | ||
| stats["contributors"] = repository.contributors_count | ||
| if repository.forks_count: | ||
| stats["forks"] = repository.forks_count | ||
| if repository.open_issues_count: | ||
| stats["open_issues"] = repository.open_issues_count | ||
| if repository.stars_count: | ||
| stats["stars"] = repository.stars_count | ||
| if repository.subscribers_count: | ||
| stats["subscribers"] = repository.subscribers_count | ||
| if repository.watchers_count: | ||
| stats["watchers"] = repository.watchers_count | ||
| if stats: | ||
| repository_data["statistics"] = stats | ||
|
|
||
| dates = {} | ||
| if repository.created_at: | ||
| dates["created"] = repository.created_at.strftime("%Y-%m-%d") | ||
| if repository.updated_at: | ||
| dates["last_updated"] = repository.updated_at.strftime("%Y-%m-%d") | ||
| if repository.pushed_at: | ||
| dates["last_pushed"] = repository.pushed_at.strftime("%Y-%m-%d") | ||
| if dates: | ||
| repository_data["dates"] = dates | ||
|
|
||
| ownership = {} | ||
| if repository.organization: | ||
| ownership["organization"] = repository.organization.login | ||
| if repository.owner: | ||
| ownership["owner"] = repository.owner.login | ||
| if ownership: | ||
| repository_data["ownership"] = ownership | ||
|
|
||
| markdown_files = [ | ||
| "README.md", | ||
| "index.md", | ||
| "info.md", | ||
| "leaders.md", | ||
| ] | ||
|
|
||
| if repository.organization: | ||
| owner = repository.organization.login | ||
| else: | ||
| owner = repository.owner.login if repository.owner else "" | ||
| branch = repository.default_branch or "main" | ||
|
|
||
| tab_files = [] | ||
| if owner and repository.key: | ||
| contents_url = ( | ||
| f"https://api.github.com/repos/{owner}/{repository.key}/contents/?ref={branch}" | ||
| ) | ||
| response = get_repository_file_content(contents_url) | ||
| if response and is_valid_json(response): | ||
| items = json.loads(response) | ||
| for item in items: | ||
| if isinstance(item, dict): | ||
| name = item.get("name", "") | ||
| if name.startswith("tab_") and name.endswith(".md"): | ||
| tab_files.append(name) | ||
|
|
||
| all_markdown_files = markdown_files + tab_files | ||
|
|
||
| markdown_content = {} | ||
| for file_path in all_markdown_files: | ||
| try: | ||
| if owner and repository.key: | ||
| raw_url = ( | ||
| f"https://raw.githubusercontent.com/{owner}/{repository.key}/" | ||
| f"{branch}/{file_path}" | ||
| ) | ||
| content = get_repository_file_content(raw_url) | ||
|
|
||
| if content and content.strip(): | ||
| markdown_content[file_path] = content | ||
| time.sleep(GITHUB_REQUEST_INTERVAL_SECONDS) | ||
|
|
||
| except (ValueError, TypeError, OSError): | ||
| logger.debug("Failed to fetch markdown file") | ||
| continue | ||
|
|
||
| if markdown_content: | ||
| repository_data["markdown_content"] = markdown_content | ||
|
|
||
| json_content = json.dumps(repository_data, indent=2) | ||
|
|
||
| metadata_parts = [] | ||
| if repository.name: | ||
| metadata_parts.append(f"Repository Name: {repository.name}") | ||
| if repository.key: | ||
| metadata_parts.append(f"Repository Key: {repository.key}") | ||
| if repository.organization: | ||
| metadata_parts.append(f"Organization: {repository.organization.login}") | ||
| if repository.owner: | ||
| metadata_parts.append(f"Owner: {repository.owner.login}") | ||
|
|
||
| return ( | ||
| json_content, | ||
| DELIMITER.join(filter(None, metadata_parts)), | ||
| ) |
41 changes: 41 additions & 0 deletions
41
backend/apps/ai/management/commands/ai_update_repository_chunks.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| """A command to create chunks of OWASP repository data for RAG.""" | ||
|
|
||
| from django.db.models import QuerySet | ||
|
|
||
| from apps.ai.common.base.chunk_command import BaseChunkCommand | ||
| from apps.ai.common.extractors.repository import extract_repository_content | ||
| from apps.github.models.repository import Repository | ||
|
|
||
|
|
||
| class Command(BaseChunkCommand): | ||
| key_field_name = "key" | ||
| model_class = Repository | ||
|
|
||
| def __init__(self, *args, **kwargs): | ||
| """Initialize command for repository.""" | ||
| super().__init__(*args, **kwargs) | ||
| self.entity_name_plural = "repositories" | ||
|
|
||
| def extract_content(self, entity: Repository) -> tuple[str, str]: | ||
| """Extract content from the repository.""" | ||
| return extract_repository_content(entity) | ||
|
|
||
| def get_base_queryset(self) -> QuerySet: | ||
| """Return the base queryset with filtering for OWASP site repositories.""" | ||
| return ( | ||
| super() | ||
| .get_base_queryset() | ||
| .filter( | ||
| is_owasp_site_repository=True, | ||
| is_archived=False, | ||
| is_empty=False, | ||
| ) | ||
| ) | ||
|
|
||
| def get_default_queryset(self) -> QuerySet: | ||
| """Override to avoid is_active filter since Repository doesn't have that field.""" | ||
| return self.get_base_queryset() | ||
|
|
||
| def source_name(self) -> str: | ||
| """Return the source name for context creation.""" | ||
| return "owasp_repository" |
41 changes: 41 additions & 0 deletions
41
backend/apps/ai/management/commands/ai_update_repository_context.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| """A command to update context for OWASP repository data.""" | ||
|
|
||
| from django.db.models import QuerySet | ||
|
|
||
| from apps.ai.common.base.context_command import BaseContextCommand | ||
| from apps.ai.common.extractors.repository import extract_repository_content | ||
| from apps.github.models.repository import Repository | ||
|
|
||
|
|
||
| class Command(BaseContextCommand): | ||
| key_field_name = "key" | ||
| model_class = Repository | ||
|
|
||
| def __init__(self, *args, **kwargs): | ||
| """Initialize command for repository.""" | ||
| super().__init__(*args, **kwargs) | ||
| self.entity_name_plural = "repositories" | ||
|
|
||
| def extract_content(self, entity: Repository) -> tuple[str, str]: | ||
| """Extract content from the repository.""" | ||
| return extract_repository_content(entity) | ||
|
|
||
| def get_base_queryset(self) -> QuerySet: | ||
| """Return the base queryset with filtering for OWASP site repositories.""" | ||
| return ( | ||
| super() | ||
| .get_base_queryset() | ||
| .filter( | ||
| is_owasp_site_repository=True, | ||
| is_archived=False, | ||
| is_empty=False, | ||
| ) | ||
| ) | ||
|
|
||
| def get_default_queryset(self) -> QuerySet: | ||
| """Override to avoid is_active filter since Repository doesn't have that field.""" | ||
| return self.get_base_queryset() | ||
|
|
||
| def source_name(self) -> str: | ||
| """Return the source name for context creation.""" | ||
| return "owasp_repository" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.