Skip to content
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
bff36bd
rag tool for agent
Dishant1804 Jul 22, 2025
f254af8
code rabbit suggestions implemented
Dishant1804 Jul 22, 2025
a1bba29
Merge branch 'main' into RAG
Dishant1804 Jul 22, 2025
ad3f3b4
Merge branch 'main' into RAG
arkid15r Jul 22, 2025
c1334a6
Merge branch 'main' into RAG
Dishant1804 Jul 23, 2025
c9d4a27
Merge branch 'main' into RAG
Dishant1804 Jul 24, 2025
ff45de1
suggestions implemented
Dishant1804 Jul 25, 2025
4b38f5a
Merge remote-tracking branch 'upstream/main' into RAG
Dishant1804 Jul 25, 2025
b2c5b59
code rabbit suggestion
Dishant1804 Jul 25, 2025
9b94aed
Merge branch 'main' into RAG
Dishant1804 Jul 25, 2025
3038f32
Merge remote-tracking branch 'upstream/main' into RAG
Dishant1804 Jul 28, 2025
e120962
added context model
Dishant1804 Jul 28, 2025
f24453a
Merge remote-tracking branch 'upstream/main' into context-model
Dishant1804 Jul 29, 2025
e876a0c
retrieving data from context model
Dishant1804 Jul 29, 2025
981277a
removed try except
Dishant1804 Jul 29, 2025
8b46f08
Suggestions implemented
Dishant1804 Jul 30, 2025
16fabcf
code rabbit suggestion
Dishant1804 Jul 30, 2025
532be09
Merge branch 'main' into context-model
Dishant1804 Jul 30, 2025
77203b8
removed deafult
Dishant1804 Jul 30, 2025
9e03b53
updated tests
Dishant1804 Jul 30, 2025
ed44239
Merge branch 'main' into context-model
Dishant1804 Aug 4, 2025
41f8126
de coupled context and chunks
Dishant1804 Aug 5, 2025
c5aba9c
Merge remote-tracking branch 'upstream/main' into context-model
Dishant1804 Aug 5, 2025
697a406
update method for context
Dishant1804 Aug 7, 2025
46cd884
Merge remote-tracking branch 'upstream/main' into context-model
Dishant1804 Aug 7, 2025
a3255ff
major revamp and test cases
Dishant1804 Aug 9, 2025
64c079a
Merge remote-tracking branch 'upstream/main' into context-model
Dishant1804 Aug 9, 2025
7affa22
code rabbit suggestions
Dishant1804 Aug 10, 2025
55132d7
Merge remote-tracking branch 'upstream/main' into context-model
Dishant1804 Aug 10, 2025
3d7bd48
major revamp
Dishant1804 Aug 10, 2025
7d0731b
Merge remote-tracking branch 'upstream/main' into context-model
Dishant1804 Aug 10, 2025
ff3e61a
Merge remote-tracking branch 'upstream/main' into context-model
Dishant1804 Aug 12, 2025
c709b9e
suggestions implemented
Dishant1804 Aug 13, 2025
1c7fe1c
refactoring
Dishant1804 Aug 13, 2025
948c529
more tests
Dishant1804 Aug 13, 2025
1455083
Merge branch 'main' into context-model
Dishant1804 Aug 13, 2025
1e8d65e
more refactoring
Dishant1804 Aug 13, 2025
3f15d7a
Merge branch 'main' into context-model
Dishant1804 Aug 13, 2025
742a15e
Merge branch 'main' into context-model
Dishant1804 Aug 14, 2025
bd8f280
suggestions implemented
Dishant1804 Aug 14, 2025
8610dde
Merge branch 'main' into context-model
Dishant1804 Aug 14, 2025
a9da28b
chunk model update
Dishant1804 Aug 14, 2025
a0ed311
update logic and suggestions
Dishant1804 Aug 16, 2025
9646366
Merge remote-tracking branch 'upstream/main' into context-model
Dishant1804 Aug 16, 2025
2d86dcb
code rabbit suggestions
Dishant1804 Aug 16, 2025
011e843
before tests and question
Dishant1804 Aug 17, 2025
466bca3
sugesstions and decoupling with tests
Dishant1804 Aug 18, 2025
9c2556c
Merge remote-tracking branch 'upstream/main' into context-model
Dishant1804 Aug 18, 2025
c9f260d
Merge branch 'main' into context-model
Dishant1804 Aug 18, 2025
197c0ff
sugesstions implemented
Dishant1804 Aug 18, 2025
4dc3800
Merge remote-tracking branch 'upstream/main' into context-model
Dishant1804 Aug 18, 2025
346d324
Update code
arkid15r Aug 20, 2025
baae5eb
updated code
Dishant1804 Aug 21, 2025
f6bb1bd
spelling fixes
Dishant1804 Aug 21, 2025
6c353d1
Merge remote-tracking branch 'upstream/main' into context-model
Dishant1804 Aug 21, 2025
506ad46
test changes
Dishant1804 Aug 21, 2025
871d266
Update tests
arkid15r Aug 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions backend/apps/ai/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,7 @@ ai-create-project-chunks:
ai-create-slack-message-chunks:
@echo "Creating Slack message chunks"
@CMD="python manage.py ai_create_slack_message_chunks" $(MAKE) exec-backend-command

ai-run-rag-tool:
@echo "Running RAG tool"
@CMD="python manage.py ai_run_rag_tool" $(MAKE) exec-backend-command
21 changes: 19 additions & 2 deletions backend/apps/ai/admin.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you order the changes you add according to existing ordering convention?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate this one I am unable to understand it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your ContextAdmin class goes before teh ChunkAdmin and the same for register(). Compare them to the imports order for example.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not addressed for some reason 🤷‍♂️

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the changes now

Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,21 @@
from django.contrib import admin

from apps.ai.models.chunk import Chunk
from apps.ai.models.context import Context


class ContextAdmin(admin.ModelAdmin):
"""Admin for Context model."""

list_display = (
"id",
"generated_text",
"content_type",
"object_id",
"source",
)
search_fields = ("generated_text", "source")
list_filter = ("content_type", "source")


class ChunkAdmin(admin.ModelAdmin):
Expand All @@ -11,9 +26,11 @@ class ChunkAdmin(admin.ModelAdmin):
list_display = (
"id",
"text",
"content_type",
"context",
)
search_fields = ("text", "object_id")
search_fields = ("text",)
list_filter = ("context__content_type",)


admin.site.register(Context, ContextAdmin)
admin.site.register(Chunk, ChunkAdmin)
Empty file.
Empty file.
Empty file.
120 changes: 120 additions & 0 deletions backend/apps/ai/agent/tools/rag/generator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
"""Generator for the RAG system."""

import logging
import os
from typing import Any

import openai

logger = logging.getLogger(__name__)


class Generator:
"""Generates answers to user queries based on retrieved context."""

MAX_TOKENS = 2000
SYSTEM_PROMPT = """
You are a helpful and professional AI assistant for the OWASP Foundation.
Your task is to answer user queries based ONLY on the provided context.
Follow these rules strictly:
1. Base your entire answer on the information given in the "CONTEXT" section. Do not use any
external knowledge unless and until it is about OWASP.
2. Do not mention or refer to the word "context", "based on context", "provided information",
"Information given to me" or similar phrases in your responses.
3. you will answer questions only related to OWASP and within the scope of OWASP.
4. Be concise and directly answer the user's query.
5. Provide the necessary link if the context contains a URL.
6. If there is any query based on location, you need to look for latitude and longitude in the
context and provide the nearest OWASP chapter based on that.
7. You can ask for more information if the query is very personalized or user-centric.
8. after trying all of the above, If the context does not contain the information or you think that
it is out of scope for OWASP, you MUST state: "please ask question related to OWASP."
"""
TEMPERATURE = 0.4

def __init__(self, chat_model: str = "gpt-4o"):
"""Initialize the Generator.

Args:
chat_model (str): The name of the OpenAI chat model to use for generation.

Raises:
ValueError: If the OpenAI API key is not set.

"""
if not (openai_api_key := os.getenv("DJANGO_OPEN_AI_SECRET_KEY")):
error_msg = "DJANGO_OPEN_AI_SECRET_KEY environment variable not set"
raise ValueError(error_msg)

self.chat_model = chat_model
self.openai_client = openai.OpenAI(api_key=openai_api_key)
logger.info("Generator initialized with chat model: %s", self.chat_model)

def prepare_context(self, context_chunks: list[dict[str, Any]]) -> str:
"""Format the list of retrieved context chunks into a single string for the LLM.

Args:
context_chunks: A list of chunk dictionaries from the retriever.

Returns:
A formatted string containing the context.

"""
if not context_chunks:
return "No context provided"

formatted_context = []
for i, chunk in enumerate(context_chunks):
source_name = chunk.get("source_name", f"Unknown Source {i + 1}")
text = chunk.get("text", "")

context_block = f"Source Name: {source_name}\nContent: {text}"
formatted_context.append(context_block)

return "\n\n---\n\n".join(formatted_context)

def generate_answer(self, query: str, context_chunks: list[dict[str, Any]]) -> str:
"""Generate an answer to the user's query using provided context chunks.

Args:
query: The user's query text.
context_chunks: A list of context chunks retrieved by the retriever.

Returns:
The generated answer as a string.

"""
formatted_context = self.prepare_context(context_chunks)

user_prompt = f"""
- You are an assistant for question-answering tasks related to OWASP.
- Use the following pieces of retrieved context to answer the question.
- If the question is related to OWASP then you can try to answer based on your knowledge, if you
don't know the answer, just say that you don't know.
- Try to give answer and keep the answer concise, but you really think that the response will be
longer and better you will provide more information.
- Ask for the current location if the query is related to location.
- Ask for the information you need if the query is very personalized or user-centric.
- Do not mention or refer to the word "context", "based on context", "provided information",
"Information given to me" or similar phrases in your responses.
Question: {query}
Context: {formatted_context}
Answer:
"""

try:
response = self.openai_client.chat.completions.create(
model=self.chat_model,
messages=[
{"role": "system", "content": self.SYSTEM_PROMPT},
{"role": "user", "content": user_prompt},
],
temperature=self.TEMPERATURE,
max_tokens=self.MAX_TOKENS,
)
answer = response.choices[0].message.content.strip()
except openai.OpenAIError:
logger.exception("OpenAI API error")
answer = "I'm sorry, I'm currently unable to process your request."

return answer
72 changes: 72 additions & 0 deletions backend/apps/ai/agent/tools/rag/rag_tool.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
"""A tool for orchestrating the components of RAG process."""

import logging

from apps.ai.common.constants import DEFAULT_CHUNKS_RETRIEVAL_LIMIT, DEFAULT_SIMILARITY_THRESHOLD

from .generator import Generator
from .retriever import Retriever

logger = logging.getLogger(__name__)


class RagTool:
"""Main RAG tool that orchestrates the retrieval and generation process."""

def __init__(
self,
embedding_model: str = "text-embedding-3-small",
chat_model: str = "gpt-4o",
):
"""Initialize the RAG tool.

Args:
embedding_model (str, optional): The model to use for embeddings.
chat_model (str, optional): The model to use for chat generation.

Raises:
ValueError: If the OpenAI API key is not set.

"""
try:
self.retriever = Retriever(embedding_model=embedding_model)
self.generator = Generator(chat_model=chat_model)
except Exception:
logger.exception("Failed to initialize RAG tool")
raise

def query(
self,
question: str,
limit: int = DEFAULT_CHUNKS_RETRIEVAL_LIMIT,
similarity_threshold: float = DEFAULT_SIMILARITY_THRESHOLD,
content_types: list[str] | None = None,
) -> str:
"""Process a user query using the complete RAG pipeline.

Args:
question (str): The user's question.
limit (int): Maximum number of context chunks to retrieve.
similarity_threshold (float): Minimum similarity score for retrieval.
content_types (Optional[list[str]]): Content types to filter by.

Returns:
dict[str, Any]: A dictionary containing:
- answer (str): The generated answer

"""
logger.info("Retrieving context for query")
retrieved_chunks = self.retriever.retrieve(
query=question,
limit=limit,
similarity_threshold=similarity_threshold,
content_types=content_types,
)

generation_result = self.generator.generate_answer(
query=question, context_chunks=retrieved_chunks
)

logger.info("Successfully processed RAG query")

return generation_result
Loading