chore: update doc and small polish (#1209)

camel-ai · Nov 24, 2024 · 19530bf · 19530bf
1 parent 2e7dcdc
commit 19530bf
Show file tree

Hide file tree

Showing 6 changed files with 863 additions and 27 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -93,6 +93,56 @@ This part outlines the guidelines and best practices for conducting code reviews
 
 Code reviews are an essential part of maintaining the quality and integrity of our open source project. By following these guidelines, we can ensure that CAMEL remains robust, secure, and easy to maintain, while also fostering a collaborative and welcoming community.
 
+### Guideline for Writing Docstrings
+
+This guideline will help you write clear, concise, and structured docstrings for contributing to `CAMEL`.
+
+#### 1. Use the Triple-Quoted String with `r"""` (Raw String)
+Begin the docstring with `r"""` to indicate a raw docstring. This prevents any issues with special characters and ensures consistent formatting, especially in documentation tools like Sphinx.
+
+#### 2. Provide a Brief Class or Method Description
+- Start with a concise summary of the purpose and functionality.
+- Keep each line under `79` characters.
+- The summary should start on the first line without a linebreak.
+
+Example:
+```python
+r"""Class for managing conversations of CAMEL Chat Agents.
+"""
+```
+
+#### 3. Document Parameters in the Args Section
+- Use an `Args`: section for documenting constructor or function parameters.
+- Maintain the `79`-character limit for each line, and indent continuation lines by 4 spaces.
+- Follow this structure:
+  - Parameter Name: Match the function signature.
+  - Type: Include the type (e.g., `int`, `str`, custom types like `BaseModelBackend`).
+  - Description: Provide a brief explanation of the parameter's role.
+  - Default Value: Use (`default: :obj:<default_value>`) to indicate default values.
+
+Example:
+```markdown
+Args:
+    system_message (BaseMessage): The system message for initializing 
+        the agent's conversation context.
+    model (BaseModelBackend, optional): The model backend to use for 
+        response generation. Defaults to :obj:`OpenAIModel` with 
+        `GPT_4O_MINI`. (default: :obj:`OpenAIModel` with `GPT_4O_MINI`)
+```
+
+### Naming Principle 🛡️
+#### Avoid Abbreviations in Naming
+- Abbreviations can lead to ambiguity, especially since variable names and code in CAMEL are directly used by agents.
+- Use clear, descriptive names that convey meaning without requiring additional explanation. This improves both human readability and the agent's ability to interpret the code.
+
+Examples:
+
+- Bad: msg_win_sz
+- Good: message_window_size
+
+By adhering to this principle, we ensure that CAMEL remains accessible and unambiguous for both developers and AI agents.
+
+
 ### Board Item Create Workflow 🛠️
 At CAMEL, we manage our project through a structured workflow that ensures efficiency and clarity in our development process. Our workflow includes stages for issue creation and pull requests (PRs), sprint planning, and reviews.
 

diff --git a/README.md b/README.md
@@ -164,9 +164,10 @@ Detailed guidance can be find [here](https://github.com/camel-ai/camel/blob/mast
 
 By default, the agent uses the `ModelType.DEFAULT` model from the `ModelPlatformType.DEFAULT`. You can configure the default model platform and model type using environment variables. If these are not set, the agent will fall back to the default settings:
 
-- `ModelPlatformType.DEFAULT = "openai"`
-
-- `ModelType.DEFAULT = "gpt-4o-mini"`
+```bash
+ModelPlatformType.DEFAULT = "openai"
+ModelType.DEFAULT = "gpt-4o-mini"
+```
 
 ### Setting Default Model Platform and Model Type (Optional)
 
@@ -319,6 +320,7 @@ Practical guides and tutorials for implementing specific functionalities in CAME
 | **[Video Analysis](https://docs.camel-ai.org/cookbooks/video_analysis.html)** | Techniques for agents in video data analysis. |
 | **[Track CAMEL Agents with AgentOps](https://docs.camel-ai.org/cookbooks/agents_tracking.html)** | Tools for tracking and managing agents in operations. |
 | **[Create A Hackathon Judge Committee with Workforce](https://docs.camel-ai.org/cookbooks/workforce_judge_committee.html)** | Building a team of agents for collaborative judging. |
+| **[3 Ways to Ingest Data from Websites with Firecrawl](https://docs.camel-ai.org/cookbooks/ingest_data_from_websites_with_Firecrawl.html)** | Explore three methods for extracting and processing data from websites using Firecrawl. |
 
 ## Utilize Various LLMs as Backends
 
@@ -346,6 +348,8 @@ For more details, please see our [`Models Documentation`](https://docs.camel-ai.
 We implemented amazing research ideas from other works for you to build, compare and customize your agents. If you use any of these modules, please kindly cite the original works:
 - `TaskCreationAgent`, `TaskPrioritizationAgent` and `BabyAGI` from *Nakajima et al.*: [Task-Driven Autonomous Agent](https://yoheinakajima.com/task-driven-autonomous-agent-utilizing-gpt-4-pinecone-and-langchain-for-diverse-applications/). [[Example](https://github.com/camel-ai/camel/blob/master/examples/ai_society/babyagi_playing.py)]
 
+- `PersonaHub` from *Tao Ge et al.*: [Scaling Synthetic Data Creation with 1,000,000,000 Personas](https://arxiv.org/pdf/2406.20094). [[Example](https://github.com/camel-ai/camel/blob/master/examples/personas/personas_generation.py)]
+
 ## Other Research Works Based on Camel
 - [Agent Trust](http://agent-trust.camel-ai.org/): Can Large Language Model Agents Simulate Human Trust Behavior?
 

diff --git a/camel/personas/persona.py b/camel/personas/persona.py
@@ -11,6 +11,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # =========== Copyright 2023 @ CAMEL-AI.org. All Rights Reserved. ===========
+import json
 import uuid
 from typing import ClassVar, Optional, Union
 
@@ -25,10 +26,17 @@ class Persona(BaseModel):
     Attributes:
         name (Optional[str]): Name of the persona.
         description (Optional[str]): Description of the persona.
-        t2p_prompt (Union[TextPrompt, str]): Text to Persona Prompt.
-        p2p_prompt (Union[TextPrompt, str]): Persona to Persona Prompt.
+        text_to_persona_prompt (Union[TextPrompt, str]): The prompt to convert
+            text into a persona.
+        persona_to_persona_prompt (Union[TextPrompt, str]): Persona-to-Persona
+            interaction prompt.
         id (uuid.UUID): The unique identifier for the persona, automatically
             generated.
+        _id (uuid.UUID): Internal unique identifier for the persona,
+            generated lazily using `uuid.uuid4`.
+        model_config (ClassVar[ConfigDict]): Configuration for the Pydantic
+            model. Allows arbitrary types and includes custom JSON schema
+            settings.
     """
 
     name: Optional[str] = None
@@ -37,13 +45,14 @@ class Persona(BaseModel):
 
     # Field with default_factory to avoid circular import issues
     # Union type allows either TextPrompt or str
-    t2p_prompt: Union[TextPrompt, str] = Field(
+    text_to_persona_prompt: Union[TextPrompt, str] = Field(
         default_factory=lambda: PersonaHubPrompt.TEXT_TO_PERSONA,
         description="Text to Persona Prompt",
     )
 
-    # Similar to t2p_prompt, using default_factory for lazy evaluation
-    p2p_prompt: Union[TextPrompt, str] = Field(
+    # Similar to text_to_persona_prompt, using default_factory for lazy
+    # evaluation
+    persona_to_persona_prompt: Union[TextPrompt, str] = Field(
         default_factory=lambda: PersonaHubPrompt.PERSONA_TO_PERSONA,
         description="Persona to Persona Prompt",
     )
@@ -56,10 +65,10 @@ class Persona(BaseModel):
         # Custom JSON schema configuration
         json_schema_extra={
             "properties": {
-                # Ensure t2p_prompt and p2p_prompt are treated as strings in
-                # JSON schema
-                "t2p_prompt": {"type": "string"},
-                "p2p_prompt": {"type": "string"},
+                # Ensure text_to_persona_prompt and persona_to_persona_prompt
+                # are treated as strings in JSON schema
+                "text_to_persona_prompt": {"type": "string"},
+                "persona_to_persona_prompt": {"type": "string"},
             }
         },
     )
@@ -75,12 +84,20 @@ def model_json_schema(cls):
         return schema
 
     def dict(self, *args, **kwargs):
-        # Output: {'name': 'Alice', 'description': None, 't2p_prompt': '...', 'p2p_prompt': '...', 'id': 'f47ac10b-58cc-4372-a567-0e02b2c3d479'}  # noqa: E501
+        # Output: {'name': 'Alice', 'description': None, 'text_to_persona_prompt': '...', 'persona_to_persona_prompt': '...', 'id': 'f47ac10b-58cc-4372-a567-0e02b2c3d479'}  # noqa: E501
         d = super().model_dump(*args, **kwargs)
         d['id'] = str(self.id)
         return d
 
     def json(self, *args, **kwargs):
-        # Output: '{"name": "Alice", "description": null, "t2p_prompt": "...", "p2p_prompt": "...", "id": "f47ac10b-58cc-4372-a567-0e02b2c3d479"}'  # noqa: E501
+        # Output: '{"name": "Alice", "description": null, "text_to_persona_prompt": "...", "persona_to_persona_prompt": "...", "id": "f47ac10b-58cc-4372-a567-0e02b2c3d479"}'  # noqa: E501
         d = self.dict(*args, **kwargs)
-        return super().json(d, *args, **kwargs)
+        return json.dumps(
+            d,
+            indent=4,  # Pretty-print with 4 spaces indentation
+            sort_keys=True,  # Sort keys alphabetically
+            separators=(
+                ",",
+                ": ",
+            ),  # Fine-tune separators for better readability
+        )
diff --git a/camel/personas/persona_hub.py b/camel/personas/persona_hub.py
@@ -31,12 +31,16 @@
 class PersonaResponse(BaseModel):
     persona_name: str = Field(description="The name of the persona")
     persona_description: str = Field(
-        description="The description of the persona"
+        description="The description of the persona."
     )
 
 
 class PersonaHub:
-    r"""PersonaHub proposes a novel persona-driven data synthesis methodology
+    r"""The PersonaHub adapted from `"Scaling Synthetic Data Creation with 1,
+    000,000,000 Personas"
+    <https://github.com/tencent-ailab/persona-hub>`_.
+
+    PersonaHub proposes a novel persona-driven data synthesis methodology
     that leverages various perspectives within a large language model (LLM) to
     create diverse synthetic data. By showcasing PersonaHub's use cases in
     synthesizing high-quality mathematical and logical reasoning problems,
@@ -45,7 +49,7 @@ class PersonaHub:
     synthesis is versatile, scalable, flexible, and easy to use, potentially
     driving a paradigm shift in synthetic data creation and applications in
     practice, which may have a profound impact on LLM research and development.
-    Please refer to the paper for more details: https://arxiv.org/pdf/2406.20094
+    Please refer to the paper for more details: https://arxiv.org/pdf/2406.20094.
 
     Args:
         model (BaseModelBackend, optional): The model to use for persona
@@ -76,7 +80,7 @@ def __delitem__(self, persona_id: uuid.UUID):
         if persona_id in self.personas:
             del self.personas[persona_id]
         else:
-            raise KeyError("Persona ID not found")
+            raise KeyError("Persona ID not found.")
 
     def __getitem__(self, persona_id: uuid.UUID) -> Persona:
         r"""Get a persona by ID.
@@ -87,7 +91,7 @@ def __getitem__(self, persona_id: uuid.UUID) -> Persona:
         if persona_id in self.personas:
             return self.personas[persona_id]
         else:
-            raise KeyError("Persona ID not found")
+            raise KeyError("Persona ID not found.")
 
     def text_to_persona(
         self,
@@ -107,8 +111,12 @@ def text_to_persona(
         """
         persona = Persona()
 
-        t2p_prompt: Union[TextPrompt, str] = persona.t2p_prompt
-        t2p_prompt_instruction = t2p_prompt.format(action=action, text=text)
+        text_to_persona_prompt: Union[TextPrompt, str] = (
+            persona.text_to_persona_prompt
+        )
+        text_to_persona_prompt_instruction = text_to_persona_prompt.format(
+            action=action, text=text
+        )
 
         # Set Agent to generate personal
         t2p_agent = ChatAgent(
@@ -119,7 +127,7 @@ def text_to_persona(
         # Get output from agent
         try:
             response = t2p_agent.step(
-                t2p_prompt_instruction,
+                text_to_persona_prompt_instruction,
                 response_format=PersonaResponse,  # type: ignore[arg-type]
             )
             parsed_content = ast.literal_eval(response.msg.content)
@@ -141,7 +149,9 @@ def persona_to_persona(self, persona: Persona) -> Dict[uuid.UUID, Persona]:
         Returns:
             Dict[uuid.UUID, Persona]: A dictionary of related personas.
         """
-        p2p_prompt: Union[TextPrompt, str] = persona.p2p_prompt
+        persona_to_persona_prompt: Union[TextPrompt, str] = (
+            persona.persona_to_persona_prompt
+        )
         answer_template = """
 You MUST answer the question according to the format of the ANSWER TEMPLATE, and you can only modify the content within <BLANK>.
 ===== ANSWER TEMPLATE =====
@@ -151,8 +161,8 @@ def persona_to_persona(self, persona: Persona) -> Dict[uuid.UUID, Persona]:
 n. persona_name: <BLANK>
 persona_description: <BLANK>
 """  # noqa: E501
-        p2p_prompt_instruction = (
-            p2p_prompt.format(
+        persona_to_persona_prompt_instruction = (
+            persona_to_persona_prompt.format(
                 persona_name=persona.name,
                 persona_description=persona.description,
             )
@@ -167,7 +177,7 @@ def persona_to_persona(self, persona: Persona) -> Dict[uuid.UUID, Persona]:
         # Get output from agent
         try:
             response = p2p_agent.step(
-                p2p_prompt_instruction  # type: ignore[arg-type]
+                persona_to_persona_prompt_instruction  # type: ignore[arg-type]
             )
             # Structured output (TODO: Use a more robust parser)
             pattern = r"(\d+)\.\s*persona_name:\s*(.*?)\s*persona_description:\s*(.*?)\s*(?=\d+\.|$)"  # noqa: E501