Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add data collector for dataset generation #1193

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

liuxukun2000
Copy link
Collaborator

Description

add data collector for dataset generation

This is only a prototype!

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #15213 if this solves the issue #15213

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of example)

Implemented Tasks

  • Subtask 1
  • Subtask 2
  • Subtask 3

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide. (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly. (required for a bug fix or a new feature)
  • I have updated the documentation accordingly.

@Wendong-Fan Wendong-Fan added the Data Related to camel data processing label Nov 19, 2024
@Wendong-Fan Wendong-Fan added this to the Sprint 17 milestone Nov 19, 2024
Copy link
Member

@Wendong-Fan Wendong-Fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on comment below add one more commit: 09b9d89

free feel to leave your comments

camel/data_collector/base.py Outdated Show resolved Hide resolved
camel/data_collector/base.py Outdated Show resolved Hide resolved
ori_update_memory = agent.update_memory

def update_memory(
message: BaseMessage, role: OpenAIBackendRole
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

role could be got from BaseMessage

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Wendong, the new update_memory needs to ensure consistency with the parameters of the old one. Additionally, using basemessage's role will set messages of type function_call to role=assistant, which may lead to issues in distinguishing them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Xukun, maybe we can check whether tool_calls exits in the BaseMessage, if it exits then we can set role type = tool

Comment on lines 40 to 41
role: OpenAIBackendRole,
name: Optional[str] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these 2 information could get from BaseMessage

Comment on lines 60 to 61
if len(message.msgs) != 1:
raise ValueError("Only supports one message in response")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could also support multi msg

] = defaultdict(list)
self._recording = False
self.agents: List[Tuple[str, BaseAgent]] = []
self._id = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dynamic id, one history set one uuid

camel/data_collector/base.py Outdated Show resolved Hide resolved
Comment on lines +40 to +44
if len(self.agents) > 1:
raise ValueError("AlpacaDataCollector only supports one agent")
if isinstance(agent, list):
if len(agent) != 1:
raise ValueError("AlpacaDataCollector only supports one agent")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on current design I think we can further support list of agents and multiple msg, could we add this support with this PR?

@Wendong-Fan Wendong-Fan linked an issue Nov 19, 2024 that may be closed by this pull request
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data Related to camel data processing
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[Feature Request] Covert BaseMessage to alpaca format
2 participants