add benchmark gorilla, nexus #1136

HHHHHejia · 2024-10-30T10:00:55Z

I've added the APIbank APIbench and Nexus benchmark, main method see benchmark test and utils folder (benchmark_base.py)

There're some problem to be solved for the APIBank, APIbench(gorilla) and Nexus benchmark. listed as below.

For Nexus:
run python nexus_test.py. You'll get error
1.OpenAI limits the size of the function passed into the function call api (function name, function description length, number of functions, etc.). You need to add judgment logic in Camel. If OpenAI does not allow function call, use structure output instead.

2.Critical: while true bug in camel.chatagent.step. When the incoming api is not executed correctly, while true will not terminate.The while true logic should be eliminated. You cannot assume that the function passed by the user will always be executed correctly.

For APIbench
There're three datasets 'torchhub', 'tensorhub', 'huggingface’ . "torchhub"works well. BUT
3.'tensorhub', 'huggingface’ could not be correctly evaluted by the ast matching program. This is a problem within the original repo. I have already proposed an issue. [(https://github.com/ShishirPatil/gorilla/issues/729)]

It could be version problem of tree_sitter, but if you don't use tree_sitter==0.20.4, you'll get an another bug.

For APIbank
There're three datasets 'level1', 'level2', 'level3’ . BUT
4.NO ONE knows how to eveluate 'level3'. See the issue in original repo:
[https://github.com/AlibabaResearch/DAMO-ConvAI/issues/167]
[https://github.com/AlibabaResearch/DAMO-ConvAI/issues/102]
[https://github.com/AlibabaResearch/DAMO-ConvAI/issues/114]

5.APIbank involves multiple "User-Assistant-System" messages as History Records. Camel ChatAgent does not support adding multiple rounds of system messages yet. Temporary solution: Use record_message and make_assistant_message instead of system messages.

6.The version conflict between openai in camel, Https, and Google translate in original repo, see
[https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/api-bank#demo]. Camel, Https and Google translate lib doesn't work together.
For now two way works:
-use original repo without camel, Google translate and Https works well.
-use camel, remove Google translate, it works but without Google translate tool.
See:
[https://github.com/microsoft/TaskWeaver/issues/172]

7.Some datasets need to be hosted on GitHub/HuggingFace. The original author did not do this, but we do not want to include these data in Camel's GitHub.

HHHHHejia and others added 3 commits October 30, 2024 02:56

add benchmark gorilla, nexus

713a2d2

add apibank

9398ffb

Merge branch 'master' into benchmark_hejia

f8ba0d7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add benchmark gorilla, nexus #1136

add benchmark gorilla, nexus #1136

HHHHHejia commented Oct 30, 2024 •

edited

Loading

add benchmark gorilla, nexus #1136

Are you sure you want to change the base?

add benchmark gorilla, nexus #1136

Conversation

HHHHHejia commented Oct 30, 2024 • edited Loading

HHHHHejia commented Oct 30, 2024 •

edited

Loading