π’ Check out our detailed Berkeley Function Calling Leaderboard changelog (Last updated: ) for the latest dataset / model updates to the Berkeley Function Calling Leaderboard!
-
π― [10/04/2024] Introducing the Agent Arena by Gorilla X LMSYS Chatbot Arena! Compare different agents in tasks like search, finance, RAG, and beyond. Explore which models and tools work best for specific tasks through our novel ranking system and community-driven prompt hub. [Blog] [Arena] [Leaderboard] [Dataset] [Tweet]
-
π£ [09/21/2024] Announcing BFCL V3 - Evaluating multi-turn and multi-step function calling capabilities! New state-based evaluation system tests models on handling complex workflows, sequential functions, and service states. [Blog] [Leaderboard] [Code] [Tweet]
-
π [08/20/2024] Released BFCL V2 β’ Live! The Berkeley Function-Calling Leaderboard now features enterprise-contributed data and real-world scenarios. [Blog] [Live Leaderboard] [V2 Categories Leaderboard] [Tweet]
-
β‘οΈ [04/12/2024] Excited to release GoEx - a runtime for LLM-generated actions like code, API calls, and more. Featuring "post-facto validation" for assessing LLM actions after execution, "undo" and "damage confinement" abstractions to manage unintended actions & risks. This paves the way for fully autonomous LLM agents, enhancing interaction between apps & services with human-out-of-loop. [Blog] [Code] [Paper] [Tweet]
-
β° [04/01/2024] Introducing cost and latency metrics into Berkeley function calling leaderboard!
-
π [03/15/2024] RAFT: Adapting Language Model to Domain Specific RAG is live! [MSFT-Meta blog] [Berkeley Blog]
-
π [02/26/2024] Berkeley Function Calling Leaderboard is live!
-
π― [02/25/2024] OpenFunctions v2 sets new SoTA for open-source LLMs!
-
π₯ [11/16/2023] Excited to release Gorilla OpenFunctions
-
π» [06/29/2023] Released gorilla-cli, LLMs for your CLI!
-
π’ [06/06/2023] Released Commercially usable, Apache 2.0 licensed Gorilla models
-
π [05/30/2023] Provided the CLI interface to chat with Gorilla!
-
π [05/28/2023] Released Torch Hub and TensorFlow Hub Models!
-
π [05/27/2023] Released the first Gorilla model! or π€!
-
π₯ [05/27/2023] We released the APIZoo contribution guide for community API contributions!
-
π₯ [05/25/2023] We release the APIBench dataset and the evaluation code of Gorilla!
Gorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke.
With Gorilla, we are the first to demonstrate how to use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. This repository contains inference code for running Gorilla finetuned models, evaluation code for reproducing results from our paper, and APIBench - the largest collection of APIs, curated and easy to be trained on!
Since our initial release, we've served ~500k requests and witnessed incredible adoption by developers worldwide. The project has expanded to include tools, evaluations, leaderboard, end-to-end finetuning recipes, infrastructure components, and the Gorilla API Store:
Project | Type | Description (click to expand) |
---|---|---|
Gorilla Paper | π€ Model π Fine-tuning π Dataset π Evaluation π§ Infra |
Large Language Model Connected with Massive APIsβ’ Novel finetuning approach for API invocationβ’ Evaluation on 1,600+ APIs (APIBench) β’ Retrieval-augmented training for test-time adaptation |
Gorilla OpenFunctions-V2 | π€ Model | Drop-in alternative for function calling, supporting multiple complex data types and parallel executionβ’ Multiple & parallel function execution with OpenAI-compatible endpointsβ’ Native support for Python, Java, JavaScript, and REST APIs with expanded data types β’ Function relevance detection to reduce hallucinations β’ Enhanced RESTful API formatting capabilities β’ State-of-the-art performance among open-source models |
Berkeley Function Calling Leaderboard (BFCL) | π Evaluation π Leaderboard π§ Function Calling Infra π Dataset |
Comprehensive evaluation of function-calling capabilitiesβ’ V1: Expert-curated dataset for evaluating single-turn function callingβ’ V2: Enterprise-contributed data for real-world scenarios β’ V3: Multi-turn & multi-step function calling evaluation β’ Cost and latency metrics for all models β’ Interactive API explorer for testing β’ Community-driven benchmarking platform |
Agent Arena | π Evaluation π Leaderboard |
Compare LLM agents across models, tools, and frameworksβ’ Head-to-head agent comparisons with ELO rating systemβ’ Framework compatibility testing (LangChain, AutoGPT) β’ Community-driven evaluation platform β’ Real-world task performance metrics |
Gorilla Execution Engine (GoEx) | π§ Infra | Runtime for executing LLM-generated actions with safety guaranteesβ’ Post-facto validation for verifying LLM actions after executionβ’ Undo capabilities and damage confinement for risk mitigation β’ OAuth2 and API key authentication for multiple services β’ Support for RESTful APIs, databases, and filesystem operations β’ Docker-based sandboxed execution environment |
Retrieval-Augmented Fine-tuning (RAFT) | π Fine-tuning π€ Model |
Fine-tuning LLMs for robust domain-specific retrievalβ’ Novel fine-tuning recipe for domain-specific RAGβ’ Chain-of-thought answers with direct document quotes β’ Training with oracle and distractor documents β’ Improved performance on PubMed, HotpotQA, and Gorilla benchmarks β’ Efficient adaptation of smaller models for domain QA |
Gorilla CLI | π€ Model π§ Local CLI Infra |
LLMs for your command-line interfaceβ’ User-friendly CLI tool supporting ~1500 APIs (Kubernetes, AWS, GCP, etc.)β’ Natural language command generation with multi-LLM fusion β’ Privacy-focused with explicit execution approval β’ Command history and interactive selection interface |
Gorilla API Zoo | π Dataset | A community-maintained repository of up-to-date API documentationβ’ Centralized, searchable index of APIs across domainsβ’ Structured documentation format with arguments, versioning, and examples β’ Community-driven updates to keep pace with API changes β’ Rich data source for model training and fine-tuning β’ Enables retrieval-augmented training and inference β’ Reduces hallucination through up-to-date documentation |
Try Gorilla in your browser:
- π Gorilla Colab Demo: Try the base Gorilla model
- π Gorilla Gradio Demo: Interactive web interface
- π₯ OpenFunctions Colab Demo: Try the latest OpenFunctions model
- π― OpenFunctions Website Demo: Experiment with function calling
- π Berkeley Function Calling Leaderboard: Compare function calling capabilities
- Gorilla CLI - Fastest way to get started
pip install gorilla-cli
gorilla generate 100 random characters into a file called test.txt
Learn more about Gorilla CLI β
- Run Gorilla Locally
git clone https://github.com/ShishirPatil/gorilla.git
cd gorilla/inference
Detailed local setup instructions β
- Use OpenFunctions
import openai
openai.api_key = "EMPTY"
openai.api_base = "http://luigi.millennium.berkeley.edu:8000/v1"
# Define your functions
functions = [{
"name": "get_current_weather",
"description": "Get weather in a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}]
# Make API call
completion = openai.ChatCompletion.create(
model="gorilla-openfunctions-v2",
messages=[{"role": "user", "content": "What's the weather in San Francisco?"}],
functions=functions
)
OpenFunctions documentation β
-
π Evaluation & Benchmarking
- Berkeley Function Calling Leaderboard: Compare function calling capabilities
- Agent Arena: Evaluate agent workflows
- Gorilla Paper Evaluation Scripts: Run your own evaluations
-
π οΈ Development Tools
- I would like to use Gorilla commercially. Is there going to be an Apache 2.0 licensed version?
Yes! We now have models that you can use commercially without any obligations.
- Can we use Gorilla with other tools like Langchain etc?
Absolutely! You've highlighted a great aspect of our tools. Gorilla is an end-to-end model, specifically tailored to serve correct API calls (tools) without requiring any additional coding. It's designed to work as part of a wider ecosystem and can be flexibly integrated within agentic frameworks and other tools.
Langchain, is a versatile developer tool. Its "agents" can efficiently swap in any LLM, Gorilla included, making it a highly adaptable solution for various needs.
The beauty of these tools truly shines when they collaborate, complementing each other's strengths and capabilities to create an even more powerful and comprehensive solution. This is where your contribution can make a difference. We enthusiastically welcome any inputs to further refine and enhance these tools.
Check out our blog on How to Use Gorilla: A Step-by-Step Walkthrough to see all the different ways you can integrate Gorilla in your projects.
In the immediate future, we plan to release the following:
- Multimodal function-calling leaderboard
- Agentic function-calling leaderboard
- New batch of user contributed live function calling evals.
- BFCL metrics to evaluate contamination
- Openfunctions-v3 model to support more languages and multi-turn capability
- Agent Arena to compare LLM agents across models, tools, and frameworks [10/04/2024]
- Multi-turn and multi-step function calling evaluation [09/21/2024]
- User contributed Live Function Calling Leaderboard [08/20/2024]
- BFCL systems metrics including cost and latency [04/01/2024]
- Gorilla Execution Engine (GoEx) - Runtime for executing LLM-generated actions with safety guarantees [04/12/2024]
- Berkeley Function Calling leaderboard (BFCL) for evaluating tool-calling/function-calling models [02/26/2024]
- Openfunctions-v2 with more languages (Java, JS, Python), relevance detection [02/26/2024]
- API Zoo Index for easy access to all APIs [02/16/2024]
- Openfunctions-v1, Apache 2.0, with parallel and multiple function calling [11/16/2023]
- Openfunctions-v0, Apache 2.0 function calling model [11/16/2023]
- Release a commercially usable, Apache 2.0 licensed Gorilla model [06/05/2023]
- Release weights for all APIs from APIBench [05/28/2023]
- Run Gorilla LLM locally [05/28/2023]
- Release weights for HF model APIs [05/27/2023]
- Hosted Gorilla LLM chat for HF model APIs [05/27/2023]
- Opening up the APIZoo for contributions from community
- Dataset and Eval Code
Gorilla is Apache 2.0 licensed, making it suitable for both academic and commercial use.
- π¬ Join our Discord Community
- π¦ Follow us on X
@article{patil2023gorilla,
title={Gorilla: Large Language Model Connected with Massive APIs},
author={Shishir G. Patil and Tianjun Zhang and Xin Wang and Joseph E. Gonzalez},
year={2023},
journal={arXiv preprint arXiv:2305.15334},
}