- 2004 University Ave, Berkeley, California
Highlights
- Pro
-
VAB_for_hallucination Public
Forked from THUDM/VisualAgentBenchTowards Large Multimodal Models as Visual Foundation Agents
Python Apache License 2.0 UpdatedFeb 5, 2025 -
AgentBench Public
Forked from THUDM/AgentBenchA Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
Python Apache License 2.0 UpdatedJan 30, 2025 -
AC_for_hallucination Public
Forked from SamuelSchmidgall/AgentClinicAgent benchmark for medical diagnosis
Python MIT License UpdatedDec 31, 2024 -
-
-
LLMAgentOODGym Public
OOD benchmark study for LLM agents based on BrowserGym and AgentLab from ServiceNow.
MIT License UpdatedDec 20, 2024 -
-
BrowserGym_OOD Public
Forked from ServiceNow/BrowserGymLLM agents OOD benchmark based on BrowserGym.
Python Other UpdatedNov 8, 2024 -
-
AgentGym_OOD Public
Forked from WooooDyy/AgentGymOOD for LLM agents based on AgentGym
Python MIT License UpdatedNov 2, 2024 -
WorkArena_OOD Public
Forked from ServiceNow/WorkArenaOOD for LLM agents study based on WorkArena
Python Other UpdatedOct 31, 2024 -
-
-
-
evals Public
Forked from openai/evalsEvals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Python Other UpdatedSep 30, 2024 -
-
FuxiCTR Public
Forked from reczoo/FuxiCTRA configurable, tunable, and reproducible library for CTR prediction https://fuxictr.github.io
Python Apache License 2.0 UpdatedApr 5, 2024 -
-
google_ml_kit_flutter Public
Forked from flutter-ml/google_ml_kit_flutterA flutter plugin that implements Google's standalone ML Kit
Dart MIT License UpdatedMar 19, 2024 -
-
-