This repo is meant to run a series of experiments to collect trace data from AI agents. And further use the data to analyze the performance of the agents, following the idea from LMCache/LMCache#1826.
Important note: If the agent is powered by an OPENAI model, the traces will be automatically tracked from the OpenAI API Dashboard. Collecting traces should be easy.
A simple vibe-coded agent trace prefix analyzer.
Status Legend: ✅ Done, 🔄 Ongoing, ⏳ Wait to be taken, ❌ Failed
| Project Name | Status | Notes |
|---|---|---|
| Terminus 1&2 | ✅ | From terminal bench and harbor project, source code open; Basically the agent context management is to append new round conversation on all the previous chat history. Because of this nature, prefix and non-prefix cache hit rates should be stricly the same. Each step, it plans and executes multiple functions in sequential order. Easy to collect traces. |
| Mini-swe-agent | ✅ | Running this agent with the terminal bench and harbor, the only difference between Terminus 1&2 is that Mini-swe-agent plans and executes one bash command(or multiple ones with '&&' but rarely happens) at one time, while Terminus 1&2 plans multiple commands and executes them in sequential order. So the context reuse pattern is very similar to Terminus 1&2. |
| Claude Code Agent | 🔄 | References: https://pierce.dev/notes/under-the-hood-of-claude-code, https://pierce.dev/notes/a-deep-dive-on-agent-sandboxes, https://medium.com/@outsightai/peeking-under-the-hood-of-claude-code-70f5a94a9a62; Planning to release a huggingface blog on 11/26/2025 |
| MetaGPT | 🔄 | The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming |
| OpenHands | 🔄 | OpenHands: Code Less, Make More. (formerly OpenDevin), a platform for software development agents powered by AI |
| GPT Pilot | 🔄 | GPT Pilot is the core technology for the Pythagora VS Code extension that aims to provide the first real AI developer companion |
| Aider | ⏳ | aider is AI pair programming in your terminal |
| Devika | ⏳ | Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective |
| RepoAgent | ⏳ | An LLM-powered repository agent designed to assist developers and teams in generating documentation and understanding repositories quickly |
| DSPy | ⏳ | The framework for programming—not prompting—foundation models |
| ThinkGPT | ⏳ | Agent techniques to augment your LLM and push it beyond its limits |
| PyCodeAGI | ⏳ | A small AGI experiment to generate a Python app given what app the user wants to build |
| SuperAGI | ⏳ | SuperAGI - A dev-first open source autonomous AI agent framework |
More agents to be added from the Awesome Agents repository, specifically focusing on the Software Development category. If the list is drained, use the backup list. Contact Kobe for any questions.
References:
- https://github.com/MLSysOps/Agent-Benchmarking-LMCache (open_deep_research and langmanus)
- https://www.anthropic.com/engineering/building-effective-agents
- https://www.arxiv.org/pdf/2510.04618 (Agentic Context Engineering)
- https://arxiv.org/pdf/2511.02230 (CONTINUUM)
- https://arxiv.org/pdf/2410.02506 (AgentPrune)
- https://arxiv.org/pdf/2510.12872 (KVCOMM)