Skip to content
forked from truera/trulens

Evaluation and Tracking for LLM Experiments

License

Notifications You must be signed in to change notification settings

C9luster/trulens

 
 

Repository files navigation

PyPI - Version Azure DevOps builds (job) GitHub PyPI - Downloads Slack Docs Open In Colab

🦑 Welcome to TruLens!

TruLens provides a set of tools for developing and monitoring neural nets, including large language models. This includes both tools for evaluation of LLMs and LLM-based applications with TruLens-Eval and deep learning explainability with TruLens-Explain. TruLens-Eval and TruLens-Explain are housed in separate packages and can be used independently.

The best way to support TruLens is to give us a ⭐ and join our slack community!

TruLens-Eval

TruLens-Eval contains instrumentation and evaluation tools for large language model (LLM) based applications. It supports the iterative development and monitoring of a wide range of LLM applications by wrapping your application to log key metadata across the entire chain (or off chain if your project does not use chains) on your local machine. Importantly, it also gives you the tools you need to evaluate the quality of your LLM-based applications.

TruLens-Eval has two key value propositions:

  1. Evaluation:
    • TruLens supports the evaluation of inputs, outputs and internals of your LLM application using any model (including LLMs).
    • A number of feedback functions for evaluation are implemented out-of-the-box such as groundedness, relevance and toxicity. The framework is also easily extensible for custom evaluation requirements.
  2. Tracking:
    • TruLens contains instrumentation for any LLM application including question answering, retrieval-augmented generation, agent-based applications and more. This instrumentation allows for the tracking of a wide variety of usage metrics and metadata. Read more in the instrumentation overview.
    • TruLens' instrumentation can be applied to any LLM application without being tied down to a given framework. Additionally, deep integrations with LangChain and Llama-Index allow the capture of internal metadata and text.
    • Anything that is tracked by the instrumentation can be evaluated!

The process for building your evaluated and tracked LLM application with TruLens is shown below 👇 Architecture Diagram

Installation and setup

Install trulens-eval from PyPI.

pip install trulens-eval

Quick Usage

TruLens supports the evaluation of tracking for any LLM app framework. Choose a framework below to get started:

Langchain

langchain_quickstart.ipynb. Open In Colab

Llama-Index

llama_index_quickstart.ipynb. Open In Colab

Custom Text to Text Apps

text2text_quickstart.ipynb. Open In Colab

TruLens-Explain

TruLens-Explain is a cross-framework library for deep learning explainability. It provides a uniform abstraction over a number of different frameworks. It provides a uniform abstraction layer over TensorFlow, Pytorch, and Keras and allows input and internal explanations.

Installation and Setup

These installation instructions assume that you have conda installed and added to your path.

  1. Create a virtual environment (or modify an existing one).
conda create -n "<my_name>" python=3  # Skip if using existing environment.
conda activate <my_name>
  1. Install dependencies.
conda install tensorflow-gpu=1  # Or whatever backend you're using.
conda install keras             # Or whatever backend you're using.
conda install matplotlib        # For visualizations.
  1. [Pip installation] Install the trulens pip package from PyPI.
pip install trulens

Quick Usage

To quickly play around with the TruLens library, check out the following Colab notebooks:

  • PyTorch: Open In Colab
  • TensorFlow 2 / Keras: Open In Colab

For more information, see TruLens-Explain Documentation.

About

Evaluation and Tracking for LLM Experiments

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 58.1%
  • Python 40.7%
  • TypeScript 1.0%
  • Makefile 0.2%
  • JavaScript 0.0%
  • Mako 0.0%