Skip to content

Evaluating the LLM Chat Agent on multiple evaluation benchmarks.

Notifications You must be signed in to change notification settings

khuzaimakt/Chat-Agent-Evalution

Repository files navigation

LLM Chat-Agent Evaluation

An online chat agent made using LangGraph and LangChain is evaluated on multiple metrics. Evaluation pipeline developed to evaluate the LLM's performance using different evaluation measures such as Toxicity, Biasedness, Hallucination, Relevance and Faithfulness. Various libraries including Pheonix-Eval, MLFlow, ContinuousEval and DeepEval were tested and implemented.

Garak Auto-Red Team was also implemented and the given agent was evaluated for adverserial attacks using Garak ATR.

GuardRails-Ai were implemented for the chat agent to invalidate the inputs specifying for harmful responses.

Microsoft Presidio is utilised to detect and anonymize personal identifiable information (PII) that could cause linguistic bias.

About

Evaluating the LLM Chat Agent on multiple evaluation benchmarks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published