GitHub - khuzaimakt/Chat-Agent-Evalution: Evaluating the LLM Chat Agent on multiple evaluation benchmarks.

LLM Chat-Agent Evaluation

An online chat agent made using LangGraph and LangChain is evaluated on multiple metrics. Evaluation pipeline developed to evaluate the LLM's performance using different evaluation measures such as Toxicity, Biasedness, Hallucination, Relevance and Faithfulness. Various libraries including Pheonix-Eval, MLFlow, ContinuousEval and DeepEval were tested and implemented.

Garak Auto-Red Team was also implemented and the given agent was evaluated for adverserial attacks using Garak ATR.

GuardRails-Ai were implemented for the chat agent to invalidate the inputs specifying for harmful responses.

Microsoft Presidio is utilised to detect and anonymize personal identifiable information (PII) that could cause linguistic bias.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ChatAgent_Evaluation.ipynb		ChatAgent_Evaluation.ipynb
Chat_Agent.py		Chat_Agent.py
Guardrail.ipynb		Guardrail.ipynb
README.md		README.md
garak.ipynb		garak.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

khuzaimakt/Chat-Agent-Evalution

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages