LLM Chat-Agent Evaluation
An online chat agent made using LangGraph and LangChain is evaluated on multiple metrics. Evaluation pipeline developed to evaluate the LLM's performance using different evaluation measures such as Toxicity, Biasedness, Hallucination, Relevance and Faithfulness. Various libraries including Pheonix-Eval, MLFlow, ContinuousEval and DeepEval were tested and implemented.
Garak Auto-Red Team was also implemented and the given agent was evaluated for adverserial attacks using Garak ATR.
GuardRails-Ai were implemented for the chat agent to invalidate the inputs specifying for harmful responses.
Microsoft Presidio is utilised to detect and anonymize personal identifiable information (PII) that could cause linguistic bias.