Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 75 additions & 5 deletions docs/demos/lcore/lcore.md
Original file line number Diff line number Diff line change
Expand Up @@ -374,11 +374,81 @@ uv run llama stack list-providers

## Evaluation

* Motivation
* Evaluation tool
- Ragas
- Deep Eval
* Statistical significance
---

## Why Evaluate an LLM System?

* Measure performance
* Ensure good user experience
* Detect bias & harm
* Comply with ethical & legal standards

---

## Benefits of Evaluation

* Improvement:
- Pinpoints weaknesses (e.g., hallucinations)
- Enables data-driven model tuning

* Benchmarking:
- Compare models (GPT, Gemini, Granite, etc.)
- Ensures reliability over time

---
### Lightspeed Evaluation Framework

<font size="10">[https://github.com/lightspeed-core/lightspeed-evaluation/](https://github.com/lightspeed-core/lightspeed-evaluation)</font>
---
Comment on lines +401 to +402
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Resolve markdown linting violation and avoid deprecated HTML.

Line 401 uses the deprecated <font> tag and the --- on line 402 creates a setext-style heading, which violates the markdownlint rule MD003 (heading-style). Modernize this by using an atx-style heading (##).

Apply this diff:

-<font size="10">[https://github.com/lightspeed-core/lightspeed-evaluation/](https://github.com/lightspeed-core/lightspeed-evaluation)</font>
----
+## [Lightspeed Evaluation Framework](https://github.com/lightspeed-core/lightspeed-evaluation/)

This replaces the deprecated <font> tag with a proper markdown heading, aligns with markdown linting standards, and keeps the GitHub link accessible.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<font size="10">[https://github.com/lightspeed-core/lightspeed-evaluation/](https://github.com/lightspeed-core/lightspeed-evaluation)</font>
---
## [Lightspeed Evaluation Framework](https://github.com/lightspeed-core/lightspeed-evaluation/)
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

401-401: Heading style
Expected: atx; Actual: setext

(MD003, heading-style)

🤖 Prompt for AI Agents
In docs/demos/lcore/lcore.md around lines 401 to 402, the content uses a
deprecated HTML <font> tag and a setext-style heading (the '---') which triggers
markdownlint MD003; replace the HTML and setext underline with a proper
ATX-style markdown heading that contains the GitHub link (e.g., start the line
with "##" followed by the link text or the link itself) so the heading is valid
markdown and the link remains accessible.


### Lightspeed Evaluation Framework

* Multi-Framework LLM as a Judge
- Ragas, DeepEval and custom implementations
* Turn & Conversation-Level
- Individual queries and multi-turn conversations
* Tools/Agents Support
* LLM Providers
- OpenAI, Watsonx, Gemini, vLLM and others
* Setup/Cleanup Scripts
* Statistical Analysis

---
```yaml
- conversation_group_id: "test_conversation"
description: "Sample evaluation"

# Optional: Environment setup/cleanup scripts, when API is enabled
setup_script: "scripts/setup_env.sh" # Run before conversation
cleanup_script: "scripts/cleanup_env.sh" # Run after conversation

# Conversation-level metrics
conversation_metrics:
- "deepeval:conversation_completeness"

conversation_metrics_metadata:
"deepeval:conversation_completeness":
threshold: 0.8

turns:
- turn_id: id1
query: What is OpenShift Virtualization?
response: null # Populated by API if enabled, otherwise provide
contexts:
- OpenShift Virtualization is an extension of the OpenShift ...
attachments: [] # Attachments (Optional)
expected_response: OpenShift Virtualization is an extension of the OpenShift Container Platform that allows running virtual machines alongside containers
expected_intent: "explain a concept" # Expected intent for intent evaluation

# Per-turn metrics (overrides system defaults)
turn_metrics:
- "ragas:faithfulness"
- "custom:answer_correctness"
- "custom:intent_eval"
```
---

## Demo

---

Expand Down
Loading