Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@
"pages": [
"documentation/host/hosting-overview",
"documentation/host/verification-stages",
"documentation/host/understanding-verification",
"documentation/host/datacenter-status",
"documentation/host/earning",
"documentation/host/payment",
Expand Down Expand Up @@ -429,6 +430,10 @@
"source": "/verification-stages",
"destination": "/documentation/host/verification-stages"
},
{
"source": "/understanding-verification",
"destination": "/documentation/host/understanding-verification"
},
{
"source": "/data-movement",
"destination": "/documentation/instances/storage/data-movement"
Expand Down
171 changes: 171 additions & 0 deletions documentation/host/understanding-verification.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
---
title: Understanding Verification
createdAt: Mon Nov 03 2025 17:30:00 GMT+0000 (Coordinated Universal Time)
updatedAt: Mon Nov 03 2025 17:30:00 GMT+0000 (Coordinated Universal Time)
"canonical": "/documentation/host/understanding-verification"
---

<script type="application/ld+json" dangerouslySetInnerHTML={{
__html: JSON.stringify({
"@context": "https://schema.org",
"@type": "HowTo",
"name": "Understanding Verification on Vast.ai",
"description": "A comprehensive guide to understanding how machine verification works on Vast.ai platform and how to optimize your machine for verification.",
"step": [
{
"@type": "HowToStep",
"name": "Ensure Reliability",
"text": "Maintain consistent uptime with minimal downtime. Keep network connectivity stable, manage thermals and power to prevent throttling, and proactively monitor hardware health. Aim for sustained ≥99.99% uptime to improve verification likelihood."
},
{
"@type": "HowToStep",
"name": "Optimize Infrastructure Configuration",
"text": "Use modern datacenter/workstation GPUs with adequate VRAM and GPU count. Ensure proper PCIe bandwidth, strong server-grade CPUs, and NVMe storage. Enable VM support in BIOS and maintain high-speed, symmetric, stable network bandwidth with open ports."
},
{
"@type": "HowToStep",
"name": "Maximize DLPerf Score",
"text": "Install the latest compatible drivers and CUDA versions. Eliminate PCIe, thermal, and power bottlenecks to maintain sustained GPU clocks. Ensure proper system configuration to achieve high real-world deep learning performance."
},
{
"@type": "HowToStep",
"name": "Align with Supply & Demand",
"text": "Offer in-demand GPU models with adequate VRAM and balanced resources. Choose popular GPUs that align with renter preferences and maintain strong reliability to remain attractive in the marketplace."
},
{
"@type": "HowToStep",
"name": "Maintain Software Excellence",
"text": "Keep drivers and CUDA correctly installed and compatible using stable latest releases. Keep systems clean by running workloads via Create Job only. Avoid background services that consume resources."
},
{
"@type": "HowToStep",
"name": "Monitor and Upgrade Responsibly",
"text": "Scale up by adding GPUs or RAM when needed, but never reduce hardware after machine creation as this triggers deverification. Monitor system health proactively and address issues promptly to maintain verification status."
}
]
})
}} />

## How Verification Works

Verification is **fully automated and powered by proprietary algorithms** that continuously evaluate each machine’s operational health and performance.

Only machines that meet the platform’s defined reliability and performance thresholds are marked as **Verified**.

This process involves **no manual intervention**, ensuring consistent, scalable, and objective verification across all systems.

---

To qualify, a machine must pass minimum baseline, and health/stability checks. Beyond that, the system evaluates four primary criteria (order not indicative of priority):

## 1) Reliability

**Definition:** Stable, uninterrupted operation over time (uptime, resilience under continuous workloads).

**Do**

- Maintain consistent uptime with minimal downtime.
- Keep network connectivity stable; avoid jitter and drops.
- Manage thermals and power to prevent throttling.
- Proactively monitor hardware health and perform maintenance.

**Avoid**

- Frequent restarts or unplanned outages.
- Overheating, undervolting, or unstable power delivery.

> **Note:** Higher reliability greatly improves verification likelihood. Sustained ≥99.99% (up to 99.9999%+) uptime is typically favored.

---

## 2) Infrastructure Configuration

**Definition:** Hardware, network, and software readiness to meet operational standards.

### Hardware

- **GPU:** Type, memory, and count matter. Newer datacenter/workstation GPUs are prioritized (e.g., B200 > H200 >> 5090 > 4070).
- **VRAM:** More VRAM improves performance profiles.
- **GPU Count:** For the same GPU type, more GPUs increase verification likelihood (e.g., 8×5090 >> 2×5090 > 1×5090).
- **PCIe Bandwidth:** Adequate throughput is essential; bottlenecks depress DLPerf and overall performance.
- **CPU:** Favor strong, server-grade CPUs; actual measured performance matters.
- **Storage:** Both capacity and bandwidth (e.g., NVMe) impact responsiveness and reliability.

### Network

- High-speed, symmetric, stable bandwidth is favored.
- Ensure required ports are open and accessible; a static IP helps.

### Virtualization

- Enabling VM support significantly improves verification likelihood.

### Software

- Drivers/CUDA must be correctly installed and compatible (use stable **latest** releases).
- Keep systems clean; run workloads via Create Job only.

### System Optimization & Upgrades

- Balanced scaling matters (CPU/RAM/PCIe/bandwidth commensurate with GPU tier).
- Do not reduce hardware after creation (e.g., fewer GPUs/RAM) – this will trigger Deverified.
- Upgrades (adding GPUs/RAM) are allowed but may take time to reflect across the platform.

**Do**

- Verify GPU PCIe connections provide full bandwidth and are not throttled.
- Keep the latest drivers/CUDA aligned with workloads.
- Confirm required ports and end-to-end reachability.

**Avoid**

- Pairing high-end GPUs with under-provisioned CPU/RAM.
- Letting hidden background services consume resources.

---

## 3) DLPerf Score

**Definition:** Estimated GPU performance on typical deep-learning tasks (e.g., CNN/Transformer training) for cross-hardware comparison. Higher DLPerf improves verification odds. [Read more](https://docs.vast.ai/documentation/reference/faq/rental-types#dlperf-scoring)

**Do**

- Use the **latest** compatible drivers/CUDA.
- Eliminate PCIe, thermal, and power bottlenecks to maintain sustained clocks.

**Avoid**

- Misconfigurations that suppress benchmark performance.

---

## 4) Supply & Demand Analysis

**Definition:** Ongoing evaluation of market trends and renter behavior to surface configurations most likely to be rented.

**Implication:** Machines aligned with active renter preferences—popular GPUs, sufficient VRAM, strong reliability, fast internet—are prioritized for verification to maximize utilization and profitability.

**Do**

- Offer in-demand GPU models with adequate VRAM and balanced resources.
- Maintain strong reliability to remain attractive once listed.

**Avoid**

- Niche/mismatched configurations with low renter interest.

---

## Quick Reference

| **Category** | **What Matters Most** | **How to Improve** |
|---------------|------------------------|--------------------|
| Reliability | Stable, uninterrupted uptime | Proactive monitoring; steady power/thermals; minimize restarts |
| Network | Symmetric, stable bandwidth; open ports | Upgrade links; verify routing/ports; monitor jitter/loss |
| Hardware | Modern GPUs/CPUs; adequate PCIe & RAM | Favor DC/workstation GPUs; ensure PCIe lanes; match CPU/RAM to GPU tier |
| Storage | Throughput and reliability | Prefer NVMe; monitor SMART; ensure sustained bandwidth |
| Virtualization | VM capability enabled | Enable in BIOS, enable IOMMU. |
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Grammatical inconsistency: Should be 'Enable VM in BIOS, enable IOMMU' or 'Enable in BIOS; enable IOMMU' for consistency with other table entries that use semicolons as separators.

Suggested change
| Virtualization | VM capability enabled | Enable in BIOS, enable IOMMU. |
| Virtualization | VM capability enabled | Enable in BIOS; enable IOMMU. |

Copilot uses AI. Check for mistakes.
| Software | Correct drivers/CUDA; clean system | Install latest, stable, compatible versions; use Create Job only |
| DLPerf | High real-world throughput | Fix PCIe/thermal bottlenecks; maintain clocks; correct drivers |
| Supply & Demand | Alignment with renter needs | Choose popular GPUs/VRAM; balance specs; maintain reliability |
| Upgrades | Changes reflected by platform | Scale up (add GPUs/RAM); avoid reductions that cause Deverified |
Loading