-
Notifications
You must be signed in to change notification settings - Fork 3
update host verification stages doc and add a new understanding verification page #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
f481a91
update host verification stages doc and add a new understanding verif…
hanran-y 743c790
update deverified section and other fix
hanran-y ba4e422
change verify to check
hanran-y dd8ac26
delete double title
hanran-y de2a493
small fix
hanran-y File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,171 @@ | ||||||
| --- | ||||||
| title: Understanding Verification | ||||||
| createdAt: Mon Nov 03 2025 17:30:00 GMT+0000 (Coordinated Universal Time) | ||||||
| updatedAt: Mon Nov 03 2025 17:30:00 GMT+0000 (Coordinated Universal Time) | ||||||
| "canonical": "/documentation/host/understanding-verification" | ||||||
| --- | ||||||
|
|
||||||
| <script type="application/ld+json" dangerouslySetInnerHTML={{ | ||||||
| __html: JSON.stringify({ | ||||||
| "@context": "https://schema.org", | ||||||
| "@type": "HowTo", | ||||||
| "name": "Understanding Verification on Vast.ai", | ||||||
| "description": "A comprehensive guide to understanding how machine verification works on Vast.ai platform and how to optimize your machine for verification.", | ||||||
| "step": [ | ||||||
| { | ||||||
| "@type": "HowToStep", | ||||||
| "name": "Ensure Reliability", | ||||||
| "text": "Maintain consistent uptime with minimal downtime. Keep network connectivity stable, manage thermals and power to prevent throttling, and proactively monitor hardware health. Aim for sustained ≥99.99% uptime to improve verification likelihood." | ||||||
| }, | ||||||
| { | ||||||
| "@type": "HowToStep", | ||||||
| "name": "Optimize Infrastructure Configuration", | ||||||
| "text": "Use modern datacenter/workstation GPUs with adequate VRAM and GPU count. Ensure proper PCIe bandwidth, strong server-grade CPUs, and NVMe storage. Enable VM support in BIOS and maintain high-speed, symmetric, stable network bandwidth with open ports." | ||||||
| }, | ||||||
| { | ||||||
| "@type": "HowToStep", | ||||||
| "name": "Maximize DLPerf Score", | ||||||
| "text": "Install the latest compatible drivers and CUDA versions. Eliminate PCIe, thermal, and power bottlenecks to maintain sustained GPU clocks. Ensure proper system configuration to achieve high real-world deep learning performance." | ||||||
| }, | ||||||
| { | ||||||
| "@type": "HowToStep", | ||||||
| "name": "Align with Supply & Demand", | ||||||
| "text": "Offer in-demand GPU models with adequate VRAM and balanced resources. Choose popular GPUs that align with renter preferences and maintain strong reliability to remain attractive in the marketplace." | ||||||
| }, | ||||||
| { | ||||||
| "@type": "HowToStep", | ||||||
| "name": "Maintain Software Excellence", | ||||||
| "text": "Keep drivers and CUDA correctly installed and compatible using stable latest releases. Keep systems clean by running workloads via Create Job only. Avoid background services that consume resources." | ||||||
| }, | ||||||
| { | ||||||
| "@type": "HowToStep", | ||||||
| "name": "Monitor and Upgrade Responsibly", | ||||||
| "text": "Scale up by adding GPUs or RAM when needed, but never reduce hardware after machine creation as this triggers deverification. Monitor system health proactively and address issues promptly to maintain verification status." | ||||||
| } | ||||||
| ] | ||||||
| }) | ||||||
| }} /> | ||||||
|
|
||||||
| ## How Verification Works | ||||||
|
|
||||||
| Verification is **fully automated and powered by proprietary algorithms** that continuously evaluate each machine’s operational health and performance. | ||||||
|
|
||||||
| Only machines that meet the platform’s defined reliability and performance thresholds are marked as **Verified**. | ||||||
|
|
||||||
| This process involves **no manual intervention**, ensuring consistent, scalable, and objective verification across all systems. | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| To qualify, a machine must pass minimum baseline, and health/stability checks. Beyond that, the system evaluates four primary criteria (order not indicative of priority): | ||||||
|
|
||||||
| ## 1) Reliability | ||||||
|
|
||||||
| **Definition:** Stable, uninterrupted operation over time (uptime, resilience under continuous workloads). | ||||||
|
|
||||||
| **Do** | ||||||
|
|
||||||
| - Maintain consistent uptime with minimal downtime. | ||||||
| - Keep network connectivity stable; avoid jitter and drops. | ||||||
| - Manage thermals and power to prevent throttling. | ||||||
| - Proactively monitor hardware health and perform maintenance. | ||||||
|
|
||||||
| **Avoid** | ||||||
|
|
||||||
| - Frequent restarts or unplanned outages. | ||||||
| - Overheating, undervolting, or unstable power delivery. | ||||||
|
|
||||||
| > **Note:** Higher reliability greatly improves verification likelihood. Sustained ≥99.99% (up to 99.9999%+) uptime is typically favored. | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## 2) Infrastructure Configuration | ||||||
|
|
||||||
| **Definition:** Hardware, network, and software readiness to meet operational standards. | ||||||
|
|
||||||
| ### Hardware | ||||||
|
|
||||||
| - **GPU:** Type, memory, and count matter. Newer datacenter/workstation GPUs are prioritized (e.g., B200 > H200 >> 5090 > 4070). | ||||||
| - **VRAM:** More VRAM improves performance profiles. | ||||||
| - **GPU Count:** For the same GPU type, more GPUs increase verification likelihood (e.g., 8×5090 >> 2×5090 > 1×5090). | ||||||
| - **PCIe Bandwidth:** Adequate throughput is essential; bottlenecks depress DLPerf and overall performance. | ||||||
| - **CPU:** Favor strong, server-grade CPUs; actual measured performance matters. | ||||||
| - **Storage:** Both capacity and bandwidth (e.g., NVMe) impact responsiveness and reliability. | ||||||
|
|
||||||
| ### Network | ||||||
|
|
||||||
| - High-speed, symmetric, stable bandwidth is favored. | ||||||
| - Ensure required ports are open and accessible; a static IP helps. | ||||||
|
|
||||||
| ### Virtualization | ||||||
|
|
||||||
| - Enabling VM support significantly improves verification likelihood. | ||||||
|
|
||||||
| ### Software | ||||||
|
|
||||||
| - Drivers/CUDA must be correctly installed and compatible (use stable **latest** releases). | ||||||
| - Keep systems clean; run workloads via Create Job only. | ||||||
|
|
||||||
| ### System Optimization & Upgrades | ||||||
|
|
||||||
| - Balanced scaling matters (CPU/RAM/PCIe/bandwidth commensurate with GPU tier). | ||||||
| - Do not reduce hardware after creation (e.g., fewer GPUs/RAM) – this will trigger Deverified. | ||||||
| - Upgrades (adding GPUs/RAM) are allowed but may take time to reflect across the platform. | ||||||
|
|
||||||
| **Do** | ||||||
|
|
||||||
| - Verify GPU PCIe connections provide full bandwidth and are not throttled. | ||||||
| - Keep the latest drivers/CUDA aligned with workloads. | ||||||
| - Confirm required ports and end-to-end reachability. | ||||||
|
|
||||||
| **Avoid** | ||||||
|
|
||||||
| - Pairing high-end GPUs with under-provisioned CPU/RAM. | ||||||
| - Letting hidden background services consume resources. | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## 3) DLPerf Score | ||||||
|
|
||||||
| **Definition:** Estimated GPU performance on typical deep-learning tasks (e.g., CNN/Transformer training) for cross-hardware comparison. Higher DLPerf improves verification odds. [Read more](https://docs.vast.ai/documentation/reference/faq/rental-types#dlperf-scoring) | ||||||
|
|
||||||
| **Do** | ||||||
|
|
||||||
| - Use the **latest** compatible drivers/CUDA. | ||||||
| - Eliminate PCIe, thermal, and power bottlenecks to maintain sustained clocks. | ||||||
|
|
||||||
| **Avoid** | ||||||
|
|
||||||
| - Misconfigurations that suppress benchmark performance. | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## 4) Supply & Demand Analysis | ||||||
|
|
||||||
| **Definition:** Ongoing evaluation of market trends and renter behavior to surface configurations most likely to be rented. | ||||||
|
|
||||||
| **Implication:** Machines aligned with active renter preferences—popular GPUs, sufficient VRAM, strong reliability, fast internet—are prioritized for verification to maximize utilization and profitability. | ||||||
|
|
||||||
| **Do** | ||||||
|
|
||||||
| - Offer in-demand GPU models with adequate VRAM and balanced resources. | ||||||
| - Maintain strong reliability to remain attractive once listed. | ||||||
|
|
||||||
| **Avoid** | ||||||
|
|
||||||
| - Niche/mismatched configurations with low renter interest. | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## Quick Reference | ||||||
|
|
||||||
| | **Category** | **What Matters Most** | **How to Improve** | | ||||||
| |---------------|------------------------|--------------------| | ||||||
| | Reliability | Stable, uninterrupted uptime | Proactive monitoring; steady power/thermals; minimize restarts | | ||||||
| | Network | Symmetric, stable bandwidth; open ports | Upgrade links; verify routing/ports; monitor jitter/loss | | ||||||
| | Hardware | Modern GPUs/CPUs; adequate PCIe & RAM | Favor DC/workstation GPUs; ensure PCIe lanes; match CPU/RAM to GPU tier | | ||||||
| | Storage | Throughput and reliability | Prefer NVMe; monitor SMART; ensure sustained bandwidth | | ||||||
| | Virtualization | VM capability enabled | Enable in BIOS, enable IOMMU. | | ||||||
|
||||||
| | Virtualization | VM capability enabled | Enable in BIOS, enable IOMMU. | | |
| | Virtualization | VM capability enabled | Enable in BIOS; enable IOMMU. | |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.