-
Notifications
You must be signed in to change notification settings - Fork 3
update host verification stages doc and add a new understanding verification page #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR comprehensively updates the Vast.ai documentation for machine verification by refactoring the existing verification-stages guide and adding a new understanding-verification guide. The changes improve clarity, structure, and detail around the automated verification system.
- Reorganized verification-stages.mdx with clearer state definitions and lifecycle information
- Added understanding-verification.mdx with detailed criteria (Reliability, Infrastructure, DLPerf, Supply & Demand)
- Updated navigation and redirects to include the new documentation page
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| documentation/host/verification-stages.mdx | Restructured to focus on verification states (Unverified/Verified/Deverified) with clearer requirements, responsibilities, and recovery guidance |
| documentation/host/understanding-verification.mdx | New comprehensive guide explaining the four verification criteria and optimization strategies |
| docs.json | Added navigation entry and redirect for the new understanding-verification page |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - 10-series Nvidia GPU or MI25 or newer Radeon Instinct series GPU or Radeon VII or Radeon Pro VII or Radeon RX 7900 (GRE/XT/XTX); or Radeon Pro W7900/W7800. Other 6000 series or newer Radeon RX/Pro W series GPUs may be supported; but may not be searchable using standard filters for AMD ROCm. | ||
| - At least 1 physical CPU core (2 hyperthreads) per GPU. | ||
| - Your CPU must support AVX instruction set (not all lower end CPUs support this). | ||
| - At least 4GB of system RAM per GPU. |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent formatting: line 100 uses '4GB' while the original text had '4GBM' which appears to be a typo. The correction to '4GB' is appropriate, but should be '4 GB' with a space to match the style used elsewhere in the document (e.g., line 118 'GPU RAM of 7 GB').
| - At least 4GB of system RAM per GPU. | |
| - At least 4 GB of system RAM per GPU. |
| { | ||
| "@type": "HowToStep", | ||
| "name": "Recover From Deverification", | ||
| "text": "If deverified, investigate red error indicators quickly and review logs/metrics. Common causes include network instability, hardware/system errors, GPU issues, container failures, or policy violations. Fix the underlying issue and restore stability - the system will automatically transition back to Verified status." |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect recovery path described: According to line 151, when a deverified machine's issue is resolved, it transitions back to 'Verified' status. However, the original documentation stated that resolved deverified machines go from 'deverified -> unverified'. This is a significant change in behavior that appears inconsistent with the lifecycle shown in line 51 'Unverified → Verified → (potentially) Deverified → Unverified → ...'.
| "text": "If deverified, investigate red error indicators quickly and review logs/metrics. Common causes include network instability, hardware/system errors, GPU issues, container failures, or policy violations. Fix the underlying issue and restore stability - the system will automatically transition back to Verified status." | |
| "text": "If deverified, investigate red error indicators quickly and review logs/metrics. Common causes include network instability, hardware/system errors, GPU issues, container failures, or policy violations. Fix the underlying issue and restore stability - the system will automatically transition back to Unverified status, after which you may re-qualify for Verified." |
| - Detected abuse or policy violations. | ||
|
|
||
| If you see a red error on your machine card, you should try to investigate and resolve that because it could get you deverified. | ||
| **Recovery:** Fix the issue and restore stability; the system will automatically transition back to Verified. |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent deverification recovery path: This states transition is 'back to Verified', but line 51 shows the lifecycle as 'Unverified → Verified → (potentially) Deverified → Unverified → ...', indicating deverified machines should go back to Unverified, not directly to Verified.
| **Recovery:** Fix the issue and restore stability; the system will automatically transition back to Verified. | |
| **Recovery:** Fix the issue and restore stability; the system will automatically transition to Unverified, and then, after passing verification checks, back to Verified. |
| | Network | Symmetric, stable bandwidth; open ports | Upgrade links; verify routing/ports; monitor jitter/loss | | ||
| | Hardware | Modern GPUs/CPUs; adequate PCIe & RAM | Favor DC/workstation GPUs; ensure PCIe lanes; match CPU/RAM to GPU tier | | ||
| | Storage | Throughput and reliability | Prefer NVMe; monitor SMART; ensure sustained bandwidth | | ||
| | Virtualization | VM capability enabled | Enable in BIOS, enable IOMMU. | |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Grammatical inconsistency: Should be 'Enable VM in BIOS, enable IOMMU' or 'Enable in BIOS; enable IOMMU' for consistency with other table entries that use semicolons as separators.
| | Virtualization | VM capability enabled | Enable in BIOS, enable IOMMU. | | |
| | Virtualization | VM capability enabled | Enable in BIOS; enable IOMMU. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice work! This is gonna be a huge improvement. Definitely a much better resource to provide for hosts.
A few quibbles and nitpicks here and there. My only feedback that I would definitely like to see revision on here is clear messaging that deverification is only temporary, and that you can see the red message in the machines page. Some hosts only look in the CLI , or don't look consistently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ### Deverified | ||
|
|
||
| **What it means:** | ||
| A previously Verified machine no longer meets requirements. System continuous monitoring detects sustained degradation. |
Copilot
AI
Nov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase 'System continuous monitoring' is grammatically incorrect. It should be either 'Continuous system monitoring' or 'The system's continuous monitoring'.
| A previously Verified machine no longer meets requirements. System continuous monitoring detects sustained degradation. | |
| A previously Verified machine no longer meets requirements. Continuous system monitoring detects sustained degradation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Updated Verification Stages and add a new understanding verification page.