Skip to content

Conversation

@hanran-y
Copy link
Contributor

@hanran-y hanran-y commented Nov 4, 2025

Updated Verification Stages and add a new understanding verification page.

@hanran-y hanran-y requested review from Copilot and froghaus November 4, 2025 02:40
@hanran-y hanran-y self-assigned this Nov 4, 2025
@hanran-y hanran-y added documentation Improvements or additions to documentation good first issue Good for newcomers labels Nov 4, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR comprehensively updates the Vast.ai documentation for machine verification by refactoring the existing verification-stages guide and adding a new understanding-verification guide. The changes improve clarity, structure, and detail around the automated verification system.

  • Reorganized verification-stages.mdx with clearer state definitions and lifecycle information
  • Added understanding-verification.mdx with detailed criteria (Reliability, Infrastructure, DLPerf, Supply & Demand)
  • Updated navigation and redirects to include the new documentation page

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
documentation/host/verification-stages.mdx Restructured to focus on verification states (Unverified/Verified/Deverified) with clearer requirements, responsibilities, and recovery guidance
documentation/host/understanding-verification.mdx New comprehensive guide explaining the four verification criteria and optimization strategies
docs.json Added navigation entry and redirect for the new understanding-verification page

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- 10-series Nvidia GPU or MI25 or newer Radeon Instinct series GPU or Radeon VII or Radeon Pro VII or Radeon RX 7900 (GRE/XT/XTX); or Radeon Pro W7900/W7800. Other 6000 series or newer Radeon RX/Pro W series GPUs may be supported; but may not be searchable using standard filters for AMD ROCm.
- At least 1 physical CPU core (2 hyperthreads) per GPU.
- Your CPU must support AVX instruction set (not all lower end CPUs support this).
- At least 4GB of system RAM per GPU.
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent formatting: line 100 uses '4GB' while the original text had '4GBM' which appears to be a typo. The correction to '4GB' is appropriate, but should be '4 GB' with a space to match the style used elsewhere in the document (e.g., line 118 'GPU RAM of 7 GB').

Suggested change
- At least 4GB of system RAM per GPU.
- At least 4 GB of system RAM per GPU.

Copilot uses AI. Check for mistakes.
{
"@type": "HowToStep",
"name": "Recover From Deverification",
"text": "If deverified, investigate red error indicators quickly and review logs/metrics. Common causes include network instability, hardware/system errors, GPU issues, container failures, or policy violations. Fix the underlying issue and restore stability - the system will automatically transition back to Verified status."
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect recovery path described: According to line 151, when a deverified machine's issue is resolved, it transitions back to 'Verified' status. However, the original documentation stated that resolved deverified machines go from 'deverified -> unverified'. This is a significant change in behavior that appears inconsistent with the lifecycle shown in line 51 'Unverified → Verified → (potentially) Deverified → Unverified → ...'.

Suggested change
"text": "If deverified, investigate red error indicators quickly and review logs/metrics. Common causes include network instability, hardware/system errors, GPU issues, container failures, or policy violations. Fix the underlying issue and restore stability - the system will automatically transition back to Verified status."
"text": "If deverified, investigate red error indicators quickly and review logs/metrics. Common causes include network instability, hardware/system errors, GPU issues, container failures, or policy violations. Fix the underlying issue and restore stability - the system will automatically transition back to Unverified status, after which you may re-qualify for Verified."

Copilot uses AI. Check for mistakes.
- Detected abuse or policy violations.

If you see a red error on your machine card, you should try to investigate and resolve that because it could get you deverified.
**Recovery:** Fix the issue and restore stability; the system will automatically transition back to Verified.
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent deverification recovery path: This states transition is 'back to Verified', but line 51 shows the lifecycle as 'Unverified → Verified → (potentially) Deverified → Unverified → ...', indicating deverified machines should go back to Unverified, not directly to Verified.

Suggested change
**Recovery:** Fix the issue and restore stability; the system will automatically transition back to Verified.
**Recovery:** Fix the issue and restore stability; the system will automatically transition to Unverified, and then, after passing verification checks, back to Verified.

Copilot uses AI. Check for mistakes.
| Network | Symmetric, stable bandwidth; open ports | Upgrade links; verify routing/ports; monitor jitter/loss |
| Hardware | Modern GPUs/CPUs; adequate PCIe & RAM | Favor DC/workstation GPUs; ensure PCIe lanes; match CPU/RAM to GPU tier |
| Storage | Throughput and reliability | Prefer NVMe; monitor SMART; ensure sustained bandwidth |
| Virtualization | VM capability enabled | Enable in BIOS, enable IOMMU. |
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Grammatical inconsistency: Should be 'Enable VM in BIOS, enable IOMMU' or 'Enable in BIOS; enable IOMMU' for consistency with other table entries that use semicolons as separators.

Suggested change
| Virtualization | VM capability enabled | Enable in BIOS, enable IOMMU. |
| Virtualization | VM capability enabled | Enable in BIOS; enable IOMMU. |

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@froghaus froghaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work! This is gonna be a huge improvement. Definitely a much better resource to provide for hosts.

A few quibbles and nitpicks here and there. My only feedback that I would definitely like to see revision on here is clear messaging that deverification is only temporary, and that you can see the red message in the machines page. Some hosts only look in the CLI , or don't look consistently.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

### Deverified

**What it means:**
A previously Verified machine no longer meets requirements. System continuous monitoring detects sustained degradation.
Copy link

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase 'System continuous monitoring' is grammatically incorrect. It should be either 'Continuous system monitoring' or 'The system's continuous monitoring'.

Suggested change
A previously Verified machine no longer meets requirements. System continuous monitoring detects sustained degradation.
A previously Verified machine no longer meets requirements. Continuous system monitoring detects sustained degradation.

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hanran-y hanran-y merged commit a04c1e7 into main Nov 5, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation good first issue Good for newcomers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants