-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[Doc][Misc] Fix msprobe_guide.md documentation issues #6965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -55,21 +55,20 @@ To restrict the operators that are captured, configure the `list` block: | |||||
|
|
||||||
| ```json | ||||||
| "scope": ["Module.conv1.Conv2d.forward.0", "Module.fc2.Linear.forward.0"] | ||||||
| "scope": ["Cell.conv1.Conv2d.forward.0", "Cell.fc2.Dense.backward.0"] | ||||||
| "scope": ["Cell.conv1.Conv2d.forward.0", "Cell.fc2.Dense.forward.0"] | ||||||
| "scope": ["Tensor.add.0.forward", "Functional.square.2.forward"] | ||||||
| ``` | ||||||
|
|
||||||
| The `level` setting determines what can be provided—modules when `level=L0`, APIs when `level=L1`, and either modules or APIs when `level=mix`. | ||||||
|
|
||||||
| - `list` (list[str]): Custom operator list. Options include: | ||||||
| - Supply the full names of specific APIs in PyTorch pynative scenarios to only dump those APIs. Example: `"list": ["Tensor.permute.1.forward", "Tensor.transpose.2.forward", "Torch.relu.3.backward"]`. | ||||||
| - Supply the full names of specific APIs in PyTorch pynative scenarios to only dump those APIs. Example: `"list": ["Tensor.permute.1.forward", "Tensor.transpose.2.forward", "Torch.relu.3.forward"]`. | ||||||
| - When `level=mix`, you can provide module names so that the dump expands to everything produced while the module is running. Example: `"list": ["Module.module.language_model.encoder.layers.0.mlp.ParallelMlp.forward.0"]`. | ||||||
| - Provide a substring such as `"list": ["relu"]` to dump every API whose name contains the substring. When `level=mix`, modules whose names contain the substring are also expanded. | ||||||
|
|
||||||
| Example configuration: | ||||||
|
|
||||||
| ```bash | ||||||
| cat <<'JSON' > /data/msprobe_config.json | ||||||
| ```json | ||||||
| { | ||||||
| "task": "statistics", | ||||||
| "dump_path": "/home/data_dump", | ||||||
|
|
@@ -86,10 +85,9 @@ cat <<'JSON' > /data/msprobe_config.json | |||||
| "summary_mode": "statistics" | ||||||
| } | ||||||
| } | ||||||
| JSON | ||||||
| ``` | ||||||
|
Comment on lines
69
to
88
|
||||||
|
|
||||||
| ## 2. Enable `msprobe` in vllm-ascend | ||||||
| ## 3. Enable `msprobe` in vllm-ascend | ||||||
|
|
||||||
| 1. Start vLLM in eager mode by adding `--enforce-eager` (static-graph scenarios are not supported yet) and pass the config path through `--additional-config`: | ||||||
|
|
||||||
|
|
@@ -102,7 +100,7 @@ JSON | |||||
| --additional-config '{"dump_config_path": "/data/msprobe_config.json"}' & | ||||||
| ``` | ||||||
|
|
||||||
| ## 3. Send requests and collect dumps | ||||||
| ## 4. Send requests and collect dumps | ||||||
|
|
||||||
| 1. Send inference requests as usual, for example: | ||||||
|
|
||||||
|
|
@@ -117,7 +115,7 @@ JSON | |||||
| }' | python -m json.tool | ||||||
| ``` | ||||||
|
|
||||||
| 2. Each request drives the sequence `msprobe: start -> forward/backward -> stop -> step`. The runner invokes `step()` on every code path, so you always get a complete dataset even if inference returns early. | ||||||
| 2. Each request drives the sequence `msprobe: start -> forward -> stop -> step`. The runner invokes `step()` on every code path, so you always get a complete dataset even if inference returns early. | ||||||
|
|
||||||
| 3. Dump files are written into `dump_path`. They usually contain: | ||||||
| - Tensor files grouped by operator/module. | ||||||
|
|
@@ -131,12 +129,10 @@ JSON | |||||
| │ ├── step0 | ||||||
| │ │ ├── rank0 | ||||||
| │ │ │ ├── dump_tensor_data | ||||||
| │ │ │ │ ├── Tensor.permute.1.forward.pt | ||||||
| │ │ │ │ ├── Functional.linear.5.backward.output.pt # Format: {api_type}.{api_name}.{call_count}.{forward/backward}.{input/output}.{arg_index}. | ||||||
| │ │ │ │ ├── Tensor.permute.1.forward.pt # Format: {api_type}.{api_name}.{call_count}.forward.{input/output}.{arg_index}. | ||||||
|
||||||
| │ │ │ │ ├── Tensor.permute.1.forward.pt # Format: {api_type}.{api_name}.{call_count}.forward.{input/output}.{arg_index}. | |
| │ │ │ │ ├── Tensor.permute.1.forward.input.0.pt # Format: {api_type}.{api_name}.{call_count}.forward.{input/output}.{arg_index}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This JSON example is invalid due to a trailing comma. Since the Module.conv2.Conv2d.parameters_grad and Module.conv2.Conv2d.backward.0 properties were removed from the data object, this comma after the Module.conv2.Conv2d.forward.0 object is no longer necessary and makes the JSON invalid.
| }, | |
| } |
Copilot
AI
Mar 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The L0 dump.json example is no longer valid JSON after removing the parameters_grad/backward entries: it now ends with a trailing comma and mismatched closing braces (}, then } …). Please update the closing braces (and remove the trailing comma) so the snippet is syntactically correct and copy/pasteable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot
AI
Mar 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The L1 dump.json example is now invalid JSON after removing the .backward section: it leaves a trailing comma after the Functional.relu.0.forward object and has extra closing braces. Please adjust the snippet so the data object and root object close cleanly without trailing commas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
scopeexamples are inside ajsoncode block but repeat the"scope"key three times, which is not valid JSON and may confuse readers trying to copy/paste. Consider turning these into separate code blocks (each showing a full snippet) or a single valid example with explanatory text outside the JSON block.