Holistic Trace Analysis (HTA), is a performance analysis tool to identify performance bottlenecks in distributed training workloads. HTA achieves this by analyzing traces collected through the PyTorch Profiler a.k.a. Kineto.
HTA provides the following features:
- Temporal Breakdown - Breakdown of time taken by the GPUs in terms of time spent in computation, communication, memory events, and idle time across all ranks.
- Kernel Breakdown - Finds kernels with the longest duration on each rank.
- Kernel Duration Distribution - Distribution of average time taken by longest kernels across different ranks.
- Idle Time Breakdown - Breakdown of GPU idle time into waiting for the host, waiting for another kernel or attribution to an unknown cause.
- Communication Computation Overlap - Calculate the percentage of time when communication overlaps computation.
- Frequent CUDA Kernel Patterns - Find the CUDA kernels most frequently launched by any given PyTorch or user defined operator.
- CUDA Kernel Launch Statistics - Distributions of GPU kernels with very small duration, large duration, and excessive launch time.
- Augmented Counters (Queue length, Memory bandwidth) - Augmented trace files which provide insights into memory bandwidth utilized and number of outstanding operations on each CUDA stream.
- Trace Comparison - A trace comparison tool to identify and visualize the differences between traces.
- CUPTI Counter Analysis - An experimental API to get GPU performance counters. By attributing performance measurements from kernels to PyTorch operators roofline analysis can be performed and kernels can be optimized.
HTA runs on Linux and Mac with Python >= 3.8.
See here to install Miniconda.
Create the environment env_name
conda create -n env_name
Activate the environment
conda activate env_name
Deactivate the environment
conda deactivate
pip install HolisticTraceAnalysis
git clone https://github.com/facebookresearch/HolisticTraceAnalysis.git
cd HolisticTraceAnalysis
git submodule update --init
pip install -r requirements.txt
pip install -e .
Learn more about the features and the API from our documentation.
All traces collected from a job must reside in a unique folder.
Activate the Conda environment and launch a Jupyter notebook.
conda activate env_name
jupyter notebook
Import HTA, and create a TraceAnalysis
object
from hta.trace_analysis import TraceAnalysis
analyzer = TraceAnalysis(trace_dir = "/path/to/folder/containing/the/traces")
# Temporal breakdown
temporal_breakdown_df = analyzer.get_temporal_breakdown()
# Kernel breakdown
kernel_breakdown_df = analyzer.get_gpu_kernel_breakdown()
# Idle time breakdown
idle_time_df = analyzer.get_idle_time_breakdown()
# Communication computation overlap
comm_comp_overlap_df = analyzer.get_comm_comp_overlap()
# Frequent CUDA kernel patterns
frequent_patterns_df = analyzer.get_frequent_cuda_kernel_patterns(operator_name="aten::linear", output_dir="/new/trace/path")
# CUDA kernel launch statistics
cuda_launch_kernel_stats = analyzer.get_cuda_kernel_launch_stats()
# Memory bandwidth time series
memory_bw_series = analyzer.get_memory_bw_time_series()
# Memory bandwidth summary
memory_bw_summary = analyzer.get_memory_bw_summary()
# Queue length time series
ql_series = analyzer.get_queue_length_time_series()
# Queue length summary
ql_summary = analyzer.get_queue_length_summary()
For a detailed demo run the trace_analysis_demo
and trace_diff_demo
notebooks in the examples folder.
Logging Level
Logging level is set through a configuration file in HTA. The default logging level is set in
hta/configs/logging.config
and can be changed in the [logger_hta]
section of the file.
If needed, a different logging file can be configured to use by modifying
hta/configs/trace_analyzer.json
.
├── examples # folder containing demo notebooks
│ ├── ...
├── hta
│ ├── analyzers # core logic for each analysis
│ │ ├── ...
│ ├── common # code common to multiple analysis
│ │ ├── ...
│ ├── configs # config files
│ │ ├── ...
│ ├── trace_analysis.py # entrypoint for TraceAnalysis API
│ ├── trace_diff.py # entrypoint for TraceDiff API
│ └── utils # utility files
│ └── ...
├── scripts # generic tools for traces
│ └── ...
│── tests # unittests
│ └── ...
We welcome new contributions. If you plan to contribute new features or extensions, please first open an issue and discuss the feature with us. To learn more about how to contribute, see our contributing guidelines.
Please let us know if you encounter a bug by filing an issue.
HTA is currently maintained by: Anupam Bhatnagar, Brian Coutinho, Xizhou Feng, Yifan Liu, Sung-Han Lin and Louis Feng. Past contributors include Michael Acar and Yuzhen Huang.
Holistic Trace Analysis is licensed under the MIT License.