-
Notifications
You must be signed in to change notification settings - Fork 74
Open
Description
What's the issue, what's expected?:
The StdLogger
class exhibits problematic behavior when the stop()
method is called multiple times consecutively or when the log()
method is called after stop()
has been executed. In both cases, a ValueError
related to operating on a closed file handle occurs.
Specifically:
- Calling
stdout_logger.stop()
more than once after the logger has been started and then stopped will raise an error. - Calling
stdout_logger.log()
afterstdout_logger.stop()
has been called will raise an error.
How to reproduce it?:
For case 1:
( Assume superbench/common/utils/stdout_logging.py
is imported )
import sys
# Assume StdLogger and StdoutLoggerStream classes are defined as in the provided code
# ... (paste StdLogger and StdoutLoggerStream classes here if needed for a self-contained example) ...
stdout_logger = StdLogger()
stdout_logger.add_file_handler("test_multi_stop.log")
print("Starting logger...")
stdout_logger.start(rank=0)
print("This output goes to console and file.")
stdout_logger.log("Using log() method.")
print("Stopping logger...")
stdout_logger.stop()
print("Logger stopped.")
print("Attempting to stop logger again...")
# This line is expected to cause a ValueError
try:
stdout_logger.stop()
print("Second stop() call finished unexpectedly without error.")
except ValueError as e:
print(f"Caught expected error on second stop(): {e}")
For case 2:
import sys
# Assume StdLogger and StdoutLoggerStream classes are defined as in the provided code
# ... (paste StdLogger and StdoutLoggerStream classes here if needed for a self-contained example) ...
stdout_logger = StdLogger()
stdout_logger.add_file_handler("test_log_after_stop.log")
print("Starting logger...")
stdout_logger.start(rank=0)
print("This output goes to console and file.")
print("Stopping logger...")
stdout_logger.stop()
print("Logger stopped. sys.stdout should be restored.")
print("Attempting to use log() method after stopping...")
# This line is expected to cause a ValueError
try:
stdout_logger.log("Message after logger should be stopped.")
print("log() call after stop finished unexpectedly without error.")
except ValueError as e:
print(f"Caught expected error on log() after stop: {e}")
print("Direct print() after stop:")
print("This print() should work fine.") # This print() will work because sys.stdout is restored
Log message or shapshot?:
[2025-05-13 07:42:48,716 01e569a1dc86:2066][base.py:182][ERROR] Run benchmark failed - benchmark: nvbandwidth, message: I/O operation on closed file.
superbench: Run benchmark failed - benchmark: nvbandwidth, message: I/O operation on closed file.
[0]: Traceback (most recent call last):
[2025-05-13 07:42:48,717 01e569a1dc86:2066][executor.py:147][ERROR] I/O operation on closed file.
superbench: I/O operation on closed file.
[2025-05-13 07:42:48,717 01e569a1dc86:2066][executor.py:148][ERROR] Executor failed in nvbandwidth.
superbench: Executor failed in nvbandwidth.
[2025-05-13 07:42:49,716 01e569a1dc86:2066][executor.py:251][INFO] Executor is going to execute kernel-launch.
superbench: Executor is going to execute kernel-launch.
[2025-05-13 07:42:49,717 01e569a1dc86:2228][monitor.py:118][INFO] Start monitoring.
superbench: Start monitoring.
[2025-05-13 07:42:49,718 01e569a1dc86:2066][micro_base.py:177][INFO] Execute command - round: 0, benchmark: kernel-launch, command: /opt/superbench/bin/kernel_launch_overhead -w 100 -n 2000000 -i 2000.
superbench: Execute command - round: 0, benchmark: kernel-launch, command: /opt/superbench/bin/kernel_launch_overhead -w 100 -n 2000000 -i 2000.
[2025-05-13 07:42:50,746 01e569a1dc86:2228][device_manager.py:217][WARNING] Get device power limitation failed: Not Supported
superbench: Get device power limitation failed: Not Supported
[2025-05-13 07:43:00,728 01e569a1dc86:2228][device_manager.py:217][WARNING] Get device power limitation failed: Not Supported
superbench: Get device power limitation failed: Not Supported
[2025-05-13 07:43:10,731 01e569a1dc86:2228][device_manager.py:217][WARNING] Get device power limitation failed: Not Supported
superbench: Get device power limitation failed: Not Supported
[0]: Kernel launch overhead - event time: 0.00267 ms
Kernel launch overhead - wall time: 0.00518 ms
[2025-05-13 07:43:17,900 01e569a1dc86:2066][base.py:182][ERROR] Run benchmark failed - benchmark: kernel-launch, message: I/O operation on closed file.
superbench: Run benchmark failed - benchmark: kernel-launch, message: I/O operation on closed file.
[0]: Traceback (most recent call last):
[2025-05-13 07:43:17,901 01e569a1dc86:2066][executor.py:147][ERROR] I/O operation on closed file.
superbench: I/O operation on closed file.
[2025-05-13 07:43:17,901 01e569a1dc86:2066][executor.py:148][ERROR] Executor failed in kernel-launch.
superbench: Executor failed in kernel-launch.
[2025-05-13 07:43:19,755 01e569a1dc86:2066][executor.py:251][INFO] Executor is going to execute gemm-flops.
superbench: Executor is going to execute gemm-flops.
[2025-05-13 07:43:19,756 01e569a1dc86:2244][monitor.py:118][INFO] Start monitoring.
superbench: Start monitoring.
[2025-05-13 07:43:19,761 01e569a1dc86:2066][micro_base.py:177][INFO] Execute command - round: 0, benchmark: gemm-flops, command: /opt/superbench/bin/cutlass_profiler --warmup-iterations=5 --operation=gemm --n=16384 --k=16384 --m=16384 --kernels=cutlass_simt_sgemm_128x128_8x2_*.
superbench: Execute command - round: 0, benchmark: gemm-flops, command: /opt/superbench/bin/cutlass_profiler --warmup-iterations=5 --operation=gemm --n=16384 --k=16384 --m=16384 --kernels=cutlass_simt_sgemm_128x128_8x2_*.
Additional information:
The root cause appears to be that the StdLogger.stop()
method calls self.logger_stream.restore()
to clean up the stream and restore sys.stdout
, but it does not set self.logger_stream
to None
afterwards.
RyanYunruiYang
Metadata
Metadata
Assignees
Labels
No labels