Skip to content

Calling stop() multiple times or log() after stop() can cause errors in StdLogger #708

@Tiger1218

Description

@Tiger1218

What's the issue, what's expected?:
The StdLogger class exhibits problematic behavior when the stop() method is called multiple times consecutively or when the log() method is called after stop() has been executed. In both cases, a ValueError related to operating on a closed file handle occurs.

Specifically:

  1. Calling stdout_logger.stop() more than once after the logger has been started and then stopped will raise an error.
  2. Calling stdout_logger.log() after stdout_logger.stop() has been called will raise an error.

How to reproduce it?:

For case 1:

( Assume superbench/common/utils/stdout_logging.py is imported )

import sys
# Assume StdLogger and StdoutLoggerStream classes are defined as in the provided code
# ... (paste StdLogger and StdoutLoggerStream classes here if needed for a self-contained example) ...

stdout_logger = StdLogger()
stdout_logger.add_file_handler("test_multi_stop.log")

print("Starting logger...")
stdout_logger.start(rank=0)
print("This output goes to console and file.")
stdout_logger.log("Using log() method.")

print("Stopping logger...")
stdout_logger.stop()
print("Logger stopped.")

print("Attempting to stop logger again...")
# This line is expected to cause a ValueError
try:
    stdout_logger.stop()
    print("Second stop() call finished unexpectedly without error.")
except ValueError as e:
    print(f"Caught expected error on second stop(): {e}")

For case 2:

import sys
# Assume StdLogger and StdoutLoggerStream classes are defined as in the provided code
# ... (paste StdLogger and StdoutLoggerStream classes here if needed for a self-contained example) ...

stdout_logger = StdLogger()
stdout_logger.add_file_handler("test_log_after_stop.log")

print("Starting logger...")
stdout_logger.start(rank=0)
print("This output goes to console and file.")

print("Stopping logger...")
stdout_logger.stop()
print("Logger stopped. sys.stdout should be restored.")

print("Attempting to use log() method after stopping...")
# This line is expected to cause a ValueError
try:
    stdout_logger.log("Message after logger should be stopped.")
    print("log() call after stop finished unexpectedly without error.")
except ValueError as e:
    print(f"Caught expected error on log() after stop: {e}")

print("Direct print() after stop:")
print("This print() should work fine.") # This print() will work because sys.stdout is restored

Log message or shapshot?:

[2025-05-13 07:42:48,716 01e569a1dc86:2066][base.py:182][ERROR] Run benchmark failed - benchmark: nvbandwidth, message: I/O operation on closed file.
superbench: Run benchmark failed - benchmark: nvbandwidth, message: I/O operation on closed file.
[0]: Traceback (most recent call last):
[2025-05-13 07:42:48,717 01e569a1dc86:2066][executor.py:147][ERROR] I/O operation on closed file.
superbench: I/O operation on closed file.
[2025-05-13 07:42:48,717 01e569a1dc86:2066][executor.py:148][ERROR] Executor failed in nvbandwidth.
superbench: Executor failed in nvbandwidth.
[2025-05-13 07:42:49,716 01e569a1dc86:2066][executor.py:251][INFO] Executor is going to execute kernel-launch.
superbench: Executor is going to execute kernel-launch.
[2025-05-13 07:42:49,717 01e569a1dc86:2228][monitor.py:118][INFO] Start monitoring.
superbench: Start monitoring.
[2025-05-13 07:42:49,718 01e569a1dc86:2066][micro_base.py:177][INFO] Execute command - round: 0, benchmark: kernel-launch, command: /opt/superbench/bin/kernel_launch_overhead -w 100 -n 2000000 -i 2000.
superbench: Execute command - round: 0, benchmark: kernel-launch, command: /opt/superbench/bin/kernel_launch_overhead -w 100 -n 2000000 -i 2000.
[2025-05-13 07:42:50,746 01e569a1dc86:2228][device_manager.py:217][WARNING] Get device power limitation failed: Not Supported
superbench: Get device power limitation failed: Not Supported
[2025-05-13 07:43:00,728 01e569a1dc86:2228][device_manager.py:217][WARNING] Get device power limitation failed: Not Supported
superbench: Get device power limitation failed: Not Supported
[2025-05-13 07:43:10,731 01e569a1dc86:2228][device_manager.py:217][WARNING] Get device power limitation failed: Not Supported
superbench: Get device power limitation failed: Not Supported
[0]: Kernel launch overhead - event time: 0.00267 ms
Kernel launch overhead - wall time: 0.00518 ms
[2025-05-13 07:43:17,900 01e569a1dc86:2066][base.py:182][ERROR] Run benchmark failed - benchmark: kernel-launch, message: I/O operation on closed file.
superbench: Run benchmark failed - benchmark: kernel-launch, message: I/O operation on closed file.
[0]: Traceback (most recent call last):
[2025-05-13 07:43:17,901 01e569a1dc86:2066][executor.py:147][ERROR] I/O operation on closed file.
superbench: I/O operation on closed file.
[2025-05-13 07:43:17,901 01e569a1dc86:2066][executor.py:148][ERROR] Executor failed in kernel-launch.
superbench: Executor failed in kernel-launch.
[2025-05-13 07:43:19,755 01e569a1dc86:2066][executor.py:251][INFO] Executor is going to execute gemm-flops.
superbench: Executor is going to execute gemm-flops.
[2025-05-13 07:43:19,756 01e569a1dc86:2244][monitor.py:118][INFO] Start monitoring.
superbench: Start monitoring.
[2025-05-13 07:43:19,761 01e569a1dc86:2066][micro_base.py:177][INFO] Execute command - round: 0, benchmark: gemm-flops, command: /opt/superbench/bin/cutlass_profiler --warmup-iterations=5 --operation=gemm --n=16384 --k=16384 --m=16384 --kernels=cutlass_simt_sgemm_128x128_8x2_*.
superbench: Execute command - round: 0, benchmark: gemm-flops, command: /opt/superbench/bin/cutlass_profiler --warmup-iterations=5 --operation=gemm --n=16384 --k=16384 --m=16384 --kernels=cutlass_simt_sgemm_128x128_8x2_*.

Additional information:

The root cause appears to be that the StdLogger.stop() method calls self.logger_stream.restore() to clean up the stream and restore sys.stdout, but it does not set self.logger_stream to None afterwards.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions