Merge epd feature from Dr.zheng's repo into private repo#103
Merge epd feature from Dr.zheng's repo into private repo#103kechengliu97 wants to merge 1 commit intov0.11.0from
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a significant new feature: the disaggregated encoder. This allows running the vision encoder in a separate process from the prefill/decoder, which can lead to better resource utilization, scaling, and lower time-to-first-token. The implementation introduces a new ECConnector abstraction for managing the transfer of encoder cache between processes, with an initial ECSharedStorageConnector implementation. The changes are well-structured, with modifications to the scheduler and GPU model runner to integrate this new capability. The addition of comprehensive documentation, examples, and tests (both unit and integration) is commendable and greatly helps in understanding and verifying the feature.
My review has identified one high-severity issue related to a race condition in the ECSharedStorageConnector when used in a multi-producer setup, which could lead to data corruption. A suggestion to add file-based locking is provided to address this. Overall, this is a high-quality contribution.
| filename = self._generate_filename_debug(mm_hash) | ||
| ec_cache = encoder_cache[mm_hash] | ||
| tensors = {"ec_cache": ec_cache.detach().cpu()} | ||
| safetensors.torch.save_file(tensors, filename) | ||
| logger.debug("Save cache successful for mm_hash %s", mm_hash) |
There was a problem hiding this comment.
The current implementation of save_caches is not safe for concurrent writes from multiple producer processes. If multiple encoder instances are configured to use the same shared_storage_path, they could race to write the cache for the same mm_hash, potentially leading to corrupted encoder_cache.safetensors files. The safetensors.torch.save_file operation is not guaranteed to be atomic across different processes.
To make this connector safe for a multi-producer setup, a file-based locking mechanism (e.g., using fcntl on Unix-like systems) should be implemented around the file writing operations. This ensures that only one process can write to a given cache file at a time.
| filename = self._generate_filename_debug(mm_hash) | |
| ec_cache = encoder_cache[mm_hash] | |
| tensors = {"ec_cache": ec_cache.detach().cpu()} | |
| safetensors.torch.save_file(tensors, filename) | |
| logger.debug("Save cache successful for mm_hash %s", mm_hash) | |
| import fcntl | |
| import os | |
| filename = self._generate_filename_debug(mm_hash) | |
| lock_filename = filename + ".lock" | |
| # Use a lock file to prevent race conditions from multiple producers. | |
| with open(lock_filename, "w") as lock_file: | |
| fcntl.flock(lock_file, fcntl.LOCK_EX) | |
| try: | |
| # Double-check if another process created the cache while waiting for the lock. | |
| if not os.path.exists(filename): | |
| ec_cache = encoder_cache[mm_hash] | |
| tensors = {"ec_cache": ec_cache.detach().cpu()} | |
| safetensors.torch.save_file(tensors, filename) | |
| logger.debug("Save cache successful for mm_hash %s", mm_hash) | |
| else: | |
| logger.debug("Cache for mm_hash %s already exists, skipping save.", mm_hash) | |
| finally: | |
| fcntl.flock(lock_file, fcntl.LOCK_UN) |
Signed-off-by: WANG Cong <115451386+congw729@users.noreply.github.com>
Uh oh!
There was an error while loading. Please reload this page.