Skip to content

Commit 981009d

Browse files
authored
[Fix] PagedKVCache fetching compute stream when copy stream is needed (#16714)
This PR fixes an issue in PagedKVCache, where a compute stream will always be fetched. For backends like WebGPU, the `GetCurrentStream` function is not implemented, which leads to an error when fetching the compute stream.
1 parent 8023a98 commit 981009d

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

src/runtime/relax_vm/paged_kv_cache.cc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -439,12 +439,12 @@ class PagedAttentionKVCacheObj : public AttentionKVCacheObj {
439439
free_page_ids_.push_back(page_id);
440440
}
441441

442-
// The compute stream is the default stream.
443442
// If the device is CUDA/ROCm, we create a standalone copy stream, in
444443
// purpose to hide the latency of auxiliary stream copy.
445-
compute_stream_ = DeviceAPI::Get(device)->GetCurrentStream(device);
446444
if (device.device_type == DLDeviceType::kDLCUDA ||
447445
device.device_type == DLDeviceType::kDLROCM) {
446+
// The compute stream is the default stream.
447+
compute_stream_ = DeviceAPI::Get(device)->GetCurrentStream(device);
448448
copy_stream_ = DeviceAPI::Get(device)->CreateStream(device);
449449
}
450450
}

0 commit comments

Comments
 (0)