Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
937eae3
draft gguf remove
Isotr0py Apr 12, 2026
642e1ec
remove gguf materialization
Isotr0py Apr 12, 2026
1b53ba7
clean
Isotr0py Apr 12, 2026
6332347
remove gguf materialization
Isotr0py Apr 12, 2026
a270d85
clean
Isotr0py Apr 12, 2026
f4e25f6
sync main
Isotr0py May 5, 2026
4d940cb
workaround tie words embedding
Isotr0py May 7, 2026
83fe14d
remove siglip maybe_swap_ffn_param
Isotr0py May 8, 2026
df565fa
pass quant_config
Isotr0py May 10, 2026
ca76814
clean
Isotr0py May 10, 2026
4b1e2d0
clean openpangu
Isotr0py May 10, 2026
070ce48
clean dead code
Isotr0py May 10, 2026
de99b2b
clean MoE weight loader
Isotr0py May 10, 2026
1c74b1c
clean rocm test
Isotr0py May 10, 2026
f38f9bb
clean spec config overrides
Isotr0py May 10, 2026
fdf0b53
clean unused config patch
Isotr0py May 10, 2026
78f3b92
clean unnecessary load_general_plugins
Isotr0py May 10, 2026
6a4da2c
Merge branch 'main' into remove-gguf
Isotr0py May 10, 2026
3416f82
Merge remote-tracking branch 'upstream/main' into remove-gguf
Isotr0py May 13, 2026
a94499b
make pre-commit happy
Isotr0py May 13, 2026
c551dc0
Merge branch 'main' into remove-gguf
Isotr0py May 14, 2026
94e52f1
Merge branch 'main' into remove-gguf
Isotr0py May 14, 2026
df4fa7e
Merge remote-tracking branch 'upstream/main' into remove-gguf
Isotr0py May 20, 2026
783875e
Merge branch 'main' into remove-gguf
Isotr0py May 20, 2026
c321339
add GGUF doc back
Isotr0py May 20, 2026
3467f25
remove gguf kernels again
Isotr0py May 20, 2026
ae0324c
remove ggml bindings
Isotr0py May 20, 2026
c33cf71
fix build
Isotr0py May 20, 2026
ff1750e
Merge branch 'main' into remove-gguf
Isotr0py May 21, 2026
ca5acbf
Merge branch 'main' into remove-gguf
Isotr0py May 22, 2026
1abb112
Merge branch 'main' into remove-gguf
Isotr0py May 23, 2026
97943ae
Merge branch 'main' into remove-gguf
Isotr0py May 24, 2026
5ca2402
Merge branch 'main' into remove-gguf
Isotr0py May 24, 2026
51092be
Merge branch 'main' into remove-gguf
Isotr0py May 28, 2026
51f3246
Merge remote-tracking branch 'upstream/main' into remove-gguf
Isotr0py Jun 9, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ updates:
- dependency-name: "torchvision"
- dependency-name: "xformers"
- dependency-name: "lm-format-enforcer"
- dependency-name: "gguf"
- dependency-name: "compressed-tensors"
- dependency-name: "ray[cgraph]" # Ray Compiled Graph
- dependency-name: "lm-eval"
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ repos:
rev: v21.1.2
hooks:
- id: clang-format
exclude: 'csrc/(moe/topk_softmax_kernels.cu|libtorch_stable/quantization/gguf/(ggml-common.h|dequantize.cuh|vecdotq.cuh|mmq.cuh|mmvq.cuh))|vllm/third_party/.*'
exclude: 'csrc/moe/topk_softmax_kernels.cu|vllm/third_party/.*'
types_or: [c++, cuda]
args: [--style=file, --verbose]
- repo: https://github.com/DavidAnson/markdownlint-cli2
Expand Down
1 change: 0 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -631,7 +631,6 @@ if(VLLM_GPU_LANG STREQUAL "CUDA" OR VLLM_GPU_LANG STREQUAL "HIP")
"csrc/libtorch_stable/quantization/w8a8/int8/per_token_group_quant.cu"
"csrc/libtorch_stable/permute_cols.cu"
"csrc/libtorch_stable/quantization/gptq/q_gemm.cu"
"csrc/libtorch_stable/quantization/gguf/gguf_kernel.cu"
"csrc/libtorch_stable/pos_encoding_kernels.cu"
"csrc/libtorch_stable/fused_qknorm_rope_kernel.cu"
"csrc/libtorch_stable/layernorm_kernels.cu"
Expand Down
29 changes: 0 additions & 29 deletions csrc/libtorch_stable/ops.h
Original file line number Diff line number Diff line change
Expand Up @@ -397,35 +397,6 @@ torch::stable::Tensor gptq_gemm(torch::stable::Tensor a,
void gptq_shuffle(torch::stable::Tensor q_weight, torch::stable::Tensor q_perm,
int64_t bit);

// GGML kernels (shared CUDA/ROCm)
torch::stable::Tensor ggml_dequantize(
torch::stable::Tensor W, int64_t type, int64_t m, int64_t n,
std::optional<torch::headeronly::ScalarType> const& dtype);

torch::stable::Tensor ggml_mul_mat_vec_a8(torch::stable::Tensor W,
torch::stable::Tensor X, int64_t type,
int64_t row);

torch::stable::Tensor ggml_mul_mat_a8(torch::stable::Tensor W,
torch::stable::Tensor X, int64_t type,
int64_t row);

torch::stable::Tensor ggml_moe_a8(torch::stable::Tensor X,
torch::stable::Tensor W,
torch::stable::Tensor sorted_token_ids,
torch::stable::Tensor expert_ids,
torch::stable::Tensor num_tokens_post_padded,
int64_t type, int64_t row, int64_t top_k,
int64_t tokens);

torch::stable::Tensor ggml_moe_a8_vec(torch::stable::Tensor X,
torch::stable::Tensor W,
torch::stable::Tensor topk_ids,
int64_t top_k, int64_t type, int64_t row,
int64_t tokens);

int64_t ggml_moe_get_block_size(int64_t type);

void paged_attention_v1(
torch::stable::Tensor& out, torch::stable::Tensor& query,
torch::stable::Tensor& key_cache, torch::stable::Tensor& value_cache,
Expand Down
Loading
Loading