Conversation
- Add 750-RFC.yml template for architectural discussions - Template adapted for vLLM-omni with updated URLs - Enables structured feedback for major design changes - Follows vLLM's RFC template structure
Summary of ChangesHello @hsliuustc0106, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a structured process for Request for Comments (RFCs) within the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new issue template for Requests for Comments (RFCs) and also modifies the .gitignore file. The RFC template is a welcome addition, though I've suggested a couple of improvements to make it more effective: clarifying the descriptions for the 'Motivation' and 'Proposed Change' fields, and removing a checkbox that seems more suited for bug reports. My main concern is with the .gitignore change, which is very broad and could lead to unintended files being committed. I've recommended a more targeted change to avoid this potential issue. In the future, it would be better to separate unrelated changes like these into different pull requests.
update RFC template
P0 fixes: vllm-project#1: _free_scaffold_weights now shrinks storage to zero (actually releases VRAM). Only runs when SKIP_SCAFFOLD is also set. Called lazily after first prefill, not at load time. vllm-project#2: Sliding VAE default OFF (splice algorithm had alignment bug). _sliding_vae_decode now falls back to full decode until proper overlap-add is implemented. vllm-project#3: Complete per-request state reset in preprocess: now clears _curr_prefix_feat_cond, _last_audio_patch_gpu, _prev_audio, _prev_audio_len, _decode_step_count, _precomputed_stop_logits. vllm-project#4: compute_logits fallback forces stop (not continue) when _prefill_completed=True, preventing runaway generation. vllm-project#5: Scaffold VRAM: load_weights no longer frees immediately; _free_scaffold_weights called after first prefill completes, so scaffold is available for prefill then released. P1 fixes: vllm-project#6: Log all active config flags at load time. vllm-project#7: Remove dead _STOP_CHECK_INTERVAL code. vllm-project#8: Remove broken audio_duration formula from postprocess. vllm-project#9/vllm-project#14: Move `from einops import rearrange` to module top level. vllm-project#11: Remove torch.no_grad() context from _forward_decode_graphable (incompatible with CUDA Graph capture).
P0 fixes: vllm-project#1: _free_scaffold_weights now shrinks storage to zero (actually releases VRAM). Only runs when SKIP_SCAFFOLD is also set. Called lazily after first prefill, not at load time. vllm-project#2: Sliding VAE default OFF (splice algorithm had alignment bug). _sliding_vae_decode now falls back to full decode until proper overlap-add is implemented. vllm-project#3: Complete per-request state reset in preprocess: now clears _curr_prefix_feat_cond, _last_audio_patch_gpu, _prev_audio, _prev_audio_len, _decode_step_count, _precomputed_stop_logits. vllm-project#4: compute_logits fallback forces stop (not continue) when _prefill_completed=True, preventing runaway generation. vllm-project#5: Scaffold VRAM: load_weights no longer frees immediately; _free_scaffold_weights called after first prefill completes, so scaffold is available for prefill then released. P1 fixes: vllm-project#6: Log all active config flags at load time. vllm-project#7: Remove dead _STOP_CHECK_INTERVAL code. vllm-project#8: Remove broken audio_duration formula from postprocess. vllm-project#9/vllm-project#14: Move `from einops import rearrange` to module top level. vllm-project#11: Remove torch.no_grad() context from _forward_decode_graphable (incompatible with CUDA Graph capture).
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
RFC Template (.github/ISSUE_TEMPLATE/750-RFC.yml)
Test Plan
none
Test Result
none
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/hsliuustc0106/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)