Skip to content
This repository was archived by the owner on Oct 11, 2024. It is now read-only.

Upstream sync 2024 06 08#288

Merged
andy-neuma merged 101 commits intomainfrom
upstream-sync-2024-06-08
Jun 10, 2024
Merged

Upstream sync 2024 06 08#288
andy-neuma merged 101 commits intomainfrom
upstream-sync-2024-06-08

Conversation

@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat commented Jun 8, 2024

Upstream sync 2024 06 08 (#288) - ties to v0.4.3 of vllm-upstream

SUMMARY:

  • Merge commits from vllm-project@f68470e to vllm-project@1197e02
  • Our GCP test instances do not have gcc or clang installed. All of the triton kernels rely on the gcc and clang to generate JITs. I disabled these for now, but we need to get these installed (cc @andy-neuma). All are marked with:
@pytest.mark.skip("C compiler not installed in NM automation. "
                  "This codepath follows a triton pathway, which "
                  "JITs using clang or gcc. Since neither are installed "
                  "in our test instances, we need to skip this for now.")
  • Cherry-picked in the changes associated with Fp8 weight format from @mgoin

Note that vllm-project@f68470e is NOT included in this merge.

COMPARE vs UPSTREAM:

alexm-redhat and others added 30 commits June 8, 2024 16:39
Co-authored-by: Alexey Kondratiev <alexey.kondratiev@amd.com>
Allow dummy load format for fp8,
torch.uniform_ doesn't support FP8 at the moment

Co-authored-by: Mor Zusman <morz@ai21.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Pass the CUDA stream into the CUTLASS GEMMs, to avoid future issues with CUDA graphs
…ct#4893)

The 2nd PR for vllm-project#4532.

This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
…project#4985)

Co-authored-by: Elisei Smirnov <el.smirnov@innopolis.university>
@andy-neuma andy-neuma self-requested a review June 10, 2024 17:25
Copy link
Copy Markdown
Member

@andy-neuma andy-neuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@andy-neuma andy-neuma merged commit db9ed90 into main Jun 10, 2024
robertgshaw2-redhat added a commit that referenced this pull request Jun 11, 2024
Upstream sync 2024 06 11
(#288)

SUMMARY:

* Merge commits from
vllm-project@1197e02
to
vllm-project@114332b
* Our GCP test instances do not have gcc or clang installed. All of the
triton kernels rely on the gcc and clang to generate JITs. These are
still disabled (cc @andy-neuma). All are marked with:
```python 
@pytest.mark.skip("C compiler not installed in NM automation. "
                  "This codepath follows a triton pathway, which "
                  "JITs using clang or gcc. Since neither are installed "
                  "in our test instances, we need to skip this for now.")
```

Note that
vllm-project@1197e02
is NOT included in this merge.

COMPARE vs UPSTREAM:


https://github.com/neuralmagic/nm-vllm/compare/upstream-sync-2024-06-11..vllm-project:vllm:v0.5.0

---------

Signed-off-by: Ye Cao <caoye.cao@alibaba-inc.com>
Signed-off-by: kevin <kevin@anyscale.com>
Co-authored-by: Daniele <d.trifiro@me.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Ye Cao <952129620@qq.com>
Co-authored-by: Nadav Shmayovits <45605409+NadavShmayo@users.noreply.github.com>
Co-authored-by: chenqianfzh <51831990+chenqianfzh@users.noreply.github.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Daniil Arapov <59310708+Delviet@users.noreply.github.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Avinash Raj <avistylein3105@gmail.com>
Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Yuan <yuan.zhou@intel.com>
Co-authored-by: Kaiyang Chen <48289729+Kaiyang-Chen@users.noreply.github.com>
Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
Co-authored-by: Breno Faria <breno@veltefaria.de>
Co-authored-by: Toshiki Kataoka <tos.lunar@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
Co-authored-by: zifeitong <zifei.tong@parasail.io>
Co-authored-by: Jie Fu (傅杰) <fujie_email@sina.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: DriverSong <31926998+DriverSong@users.noreply.github.com>
Co-authored-by: qiujiawei9 <qiujiawei9@jd.com>
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Alex Wu <alexanderwu@berkeley.edu>
Co-authored-by: Breno Faria <breno.faria@intrafind.com>
Co-authored-by: liuyhwangyh <liuyhwangyh@163.com>
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>
Co-authored-by: Matthew Goldey <matthew.goldey@gmail.com>
Co-authored-by: Jie Fu (傅杰) <jiefu@tencent.com>
Co-authored-by: Itay Etelis <92247226+Etelis@users.noreply.github.com>
Co-authored-by: limingshu <61349199+JamesLim-sy@users.noreply.github.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Calvinn Ng <39899397+Calvinnncy97@users.noreply.github.com>
Co-authored-by: team <calvinn.ng@ahrefs.com>
Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Benjamin Kitor <bkitor@gmail.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Bla_ckB <50193121+BlackBird-Coding@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.