Skip to content

Commit

Permalink
chore(main): release 0.1.5 (#435)
Browse files Browse the repository at this point in the history
🤖 I have created a release *beep* *boop*
---


##
[0.1.5](v0.1.4...v0.1.5)
(2024-08-13)


### Bugfix

* Fix PagedPrefill python api and some typos
([#441](#441))
([3fff008](3fff008))
* fix prefill kernels' lse result for empty kv-cache
([#440](#440))
([6ac28f4](6ac28f4))

### Features

* decouple float and int workspace buffer
([#442](#442))
([a7ee566](a7ee566))


### Performance Improvements

* faster fp8->fp16 dequantization for pre sm_90 arch
([#439](#439))
([c93f647](c93f647))

### Acknowledgement

We thank contributions and feedbacks from the community:
[@comaniac](https://github.com/comaniac),
[@hnyls2002](https://github.com/hnyls2002),
[@jianfei-wangg](https://github.com/jianfei-wangg),
[@Yard1](https://github.com/Yard1).


---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Zihao Ye <[email protected]>
  • Loading branch information
github-actions[bot] and yzh119 authored Aug 13, 2024
1 parent a7ee566 commit 7470edc
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 3 deletions.
2 changes: 1 addition & 1 deletion .release-please-manifest.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{
".": "0.1.4"
".": "0.1.5"
}
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,28 @@
# Changelog

## [0.1.5](https://github.com/flashinfer-ai/flashinfer/compare/v0.1.4...v0.1.5) (2024-08-13)


### Bugfix

* Fix PagedPrefill python api and some typos ([#441](https://github.com/flashinfer-ai/flashinfer/pull/441)) ([3fff008](https://github.com/flashinfer-ai/flashinfer/commit/3fff008dc9af56c325d9c487bddf69ff014f3989))
* fix prefill kernels' lse result for empty kv-cache ([#440](https://github.com/flashinfer-ai/flashinfer/pull/440)) ([6ac28f4](https://github.com/flashinfer-ai/flashinfer/commit/6ac28f4dd3a9a34a2b4abcbe0a815fc59a2d74ad))

### Features

* decouple float and int workspace buffer ([#442](https://github.com/flashinfer-ai/flashinfer/issues/442)) ([a7ee566](https://github.com/flashinfer-ai/flashinfer/commit/a7ee5662bf967ab1ee16910c73761d326fbeb9a0))


### Performance Improvements

* faster fp8-&gt;fp16 dequantization for pre sm_90 arch ([#439](https://github.com/flashinfer-ai/flashinfer/issues/439)) ([c93f647](https://github.com/flashinfer-ai/flashinfer/commit/c93f647a0dd6b58c9ac20b39438316202358463c))

### Acknowledgement

We thank contributions and feedbacks from the community: [@comaniac](https://github.com/comaniac), [@hnyls2002](https://github.com/hnyls2002), [@jianfei-wangg](https://github.com/jianfei-wangg), [@Yard1](https://github.com/Yard1).



## [0.1.4](https://github.com/flashinfer-ai/flashinfer/compare/v0.1.3...v0.1.4) (2024-08-09)


Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Kernel Library for LLM Serving
[![Documentation](https://github.com/flashinfer-ai/flashinfer/actions/workflows/build-doc.yml/badge.svg)](https://github.com/flashinfer-ai/flashinfer/actions/workflows/build-doc.yml)


FlashInfer is a library for Large Language Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, PageAttention and LoRA. FlashInfer focus on LLM serving and inference, and delivers state-of-the-art performance across diverse scenarios.
FlashInfer is a library for Large Language Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, SparseAttention, PageAttention, Sampling, and more. FlashInfer focus on LLM serving and inference, and delivers state-of-the-art performance across diverse scenarios.

The unique features of FlashInfer include:
1. **Comprehensive Attention Kernels**: Attention kernels that cover all the common use cases of LLM serving, including *single-request* and *batching* versions of *Prefill*, *Decode*, and *Append* kernels, on different formats of KV-Cache (Padded Tensor, Ragged Tensor, and Page Table).
Expand Down
2 changes: 1 addition & 1 deletion version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.1.4
0.1.5

0 comments on commit 7470edc

Please sign in to comment.