chore(main): release 0.1.5 (#435)

🤖 I have created a release *beep* *boop* --- ## [0.1.5](v0.1.4...v0.1.5) (2024-08-13) ### Bugfix * Fix PagedPrefill python api and some typos ([#441](#441)) ([3fff008](3fff008)) * fix prefill kernels' lse result for empty kv-cache ([#440](#440)) ([6ac28f4](6ac28f4)) ### Features * decouple float and int workspace buffer ([#442](#442)) ([a7ee566](a7ee566)) ### Performance Improvements * faster fp8->fp16 dequantization for pre sm_90 arch ([#439](#439)) ([c93f647](c93f647)) ### Acknowledgement We thank contributions and feedbacks from the community: [@comaniac](https://github.com/comaniac), [@hnyls2002](https://github.com/hnyls2002), [@jianfei-wangg](https://github.com/jianfei-wangg), [@Yard1](https://github.com/Yard1). --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Zihao Ye <[email protected]>
flashinfer-ai · Aug 13, 2024 · 7470edc · 7470edc
1 parent a7ee566
commit 7470edc
Show file tree

Hide file tree

Showing 4 changed files with 26 additions and 3 deletions.
diff --git a/.release-please-manifest.json b/.release-please-manifest.json
@@ -1,3 +1,3 @@
 {
-  ".": "0.1.4"
+  ".": "0.1.5"
 }
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,28 @@
 # Changelog
 
+## [0.1.5](https://github.com/flashinfer-ai/flashinfer/compare/v0.1.4...v0.1.5) (2024-08-13)
+
+
+### Bugfix
+
+* Fix PagedPrefill python api and some typos ([#441](https://github.com/flashinfer-ai/flashinfer/pull/441)) ([3fff008](https://github.com/flashinfer-ai/flashinfer/commit/3fff008dc9af56c325d9c487bddf69ff014f3989))
+* fix prefill kernels' lse result for empty kv-cache ([#440](https://github.com/flashinfer-ai/flashinfer/pull/440)) ([6ac28f4](https://github.com/flashinfer-ai/flashinfer/commit/6ac28f4dd3a9a34a2b4abcbe0a815fc59a2d74ad))
+
+### Features
+
+* decouple float and int workspace buffer ([#442](https://github.com/flashinfer-ai/flashinfer/issues/442)) ([a7ee566](https://github.com/flashinfer-ai/flashinfer/commit/a7ee5662bf967ab1ee16910c73761d326fbeb9a0))
+
+
+### Performance Improvements
+
+* faster fp8-&gt;fp16 dequantization for pre sm_90 arch ([#439](https://github.com/flashinfer-ai/flashinfer/issues/439)) ([c93f647](https://github.com/flashinfer-ai/flashinfer/commit/c93f647a0dd6b58c9ac20b39438316202358463c))
+
+### Acknowledgement
+
+We thank contributions and feedbacks from the community: [@comaniac](https://github.com/comaniac), [@hnyls2002](https://github.com/hnyls2002), [@jianfei-wangg](https://github.com/jianfei-wangg), [@Yard1](https://github.com/Yard1).
+
+
+
 ## [0.1.4](https://github.com/flashinfer-ai/flashinfer/compare/v0.1.3...v0.1.4) (2024-08-09)
 
 

diff --git a/README.md b/README.md
@@ -16,7 +16,7 @@ Kernel Library for LLM Serving
 [![Documentation](https://github.com/flashinfer-ai/flashinfer/actions/workflows/build-doc.yml/badge.svg)](https://github.com/flashinfer-ai/flashinfer/actions/workflows/build-doc.yml)
 
 
-FlashInfer is a library for Large Language Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, PageAttention and LoRA. FlashInfer focus on LLM serving and inference, and delivers state-of-the-art performance across diverse scenarios.
+FlashInfer is a library for Large Language Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, SparseAttention, PageAttention, Sampling, and more. FlashInfer focus on LLM serving and inference, and delivers state-of-the-art performance across diverse scenarios.
 
 The unique features of FlashInfer include:
 1. **Comprehensive Attention Kernels**: Attention kernels that cover all the common use cases of LLM serving, including *single-request* and *batching* versions of *Prefill*, *Decode*, and *Append* kernels, on different formats of KV-Cache (Padded Tensor, Ragged Tensor, and Page Table).

diff --git a/version.txt b/version.txt
@@ -1 +1 @@
-0.1.4
+0.1.5