Skip to content

Latest commit

 

History

History
55 lines (45 loc) · 3.19 KB

README.md

File metadata and controls

55 lines (45 loc) · 3.19 KB

Fast LLM Inference - Optimized Task Plan

I hope to implement some acceleration technologies for Large Language Models (LLMs) because I enjoy doing this myself and love the challenge of bringing research papers into real-world applications.

If there are any technologies you'd like to develop or discuss, feel free to reach out. Thanks!

I'm excited to dive deeper into AI research!


Updates Log

2024

  • 2024/12/16: Add the Medusa-1 Training Script v2
  • 2024/12/15: Add the Medusa-1 Training Script
  • 2024/12/12: Update the KV Cache support for Speculative Decoding
  • 2024/12/04: Add the Kangaroo Training Script v2
  • 2024/11/26: Add the Kangaroo Training Script
  • 2024/11/22: Update the Target Model Keep Generation Mechanism experiment
  • 2024/11/18: Update the Self-Speculative Decoding experiment results of google--gemma-2-9b-it.
  • 2024/11/12: Reviewing implementation challenges for Self-Speculative Decoding and evaluating model compatibility for improved efficiency.
  • 2024/11/10: Initial setup for Self-Speculative Decoding completed; data pipeline in place for testing draft-and-verify.
  • 2024/11/08: Speculative Decoding successfully implemented. Verified improved inference time with no noticeable accuracy degradation.

Pending Decisions

  • Batched Speculative Decoding:
  • Prompt lookup decoding: Determine timeline after reviewing initial implementations.
  • UAG Integration: Assess when to integrate after Medusa and Kangaroo are in place.

TODO List

November 2024

Additional Enhancements