Skip to content

speculative decoding in llama.cpp : PoC for speeding-up inference via speculative sampling by ggerganov · Pull Request #2926 · ggerganov/llama.cpp #492

@irthomasthomas

Description

@irthomasthomas

Title: speculative : PoC for speeding-up inference via speculative sampling #292

Suggested labels

{ "label-name": "LLM-speed-optimization", "description": "Optimizing LLama model inference speed", "confidence": 80.85 }

Metadata

Metadata

Assignees

No one assigned

    Labels

    AlgorithmsSorting, Learning or Classifying. All algorithms go here.TILShort notes or tips on coding, linux, llms, ml, etcllm-experimentsexperiments with large language modelsllm-serving-optimisationsTips, tricks and tools to speedup inference of large language modelsprompt-engineeringDeveloping and optimizing prompts to efficiently use language models for various applications and re

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions