Skip to content

[Performance]: Moe Memory usage much larger #744

@haojiangzheng

Description

@haojiangzheng

Proposal to Enhance Memory Efficiency and Performance in MoE W8A8
Background:
Last week, a pull request (PR) (see PR #580) introduced optimizations on memory usage for MoE W8A8, which significantly enhanced the handling of key-value caches (kvcaches). However, a recent commit (5c6d05a) appears to have rolled back some of these optimizations, potentially impacting memory efficiency.

Observations:

Memory Management: Efficient memory usage is crucial as it directly impacts the number of kvcaches that can be maintained, which are integral to model performance and scalability.
Performance Trade-offs: While speed is essential, excessive memory usage can lead to constraints on larger dataset handling and model scaling.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions