Skip to content

[AutoDeploy][Bug]: Cutlass MOE kernel caused the accuracy drop #9184

@nvchenghaoz

Description

@nvchenghaoz

System Info

With the H100 from cw, the cutlass moe BF16 kernel caused the accuracy drop for gsm8k.

Who can help?

@nzmora-nvidia

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

pytest tests/integration/defs/accuracy/test_llm_api_autodeploy.py::TestNemotronMOE -s -vv

Expected behavior

Fix the accuracy issue, or If the kernel cannot support the BF16, then let's stick with the triton..

actual behavior

N/A

additional notes

N/A

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

Labels

AutoDeploy<NV> AutoDeploy BackendbugSomething isn't working

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions