[Feature]: Support for Hunyuan-A13B-Instruct

### 🚀 The feature, motivation and pitch

Tencent released this new model:
https://huggingface.co/tencent/Hunyuan-A13B-Instruct

It matches bigger models on benchmarks. It has a decent size to run locally and the MoE architecture should make it pretty fast.
It has 256K context too.

The tencent team released a docker version compatible with vllm 0.8.5 but that image lacks the new improvements. Plus I think it doesn't have the Ampere fp8 marlin support as I can't run the fp8 quant it on a 3090 system

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Support for Hunyuan-A13B-Instruct #20182

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Support for Hunyuan-A13B-Instruct #20182

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions