Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions examples/models/llama/config/llama_xnnpack.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
base:
metadata: '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}'

model:
use_sdpa_with_kv_cache: True
use_kv_cache: True
dtype_override: fp32

quantization:
qmode: 8da4w
group_size: 128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work OK, i.e. produce correct output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried recently but I've definitely used this combination of parameters a few months ago when I was working on export_llm and it worked

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(aka these are some of the most common options)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me run it..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to produce sane output. Let's land. CI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should already be covered in one of our many Llama CI tests! This combination params is pretty standard

embedding_quantize: 4,32

backend:
xnnpack:
enabled: True
extended_ops: True
Loading