LLama prefill bringup #1768

tapspatel · 2025-01-14T02:01:21Z

Goal: Run llama prefill model on visualizer with golden information and be able to change attributes to visualize degradation/fixation of golden.

Llama prefill: (attached as .txt as GitHub doesn't allow .mlir extension)
llama_ttir_mlir.txt

Parallel Thread of Work

Fix mlir location design for proper inheritance ([TTIR][TTNN] MLIR compiler locations #1745) - Aleksandar

Parallel Thread of Work

Pybind pipeline passes
Get list of all ops that you need from llama ttir
Be able to build all ttir ops in python infra for llama
Be able to do golden for each ttir op in python infra for llama
Ensure all ttir ops found in llama model are supported in ttir_builder (Have ability to load a llama model in ttir builder #1779) - Collin
Create llama model from ttir infra (Have ability to load a llama model in ttir builder #1779) - Collin
Create golden version of llama model from ttir infra (blocked by above) - Collin
Run golden version of llama with golden map and do comparison (blocked by above) - Collin/Taps

Parallel Thread of Work

Generate a multi-op flatbuffer with golden data embedded in the flatbuffer via ttir builder - Taps
Generate the same multi-op flatbuffer with golden data saved to disk via ttir builder - Taps
Visualize mutli-op ttir graph in tt-explorer with access to the golden data (via flatbuffer or disk) Model Accuracy Overlay - TT-Explorer #1234 - Ognjen
Run multi-op ttir graph and compare intermediates + output with golden data (blocked by above) - Taps
Have ability to change certain attributes (dtype) and showcase failure in golden check for ops (blocked by above) - Taps

Parallel Thread of Work

Visualize llama ttir on tt-explorer (Enable viewing of llama prefill in tt-explorer #1769) - Ognjen
Run llama ttir on tt-explorer (blocked by above) (above fix works, but model runs into issue with running in perf mode since perf buffers on device aren't large enough) - Taps
Do golden comparison via tt-explorer (blocked by above) - Taps
Change attributes and show golden degradation on tt-explorer (blocked by above) - Taps

Parallel Thread of Work

Run llama model from tt-forge-fe - Taps
Run llama ttir model from mlir repo (using ttmlir-opt, ttmlir-translate, ttrt) - make sure features work (memory) (perf doesn't work because device can't hold that many ops (more than ~1500) - Taps
Improve tt-forge code to collect golden tensors that match with ttir graph (Generate golden tensors for TTIR llama prefill tt-forge-fe#1057) - Darko
Improve tt-forge code to set proper locations for decomposed ttir graph (Generate loc information for TTIR llama prefill tt-forge-fe#1072 ) - Nikola
Do golden comparison on model (blocked by above) - Taps

Bonus

Visualize memory overlay for llama prefill (Add memory report visualization into tt explorer #1618)

tapspatel · 2025-01-14T02:02:30Z

relevant branches: tpatel/issue-1745

tapspatel · 2025-01-14T02:02:33Z

list of all ops in llama prefill

typecast
reciprocal
multiply
embedding
cos (done)
unsqueeze
transpose
squeeze
concat
matmul
sqrt
sin
softmax
sigmoid
add
reshape
mean

tapspatel · 2025-01-16T17:18:08Z

@ctodTT stage 4: these 2 tasks we can do locally (we are blocked on the task after that from tt-forge support)

Run llama model from tt-forge-fe
Run llama ttir model from mlir repo (using ttmlir-opt, ttmlir-translate, ttrt) - make sure features work (perf, memory)

Run llama model from tt-forge-fe

To run prefil on device run: pytest -svv forge/test/mlir/llama/tests/test_llama_prefil.py::test_llama_prefil_on_device_decode_on_cpu[openlm-research/open_llama_3b]

tapspatel · 2025-01-21T06:35:46Z

ttrt run <llama_ttir.mlir> -save-artifacts --clean-artifacts --memory works like a charm

// ttmlir-opt --ttir-to-ttnn-backend-pipeline="system-desc-path=/code/tt-mlir/ttrt-artifacts/system_desc.ttsys" llama_prefill_ttir.mlir &> llama_prefill_ttnn.mlir
// ttmlir-translate --ttnn-to-flatbuffer llama_prefill_ttnn.mlir &> llama_prefill.ttnn

tapspatel self-assigned this Jan 14, 2025

tapspatel assigned ctodTT Jan 14, 2025

tapspatel mentioned this issue Jan 16, 2025

Generate golden tensors for TTIR llama prefill tenstorrent/tt-forge-fe#1057

Open

This was referenced Jan 16, 2025

Enable viewing of llama prefill in tt-explorer #1769

Open

Generate loc information for TTIR llama prefill tenstorrent/tt-forge-fe#1072

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLama prefill bringup #1768

LLama prefill bringup #1768

tapspatel commented Jan 14, 2025 •

edited

Loading

tapspatel commented Jan 14, 2025

tapspatel commented Jan 14, 2025

tapspatel commented Jan 16, 2025

tapspatel commented Jan 21, 2025

LLama prefill bringup #1768

LLama prefill bringup #1768

Comments

tapspatel commented Jan 14, 2025 • edited Loading

tapspatel commented Jan 14, 2025

tapspatel commented Jan 14, 2025

tapspatel commented Jan 16, 2025

tapspatel commented Jan 21, 2025

tapspatel commented Jan 14, 2025 •

edited

Loading