Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLama prefill bringup #1768

Open
8 of 24 tasks
tapspatel opened this issue Jan 14, 2025 · 4 comments
Open
8 of 24 tasks

LLama prefill bringup #1768

tapspatel opened this issue Jan 14, 2025 · 4 comments
Assignees

Comments

@tapspatel
Copy link
Contributor

tapspatel commented Jan 14, 2025

Goal: Run llama prefill model on visualizer with golden information and be able to change attributes to visualize degradation/fixation of golden.

Llama prefill: (attached as .txt as GitHub doesn't allow .mlir extension)
llama_ttir_mlir.txt

Parallel Thread of Work

Parallel Thread of Work

  • Pybind pipeline passes
  • Get list of all ops that you need from llama ttir
  • Be able to build all ttir ops in python infra for llama
  • Be able to do golden for each ttir op in python infra for llama
  • Ensure all ttir ops found in llama model are supported in ttir_builder (Have ability to load a llama model in ttir builder #1779) - Collin
  • Create llama model from ttir infra (Have ability to load a llama model in ttir builder #1779) - Collin
  • Create golden version of llama model from ttir infra (blocked by above) - Collin
  • Run golden version of llama with golden map and do comparison (blocked by above) - Collin/Taps

Parallel Thread of Work

  • Generate a multi-op flatbuffer with golden data embedded in the flatbuffer via ttir builder - Taps
  • Generate the same multi-op flatbuffer with golden data saved to disk via ttir builder - Taps
  • Visualize mutli-op ttir graph in tt-explorer with access to the golden data (via flatbuffer or disk) Model Accuracy Overlay - TT-Explorer #1234 - Ognjen
  • Run multi-op ttir graph and compare intermediates + output with golden data (blocked by above) - Taps
  • Have ability to change certain attributes (dtype) and showcase failure in golden check for ops (blocked by above) - Taps

Parallel Thread of Work

  • Visualize llama ttir on tt-explorer (Enable viewing of llama prefill in tt-explorer #1769) - Ognjen
  • Run llama ttir on tt-explorer (blocked by above) (above fix works, but model runs into issue with running in perf mode since perf buffers on device aren't large enough) - Taps
  • Do golden comparison via tt-explorer (blocked by above) - Taps
  • Change attributes and show golden degradation on tt-explorer (blocked by above) - Taps

Parallel Thread of Work

Bonus

@tapspatel tapspatel self-assigned this Jan 14, 2025
@tapspatel
Copy link
Contributor Author

relevant branches: tpatel/issue-1745

@tapspatel
Copy link
Contributor Author

list of all ops in llama prefill

  • typecast
  • reciprocal
  • multiply
  • embedding
  • cos (done)
  • unsqueeze
  • transpose
  • squeeze
  • concat
  • matmul
  • sqrt
  • sin
  • softmax
  • sigmoid
  • add
  • reshape
  • mean

@tapspatel
Copy link
Contributor Author

@ctodTT stage 4: these 2 tasks we can do locally (we are blocked on the task after that from tt-forge support)

Run llama model from tt-forge-fe
Run llama ttir model from mlir repo (using ttmlir-opt, ttmlir-translate, ttrt) - make sure features work (perf, memory)

Run llama model from tt-forge-fe

To run prefil on device run: pytest -svv forge/test/mlir/llama/tests/test_llama_prefil.py::test_llama_prefil_on_device_decode_on_cpu[openlm-research/open_llama_3b]

@tapspatel
Copy link
Contributor Author

ttrt run <llama_ttir.mlir> -save-artifacts --clean-artifacts --memory works like a charm

// ttmlir-opt --ttir-to-ttnn-backend-pipeline="system-desc-path=/code/tt-mlir/ttrt-artifacts/system_desc.ttsys" llama_prefill_ttir.mlir &> llama_prefill_ttnn.mlir
// ttmlir-translate --ttnn-to-flatbuffer llama_prefill_ttnn.mlir &> llama_prefill.ttnn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants