Model Accuracy Overlay - TT-Explorer #1234

vprajapati-tt · 2024-11-12T17:05:42Z

We have the issues in for perf overlays in the TT-Explorer frontend, a new functionality would be an overlay that allows for the "Accuracy" to be displayed on the graph.
Accuracy can be something like PCC information from Golden, which can help visualize the error as the execution is completed.

odjuricicTT · 2024-12-24T09:54:02Z

I think that this was planned for February. Changing the milestone.

tapspatel · 2025-01-21T07:05:40Z

example case

Generate model with golden data embedded into it from ttir_builder.py

/code/tt-mlir/test/python/golden/test_ttir_ops.py
python /code/tt-mlir/test/python/golden/test_ttir_ops.py

or choose a test in that file, like def test_arbitrary_op_chain(...)

Run model in ttrt with appropriate flags

ttrt run test_arbitrary_op_chain.ttnn --clean-artifacts --save-artifacts

Look at golden_results.json file that gets generated per program ttrt-artifacts/test_arbitrary_op_chain.ttnn/run/program_0/golden_results.json. TTIR op is identified via loc data (right now we assume 1:1 mapping between most ttir -> ttnn ops but this will not be the case. This issue will fix it: [TTIR][TTNN] MLIR compiler locations #1745).

sample dump

{
    "loc(\"/code/tt-mlir/test/python/golden/test_ttir_ops.py:372:id(0)\")": {
        "expected_pcc": 0.99,
        "actual_pcc": 1.0,
        "atol": 1e-08,
        "rtol": 1e-05,
        "allclose": true,
        "max": 0.0,
        "mean_absolute_error": 0.0,
        "root_mean_square_error": 0.0,
        "cosine_similarity": 1.0000001192092896
    },
    "loc(\"/code/tt-mlir/test/python/golden/test_ttir_ops.py:373:id(1)\")": {
        "expected_pcc": 0.99,
        "actual_pcc": 0.9999771608457817,
        "atol": 1e-08,
        "rtol": 1e-05,
        "allclose": false,
        "max": 0.1714310646057129,
        "mean_absolute_error": 0.011073621921241283,
        "root_mean_square_error": 0.023029498755931854,
        "cosine_similarity": 0.9999846816062927
    },
    "loc(\"/code/tt-mlir/test/python/golden/test_ttir_ops.py:374:id(2)\")": {
        "expected_pcc": 0.99,
        "actual_pcc": 0.9999862390049077,
        "atol": 1e-08,
        "rtol": 1e-05,
        "allclose": false,
        "max": 0.37725830078125,
        "mean_absolute_error": 0.012225031852722168,
        "root_mean_square_error": 0.029251599684357643,
        "cosine_similarity": 0.9999864101409912
    }
}

odjuricicTT · 2025-01-21T09:35:27Z

@tapspatel Step 1. is not something that will happed from tt-explorer?

odjuricicTT · 2025-01-21T09:38:25Z

@vprajapati-tt After a brief chat with @tapspatel, ttrt golden data only works with ttrt run. In order to make it work from explorer we need to add this as an option.

My suggestion would be to have a another option on the frontend similar to Optimization Policy. This way we can let the use decide which overlay they want to se after execution (perf or accuracy).

vprajapati-tt · 2025-01-21T16:38:25Z

Both of these overlays should be provided as NodeData after each execution, with an exception being made if GoldenData is not found. ttrt perf will invoke ttrt run, and correct me if I'm wrong @tapspatel but the --golden flag should be defaulted now. Since the artifacts will be saved for both the run and perf, we should be able to parse both sets of data if they exist. If multiple node data sources are provided, model-explorer provides a UI element to choose between them: https://github.com/google-ai-edge/model-explorer/wiki/2.-User-Guide#use-custom-node-data.

tapspatel · 2025-01-22T01:42:07Z

For step 1, that has to be done by some external tool/framework. For example, a potential user flow

build a model using ttir_builder, this builds ttir ops and runs golden on each op and stores golden in the flatbuffer via the API ttnn_to_flatbuffer. You can see an example in file tt-mlir/python/test_infra/test_utils.py line ttnn_to_flatbuffer_file(module, output_file_name, builder.get_golden_map()). In passes.cpp, it calls the api

mlir::tt::ttnn::translateTTNNToFlatbuffer(moduleOp, file, goldenMap)

goldenMap = std::unordered_map<std::string, mlir::tt::GoldenTensor>.

This is embedded directly into the flatbuffer as a map where key=loc and value is array of bytes of the golden data

Other frontends (like tt-torch, tt-xla etc) will translate their graph into some higher level dialect and they are responsible for having golden intermediate for each of those higher level dialect ops. They can also follow this same mechanism to embed the golden tensors in the flatbuffer itself as a map.
If you invoke ttrt via command line, it will do the golden comparison with the golden data it finds in the flatbuffer and compare it against the loc() of each op against the key of the golden_map via callbacks (see tt-mlir/runtime/tools/python/ttrt/common/callback.py). It will then dump that golden_results.json report.
Long-term, we probably want to come up with a better solution of how to store Goldens, especially for larger models as the flatbuffers will become quite large in size. Another option being discussed as temporary for the demo is saving Goldens to disk in a folder, with their name being the loc.pt (saved as .pt PyTorch tensor file) and then ttrt verification will use this file during golden comparison. A directory path can be passed into ttrt where it will look at the files stored in disk and do the golden with those .pt files. This is an example idea, but is not implemented (really low risk to get it running).
Now if you are running a .mlir file explorer:

From what I understand, explorer uses a ttir graph POV. The problem I think is (and correct me if I'm wrong) but explorer needs a way to access the golden tensors. I discussed with Vraj last week and either explorer can load in a flatbuffer and get the golden tensors or it can point to some location in disk when running the model to do the golden comparison. ttrt can provide this support to pass a folder path to look at during golden comparison. Explorer can also create an internal map of Goldens provided the Goldens exist on disk (ie, during loading of ttir graph, load Goldens into std::unordered_map<std::string, mlir::tt::GoldenTensor> and when you convert to flatbuffer, you can pass in this map). For this situation, we just have to make sure the loc data in ttir op matches the golden .pt file name.

My goal is to be able to provide a ttir.mlir file which already has loc data within it into explorer such that you do not have to figure out loc names for yourself. I will also provide golden data for each of the ttir ops saved in both the flatbuffer that is generated from that ttir.mlir file and saved to disk with their name being "loc.pt". Explorer is free to use whichever way to access to the golden data.

Summarize

tt-explorer loads some ttir.mlir file with loc data
tt-explorer has access to golden data on disk where .pt tensors have same name as the loc for each ttir op it relates to or it can load in the flatbuffer generated from the ttir.mlir file and get the golden data from the flatbuffer itself
tt-explorer can either maintain a map during .mlir file initialization and pass to ttnn_to_flatbuffer function to store in flatbuffer (and then this will automatically get verified by ttrt during runtime if golden data exists in flatbuffer) or ttrt can expose a dir flag to look in during golden verification

Let me know your thoughts. Fyi, we are also discussing maybe doing a smaller transformer layer only from the llama prefill model since it is ~3000 ops and the point of this demo is to show case how golden accuracy can be debugged. So you can expect us to use a smaller model (ie ~50 ish ops).

I stand corrected, @vprajapati-tt is correct in that golden is supported in perf as well as run mode. I had a brain freeze this morning. I had meant to say we will need run mode because the llama model doesn't run in perf mode (due to buffers running out of space on device and we don't have mechanism to flush buffers in ttrt yet) and golden would also work in ttrt run. However, in light of potentially only using a transformer layer in the demo, we should be able to run golden in ttrt perf mode without issue. So if it works on your end in explorer via the ttir_builder.py models with golden verification in explorer, its fine by me.

tapspatel · 2025-01-22T16:30:34Z

Synced with Vraj during standup, he is adding in a mechanism for uploading flatbuffer + caches of the model IR during various stages which should solve this issue. I had the assumption we could only load ttir graph, so my previous comment is not relevant anymore.

vprajapati-tt added enhancement New feature or request explorer Issues related to Explorer Visualization tool labels Nov 12, 2024

vprajapati-tt self-assigned this Nov 12, 2024

vprajapati-tt added this to the [Performance - 1] December milestone Dec 4, 2024

vprajapati-tt mentioned this issue Dec 4, 2024

TTRT - Golden Artifacts #1507

Closed

odjuricicTT modified the milestones: [Performance - 1] December, [Performance - Next] Dec 24, 2024

tapspatel mentioned this issue Jan 21, 2025

LLama prefill bringup #1768

Closed

21 tasks

vprajapati-tt linked a pull request Jan 30, 2025 that will close this issue

Create ModuleCache, load ModuleCache into FB, Use to load FB + Golden Data in Explorer #1882

Merged

vprajapati-tt closed this as completed in #1882 Feb 6, 2025

vprajapati-tt reopened this Feb 7, 2025

vprajapati-tt linked a pull request Feb 25, 2025 that will close this issue

Incorporate ttir_builder generated TTNNs as Load/Execute Tests in Explorer #2123

Merged

vprajapati-tt closed this as completed in #2123 Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Accuracy Overlay - TT-Explorer #1234

Model Accuracy Overlay - TT-Explorer #1234

vprajapati-tt commented Nov 12, 2024

odjuricicTT commented Dec 24, 2024

tapspatel commented Jan 21, 2025

odjuricicTT commented Jan 21, 2025

odjuricicTT commented Jan 21, 2025

vprajapati-tt commented Jan 21, 2025

tapspatel commented Jan 22, 2025 •

edited

Loading

tapspatel commented Jan 22, 2025

Model Accuracy Overlay - TT-Explorer #1234

Model Accuracy Overlay - TT-Explorer #1234

Comments

vprajapati-tt commented Nov 12, 2024

odjuricicTT commented Dec 24, 2024

tapspatel commented Jan 21, 2025

odjuricicTT commented Jan 21, 2025

odjuricicTT commented Jan 21, 2025

vprajapati-tt commented Jan 21, 2025

tapspatel commented Jan 22, 2025 • edited Loading

tapspatel commented Jan 22, 2025

tapspatel commented Jan 22, 2025 •

edited

Loading