Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Accuracy Overlay - TT-Explorer #1234

Open
vprajapati-tt opened this issue Nov 12, 2024 · 6 comments
Open

Model Accuracy Overlay - TT-Explorer #1234

vprajapati-tt opened this issue Nov 12, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request explorer Issues related to Explorer Visualization tool

Comments

@vprajapati-tt
Copy link
Contributor

  • We have the issues in for perf overlays in the TT-Explorer frontend, a new functionality would be an overlay that allows for the "Accuracy" to be displayed on the graph.
  • Accuracy can be something like PCC information from Golden, which can help visualize the error as the execution is completed.
@vprajapati-tt vprajapati-tt added enhancement New feature or request explorer Issues related to Explorer Visualization tool labels Nov 12, 2024
@vprajapati-tt vprajapati-tt self-assigned this Nov 12, 2024
@odjuricicTT
Copy link
Contributor

I think that this was planned for February. Changing the milestone.

@tapspatel
Copy link
Contributor

example case

  1. Generate model with golden data embedded into it from ttir_builder.py
/code/tt-mlir/test/python/golden/test_ttir_ops.py
python /code/tt-mlir/test/python/golden/test_ttir_ops.py

or choose a test in that file, like def test_arbitrary_op_chain(...)
  1. Run model in ttrt with appropriate flags
ttrt run test_arbitrary_op_chain.ttnn --clean-artifacts --save-artifacts
  1. Look at golden_results.json file that gets generated per program ttrt-artifacts/test_arbitrary_op_chain.ttnn/run/program_0/golden_results.json. TTIR op is identified via loc data (right now we assume 1:1 mapping between most ttir -> ttnn ops but this will not be the case. This issue will fix it: [TTIR][TTNN] MLIR compiler locations #1745).

sample dump

{
    "loc(\"/code/tt-mlir/test/python/golden/test_ttir_ops.py:372:id(0)\")": {
        "expected_pcc": 0.99,
        "actual_pcc": 1.0,
        "atol": 1e-08,
        "rtol": 1e-05,
        "allclose": true,
        "max": 0.0,
        "mean_absolute_error": 0.0,
        "root_mean_square_error": 0.0,
        "cosine_similarity": 1.0000001192092896
    },
    "loc(\"/code/tt-mlir/test/python/golden/test_ttir_ops.py:373:id(1)\")": {
        "expected_pcc": 0.99,
        "actual_pcc": 0.9999771608457817,
        "atol": 1e-08,
        "rtol": 1e-05,
        "allclose": false,
        "max": 0.1714310646057129,
        "mean_absolute_error": 0.011073621921241283,
        "root_mean_square_error": 0.023029498755931854,
        "cosine_similarity": 0.9999846816062927
    },
    "loc(\"/code/tt-mlir/test/python/golden/test_ttir_ops.py:374:id(2)\")": {
        "expected_pcc": 0.99,
        "actual_pcc": 0.9999862390049077,
        "atol": 1e-08,
        "rtol": 1e-05,
        "allclose": false,
        "max": 0.37725830078125,
        "mean_absolute_error": 0.012225031852722168,
        "root_mean_square_error": 0.029251599684357643,
        "cosine_similarity": 0.9999864101409912
    }
}

@odjuricicTT
Copy link
Contributor

@tapspatel Step 1. is not something that will happed from tt-explorer?

@odjuricicTT
Copy link
Contributor

@vprajapati-tt After a brief chat with @tapspatel, ttrt golden data only works with ttrt run. In order to make it work from explorer we need to add this as an option.

My suggestion would be to have a another option on the frontend similar to Optimization Policy. This way we can let the use decide which overlay they want to se after execution (perf or accuracy).

@vprajapati-tt
Copy link
Contributor Author

Both of these overlays should be provided as NodeData after each execution, with an exception being made if GoldenData is not found. ttrt perf will invoke ttrt run, and correct me if I'm wrong @tapspatel but the --golden flag should be defaulted now. Since the artifacts will be saved for both the run and perf, we should be able to parse both sets of data if they exist. If multiple node data sources are provided, model-explorer provides a UI element to choose between them: https://github.com/google-ai-edge/model-explorer/wiki/2.-User-Guide#use-custom-node-data.

@tapspatel
Copy link
Contributor

tapspatel commented Jan 22, 2025

For step 1, that has to be done by some external tool/framework. For example, a potential user flow

  1. build a model using ttir_builder, this builds ttir ops and runs golden on each op and stores golden in the flatbuffer via the API ttnn_to_flatbuffer. You can see an example in file tt-mlir/python/test_infra/test_utils.py line ttnn_to_flatbuffer_file(module, output_file_name, builder.get_golden_map()). In passes.cpp, it calls the api
mlir::tt::ttnn::translateTTNNToFlatbuffer(moduleOp, file, goldenMap)

goldenMap = std::unordered_map<std::string, mlir::tt::GoldenTensor>.

This is embedded directly into the flatbuffer as a map where key=loc and value is array of bytes of the golden data

  1. Other frontends (like tt-torch, tt-xla etc) will translate their graph into some higher level dialect and they are responsible for having golden intermediate for each of those higher level dialect ops. They can also follow this same mechanism to embed the golden tensors in the flatbuffer itself as a map.

  2. If you invoke ttrt via command line, it will do the golden comparison with the golden data it finds in the flatbuffer and compare it against the loc() of each op against the key of the golden_map via callbacks (see tt-mlir/runtime/tools/python/ttrt/common/callback.py). It will then dump that golden_results.json report.

  3. Long-term, we probably want to come up with a better solution of how to store Goldens, especially for larger models as the flatbuffers will become quite large in size. Another option being discussed as temporary for the demo is saving Goldens to disk in a folder, with their name being the loc.pt (saved as .pt PyTorch tensor file) and then ttrt verification will use this file during golden comparison. A directory path can be passed into ttrt where it will look at the files stored in disk and do the golden with those .pt files. This is an example idea, but is not implemented (really low risk to get it running).

  4. Now if you are running a .mlir file explorer:

From what I understand, explorer uses a ttir graph POV. The problem I think is (and correct me if I'm wrong) but explorer needs a way to access the golden tensors. I discussed with Vraj last week and either explorer can load in a flatbuffer and get the golden tensors or it can point to some location in disk when running the model to do the golden comparison. ttrt can provide this support to pass a folder path to look at during golden comparison. Explorer can also create an internal map of Goldens provided the Goldens exist on disk (ie, during loading of ttir graph, load Goldens into std::unordered_map<std::string, mlir::tt::GoldenTensor> and when you convert to flatbuffer, you can pass in this map). For this situation, we just have to make sure the loc data in ttir op matches the golden .pt file name.

My goal is to be able to provide a ttir.mlir file which already has loc data within it into explorer such that you do not have to figure out loc names for yourself. I will also provide golden data for each of the ttir ops saved in both the flatbuffer that is generated from that ttir.mlir file and saved to disk with their name being "loc.pt". Explorer is free to use whichever way to access to the golden data.

Summarize

  1. tt-explorer loads some ttir.mlir file with loc data
  2. tt-explorer has access to golden data on disk where .pt tensors have same name as the loc for each ttir op it relates to or it can load in the flatbuffer generated from the ttir.mlir file and get the golden data from the flatbuffer itself
  3. tt-explorer can either maintain a map during .mlir file initialization and pass to ttnn_to_flatbuffer function to store in flatbuffer (and then this will automatically get verified by ttrt during runtime if golden data exists in flatbuffer) or ttrt can expose a dir flag to look in during golden verification

Let me know your thoughts. Fyi, we are also discussing maybe doing a smaller transformer layer only from the llama prefill model since it is ~3000 ops and the point of this demo is to show case how golden accuracy can be debugged. So you can expect us to use a smaller model (ie ~50 ish ops).

I stand corrected, @vprajapati-tt is correct in that golden is supported in perf as well as run mode. I had a brain freeze this morning. I had meant to say we will need run mode because the llama model doesn't run in perf mode (due to buffers running out of space on device and we don't have mechanism to flush buffers in ttrt yet) and golden would also work in ttrt run. However, in light of potentially only using a transformer layer in the demo, we should be able to run golden in ttrt perf mode without issue. So if it works on your end in explorer via the ttir_builder.py models with golden verification in explorer, its fine by me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request explorer Issues related to Explorer Visualization tool
Projects
None yet
Development

No branches or pull requests

3 participants