#16367: Added support to enable dram and l1 memory collection without saving to disk #16368

tapspatel · 2024-12-30T22:47:39Z

Ticket

Problem description

In upstream compiler environments, we need the ability to query a memory view of the device state for DRAM/L1. The current method of saving to csv brings up 2 issues

The csv format requires specific parsing algorithms and recent changes to memory allocator have caused the upstream compiler parsing to break. A more robust solution that can dump the memory values without having to implement clever parsing algorithms would be beneficial.

Current memory allocator csv file:

,DRAM
,Total allocatable (B):,12884901504
,Total allocated (B):,0
,Total free (B):,12884901504
FreeListOpt allocator info:
segregated free blocks by size:
  Size class 0: (1024 - 2048) blocks: 
  Size class 1: (2048 - 4096) blocks: 
  Size class 2: (4096 - 8192) blocks: 
  Size class 3: (8192 - 16384) blocks: 
  Size class 4: (16384 - 32768) blocks: 
  Size class 5: (32768 - 65536) blocks: 
  Size class 6: (65536 - 131072) blocks: 
  Size class 7: (131072 - 262144) blocks: 
  Size class 8: (262144 - 524288) blocks: 
  Size class 9: (524288 - 1048576) blocks: 
  Size class 10: (1048576 - 2097152) blocks: 
  Size class 11: (2097152 - 4194304) blocks: 
  Size class 12: (4194304 - 8388608) blocks: 
  Size class 13: (8388608 - 16777216) blocks: 
  Size class 14: (16777216 - 33554432) blocks: 
  Size class 15: (33554432 - 67108864) blocks: 
  Size class 16: (67108864 - 134217728) blocks: 
  Size class 17: (134217728 - inf) blocks: 0 
Free slots in block table: 
Block table:
       Block      Address         Size       PrevID       NextID    Allocated 
           0            0   1073741792         none         none           no
,L1
,Total allocatable (B):,87504896
,Total allocated (B):,0
,Total free (B):,87504896
FreeListOpt allocator info:
segregated free blocks by size:
  Size class 0: (1024 - 2048) blocks: 
  Size class 1: (2048 - 4096) blocks: 
  Size class 2: (4096 - 8192) blocks: 
  Size class 3: (8192 - 16384) blocks: 
  Size class 4: (16384 - 32768) blocks: 
  Size class 5: (32768 - 65536) blocks:

The second hurdle is having to save to disk + requiring some sort of file opening mechanism to read the stored file. The compiler could issue hundreds of small subgraphs and having to save to disk is extremely time consuming and not efficient. Having the ability to query the device during the runtime is much more beneficial.

What's changed

2 new tt_metal APIs

MemoryView GetDramMemoryView(const Device* device);

MemoryView GetL1MemoryView(const Device* device);

2 new ttnn pybinded APIs

ttnn.get_dram_memory_view(device)
ttnn.get_l1_memory_view(device)

Sample usage from ttnn python POV

>>> dram = ttnn.get_dram_memory_view(device)
>>> print(dram)
<ttnn._ttnn.device.MemoryView object at 0x7fe7ee4ecb70>
>>> dram.num_banks
12
>>> dram.total_allocatable_per_bank_size_bytes
1073741792
>>> dram.total_allocated_per_bank_size_bytes
4096
>>> dram.total_free_per_bank_size_bytes
1073737696
>>> dram.total_allocatable_size_bytes
12884901504
>>> dram.total_allocated_size_bytes
49152
>>> dram.total_free_size_bytes
12884852352
>>> dram.largest_contiguous_free_block_per_bank_size_bytes
1073737696
>>> dram.blockTable
[{'blockID': '0', 'address': '0', 'size': '2048', 'prevID': '-1', 'nextID': '1', 'allocated': 'yes'}, {'blockID': '1', 'address': '2048', 'size': '2048', 'prevID': '0', 'nextID': '2', 'allocated': 'yes'}, {'blockID': '2', 'address': '4096', 'size': '1073737696', 'prevID': '1', 'nextID': '-1', 'allocated': 'no'}]
>>> l1 = ttnn.get_l1_memory_view(device)
>>> l1.num_banks
64

MemoryView information being collected

struct MemoryView {
    std::uint64_t num_banks;
    size_t total_allocatable_per_bank_size_bytes;
    size_t total_allocated_per_bank_size_bytes;
    size_t total_free_per_bank_size_bytes;
    size_t total_allocatable_size_bytes;  // total_allocatable_per_bank_size_bytes * num_banks
    size_t total_allocated_size_bytes;    // total_allocated_per_bank_size_bytes * num_banks
    size_t total_free_size_bytes;         // total_free_per_bank_size_bytes * num_banks
    size_t largest_contiguous_free_block_per_bank_size_bytes;
    std::vector<std::unordered_map<std::string, std::string>> blockTable;
};

Checklist

post commit pass: https://github.com/tenstorrent/tt-metal/actions/runs/12683308047
nightly model and ttnn pass: https://github.com/tenstorrent/tt-metal/actions/runs/12683309017

ttnn/cpp/pybind11/device.cpp

tt_metal/detail/reports/memory_reporter.cpp

tt_metal/impl/allocator/algorithms/free_list.cpp

tapspatel · 2025-01-07T21:14:59Z

@abhullar-tt @ayerofieiev-tt I updated the variable names

std::uint64_t num_banks;
    size_t bytes_allocatable_per_bank;
    size_t bytes_allocated_per_bank;
    size_t bytes_free_per_bank;
    size_t total_bytes_allocatable;  // bytes_allocatable_per_bank * num_banks
    size_t total_bytes_allocated;    // bytes_allocated_per_bank * num_banks
    size_t total_bytes_free;         // bytes_free_per_bank * num_banks
    size_t largest_contiguous_bytes_free_per_bank;

abhullar-tt

overall looks okay to me, please post successful CI runs before merging

tapspatel · 2025-01-08T14:40:03Z

post commit pipeline pass: https://github.com/tenstorrent/tt-metal/actions/runs/12660580219

tt_metal/detail/reports/memory_reporter.hpp

tt_metal/impl/allocator/algorithms/free_list.cpp

tech_reports/memory/allocator.md

tt_metal/detail/reports/memory_reporter.hpp

tt-aho

Looks okay to me, but I am not sure why we need dedicated user APIs for each buffer type, instead of just taking the buffer type as a parameter.

tapspatel · 2025-01-08T20:57:29Z

@tt-aho agree with your comments. updated the API structure to pass BufferType as a var (l1, dram, l1_small, trace)

tapspatel · 2025-01-09T05:26:37Z

post commit pass: https://github.com/tenstorrent/tt-metal/actions/runs/12683308047
nightly model and ttnn pass: https://github.com/tenstorrent/tt-metal/actions/runs/12683309017

tt_metal/impl/allocator/algorithms/free_list_opt.cpp

tt_metal/detail/reports/memory_reporter.cpp

tt_metal/include/tt_metal/device.hpp

tt_metal/detail/reports/memory_reporter.hpp

… saving to disk

tapspatel self-assigned this Dec 30, 2024

tapspatel requested review from ayerofieiev-tt, dmakoviichuk-tt, cfjchu, TT-BrianLiu, abhullar-tt, pgkeller, aliuTT, tt-aho, tt-dma, tt-asaigal, ubcheema and davorchap as code owners December 30, 2024 22:47

tapspatel mentioned this pull request Dec 30, 2024

Update memory report in ttrt with new memory allocator tenstorrent/tt-mlir#1683

Closed

tapspatel force-pushed the tpatel/issue-16367 branch from 81f8df5 to eb4efbd Compare December 31, 2024 20:15

ayerofieiev-tt reviewed Dec 31, 2024

View reviewed changes

ttnn/cpp/pybind11/device.cpp Outdated Show resolved Hide resolved

abhullar-tt requested changes Jan 2, 2025

View reviewed changes

tapspatel force-pushed the tpatel/issue-16367 branch from eb4efbd to 87f69ab Compare January 7, 2025 20:39

tapspatel requested review from ayerofieiev-tt and abhullar-tt January 7, 2025 21:15

tapspatel linked an issue Jan 7, 2025 that may be closed by this pull request

Enable in-memory collection of memory results (not save to disk) #16367

Closed

abhullar-tt approved these changes Jan 7, 2025

View reviewed changes

tapspatel mentioned this pull request Jan 7, 2025

#1683: Updated memory collection and report of dram and l1 using new metal APIs tenstorrent/tt-mlir#1726

Merged

tt-aho reviewed Jan 8, 2025

View reviewed changes

tapspatel force-pushed the tpatel/issue-16367 branch from 87f69ab to 8905345 Compare January 8, 2025 16:17

tapspatel requested a review from tt-aho January 8, 2025 16:18

tt-aho approved these changes Jan 8, 2025

View reviewed changes

tapspatel force-pushed the tpatel/issue-16367 branch from 8905345 to c2a216c Compare January 8, 2025 20:56

tapspatel force-pushed the tpatel/issue-16367 branch 2 times, most recently from 745dc8a to affbc66 Compare January 9, 2025 03:51

tapspatel force-pushed the tpatel/issue-16367 branch from affbc66 to 7bb435c Compare January 9, 2025 16:47

TT-BrianLiu requested changes Jan 9, 2025

View reviewed changes

tt_metal/impl/allocator/algorithms/free_list_opt.cpp Outdated Show resolved Hide resolved

tt_metal/detail/reports/memory_reporter.cpp Outdated Show resolved Hide resolved

tapspatel force-pushed the tpatel/issue-16367 branch from 7bb435c to 9c7cb9c Compare January 9, 2025 17:53

tapspatel requested a review from TT-BrianLiu January 9, 2025 17:54

TT-BrianLiu approved these changes Jan 9, 2025

View reviewed changes

dmakoviichuk-tt reviewed Jan 9, 2025

View reviewed changes

tt_metal/include/tt_metal/device.hpp Outdated Show resolved Hide resolved

dmakoviichuk-tt reviewed Jan 9, 2025

View reviewed changes

tt_metal/detail/reports/memory_reporter.hpp Outdated Show resolved Hide resolved

dmakoviichuk-tt approved these changes Jan 10, 2025

View reviewed changes

#16367: Added support to enable dram and l1 memory collection without…

be02763

… saving to disk

tapspatel force-pushed the tpatel/issue-16367 branch from 9c7cb9c to be02763 Compare January 13, 2025 18:59

ayerofieiev-tt approved these changes Jan 13, 2025

View reviewed changes

tapspatel merged commit 07aa188 into main Jan 13, 2025
9 checks passed

tapspatel deleted the tpatel/issue-16367 branch January 13, 2025 19:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#16367: Added support to enable dram and l1 memory collection without saving to disk #16368

#16367: Added support to enable dram and l1 memory collection without saving to disk #16368

tapspatel commented Dec 30, 2024 •

edited

Loading

tapspatel commented Jan 7, 2025

abhullar-tt left a comment •

edited

Loading

tapspatel commented Jan 8, 2025

tt-aho left a comment

tapspatel commented Jan 8, 2025

tapspatel commented Jan 9, 2025

#16367: Added support to enable dram and l1 memory collection without saving to disk #16368

#16367: Added support to enable dram and l1 memory collection without saving to disk #16368

Conversation

tapspatel commented Dec 30, 2024 • edited Loading

Ticket

Problem description

What's changed

2 new tt_metal APIs

2 new ttnn pybinded APIs

Sample usage from ttnn python POV

MemoryView information being collected

Checklist

tapspatel commented Jan 7, 2025

abhullar-tt left a comment • edited Loading

Choose a reason for hiding this comment

tapspatel commented Jan 8, 2025

tt-aho left a comment

Choose a reason for hiding this comment

tapspatel commented Jan 8, 2025

tapspatel commented Jan 9, 2025

tapspatel commented Dec 30, 2024 •

edited

Loading

abhullar-tt left a comment •

edited

Loading