Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#16367: Added support to enable dram and l1 memory collection without saving to disk #16368

Merged
merged 1 commit into from
Jan 13, 2025

Conversation

tapspatel
Copy link
Contributor

@tapspatel tapspatel commented Dec 30, 2024

Ticket

Link to Github Issue

Problem description

In upstream compiler environments, we need the ability to query a memory view of the device state for DRAM/L1. The current method of saving to csv brings up 2 issues

  1. The csv format requires specific parsing algorithms and recent changes to memory allocator have caused the upstream compiler parsing to break. A more robust solution that can dump the memory values without having to implement clever parsing algorithms would be beneficial.

Current memory allocator csv file:

,DRAM
,Total allocatable (B):,12884901504
,Total allocated (B):,0
,Total free (B):,12884901504
FreeListOpt allocator info:
segregated free blocks by size:
  Size class 0: (1024 - 2048) blocks: 
  Size class 1: (2048 - 4096) blocks: 
  Size class 2: (4096 - 8192) blocks: 
  Size class 3: (8192 - 16384) blocks: 
  Size class 4: (16384 - 32768) blocks: 
  Size class 5: (32768 - 65536) blocks: 
  Size class 6: (65536 - 131072) blocks: 
  Size class 7: (131072 - 262144) blocks: 
  Size class 8: (262144 - 524288) blocks: 
  Size class 9: (524288 - 1048576) blocks: 
  Size class 10: (1048576 - 2097152) blocks: 
  Size class 11: (2097152 - 4194304) blocks: 
  Size class 12: (4194304 - 8388608) blocks: 
  Size class 13: (8388608 - 16777216) blocks: 
  Size class 14: (16777216 - 33554432) blocks: 
  Size class 15: (33554432 - 67108864) blocks: 
  Size class 16: (67108864 - 134217728) blocks: 
  Size class 17: (134217728 - inf) blocks: 0 
Free slots in block table: 
Block table:
       Block      Address         Size       PrevID       NextID    Allocated 
           0            0   1073741792         none         none           no
,L1
,Total allocatable (B):,87504896
,Total allocated (B):,0
,Total free (B):,87504896
FreeListOpt allocator info:
segregated free blocks by size:
  Size class 0: (1024 - 2048) blocks: 
  Size class 1: (2048 - 4096) blocks: 
  Size class 2: (4096 - 8192) blocks: 
  Size class 3: (8192 - 16384) blocks: 
  Size class 4: (16384 - 32768) blocks: 
  Size class 5: (32768 - 65536) blocks: 
  1. The second hurdle is having to save to disk + requiring some sort of file opening mechanism to read the stored file. The compiler could issue hundreds of small subgraphs and having to save to disk is extremely time consuming and not efficient. Having the ability to query the device during the runtime is much more beneficial.

What's changed

2 new tt_metal APIs

MemoryView GetDramMemoryView(const Device* device);

MemoryView GetL1MemoryView(const Device* device);

2 new ttnn pybinded APIs

ttnn.get_dram_memory_view(device)
ttnn.get_l1_memory_view(device)

Sample usage from ttnn python POV

>>> dram = ttnn.get_dram_memory_view(device)
>>> print(dram)
<ttnn._ttnn.device.MemoryView object at 0x7fe7ee4ecb70>
>>> dram.num_banks
12
>>> dram.total_allocatable_per_bank_size_bytes
1073741792
>>> dram.total_allocated_per_bank_size_bytes
4096
>>> dram.total_free_per_bank_size_bytes
1073737696
>>> dram.total_allocatable_size_bytes
12884901504
>>> dram.total_allocated_size_bytes
49152
>>> dram.total_free_size_bytes
12884852352
>>> dram.largest_contiguous_free_block_per_bank_size_bytes
1073737696
>>> dram.blockTable
[{'blockID': '0', 'address': '0', 'size': '2048', 'prevID': '-1', 'nextID': '1', 'allocated': 'yes'}, {'blockID': '1', 'address': '2048', 'size': '2048', 'prevID': '0', 'nextID': '2', 'allocated': 'yes'}, {'blockID': '2', 'address': '4096', 'size': '1073737696', 'prevID': '1', 'nextID': '-1', 'allocated': 'no'}]
>>> l1 = ttnn.get_l1_memory_view(device)
>>> l1.num_banks
64

MemoryView information being collected

struct MemoryView {
    std::uint64_t num_banks;
    size_t total_allocatable_per_bank_size_bytes;
    size_t total_allocated_per_bank_size_bytes;
    size_t total_free_per_bank_size_bytes;
    size_t total_allocatable_size_bytes;  // total_allocatable_per_bank_size_bytes * num_banks
    size_t total_allocated_size_bytes;    // total_allocated_per_bank_size_bytes * num_banks
    size_t total_free_size_bytes;         // total_free_per_bank_size_bytes * num_banks
    size_t largest_contiguous_free_block_per_bank_size_bytes;
    std::vector<std::unordered_map<std::string, std::string>> blockTable;
};

Checklist

post commit pass: https://github.com/tenstorrent/tt-metal/actions/runs/12683308047
nightly model and ttnn pass: https://github.com/tenstorrent/tt-metal/actions/runs/12683309017

tt_metal/detail/reports/memory_reporter.cpp Outdated Show resolved Hide resolved
tt_metal/detail/reports/memory_reporter.cpp Outdated Show resolved Hide resolved
tt_metal/detail/reports/memory_reporter.cpp Outdated Show resolved Hide resolved
tt_metal/impl/allocator/algorithms/free_list.cpp Outdated Show resolved Hide resolved
@tapspatel
Copy link
Contributor Author

@abhullar-tt @ayerofieiev-tt I updated the variable names

std::uint64_t num_banks;
    size_t bytes_allocatable_per_bank;
    size_t bytes_allocated_per_bank;
    size_t bytes_free_per_bank;
    size_t total_bytes_allocatable;  // bytes_allocatable_per_bank * num_banks
    size_t total_bytes_allocated;    // bytes_allocated_per_bank * num_banks
    size_t total_bytes_free;         // bytes_free_per_bank * num_banks
    size_t largest_contiguous_bytes_free_per_bank;

Copy link
Contributor

@abhullar-tt abhullar-tt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks okay to me, please post successful CI runs before merging

@tapspatel
Copy link
Contributor Author

tt_metal/detail/reports/memory_reporter.hpp Outdated Show resolved Hide resolved
tt_metal/impl/allocator/algorithms/free_list.cpp Outdated Show resolved Hide resolved
tt_metal/impl/allocator/algorithms/free_list.cpp Outdated Show resolved Hide resolved
tech_reports/memory/allocator.md Outdated Show resolved Hide resolved
tt_metal/detail/reports/memory_reporter.hpp Outdated Show resolved Hide resolved
@tapspatel tapspatel requested a review from tt-aho January 8, 2025 16:18
Copy link
Contributor

@tt-aho tt-aho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay to me, but I am not sure why we need dedicated user APIs for each buffer type, instead of just taking the buffer type as a parameter.

@tapspatel
Copy link
Contributor Author

@tt-aho agree with your comments. updated the API structure to pass BufferType as a var (l1, dram, l1_small, trace)

@tapspatel tapspatel force-pushed the tpatel/issue-16367 branch 2 times, most recently from 745dc8a to affbc66 Compare January 9, 2025 03:51
@tapspatel
Copy link
Contributor Author

@tapspatel tapspatel merged commit 07aa188 into main Jan 13, 2025
9 checks passed
@tapspatel tapspatel deleted the tpatel/issue-16367 branch January 13, 2025 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable in-memory collection of memory results (not save to disk)
6 participants