Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
149 commits
Select commit Hold shift + click to select a range
8b7ce0b
Add power management utilities to NPU device context and update DCVS …
chraac Aug 2, 2025
9a3cf62
Update DCVS settings in power_utils to use v3 API and enhance power m…
chraac Aug 2, 2025
5be7b0a
wip
chraac Aug 3, 2025
39d0b70
Enhance dequantization functions by adding load_dequant_table support…
chraac Aug 4, 2025
9063fd3
use lut
chraac Aug 4, 2025
9cf2c43
wip
chraac Aug 4, 2025
ccdf858
fix test failure
chraac Aug 4, 2025
94f7022
wip
chraac Aug 4, 2025
50add7e
Refactor load_qual_block_generic to improve block handling and optimi…
chraac Aug 5, 2025
df55391
Enhance load_dual_block_generic and load_qual_block_generic to accept…
chraac Aug 5, 2025
c60433f
Refactor flash_attn_impl to optimize mask l2 prefetch
chraac Aug 6, 2025
0ad08cc
wip
chraac Aug 6, 2025
c502b4c
wip
chraac Aug 6, 2025
c8985f5
wip
chraac Aug 6, 2025
a600676
wip
chraac Aug 7, 2025
3048e3e
add log
chraac Aug 7, 2025
669faa0
link against shared libraries instead of static ones
chraac Aug 7, 2025
85a082f
fix swiglu
chraac Aug 7, 2025
e7ceb25
wip
chraac Aug 7, 2025
4601d7f
refactor expf_fix to handle overflow for different data types
chraac Aug 7, 2025
20a4ed2
enhance is_glu_op_supported to validate shapes for multiple sources
chraac Aug 7, 2025
e56b2c1
wip
chraac Aug 7, 2025
723e04a
refactor logging macros to use hexagon namespace and improve formatting
chraac Aug 8, 2025
409cb28
fix printf format error
chraac Aug 8, 2025
3802041
wip
chraac Aug 8, 2025
c3771a8
refactor: update static_assert messages for block size validation and…
chraac Aug 8, 2025
e469b94
rename
chraac Aug 8, 2025
0722d20
Merge branch 'dev-refactoring' into dev-quant-lut
chraac Aug 9, 2025
ba8c044
feat: enhance fa with mask
chraac Aug 9, 2025
6d1f5a8
wip
chraac Aug 9, 2025
be4202c
wip
chraac Aug 9, 2025
d7dc3df
refactor: replace instances of Q6_V_vzero() with kZeroV for consistency
chraac Aug 9, 2025
5b27dc6
wip
chraac Aug 9, 2025
09fda2c
wip
chraac Aug 9, 2025
bd2089b
wip
chraac Aug 9, 2025
08be69d
fix: improve address alignment check in HVX_Vector handling
chraac Aug 10, 2025
eff7ce7
refactor: streamline vector dot product implementations for improved …
chraac Aug 10, 2025
9f94164
refactor: q4k add hvx intrinsic impl
chraac Aug 10, 2025
788eb85
refactor: enhance dequantize_row_q4_K for clarity and performance
chraac Aug 11, 2025
f8897b6
refactor: optimize scale mask usage in dequantization functions for i…
chraac Aug 11, 2025
082d666
refactor: optimize dequantize_row_q4_K for intrinsic usage and perfor…
chraac Aug 11, 2025
31c53c4
refactor: move GLU operation implementation into separated file
chraac Aug 11, 2025
764273c
sync after swiglu
chraac Aug 11, 2025
0040711
wip
chraac Aug 12, 2025
380bd8f
wip
chraac Aug 12, 2025
00b9da9
wip
chraac Aug 12, 2025
4f8cf2b
feat: increase prc main thread stack size
chraac Aug 12, 2025
c3ba43b
fix: replace hardcoded stack size with NPU_THREAD_STACK_SIZE constant
chraac Aug 12, 2025
86d9c93
wip
chraac Aug 13, 2025
d33587e
feat: add optimized vector operations for exponential and division wi…
chraac Aug 13, 2025
dd58a98
wip
chraac Aug 13, 2025
c279a3d
feat: refactor exponential function to handle overflow and underflow …
chraac Aug 13, 2025
8c9b5ef
wip
chraac Aug 13, 2025
bce6fd4
wip
chraac Aug 15, 2025
1d0bca6
feat: add vector loading and scaling functions for improved performan…
chraac Aug 15, 2025
7a0cd2f
wip
chraac Aug 15, 2025
a318bba
feat: optimize block loading by refactoring scale index handling for …
chraac Aug 15, 2025
818baa5
use Q6_Vb_vlut32_VbVbR_nomatch instead
chraac Aug 15, 2025
f7c1b7c
feat: enhance scale loading by adding static assertion and restructur…
chraac Aug 15, 2025
cd349ce
wip
chraac Aug 16, 2025
027a933
feat: refactor vec_dot_product_mixed_impl for improved clarity and pe…
chraac Aug 16, 2025
eeb4606
wip
chraac Aug 16, 2025
20fb6c5
feat: simplify vector loading functions and improve alignment handling
chraac Aug 17, 2025
6c3bc2d
wip
chraac Aug 17, 2025
3694d50
feat: enhance scale loading mask with quantization block size validation
chraac Aug 17, 2025
bdbf172
wip
chraac Aug 17, 2025
423acb7
feat: implement make_scale_load_mask function and refactor vector han…
chraac Aug 17, 2025
f9cc060
feat: enhance load_dual_block_generic to include scale indices for im…
chraac Aug 17, 2025
36f1870
revert q8 dequant
chraac Aug 17, 2025
9bba483
wip
chraac Aug 17, 2025
f0ca3e7
feat: optimize dequantization functions by removing unnecessary maski…
chraac Aug 17, 2025
9901ca0
wip
chraac Aug 18, 2025
38935b6
Merge branch 'dev-refactoring' into dev-quant-lut
chraac Aug 25, 2025
e97b3c0
wip
chraac Aug 28, 2025
dbd7e24
add qurt_mutex
chraac Aug 25, 2025
bd98beb
Add DMA transfer class and integrate into thread pool
chraac Aug 25, 2025
c1fb537
Enhance DMA transfer functionality by adding support for multiple des…
chraac Aug 26, 2025
c76872d
fix dma crash
chraac Aug 26, 2025
9cf3a50
fix failed unit tests
chraac Aug 27, 2025
566514c
wip
chraac Aug 27, 2025
10b55a5
use alignas
chraac Aug 27, 2025
39d3f04
Improve DMA transfer error handling and update descriptor completion …
chraac Aug 27, 2025
41e074b
Fix VTCM cache size calculation in element-wise operations
chraac Aug 27, 2025
50764f3
Add cache clean operations before DMA transfers in element-wise opera…
chraac Aug 27, 2025
d2dfaaa
reduce cache clean operations
chraac Aug 28, 2025
dbf1483
Refactor DMA transfer functions to support 1D operations and rename f…
chraac Aug 28, 2025
5318d9b
Enhance DMA transfer functionality by adding 2D submission support an…
chraac Aug 28, 2025
6fe0141
Update read buffer method to support forced invalidation and remove u…
chraac Aug 28, 2025
16c518a
wip
chraac Aug 29, 2025
af9c561
Improve DMA transfer handling in mul_mat_gemv_impl by replacing memcp…
chraac Aug 29, 2025
af8d7dd
Merge branch 'dev-refactoring' into dev-perf-opt-dma
chraac Aug 29, 2025
0e53c0c
fix 2d dma
chraac Aug 29, 2025
ef37f6c
feat: add DMA plane cache
chraac Aug 30, 2025
327b228
rename
chraac Aug 30, 2025
c7a75bd
wip
chraac Aug 30, 2025
0699652
use memcpy for debug
chraac Aug 30, 2025
f4e102b
fix cache plane calc
chraac Sep 5, 2025
4d88629
refactor: remove debug logging from mul_mat_impl and optimize cache h…
chraac Sep 5, 2025
4bbe6ed
rename
chraac Sep 5, 2025
821575f
fix 2d dma type
chraac Sep 5, 2025
abaeac2
refactor: enhance DMA transfer handling in mul_mat_gemv_impl and wait…
chraac Sep 6, 2025
fa6bb2b
refactor: optimize DMA transfer handling in mul_mat_gemv_impl and wai…
chraac Sep 7, 2025
296a8cc
wip
chraac Sep 8, 2025
17d30e3
wip
chraac Sep 8, 2025
9c51486
move op impl into sub dir
chraac Sep 8, 2025
9ae6b69
add log
chraac Sep 8, 2025
8b31ca3
fix: correct pointer usage in mul_mat_gemv_impl for next plane access
chraac Sep 9, 2025
8a77215
fix: improve DMA transfer error handling in mul_mat_impl and mul_mat_…
chraac Sep 9, 2025
0371c40
fix: fix crash by using the entire row bytes
chraac Sep 9, 2025
416d60b
wip
chraac Sep 9, 2025
1dce1c7
wip
chraac Sep 10, 2025
f5957c0
fix: prevent parallelization for scalar src1 in is_mul_mat_supported
chraac Sep 11, 2025
30c7a30
fix: add dimension checks for 2D DMA transfers and fallback to 1D if …
chraac Sep 12, 2025
0e59726
wip
chraac Sep 12, 2025
950bf4b
fix: enable thread barrier for mul multiplication operations
chraac Sep 15, 2025
c16c79c
feat: add synchronization checks for tensor operations and update rel…
chraac Sep 15, 2025
441ecd5
Merge branch 'dev-refactoring' into dev-perf-opt-dma
chraac Sep 15, 2025
f0ee539
wip
chraac Sep 15, 2025
af3441e
fix: remove invalidation flag from get_read_buffer calls in element-w…
chraac Sep 15, 2025
4bf9779
Revert "fix: remove invalidation flag from get_read_buffer calls in e…
chraac Sep 15, 2025
978d6e3
wip
chraac Sep 16, 2025
8b1d8ff
wip
chraac Sep 16, 2025
4588b20
add comment
chraac Sep 16, 2025
cc90466
fix: improve DMA transfer handling in mul_mat_gemv_impl for quantized…
chraac Sep 20, 2025
f8da8dc
add log
chraac Sep 20, 2025
14951bb
try fix mulmat gemv
chraac Sep 20, 2025
afcd3c7
wip
chraac Sep 21, 2025
f940784
fix: enhance DMA transfer handling in mul_mat_gemv_impl for quantized…
chraac Sep 22, 2025
e5b9d3c
fix: optimize cache offset calculation and remove redundant swap in m…
chraac Sep 22, 2025
bc9888b
fix: refactor DMA transfer handling in mul_mat_gemv_impl for improved…
chraac Sep 22, 2025
3287d9b
wip
chraac Sep 22, 2025
70201dc
wip
chraac Sep 22, 2025
9f72399
wip
chraac Sep 22, 2025
afa541c
fix: enhance mul_mat_impl for improved cache handling and clarity
chraac Sep 23, 2025
eef1600
fix: refactor tensor unflattening and DMA transfer initialization for…
chraac Sep 23, 2025
d68d4d9
fix: improve cache handling of quant
chraac Sep 23, 2025
3358958
wip
chraac Sep 23, 2025
bc24f03
fix: improve cache handling in mul_mat_impl and mul_mat_gemv_impl for…
chraac Sep 24, 2025
27d1851
rename
chraac Sep 24, 2025
32927fd
Merge branch 'dev-refactoring' into dev-perf-opt-dma-phase2
chraac Sep 24, 2025
b575da9
add load_hexa_block_generic
chraac Sep 25, 2025
e47a73c
wip
chraac Sep 26, 2025
ae70488
extract dequant block into separated function
chraac Sep 27, 2025
70a6551
refactor: enhance dequantization functions with table parameter
chraac Sep 27, 2025
c85c1bb
fix load_dual_block_generic
chraac Sep 27, 2025
be4e34c
refactor: rename dequantization functions for clarity and enhance blo…
chraac Sep 27, 2025
47dbfc2
refactor: simplify dequantization logic by consolidating block handli…
chraac Sep 27, 2025
5791c51
wip
chraac Sep 27, 2025
dddef70
wip
chraac Sep 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions ggml/src/ggml-qnn/npu/device/dma_transfer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,23 +12,23 @@ dma_transfer::dma_transfer() {
dma_desc_set_next(_dma_1d_desc0, 0);
dma_desc_set_dstate(_dma_1d_desc0, DESC_DSTATE_INCOMPLETE);
dma_desc_set_desctype(_dma_1d_desc0, DMA_DESC_TYPE_1D);
dma_desc_set_order(_dma_1d_desc0, DESC_ORDER_ORDER);
dma_desc_set_order(_dma_1d_desc0, DESC_ORDER_NOORDER);
dma_desc_set_bypasssrc(_dma_1d_desc0, DESC_BYPASS_ON); // for dram
dma_desc_set_bypassdst(_dma_1d_desc0, DESC_BYPASS_OFF); // for vtcm
dma_desc_set_length(_dma_1d_desc0, 0);

dma_desc_set_next(_dma_1d_desc1, 0);
dma_desc_set_dstate(_dma_1d_desc1, DESC_DSTATE_INCOMPLETE);
dma_desc_set_desctype(_dma_1d_desc1, DMA_DESC_TYPE_1D);
dma_desc_set_order(_dma_1d_desc1, DESC_ORDER_ORDER);
dma_desc_set_order(_dma_1d_desc1, DESC_ORDER_NOORDER);
dma_desc_set_bypasssrc(_dma_1d_desc1, DESC_BYPASS_ON); // for dram
dma_desc_set_bypassdst(_dma_1d_desc1, DESC_BYPASS_OFF); // for vtcm
dma_desc_set_length(_dma_1d_desc1, 0);

dma_desc_set_next(_dma_2d_desc0, 0);
dma_desc_set_dstate(_dma_2d_desc0, DESC_DSTATE_INCOMPLETE);
dma_desc_set_desctype(_dma_2d_desc0, DMA_DESC_TYPE_2D);
dma_desc_set_order(_dma_2d_desc0, DESC_ORDER_ORDER);
dma_desc_set_order(_dma_2d_desc0, DESC_ORDER_NOORDER);
dma_desc_set_bypasssrc(_dma_2d_desc0, DESC_BYPASS_ON); // for dram
dma_desc_set_bypassdst(_dma_2d_desc0, DESC_BYPASS_OFF); // for vtcm
dma_desc_set_cachealloc(_dma_2d_desc0, DESC_CACHEALLOC_NONE);
Expand Down
Loading