【cpp wrapper】extract and cache device_info in cpp wrapper by kiddyjinjin · Pull Request #1152 · flagos-ai/FlagGems

kiddyjinjin · 2025-12-03T06:05:36Z

PR Category

Type of Change

Performance Optimization

Description

extract and cache device_info in cpp wrapper

meinie0826 · 2025-12-03T10:00:20Z

lib/device_info.cpp

+}  // namespace
+
+const DeviceInfo &get_device_info(int device_id) {
+  {


should device_id == 0 be treated as an exception ?

meinie0826

lgtm

* optimize resolve_conj * optimize argmin * optimize mean * recover * [KUNLUNXIN] open threshold in flaggems benchmark for "index index_add instance_norm kron linspace nll_loss" (#1143) * [KUNLUNXIN] open threshold in flaggems benchmark for "index index_add instance_norm kron linspace nll_loss" * [KUNLUNXIN] open threshold in flaggems benchmark for "index index_add instance_norm kron linspace nll_loss" --------- Co-authored-by: xuerui <xuerui06@baidu.com> * [KUNLUNXIN] Fix Bug For UnContigious Copy (#1151) Co-authored-by: zhaoyin <zhaoyin@zhaoyindeMacBook-Pro.local> * [AdvancedCompiler]Add moe_sum (#688) * moe_sum * change format * delete test_moe_sum * update for build * remove json * add test * add test to test_special_ops.py * add benchmark to test_special_perf.py * add @pytest.mark.moe_sum --------- Co-authored-by: nmpress1 <1935298275@qq.com> Co-authored-by: you-and-you <1823382186@qq.com> * [AdvancedCompiler]Sort(cpp wrapper) (#822) * [Advanced Compiler]Add TE.geglu&dgeglu (#1056) * [MTHREADS] Adaptation for MUSA backend (#1150) * adaptation - mean - argmax * optimize - max - min - all - any - arange - argmin - batch_norm - celu - gather - log - prod * fix - addmm - index_put * [AdvancedCompiler]Per token group quant fp8 (#716) * implement and test per-token-group fp8 op --------- Co-authored-by: Ea760 <15236119052@163.com> * 【Triton Copilot】Enhance test coverage for aten::index operator (#1083) * Enhance test coverage for aten::index operator - Fix AttributeError in index operator for mixed basic/advanced indexing - Add comprehensive test cases for index operator - Support combining advanced and basic indexing using Triton Fixes #635 * Fix index_put logic inconsistency and precision issues - Update get_max_rank_shape() and broadcast_indices() in index_put.py to support None values (consistent with index.py) - Fix precision issue: create tensor_indices AFTER broadcast_indices to ensure using broadcasted tensors - Add gen_indices_for_index_put() function in test_reduction_ops.py to properly handle multi-dimensional index shapes - Update all index_put tests to use gen_indices_for_index_put() This fixes the pipeline failures and ensures consistency between index and index_put operators. * Fix code formatting issues (trailing whitespace and black formatting) * Reduce test cases to prevent timeout - Remove excessive test cases added to INDEX_ACC_SHAPE - Keep only the original 8 test cases to match the baseline - This should prevent CI timeout issues * Add test cases to improve coverage for index and index_put operators - Add test cases for None value handling in index operator - Add test cases for non-contiguous subspace (transpose logic) - Add test cases for boolean mask indexing - Add test cases for error handling paths - Add test cases for edge cases (empty tensor, all None, 1D special case) - Add error handling tests for index_put operators Total: 10 new test cases covering critical code paths to improve coverage from 70.8% to target >=90% * Fix black formatting for long lines in test cases * Fix failing test cases: remove unsupported scenarios - Remove test_index_all_none: PyTorch doesn't support all-None indices - Simplify test_index_with_none_basic_indexing: keep only working parameter combinations - Remove test_index_non_contiguous_subspace: implementation issue All remaining test cases now pass successfully (8/8 passed) * Fix formatting: remove extra blank lines (flake8 and black) * ci (#1159) * extract and cache device_info in cpp wrapper (#1152) * enable index (#1161) * Fix te ut (#1168) * 【Operator】 replace autotuner to libtuner for index_put (#1166) * enable index * replace autotuner to libtuner for index_put * fix typos (#1169) * 【Hopper】Update mm kernel and tune configs (#1104) * add mm configs * tma gemm kernel * adjust the location of mm with tma implementation * fix ci * Update imports based on Triton version Signed-off-by: Galaxy1458 <55453380+Galaxy1458@users.noreply.github.com> * Update import statement for mm with noqa comment Signed-off-by: Galaxy1458 <55453380+Galaxy1458@users.noreply.github.com> --------- Signed-off-by: Galaxy1458 <55453380+Galaxy1458@users.noreply.github.com> Co-authored-by: Galaxy1458 <55453380+Galaxy1458@users.noreply.github.com> * [KUNLUNXIN] Fix Full Like (#1163) * [KUNLUNXIN] skip lerp/lerp_ test on torch 2.0 with kununxin (#1170) * [KUNLUNXIN] skip lerp/lerp_ test on torch 2.0 with kununxin The half dtype is un-supported on torch < 2.5, so we have to skip that for now * [KUNLUNXIN] skip lerp/lerp_ test on torch 2.0 with kununxin The half dtype is un-supported on torch < 2.5, so we have to skip that for now * [KUNLUNXIN] Fix Softmax accuracy (#1146) * [KUNLUNXIN] Fix index_put_ (#1165) * [KUNLUNXIN] Fix Select Scatter (#1164) * [kunlunxin] update kron op input shapes from zhiyuan provided except big shapes (#1158) Co-authored-by: xuerui <xuerui06@baidu.com> * [KUNLUNXIN] use manual_seed rwkv_mm_sparsity (#1162) * [KUNLUNXIN] use manual_seed rwkv_mm_sparsity * [KUNLUNXIN] tl.load use fp32 to update precesion * Remove register ops (#1171) * [KUNLUNXIN] Turn on attention tests (#1147) * [KUNLUNXIN] Passin in_h in max_pool2d_backward_kernel Signed-off-by: wangrun06 <wangrun06@baidu.com> * [KUNLUNXIN] Turn on attention tests Signed-off-by: wangrun06 <wangrun06@baidu.com> --------- Signed-off-by: wangrun06 <wangrun06@baidu.com> Co-authored-by: wangrun06 <wangrun06@baidu.com> * Fix bool type indices (#1023) * optimize randn and randn_like * fix code format and style * remove the unnecessary change --------- Signed-off-by: Galaxy1458 <55453380+Galaxy1458@users.noreply.github.com> Signed-off-by: wangrun06 <wangrun06@baidu.com> Signed-off-by: bin913 <842884726@qq.com> Co-authored-by: xuexingtu <88195961+xuexingtu@users.noreply.github.com> Co-authored-by: xuerui <xuerui06@baidu.com> Co-authored-by: fantasy666 <39185229+fantasy666@users.noreply.github.com> Co-authored-by: zhaoyin <zhaoyin@zhaoyindeMacBook-Pro.local> Co-authored-by: AdvancedCompiler <Pikachu_Jun@outlook.com> Co-authored-by: nmpress1 <1935298275@qq.com> Co-authored-by: you-and-you <1823382186@qq.com> Co-authored-by: zhoubo567 <781266327@qq.com> Co-authored-by: Kylin1207 <13345006231@163.com> Co-authored-by: Ea760 <15236119052@163.com> Co-authored-by: Zang Peiyu <166481866+factnn@users.noreply.github.com> Co-authored-by: Bigeyes <qjk595391@gmail.com> Co-authored-by: kiddyjinjin <54064850+kiddyjinjin@users.noreply.github.com> Co-authored-by: WangZhen <23097963+0x45f@users.noreply.github.com> Co-authored-by: Galaxy1458 <55453380+Galaxy1458@users.noreply.github.com> Co-authored-by: ylyzty <50573767+ylyzty@users.noreply.github.com> Co-authored-by: Ason93 <18817617225@163.com> Co-authored-by: zeno_dongjibin <4pyqm84rz7@privaterelay.appleid.com> Co-authored-by: purerli98 <82259540+purerli98@users.noreply.github.com> Co-authored-by: mikiya1991 <anakinlancer@gmail.com> Co-authored-by: wangrun06 <wangrun06@baidu.com>

* optimize resolve_conj * optimize argmin * optimize mean * recover * [KUNLUNXIN] open threshold in flaggems benchmark for "index index_add instance_norm kron linspace nll_loss" (flagos-ai#1143) * [KUNLUNXIN] open threshold in flaggems benchmark for "index index_add instance_norm kron linspace nll_loss" * [KUNLUNXIN] open threshold in flaggems benchmark for "index index_add instance_norm kron linspace nll_loss" --------- Co-authored-by: xuerui <xuerui06@baidu.com> * [KUNLUNXIN] Fix Bug For UnContigious Copy (flagos-ai#1151) Co-authored-by: zhaoyin <zhaoyin@zhaoyindeMacBook-Pro.local> * [AdvancedCompiler]Add moe_sum (flagos-ai#688) * moe_sum * change format * delete test_moe_sum * update for build * remove json * add test * add test to test_special_ops.py * add benchmark to test_special_perf.py * add @pytest.mark.moe_sum --------- Co-authored-by: nmpress1 <1935298275@qq.com> Co-authored-by: you-and-you <1823382186@qq.com> * [AdvancedCompiler]Sort(cpp wrapper) (flagos-ai#822) * [Advanced Compiler]Add TE.geglu&dgeglu (flagos-ai#1056) * [MTHREADS] Adaptation for MUSA backend (flagos-ai#1150) * adaptation - mean - argmax * optimize - max - min - all - any - arange - argmin - batch_norm - celu - gather - log - prod * fix - addmm - index_put * [AdvancedCompiler]Per token group quant fp8 (flagos-ai#716) * implement and test per-token-group fp8 op --------- Co-authored-by: Ea760 <15236119052@163.com> * 【Triton Copilot】Enhance test coverage for aten::index operator (flagos-ai#1083) * Enhance test coverage for aten::index operator - Fix AttributeError in index operator for mixed basic/advanced indexing - Add comprehensive test cases for index operator - Support combining advanced and basic indexing using Triton Fixes flagos-ai#635 * Fix index_put logic inconsistency and precision issues - Update get_max_rank_shape() and broadcast_indices() in index_put.py to support None values (consistent with index.py) - Fix precision issue: create tensor_indices AFTER broadcast_indices to ensure using broadcasted tensors - Add gen_indices_for_index_put() function in test_reduction_ops.py to properly handle multi-dimensional index shapes - Update all index_put tests to use gen_indices_for_index_put() This fixes the pipeline failures and ensures consistency between index and index_put operators. * Fix code formatting issues (trailing whitespace and black formatting) * Reduce test cases to prevent timeout - Remove excessive test cases added to INDEX_ACC_SHAPE - Keep only the original 8 test cases to match the baseline - This should prevent CI timeout issues * Add test cases to improve coverage for index and index_put operators - Add test cases for None value handling in index operator - Add test cases for non-contiguous subspace (transpose logic) - Add test cases for boolean mask indexing - Add test cases for error handling paths - Add test cases for edge cases (empty tensor, all None, 1D special case) - Add error handling tests for index_put operators Total: 10 new test cases covering critical code paths to improve coverage from 70.8% to target >=90% * Fix black formatting for long lines in test cases * Fix failing test cases: remove unsupported scenarios - Remove test_index_all_none: PyTorch doesn't support all-None indices - Simplify test_index_with_none_basic_indexing: keep only working parameter combinations - Remove test_index_non_contiguous_subspace: implementation issue All remaining test cases now pass successfully (8/8 passed) * Fix formatting: remove extra blank lines (flake8 and black) * ci (flagos-ai#1159) * extract and cache device_info in cpp wrapper (flagos-ai#1152) * enable index (flagos-ai#1161) * Fix te ut (flagos-ai#1168) * 【Operator】 replace autotuner to libtuner for index_put (flagos-ai#1166) * enable index * replace autotuner to libtuner for index_put * fix typos (flagos-ai#1169) * 【Hopper】Update mm kernel and tune configs (flagos-ai#1104) * add mm configs * tma gemm kernel * adjust the location of mm with tma implementation * fix ci * Update imports based on Triton version Signed-off-by: Galaxy1458 <55453380+Galaxy1458@users.noreply.github.com> * Update import statement for mm with noqa comment Signed-off-by: Galaxy1458 <55453380+Galaxy1458@users.noreply.github.com> --------- Signed-off-by: Galaxy1458 <55453380+Galaxy1458@users.noreply.github.com> Co-authored-by: Galaxy1458 <55453380+Galaxy1458@users.noreply.github.com> * [KUNLUNXIN] Fix Full Like (flagos-ai#1163) * [KUNLUNXIN] skip lerp/lerp_ test on torch 2.0 with kununxin (flagos-ai#1170) * [KUNLUNXIN] skip lerp/lerp_ test on torch 2.0 with kununxin The half dtype is un-supported on torch < 2.5, so we have to skip that for now * [KUNLUNXIN] skip lerp/lerp_ test on torch 2.0 with kununxin The half dtype is un-supported on torch < 2.5, so we have to skip that for now * [KUNLUNXIN] Fix Softmax accuracy (flagos-ai#1146) * [KUNLUNXIN] Fix index_put_ (flagos-ai#1165) * [KUNLUNXIN] Fix Select Scatter (flagos-ai#1164) * [kunlunxin] update kron op input shapes from zhiyuan provided except big shapes (flagos-ai#1158) Co-authored-by: xuerui <xuerui06@baidu.com> * [KUNLUNXIN] use manual_seed rwkv_mm_sparsity (flagos-ai#1162) * [KUNLUNXIN] use manual_seed rwkv_mm_sparsity * [KUNLUNXIN] tl.load use fp32 to update precesion * Remove register ops (flagos-ai#1171) * [KUNLUNXIN] Turn on attention tests (flagos-ai#1147) * [KUNLUNXIN] Passin in_h in max_pool2d_backward_kernel Signed-off-by: wangrun06 <wangrun06@baidu.com> * [KUNLUNXIN] Turn on attention tests Signed-off-by: wangrun06 <wangrun06@baidu.com> --------- Signed-off-by: wangrun06 <wangrun06@baidu.com> Co-authored-by: wangrun06 <wangrun06@baidu.com> * Fix bool type indices (flagos-ai#1023) * optimize randn and randn_like * fix code format and style * remove the unnecessary change --------- Signed-off-by: Galaxy1458 <55453380+Galaxy1458@users.noreply.github.com> Signed-off-by: wangrun06 <wangrun06@baidu.com> Signed-off-by: bin913 <842884726@qq.com> Co-authored-by: xuexingtu <88195961+xuexingtu@users.noreply.github.com> Co-authored-by: xuerui <xuerui06@baidu.com> Co-authored-by: fantasy666 <39185229+fantasy666@users.noreply.github.com> Co-authored-by: zhaoyin <zhaoyin@zhaoyindeMacBook-Pro.local> Co-authored-by: AdvancedCompiler <Pikachu_Jun@outlook.com> Co-authored-by: nmpress1 <1935298275@qq.com> Co-authored-by: you-and-you <1823382186@qq.com> Co-authored-by: zhoubo567 <781266327@qq.com> Co-authored-by: Kylin1207 <13345006231@163.com> Co-authored-by: Ea760 <15236119052@163.com> Co-authored-by: Zang Peiyu <166481866+factnn@users.noreply.github.com> Co-authored-by: Bigeyes <qjk595391@gmail.com> Co-authored-by: kiddyjinjin <54064850+kiddyjinjin@users.noreply.github.com> Co-authored-by: WangZhen <23097963+0x45f@users.noreply.github.com> Co-authored-by: Galaxy1458 <55453380+Galaxy1458@users.noreply.github.com> Co-authored-by: ylyzty <50573767+ylyzty@users.noreply.github.com> Co-authored-by: Ason93 <18817617225@163.com> Co-authored-by: zeno_dongjibin <4pyqm84rz7@privaterelay.appleid.com> Co-authored-by: purerli98 <82259540+purerli98@users.noreply.github.com> Co-authored-by: mikiya1991 <anakinlancer@gmail.com> Co-authored-by: wangrun06 <wangrun06@baidu.com>

extract and cache device_info in cpp wrapper

d220e6a

meinie0826 reviewed Dec 3, 2025

View reviewed changes

meinie0826 approved these changes Dec 3, 2025

View reviewed changes

kiddyjinjin merged commit 4402bcf into flagos-ai:master Dec 4, 2025
12 of 15 checks passed

nicelynice pushed a commit to nicelynice/FlagGems that referenced this pull request Feb 24, 2026

extract and cache device_info in cpp wrapper (flagos-ai#1152)

76fdf5c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【cpp wrapper】extract and cache device_info in cpp wrapper#1152

【cpp wrapper】extract and cache device_info in cpp wrapper#1152
kiddyjinjin merged 1 commit intoflagos-ai:masterfrom
kiddyjinjin:master

kiddyjinjin commented Dec 3, 2025

Uh oh!

meinie0826 Dec 3, 2025

Uh oh!

meinie0826 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kiddyjinjin commented Dec 3, 2025

PR Category

Type of Change

Description

Uh oh!

meinie0826 Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

meinie0826 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants