Skip to content

fix bmm transpose in cann 8.5#316

Merged
RuixuanZhang06 merged 1 commit intosgl-project:mainfrom
randgun:main
Jan 14, 2026
Merged

fix bmm transpose in cann 8.5#316
RuixuanZhang06 merged 1 commit intosgl-project:mainfrom
randgun:main

Conversation

@randgun
Copy link
Copy Markdown
Contributor

@randgun randgun commented Jan 14, 2026

After upgrade CANN to version 8.5, some AIV operations are executed compared to CANN 8.3 and cause bugs, force to diable AIV can solve it.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@RuixuanZhang06 RuixuanZhang06 merged commit b62dc61 into sgl-project:main Jan 14, 2026
8 checks passed
Yael-X added a commit to Yael-X/sgl-kernel-npu that referenced this pull request Jan 26, 2026
* 'main' of https://github.com/sgl-project/sgl-kernel-npu: (24 commits)
  [Doc] Improved README.md content and English grammar and integrated the DeepWiki badge for Ask AI (sgl-project#345)
  (test) add solve_tril from upstream (sgl-project#339)
  Add AscendC triangular inverse (sgl-project#332)
  support the situation that topk maybe -1 on machine A3 (sgl-project#313)
  chunk_gated_delta_rule_npu output final state (sgl-project#341)
  The environment variable DEEPEP_HCCL_BUFFSIZE is added, and the priority of DEEPEP_HCCL_BUFFSIZE is higher than that of HCCL_BUFFSIZE. (sgl-project#329)
  Added the low_latency operator API documentation. (sgl-project#337)
  Added the verification of num_max_dispatch_tokens_per_rank to the decode operator adaptation layer. (sgl-project#330)
  Document get_dispatch_layout API (sgl-project#338)
  【Doc】add fused deep moe doc (sgl-project#335)
  add deepep normal api doc (sgl-project#336)
  remove the limit that A2 internode only support topk 8 (sgl-project#323)
  Optimize the performance of the Combine Ant Moving function and the use of HCCL buffer (sgl-project#314)
  deepep adapt custom cann installation path (sgl-project#327)
  [Chore] CANN version bump to 8.5.0 (sgl-project#326)
  add dfx for operator FusedDeepMoe (sgl-project#317)
  Integrate ccache for faster compilation (sgl-project#318)
  Modify contribution guide (sgl-project#315)
  fix bmm transpose in cann 8.5 (sgl-project#316)
  fix little batchsize and int8 quant on ci (sgl-project#302)
  ...
AndyKong2020 pushed a commit to AndyKong2020/sgl-kernel-npu that referenced this pull request Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants