-
Notifications
You must be signed in to change notification settings - Fork 520
Pull requests: pytorch/FBGEMM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Check src & dst dtypes in allgather to prevent silent failures.
cla signed
fb-exported
#3523
opened Dec 20, 2024 by
ChenheliHua
Loading…
Improve performance of prefill mode FP8 Grouped Gemm
cla signed
fb-exported
#3522
opened Dec 20, 2024 by
jwfromm
Loading…
Add fused_moe kernel to ck_extension
cla signed
fb-exported
#3518
opened Dec 19, 2024 by
sijiac
Loading…
Cherry-pick CK PR #1636 for fp8 GEMM rowwise for 70B Prefill
cla signed
fb-exported
#3517
opened Dec 19, 2024 by
zjing14
Loading…
env variable to select rounding mode
cla signed
fb-exported
#3515
opened Dec 19, 2024 by
hhyuanf
Loading…
Optimzed backward pass for ROCm devices (pt 2)
ciflow/rocm
cla signed
fb-exported
module: rocm
#3511
opened Dec 18, 2024 by
q10
Loading…
Back out "Manual loop unroll for rocm inference"
ciflow/rocm
cla signed
fb-exported
module: rocm
#3506
opened Dec 15, 2024 by
brad-mengchi
Loading…
[fbgemm_gpu] Add support for CUDA 12.6 builds in OSS
cla signed
#3503
opened Dec 13, 2024 by
q10
Loading…
migrate "jagged_flash_attention"
cla signed
fb-exported
#3490
opened Dec 10, 2024 by
brad-mengchi
Loading…
Optimzed backward pass for ROCm devices (#3367)
ciflow/rocm
cla signed
fb-exported
module: rocm
#3468
opened Dec 6, 2024 by
q10
Loading…
Use GEMM kernel for KleidiAI to accelerate FP16Benchmark
cla signed
#3440
opened Dec 3, 2024 by
milpuz01
Loading…
Make check_feature_gate_key PT2 compatible
cla signed
fb-exported
#3426
opened Nov 30, 2024 by
sryap
Loading…
Make check_feature_gate_key PT2 compatible
cla signed
fb-exported
#3425
opened Nov 30, 2024 by
sryap
Loading…
Add NEON and SVE implementations for Float16 conversions
cla signed
#3424
opened Nov 28, 2024 by
annop-w
Loading…
Support sending using lengths to TBE instead of just offsets (#2557)
cla signed
fb-exported
#3420
opened Nov 26, 2024 by
PaulZhang12
Loading…
[fbgemm_gpu] Build GenAI for ROCm in OSS [WIP]
ciflow/rocm
cla signed
module: rocm
#3415
opened Nov 25, 2024 by
q10
Loading…
Patch D66310520 to make it build in OSS
cla signed
fb-exported
module: rocm
#3409
opened Nov 23, 2024 by
q10
Loading…
Add check to ensure that there is enough room in permuted_indices
cla signed
fb-exported
#3403
opened Nov 22, 2024 by
zimin2000
Loading…
Previous Next
ProTip!
Follow long discussions with comments:>50.