Skip to content

Conversation

@fabianlim
Copy link
Contributor

@fabianlim fabianlim commented May 30, 2024

Completing more items in #25 .

  • decided to remove the L40 benchmarks.

Verified that we can reproduce the roughly 20% speedups using fused-ops and kernels

  • these are per device throughputs, so for two gpus we should multiply by 2 to get the actual througput
    image

Verified that we are reproduce the 75% in memory reduction using 4bit base weights

  • also with FSDP when using two gpus, we see another 50% memory reduction
    image

fabianlim added 7 commits May 30, 2024 16:43
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
@fabianlim fabianlim requested a review from achew010 May 30, 2024 08:45
@fabianlim fabianlim self-assigned this May 30, 2024
@fabianlim fabianlim changed the title Add MLP Fused Ops and Kernels, Mixtral Add MLP Fused Ops and Kernels, Mixtral, QLoRA Kernels May 30, 2024
@fabianlim fabianlim changed the title Add MLP Fused Ops and Kernels, Mixtral, QLoRA Kernels Add MLP & QLoRA Fused Ops and Kernels, Mixtral May 30, 2024
@fabianlim fabianlim force-pushed the fix-foak-final branch 2 times, most recently from 2617d8c to fa50cf2 Compare May 30, 2024 11:50
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
@fabianlim
Copy link
Contributor Author

running a set of benches now. will merge after complete

@fabianlim fabianlim merged commit 8103238 into foundation-model-stack:dev Jun 2, 2024
@fabianlim
Copy link
Contributor Author

@achew010 pls update if you have obtained the new benches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants