Decrease in Performance #2355

coadmonky · 2023-07-24T02:43:52Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[y ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[ n] I carefully followed the README.md.
[ n] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[ n] I reviewed the Discussions, and have a new bug or useful enhancement to share.
The following graph shows end-of-day builds over the last few weeks (focusing on where the performance decrease was identified).

Expected Behavior

The system should behave with similar performance over time with models.

Current Behavior

The system shows decreased performance on 11 July 2023.

Environment and Context

CPU Only: i5-8400 with 32 G DDR4 2666 RAM.

The operating system is Linux fedora 5.19.11-200.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Sep 23 15:07:44 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Failure Information (for bugs)

There is no failure, only reduced performance.

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

step 1 - take a version from 9 July 2023...run with the optimal number of threads (on my system it is 6). Observe performance.
step 2 - take a version from 11 or 12 July 2023 and observe the performance
step 3 - notice that the performance is worse on or after 11 July 2023.

Failure Logs

A graph of performance of various versions over time has been attached (along with other relevant information).

JohannesGaessler · 2023-07-24T07:31:19Z

If you're going to invest this much time into trying to figure out when the performance regression happened I suggest you use git bisect instead. It will automatically give you new commits to test and tell you which commit caused the problem at the end.

JohannesGaessler · 2023-07-24T08:14:12Z

I can reproduce the issue. According to git bisect the matrix multiplication broadcasting by @ggerganov causes the performance regression:

975221e9548ef6d9f4af8d39cdffc4811c050beb is the first bad commit
commit 975221e9548ef6d9f4af8d39cdffc4811c050beb
Author: Georgi Gerganov <[email protected]>
Date:   Wed Jul 12 20:51:29 2023 +0300

    ggml : broadcast mul_mat + conv batch support (#2199)
    
    * ggml : broadcast mul_mat + conv batch support
    
    * ggml : apply mul_mat broadcast fix by @jploski

 ggml.c | 152 ++++++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 79 insertions(+), 73 deletions(-)

@jploski

* ggml : broadcast mul_mat + conv batch support * ggml : apply mul_mat broadcast fix by @jploski

coadmonky · 2023-07-25T00:00:05Z

It still looks about the same.

COMPILER INFORMATION

I CFLAGS: -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS
I CC: cc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2)
I CXX: g++ (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2)

BEFORE (12 July 2023 [f7d278f])
llama_print_timings: load time = 69705.57 ms
llama_print_timings: sample time = 80.57 ms / 131 runs ( 0.62 ms per token, 1625.87 tokens per second)
llama_print_timings: prompt eval time = 4342.94 ms / 33 tokens ( 131.60 ms per token, 7.60 tokens per second)
llama_print_timings: eval time = 66328.00 ms / 130 runs ( 510.22 ms per token, 1.96 tokens per second)
llama_print_timings: total time = 70776.56 ms

AFTER (24 July 2023 [41c6741])
llama_print_timings: load time = 1277.89 ms
llama_print_timings: sample time = 54.80 ms / 126 runs ( 0.43 ms per token, 2299.14 tokens per second)
llama_print_timings: prompt eval time = 15090.92 ms / 33 tokens ( 457.30 ms per token, 2.19 tokens per second)
llama_print_timings: eval time = 61502.50 ms / 125 runs ( 492.02 ms per token, 2.03 tokens per second)
llama_print_timings: total time = 76672.43 ms

coadmonky · 2023-07-26T01:02:58Z

I checked code for 20230725 eb542d3 and it seems the same to me.
If there's a higher priority, focus there....I can use an older release until something happens.

Thanks...

ggerganov · 2023-07-26T05:01:13Z

Ah, I just realized you are testing the code on master.

I am actually making changes on a branch: mul-mat-tweaks and asked to test that branch to see if it restores the performance for you.

Here is the PR for that branch: #2372

coadmonky · 2023-07-26T15:01:09Z

Sorry, yeah I realized that last night; I just didn't know which branch. I'll check it out this evening. Thanks for all your work. :-) Cheers, Ed L

coadmonky · 2023-07-27T00:23:39Z

I finallly used the right branch (mul-mat-tweaks) and it seems better. I'm going to close this issue. Thanks!

cebtenzzre · 2023-07-27T00:40:29Z

Could we leave this issue open until the issue is fixed on master?

coadmonky · 2023-07-27T02:46:52Z

Reopen to do further testing and integration into the Master trunk.

ggerganov · 2023-08-07T11:26:48Z

Fixed via #2372

slaren referenced this issue Jul 24, 2023

ggml : broadcast mul_mat + conv batch support (#2199)

975221e

* ggml : broadcast mul_mat + conv batch support * ggml : apply mul_mat broadcast fix by @jploski

ggerganov mentioned this issue Jul 24, 2023

ggml : mul mat tweaks #2372

Merged

2 tasks

coadmonky closed this as completed Jul 27, 2023

coadmonky reopened this Jul 27, 2023

ggerganov closed this as completed Aug 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decrease in Performance #2355

Decrease in Performance #2355

coadmonky commented Jul 24, 2023

JohannesGaessler commented Jul 24, 2023

JohannesGaessler commented Jul 24, 2023

coadmonky commented Jul 25, 2023

coadmonky commented Jul 26, 2023

ggerganov commented Jul 26, 2023 •

edited

Loading

coadmonky commented Jul 26, 2023 via email

coadmonky commented Jul 27, 2023

cebtenzzre commented Jul 27, 2023

coadmonky commented Jul 27, 2023

ggerganov commented Aug 7, 2023

Decrease in Performance #2355

Decrease in Performance #2355

Comments

coadmonky commented Jul 24, 2023

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

JohannesGaessler commented Jul 24, 2023

JohannesGaessler commented Jul 24, 2023

coadmonky commented Jul 25, 2023

COMPILER INFORMATION

coadmonky commented Jul 26, 2023

ggerganov commented Jul 26, 2023 • edited Loading

coadmonky commented Jul 26, 2023 via email

coadmonky commented Jul 27, 2023

cebtenzzre commented Jul 27, 2023

coadmonky commented Jul 27, 2023

ggerganov commented Aug 7, 2023

ggerganov commented Jul 26, 2023 •

edited

Loading