Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decrease in Performance #2355

Closed
coadmonky opened this issue Jul 24, 2023 · 10 comments
Closed

Decrease in Performance #2355

coadmonky opened this issue Jul 24, 2023 · 10 comments

Comments

@coadmonky
Copy link

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [y ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [ n] I carefully followed the README.md.
  • [ n] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [ n] I reviewed the Discussions, and have a new bug or useful enhancement to share.
  • The following graph shows end-of-day builds over the last few weeks (focusing on where the performance decrease was identified).

Screenshot from 2023-07-23 18-35-12

Expected Behavior

The system should behave with similar performance over time with models.

Current Behavior

The system shows decreased performance on 11 July 2023.

Environment and Context

CPU Only: i5-8400 with 32 G DDR4 2666 RAM.

The operating system is Linux fedora 5.19.11-200.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Sep 23 15:07:44 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Failure Information (for bugs)

There is no failure, only reduced performance.

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

  1. step 1 - take a version from 9 July 2023...run with the optimal number of threads (on my system it is 6). Observe performance.
  2. step 2 - take a version from 11 or 12 July 2023 and observe the performance
  3. step 3 - notice that the performance is worse on or after 11 July 2023.

Failure Logs

A graph of performance of various versions over time has been attached (along with other relevant information).

@JohannesGaessler
Copy link
Collaborator

If you're going to invest this much time into trying to figure out when the performance regression happened I suggest you use git bisect instead. It will automatically give you new commits to test and tell you which commit caused the problem at the end.

@JohannesGaessler
Copy link
Collaborator

I can reproduce the issue. According to git bisect the matrix multiplication broadcasting by @ggerganov causes the performance regression:

975221e9548ef6d9f4af8d39cdffc4811c050beb is the first bad commit
commit 975221e9548ef6d9f4af8d39cdffc4811c050beb
Author: Georgi Gerganov <[email protected]>
Date:   Wed Jul 12 20:51:29 2023 +0300

    ggml : broadcast mul_mat + conv batch support (#2199)
    
    * ggml : broadcast mul_mat + conv batch support
    
    * ggml : apply mul_mat broadcast fix by @jploski

 ggml.c | 152 ++++++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 79 insertions(+), 73 deletions(-)

slaren referenced this issue Jul 24, 2023
* ggml : broadcast mul_mat + conv batch support

* ggml : apply mul_mat broadcast fix by @jploski
@ggerganov ggerganov mentioned this issue Jul 24, 2023
2 tasks
@coadmonky
Copy link
Author

It still looks about the same.

COMPILER INFORMATION

I CFLAGS: -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS
I CC: cc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2)
I CXX: g++ (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2)

BEFORE (12 July 2023 [f7d278f])
llama_print_timings: load time = 69705.57 ms
llama_print_timings: sample time = 80.57 ms / 131 runs ( 0.62 ms per token, 1625.87 tokens per second)
llama_print_timings: prompt eval time = 4342.94 ms / 33 tokens ( 131.60 ms per token, 7.60 tokens per second)
llama_print_timings: eval time = 66328.00 ms / 130 runs ( 510.22 ms per token, 1.96 tokens per second)
llama_print_timings: total time = 70776.56 ms

AFTER (24 July 2023 [41c6741])
llama_print_timings: load time = 1277.89 ms
llama_print_timings: sample time = 54.80 ms / 126 runs ( 0.43 ms per token, 2299.14 tokens per second)
llama_print_timings: prompt eval time = 15090.92 ms / 33 tokens ( 457.30 ms per token, 2.19 tokens per second)
llama_print_timings: eval time = 61502.50 ms / 125 runs ( 492.02 ms per token, 2.03 tokens per second)
llama_print_timings: total time = 76672.43 ms

image

@coadmonky
Copy link
Author

I checked code for 20230725 eb542d3 and it seems the same to me.
If there's a higher priority, focus there....I can use an older release until something happens.

Thanks...

image

@ggerganov
Copy link
Owner

ggerganov commented Jul 26, 2023

Ah, I just realized you are testing the code on master.

I am actually making changes on a branch: mul-mat-tweaks and asked to test that branch to see if it restores the performance for you.

Here is the PR for that branch: #2372

@coadmonky
Copy link
Author

coadmonky commented Jul 26, 2023 via email

@coadmonky
Copy link
Author

I finallly used the right branch (mul-mat-tweaks) and it seems better. I'm going to close this issue. Thanks!

image

@cebtenzzre
Copy link
Collaborator

Could we leave this issue open until the issue is fixed on master?

@coadmonky coadmonky reopened this Jul 27, 2023
@coadmonky
Copy link
Author

Reopen to do further testing and integration into the Master trunk.

@ggerganov
Copy link
Owner

Fixed via #2372

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants