-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support tensors with 64-bit number of elements in ggml #599
Comments
It wants to request Unfortunately I cannot reproduce this as I'm failing on the first step:
|
Yes, but when running with -c 3200 I have over 60 GB left. So that wont be the issue.
I feared that would be an issue. I'm not an insane C++ programmer but I've used it a few times. If you need me to run a debugger or something, I think I will be able to do that. Just tell me what I should do or look for. |
did also repro this
|
Maybe there actually isn't such a large contiguous memory block available and the allocation fails because of that? I don't know which memory reporting tool you use but it should show the amount of contiguous available, committable memory and not something that is backed by a page file. The first step I would take would be to disable pagefile completely so absolutely nothing will get paged from/to disk to rule out the possibility of it being a memory/swap issue. |
Oh wait, this line: Or maybe not after all, and it's actually trying to access 0 (null) and the integer length being 8 its 0x8. |
If I do the things @bogdad did I get more or less the identical output (+- Linux and Apple stuff). So we most likely see the same issue. Also I figured out the issue occurs with -c 3277 while it works with -c 3276. |
Agree, have very samish looking output when running without lldb too. Sorry, did not have time to look more into this, good thing it reproduces reliably every time, maybe some overflow mistake in buffer size calculation as you say, but hard to say without looking carefully. |
You need to bump these buffer sizes: Lines 46 to 79 in b51c717
Currently, they are hardcoded to support max context of ~2048 |
Unfortunately that seems to not be sufficient. I increased all four values for 65B by quiet a bit but still get the issue for -c 3277, which is the smallest value it wouldn't work with before. |
I looked into the issue further and want to share my findings so maybe someone who actually knows what's going on might get an idea. Like said before, increasing the MEM_REQ_... values didn't help, but what did help was just adding some space right after: Line 486 in 0ba76c1
I just added a random
and most of the messages were gone. As far as I can tell, Like I said, most message were gone, but not all
So after loading the model into memory it needs an amount of memory which even for my computer is a rather large number. I used my newly learned debugging skills to look what's going on
So something is going wrong already here: Line 242 in 0ba76c1
Maybe this already helps anyone. Unfortunately I have to do some work now, but I'll try to dig further later on. Edit: Edit²: |
n_elements being int seems to be a hard thing to change, as it is assigned to tensor->ne[..] sometimes which is also and an int. might be the natural limit. but i have almost zero experience in llama.cpp. |
Yeah, it goes deep into ggml.c . One probably would have to rewrite whole indexing of ggml.c and llama.cpp to a different data type, which probably would be an horrific task. As I'm not too familiar with C/++, I wonder if it would even work. One probably would have to use something like |
I would say there is no limitation on array size or vector size for up to size_t with some caveats. I would also think that n_elements (or tensor->ne) is not part of the avx optimizations, more controls how many times we do the avx thing, but I know nothing it’s just a guess. Int might be fastest to iterate the loop on, but maybe it would not be that important. I would say also if I had a concrete practical thing that depended on n_elements being larger than int, for example uint might be enough, I would totally gave changing to uint a try in a fork. Getting it upstream might be a separate concern though : ) It might be that it won’t work though for some other reasons, please don’t blame me if that’s the case! |
Thanks a lot @bogdad, that would be great. I read into the topic and it seems like |
(sorry, i was not very clear - currently i don't have the real practical need, i am not working on it) |
So in summary - it is an integer overflow issue. The proposed solution in #626 does not seem the right way. What needs to be done I think is to carefully go through the places where we multiply In contrast, for memory offsets we always use |
This should be resolved by #626 . |
any plans to fix this? |
It should be fixed via #626 |
I'm trying to load a custom 50B model via ggml (not this repo). I believe similar patch has been applied there too, but I can still repro. Sorry didn't realize I was on different repo before commenting (google search brought me here).
|
Just checked ggml project has the same patch applied. Any advice how to debug this issue further? |
Expected Behavior
When setting '-c' to a large number, with sufficient RAM llama.cpp should run. I'm aware that it warns me that context sizes larger than 2048 might produce poor results, but the results are actually fine, if it's not crashing.
Current Behavior
When setting '-c' to a large number, llama.cpp crashes with the error message
(repeated many times with slightly varying numbers, see below).
To be precise, I'm using the command
which crashes while the same command with '-c 3200' works.
If still have plenty of free RAM (~60 GB) so that shouldn't be an issue.
This might be related to #52 but it seem to die later in the process so probably it's something different.
Environment and Context
I'm using Debian 11 and compiled using clang(++)-13 (but same result with g++) with OpenBLAS:
Failure Information (for bugs)
Hopefully everything stated above. Feel free to ask for details.
Steps to Reproduce
Failure Logs
The text was updated successfully, but these errors were encountered: