Skip to content

Conversation

@ggerganov
Copy link
Member

fix #8528

@ggerganov ggerganov force-pushed the gg/bump-max-layers branch from fac0355 to b268edf Compare July 17, 2024 07:04
@slaren
Copy link
Member

slaren commented Jul 17, 2024

I would suggest moving the assert to the case where the key is actually an array only. That way it wouldn't affect other models. The assert should also be replaced with an exception instead to avoid crashing on invalid models. Ideally it should be a dynamic container instead of a fixed array, but that requires more changes.

@ggerganov
Copy link
Member Author

I would suggest moving the assert to the case where the key is actually an array only. That way it wouldn't affect other models.

Wouldn't this loop potentially overflow if we check just array keys?

https://github.com/ggerganov/llama.cpp/blob/b268edf87c3bb996feb88e2d8b2c85b38bff6f36/src/llama.cpp#L4061-L4063

@slaren
Copy link
Member

slaren commented Jul 17, 2024

Yes, I assumed that it would only set the first element in that case. From what I can tell, only OpenELM ever uses more than the first element.

@ggerganov ggerganov requested a review from slaren July 17, 2024 07:50
@ggerganov ggerganov merged commit d197545 into master Jul 19, 2024
@ggerganov ggerganov deleted the gg/bump-max-layers branch July 19, 2024 13:50
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 27, 2024
* llama : bump max layers from 256 to 512

* llama : replace asserts with exceptions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Can't quantize 405B Mega merge

2 participants