Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update sampling API for llama.cpp #1742

Merged
merged 18 commits into from
Sep 19, 2024
Merged

Update sampling API for llama.cpp #1742

merged 18 commits into from
Sep 19, 2024

Conversation

abetlen
Copy link
Owner

@abetlen abetlen commented Sep 15, 2024

Updates llama-cpp-python to use the new llama.cpp sampler api.

Known Issues:

  • Grammar parser seems to silently fail when there is a syntax error in the gbnf grammar

@e-c-d
Copy link
Contributor

e-c-d commented Sep 16, 2024

You probably know this already, but the llama_context_params.seed field should be removed everywhere:

diff --git a/llama_cpp/llama_cpp.py b/llama_cpp/llama_cpp.py
index efec065..fa53135 100644
--- a/llama_cpp/llama_cpp.py
+++ b/llama_cpp/llama_cpp.py
@@ -850,11 +850,10 @@ class llama_model_params(ctypes.Structure):
 # };
 class llama_context_params(ctypes.Structure):
     """Parameters for llama_context
 
     Attributes:
-        seed (int): RNG seed, -1 for random
         n_ctx (int): text context, 0 = from model
         n_batch (int): logical maximum batch size that can be submitted to llama_decode
         n_ubatch (int): physical maximum batch size
         n_seq_max (int): max number of sequences (i.e. distinct states for recurrent models)
         n_threads (int): number of threads to use for generation
@@ -881,11 +880,10 @@ class llama_context_params(ctypes.Structure):
         abort_callback (ggml_abort_callback): abort callback if it returns true, execution of llama_decode() will be aborted
         abort_callback_data (ctypes.ctypes.c_void_p): data for abort_callback
     """
 
     if TYPE_CHECKING:
-        seed: int
         n_ctx: int
         n_batch: int
         n_ubatch: int
         n_seq_max: int
         n_threads: int
@@ -911,11 +909,10 @@ class llama_context_params(ctypes.Structure):
         flash_attn: bool
         abort_callback: Callable[[ctypes.c_void_p], bool]
         abort_callback_data: ctypes.c_void_p
 
     _fields_ = [
-        ("seed", ctypes.c_uint32),
         ("n_ctx", ctypes.c_uint32),
         ("n_batch", ctypes.c_uint32),
         ("n_ubatch", ctypes.c_uint32),
         ("n_seq_max", ctypes.c_uint32),
         ("n_threads", ctypes.c_int32),

@abetlen
Copy link
Owner Author

abetlen commented Sep 17, 2024

@e-c-d yes thank you forgot to push this!

@abetlen abetlen marked this pull request as ready for review September 18, 2024 23:55
@abetlen abetlen merged commit f8fcb3e into main Sep 19, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants