Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] low level cpp & python API for generation #51

Open
UranusSeven opened this issue Jul 13, 2023 · 3 comments
Open

[Feature request] low level cpp & python API for generation #51

UranusSeven opened this issue Jul 13, 2023 · 3 comments

Comments

@UranusSeven
Copy link

Currently chatglm.cpp provides Pipeline for users. The Pipeline class provides a method called chat, which handles the system prompt, chat history, output formatting and more.

This is awesome. But for better flexibility, a low level API generate is also needed for integrate chatglm.cpp with other systems like text-generation-webui, LangChain, and Xinference.

Here are some detailed needs for the method generate (in Python):

def generate(
    self,
    prompt: str,  # the full prompt passed directly to the model
    stop: Optional[Union[str, List[str]]] = [],  # stop words
    max_tokens: int = 128,
    temperature: float = 0.8,
    top_p: float = 0.95,
    top_k: int = 40,
    **other_generate_kwargs
) -> Completion:
    pass


class Completion(TypedDict):
    text: str
    index: int
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int
    finish_reason: Optional[str]  # "stop" for eos met and "length" for max_tokens reached
@UranusSeven
Copy link
Author

@li-plus Hi, do you have any comment on this issue? I'll be glad to implement it if you don't mind :)

@li-plus
Copy link
Owner

li-plus commented Jul 17, 2023

@UranusSeven Hi, sorry for the late reply. Integration into popular frontends is indeed one of the TODOs. I just read the code of llama-cpp-python: https://github.com/abetlen/llama-cpp-python/blob/6d8892fe64ca7eadd503ae01f93fbcd9ff3806dd/llama_cpp/llama.py#L1229-L1251. How about we call it create_completion instead of generate to align with them. It'll be awesome if you would help implement it. If you don't have time, I'll handle it. Anyway, thanks for you advice!

@UranusSeven
Copy link
Author

@li-plus Totally agree! Having similar interface with llama-cpp-python is awesome!
I'll try to resolve this issue this weekend :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants