Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : refactor sampling v2 #9294

Merged
merged 47 commits into from
Sep 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
ab545c8
llama : add llama_sampling API + move grammar in libllama
ggerganov Aug 5, 2024
86b07cc
llama : sketching new sampling API
ggerganov Sep 3, 2024
5116b36
cont : add llama_constraint_i [no ci]
ggerganov Sep 3, 2024
cf4dd10
cont : initial implementation sketch [no ci]
ggerganov Sep 3, 2024
1b07dc5
cont : fixes, naming [no ci]
ggerganov Sep 3, 2024
71293a6
cont : add rest of the existing samplers [no ci]
ggerganov Sep 3, 2024
0daebc6
cont : fix [no ci]
ggerganov Sep 3, 2024
a2ce91c
cont : add penalties and logit-bias constraints [no ci]
ggerganov Sep 3, 2024
09ceb68
cont : add comments [no ci]
ggerganov Sep 3, 2024
1e8e26c
cont : leaner constraint initialization [no ci]
ggerganov Sep 4, 2024
91cbb40
cont : common/sampling use the new API [no ci]
ggerganov Sep 4, 2024
437376e
cont : add n_prev to llama_sampler_params
ggerganov Sep 4, 2024
a0b9121
cont : use new API in examples
ggerganov Sep 4, 2024
ad436e9
examples : fix build
ggerganov Sep 4, 2024
fdb52aa
common : fix gpt_sampler_cp
ggerganov Sep 4, 2024
ca5d21c
grammar : fix reset call
ggerganov Sep 4, 2024
c024fe4
constraint : clean-up and simplify
ggerganov Sep 4, 2024
1a0de0b
constraint : add name API
ggerganov Sep 4, 2024
0e1378c
sampling : convert mirostat samplers to constraints
ggerganov Sep 4, 2024
784a644
sampler : API to iterate constraints
ggerganov Sep 4, 2024
e7a11ca
sampling : simplify new llama_sampler calls
ggerganov Sep 4, 2024
8e80a1c
sampling : simplify sample API
ggerganov Sep 4, 2024
9b95067
sampling : fix grammar apply
ggerganov Sep 4, 2024
b2b36e9
example : fix build + fix speculative
ggerganov Sep 4, 2024
69551ff
sampling : remove top-k min_keep, fix mirostat init and state
ggerganov Sep 5, 2024
ebeb651
sampling : change _cp/copy to clone
ggerganov Sep 5, 2024
5957114
sampling : add name API + option to disable timings
ggerganov Sep 5, 2024
a2d8b27
llama : restore comments in llama.h
ggerganov Sep 5, 2024
0b6dfce
llama : remove llama_constraint
ggerganov Sep 5, 2024
34f4bd0
sampling : fix cloning of samplers with null ctx
ggerganov Sep 5, 2024
bd88352
ios : try to fix build
ggerganov Sep 5, 2024
82a89df
sampling : improve mirostat implementation
ggerganov Sep 5, 2024
5b01cc8
swift : fix example
ggerganov Sep 5, 2024
8c972b6
grammar : restore llama_grammar_accept signature
ggerganov Sep 6, 2024
809bdcf
sampling : allow passing m to mirostat sampler
ggerganov Sep 6, 2024
b448c75
sampling : remove redundant indirection calls
ggerganov Sep 6, 2024
5ab52c1
sampling : remove _context suffix [no ci]
ggerganov Sep 6, 2024
757a9bf
llama : add new llama_perf API
ggerganov Sep 6, 2024
befcfe7
common : simplify gpt_sampler
ggerganov Sep 6, 2024
9ce9210
batched.swift : fix build [no ci]
ggerganov Sep 6, 2024
4a4530b
examples : add missing samplers
ggerganov Sep 7, 2024
4b27235
style : rearrange code + add comments and TODOs
ggerganov Sep 7, 2024
19c3696
batched.swift : fix build
ggerganov Sep 7, 2024
0e6d170
sampling : avoid llama_model in few samplers
ggerganov Sep 7, 2024
8a82f38
sampling : fix state cloning
ggerganov Sep 7, 2024
2387dbe
sampling : fix repeat penalty out-of-bounds access
ggerganov Sep 7, 2024
4ac186a
llama : update doc [no ci]
ggerganov Sep 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 0 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -927,7 +927,6 @@ OBJ_COMMON = \
common/ngram-cache.o \
common/sampling.o \
common/train.o \
common/grammar-parser.o \
common/build-info.o \
common/json-schema-to-grammar.o

Expand Down Expand Up @@ -1167,11 +1166,6 @@ common/console.o: \
common/console.h
$(CXX) $(CXXFLAGS) -c $< -o $@

common/grammar-parser.o: \
common/grammar-parser.cpp \
common/grammar-parser.h
$(CXX) $(CXXFLAGS) -c $< -o $@

common/json-schema-to-grammar.o: \
common/json-schema-to-grammar.cpp \
common/json-schema-to-grammar.h
Expand Down
2 changes: 0 additions & 2 deletions common/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,6 @@ add_library(${TARGET} STATIC
sampling.cpp
console.h
console.cpp
grammar-parser.h
grammar-parser.cpp
json.hpp
json-schema-to-grammar.cpp
train.h
Expand Down
109 changes: 37 additions & 72 deletions common/common.cpp

Large diffs are not rendered by default.

6 changes: 1 addition & 5 deletions common/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,6 @@ struct cpu_params {
};

struct gpt_params {
uint32_t seed = LLAMA_DEFAULT_SEED; // RNG seed

int32_t n_predict = -1; // new tokens to predict
int32_t n_ctx = 0; // context size
int32_t n_batch = 2048; // logical batch size for prompt processing (must be >=32 to use BLAS)
Expand Down Expand Up @@ -120,8 +118,7 @@ struct gpt_params {
enum llama_pooling_type pooling_type = LLAMA_POOLING_TYPE_UNSPECIFIED; // pooling type for embeddings
enum llama_attention_type attention_type = LLAMA_ATTENTION_TYPE_UNSPECIFIED; // attention type for embeddings

// // sampling parameters
struct llama_sampling_params sparams;
struct gpt_sampler_params sparams;

std::string model = ""; // model path
std::string model_draft = ""; // draft model for speculative decoding
Expand Down Expand Up @@ -185,7 +182,6 @@ struct gpt_params {
bool flash_attn = false; // flash attention

bool input_prefix_bos = false; // prefix BOS to user inputs, preceding input_prefix
bool ignore_eos = false; // ignore generated EOS tokens
bool logits_all = false; // return logits for all tokens in the batch
bool use_mmap = true; // use mmap for faster loads
bool use_mlock = false; // use mlock to keep model in memory
Expand Down
Loading