Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
423 commits
Select commit Hold shift + click to select a range
3791ad2
SimpleChat v3.1: Boolean chat request options in Settings UI, cache_p…
hanishkvc Jun 25, 2024
48e6b92
Add chat template support for llama-cli (#8068)
ngxson Jun 25, 2024
49c03c7
cvector: better prompt handling, add "mean vector" method (#8069)
ngxson Jun 25, 2024
c8ad359
Gguf dump start data offset via --data-offset and some extra refactor…
mofosyne Jun 25, 2024
925c309
Add healthchecks to llama-server containers (#8081)
codearranger Jun 25, 2024
dd047b4
disable docker CI on pull requests (#8110)
slaren Jun 25, 2024
84631fe
`json`: support integer minimum, maximum, exclusiveMinimum, exclusive…
ochafik Jun 25, 2024
e6bf007
llama : return nullptr from llama_grammar_init (#8093)
danbev Jun 25, 2024
6fcbf68
llama : implement Unigram tokenizer needed by T5 and FLAN-T5 model fa…
fairydreaming Jun 25, 2024
163d50a
fixes #7999 (adds control vectors to all `build_XXX()` functions in `…
jukofyork Jun 25, 2024
6777c54
`json`: fix additionalProperties, allow space after enum/const (#7840)
ochafik Jun 26, 2024
9b2f16f
`json`: better support for "type" unions (e.g. nullable arrays w/ typ…
ochafik Jun 26, 2024
494165f
llama : extend llm_build_ffn() to support _scale tensors (#8103)
Eddie-Wang1120 Jun 26, 2024
c8771ab
CUDA: fix misaligned shared memory read (#8123)
JohannesGaessler Jun 26, 2024
8854044
Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag (#8115)
isaac-mcfadyen Jun 26, 2024
f3f6542
llama : reorganize source code + improve CMake (#8006)
ggerganov Jun 26, 2024
a95631e
readme : update API notes
ggerganov Jun 26, 2024
0e814df
devops : remove clblast + LLAMA_CUDA -> GGML_CUDA (#8139)
ggerganov Jun 26, 2024
4713bf3
authors : regen
ggerganov Jun 26, 2024
f2d48ff
sync : ggml
ggerganov Jun 26, 2024
c7ab7b6
make : fix missing -O3 (#8143)
slaren Jun 26, 2024
31ec399
ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLA…
slaren Jun 26, 2024
ae5d0f4
ci : publish new docker images only when the files change (#8142)
slaren Jun 26, 2024
c70d117
scripts : fix filename sync
ggerganov Jun 26, 2024
9b31a40
clip : suppress unused variable warnings (#8105)
danbev Jun 26, 2024
ac14662
Fix llama-android.cpp for error - "common/common.h not found" (#8145)
criminact Jun 27, 2024
911e35b
llama : fix CodeLlama FIM token checks (#8144)
CISC Jun 27, 2024
f675b20
Added support for Viking pre-tokenizer (#8135)
kustaaya Jun 27, 2024
85a267d
CUDA: fix MMQ stream-k for --split-mode row (#8167)
JohannesGaessler Jun 27, 2024
6030c61
Add Qwen2MoE 57B-A14B model identifier (#8158)
CISC Jun 27, 2024
3879526
Delete examples/llama.android/llama/CMakeLists.txt (#8165)
criminact Jun 27, 2024
97877eb
Control vector loading fixes (#8137)
jukofyork Jun 27, 2024
ab36791
flake.lock: Update (#8071)
ggerganov Jun 27, 2024
16791b8
Add chatml fallback for cpp `llama_chat_apply_template` (#8160)
ngxson Jun 27, 2024
8172ee9
cmake : fix deprecated option names not working (#8171)
slaren Jun 27, 2024
558f44b
CI: fix release build (Ubuntu+Mac) (#8170)
loonerin Jun 27, 2024
cb0b06a
`json`: update grammars/README w/ examples & note about additionalPro…
ochafik Jun 27, 2024
a27aa50
Add missing items in makefile (#8177)
ngxson Jun 28, 2024
e57dc62
llama: Add support for Gemma2ForCausalLM (#8156)
pculliton Jun 28, 2024
139cc62
`json`: restore default additionalProperties to false, fix some patte…
ochafik Jun 28, 2024
b851b3f
cmake : allow user to override default options (#8178)
slaren Jun 28, 2024
38373cf
Add SPM infill support (#8016)
CISC Jun 28, 2024
26a39bb
Add MiniCPM, Deepseek V2 chat template + clean up `llama_chat_apply_t…
ngxson Jun 28, 2024
8748d8a
json: attempt to skip slow tests when running under emulator (#8189)
ochafik Jun 28, 2024
72272b8
fix code typo in llama-cli (#8198)
ngxson Jun 28, 2024
1c5eba6
llama: Add attention and final logit soft-capping, update scaling fac…
abetlen Jun 30, 2024
9ef0780
Fix new line issue with chat template, disable template when in-prefi…
ngxson Jun 30, 2024
d0a7145
flake.lock: Update (#8218)
ggerganov Jun 30, 2024
197fe6c
[SYCL] Update SYCL-Rope op and Refactor (#8157)
zhentaoyu Jul 1, 2024
694c59c
Document BERT support. (#8205)
iacore Jul 1, 2024
257f8e4
nix : remove OpenCL remnants (#8235)
ggerganov Jul 1, 2024
3840b6f
nix : enable curl (#8043)
edude03 Jul 1, 2024
0ddeff1
readme : update tool list (#8209)
crashr Jul 1, 2024
49122a8
gemma2: add sliding window mask (#8227)
ngxson Jul 1, 2024
dae57a1
readme: add Paddler to the list of projects (#8239)
mcharytoniuk Jul 1, 2024
cb5fad4
CUDA: refactor and optimize IQ MMVQ (#8215)
JohannesGaessler Jul 1, 2024
5fac350
Fix gemma2 tokenizer convert (#8244)
ngxson Jul 1, 2024
d08c20e
[SYCL] Fix the sub group size of Intel (#8106)
luoyu-intel Jul 2, 2024
a9f3b10
[SYCL] Fix win build conflict of math library (#8230)
luoyu-intel Jul 2, 2024
0e0590a
cuda : update supports_op for matrix multiplication (#8245)
slaren Jul 2, 2024
023b880
convert-hf : print output file name when completed (#8181)
danbev Jul 2, 2024
9689673
Add `JAIS` model(s) (#8118)
fmz Jul 2, 2024
07a3fc0
Removes multiple newlines at the end of files that is breaking the ed…
HanClinto Jul 2, 2024
3e2618b
Adding step to `clean` target to remove legacy binary names to reduce…
HanClinto Jul 2, 2024
a27152b
fix: add missing short command line argument -mli for multiline-input…
MistApproach Jul 2, 2024
fadde67
Dequant improvements rebase (#8255)
Jul 3, 2024
f8d6a23
fix typo (#8267)
foldl Jul 3, 2024
916248a
fix phi 3 conversion (#8262)
ngxson Jul 3, 2024
5f2d4e6
ppl : fix n_seq_max for perplexity (#8277)
slaren Jul 3, 2024
d23287f
Define and optimize RDNA1 (#8085)
daniandtheweb Jul 3, 2024
f619024
[SYCL] Remove unneeded semicolons (#8280)
Jul 4, 2024
20fc380
convert : fix gemma v1 tokenizer convert (#8248)
ggerganov Jul 4, 2024
402d6fe
llama : suppress unref var in Windows MSVC (#8150)
danbev Jul 4, 2024
f8c4c07
tests : add _CRT_SECURE_NO_WARNINGS for WIN32 (#8231)
danbev Jul 4, 2024
807b0c4
Inference support for T5 and FLAN-T5 model families (#5763)
fairydreaming Jul 4, 2024
b0a4699
build(python): Package scripts with pip-0517 compliance
ditsuke Feb 27, 2024
b1c3f26
fix: Actually include scripts in build
ditsuke Feb 28, 2024
8219229
fix: Update script paths in CI scripts
ditsuke Mar 10, 2024
de14e2e
chore: ignore all __pychache__
ditsuke Jul 2, 2024
07786a6
chore: Fixup requirements and build
ditsuke Jul 2, 2024
01a5f06
chore: Remove rebase artifacts
ditsuke Jul 2, 2024
1e92001
doc: Add context for why we add an explicit pytorch source
ditsuke Jul 2, 2024
51d2eba
build: Export hf-to-gguf as snakecase
ditsuke Jul 4, 2024
6f63d64
tokenize : add --show-count (token) option (#8299)
danbev Jul 4, 2024
d7fd29f
llama : add OpenELM support (#7359)
icecream95 Jul 4, 2024
a38b884
cli: add EOT when user hit Ctrl+C (#8296)
ngxson Jul 4, 2024
f09b7cb
rm get_work_group_size() by local cache for performance (#8286)
NeoZhangJianyu Jul 5, 2024
e235b26
py : switch to snake_case (#8305)
ggerganov Jul 5, 2024
a9554e2
[SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266)
luoyu-intel Jul 5, 2024
6c05752
contributing : update guidelines (#8316)
ggerganov Jul 5, 2024
aa5898d
llama : prefer n_ over num_ prefix (#8308)
ggerganov Jul 5, 2024
61ecafa
passkey : add short intro to README.md [no-ci] (#8317)
danbev Jul 5, 2024
5a7447c
readme : fix minor typos [no ci] (#8314)
pouwerkerk Jul 5, 2024
bcefa03
CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (#8311)
JohannesGaessler Jul 5, 2024
d12f781
llama : streamline embeddings from "non-embedding" models (#8087)
iamlemec Jul 5, 2024
0a42380
CUDA: revert part of the RDNA1 optimizations (#8309)
daniandtheweb Jul 5, 2024
8e55830
CUDA: MMQ support for iq4_nl, iq4_xs (#8278)
JohannesGaessler Jul 5, 2024
2cccbaa
llama : minor indentation during tensor loading (#8304)
ggerganov Jul 5, 2024
148ec97
convert : remove AWQ remnants (#8320)
ggerganov Jul 5, 2024
1f3e1b6
Enabled more data types for oneMKL gemm_batch (#8236)
OuadiElfarouki Jul 5, 2024
1d894a7
cmake : add GGML_BUILD and GGML_SHARED macro definitions (#8281)
akemimadoka Jul 5, 2024
7ed03b8
llama : fix compile warning (#8304)
ggerganov Jul 5, 2024
be20e7f
Reorganize documentation pages (#8325)
ngxson Jul 5, 2024
213701b
Detokenizer fixes (#8039)
jaime-m-p Jul 5, 2024
87e25a1
llama : add early return for empty range (#8327)
danbev Jul 6, 2024
60d83a0
update main readme (#8333)
ngxson Jul 6, 2024
86e7299
added support for Authorization Bearer tokens when downloading model …
dwoolworth Jul 6, 2024
cb4d86c
server: Retrieve prompt template in /props (#8337)
bviksoe Jul 7, 2024
210eb9e
finetune: Rename an old command name in finetune.sh (#8344)
standby24x7 Jul 7, 2024
b81ba1f
finetune: Rename command name in README.md (#8343)
standby24x7 Jul 7, 2024
d39130a
py : use cpu-only torch in requirements.txt (#8335)
compilade Jul 7, 2024
b504008
llama : fix n_rot default (#8348)
ggerganov Jul 7, 2024
905942a
llama : support glm3 and glm4 (#8031)
youth123 Jul 7, 2024
f7cab35
gguf-hash: model wide and per tensor hashing using xxhash and sha1 (#…
mofosyne Jul 7, 2024
f1948f1
readme : update bindings list (#8222)
andy-tai Jul 7, 2024
4090ea5
ci : add checks for cmake,make and ctest in ci/run.sh (#8200)
AlexsCode Jul 7, 2024
a8db2a9
Update llama-cli documentation (#8315)
dspasyuk Jul 7, 2024
3fd62a6
py : type-check all Python scripts with Pyright (#8341)
compilade Jul 7, 2024
04ce3a8
readme : add supported glm models (#8360)
youth123 Jul 8, 2024
ffd0079
common : avoid unnecessary logits fetch (#8358)
kevmo314 Jul 8, 2024
6f0dbf6
infill : assert prefix/suffix tokens + remove old space logic (#8351)
ggerganov Jul 8, 2024
470939d
common : preallocate sampling token data vector (#8363)
kevmo314 Jul 8, 2024
fde13b3
feat: cuda implementation for `ggml_conv_transpose_1d` (ggml/854)
balisujohn Jul 2, 2024
6847d54
tests : fix whitespace (#0)
ggerganov Jul 8, 2024
2ee44c9
sync : ggml
ggerganov Jul 8, 2024
3f2d538
scripts : fix sync for sycl
ggerganov Jul 8, 2024
175391d
merge with master
JoanFM Jul 8, 2024
0699a4c
Merge branch 'feat-jina-embeddings-v2-zh' of https://github.com/JoanF…
JoanFM Jul 8, 2024
2ec846d
sycl : fix powf call in device code (#8368)
Alcpz Jul 8, 2024
afd76e6
fix: handle default
JoanFM Jul 8, 2024
c4dd11d
readme : fix web link error [no ci] (#8347)
b4b4o Jul 8, 2024
a130ecc
labeler : updated sycl to match docs and code refactor (#8373)
Alcpz Jul 8, 2024
7fdb6f7
flake.lock: Update (#8342)
ggerganov Jul 8, 2024
7d0e23d
gguf-py : do not use internal numpy types (#7472)
compilade Jul 9, 2024
9beb2dd
readme : fix typo [no ci] (#8389)
daghanerdonmez Jul 9, 2024
9925ca4
cmake : allow external ggml (#8370)
iboB Jul 9, 2024
5b0b8d8
sycl : Reenabled mmvq path for the SYCL Nvidia Backend (#8372)
Alcpz Jul 9, 2024
a03e8dd
make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE (#8392)
JohannesGaessler Jul 9, 2024
e500d61
Deprecation warning to assist with migration to new binary names (#8283)
HanClinto Jul 9, 2024
fd560fe
Update README.md to fix broken link to docs (#8399)
andysalerno Jul 9, 2024
a59f8fd
Server: Enable setting default sampling parameters via command-line (…
HanClinto Jul 9, 2024
8f0fad4
py : fix extra space in convert_hf_to_gguf.py (#8407)
laik Jul 10, 2024
e4dd31f
py : fix converter for internlm2 (#8321)
RunningLeon Jul 10, 2024
a8be1e6
llama : add assert about missing llama_encode() call (#8400)
fairydreaming Jul 10, 2024
7a80710
msvc : silence codecvt c++17 deprecation warnings (#8395)
iboB Jul 10, 2024
cc61948
llama : C++20 compatibility for u8 strings (#8408)
iboB Jul 10, 2024
83321c6
gguf-py rel pipeline (#8410)
monatis Jul 10, 2024
0f1a39f
ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780)
Dibakar Jul 10, 2024
6b2a849
ggml : move sgemm sources to llamafile subfolder (#8394)
ggerganov Jul 10, 2024
f4444d9
[SYCL] Use multi_ptr to clean up deprecated warnings (#8256)
Jul 10, 2024
dd07a12
Name Migration: Build the deprecation-warning 'main' binary every tim…
HanClinto Jul 10, 2024
278d0e1
Initialize default slot sampling parameters from the global context. …
HanClinto Jul 11, 2024
7a221b6
llama : use F32 precision in Qwen2 attention and no FA (#8412)
ggerganov Jul 11, 2024
9a55ffe
tokenize : add --no-parse-special option (#8423)
compilade Jul 11, 2024
a977c11
gitignore : deprecated binaries
ggerganov Jul 11, 2024
808aba3
CUDA: optimize and refactor MMQ (#8416)
JohannesGaessler Jul 11, 2024
b078c61
cuda : suppress 'noreturn' warn in no_device_code (#8414)
danbev Jul 11, 2024
3686456
ggml : add NVPL BLAS support (#8329) (#8425)
nicholaiTukanov Jul 11, 2024
b549a1b
[SYCL] fix the mul_mat_id ut issues (#8427)
ClarkChin08 Jul 12, 2024
370b1f7
ggml : minor naming changes (#8433)
ggerganov Jul 12, 2024
71c1121
examples : sprintf -> snprintf (#8434)
ggerganov Jul 12, 2024
5aefbce
convert : remove fsep token from GPTRefactForCausalLM (#8237)
jpodivin Jul 12, 2024
8a4441e
docker : fix filename for convert-hf-to-gguf.py in tools.sh (#8441)
kriation Jul 12, 2024
c3ebcfa
server : ensure batches are either all embed or all completion (#8420)
iamlemec Jul 12, 2024
f532262
llama : suppress unary minus operator warning (#8448)
danbev Jul 12, 2024
6af51c0
main : print error on empty input (#8456)
ggerganov Jul 12, 2024
4e24cff
server : handle content array in chat API (#8449)
ggerganov Jul 12, 2024
c917b67
metal : template-ify some of the kernels (#8447)
ggerganov Jul 13, 2024
17eb6aa
vulkan : cmake integration (#8119)
bandoti Jul 13, 2024
fa79495
llama : fix pre-tokenization of non-special added tokens (#8228)
compilade Jul 14, 2024
e236528
gguf_hash.py: Add sha256 (#8470)
mofosyne Jul 14, 2024
73cf442
llama : fix Gemma-2 Query scaling factors (#8473)
ggerganov Jul 14, 2024
aaab241
flake.lock: Update (#8475)
ggerganov Jul 14, 2024
090fca7
pydantic : replace uses of __annotations__ with get_type_hints (#8474)
compilade Jul 14, 2024
bda62d7
Vulkan MMQ Fix (#8479)
0cc4m Jul 15, 2024
3dfda05
llama : de-duplicate deepseek2 norm
ggerganov Jul 15, 2024
16bdfa4
[SYCL] add concat through dim 1/2 (#8483)
airMeng Jul 15, 2024
fc690b0
docs: fix links in development docs [no ci] (#8481)
NikolaiLyssogor Jul 15, 2024
9104bc2
common : add --no-cont-batching arg (#6358)
ggerganov Jul 15, 2024
f17f39f
server: update README.md with llama-server --help output [no ci] (#8472)
maruel Jul 15, 2024
8fac431
ggml : suppress unknown pragma 'GCC' on windows (#8460)
danbev Jul 15, 2024
4db8f60
fix ci (#8494)
ngxson Jul 15, 2024
97bdd26
Refactor lora adapter support (#8332)
ngxson Jul 15, 2024
7acfd4e
convert_hf : faster lazy safetensors (#8482)
compilade Jul 16, 2024
0efec57
llama : valign + remove unused ftype (#8502)
ggerganov Jul 16, 2024
37b12f9
export-lora : handle help argument (#8497)
sbonds Jul 16, 2024
1666f92
gguf-hash : update clib.json to point to original xxhash repo (#8491)
mofosyne Jul 16, 2024
5e116e8
make/cmake: add missing force MMQ/cuBLAS for HIP (#8515)
JohannesGaessler Jul 16, 2024
d65a836
llama : disable context-shift for DeepSeek v2 (#8501)
ggerganov Jul 17, 2024
da3913d
batched: fix n_predict parameter (#8527)
msy-kato Jul 17, 2024
1bdd8ae
[CANN] Add Ascend NPU backend (#6035)
hipudding Jul 17, 2024
30f80ca
CONTRIBUTING.md : remove mention of noci (#8541)
mofosyne Jul 17, 2024
b328344
build : Fix docker build warnings (#8535) (#8537)
amochkin Jul 17, 2024
e02b597
lookup: fibonacci hashing, fix crashes (#8548)
JohannesGaessler Jul 17, 2024
3807c3d
server : respect `--special` cli arg (#8553)
RunningLeon Jul 18, 2024
672a6f1
convert-*.py: GGUF Naming Convention Refactor and Metadata Override R…
mofosyne Jul 18, 2024
0d2c732
server: use relative routes for static files in new UI (#8552)
EZForever Jul 18, 2024
705b7ec
cmake : install all ggml public headers (#8480)
65a Jul 18, 2024
a15ef8f
CUDA: fix partial offloading for ne0 % 256 != 0 (#8572)
JohannesGaessler Jul 18, 2024
3d0e436
convert-*.py: add general.name kv override (#8571)
mofosyne Jul 19, 2024
f299aa9
fix: typo of chatglm4 chat tmpl (#8586)
thxCode Jul 19, 2024
b57eb9c
ggml : add friendlier error message to fopen errors (#8575)
HanClinto Jul 19, 2024
be0cfb4
readme : fix server badge
ggerganov Jul 19, 2024
d197545
llama : bump max layers from 256 to 512 (#8530)
ggerganov Jul 19, 2024
57b1d4f
convert-*.py: remove add_name from ChatGLMModel class (#8590)
mofosyne Jul 19, 2024
87e397d
ggml : fix quant dot product with odd number of blocks (#8549)
slaren Jul 19, 2024
c3776ca
gguf_dump.py: fix markddown kv array print (#8588)
mofosyne Jul 20, 2024
69b9945
llama.swiftui: fix end of generation bug (#8268)
ho2103 Jul 20, 2024
9403622
llama : add support for Tekken pre-tokenizer (#8579)
m18coppola Jul 20, 2024
07283b1
gguf : handle null name during init (#8587)
ggerganov Jul 20, 2024
69c487f
CUDA: MMQ code deduplication + iquant support (#8495)
JohannesGaessler Jul 20, 2024
c69c630
convert_hf : fix Gemma v1 conversion (#8597)
compilade Jul 21, 2024
328884f
gguf-py : fix some metadata name extraction edge cases (#8591)
compilade Jul 21, 2024
22f281a
examples : Rewrite pydantic_models_to_grammar_examples.py (#8493)
maruel Jul 21, 2024
45f2c19
flake.lock: Update (#8610)
ggerganov Jul 21, 2024
b7c11d3
examples: fix android example cannot be generated continuously (#8621)
devojony Jul 22, 2024
04bab6b
ggml: fix compile error for RISC-V (#8623)
zqb-all Jul 22, 2024
6281544
server : update doc to clarify n_keep when there is bos token (#8619)
kaetemi Jul 22, 2024
50e0535
llama : add Mistral Nemo inference support (#8604)
iamlemec Jul 22, 2024
e093dd2
tests : re-enable tokenizer tests (#8611)
ggerganov Jul 22, 2024
6f11a83
llama : allow overrides for tokenizer flags (#8614)
ggerganov Jul 22, 2024
566daa5
*.py: Stylistic adjustments for python (#8233)
jpodivin Jul 22, 2024
d94c6e0
llama : add support for SmolLm pre-tokenizer (#8609)
Stillerman Jul 22, 2024
081fe43
llama : fix codeshell support (#8599)
hankeke303 Jul 22, 2024
063d99a
[SYCL] fix scratch size of softmax (#8642)
luoyu-intel Jul 23, 2024
e7e6487
contrib : clarify PR squashing + module names (#8630)
ggerganov Jul 23, 2024
46e4741
Allow all RDNA2 archs to use sdot4 intrinsic (#8629)
jeroen-mostert Jul 23, 2024
751fcfc
Vulkan IQ4_NL Support (#8613)
0cc4m Jul 23, 2024
938943c
llama : move vocab, grammar and sampling into separate files (#8508)
ggerganov Jul 23, 2024
64cf50a
sycl : Add support for non-release DPC++ & oneMKL (#8644)
joeatodd Jul 23, 2024
b841d07
server : fix URL.parse in the UI (#8646)
0x4139 Jul 23, 2024
de28008
examples : Fix `llama-export-lora` example (#8607)
ngxson Jul 23, 2024
b115105
add llama_lora_adapter_clear (#8653)
ngxson Jul 24, 2024
79167d9
Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (#8667)
joeatodd Jul 24, 2024
96952e7
llama : fix `llama_chat_format_single` for mistral (#8657)
ngxson Jul 24, 2024
3a7ac53
readme : update UI list [no ci] (#8505)
SommerEngineering Jul 24, 2024
f19bf99
Build Llama SYCL Intel with static libs (#8668)
joeatodd Jul 24, 2024
68504f0
readme : update games list (#8673)
MorganRO8 Jul 24, 2024
8a4bad5
llama: use sliding window for phi3 (#8627)
FanShupei Jul 25, 2024
4b0eff3
docs : Quantum -> Quantized (#8666)
Ujjawal-K-Panchal Jul 25, 2024
be6d7c0
examples : remove `finetune` and `train-text-from-scratch` (#8669)
ngxson Jul 25, 2024
eddcb52
ggml : add and use ggml_cpu_has_llamafile() (#8664)
ggerganov Jul 25, 2024
ed67bcb
[SYCL] fix multi-gpu issue on sycl (#8554)
ClarkChin08 Jul 25, 2024
88954f7
tests : fix printfs (#8068)
ggerganov Jul 25, 2024
bf5a81d
ggml : fix build on Windows with Snapdragon X (#8531)
AndreasKunar Jul 25, 2024
4226a8d
llama : fix build + fix fabs compile warnings (#8683)
ggerganov Jul 25, 2024
49ce0ab
ggml: handle ggml_init failure to fix NULL pointer deref (#8692)
DavidKorczynski Jul 25, 2024
41cd47c
examples : export-lora : fix issue with quantized base models (#8687)
ngxson Jul 25, 2024
01aec4a
server : add Speech Recognition & Synthesis to UI (#8679)
ElYaiko Jul 25, 2024
201559d
Merge branch 'master' of https://github.com/JoanFM/llama.cpp into fea…
JoanFM Jul 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .devops/cloud-v-pipeline
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ node('x86_runner1'){ // Running on x86 runner containing latest vecto
stage('Running llama.cpp'){
sh'''#!/bin/bash
module load gnu-bin2/0.1 # loading latest versions of vector qemu and vector gcc
qemu-riscv64 -L /softwares/gnu-bin2/sysroot -cpu rv64,v=true,vlen=256,elen=64,vext_spec=v1.0 ./main -m /home/alitariq/codellama-7b.Q4_K_M.gguf -p "Anything" -n 9 > llama_log.txt # Running llama.cpp on vector qemu-riscv64
qemu-riscv64 -L /softwares/gnu-bin2/sysroot -cpu rv64,v=true,vlen=256,elen=64,vext_spec=v1.0 ./llama-cli -m /home/alitariq/codellama-7b.Q4_K_M.gguf -p "Anything" -n 9 > llama_log.txt # Running llama.cpp on vector qemu-riscv64
cat llama_log.txt # Printing results
'''
}
Expand Down
6 changes: 3 additions & 3 deletions .devops/full-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@ ARG CUDA_VERSION=11.7.1
# Target the CUDA build image
ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}

FROM ${BASE_CUDA_DEV_CONTAINER} as build
FROM ${BASE_CUDA_DEV_CONTAINER} AS build

# Unless otherwise specified, we make a fat build.
ARG CUDA_DOCKER_ARCH=all

RUN apt-get update && \
apt-get install -y build-essential python3 python3-pip git libcurl4-openssl-dev
apt-get install -y build-essential python3 python3-pip git libcurl4-openssl-dev libgomp1

COPY requirements.txt requirements.txt
COPY requirements requirements
Expand All @@ -27,7 +27,7 @@ COPY . .
# Set nvcc architecture
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable CUDA
ENV LLAMA_CUDA=1
ENV GGML_CUDA=1
# Enable cURL
ENV LLAMA_CURL=1

Expand Down
4 changes: 2 additions & 2 deletions .devops/full-rocm.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ARG ROCM_VERSION=5.6
# Target the CUDA build image
ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-complete

FROM ${BASE_ROCM_DEV_CONTAINER} as build
FROM ${BASE_ROCM_DEV_CONTAINER} AS build

# Unless otherwise specified, we make a fat build.
# List from https://github.com/ggerganov/llama.cpp/pull/1087#issuecomment-1682807878
Expand Down Expand Up @@ -36,7 +36,7 @@ COPY . .
# Set nvcc architecture
ENV GPU_TARGETS=${ROCM_DOCKER_ARCH}
# Enable ROCm
ENV LLAMA_HIPBLAS=1
ENV GGML_HIPBLAS=1
ENV CC=/opt/rocm/llvm/bin/clang
ENV CXX=/opt/rocm/llvm/bin/clang++

Expand Down
4 changes: 2 additions & 2 deletions .devops/full.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
ARG UBUNTU_VERSION=22.04

FROM ubuntu:$UBUNTU_VERSION as build
FROM ubuntu:$UBUNTU_VERSION AS build

RUN apt-get update && \
apt-get install -y build-essential python3 python3-pip git libcurl4-openssl-dev
apt-get install -y build-essential python3 python3-pip git libcurl4-openssl-dev libgomp1

COPY requirements.txt requirements.txt
COPY requirements requirements
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VER
# Target the CUDA runtime image
ARG BASE_CUDA_RUN_CONTAINER=nvidia/cuda:${CUDA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}

FROM ${BASE_CUDA_DEV_CONTAINER} as build
FROM ${BASE_CUDA_DEV_CONTAINER} AS build

# Unless otherwise specified, we make a fat build.
ARG CUDA_DOCKER_ARCH=all
Expand All @@ -21,12 +21,15 @@ COPY . .
# Set nvcc architecture
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable CUDA
ENV LLAMA_CUDA=1
ENV GGML_CUDA=1

RUN make -j$(nproc)
RUN make -j$(nproc) llama-cli

FROM ${BASE_CUDA_RUN_CONTAINER} as runtime
FROM ${BASE_CUDA_RUN_CONTAINER} AS runtime

COPY --from=build /app/main /main
RUN apt-get update && \
apt-get install -y libgomp1

COPY --from=build /app/llama-cli /llama-cli

ENTRYPOINT [ "/main" ]
ENTRYPOINT [ "/llama-cli" ]
28 changes: 28 additions & 0 deletions .devops/llama-cli-intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
ARG ONEAPI_VERSION=2024.1.1-devel-ubuntu22.04

FROM intel/oneapi-basekit:$ONEAPI_VERSION AS build

ARG GGML_SYCL_F16=OFF
RUN apt-get update && \
apt-get install -y git

WORKDIR /app

COPY . .

RUN if [ "${GGML_SYCL_F16}" = "ON" ]; then \
echo "GGML_SYCL_F16 is set" && \
export OPT_SYCL_F16="-DGGML_SYCL_F16=ON"; \
fi && \
echo "Building with static libs" && \
cmake -B build -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx \
${OPT_SYCL_F16} -DBUILD_SHARED_LIBS=OFF && \
cmake --build build --config Release --target llama-cli

FROM intel/oneapi-basekit:$ONEAPI_VERSION AS runtime

COPY --from=build /app/build/bin/llama-cli /llama-cli

ENV LC_ALL=C.utf8

ENTRYPOINT [ "/llama-cli" ]
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ARG ROCM_VERSION=5.6
# Target the CUDA build image
ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-complete

FROM ${BASE_ROCM_DEV_CONTAINER} as build
FROM ${BASE_ROCM_DEV_CONTAINER} AS build

# Unless otherwise specified, we make a fat build.
# List from https://github.com/ggerganov/llama.cpp/pull/1087#issuecomment-1682807878
Expand Down Expand Up @@ -36,10 +36,10 @@ COPY . .
# Set nvcc architecture
ENV GPU_TARGETS=${ROCM_DOCKER_ARCH}
# Enable ROCm
ENV LLAMA_HIPBLAS=1
ENV GGML_HIPBLAS=1
ENV CC=/opt/rocm/llvm/bin/clang
ENV CXX=/opt/rocm/llvm/bin/clang++

RUN make -j$(nproc)
RUN make -j$(nproc) llama-cli

ENTRYPOINT [ "/app/main" ]
ENTRYPOINT [ "/app/llama-cli" ]
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
ARG UBUNTU_VERSION=jammy

FROM ubuntu:$UBUNTU_VERSION as build
FROM ubuntu:$UBUNTU_VERSION AS build

# Install build tools
RUN apt update && apt install -y git build-essential cmake wget
RUN apt update && apt install -y git build-essential cmake wget libgomp1

# Install Vulkan SDK
RUN wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
Expand All @@ -14,14 +14,14 @@ RUN wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key
# Build it
WORKDIR /app
COPY . .
RUN cmake -B build -DLLAMA_VULKAN=1 && \
cmake --build build --config Release --target main
RUN cmake -B build -DGGML_VULKAN=1 && \
cmake --build build --config Release --target llama-cli

# Clean up
WORKDIR /
RUN cp /app/build/bin/main /main && \
RUN cp /app/build/bin/llama-cli /llama-cli && \
rm -rf /app

ENV LC_ALL=C.utf8

ENTRYPOINT [ "/main" ]
ENTRYPOINT [ "/llama-cli" ]
23 changes: 23 additions & 0 deletions .devops/llama-cli.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
ARG UBUNTU_VERSION=22.04

FROM ubuntu:$UBUNTU_VERSION AS build

RUN apt-get update && \
apt-get install -y build-essential git

WORKDIR /app

COPY . .

RUN make -j$(nproc) llama-cli

FROM ubuntu:$UBUNTU_VERSION AS runtime

RUN apt-get update && \
apt-get install -y libgomp1

COPY --from=build /app/llama-cli /llama-cli

ENV LC_ALL=C.utf8

ENTRYPOINT [ "/llama-cli" ]
84 changes: 0 additions & 84 deletions .devops/llama-cpp-clblast.srpm.spec

This file was deleted.

16 changes: 8 additions & 8 deletions .devops/llama-cpp-cuda.srpm.spec
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,13 @@ CPU inference for Meta's Lllama2 models using default options.
%setup -n llama.cpp-master

%build
make -j LLAMA_CUDA=1
make -j GGML_CUDA=1

%install
mkdir -p %{buildroot}%{_bindir}/
cp -p main %{buildroot}%{_bindir}/llamacppcuda
cp -p server %{buildroot}%{_bindir}/llamacppcudaserver
cp -p simple %{buildroot}%{_bindir}/llamacppcudasimple
cp -p llama-cli %{buildroot}%{_bindir}/llama-cuda-cli
cp -p llama-server %{buildroot}%{_bindir}/llama-cuda-server
cp -p llama-simple %{buildroot}%{_bindir}/llama-cuda-simple

mkdir -p %{buildroot}/usr/lib/systemd/system
%{__cat} <<EOF > %{buildroot}/usr/lib/systemd/system/llamacuda.service
Expand All @@ -49,7 +49,7 @@ After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.t
[Service]
Type=simple
EnvironmentFile=/etc/sysconfig/llama
ExecStart=/usr/bin/llamacppcudaserver $LLAMA_ARGS
ExecStart=/usr/bin/llama-cuda-server $LLAMA_ARGS
ExecReload=/bin/kill -s HUP $MAINPID
Restart=never

Expand All @@ -67,9 +67,9 @@ rm -rf %{buildroot}
rm -rf %{_builddir}/*

%files
%{_bindir}/llamacppcuda
%{_bindir}/llamacppcudaserver
%{_bindir}/llamacppcudasimple
%{_bindir}/llama-cuda-cli
%{_bindir}/llama-cuda-server
%{_bindir}/llama-cuda-simple
/usr/lib/systemd/system/llamacuda.service
%config /etc/sysconfig/llama

Expand Down
14 changes: 7 additions & 7 deletions .devops/llama-cpp.srpm.spec
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,9 @@ make -j

%install
mkdir -p %{buildroot}%{_bindir}/
cp -p main %{buildroot}%{_bindir}/llama
cp -p server %{buildroot}%{_bindir}/llamaserver
cp -p simple %{buildroot}%{_bindir}/llamasimple
cp -p llama-cli %{buildroot}%{_bindir}/llama-cli
cp -p llama-server %{buildroot}%{_bindir}/llama-server
cp -p llama-simple %{buildroot}%{_bindir}/llama-simple

mkdir -p %{buildroot}/usr/lib/systemd/system
%{__cat} <<EOF > %{buildroot}/usr/lib/systemd/system/llama.service
Expand All @@ -51,7 +51,7 @@ After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.t
[Service]
Type=simple
EnvironmentFile=/etc/sysconfig/llama
ExecStart=/usr/bin/llamaserver $LLAMA_ARGS
ExecStart=/usr/bin/llama-server $LLAMA_ARGS
ExecReload=/bin/kill -s HUP $MAINPID
Restart=never

Expand All @@ -69,9 +69,9 @@ rm -rf %{buildroot}
rm -rf %{_builddir}/*

%files
%{_bindir}/llama
%{_bindir}/llamaserver
%{_bindir}/llamasimple
%{_bindir}/llama-cli
%{_bindir}/llama-server
%{_bindir}/llama-simple
/usr/lib/systemd/system/llama.service
%config /etc/sysconfig/llama

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VER
# Target the CUDA runtime image
ARG BASE_CUDA_RUN_CONTAINER=nvidia/cuda:${CUDA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}

FROM ${BASE_CUDA_DEV_CONTAINER} as build
FROM ${BASE_CUDA_DEV_CONTAINER} AS build

# Unless otherwise specified, we make a fat build.
ARG CUDA_DOCKER_ARCH=all
Expand All @@ -21,17 +21,19 @@ COPY . .
# Set nvcc architecture
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable CUDA
ENV LLAMA_CUDA=1
ENV GGML_CUDA=1
# Enable cURL
ENV LLAMA_CURL=1

RUN make -j$(nproc)
RUN make -j$(nproc) llama-server

FROM ${BASE_CUDA_RUN_CONTAINER} as runtime
FROM ${BASE_CUDA_RUN_CONTAINER} AS runtime

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev
apt-get install -y libcurl4-openssl-dev libgomp1 curl

COPY --from=build /app/server /server
COPY --from=build /app/llama-server /llama-server

ENTRYPOINT [ "/server" ]
HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/llama-server" ]
Loading