-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update llama.cpp to latest #6
Commits on Feb 4, 2024
-
Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/07f6395285469419cf9d078f59b5b49993198c00' (2024-01-11) → 'github:hercules-ci/flake-parts/b253292d9c0a5ead9bc98c4e9a26c6312e27d69f' (2024-02-01) • Updated input 'flake-parts/nixpkgs-lib': 'github:NixOS/nixpkgs/b0d36bd0a420ecee3bc916c91886caca87c894e9?dir=lib' (2023-12-30) → 'github:NixOS/nixpkgs/97b17f32362e475016f942bbdfda4a4a72a8a652?dir=lib' (2024-01-29) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/ae5c332cbb5827f6b1f02572496b141021de335f' (2024-01-25) → 'github:NixOS/nixpkgs/b8b232ae7b8b144397fdb12d20f592e5e7c1a64d' (2024-01-31)
Configuration menu - View commit details
-
Copy full SHA for 9392ebd - Browse repository at this point
Copy the full SHA 9392ebdView commit details
Commits on Feb 5, 2024
-
[SYCL] Fix cpy with dims of 3 (ggerganov#5289)
* Fix cpy with dims of 3 * rm asserts --------- Co-authored-by: Abhilash Majumder <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4833ac2 - Browse repository at this point
Copy the full SHA 4833ac2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5d55b0c - Browse repository at this point
Copy the full SHA 5d55b0cView commit details -
scripts : add non-interactive server-llm.sh (ggerganov#5303)
* Update server-llm.sh Add flag --non-interactive that allows run script without asking a permission * Update scripts/server-llm.sh --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4be04c8 - Browse repository at this point
Copy the full SHA 4be04c8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 30679d4 - Browse repository at this point
Copy the full SHA 30679d4View commit details -
common : add dynamic temperature parameters to main example cli (gger…
…ganov#5295) * added dynamic temp params in main * added help text
Configuration menu - View commit details
-
Copy full SHA for e6f8177 - Browse repository at this point
Copy the full SHA e6f8177View commit details -
Configuration menu - View commit details
-
Copy full SHA for a2d60c9 - Browse repository at this point
Copy the full SHA a2d60c9View commit details -
iq2_xxs: tune quantization (ggerganov#5320)
We get slightly better PPL, and we cut quantization time in nearly half. The trick is to 1st quantize without forcing points onto the E8-lattice. We can then use a narrower search range around the block scale that we got that way. Co-authored-by: Iwan Kawrakow <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6fdfa2e - Browse repository at this point
Copy the full SHA 6fdfa2eView commit details -
py : fix internlm2-hf convert to gguf (ggerganov#5305)
* py : fix internlm2-hf convert to gguf * ggml-ci
Configuration menu - View commit details
-
Copy full SHA for 7e1ae37 - Browse repository at this point
Copy the full SHA 7e1ae37View commit details -
iq3_xxs: quards for the no-imatrix situation (ggerganov#5334)
Co-authored-by: Iwan Kawrakow <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 89503dc - Browse repository at this point
Copy the full SHA 89503dcView commit details -
ggml : avoid duplicating function calls using MIN/MAX macros (ggergan…
…ov#5325) * Avoid duplicating function calls when using MIN/MAX macros. Since these copy "a" and "b" they ask the compiler to evaluate one of them twice. The compiler doesn't have a problem with removing the duplication in something like MAX(0, x + 2), but in some cases we're calling functions, and those calls just happen twice. By explicitly evaluating at the expression we get smaller and faster code without duplicate calls. See ggml_rope_yarn_corr_dims in Compiler Explorer: https://godbolt.org/z/Ee4KMrvKh Code behaves exactly the same. * Update ggml.c --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for abb6194 - Browse repository at this point
Copy the full SHA abb6194View commit details -
ggml : make use of ggml-quants.h possible in C++ code (ggerganov#5338)
* Make use of ggml-quants.h possible in C++ code * One cannot possibly be defining static_assert in a C++ compilation --------- Co-authored-by: Iwan Kawrakow <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c6b3955 - Browse repository at this point
Copy the full SHA c6b3955View commit details -
README: updated introduction (ggerganov#5343)
* README: updated introduction * readme : update --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 78b00dd - Browse repository at this point
Copy the full SHA 78b00ddView commit details -
make: Use ccache for faster compilation (ggerganov#5318)
* make: Use ccache for faster compilation
Configuration menu - View commit details
-
Copy full SHA for 098f6d7 - Browse repository at this point
Copy the full SHA 098f6d7View commit details
Commits on Feb 6, 2024
-
py : handle byte tokens in
get_token_type
(ggerganov#5341)* py : handle byte tokens in `get_token_type` * py : fix empty bytes arg
Configuration menu - View commit details
-
Copy full SHA for 906cff5 - Browse repository at this point
Copy the full SHA 906cff5View commit details -
server : various fixes for the prompt field in /completion (ggerganov…
…#5300) server : fix deadlock when prompt array contains strings and numbers server : removed an unnecessary generation when generating multi-prompts server : removed an unnecessary assert
Configuration menu - View commit details
-
Copy full SHA for 4ffc7a1 - Browse repository at this point
Copy the full SHA 4ffc7a1View commit details -
server : add
dynatemp_range
anddynatemp_exponent
(ggerganov#5352)* server: added `dynatemp_range` and `dynatemp_exponent` * Update README.md --------- Co-authored-by: Michael Coppola <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 31e7903 - Browse repository at this point
Copy the full SHA 31e7903View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8a79c59 - Browse repository at this point
Copy the full SHA 8a79c59View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2c51661 - Browse repository at this point
Copy the full SHA 2c51661View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2e9c0bd - Browse repository at this point
Copy the full SHA 2e9c0bdView commit details -
Slight quantization improvement for Q4_K and Q5_K (ggerganov#5361)
* Q4_K: slightly better quantization * Q5_K: slightly better quantization --------- Co-authored-by: Iwan Kawrakow <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f57fadc - Browse repository at this point
Copy the full SHA f57fadcView commit details -
Update README.md (ggerganov#5366)
Add some links to quantization related PRs
Configuration menu - View commit details
-
Copy full SHA for b08f22c - Browse repository at this point
Copy the full SHA b08f22cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 17c97fb - Browse repository at this point
Copy the full SHA 17c97fbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 213d143 - Browse repository at this point
Copy the full SHA 213d143View commit details
Commits on Feb 7, 2024
-
convert : fix TypeError on GPT-2 vocab.json (ggerganov#5288)
Sang-Kil Park authoredFeb 7, 2024 Configuration menu - View commit details
-
Copy full SHA for f68664a - Browse repository at this point
Copy the full SHA f68664aView commit details -
server : update
/props
with "total_slots" value (ggerganov#5373)* include total "num_slots" in default_generation_settings_for_props * cleanup total_slots return value in /props endpoint * update /props endpoint docs with total_slots * remove num_slots from default_generation_settings_for_props * update /props endpoint section
Configuration menu - View commit details
-
Copy full SHA for f3e2b4f - Browse repository at this point
Copy the full SHA f3e2b4fView commit details -
llama : add MiniCPM support (ggerganov#5346)
* support minicpm arch. * fix tab/space typo. * convert minicpm model via convert-hf-gguf.py * try to make tokenizer work * fix bug for quantize minicpm * fix for flake8 lint * remove convert-minicpm.py * fix for editorconfig * correct minicpm model type (size) * constants expanded for minicpm * Minor change of the constant names for minicpm
Configuration menu - View commit details
-
Copy full SHA for 316c7fa - Browse repository at this point
Copy the full SHA 316c7faView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9a697d8 - Browse repository at this point
Copy the full SHA 9a697d8View commit details -
readme : modernize (ggerganov#5379)
* first cleanup, update everything to Llama 2 and remove outdated content * Delete SHA256SUMS * make build instructions generic * recommend Q4_K_M quantization method * Update README.md
Configuration menu - View commit details
-
Copy full SHA for ed0bf32 - Browse repository at this point
Copy the full SHA ed0bf32View commit details -
Basic Vulkan Multi-GPU implementation (ggerganov#5321)
* Initial Vulkan multi-gpu implementation Move most global variables into backend context * Add names to backend device functions * Add further missing cleanup code * Reduce code duplication in tensor split layer assignment * generalize LLAMA_SPLIT_LAYER for all backends, do not expose device count and memory in llama.h * Only do device info print in the beginning and initialize one backend for cpu assist Add missing cleanup code * Rework backend memory management to make sure devices and buffers get properly allocated and freed * Rename cpu assist free function --------- Co-authored-by: slaren <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ee1628b - Browse repository at this point
Copy the full SHA ee1628bView commit details -
llava-cli : always tokenize special tokens (ggerganov#5382)
* llava-cli: tokenize special tokens in prompt * llava-cli: use the escape CLI argument, remove incomplete separate escaping process
Configuration menu - View commit details
-
Copy full SHA for 0ef46da - Browse repository at this point
Copy the full SHA 0ef46daView commit details -
Configuration menu - View commit details
-
Copy full SHA for 10afa6f - Browse repository at this point
Copy the full SHA 10afa6fView commit details -
Configuration menu - View commit details
-
Copy full SHA for aa7ab99 - Browse repository at this point
Copy the full SHA aa7ab99View commit details -
Configuration menu - View commit details
-
Copy full SHA for b906596 - Browse repository at this point
Copy the full SHA b906596View commit details -
fix typo in readme (ggerganov#5399)
Co-authored-by: Ebey Abraham <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8c933b7 - Browse repository at this point
Copy the full SHA 8c933b7View commit details -
CMAKE_OSX_ARCHITECTURES for MacOS cross compilation (ggerganov#5393)
Co-authored-by: Jared Van Bortel <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c4fbb67 - Browse repository at this point
Copy the full SHA c4fbb67View commit details
Commits on Feb 8, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 8504d2d - Browse repository at this point
Copy the full SHA 8504d2dView commit details -
sampling: fix top_k <= 0 (ggerganov#5388)
* sampling: fix top_k <= 0 * Update llama.cpp Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 26d4efd - Browse repository at this point
Copy the full SHA 26d4efdView commit details -
llava: fix typo/formatting in README.md (ggerganov#5405)
This commit fixes a typo in the README.md file for the llava example which is causing the formatting to look a little off: Clone llava-v15-7b`` and clip-vit-large-patch14-336`` locally Signed-off-by: Daniel Bevenius <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a6e514a - Browse repository at this point
Copy the full SHA a6e514aView commit details -
llama : fix MiniCPM (ggerganov#5392)
* fix bug for norm_rms_eps missing * to align with the same order as convert.py for model write * fix: undo HF models permute tensor * update for flake8 lint
Configuration menu - View commit details
-
Copy full SHA for 4aa43fa - Browse repository at this point
Copy the full SHA 4aa43faView commit details -
Configuration menu - View commit details
-
Copy full SHA for b7b74ce - Browse repository at this point
Copy the full SHA b7b74ceView commit details -
llava : add missing .py, and fix paths in README.md (ggerganov#5414)
This commit adds the missing .py extension to the convert-image-encoder-to-gguf script. It also fixes the paths for the `model` and `mmproj` options in the example llava-cli command. Signed-off-by: Daniel Bevenius <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ff4ff05 - Browse repository at this point
Copy the full SHA ff4ff05View commit details -
Fix f16_sycl cpy call from Arc (ggerganov#5411)
* fix f16_sycl cpy call * rm old logic * add fp16 build CI * use macro * format fix
Configuration menu - View commit details
-
Copy full SHA for 6e99f2a - Browse repository at this point
Copy the full SHA 6e99f2aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 41f308f - Browse repository at this point
Copy the full SHA 41f308fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8e6a9d2 - Browse repository at this point
Copy the full SHA 8e6a9d2View commit details
Commits on Feb 9, 2024
-
Fix Vulkan crash on APUs with very little device memory (ggerganov#5424)
* Fix Vulkan crash on APUs with very little device memory * Fix debug output function names
Configuration menu - View commit details
-
Copy full SHA for 44fbe34 - Browse repository at this point
Copy the full SHA 44fbe34View commit details -
Configuration menu - View commit details
-
Copy full SHA for b2f87cb - Browse repository at this point
Copy the full SHA b2f87cbView commit details -
Configuration menu - View commit details
-
Copy full SHA for e4124c2 - Browse repository at this point
Copy the full SHA e4124c2View commit details -
llama : do not cap thread count when MoE on CPU (ggerganov#5419)
* Not capping thread count when MoE inference is running on CPU * Whitespace
Configuration menu - View commit details
-
Copy full SHA for e5ca393 - Browse repository at this point
Copy the full SHA e5ca393View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7c777fc - Browse repository at this point
Copy the full SHA 7c777fcView commit details -
llava : add requirements.txt and update README.md (ggerganov#5428)
* llava: add requirements.txt and update README.md This commit adds a `requirements.txt` file to the `examples/llava` directory. This file contains the required Python packages to run the scripts in the `examples/llava` directory. The motivation of this to make it easier for users to run the scripts in `examples/llava`. This will avoid users from having to possibly run into missing package issues if the packages are not installed on their system. Signed-off-by: Daniel Bevenius <[email protected]> * llava: fix typo in llava-surgery.py output Signed-off-by: Daniel Bevenius <[email protected]> --------- Signed-off-by: Daniel Bevenius <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e00d2a6 - Browse repository at this point
Copy the full SHA e00d2a6View commit details -
vulkan: Set limit for task concurrency (ggerganov#5427)
A common default for the maximum number of open files is 256, which can lead to `asyncio.gather(*tasks)` failing with Too many open files. $ python ggml_vk_generate_shaders.py --glslc=$ANDROID_NDK_PATH/shader-tools/darwin-x86_64/glslc ggml_vulkan: Generating and compiling shaders to SPIR-V Traceback (most recent call last): File "/Users/neuman/Code.noindex/github/llama.cpp/ggml_vk_generate_shaders.py", line 2326, in <module> asyncio.run(main()) File "/Users/neuman/Code.noindex/miniforge3/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/Users/neuman/Code.noindex/miniforge3/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/Users/neuman/Code.noindex/github/llama.cpp/ggml_vk_generate_shaders.py", line 2294, in main await asyncio.gather(*tasks) [...snip...] OSError: [Errno 24] Too many open files This change sets a reasonable concurrency limit for tasks (and therefore open files), without significant impact on run time.
Configuration menu - View commit details
-
Copy full SHA for 4b7b38b - Browse repository at this point
Copy the full SHA 4b7b38bView commit details
Commits on Feb 10, 2024
-
ggml : add abort_callback for cpu backend (ggml/725)
* a way to use abort_callback with the cpu backend * whisper update
Configuration menu - View commit details
-
Copy full SHA for 4633d93 - Browse repository at this point
Copy the full SHA 4633d93View commit details -
Configuration menu - View commit details
-
Copy full SHA for 43b65f5 - Browse repository at this point
Copy the full SHA 43b65f5View commit details -
Configuration menu - View commit details
-
Copy full SHA for cd9aea6 - Browse repository at this point
Copy the full SHA cd9aea6View commit details -
metal : use autoreleasepool to avoid memory leaks (ggerganov#5437)
There appears to be a known memory leak when using the `MLTCommandBuffer`. It is suggested to use `@autoreleasepool` in [1,2] [1] https://developer.apple.com/forums/thread/662721 [2] https://forums.developer.apple.com/forums/thread/120931 This change-set wraps the `ggml_metal_graph_compute` in a `@autoreleasepool`. This commit addresses ggerganov#5436
Configuration menu - View commit details
-
Copy full SHA for f026f81 - Browse repository at this point
Copy the full SHA f026f81View commit details
Commits on Feb 11, 2024
-
server : add llama2 chat template (ggerganov#5425)
* server: add mistral chat template * server: fix typo * server: rename template mistral to llama2 * server: format_llama2: remove BOS * server: validate "--chat-template" argument * server: clean up using_chatml variable Co-authored-by: Jared Van Bortel <[email protected]> --------- Co-authored-by: Jared Van Bortel <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 907e08c - Browse repository at this point
Copy the full SHA 907e08cView commit details -
Configuration menu - View commit details
-
Copy full SHA for e4640d8 - Browse repository at this point
Copy the full SHA e4640d8View commit details -
ggml : add mmla kernels for quantized GEMM (ggerganov#4966)
* ggml: aarch64: implement smmla kernel for q8_0_q8_0 quantized gemm armv8.2-a and above supports MMLA instructions that have higher throughput than DOT. this commit adds mmla kernel for q8_0_q8_0 gemm. The feature is enabled if the platform supports "__ARM_FEATURE_MATMUL_INT8" On AWS Graviton3 processors this kernel resulted up to 1.5x improvement for prompt evaluation throughput compared to the default sdot kernel. * ggml: aarch64: implement smmla kernel for q4_0_q8_0 quantized gemm armv8.2-a and above supports MMLA instructions that have higher throughput than DOT. this commit adds mmla kernel for q4_0_q8_0 gemm. The feature is enabled if the platform supports "__ARM_FEATURE_MATMUL_INT8" On AWS Graviton3 processors this kernel resulted up to 1.5x improvement for prompt evaluation throughput compared to the default sdot kernel. * ggml: aarch64: implement smmla kernel for q4_1_q8_1 quantized gemm armv8.2-a and above supports MMLA instructions that have higher throughput than DOT. this commit adds mmla kernel for q4_1_q8_1 gemm. The feature is enabled if the platform supports "__ARM_FEATURE_MATMUL_INT8" On AWS Graviton3 processors this kernel resulted up to 1.5x improvement for prompt evaluation throughput compared to the default sdot kernel. * ggml: update unit tests for the new vec_dot interface * llama.cpp: add MATMUL_INT8 capability to system_info
Configuration menu - View commit details
-
Copy full SHA for a07d0fe - Browse repository at this point
Copy the full SHA a07d0feView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0f2411f - Browse repository at this point
Copy the full SHA 0f2411fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 139b62a - Browse repository at this point
Copy the full SHA 139b62aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 85910c5 - Browse repository at this point
Copy the full SHA 85910c5View commit details -
server : allow to specify tokens as strings in logit_bias (ggerganov#…
…5003) * server: allow to specify tokens as strings in logit_bias * Apply suggestions from code review Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6847801 - Browse repository at this point
Copy the full SHA 6847801View commit details -
common : use enums for sampler types (ggerganov#5418)
* common: use enums for sampler types * Apply suggestions from code review Co-authored-by: Georgi Gerganov <[email protected]> * minor : spaces --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a803333 - Browse repository at this point
Copy the full SHA a803333View commit details -
vulkan: only use M-sized matmul on Apple GPUs (ggerganov#5412)
* vulkan: refactor guess_matmul_pipeline for vendor Refactor ggml_vk_guess_matmul_pipeline to simplify adding per-vendor conditionals. Signed-off-by: Sergio Lopez <[email protected]> * vulkan: only use M-sized matmul on Apple GPUs L-sized and S-sized matmuls are broken on Apple GPUs, force using M-size with this vendor. Signed-off-by: Sergio Lopez <[email protected]> --------- Signed-off-by: Sergio Lopez <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c88c74f - Browse repository at this point
Copy the full SHA c88c74fView commit details -
Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/b8b232ae7b8b144397fdb12d20f592e5e7c1a64d' (2024-01-31) → 'github:NixOS/nixpkgs/f8e2ebd66d097614d51a56a755450d4ae1632df1' (2024-02-07)
Configuration menu - View commit details
-
Copy full SHA for 97a3365 - Browse repository at this point
Copy the full SHA 97a3365View commit details -
Add support for BERT embedding models (ggerganov#5423)
* BERT model graph construction (build_bert) * WordPiece tokenizer (llm_tokenize_wpm) * Add flag for non-causal attention models * Allow for models that only output embeddings * Support conversion of BERT models to GGUF * Based on prior work by @xyzhang626 and @skeskinen --------- Co-authored-by: Jared Van Bortel <[email protected]> Co-authored-by: Jared Van Bortel <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2891c8a - Browse repository at this point
Copy the full SHA 2891c8aView commit details -
CUDA: mul_mat_vec_q tiling, refactor mul mat logic (ggerganov#5434)
* CUDA: mul_mat_vec_q tiling, refactor mul mat logic Co-authored-by: slaren <[email protected]> --------- Co-authored-by: slaren <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3bdc4cd - Browse repository at this point
Copy the full SHA 3bdc4cdView commit details
Commits on Feb 12, 2024
-
* ggml-alloc : v3 (ggml/727) * ggml-alloc v3 ggml-ci * fix ci ggml-ci * whisper : check for backend buffer allocation failures * whisper : avoid leaks when initialization fails * cleanup ggml-ci * style fixes ggml-ci * sync : ggml * update llama.cpp, clip.cpp, export-lora.cpp * update finetune.cpp, train-text-from-scratch.cpp ggml-ci * ggml-backend : reduce alignment to 32 to match gguf and fix mmap --------- Co-authored-by: slaren <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3b16944 - Browse repository at this point
Copy the full SHA 3b16944View commit details -
llava : remove prog parameter from ArgumentParser (ggerganov#5457)
* llava: remove prog parameter from ArgumentParser This commit removes the `prog` parameter from `ArgumentParser` so that it uses the default value which is the name of the script. The motivation for this change is that currently the usage output looks like this: ```console $ python examples/llava/convert-image-encoder-to-gguf.py --help usage: convert_hf_to_gguf.py [-h] ... ``` And with this change it will look like this: ```console $ python examples/llava/convert-image-encoder-to-gguf.py --help usage: convert-image-encoder-to-gguf.py [-h] ... ``` Signed-off-by: Daniel Bevenius <[email protected]> * ci: add W503 to flake8 ignore list This commit adds W503 to the ignore list for flake8. This is done to avoid the following error: W503 line break before binary operator Signed-off-by: Daniel Bevenius <[email protected]> --------- Signed-off-by: Daniel Bevenius <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4a46d2b - Browse repository at this point
Copy the full SHA 4a46d2bView commit details -
ggml-sycl: Replace 3d ops with macro (ggerganov#5458)
* use macro * use macro * fix format
Configuration menu - View commit details
-
Copy full SHA for 43fe07c - Browse repository at this point
Copy the full SHA 43fe07cView commit details -
py : fix persimmon
n_rot
conversion (ggerganov#5460)* convert : fix persimmon offical weight conversion to write correct n_rot. * Update convert-persimmon-to-gguf.py --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for dbd8828 - Browse repository at this point
Copy the full SHA dbd8828View commit details -
swift : package no longer use ggml dependency (ggerganov#5465)
* Revert "swift : update Package.swift to use ggml as dependency (ggerganov#4691)" This reverts commit ece9a45. * spm : add ggml headers
Configuration menu - View commit details
-
Copy full SHA for df334a1 - Browse repository at this point
Copy the full SHA df334a1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 099afc6 - Browse repository at this point
Copy the full SHA 099afc6View commit details
Commits on Feb 13, 2024
-
ggml-quants : fix compiler warnings (shadow variable) (ggerganov#5472)
Co-authored-by: Iwan Kawrakow <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 895407f - Browse repository at this point
Copy the full SHA 895407fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 99b8b43 - Browse repository at this point
Copy the full SHA 99b8b43View commit details -
bert : add tests + fix quantization (ggerganov#5475)
* llama : do not quantize pos embd and token type tensors * ci : add BERT tests ggml-ci * ci : do not do BERT tests on low-perf nodes ggml-ci
Configuration menu - View commit details
-
Copy full SHA for 49cc1f7 - Browse repository at this point
Copy the full SHA 49cc1f7View commit details -
make: add error message for bad CUDA version (ggerganov#5444)
* make: add error message for bad CUDA version * Update Makefile Co-authored-by: Jared Van Bortel <[email protected]> --------- Co-authored-by: Jared Van Bortel <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ad014bb - Browse repository at this point
Copy the full SHA ad014bbView commit details -
llama : support batched embeddings (ggerganov#5466)
* batched embedding: pool outputs by sequence id. updated embedding example * bring back non-causal attention * embd : minor improvements * llama : minor --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 03bf161 - Browse repository at this point
Copy the full SHA 03bf161View commit details -
tests : multi-thread the tokenizer tests (ggerganov#5474)
* tests : multi-thread the tokenizer tests ggml-ci * unicode : fix data race for unidentified codepoints ggml-ci * unicode : minor style fixes ggml-ci
Configuration menu - View commit details
-
Copy full SHA for cf45252 - Browse repository at this point
Copy the full SHA cf45252View commit details -
finetune : rename feed-forward tensors (w1/w2/w3) (ggerganov#4839)
* finetune: rename feed-forward tensors (w1/w2/w3) This commit renames the feed-forward tensors w1, w2 and w3 to ffn_gate, ffn_down and ffn_up respectively. The motivation for this change is to make it easier to understand the purpose of the tensors. This also seems to be inline with the names used in the llama_layer struct in llama.cpp. Signed-off-by: Daniel Bevenius <[email protected]> * train-text-from-scratch: rename ff tensors This commit renames the feed-forward tensors w1, w2 and w3 to ffn_gate, ffn_down and ffn_up respectively. The motivation for this change is to make it easier to understand the purpose of the tensors. This also seems to be inline with the names used in the llama_layer struct in llama.cpp Signed-off-by: Daniel Bevenius <[email protected]> --------- Signed-off-by: Daniel Bevenius <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2639789 - Browse repository at this point
Copy the full SHA 2639789View commit details -
llama : make load error reporting more granular (ggerganov#5477)
Makes it easier to pinpoint where e.g. `unordered_map::at: key not found` comes from.
Configuration menu - View commit details
-
Copy full SHA for 037259b - Browse repository at this point
Copy the full SHA 037259bView commit details -
llama : allow raw byte in SPM vocabs; don't crash on nl 404 (ggergano…
…v#5478) * common : don't crash if newline token is not found * common : llama_byte_to_token: allow falling back to finding just the token byte in SPM vocabs
Configuration menu - View commit details
-
Copy full SHA for c4e6dd5 - Browse repository at this point
Copy the full SHA c4e6dd5View commit details -
Configuration menu - View commit details
-
Copy full SHA for ea9c8e1 - Browse repository at this point
Copy the full SHA ea9c8e1View commit details -
gguf : add python reader example (ggerganov#5216)
* Update CMakeLists.txt * Create reader.py * Update reader.py * Update reader.py another whitespace :| * Update reader.py * lintlintlint
Configuration menu - View commit details
-
Copy full SHA for 6c00a06 - Browse repository at this point
Copy the full SHA 6c00a06View commit details -
Early return for zero size calls to get_tensor. (ggerganov#5482)
* Early return for zero size calls to get_tensor. Signed-off-by: Adam Treat <[email protected]> * Update ggml-kompute.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update ggml-kompute.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Add an early return to the get/set tensor when the size is null. Signed-off-by: Adam Treat <[email protected]> * Early return after the assertions. Signed-off-by: Adam Treat <[email protected]> * Since we do the early return in the generic backend now no reason to do so here as well. Signed-off-by: Adam Treat <[email protected]> --------- Signed-off-by: Adam Treat <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f5ca054 - Browse repository at this point
Copy the full SHA f5ca054View commit details
Commits on Feb 14, 2024
-
llava : support v1.6 (ggerganov#5267)
* Create llava-survery-v2.py * Update convert-image-encoder-to-gguf.py * Update convert-image-encoder-to-gguf.py * Rename llava-survery-v2.py to llava-surgery-v2.py * Update convert-image-encoder-to-gguf.py will now search for projector * Update convert-image-encoder-to-gguf.py whoops * Update llava-surgery-v2.py * Clip: Bugfix for normalization (it did not loat the 3 std and mean values) Clip: bicubic resize function Clip: added save-to-bmp/pil for debugging and conversion from/to 32/8 images Clip: added normalization with FP16 precision simulation (image tensors match HF implementation, can be switched off, only used for llava-1.6) Clip: added newline tensor, mergetype kv, image-grid kv, new resize-pad function with resolution from gridpoints Clip: clip_image_preprocess now returns a float * vector instead of float, this way llava 1.5 and 1.6 is supported llava: added ggml cpu graph for embedding patching, added spatial_unpad preliminary support, added a lot of comments that need to be cleaned when all is final convert-image-encoder: fixed image-grid flattening * whitespace corrections * ws * Tensors are now properly permuted. Before the embeddings were inserted 1:1, now they are split into the 24x24 patches as in reference. * ws * added verbose_prompt support into cli added stopwords for llava-1.6 into cli * moved llava functions to llava.cpp, made clip.h C compatible API, replaced vector style functions with pointers, added a debug define to remove functions from compilation while not needed * ws * convert : skip unknown tensors (need for LLaVA) * llava : update readme * llava : fix compile warnings * llava : style * convert : add --skip-unknown CLI arg * server : remove clip structs * bugfix for non llava-1.6 It should now work with llava-1.5 as well * clip : minor code rearrange * llava : update readme a bit --------- Co-authored-by: John <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for aa23412 - Browse repository at this point
Copy the full SHA aa23412View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8084d55 - Browse repository at this point
Copy the full SHA 8084d55View commit details -
llava : update README.md (ggerganov#5489)
* Update README.md * Update README.md * Update examples/llava/README.md --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ccbb277 - Browse repository at this point
Copy the full SHA ccbb277View commit details -
Configuration menu - View commit details
-
Copy full SHA for 594fca3 - Browse repository at this point
Copy the full SHA 594fca3View commit details
Commits on Feb 15, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 704359e - Browse repository at this point
Copy the full SHA 704359eView commit details -
llaba : hotfix for llava-1.6 image number (ggerganov#5495)
Co-authored-by: John <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7930a8a - Browse repository at this point
Copy the full SHA 7930a8aView commit details -
llava : fix memory management bug (ggerganov#5491)
* Fix memory management in llava and server code Fixes this error: llama_new_context_with_model: graph splits (measure): 3 Available slots: -> Slot 0 - max context: 6000 {"timestamp":1707926446,"level":"INFO","function":"main","line":2623,"message":"model loaded"} all slots are idle and system prompt is empty, clear the KV cache slot 0 - loaded image slot 0 is processing [task id: 0] slot 0 : kv cache rm - [0, end) slot 0 - encoding image [id: 1] munmap_chunk(): invalid pointer Aborted * Make it cleaner by checking size in batch free wrapper
Configuration menu - View commit details
-
Copy full SHA for 0d41771 - Browse repository at this point
Copy the full SHA 0d41771View commit details -
fix(gguf-py): special tokens are no longer skipped when add_<token>_t…
…oken is set to false (ggerganov#5487) * fix(gguf-py): special tokens are no longer skipped when add_<token>_token is set to false * fix(gguf-py): added missing cls and mask token ids to the gguf metadata
Configuration menu - View commit details
-
Copy full SHA for 7312247 - Browse repository at this point
Copy the full SHA 7312247View commit details -
scripts : add hf.sh helper script (ggerganov#5501)
* scripts : add hf.sh helper scripts * hf : add error logs * hf : add support for --repo and --file
Configuration menu - View commit details
-
Copy full SHA for 9350a1c - Browse repository at this point
Copy the full SHA 9350a1cView commit details -
cuda : print message when initialization fails (ggerganov#5512)
* cuda : print message when initialization fails * use CUDA_NAME both times
Configuration menu - View commit details
-
Copy full SHA for 9060a1e - Browse repository at this point
Copy the full SHA 9060a1eView commit details -
Configuration menu - View commit details
-
Copy full SHA for c06e45d - Browse repository at this point
Copy the full SHA c06e45dView commit details -
Use correct type of pooling for embedding models (ggerganov#5500)
Use correct type of pooling for embedding models
Configuration menu - View commit details
-
Copy full SHA for 4524290 - Browse repository at this point
Copy the full SHA 4524290View commit details
Commits on Feb 16, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 594845a - Browse repository at this point
Copy the full SHA 594845aView commit details -
llava : fix clip-model-is-vision flag in README.md (ggerganov#5509)
* llava: fix clip-model-is-vision flag in README.md This commit fixes the flag `--clip_model_is_vision` in README.md which is does not match the actual flag: ```console $ python convert-image-encoder-to-gguf.py --help ... --clip-model-is-vision The clip model is a pure vision model (ShareGPT4V vision extract for example) ``` Signed-off-by: Daniel Bevenius <[email protected]> * llava: update link to vit config in README.md Signed-off-by: Daniel Bevenius <[email protected]> --------- Signed-off-by: Daniel Bevenius <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 60ed04c - Browse repository at this point
Copy the full SHA 60ed04cView commit details -
ggml : add numa options (ggerganov#5377)
* Added numa options to allow finer grained control as well as plumbing for a new mirror mode that will require numa.h * Reverted Makefile * Fixed include * Removed sched.h from ggml.h, moved ggml_get_numa_affinity into ggml.c, removed trailing whitespace and fixed up a few inconsistent variables * removed trailing whitespace * Added numa options to allow finer grained control as well as plumbing for a new mirror mode that will require numa.h * Reverting Makefile * Fixed a number of issues with the move from BOOL to ggml_numa_strategies. Added a note about mirror mode note being implemented yet * Removing MIRROR_MODE code for this PR * Removing last bit of MIRROR_MODE code for this PR * Removing unneeded branch in server.cpp example and moving get_numa_affinity and making it static * Fixed lingering init_llama_backend() bool calls in tests and examples * Remote enum llama_numa_strategies * Revert bad merge with dynatemp flags * add missing enum ggml_numa_strategies declaration and revert sync problem with master * add missing enum ggml_numa_strategies declaration * fixed ggml_init_numa variable * Update ggml.h Co-authored-by: Jared Van Bortel <[email protected]> * Update READMEs with info about numa flags, change INTERLEAVE strategy name to DISTRIBUTE everywhere, implement the improved distribution strategy from @rankaiyx, fix a spelling mistake and un-merge some bad merges * split numa init out from llama_backend_init and created llama_numa_init. Updated all code paths and samples * Fix up some boolean vs enum comparisons * Added #ifdefs for non-Linux OS that don't have cpu_set_t datatype * Update ggml.h Align enum values Co-authored-by: Georgi Gerganov <[email protected]> * Update ggml.c Remove whitespace Co-authored-by: Georgi Gerganov <[email protected]> * Update ggml.c align paremeters Co-authored-by: Georgi Gerganov <[email protected]> * Update examples/server/server.cpp remove whitespace and align brace Co-authored-by: Georgi Gerganov <[email protected]> * Update common/common.cpp Remove whitespace and align brace Co-authored-by: Georgi Gerganov <[email protected]> * unified ggml_numa_strategy enum and fixed text alignment in server.cpp example * Update ggml.c simplified return for platforms without NUMA support Co-authored-by: Jared Van Bortel <[email protected]> * removed redundant else from cli argument processing of --numa * whitespace --------- Co-authored-by: root <[email protected]> Co-authored-by: Jared Van Bortel <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: Jared Van Bortel <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f486f6e - Browse repository at this point
Copy the full SHA f486f6eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5f5808c - Browse repository at this point
Copy the full SHA 5f5808cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6dcc02d - Browse repository at this point
Copy the full SHA 6dcc02dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 65085c7 - Browse repository at this point
Copy the full SHA 65085c7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4cb0727 - Browse repository at this point
Copy the full SHA 4cb0727View commit details -
scripts : add helpers script for bench comparing commits (ggerganov#5521
) * scripts : add helpers script for bench comparing commits * scripts : detect CUDA * set flags after checking the command line * fix make flags --------- Co-authored-by: slaren <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d2819d5 - Browse repository at this point
Copy the full SHA d2819d5View commit details -
cmake : fix VULKAN and ROCm builds (ggerganov#5525)
* cmake : fix VULKAN and ROCm builds * cmake : fix (cont) * vulkan : fix compile warnings ggml-ci * cmake : fix ggml-ci * cmake : minor ggml-ci
Configuration menu - View commit details
-
Copy full SHA for 5bf2b94 - Browse repository at this point
Copy the full SHA 5bf2b94View commit details
Commits on Feb 17, 2024
-
Configuration menu - View commit details
-
Copy full SHA for d250c9d - Browse repository at this point
Copy the full SHA d250c9dView commit details -
ci : add an option to fail on compile warning (ggerganov#3952)
* feat(ci): add an option to fail on compile warning * Update CMakeLists.txt * minor : fix compile warnings ggml-ci * ggml : fix unreachable code warnings ggml-ci * ci : disable fatal warnings for windows, ios and tvos * ggml : fix strncpy warning * ci : disable fatal warnings for MPI build * ci : add fatal warnings to ggml-ci ggml-ci --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6e4e973 - Browse repository at this point
Copy the full SHA 6e4e973View commit details -
ggml : add ALiBi support for ggml_soft_max_ext (ggerganov#5488)
* ggml : avoid recomputing alibi slopes (CPU) * llama : reuse hparams.f_max_alibi_bias in all cases ggml-ci * ggml : support alibi bias in ggml_soft_max_ext (CPU + Metal) ggml-ci * ggml : handle all SRCs (do not break on first null) ggml-ci * tests : do not use slope for large soft_max accumulates too much error ggml-ci * ggml : alternative ALiBi without extra tensor We compute the slopes in the kernel ggml-ci * cuda : add ALiBi support in ggml_soft_max_ext ggml-ci * ggml : deprecate ggml_alibi * ggml : support multi-sequence ALiBi (Metal) ggml-ci * cuda : add multi-seq ALiBi + remote F16 soft_max ggml-ci * ggml : update deprecation message * ggml : fix pos ptr when no ALiBi ggml-ci * cuda : fix performance (pow -> powf) * cuda : precompute ALiBi constants * metal : pre-compute ALiBi slopes ggml-ci * llama : init kq_pos only if needed ggml-ci * test-backend-ops : add null pos test to soft_max test-backend-ops : replace soft_max tests ggml-ci --------- Co-authored-by: slaren <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8f1be0d - Browse repository at this point
Copy the full SHA 8f1be0dView commit details
Commits on Feb 18, 2024
-
Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/f8e2ebd66d097614d51a56a755450d4ae1632df1' (2024-02-07) → 'github:NixOS/nixpkgs/5863c27340ba4de8f83e7e3c023b9599c3cb3c80' (2024-02-16)
Configuration menu - View commit details
-
Copy full SHA for c8e0d7e - Browse repository at this point
Copy the full SHA c8e0d7eView commit details -
1.5 bit quantization (ggerganov#5453)
* iq1_s: WIP basics * iq1_s: CUDA is working * iq1_s: scalar CPU dot product * iq1_s: WIP AVX2 dot product - something is not right * Fix tests * Fix shadow warnings * Fix after merge with latest master * iq1_s: AVX2 finally works * iq1_s: ARM_NEON dot product. Works, but not very fast * iq1_s: better grid * iq1_s: use IQ2_XXS for attn_output At a cost of 0.04 extra bpw this gives a big improvement in PPL. * iq1_s: Metal basics Dequantize works, but not dot product * iq1_s: Metal works, but quite slow As usual, Apple Silicon does not like the code I write. * iq1_s: Tests * iq1_s: slightly faster dot product --------- Co-authored-by: Iwan Kawrakow <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for bd2d4e3 - Browse repository at this point
Copy the full SHA bd2d4e3View commit details -
llava : update surgery script to not remove tensors (ggerganov#5536)
This commit updates the surgery script to not remove the tensors from the model file. For this to work the `--skip-unknown` flag is added as an argument to the convert.py script in README.md. The motivation for this change is that the surgery script currently removes the projector tensors from the model file. If the model was checked out from a repository, the model file will have been updated and have to be checked out again to reset this effect. If this can be avoided I think it would be preferable. I did not perform this change for BakLLaVA models as I am not sure how that part works.
Configuration menu - View commit details
-
Copy full SHA for fc0c8d2 - Browse repository at this point
Copy the full SHA fc0c8d2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5d3de51 - Browse repository at this point
Copy the full SHA 5d3de51View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1dcc3fd - Browse repository at this point
Copy the full SHA 1dcc3fdView commit details -
server : graceful server shutdown (ggerganov#5244)
This updates the server queue to support graceful shutdown of the server on signals.
Configuration menu - View commit details
-
Copy full SHA for 66c1968 - Browse repository at this point
Copy the full SHA 66c1968View commit details -
server : --n-predict option document and cap to max value (ggerganov#…
…5549) * server: document --n-predict * server: ensure client request cannot override n_predict if set * server: fix print usage LF in new --n-predict option
Configuration menu - View commit details
-
Copy full SHA for 36376ab - Browse repository at this point
Copy the full SHA 36376abView commit details -
server : enhanced health endpoint (ggerganov#5548)
* server: enrich health endpoint with available slots, return 503 if not slots are available * server: document new status no slot available in the README.md
Configuration menu - View commit details
-
Copy full SHA for e75c627 - Browse repository at this point
Copy the full SHA e75c627View commit details -
Configuration menu - View commit details
-
Copy full SHA for f3f28c5 - Browse repository at this point
Copy the full SHA f3f28c5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 689a091 - Browse repository at this point
Copy the full SHA 689a091View commit details -
Configuration menu - View commit details
-
Copy full SHA for c145f8a - Browse repository at this point
Copy the full SHA c145f8aView commit details -
common, server : surface min_keep as its own parameter (ggerganov#5567)
* Feature - surface min_keep as its own parameter * Updated README with min_keep param
Configuration menu - View commit details
-
Copy full SHA for 5ee99c3 - Browse repository at this point
Copy the full SHA 5ee99c3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7ad554f - Browse repository at this point
Copy the full SHA 7ad554fView commit details -
Configuration menu - View commit details
-
Copy full SHA for b1de968 - Browse repository at this point
Copy the full SHA b1de968View commit details -
Configuration menu - View commit details
-
Copy full SHA for 14278f5 - Browse repository at this point
Copy the full SHA 14278f5View commit details -
build : pass all warning flags to nvcc via -Xcompiler (ggerganov#5570)
* build : pass all warning flags to nvcc via -Xcompiler * make : fix apparent mis-merge from ggerganov#3952 * make : fix incorrect GF_CC_VER for CUDA host compiler
Configuration menu - View commit details
-
Copy full SHA for a0c2dad - Browse repository at this point
Copy the full SHA a0c2dadView commit details
Commits on Feb 19, 2024
-
ggml : android and old glibc NUMA incompatibility bugfixes (ggerganov…
…#5557) * #ifdef out some code NUMA blocks for Android due to lack of support * added in some __ANDROID__ if def gates around numa code and forced GLIBC prior to 2.29 to use a syscall for getcpu instead of the wrapper * Changed gates on numa platform specific stuff to __gnu_linux__ to skip any platforms without glibc * harmonizing #if defined blocks for numa code to __gnu_linux__ since that's the only model that's being followed anyways --------- Co-authored-by: root <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f0d1faf - Browse repository at this point
Copy the full SHA f0d1fafView commit details -
Configuration menu - View commit details
-
Copy full SHA for 769a716 - Browse repository at this point
Copy the full SHA 769a716View commit details -
cuda, metal : fix nans in soft_max (ggerganov#5574)
* cuda : fix nans in soft_max * metal : fix nans in soft_max --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3a9cb4c - Browse repository at this point
Copy the full SHA 3a9cb4cView commit details -
llama : add llama_chat_apply_template() (ggerganov#5538)
* llama: add llama_chat_apply_template * test-chat-template: remove dedundant vector * chat_template: do not use std::string for buffer * add clarification for llama_chat_apply_template * llama_chat_apply_template: add zephyr template * llama_chat_apply_template: correct docs * llama_chat_apply_template: use term "chat" everywhere * llama_chat_apply_template: change variable name to "tmpl"
Configuration menu - View commit details
-
Copy full SHA for 11b12de - Browse repository at this point
Copy the full SHA 11b12deView commit details -
baby-llama : allocate graphs in ggml_context (ggerganov#5573)
* Fixed the baby-llama issue (see issue ggerganov#4830) * minor : fix whitespaces --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4480542 - Browse repository at this point
Copy the full SHA 4480542View commit details -
llava : avoid changing the original BakLLaVA model (ggerganov#5577)
This is a follup of Commit fc0c8d2 ("llava : update surgery script to not remove tensors") but this time the change is to the BakLLaVA specific part of the surgery script. I've been able to test this using SkunkworksAI/BakLLaVA-1 and it works as expected using the instructions in README.md. Signed-off-by: Daniel Bevenius <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7084755 - Browse repository at this point
Copy the full SHA 7084755View commit details -
Configuration menu - View commit details
-
Copy full SHA for f53119c - Browse repository at this point
Copy the full SHA f53119cView commit details -
cmake : remove obsolete sycl compile flags (ggerganov#5581)
* rm unwanted sycl compile options * fix bug * fix bug * format fix
Configuration menu - View commit details
-
Copy full SHA for 13e2c77 - Browse repository at this point
Copy the full SHA 13e2c77View commit details -
Configuration menu - View commit details
-
Copy full SHA for 70d45af - Browse repository at this point
Copy the full SHA 70d45afView commit details -
Configuration menu - View commit details
-
Copy full SHA for 68a6b98 - Browse repository at this point
Copy the full SHA 68a6b98View commit details -
ci : enable -Werror for CUDA builds (ggerganov#5579)
* cmake : pass -Werror through -Xcompiler ggml-ci * make, cmake : enable CUDA errors on warnings ggml-ci
Configuration menu - View commit details
-
Copy full SHA for d0e3ce5 - Browse repository at this point
Copy the full SHA d0e3ce5View commit details -
metal : option to embed MSL source into compiled binary (whisper/1842)
* ggml : embed Metal library source (ggml-metal.metal) into binary enable by setting WHISPER_EMBED_METAL_LIBRARY * rename the build option * rename the preprocessor directive * generate Metal library embedding assembly on-fly during build process
Configuration menu - View commit details
-
Copy full SHA for 890559a - Browse repository at this point
Copy the full SHA 890559aView commit details -
Configuration menu - View commit details
-
Copy full SHA for a3145bd - Browse repository at this point
Copy the full SHA a3145bdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 337c9cb - Browse repository at this point
Copy the full SHA 337c9cbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6fd4137 - Browse repository at this point
Copy the full SHA 6fd4137View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1387cf6 - Browse repository at this point
Copy the full SHA 1387cf6View commit details -
examples : support minItems/maxItems in JSON grammar converter (ggerg…
…anov#5039) * support minLength and maxLength in JSON schema grammar converter * Update examples/json-schema-to-grammar.py --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9d679f0 - Browse repository at this point
Copy the full SHA 9d679f0View commit details -
Configuration menu - View commit details
-
Copy full SHA for f24ed14 - Browse repository at this point
Copy the full SHA f24ed14View commit details -
cuda : ignore peer access already enabled errors (ggerganov#5597)
* cuda : ignore peer access already enabled errors * fix hip
Configuration menu - View commit details
-
Copy full SHA for 40c3a6c - Browse repository at this point
Copy the full SHA 40c3a6cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5dde540 - Browse repository at this point
Copy the full SHA 5dde540View commit details -
Configuration menu - View commit details
-
Copy full SHA for 42f664a - Browse repository at this point
Copy the full SHA 42f664aView commit details -
Configuration menu - View commit details
-
Copy full SHA for d8c0545 - Browse repository at this point
Copy the full SHA d8c0545View commit details -
Configuration menu - View commit details
-
Copy full SHA for f50db6a - Browse repository at this point
Copy the full SHA f50db6aView commit details -
Refactor validation and enumeration platform checks into functions to…
… clean up ggml_vk_instance_init()
Configuration menu - View commit details
-
Copy full SHA for bb9dcd5 - Browse repository at this point
Copy the full SHA bb9dcd5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 22f83f0 - Browse repository at this point
Copy the full SHA 22f83f0View commit details -
nix: now that we can do so, allow MacOS to build Vulkan binaries
Author: Philip Taron <[email protected]> Date: Tue Feb 13 20:28:02 2024 +0000
Configuration menu - View commit details
-
Copy full SHA for 633782b - Browse repository at this point
Copy the full SHA 633782bView commit details
Commits on Feb 20, 2024
-
Update ggml_sycl_op_mul_mat_vec_q (ggerganov#5502)
* Update ggml_sycl_op_mul_mat_vec_q * Apply suggestions from code review Co-authored-by: Abhilash Majumder <[email protected]> * revert suggestion on macro * fix bug * Add quant type GGML_TYPE_IQ1_S to unsupported * fix format --------- Co-authored-by: Abhilash Majumder <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b9111bd - Browse repository at this point
Copy the full SHA b9111bdView commit details -
Configuration menu - View commit details
-
Copy full SHA for c0a8c6d - Browse repository at this point
Copy the full SHA c0a8c6dView commit details -
metal : add build system support for embedded metal library (ggergano…
…v#5604) * add build support for embedded metal library * Update Makefile --------- Co-authored-by: Haoxiang Fei <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8dbbd75 - Browse repository at this point
Copy the full SHA 8dbbd75View commit details -
readme : update UI list (ggerganov#5605)
* Add maid to ui list * Specify licence
Configuration menu - View commit details
-
Copy full SHA for 5207b3f - Browse repository at this point
Copy the full SHA 5207b3fView commit details -
Server: use llama_chat_apply_template (ggerganov#5593)
* server: use llama_chat_apply_template * server: remove trailing space * server: fix format_chat * server: fix help message Co-authored-by: Georgi Gerganov <[email protected]> * server: fix formatted_chat --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9c405c9 - Browse repository at this point
Copy the full SHA 9c405c9View commit details -
llava : add explicit instructions for llava-1.6 (ggerganov#5611)
This commit contains a suggestion for the README.md in the llava example. The suggestion adds explicit instructions for how to convert a llava-1.6 model and run it using llava-cli. The motivation for this is that having explicit instructions similar to the 1.5 instructions will make it easier for users to try this out. Signed-off-by: Daniel Bevenius <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4ed8e4f - Browse repository at this point
Copy the full SHA 4ed8e4fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 06bf2cf - Browse repository at this point
Copy the full SHA 06bf2cfView commit details -
server : support llava 1.6 (ggerganov#5553)
* server: init working 1.6 * move clip_image to header * remove commented code * remove c++ style from header * remove todo * expose llava_image_embed_make_with_clip_img * fix zig build
Configuration menu - View commit details
-
Copy full SHA for 6560bed - Browse repository at this point
Copy the full SHA 6560bedView commit details
Commits on Feb 21, 2024
-
IQ4_NL: 4-bit non-linear quants with blocks of 32 (ggerganov#5590)
* iq4_nl: squash commits for easier rebase * Basics (quantize, dequantize) * CUDA dequantize and dot product * Slightly faster CUDA dot product (120 t/s) * Switch to 6-bit scales * Scalar dot product * AVX2 dot product * ARM_NEON dot product * Works on metal, but still slow * Slightly better Metal dot product * Another small Metal improvement * Metal dot product is getting there * Faster CUDA dot product * Add 1/8 ffn_down layers as Q5_K when no imatrix has been provided * Report the actual bpw * Add _xs mix that is 4.05 bpw for non-MoE models * Remove IQ4_XS for now, slightly adjust kvalues_iq4nl * AVX2 dot product uses Q8_0 instead of Q8_K * Add to test-backend-ops * Minor fix * Also use use Q5_K for attn_output in MoE models * Fixes after merging latest master * Switching to blocks of 32 * AVX2 for blocks of 32 * Scaler dot product for blocks of 32 * ARM_NEON dot product for blocks of 32 * Metal kernels for blocks of 32 * Slightly faster Metal kernels * iq4_nl: Fix after merging with master * iq4_nl: another fix after merging with master * Use IQ4_NL instead of Q4_K when using k-quants is not possible * Fix typo that makes several tests fail * It was the ggml_vdotq thing missed inside the brackets --------- Co-authored-by: Iwan Kawrakow <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a14679c - Browse repository at this point
Copy the full SHA a14679cView commit details -
[SYCL] conext add name (ggerganov#5624)
* [SYCL] conext add name * name should start with SYCL*
Configuration menu - View commit details
-
Copy full SHA for 88c46cb - Browse repository at this point
Copy the full SHA 88c46cbView commit details -
llama : add
gemma
model (ggerganov#5631)There are couple things in this architecture: 1. Shared input and output embedding parameters. 2. Key length and value length are not derived from `n_embd`. More information about the models can be found at https://ai.google.dev/gemma. GGUFs can be downloaded from https://huggingface.co/google.
Configuration menu - View commit details
-
Copy full SHA for 580111d - Browse repository at this point
Copy the full SHA 580111dView commit details -
llava : add --skip-unknown to 1.6 convert.py (ggerganov#5632)
This commit adds the `--skip-unknown` option to the convert.py script and removes the saving of the updated checkpoints to avoid updating possibly checked out files. The motivation for this change is that this was done for 1.5 in Commit fc0c8d2 ("llava : update surgery script to not remove tensors") and makes the examples more consistent. Signed-off-by: Daniel Bevenius <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for cc6cac0 - Browse repository at this point
Copy the full SHA cc6cac0View commit details -
Configuration menu - View commit details
-
Copy full SHA for c14f72d - Browse repository at this point
Copy the full SHA c14f72dView commit details -
* ggml : fix conv_2d batch mode (ggml/737) Co-authored-by: bssrdf <[email protected]> * ggml : compute forward no longer pass src tensors (ggml/729) * sync : ggml ggml-ci --------- Co-authored-by: bssrdf <[email protected]> Co-authored-by: bssrdf <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for eccd7a2 - Browse repository at this point
Copy the full SHA eccd7a2View commit details -
Configuration menu - View commit details
-
Copy full SHA for a00a35c - Browse repository at this point
Copy the full SHA a00a35cView commit details -
server: health: fix race condition on slots data using tasks queue (g…
…gerganov#5634) * server: health: fix race condition on slots data using tasks queue * server: health: * include_slots only if slots_endpoint * fix compile warning task.target_id not initialized.
Configuration menu - View commit details
-
Copy full SHA for 1ecea25 - Browse repository at this point
Copy the full SHA 1ecea25View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5022cf2 - Browse repository at this point
Copy the full SHA 5022cf2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 89febfe - Browse repository at this point
Copy the full SHA 89febfeView commit details -
Configuration menu - View commit details
-
Copy full SHA for ba2135c - Browse repository at this point
Copy the full SHA ba2135cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7fe4678 - Browse repository at this point
Copy the full SHA 7fe4678View commit details -
Add docs for llama_chat_apply_template (ggerganov#5645)
* add docs for llama_chat_apply_template * fix typo
Configuration menu - View commit details
-
Copy full SHA for 7c8bcc1 - Browse repository at this point
Copy the full SHA 7c8bcc1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 973053d - Browse repository at this point
Copy the full SHA 973053dView commit details
Commits on Feb 22, 2024
-
mpt : add optional bias tensors (ggerganov#5638)
Update for MPT with optional bias parameters: to work with PhoGPT and SEA-LION models that were pre-trained with 'bias'.
Configuration menu - View commit details
-
Copy full SHA for 4ef245a - Browse repository at this point
Copy the full SHA 4ef245aView commit details -
Configuration menu - View commit details
-
Copy full SHA for c5688c6 - Browse repository at this point
Copy the full SHA c5688c6View commit details -
server : fallback to chatml, add AlphaMonarch chat template (ggergano…
…v#5628) * server: fallback to chatml * add new chat template * server: add AlphaMonarch to test chat template * server: only check model template if there is no custom tmpl * remove TODO
Configuration menu - View commit details
-
Copy full SHA for a46f507 - Browse repository at this point
Copy the full SHA a46f507View commit details -
Configuration menu - View commit details
-
Copy full SHA for 56d03d9 - Browse repository at this point
Copy the full SHA 56d03d9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3a03541 - Browse repository at this point
Copy the full SHA 3a03541View commit details -
workflows: nix: hardcode cachix ids, build unconditionally (ggerganov…
…#5663) GitHub does not expose environment and repository variables to PRs coming from forks implies that we've been disabling the Nix CI actions for most PRs. The `if:` also didn't make much sense, because we can always pull from cachix, and there's no point (albeit no risk either) in pushing cache for the untrusted code.
Configuration menu - View commit details
-
Copy full SHA for 4cb4d8b - Browse repository at this point
Copy the full SHA 4cb4d8bView commit details -
Add Gemma chat template (ggerganov#5665)
* add gemma chat template * gemma: only apply system_prompt on non-model message
Configuration menu - View commit details
-
Copy full SHA for 373ee3f - Browse repository at this point
Copy the full SHA 373ee3fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5a9e2f6 - Browse repository at this point
Copy the full SHA 5a9e2f6View commit details -
nix: init singularity and docker images (ggerganov#5056)
Exposes a few attributes demonstrating how to build [singularity](https://docs.sylabs.io/guides/latest/user-guide/)/[apptainer](https://apptainer.org/) and Docker images re-using llama.cpp's Nix expression. Built locally on `x86_64-linux` with `nix build github:someoneserge/llama.cpp/feat/nix/images#llamaPackages.{docker,docker-min,sif,llama-cpp}` and it's fast and effective.
Configuration menu - View commit details
-
Copy full SHA for 201294a - Browse repository at this point
Copy the full SHA 201294aView commit details -
ggml : 32-bit arm compat (whisper/1891)
* ggml : 32-bit arm compat * ggml : add ggml_vqtbl1q_s8 impl * ggml : cont
Configuration menu - View commit details
-
Copy full SHA for efd56b1 - Browse repository at this point
Copy the full SHA efd56b1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 334f76f - Browse repository at this point
Copy the full SHA 334f76fView commit details -
ggml : always define ggml_fp16_t as uint16_t (ggerganov#5666)
* ggml : always define ggml_fp16_t as uint16_t ggml-ci * ggml : cont ggml-ci * ggml : cont * ggml : cont ggml-ci * ggml : cont ggml-ci * cuda : no longer ggml headers last ggml-ci * ggml : fix q6_K FP16 -> FP32 conversion ggml-ci * ggml : more FP16 -> FP32 conversion fixes ggml-ci
Configuration menu - View commit details
-
Copy full SHA for 7e4f339 - Browse repository at this point
Copy the full SHA 7e4f339View commit details -
py : add Gemma conversion from HF models (ggerganov#5647)
* py : add gemma conversion from HF models * Update convert-hf-to-gguf.py Co-authored-by: Aarni Koskela <[email protected]> * Update convert-hf-to-gguf.py Co-authored-by: Aarni Koskela <[email protected]> * Update convert-hf-to-gguf.py Co-authored-by: Jared Van Bortel <[email protected]> --------- Co-authored-by: Aarni Koskela <[email protected]> Co-authored-by: Jared Van Bortel <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 847eedb - Browse repository at this point
Copy the full SHA 847eedbView commit details -
gemma : use more bits for the token_embd.weight tensor (ggerganov#5650)
* gemma : use Q8_0 for the token_embd.weight tensor * llama : quantize token_embd.weight using output type
Configuration menu - View commit details
-
Copy full SHA for 96633ee - Browse repository at this point
Copy the full SHA 96633eeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 15499eb - Browse repository at this point
Copy the full SHA 15499ebView commit details
Commits on Feb 23, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 54fbcd2 - Browse repository at this point
Copy the full SHA 54fbcd2View commit details -
Configuration menu - View commit details
-
Copy full SHA for fd43d66 - Browse repository at this point
Copy the full SHA fd43d66View commit details
Commits on Feb 24, 2024
-
server: init functional tests (ggerganov#5566)
* server: tests: init scenarios - health and slots endpoints - completion endpoint - OAI compatible chat completion requests w/ and without streaming - completion multi users scenario - multi users scenario on OAI compatible endpoint with streaming - multi users with total number of tokens to predict exceeds the KV Cache size - server wrong usage scenario, like in Infinite loop of "context shift" ggerganov#3969 - slots shifting - continuous batching - embeddings endpoint - multi users embedding endpoint: Segmentation fault ggerganov#5655 - OpenAI-compatible embeddings API - tokenize endpoint - CORS and api key scenario * server: CI GitHub workflow --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 525213d - Browse repository at this point
Copy the full SHA 525213dView commit details -
IQ3_S: a much better alternative to Q3_K (ggerganov#5676)
* iq4_nl: squash commits for easier rebase * Basics (quantize, dequantize) * CUDA dequantize and dot product * Slightly faster CUDA dot product (120 t/s) * Switch to 6-bit scales * Scalar dot product * AVX2 dot product * ARM_NEON dot product * Works on metal, but still slow * Slightly better Metal dot product * Another small Metal improvement * Metal dot product is getting there * Faster CUDA dot product * Add 1/8 ffn_down layers as Q5_K when no imatrix has been provided * Report the actual bpw * Add _xs mix that is 4.05 bpw for non-MoE models * Remove IQ4_XS for now, slightly adjust kvalues_iq4nl * AVX2 dot product uses Q8_0 instead of Q8_K * Add to test-backend-ops * Minor fix * Also use use Q5_K for attn_output in MoE models * Fixes after merging latest master * Switching to blocks of 32 * AVX2 for blocks of 32 * Scaler dot product for blocks of 32 * ARM_NEON dot product for blocks of 32 * Metal kernels for blocks of 32 * Slightly faster Metal kernels * Resurrecting iq3_xs After all the experimentation, nothing was better than this. * Minor PPL improvement via a block scale fudge factor * Minor improvement via 3 neighbours * iq3_xs: working scalar and AVX2 dot products * iq3_xs: ARM_NEON dot product - works but extremely slow (10 t/s) * iq3_xs: working Metal implementation * Adding IQ3_M - IQ3_XS mix with mostly Q4_K * iiq3_xs: a 3.4375 bpw variant * iq3_xs: make CUDA work for new version * iq3_xs: make scalar and AVX2 work for new version * iq3_s: make ARM_NEON work with new version * iq3_xs: make new version work on metal Performance is very similar to Q3_K_S * iq3_xs: tiny Metal speed improvement * iq3_xs: tiny Metal speed improvement * Fix stupid warning * Q3_K_XS now uses a mix of IQ3_XS and IQ3_XXS * iq3_xs: rename to iq3_s * iq3_s: make tests pass * Move Q3_K_XS mix to 3.25 bpw * Attempt to fix failing tests * Another attempt to fix the Windows builds * Attempt to fix ROCm * ROCm again * iq3_s: partial fix for QK_K = 64 * iq3_s: make it work on metal for QK_K = 64 Pleasent surprise: the coding was super-block size independent, so all it took was to delete some QK_K == 256 guards. * Will this fix ROCm? --------- Co-authored-by: Iwan Kawrakow <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4c4cb30 - Browse repository at this point
Copy the full SHA 4c4cb30View commit details -
server: continue to update other slots on embedding concurrent request (
ggerganov#5699) * server: ggerganov#5655 - continue to update other slots on embedding concurrent request. * server: tests: add multi users embeddings as fixed * server: tests: adding OAI compatible embedding concurrent endpoint * server: tests: adding OAI compatible embedding with multiple inputs
Configuration menu - View commit details
-
Copy full SHA for 9e359a4 - Browse repository at this point
Copy the full SHA 9e359a4View commit details
Commits on Feb 25, 2024
-
py : fix StableLM conversion after config.json changes (ggerganov#5703)
* Fix issues during StableLM models conversion * Fix hard coded layer_norm_eps * Support layer_norm_eps for LlavaStableLM Co-authored-by: Jared Van Bortel <[email protected]> * Add missing parenthesis Co-authored-by: Jared Van Bortel <[email protected]> * Support rotary_factor for LlavaStableLM Co-authored-by: Jared Van Bortel <[email protected]> * fix typo * Add StableLMEpochForCausalLM for safety Co-authored-by: compilade <[email protected]> * Add StableLMEpochForCausalLM for safety 2 Co-authored-by: compilade <[email protected]> --------- Co-authored-by: Jared Van Bortel <[email protected]> Co-authored-by: Jared Van Bortel <[email protected]> Co-authored-by: compilade <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 69917df - Browse repository at this point
Copy the full SHA 69917dfView commit details -
code : normalize enum names (ggerganov#5697)
* coda : normalize enum names ggml-ci * code : cont * code : cont
Configuration menu - View commit details
-
Copy full SHA for ab336a9 - Browse repository at this point
Copy the full SHA ab336a9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1289408 - Browse repository at this point
Copy the full SHA 1289408View commit details -
server: concurrency fix + monitoring - add /metrics prometheus compat…
…ible endpoint (ggerganov#5708) * server: monitoring - add /metrics prometheus compatible endpoint * server: concurrency issue, when 2 task are waiting for results, only one call thread is notified * server: metrics - move to a dedicated struct
Configuration menu - View commit details
-
Copy full SHA for d52d781 - Browse repository at this point
Copy the full SHA d52d781View commit details -
server: logs - unified format and --log-format option (ggerganov#5700)
* server: logs - always use JSON logger, add add thread_id in message, log task_id and slot_id * server : skip GH copilot requests from logging * server : change message format of server_log() * server : no need to repeat log in comment * server : log style consistency * server : fix compile warning * server : fix tests regex patterns on M2 Ultra * server: logs: PR feedback on log level * server: logs: allow to choose log format in json or plain text * server: tests: output server logs in text * server: logs switch init logs to server logs macro * server: logs ensure value json value does not raised error * server: logs reduce level VERBOSE to VERB to max 4 chars * server: logs lower case as other log messages * server: logs avoid static in general Co-authored-by: Georgi Gerganov <[email protected]> * server: logs PR feedback: change text log format to: LEVEL [function_name] message | additional=data --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 930b178 - Browse repository at this point
Copy the full SHA 930b178View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7d548a1 - Browse repository at this point
Copy the full SHA 7d548a1View commit details -
make : fix nvcc version is empty (ggerganov#5713)
fix nvcc version is empty
Configuration menu - View commit details
-
Copy full SHA for f1a98c5 - Browse repository at this point
Copy the full SHA f1a98c5View commit details -
ggml-quants : provide ggml_vqtbl1q_u8 for 64bit compatibility (ggerga…
…nov#5711) * [ggml-quants] Provide ggml_vqtbl1q_u8 for 64bit compatibility vqtbl1q_u8 is not part of arm v7 neon library * [android-example] Remove abi filter after arm v7a fix * [github-workflows] Do not skip Android armeabi-v7a build
Configuration menu - View commit details
-
Copy full SHA for abbabc5 - Browse repository at this point
Copy the full SHA abbabc5View commit details -
server : fix crash when system prompt is bigger than batch size (gger…
…ganov#5714) The system prompt is now decoded in batches. * server : fix off-by-one n_past when start of prompt matches whole cache The tokens right after the matching part would otherwise skip a pos value.
Configuration menu - View commit details
-
Copy full SHA for f762501 - Browse repository at this point
Copy the full SHA f762501View commit details -
llama : refactor k-shift implementation + KV defragmentation (ggergan…
…ov#5691) * llama : refactor k-shift implementation ggml-ci * llama : rename llama_kv_cache_seq_shift to llama_kv_cache_seq_add * llama : cont k-shift refactoring + normalize type names ggml-ci * minor : fix MPI builds * llama : reuse n_rot from the build context ggml-ci * llama : revert enum name changes from this PR ggml-ci * llama : update llama_rope_type * llama : add comment about rope values * llama : fix build * passkey : apply kv cache updates explicitly ggml-ci * llama : change name to llama_kv_cache_update() * llama : add llama_kv_cache_seq_pos_max() * passkey : fix llama_kv_cache_seq_pos_max() usage * llama : some llama_kv_cell simplifications * llama : add llama_kv_cache_compress (EXPERIMENTAL) * llama : add alternative KV cache merging (EXPERIMENTAL) * llama : add llama_kv_cache_defrag * llama : comments * llama : remove llama_kv_cache_compress will add in a separate PR ggml-ci * llama : defragment via non-overlapping moves * llama : ggml_graph based defrag implementation ggml-ci * llama : switch the loop order in build_defrag * llama : add comments
Configuration menu - View commit details
-
Copy full SHA for bf08e00 - Browse repository at this point
Copy the full SHA bf08e00View commit details -
server: docs - refresh and tease a little bit more the http server (g…
…gerganov#5718) * server: docs - refresh and tease a little bit more the http server * Rephrase README.md server doc Co-authored-by: Georgi Gerganov <[email protected]> * Update examples/server/README.md Co-authored-by: Georgi Gerganov <[email protected]> * Update examples/server/README.md Co-authored-by: Georgi Gerganov <[email protected]> * Update README.md --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8b35035 - Browse repository at this point
Copy the full SHA 8b35035View commit details -
server: tests - slow inference causes timeout on the CI (ggerganov#5715)
* server: tests - longer inference timeout for CI
Configuration menu - View commit details
-
Copy full SHA for e3965cf - Browse repository at this point
Copy the full SHA e3965cfView commit details -
Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/5863c27340ba4de8f83e7e3c023b9599c3cb3c80' (2024-02-16) → 'github:NixOS/nixpkgs/cbc4211f0afffe6dfd2478a62615dd5175a13f9a' (2024-02-23)
Configuration menu - View commit details
-
Copy full SHA for c393733 - Browse repository at this point
Copy the full SHA c393733View commit details
Commits on Feb 26, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 269de86 - Browse repository at this point
Copy the full SHA 269de86View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8a533f0 - Browse repository at this point
Copy the full SHA 8a533f0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4804215 - Browse repository at this point
Copy the full SHA 4804215View commit details -
Configuration menu - View commit details
-
Copy full SHA for 67fd331 - Browse repository at this point
Copy the full SHA 67fd331View commit details -
[SYCL] Add support for soft_max ALiBi (ggerganov#5639)
* Add support for bias * Update pre-processor * rm commented code * fix format * fix CI --------- Co-authored-by: Abhilash Majumder <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e849078 - Browse repository at this point
Copy the full SHA e849078View commit details -
readme : update ui list (ggerganov#5731)
* Add LLMFarm (ui for iOS) to list
Configuration menu - View commit details
-
Copy full SHA for c4d7f81 - Browse repository at this point
Copy the full SHA c4d7f81View commit details -
Configuration menu - View commit details
-
Copy full SHA for 47bb7b4 - Browse repository at this point
Copy the full SHA 47bb7b4View commit details -
Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantizati…
…on range (ggerganov#5721) * Adding IQ2_S and IQ2_M as a single cumulative commit * Update examples/quantize/quantize.cpp Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Iwan Kawrakow <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a33e6a0 - Browse repository at this point
Copy the full SHA a33e6a0View commit details -
Configuration menu - View commit details
-
Copy full SHA for b11a93d - Browse repository at this point
Copy the full SHA b11a93dView commit details
Commits on Feb 27, 2024
-
Makefile: use variables for cublas (ggerganov#5689)
* make: use arch variable for cublas * fix UNAME_M * check opt first --------- Co-authored-by: lindeer <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for cbbd1ef - Browse repository at this point
Copy the full SHA cbbd1efView commit details -
llama : fix defrag bugs + add parameter (ggerganov#5735)
* llama : fix defrag bugs + enable by default ggml-ci * llama : add defrag_thold parameter ggml-ci * llama : cont * llama : disable log message ggml-ci * llama : fix graph size check during defrag
Configuration menu - View commit details
-
Copy full SHA for 9d533a7 - Browse repository at this point
Copy the full SHA 9d533a7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1f30b7a - Browse repository at this point
Copy the full SHA 1f30b7aView commit details -
Configuration menu - View commit details
-
Copy full SHA for c24a2a6 - Browse repository at this point
Copy the full SHA c24a2a6View commit details -
IQ4_XS: a 4.25 bpw quantization (ggerganov#5747)
* Try IQ4_NL with blocks of 64 - does not look good * iq4_xs: go to super-blocks of 256 and 6-bit scales for blocks of 32 * iq4_xs: CUDA works - 133.2 t/s * iq4_xs: AVX2 dot product * iq4_xs: ARM_NEON dot product * iq4_nl: Metal implementation As usual, Metal / Apple Silicon don't like my quants. * iq3_xs: minor fix * iq4_xs: shrink by using IQ3_S for attn_k and attn_q * iq4_xs: revert using IQ3_S for attn_k and attn_v PPL vs size is good, but CPU performance suffers: on M2 Max TG-128 drops to 21.7 t/s from 28.8, and on a Ryzen-7950X to 14.5 t/s from 15.8 t/s. On CUDA we have 135 t/s when using IQ3_S vs 133 t/s with pure IQ4_XS. * Fix CI * iq4_xs: Added forgotten check for 256 divisibility --------- Co-authored-by: Iwan Kawrakow <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0becb22 - Browse repository at this point
Copy the full SHA 0becb22View commit details -
Attempt to fix android build (ggerganov#5752)
Co-authored-by: Iwan Kawrakow <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for cb49e0f - Browse repository at this point
Copy the full SHA cb49e0fView commit details
Commits on Feb 28, 2024
-
ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (ggerga…
…nov#5760) * WIP: make i-quants work for QK_K = 64 * iq2_xs: attempt to fix AVX dot product for QK_K = 64 Tests pass, but I get gibberish. * QK_K = 64 tests pass on ARM_NEON and Metal Sadly, that does not mean it actually works. * Make CUDA compile with QK_K = 64 Tests don't pass, plus we get misaligned access * Q2_K: fixed bug in imatrix quantization for QK_K = 64 * iq1_s: turn off SIMD implementation for QK_K = 64 (it does not work) --------- Co-authored-by: Iwan Kawrakow <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7c4263d - Browse repository at this point
Copy the full SHA 7c4263dView commit details -
server : add "/chat/completions" alias for "/v1/...` (ggerganov#5722)
* Add "/chat/completions" as alias for "/v1/chat/completions" * merge to upstream master * minor : fix trailing whitespace --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for efc7225 - Browse repository at this point
Copy the full SHA efc7225View commit details -
readme : add link to LLaVA 1.6 models (ggerganov#5758)
Signed-off-by: Daniel Bevenius <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6c44168 - Browse repository at this point
Copy the full SHA 6c44168View commit details -
llama : improve BERT tokenization (ggerganov#5740)
* implement nfd for stripping accents in wpm tokenizer * sort nfd map; reuse iterator * use builtin tolower * add locale include * Simplify to_lower cases Co-authored-by: Jared Van Bortel <[email protected]> --------- Co-authored-by: Jared Van Bortel <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 177628b - Browse repository at this point
Copy the full SHA 177628bView commit details -
llama : fix non-quantization of expert gating tensors (ggerganov#5754)
This reverts a single line from ggerganov#5475
Configuration menu - View commit details
-
Copy full SHA for adcb12a - Browse repository at this point
Copy the full SHA adcb12aView commit details -
server : hit Ctrl+C twice to exit (ggerganov#5734)
* server: twice ctrl+C to exit * std::atomic_flag * sigint: message * sigint: stderr * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <[email protected]> --------- Co-authored-by: Jared Van Bortel <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a693bea - Browse repository at this point
Copy the full SHA a693beaView commit details -
Introduce backend GUIDs (ggml/743)
* Introduce backend GUIDs Initial proposed implementation of backend GUIDs (Discussed in ggerganov/ggml#741) Hardcoded CPU backend GUID (for now) Change ggml_backend_is_cpu logic to use GUID * Remove redundant functions Remove redundant functions `ggml_backend_i::get_name` and `ggml_backend_guid` which are not desired for future expansion * Add spaces to match style Co-authored-by: slaren <[email protected]> * Fix brace style to match Co-authored-by: slaren <[email protected]> * Add void to () in function signature Co-authored-by: slaren <[email protected]> * Add back ggml_backend_guid and make CPU_GUID a local static in ggml_backend_cpu_guid * add guids to all backends ggml-ci --------- Co-authored-by: slaren <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5f70671 - Browse repository at this point
Copy the full SHA 5f70671View commit details -
add google magika inference example (ggml/748)
* add magika inference example * ggml : fix unaligned accesses in custom ops * ggml : fix FP32 GELU for values that exceed the FP16 range * use ggml_pool_1d * add README * Update README.md * pad inputs if the files are too small * cleanup ggml-ci
Configuration menu - View commit details
-
Copy full SHA for 2774b0c - Browse repository at this point
Copy the full SHA 2774b0cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8c0e8f4 - Browse repository at this point
Copy the full SHA 8c0e8f4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 78aacf3 - Browse repository at this point
Copy the full SHA 78aacf3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 08c5ee8 - Browse repository at this point
Copy the full SHA 08c5ee8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 317709b - Browse repository at this point
Copy the full SHA 317709bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 87c91c0 - Browse repository at this point
Copy the full SHA 87c91c0View commit details
Commits on Feb 29, 2024
-
Configuration menu - View commit details
-
Copy full SHA for d5ab297 - Browse repository at this point
Copy the full SHA d5ab297View commit details -
Server: normalize naming (ggerganov#5779)
* server: normalize naming * fix spacing
Configuration menu - View commit details
-
Copy full SHA for 052051d - Browse repository at this point
Copy the full SHA 052051dView commit details -
Configuration menu - View commit details
-
Copy full SHA for e841b7a - Browse repository at this point
Copy the full SHA e841b7aView commit details
Commits on Mar 1, 2024
-
[SYCL] Use batched mul_mat pathway (ggerganov#5591)
* Use batched mul_mat pathway * rm extra line * Explicitly state scaled data type --------- Co-authored-by: Abhilash Majumder <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 38d1521 - Browse repository at this point
Copy the full SHA 38d1521View commit details -
Configuration menu - View commit details
-
Copy full SHA for f105471 - Browse repository at this point
Copy the full SHA f105471View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6ea0f01 - Browse repository at this point
Copy the full SHA 6ea0f01View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5cb02b4 - Browse repository at this point
Copy the full SHA 5cb02b4View commit details -
unicode : switch to multimap based nfd_map (ggerganov#5799)
* switch to multimap based nfd_map due to compile time issues * simplify multimap keys * dont construct new locale every time
Configuration menu - View commit details
-
Copy full SHA for 9600d59 - Browse repository at this point
Copy the full SHA 9600d59View commit details -
llama : cleanup unused mmq flags (ggerganov#5772)
* cleanup unused --no-mul-mat-q,-nommq, -mmq, --mul-mat-q, mul_mat_q * remove: mul_mat_q in compare llama bench and usage * update llama-bench --------- Co-authored-by: slaren <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3ab8b3a - Browse repository at this point
Copy the full SHA 3ab8b3aView commit details -
Configuration menu - View commit details
-
Copy full SHA for f49a535 - Browse repository at this point
Copy the full SHA f49a535View commit details -
Configuration menu - View commit details
-
Copy full SHA for e743386 - Browse repository at this point
Copy the full SHA e743386View commit details -
Configuration menu - View commit details
-
Copy full SHA for c2224f0 - Browse repository at this point
Copy the full SHA c2224f0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 38d16b1 - Browse repository at this point
Copy the full SHA 38d16b1View commit details -
llama : add StarCoder2 support (ggerganov#5795)
* Add support for starcoder2 * handle rope type * skip rope freq and rotary embeddings from being serialized * resolve comments * Update llama.cpp * remove redundant changes * handle `rope-theta` * llama : change starcoder2 rope type * address comment --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c29af7e - Browse repository at this point
Copy the full SHA c29af7eView commit details -
Configuration menu - View commit details
-
Copy full SHA for c504a54 - Browse repository at this point
Copy the full SHA c504a54View commit details