-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(COMPATIBILITY) [v1.54 Smooth Sampling] - unknown model architecture: 'orion' #638
Comments
I don't think the "Orion" architecture is supported, don't see any references to it. Where and how did you get this model? |
I have this problem,too. I convert this model manually following this guide. |
Ah that makes sense. It relies on a pull request ggerganov#5118 that has not yet been merged, so it won't work until that happens. |
You are right. I have tried to get the pr before converting the model. Actually, the converting will fail without this pr. The converted model do can run with this pr. |
Here is the GGUF for Orion Longchat 14b. https://huggingface.co/demonsu/orion-14b-longchat-gguf/tree/main |
Should be fixed now in v1.57, can check? |
Yes,it seems work fine. Thanks for your job! |
I was trying to use 14b Orion LongChat, but it threw an error. Presumably, it is simply an new architecture. Here you go.
Welcome to KoboldCpp - Version 1.54
For command line arguments, please refer to --help
Attempting to use CuBLAS library for faster prompt ingestion. A compatible CuBLAS will be required.
Initializing dynamic library: koboldcpp_cublas.dll
Namespace(model=None, model_param='C:/KoboldCPP/Models/14b Orion LongChat - q6k.gguf', port=5001, port_param=5001, host='', launch=True, lora=None, config=None, threads=31, blasthreads=31, highpriority=False, contextsize=32768, blasbatchsize=512, ropeconfig=[0.0, 10000.0], smartcontext=False, noshift=False, bantokens=None, forceversion=0, nommap=False, usemlock=True, noavx2=False, debugmode=0, skiplauncher=False, hordeconfig=None, noblas=False, useclblast=None, usecublas=['normal', '0', 'mmq'], gpulayers=99, tensor_split=None, onready='', multiuser=1, remotetunnel=False, foreground=False, preloadstory=None, quiet=False, ssl=None)
Loading model: C:\KoboldCPP\Models\14b Orion LongChat - q6k.gguf
[Threads: 31, BlasThreads: 31, SmartContext: False, ContextShift: True]
The reported GGUF Arch is: orion
Identified as LLAMA model: (ver 6)
Attempting to Load...
Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead!
System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
llama_model_loader: loaded meta data with 21 key-value pairs and 444 tensors from C:\KoboldCPP\Models\14b Orion LongChat - q6k.gLチレ兎rror loading model: unknown model architecture: 'orion'
llama_load_model_from_file: failed to load model
Traceback (most recent call last):
File "koboldcpp.py", line 2519, in
File "koboldcpp.py", line 2366, in main
File "koboldcpp.py", line 310, in load_model
OSError: exception: access violation reading 0x0000000000000064
[28744] Failed to execute script 'koboldcpp' due to unhandled exception!
[process exited with code 1 (0x00000001)]
The text was updated successfully, but these errors were encountered: