main : don't print special tokens with --grammar #6923

jart · 2024-04-26T09:16:12Z

The CLI interface was recently changed to print special control tokens like the stop message one. This token shouldn't be printed if the grammar flag was passed, unless the grammar specifies it, because that breaks shell-scriptability.

The CLI interface was recently changed to print special control tokens like the </s> stop message one. This token shouldn't be printed if the grammar flag was passed, unless the grammar specifies it, because that breaks shell-scriptability.

HanClinto · 2024-05-16T21:53:57Z

The CLI interface was recently changed to print special control tokens like the stop message one.

Looks like this was #6807 ?

This token shouldn't be printed if the grammar flag was passed

Looks like your PR accomplishes this part...

unless the grammar specifies it,

... but is this condition included in your PR? It looks like the PR checks whether any grammar is present at all, and doesn't check to see if the grammar specifies special control tokens (?).

That said, I'm not entirely sure how a grammar would specify a special control token (I probably need to learn the tokenizer better...). If there's a mechanism or escape sequence to tag part of a grammar as a control token (rather than regular text), then I'm not aware of it. But now that you mention it, that sounds like the sort of thing we may want to add support for later (?).

because that breaks shell-scriptability.

Do you have an example of the sort of script that fails without this PR?

Apologies if any of my questions seem ignorant or dense -- mainly trying to get up to speed with what you're talking about. I trust that what you wrote is important, and I would love it if you could help me understand it better.

Thank you very much!

jart · 2024-05-17T00:11:48Z

See:

I like to ask LLMs yes/no questions in shell scripts. I use the grammar flag to force it to only print yes or no. If it instead prints "no<s>" or "no<|end-of-turn|>" then that breaks my shell script if statments.

HanClinto · 2024-05-17T04:00:04Z

That makes sense.

I don't use shell scripts with grammars, but I wonder if this functionality would be better added behind a command line option to specifically render special tokens or hide them? If I'm trying to debug a grammar- constrained generation, it would tend to want to display the special tokens rather than hide them.

How do you feel about separating this flag out into its own option?

mofosyne

Logic checks out. Intent is sensible

What's your thought about @HanClinto idea of separating it out into it's own flag? Regardless, should be safe to make a separate PR and merge this in anyway as the default behavior of omitting special token with grammar make sense.

FYI: Android CI is bit broken in main branch, but I see a PR coming in soon to fix. But this doesn't touch android anyway.

ggerganov · 2024-05-18T08:10:26Z

I don't use shell scripts with grammars, but I wonder if this functionality would be better added behind a command line option to specifically render special tokens or hide them? If I'm trying to debug a grammar- constrained generation, it would tend to want to display the special tokens rather than hide them.

How do you feel about separating this flag out into its own option?

I agree it's better to have this as separate flag - should consolidate with the existing conversation flag - rename it and reuse it.

mofosyne · 2024-05-18T14:24:57Z

Possible approach that @jart could potentially use in jart#3 to address @HanClinto 's idea of separating out the flag. Investigated @ggerganov idea of consolidating with existing conversation flag but there is significant enough difference in semantic that i could not merge it.

Feel free to adjust as needed or ignore if it puts too much complexity to this PR.

jart · 2024-05-18T20:30:49Z

One thing you could do is print the special tokens out of band to file descriptor 3. Then if a shell script doesn't want them, it could either pass a flag to disable them, or simply say 3>/dev/null.

…rammar mode

teleprint-me · 2024-05-20T19:52:09Z

I haven't had a chance to look at the grammar yet, but I'm wondering what the logic is behind including the special tokens in the output?

…ring

mofosyne · 2024-05-21T15:16:42Z

To test this you can try these

cmake -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build

echo "== Expect Control Token To Shared Console 3>&1 =="
./build/bin/main --hf-repo TheBloke/phi-2-GGUF --hf-file phi-2.Q6_K.gguf --grammar 'root ::= "yes" | "no"' --temp 0 -c 0 --no-display-prompt --log-disable -p "<|user|>
Say yes
<|assistant|>" 2>/dev/null 3>&1
echo
echo "== Expect No Control Token To Console because 3>/dev/null =="
./build/bin/main --hf-repo TheBloke/phi-2-GGUF --hf-file phi-2.Q6_K.gguf --grammar 'root ::= "yes" | "no"' --temp 0 -c 0 --no-display-prompt --log-disable -p "<|user|>
Say yes
<|assistant|>" 2>/dev/null 3>/dev/null
echo
echo "== Expect No Control Token To Console because 3>&- =="
./build/bin/main --hf-repo TheBloke/phi-2-GGUF --hf-file phi-2.Q6_K.gguf --grammar 'root ::= "yes" | "no"' --temp 0 -c 0 --no-display-prompt --log-disable -p "<|user|>
Say yes
<|assistant|>" 2>/dev/null 3>&-
echo
echo == Expect No Control Token To Console as we are still in grammar mode ==
./build/bin/main --hf-repo TheBloke/phi-2-GGUF --hf-file phi-2.Q6_K.gguf --grammar 'root ::= "yes" | "no"' --temp 0 -c 0 --no-display-prompt --log-disable -p "<|user|>
Say yes
<|assistant|>" 2>/dev/null
echo
echo == Expect Control Token To Console as we are in normal completion mode ==
./build/bin/main --hf-repo TheBloke/phi-2-GGUF --hf-file phi-2.Q6_K.gguf --temp 0 -c 0 --no-display-prompt --log-disable -p "<|user|>
Hi
<|assistant|>" 2>/dev/null 3>&1
echo

mofosyne · 2024-05-21T15:20:21Z

I haven't had a chance to look at the grammar yet, but I'm wondering what the logic is behind including the special tokens in the output?

Might be handy for debugging at least.

Also it has tokens showing the split between the user, assistant and end of text. Might be handy for integration if not using the apis or libraries for some reason.

It does get me thinking if it's possible to also separate the input special tokens so that the special tokens are fully out of band. Might help make it a little bit more secure?

Anyway, is everyone happy enough with the new changes?

llama.h

github-actions · 2024-05-21T15:43:20Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 535 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8728.07ms p(95)=22198.78ms fails=, finish reason: stop=481 truncated=54
Prompt processing (pp): avg=102.85tk/s p(95)=415.52tk/s
Token generation (tg): avg=32.3tk/s p(95)=46.58tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=grammar-token commit=e75c5ca4512cef4bdd7470e4e756bf3d0af60ff3

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 535 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716621768 --> 1716622390
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 614.93, 614.93, 614.93, 614.93, 614.93, 917.56, 917.56, 917.56, 917.56, 917.56, 870.78, 870.78, 870.78, 870.78, 870.78, 915.13, 915.13, 915.13, 915.13, 915.13, 963.0, 963.0, 963.0, 963.0, 963.0, 955.35, 955.35, 955.35, 955.35, 955.35, 960.2, 960.2, 960.2, 960.2, 960.2, 964.38, 964.38, 964.38, 964.38, 964.38, 970.96, 970.96, 970.96, 970.96, 970.96, 964.57, 964.57, 964.57, 964.57, 964.57, 983.61, 983.61, 983.61, 983.61, 983.61, 971.06, 971.06, 971.06, 971.06, 971.06, 945.11, 945.11, 945.11, 945.11, 945.11, 924.34, 924.34, 924.34, 924.34, 924.34, 905.14, 905.14, 905.14, 905.14, 905.14, 907.64, 907.64, 907.64, 907.64, 907.64, 909.37, 909.37, 909.37, 909.37, 909.37, 906.0, 906.0, 906.0, 906.0, 906.0, 900.74, 900.74, 900.74, 900.74, 900.74, 852.46, 852.46, 852.46, 852.46, 852.46, 854.58, 854.58, 854.58, 854.58, 854.58, 861.01, 861.01, 861.01, 861.01, 861.01, 863.55, 863.55, 863.55, 863.55, 863.55, 874.66, 874.66, 874.66, 874.66, 874.66, 877.63, 877.63, 877.63, 877.63, 877.63, 879.08, 879.08, 879.08, 879.08, 879.08, 880.2, 880.2, 880.2, 880.2, 880.2, 890.3, 890.3, 890.3, 890.3, 890.3, 887.63, 887.63, 887.63, 887.63, 887.63, 883.19, 883.19, 883.19, 883.19, 883.19, 883.47, 883.47, 883.47, 883.47, 883.47, 887.19, 887.19, 887.19, 887.19, 887.19, 885.07, 885.07, 885.07, 885.07, 885.07, 887.9, 887.9, 887.9, 887.9, 887.9, 896.82, 896.82, 896.82, 896.82, 896.82, 898.37, 898.37, 898.37, 898.37, 898.37, 900.92, 900.92, 900.92, 900.92, 900.92, 906.61, 906.61, 906.61, 906.61, 906.61, 896.98, 896.98, 896.98, 896.98, 896.98, 896.34, 896.34, 896.34, 896.34, 896.34, 897.8, 897.8, 897.8, 897.8, 897.8, 899.21, 899.21, 899.21, 899.21, 899.21, 903.4, 903.4, 903.4, 903.4, 903.4, 865.56, 865.56, 865.56, 865.56, 865.56, 867.76, 867.76, 867.76, 867.76, 867.76, 866.62, 866.62, 866.62, 866.62, 866.62, 866.26, 866.26, 866.26, 866.26, 866.26, 858.96, 858.96, 858.96, 858.96, 858.96, 865.36, 865.36, 865.36, 865.36, 865.36, 864.29, 864.29, 864.29, 864.29, 864.29, 862.94, 862.94, 862.94, 862.94, 862.94, 867.61, 867.61, 867.61, 867.61, 867.61, 870.85, 870.85, 870.85, 870.85, 870.85, 874.59, 874.59, 874.59, 874.59, 874.59, 873.06, 873.06, 873.06, 873.06, 873.06, 872.87, 872.87, 872.87, 872.87, 872.87, 873.59, 873.59, 873.59, 873.59, 873.59, 872.87, 872.87, 872.87, 872.87, 872.87, 870.75, 870.75, 870.75, 870.75, 870.75, 871.52, 871.52, 871.52, 871.52, 871.52, 871.51, 871.51]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 535 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716621768 --> 1716622390
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 42.87, 42.87, 42.87, 42.87, 42.87, 33.35, 33.35, 33.35, 33.35, 33.35, 27.2, 27.2, 27.2, 27.2, 27.2, 30.97, 30.97, 30.97, 30.97, 30.97, 31.72, 31.72, 31.72, 31.72, 31.72, 32.68, 32.68, 32.68, 32.68, 32.68, 33.81, 33.81, 33.81, 33.81, 33.81, 33.82, 33.82, 33.82, 33.82, 33.82, 34.22, 34.22, 34.22, 34.22, 34.22, 34.06, 34.06, 34.06, 34.06, 34.06, 33.94, 33.94, 33.94, 33.94, 33.94, 33.5, 33.5, 33.5, 33.5, 33.5, 33.27, 33.27, 33.27, 33.27, 33.27, 32.76, 32.76, 32.76, 32.76, 32.76, 32.17, 32.17, 32.17, 32.17, 32.17, 31.05, 31.05, 31.05, 31.05, 31.05, 29.89, 29.89, 29.89, 29.89, 29.89, 29.56, 29.56, 29.56, 29.56, 29.56, 29.97, 29.97, 29.97, 29.97, 29.97, 29.88, 29.88, 29.88, 29.88, 29.88, 29.75, 29.75, 29.75, 29.75, 29.75, 29.67, 29.67, 29.67, 29.67, 29.67, 29.8, 29.8, 29.8, 29.8, 29.8, 30.12, 30.12, 30.12, 30.12, 30.12, 30.09, 30.09, 30.09, 30.09, 30.09, 30.19, 30.19, 30.19, 30.19, 30.19, 30.45, 30.45, 30.45, 30.45, 30.45, 30.27, 30.27, 30.27, 30.27, 30.27, 30.14, 30.14, 30.14, 30.14, 30.14, 30.03, 30.03, 30.03, 30.03, 30.03, 30.14, 30.14, 30.14, 30.14, 30.14, 30.29, 30.29, 30.29, 30.29, 30.29, 30.43, 30.43, 30.43, 30.43, 30.43, 30.53, 30.53, 30.53, 30.53, 30.53, 30.53, 30.53, 30.53, 30.53, 30.53, 30.43, 30.43, 30.43, 30.43, 30.43, 30.14, 30.14, 30.14, 30.14, 30.14, 30.06, 30.06, 30.06, 30.06, 30.06, 29.76, 29.76, 29.76, 29.76, 29.76, 29.77, 29.77, 29.77, 29.77, 29.77, 29.9, 29.9, 29.9, 29.9, 29.9, 30.08, 30.08, 30.08, 30.08, 30.08, 30.28, 30.28, 30.28, 30.28, 30.28, 30.28, 30.28, 30.28, 30.28, 30.28, 29.98, 29.98, 29.98, 29.98, 29.98, 29.58, 29.58, 29.58, 29.58, 29.58, 28.98, 28.98, 28.98, 28.98, 28.98, 28.65, 28.65, 28.65, 28.65, 28.65, 28.59, 28.59, 28.59, 28.59, 28.59, 28.61, 28.61, 28.61, 28.61, 28.61, 28.61, 28.61, 28.61, 28.61, 28.61, 28.62, 28.62, 28.62, 28.62, 28.62, 28.73, 28.73, 28.73, 28.73, 28.73, 28.75, 28.75, 28.75, 28.75, 28.75, 28.7, 28.7, 28.7, 28.7, 28.7, 28.64, 28.64, 28.64, 28.64, 28.64, 28.75, 28.75, 28.75, 28.75, 28.75, 28.86, 28.86, 28.86, 28.86, 28.86, 29.01, 29.01, 29.01, 29.01, 29.01, 29.06, 29.06, 29.06, 29.06, 29.06, 29.13, 29.13]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 535 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716621768 --> 1716622390
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.18, 0.18, 0.18, 0.18, 0.18, 0.33, 0.33, 0.33, 0.33, 0.33, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.2, 0.2, 0.2, 0.2, 0.2, 0.12, 0.12, 0.12, 0.12, 0.12, 0.2, 0.2, 0.2, 0.2, 0.2, 0.14, 0.14, 0.14, 0.14, 0.14, 0.26, 0.26, 0.26, 0.26, 0.26, 0.16, 0.16, 0.16, 0.16, 0.16, 0.26, 0.26, 0.26, 0.26, 0.26, 0.27, 0.27, 0.27, 0.27, 0.27, 0.39, 0.39, 0.39, 0.39, 0.39, 0.31, 0.31, 0.31, 0.31, 0.31, 0.37, 0.37, 0.37, 0.37, 0.37, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.22, 0.22, 0.22, 0.22, 0.22, 0.21, 0.21, 0.21, 0.21, 0.21, 0.22, 0.22, 0.22, 0.22, 0.22, 0.14, 0.14, 0.14, 0.14, 0.14, 0.24, 0.24, 0.24, 0.24, 0.24, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.32, 0.32, 0.32, 0.32, 0.32, 0.23, 0.23, 0.23, 0.23, 0.23, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.14, 0.14, 0.14, 0.14, 0.14, 0.19, 0.19, 0.19, 0.19, 0.19, 0.27, 0.27, 0.27, 0.27, 0.27, 0.21, 0.21, 0.21, 0.21, 0.21, 0.28, 0.28, 0.28, 0.28, 0.28, 0.36, 0.36, 0.36, 0.36, 0.36, 0.18, 0.18, 0.18, 0.18, 0.18, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.2, 0.2, 0.2, 0.2, 0.2, 0.5, 0.5, 0.5, 0.5, 0.5, 0.54, 0.54, 0.54, 0.54, 0.54, 0.46, 0.46, 0.46, 0.46, 0.46, 0.58, 0.58, 0.58, 0.58, 0.58, 0.15, 0.15, 0.15, 0.15, 0.15, 0.25, 0.25, 0.25, 0.25, 0.25, 0.29, 0.29, 0.29, 0.29, 0.29, 0.13, 0.13, 0.13, 0.13, 0.13, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.34, 0.34, 0.34, 0.34, 0.34, 0.12, 0.12, 0.12, 0.12, 0.12, 0.21, 0.21, 0.21, 0.21, 0.21, 0.11, 0.11, 0.11, 0.11, 0.11, 0.07, 0.07, 0.07, 0.07, 0.07, 0.16, 0.16, 0.16, 0.16, 0.16, 0.2, 0.2, 0.2, 0.2, 0.2, 0.23, 0.23]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 535 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716621768 --> 1716622390
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0]

examples/main/main.cpp

mofosyne · 2024-05-25T09:03:59Z

@ngxson thanks. Confident now that full consensus has been reached now. Merging

mofosyne added bugfix fixes an issue or bug Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels May 9, 2024

Merge branch 'master' into grammar-token

ea70e28

mofosyne approved these changes May 18, 2024

View reviewed changes

mofosyne added the merge ready indicates that this may be ready to merge soon and is just holding out in case of objections label May 18, 2024

mofosyne mentioned this pull request May 18, 2024

common.cpp: add --enable-special-out and --disable-special-out for ov… jart/llama.cpp#3

Closed

main: use seperate stream for control characters

bcd24f8

mofosyne mentioned this pull request May 19, 2024

main: use seperate stream for control characters jart/llama.cpp#4

Merged

mofosyne added 5 commits May 20, 2024 22:35

main: use dprintf and add --ctrl-token-no-out and --ctrl-token-fd-out

9f445a7

main: dprintf isn't part of the IEEE POSIX standard. Just use write().

ad4b609

main: remove --ctrl-token-fd-out in favor for fcntl() based detection

c9ea9df

common.cpp: accidentally removed --interactive-first

5032f18

main: only merge stdout and control token if not in conversation or g…

90456a5

…rammar mode

mofosyne added 4 commits May 21, 2024 11:20

main: rejig control token descriptor handling

50048f5

main: must check pipe status on very top of program

c1e8a6d

main: renamed --no-special from --ctrl-token-no-out and other refacto…

7d52482

…ring

main: refactor ctrl_token_no_out --> no_special

8f76ba5

jart force-pushed the grammar-token branch from 178d2b3 to a285e3c Compare May 21, 2024 15:08

Merge branch 'master' into grammar-token

d8b373c

jart force-pushed the grammar-token branch from a285e3c to d8b373c Compare May 21, 2024 15:12

ggerganov reviewed May 21, 2024

View reviewed changes

llama.h Outdated Show resolved Hide resolved

llama: rename llama_token_is_control_token() to llama_token_is_control()

12fcea5

github-actions bot added the examples label May 21, 2024

ggerganov reviewed May 21, 2024

View reviewed changes

examples/main/main.cpp Outdated Show resolved Hide resolved

mofosyne mentioned this pull request May 24, 2024

main: remove special token file descriptor feature jart/llama.cpp#5

Merged

main: remove special token file descriptor feature (#5)

e75c5ca

mofosyne requested review from ggerganov, phymbert and ngxson and removed request for ggerganov May 25, 2024 07:05

ngxson approved these changes May 25, 2024

View reviewed changes

mofosyne merged commit 00c6390 into ggerganov:master May 25, 2024
71 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

main : don't print special tokens with --grammar #6923

main : don't print special tokens with --grammar #6923

jart commented Apr 26, 2024

HanClinto commented May 16, 2024

jart commented May 17, 2024 •

edited

Loading

HanClinto commented May 17, 2024

mofosyne left a comment •

edited

Loading

ggerganov commented May 18, 2024

mofosyne commented May 18, 2024

jart commented May 18, 2024

teleprint-me commented May 20, 2024

mofosyne commented May 21, 2024 •

edited

Loading

mofosyne commented May 21, 2024 •

edited

Loading

github-actions bot commented May 21, 2024 •

edited

Loading

mofosyne commented May 25, 2024

main : don't print special tokens with --grammar #6923

main : don't print special tokens with --grammar #6923

Conversation

jart commented Apr 26, 2024

HanClinto commented May 16, 2024

jart commented May 17, 2024 • edited Loading

HanClinto commented May 17, 2024

mofosyne left a comment • edited Loading

Choose a reason for hiding this comment

ggerganov commented May 18, 2024

mofosyne commented May 18, 2024

jart commented May 18, 2024

teleprint-me commented May 20, 2024

mofosyne commented May 21, 2024 • edited Loading

mofosyne commented May 21, 2024 • edited Loading

github-actions bot commented May 21, 2024 • edited Loading

mofosyne commented May 25, 2024

jart commented May 17, 2024 •

edited

Loading

mofosyne left a comment •

edited

Loading

mofosyne commented May 21, 2024 •

edited

Loading

mofosyne commented May 21, 2024 •

edited

Loading

github-actions bot commented May 21, 2024 •

edited

Loading