Adds OpenAI compatible endpoint option #16

ohmeow · 2025-01-27T23:34:48Z

This PR enables llama-vscode to talk to a OpenAI compatible endpoint in lieu of llama.cpp running locally.

Tested with OpenAI's exposed enpoint as well as a local vLLM server endpoint.

I'm not sure if I translated all the llama.cpp arguments correctly for an OpenAI API so I imagine we might have to do a few iterations to get the hypers and prompt right. I'm relatively new to this extension and to llama.cpp so definitely up for taking as much time to get this right.

Thanks much - wg

…compatible-endpoint

shishkin · 2025-01-28T09:27:42Z

Would love to use it with Ollama server as well.

ggerganov

Lets merge after resolving the conflicts.

ggerganov · 2025-01-29T11:40:25Z

src/configuration.ts

 export class Configuration {
-    // extension configs
-    enabled = true
-    endpoint = "http=//127.0.0.1:8012"
-    auto = true
-    api_key = ""
-    n_prefix = 256
-    n_suffix = 64
-    n_predict = 128
-    t_max_prompt_ms = 500
-    t_max_predict_ms = 2500
-    show_info = true
-    max_line_suffix = 8
-    max_cache_keys = 250
-    ring_n_chunks = 16
-    ring_chunk_size = 64
-    ring_scope = 1024
-    ring_update_ms = 1000
-    language = "en"
-    // additional configs
-    axiosRequestConfig = {}
-    disabledLanguages: string[] = []
-    RING_UPDATE_MIN_TIME_LAST_COMPL = 3000
-    MIN_TIME_BETWEEN_COMPL = 600
-    MAX_LAST_PICK_LINE_DISTANCE = 32
-    MAX_QUEUED_CHUNKS = 16
-    DELAY_BEFORE_COMPL_REQUEST = 150
+  // extension configs
+  enabled = true;
+  endpoint = "http=//127.0.0.1:8012";
+  is_openai_compatible = false;
+  openAiClient: OpenAI | null = null;
+  openAiClientModel: string | null = null;
+  opeanAiPromptTemplate: string = "<|fim_prefix|>{inputPrefix}{prompt}<|fim_suffix|>{inputSuffix}<|fim_middle|>";
+  auto = true;
+  api_key = "";
+  n_prefix = 256;
+  n_suffix = 64;
+  n_predict = 128;
+  t_max_prompt_ms = 500;
+  t_max_predict_ms = 2500;
+  show_info = true;
+  max_line_suffix = 8;
+  max_cache_keys = 250;
+  ring_n_chunks = 16;
+  ring_chunk_size = 64;
+  ring_scope = 1024;
+  ring_update_ms = 1000;
+  language = "en";
+  // additional configs
+  axiosRequestConfig = {};
+  disabledLanguages: string[] = [];
+  RING_UPDATE_MIN_TIME_LAST_COMPL = 3000;
+  MIN_TIME_BETWEEN_COMPL = 600;
+  MAX_LAST_PICK_LINE_DISTANCE = 32;
+  MAX_QUEUED_CHUNKS = 16;
+  DELAY_BEFORE_COMPL_REQUEST = 150;


Note for future PRs: we should normalize the naming here. We have 3 different styles now. We should choose one and follow it.

Fixed formatting by adding a .prettierrc to the project for folks (like me) that use prettier for formatting js/ts.

Updated configuration names to be consistent.

Also, can you take a look at my openai compatible implementation ins llama-server.ts? I know there isn't a one-to-one correlation between what llama.cpp accepts and openai compatible endpoints but I'd love to make sure it is as consistent as possible.

Thanks - wg

ggerganov · 2025-01-29T11:46:28Z

Would love to use it with Ollama server as well.

Note that using anything else other than the llama.cpp server will be majorly inefficient, because only the llama.cpp server has the necessary optimizations used by this extension.

… for folks using the extension

shishkin · 2025-01-30T08:06:17Z

Note that using anything else other than the llama.cpp server will be majorly inefficient, because only the llama.cpp server has the necessary optimizations used by this extension.

Apologies for my ignorance, but what are those optimizations and what would be the way forward to reuse them between llama.cpp-based model servers? If those are on the API layer, maybe Ollama should expose llama.cpp server's API directly?

ohmeow added 6 commits January 26, 2025 17:28

initial openai compatible api endpoint integration

cc245fd

fix watch

95cfa9f

added openAiClientModel to config; tested with local vllm server

35adb98

Merge remote-tracking branch 'origin/master' into feature/add-openai-…

afa8e42

…compatible-endpoint

fixed config and completions to work with FIM models by default

52135f7

remove unnecessary try catch

62981d0

ggerganov approved these changes Jan 29, 2025

View reviewed changes

resolved conflicts; added .prettierrc to ensure consistent formatting…

82e3d33

… for folks using the extension

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds OpenAI compatible endpoint option #16

Adds OpenAI compatible endpoint option #16

ohmeow commented Jan 27, 2025

shishkin commented Jan 28, 2025

ggerganov left a comment

ggerganov Jan 29, 2025

ohmeow Jan 29, 2025

ggerganov commented Jan 29, 2025

shishkin commented Jan 30, 2025

Adds OpenAI compatible endpoint option #16

Are you sure you want to change the base?

Adds OpenAI compatible endpoint option #16

Conversation

ohmeow commented Jan 27, 2025

shishkin commented Jan 28, 2025

ggerganov left a comment

Choose a reason for hiding this comment

ggerganov Jan 29, 2025

Choose a reason for hiding this comment

ohmeow Jan 29, 2025

Choose a reason for hiding this comment

ggerganov commented Jan 29, 2025

shishkin commented Jan 30, 2025