sgl-project · merrymercy · Jan 19, 2024 · Jan 19, 2024 · Jan 19, 2024 · Jan 19, 2024
diff --git a/README.md b/README.md
@@ -248,13 +248,55 @@ In addition, the server supports an experimental OpenAI-compatible API.
 import openai
 client = openai.Client(
     base_url="http://127.0.0.1:30000/v1", api_key="EMPTY")
+
+# Text completion
 response = client.completions.create(
 	model="default",
 	prompt="The capital of France is",
 	temperature=0,
 	max_tokens=32,
 )
 print(response)
+
+# Chat completion
+response = client.chat.completions.create(
+    model="default",
+    messages=[
+        {"role": "system", "content": "You are a helpful AI assistant"},
+        {"role": "user", "content": "List 3 countries and their capitals."},
+    ],
+    temperature=0,
+    max_tokens=64,
+)
+print(response)
+```
+
+In above example, the server uses the chat template specified in the model tokenizer.
+You can override the chat template if needed when launching the server:
+
+```
+python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
+--chat-template llama-2
+```
+
+If the chat template you are looking for is missing, you are welcome to contribute it.
+Meanwhile, you can also temporary register your chat template as follows:
+
+```json
+{
+  "name": "my_model",
+  "system": "<|im_start|>system",
+  "user": "<|im_start|>user",
+  "assistant": "<|im_start|>assistant",
+  "sep_style": "CHATML",
+  "sep": "<|im_end|>",
+  "stop_str": ["<|im_end|>", "<|im_start|>"]
+}
+```
+
+```
+python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
+--chat-template ./my_model_template.json
 ```
 
 ### Additional Arguments