Function server #1641

pseudotensor · 2024-05-23T03:16:19Z

Parallel and Isolated OpenAI Proxy Servers

python generate.py --openai_server=True --openai_workers=2 ...

will launch 2 OpenAI proxy servers using FastAPIs workers, so each is a separate fork independent of any other process.

This speeds up any calls to the OpenAI server, letting FastAPI handle concurrency and load balancing between the different workers using same IP/port via OS management.

Parallel and Isolated Ingestion Servers

python generate.py --function_server=True --function_server_workers=2 ...

will launch 2 Ingestion proxy servers using FastAPIs workers, so each is a separate fork independent of any other process. If ASR, DocTR, captions, etc. are enabled, these will be run on same GPUs in separate processes.

This helps keep the main UI server isolated from ingestion tasks that can consume alot of cpu or hang the Gradio server.

…se in some extreme limits

…bout 8192

…bout 4096

pseudotensor · 2024-05-24T00:43:30Z

pseudotensor force-pushed the function_server branch 3 times, most recently from cda8709 to 8a76cbf Compare May 23, 2024 04:02

Function server

96fb03f

pseudotensor force-pushed the function_server branch from 8a76cbf to 96fb03f Compare May 23, 2024 04:11

pseudotensor added 12 commits May 22, 2024 23:13

make split faster when chunking by faking tokenizer, too slow otherwi…

9e20ca4

…se in some extreme limits

Update FAQ

6b101a1

Handle path_to_docs via FastAPI

42dddbc

Ensure gen_kwargs filled for function server, fix various things.

d1a2854

Ensure return values

8f69d4c

Avoid duplicating some things in function server mode

96664d4

Fix condition if model_lock=[]

2c5adc9

Verbose

94e5d6d

Gradio_pdf

0cb9caf

Fix whitespace

2da730c

Adjust import

05b627b

Check auth

65eec19

pseudotensor force-pushed the function_server branch from 3086d5c to 65eec19 Compare May 23, 2024 11:17

pseudotensor added 9 commits May 23, 2024 04:33

Deal with auth

ab2e927

FAQ for parallel openai or function servers

15dfc22

Fix idefics2 context length

3999f4a

Minor

db2fbf1

Not explained, but seems idefics2 works ok in longer context out to a…

5246de7

…bout 8192

Not explained, but seems idefics2 works ok in longer context out to a…

1aab8a5

…bout 4096

idefics2 starts to mess up after 4k tokens input

1cdaa56

hide debug

730c0a4

Default of 4096

83c647a

pseudotensor added 2 commits May 23, 2024 17:43

Support sglang via text and sglang via its own langchain llm class

ac7086a

Deal with sglang/llava stop tokens

e9ca8f0

pseudotensor added 7 commits May 23, 2024 19:15

Support multiple images for sglang

8c137c8

More docs and add /v1 if not present for sglang OpenAI chat

2604138

Handle our own system prompt

a744cd1

Update test

b7ead1e

Update test

0117122

guided_json test code split gradio and openai

674193e

Update test

373ccdd

pseudotensor marked this pull request as ready for review May 25, 2024 05:48

pseudotensor merged commit 538564d into main May 25, 2024
2 checks passed

pseudotensor deleted the function_server branch May 25, 2024 05:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function server #1641

Function server #1641

pseudotensor commented May 23, 2024 •

edited

Loading

pseudotensor commented May 24, 2024

Function server #1641

Function server #1641

Conversation

pseudotensor commented May 23, 2024 • edited Loading

Parallel and Isolated OpenAI Proxy Servers

Parallel and Isolated Ingestion Servers

pseudotensor commented May 24, 2024

pseudotensor commented May 23, 2024 •

edited

Loading