Update device_capabilities.py: Add GTX 1070, 1080; main.py: timeout 90->900 #393

FFAMax · 2024-10-28T04:41:59Z

Added few GPUs.
Tuned timeout. On slow setups (~1 token per second) average response may take ~600-1000 tokens. In most cases it will lead to timeout (network error which is not). Fixing to reduce exceptions. Who looking for better performance and know what to do need adjust with a knowledge how it will impact. By default making it will work for most cases.

On slow setups (~1 token per second) average response may take ~600-1000 tokens. In most cases it will lead to timeout (network error which is not). Fixing to reduce exceptions. Who looking for better performance and know what to do need adjust with a knowledge how it will impact. By default making it will work for most cases.

dtnewman · 2024-11-03T03:33:12Z

Can you double check the FP16 numbers here? Those look a little too low. They are usually halfway between the 8 and 32.

FFAMax · 2024-11-03T05:54:48Z

Can you double check the FP16 numbers here? Those look a little too low. They are usually halfway between the 8 and 32.

For example take GTX 1080 Ti

According to https://images.nvidia.com/aem-dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf

FP16 & INT8 are NA

Based on https://www.techpowerup.com/gpu-specs/geforce-gtx-1080-ti.c2877
FP16 (half) 177.2 GFLOPS (1:64)
What is .177 TFLOPS as mentioned

So they are low probably due no HW support or something like that.

For example for https://www.techpowerup.com/gpu-specs/geforce-gtx-1660-ti.c3364 We can see 2:1 as you mentioned while for 1080 it is 1:64

AlexCheema · 2024-11-23T19:07:18Z

I'm concerned with increasing the timeout this much. If a request would take this long, I'd say it should be treated differently. Request handling generally needs to be reworked with a new scheduler that has better control of the request flow. Right now it's pretty much fire-and-forget and hope we get some response back from the cluster.

I changed the timeout back to 90. The rest looks good to me.

FFAMax added 2 commits October 27, 2024 21:36

Update device_capabilities.py

d4e26fc

FFAMax changed the title ~~Update device_capabilities.py: Add GTX 1070, 1080~~ Update device_capabilities.py: Add GTX 1070, 1080; main.py: timeout 90->900 Oct 28, 2024

BELBAWE approved these changes Nov 15, 2024

View reviewed changes

AlexCheema and others added 2 commits November 23, 2024 23:02

Merge branch 'main' into main

675b1e1

change chatgpt-api-response-timeout default back to 90

ab3e76a

AlexCheema merged commit 4bbe0b4 into exo-explore:main Nov 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update device_capabilities.py: Add GTX 1070, 1080; main.py: timeout 90->900 #393

Update device_capabilities.py: Add GTX 1070, 1080; main.py: timeout 90->900 #393

FFAMax commented Oct 28, 2024 •

edited

Loading

dtnewman commented Nov 3, 2024

FFAMax commented Nov 3, 2024

AlexCheema commented Nov 23, 2024

Update device_capabilities.py: Add GTX 1070, 1080; main.py: timeout 90->900 #393

Update device_capabilities.py: Add GTX 1070, 1080; main.py: timeout 90->900 #393

Conversation

FFAMax commented Oct 28, 2024 • edited Loading

dtnewman commented Nov 3, 2024

FFAMax commented Nov 3, 2024

AlexCheema commented Nov 23, 2024

FFAMax commented Oct 28, 2024 •

edited

Loading