-
-
Notifications
You must be signed in to change notification settings - Fork 15k
[Frontend] Add /v1/audio/translations OpenAI API endpoint
#19615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
ca656ab
initial
ywang96 9bd4692
separate openai s2t
NickLucche 13e2502
minor
NickLucche 60bbae1
fix streaming
NickLucche 0445461
docs
NickLucche 40dfecb
revert example change
NickLucche 5691476
type for passing class
NickLucche a796661
types
NickLucche 313a063
test translation
NickLucche 21684fc
rebase changes
NickLucche 382a257
fix splitting: subsequent chunks still need preamble
NickLucche 5df1274
long audio test
NickLucche fbf8f86
types
NickLucche 5b77161
generic create_speech_to_text
NickLucche 29fbaf8
examples main guard
NickLucche f837747
fix docs
NickLucche 90159fc
types
NickLucche 7c7c107
cruft
NickLucche File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,75 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
| import asyncio | ||
| import json | ||
|
|
||
| import httpx | ||
| from openai import OpenAI | ||
|
|
||
| from vllm.assets.audio import AudioAsset | ||
|
|
||
|
|
||
| def sync_openai(audio_path: str, client: OpenAI): | ||
| with open(audio_path, "rb") as f: | ||
| translation = client.audio.translations.create( | ||
| file=f, | ||
| model="openai/whisper-large-v3", | ||
| response_format="json", | ||
| temperature=0.0, | ||
| # Additional params not provided by OpenAI API. | ||
| extra_body=dict( | ||
| language="it", | ||
| seed=4419, | ||
| repetition_penalty=1.3, | ||
| ), | ||
| ) | ||
| print("translation result:", translation.text) | ||
|
|
||
|
|
||
| async def stream_openai_response(audio_path: str, base_url: str, api_key: str): | ||
| data = { | ||
| "language": "it", | ||
| "stream": True, | ||
| "model": "openai/whisper-large-v3", | ||
| } | ||
| url = base_url + "/audio/translations" | ||
| headers = {"Authorization": f"Bearer {api_key}"} | ||
| print("translation result:", end=" ") | ||
| # OpenAI translation API client does not support streaming. | ||
| async with httpx.AsyncClient() as client: | ||
| with open(audio_path, "rb") as f: | ||
| async with client.stream( | ||
| "POST", url, files={"file": f}, data=data, headers=headers | ||
| ) as response: | ||
| async for line in response.aiter_lines(): | ||
| # Each line is a JSON object prefixed with 'data: ' | ||
| if line: | ||
| if line.startswith("data: "): | ||
| line = line[len("data: ") :] | ||
| # Last chunk, stream ends | ||
| if line.strip() == "[DONE]": | ||
| break | ||
| # Parse the JSON response | ||
| chunk = json.loads(line) | ||
| # Extract and print the content | ||
| content = chunk["choices"][0].get("delta", {}).get("content") | ||
| print(content, end="") | ||
|
|
||
|
|
||
| def main(): | ||
| foscolo = str(AudioAsset("azacinto_foscolo").get_local_path()) | ||
|
|
||
| # Modify OpenAI's API key and API base to use vLLM's API server. | ||
| openai_api_key = "EMPTY" | ||
| openai_api_base = "http://localhost:8000/v1" | ||
| client = OpenAI( | ||
| api_key=openai_api_key, | ||
| base_url=openai_api_base, | ||
| ) | ||
| sync_openai(foscolo, client) | ||
| # Run the asynchronous function | ||
| asyncio.run(stream_openai_response(foscolo, openai_api_base, openai_api_key)) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
172 changes: 172 additions & 0 deletions
172
tests/entrypoints/openai/test_translation_validation.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,172 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
|
|
||
| import io | ||
| # imports for guided decoding tests | ||
| import json | ||
| from unittest.mock import patch | ||
|
|
||
| import librosa | ||
| import numpy as np | ||
| import pytest | ||
| import soundfile as sf | ||
| from openai._base_client import AsyncAPIClient | ||
|
|
||
| from vllm.assets.audio import AudioAsset | ||
|
|
||
| from ...utils import RemoteOpenAIServer | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def foscolo(): | ||
| # Test translation it->en | ||
| path = AudioAsset('azacinto_foscolo').get_local_path() | ||
| with open(str(path), "rb") as f: | ||
| yield f | ||
|
|
||
|
|
||
| # NOTE: (NickLucche) the large-v3-turbo model was not trained on translation! | ||
| @pytest.mark.asyncio | ||
| async def test_basic_audio(foscolo): | ||
| model_name = "openai/whisper-small" | ||
| server_args = ["--enforce-eager"] | ||
| with RemoteOpenAIServer(model_name, server_args) as remote_server: | ||
| client = remote_server.get_async_client() | ||
| translation = await client.audio.translations.create( | ||
| model=model_name, | ||
| file=foscolo, | ||
| response_format="text", | ||
| # TODO remove once language detection is implemented | ||
| extra_body=dict(language="it"), | ||
| temperature=0.0) | ||
| out = json.loads(translation)['text'].strip() | ||
| assert "Nor will I ever touch the sacred" in out | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_audio_prompt(foscolo): | ||
| model_name = "openai/whisper-small" | ||
| server_args = ["--enforce-eager"] | ||
| # Condition whisper on starting text | ||
| prompt = "Nor have I ever" | ||
| with RemoteOpenAIServer(model_name, server_args) as remote_server: | ||
| client = remote_server.get_async_client() | ||
| transcription = await client.audio.translations.create( | ||
| model=model_name, | ||
| file=foscolo, | ||
| prompt=prompt, | ||
| extra_body=dict(language="it"), | ||
| response_format="text", | ||
| temperature=0.0) | ||
| out = json.loads(transcription)['text'] | ||
| assert "Nor will I ever touch the sacred" not in out | ||
| assert prompt not in out | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_non_asr_model(foscolo): | ||
| # text to text model | ||
| model_name = "JackFram/llama-68m" | ||
| server_args = ["--enforce-eager"] | ||
| with RemoteOpenAIServer(model_name, server_args) as remote_server: | ||
| client = remote_server.get_async_client() | ||
| res = await client.audio.translations.create(model=model_name, | ||
| file=foscolo, | ||
| temperature=0.0) | ||
| assert res.code == 400 and not res.text | ||
| assert res.message == "The model does not support Translations API" | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_streaming_response(foscolo): | ||
| model_name = "openai/whisper-small" | ||
| server_args = ["--enforce-eager"] | ||
| translation = "" | ||
| with RemoteOpenAIServer(model_name, server_args) as remote_server: | ||
| client = remote_server.get_async_client() | ||
| res_no_stream = await client.audio.translations.create( | ||
| model=model_name, | ||
| file=foscolo, | ||
| response_format="json", | ||
| extra_body=dict(language="it"), | ||
| temperature=0.0) | ||
| # Unfortunately this only works when the openai client is patched | ||
| # to use streaming mode, not exposed in the translation api. | ||
| original_post = AsyncAPIClient.post | ||
|
|
||
| async def post_with_stream(*args, **kwargs): | ||
| kwargs['stream'] = True | ||
| return await original_post(*args, **kwargs) | ||
|
|
||
| with patch.object(AsyncAPIClient, "post", new=post_with_stream): | ||
| client = remote_server.get_async_client() | ||
| res = await client.audio.translations.create(model=model_name, | ||
| file=foscolo, | ||
| temperature=0.0, | ||
| extra_body=dict( | ||
| stream=True, | ||
| language="it")) | ||
| # Reconstruct from chunks and validate | ||
| async for chunk in res: | ||
| # just a chunk | ||
| text = chunk.choices[0]['delta']['content'] | ||
| translation += text | ||
|
|
||
| assert translation == res_no_stream.text | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_stream_options(foscolo): | ||
| model_name = "openai/whisper-small" | ||
| server_args = ["--enforce-eager"] | ||
| with RemoteOpenAIServer(model_name, server_args) as remote_server: | ||
| original_post = AsyncAPIClient.post | ||
|
|
||
| async def post_with_stream(*args, **kwargs): | ||
| kwargs['stream'] = True | ||
| return await original_post(*args, **kwargs) | ||
|
|
||
| with patch.object(AsyncAPIClient, "post", new=post_with_stream): | ||
| client = remote_server.get_async_client() | ||
| res = await client.audio.translations.create( | ||
| model=model_name, | ||
| file=foscolo, | ||
| temperature=0.0, | ||
| extra_body=dict(language="it", | ||
| stream=True, | ||
| stream_include_usage=True, | ||
| stream_continuous_usage_stats=True)) | ||
| final = False | ||
| continuous = True | ||
| async for chunk in res: | ||
| if not len(chunk.choices): | ||
| # final usage sent | ||
| final = True | ||
| else: | ||
| continuous = continuous and hasattr(chunk, 'usage') | ||
| assert final and continuous | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_long_audio_request(foscolo): | ||
| model_name = "openai/whisper-small" | ||
| server_args = ["--enforce-eager"] | ||
|
|
||
| foscolo.seek(0) | ||
| audio, sr = librosa.load(foscolo) | ||
| repeated_audio = np.tile(audio, 2) | ||
| # Repeated audio to buffer | ||
| buffer = io.BytesIO() | ||
| sf.write(buffer, repeated_audio, sr, format='WAV') | ||
| buffer.seek(0) | ||
| with RemoteOpenAIServer(model_name, server_args) as remote_server: | ||
| client = remote_server.get_async_client() | ||
| translation = await client.audio.translations.create( | ||
| model=model_name, | ||
| file=buffer, | ||
| extra_body=dict(language="it"), | ||
| response_format="text", | ||
| temperature=0.0) | ||
| out = json.loads(translation)['text'].strip().lower() | ||
| # TODO investigate higher model uncertainty in for longer translations. | ||
| assert out.count("nor will i ever") == 2 | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NickLucche @DarkLight1337 @ywang96
We're testing vLLM with PyTorch 2.8, and this assertion gets triggered -- this model generates different text from PyTorch 2.7 to PyTorch 2.8. I see a todo for higher model uncertainty: is this behavior expected?
PyTorch 2.7:
nor will i ever touch the sacred places where my body is made of jacquero, my treasure, which mirrors you in the shadow of the greek sea, from which the virgines are born to come, and faithfully to that island he confuses them with his first smile, he waves his naked and your foreheads, the incline towards him that the water sings of fatal fate, and the different exile, for which, beautiful of fame and disdain, nor will i ever touch i will touch the sacred places where my body makes the water drop, my zacinto, which mirrors you in the wave of the greek sea, from which the virgin water comes, and faithfully to that island it flutters with its first smile. the waves are not a tack, your clean clouds and your fronts, the incline towards him that the water sings of fatal, and the different exile, for which, beautiful of fame and of adventure,
PyTorch 2.8:
nor do i ever touch the sacred places where my body is made of jacquero, my treasure, which mirrors you in the shadow of the greek sea, from which the virgines come into water, and faithfully to that island he confuses them with his first smile, he waves his naked and naked limbs, the incline towards him that the water sings of fatal fate, and the different exile, for which, beautiful of fame and disdain, nor do i ever touch i will touch the sacred places where my body makes the water drop, my zacinto, which mirrors you in the wave of the greek sea, from which the virgin water comes, and faithfully to that island it flutters with its first smile. the waves are not a tack, your clean clouds and your fronts, the incline towards him that the water sings of fatal, and the different exile, for which, beautiful of fame and of adventure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I think we can safely change the test here.
My conclusion was that a) translation wasn't a primary task (latest whisper -turbo model doesn't support it) so the model isn't as resilient and b) this particular sample may just be hard, and scores for the second token end up being quite similar.
Either way, I will put up a PR for this as I've also just witnessed that behavior on blackwell.