forked from NVIDIA/NeMo
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Prompt formatter API and canary transcribe tensor input support (NVID…
…IA#9206) * Apply CanaryPromptFormatter in dataset/inference Signed-off-by: Piotr Żelasko <[email protected]> * Working inference with CanaryPromptFormatter Signed-off-by: Piotr Żelasko <[email protected]> * Minimum working example of Canary.transcribe() with tensors Signed-off-by: Piotr Żelasko <[email protected]> * training fix Signed-off-by: Piotr Żelasko <[email protected]> * Update to the new 'chat' based prompt formatting API Signed-off-by: Piotr Żelasko <[email protected]> * Prompt formatters for popular models and partial unit test coverage Signed-off-by: Piotr Żelasko <[email protected]> * Updated documentation Signed-off-by: Piotr Żelasko <[email protected]> * Improved test coverage + proper preamble support Signed-off-by: Piotr Żelasko <[email protected]> * Fix usage of PromptFormatter for MT-AED class + fix tokenization/formatting issues Signed-off-by: Piotr Żelasko <[email protected]> * Move some canary hacks to canary prompt formatter, improve validation, add tests for aggtok Signed-off-by: Piotr Żelasko <[email protected]> * aed_model.transcribe(**slots) support, rename all slots to lowercase and drop pipes everywhere except template definition. Signed-off-by: Piotr Żelasko <[email protected]> * truly generic version Signed-off-by: Piotr Żelasko <[email protected]> * making transcribe_speech.py work prompt slots + syntactic sugar Signed-off-by: Piotr Żelasko <[email protected]> * update streaming_utils.py Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * code review: partial Signed-off-by: Piotr Żelasko <[email protected]> * Accept multi-turn, single-turn, and legacy prompt format in transcribe() and transcribe_speech.py Signed-off-by: Piotr Żelasko <[email protected]> * Address code reviews Signed-off-by: Piotr Żelasko <[email protected]> * Add support for SPE special tokens bos/eos in prompt templates and ensure Llama2 format gives identical results with the reference implementation Signed-off-by: Piotr Żelasko <[email protected]> * Fix tests and add llama2 prompt formatter tests Signed-off-by: Piotr Żelasko <[email protected]> * Fix tests Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Boxiang Wang <[email protected]>
- Loading branch information
Showing
26 changed files
with
1,382 additions
and
211 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.