-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Synthetic Data Generation Module #136
Conversation
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @ryantwolf . This is a very important functionality we are adding, Very excited for it.
My major concern currently is around not having a way to rate limit the number of requests we are sending, everything else is mostly nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit but at a high level looks good to me! Thanks a lot for this effort
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for working on this Ryan.
* Begin implementation on OpenAI client Signed-off-by: Ryan Wolf <[email protected]> * Fix relative import Signed-off-by: Ryan Wolf <[email protected]> * Add temperature Signed-off-by: Ryan Wolf <[email protected]> * Modify client interface and begin ultrachat Signed-off-by: Ryan Wolf <[email protected]> * Change type annotation in openai client Signed-off-by: Ryan Wolf <[email protected]> * Make imports easier Signed-off-by: Ryan Wolf <[email protected]> * Reformat to match nemotron report Signed-off-by: Ryan Wolf <[email protected]> * Add yaml conversion Signed-off-by: Ryan Wolf <[email protected]> * Fix index error Signed-off-by: Ryan Wolf <[email protected]> * Add error handling for yaml parsing Signed-off-by: Ryan Wolf <[email protected]> * Fix error Signed-off-by: Ryan Wolf <[email protected]> * Add additional yaml parsing check Signed-off-by: Ryan Wolf <[email protected]> * Add more yaml error handling Signed-off-by: Ryan Wolf <[email protected]> * Export conversion error Signed-off-by: Ryan Wolf <[email protected]> * Change variable naming Signed-off-by: Ryan Wolf <[email protected]> * Make error catching more general Signed-off-by: Ryan Wolf <[email protected]> * Refactor list out of nemotron Signed-off-by: Ryan Wolf <[email protected]> * Add prompt helper function Signed-off-by: Ryan Wolf <[email protected]> * Add revisions and writing prompts Signed-off-by: Ryan Wolf <[email protected]> * Fix default prompt templates Signed-off-by: Ryan Wolf <[email protected]> * Add closed qa Signed-off-by: Ryan Wolf <[email protected]> * Fix prompt Signed-off-by: Ryan Wolf <[email protected]> * Add math and coding Signed-off-by: Ryan Wolf <[email protected]> * Add problem generation Signed-off-by: Ryan Wolf <[email protected]> * Rename function Signed-off-by: Ryan Wolf <[email protected]> * Add dialogue support Signed-off-by: Ryan Wolf <[email protected]> * Fix mispell Signed-off-by: Ryan Wolf <[email protected]> * Add two turn generation Signed-off-by: Ryan Wolf <[email protected]> * Add reward model as judge Signed-off-by: Ryan Wolf <[email protected]> * Refactor reward query Signed-off-by: Ryan Wolf <[email protected]> * Add error handling for non-reward models Signed-off-by: Ryan Wolf <[email protected]> * Add error handling to sync client Signed-off-by: Ryan Wolf <[email protected]> * Add open qa pipeline Signed-off-by: Ryan Wolf <[email protected]> * Improve docs and add writing pipeline Signed-off-by: Ryan Wolf <[email protected]> * Add closed qa pipeline Signed-off-by: Ryan Wolf <[email protected]> * Add math pipeline Signed-off-by: Ryan Wolf <[email protected]> * Add python pipeline Signed-off-by: Ryan Wolf <[email protected]> * Add async nemotron generator Signed-off-by: Ryan Wolf <[email protected]> * Fix await with index Signed-off-by: Ryan Wolf <[email protected]> * Add seed parameter Signed-off-by: Ryan Wolf <[email protected]> * Add missing await Signed-off-by: Ryan Wolf <[email protected]> * Fix parameter names Signed-off-by: Ryan Wolf <[email protected]> * Fix subscript await issues Signed-off-by: Ryan Wolf <[email protected]> * Switch parsing method for reward model Signed-off-by: Ryan Wolf <[email protected]> * Add initial docs Signed-off-by: Ryan Wolf <[email protected]> * Add nemo deploy client Signed-off-by: Ryan Wolf <[email protected]> * Add easy import Signed-off-by: Ryan Wolf <[email protected]> * Move conversation formatter Signed-off-by: Ryan Wolf <[email protected]> * Add other file Signed-off-by: Ryan Wolf <[email protected]> * Update nemotron import Signed-off-by: Ryan Wolf <[email protected]> * Update model client import Signed-off-by: Ryan Wolf <[email protected]> * Remove model in query call Signed-off-by: Ryan Wolf <[email protected]> * Add extra index Signed-off-by: Ryan Wolf <[email protected]> * Fix response indexing Signed-off-by: Ryan Wolf <[email protected]> * Add top k Signed-off-by: Ryan Wolf <[email protected]> * Remove extras Signed-off-by: Ryan Wolf <[email protected]> * Add safe import for nemo deploy Signed-off-by: Ryan Wolf <[email protected]> * Add pandas conversions Signed-off-by: Ryan Wolf <[email protected]> * Add partition default Signed-off-by: Ryan Wolf <[email protected]> * Add no format Signed-off-by: Ryan Wolf <[email protected]> * Move no format location Signed-off-by: Ryan Wolf <[email protected]> * Use top_k in nemo client Signed-off-by: Ryan Wolf <[email protected]> * Address vibhu's review Signed-off-by: Ryan Wolf <[email protected]> * Add logging import Signed-off-by: Ryan Wolf <[email protected]> * Fix import Signed-off-by: Ryan Wolf <[email protected]> * Fix tqdm Signed-off-by: Ryan Wolf <[email protected]> * Add missing awaits Signed-off-by: Ryan Wolf <[email protected]> * Standardize names Signed-off-by: Ryan Wolf <[email protected]> * Address Ayush nit Signed-off-by: Ryan Wolf <[email protected]> --------- Signed-off-by: Ryan Wolf <[email protected]>
Description
Adds a suite of tools for interacting with LLM services. These LLM services are then used to build synthetic data generation tools and example pipelines following the Nemotron 340B Technical Report. The prompt templates used in the report are supplied as defaults throughout the code.
Usage
OpenAI API
NeMo Deploy
Checklist