Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RE-OPENED ELSEWHERE] HuggingFace support for Domain Classifier #138

Closed
wants to merge 20 commits into from

Conversation

sarahyurick
Copy link
Collaborator

@sarahyurick sarahyurick commented Jul 2, 2024

Copy link
Collaborator Author

@sarahyurick sarahyurick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments.

nemo_curator/utils/script_utils.py Outdated Show resolved Hide resolved
@sarahyurick sarahyurick marked this pull request as ready for review July 19, 2024 20:58
@sarahyurick
Copy link
Collaborator Author

Hi @VibhuJawa this is ready for review, can confirm the HuggingFace domain classifier produces the same results as our previous pipeline.

Copy link
Collaborator

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks great to me. Just have some nits around type hints. Looks great to me.

Thanks for working on this @sarahyurick, code is so much cleaner and usable now

nemo_curator/modules/distributed_data_classifier.py Outdated Show resolved Hide resolved
nemo_curator/modules/distributed_data_classifier.py Outdated Show resolved Hide resolved
nemo_curator/modules/distributed_data_classifier.py Outdated Show resolved Hide resolved
nemo_curator/modules/distributed_data_classifier.py Outdated Show resolved Hide resolved
@sarahyurick
Copy link
Collaborator Author

Thanks @VibhuJawa ! Updated, LMK what you think.

Copy link
Collaborator

@ryantwolf ryantwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor thing. Looks great, thank you so much!

@sarahyurick
Copy link
Collaborator Author

Thanks @ryantwolf and @VibhuJawa ! Should be ready for another review. The build_and_test failure does not seem to be related to the PR?

Copy link
Collaborator

@ryantwolf ryantwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good on my end, thanks a bunch for this!

Copy link
Collaborator

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ayushdg and others added 19 commits July 23, 2024 12:02
* Stricter query planning checks with newer versions of dask

Signed-off-by: Ayush Dattagupta <[email protected]>

* Add checks to tests/__init__

Signed-off-by: Ayush Dattagupta <[email protected]>

* Check sys.modules to ensure dask-expr is not enabled

Signed-off-by: Ayush Dattagupta <[email protected]>

* Search for "dask_expr" in sys modules

Co-authored-by: Richard (Rick) Zamora <[email protected]>
Signed-off-by: Ayush Dattagupta <[email protected]>

* use dask_expr instead of dask-expr

Signed-off-by: Ayush Dattagupta <[email protected]>

---------

Signed-off-by: Ayush Dattagupta <[email protected]>
Co-authored-by: Richard (Rick) Zamora <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
* Applying SEO Best Pratices (NVIDIA#104)

* Rename CPUvsGPU.rst to cpuvsgpu.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename DataCuration.rsts to datacuration.rsts

Signed-off-by: Andrew Schilling <[email protected]>

* Rename DistributedDataClassification.rst to distributeddataclassification.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename DocumentDataset.rst to documentdataset.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename Download.rst to download.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename GpuDeduplication.rst to gpudeduplication.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename KubernetesCurator.rst to kubernetescurator.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename LanguageIdentificationUnicodeFormatting.rst to languageidentificationunicodeformatting.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename PersonalIdentifiableInformationIdentificationAndRemoval.rst to personalidentifiableinformationidentificationandremoval.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename QualityFiltering.rst to qualityfiltering.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename TaskDecontamination.rst to taskdecontamination.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update index.rst

Setting all RST files to lowercase names.

Signed-off-by: Andrew Schilling <[email protected]>

* Ignore docs for EOF fixer hook

Signed-off-by: Ayush Dattagupta <[email protected]>

---------

Signed-off-by: Andrew Schilling <[email protected]>
Signed-off-by: Ayush Dattagupta <[email protected]>
Co-authored-by: Ayush Dattagupta <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* Shuffle CC result on group before writing out (NVIDIA#110)

Signed-off-by: Ayush Dattagupta <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst (NVIDIA#113)

Added links to tutorials

Signed-off-by: jgerh <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* first commit

Signed-off-by: avinashvem <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* mv under modules dir

Signed-off-by: avinashvem <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* first commit

Signed-off-by: avinashvem <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* mv under modules dir

Signed-off-by: avinashvem <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* first commit

Signed-off-by: Vibhu Jawa <[email protected]>

* mv under modules dir

Signed-off-by: Vibhu Jawa <[email protected]>

* embed by cluster saved

Signed-off-by: Vibhu Jawa <[email protected]>

* id map script

Signed-off-by: Vibhu Jawa <[email protected]>

* test commit

Signed-off-by: Vibhu Jawa <[email protected]>

* add id map script

Signed-off-by: Vibhu Jawa <[email protected]>

* Cleanup compute_embeddings_crossfit.py

Signed-off-by: Vibhu Jawa <[email protected]>

* Cleanup compute_embeddings_crossfit.py

Signed-off-by: Vibhu Jawa <[email protected]>

* Pre-commit style fixes

Signed-off-by: Vibhu Jawa <[email protected]>

* clustering_dask_crossfit.py

Signed-off-by: Vibhu Jawa <[email protected]>

* Minor clean up to sort_clusters_crossfit.py

Signed-off-by: Vibhu Jawa <[email protected]>

* cleanup semdedup_crossfit

Signed-off-by: Vibhu Jawa <[email protected]>

* Remove undo changes

Signed-off-by: Vibhu Jawa <[email protected]>

* Remove rename changes

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix rename

Signed-off-by: Vibhu Jawa <[email protected]>

* Readme formatting

Signed-off-by: Vibhu Jawa <[email protected]>

* add dask to semdedup_crossfit.py

Signed-off-by: Vibhu Jawa <[email protected]>

* README.md updates

Signed-off-by: Vibhu Jawa <[email protected]>

* README.md updates

Signed-off-by: Vibhu Jawa <[email protected]>

* README.md updates

Signed-off-by: Vibhu Jawa <[email protected]>

* README.md updates

Signed-off-by: Vibhu Jawa <[email protected]>

* README.md updates

Signed-off-by: Vibhu Jawa <[email protected]>

* configure max memory using a cli

Signed-off-by: Vibhu Jawa <[email protected]>

* Dumb id results to parquet

Signed-off-by: Vibhu Jawa <[email protected]>

* Embedding fixes

Signed-off-by: Vibhu Jawa <[email protected]>

* README.md updates

Signed-off-by: Vibhu Jawa <[email protected]>

* Working end to end

Signed-off-by: Vibhu Jawa <[email protected]>

* Minor  yaml fixes

Signed-off-by: Vibhu Jawa <[email protected]>

* Undo changes to index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update .pre-commit-config.yaml 

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Undo changes to docs/personalidentifiableinformationidentificationandremoval.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update fuzzy_dedup.py

Signed-off-by: Vibhu Jawa <[email protected]>

* Undo changes to docs/personalidentifiableinformationidentificationandremoval.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Add end to end script in readme.md

Signed-off-by: Vibhu Jawa <[email protected]>

* Add type hints

Signed-off-by: Vibhu Jawa <[email protected]>

* Use dask for sort_clusters

Signed-off-by: Vibhu Jawa <[email protected]>

* Make sort_clusters work on MNMG scales

Signed-off-by: Vibhu Jawa <[email protected]>

* Cleaned up dask shutdown

Signed-off-by: Vibhu Jawa <[email protected]>

* Decrease noise in E2E scripts

Signed-off-by: Vibhu Jawa <[email protected]>

* Clean up scripts

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix scripts/end_to_end_script.sh

Signed-off-by: Vibhu Jawa <[email protected]>

* Some more cleanup

Signed-off-by: Vibhu Jawa <[email protected]>

* Add copyright

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix README.md

Signed-off-by: Vibhu Jawa <[email protected]>

* Address reviews

Signed-off-by: Vibhu Jawa <[email protected]>

* Make work with a SemDedupConfig

Signed-off-by: Vibhu Jawa <[email protected]>

* Make work with SemDedupConfig

Signed-off-by: Vibhu Jawa <[email protected]>

* Move to nemo-curator's logger

Signed-off-by: Vibhu Jawa <[email protected]>

* Semdedup-extract_dedup_data.py

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Applying SEO Best Pratices (NVIDIA#104)

* Rename CPUvsGPU.rst to cpuvsgpu.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename DataCuration.rsts to datacuration.rsts

Signed-off-by: Andrew Schilling <[email protected]>

* Rename DistributedDataClassification.rst to distributeddataclassification.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename DocumentDataset.rst to documentdataset.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename Download.rst to download.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename GpuDeduplication.rst to gpudeduplication.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename KubernetesCurator.rst to kubernetescurator.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename LanguageIdentificationUnicodeFormatting.rst to languageidentificationunicodeformatting.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename PersonalIdentifiableInformationIdentificationAndRemoval.rst to personalidentifiableinformationidentificationandremoval.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename QualityFiltering.rst to qualityfiltering.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename TaskDecontamination.rst to taskdecontamination.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update index.rst

Setting all RST files to lowercase names.

Signed-off-by: Andrew Schilling <[email protected]>

* Ignore docs for EOF fixer hook

Signed-off-by: Ayush Dattagupta <[email protected]>

---------

Signed-off-by: Andrew Schilling <[email protected]>
Signed-off-by: Ayush Dattagupta <[email protected]>
Co-authored-by: Ayush Dattagupta <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix bad merge

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Add Module for embedding+clustering

Signed-off-by: Vibhu Jawa <[email protected]>

* Add sorting to clustering

Signed-off-by: Vibhu Jawa <[email protected]>

* Refactor Semdup modules

Signed-off-by: Vibhu Jawa <[email protected]>

* Refactor Semdup modules

Signed-off-by: Vibhu Jawa <[email protected]>

* Refactor Semdup modules

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix Readme.md

Signed-off-by: Vibhu Jawa <[email protected]>

* Add a environment variable to silence HF warnings

Signed-off-by: Vibhu Jawa <[email protected]>

* dask-cudf fix

Signed-off-by: Vibhu Jawa <[email protected]>

* dask-cudf fix

Signed-off-by: Vibhu Jawa <[email protected]>

* dask-cudf fix

Signed-off-by: Vibhu Jawa <[email protected]>

* Make config a flat file based on reviews

Signed-off-by: Vibhu Jawa <[email protected]>

* Add docstrings

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix argparse and seed function

Signed-off-by: Vibhu Jawa <[email protected]>

* Use argparse to read config

Signed-off-by: Vibhu Jawa <[email protected]>

* Move around config files

Signed-off-by: Vibhu Jawa <[email protected]>

* Move around config files

Signed-off-by: Vibhu Jawa <[email protected]>

* Move around config files

Signed-off-by: Vibhu Jawa <[email protected]>

* Remove end_to_end_script.sh

Signed-off-by: Vibhu Jawa <[email protected]>

* Append Readme

Signed-off-by: Vibhu Jawa <[email protected]>

* Address Reviews

Signed-off-by: Vibhu Jawa <[email protected]>

* Change config

Signed-off-by: Vibhu Jawa <[email protected]>

* Make embedding creation optionally lazy

Signed-off-by: Vibhu Jawa <[email protected]>

* fix docstring

Signed-off-by: Vibhu Jawa <[email protected]>

* Address Reviews and docstrings

Signed-off-by: Vibhu Jawa <[email protected]>

* Address Reviews and make eps_thresholds a list of values

Signed-off-by: Vibhu Jawa <[email protected]>

* Minor import fix

Signed-off-by: Vibhu Jawa <[email protected]>

* Empty Commit

Signed-off-by: Vibhu Jawa <[email protected]>

* Add modules to __init__ and README.md

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix init

Signed-off-by: Vibhu Jawa <[email protected]>

* Move comment

Signed-off-by: Vibhu Jawa <[email protected]>

* Empty commit to restart CI (which failed due to a download issue)

Signed-off-by: Vibhu Jawa <[email protected]>

* Empty commit to restart CI (which failed due to a download issue)

Signed-off-by: Vibhu Jawa <[email protected]>

---------

Signed-off-by: Andrew Schilling <[email protected]>
Signed-off-by: Ayush Dattagupta <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>
Signed-off-by: jgerh <[email protected]>
Signed-off-by: avinashvem <[email protected]>
Co-authored-by: Andrew Schilling <[email protected]>
Co-authored-by: Ayush Dattagupta <[email protected]>
Co-authored-by: jgerh <[email protected]>
Co-authored-by: avinashvem <[email protected]>
* Begin implementation on OpenAI client

Signed-off-by: Ryan Wolf <[email protected]>

* Fix relative import

Signed-off-by: Ryan Wolf <[email protected]>

* Add temperature

Signed-off-by: Ryan Wolf <[email protected]>

* Modify client interface and begin ultrachat

Signed-off-by: Ryan Wolf <[email protected]>

* Change type annotation in openai client

Signed-off-by: Ryan Wolf <[email protected]>

* Make imports easier

Signed-off-by: Ryan Wolf <[email protected]>

* Reformat to match nemotron report

Signed-off-by: Ryan Wolf <[email protected]>

* Add yaml conversion

Signed-off-by: Ryan Wolf <[email protected]>

* Fix index error

Signed-off-by: Ryan Wolf <[email protected]>

* Add error handling for yaml parsing

Signed-off-by: Ryan Wolf <[email protected]>

* Fix error

Signed-off-by: Ryan Wolf <[email protected]>

* Add additional yaml parsing check

Signed-off-by: Ryan Wolf <[email protected]>

* Add more yaml error handling

Signed-off-by: Ryan Wolf <[email protected]>

* Export conversion error

Signed-off-by: Ryan Wolf <[email protected]>

* Change variable naming

Signed-off-by: Ryan Wolf <[email protected]>

* Make error catching more general

Signed-off-by: Ryan Wolf <[email protected]>

* Refactor list out of nemotron

Signed-off-by: Ryan Wolf <[email protected]>

* Add prompt helper function

Signed-off-by: Ryan Wolf <[email protected]>

* Add revisions and writing prompts

Signed-off-by: Ryan Wolf <[email protected]>

* Fix default prompt templates

Signed-off-by: Ryan Wolf <[email protected]>

* Add closed qa

Signed-off-by: Ryan Wolf <[email protected]>

* Fix prompt

Signed-off-by: Ryan Wolf <[email protected]>

* Add math and coding

Signed-off-by: Ryan Wolf <[email protected]>

* Add problem generation

Signed-off-by: Ryan Wolf <[email protected]>

* Rename function

Signed-off-by: Ryan Wolf <[email protected]>

* Add dialogue support

Signed-off-by: Ryan Wolf <[email protected]>

* Fix mispell

Signed-off-by: Ryan Wolf <[email protected]>

* Add two turn generation

Signed-off-by: Ryan Wolf <[email protected]>

* Add reward model as judge

Signed-off-by: Ryan Wolf <[email protected]>

* Refactor reward query

Signed-off-by: Ryan Wolf <[email protected]>

* Add error handling for non-reward models

Signed-off-by: Ryan Wolf <[email protected]>

* Add error handling to sync client

Signed-off-by: Ryan Wolf <[email protected]>

* Add open qa pipeline

Signed-off-by: Ryan Wolf <[email protected]>

* Improve docs and add writing pipeline

Signed-off-by: Ryan Wolf <[email protected]>

* Add closed qa pipeline

Signed-off-by: Ryan Wolf <[email protected]>

* Add math pipeline

Signed-off-by: Ryan Wolf <[email protected]>

* Add python pipeline

Signed-off-by: Ryan Wolf <[email protected]>

* Add async nemotron generator

Signed-off-by: Ryan Wolf <[email protected]>

* Fix await with index

Signed-off-by: Ryan Wolf <[email protected]>

* Add seed parameter

Signed-off-by: Ryan Wolf <[email protected]>

* Add missing await

Signed-off-by: Ryan Wolf <[email protected]>

* Fix parameter names

Signed-off-by: Ryan Wolf <[email protected]>

* Fix subscript await issues

Signed-off-by: Ryan Wolf <[email protected]>

* Switch parsing method for reward model

Signed-off-by: Ryan Wolf <[email protected]>

* Add initial docs

Signed-off-by: Ryan Wolf <[email protected]>

* Add nemo deploy client

Signed-off-by: Ryan Wolf <[email protected]>

* Add easy import

Signed-off-by: Ryan Wolf <[email protected]>

* Move conversation formatter

Signed-off-by: Ryan Wolf <[email protected]>

* Add other file

Signed-off-by: Ryan Wolf <[email protected]>

* Update nemotron import

Signed-off-by: Ryan Wolf <[email protected]>

* Update model client import

Signed-off-by: Ryan Wolf <[email protected]>

* Remove model in query call

Signed-off-by: Ryan Wolf <[email protected]>

* Add extra index

Signed-off-by: Ryan Wolf <[email protected]>

* Fix response indexing

Signed-off-by: Ryan Wolf <[email protected]>

* Add top k

Signed-off-by: Ryan Wolf <[email protected]>

* Remove extras

Signed-off-by: Ryan Wolf <[email protected]>

* Add safe import for nemo deploy

Signed-off-by: Ryan Wolf <[email protected]>

* Add pandas conversions

Signed-off-by: Ryan Wolf <[email protected]>

* Add partition default

Signed-off-by: Ryan Wolf <[email protected]>

* Add no format

Signed-off-by: Ryan Wolf <[email protected]>

* Move no format location

Signed-off-by: Ryan Wolf <[email protected]>

* Use top_k in nemo client

Signed-off-by: Ryan Wolf <[email protected]>

* Address vibhu's review

Signed-off-by: Ryan Wolf <[email protected]>

* Add logging import

Signed-off-by: Ryan Wolf <[email protected]>

* Fix import

Signed-off-by: Ryan Wolf <[email protected]>

* Fix tqdm

Signed-off-by: Ryan Wolf <[email protected]>

* Add missing awaits

Signed-off-by: Ryan Wolf <[email protected]>

* Standardize names

Signed-off-by: Ryan Wolf <[email protected]>

* Address Ayush nit

Signed-off-by: Ryan Wolf <[email protected]>

---------

Signed-off-by: Ryan Wolf <[email protected]>
* Begin docs

Signed-off-by: Ryan Wolf <[email protected]>

* Add slurm sdk example

Signed-off-by: Ryan Wolf <[email protected]>

* Use safe import

Signed-off-by: Ryan Wolf <[email protected]>

* Fix bugs in sdk

Signed-off-by: Ryan Wolf <[email protected]>

* Update docs and tweak scripts

Signed-off-by: Ryan Wolf <[email protected]>

* Add interface helper function

Signed-off-by: Ryan Wolf <[email protected]>

* Update docs

Signed-off-by: Ryan Wolf <[email protected]>

* Fix formatting

Signed-off-by: Ryan Wolf <[email protected]>

* Add config docstring

Signed-off-by: Ryan Wolf <[email protected]>

* Address comments

Signed-off-by: Ryan Wolf <[email protected]>

---------

Signed-off-by: Ryan Wolf <[email protected]>
updates:
- [github.com/pre-commit/pre-commit-hooks: v4.5.0 → v4.6.0](pre-commit/pre-commit-hooks@v4.5.0...v4.6.0)
- [github.com/psf/black: 24.3.0 → 24.4.2](psf/black@24.3.0...24.4.2)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fix bug with torch rmm and nemo

Signed-off-by: Ryan Wolf <[email protected]>

* Change pycld2 version pin

Signed-off-by: Ryan Wolf <[email protected]>

---------

Signed-off-by: Ryan Wolf <[email protected]>
* Preving plugging an allocator twice

Signed-off-by: Vibhu Jawa <[email protected]>

* Remove extra import

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix defaults for RMM-POOL and other style fixes

Signed-off-by: Vibhu Jawa <[email protected]>

* Switch it rmm_pytorch off by default

Signed-off-by: Vibhu Jawa <[email protected]>

---------

Signed-off-by: Vibhu Jawa <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
@sarahyurick sarahyurick changed the title HuggingFace support for Domain Classifier [RE-OPENED ELSEWHERE] HuggingFace support for Domain Classifier Jul 23, 2024
sarahyurick added a commit to sarahyurick/NeMo-Curator that referenced this pull request Jul 23, 2024
Signed-off-by: Sarah Yurick <[email protected]>
ryantwolf pushed a commit that referenced this pull request Jul 23, 2024
Signed-off-by: Sarah Yurick <[email protected]>
@sarahyurick sarahyurick deleted the domain_hf_support branch October 25, 2024 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants