Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added tutorials to index.rst #113

Merged
merged 1 commit into from
Jun 13, 2024
Merged

Added tutorials to index.rst #113

merged 1 commit into from
Jun 13, 2024

Conversation

jgerh
Copy link
Contributor

@jgerh jgerh commented Jun 13, 2024

Added links to tutorials

Description

Usage

# Add snippet demonstrating usage

Checklist

  • I am familiar with the Contributing Guide.
  • New or Existing tests cover these changes.
  • The documentation is up to date with these changes.

Added links to tutorials

Signed-off-by: jgerh <[email protected]>
@jgerh
Copy link
Contributor Author

jgerh commented Jun 13, 2024

Added tutorials per [NeMo Curator] add links for the NeMo Curator tutorials to the NeMo Framework User Guide https://gitlab-master.nvidia.com/nemo-framework-tme/documentation/-/issues/28

Copy link
Collaborator

@ryantwolf ryantwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@ryantwolf ryantwolf merged commit f1e993b into NVIDIA:main Jun 13, 2024
3 checks passed
VibhuJawa pushed a commit to VibhuJawa/NeMo-Curator that referenced this pull request Jun 27, 2024
Added links to tutorials

Signed-off-by: jgerh <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>
VibhuJawa pushed a commit to VibhuJawa/NeMo-Curator that referenced this pull request Jun 27, 2024
Added links to tutorials

Signed-off-by: jgerh <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>
VibhuJawa added a commit that referenced this pull request Jul 5, 2024
* Applying SEO Best Pratices (#104)

* Rename CPUvsGPU.rst to cpuvsgpu.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename DataCuration.rsts to datacuration.rsts

Signed-off-by: Andrew Schilling <[email protected]>

* Rename DistributedDataClassification.rst to distributeddataclassification.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename DocumentDataset.rst to documentdataset.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename Download.rst to download.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename GpuDeduplication.rst to gpudeduplication.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename KubernetesCurator.rst to kubernetescurator.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename LanguageIdentificationUnicodeFormatting.rst to languageidentificationunicodeformatting.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename PersonalIdentifiableInformationIdentificationAndRemoval.rst to personalidentifiableinformationidentificationandremoval.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename QualityFiltering.rst to qualityfiltering.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename TaskDecontamination.rst to taskdecontamination.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update index.rst

Setting all RST files to lowercase names.

Signed-off-by: Andrew Schilling <[email protected]>

* Ignore docs for EOF fixer hook

Signed-off-by: Ayush Dattagupta <[email protected]>

---------

Signed-off-by: Andrew Schilling <[email protected]>
Signed-off-by: Ayush Dattagupta <[email protected]>
Co-authored-by: Ayush Dattagupta <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* Shuffle CC result on group before writing out (#110)

Signed-off-by: Ayush Dattagupta <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst (#113)

Added links to tutorials

Signed-off-by: jgerh <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* first commit

Signed-off-by: avinashvem <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* mv under modules dir

Signed-off-by: avinashvem <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* first commit

Signed-off-by: avinashvem <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* mv under modules dir

Signed-off-by: avinashvem <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* first commit

Signed-off-by: Vibhu Jawa <[email protected]>

* mv under modules dir

Signed-off-by: Vibhu Jawa <[email protected]>

* embed by cluster saved

Signed-off-by: Vibhu Jawa <[email protected]>

* id map script

Signed-off-by: Vibhu Jawa <[email protected]>

* test commit

Signed-off-by: Vibhu Jawa <[email protected]>

* add id map script

Signed-off-by: Vibhu Jawa <[email protected]>

* Cleanup compute_embeddings_crossfit.py

Signed-off-by: Vibhu Jawa <[email protected]>

* Cleanup compute_embeddings_crossfit.py

Signed-off-by: Vibhu Jawa <[email protected]>

* Pre-commit style fixes

Signed-off-by: Vibhu Jawa <[email protected]>

* clustering_dask_crossfit.py

Signed-off-by: Vibhu Jawa <[email protected]>

* Minor clean up to sort_clusters_crossfit.py

Signed-off-by: Vibhu Jawa <[email protected]>

* cleanup semdedup_crossfit

Signed-off-by: Vibhu Jawa <[email protected]>

* Remove undo changes

Signed-off-by: Vibhu Jawa <[email protected]>

* Remove rename changes

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix rename

Signed-off-by: Vibhu Jawa <[email protected]>

* Readme formatting

Signed-off-by: Vibhu Jawa <[email protected]>

* add dask to semdedup_crossfit.py

Signed-off-by: Vibhu Jawa <[email protected]>

* README.md updates

Signed-off-by: Vibhu Jawa <[email protected]>

* README.md updates

Signed-off-by: Vibhu Jawa <[email protected]>

* README.md updates

Signed-off-by: Vibhu Jawa <[email protected]>

* README.md updates

Signed-off-by: Vibhu Jawa <[email protected]>

* README.md updates

Signed-off-by: Vibhu Jawa <[email protected]>

* configure max memory using a cli

Signed-off-by: Vibhu Jawa <[email protected]>

* Dumb id results to parquet

Signed-off-by: Vibhu Jawa <[email protected]>

* Embedding fixes

Signed-off-by: Vibhu Jawa <[email protected]>

* README.md updates

Signed-off-by: Vibhu Jawa <[email protected]>

* Working end to end

Signed-off-by: Vibhu Jawa <[email protected]>

* Minor  yaml fixes

Signed-off-by: Vibhu Jawa <[email protected]>

* Undo changes to index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update .pre-commit-config.yaml 

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Undo changes to docs/personalidentifiableinformationidentificationandremoval.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update fuzzy_dedup.py

Signed-off-by: Vibhu Jawa <[email protected]>

* Undo changes to docs/personalidentifiableinformationidentificationandremoval.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Add end to end script in readme.md

Signed-off-by: Vibhu Jawa <[email protected]>

* Add type hints

Signed-off-by: Vibhu Jawa <[email protected]>

* Use dask for sort_clusters

Signed-off-by: Vibhu Jawa <[email protected]>

* Make sort_clusters work on MNMG scales

Signed-off-by: Vibhu Jawa <[email protected]>

* Cleaned up dask shutdown

Signed-off-by: Vibhu Jawa <[email protected]>

* Decrease noise in E2E scripts

Signed-off-by: Vibhu Jawa <[email protected]>

* Clean up scripts

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix scripts/end_to_end_script.sh

Signed-off-by: Vibhu Jawa <[email protected]>

* Some more cleanup

Signed-off-by: Vibhu Jawa <[email protected]>

* Add copyright

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix README.md

Signed-off-by: Vibhu Jawa <[email protected]>

* Address reviews

Signed-off-by: Vibhu Jawa <[email protected]>

* Make work with a SemDedupConfig

Signed-off-by: Vibhu Jawa <[email protected]>

* Make work with SemDedupConfig

Signed-off-by: Vibhu Jawa <[email protected]>

* Move to nemo-curator's logger

Signed-off-by: Vibhu Jawa <[email protected]>

* Semdedup-extract_dedup_data.py

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Applying SEO Best Pratices (#104)

* Rename CPUvsGPU.rst to cpuvsgpu.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename DataCuration.rsts to datacuration.rsts

Signed-off-by: Andrew Schilling <[email protected]>

* Rename DistributedDataClassification.rst to distributeddataclassification.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename DocumentDataset.rst to documentdataset.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename Download.rst to download.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename GpuDeduplication.rst to gpudeduplication.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename KubernetesCurator.rst to kubernetescurator.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename LanguageIdentificationUnicodeFormatting.rst to languageidentificationunicodeformatting.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename PersonalIdentifiableInformationIdentificationAndRemoval.rst to personalidentifiableinformationidentificationandremoval.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename QualityFiltering.rst to qualityfiltering.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename TaskDecontamination.rst to taskdecontamination.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update index.rst

Setting all RST files to lowercase names.

Signed-off-by: Andrew Schilling <[email protected]>

* Ignore docs for EOF fixer hook

Signed-off-by: Ayush Dattagupta <[email protected]>

---------

Signed-off-by: Andrew Schilling <[email protected]>
Signed-off-by: Ayush Dattagupta <[email protected]>
Co-authored-by: Ayush Dattagupta <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix bad merge

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Add Module for embedding+clustering

Signed-off-by: Vibhu Jawa <[email protected]>

* Add sorting to clustering

Signed-off-by: Vibhu Jawa <[email protected]>

* Refactor Semdup modules

Signed-off-by: Vibhu Jawa <[email protected]>

* Refactor Semdup modules

Signed-off-by: Vibhu Jawa <[email protected]>

* Refactor Semdup modules

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix Readme.md

Signed-off-by: Vibhu Jawa <[email protected]>

* Add a environment variable to silence HF warnings

Signed-off-by: Vibhu Jawa <[email protected]>

* dask-cudf fix

Signed-off-by: Vibhu Jawa <[email protected]>

* dask-cudf fix

Signed-off-by: Vibhu Jawa <[email protected]>

* dask-cudf fix

Signed-off-by: Vibhu Jawa <[email protected]>

* Make config a flat file based on reviews

Signed-off-by: Vibhu Jawa <[email protected]>

* Add docstrings

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix argparse and seed function

Signed-off-by: Vibhu Jawa <[email protected]>

* Use argparse to read config

Signed-off-by: Vibhu Jawa <[email protected]>

* Move around config files

Signed-off-by: Vibhu Jawa <[email protected]>

* Move around config files

Signed-off-by: Vibhu Jawa <[email protected]>

* Move around config files

Signed-off-by: Vibhu Jawa <[email protected]>

* Remove end_to_end_script.sh

Signed-off-by: Vibhu Jawa <[email protected]>

* Append Readme

Signed-off-by: Vibhu Jawa <[email protected]>

* Address Reviews

Signed-off-by: Vibhu Jawa <[email protected]>

* Change config

Signed-off-by: Vibhu Jawa <[email protected]>

* Make embedding creation optionally lazy

Signed-off-by: Vibhu Jawa <[email protected]>

* fix docstring

Signed-off-by: Vibhu Jawa <[email protected]>

* Address Reviews and docstrings

Signed-off-by: Vibhu Jawa <[email protected]>

* Address Reviews and make eps_thresholds a list of values

Signed-off-by: Vibhu Jawa <[email protected]>

* Minor import fix

Signed-off-by: Vibhu Jawa <[email protected]>

* Empty Commit

Signed-off-by: Vibhu Jawa <[email protected]>

* Add modules to __init__ and README.md

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix init

Signed-off-by: Vibhu Jawa <[email protected]>

* Move comment

Signed-off-by: Vibhu Jawa <[email protected]>

* Empty commit to restart CI (which failed due to a download issue)

Signed-off-by: Vibhu Jawa <[email protected]>

* Empty commit to restart CI (which failed due to a download issue)

Signed-off-by: Vibhu Jawa <[email protected]>

---------

Signed-off-by: Andrew Schilling <[email protected]>
Signed-off-by: Ayush Dattagupta <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>
Signed-off-by: jgerh <[email protected]>
Signed-off-by: avinashvem <[email protected]>
Co-authored-by: Andrew Schilling <[email protected]>
Co-authored-by: Ayush Dattagupta <[email protected]>
Co-authored-by: jgerh <[email protected]>
Co-authored-by: avinashvem <[email protected]>
sarahyurick pushed a commit to sarahyurick/NeMo-Curator that referenced this pull request Jul 23, 2024
* Applying SEO Best Pratices (NVIDIA#104)

* Rename CPUvsGPU.rst to cpuvsgpu.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename DataCuration.rsts to datacuration.rsts

Signed-off-by: Andrew Schilling <[email protected]>

* Rename DistributedDataClassification.rst to distributeddataclassification.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename DocumentDataset.rst to documentdataset.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename Download.rst to download.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename GpuDeduplication.rst to gpudeduplication.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename KubernetesCurator.rst to kubernetescurator.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename LanguageIdentificationUnicodeFormatting.rst to languageidentificationunicodeformatting.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename PersonalIdentifiableInformationIdentificationAndRemoval.rst to personalidentifiableinformationidentificationandremoval.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename QualityFiltering.rst to qualityfiltering.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename TaskDecontamination.rst to taskdecontamination.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update index.rst

Setting all RST files to lowercase names.

Signed-off-by: Andrew Schilling <[email protected]>

* Ignore docs for EOF fixer hook

Signed-off-by: Ayush Dattagupta <[email protected]>

---------

Signed-off-by: Andrew Schilling <[email protected]>
Signed-off-by: Ayush Dattagupta <[email protected]>
Co-authored-by: Ayush Dattagupta <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* Shuffle CC result on group before writing out (NVIDIA#110)

Signed-off-by: Ayush Dattagupta <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst (NVIDIA#113)

Added links to tutorials

Signed-off-by: jgerh <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* first commit

Signed-off-by: avinashvem <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* mv under modules dir

Signed-off-by: avinashvem <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* first commit

Signed-off-by: avinashvem <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* mv under modules dir

Signed-off-by: avinashvem <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>

* first commit

Signed-off-by: Vibhu Jawa <[email protected]>

* mv under modules dir

Signed-off-by: Vibhu Jawa <[email protected]>

* embed by cluster saved

Signed-off-by: Vibhu Jawa <[email protected]>

* id map script

Signed-off-by: Vibhu Jawa <[email protected]>

* test commit

Signed-off-by: Vibhu Jawa <[email protected]>

* add id map script

Signed-off-by: Vibhu Jawa <[email protected]>

* Cleanup compute_embeddings_crossfit.py

Signed-off-by: Vibhu Jawa <[email protected]>

* Cleanup compute_embeddings_crossfit.py

Signed-off-by: Vibhu Jawa <[email protected]>

* Pre-commit style fixes

Signed-off-by: Vibhu Jawa <[email protected]>

* clustering_dask_crossfit.py

Signed-off-by: Vibhu Jawa <[email protected]>

* Minor clean up to sort_clusters_crossfit.py

Signed-off-by: Vibhu Jawa <[email protected]>

* cleanup semdedup_crossfit

Signed-off-by: Vibhu Jawa <[email protected]>

* Remove undo changes

Signed-off-by: Vibhu Jawa <[email protected]>

* Remove rename changes

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix rename

Signed-off-by: Vibhu Jawa <[email protected]>

* Readme formatting

Signed-off-by: Vibhu Jawa <[email protected]>

* add dask to semdedup_crossfit.py

Signed-off-by: Vibhu Jawa <[email protected]>

* README.md updates

Signed-off-by: Vibhu Jawa <[email protected]>

* README.md updates

Signed-off-by: Vibhu Jawa <[email protected]>

* README.md updates

Signed-off-by: Vibhu Jawa <[email protected]>

* README.md updates

Signed-off-by: Vibhu Jawa <[email protected]>

* README.md updates

Signed-off-by: Vibhu Jawa <[email protected]>

* configure max memory using a cli

Signed-off-by: Vibhu Jawa <[email protected]>

* Dumb id results to parquet

Signed-off-by: Vibhu Jawa <[email protected]>

* Embedding fixes

Signed-off-by: Vibhu Jawa <[email protected]>

* README.md updates

Signed-off-by: Vibhu Jawa <[email protected]>

* Working end to end

Signed-off-by: Vibhu Jawa <[email protected]>

* Minor  yaml fixes

Signed-off-by: Vibhu Jawa <[email protected]>

* Undo changes to index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update .pre-commit-config.yaml 

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Undo changes to docs/personalidentifiableinformationidentificationandremoval.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update fuzzy_dedup.py

Signed-off-by: Vibhu Jawa <[email protected]>

* Undo changes to docs/personalidentifiableinformationidentificationandremoval.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Add end to end script in readme.md

Signed-off-by: Vibhu Jawa <[email protected]>

* Add type hints

Signed-off-by: Vibhu Jawa <[email protected]>

* Use dask for sort_clusters

Signed-off-by: Vibhu Jawa <[email protected]>

* Make sort_clusters work on MNMG scales

Signed-off-by: Vibhu Jawa <[email protected]>

* Cleaned up dask shutdown

Signed-off-by: Vibhu Jawa <[email protected]>

* Decrease noise in E2E scripts

Signed-off-by: Vibhu Jawa <[email protected]>

* Clean up scripts

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix scripts/end_to_end_script.sh

Signed-off-by: Vibhu Jawa <[email protected]>

* Some more cleanup

Signed-off-by: Vibhu Jawa <[email protected]>

* Add copyright

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix README.md

Signed-off-by: Vibhu Jawa <[email protected]>

* Address reviews

Signed-off-by: Vibhu Jawa <[email protected]>

* Make work with a SemDedupConfig

Signed-off-by: Vibhu Jawa <[email protected]>

* Make work with SemDedupConfig

Signed-off-by: Vibhu Jawa <[email protected]>

* Move to nemo-curator's logger

Signed-off-by: Vibhu Jawa <[email protected]>

* Semdedup-extract_dedup_data.py

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Applying SEO Best Pratices (NVIDIA#104)

* Rename CPUvsGPU.rst to cpuvsgpu.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename DataCuration.rsts to datacuration.rsts

Signed-off-by: Andrew Schilling <[email protected]>

* Rename DistributedDataClassification.rst to distributeddataclassification.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename DocumentDataset.rst to documentdataset.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename Download.rst to download.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename GpuDeduplication.rst to gpudeduplication.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename KubernetesCurator.rst to kubernetescurator.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename LanguageIdentificationUnicodeFormatting.rst to languageidentificationunicodeformatting.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename PersonalIdentifiableInformationIdentificationAndRemoval.rst to personalidentifiableinformationidentificationandremoval.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename QualityFiltering.rst to qualityfiltering.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Rename TaskDecontamination.rst to taskdecontamination.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update index.rst

Setting all RST files to lowercase names.

Signed-off-by: Andrew Schilling <[email protected]>

* Ignore docs for EOF fixer hook

Signed-off-by: Ayush Dattagupta <[email protected]>

---------

Signed-off-by: Andrew Schilling <[email protected]>
Signed-off-by: Ayush Dattagupta <[email protected]>
Co-authored-by: Ayush Dattagupta <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix bad merge

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Update index.rst

Signed-off-by: Vibhu Jawa <[email protected]>

* Add Module for embedding+clustering

Signed-off-by: Vibhu Jawa <[email protected]>

* Add sorting to clustering

Signed-off-by: Vibhu Jawa <[email protected]>

* Refactor Semdup modules

Signed-off-by: Vibhu Jawa <[email protected]>

* Refactor Semdup modules

Signed-off-by: Vibhu Jawa <[email protected]>

* Refactor Semdup modules

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix Readme.md

Signed-off-by: Vibhu Jawa <[email protected]>

* Add a environment variable to silence HF warnings

Signed-off-by: Vibhu Jawa <[email protected]>

* dask-cudf fix

Signed-off-by: Vibhu Jawa <[email protected]>

* dask-cudf fix

Signed-off-by: Vibhu Jawa <[email protected]>

* dask-cudf fix

Signed-off-by: Vibhu Jawa <[email protected]>

* Make config a flat file based on reviews

Signed-off-by: Vibhu Jawa <[email protected]>

* Add docstrings

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix argparse and seed function

Signed-off-by: Vibhu Jawa <[email protected]>

* Use argparse to read config

Signed-off-by: Vibhu Jawa <[email protected]>

* Move around config files

Signed-off-by: Vibhu Jawa <[email protected]>

* Move around config files

Signed-off-by: Vibhu Jawa <[email protected]>

* Move around config files

Signed-off-by: Vibhu Jawa <[email protected]>

* Remove end_to_end_script.sh

Signed-off-by: Vibhu Jawa <[email protected]>

* Append Readme

Signed-off-by: Vibhu Jawa <[email protected]>

* Address Reviews

Signed-off-by: Vibhu Jawa <[email protected]>

* Change config

Signed-off-by: Vibhu Jawa <[email protected]>

* Make embedding creation optionally lazy

Signed-off-by: Vibhu Jawa <[email protected]>

* fix docstring

Signed-off-by: Vibhu Jawa <[email protected]>

* Address Reviews and docstrings

Signed-off-by: Vibhu Jawa <[email protected]>

* Address Reviews and make eps_thresholds a list of values

Signed-off-by: Vibhu Jawa <[email protected]>

* Minor import fix

Signed-off-by: Vibhu Jawa <[email protected]>

* Empty Commit

Signed-off-by: Vibhu Jawa <[email protected]>

* Add modules to __init__ and README.md

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix init

Signed-off-by: Vibhu Jawa <[email protected]>

* Move comment

Signed-off-by: Vibhu Jawa <[email protected]>

* Empty commit to restart CI (which failed due to a download issue)

Signed-off-by: Vibhu Jawa <[email protected]>

* Empty commit to restart CI (which failed due to a download issue)

Signed-off-by: Vibhu Jawa <[email protected]>

---------

Signed-off-by: Andrew Schilling <[email protected]>
Signed-off-by: Ayush Dattagupta <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>
Signed-off-by: jgerh <[email protected]>
Signed-off-by: avinashvem <[email protected]>
Co-authored-by: Andrew Schilling <[email protected]>
Co-authored-by: Ayush Dattagupta <[email protected]>
Co-authored-by: jgerh <[email protected]>
Co-authored-by: avinashvem <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants