Skip to content

Pull requests: NVIDIA/NeMo-Curator

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Fix missing eps_thresholds parameter
#390 opened Nov 22, 2024 by sarahyurick Loading…
Create separate files for each deduplication class gpuci Run GPU CI/CD on PR
#389 opened Nov 22, 2024 by sarahyurick Loading…
Fix GPU error messages for fuzzy deduplication
#387 opened Nov 22, 2024 by sarahyurick Draft
1 of 2 tasks
Fuzzy Dedup: Make skipping the False positive check the default enhancement New feature or request gpuci Run GPU CI/CD on PR
#386 opened Nov 21, 2024 by ayushdg Loading…
2 of 3 tasks
Remove max_text_bytes_per_part gpuci Run GPU CI/CD on PR
#385 opened Nov 20, 2024 by sarahyurick Loading…
Global cache_dir variable for exact, fuzzy, and semantic deduplication gpuci Run GPU CI/CD on PR
#384 opened Nov 19, 2024 by sarahyurick Loading…
3 tasks done
Allow users to write to single file
#383 opened Nov 19, 2024 by sarahyurick Loading…
Synthetic data generation for Retriever Evaluation
#370 opened Nov 14, 2024 by vinay-raman Loading…
3 tasks done
ci: Add copyright-check workflow
#369 opened Nov 14, 2024 by ko3n1g Loading…
3 tasks
Update to latest Crossfit gpuci Run GPU CI/CD on PR
#365 opened Nov 14, 2024 by VibhuJawa Draft
Task-Complexity Classifier
#364 opened Nov 13, 2024 by sarahyurick Draft
Type of Speech Classifier
#361 opened Nov 13, 2024 by sarahyurick Draft
Add codepath for computing buckets without int conversion enhancement New feature or request gpuci Run GPU CI/CD on PR
#326 opened Oct 25, 2024 by ayushdg Loading…
3 tasks done
Add support for finetune guard classifier
#325 opened Oct 25, 2024 by VibhuJawa Loading…
Dapt data curation tutorial fuzzy and semantic dedupe gpuci Run GPU CI/CD on PR
#322 opened Oct 24, 2024 by ruchaa-apte Loading…
Added example notebook for translation with ct2 model. documentation Improvements or additions to documentation
#262 opened Sep 25, 2024 by uahmed93 Draft
3 tasks
Add support for parallel data curation
#193 opened Aug 8, 2024 by shuoyangd Loading…
3 tasks done
Fixed bug: changed to correct model name
#186 opened Aug 6, 2024 by ByteWrite Loading…
1 of 3 tasks
ProTip! Updated in the last three days: updated:>2024-11-21.