Skip to content

fix: Upload other splits to Huggingface#460

Closed
fsiino-nvidia wants to merge 7 commits intofsiino/416-hf-download-utilsfrom
fsiino/422-validations-splits-for-hf
Closed

fix: Upload other splits to Huggingface#460
fsiino-nvidia wants to merge 7 commits intofsiino/416-hf-download-utilsfrom
fsiino/422-validations-splits-for-hf

Conversation

@fsiino-nvidia
Copy link
Contributor

@fsiino-nvidia fsiino-nvidia commented Dec 10, 2025

(Based off #419 , yet to be merged)

  • This updates the HF dataset upload flow to account for changes such as updated naming (e.g. we prefix with Nemotron-RL and not Nemo-Gym now), and various other things.
  • Contributing files to existing repos as a PR is also supported now.
  • Docs have been updated.
  • Following datasets were contributed (as PRs) via:
ng_upload_dataset_to_hf \
    +input_jsonl_fpath=data/workplace_assistant/validation.jsonl \
    +resource_config_path=resources_servers/workplace_assistant/configs/workplace_assistant.yaml \
    +create_pr=true \
    +commit_message="Upload validation" \
    +split=validation

ng_upload_dataset_to_hf \
    +input_jsonl_fpath=data/structured_outputs/structured_outputs_251027_nano_v3_sdg_json_val.jsonl \
    +resource_config_path=resources_servers/structured_outputs/configs/structured_outputs_json.yaml \
    +create_pr=true \
    +commit_message="Upload validation" \
    +split=validation

ng_upload_dataset_to_hf \
    +dataset_name=OpenMathReasoning \
    +input_jsonl_fpath=data/math_with_judge/aime24_validation.jsonl \
    +resource_config_path=resources_servers/math_with_judge/configs/math_with_judge.yaml \
    +create_pr=true \
    +commit_message="Upload validation" \
    +split=validation

Signed-off-by: Frankie Siino <fsiino@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 10, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
@fsiino-nvidia fsiino-nvidia linked an issue Dec 10, 2025 that may be closed by this pull request
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
@fsiino-nvidia fsiino-nvidia marked this pull request as ready for review December 12, 2025 18:53
…to fsiino/422-validations-splits-for-hf

Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
@bxyu-nvidia
Copy link
Contributor

changes merged in #481

@fsiino-nvidia fsiino-nvidia deleted the fsiino/422-validations-splits-for-hf branch February 23, 2026 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix HF dataset uploads to include validation splits if present

2 participants