Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Preparation Failed in Kubernetes #314

Open
Syulin7 opened this issue May 10, 2024 · 0 comments · May be fixed by #337
Open

Data Preparation Failed in Kubernetes #314

Syulin7 opened this issue May 10, 2024 · 0 comments · May be fixed by #337

Comments

@Syulin7
Copy link

Syulin7 commented May 10, 2024

Referencing this tutorial: https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/kubernetes.html#nemo-framework-on-kubernetes-playbook
an error occurred during the Data Preparation stage.

image

NeMo Container image: nvcr.io/nvidia/nemo:24.03.01.framework

ENV PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python in the container dose not work.

I fixed this issue by adding the environment variable in the mpirun command.

mpirun -x PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python

- '{{- range tuple "download" "extract" "preprocess" }} mpirun --allow-run-as-root -np {{ $config.totalProcesses }} -npernode {{ $config.procsPerNode }} -bind-to none -map-by slot --oversubscribe -x PYTHONPATH -mca pml ob1 -mca btl ^openib python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/nemo_launcher/collections/dataprep_scripts/pile_dataprep/{{ . }}.py --config-path=/config --config-name=config.yaml && {{- end}} echo Data preparation complete'

@Syulin7 Syulin7 linked a pull request May 23, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant