Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubeflow pipeline unable to run due to large packages taking limited space #37

Open
K123AsJ0k1 opened this issue May 20, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@K123AsJ0k1
Copy link
Collaborator

K123AsJ0k1 commented May 20, 2024

In the current kubeflow pipeline resource configuration large packages like Pytorch can cause the pipeline to fail due to their size (around 4GB) taking space during package installation. The temporary fix for this seems to be to delete the created kubeflow component pods with:

kubectl get pods -n
kubectl delete pod () -n ()

A better fix would be to somehow enable kubeflow components to install the cpu only variant of torch, which in a regular venv can be installed with:

torch==2.3.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
torchvision==0.18.0+cpu -f https://download.pytorch.org/whl/torch_stable.html

However, kubeflow component package installs don't understand the -f option, which is why I think a more lasting fix would be increasing the kubeflow pipeline resource configuration if possible.

@K123AsJ0k1 K123AsJ0k1 added enhancement New feature or request question Further information is requested labels May 20, 2024
@JoaquinRivesGambin
Copy link
Contributor

@K123AsJ0k1 Is the issue that the component goes over the disk memory limit for the task? If that is the case, we can also increase that limit. I can't remember exactly how was the argument called, but I think it was something like disk_limit:

@component(
    base_image="python:3.10",
    packages_to_install=["numpy", "mlflow~=2.4.1"],
    output_component_file='components/evaluate_component.yaml',
    disk_limit='10Gi'
)

@K123AsJ0k1
Copy link
Collaborator Author

@JoaquinRivesGambin It is possible, but to me it seems more collective memory size than single component size. I will however test if that change works. Regardless, after checking the kubeflow pipeline docs, I found out that components can use index_urls that can be used similarly to -f option in pip installs. This means that we can reduce the torch package size with the following:

base_image = "python:3.10",
packages_to_install = [
      "python-swiftclient",
      "torch==2.3.0", 
      "torchvision==0.18.0"
],
pip_index_urls=[
      "https://pypi.org/simple",
      "https://download.pytorch.org/whl/cpu",
      "https://download.pytorch.org/whl/cpu"
  ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants