You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the current kubeflow pipeline resource configuration large packages like Pytorch can cause the pipeline to fail due to their size (around 4GB) taking space during package installation. The temporary fix for this seems to be to delete the created kubeflow component pods with:
kubectl get pods -n
kubectl delete pod () -n ()
A better fix would be to somehow enable kubeflow components to install the cpu only variant of torch, which in a regular venv can be installed with:
However, kubeflow component package installs don't understand the -f option, which is why I think a more lasting fix would be increasing the kubeflow pipeline resource configuration if possible.
The text was updated successfully, but these errors were encountered:
@K123AsJ0k1 Is the issue that the component goes over the disk memory limit for the task? If that is the case, we can also increase that limit. I can't remember exactly how was the argument called, but I think it was something like disk_limit:
@JoaquinRivesGambin It is possible, but to me it seems more collective memory size than single component size. I will however test if that change works. Regardless, after checking the kubeflow pipeline docs, I found out that components can use index_urls that can be used similarly to -f option in pip installs. This means that we can reduce the torch package size with the following:
In the current kubeflow pipeline resource configuration large packages like Pytorch can cause the pipeline to fail due to their size (around 4GB) taking space during package installation. The temporary fix for this seems to be to delete the created kubeflow component pods with:
A better fix would be to somehow enable kubeflow components to install the cpu only variant of torch, which in a regular venv can be installed with:
However, kubeflow component package installs don't understand the -f option, which is why I think a more lasting fix would be increasing the kubeflow pipeline resource configuration if possible.
The text was updated successfully, but these errors were encountered: