Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix caching issues #2847

Merged
merged 3 commits into from
Jun 4, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions cmd/minikube/cmd/start.go
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ func runStart(cmd *cobra.Command, args []string) {
clusterBootstrapper := viper.GetString(cmdcfg.Bootstrapper)

if shouldCacheImages {
go machine.CacheImagesForBootstrapper(k8sVersion, clusterBootstrapper)
machine.CacheImagesForBootstrapper(k8sVersion, clusterBootstrapper)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the race condition here, but I'm not quite sure that this is the right fix. We implemented it this way to avoid slowing things down at all for non-offline mode users.

The cache was basically designed to be "best effort", and making it synchronous like this could introduce flakiness and delays for the standard use-case.

Any ideas here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take your point about the standard use case. I've actually found that it makes the start operation more reliable (particularly if a previous one had been terminated) as well as also making debugging much easier as the log is now predictable and deterministic. If the cache loading has not completed before the "docker load" then we end up often with incomplete loads into the cache and - in online mode - re-downloading of the images with I would think a reduction in performance due to the duplication.

One improvement would be to allow the image download to be async but then block on it completing before the call to UpdateCluster(). This would allow it to run in parallel with the VM creation / startup while preventing the incomplete load problem.

Another change would be to change the semantics such that the load is not best effort i.e. if enabled then it either succeeds or fails (as per the patch) and if this behaviour is not desired then the cache-images flag should be set to false and thus the standard use case would not be impacted. This would seems more more logical to me i.e. if caching is enabled then it should complete successfully rather than failing silently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative would be to add a flag such as "mustCacheImages" or such like but I'd prefer to avoid adding more flags of possible and think the other two options are a better approach. Let me know what you think. Cheers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A further thought, and I think the better option, is to make False the default for --cache-images and have the behaviour as coded in the PR. This then makes things very clear and consistent i.e. don't cache images by default but if you do request caching then exit if the caching fails. Possibly also add the async wait before UpdateCluster() which I mentioned earlier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work and thoughtful discussion here! I think I agree with your last proposal. Then we can easily change the default to true if we feel it does make things more stable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A change in behavior from --cache-images default true, to --cache-images default false?
"Feels" like the wrong direction.

I believe the binaries for iso/kubeadm/kubelet are all downloaded to the host, cached & imported/used later. Is it too much of a departure to do the same for images? Caching & blocking at UpdateCluster by default for k8s images seems the "best" option (immediate stability risks ignored).

Sounds like "delay UpdateCluster until cached images are ready" would be preferred but is too risky a default right now?

With switches & defaults - Setting the tone for a future where caching is optimal, reliable & default seems best. (Even if it takes a little while to get there).

My use case includes many lower-bandwidth customers, and "frequent" destroy/starts.
I'd appreciate a config value akin to "kubernetes-version" for "cacheWaitOnKubernetesImages=true" and then in the future once reliable image caching is established - make blocking the default.

Copy link
Contributor Author

@mlgibbons mlgibbons Jun 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. I can understand where you're coming from. I think we can separate this into two distinct items: 1) when used caching needs to be reliable and so the race condition needs to be addressed; 2) whether the image caching should default to true or false.

Item one needs fixed and I think everyone is in agreement one that. The commit I just made addresses that issue. Item two depends on your perspective and I could argue for both cases. The key difference in my mind is that the ISO, kubeadm and kubectl MUST be downloaded to be able to spin up the cluster. The images MAY be downloaded, cached and loaded but don't need to be as the caching is an optimisation. Given that, having them treated differently to the other items makes sense.

I would personally go with having image caching disabled by default for a number of reasons including that it's one less thing to worry about if you don't need it and I've spent too much time deleting the incomplete images in the cache dir created after doing Ctrl-C during cache population. The commit I made has the default set to False - I read your comment after the commit was made - but it could be either depending on what you value most and your use case i.e. there is no "right" default setting.

}
api, err := machine.NewAPIClient()
if err != nil {
Expand Down Expand Up @@ -332,10 +332,10 @@ You will need to move the files to the appropriate location and then set the cor
sudo mv /root/.kube $HOME/.kube # this will write over any previous configuration
sudo chown -R $USER $HOME/.kube
sudo chgrp -R $USER $HOME/.kube

sudo mv /root/.minikube $HOME/.minikube # this will write over any previous configuration
sudo chown -R $USER $HOME/.minikube
sudo chgrp -R $USER $HOME/.minikube
sudo chgrp -R $USER $HOME/.minikube

This can also be done automatically by setting the env var CHANGE_MINIKUBE_NONE_USER=true`)
}
Expand Down
7 changes: 5 additions & 2 deletions pkg/minikube/bootstrapper/kubeadm/kubeadm.go
Original file line number Diff line number Diff line change
Expand Up @@ -259,9 +259,12 @@ func NewKubeletConfig(k8s config.KubernetesConfig) (string, error) {

func (k *KubeadmBootstrapper) UpdateCluster(cfg config.KubernetesConfig) error {
if cfg.ShouldLoadCachedImages {
// Make best effort to load any cached images
go machine.LoadImages(k.c, constants.GetKubeadmCachedImages(cfg.KubernetesVersion), constants.ImageCacheDir)
err := machine.LoadImages(k.c, constants.GetKubeadmCachedImages(cfg.KubernetesVersion), constants.ImageCacheDir)
if err != nil {
return errors.Wrap(err, "loading cached images")
}
}

kubeadmCfg, err := generateConfig(cfg)
if err != nil {
return errors.Wrap(err, "generating kubeadm cfg")
Expand Down