-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime resource discovery #188
Conversation
33111fe
to
10bf3b8
Compare
kubernetes-deploy.gemspec
Outdated
@@ -24,11 +24,12 @@ Gem::Specification.new do |spec| | |||
|
|||
spec.required_ruby_version = '>= 2.3.0' | |||
spec.add_dependency "activesupport", ">= 4.2" | |||
spec.add_dependency "kubeclient", "~> 2.4" | |||
spec.add_dependency "kubeclient", "~> 2.5.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Required for raw json
support.
kubernetes-deploy.gemspec
Outdated
spec.add_dependency "googleauth", ">= 0.5" | ||
spec.add_dependency "ejson", "1.0.1" | ||
spec.add_dependency "colorize", "~> 0.8" | ||
spec.add_dependency "statsd-instrument", "~> 2.1" | ||
spec.add_dependency "jsonpath", "0.8.8" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Required to parse out the custom status
field.
discover_tpr(v1beta1_kubeclient(context)) | ||
begin | ||
discover_crd(v1beta1_crd_kubeclient(context)) | ||
rescue KubeException => err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is thrown for Kubernetes < 1.7 - I am also open to either:
- Eliminating this check and dropping support for < 1.7.
- Making the check against the version number of the client/server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we merged the server version check, maybe we can use that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use a version check as Karan suggested. I also think that these discovery methods should be retried a couple times so we don't fail the deploy on server blips here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a version check now.
end | ||
|
||
def self.discovered(group:, type:, version:, annotations:) | ||
resource_class = Class.new(self) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I previously spoke about using meta-programming with @mkobetic. Since that time I've convinced myself this is the right approach.
We are modelling a class/instance relationship (resource definition vs resource instances). Each definition has specific properties shared by all instances (e.g. whether it's prunable, or its configured timeout). This information should be stored somewhere, since Ruby supports OOP it makes sense to generate classes that can contain the specific settings shared by all resources (instances).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DISCLAIMER: I've only worked with this kind of thing a couple times in the past, and I'm going to want to pull in another experienced Rubyist before we merge this PR.
Agreed, but I think we can take it a step further and make the generated classes "normal" rather than using class variables, by using class_eval
. Here's a working example:
module ScratchPadEphemeral
class DynamicParent
require 'active_support/core_ext/class/subclasses'
def self.define_child(child_name, child_type, some_bool)
new_class = self.class_eval <<-STRING
class #{child_name.capitalize} < DynamicParent
def type
'#{child_type}'
end
def some_bool
#{some_bool}
end
end
STRING
end
end
end
[35] pry(ScratchPadEphemeral):1> DynamicParent.define_child("foo", "test", true)
=> :some_bool
[36] pry(ScratchPadEphemeral):1> DynamicParent.define_child("bar", "test", false)
=> :some_bool
[37] pry(ScratchPadEphemeral):1> DynamicParent.subclasses
=> [ScratchPadEphemeral::DynamicParent::Bar, ScratchPadEphemeral::DynamicParent::Foo]
[38] pry(ScratchPadEphemeral):1> foo = DynamicParent::Foo.new
=> #<ScratchPadEphemeral::DynamicParent::Foo:0x007fd8e4106fe8>
[39] pry(ScratchPadEphemeral):1> foo.type
=> "test"
[40] pry(ScratchPadEphemeral):1> foo.some_bool
=> true
This should incidentally be more performant than define_method?
, from what I've been told in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adopted this approach in the rewritten version, thanks.
getter = "get_#{type.downcase}" | ||
@client ||= DiscoverableResource.kubeclient(context: @context, resource_class: self.class) | ||
raw_json = @client.send(getter, @name, @namespace, as: :raw) | ||
query_path = JsonPath.new(status_field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We allow the CRD implementors to specify how the resources should be queried via arbitrary jsonpath
to determine its status.
add_resource(resource_class) | ||
end | ||
|
||
def self.add_resource(resource_class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we generate resource classes dynamically, we still need a way to map resource names to their classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very possible that I'm missing something, but I believe that if you use the class_eval
technique I mentioned above, we can subclass from KubernetesResource
as usual, and the existing lookup technique (KubernetesDeploy.const_defined?(definition["kind"])
) should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part has been re-written.
@@ -0,0 +1,26 @@ | |||
# frozen_string_literal: true | |||
module KubernetesDeploy | |||
class GenericResource < KubernetesResource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've extracted this base class that represents a generic resource. This code is shared by several existing resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, KubernetesResource
(the superclass) is already kinda "generic resource", no? Should we simply change the defaults in that class to what you have here? Regardless, please extract this change to a separate PR. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
@@ -0,0 +1,6 @@ | |||
# frozen_string_literal: true | |||
module KubernetesDeploy | |||
class CustomResourceDefinition < GenericResource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class allows k8s-deploy
to monitor deployment of static custom resource definitions.
@@ -0,0 +1,12 @@ | |||
# frozen_string_literal: true | |||
module KubernetesDeploy | |||
class ThirdPartyResource < GenericResource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class allows k8s-deploy
to monitor deployment of static third party resource definitions.
end | ||
|
||
def self.kubeclient(context:, resource_class:) | ||
_build_kubeclient( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value cannot be memoized because we must refresh what CRD/TPRs are available on each invocation (even if it's during a single lifespan of ak8s-deploy
process, for example if using this code as a library instead of a standalone executable during test).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that because we might have deployed CRDs? If we want to make that work, we should predeploy them before doing the discovery (or maybe just accept that this does not work for v1, since that'd be really complicated). Doing discovery every time we need to use the client sounds expensive. Alternatively, could we be using instances of this class and discarding them on every run
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Predeploying CRDs now, thanks.
end | ||
end | ||
|
||
def has_resource?(res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Necessary when running without TPR support (k8s > 1.7) or without CRD support (k8s < 1.7).
716352d
to
2d78c4b
Compare
end | ||
|
||
def self.build(namespace:, context:, definition:, logger:) | ||
return super if KubernetesDeploy.const_defined?(definition["kind"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows us to temporarily fall back to the custom classes in k8s-deploy
if they are present for a given resource. This behaviour is useful for progressively moving the monitoring to the controllers and outside of this gem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(irrelevant if we can make the instances direct subclasses of KubernetesResource, but...) Doesn't this end up covering the regular resources, i.e. is not temporary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more explicit in the re-written version. If a matching (== kind) symbol exists in the k8s-deploy module it is used, otherwise a class is generated dynamically.
fc89812
to
9fb501e
Compare
We may want to also override
Any thoughts? @KnVerey |
end | ||
|
||
def self.prunable? | ||
return @prunable if defined?(@prunable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instance vars that don't exist should still be falsey, just being explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just my Java/C#/C background. ;) Simplified, thanks!
|
||
def self.discover(context:, logger:) | ||
logger.info("Discovering custom resources:") | ||
@resources = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also a nop afaik
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless you are resetting them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should be reset each time we (re)discover.
cc: @kirs I'd like to hear your opinion on this PR as well please. |
a6ff545
to
2627ec1
Compare
Rebased on master and addressed some cosmetic comments. |
@resources[group][type][version] = resource_class | ||
end | ||
|
||
def self.parse_bool(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubernetes also uses yes
for the value of a boolean annotation. These annotations are added to persistent volume claims once it is bound.
pv.kubernetes.io/bind-completed=yes
pv.kubernetes.io/bound-by-controller=yes
This should also look for yes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets do ON, yes, true and 1 wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value needs to be a string so "1" might get confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this is what Rails (well, ActiveModel) does: https://github.com/rails/rails/blob/47eadb68bfcae1641b019e07e051aa39420685fb/activemodel/lib/active_model/type/boolean.rb#L17
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to ActiveModel, thanks!
2627ec1
to
86c2e3a
Compare
This PR rebased pretty cleanly. I know you have a full plate, but when you get a chance I'd like to hear your thoughts @KnVerey, also with respect to #188 (comment). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
👍 on the cleanup of the common code too
discover_tpr(v1beta1_kubeclient(context)) | ||
begin | ||
discover_crd(v1beta1_crd_kubeclient(context)) | ||
rescue KubeException => err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we merged the server version check, maybe we can use that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a bunch of comments, but this is a great start. I'm really excited about this feature!
|
||
def exists? | ||
# TPRs take time to become available. | ||
_, _err, st = kubectl.run("get", @name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only sync
methods should make API calls. I really should look into adding a test to enforce that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
lib/kubernetes-deploy/deploy_task.rb
Outdated
def predeploy_sequence | ||
resources = DiscoverableResource.all.select(&:predeploy?) | ||
identities = resources.map(&:identity) | ||
PREDEPLOY_SEQUENCE + identities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless/until we support more powerful ordering, it's probably safer to assume that the custom resources need to go first, and pods (which are at the end of the constant list) need to be very last. This was true for all our own CRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks.
lib/kubernetes-deploy/deploy_task.rb
Outdated
@@ -206,13 +223,15 @@ def validate_definitions(resources) | |||
|
|||
def discover_resources | |||
resources = [] | |||
# Explicitly discovering will discard all cached resources. | |||
DiscoverableResource.discover(context: @context, logger: @logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should be its own step in the "Initializing deploy" phase. It isn't resource discovery in the same sense as this method used to mean; this method should probably be renamed to something like "parse_templates" or "load_resource_from_file" or whatever, now that we have something that is "discovery" in the k8s api sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to separate method, renamed existing method as suggested.
discover_tpr(v1beta1_kubeclient(context)) | ||
begin | ||
discover_crd(v1beta1_crd_kubeclient(context)) | ||
rescue KubeException => err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use a version check as Karan suggested. I also think that these discovery methods should be retried a couple times so we don't fail the deploy on server blips here.
end | ||
|
||
def self.discover_crd(client) | ||
return unless client.respond_to? :get_custom_resource_definitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are basically another version check, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
end | ||
|
||
def self.build(namespace:, context:, definition:, logger:) | ||
return super if KubernetesDeploy.const_defined?(definition["kind"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(irrelevant if we can make the instances direct subclasses of KubernetesResource, but...) Doesn't this end up covering the regular resources, i.e. is not temporary?
return super if KubernetesDeploy.const_defined?(definition["kind"]) | ||
|
||
# We only discover once per kubernetes-deploy invocation | ||
discover(context: context, logger: logger) unless @resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should assume discovery is has already been done. It would be very heavy to have it actually happen here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing discovery once now, removed the call from here.
STATUS_SUCCESS_ANNOTATION = 'kubernetes-deploy.shopify.io/status-success' | ||
TIMEOUT_ANNOTATION = 'kubernetes-deploy.shopify.io/timeout' | ||
PREDEPLOY_ANNOTATION = 'kubernetes-deploy.shopify.io/predeploy' | ||
PRUNABLE_ANNOTATION = 'kubernetes-deploy.shopify.io/prunable' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are so many annotations here that I'm wondering if we should instead have a single one that contains JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed timeout
as it's been superseded by a different PR, collected the rest of the methods under metadata
. I'm open to suggestions on the json
field name.
@@ -0,0 +1,26 @@ | |||
# frozen_string_literal: true | |||
module KubernetesDeploy | |||
class GenericResource < KubernetesResource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, KubernetesResource
(the superclass) is already kinda "generic resource", no? Should we simply change the defaults in that class to what you have here? Regardless, please extract this change to a separate PR. 😄
|
||
def cleanup(*resources) | ||
resources.each do |res| | ||
_, err, st = kubectl.run("delete", res, "--all") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it impossible/hard to do this with kubeclient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing it with kubeclient now.
79c6ff9
to
5f9dd31
Compare
@KnVerey I've updated this branch with the latest approach, it's still a work in progress (+ need some more tests) so feel free to ignore it for now. I'll ping you for review once I feel it's ready. Edit: That said, comments are always welcome, thank you. |
2a7b298
to
c5f60de
Compare
fd740a0
to
1361834
Compare
bin/test
Outdated
if [[ ${CI:="0"} == "1" ]]; then | ||
echo "--- :ruby: Bundle Install" | ||
bundle install --jobs 4 | ||
bundle install --jobs 4 || exit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests should exit if bundler
fails.
resources = DiscoverableResource.all + KubernetesResource.all | ||
# Omit unqualified kinds (they are unsupported on this cluster, or discovery hasn't been performed yet) | ||
resources.select do |res| | ||
res.constants.include?(:GROUP) && res.constants.include?(:VERSION) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is somewhat inelegant, but the GROUP
and VERSION
are now detected via discovery for most resources. If these fields are not present we can't query the API server, so we skip over them. This is not an issue if the user runs discovery first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean for the "user" to run discovery? i.e. would it be a programming error in the gem if it doesn't happen before this is called? If so, can we detect (e.g. by setting an ivar when we do it) whether or not it has been done and raise an exception if not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean for the "user" to run discovery?
It means invoking discover_resources
at some point, normally before attempting to deploy.
I don't think we use k8s-deploy
as a library anywhere, so I'm not sure what the normal entry point to it would be if used as a library.
If used normally as an executable discover_resources
is executed as part of run
in the deploy_task
so this is a non-issue.
The other important part is that not all statically defined resources are supported on all k8s versions.
DEPLOY_METADATA_ANNOTATION = 'kubernetes-deploy.shopify.io/metadata' | ||
|
||
def self.inherited(child_class) | ||
DiscoverableResource.child_classes.add(child_class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overriding allows us to distinguish dynamically created resources (discovery) versus statically defined resources (static subclass of KubernetesResource
).
def self.discover_kinds(context) | ||
kinds = {} | ||
|
||
# At the top level there is the core group (everything below /api/v1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately for historic reasons not all resources are available under /apis/$NAME/$VERSION
, so we have to look in two places.
resources = json_response['resources'] | ||
resources.each do |res| | ||
kind = res['kind'] | ||
next if kinds.key?(kind) # Respect the preferred version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For each group we'll attempt to detect resources under the preferred group version, but some resources are only available under newer (beta/alpha) versions, so we have to check those too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth mentioning in a comment. My first thought in reading through the above was "why don't we check only the preferred version?" even though I should have known the answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this behave correctly when a resource exists in multiple GV, but not the preferred one? batch
would be an example of this situation: its preferred version in 1.8 is batch/v1
, but CronJob only exists in v1beta1
and v2alpha1
. We'd want to make sure to pick the beta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this code iterates through each version of each group.
Preference is given to the preferred version, if something is not present in the preferred version then it will be added based on the first version that it is encountered in.
The order of the version list returned by the API server is oldest version to newest version, so the old(er) version will be found first.
e.g.
kubectl get --raw /apis/batch | jq .
{
"kind": "APIGroup",
"apiVersion": "v1",
"name": "batch",
"versions": [
{
"groupVersion": "batch/v1",
"version": "v1"
},
{
"groupVersion": "batch/v1beta1",
"version": "v1beta1"
}
],
"preferredVersion": {
"groupVersion": "batch/v1",
"version": "v1"
},
"serverAddressByClientCIDRs": null
}
In this case CronJob
will be encountered first in v1beta1
and associated with that version.
@@ -2,10 +2,13 @@ | |||
module KubernetesDeploy | |||
class Bugsnag < KubernetesResource | |||
TIMEOUT = 1.minute | |||
PREDEPLOY = true | |||
GROUP = 'stable.shopify.io' | |||
VERSION = 'v1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We provide these values in code for statically defined classes. They can also be provided dynamically by annotations to CRDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should delete this class, actually... we 🔥 this CR a while ago.
@@ -0,0 +1,29 @@ | |||
# frozen_string_literal: true | |||
module KubernetesDeploy | |||
class HorizontalPodAutoscaler < KubernetesResource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HPA was previously defined only as a string in the prune
list, we now need an actual class backing it.
@@ -2,7 +2,9 @@ | |||
module KubernetesDeploy | |||
class Pod < KubernetesResource | |||
TIMEOUT = 10.minutes | |||
|
|||
PREDEPLOY = true | |||
PREDEPLOY_DEPENDENCIES = %w(ResourceQuota ServiceAccount ConfigMap PersistentVolumeClaim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted from the previous static predeploy dependency list. Note that this is a partial order (the previous list by definition was a total order).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the fact that this can depend on custom resources, such as cloudsql/redis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we'll need some way to specify such custom dependencies in the pod spec.
Alternatively we could optionally specify something like PREDEPLOY_DEPENDENTS
on resources, so for e.g. CloudSQL
could specify that it is a dependency of Pod
which is its dependent.
# frozen_string_literal: true | ||
require 'test_helper' | ||
|
||
class ResourceDiscoveryTest < KubernetesDeploy::IntegrationTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests affect global resources (CRD) so they are not parallel-friendly.
klass.predeploy_dependencies.each do |dep| | ||
pos = order.map.with_index { |e, i| e == dep ? i : nil }.compact.first | ||
# The current resource is deployed *after* its deps | ||
assert_operator idx, :>, pos, "#{r} requires #{dep} but got #{order}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not want to hardcode a particular total order here because that's not important. What's important is that the result is in any valid toplogical sort order, which may not be unique.
@KnVerey Could I ask you to have another look? A lot has changed since the initial version. All tests are currently passing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only did one pass and didn't make it through the tests, but this is looking great! Can you let me know when you have time to sit down and walk me through some of it IRL? Some things I'd like to discuss:
- jq vs jsonpath
- timeouts/failures for dynamic resources
- dynamic classes for core resources
- the predeploy graph (I think I'm forgetting something about why we need to do this)
- any performance implications
@prune_whitelist ||= _build_prune_whitelist | ||
end | ||
|
||
def _build_prune_whitelist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be more conventional to make these methods private
rather than prefixing to suggest they should be. Couldn't the prune_whitelist
be private too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, was just following the previous examples in the file.
resources = DiscoverableResource.all + KubernetesResource.all | ||
# Omit unqualified kinds (they are unsupported on this cluster, or discovery hasn't been performed yet) | ||
resources.select do |res| | ||
res.constants.include?(:GROUP) && res.constants.include?(:VERSION) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean for the "user" to run discovery? i.e. would it be a programming error in the gem if it doesn't happen before this is called? If so, can we detect (e.g. by setting an ivar when we do it) whether or not it has been done and raise an exception if not?
@@ -217,14 +238,21 @@ def validate_definitions(resources) | |||
end | |||
|
|||
def discover_resources | |||
# (Lazily) rebuild these lists after discovery if they were present. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lazily likely means in the middle of the mutating part of the deploy. Since they can fail, I'm thinking it would be better to build them eagerly, during the "Initializing deploy" stage... I'd suggest making prune_whitelist
a simple attr_reader and setting them explicitly here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predeploy_sequence
and prune_whitelist
are local computations only without any API calls, they should never fail.
resources = json_response['resources'] | ||
resources.each do |res| | ||
kind = res['kind'] | ||
next if kinds.key?(kind) # Respect the preferred version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth mentioning in a comment. My first thought in reading through the above was "why don't we check only the preferred version?" even though I should have known the answer.
resources = json_response['resources'] | ||
resources.each do |res| | ||
kind = res['kind'] | ||
next if kinds.key?(kind) # Respect the preferred version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this behave correctly when a resource exists in multiple GV, but not the preferred one? batch
would be an example of this situation: its preferred version in 1.8 is batch/v1
, but CronJob only exists in v1beta1
and v2alpha1
. We'd want to make sure to pick the beta.
@found = st.success? | ||
end | ||
|
||
def deploy_succeeded? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's actually a more sophisticated success condition available for this; check out waitCRDReady
in cloudbuddies.
exists? | ||
end | ||
|
||
def deploy_failed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might make sense to look at the NamesAccepted
condition here (see waitCRDReady
in cloudbuddies)
@@ -0,0 +1,29 @@ | |||
# frozen_string_literal: true | |||
module KubernetesDeploy | |||
class Job < KubernetesResource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't insist on this, but it could be helpful if you extracted these new class additions into a separate PR. Making sure each one has appropriate checks and test coverage is not really related to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a prerequisite PR? Without the new classes this PR won't work.
@@ -2,7 +2,9 @@ | |||
module KubernetesDeploy | |||
class Pod < KubernetesResource | |||
TIMEOUT = 10.minutes | |||
|
|||
PREDEPLOY = true | |||
PREDEPLOY_DEPENDENCIES = %w(ResourceQuota ServiceAccount ConfigMap PersistentVolumeClaim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the fact that this can depend on custom resources, such as cloudsql/redis?
|
||
def self.discover(context:, logger:, server_version:) | ||
logger.info("Discovering custom resources:") | ||
with_retries { discover_groups(context) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I reading correctly that this is solely for the purpose of determining which GV to use in the prune whitelist for the statically defined classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the chain of calls here is unclear. In order to discover groups (including for the statically defined classes) we have to discover everything based on what the API server returns, i.e. discover_groups
requires discover_kinds
which performs all discovery.
But yes, discover_groups
itself is responsible for populating info in the static classes.
1361834
to
8a1aba5
Compare
8a1aba5
to
6b7edd6
Compare
@stefanmb this would be so awesome to get in for all of us without all the *Buddies nowadays!!! <3<3 |
WARNING: Some of this description is now obsolete (Jan26/2018)
What
This PR introduces automatic discovery of CustomResourceDefinitions (1.7+) and ThirdPartyResources (1.6) in a backwards compatible manner. Once this PR is implemented, the monitoring logic for these resources can be removed from
kubernetes-deploy
.Motivation
Currently we are handling each resource type by adding a new subclass of
KubernetesResource
to handle its state (as per the documentation).This approach makes sense for core resources, but is not scalable for
CustomResourceDefinitions
(andThirdPartyResources
) for multiple reasons:cloudsql
andredis
which are not open sourced.k8s-deploy
k8s-deploy
- we cannot centralize all logic for all possible resources users may wish to implement.The goals of the PR are:
k8s-deploy
by enabling controllers to annotate resources with their respective statuses.KubernetesResource
can be removed one at a time.How
TODO
TODO
Update
README
file.Review
cc: @Shopify/cloudplatform @KnVerey