Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rebase 1.9.0 beta.1 #17576

Merged
merged 78 commits into from
Dec 13, 2017
Merged

rebase 1.9.0 beta.1 #17576

merged 78 commits into from
Dec 13, 2017

Conversation

deads2k
Copy link
Contributor

@deads2k deads2k commented Dec 4, 2017

Green:

  1. networking
  2. verify
  3. cross
  4. integration
  5. end-to-end with router removed
  6. unit
  7. test-cmd
  8. extended builds
  9. extended image ecosystem

Current counts:

  1. GCE - 5 failures - 4 networking failures
  2. extended conformance - 2 failures
  3. end-to-end router tests - CI can't run e2e tests anymore because its trying to use golang 1.8.3. I removed them

e2e failures on gce:

  • [Feature:DeploymentConfig] deploymentconfigs when run iteratively [Conformance] should only deploy the last deployment [Suite:openshift/conformance/parallel] - flaking
  • [sig-network] Networking Granular Checks: Pods should function for intra-pod communication: http [Conformance] [Suite:openshift/conformance/parallel] [Suite:k8s] 5m27s
  • [sig-network] Networking Granular Checks: Pods should function for intra-pod communication: udp [Conformance] [Suite:openshift/conformance/parallel] [Suite:k8s] 5m43s
  • [sig-network] Networking Granular Checks: Pods should function for node-pod communication: http [Conformance] [Suite:openshift/conformance/parallel] [Suite:k8s] 5m50s
  • [sig-network] Networking Granular Checks: Pods should function for node-pod communication: udp [Conformance] [Suite:openshift/conformance/parallel] [Suite:k8s] 5m49s

e2e failures on install:

  • [Conformance][Area:Networking][Feature:Router] openshift router metrics The HAProxy router should expose prometheus metrics for a route [Suite:openshift/conformance/parallel] 21s @deads2k reproduced locally
  • [Conformance][Area:Networking][Feature:Router] openshift router metrics The HAProxy router should expose the profiling endpoints [Suite:openshift/conformance/parallel] 23s @deads2k reproduced locally

flaky failures

  1. [Feature:Builds] Optimized image builds should succeed [Conformance] [Suite:openshift/conformance/parallel] 23s
  2. [Feature:Builds][Conformance] build without output image building from templates should create an image from a docker template without an output image reference defined [Suite:openshift/conformance/parallel] 1m5s
  3. [sig-storage] Downward API volume should update labels on modification [Conformance] [Suite:openshift/conformance/parallel] [Suite:k8s] 2m19s I think this flakes on master
  4. [Feature:DeploymentConfig] deploymentconfigs when run iteratively [Conformance] should only deploy the last deployment [Suite:openshift/conformance/parallel] - flaking

Post-rebase tasks

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 4, 2017
@openshift-ci-robot openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 4, 2017
@deads2k deads2k force-pushed the rebase-1.9.0-beta.1 branch from 1c592e9 to aaeb330 Compare December 4, 2017 15:49
@deads2k deads2k force-pushed the rebase-1.9.0-beta.1 branch from aaeb330 to 1557bb3 Compare December 5, 2017 21:11
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 5, 2017
@deads2k deads2k force-pushed the rebase-1.9.0-beta.1 branch from 8b12ec9 to 254ba8a Compare December 6, 2017 15:50
@deads2k
Copy link
Contributor Author

deads2k commented Dec 6, 2017

@sjenning there are significant changes to the node wiring. In particular kubernetes/kubernetes#53564, kubernetes/kubernetes#52287, and kubernetes/kubernetes#54405 . I've put your name in the commit where I deleted them in our config, but I'm not sure what will happen to us.

@@ -64,6 +64,12 @@ func Build(options configapi.NodeConfig) (*kubeproxyconfig.KubeProxyConfiguratio
masqueradeBit := int32(0)
proxyconfig.IPTables.MasqueradeBit = &masqueradeBit

defaultedProxyConfig, err := proxyOptions.ApplyDefaults(proxyconfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will end up calling SetDefaults_KubeProxyConfiguration() in pkg/proxy/apis/kubeproxyconfig/v1alpha1/defaults.go, which will force MetricsBindAddress and ResourceContainer to non-empty values, contradicting the values we set earlier in this function. Maybe you want to call this at the start rather than the end?

@deads2k deads2k force-pushed the rebase-1.9.0-beta.1 branch 4 times, most recently from 6cc848c to 89dcb16 Compare December 8, 2017 14:18
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 8, 2017
@deads2k deads2k force-pushed the rebase-1.9.0-beta.1 branch from 89dcb16 to c41b903 Compare December 8, 2017 16:00
)

func RunResourceQuotaManager(ctx ControllerContext) (bool, error) {
concurrentResourceQuotaSyncs := int(ctx.OpenshiftControllerOptions.ResourceQuotaOptions.ConcurrentSyncs)
resourceQuotaSyncPeriod := ctx.OpenshiftControllerOptions.ResourceQuotaOptions.SyncPeriod.Duration
replenishmentSyncPeriodFunc := calculateResyncPeriod(ctx.OpenshiftControllerOptions.ResourceQuotaOptions.MinResyncPeriod.Duration)
saName := "resourcequota-controller"
listerFuncForResource := generic.ListerFuncForResourceFunc(ctx.GenericInformerFunc)
quotaConfiguration := quotainstall.NewQuotaConfigurationForControllers(listerFuncForResource)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derekwaynecarr guess this is enabling the generic quota count and fixes the bugzilla

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@@ -895,7 +918,7 @@ imports:
subpackages:
- imageprogress
- name: github.com/openshift/source-to-image
version: e3e0eedc3f9dd55377ca2b8444769ec2ea54c895
version: 5bf2cb171c2cab1270fba80dc53ef4ff25b12c97
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't do that. This might cause some weird problems. If possible pin s2i to the current version we have. It can be easily bumped after rebase lands. Unless you strictly required something from newer version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't do that. This might cause some weird problems. If possible pin s2i to the current version we have. It can be easily bumped after rebase lands. Unless you strictly required something from newer version.

seems to be working.

@soltysh
Copy link
Contributor

soltysh commented Dec 11, 2017

UPSTREAM: 00000: expose special storage locations
UPSTREAM: 00000: make the quota count reconcilation optional

Please open upstream PRs.

<drop>: skip scheduler configz error, drop once we run in a separate process
<drop>: skip controller metric error, drop once we run in a separate process

Is this something that will be handled in 3.9 timeframe? If yes add it to #17428.

Squash together:

pre-adjust moved packages 
boring: kapi moved 
boring: kapi moved    
boring: package move

"k8s.io/apimachinery/pkg/api/meta"
)

func PrintSuccess(mapper meta.RESTMapper, shortOutput bool, out io.Writer, resource, name string, dryRun bool, operation string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we rely on this function a lot in origin, but there's no reason to introduce a carry here. Why not copy it over to origin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we rely on this function a lot in origin, but there's no reason to introduce a carry here. Why not copy it over to origin?

I'll let @juanvallejo sort it out when OC moves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least call it <drop> and add a comment about when we can drop it

}

func (r *ScaleREST) GroupVersionKind(containingGV schema.GroupVersion) schema.GroupVersionKind {
switch containingGV {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt explicitly set us to use extensions/v1beta1.Scale only, not sure if this change is desired.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long term we want to move to autoscaling/Scale

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but worth checking with @liggitt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt explicitly set us to use extensions/v1beta1.Scale only, not sure if this change is desired.

We need to make sure that any future version uses autoscaling. This code does that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soltysh this is not changing the default, if I do oc scale dc/foo I'm still send extensions/Scale and get extensions/Scale. I believe this is just so we have the right internal type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger that.

@@ -89,13 +89,15 @@ const originLongRunningEndpointsRE = "(/|^)(buildconfigs/.*/instantiatebinary|im

var LegacyAPIGroupPrefixes = sets.NewString(apiserver.DefaultLegacyAPIPrefix, api.Prefix)

// TODO I'm honestly not sure this is worth it. We're not likely to ever be able to launch from flags, so this just
// adds a layer of complexity that is driving me crazy.
Copy link
Contributor

@soltysh soltysh Dec 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 with the increasing number of flags and the desire upstream to move to config file

@@ -11,7 +11,8 @@ function os::golang::verify_go_version() {
if [[ "${go_version[2]}" != go1.8* ]]; then
os::log::info "Detected go version: ${go_version[*]}."
if [[ -z "${PERMISSIVE_GO:-}" ]]; then
os::log::fatal "Please install Go version ${OS_REQUIRED_GO_VERSION} or use PERMISSIVE_GO=y to bypass this check."
os::log::warning "Please install Go version ${OS_REQUIRED_GO_VERSION} or use PERMISSIVE_GO=y to bypass this check."
return 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not return 1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is intentional. Hmm. Ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not return 1?

No. The bash conditionals isn't right and something doesn't handle detection well. We hit this last time too.

@@ -34,7 +34,7 @@ const minimumDockerAPIVersionWithPullByID = "1.22"
// All errors here are fatal.
func (c *NodeConfig) EnsureKubeletAccess() {
if _, err := os.Stat("/var/lib/docker"); os.IsPermission(err) {
c.HandleDockerError("Unable to view the /var/lib/docker directory - are you running as root?")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we should be moving away from these checks as we lean more on cri, not making them unconditionally fatal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we should be moving away from these checks as we lean more on cri, not making them unconditionally fatal

This is a pre-existing move. Not something net new.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or at least move this into EnsureDocker or protect with if c.KubeletServer.ContainerRuntime == "docker" { ... }

@deads2k
Copy link
Contributor Author

deads2k commented Dec 13, 2017

/retest

} else {
if err := rest.BeforeUpdate(Strategy, ctx, obj, old); err != nil {
return nil, false, err
}
if err := updateValidation(obj.DeepCopyObject(), old.DeepCopyObject()); err != nil {
Copy link
Contributor

@sttts sttts Dec 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know old is non-nil? Aren't there ways to update without old object? Remembering a comment like that from @lavalamp in a different context. @liggitt ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know old is nil? Aren't there ways to update without old object? Remembering a comment like that from @lavalamp in a different context. @liggitt ?

An update that allows create if it doesn't exist. imagestreamtags don't do that. Also, nil receivers are allowed and as I recall handled by the generated deep copy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil receivers for struct pointers. For interfaces they panic.


// HandleDockerError handles an an error from the docker daemon
func (c *NodeConfig) HandleDockerError(message string) {
if !c.AllowDisabledDocker {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this?

kubelet removed the ability to pass the client through, so the responsibility for wiring a fake falls to kubelet. We can't wire it through any more.

@@ -167,6 +171,9 @@ func (s *REST) Update(ctx apirequest.Context, name string, objInfo rest.UpdatedO
if errs := s.updateStrategy.ValidateUpdate(ctx, obj, oldObj); len(errs) > 0 {
return nil, false, kerrors.NewInvalid(projectapi.Kind("Project"), project.Name, errs)
}
if err := updateValidation(obj.DeepCopyObject(), oldObj.DeepCopyObject()); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here

@@ -23,7 +24,7 @@ func NewRegistry(s Storage) Registry {
}

func (s *storage) CreateSubjectAccessReview(ctx apirequest.Context, subjectAccessReview *api.SubjectAccessReview) (*api.SubjectAccessReviewResponse, error) {
obj, err := s.Create(ctx, subjectAccessReview, false)
obj, err := s.Create(ctx, subjectAccessReview, nil, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't I add some DenyAlways funcs in upstream? Would make it more explicit.

@deads2k
Copy link
Contributor Author

deads2k commented Dec 13, 2017

@ironcladlou @tnozicka hit the anti-flake on end-to-end and it passed

@deads2k
Copy link
Contributor Author

deads2k commented Dec 13, 2017

Every GCE test has passed on this commit. Just not all at the same time.

CloudProvider: "", // now disabled
RootDirectory: "/var/lib/kubelet",
CertDirectory: "/var/lib/kubelet/pki",
RegisterNode: true, // this looks suspicious
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment... this is ok

RootDirectory: "/var/lib/kubelet",
CertDirectory: "/var/lib/kubelet/pki",
RegisterNode: true, // this looks suspicious
RemoteRuntimeEndpoint: "unix:///var/run/dockershim.sock", // overridden
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add Containerized: false, // overridden based on OPENSHIFT_CONTAINERIZED

@deads2k
Copy link
Contributor Author

deads2k commented Dec 13, 2017

/retest

@deads2k
Copy link
Contributor Author

deads2k commented Dec 13, 2017

all but the router health error code have passed on conformance install.

@@ -151,7 +147,7 @@ func TestKubeletDefaults(t *testing.T) {
}

if goruntime.GOOS == "darwin" {
expectedDefaults.KubeletConfiguration.RemoteRuntimeEndpoint = ""
//expectedDefaults.KubeletConfiguration.RemoteRuntimeEndpoint = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow up to restore this, object.KubeletFlags.RemoteRuntimeEndpoint

allowed, delegateReason, err := a.delegate.Authorize(attributes)
if allowed {
return true, reason(attributes), nil
authorized, delegateReason, err := a.delegate.Authorize(attributes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should return authorized, not convert it to Deny

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should return authorized, not convert it to Deny

done

@openshift-ci-robot
Copy link

@deads2k: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/extended_conformance_gce b374cbd link /test extended_conformance_gce
ci/openshift-jenkins/extended_conformance_install b374cbd link /test extended_conformance_install

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

if attributes.GetUser() == nil {
return false, "", errors.New("no user available on context")
return authorizer.DecisionDeny, "", errors.New("no user available on context")
Copy link
Contributor

@liggitt liggitt Dec 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mechanical change to return NoOpinion, not Deny (applies everywhere)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mechanical change to return NoOpinion, not Deny (applies everywhere)

done

@@ -259,7 +259,7 @@ func (r *subjectAccessTest) runTest(t *testing.T) {

expectedResponse := &authorizationapi.SubjectAccessReviewResponse{
Namespace: r.reviewRequest.Action.Namespace,
Allowed: r.authorizer.allowed,
Allowed: r.authorizer.allowed == kauthorizer.DecisionAllow,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Denied: r.authorizer.allowed == kauthorizer.DecisionDeny,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Denied: r.authorizer.allowed == kauthorizer.DecisionDeny,

Not for our object.

@@ -161,6 +164,9 @@ func NewCmdMigrateAPIStorage(name, fullName string, f *clientcmd.Factory, in io.
Long: internalMigrateStorageLong,
Example: fmt.Sprintf(internalMigrateStorageExample, fullName),
Run: func(cmd *cobra.Command, args []string) {
apiextensioninstall.Install(legacyscheme.GroupFactoryRegistry, legacyscheme.Registry, legacyscheme.Scheme)
apiregistrationinstall.Install(legacyscheme.GroupFactoryRegistry, legacyscheme.Registry, legacyscheme.Scheme)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a follow up to figure out what we need to do to revert this... migrate shouldn't need registered types... it reads into unstructured and puts back

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a follow up to figure out what we need to do to revert this... migrate shouldn't need registered types... it reads into unstructured and puts back

done

@@ -215,7 +215,8 @@ func (o *ResourceOptions) Complete(f *clientcmd.Factory, c *cobra.Command) error
break
}

o.Builder = f.NewBuilder(true).
o.Builder = f.NewBuilder().
Internal().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this... we have to deal with unstructured objects in migrate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe make Unstructured()... not sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this... we have to deal with unstructured objects in migrate

Not according to the old code. Looks like it was always going into internal types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull open

@liggitt
Copy link
Contributor

liggitt commented Dec 13, 2017

/lgtm

can take comments as follow ups.

need immediate update to disable two router tests to unblock queue, then PR to re-enable and add debugging to troubleshoot the failures.

@liggitt
Copy link
Contributor

liggitt commented Dec 13, 2017

all tests (except two router tests) passed at least once on this commit level. no conflicts with master. merging with immediate follow ups.

@liggitt liggitt merged commit a2ec324 into openshift:master Dec 13, 2017
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, liggitt, mfojtik

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:
  • OWNERS [deads2k,liggitt,mfojtik]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@sttts
Copy link
Contributor

sttts commented Dec 14, 2017

🎉 🚀 🎈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. needs-api-review size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.