Upstream + hybrid-overlay merge 2019-12-28 by dcbw · Pull Request #70 · openshift/ovn-kubernetes

dcbw · 2019-12-21T04:14:40Z

Upstream master + ovn-kubernetes/ovn-kubernetes#889

@squeed @alexanderConstantinescu @danwinship @rcarrillocruz @JacobTanenbaum @pecameron

Signed-off-by: Dan Williams <dcbw@redhat.com>

Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

AddFilteredEndpointsHandler must take label selector like other handlers

build: fix 'make lint' when GOPATH isn't explicitly set

When handling the scheme:address:port URLs given to OVN for configuring how to reach OVN services, properly handle IPv6 addresses by not assuming we can just split on ":" across the whole string. Also use JoinHostPort to properly join a host and port for both IPv4 and IPv6 cases.

Fix parsing of IPv6 addresses in ovn URLs

…ic event notifications to watchers Improving debugging for failing tests

So, we have registered 9409 and 9410 port numbers for ovnkube-master and ovnkube-node here: https://github.com/prometheus/prometheus/wiki/Default-port-allocations Change the current port numbers to use the reserved port numbers. Furthermore, with the current port numbers -- 9101 and 9102 -- the node_exporter daemonset is crashing because it uses one of the above ports. Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

the test case was passing config.GatewayModeLocal for the shared gateway mode instead of config.GatewayModeShared. Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

the boolean argument that determined whether a localnet logical switch port was required or not was required for spare gateway mode. the two gateway modes we support today will always have localnet logical switch port, so remove that redundant argument Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

Fixes: c3def15 ("Add multicast support.") Signed-off-by: Dumitru Ceara <dceara@redhat.com>

Pulls in changes to support multiple subnets and to support IPv6: openshift/sdn#66

Sync SubnetAllocator from openshift/sdn

Enable IGMP Querier only if a source IPv4 is available.

The pod network info of IP, MAC, Gateway, and Routes are under 'ovn' annotation. We need to move it under 'k8s.ovn.org' namespace. The new annotation is called 'pod-networks', and it is going to be a map of 'network_name' to pod's IP information on that network. For example: ("default" refers to the first OVN interface to the Pod) { "default": { "gateway_ip": "192.168.2.1", "ip_address": "192.168.2.3/24", "mac_address": "8a:24:f4:a8:02:04" } } The changes assumes that the master is upgraded first. It continues to write both the old/new annotation names to facilitate yet-to-be upgraded ovnkube nodes. In the next release of ovn-kubernetes, we can remove the code that adds `legacy` annotation. Signed-off-by: Yun Zhou <yunz@nvidia.com> Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

Move pod annotation under k8s.ovn.org namespace

The current test annotates the node upfront and later checks to see if the node has correct subnet information. This is not right. We need to start with no subnet annotation and then later check if the node has subnet annotation. Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

As with all the other Poll*() functions, don't return an error if all we want to do is just check again at the next interval. Signed-off-by: Dan Williams <dcbw@redhat.com>

the MAC address for node's management port is randomly chosen. this address is then added to node's annotation. the master reads the address and creates a corresponding logical switch port using this address. now when node reboots, the mac address of the management port on the node changes. this changed address is then reflected on node's annotation and then in the UpdateFunc callback handler for the node resource, we update the MAC address of the logical switch port. this is all unnecessary complexity, so better way is to just persist the initial MAC for the management port in the interface's MAC column Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

use ahostsv4 database to ensure we get IPv4 address always

Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

the MAC address of br-nexthop port is re-generated upon every reboot. OVN SB remembers the old MAC address in it's MAC_Binding table and this causes communication issue. just like how physical NICs have fixed MAC addresses, create these interfaces with the fixed MAC address of 00:00:a9:fe:21:01 where in the last 4 hex octets correspond to 169.255.33.1 Fixes openshift#946 Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

add a switch to flip on/off multicast support (is disabled by default)

Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

with map[string]interface{} we can have the value to be `nil` and that can be used to remove an annotation from the node. Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

dcbw · 2019-12-21T13:25:59Z

/test e2e-aws-ovn

set other_config:hwaaddr on br-local before you add br-nexthop

dcbw · 2019-12-21T19:12:33Z

Last "failure" was actually success except for [Feature:Prometheus][Conformance] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Suite:openshift/conformance/parallel/minimal] which we know is not quite right...

dcbw · 2019-12-21T19:16:22Z

/test e2e-aws-ovn

dcbw · 2019-12-22T15:19:10Z

Another "pass" except for the Prometheus alert issue.

/test e2e-aws-ovn

dcbw · 2019-12-22T21:55:42Z

Another "pass" except for the Prometheus alert issue.

/test e2e-aws-ovn

With 400+ odd nodes, the current MangementPortReady() function is not scaling. The ovn-nbctl calls are timing out. When we have a way to find out that the data path for the management port is ready by checking for OpenFlow rules on the integration bridge we should make use of it. Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

With 400+ odd nodes, the current GatewayReady() function is not scaling. The ovn-nbctl calls are timing out. When we have a way to find out that the data path for the L3Gateway is ready by checking for OpenFlow rules on the integration bridge we should make use of it. Adding SNAT rules is the last thing we do while building the logical topology. So, check for the SNAT rule in table 65 in the integration bridge Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

scale: ascertain management port readiness by checking OpenFlow rules

scale: ascertain gateway readiness by checking OpenFlow rules

dcbw · 2019-12-23T14:08:55Z

Another "pass" except for the Prometheus alert issue. Other failure is the openshift-apiserver failing with 1 endpoint.go:68] ccResolverWrapper: sending new addresses to cc: [{https://etcd.openshift-etcd.svc:2379 0 \u003cnil\u003e}]\nF1222 22:45:09.504287 1 openshift_apiserver.go:420] context deadline exceeded

/test e2e-aws-ovn

ovn-kubernetes was already setting ovn-remote-probe-interval. This patch follows the same pattern for ovn-openflow-probe-interval, and does it for the same reasons. The default value for this option is 5 seconds. On a large cluster, this can cause excessive CPU consumption in ovn-controller. If it takes ovn-controller 5 seconds to do a full state computation, then you'll see ovn-controller end up in effectively a busy loop, because it isn't able to keep up with this probe interval. The openflow probe is even less interesting than the OVSDB remote probe. At least the ovsdb connection is to something remote. The openflow connection is always local, so this is unlikely to be a problem. We now set it to 3 minutes by default, just in case, instead of disabling it completely. Signed-off-by: Russell Bryant <russell@ovn.org>

ovn-controller: Set ovn-openflow-probe-interval

ovnkube-master.log file, with 290K lines of log messages, had close to 221K lines of '... UPDATE for event handler X' log messages that doesn't provide any meaningful information. in fact, in that noise we might miss important log message. so remove these debug messages. Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

dcbw · 2019-12-23T19:29:13Z

Another "pass" except for the Prometheus alert issue.

/test e2e-aws-ovn

remove unwanted debug log messages in factory.go

currently, that function gets other-config to ascertain that the logcial switch is created for a node and continues. later on, we make an another call to get other-config:subnet. instead, check for other-config:subnet itself and avoid an unnecessary call. Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

scale: waitForNodeLogicalSwitch() should get other-config:subnet itself

dcbw · 2019-12-26T19:26:50Z

/test e2e-aws-ovn

dcbw · 2019-12-26T19:57:56Z

ovnkube masters do provide metrics on 0.0.0.0:9102:

# HELP ovnkube_master_pod_creation_latency_seconds The latency between pod creation and setting the OVN annotations
# TYPE ovnkube_master_pod_creation_latency_seconds histogram
ovnkube_master_pod_creation_latency_seconds_bucket{le="0.1"} 0
ovnkube_master_pod_creation_latency_seconds_bucket{le="0.2"} 2
ovnkube_master_pod_creation_latency_seconds_bucket{le="0.4"} 6
ovnkube_master_pod_creation_latency_seconds_bucket{le="0.8"} 30
ovnkube_master_pod_creation_latency_seconds_bucket{le="1.6"} 57

so perhaps the problem is either getting those metrics to prometheus, or the prometheus alert itself?

dcbw · 2019-12-26T20:52:46Z

And a success without the prometheus metric issue.

/test e2e-aws-ovn

dcbw · 2019-12-28T03:56:50Z

Prometheus alert issue again, otherwise good.

/test e2e-aws-ovn

dcbw · 2019-12-28T20:39:23Z

/test e2e-aws-ovn

dcbw · 2019-12-29T03:26:38Z

/test e2e-aws-ovn

dcbw · 2019-12-29T04:26:46Z

Fixes for prometheus alert failures are openshift/cluster-network-operator#435 and openshift/cluster-network-operator#436

…dress"" This reverts commit d57a9f7.

dcbw and others added 30 commits November 15, 2019 08:59

build: fix 'make lint' when GOPATH isn't explicitly set

4c37c58

Signed-off-by: Dan Williams <dcbw@redhat.com>

AddFilteredEndpointsHandler must take label selector like other handlers

a0660f6

Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

Merge pull request openshift#936 from girishmg/fix_ep_handler

5fa3faa

AddFilteredEndpointsHandler must take label selector like other handlers

Merge pull request openshift#929 from dcbw/fix-make-lint

40feb8f

build: fix 'make lint' when GOPATH isn't explicitly set

Merge pull request openshift#938 from russellb/ipv6-ovn-urls

12d92a0

Fix parsing of IPv6 addresses in ovn URLs

Adding un-ordered comparison of fake commands, due to non-determinist…

caf1568

…ic event notifications to watchers Improving debugging for failing tests

Networkpolicy test cases, increasing test coverage to 52 %

176d86b

fix test associated with the shared gateway mode

b5c210b

the test case was passing config.GatewayModeLocal for the shared gateway mode instead of config.GatewayModeShared. Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

Enable IGMP Querier only if a source IPv4 is available.

27682a3

Fixes: c3def15 ("Add multicast support.") Signed-off-by: Dumitru Ceara <dceara@redhat.com>

Sync SubnetAllocator from openshift/sdn

2dfcbdc

Pulls in changes to support multiple subnets and to support IPv6: openshift/sdn#66

Make hostBits calculation work for ipv4/ipv6

9fbd8b9

Merge pull request openshift#945 from markmc/allocator-sync

af3717f

Sync SubnetAllocator from openshift/sdn

Merge pull request openshift#944 from dceara/igmp-querier-ipv4-check

ddf007b

Enable IGMP Querier only if a source IPv4 is available.

Merge pull request openshift#922 from girishmg/pod_annotation_change

8f13ba8

Move pod annotation under k8s.ovn.org namespace

node: don't return error from isOVNControllerReady() flow check

95918bb

As with all the other Poll*() functions, don't return an error if all we want to do is just check again at the next interval. Signed-off-by: Dan Williams <dcbw@redhat.com>

use ahostsv4 database to ensure we get IPv4 address always

164d1ae

Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

Merge pull request openshift#952 from girishmg/use_ahostsv4

2ef6c10

use ahostsv4 database to ensure we get IPv4 address always

add a switch to flip on/off multicast support (is disabled by default)

9a080f7

Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

trivial: remove unnecessary check for nil

d2942b0

Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

Merge pull request openshift#954 from girishmg/multicast_switch

8d4823f

add a switch to flip on/off multicast support (is disabled by default)

Removing networkpolicy logic from pods.go to disable race condition

99ee4ac

capture disabled gateway-mode as node annotation to simplify code

6a08174

Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

change SetAnnotationsOnNode() to take map[string]interface{}

0a69c51

with map[string]interface{} we can have the value to be `nil` and that can be used to remove an annotation from the node. Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>

Merge pull request openshift#983 from girishmg/fix_upstream_persist_mac

f7cdd98

set other_config:hwaaddr on br-local before you add br-nexthop

girishmg and others added 4 commits December 23, 2019 17:39

Merge pull request openshift#984 from girishmg/upstream_scale_fixes

8b9b481

scale: ascertain management port readiness by checking OpenFlow rules

Merge pull request openshift#985 from girishmg/gateway_readiness

f4c7696

scale: ascertain gateway readiness by checking OpenFlow rules

russellb and others added 3 commits December 23, 2019 09:57

Merge pull request openshift#986 from russellb/openflow-probe

9dd32c5

ovn-controller: Set ovn-openflow-probe-interval

dcbw and others added 3 commits December 23, 2019 13:31

Merge pull request openshift#987 from girishmg/us_debug_log

ccd9450

remove unwanted debug log messages in factory.go

Merge pull request openshift#989 from girishmg/other_config_subnet

5771dd8

scale: waitForNodeLogicalSwitch() should get other-config:subnet itself

dcbw added 2 commits December 28, 2019 23:42

Revert "Revert "create br-nexthop OVS internal port with fixed MAC ad…

59d7c31

…dress"" This reverts commit d57a9f7.

Merge remote-tracking branch 'ovnorg/master'

cee681b

dcbw changed the title ~~Upstream + hybrid-overlay merge 2019-12-20~~ Upstream + hybrid-overlay merge 2019-12-28 Dec 29, 2019

dcbw merged commit 13d85c0 into openshift:master Dec 29, 2019

dcbw mentioned this pull request Dec 29, 2019

[wip] windows fix #64

Closed

Conversation

dcbw commented Dec 21, 2019

Uh oh!

dcbw commented Dec 21, 2019

Uh oh!

dcbw commented Dec 21, 2019

Uh oh!

dcbw commented Dec 21, 2019

Uh oh!

dcbw commented Dec 22, 2019

Uh oh!

dcbw commented Dec 22, 2019

Uh oh!

dcbw commented Dec 23, 2019

Uh oh!

dcbw commented Dec 23, 2019

Uh oh!

dcbw commented Dec 26, 2019

Uh oh!

dcbw commented Dec 26, 2019

Uh oh!

dcbw commented Dec 26, 2019

Uh oh!

dcbw commented Dec 28, 2019

Uh oh!

dcbw commented Dec 28, 2019

Uh oh!

dcbw commented Dec 29, 2019

Uh oh!

dcbw commented Dec 29, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants