Upstream + hybrid-overlay merge 2019-12-28#70
Conversation
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
AddFilteredEndpointsHandler must take label selector like other handlers
build: fix 'make lint' when GOPATH isn't explicitly set
When handling the scheme:address:port URLs given to OVN for configuring how to reach OVN services, properly handle IPv6 addresses by not assuming we can just split on ":" across the whole string. Also use JoinHostPort to properly join a host and port for both IPv4 and IPv6 cases.
Fix parsing of IPv6 addresses in ovn URLs
…ic event notifications to watchers Improving debugging for failing tests
So, we have registered 9409 and 9410 port numbers for ovnkube-master and ovnkube-node here: https://github.com/prometheus/prometheus/wiki/Default-port-allocations Change the current port numbers to use the reserved port numbers. Furthermore, with the current port numbers -- 9101 and 9102 -- the node_exporter daemonset is crashing because it uses one of the above ports. Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
the test case was passing config.GatewayModeLocal for the shared gateway mode instead of config.GatewayModeShared. Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
the boolean argument that determined whether a localnet logical switch port was required or not was required for spare gateway mode. the two gateway modes we support today will always have localnet logical switch port, so remove that redundant argument Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
Fixes: c3def15 ("Add multicast support.") Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Pulls in changes to support multiple subnets and to support IPv6: openshift/sdn#66
Sync SubnetAllocator from openshift/sdn
Enable IGMP Querier only if a source IPv4 is available.
The pod network info of IP, MAC, Gateway, and Routes are under
'ovn' annotation. We need to move it under 'k8s.ovn.org' namespace.
The new annotation is called 'pod-networks', and it is going to be a
map of 'network_name' to pod's IP information on that network. For
example: ("default" refers to the first OVN interface to the Pod)
{
"default": {
"gateway_ip": "192.168.2.1",
"ip_address": "192.168.2.3/24",
"mac_address": "8a:24:f4:a8:02:04"
}
}
The changes assumes that the master is upgraded first. It continues to
write both the old/new annotation names to facilitate yet-to-be
upgraded ovnkube nodes. In the next release of ovn-kubernetes, we can
remove the code that adds `legacy` annotation.
Signed-off-by: Yun Zhou <yunz@nvidia.com>
Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
Move pod annotation under k8s.ovn.org namespace
The current test annotates the node upfront and later checks to see if the node has correct subnet information. This is not right. We need to start with no subnet annotation and then later check if the node has subnet annotation. Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
As with all the other Poll*() functions, don't return an error if all we want to do is just check again at the next interval. Signed-off-by: Dan Williams <dcbw@redhat.com>
the MAC address for node's management port is randomly chosen. this address is then added to node's annotation. the master reads the address and creates a corresponding logical switch port using this address. now when node reboots, the mac address of the management port on the node changes. this changed address is then reflected on node's annotation and then in the UpdateFunc callback handler for the node resource, we update the MAC address of the logical switch port. this is all unnecessary complexity, so better way is to just persist the initial MAC for the management port in the interface's MAC column Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
use ahostsv4 database to ensure we get IPv4 address always
Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
the MAC address of br-nexthop port is re-generated upon every reboot. OVN SB remembers the old MAC address in it's MAC_Binding table and this causes communication issue. just like how physical NICs have fixed MAC addresses, create these interfaces with the fixed MAC address of 00:00:a9:fe:21:01 where in the last 4 hex octets correspond to 169.255.33.1 Fixes openshift#946 Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
add a switch to flip on/off multicast support (is disabled by default)
Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
with map[string]interface{} we can have the value to be `nil` and that
can be used to remove an annotation from the node.
Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
|
/test e2e-aws-ovn |
set other_config:hwaaddr on br-local before you add br-nexthop
|
Last "failure" was actually success except for |
|
/test e2e-aws-ovn |
|
Another "pass" except for the Prometheus alert issue. /test e2e-aws-ovn |
1 similar comment
|
Another "pass" except for the Prometheus alert issue. /test e2e-aws-ovn |
With 400+ odd nodes, the current MangementPortReady() function is not scaling. The ovn-nbctl calls are timing out. When we have a way to find out that the data path for the management port is ready by checking for OpenFlow rules on the integration bridge we should make use of it. Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
With 400+ odd nodes, the current GatewayReady() function is not scaling. The ovn-nbctl calls are timing out. When we have a way to find out that the data path for the L3Gateway is ready by checking for OpenFlow rules on the integration bridge we should make use of it. Adding SNAT rules is the last thing we do while building the logical topology. So, check for the SNAT rule in table 65 in the integration bridge Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
scale: ascertain management port readiness by checking OpenFlow rules
scale: ascertain gateway readiness by checking OpenFlow rules
|
Another "pass" except for the Prometheus alert issue. Other failure is the openshift-apiserver failing with /test e2e-aws-ovn |
ovn-kubernetes was already setting ovn-remote-probe-interval. This patch follows the same pattern for ovn-openflow-probe-interval, and does it for the same reasons. The default value for this option is 5 seconds. On a large cluster, this can cause excessive CPU consumption in ovn-controller. If it takes ovn-controller 5 seconds to do a full state computation, then you'll see ovn-controller end up in effectively a busy loop, because it isn't able to keep up with this probe interval. The openflow probe is even less interesting than the OVSDB remote probe. At least the ovsdb connection is to something remote. The openflow connection is always local, so this is unlikely to be a problem. We now set it to 3 minutes by default, just in case, instead of disabling it completely. Signed-off-by: Russell Bryant <russell@ovn.org>
ovn-controller: Set ovn-openflow-probe-interval
ovnkube-master.log file, with 290K lines of log messages, had close to 221K lines of '... UPDATE for event handler X' log messages that doesn't provide any meaningful information. in fact, in that noise we might miss important log message. so remove these debug messages. Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
|
Another "pass" except for the Prometheus alert issue. /test e2e-aws-ovn |
remove unwanted debug log messages in factory.go
currently, that function gets other-config to ascertain that the logcial switch is created for a node and continues. later on, we make an another call to get other-config:subnet. instead, check for other-config:subnet itself and avoid an unnecessary call. Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
scale: waitForNodeLogicalSwitch() should get other-config:subnet itself
|
/test e2e-aws-ovn |
|
ovnkube masters do provide metrics on 0.0.0.0:9102: so perhaps the problem is either getting those metrics to prometheus, or the prometheus alert itself? |
|
And a success without the prometheus metric issue. /test e2e-aws-ovn |
|
Prometheus alert issue again, otherwise good. /test e2e-aws-ovn |
|
/test e2e-aws-ovn |
1 similar comment
|
/test e2e-aws-ovn |
|
Fixes for prometheus alert failures are openshift/cluster-network-operator#435 and openshift/cluster-network-operator#436 |
…dress"" This reverts commit d57a9f7.
Upstream master + ovn-kubernetes/ovn-kubernetes#889
@squeed @alexanderConstantinescu @danwinship @rcarrillocruz @JacobTanenbaum @pecameron