Skip to content

[DownstreamMerge] Downstream merge 2-1-22#940

Merged
openshift-merge-robot merged 108 commits intoopenshift:masterfrom
trozet:downstream-merge-2-1-2022
Feb 5, 2022
Merged

[DownstreamMerge] Downstream merge 2-1-22#940
openshift-merge-robot merged 108 commits intoopenshift:masterfrom
trozet:downstream-merge-2-1-2022

Conversation

@trozet
Copy link
Contributor

@trozet trozet commented Feb 1, 2022

Passes unit tests

astoycos and others added 30 commits December 16, 2021 15:44
The two metrics:
- metricDBE2eTimestamp
- probe_interval

both make nb/sbctl calls to the OVN dbs, and
to ensure we only make ovsdb client connections
in the master process it makes sense to move these
to the master's set of metrics

It also renames the `probe_interval` metric to
`northd_probe_interval` to make things a bit easier
to parse

Signed-off-by: astoycos <astoycos@redhat.com>
In order to make egressIPs and externalgws compatible,
we re-add the SNAT to nodeIP when we delete the pod
if disableSNATMultipleGWs is true. While doing this,
we need to check if the pod exists or not, because
delLogicalPort is called first before
deletePodEgressIPAssignment. This will leave stale
SNATs behind.

Also enable the CI job pipeline for disableSNAT=true
(From 4.10 OCP this is the default)

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Check if pod exists before re-adding SNAT
Gather process metrics from the host OVS process using the default
pidfile locations.

Signed-off-by: Dan Williams <dcbw@redhat.com>
ovn-kubernetes/ovn-kubernetes@1b94cbb

Managed to break Hybrid overlay specifically with the creation of the
MAC binding for the HO logical Port since we never actually passed
a valid sbClient to the ho Controller.

Also the unit tests were broken in the sense that we didn't have a test
that ensured we actually created all the needed OVN objects from
scratch, these objects are

The HO Logical Switch Port
The HO Logiacal Router Policy
The HO Mac binding

this commit fixes the above problems and more related to HO
found while fixing the tests

Signed-off-by: astoycos <astoycos@redhat.com>
Corrects formatting error in ovsargs.

Fixes: openshift#2726

Signed-off-by: Hareesh Puthalath <hareeshp@nvidia.com>
Fix log message for failed commands in pokeEndpointHostname.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
CI: Fix log message for failed commands in pokeEndpointHostname
When the traffic directed to a service of type loadbalancer reaches the
nodes, it's not redirected to the service's cluster ips. This is
implemented for services of externalip / nodeport, but not for
loadbalancer services. All the other logic is in place.

This will enable the integration with metallb when the traffic reaches
the node from an interface different from breth0.

Also, adjust unit tests related to loadbalancer to look only at
LoadBalancer IPs. In that scenario, externalIPs are not set.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
Use absolute paths instead of relative paths so that kind.sh can be
run from any directory.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
Create iptables NAT rules also for loadbalancer services
Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
the current regex isn't correctly matching log messages with single quote
in the first word, for example:
- can't
- couldn't

this commit doesn't fix regex yet since it was not straight forward

Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
Delete and wait for namespaces inside Context.AfterEach with
f.DeleteNamespace(f.Namespace.Name). This makes sure that we actually
wait for the Namespaces to be deleted and avoids that host network pods
hog ports for too long.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
CI: Wait on namespace deletion for host networked test pods
Rename ensure election timeout unit test to TestEnsureElectionTimeout
and refactor it in preparation for new unit tests to come for
OvnDbManager.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
In preparation for new test cases, make all OvnDbManager methods
mockable.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
Implement unit tests for ensureLocalRaftServerID,
ensureClusterRaftMembership, resetRaftDB

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
ovndbmanager: Implement unit tests for missing functions
kind.sh: Use absolute paths instead of relative paths
On every pod add we assemble a new slice with all of the gatewayInfos.
This had no capacity, so every append was an underlying array copy.
Attempt to allocate at least some predictable capacity to avoid this.

Signed-off-by: Tim Rozet <trozet@redhat.com>
Adds some checking to ensure user provided IPs are correct as well as
detect any cache issues.

Changes Include:
 - Ensure on exgw namespace annotation there are not duplicate IPs
 - Ensure for exgw pod addition, there is not already another pod with
   the same IP
 - If exgw pod cache becomes corrupt with duplicate IPs, emit a warning
   during pod add

Signed-off-by: Tim Rozet <trozet@redhat.com>
When pods are added to the cache as exgws for a namespace, only the
pod's name is used as the key. This breaks a scenario where 2 pods with
the same name are serving as exgws for the same namespace. Consider this
example:

1. app pod is created in ns foo
2. exgwAPod is created in ns exgw1 (172.0.1.1), serving ns foo
3. exgwAPod is created in ns exgw2 (172.0.1.2), serving ns foo

In the above example, the app pod will only have one ECMP route for
172.0.1.2, because the cache is keyed only on pod name.

Signed-off-by: Tim Rozet <trozet@redhat.com>
Host -> svc (ETP=local) backed by ovn pods
does not work in SGW because we add the DNAT
rule that converts the NP to CIP before it
hits the LB on the GR.

Current flow which is wrong:

1) traffic from host gets DNAT-ed to clusterIP svc using iptables
2) traffic sent to br-ex
3) hits the GR load balancer
4) gets DNAT-ed to the backend pods,
5) depending on if this is on the same node or a different one,
we'll have packet delivered to the pod if its on the same node,
or it passes via geneve tunnel to the destination node where the pod lives.

Technically if the backends are not local to the node, the
request should get rejected. This PR removes the iptable DNAT rule
towards CIP if the traffic is of ETP=local type. Instead it adds
the DNAT rule towards masqueradeIP which we already do for LGW
mode. With that, it will send the packet from host into OVN
via mp0 and hits the node-local-switch LB preserving sourceIP.

New flow:

1) NP/EIP/LIP traffic from host hits PRE-ROUTING chain where it
gets DNAT-ed to masqueradeIP:NP.
2) Then it hits OVN-KUBE-SNAT-MGMTPORT where its src-IP is
preserved.
3) enters OVN via mp0, hits the load balancer on the switch
4) gets DNAT-ed to the backend pods if they exist locally on the
node else get rejected.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Multiple ExGW cache validation/improvements
Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
Drop defer statement to make function reserveJoinLRPIPs more readable.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

21 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 4, 2022

@trozet: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-vsphere-ovn cb3dd5e link false /test e2e-vsphere-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit c0a3ddb into openshift:master Feb 5, 2022
@anuragthehatter
Copy link

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comments