Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: [dropgz] Dropgz for windows #2075

Merged
merged 2 commits into from
Aug 1, 2023
Merged

feat: [dropgz] Dropgz for windows #2075

merged 2 commits into from
Aug 1, 2023

Conversation

vipul-21
Copy link
Contributor

@vipul-21 vipul-21 commented Jul 25, 2023

Dropgz support for windows.

Reason for Change:

Issue Fixed:

Requirements:

Notes:

@vipul-21 vipul-21 added cni Related to CNI. feature labels Jul 25, 2023
Comment on lines 15 to 25
RUN curl -LO --cacert /etc/ssl/certs/ca-certificates.crt https://github.com/Azure/azure-container-networking/releases/download/$AZCNI_VERSION/azure-vnet-cni-$OS-$ARCH-$AZCNI_VERSION.zip && unzip -o azure-vnet-cni-$OS-$ARCH-$AZCNI_VERSION.zip

FROM --platform=linux/${ARCH} mcr.microsoft.com/cbl-mariner/base/core:2.0 AS compressor
ARG OS
WORKDIR /dropgz
COPY dropgz .
COPY --from=azure-vnet /azure-container-networking/azure-vnet.exe pkg/embed/fs
COPY --from=azure-vnet /azure-container-networking/azure-vnet-telemetry.exe pkg/embed/fs
COPY --from=azure-vnet /azure-container-networking/azure-vnet-ipam.exe pkg/embed/fs
RUN cd pkg/embed/fs/ && sha256sum * > sum.txt
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamilmani1989 Need help in what we want to add a part of dropgz image for windows. Currently just the v1 binaries are added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that should be fine.. add azure-vnet-telemetry.config as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on @rbtr comments, since the file is populated with default values only that is why we don't need to copy it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i replied in below comments.. lets copy it to keep the same behavior as like prod atleast for cniv1

// For windows we need to close the process running with the binary before we can rename it.
// This is because the file is locked by the process.
// We can't use the os.Rename() function because it will fail with an error.
if runtime.GOOS == "windows" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rbtr @tamilmani1989 Need review of this part of code. Added the comment on why we need it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya this is bad..i guess its azure-vnet-telemetry process.. i'm considering it to move to cns to start telemetry service.. then we dont need to copy azure-vnet-telemetry for swift

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this code ? Lets consider telemetry process not running when we use dropgz.. for cniv1, lets update cniv1 installer deployment yaml to kill azure-vnet-telemetry process before invoking dropgz

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also give a retry if copy fails due to transient error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also give a retry if copy fails due to transient error

disagree, the init container will be restarted if it exits with an error

Copy link
Member

@tamilmani1989 tamilmani1989 Jul 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case, its fine..retry not required.

@vipul-21 vipul-21 marked this pull request as ready for review July 25, 2023 15:25
@vipul-21 vipul-21 force-pushed the singhvipul/dropgz branch 2 times, most recently from 00e1e51 to c517809 Compare July 25, 2023 19:41
dropgz/build/cniTest_windows.Dockerfile Show resolved Hide resolved
dropgz/build/cniTest_windows.Dockerfile Show resolved Hide resolved
Comment on lines 15 to 25
RUN curl -LO --cacert /etc/ssl/certs/ca-certificates.crt https://github.com/Azure/azure-container-networking/releases/download/$AZCNI_VERSION/azure-vnet-cni-$OS-$ARCH-$AZCNI_VERSION.zip && unzip -o azure-vnet-cni-$OS-$ARCH-$AZCNI_VERSION.zip

FROM --platform=linux/${ARCH} mcr.microsoft.com/cbl-mariner/base/core:2.0 AS compressor
ARG OS
WORKDIR /dropgz
COPY dropgz .
COPY --from=azure-vnet /azure-container-networking/azure-vnet.exe pkg/embed/fs
COPY --from=azure-vnet /azure-container-networking/azure-vnet-telemetry.exe pkg/embed/fs
COPY --from=azure-vnet /azure-container-networking/azure-vnet-ipam.exe pkg/embed/fs
RUN cd pkg/embed/fs/ && sha256sum * > sum.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that should be fine.. add azure-vnet-telemetry.config as well

// For windows we need to close the process running with the binary before we can rename it.
// This is because the file is locked by the process.
// We can't use the os.Rename() function because it will fail with an error.
if runtime.GOOS == "windows" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya this is bad..i guess its azure-vnet-telemetry process.. i'm considering it to move to cns to start telemetry service.. then we dont need to copy azure-vnet-telemetry for swift

// For windows we need to close the process running with the binary before we can rename it.
// This is because the file is locked by the process.
// We can't use the os.Rename() function because it will fail with an error.
if runtime.GOOS == "windows" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this code ? Lets consider telemetry process not running when we use dropgz.. for cniv1, lets update cniv1 installer deployment yaml to kill azure-vnet-telemetry process before invoking dropgz

@vipul-21 vipul-21 force-pushed the singhvipul/dropgz branch 3 times, most recently from 93a1424 to a98bcfa Compare July 28, 2023 21:00
- /k/azurecni/bin/azure-vnet-ipam.exe
- azure-vnet-telemetry.exe
- -o
- /k/azurecni/bin/azure-vnet-telemetry.exe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have to copy telemetry.config file as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, still testing the init container part, will ping once that is verified.

@vipul-21 vipul-21 force-pushed the singhvipul/dropgz branch 3 times, most recently from 73d04ff to 4bd927e Compare July 28, 2023 22:27
Copy link
Member

@tamilmani1989 tamilmani1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@thatmattlong
Copy link
Contributor

Curious how this will work? I remember from a couple years back that you couldn't hot-swap Windows CNI because containerd required a reboot to pick up the new bin. Has that changed, and hot swap of CNI is possible on Windows now?

@vipul-21
Copy link
Contributor Author

Curious how this will work? I remember from a couple years back that you couldn't hot-swap Windows CNI because containerd required a reboot to pick up the new bin. Has that changed, and hot swap of CNI is possible on Windows now?

I am not sure about the containerd reboot requirement. We were updating the binaries on the windows on cluster and testing it earlier as well. We just added the same functionality through dropgz.
So earlier even if we replaces the binaries, the old binaries were getting picked up ?

@vipul-21
Copy link
Contributor Author

vipul-21 commented Aug 1, 2023

@thatmattlong @rbtr Can i please get a review on this ? ( as a part of code owners )

Copy link
Contributor

@rbtr rbtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question re: 2019 otherwise lgtm

name: cni-plugin
arch: amd64
os_version: ltsc2019
os_version: ltsc2022
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still support/should we still test 2019?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't plan to test 2019 in PR pipeline. But we will be testing 2019 in a separate pipeline( CNI test pipeline).

@vipul-21 vipul-21 merged commit 7a41178 into master Aug 1, 2023
@vipul-21 vipul-21 deleted the singhvipul/dropgz branch August 1, 2023 16:06
jpayne3506 pushed a commit that referenced this pull request Sep 11, 2023
* feat: [dropgz] Dropgz for windows

* Removing the code for killing the process from dropgz for windows

(cherry picked from commit 7a41178)
jpayne3506 pushed a commit that referenced this pull request Sep 12, 2023
* feat: [dropgz] Dropgz for windows

* Removing the code for killing the process from dropgz for windows

(cherry picked from commit 7a41178)
jpayne3506 pushed a commit that referenced this pull request Sep 12, 2023
* feat: [dropgz] Dropgz for windows

* Removing the code for killing the process from dropgz for windows

(cherry picked from commit 7a41178)
jpayne3506 pushed a commit that referenced this pull request Sep 12, 2023
* feat: [dropgz] Dropgz for windows

* Removing the code for killing the process from dropgz for windows

(cherry picked from commit 7a41178)
jpayne3506 pushed a commit that referenced this pull request Sep 12, 2023
* feat: [dropgz] Dropgz for windows

* Removing the code for killing the process from dropgz for windows

(cherry picked from commit 7a41178)
jpayne3506 pushed a commit that referenced this pull request Sep 17, 2023
* feat: [dropgz] Dropgz for windows

* Removing the code for killing the process from dropgz for windows

(cherry picked from commit 7a41178)
jpayne3506 pushed a commit that referenced this pull request Sep 18, 2023
* feat: [dropgz] Dropgz for windows

* Removing the code for killing the process from dropgz for windows

(cherry picked from commit 7a41178)
jpayne3506 pushed a commit that referenced this pull request Sep 20, 2023
* feat: [dropgz] Dropgz for windows

* Removing the code for killing the process from dropgz for windows

(cherry picked from commit 7a41178)
jpayne3506 added a commit that referenced this pull request Sep 22, 2023
* build azure-vnet-telemetry and azure-vnet-ipam in dropgz-test (#1846)

build azure-vnet-telemetry and azure-vnet-ipam in dropgz-test for parity with release image

Signed-off-by: Evan Baker <[email protected]>
(cherry picked from commit f619259)

* ci: disable kube-proxy for test clusters (#1965)

* disable kube-proxy for byocni cluster creation

* test config mapping

* shell pwd

* use CURDIR

* check current directory

* test with repo root dir

* test azp format

* test azp format

* test azp format

* change e2e steps to remove kube proxy

* fix load test update args

* fix ns and rg in update

* update ciliume2e

* fix kubectl cmd in load test

* adding new targets for no kube proxy

* remove cluster update

* update overlay e2e

* test behavior of load test

* test grep for azure-cns

* look for container deployment

* testing

* restart node variable check

* update if condition

* add skip node case

---------

Co-authored-by: tamilmani1989 <[email protected]>
(cherry picked from commit 024819d)

* CI: [CNI] Replace the bash scripts for CNI load testing with golang test cases (#2003)

CI:[CNI] Replace the bash scripts with the golang test cases
(cherry picked from commit 008ae45)

* ci: [CNI] Move Nightly Cilium Pipeline test to ACN (#1963)

* CNS to be able to generate dualstack overaly CNI conflist (#1981)

* fix: Eliminating duplicate lines

* ci: Add update permission for ciliumidentity

* fix: Parameterize Image Registry

add retry to nnc update during scaledown (#1970)

* add retry to nnc update during scaledown

Signed-off-by: Evan Baker <[email protected]>

* test for panic in pool monitor

Signed-off-by: Evan Baker <[email protected]>

---------

Signed-off-by: Evan Baker <[email protected]>

fix: reserve 0th IP as gateway for overlay on Windows (#1968)

* fix: reserve 0th IP as gateway for overlay on Windows

* fix: allow gateway to be updated

ci: windows profile container image (#1988)

Always use 0 for NC version in Overlay (#1979)

always use 0 for NC version in overlay

Signed-off-by: Evan Baker <[email protected]>

[Vnet Scale - CNS]: Flattening CIDR ranges for Node NNC to a list (#1921)

* Read secondary CIDRs from VnetScale NNC

* fix comment

* update comment

* For VnetScale mode, Use 1st IP for def gateway instead of 0th for windows

* fix/add import

* address pr comments

* add comments

* address pr comments

* wrap error

* fix typo

* fix UT

fix: [NPM] check if policy exists in case of nil pointer (#1974)

fix: check for nil first

ci: disable kube-proxy for test clusters (#1965)

* disable kube-proxy for byocni cluster creation

* test config mapping

* shell pwd

* use CURDIR

* check current directory

* test with repo root dir

* test azp format

* test azp format

* test azp format

* change e2e steps to remove kube proxy

* fix load test update args

* fix ns and rg in update

* update ciliume2e

* fix kubectl cmd in load test

* adding new targets for no kube proxy

* remove cluster update

* update overlay e2e

* test behavior of load test

* test grep for azure-cns

* look for container deployment

* testing

* restart node variable check

* update if condition

* add skip node case

---------

Co-authored-by: tamilmani1989 <[email protected]>

perf: [WIN-NPM] fast bootup (#1900)

* wip

* wip2

* use other apply DP func

* address comment about if statement

* finish bootup for both DPs

* fix lint

* fix lint 2

* fix lint 3

* longer UT timeout and add missing UTs for apply in background

tool: [NPM] script to clean up iptable chains (#1978)

tool: script to clean up NPM iptable chains

feat: [WIN-NPM] metrics for latencies and failures (#1959)

* implement metrics

* add npm prefix

* rename windows files

* metrics pkg UTs

* allow reinitializing prometheus metrics

* fix: hns wrapper should not throw error for empty SetPolicy values

* test: metric UTs in dataplane

* fix: record list endpoint latency always

* remove flaky UT

* feat: metric for max ipset members

* fix lint

* fix lint 2

* fix build

* fix lint 3

* simplify conditionals and protect against maxMembers becoming negative

* remove bottom 4 histogram buckets. start at 16 ms

* reset metrics for ipset UTs

* style: don't check for windows dp in *_windows.go files

* build: remove unused import

* test: reset windows metrics in UT

Remove SSH port 22 rule from aks-engine clusters (#1983)

ci: change overlaye2e stage to cilium-overlay (#1997)

* renaming overlaye2e for cilium

* update display names for stages

Initial getHomeAZ 404 changes (#1994)

* initial getHomeAZ 404 changes

* treat 404 as success

* address comments

CNS to be able to generate dualstack overaly CNI conflist (#1981)

fix: Parameterize Image Registry

add retry to nnc update during scaledown (#1970)

* add retry to nnc update during scaledown

Signed-off-by: Evan Baker <[email protected]>

* test for panic in pool monitor

Signed-off-by: Evan Baker <[email protected]>

---------

Signed-off-by: Evan Baker <[email protected]>

fix: reserve 0th IP as gateway for overlay on Windows (#1968)

* fix: reserve 0th IP as gateway for overlay on Windows

* fix: allow gateway to be updated

ci: windows profile container image (#1988)

Always use 0 for NC version in Overlay (#1979)

always use 0 for NC version in overlay

Signed-off-by: Evan Baker <[email protected]>

[Vnet Scale - CNS]: Flattening CIDR ranges for Node NNC to a list (#1921)

* Read secondary CIDRs from VnetScale NNC

* fix comment

* update comment

* For VnetScale mode, Use 1st IP for def gateway instead of 0th for windows

* fix/add import

* address pr comments

* add comments

* address pr comments

* wrap error

* fix typo

* fix UT

fix: [NPM] check if policy exists in case of nil pointer (#1974)

fix: check for nil first

ci: disable kube-proxy for test clusters (#1965)

* disable kube-proxy for byocni cluster creation

* test config mapping

* shell pwd

* use CURDIR

* check current directory

* test with repo root dir

* test azp format

* test azp format

* test azp format

* change e2e steps to remove kube proxy

* fix load test update args

* fix ns and rg in update

* update ciliume2e

* fix kubectl cmd in load test

* adding new targets for no kube proxy

* remove cluster update

* update overlay e2e

* test behavior of load test

* test grep for azure-cns

* look for container deployment

* testing

* restart node variable check

* update if condition

* add skip node case

---------

Co-authored-by: tamilmani1989 <[email protected]>

perf: [WIN-NPM] fast bootup (#1900)

* wip

* wip2

* use other apply DP func

* address comment about if statement

* finish bootup for both DPs

* fix lint

* fix lint 2

* fix lint 3

* longer UT timeout and add missing UTs for apply in background

tool: [NPM] script to clean up iptable chains (#1978)

tool: script to clean up NPM iptable chains

feat: [WIN-NPM] metrics for latencies and failures (#1959)

* implement metrics

* add npm prefix

* rename windows files

* metrics pkg UTs

* allow reinitializing prometheus metrics

* fix: hns wrapper should not throw error for empty SetPolicy values

* test: metric UTs in dataplane

* fix: record list endpoint latency always

* remove flaky UT

* feat: metric for max ipset members

* fix lint

* fix lint 2

* fix build

* fix lint 3

* simplify conditionals and protect against maxMembers becoming negative

* remove bottom 4 histogram buckets. start at 16 ms

* reset metrics for ipset UTs

* style: don't check for windows dp in *_windows.go files

* build: remove unused import

* test: reset windows metrics in UT

Remove SSH port 22 rule from aks-engine clusters (#1983)

ci: change overlaye2e stage to cilium-overlay (#1997)

* renaming overlaye2e for cilium

* update display names for stages

Initial getHomeAZ 404 changes (#1994)

* initial getHomeAZ 404 changes

* treat 404 as success

* address comments

CNS to be able to generate dualstack overaly CNI conflist (#1981)

* fix: File Directory

* style: Comments

* Addressing Comments

---------

Co-authored-by: Paul Johnston <[email protected]>
(cherry picked from commit 1514d95)

* ci:[CNI] Add windows CNIv1 datapath test (#2016)

* ci: Transfer files

* test: Working Datapath Test

* test: apierror Tests

* style: Datapath Package

* test: Deployment timing

* fix: Error check

* fix: Lint

(cherry picked from commit 390977d)

* fix: [CNI] CNI load test failing due to namespace already created (#2031)

fix: CNI load test failing due to namespace already created
(cherry picked from commit c10900e)

* ci:[CNI] Windows cniv1 load test pipeline (#2024)

CI:[CNI] Windows cniv1 load test pipeline
(cherry picked from commit e45ad21)

* ci: [CNI] Adding aks cluster creation steps for k8s e2e test (#2052)

* ci: [CNI] Adding aks cluster creation steps for k8s e2e test

* Add  validate step to the pipeline

* Adding the telemetry config to the cluster

(cherry picked from commit 846e508)

* ci:[CNI] Replace AKS-Engine Tests with k8s conformance tests (#2062)

* Initial Commit

* Add attempts to prevent flakyness

* Add taint for windows tests

* Add k8s e2e tests

* Testing vmSizes

* Artifact k8se2e binary

* Remove NPM E2E

* Add testing and increase processes

* Addressing comments

(cherry picked from commit 451c691)

* CI: Removing AKS engine related code (#2089)

(cherry picked from commit b45c2c7)

* feat: [dropgz] Dropgz for windows (#2075)

* feat: [dropgz] Dropgz for windows

* Removing the code for killing the process from dropgz for windows

(cherry picked from commit 7a41178)

* ci: Update dns tests for k8s conformance (#2104)

Update dns tests for k8s v1.26

(cherry picked from commit bbf2fd4)

* ci: adding cni package as a trigger (#2108)

(cherry picked from commit e6a8ea6)

* ci: add packages for submodule trigger (#2154)

(cherry picked from commit 4aecfd6)

* set mellanox reg key (#1768)

(cherry picked from commit fa2de6d)

---------

Co-authored-by: Evan Baker <[email protected]>
Co-authored-by: Camryn Lee <[email protected]>
Co-authored-by: Vipul Singh <[email protected]>
Co-authored-by: Rajvi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cni Related to CNI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants