Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Hot Reload for secrets #49

Merged
merged 44 commits into from
Dec 17, 2024
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
3131a4e
:seedling: Update Go dependencies.
guettli Dec 12, 2024
182c862
:seedling: Fix tests.
guettli Dec 12, 2024
c44897e
... fix TestRateLimitIsExceeded
guettli Dec 12, 2024
20d4aed
fix TestLoadBalancerOps_ReconcileHCLBTargets
guettli Dec 12, 2024
5a6d841
better err msg for e2e tests.
guettli Dec 12, 2024
d0e762b
clean up
guettli Dec 12, 2024
5d31767
more docs.
guettli Dec 12, 2024
2369bc8
Merge branch 'tg/fix-tests' into tg/update-go-dependencies
guettli Dec 12, 2024
e63ace6
fixed TestInstances_InstanceMetadataRobotServer and better err msg fo…
guettli Dec 12, 2024
70c5643
Merge branch 'tg/fix-tests' into tg/update-go-dependencies
guettli Dec 12, 2024
a589799
fix tests.
guettli Dec 12, 2024
33e5aeb
make code easier to read.
guettli Dec 12, 2024
c12c49a
:seedling: Hot Reload for secrets
guettli Dec 13, 2024
0e7706b
updateHcloudToken() works, including tests.
guettli Dec 13, 2024
576e2ac
Merge branch 'tg/fix-tests' into tg/hot-reload-for-secrets
guettli Dec 13, 2024
2bf4b4c
started with hot reloading of robot credentials.
guettli Dec 13, 2024
6aa7c6d
fix tests.
guettli Dec 13, 2024
9b465f7
Merge remote-tracking branch 'origin/main' into tg/hot-reload-for-sec…
guettli Dec 13, 2024
2c3c70a
hotreload works for robot, not yet for hcloud.
guettli Dec 13, 2024
560a40b
hotreload of hcloud is working.
guettli Dec 13, 2024
a69ee61
removed not needed github actions. Fixed ubuntu version.
guettli Dec 13, 2024
260ed47
clean up.
guettli Dec 13, 2024
3a3045f
align k8s version to our other repos.
guettli Dec 16, 2024
2d526f3
Merge branch 'main' into tg/hot-reload-for-secrets
guettli Dec 16, 2024
8badab8
Merge branch 'main' into tg/hot-reload-for-secrets
guettli Dec 16, 2024
0dc0140
Merge branch 'main' into tg/hot-reload-for-secrets
guettli Dec 16, 2024
790f6b0
avoid not needed changes in go.mod
guettli Dec 16, 2024
18780d0
update go.sum
guettli Dec 16, 2024
5ee018a
make fsnotify work in Kubernetes Pod.
guettli Dec 16, 2024
3a64a7e
remove NODE_NAME
guettli Dec 16, 2024
3279e78
move var at the top.
guettli Dec 16, 2024
be9e272
add comment to Reload Counter
guettli Dec 16, 2024
15237fd
comment why cache gets reset.
guettli Dec 16, 2024
c2c5f60
robot test was flaky, fixed it.
guettli Dec 16, 2024
63ed4db
fix data race in hotreload.
guettli Dec 16, 2024
e2b0421
Merge branch 'main' into tg/hot-reload-for-secrets
guettli Dec 17, 2024
3ae86ff
add hint that robotClient can be nil.
guettli Dec 17, 2024
f547528
aligned naming, and added method: hotreload.CredentialsDirectory().
guettli Dec 17, 2024
dc85252
rename to `credentials`.
guettli Dec 17, 2024
dc6223f
added comments to code.
guettli Dec 17, 2024
f5501a0
inlined an exported function, removed not needed global variable.
guettli Dec 17, 2024
e75ddbc
docs.
guettli Dec 17, 2024
12e2215
docs: add current PR to list of PR which need to get added to upstream.
guettli Dec 17, 2024
d0fed72
added "failed" to err msg.
guettli Dec 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ permissions:
jobs:
manager-image:
name: Build and push manager image
runs-on: ubuntu-latest
runs-on: ubuntu-24.04
steps:
- name: Checkout code
uses: actions/checkout@eef61447b9ff4aafe5dcd4e0bbf5d482be7e7871 # v4.2.1
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ permissions:
jobs:
manager-image:
name: Build and push manager image
runs-on: ubuntu-latest
runs-on: ubuntu-24.04
steps:
- name: Checkout code
uses: actions/checkout@eef61447b9ff4aafe5dcd4e0bbf5d482be7e7871 # v4.2.1
Expand Down Expand Up @@ -123,7 +123,7 @@ jobs:
release:
name: Create draft release
runs-on: ubuntu-latest
runs-on: ubuntu-24.04
needs:
- manager-image
steps:
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,5 @@ deploy/gen/
./hetzner-cloud-controller-manager
*.tgz
hack/.*
/*.kubeconfig
/etc
4 changes: 0 additions & 4 deletions deploy/ccm-bare-metal.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -111,10 +111,6 @@ spec:
cpu: 100m
memory: 50Mi
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: HCLOUD_TOKEN
valueFrom:
secretKeyRef:
Expand Down
4 changes: 0 additions & 4 deletions deploy/ccm-networks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,6 @@ spec:
secretKeyRef:
key: token
name: hcloud
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: HCLOUD_NETWORK
valueFrom:
secretKeyRef:
Expand Down
21 changes: 10 additions & 11 deletions deploy/ccm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -66,22 +66,21 @@ spec:
- "--route-reconciliation-period=30s"
- "--webhook-secure-port=0"
- "--leader-elect=false"
env:
- name: HCLOUD_TOKEN
valueFrom:
secretKeyRef:
key: token
name: hcloud
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
image: quay.io/syself/hetzner-cloud-controller-manager:v1.18.0-0.0.1 # x-release-please-version
image: ccm:test # TODO
volumeMounts:
- name: hetzner-secret
mountPath: "/etc/hetzner-secret"
readOnly: true
ports:
- name: metrics
containerPort: 8233
resources:
requests:
cpu: 100m
memory: 50Mi
volumes:
- name: hetzner-secret
secret:
secretName: hetzner

priorityClassName: system-cluster-critical
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ module github.com/syself/hetzner-cloud-controller-manager
go 1.23.0

require (
github.com/fsnotify/fsnotify v1.8.0
github.com/hetznercloud/hcloud-go/v2 v2.17.0
github.com/prometheus/client_golang v1.20.5
github.com/spf13/pflag v1.0.5
Expand Down Expand Up @@ -32,7 +33,6 @@ require (
github.com/emicklei/go-restful/v3 v3.12.1 // indirect
github.com/evanphx/json-patch v4.12.0+incompatible // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/fsnotify/fsnotify v1.8.0 // indirect
github.com/go-logr/logr v1.4.2 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/go-openapi/jsonpointer v0.21.0 // indirect
Expand Down
111 changes: 54 additions & 57 deletions hcloud/cloud.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,20 +23,19 @@ import (
"io"
"net/http"
"os"
"path/filepath"
"regexp"
"runtime/debug"
"strconv"
"strings"
"time"

"github.com/hetznercloud/hcloud-go/v2/hcloud"
"github.com/hetznercloud/hcloud-go/v2/hcloud/metadata"
"github.com/syself/hetzner-cloud-controller-manager/internal/hcops"
"github.com/syself/hetzner-cloud-controller-manager/internal/hotreload"
"github.com/syself/hetzner-cloud-controller-manager/internal/metrics"
robotclient "github.com/syself/hetzner-cloud-controller-manager/internal/robot/client"
"github.com/syself/hetzner-cloud-controller-manager/internal/robot/client/cache"
"github.com/syself/hetzner-cloud-controller-manager/internal/util"
hrobot "github.com/syself/hrobot-go"
corev1 "k8s.io/api/core/v1"
"k8s.io/client-go/tools/record"
cloudprovider "k8s.io/cloud-provider"
Expand All @@ -51,16 +50,10 @@ const (
hcloudDebugENVVar = "HCLOUD_DEBUG"
robotDebugENVVar = "ROBOT_DEBUG"

robotUserNameENVVar = "ROBOT_USER_NAME"
robotPasswordENVVar = "ROBOT_PASSWORD"

// Only as reference - is used in hcops package.
// Default is 5 minutes.
RateLimitWaitTimeRobot = "RATE_LIMIT_WAIT_TIME_ROBOT"

// default is 5 minutes.
CacheTimeout = "CACHE_TIMEOUT"

// Disable the "master/server is attached to the network" check against the metadata service.
hcloudNetworkDisableAttachedCheckENVVar = "HCLOUD_NETWORK_DISABLE_ATTACHED_CHECK"
hcloudNetworkRoutesEnabledENVVar = "HCLOUD_NETWORK_ROUTES_ENABLED"
Expand All @@ -73,7 +66,6 @@ const (
hcloudLoadBalancersDisableIPv6 = "HCLOUD_LOAD_BALANCERS_DISABLE_IPV6"
hcloudMetricsEnabledENVVar = "HCLOUD_METRICS_ENABLED"
hcloudMetricsAddress = ":8233"
nodeNameENVVar = "NODE_NAME"
providerName = "hcloud"
hostNamePrefixRobot = "bm-"
)
Expand All @@ -84,7 +76,7 @@ var errMissingRobotCredentials = errors.New("missing robot credentials - cannot
var providerVersion = "unknown"

type cloud struct {
client *hcloud.Client
hcloudClient *hcloud.Client
robotClient robotclient.Client
instances *instances
routes *routes
Expand All @@ -111,22 +103,21 @@ func (lt *LoggingTransport) RoundTrip(req *http.Request) (resp *http.Response, e
return resp, nil
}

func newCloud(_ io.Reader) (cloudprovider.Interface, error) {
const op = "hcloud/newCloud"
metrics.OperationCalled.WithLabelValues(op).Inc()

token := os.Getenv(hcloudTokenENVVar)
if token == "" {
return nil, fmt.Errorf("environment variable %q is required", hcloudTokenENVVar)
func newHcloudClient(rootDir string) (*hcloud.Client, error) {
secretDir := filepath.Join(rootDir, "etc", "hetzner-secret")
token, err := hotreload.GetInitialHcloudCredentialsFromDirectory(secretDir)
if err != nil {
klog.V(1).Infof("reading Hetzner Cloud token from directory failed. Will try env var: %s", err.Error())
token = os.Getenv(hcloudTokenENVVar)
if token == "" {
return nil, fmt.Errorf("Either token from directory %q or environment variable %q is required", secretDir, hcloudTokenENVVar)
}
} else {
klog.V(1).Infof("reading Hetzner Cloud token from %q. The controller will reload the credentials, when the file changes", secretDir)
}
if len(token) != 64 {
return nil, fmt.Errorf("entered token is invalid (must be exactly 64 characters long)")
}
nodeName := os.Getenv(nodeNameENVVar)
if nodeName == "" {
return nil, fmt.Errorf("environment variable %q is required", nodeNameENVVar)
}

opts := []hcloud.ClientOption{
hcloud.WithToken(token),
hcloud.WithApplication("hetzner-cloud-controller", providerVersion),
Expand All @@ -146,43 +137,39 @@ func newCloud(_ io.Reader) (cloudprovider.Interface, error) {
opts = append(opts, hcloud.WithEndpoint(endpoint))
}
client := hcloud.NewClient(opts...)
metadataClient := metadata.NewClient()
return client, nil
}

robotUserName := os.Getenv(robotUserNameENVVar)
robotPassword := os.Getenv(robotPasswordENVVar)
func newCloud(_ io.Reader) (cloudprovider.Interface, error) {
const op = "hcloud/newCloud"
metrics.OperationCalled.WithLabelValues(op).Inc()

cacheTimeout, err := util.GetEnvDuration(CacheTimeout)
rootDir, err := os.Getwd()
if err != nil {
return nil, fmt.Errorf("%s: %w", op, err)
}

if cacheTimeout == 0 {
cacheTimeout = 5 * time.Minute
hcloudClient, err := newHcloudClient(rootDir)
if err != nil {
return nil, fmt.Errorf("%s: %w", op, err)
}
metadataClient := metadata.NewClient()

var robotClient robotclient.Client
if robotUserName != "" && robotPassword != "" {
var c hrobot.RobotClient
if os.Getenv(robotDebugENVVar) == "true" {
client := &http.Client{
Transport: &LoggingTransport{
roundTripper: http.DefaultTransport,
},
}
c = hrobot.NewBasicAuthClientWithCustomHttpClient(robotUserName, robotPassword, client)
klog.Info("Enabled robot API debugging")
} else {
c = hrobot.NewBasicAuthClient(robotUserName, robotPassword)
klog.Infof("Not enabling robot API debugging. Set env var %s=true to enable it.", robotDebugENVVar)
var httpClient *http.Client
if os.Getenv(robotDebugENVVar) == "true" {
httpClient = &http.Client{
Transport: &LoggingTransport{
roundTripper: http.DefaultTransport,
},
}
robotClient = cache.NewClient(c, cacheTimeout)
} else {
klog.Infof("Hetzner robot is not support because of insufficient credentials. Robot user name specified: %v. Robot password specified: %v", robotUserName != "", robotPassword != "")
}
robotClient, err := cache.NewCachedRobotClient(rootDir, httpClient, "")
if err != nil {
return nil, fmt.Errorf("%s: %w", op, err)
}

var networkID int64
if v, ok := os.LookupEnv(hcloudNetworkENVVar); ok {
n, _, err := client.Network.Get(context.Background(), v)
n, _, err := hcloudClient.Network.Get(context.Background(), v)
if err != nil {
return nil, fmt.Errorf("%s: %w", op, err)
}
Expand Down Expand Up @@ -210,7 +197,7 @@ func newCloud(_ io.Reader) (cloudprovider.Interface, error) {
}

// Validate that the provided token works, and we have network connectivity to the Hetzner Cloud API
_, _, err = client.Server.List(context.Background(), hcloud.ServerListOpts{})
_, _, err = hcloudClient.Server.List(context.Background(), hcloud.ServerListOpts{})
if err != nil {
return nil, fmt.Errorf("%s: %w", op, err)
}
Expand All @@ -228,17 +215,17 @@ func newCloud(_ io.Reader) (cloudprovider.Interface, error) {
lbRecorder := eventBroadcaster.NewRecorder(scheme.Scheme, corev1.EventSource{Component: "hetzner-ccm-loadbalancer"})

lbOps := &hcops.LoadBalancerOps{
LBClient: &client.LoadBalancer,
CertOps: &hcops.CertificateOps{CertClient: &client.Certificate},
ActionClient: &client.Action,
NetworkClient: &client.Network,
LBClient: &hcloudClient.LoadBalancer,
CertOps: &hcops.CertificateOps{CertClient: &hcloudClient.Certificate},
ActionClient: &hcloudClient.Action,
NetworkClient: &hcloudClient.Network,
RobotClient: robotClient,
NetworkID: networkID,
Recorder: lbRecorder,
Defaults: lbOpsDefaults,
}

loadBalancers := newLoadBalancers(lbOps, &client.Action, lbDisablePrivateIngress, lbDisableIPv6)
loadBalancers := newLoadBalancers(lbOps, &hcloudClient.Action, lbDisablePrivateIngress, lbDisableIPv6)
if os.Getenv(hcloudLoadBalancersEnabledENVVar) == "false" {
loadBalancers = nil
}
Expand All @@ -248,10 +235,20 @@ func newCloud(_ io.Reader) (cloudprovider.Interface, error) {
return nil, fmt.Errorf("%s: %w", op, err)
}

secretsDir := filepath.Join(rootDir, "etc", "hetzner-secret")
_, err = os.Stat(secretsDir)
if err == nil {
// Watch for changes in the secrets directory
err := hotreload.Watch(secretsDir, hcloudClient, robotClient)
if err != nil {
return nil, fmt.Errorf("%s: %w", op, err)
}
}

return &cloud{
client: client,
hcloudClient: hcloudClient,
robotClient: robotClient,
instances: newInstances(client, robotClient, instancesAddressFamily, networkID),
instances: newInstances(hcloudClient, robotClient, instancesAddressFamily, networkID),
loadBalancer: loadBalancers,
routes: nil,
networkID: networkID,
Expand Down Expand Up @@ -288,7 +285,7 @@ func (c *cloud) Clusters() (cloudprovider.Clusters, bool) {

func (c *cloud) Routes() (cloudprovider.Routes, bool) {
if c.networkID > 0 && os.Getenv(hcloudNetworkRoutesEnabledENVVar) != "false" {
r, err := newRoutes(c.client, c.networkID)
r, err := newRoutes(c.hcloudClient, c.networkID)
if err != nil {
klog.ErrorS(err, "create routes provider", "networkID", c.networkID)
return nil, false
Expand Down
Loading