Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
c5eb5ba
Add client metrics
pappz Jan 12, 2026
e3a5c44
Add client metrics system with OpenTelemetry and VictoriaMetrics support
pappz Jan 15, 2026
bb377a2
Merge main branch into feature/client-metrics
pappz Jan 21, 2026
5f02a48
Merge branch 'main' into feature/client-metrics
pappz Jan 28, 2026
5169129
Add signaling metrics tracking for initial and reconnection attempts
pappz Jan 28, 2026
cbfde79
Reset connection stage timestamps during reconnections to exclude unn…
pappz Jan 28, 2026
08295e5
Delete otel lib from client
pappz Jan 28, 2026
e7283a8
Update unit tests
pappz Jan 28, 2026
138e728
Invoke callback on handshake success in WireGuard watcher
pappz Jan 29, 2026
80abddb
Merge branch 'main' into feature/client-metrics
pappz Feb 10, 2026
ca3e6d9
Add Netbird version tracking to client metrics
pappz Feb 11, 2026
bec58b8
Add sync duration tracking to client metrics
pappz Feb 11, 2026
3753bf7
Remove no-op metrics implementation and simplify ClientMetrics constr…
pappz Feb 11, 2026
7e276a4
Add total duration tracking for connection attempts
pappz Feb 11, 2026
cf0a1fa
Add metrics push support to VictoriaMetrics integration
pappz Feb 11, 2026
75a5955
Merge branch 'main' into feature/client-metrics-push
pappz Feb 20, 2026
8ed99ba
[client] anchor connection metrics to first signal received
pappz Feb 20, 2026
dfdff69
Merge branch 'main' into feature/client-metrics-push
pappz Mar 5, 2026
8a852d4
Remove creation_to_semaphore connection stage metric
pappz Mar 5, 2026
473f59c
[client] Add remote push config for metrics with version-based eligib…
pappz Mar 5, 2026
ee13016
[client] Add WASM-compatible NewClientMetrics implementation
pappz Mar 5, 2026
eb1a9b1
Add missing file
pappz Mar 5, 2026
f2ef0c4
Update default case in DeploymentType.String to return "unknown" inst…
pappz Mar 5, 2026
e585064
[client] Rework metrics to use timestamped samples instead of histograms
pappz Mar 5, 2026
5a018c1
[client] Add InfluxDB metrics backend alongside VictoriaMetrics
pappz Mar 5, 2026
4aeab69
[client] Fix metrics issues and update dev docker setup
pappz Mar 6, 2026
80543b5
[client] Add anonymised peer tracking to pushed metrics
pappz Mar 9, 2026
b577289
Remove unused dependencies from go.mod and go.sum
pappz Mar 9, 2026
4698490
Refactor InfluxDB ingest pipeline: extract validation logic
pappz Mar 9, 2026
1d5224b
Set non-root user in Dockerfile for Ingest service
pappz Mar 9, 2026
15f12c8
Fix Windows CI: command line too long
pappz Mar 9, 2026
ddaaa92
Remove Victoria metrics
pappz Mar 10, 2026
d4c80ef
Add hashed peer ID as Authorization header in metrics push
pappz Mar 10, 2026
ebfd984
Revert influxdb in docker compose
pappz Mar 10, 2026
da63e2f
Enable gzip compression and authorization validation for metrics push…
pappz Mar 10, 2026
5815fac
Reducate code of complexity
pappz Mar 10, 2026
1dd2b9b
Update debug documentation to include metrics.txt description
pappz Mar 10, 2026
f6353c3
Increase `maxBodySize` limit to 50 MB and update gzip reader wrapping…
pappz Mar 10, 2026
d8118df
Refactor deployment type detection to use URL parsing for improved ac…
pappz Mar 10, 2026
d78f05d
Update readme
pappz Mar 10, 2026
3625b3b
Throttle remote config retries on fetch failure
pappz Mar 10, 2026
e24c0bb
Preserve first WG handshake timestamp, ignore rekeys
pappz Mar 10, 2026
804cd5d
Skip adding empty metrics.txt to debug bundle in debug mode
pappz Mar 10, 2026
b4cd717
Update default metrics server URL to https://ingest.netbird.io
pappz Mar 12, 2026
d272755
Atomic metrics export-and-reset to prevent sample loss between Export…
pappz Mar 13, 2026
21ffd87
Fix doc
pappz Mar 13, 2026
9df13ba
Refactor Push configuration to improve clarity and enforce minimum pu…
pappz Mar 13, 2026
1085ad0
Remove `minPushInterval` and update push interval validation logic
pappz Mar 13, 2026
44edbfd
Revert ExportAndReset, it is acceptable data loss
pappz Mar 13, 2026
20d0569
Fix metrics review issues: rename env var, remove stale infra, add tests
lixmal Mar 18, 2026
d4be42b
Add login duration metric, ingest tag validation, and duration bounds
lixmal Mar 18, 2026
672fc66
Add arch tag to all metrics
lixmal Mar 18, 2026
c44b797
Fix Grafana dashboard: add arch to drop columns, add login panels
lixmal Mar 18, 2026
efa2ec9
Merge remote-tracking branch 'origin/main' into feature/client-metric…
lixmal Mar 18, 2026
e67ae9a
Validate NB_METRICS_SERVER_URL is an absolute HTTP(S) URL
lixmal Mar 18, 2026
59f8c5d
Address review comments: fix README wording, update stale comments
lixmal Mar 18, 2026
8a761cc
Clarify env var precedence does not bypass remote config eligibility
lixmal Mar 18, 2026
909445d
Remove accidentally committed pprof files
lixmal Mar 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions .github/workflows/golang-test-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,10 +63,15 @@ jobs:
- run: PsExec64 -s -w ${{ github.workspace }} C:\hostedtoolcache\windows\go\${{ steps.go.outputs.go-version }}\x64\bin\go.exe env -w GOMODCACHE=${{ env.cache }}
- run: PsExec64 -s -w ${{ github.workspace }} C:\hostedtoolcache\windows\go\${{ steps.go.outputs.go-version }}\x64\bin\go.exe env -w GOCACHE=${{ env.modcache }}
- run: PsExec64 -s -w ${{ github.workspace }} C:\hostedtoolcache\windows\go\${{ steps.go.outputs.go-version }}\x64\bin\go.exe mod tidy
- run: echo "files=$(go list ./... | ForEach-Object { $_ } | Where-Object { $_ -notmatch '/management' } | Where-Object { $_ -notmatch '/relay' } | Where-Object { $_ -notmatch '/signal' } | Where-Object { $_ -notmatch '/proxy' } | Where-Object { $_ -notmatch '/combined' })" >> $env:GITHUB_ENV
- name: Generate test script
run: |
$packages = go list ./... | Where-Object { $_ -notmatch '/management' } | Where-Object { $_ -notmatch '/relay' } | Where-Object { $_ -notmatch '/signal' } | Where-Object { $_ -notmatch '/proxy' } | Where-Object { $_ -notmatch '/combined' }
$goExe = "C:\hostedtoolcache\windows\go\${{ steps.go.outputs.go-version }}\x64\bin\go.exe"
$cmd = "$goExe test -tags=devcert -timeout 10m -p 1 $($packages -join ' ') > test-out.txt 2>&1"
Set-Content -Path "${{ github.workspace }}\run-tests.cmd" -Value $cmd
- name: test
run: PsExec64 -s -w ${{ github.workspace }} cmd.exe /c "C:\hostedtoolcache\windows\go\${{ steps.go.outputs.go-version }}\x64\bin\go.exe test -tags=devcert -timeout 10m -p 1 ${{ env.files }} > test-out.txt 2>&1"
run: PsExec64 -s -w ${{ github.workspace }} cmd.exe /c "${{ github.workspace }}\run-tests.cmd"
- name: test output
if: ${{ always() }}
run: Get-Content test-out.txt
40 changes: 40 additions & 0 deletions client/internal/connect.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import (
"github.com/netbirdio/netbird/client/iface/netstack"
"github.com/netbirdio/netbird/client/internal/dns"
"github.com/netbirdio/netbird/client/internal/listener"
"github.com/netbirdio/netbird/client/internal/metrics"
"github.com/netbirdio/netbird/client/internal/peer"
"github.com/netbirdio/netbird/client/internal/profilemanager"
"github.com/netbirdio/netbird/client/internal/statemanager"
Expand Down Expand Up @@ -50,6 +51,7 @@ type ConnectClient struct {

engine *Engine
engineMutex sync.Mutex
clientMetrics *metrics.ClientMetrics
updateManager *updater.Manager

persistSyncResponse bool
Expand Down Expand Up @@ -133,10 +135,34 @@ func (c *ConnectClient) run(mobileDependency MobileDependency, runningChan chan
}
}()

// Stop metrics push on exit
defer func() {
if c.clientMetrics != nil {
c.clientMetrics.StopPush()
}
}()

log.Infof("starting NetBird client version %s on %s/%s", version.NetbirdVersion(), runtime.GOOS, runtime.GOARCH)

nbnet.Init()

// Initialize metrics once at startup (always active for debug bundles)
if c.clientMetrics == nil {
agentInfo := metrics.AgentInfo{
DeploymentType: metrics.DeploymentTypeUnknown,
Version: version.NetbirdVersion(),
OS: runtime.GOOS,
Arch: runtime.GOARCH,
}
c.clientMetrics = metrics.NewClientMetrics(agentInfo)
log.Debugf("initialized client metrics")

// Start metrics push if enabled (uses daemon context, persists across engine restarts)
if metrics.IsMetricsPushEnabled() {
c.clientMetrics.StartPush(c.ctx, metrics.PushConfigFromEnv())
}
Comment thread
pappz marked this conversation as resolved.
}

backOff := &backoff.ExponentialBackOff{
InitialInterval: time.Second,
RandomizationFactor: 1,
Expand Down Expand Up @@ -223,6 +249,16 @@ func (c *ConnectClient) run(mobileDependency MobileDependency, runningChan chan
mgmNotifier := statusRecorderToMgmConnStateNotifier(c.statusRecorder)
mgmClient.SetConnStateListener(mgmNotifier)

// Update metrics with actual deployment type after connection
deploymentType := metrics.DetermineDeploymentType(mgmClient.GetServerURL())
agentInfo := metrics.AgentInfo{
DeploymentType: deploymentType,
Version: version.NetbirdVersion(),
OS: runtime.GOOS,
Arch: runtime.GOARCH,
}
c.clientMetrics.UpdateAgentInfo(agentInfo, myPrivateKey.PublicKey().String())
Comment thread
pappz marked this conversation as resolved.

log.Debugf("connected to the Management service %s", c.config.ManagementURL.Host)
defer func() {
if err = mgmClient.Close(); err != nil {
Expand All @@ -231,8 +267,10 @@ func (c *ConnectClient) run(mobileDependency MobileDependency, runningChan chan
}()

// connect (just a connection, no stream yet) and login to Management Service to get an initial global Netbird config
loginStarted := time.Now()
loginResp, err := loginToManagement(engineCtx, mgmClient, publicSSHKey, c.config)
if err != nil {
c.clientMetrics.RecordLoginDuration(engineCtx, time.Since(loginStarted), false)
log.Debug(err)
if s, ok := gstatus.FromError(err); ok && (s.Code() == codes.PermissionDenied) {
state.Set(StatusNeedsLogin)
Expand All @@ -241,6 +279,7 @@ func (c *ConnectClient) run(mobileDependency MobileDependency, runningChan chan
}
return wrapErr(err)
}
c.clientMetrics.RecordLoginDuration(engineCtx, time.Since(loginStarted), true)
c.statusRecorder.MarkManagementConnected()

localPeerState := peer.LocalPeerState{
Expand Down Expand Up @@ -317,6 +356,7 @@ func (c *ConnectClient) run(mobileDependency MobileDependency, runningChan chan
Checks: checks,
StateManager: stateManager,
UpdateManager: c.updateManager,
ClientMetrics: c.clientMetrics,
}, mobileDependency)
engine.SetSyncResponsePersistence(c.persistSyncResponse)
c.engine = engine
Expand Down
37 changes: 37 additions & 0 deletions client/internal/debug/debug.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ resolved_domains.txt: Anonymized resolved domain IP addresses from the status re
config.txt: Anonymized configuration information of the NetBird client.
network_map.json: Anonymized sync response containing peer configurations, routes, DNS settings, and firewall rules.
state.json: Anonymized client state dump containing netbird states for the active profile.
metrics.txt: Buffered client metrics in InfluxDB line protocol format. Only present when metrics collection is enabled. Peer identifiers are anonymized.
Comment thread
pappz marked this conversation as resolved.
mutex.prof: Mutex profiling information.
goroutine.prof: Goroutine profiling information.
block.prof: Block profiling information.
Expand Down Expand Up @@ -219,6 +220,11 @@ const (
darwinStdoutLogPath = "/var/log/netbird.err.log"
)

// MetricsExporter is an interface for exporting metrics
type MetricsExporter interface {
Export(w io.Writer) error
}

type BundleGenerator struct {
anonymizer *anonymize.Anonymizer

Expand All @@ -229,6 +235,7 @@ type BundleGenerator struct {
logPath string
cpuProfile []byte
refreshStatus func() // Optional callback to refresh status before bundle generation
clientMetrics MetricsExporter

anonymize bool
includeSystemInfo bool
Expand All @@ -250,6 +257,7 @@ type GeneratorDependencies struct {
LogPath string
CPUProfile []byte
RefreshStatus func() // Optional callback to refresh status before bundle generation
ClientMetrics MetricsExporter
}

func NewBundleGenerator(deps GeneratorDependencies, cfg BundleConfig) *BundleGenerator {
Expand All @@ -268,6 +276,7 @@ func NewBundleGenerator(deps GeneratorDependencies, cfg BundleConfig) *BundleGen
logPath: deps.LogPath,
cpuProfile: deps.CPUProfile,
refreshStatus: deps.RefreshStatus,
clientMetrics: deps.ClientMetrics,

anonymize: cfg.Anonymize,
includeSystemInfo: cfg.IncludeSystemInfo,
Expand Down Expand Up @@ -351,6 +360,10 @@ func (g *BundleGenerator) createArchive() error {
log.Errorf("failed to add corrupted state files to debug bundle: %v", err)
}

if err := g.addMetrics(); err != nil {
log.Errorf("failed to add metrics to debug bundle: %v", err)
}

if err := g.addWgShow(); err != nil {
log.Errorf("failed to add wg show output: %v", err)
}
Expand Down Expand Up @@ -744,6 +757,30 @@ func (g *BundleGenerator) addCorruptedStateFiles() error {
return nil
}

func (g *BundleGenerator) addMetrics() error {
if g.clientMetrics == nil {
log.Debugf("skipping metrics in debug bundle: no metrics collector")
return nil
}

var buf bytes.Buffer
if err := g.clientMetrics.Export(&buf); err != nil {
return fmt.Errorf("export metrics: %w", err)
}

if buf.Len() == 0 {
log.Debugf("skipping metrics.txt in debug bundle: no metrics data")
return nil
}

if err := g.addFileToZip(&buf, "metrics.txt"); err != nil {
return fmt.Errorf("add metrics file to zip: %w", err)
}
Comment thread
pappz marked this conversation as resolved.

log.Debugf("added metrics to debug bundle")
return nil
Comment thread
pappz marked this conversation as resolved.
}

func (g *BundleGenerator) addLogfile() error {
if g.logPath == "" {
log.Debugf("skipping empty log file in debug bundle")
Expand Down
27 changes: 21 additions & 6 deletions client/internal/engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ import (
"github.com/netbirdio/netbird/client/internal/dnsfwd"
"github.com/netbirdio/netbird/client/internal/expose"
"github.com/netbirdio/netbird/client/internal/ingressgw"
"github.com/netbirdio/netbird/client/internal/metrics"
"github.com/netbirdio/netbird/client/internal/netflow"
nftypes "github.com/netbirdio/netbird/client/internal/netflow/types"
"github.com/netbirdio/netbird/client/internal/networkmonitor"
Expand Down Expand Up @@ -149,6 +150,7 @@ type EngineServices struct {
Checks []*mgmProto.Checks
StateManager *statemanager.Manager
UpdateManager *updater.Manager
ClientMetrics *metrics.ClientMetrics
}

// Engine is a mechanism responsible for reacting on Signal and Management stream events and managing connections to the remote peers.
Expand Down Expand Up @@ -229,6 +231,9 @@ type Engine struct {

probeStunTurn *relay.StunTurnProbe

// clientMetrics collects and pushes metrics
clientMetrics *metrics.ClientMetrics

jobExecutor *jobexec.Executor
jobExecutorWG sync.WaitGroup

Expand Down Expand Up @@ -272,6 +277,7 @@ func NewEngine(
checks: services.Checks,
probeStunTurn: relay.NewStunTurnProbe(relay.DefaultCacheTTL),
jobExecutor: jobexec.NewExecutor(),
clientMetrics: services.ClientMetrics,
updateManager: services.UpdateManager,
}

Expand Down Expand Up @@ -813,7 +819,9 @@ func (e *Engine) handleAutoUpdateVersion(autoUpdateSettings *mgmProto.AutoUpdate
func (e *Engine) handleSync(update *mgmProto.SyncResponse) error {
started := time.Now()
defer func() {
log.Infof("sync finished in %s", time.Since(started))
duration := time.Since(started)
log.Infof("sync finished in %s", duration)
e.clientMetrics.RecordSyncDuration(e.ctx, duration)
}()
e.syncMsgMux.Lock()
defer e.syncMsgMux.Unlock()
Expand Down Expand Up @@ -1061,6 +1069,7 @@ func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobR
StatusRecorder: e.statusRecorder,
SyncResponse: syncResponse,
LogPath: e.config.LogPath,
ClientMetrics: e.clientMetrics,
RefreshStatus: func() {
e.RunHealthProbes(true)
},
Expand Down Expand Up @@ -1515,11 +1524,12 @@ func (e *Engine) createPeerConn(pubKey string, allowedIPs []netip.Prefix, agentV
}

serviceDependencies := peer.ServiceDependencies{
StatusRecorder: e.statusRecorder,
Signaler: e.signaler,
IFaceDiscover: e.mobileDep.IFaceDiscover,
RelayManager: e.relayManager,
SrWatcher: e.srWatcher,
StatusRecorder: e.statusRecorder,
Signaler: e.signaler,
IFaceDiscover: e.mobileDep.IFaceDiscover,
RelayManager: e.relayManager,
SrWatcher: e.srWatcher,
MetricsRecorder: e.clientMetrics,
}
peerConn, err := peer.NewConn(config, serviceDependencies)
if err != nil {
Expand Down Expand Up @@ -1816,6 +1826,11 @@ func (e *Engine) GetExposeManager() *expose.Manager {
return e.exposeManager
}

// GetClientMetrics returns the client metrics
func (e *Engine) GetClientMetrics() *metrics.ClientMetrics {
return e.clientMetrics
}

func findIPFromInterfaceName(ifaceName string) (net.IP, error) {
iface, err := net.InterfaceByName(ifaceName)
if err != nil {
Expand Down
6 changes: 3 additions & 3 deletions client/internal/engine_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -828,7 +828,7 @@ func TestEngine_UpdateNetworkMapWithRoutes(t *testing.T) {
WgPrivateKey: key,
WgPort: 33100,
MTU: iface.DefaultMTU,
}, EngineServices{
}, EngineServices{
SignalClient: &signal.MockClient{},
MgmClient: &mgmt.MockClient{},
RelayManager: relayMgr,
Expand Down Expand Up @@ -1035,7 +1035,7 @@ func TestEngine_UpdateNetworkMapWithDNSUpdate(t *testing.T) {
WgPrivateKey: key,
WgPort: 33100,
MTU: iface.DefaultMTU,
}, EngineServices{
}, EngineServices{
SignalClient: &signal.MockClient{},
MgmClient: &mgmt.MockClient{},
RelayManager: relayMgr,
Expand Down Expand Up @@ -1566,7 +1566,7 @@ func createEngine(ctx context.Context, cancel context.CancelFunc, setupKey strin
}

relayMgr := relayClient.NewManager(ctx, nil, key.PublicKey().String(), iface.DefaultMTU)
e, err := NewEngine(ctx, cancel, conf, EngineServices{
e, err := NewEngine(ctx, cancel, conf, EngineServices{
SignalClient: signalClient,
MgmClient: mgmtClient,
RelayManager: relayMgr,
Expand Down
17 changes: 17 additions & 0 deletions client/internal/metrics/connection_type.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
package metrics

// ConnectionType represents the type of peer connection
type ConnectionType string

const (
// ConnectionTypeICE represents a direct peer-to-peer connection using ICE
ConnectionTypeICE ConnectionType = "ice"

// ConnectionTypeRelay represents a relayed connection
ConnectionTypeRelay ConnectionType = "relay"
)

// String returns the string representation of the connection type
func (c ConnectionType) String() string {
return string(c)
}
51 changes: 51 additions & 0 deletions client/internal/metrics/deployment_type.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
package metrics

import (
"net/url"
"strings"
)

// DeploymentType represents the type of NetBird deployment
type DeploymentType int

const (
// DeploymentTypeUnknown represents an unknown or uninitialized deployment type
DeploymentTypeUnknown DeploymentType = iota

// DeploymentTypeCloud represents a cloud-hosted NetBird deployment
DeploymentTypeCloud

// DeploymentTypeSelfHosted represents a self-hosted NetBird deployment
DeploymentTypeSelfHosted
)

// String returns the string representation of the deployment type
func (d DeploymentType) String() string {
switch d {
case DeploymentTypeCloud:
return "cloud"
case DeploymentTypeSelfHosted:
return "selfhosted"
default:
return "unknown"
}
}

// DetermineDeploymentType determines if the deployment is cloud or self-hosted
// based on the management URL string
func DetermineDeploymentType(managementURL string) DeploymentType {
if managementURL == "" {
return DeploymentTypeUnknown
}

u, err := url.Parse(managementURL)
if err != nil {
return DeploymentTypeSelfHosted
}

if strings.ToLower(u.Hostname()) == "api.netbird.io" {
return DeploymentTypeCloud
}

return DeploymentTypeSelfHosted
}
Loading
Loading