Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 13 additions & 15 deletions .github/workflows/build-and-release-dc_util.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,8 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
# os: [linux, windows, darwin]
os: [linux]
# goarch: [amd64, arm64]
goarch: [amd64]
goos: [linux]
goarch: [amd64, arm64]

steps:
- name: Checkout Repository
Expand All @@ -31,22 +29,22 @@ jobs:
run: |
mkdir -p build
cd utils/dc_util # Navigate to the directory containing dc_util.go
GOOS=${{ matrix.os }} GOARCH=${{ matrix.goarch }} go build -o ../../build/dc_util-${{ matrix.os }}-${{ matrix.goarch }}
GOOS=${{ matrix.goos }} GOARCH=${{ matrix.goarch }} go build -o ../../build/dc_util-${{ matrix.goos }}-${{ matrix.goarch }}

- name: Generate SHA256 Checksum
run: |
cd build
if [[ "${{ matrix.os }}" == "windows" ]]; then
sha256sum dc_util-${{ matrix.os }}-${{ matrix.goarch }}.exe > dc_util-${{ matrix.os }}-${{ matrix.goarch }}.exe.sha256
if [[ "${{ matrix.goos }}" == "windows" ]]; then
sha256sum dc_util-${{ matrix.goos }}-${{ matrix.goarch }}.exe > dc_util-${{ matrix.goos }}-${{ matrix.goarch }}.exe.sha256
else
sha256sum dc_util-${{ matrix.os }}-${{ matrix.goarch }} > dc_util-${{ matrix.os }}-${{ matrix.goarch }}.sha256
sha256sum dc_util-${{ matrix.goos }}-${{ matrix.goarch }} > dc_util-${{ matrix.goos }}-${{ matrix.goarch }}.sha256
fi

- name: Upload Binary and Checksum Artifacts
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v4
with:
name: dc_util-${{ matrix.os }}-${{ matrix.goarch }}
path: build/dc_util-${{ matrix.os }}-${{ matrix.goarch }}*
name: dc_util-${{ matrix.goos }}-${{ matrix.goarch }}
path: build/dc_util-${{ matrix.goos }}-${{ matrix.goarch }}*

- name: Clean Up Build Directory
run: |
Expand All @@ -57,14 +55,14 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
os: [linux, windows, darwin]
goos: [linux]
goarch: [amd64, arm64]

steps:
- name: Download Binary Artifact
uses: actions/download-artifact@v3
uses: actions/download-artifact@v4
with:
name: dc_util-${{ matrix.os }}-${{ matrix.goarch }}
name: dc_util-${{ matrix.goos }}-${{ matrix.goarch }}
path: ./release

- name: Create GitHub Release
Expand All @@ -85,7 +83,7 @@ jobs:
with:
upload_url: ${{ steps.create_release.outputs.upload_url }}
asset_path: ./release
asset_name: dc_util-${{ matrix.os }}-${{ matrix.goarch }}-${{ github.event.inputs.release_version }}
asset_name: dc_util-${{ matrix.goos }}-${{ matrix.goarch }}-${{ github.event.inputs.release_version }}
asset_content_type: application/octet-stream

- name: Clean Up Release Directory
Expand Down
2 changes: 2 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ Changelog
Unreleased
----------

* Fix hostname parsing and add tests in dc_util.

2.53.0 (2025-09-25)
-------------------

Expand Down
144 changes: 144 additions & 0 deletions utils/dc_util/CHANGES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added

- **Persistent Logging**: Dual logging to both STDOUT and persistent file with automatic rotation
- New `--log-file` CLI flag (default: `/resource/heapdump/dc_util.log`)
- Automatic file rotation when approaching 1MB to prevent disk space issues
- Failsafe design - continues STDOUT logging even if file logging fails
- Essential for debugging Kubernetes lifecycle hooks where container logs may not be accessible
- Creates directory structure if it doesn't exist

- **PostStart Hook Detection**: Intelligent detection of StatefulSet PostStart hooks
- Automatically scans StatefulSet containers for PostStart hooks with `dc_util --reset-routing`
- Prevents routing allocation changes when no PostStart hook exists to reset them
- Solves historical issue where `NEW_PRIMARIES` routing allocation could not be reliably reset
- Supports both single dash (`-reset-routing`) and double dash (`--reset-routing`) flag formats
- Precise word boundary matching prevents false positives from similar flag names
- Logs clear messages when PostStart hooks are found or missing

- **Single Node Cluster Detection**: Automatic detection and handling of single node clusters
- Detects when StatefulSet has exactly 1 replica and skips decommission
- Prevents unnecessary overhead and potential failures in single node deployments
- Clear logging explains why decommission was skipped
- Maintains existing behavior for multi-node clusters (≥2 replicas)

- **Configurable Lock File Path**: New `--lock-file` CLI flag
- Default: `/resource/heapdump/dc_util.lock`
- Allows customization for different deployment scenarios
- All lock file operations now use configurable path

- **Enhanced Flag Support**: Improved command-line flag handling
- Both `-reset-routing` and `--reset-routing` formats now supported
- Maintains backward compatibility with existing deployments
- Better error handling and validation

- **Multi-Architecture Support**: Automatic CPU architecture detection in hook configurations
- Hook examples now include automatic detection of x86_64/amd64 and aarch64/arm64 architectures
- Downloads appropriate binary based on detected architecture (`dc_util-linux-amd64` or `dc_util-linux-arm64`)
- Eliminates need for separate configuration files for different node architectures
- Graceful error handling for unsupported architectures

### Changed

- **Routing Allocation Logic**: Enhanced PreStop process with PostStart hook detection
- Routing allocation changes now only occur when corresponding PostStart hook exists
- Prevents permanent cluster misconfiguration in deployments without PostStart hooks
- More intelligent decision making based on actual StatefulSet configuration

- **Replica Count Handling**: Improved logic for different cluster sizes
- Zero replicas (scaled down): Skips decommission with clear logging
- Single replica: Skips decommission to prevent failures
- Multiple replicas: Proceeds with normal decommission process
- Better log messages explaining the decision for each scenario

- **Function Signatures**: Updated internal functions to support configurable paths
- `createLockFile()` now accepts lock file path parameter
- `removeLockFile()` now accepts lock file path parameter
- `lockFileExists()` now accepts lock file path parameter
- `handleResetRouting()` now accepts lock file path parameter

### Improved

- **Logging Experience**: Comprehensive logging improvements
- All log messages now appear in both STDOUT and persistent file
- Better visibility into hook execution for debugging
- Historical logs available even after pod restarts
- Easier troubleshooting and operations monitoring

- **Documentation**: Extensively updated README.md
- Added "Recent Updates" section highlighting new features
- New "Replica Count Logic" section with examples
- Updated CLI parameter table with new flags
- Enhanced "PostStart Hook Detection" documentation
- Added complete "Persistent Logging" section with usage examples
- Updated sample logs sections to reflect new capabilities
- All hook configuration examples now include automatic architecture detection
- Clear separation between basic (preStop only) and complete (both hooks) configurations

- **Testing**: Comprehensive test coverage for all new features
- `TestHasPostStartHookWithResetRouting`: PostStart hook detection with various scenarios
- `TestPostStopRoutingAllocationIntegration`: Integration tests for routing allocation logic
- `TestLoggingIntegration`: Dual logging functionality verification
- `TestLogRotation`: File rotation behavior validation
- `TestSingleNodeClusterBehavior`: Single node cluster detection tests
- `TestReplicaCountBehavior`: Comprehensive replica count handling tests
- All existing tests updated to work with new function signatures

### Technical Details

- **New CLI Flags**:
- `--log-file string`: Path to persistent log file (default: `/resource/heapdump/dc_util.log`)
- `--lock-file string`: Path to lock file (default: `/resource/heapdump/dc_util.lock`)

- **New Functions**:
- `setupLogging(logFile string)`: Configures dual logging with rotation
- `hasPostStartHookWithResetRouting(statefulSet *appsv1.StatefulSet)`: PostStart hook detection
- Enhanced replica count logic in main decommission flow

- **Dependencies**: Added `path/filepath` import for directory handling

### Log Message Examples

```bash
# PostStart hook detection
Decommissioner: No postStart hook with dc_util --reset-routing or -reset-routing found, skipping pre-stop routing allocation change

# Single node cluster detection
Decommissioner: Single node cluster detected (replicas=1) -- Skipping decommission

# Architecture detection in hooks
ARCH=$(uname -m)
case $ARCH in
x86_64) BINARY_ARCH="amd64" ;;
aarch64) BINARY_ARCH="arm64" ;;
*) echo "Unsupported architecture: $ARCH"; exit 1 ;;
esac
curl -sLO https://example.com/dc_util-linux-${BINARY_ARCH}

# Persistent logging
Decommissioner: 2025/10/17 15:02:38 Using kubeconfig from /Users/walter/.kube/config
# (Same message appears in both STDOUT and /resource/heapdump/dc_util.log)
```

### Backward Compatibility

- All existing CLI flags and behavior remain unchanged
- Existing StatefulSet configurations continue to work without modification
- New features are opt-in via CLI flags or automatic detection
- No breaking changes to existing functionality

### Benefits

- **For Operations**: Persistent logs make debugging Kubernetes hooks significantly easier
- **For Development**: Enhanced testing capabilities with better dry-run logging
- **For Reliability**: Prevents cluster misconfigurations and single node failures
- **For Maintenance**: Clear logging and automatic file rotation reduce operational overhead
- **For Multi-Architecture**: Automatic architecture detection ensures compatibility across heterogeneous Kubernetes clusters
30 changes: 30 additions & 0 deletions utils/dc_util/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Minimalistic Makefile for dc_util
# Builds both x86_64 and ARM64 binaries for testing

BINARY_NAME = dc_util
GO_FILE = dc_util.go

# Default target
all: build-amd64 build-arm64

# Build for Linux x86_64/amd64
build-amd64:
GOOS=linux GOARCH=amd64 go build -o $(BINARY_NAME)-linux-amd64 $(GO_FILE)

# Build for Linux ARM64
build-arm64:
GOOS=linux GOARCH=arm64 go build -o $(BINARY_NAME)-linux-arm64 $(GO_FILE)

# Clean built binaries
clean:
rm -f $(BINARY_NAME)-linux-amd64 $(BINARY_NAME)-linux-arm64

# Test
test:
go test -v

# Build for current platform (development)
dev:
go build -o $(BINARY_NAME) $(GO_FILE)

.PHONY: all build-amd64 build-arm64 clean test dev
Loading
Loading