Skip to content

Commit face26c

Browse files
Add ProcessMonitor and fix several minor issues.
1 parent dfe5464 commit face26c

File tree

16 files changed

+508
-13
lines changed

16 files changed

+508
-13
lines changed

async-container-supervisor.gemspec

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,4 +28,5 @@ Gem::Specification.new do |spec|
2828
spec.add_dependency "io-endpoint"
2929
spec.add_dependency "memory", "~> 0.7"
3030
spec.add_dependency "memory-leak", "~> 0.5"
31+
spec.add_dependency "process-metrics"
3132
end

context/getting-started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ service "supervisor" do
120120
# Restart workers that exceed 500MB of memory:
121121
Async::Container::Supervisor::MemoryMonitor.new(
122122
interval: 10, # Check every 10 seconds
123-
limit: 1024 * 1024 * 500 # 500MB limit
123+
maximum_size_limit: 1024 * 1024 * 500 # 500MB limit
124124
)
125125
]
126126
end

examples/memory-leak/service.rb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ def setup(container)
2222

2323
chunks = []
2424
while true
25-
Console.info(self, "Leaking memory...")
25+
# Console.info(self, "Leaking memory...")
2626
chunks << " " * 1024 * 1024 * rand(10)
2727
sleep 1
2828
instance.ready!
@@ -51,7 +51,7 @@ def setup(container)
5151
# The interval at which to check for memory leaks.
5252
interval: 1,
5353
# The total size limit of all processes:
54-
maximum_size_limit: 1024 * 1024 * 1000, # 1000 MB
54+
maximum_size_limit: 1024 * 1024 * 100, # 1000 MB
5555
)]
5656
end
5757
end

examples/process-monitor/readme.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Process Monitor Example
2+
3+
This example demonstrates how to use the `ProcessMonitor` to log process metrics periodically for all worker processes.
4+
5+
## Overview
6+
7+
The `ProcessMonitor` captures CPU and memory metrics for the entire process tree by tracking the parent process ID (ppid). This is more efficient than tracking individual processes and provides a comprehensive view of resource usage across all workers.
8+
9+
## Usage
10+
11+
Run the service:
12+
13+
```bash
14+
$ ./service.rb
15+
```
16+
17+
This will:
18+
1. Start a supervisor process.
19+
2. Spawn 4 worker processes that perform CPU and memory work.
20+
3. Log process metrics every 10 seconds.
21+
4. Monitor memory usage and restart workers that exceed 500MB.
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
#!/usr/bin/env async-service
2+
# frozen_string_literal: true
3+
4+
# Released under the MIT License.
5+
# Copyright, 2025, by Samuel Williams.
6+
7+
require "async/container/supervisor"
8+
9+
class WorkerService < Async::Service::Generic
10+
def setup(container)
11+
super
12+
13+
container.run(name: self.class.name, count: 4, restart: true, health_check_timeout: 2) do |instance|
14+
Async do
15+
if @environment.implements?(Async::Container::Supervisor::Supervised)
16+
@evaluator.make_supervised_worker(instance).run
17+
end
18+
19+
start_time = Time.now
20+
21+
instance.ready!
22+
23+
# Simulate some CPU and memory activity
24+
counter = 0
25+
chunks = []
26+
while true
27+
# Do some work
28+
counter += 1
29+
if counter % 10 == 0
30+
chunks << " " * 1024 * 1024 * rand(5)
31+
end
32+
33+
# Simulate CPU usage
34+
(1..1000).each {|i| Math.sqrt(i)}
35+
36+
sleep 1
37+
instance.ready!
38+
39+
uptime = Time.now - start_time
40+
instance.name = "Worker running for #{uptime.to_i} seconds (counter: #{counter})"
41+
end
42+
ensure
43+
Console.info(self, "Exiting...")
44+
end
45+
end
46+
end
47+
end
48+
49+
service "worker" do
50+
service_class WorkerService
51+
52+
include Async::Container::Supervisor::Supervised
53+
end
54+
55+
service "supervisor" do
56+
include Async::Container::Supervisor::Environment
57+
58+
monitors do
59+
[
60+
# Monitor process metrics every 10 seconds
61+
Async::Container::Supervisor::ProcessMonitor.new(interval: 10),
62+
63+
# Also monitor memory and restart workers if they exceed 500MB
64+
Async::Container::Supervisor::MemoryMonitor.new(
65+
interval: 5,
66+
maximum_size_limit: 1024 * 1024 * 500
67+
)
68+
]
69+
end
70+
end
71+

examples/simple/simple.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ def setup(container)
4848
monitors do
4949
[Async::Container::Supervisor::MemoryMonitor.new(
5050
interval: 1,
51-
limit: 1024 * 1024 * 400
51+
maximum_size_limit: 1024 * 1024 * 400
5252
)]
5353
end
5454
end

guides/getting-started/readme.md

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -109,25 +109,36 @@ This will start:
109109

110110
### Adding Health Monitors
111111

112-
You can add monitors to detect and respond to unhealthy conditions. For example, to add a memory monitor:
112+
You can add monitors to observe worker health and automatically respond to issues. Monitors are useful for:
113+
114+
- **Memory leak detection**: Automatically restart workers consuming excessive memory.
115+
- **Performance monitoring**: Track CPU and memory usage trends.
116+
- **Capacity planning**: Understand resource requirements.
117+
118+
For example, to add monitoring:
113119

114120
```ruby
115121
service "supervisor" do
116122
include Async::Container::Supervisor::Environment
117123

118124
monitors do
119125
[
120-
# Restart workers that exceed 500MB of memory:
126+
# Log process metrics for observability:
127+
Async::Container::Supervisor::ProcessMonitor.new(
128+
interval: 60
129+
),
130+
131+
# Restart workers exceeding memory limits:
121132
Async::Container::Supervisor::MemoryMonitor.new(
122-
interval: 10, # Check every 10 seconds
123-
limit: 1024 * 1024 * 500 # 500MB limit
133+
interval: 10,
134+
maximum_size_limit: 1024 * 1024 * 500 # 500MB limit per process
124135
)
125136
]
126137
end
127138
end
128139
```
129140

130-
The {ruby Async::Container::Supervisor::MemoryMonitor} will periodically check worker memory usage and restart any workers that exceed the configured limit.
141+
See the {ruby Async::Container::Supervisor::MemoryMonitor Memory Monitor} and {ruby Async::Container::Supervisor::ProcessMonitor Process Monitor} guides for detailed configuration options and best practices.
131142

132143
### Collecting Diagnostics
133144

guides/links.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
11
getting-started:
22
order: 1
33

4+
memory-monitor:
5+
order: 2
6+
7+
process-monitor:
8+
order: 3

guides/memory-monitor/readme.md

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
# Memory Monitor
2+
3+
This guide explains how to use the {ruby Async::Container::Supervisor::MemoryMonitor} to detect and restart workers that exceed memory limits or develop memory leaks.
4+
5+
## Overview
6+
7+
Long-running worker processes often accumulate memory over time, either through legitimate growth or memory leaks. Without intervention, workers can consume all available system memory, causing performance degradation or system crashes. The `MemoryMonitor` solves this by automatically detecting and restarting problematic workers before they impact system stability.
8+
9+
Use the `MemoryMonitor` when you need:
10+
11+
- **Memory leak protection**: Automatically restart workers that continuously accumulate memory.
12+
- **Resource limits**: Enforce maximum memory usage per worker.
13+
- **System stability**: Prevent runaway processes from exhausting system memory.
14+
- **Leak diagnosis**: Capture memory samples when leaks are detected for debugging.
15+
16+
The monitor uses the `memory-leak` gem to track process memory usage over time, detecting abnormal growth patterns that indicate leaks.
17+
18+
## Usage
19+
20+
Add a memory monitor to your supervisor service to automatically restart workers that exceed 500MB:
21+
22+
```ruby
23+
service "supervisor" do
24+
include Async::Container::Supervisor::Environment
25+
26+
monitors do
27+
[
28+
Async::Container::Supervisor::MemoryMonitor.new(
29+
# Check worker memory every 10 seconds:
30+
interval: 10,
31+
32+
# Restart workers exceeding 500MB:
33+
maximum_size_limit: 1024 * 1024 * 500
34+
)
35+
]
36+
end
37+
end
38+
```
39+
40+
When a worker exceeds the limit:
41+
1. The monitor logs the leak detection.
42+
2. Optionally captures a memory sample for debugging.
43+
3. Sends `SIGINT` to gracefully shut down the worker.
44+
4. The container automatically spawns a replacement worker.
45+
46+
## Configuration Options
47+
48+
The `MemoryMonitor` accepts the following options:
49+
50+
### `interval`
51+
52+
The interval (in seconds) at which to check for memory leaks. Default: `10` seconds.
53+
54+
```ruby
55+
Async::Container::Supervisor::MemoryMonitor.new(interval: 30)
56+
```
57+
58+
### `maximum_size_limit`
59+
60+
The maximum memory size (in bytes) per process. When a process exceeds this limit, it will be restarted.
61+
62+
```ruby
63+
# 500MB limit
64+
Async::Container::Supervisor::MemoryMonitor.new(maximum_size_limit: 1024 * 1024 * 500)
65+
66+
# 1GB limit
67+
Async::Container::Supervisor::MemoryMonitor.new(maximum_size_limit: 1024 * 1024 * 1024)
68+
```
69+
70+
### `total_size_limit`
71+
72+
The total size limit (in bytes) for all monitored processes combined. If not specified, only per-process limits are enforced.
73+
74+
```ruby
75+
# Total limit of 2GB across all workers
76+
Async::Container::Supervisor::MemoryMonitor.new(
77+
maximum_size_limit: 1024 * 1024 * 500, # 500MB per process
78+
total_size_limit: 1024 * 1024 * 1024 * 2 # 2GB total
79+
)
80+
```
81+
82+
### `memory_sample`
83+
84+
Options for capturing memory samples when a leak is detected. If `nil`, memory sampling is disabled.
85+
86+
Default: `{duration: 30, timeout: 120}`
87+
88+
```ruby
89+
# Customize memory sampling:
90+
Async::Container::Supervisor::MemoryMonitor.new(
91+
memory_sample: {
92+
duration: 60, # Sample for 60 seconds
93+
timeout: 180 # Timeout after 180 seconds
94+
}
95+
)
96+
97+
# Disable memory sampling:
98+
Async::Container::Supervisor::MemoryMonitor.new(
99+
memory_sample: nil
100+
)
101+
```
102+
103+
## Memory Leak Detection
104+
105+
When a memory leak is detected, the monitor will:
106+
107+
1. Log the leak detection with process details.
108+
2. If `memory_sample` is configured, capture a memory sample from the worker.
109+
3. Send a `SIGINT` signal to gracefully restart the worker.
110+
4. The container will automatically restart the worker process.
111+
112+
### Memory Sampling
113+
114+
When a memory leak is detected and `memory_sample` is configured, the monitor requests a lightweight memory sample from the worker. This sample:
115+
116+
- Tracks allocations during the sampling period.
117+
- Forces a garbage collection.
118+
- Returns a JSON report showing retained objects.
119+
120+
The report includes:
121+
- `total_allocated`: Total allocated memory and object count.
122+
- `total_retained`: Total retained memory and count after GC.
123+
- `by_gem`: Breakdown by gem/library.
124+
- `by_file`: Breakdown by source file.
125+
- `by_location`: Breakdown by specific file:line locations.
126+
- `by_class`: Breakdown by object class.
127+
- `strings`: String allocation analysis.
128+
129+
This is much more efficient than a full heap dump using `ObjectSpace.dump_all`.

0 commit comments

Comments
 (0)