Skip to content

Commit d374033

Browse files
committed
Merge branch 'develop'
2 parents c5f565e + 98d82c2 commit d374033

File tree

3 files changed

+358
-105
lines changed

3 files changed

+358
-105
lines changed

.github/workflows/build.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ on:
55
branches:
66
- master
77
- develop
8+
- tailscale
89
workflow_dispatch:
910
jobs:
1011
Build-Docker:

LOGGING.md

Lines changed: 291 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,291 @@
1+
# Logging using log aggregation
2+
3+
There are two ways that I am playing with to perform logging to an
4+
external logging service: Google Cloud stackdriver lgging and Grafana
5+
Cloud Loki. I do like to check all my logs at a central location for all
6+
my various linux servers. Logging at one central location also has the
7+
advantage that I can centrally manage any alerting based on strings in
8+
the logs as well as being sure that it is not that easy to tamper with
9+
log files in case of intruders. Both Google Cloud and Grafana Cloud do
10+
have generous free quotas to use their services, and I am using both in
11+
different projects. I started with Google Cloud many years back, so I
12+
may be a bit more familar there.
13+
14+
There are two parts for capturing the logs of individual machines, one
15+
is redirecting the journald logs to the cloud logging service, this is
16+
done by natively installing [fluent-bit](https://www.fluentbit.io) on
17+
your machine. There are two ways to configure fluent-bit, I will use
18+
the yaml based config file here. To make that happen I use the following
19+
as /etc/systemd/system/fluent-bit.service.d/override.conf for the
20+
fluent-bit daemon:
21+
22+
```
23+
[Service]
24+
ExecStart=
25+
ExecStart=/usr/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.yaml
26+
```
27+
28+
The second part is instructing docker to send the collected stdout and
29+
stderr of the managed containers to the same cloud logging provider.
30+
There are docker plug-ins available for both Google Cloud and Grafana
31+
Cloud to perform this task. Both work best if the docker apps are
32+
configured to emit just plain json lines for their logging.
33+
34+
But I am surprised how many developers spend lost time (in my opinion)
35+
to build homegrown logging into their apps that include log file rotation
36+
and colorizing their output with ANSI escape sequences.
37+
38+
## Logging to Google Cloud Logging (Stackdriver)
39+
40+
To be able to log to Google Cloud, you will need to create a service
41+
account that should include the following roles:
42+
43+
* Logs Writer
44+
* Error Reporting Writer
45+
46+
Save the service account key as a json file, it is used both by fluent-bit and docker below.
47+
48+
### fluent-bit for sending the journal
49+
50+
A typical fluent-bit.yaml config for logging to Google Cloud looks like
51+
this:
52+
53+
```
54+
service:
55+
flush: 1
56+
daemon: Off
57+
log_level: info
58+
http_server: Off
59+
http_listen: 0.0.0.0
60+
http_port: 2020
61+
storage.metrics: on
62+
63+
pipeline:
64+
inputs:
65+
- name: systemd
66+
tag: host.systemd
67+
DB: /var/lib/fluent-bit/journal.db
68+
Lowercase: on
69+
Strip_Underscores: on
70+
processors:
71+
logs:
72+
- name: lua
73+
call: modify
74+
code: |
75+
function modify(tag, timestamp, record)
76+
new_record = record
77+
prio = record["priority"]
78+
if(prio == "7")
79+
then
80+
new_record["severity"] = "DEBUG"
81+
elseif(prio == "6")
82+
then
83+
new_record["severity"] = "INFO"
84+
elseif(prio == "5")
85+
then
86+
new_record["severity"] = "NOTICE"
87+
elseif(prio == "4")
88+
then
89+
new_record["severity"] = "WARNING"
90+
elseif(prio == "3")
91+
then
92+
new_record["severity"] = "ERROR"
93+
elseif(prio == "2")
94+
then
95+
new_record["severity"] = "CRITICAL"
96+
elseif(prio == "1")
97+
then
98+
new_record["severity"] = "ALERT"
99+
elseif(prio == "0")
100+
then
101+
new_record["severity"] = "EMERGENCY"
102+
end
103+
return 1, timestamp, new_record
104+
end
105+
outputs:
106+
- name: stackdriver
107+
match: '*'
108+
severity_key: severity
109+
google_service_credentials: /etc/fluent-bit/my-service-acct.json
110+
resource: gce_instance
111+
resource_labels: instance_id=myhost,zone=myzone
112+
```
113+
114+
The biggest part in the above config is mapping journald priorities to
115+
stackdriver severities. Please not the use of the Google Cloud
116+
credentials file and the addition of standard instance_id and zone
117+
resource labels. The directory /var/lib/fluent-bit for journald
118+
synchronization needs to be created once.
119+
120+
### Docker log plug-in
121+
122+
I used to use the gcplogs log driver built into docker, but I am really
123+
switching all my projects to structured json based logging and was
124+
looking for ways to directly feed that into google cloud logging. The
125+
docker gpclogs driver does not do this (it forwards the JSON as one big
126+
log line), but I found the excellent project
127+
[ngcplogs](https://github.com/nanoandrew4/ngcplogs)
128+
that modified the gcplogs driver to extract the structured log info.
129+
130+
This driver is a docker plugin and is installed like this (for an ARM
131+
based host):
132+
133+
````
134+
docker plugin install nanoandrew4/ngcplogs:linux-arm64-v1.3.0 --alias ngcplogs --grant-all-permissions
135+
````
136+
137+
The driver is configured as usual in /etc/docker/daemon.json
138+
like this:
139+
140+
```
141+
{
142+
"log-driver": "ngcplogs",
143+
"log-opts": {
144+
"exclude-timestamp" : "true",
145+
"extract-gcp" : "true",
146+
"extract-caddy" : "true",
147+
"gcp-project": "hosting-XXXXXX",
148+
"gcp-meta-name": "myhost",
149+
"gcp-meta-zone": "myzone",
150+
"credentials-json" : "your_json_escaped_credentials.json_file_content"
151+
}
152+
}
153+
```
154+
155+
The escaped json string for the Google service account with log writing
156+
permissions can be generated with the json-escape.go program like this:
157+
158+
```
159+
./json-escape.sh </path/to/my-service-acct.json
160+
```
161+
162+
The extract-gcp option extracts already existing Google Cloud style
163+
Trace, labels and source line information from applications that already
164+
expect their output to be scanned by Google Cloud Logging. For Golang
165+
apps that use logrus
166+
[stackdriver-gae-logrus-plugin](https://github.com/andyfusniak/stackdriver-gae-logrus-plugin)
167+
or for log/slog based ones [slogdriver](https://github.com/jussi-kalliokoski/slogdriver) this works nicely.
168+
169+
The slogdriver adapter for log/slog does not parse the traceparent HTTP
170+
header, I have thus created small piece of middleware that I use to
171+
inject the trace information as expected by slogdriver into the request
172+
context: [traceparent](https://github.com/jum/traceparent).
173+
174+
The extract-caddy option extracts fields from Caddy logs to be able to
175+
use caddy as a proper trace parent and also make Google Cloud console
176+
display caddy access log entries as HTTP requests.
177+
178+
The neat effect of all this that I get a fully distributed tracing across
179+
multiple nodes without going through the hoops of setting up a full blown
180+
OTEL setup and a really nice log viewer in the Google Cloud Console.
181+
182+
## Logging to Grafana Cloud (Loki)
183+
184+
For logging to Grafana Loki you will need to get the credentials from
185+
the Loki section in the Grafana Cloud account. This includes the host to
186+
send logs to, the user id and password.
187+
188+
### fluent-bit for sending the journal
189+
190+
A typical fluent-bit.yaml config for logging to Grafana Loki looks like
191+
this:
192+
193+
```
194+
service:
195+
flush: 1
196+
daemon: Off
197+
log_level: info
198+
http_server: Off
199+
http_listen: 0.0.0.0
200+
http_port: 2020
201+
storage.metrics: on
202+
203+
pipeline:
204+
inputs:
205+
- name: systemd
206+
tag: host.systemd
207+
DB: /var/lib/fluent-bit/journal.db
208+
Lowercase: on
209+
Strip_Underscores: on
210+
processors:
211+
logs:
212+
- name: lua
213+
call: modify
214+
code: |
215+
function modify(tag, timestamp, record)
216+
new_record = record
217+
prio = record["priority"]
218+
if(prio == "7")
219+
then
220+
new_record["level"] = "DEBUG"
221+
elseif(prio == "6")
222+
then
223+
new_record["level"] = "INFO"
224+
elseif(prio == "5")
225+
then
226+
new_record["level"] = "NOTICE"
227+
elseif(prio == "4")
228+
then
229+
new_record["level"] = "WARN"
230+
elseif(prio == "3")
231+
then
232+
new_record["level"] = "ERROR"
233+
elseif(prio == "2")
234+
then
235+
new_record["level"] = "CRITICAL"
236+
elseif(prio == "1")
237+
then
238+
new_record["level"] = "ALERT"
239+
elseif(prio == "0")
240+
then
241+
new_record["level"] = "EMERGENCY"
242+
end
243+
return 1, timestamp, new_record
244+
end
245+
outputs:
246+
- name: loki
247+
match: '*'
248+
labels: job=journal, instance=myhost, zone=myzone, level=$level, $systemd_unit, tag=$TAG
249+
host: logs-prod-XXX.grafana.net
250+
port: 443
251+
tls: on
252+
tls.verify: on
253+
line_format: json
254+
http_user: my_grafana_user_id
255+
http_passwd: my_grafana_password
256+
```
257+
258+
The biggest part in the above config is mapping journald priorities to
259+
Loki log levels. There is a subtle difference between stackdriver and
260+
Loki here, the WARNING lable is only understood if written as WARN by
261+
Loki. Please note the use of the Grafana Cloud Loki credentials and the
262+
addition of standard instance_id and zone resource labels. The
263+
directory /var/lib/fluent-bit for journald synchronization needs to be
264+
created once.
265+
266+
### Docker log plug-in
267+
268+
To install the Loki docker plugin is installed like this (for an ARM
269+
based host):
270+
271+
```
272+
docker plugin install grafana/loki-docker-driver:3.5.0-arm64 --alias loki --grant-all-permissions
273+
```
274+
275+
The driver is configured as usual in /etc/docker/daemon.json
276+
like this:
277+
278+
```
279+
{
280+
"log-driver": "loki",
281+
"log-opts": {
282+
"loki-url": "https://my_grafana_user_id:[email protected]/loki/api/v1/push",
283+
"loki-external-labels": "job=docker,instance=myhost,zone=myzone"
284+
}
285+
}
286+
```
287+
288+
The loki docker plug-in does already handle log lines in json format. To
289+
propagate traceparent information for golang apps using a suitable http
290+
middleware and adding trace information to the log see:
291+
[slog-traceparent](https://github.com/jum/slog-traceparent)

0 commit comments

Comments
 (0)