|
| 1 | +# Logging using log aggregation |
| 2 | + |
| 3 | +There are two ways that I am playing with to perform logging to an |
| 4 | +external logging service: Google Cloud stackdriver lgging and Grafana |
| 5 | +Cloud Loki. I do like to check all my logs at a central location for all |
| 6 | +my various linux servers. Logging at one central location also has the |
| 7 | +advantage that I can centrally manage any alerting based on strings in |
| 8 | +the logs as well as being sure that it is not that easy to tamper with |
| 9 | +log files in case of intruders. Both Google Cloud and Grafana Cloud do |
| 10 | +have generous free quotas to use their services, and I am using both in |
| 11 | +different projects. I started with Google Cloud many years back, so I |
| 12 | +may be a bit more familar there. |
| 13 | + |
| 14 | +There are two parts for capturing the logs of individual machines, one |
| 15 | +is redirecting the journald logs to the cloud logging service, this is |
| 16 | +done by natively installing [fluent-bit](https://www.fluentbit.io) on |
| 17 | +your machine. There are two ways to configure fluent-bit, I will use |
| 18 | +the yaml based config file here. To make that happen I use the following |
| 19 | +as /etc/systemd/system/fluent-bit.service.d/override.conf for the |
| 20 | +fluent-bit daemon: |
| 21 | + |
| 22 | +``` |
| 23 | +[Service] |
| 24 | +ExecStart= |
| 25 | +ExecStart=/usr/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.yaml |
| 26 | +``` |
| 27 | + |
| 28 | +The second part is instructing docker to send the collected stdout and |
| 29 | +stderr of the managed containers to the same cloud logging provider. |
| 30 | +There are docker plug-ins available for both Google Cloud and Grafana |
| 31 | +Cloud to perform this task. Both work best if the docker apps are |
| 32 | +configured to emit just plain json lines for their logging. |
| 33 | + |
| 34 | +But I am surprised how many developers spend lost time (in my opinion) |
| 35 | +to build homegrown logging into their apps that include log file rotation |
| 36 | +and colorizing their output with ANSI escape sequences. |
| 37 | + |
| 38 | +## Logging to Google Cloud Logging (Stackdriver) |
| 39 | + |
| 40 | +To be able to log to Google Cloud, you will need to create a service |
| 41 | +account that should include the following roles: |
| 42 | + |
| 43 | +* Logs Writer |
| 44 | +* Error Reporting Writer |
| 45 | + |
| 46 | +Save the service account key as a json file, it is used both by fluent-bit and docker below. |
| 47 | + |
| 48 | +### fluent-bit for sending the journal |
| 49 | + |
| 50 | +A typical fluent-bit.yaml config for logging to Google Cloud looks like |
| 51 | +this: |
| 52 | + |
| 53 | +``` |
| 54 | +service: |
| 55 | + flush: 1 |
| 56 | + daemon: Off |
| 57 | + log_level: info |
| 58 | + http_server: Off |
| 59 | + http_listen: 0.0.0.0 |
| 60 | + http_port: 2020 |
| 61 | + storage.metrics: on |
| 62 | +
|
| 63 | +pipeline: |
| 64 | + inputs: |
| 65 | + - name: systemd |
| 66 | + tag: host.systemd |
| 67 | + DB: /var/lib/fluent-bit/journal.db |
| 68 | + Lowercase: on |
| 69 | + Strip_Underscores: on |
| 70 | + processors: |
| 71 | + logs: |
| 72 | + - name: lua |
| 73 | + call: modify |
| 74 | + code: | |
| 75 | + function modify(tag, timestamp, record) |
| 76 | + new_record = record |
| 77 | + prio = record["priority"] |
| 78 | + if(prio == "7") |
| 79 | + then |
| 80 | + new_record["severity"] = "DEBUG" |
| 81 | + elseif(prio == "6") |
| 82 | + then |
| 83 | + new_record["severity"] = "INFO" |
| 84 | + elseif(prio == "5") |
| 85 | + then |
| 86 | + new_record["severity"] = "NOTICE" |
| 87 | + elseif(prio == "4") |
| 88 | + then |
| 89 | + new_record["severity"] = "WARNING" |
| 90 | + elseif(prio == "3") |
| 91 | + then |
| 92 | + new_record["severity"] = "ERROR" |
| 93 | + elseif(prio == "2") |
| 94 | + then |
| 95 | + new_record["severity"] = "CRITICAL" |
| 96 | + elseif(prio == "1") |
| 97 | + then |
| 98 | + new_record["severity"] = "ALERT" |
| 99 | + elseif(prio == "0") |
| 100 | + then |
| 101 | + new_record["severity"] = "EMERGENCY" |
| 102 | + end |
| 103 | + return 1, timestamp, new_record |
| 104 | + end |
| 105 | + outputs: |
| 106 | + - name: stackdriver |
| 107 | + match: '*' |
| 108 | + severity_key: severity |
| 109 | + google_service_credentials: /etc/fluent-bit/my-service-acct.json |
| 110 | + resource: gce_instance |
| 111 | + resource_labels: instance_id=myhost,zone=myzone |
| 112 | +``` |
| 113 | + |
| 114 | +The biggest part in the above config is mapping journald priorities to |
| 115 | +stackdriver severities. Please not the use of the Google Cloud |
| 116 | +credentials file and the addition of standard instance_id and zone |
| 117 | +resource labels. The directory /var/lib/fluent-bit for journald |
| 118 | +synchronization needs to be created once. |
| 119 | + |
| 120 | +### Docker log plug-in |
| 121 | + |
| 122 | +I used to use the gcplogs log driver built into docker, but I am really |
| 123 | +switching all my projects to structured json based logging and was |
| 124 | +looking for ways to directly feed that into google cloud logging. The |
| 125 | +docker gpclogs driver does not do this (it forwards the JSON as one big |
| 126 | +log line), but I found the excellent project |
| 127 | +[ngcplogs](https://github.com/nanoandrew4/ngcplogs) |
| 128 | +that modified the gcplogs driver to extract the structured log info. |
| 129 | + |
| 130 | +This driver is a docker plugin and is installed like this (for an ARM |
| 131 | +based host): |
| 132 | + |
| 133 | +```` |
| 134 | +docker plugin install nanoandrew4/ngcplogs:linux-arm64-v1.3.0 --alias ngcplogs --grant-all-permissions |
| 135 | +```` |
| 136 | + |
| 137 | +The driver is configured as usual in /etc/docker/daemon.json |
| 138 | +like this: |
| 139 | + |
| 140 | +``` |
| 141 | +{ |
| 142 | + "log-driver": "ngcplogs", |
| 143 | + "log-opts": { |
| 144 | + "exclude-timestamp" : "true", |
| 145 | + "extract-gcp" : "true", |
| 146 | + "extract-caddy" : "true", |
| 147 | + "gcp-project": "hosting-XXXXXX", |
| 148 | + "gcp-meta-name": "myhost", |
| 149 | + "gcp-meta-zone": "myzone", |
| 150 | + "credentials-json" : "your_json_escaped_credentials.json_file_content" |
| 151 | + } |
| 152 | +} |
| 153 | +``` |
| 154 | + |
| 155 | +The escaped json string for the Google service account with log writing |
| 156 | +permissions can be generated with the json-escape.go program like this: |
| 157 | + |
| 158 | +``` |
| 159 | +./json-escape.sh </path/to/my-service-acct.json |
| 160 | +``` |
| 161 | + |
| 162 | +The extract-gcp option extracts already existing Google Cloud style |
| 163 | +Trace, labels and source line information from applications that already |
| 164 | +expect their output to be scanned by Google Cloud Logging. For Golang |
| 165 | +apps that use logrus |
| 166 | +[stackdriver-gae-logrus-plugin](https://github.com/andyfusniak/stackdriver-gae-logrus-plugin) |
| 167 | +or for log/slog based ones [slogdriver](https://github.com/jussi-kalliokoski/slogdriver) this works nicely. |
| 168 | + |
| 169 | +The slogdriver adapter for log/slog does not parse the traceparent HTTP |
| 170 | +header, I have thus created small piece of middleware that I use to |
| 171 | +inject the trace information as expected by slogdriver into the request |
| 172 | +context: [traceparent](https://github.com/jum/traceparent). |
| 173 | + |
| 174 | +The extract-caddy option extracts fields from Caddy logs to be able to |
| 175 | +use caddy as a proper trace parent and also make Google Cloud console |
| 176 | +display caddy access log entries as HTTP requests. |
| 177 | + |
| 178 | +The neat effect of all this that I get a fully distributed tracing across |
| 179 | +multiple nodes without going through the hoops of setting up a full blown |
| 180 | +OTEL setup and a really nice log viewer in the Google Cloud Console. |
| 181 | + |
| 182 | +## Logging to Grafana Cloud (Loki) |
| 183 | + |
| 184 | +For logging to Grafana Loki you will need to get the credentials from |
| 185 | +the Loki section in the Grafana Cloud account. This includes the host to |
| 186 | +send logs to, the user id and password. |
| 187 | + |
| 188 | +### fluent-bit for sending the journal |
| 189 | + |
| 190 | +A typical fluent-bit.yaml config for logging to Grafana Loki looks like |
| 191 | +this: |
| 192 | + |
| 193 | +``` |
| 194 | +service: |
| 195 | + flush: 1 |
| 196 | + daemon: Off |
| 197 | + log_level: info |
| 198 | + http_server: Off |
| 199 | + http_listen: 0.0.0.0 |
| 200 | + http_port: 2020 |
| 201 | + storage.metrics: on |
| 202 | +
|
| 203 | +pipeline: |
| 204 | + inputs: |
| 205 | + - name: systemd |
| 206 | + tag: host.systemd |
| 207 | + DB: /var/lib/fluent-bit/journal.db |
| 208 | + Lowercase: on |
| 209 | + Strip_Underscores: on |
| 210 | + processors: |
| 211 | + logs: |
| 212 | + - name: lua |
| 213 | + call: modify |
| 214 | + code: | |
| 215 | + function modify(tag, timestamp, record) |
| 216 | + new_record = record |
| 217 | + prio = record["priority"] |
| 218 | + if(prio == "7") |
| 219 | + then |
| 220 | + new_record["level"] = "DEBUG" |
| 221 | + elseif(prio == "6") |
| 222 | + then |
| 223 | + new_record["level"] = "INFO" |
| 224 | + elseif(prio == "5") |
| 225 | + then |
| 226 | + new_record["level"] = "NOTICE" |
| 227 | + elseif(prio == "4") |
| 228 | + then |
| 229 | + new_record["level"] = "WARN" |
| 230 | + elseif(prio == "3") |
| 231 | + then |
| 232 | + new_record["level"] = "ERROR" |
| 233 | + elseif(prio == "2") |
| 234 | + then |
| 235 | + new_record["level"] = "CRITICAL" |
| 236 | + elseif(prio == "1") |
| 237 | + then |
| 238 | + new_record["level"] = "ALERT" |
| 239 | + elseif(prio == "0") |
| 240 | + then |
| 241 | + new_record["level"] = "EMERGENCY" |
| 242 | + end |
| 243 | + return 1, timestamp, new_record |
| 244 | + end |
| 245 | + outputs: |
| 246 | + - name: loki |
| 247 | + match: '*' |
| 248 | + labels: job=journal, instance=myhost, zone=myzone, level=$level, $systemd_unit, tag=$TAG |
| 249 | + host: logs-prod-XXX.grafana.net |
| 250 | + port: 443 |
| 251 | + tls: on |
| 252 | + tls.verify: on |
| 253 | + line_format: json |
| 254 | + http_user: my_grafana_user_id |
| 255 | + http_passwd: my_grafana_password |
| 256 | +``` |
| 257 | + |
| 258 | +The biggest part in the above config is mapping journald priorities to |
| 259 | +Loki log levels. There is a subtle difference between stackdriver and |
| 260 | +Loki here, the WARNING lable is only understood if written as WARN by |
| 261 | +Loki. Please note the use of the Grafana Cloud Loki credentials and the |
| 262 | +addition of standard instance_id and zone resource labels. The |
| 263 | +directory /var/lib/fluent-bit for journald synchronization needs to be |
| 264 | +created once. |
| 265 | + |
| 266 | +### Docker log plug-in |
| 267 | + |
| 268 | +To install the Loki docker plugin is installed like this (for an ARM |
| 269 | +based host): |
| 270 | + |
| 271 | +``` |
| 272 | +docker plugin install grafana/loki-docker-driver:3.5.0-arm64 --alias loki --grant-all-permissions |
| 273 | +``` |
| 274 | + |
| 275 | +The driver is configured as usual in /etc/docker/daemon.json |
| 276 | +like this: |
| 277 | + |
| 278 | +``` |
| 279 | +{ |
| 280 | + "log-driver": "loki", |
| 281 | + "log-opts": { |
| 282 | + "loki-url": "https://my_grafana_user_id:[email protected]/loki/api/v1/push", |
| 283 | + "loki-external-labels": "job=docker,instance=myhost,zone=myzone" |
| 284 | + } |
| 285 | +} |
| 286 | +``` |
| 287 | + |
| 288 | +The loki docker plug-in does already handle log lines in json format. To |
| 289 | +propagate traceparent information for golang apps using a suitable http |
| 290 | +middleware and adding trace information to the log see: |
| 291 | +[slog-traceparent](https://github.com/jum/slog-traceparent) |
0 commit comments