| 
 | 1 | +# Logging using log aggregation  | 
 | 2 | + | 
 | 3 | +There are two ways that I am playing with to perform logging to an  | 
 | 4 | +external logging service: Google Cloud stackdriver lgging and Grafana  | 
 | 5 | +Cloud Loki. I do like to check all my logs at a central location for all  | 
 | 6 | +my various linux servers. Logging at one central location also has the  | 
 | 7 | +advantage that I can centrally manage any alerting based on strings in  | 
 | 8 | +the logs as well as being sure that it is not that easy to tamper with  | 
 | 9 | +log files in case of intruders. Both Google Cloud and Grafana Cloud do  | 
 | 10 | +have generous free quotas to use their services, and I am using both in  | 
 | 11 | +different projects. I started with Google Cloud many years back, so I  | 
 | 12 | +may be a bit more familar there.  | 
 | 13 | + | 
 | 14 | +There are two parts for capturing the logs of individual machines, one  | 
 | 15 | +is redirecting the journald logs to the cloud logging service, this is  | 
 | 16 | +done by natively installing [fluent-bit](https://www.fluentbit.io) on  | 
 | 17 | +your machine. There are two ways to configure fluent-bit, I will use  | 
 | 18 | +the yaml based config file here. To make that happen I use the following  | 
 | 19 | +as /etc/systemd/system/fluent-bit.service.d/override.conf for the  | 
 | 20 | +fluent-bit daemon:  | 
 | 21 | + | 
 | 22 | +```  | 
 | 23 | +[Service]  | 
 | 24 | +ExecStart=  | 
 | 25 | +ExecStart=/usr/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.yaml  | 
 | 26 | +```  | 
 | 27 | + | 
 | 28 | +The second part is instructing docker to send the collected stdout and  | 
 | 29 | +stderr of the managed containers to the same cloud logging provider.  | 
 | 30 | +There are docker plug-ins available for both Google Cloud and Grafana  | 
 | 31 | +Cloud to perform this task. Both work best if the docker apps are  | 
 | 32 | +configured to emit just plain json lines for their logging.  | 
 | 33 | + | 
 | 34 | +But I am surprised how many developers spend lost time (in my opinion)  | 
 | 35 | +to build homegrown logging into their apps that include log file rotation  | 
 | 36 | +and colorizing their output with ANSI escape sequences.  | 
 | 37 | + | 
 | 38 | +## Logging to Google Cloud Logging (Stackdriver)  | 
 | 39 | + | 
 | 40 | +To be able to log to Google Cloud, you will need to create a service  | 
 | 41 | +account that should include the following roles:  | 
 | 42 | + | 
 | 43 | +* Logs Writer  | 
 | 44 | +* Error Reporting Writer  | 
 | 45 | + | 
 | 46 | +Save the service account key as a json file, it is used both by fluent-bit and docker below.  | 
 | 47 | + | 
 | 48 | +### fluent-bit for sending the journal  | 
 | 49 | + | 
 | 50 | +A typical fluent-bit.yaml config for logging to Google Cloud looks like  | 
 | 51 | +this:  | 
 | 52 | + | 
 | 53 | +```  | 
 | 54 | +service:  | 
 | 55 | +  flush: 1  | 
 | 56 | +  daemon: Off  | 
 | 57 | +  log_level: info  | 
 | 58 | +  http_server: Off  | 
 | 59 | +  http_listen: 0.0.0.0  | 
 | 60 | +  http_port: 2020  | 
 | 61 | +  storage.metrics: on  | 
 | 62 | +
  | 
 | 63 | +pipeline:  | 
 | 64 | +  inputs:  | 
 | 65 | +    - name: systemd  | 
 | 66 | +      tag: host.systemd  | 
 | 67 | +      DB: /var/lib/fluent-bit/journal.db  | 
 | 68 | +      Lowercase: on  | 
 | 69 | +      Strip_Underscores: on  | 
 | 70 | +      processors:  | 
 | 71 | +        logs:  | 
 | 72 | +          - name: lua  | 
 | 73 | +            call: modify  | 
 | 74 | +            code: |  | 
 | 75 | +              function modify(tag, timestamp, record)  | 
 | 76 | +                new_record = record  | 
 | 77 | +                prio = record["priority"]  | 
 | 78 | +                if(prio == "7")  | 
 | 79 | +                then  | 
 | 80 | +                  new_record["severity"] = "DEBUG"  | 
 | 81 | +                elseif(prio == "6")  | 
 | 82 | +                then  | 
 | 83 | +                  new_record["severity"] = "INFO"  | 
 | 84 | +                elseif(prio == "5")  | 
 | 85 | +                then  | 
 | 86 | +                  new_record["severity"] = "NOTICE"  | 
 | 87 | +                elseif(prio == "4")  | 
 | 88 | +                then  | 
 | 89 | +                  new_record["severity"] = "WARNING"  | 
 | 90 | +                elseif(prio == "3")  | 
 | 91 | +                then  | 
 | 92 | +                  new_record["severity"] = "ERROR"  | 
 | 93 | +                elseif(prio == "2")  | 
 | 94 | +                then  | 
 | 95 | +                  new_record["severity"] = "CRITICAL"  | 
 | 96 | +                elseif(prio == "1")  | 
 | 97 | +                then  | 
 | 98 | +                  new_record["severity"] = "ALERT"  | 
 | 99 | +                elseif(prio == "0")  | 
 | 100 | +                then  | 
 | 101 | +                  new_record["severity"] = "EMERGENCY"  | 
 | 102 | +                end  | 
 | 103 | +                return 1, timestamp, new_record  | 
 | 104 | +              end  | 
 | 105 | +  outputs:  | 
 | 106 | +    - name: stackdriver  | 
 | 107 | +      match: '*'  | 
 | 108 | +      severity_key: severity  | 
 | 109 | +      google_service_credentials: /etc/fluent-bit/my-service-acct.json  | 
 | 110 | +      resource: gce_instance  | 
 | 111 | +      resource_labels: instance_id=myhost,zone=myzone  | 
 | 112 | +```  | 
 | 113 | + | 
 | 114 | +The biggest part in the above config is mapping journald priorities to  | 
 | 115 | +stackdriver severities. Please not the use of the Google Cloud  | 
 | 116 | +credentials file and the addition of standard instance_id and zone  | 
 | 117 | +resource labels. The directory /var/lib/fluent-bit for journald  | 
 | 118 | +synchronization needs to be created once.  | 
 | 119 | + | 
 | 120 | +### Docker log plug-in  | 
 | 121 | + | 
 | 122 | +I used to use the gcplogs log driver built into docker, but I am really  | 
 | 123 | +switching all my projects to structured json based logging and was  | 
 | 124 | +looking for ways to directly feed that into google cloud logging. The  | 
 | 125 | +docker gpclogs driver does not do this (it forwards the JSON as one big  | 
 | 126 | +log line), but I found the excellent project  | 
 | 127 | +[ngcplogs](https://github.com/nanoandrew4/ngcplogs)  | 
 | 128 | +that modified the gcplogs driver to extract the structured log info.  | 
 | 129 | + | 
 | 130 | +This driver is a docker plugin and is installed like this (for an ARM  | 
 | 131 | +based host):  | 
 | 132 | + | 
 | 133 | +````  | 
 | 134 | +docker plugin install nanoandrew4/ngcplogs:linux-arm64-v1.3.0 --alias ngcplogs --grant-all-permissions  | 
 | 135 | +````  | 
 | 136 | + | 
 | 137 | +The driver is configured as usual in /etc/docker/daemon.json  | 
 | 138 | +like this:  | 
 | 139 | + | 
 | 140 | +```  | 
 | 141 | +{  | 
 | 142 | +	"log-driver": "ngcplogs",  | 
 | 143 | +	"log-opts": {  | 
 | 144 | +		"exclude-timestamp" : "true",  | 
 | 145 | +		"extract-gcp" : "true",  | 
 | 146 | +		"extract-caddy" : "true",  | 
 | 147 | +		"gcp-project": "hosting-XXXXXX",  | 
 | 148 | +		"gcp-meta-name": "myhost",  | 
 | 149 | +        "gcp-meta-zone": "myzone",  | 
 | 150 | +		"credentials-json" : "your_json_escaped_credentials.json_file_content"  | 
 | 151 | +	}  | 
 | 152 | +}  | 
 | 153 | +```  | 
 | 154 | + | 
 | 155 | +The escaped json string for the Google service account with log writing  | 
 | 156 | +permissions can be generated with the json-escape.go program like this:  | 
 | 157 | + | 
 | 158 | +```  | 
 | 159 | +./json-escape.sh </path/to/my-service-acct.json  | 
 | 160 | +```  | 
 | 161 | + | 
 | 162 | +The extract-gcp option extracts already existing Google Cloud style  | 
 | 163 | +Trace, labels and source line information from applications that already  | 
 | 164 | +expect their output to be scanned by Google Cloud Logging. For Golang  | 
 | 165 | +apps that use logrus  | 
 | 166 | +[stackdriver-gae-logrus-plugin](https://github.com/andyfusniak/stackdriver-gae-logrus-plugin)  | 
 | 167 | +or for log/slog based ones [slogdriver](https://github.com/jussi-kalliokoski/slogdriver) this works nicely.  | 
 | 168 | + | 
 | 169 | +The slogdriver adapter for log/slog does not parse the traceparent HTTP  | 
 | 170 | +header, I have thus created small piece of middleware that I use to  | 
 | 171 | +inject the trace information as expected by slogdriver into the request  | 
 | 172 | +context: [traceparent](https://github.com/jum/traceparent).  | 
 | 173 | + | 
 | 174 | +The extract-caddy option extracts fields from Caddy logs to be able to  | 
 | 175 | +use caddy as a proper trace parent and also make Google Cloud console  | 
 | 176 | +display caddy access log entries as HTTP requests.  | 
 | 177 | + | 
 | 178 | +The neat effect of all this that I get a fully distributed tracing across  | 
 | 179 | +multiple nodes without going through the hoops of setting up a full blown  | 
 | 180 | +OTEL setup and a really nice log viewer in the Google Cloud Console.  | 
 | 181 | + | 
 | 182 | +## Logging to Grafana Cloud (Loki)  | 
 | 183 | + | 
 | 184 | +For logging to Grafana Loki you will need to get the credentials from  | 
 | 185 | +the Loki section in the Grafana Cloud account. This includes the host to  | 
 | 186 | +send logs to, the user id and password.  | 
 | 187 | + | 
 | 188 | +### fluent-bit for sending the journal  | 
 | 189 | + | 
 | 190 | +A typical fluent-bit.yaml config for logging to Grafana Loki looks like  | 
 | 191 | +this:  | 
 | 192 | + | 
 | 193 | +```  | 
 | 194 | +service:  | 
 | 195 | +  flush: 1  | 
 | 196 | +  daemon: Off  | 
 | 197 | +  log_level: info  | 
 | 198 | +  http_server: Off  | 
 | 199 | +  http_listen: 0.0.0.0  | 
 | 200 | +  http_port: 2020  | 
 | 201 | +  storage.metrics: on  | 
 | 202 | +
  | 
 | 203 | +pipeline:  | 
 | 204 | +  inputs:  | 
 | 205 | +    - name: systemd  | 
 | 206 | +      tag: host.systemd  | 
 | 207 | +      DB: /var/lib/fluent-bit/journal.db  | 
 | 208 | +      Lowercase: on  | 
 | 209 | +      Strip_Underscores: on  | 
 | 210 | +      processors:  | 
 | 211 | +        logs:  | 
 | 212 | +          - name: lua  | 
 | 213 | +            call: modify  | 
 | 214 | +            code: |  | 
 | 215 | +              function modify(tag, timestamp, record)  | 
 | 216 | +                new_record = record  | 
 | 217 | +                prio = record["priority"]  | 
 | 218 | +                if(prio == "7")  | 
 | 219 | +                then  | 
 | 220 | +                  new_record["level"] = "DEBUG"  | 
 | 221 | +                elseif(prio == "6")  | 
 | 222 | +                then  | 
 | 223 | +                  new_record["level"] = "INFO"  | 
 | 224 | +                elseif(prio == "5")  | 
 | 225 | +                then  | 
 | 226 | +                  new_record["level"] = "NOTICE"  | 
 | 227 | +                elseif(prio == "4")  | 
 | 228 | +                then  | 
 | 229 | +                  new_record["level"] = "WARN"  | 
 | 230 | +                elseif(prio == "3")  | 
 | 231 | +                then  | 
 | 232 | +                  new_record["level"] = "ERROR"  | 
 | 233 | +                elseif(prio == "2")  | 
 | 234 | +                then  | 
 | 235 | +                  new_record["level"] = "CRITICAL"  | 
 | 236 | +                elseif(prio == "1")  | 
 | 237 | +                then  | 
 | 238 | +                  new_record["level"] = "ALERT"  | 
 | 239 | +                elseif(prio == "0")  | 
 | 240 | +                then  | 
 | 241 | +                  new_record["level"] = "EMERGENCY"  | 
 | 242 | +                end  | 
 | 243 | +                return 1, timestamp, new_record  | 
 | 244 | +              end  | 
 | 245 | +  outputs:  | 
 | 246 | +    - name: loki  | 
 | 247 | +      match: '*'  | 
 | 248 | +      labels: job=journal, instance=myhost, zone=myzone, level=$level, $systemd_unit, tag=$TAG  | 
 | 249 | +      host: logs-prod-XXX.grafana.net  | 
 | 250 | +      port: 443  | 
 | 251 | +      tls: on  | 
 | 252 | +      tls.verify: on  | 
 | 253 | +      line_format: json  | 
 | 254 | +      http_user: my_grafana_user_id  | 
 | 255 | +      http_passwd: my_grafana_password  | 
 | 256 | +```  | 
 | 257 | + | 
 | 258 | +The biggest part in the above config is mapping journald priorities to  | 
 | 259 | +Loki log levels. There is a subtle difference between stackdriver and  | 
 | 260 | +Loki here, the WARNING lable is only understood if written as WARN by  | 
 | 261 | +Loki. Please note the use of the Grafana Cloud Loki credentials and the  | 
 | 262 | +addition of standard instance_id and zone resource labels. The  | 
 | 263 | +directory /var/lib/fluent-bit for journald synchronization needs to be  | 
 | 264 | +created once.  | 
 | 265 | + | 
 | 266 | +### Docker log plug-in  | 
 | 267 | + | 
 | 268 | +To install the Loki docker plugin is installed like this (for an ARM  | 
 | 269 | +based host):  | 
 | 270 | + | 
 | 271 | +```  | 
 | 272 | +docker plugin install grafana/loki-docker-driver:3.5.0-arm64 --alias loki --grant-all-permissions  | 
 | 273 | +```  | 
 | 274 | + | 
 | 275 | +The driver is configured as usual in /etc/docker/daemon.json  | 
 | 276 | +like this:  | 
 | 277 | + | 
 | 278 | +```  | 
 | 279 | +{  | 
 | 280 | +	"log-driver": "loki",  | 
 | 281 | +	"log-opts": {  | 
 | 282 | +		"loki-url": "https://my_grafana_user_id:[email protected]/loki/api/v1/push",  | 
 | 283 | +		"loki-external-labels": "job=docker,instance=myhost,zone=myzone"  | 
 | 284 | +	}  | 
 | 285 | +}  | 
 | 286 | +```  | 
 | 287 | + | 
 | 288 | +The loki docker plug-in does already handle log lines in json format. To  | 
 | 289 | +propagate traceparent information for golang apps using a suitable http  | 
 | 290 | +middleware and adding trace information to the log see:  | 
 | 291 | +[slog-traceparent](https://github.com/jum/slog-traceparent)  | 
0 commit comments