Logging from apps

Python apps use the standard python logging module, hooked up to the JsonFormatter from the python-json-logger package to produce log lines on STDOUT as json dictionaries. More than just allowing us to emit unstructured text-based "messages", this enables us to annotate each log line with arbitrary fields of any primitive type json can support. This "structured" output can often be more useful than the textual "message" itself, as it can be explicitly filtered-by in kibana, numerical fields can be sorted-by and filtered by numerical comparisons, and kibana can show histograms of structured values. The most important role of the natural-language "message" remaining to explain how these various fields relate to one another or explain what event has occurred to produce this log entry.

These "structured" fields can be supplied in a logging call through a dictionary provided to the extra= kwarg. e.g.

logger.warning("Foo %s doesn't have a bar", foo_id, extra={"foo_id": foo_id})

While it may be tempting to try and dig out as much context as possible for each log message (e.g. service_id for each notification_id that's mentioned), this would probably get out of hand, increase code complexity, increase log volume and even increase database load if this information needed to be looked up. Instead, most of the time it's probably best to only include information that the function has readily at hand. Providing more extensive context is something that is better left to tracing technology that we're hoping to introduce in the near future and can achieve this more efficiently.

A very small number of structured fields are automatically annotated onto log messages emitted by the apps. These include service_id and current_user_id. Coverage of these is quite patchy and quite tied to flask requests and contexts. In the future we are considering replacing these with something based on tracing context propagation technology too.

Structured field name choice

Here is a list of most of the structured log field names emitted by the apps at time of writing:

api_key_id
api_key_type
attachment_id
bounce_message
branding_id
celery_task
celery_task_id
client_name
cloudfront_request_id
complaint_id
created_record_count
current_user_id
days_of_retention
delay
delete_events_before
deleted_record_count
detailed_status
detailed_status_code
document_id
duration
email_brand_count
end_date
end_time
error_message
exception
expiry_time
failed_login_count
file_id
file_name
file_size
file_size_max
file_size_new
file_size_orig
file_size_percent_bigger
free_sms_fragment_allowance
func_name
inbound_number
inbound_number_count
inbound_sms_id
inbound_sms_provider_reference
invalid_pages
invalid_pages_count
is_expired
is_used
issuer
job_created_at
job_id
job_processing_finished_at
job_row_number
job_started_at
job_status
limit
limit_name
local_unix_time
max_failed_login_count
metadata
migrated_row_count
namespace
notification_billable_units
notification_count
notification_created_at
notification_id
notification_id_token
notification_reference
notification_status
notification_status_new
notification_type
organisation_name
page_count
page_height
page_number
page_rotate
page_width
part_count
part_number
pdf_creator
pdf_producer
permanent_failure_rate
process_
process_day
provider_name
provider_priority
provider_priority_new
provider_status
queue_name
reason
redis_key
redis_operation
report_request_id
report_request_parameter
report_request_status
retention_period
retry_number
review_since
row_count
s3_bucket
s3_bucket_destination
s3_bucket_new
s3_key
s3_key_destination
s3_key_new
sender_id
sending_limit
sent_count
service_callback_url
service_id
service_ids_to_purge_count
service_ids_with_data_retention_count
shatter_batch_size
start_date
start_time
status_code
template_id
time_threshold
token
token_iat
unembedded_fonts
updated_history_record_count
updated_record_count
upload_id
url
user_count
user_id
username
valid_from_financial_year_start
validation_status
verify_code_count
zendesk_operation
zendesk_ticket_id

It's likely that a new log annotation you're wishing to add will have an appropriate field name already in use here.

The above log field names loosely follow some broad patterns:

In most places, where a parameter corresponds to a database/object field name, the parameter name tends to begin with <objectname>_<fieldname> (e.g. notification_id, notification_created_at) except where this would unnecessarily prevent fieldname sharing (e.g. we don't have template_service_id because what would interfere with the usefulness of having a common service_id field, even though the value might have been pulled out of the service_id field of a template object)
In some places, this may be followed up by a disambiguator where we have multiple values of the same sub-type. In cases where this is "old value"/"new value" generally the "old" value is left un-suffixed and the "new" value has a disambiguating suffix.
If the value is an aggregation of some sort, that signifier tends to come at the end (usually this is _count)

Types for structured fields

JsonFormatter makes a good attempt at converting many python types (supplied through extra= args) into types that are natively supported by json. datetime.datetime and datetime.date are converted into .isoformat() strings, which are well-handled by kibana. The standard string representation of these types (as used when string-formatting interpolates the values into the natural-language "message" field) is also (close to) ISO format, so for dates and timestamps it's probably best to just pass these to the logger as-is.

datetime.timedelta has no such special handling, so for extra= fields it's probably best to call .total_seconds() on them and expose them as a float of seconds. However their standard python-string-representation is a slightly nicer, more human readable format HH:MM:SS.ssss... so for the natural-language-destined argument it's better to pass this as a timedelta with no further processing.

It's rarely a great idea to dump a whole bunch of ids to a single log entry - it'll never work properly with structured fields. Consider whether it's worth the log-volume savings when you take into account we deal with log volumes that already create a handful of log entries for every notification sent, and it may be worth emitting a log entry for every item found meeting some condition.

See https://docs.python.org/3/library/string.html#format-specification-mini-language for details of options you have beyond just using %s (or %(param_name)s) as a parameter's placeholder in the natural-language "message". Some of these options may save you significant work in pretty-ing up values for display. Some notable ones:

If you're emitting a float, you may want to consider if you really need to display the full precision in the natural-language "message" (for one thing, the amount of precision available in the float can mislead over the amount of precision the information is actually generated from). You can use the %g placeholder, which allows the number of significant figures to include to be set like %.4g (for 4 s.f.) or %(param_name).5g (for 5 s.f.). Reducing precision like this for very common log messages may also have a positive impact on our log volume (and therefore bills).
Instead of naively wrapping a string placeholder in quotation marks (e.g. "%s" or "%(param_name)s") in an attempt to delimit the boundaries of an arbitrary object or string of text, you may get better results using %r (or %(param_name)r) which will use the python repr(...) of the object instead of the str(...). This will do some intelligent things, like automatically quote-wrapping strings, switching quote-character appropriately if one quote character is contained within the string itself and displaying other unusual characters as their \-escaped equivalents (e.g. \n for newline). It will also emit non-string values without quotes, making it possible to differentiate None from the string "None" and 1.23 from the string "1.23".

Logging from apps

Structured field name choice

Types for structured fields

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally