Logging Best Practices

2018

Hey Brad I am writing some code and in a review someone said I should add some log messages but I don’t know what I should be logging. What purpose do the log messages serve and what information should I be logging?

TL;DR See Summary section below

What is a log?

A log is a continual feed of what the application is doing; it’s the stream of actions being performed. A log has no beginning or end, it is simply a time-ordered collection of events from the application. In a multi-service system, all the logs are aggregated together to provide a live representation as to what is happening in the system as a whole.

In short, a log tells the story of what your system is doing.

Like any good story the log should be succinct and to the point. It should also be complete, not missing steps in the event stream.

Why log?

Your application will most likely not be running in debug mode, at the very least it likely won’t already be in debug mode when your users run into an error, nor will you necessarily be able to attach a remote debugger. Your log is really the only way to troubleshoot an issue your user is having with the live system. You should use the log to help you ensure that events are occurring in the correct order and to identify errors without having to defer to debug builds or remote debugging sessions.

Thus, when logging ask yourself will the information being logged be helpful if you have to investigate an issue in this area.

How to log?

Your system should not be overly concerned with how logging will occur so as to keep your system focused on satisfying its business requirements thus it should use a logger which will encapsulate away the how. At the time of writing for Node.js projects I’ve been using the bunyan logger as it satisfies the below requirements.

Requirements of a logger

A logger should have a way to priority log messages (i.e. debug, info, warn, error)
A logger should support multiple destinations (i.e. console, file, aggregator)
A logger should be able to be turned off
A logger should not impact performance

Logging Levels

Logs should be able to be prioritized allowing some messages to be of higher concern then others. These priorities are generally referred to as levels. Below is a definition as to what each level should mean for a given log message.

FATAL – A Fatal log message is used when the application has gotten into an unusable state and is about to terminate. A fatal log message is generally a trigger for someone to triage the situation and thus should include enough information for the technician to figure out how the system got into this state. These log levels should only be used when the application is about to go down and needs human intervention.
ERROR – An Error log message is used when a request has gotten into an unusable state and could not be completed. The application itself is fine but the user could be blocked if subsequent requests suffer the same fate. An error log message is generally a trigger for someone to triage the situation to prevent subsequent requests from also failing and thus should include enough information for the technician to figure out what is going on. These log levels should be used when an action has failed and no further attempts to process the action will occur.
WARN – A Warning log message is used when something unusual happened during a request but was not critical to the processing of the request. The current request was still able to be handled but it would be useful to review the anomalous situation to decide if human intervention would be required to prevent it from happening again. These log messages should be restricted to those cases which won’t prevent the request from being fulfilled but indicate a case which was unexpected and might require further investigation to determine the cause. They should not be used for general debug messaging.
INFO – An Information log message is used for reporting a requests significant progress and stages through the application/system. These log messages are useful to ensure that a request is propagating through the system in the correct order and can be helpful to back track a request if an error occurred downstream. Information log messages should not be made to frequently as they can quickly become noise so they should be restricted to significant steps through a given operation.
DEBUG – A Debug log message is used to record extended information about a requests progress through the application/system. These log messages can be used to record calls to less significant actions, key variables or parameters, and the results of critical algorithms which would help to pin point where an error is occurring which results in an operation failing. If when troubleshooting a failing operation you find yourself wanting to insert a console.log add a log.debug() message instead. Keep in mind that debug messages and below are generally turned off in production so if the step is significant for troubleshooting use the INFO level.
TRACE – A Trace log message is used to trace a requests progress through every step of an operation. These log messages can be a bit excessive but can be helpful to step through the code when remote debugging is not an option. They don’t generally include a great deal of detail about the current state and are mostly used to follow the application/systems logic as the request makes its way through the operation.

Best Practices

Here is a list of best practices to follow when adding logs to your application or module. These practices are to avoid common pitfalls but of course they are guidelines and its expected that you will use your best judgment. These practices are tailored for Node.js applications but should be mostly transferable to other languages.

Don’t use `console.log`

You might be tempted to use console.log to log information about your application as it’s readily available however it lacks a few key features:

You can’t switch off the logs. Once a log statement is encountered, it will always be printed
You can’t assign priority to the logs, each log will be treated the same way
It only logs to stdout/stderr so you’d have to pipe the logs to an aggregator via the command line
It logs synchronously when logging to a file or the stdout so unless you are piping the output each log will block the Event Loop.

Writes to stdout and stderr will be blocking when feed into the console or a file to ensure no log messages are lost if the process crashes but if you pipe stdout/stderr into another process they will be written asynchronous to avoid performance issues if the other process is slow.

$ node script.js 2> error.log | tee info.log

In the above stdout is being piped to another process so it will be non-blocking where as stderr which is being streamed into a file (error.log) will be blocking

Set the component name and version

In loggers like bunyan you can specify the name of the application when creating the logger, this will make it easy to identify where the log message came from especially when viewed from an aggregated log containing logs from multiple nodes in a system.

It is also a best practice to configure the logger with the applications version so that it’s easy to identify what version of the application is running on the node who issued the log. This way if a node was brought back online with an older version it would be easy for a human to identify that issues being logged might be due to that or for a bot to auto detect nodes running out of date versions of the application.

const bunyan = require('bunyan');

const { name, version } = require('./package.json');

const log = bunyan.createLogger({ name, version });

log.info('some message');

This will include the name and version of the application in the log messages generated by this logger.

{"name":"myapp","version":"1.0.0","msg":"some message"}

If you’re creating a module which will generate logs instead of creating your own logger it would be better to accept a logger object from the application which is consuming your module. That way any properties the consumer has baked into their logger object will be included in your log messages.

Extending on that idea loggers like bunyan can create child loggers which can be used to extend the parent logger by adding additional properties; your module could use that feature to generate a child logger from the logger object passed to it and append your modules own name and version to make it easier to identify which version of your component is being used and that the log message came from you and not the parent application.

const { name, version } = require('./package.json');

init(opt) {
  if (opt.logger) {
    this.log = opt.logger.child({
     component: name,
     componentVersion: version,
    });
  } else {
    this.log = bunyan.createLogger({ name, version });
  }
}

The above code would generate a logger which when used would create entries that contains both the application and the modules’ name and version.

{"name":"my-application","version":"1.0.0","component":"my-component,"componentVersion":"1.1.1","msg":"some message"}

If your log viewer supports prettify view of the json log like the bunyan-cli does the log will be shows as so:

17:27:28.340Z  INFO my-application/my-component: some message (version=1.0.0, componentVersion=1.1.1)

With proper use of name and componentName the log stream will show a clear progression from App to Component and back to the App again.

17:27:28.340Z  INFO my-application: some message (version=1.0.0)
17:27:28.340Z  INFO my-application/my-component: some message (version=1.0.0, componentVersion=1.1.1)
17:27:28.340Z  INFO my-application: some message (version=1.0.0)

Never log Sensitive Information

Always consider that the information you log could be accessed by parties not privy to sensitive information like user names, passwords, etc. In micro-service systems logs from each service could be streamed to an aggregator service so you can view the logs from all the nodes in one spot. The aggregator service would likely be password protected but it’s better to play it safe and not assume that its 100% secure or that the logs wouldn’t be shared with third parties for diagnostic purposes. It’s a good practice to never log information which could put your user’s personal information at risk.

Things to never log:

The user’s real name
The user’s email
The user’s password
The user’s bearer token

What’s the harm in logging their name or email?
It’s not our information to share. Case in point Bell Canada had to issue an alert to their users for a breach of names and emails.

Never log information you don’t own. The users name, address, email, etc do not belong to you so don’t log them but their ID in your system does thus that is safe to log.

Although it could be simply said that code reviewers should be looking for accidental information leaks via the logs we are as they say only human. If your using bunyan you could setup a serializer which filters out properties on different objects known to include sensitive information.

function filterUserInfo(usr) {
  return {
    id: usr.id,
    rootId: usr.rootId,
    ...
  };
}

init(opt) {
  ...
  
  this.log.addSerializers({ user: filterUserInfo });
}

With the above any consumer of the this.log object who passes in an object containing a user property will have the value of said property scrubbed per the filter function.

  someOperation(user) {
    this.log.info({ user }, 'Some message');
    ...
  }

You can implement your filter function however you see fit (whitelist/blacklist) but it should never throw and it should never mutate the passed in object as that will mutate the original object causing side effects in the calling function.

Allow the log level to be set at run time

Different environments may have different logging requirements. That is in production you might only want to see warning and error logs but in development you may want to see information and debug messages as well. To support this one should be able to pass in the desired log level via an environment variable at application start up.

For consistency sake use the environment variable LOG_LEVEL to allow the runtime configuration of the logging level.

> LOG_LEVEL=info node src/server.js

The above would start the server and tell the logger to generate log messages for any logs marked info and above.

But what if I don’t want to restart my service to change the log level?

This would depend on the logging library your using but if you happen to be using bunyan it supports log snooping.

If your service is setup to only log INFO messages and above when it starts acting up you can use DTrace to snoop the debug messages.

In a terminal attached to the service you want to see the debug messages for run the command:

$ bunyan -p *

This will use DTrace to print out the debug messages to the terminal. They won’t go to your aggregator and will only be visible here so you’d have to have access to the service machine.

See bunyan log snooping

If your using a different logging library or don’t have direct access to the machine then you could make a judgment call on which DEBUG messages are really needed and log them as INFO messages; just keep in mind that too much INFO messages can generate noise and increase log storage requirements. Try to find the right balance between just enough and too much information.

Log with Debugging in mind

Remember that it’s not that easy to stop a live service and attach a remote debugger nor is it guaranteed that an issue seen in production can be reproduced easily in your development environment running in debug mode. To ensure that you can respond quickly to issues which users are having in production your log messages should be sufficient to triage the situation.

If you find yourself inserting a console.log() while debugging an issue think about adding a log.debug() instead.

Log what you have, limit fetching additional data just for the logs

It’s key to remember that your log messages regardless of the level will be processed and thus will consume processing time. When you write a log message which takes a constructed json object that json object will be created even if the application is brought up with a logging level which won’t actually generate the log.

this.log.debug({ hello: 'world' }, 'some message');

In the above even if my application comes up in a way which will only log info messages the object literal { hello: 'world' } will still be created as its an input parameter to the this.log.debug() function; Its only within said function that the logic exists to tell if it should generate the message or not.

Therefore, you should pay attention to the objects your creating for your log messages. If you are just creating simple object literals to contain already fetched information than the performance hit is minimal; however, if you need to go out and fetch additional information not needed for the function but only for the log message you might want to consider putting your log message in a check to save you the extra fetch if the message will ultimately not be generated.

if (this.log.level() <= bunyan.DEBUG) {
  const context = featchContext(userId);
  this.log.debug({ context }, 'some message');
}

Only log what will be needed

When passing data into a log message remember that the data will need to be serialized; a large complex data object will require cpu time and memory to be stringified. It's important to ensure that the cost of serializing a large data object will be of value when reading the logs.

If you only need one bit of information from a large object try just adding that bit of information to improve logging performance.

this.log.info({ userId: context.id }, 'some message');

Profiling with full logging enabled would help to avoid premature optimization but if your logging a large amount of data within hot functions reviewing what information will actually be of use when troubleshooting the live system might help you ensure your operation meets its SLA.

But what should I log?

At a minimum, a log message should include:

The date time of the log message
The name of the host machine. Useful for distributed systems with multiple nodes running the same components
The name of the application. Useful for distributed systems which aggregate logs from each node together
The version of the application
The PID of the process running the application could be handy
The log level
The message
The error (err) for fatal/error/warning logs
context and/or user who issued the request. This could be limited to an id to avoid leaking sensitive information
Any information (current values of key parameters) that you will need to understand the error.
Context describing the operation being performed and which objects were being worked on.

When logging objects if the object is large instead of logging the full object try restricting it down to just the properties being used. Be keen however not to leave out values which would be helpful to understand what the conditions were that led to the error.

For network/web based services it could be handy to also log:

The request (req) object including the method, url, header, body, and calling client info
The response (res) object including response code, body, and header information

For distributed systems, the req_id is a must to correlate log messages across nodes.

Log Errors where they are handled

Remember that the logs are a continuous event stream which you can use to follow an action through the system; if an error occurs and is logged multiple times it can become confusing to understand what the issue actually is and where the sequence of events actually stopped.

For example, if an action kicked off a series of events down through a few layers and an error occurred at the bottom layer if each layer on the way back up to the root logged an error the event stream will be hit with a batch of errors. Those errors would first have to be filtered to ensure that they were all from one action vs. many actions failing. Secondly each error would have to be looked at to see if any actions need to be taking to resolve the errors. Considering all the logged errors were for the same error the extra filtering and iterations seem to be a bit of a time waster.

To save time during triaging only log the error once.

But where do you log it?

Its recommended to log the error where it is being handled.

Case 1
Let's say we have a module who logs and throws an error when it runs into a problem. That sounds reasonable but what if the consuming application is intentionally passing in invalid parameters? Let's assuming that its using the modules API of throwing an exception to validate the input parameter by catching the error and responding appropriately. In this case there is no actual error yet we are filling the event stream with error log messages which will prompt someone to investigate. These false alerts could in fact hide real errors as they are generating noise.

Case 2
Let's say the module now simply throws an error but does not log one. If the consuming application is the same then we no longer have any false failures in our event stream. But what if the consumer is not catching the error? In that case its consumer should be catching errors or its consumer or its. The Error will bubble up until someone can handle the error, in fact if you cannot handle the error why would you catch it? Catching an error just to log it then re-throw the error results in either a bug if you fail to re-throw or a cacophony of log messages in the event stream as each layer is catching, logging, and re-throwing the same error. Its best to let the error bubble up until someone can handle the error and log the problem then.

You can of course catch an error to add context to it. If you are calling a downstream service and it throws an error which you cannot handle you might want to add additional context to the error to help components upstream. This can be done by wrapping the downstream error in a new error you generate. You would re-throw your new wrapping error but again probably best not to log here and allow the upstream handler to decide what to log.

See VError for error wrapping.

But what if no one handles the error? In that case you have a bigger issue then your log as your application will crash. You could add a catch all at the root to ensure that the error gets logged but it's probably better to simply let the application crash as its easier to catch, pun intended, a missing error handler during development if the application is crashing. A good compromise would be to add the root level catch all and log the error but then re-throw or call the shutdown routine to ensure proper attention is given to resolve the unhandled error case.

If you require additional context to trouble shoot issues were errors are being thrown in the lower levels you can either include enough context in the Error which was throw or add log.debug() messages in the lower level while restricting the log.error() message to the handler of the error.

Only log Errors if Humans are needed

I have seen applications log errors even when the application can deal with the error itself. This results in your infrastructure/support team setting up alert triggers and other filters not just on the Error priority but on the error message as well. In other words, ignore these errors but response to those errors.

As stated by the definition of the ERROR level above error log messages should be for those cases where we need to perform some action to return the system to normal operation. For example, if a service disconnects from a backing service and requires a technician to perform a task to reconnect the service to the backing system log an ERROR message. If the service includes retry logic to automatically reconnect to the backing service then instead of logging an ERROR try logging a WARNING message which indicates that we might want to investigate but otherwise the service is ok. One idea would be to log a WARNING for each retry attempt and if we have a limit on how many retries before stopping and that limit is reached log an ERROR then since human intervention will now be needed.

This way we don't:

generate noise with a bunch of safe to ignore Error logs in our event stream
accidentally ignore real errors because we forgot to update the alert triggers

Logging for Modules

A module would be a unit of code which does not run on its own but is consumed by a parent application. These modules offer up functionality via simplified APIs to allow the main application to abstract away complicated interactions. Modules can either be reusable packages which can be imported to many applications or internal services purpose built for its parent application.

A module should not pollute the event stream of those who are using them; generally, if an error occurs in the module it will not be the module who handles it, thus the module should not log the error but instead it should throw and exception to be caught and handled by the consumer of the module.
A module should log however information which will allow consumers to debug there use of the module without requiring them to open up the module and insert console.log's.
For that reason, modules should really only be logging info, debug, and trace level log messages. It is not the modules place to dictate if an error will result in an issue which will need triaging (i.e. warn/error logs).
Anything you think should result in an error message should in fact be a thrown exception relaying on the consumer to log the error.
Anything that should be a warning message since that would imply the process can continue should be inferable from the returned value from the module. For example, if the module does a look up and fails to find the item based on the input instead of logging a warning that it could not find the item simply return null or an empty array and relay on the consumer to interpret that response and let it choose to log a warning. That way it might choose not to log the warning if that is an expected result in its case.

Logging for Applications

An application is a unit of code which can be executed and live as its own process. An application accepts inputs and triggers actions, it owns its own business logic and thus its own event stream.

An application should log its errors. When using external modules, it should capture errors thrown by it and log the error if it cannot proceed with the operation.
An application should use log.trace() to simulate stepping through the code. In lieu of attaching a remote debugger to the application and stepping through the code line by line use the log.trace() message to allow for the same insight into the application.
An application should use log.debug() to simulate setting break points. In lieu of attaching a remote debugger to the application and setting break points at key steps in the operation to inspect logical flow, variables, and parameters use the log.debug() message to expose said key information about the application.
An application should clearly log when the application has started up. When an application is brought online is will no doubt perform some bootstrapping and won't be ready imminently. To help those bringing up the application know when it's ready have it log a clear INFO message which stands out and clearly states that the application has come up correctly and is ready to be used. For network based application including the hostname/ip address and port that the application is listening in the startup INFO message would be beneficial for those waiting for the application to come online.

log.info({ url: server.url, port: server.port }, `----- ${name} server is online -----`);
An application should clearly log when it has stopped. If the application crashed or otherwise gotten into a state where it is about to terminate it should log a FATAL message so that triggers can be added to automatically restart the application or alert a technician to investigate the termination. If the application restart is automatic a FATAL log message can be used to indicate how often the application gets into this state.

log.fatal({ err, cause, url: server.url }, `----- ${name} server shut down -----`);

Logging for Distributed Systems

A Distributed System is a collection of Applications all working together to perform some operation. Actions usually propagate from one application, or node, in the system to the next until the action is completed. For distributed systems, the event stream of each application is viewed together as a single system level stream of events.

A Distributed system should log in the same was as an application but should ensure to include a correlation id (a.k.a a request id) so that logs can be matched across nodes in the system. Generally, this id is logged under the req_id property.

Summary

Don't use console.log
FATAL means the application terminated, ERROR means the request failed
TRACE is like stepping through the code, DEBUG is like break point
Use ERROR if a human will eventually be needed to resolve the issue otherwise use WARN or INFO
Use log.debug() instead of temporary console.log()
Log errors once where the error is handled
Modules should only use INFO, DEBUG or TRACE
Logs should include the application name and version
Logs from Modules should include their name and version as well as the applications
For distributed systems, all log messages should include a req_id
Applications should clearly log when they startup and when they shutdown
Never log sensitive information like user name, password, etc.
Set log level with environment variable LOG_LEVEL
Only log what you will need to troubleshoot issues
Keep the log succinct

Well I hope that these best practices were helpful and that you will now be able to generate log messages which will allow you to tell a real time story of what your system is doing; That you can use your story to ensure that operations are executing correctly and when they are not you're logging enough information to allow you to diagnose the issue.

Until next time think imaginatively and design creatively

ImaginativeThinking.ca

Logging Best Practices

What is a log?

Why log?

How to log?

Requirements of a logger

Logging Levels

Best Practices

Don’t use `console.log`

Set the component name and version

Never log Sensitive Information

Allow the log level to be set at run time

But what if I don’t want to restart my service to change the log level?

Log with Debugging in mind

Log what you have, limit fetching additional data just for the logs

Only log what will be needed

But what should I log?

Log Errors where they are handled

But where do you log it?

Only log Errors if Humans are needed

Logging for Modules

Logging for Applications

Logging for Distributed Systems

Summary

Brad

What the Heck are Automated Tests?

What the Heck is Mocha?

Feel free to write a reply or comment. Cancel reply

What is a log?

Why log?

How to log?

Requirements of a logger

Logging Levels

Best Practices

Don’t use console.log

Set the component name and version

Never log Sensitive Information

Allow the log level to be set at run time

But what if I don’t want to restart my service to change the log level?

Log with Debugging in mind

Log what you have, limit fetching additional data just for the logs

Only log what will be needed

But what should I log?

Log Errors where they are handled

But where do you log it?

Only log Errors if Humans are needed

Logging for Modules

Logging for Applications

Logging for Distributed Systems

Summary

If you liked this post Share it.

Brad

What the Heck are Automated Tests?

What the Heck is Mocha?

Feel free to write a reply or comment. Cancel reply

Don’t use `console.log`