Logging Best Practices
By: Brad
Hey Brad I am writing some code and in a review someone said I should add some log messages but I don’t know what I should be logging. What purpose do the log messages serve and what information should I be logging?
TL;DR See Summary section below
What is a log?
A log is a continual feed of what the application is doing; it’s the stream of actions being performed. A log has no beginning or end, it is simply a time-ordered collection of events from the application. In a multi-service system, all the logs are aggregated together to provide a live representation as to what is happening in the system as a whole.
In short, a log tells the story of what your system is doing.
Like any good story the log should be succinct and to the point. It should also be complete, not missing steps in the event stream.
Why log?
Your application will most likely not be running in debug mode, at the very least it likely won’t already be in debug mode when your users run into an error, nor will you necessarily be able to attach a remote debugger. Your log is really the only way to troubleshoot an issue your user is having with the live system. You should use the log to help you ensure that events are occurring in the correct order and to identify errors without having to defer to debug builds or remote debugging sessions.
Thus, when logging ask yourself will the information being logged be helpful if you have to investigate an issue in this area.
How to log?
Your system should not be overly concerned with how logging will occur so as to keep your system focused on satisfying its business requirements thus it should use a logger which will encapsulate away the how. At the time of writing for Node.js projects I’ve been using the bunyan logger as it satisfies the below requirements.
Requirements of a logger
- A logger should have a way to priority log messages (i.e. debug, info, warn, error)
- A logger should support multiple destinations (i.e. console, file, aggregator)
- A logger should be able to be turned off
- A logger should not impact performance
Logging Levels
Logs should be able to be prioritized allowing some messages to be of higher concern then others. These priorities are generally referred to as levels. Below is a definition as to what each level should mean for a given log message.
-
FATAL – A Fatal log message is used when the application has gotten into an unusable state and is about to terminate. A fatal log message is generally a trigger for someone to triage the situation and thus should include enough information for the technician to figure out how the system got into this state. These log levels should only be used when the application is about to go down and needs human intervention.
-
ERROR – An Error log message is used when a request has gotten into an unusable state and could not be completed. The application itself is fine but the user could be blocked if subsequent requests suffer the same fate. An error log message is generally a trigger for someone to triage the situation to prevent subsequent requests from also failing and thus should include enough information for the technician to figure out what is going on. These log levels should be used when an action has failed and no further attempts to process the action will occur.
-
WARN – A Warning log message is used when something unusual happened during a request but was not critical to the processing of the request. The current request was still able to be handled but it would be useful to review the anomalous situation to decide if human intervention would be required to prevent it from happening again. These log messages should be restricted to those cases which won’t prevent the request from being fulfilled but indicate a case which was unexpected and might require further investigation to determine the cause. They should not be used for general debug messaging.
-
INFO – An Information log message is used for reporting a requests significant progress and stages through the application/system. These log messages are useful to ensure that a request is propagating through the system in the correct order and can be helpful to back track a request if an error occurred downstream. Information log messages should not be made to frequently as they can quickly become noise so they should be restricted to significant steps through a given operation.
-
DEBUG – A Debug log message is used to record extended information about a requests progress through the application/system. These log messages can be used to record calls to less significant actions, key variables or parameters, and the results of critical algorithms which would help to pin point where an error is occurring which results in an operation failing. If when troubleshooting a failing operation you find yourself wanting to insert a
console.log
add alog.debug()
message instead. Keep in mind that debug messages and below are generally turned off in production so if the step is significant for troubleshooting use theINFO
level. -
TRACE – A Trace log message is used to trace a requests progress through every step of an operation. These log messages can be a bit excessive but can be helpful to step through the code when remote debugging is not an option. They don’t generally include a great deal of detail about the current state and are mostly used to follow the application/systems logic as the request makes its way through the operation.
Best Practices
Here is a list of best practices to follow when adding logs to your application or module. These practices are to avoid common pitfalls but of course they are guidelines and its expected that you will use your best judgment. These practices are tailored for Node.js applications but should be mostly transferable to other languages.
Don’t use console.log
You might be tempted to use console.log
to log information about your application as it’s readily available however it lacks a few key features:
- You can’t switch off the logs. Once a log statement is encountered, it will always be printed
- You can’t assign priority to the logs, each log will be treated the same way
- It only logs to stdout/stderr so you’d have to pipe the logs to an aggregator via the command line
- It logs synchronously when logging to a file or the stdout so unless you are piping the output each log will block the Event Loop.
Set the component name and version
In loggers like bunyan you can specify the name of the application when creating the logger, this will make it easy to identify where the log message came from especially when viewed from an aggregated log containing logs from multiple nodes in a system.
It is also a best practice to configure the logger with the applications version so that it’s easy to identify what version of the application is running on the node who issued the log. This way if a node was brought back online with an older version it would be easy for a human to identify that issues being logged might be due to that or for a bot to auto detect nodes running out of date versions of the application.
const bunyan = require('bunyan'); const { name, version } = require('./package.json'); const log = bunyan.createLogger({ name, version }); log.info('some message');
This will include the name and version of the application in the log messages generated by this logger.
{"name":"myapp","version":"1.0.0","msg":"some message"}
If you’re creating a module which will generate logs instead of creating your own logger it would be better to accept a logger object from the application which is consuming your module. That way any properties the consumer has baked into their logger object will be included in your log messages.
Extending on that idea loggers like bunyan can create child loggers which can be used to extend the parent logger by adding additional properties; your module could use that feature to generate a child logger from the logger object passed to it and append your modules own name and version to make it easier to identify which version of your component is being used and that the log message came from you and not the parent application.
const { name, version } = require('./package.json'); init(opt) { if (opt.logger) { this.log = opt.logger.child({ component: name, componentVersion: version, }); } else { this.log = bunyan.createLogger({ name, version }); } }
The above code would generate a logger which when used would create entries that contains both the application and the modules’ name and version.
{"name":"my-application","version":"1.0.0","component":"my-component,"componentVersion":"1.1.1","msg":"some message"}
If your log viewer supports prettify view of the json log like the bunyan-cli
does the log will be shows as so:
17:27:28.340Z INFO my-application/my-component: some message (version=1.0.0, componentVersion=1.1.1)
With proper use of name
and componentName
the log stream will show a clear progression from App to Component and back to the App again.
17:27:28.340Z INFO my-application: some message (version=1.0.0) 17:27:28.340Z INFO my-application/my-component: some message (version=1.0.0, componentVersion=1.1.1) 17:27:28.340Z INFO my-application: some message (version=1.0.0)
Never log Sensitive Information
Always consider that the information you log could be accessed by parties not privy to sensitive information like user names, passwords, etc. In micro-service systems logs from each service could be streamed to an aggregator service so you can view the logs from all the nodes in one spot. The aggregator service would likely be password protected but it’s better to play it safe and not assume that its 100% secure or that the logs wouldn’t be shared with third parties for diagnostic purposes. It’s a good practice to never log information which could put your user’s personal information at risk.
Things to never log:
- The user’s real name
- The user’s email
- The user’s password
- The user’s bearer token
What’s the harm in logging their name or email?
It’s not our information to share. Case in point Bell Canada had to issue an alert to their users for a breach of names and emails.
Never log information you don’t own. The users name, address, email, etc do not belong to you so don’t log them but their ID in your system does thus that is safe to log.
Although it could be simply said that code reviewers should be looking for accidental information leaks via the logs we are as they say only human. If your using bunyan you could setup a serializer which filters out properties on different objects known to include sensitive information.
function filterUserInfo(usr) { return { id: usr.id, rootId: usr.rootId, ... }; } init(opt) { ... this.log.addSerializers({ user: filterUserInfo }); }
With the above any consumer of the this.log
object who passes in an object containing a user
property will have the value of said property scrubbed per the filter function.
someOperation(user) { this.log.info({ user }, 'Some message'); ... }
You can implement your filter function however you see fit (whitelist/blacklist) but it should never throw and it should never mutate the passed in object as that will mutate the original object causing side effects in the calling function.
Allow the log level to be set at run time
Different environments may have different logging requirements. That is in production you might only want to see warning and error logs but in development you may want to see information and debug messages as well. To support this one should be able to pass in the desired log level via an environment variable at application start up.
For consistency sake use the environment variable LOG_LEVEL
to allow the runtime configuration of the logging level.
> LOG_LEVEL=info node src/server.js
The above would start the server and tell the logger to generate log messages for any logs marked info
and above.
But what if I don’t want to restart my service to change the log level?
This would depend on the logging library your using but if you happen to be using bunyan
it supports log snooping.
If your service is setup to only log INFO messages and above when it starts acting up you can use DTrace to snoop the debug messages.
In a terminal attached to the service you want to see the debug messages for run the command:
$ bunyan -p *
This will use DTrace to print out the debug messages to the terminal. They won’t go to your aggregator and will only be visible here so you’d have to have access to the service machine.
If your using a different logging library or don’t have direct access to the machine then you could make a judgment call on which DEBUG messages are really needed and log them as INFO messages; just keep in mind that too much INFO messages can generate noise and increase log storage requirements. Try to find the right balance between just enough and too much information.
Log with Debugging in mind
Remember that it’s not that easy to stop a live service and attach a remote debugger nor is it guaranteed that an issue seen in production can be reproduced easily in your development environment running in debug mode. To ensure that you can respond quickly to issues which users are having in production your log messages should be sufficient to triage the situation.
If you find yourself inserting a
console.log()
while debugging an issue think about adding alog.debug()
instead.
Log what you have, limit fetching additional data just for the logs
It’s key to remember that your log messages regardless of the level will be processed and thus will consume processing time. When you write a log message which takes a constructed json object that json object will be created even if the application is brought up with a logging level which won’t actually generate the log.
this.log.debug({ hello: 'world' }, 'some message');
In the above even if my application comes up in a way which will only log info
messages the object literal { hello: 'world' }
will still be created as its an input parameter to the this.log.debug()
function; Its only within said function that the logic exists to tell if it should generate the message or not.
Therefore, you should pay attention to the objects your creating for your log messages. If you are just creating simple object literals to contain already fetched information than the performance hit is minimal; however, if you need to go out and fetch additional information not needed for the function but only for the log message you might want to consider putting your log message in a check to save you the extra fetch if the message will ultimately not be generated.
if (this.log.level() <= bunyan.DEBUG) { const context = featchContext(userId); this.log.debug({ context }, 'some message'); }
Only log what will be needed
When passing data into a log message remember that the data will need to be serialized; a large complex data object will require cpu time and memory to be stringified. It's important to ensure that the cost of serializing a large data object will be of value when reading the logs.
If you only need one bit of information from a large object try just adding that bit of information to improve logging performance.
this.log.info({ userId: context.id }, 'some message');
Profiling with full logging enabled would help to avoid premature optimization but if your logging a large amount of data within hot functions reviewing what information will actually be of use when troubleshooting the live system might help you ensure your operation meets its SLA.
But what should I log?
At a minimum, a log message should include:
- The date time of the log message
- The name of the host machine. Useful for distributed systems with multiple nodes running the same components
- The name of the application. Useful for distributed systems which aggregate logs from each node together
- The version of the application
- The PID of the process running the application could be handy
- The log level
- The message
- The error (
err
) for fatal/error/warning logs - context and/or user who issued the request. This could be limited to an id to avoid leaking sensitive information
- Any information (current values of key parameters) that you will need to understand the error.
- Context describing the operation being performed and which objects were being worked on.
When logging objects if the object is large instead of logging the full object try restricting it down to just the properties being used. Be keen however not to leave out values which would be helpful to understand what the conditions were that led to the error.
For network/web based services it could be handy to also log:
- The request (
req
) object including the method, url, header, body, and calling client info - The response (
res
) object including response code, body, and header information
For distributed systems, the req_id
is a must to correlate log messages across nodes.
Log Errors where they are handled
Remember that the logs are a continuous event stream which you can use to follow an action through the system; if an error occurs and is logged multiple times it can become confusing to understand what the issue actually is and where the sequence of events actually stopped.
For example, if an action kicked off a series of events down through a few layers and an error occurred at the bottom layer if each layer on the way back up to the root logged an error the event stream will be hit with a batch of errors. Those errors would first have to be filtered to ensure that they were all from one action vs. many actions failing. Secondly each error would have to be looked at to see if any actions need to be taking to resolve the errors. Considering all the logged errors were for the same error the extra filtering and iterations seem to be a bit of a time waster.
To save time during triaging only log the error once.
But where do you log it?
Its recommended to log the error where it is being handled.
Case 1
Let's say we have a module who logs and throws an error when it runs into a problem. That sounds reasonable but what if the consuming application is intentionally passing in invalid parameters? Let's assuming that its using the modules API of throwing an exception to validate the input parameter by catching the error and responding appropriately. In this case there is no actual error yet we are filling the event stream with error log messages which will prompt someone to investigate. These false alerts could in fact hide real errors as they are generating noise.
Case 2
Let's say the module now simply throws an error but does not log one. If the consuming application is the same then we no longer have any false failures in our event stream. But what if the consumer is not catching the error? In that case its consumer should be catching errors or its consumer or its. The Error will bubble up until someone can handle the error, in fact if you cannot handle the error why would you catch it? Catching an error just to log it then re-throw the error results in either a bug if you fail to re-throw or a cacophony of log messages in the event stream as each layer is catching, logging, and re-throwing the same error. Its best to let the error bubble up until someone can handle the error and log the problem then.
You can of course catch an error to add context to it. If you are calling a downstream service and it throws an error which you cannot handle you might want to add additional context to the error to help components upstream. This can be done by wrapping the downstream error in a new error you generate. You would re-throw your new wrapping error but again probably best not to log here and allow the upstream handler to decide what to log.
See VError for error wrapping.
But what if no one handles the error? In that case you have a bigger issue then your log as your application will crash. You could add a catch all at the root to ensure that the error gets logged but it's probably better to simply let the application crash as its easier to catch, pun intended, a missing error handler during development if the application is crashing. A good compromise would be to add the root level catch all and log the error but then re-throw or call the shutdown routine to ensure proper attention is given to resolve the unhandled error case.
If you require additional context to trouble shoot issues were errors are being thrown in the lower levels you can either include enough context in the Error which was throw or add log.debug()
messages in the lower level while restricting the log.error()
message to the handler of the error.
Only log Errors if Humans are needed
I have seen applications log errors even when the application can deal with the error itself. This results in your infrastructure/support team setting up alert triggers and other filters not just on the Error priority but on the error message as well. In other words, ignore these errors but response to those errors.
As stated by the definition of the ERROR level above error log messages should be for those cases where we need to perform some action to return the system to normal operation. For example, if a service disconnects from a backing service and requires a technician to perform a task to reconnect the service to the backing system log an ERROR message. If the service includes retry logic to automatically reconnect to the backing service then instead of logging an ERROR try logging a WARNING message which indicates that we might want to investigate but otherwise the service is ok. One idea would be to log a WARNING for each retry attempt and if we have a limit on how many retries before stopping and that limit is reached log an ERROR then since human intervention will now be needed.
This way we don't:
- generate noise with a bunch of safe to ignore Error logs in our event stream
- accidentally ignore real errors because we forgot to update the alert triggers
Logging for Modules
A module would be a unit of code which does not run on its own but is consumed by a parent application. These modules offer up functionality via simplified APIs to allow the main application to abstract away complicated interactions. Modules can either be reusable packages which can be imported to many applications or internal services purpose built for its parent application.
- A module should not pollute the event stream of those who are using them; generally, if an error occurs in the module it will not be the module who handles it, thus the module should not log the error but instead it should throw and exception to be caught and handled by the consumer of the module.
- A module should log however information which will allow consumers to debug there use of the module without requiring them to open up the module and insert
console.log
's. - For that reason, modules should really only be logging
info
,debug
, andtrace
level log messages. It is not the modules place to dictate if an error will result in an issue which will need triaging (i.e.warn
/error
logs). - Anything you think should result in an error message should in fact be a thrown exception relaying on the consumer to log the error.
- Anything that should be a warning message since that would imply the process can continue should be inferable from the returned value from the module. For example, if the module does a look up and fails to find the item based on the input instead of logging a warning that it could not find the item simply return
null
or an empty array and relay on the consumer to interpret that response and let it choose to log a warning. That way it might choose not to log the warning if that is an expected result in its case.
Logging for Applications
An application is a unit of code which can be executed and live as its own process. An application accepts inputs and triggers actions, it owns its own business logic and thus its own event stream.
- An application should log its errors. When using external modules, it should capture errors thrown by it and log the error if it cannot proceed with the operation.
- An application should use
log.trace()
to simulate stepping through the code. In lieu of attaching a remote debugger to the application and stepping through the code line by line use thelog.trace()
message to allow for the same insight into the application. - An application should use
log.debug()
to simulate setting break points. In lieu of attaching a remote debugger to the application and setting break points at key steps in the operation to inspect logical flow, variables, and parameters use thelog.debug()
message to expose said key information about the application. -
An application should clearly log when the application has started up. When an application is brought online is will no doubt perform some bootstrapping and won't be ready imminently. To help those bringing up the application know when it's ready have it log a clear INFO message which stands out and clearly states that the application has come up correctly and is ready to be used. For network based application including the hostname/ip address and port that the application is listening in the startup INFO message would be beneficial for those waiting for the application to come online.
log.info({ url: server.url, port: server.port }, `----- ${name} server is online -----`);
-
An application should clearly log when it has stopped. If the application crashed or otherwise gotten into a state where it is about to terminate it should log a FATAL message so that triggers can be added to automatically restart the application or alert a technician to investigate the termination. If the application restart is automatic a FATAL log message can be used to indicate how often the application gets into this state.
log.fatal({ err, cause, url: server.url }, `----- ${name} server shut down -----`);
Logging for Distributed Systems
A Distributed System is a collection of Applications all working together to perform some operation. Actions usually propagate from one application, or node, in the system to the next until the action is completed. For distributed systems, the event stream of each application is viewed together as a single system level stream of events.
- A Distributed system should log in the same was as an application but should ensure to include a correlation id (a.k.a a request id) so that logs can be matched across nodes in the system. Generally, this id is logged under the
req_id
property.
Summary
- Don't use
console.log
- FATAL means the application terminated, ERROR means the request failed
- TRACE is like stepping through the code, DEBUG is like break point
- Use ERROR if a human will eventually be needed to resolve the issue otherwise use WARN or INFO
- Use
log.debug()
instead of temporaryconsole.log()
- Log errors once where the error is handled
- Modules should only use INFO, DEBUG or TRACE
- Logs should include the application name and version
- Logs from Modules should include their name and version as well as the applications
- For distributed systems, all log messages should include a
req_id
- Applications should clearly log when they startup and when they shutdown
- Never log sensitive information like user name, password, etc.
- Set log level with environment variable
LOG_LEVEL
- Only log what you will need to troubleshoot issues
- Keep the log succinct
Well I hope that these best practices were helpful and that you will now be able to generate log messages which will allow you to tell a real time story of what your system is doing; That you can use your story to ensure that operations are executing correctly and when they are not you're logging enough information to allow you to diagnose the issue.
Until next time think imaginatively and design creatively