An Intro to Logging Basics: Tips, Tricks and In-depth Resources
What is Log Management?
Log management is the process of handling log events generated by all software applications and infrastructure on which they run. It involves log collection, aggregation, parsing, storage, analysis, search, archiving, and disposal, with the ultimate goal of using the data for troubleshooting and gaining business insights, while also ensuring the compliance and security of applications and infrastructure.
Logs are typically recorded in one or more log files. Log Management allows you to gather the data in one place and look at it as part of a whole instead of separate entities. As such, you can analyze the collected log data, identify issues and patterns so that you can paint a clear and visual picture of how all your systems perform at any given moment.
Log management has become an integral part of any DevOps team. Log management solutions vary in use from using the popular open source ELK stack, typically deployed on your own infrastructure, to using fully managed log management solutions, such as Sematext Cloud.
Popular Log Management Topics
Getting Started with Logs: What Is a Log File
A log file is a text file where applications, including the operating system, write events. Logs show you what happened behind the scenes and when it happened so that if something should go wrong with your systems you have a detailed record of every action prior to the anomaly.
Therefore, log files make it easier for developers, DevOps, SysAdmins or SecOps to get insights and identify the root cause of issues with applications and infrastructure.
Looking to replace Splunk or a similar commercial solution with Elasticsearch, Logstash, and Kibana (aka, “ELK stack” or “Elastic stack”) or an alternative logging stack? In this eBook, you’ll find useful how-to instructions, screenshots, code, info about structured logging with rsyslog and Elasticsearch, and more.
Logs are also useful when systems behave normally. You can get insights about how your application reacts and performs, in order to improve it.
There are many different sources of logs, as well as log types. Here are some of the top log sources we see today, starting from the bottom of the stack.
As we interact with mobile apps, web apps, websites, etc., we generate a lot of network traffic. The network gear – network routers, network switches and so on – can generate logs about this traffic. Unlike server and application logs that tend to use more modern formats and increasingly more structured logs, the network gear still uses various kinds of Syslog.
Also Read: What is Syslog: Daemons, Message Formats for more on Syslog.
Server and Application Logs
Traditional sources of log events are servers and applications running on those servers. The kernel emits log messages such as which drivers it loads if the OOM killer was invoked and so on. Then there are system services like when a user logged in. This information helps you diagnose stability and security issues, as well as system-level performance bottlenecks. Is the kernel sending SYN cookies? It could be an attack or the network may be overloaded.
As for applications, you may have Nginx web server logs, a Java web application running in an Apache Tomcat or a PHP application running in Apache web server. They will emit various informative, error, or debug log events.
Some of these logs use standardized formats, like Common Log Format, while others use various custom formats, including various structured logging formats, like key=value or even JSON logs.
If you write your own application, we strongly suggest a structured logging format. It’s much easier to parse down the pipeline.
Nowadays more and more applications are deployed in containers. As such, containers and applications running inside them are another big source of logs. Unlike traditional apps and servers, and certainly network gear, containers are very “promiscuous”. Container orchestration frameworks like Kubernetes move containers from host to host, adapting to demand and resource availability. An average container’s lifespan is shorter than that of a firefly or a bee.
On top of that the practice to “ssh in”, poke around, tail and grep the logs to troubleshoot was deemed a bad practice in the cloud-native world. Hence various Docker monitoring and log management challenges require new approaches and new docker log management tools.
Read more about container logs and how log management works in Kubernetes in our Kubernetes Logging Guide.
Mobile Devices and App Logs
Mobile apps and devices are ubiquitous. You may not think of them as sources of logs because you can’t (easily) access system or application logs on an iOS or Android device. Limited disk space and unreliable network mean you can’t log verbose messages locally, and you can’t assume you’ll ship logs to a centralized location in real time.
In spite of those challenges, it’s important to know if a mobile app crashes and why. Beyond this, how the app behaves and performs. Typically, you’d buffer up to N messages locally, and ship them to a centralized logging service. This is what Sematext Cloud libraries for Android and iOS do.
Sensors, IoT, Industrial IoT
In the consumer space we have sensors in cars, smart thermostats, internet-connected fridges, and other smart-home devices and, on a bigger scale, smart cities.
The Industrial Internet of Things, or IIoT, connects machines and devices in industries such as transportation, power generation, manufacturing, and healthcare.
Typically, we’re interested in metrics generated by these devices. For example, we collect some air pollution levels (PM2.5, PM10) and send them to Sematext Cloud. But logs emitted from these devices are important as well: did this sensor start correctly? Does it need recalibration? How many times did sensors fail in the last 6 months? Based on this information, what is the most reliable manufacturer? These are just some examples of metadata that can be extracted from IoT logs.
Why is Log Management Important?
Log management provides insight into the health and compliance of your systems and applications.
Without it, you’d be stumbling around in the dark hoping to pinpoint sources of performance issues, bugs, unexpected behavior, and other similar issues. You’d be forced to manually inspect multiple log files while trying to troubleshoot production issues. This is painfully slow, error-prone, expensive, and not scalable.
Log management is especially important for cloud-native applications because of their dynamic, distributed and ephemeral nature. Unlike traditional applications, cloud-native applications often run in containers and emit logs to standard output instead of writing them to log files. Which means you don’t have the “default option” of manually grepping logs. Typically, you’d capture the logs and ship them to a centralized log management solution.
In a nutshell, log management enables application and infrastructure operators (developers, DevOps, SysAdmins, etc.) to troubleshoot problems and allows business stakeholders (product managers, marketing, BizOps, etc.) to derive insights from data embedded in log events. Logs are also one of the key sources of data for security analytics – threat detection, intrusion detection, compliance, network security, etc., collectively known as SIEM (Security Information and Event Management).
To fully understand the importance of log management, we’ve gathered some of the main benefits below:
Monitoring and Troubleshooting
The most common and core log management use case is software application and infrastructure troubleshooting. Log events go hand in hand with application monitoring and server monitoring. Developers, DevOps, SysAdmins, and SecOps utilize both metrics and logs so that they are alerted about application and infrastructure performance and health issues, but also to find the root cause of those issues. Having good log management tools or solutions helps reduce MTTR (Mean Time To Recovery) which in turn helps improve user experience. Long downtimes or even applications and infrastructure that perform poorly can also cause profit loss. Thus, log management solutions play a critical role in reducing MTTR.
Logs provide value beyond troubleshooting, though. If you have your logs structured – either from the source, or parsed in the pipeline – you can extract interesting metadata. For example, we often look at slow query logs during Solr or Elasticsearch consulting. Then we can answer lots of questions, like which kinds of queries happen more often, which queries are slow, what’s the breakdown per client, or do we have “noisy” clients? All this helps us optimize the setup, from architecture to queries. If all goes well, we end up with a more stable, faster and more cost-effective system. And we make our own production support job easier!
A less “technical” source of logs can be a sales channel. If we log what clients do at every step – along with client metadata – we can optimize. How many of those creating an account end up logging in? Can they successfully use our service, or should we improve our onboarding? Are there specific categories of clients (e.g. from a region of the world) that seem to have trouble? We can derive these insights if we centralize logs.
Related Monitoring and Troubleshooting Articles:
- APM vs. Log Management: How Logging and Monitoring are Different & Why You Need Both
- How to Detect Malicious Traffic in Your Server Logs
- Monitoring Linux Audit Logs with auditd and Auditbeat
As applications and systems become more and more complex, so does the size and difficulty of your operations. SecOps, SysAdmins, and DevOps would have a harder time monitoring everything “manually”, thus requiring more time and financial resources.
With centralized log management, you can identify trends across your whole company’s infrastructure, allowing you to adapt early and come up with solutions that prevent “fires” vs having to “put them out”.
Better Resource Usage
When it comes to system performance problems, system overload is always like a dark cloud looming over. However, you need to keep in mind that it’s not always your software at fault but rather the requests you have on your server. Whether there are too many or too complex, your system can have difficulties dealing with them.
In this case, what log management does is help track resource usage. You can then see when your system is close to being overloaded so you can better allocate your resources.
Performance monitoring can let you know if there are performance issues, for example, that 90th percentile queries are slow. They may also reveal bottlenecks. To stick with the example, you may find that IO is overloaded when queries are slow. That said, you’ll need query logs to get more actionable insight, such as the content of the more expensive queries, how much data do those queries touch and how many of them run in parallel. Unlike metrics, with logs you have more metadata to filter and visualize.
As with the previous example, one of the biggest headaches people report with applications is long response times to queries or not getting a response at all. Log management allows you to monitor requests at any level (API, database, etc.) and sees which are underperforming. This enables you to step in and understand why such issues occurred, thus keeping you in control of your users’ experience.
Understand Site Visitor Behavior
Log management, along with real user monitoring (RUM), can help track your users’ journey through your site or platform so that you can gain insight into their behavior and improve their experience. Here, log management and real user monitoring (RUM) complement each other.
RUM has access to the user’s perspective, such as the number of visitors you’ve had on your site, which pages they spent the most time on, if there are changes in the number of visitors and much more.
From logs, you have access to metadata closer to your business logic: how many users ended up paying, how backend requests looked like, etc. By correlating these two sources of data, you can spot opportunities such as when to launch a new product, when to close your site for maintenance or when to offer discounts.
There’s no such thing as too much protection when it comes to IT security. Log data analysis is at the heart of any SIEM solution: from network, system and audit logs to application logs. Anomalies here may signal an attack. Logs help security administrators diagnose anomalies in real-time by providing a live stream of log events.
So whenever someone is attempting to breach your walls — whether it’s from the inside or an external threat, you’ll have more insight about what actually happened. You can also get alerted before anomalies happen, so you can react before issues escalate.
Ensure Compliance with Regulations
Seeing as virtual attacks are becoming more and more difficult to detect and solve, it’s critical that your company meets compliance requirements of security policies, audit, regulation, and forensics.
Some of the most important are HIPPA (Health Insurance Portability and Accountability Act), PCI DSS (Payment Card Industry Data Security Standard) and GDPR (General Data Protection Regulation). Furthermore, increasing regulations require that you collect log data, store it and protect it against threats while having it available for audit. Otherwise, if a data breach happens, your company could be susceptible to profit loss as well as hefty fines due to various regulations put in place by several organizations.
Log management will help alert the right people of any suspicious activity concerning user data.
How Does Log Management Work?
Log management has 5 key elements that, if followed, will ensure your logging and monitoring go smoothly. Let’s review what those 5 elements are:
As mentioned above, all your systems and applications generate log files at any given moment, which may be stored in various locations on your software stack, operating system, containers, cloud infrastructure, and network devices.
By log collection, we mean either pulling data from a source (e.g. a log file) or accepting data sent from that source (through a UNIX or network socket). Then, you can send it over to the next hop in your pipeline.
The log collector (or shipper) would have to at least be able to buffer the data somehow, in case it can’t talk to the destination. Sometimes it’s a good idea to do some parsing and enriching close to the source as well. But we’ll talk more about these in the next section: log aggregation.
Related Log Shipping Articles:
- Top 5 Most Popular Log Shippers
- How to ship Kibana Server Logs to Elasticsearch
- Using Filebeat to Send Elasticsearch Logs to Logsene
- Android SDK for Log Shipping & Analytics
- iOS SDK for Log Shipping & Analytics
- Logging Libraries vs Log Shippers
- How to Ship Heroku Logs to Logsene / Managed ELK Stack
Log Aggregation to Centralized Log Storage
The next key element in log management is log aggregation. A typical log aggregation pipeline has the ability to:
- collect logs from the needed sources, as we described above
- buffer logs, in case there are network or throughput issues
- parse logs to transform them in a format that can be indexed. For example, Elasticsearch consumes JSON, so you need to transform your logs into JSON
- optionally, enrich them with various metadata. For example, by knowing the IP of a source you can tag the company department of that host or its geo-location.
You may or may not want to separate these roles. Here are some examples of architectures:
- do everything close to the source. This will automatically scale with the number of sources but may become problematic if you have limited resources (e.g. network or mobile devices). As examples, you can have a lightweight log shipper, such as rsyslog or Logagent installed on every host that generates logs
- have dedicated server(s) that does buffering, parsing and enriching. Preferably in this order, so data can be buffered if processing is too expensive. An example of this design is a centralized Logstash receiving data from a lightweight log shipper, such as Filebeat
- have dedicated buffering, typically a Kafka cluster. A (lightweight) log shipper will push data to Kafka. On the other end, you have a Consumer (e.g. Logstash or Logagent) in charge of parsing, enriching, and shipping data to the final storage
That final storage can be deployed in-house, like your own Elasticsearch or Solr. Or it can be a managed service, like Sematext Cloud. A managed service may take care of parts of the pipeline for you. For example, you can send syslog directly from your devices to Sematext Cloud, where it gets buffered, parsed, and indexed. It can also get automatically backed up to your AWS S3 bucket, for archiving/compliance reasons.
Read more about log aggregation, what is, how it works and the tools you can use to do it from our Log Aggregation Guide: Tools & Tutorials.
Log Search and Analysis
Stored and indexed, your aggregated log files are now searchable. Typically through a structured language such as the Lucene Query Syntax used in Sematext Cloud. This makes it easier for you to dive into root cause analysis.
Log analysis can be more than just search. Even while troubleshooting, it’s often useful to be able to visualize the breakdown of data: does the overall volume spike at some point? How about the traffic volume? Or the number of errors per host? If your logs are structured at the time of indexing, you can get all this information and more.
Log Monitoring and Alerting
Log management helps keep you on your toes, constantly providing data about how your systems and applications are performing. It also keeps you informed whether your infrastructure is working normally or if there are activity anomalies or security breaches.
A key part of this process is that it allows you to set up rules and alerts so that the right teams or people are notified in real-time to take measures before users are affected.
For example, a rule could be to alert your security team whenever a certain number of logins fail or the sales staff when too many people abandon their shopping carts.
- 5-Minute Recipe: Log Alerting and Anomaly Detection
- Log Alerting, Anomaly Detection and Scheduled Reports
Log Visualization and Reporting
All team members – and other cross-functional team members – should have access to the same information so everyone is on the same page. Reports and visualization make everything that happens behind the scenes accessible for everyone, including people outside the IT department.
When building reports for business stakeholders you’ll be able to show data trends as time series line charts, group data and draw tasty pie charts. Not to mention, graphs, visual representation of trends and dashboards have a much higher impact — such as when you see a huge spike — on decision-makers.
Having a clear picture of how large volumes of data perform over time makes it easier to spot trends or anomalies in behavior. You can then just skip to the log line that’s at fault for the spike.
Logging and Monitoring Best Practices
Log management can indeed help you manage and grant actionable insights from the massive amount of log events that your infrastructure and applications generate. However, that’s only the case if you do it right. Otherwise, you’d be tempted to say that you’re losing time and money and that you’re better off without it.
There are a couple of best practices for log management and monitoring that can help ensure you reap the full benefits of having a log management solution. You should start, though, by understanding the difference between logging and monitoring and why you need both. Check out our post on Logging vs Monitoring.
Log Management Tools
As we’ve mentioned before, you can go through all the steps of log management on your own. However, unless you use log management tools, you will need to invest a lot of time and energy.
Instead, log management solutions can handle the entire log management process, while giving you the option to personalize each step depending on your needs. Furthermore, they allow you to visualize and enrich logs, making them easily searchable for both troubleshooting and business analytics. Not to mention, they feature real-time anomaly detection and alerting so that you can pinpoint issues before they even affect the end user.
There are many good log management solutions available today, both open source and paid. We tested some of them over the years and made a list with the best open source log management tools and software out there. Check it out!
Sematext as a Log Management Solution
It’s an all-in-one solution that improves efficiency and grants you actionable insights faster.
You can easily identify and troubleshoot issues before they affect your users and spot opportunities to drive business growth – we encourage you to explore it in more depth here.