Introduction
Effective web server log management is crucial for maintaining your website’s performance, troubleshooting issues, and gaining insights into user behavior. Apache is one of the most popular web servers. It generates access and error logs that contain valuable information. To efficiently manage and analyze these logs, you can use Logstash to process and forward them to DigitalOcean’s Managed OpenSearch for indexing and visualization.
In this tutorial, we will guide you through installing Logstash on a Droplet, configuring it to collect your Apache logs, and sending them to Managed OpenSearch for analysis.
Prerequisites
- Droplet/s with Apache Webserver installed.
- Managed OpenSearch Cluster
Step 1 – Installing Logstash
Logstash can be installed using the binary files OR via the package repositories. For easier management and updates, using package repositories is generally recommended.
In this section, we’ll guide you through installing Logstash on your Droplet using both APT and YUM package managers.
Let’s identify the OS:
For APT-Based Systems (Ubuntu/Debian)
Download and install the Public Signing Key:
You may need to install the apt-transport-https
package on Debian before proceeding:
Save the repository definition to /etc/apt/sources.list.d/elastic-8.x.list
:
Use the echo
method described above to add the Logstash repository. Do not use add-apt-repository
as it will add a deb-src
entry as well, but we do not provide a source package. If you have added the deb-src
entry, you will see an error like the following:
Unable to find expected entry 'main/source/Sources' in Release file (Wrong sources.list entry or malformed file)
Just delete the deb-src
entry from the /etc/apt/sources.list
file and the installation should work as expected.
Run sudo apt-get update
and the repository is ready for use. You can install it with:
For YUM-Based Systems (CentOS/RHEL)
Download and install the public signing key:
Add the following in your /etc/yum.repos.d/logstash.repo
file. You can make use of ‘tee’ to update and create the file.
Your repository is ready for use. You can install it with:
For further information, please refer to the Installing Logstash guide.
Step 2 – Configuring Logstash to Send Logs to OpenSearch
A Logstash pipeline consists of three main stages: input, filter, and output. Logstash pipelines make use of plugins. You can make use of community plugins or create your own.
-
Input: This stage collects data from various sources. Logstash supports numerous input plugins to handle data sources like log files, databases, message queues, and cloud services.
-
Filter: This stage processes and transforms the data collected in the input stage. Filters can modify, enrich, and structure the data to make it more useful and easier to analyze.
-
Output: This stage sends the processed data to a destination. Destinations can include databases, files, and data stores like OpenSearch.
Step 3 – Installing the Open Search Output Plugin
The OpenSearch output plugin can be installed by running the following command:
More information can be found on this logstash-output-opensearch-plugin repository.
Now let’se create a pipeline:
Create a new file in the path /etc/logstash/conf.d/ called apache_pipeline.conf
, and copy the following contents.
Replace the <OpenSearch_Host>
with your OpenSearch server’s hostname and <OpenSearch_Password>
with your OpenSearch password.
Let’s break down the above configuration.
-
INPUT: This is used to configure a source for the events. The ‘file’ input plugin is used here.
-
path => “/var/log/apache2/access.log” : Specifies the path to the Apache access log file that Logstash will read from
Do make sure that the Logstash service has access to the input path.
-
start_position => “beginning”: Defines where Logstash should start reading the log file. “beginning” indicates that Logstash should start processing the file from the beginning, rather than from the end
-
sincedb_path => “/dev/null”: Specifies the path to a sincedb file. Sincedb files are used by Logstash to keep track of the current position in log files, enabling it to resume where it left off in case of restarts or failures.
-
tags => “apache_access”: Assigns a tag to events read from this input. Tags are useful for identifying and filtering events within Logstash, often used downstream in the output or filtering stages of the configuration. We are using tags for the latter
-
FILTER: is used to process the events.
Starting with conditionals:
This checks if the tag
apache_access
exists in the [tags] field of the incoming log events. We use this conditional to apply the appropriate GROK Filter for Apache access and error logs. -
Grok Filter (for Apache Access Logs):
The grok filter
%{HTTPD_COMBINEDLOG}
is a predefined pattern in Logstash used to parse Apache combined access log format. This extracts fields like IP address, timestamp, HTTP method, URI, status code, etc., from the message field of incoming events. -
Mutate Filter
Remove
(optional): After the Apache logs are parsed, we use mutate-remove to remove certain fields. -
Else Condition: The else block is executed if the
apache_access
tag is not present in [tags]. This else block contains another GROK filter for Apache error logs.This grok filter
%{HTTPD24_ERRORLOG}
parses messages that match the Apache error log format. It extracts fields relevant to error logs like timestamp, log level, error message, etc.GROK patterns can be found at: https://github.com/logstash-plugins/logstash-patterns-core/tree/main/patterns.
-
OUTPUT: The output plugin sends events to a particular destination.
The output block begins with an if condition. We are using if conditionals here
This if conditional is used to route logs to OpenSearch to two separate indexes,
apache_error
andapache_access
.Let’s explore the OpenSearch Output plugin:
hosts => "https://XXX:25060" Your Open search Hostname user => "doadmin" Your Open search Username password => "XXXXX" OpenSearch Password index => "apache_error" Index name in OpenSearch ssl_certificate_verification => true Enabled SSL certificate verification
Step 4 – Start Logstash
Once the Pipeline is configured, start the Logstash service:
Step 5 – Troubleshooting
Check Connectivity
You can verify that Logstash can connect to OpenSearch by testing connectivity:
Replace <your-opensearch-server> with your OpenSearch server’s hostname and <your_username>, <your_password> with your OpenSearch credentials.
Data Ingestion
Ensure that data is properly indexed in OpenSearch:
Replace <your-opensearch-server> with your OpenSearch server’s hostname and <your_username>, <your_password> with your OpenSearch credentials. Similarly, <your-index-name> with the index name.
Firewall and Network Configuration
Ensure firewall rules and network settings allow traffic between Logstash and OpenSearch on port 25060
.
Logs
The logs for Logstash can be found at /var/log/logstash/logstash-plain.log
For details, refer to Troubleshooting.
Conclusion
In this guide, we walked through setting up Logstash to collect and forward Apache logs to OpenSearch. Here’s a quick recap of what we covered:
Installing Logstash: We covered how to use either APT or YUM package managers, depending on your Linux distribution, to install Logstash on your Droplet.
Configuring Logstash: We created and adjusted the Logstash configuration file to ensure that Apache logs are correctly parsed and sent to OpenSearch.
Verifying in OpenSearch: We set up an index pattern in OpenSearch Dashboards to confirm that your logs are being indexed properly and are visible for analysis.
With these steps completed, you should now have a functional setup where Logstash collects Apache logs and sends them to OpenSearch.
Source:
https://www.digitalocean.com/community/tutorials/forward-apache-logs-to-opensearch-via-logstash