Setting up Fluentd Logs Ingestion

Last updated on 04 January, 2023

Fluentd is an open-source data collector which provides a unifying layer between different types of log inputs and outputs. Fluentd can collect logs from multiple sources, and structure the data in JSON format. This allows for a unified log data processing including collecting, filtering, buffering, and outputting logs across multiple sources and destinations. For more information see the Fluentd documentation.

If you are already using Fluentd to collect application and system logs, you can use the LM Logs Fluentd plugin to forward the logs to LogicMonitor.

Prerequisites

  • A LogicMonitor API token to authenticate all requests to the log ingestion API.
  • Logs sent to the log ingestion API must include a “message” field. Requests sent without a “message” are not accepted.

Installing the Plugin

You have the following options: 

  • With gem – if you have td-agent/fluentd installed along with native Ruby: gem install fluent-plugin-lm-logs
  • For native td-agent/fluentd plugin handling: td-agent-gem install fluent-plugin-lm-logs
  • Alternatively, you can add out_lm.rb to your Fluentd plugins directory. 

Configuring the Plugin

In this step you specify which logs should be forwarded to LogicMonitor.

Create a custom fluent.conf file, or edit the existing one, and add the following to the Fluentd configuration. Properties are explained in the following.

# Match events tagged with "lm.**" and
# send them to LogicMonitor
<match lm.**>
    @type lm
    resource_mapping {"<event_key>": "<lm_property>"}
    company_name <lm_company_name>
    access_id <lm_access_id>
    access_key <lm_access_key>
      <buffer>
        @type memory
        flush_interval 1s
        chunk_limit_size 5m
      </buffer> 
    debug false
    compression gzip
</match>

Configuration Properties

PropertyDescription
company_nameYour LogicMonitor company or account name in the target URL: https://<account>.logicmonitor.com
resource_mappingThe mapping that defines the source of the log event to the LogicMonitor resource. In this case, the <event_key> in the incoming event is mapped to the value of <lm_property>.
access_idThe LogicMonitor API tokens access ID. It is recommended to create an API-only user. See API Tokens.
access_keyThe LogicMonitor API tokens access key. See API Tokens.
flush_intervalDefines the time in seconds to wait before sending batches of logs to LogicMonitor. Default is 60s.
chunk_limit_size
Defines the size limit in mbs for a collected logs chunk before sending the batch to LogicMonitor. Default is 8 m. 
flush_thread_countDefines the number of parallel batches of logs to send to LogicMonitor. Default is 1. Using multiple threads can hide the IO/network latency, but does not improve the processing performance.
debugWhen true, logs more information to the Fluentd console.
force_encodingSpecify charset when logs contain invalid utf-8 characters.
include_metadataWhen true, appends additional metadata to the log. default false
compressionEnable compression for incoming events. Currently supports gzip encoding. 

Request Example

Example of request sent:

curl -X POST -d 'json={"message":"hello LogicMonitor from fluentd", "event_key":"lm_property_value"}' http://localhost:8888/lm.test

Event returned:

{
    "message": "hello LogicMonitor from fluentd"
}

Mapping Resources

It is important that the sources generating the log data are mapped using the right format, so that logs are parsed correctly when sent to LogicMonitor Logs.

When defining the source mapping for the Fluentd event, the <event_key> in the incoming event is mapped to the LogicMonitor resource, which is the value of <lm_property>.

For example, you can map a “hostname" field in the log event to the LogicMonitor property “system.hostname" using:

resource_mapping {"hostname": 'system.hostname"}

If the LogicMonitor resource mapping is known, the event_key property can be overridden by specifying _lm.resourceId in each record.

Configuration Examples

The following are examples of resource mapping.

Mapping with _lm.resourceID

In this example, all incoming records that match lm.** will go through the filter. The specified _lm.resourceId mapping is added before it is sent to LogicMonitor.

<filter lm.**>
    @type record_transformer
    <record>
_lm.resourceId { "system.aws.arn": "arn:aws:ec2:us-west-1:xxx:instance/i-xxx"}
   	tag ${tag}
    </record>
</filter>

Mapping Kubernetes Logs

For Kubernetes logs in Fluentd, the resource mapping can be defined with this statement:

 resource_mapping {"kubernetes.pod_name": "auto.name"}

Mapping a Log File Resource

If you want to only send logs with a specific tag to LogicMonitor, change the Regex pattern from ** to the specific tag. In this example the tag is  “lm.service”.

 #Tail one or more log files
<source>
  @type tail
  <parse>
    @type none
  </parse>
  path /path/to/file
  tag lm.service
</source>

# send all logs to Logicmonitor
<match **> 
  @type lm
  resource_mapping {"Computer": "system.hostname"}
  company_name LM_COMPANY_NAME
  access_id LM_ACCESS_ID
  access_key LM_ACCESS_KEY
  <buffer>
    @type memory
    flush_interval 1s
    chunk_limit_size 5m
  </buffer> 
  debug false
</match>

Parsing a Log File

In some cases you might be tailing a file for logs. Here it is important to parse the log lines so that fields are correctly populated with timestamp, host, log message and so on. The following example shows how to configure the source for this.

 #Tail one or more log files
<source>
  @type tail
  <parse>
    @type none # this will send log-lines as it is without parsing
  </parse>
  path /path/to/file
  tag lm.service
</source>

There are many parsers available for example for Fluentd. You can install a parser plugin using gem install fluent-plugin-lm-logs. For more information see the Fluentd documentation.

Transforming a Log Record

Logs that are read by source might not have all the metadata needed. Through a filter plugin you can modify the logs before writing them to LogicMonitor. Add the following block to the configuration file. 

 # records are filtered against a tag
<filter lm.filter>
  @type record_transformer
  <record>
    system.hostname "#{Socket.gethostname}"
	service "lm.demo.service"
  </record>
</filter>

You can add more advance filtering, and write a Ruby code block with record_transformer. For better performance, you can use record_modifier. See the Fluentd documentation.

More Examples

Fluentd provides a unified logging layer which can be used for collecting many types of logs that can be forwarded to LogicMonitor for analysis. The following lists additional sources of configuration samples.

Tuning Performance

In some cases you might need to fine-tune the configuration to optimize the Fluentd performance and resource usage. For example, if the log input speed is faster than the log forwarding, the batches will accumulate. You can prevent this by adjusting the buffer configuration.

Buffer plugins are used to store incoming stream temporally before transmitting. See the Fluentd documentation.

There are these types of buffer plugins:

  • memory (buf_memory). Uses memory to store buffer chunks.
  • file (buf_file). Uses files to store buffer chunks on disk.

In the following configuration example, Fluentd creates chunks of logs of 5mbs (_chunk_limit_size), and sends them to LogicMonitor every 1 second (flush_interval). Note that even if 5m is the upper limit, the chuncks will be sent every 1 second even if their size is smaller than 5m.

<match lm.**>
    @type lm
    company_name LM_COMPANY_NAME
    access_id LM_ACCESS_ID
    access_key LM_ACCESS_KEY
    <buffer>
        @type memory   
        flush_interval 1s
        chunk_limit_size 5m
    </buffer> 
    debug false
</match>

Adjusting Rate of Incoming Logs

If the log input speed is faster than the log forwarding, the batches will accumulate. If you use the memory-based buffer, the log chunks are kept in the memory, and memory usage will increase. To prevent this, you can have multiple parallel threads for the flushing.

Update the buffer configuration as described in the following. Adding flush_thread_count 8 increases the output rate 8 times. 

 <match lm.**>
    @type lm
    company_name LM_COMPANY_NAME
    access_id LM_ACCESS_ID
    access_key LM_ACCESS_KEY
    <buffer>
        @type memory   
        flush_interval 1s
        chunk_limit_size 5m
		flush_thread_count 8
    </buffer> 
    debug false
</match>

Using File-Based Buffering

If you have an upper limit for parallel thread processing, and have a spike in the incoming log rate, you can use file-based buffering instead. To use this, change @type memory to @type file in the buffer configuration block. Note that this may result in increased I/O operations.

Troubleshooting

Enable debug logging by setting the debug property to “true” in fluent.conf to see additional information in the Fluentd console. The following describes some common troubleshooting scenarios when using Fluentd. For more information on logs troubleshooting, see Troubleshooting.

Delayed Ingestion for Multi-Line Events

For multi-line events, log ingestion might be delayed until the next log entry is created. This delay occurs because Fluentd will only parse the last line when a line break is appended at the end of the line. To fix this, add or increase the configuration property multiline_flush_interval (in seconds) in fluent.conf.

Resource Mapping Failures

In these cases Fluentd appears to be working but logs do not appear in LogicMonitor. This is most likely caused by incorrect or missing resource mappings. 

By default fluent-plugin-lm looks for “host” and “hostname”  in a log record coming from Fluentd. The plugin tries to map the record to a device with the same value for the “system.hostname” property in LogicMonitor. The resource to be mapped to must be uniqely identifiable with “system.hostname”  having the same value as host” / “hostname” in the log. 

The following are examples of host/hostname mappings.

Example 1

Configuration:

 resource_mapping {"hostname": "system.sysname"}

Result:

 if log : { "message" : "message", "host": "55.10.10.1", "timestamp" : ..........}

The log will be mapped against a resource in LM which is uniquely identifiable by property system.sysname = 55.10.10.1

Example 2

Mapping with multiple properties where devices are uniquely identifiable.

Configuration:

 resource_mapping {"hostname": "system.sysname", "key_in_log" : "Property_key_in_lm"}

Result:

 if log : { "message" : "message", "host": "55.10.10.1", "key_in_log": "value", "timestamp" : ..........}

The log will be mapped against a device in LogicMonitor which is uniquely identifiable by property system.sysname = 55.10.10.1 and Property_key_in_lm= value

Example 3

Hard coded resource mapping of all logs to one resource. The resource has to be uniquely identifiable by the properties used.

Configuration:

 #Tail one or more log files
<source>
.....
</source>

#force resource mapping to single device with record transformer
<filter lm.**>
 @type record_transformer
    <record>
        _lm.resourceId {"system.hostname" : "11.2.3.4", "system.region" : "north-east-1",  "system.devicetype" : "8"}
        tag lm.filter
    </record>
</filter>
# send all logs to Logicmonitor
<match lm.**> 
  @type lm
  company_name LM_COMPANY_NAME
  access_id LM_ACCESS_ID
  access_key LM_ACCESS_KEY
  <buffer>
    @type memory
    flush_interval 1s
    chunk_limit_size 5m
  </buffer> 
  debug false
</match>

Result:

Logs with tag lm.** will be mapped with a resource uniquely identified with property : 
system.hostname = 11.2.3.4
system.region = north-east-1
system.devicetype = 8

In This Article