LogicMonitor’s collectors are configured to work well in most environments, but can need tuning.

Performance Overview

There is a trade-off between the collector’s resource consumption (CPU and memory) and performance. The collector by default does not consume many resources, so tuning of the collector may be required in large environments, environments where a collector is not doing a variety of work (e.g. a collector doing almost all JMX collection, instead of a mix of SNMP, JMX and JDBC), or environments where many devices are not responding.  Tuning may involve adjusting the collector’s configuration, or it may involve redistributing workloads.

A common reason for collectors to no longer be able to deal with the same devices they have been monitoring is if some devices no longer respond. For example, if a collector is monitoring 100 devices with no queuing, but then starts showing task queueing, or is unable to schedule tasks, this may well be because it can no longer collect data from some of the devices.  If it was talking to all those devices via JMX, and each device normally responded to a JMX query in 200 ms, it could cycle through all the devices easily. However, if the JMX credentials now mismatch on 10 of the hosts, so that they do not respond to LogicMonitor queries – the collector will keep a thread open until the configured JMX timeout. It will now be keeping several threads open, waiting for responses that will never come. Tuning can help in this situation.

How do I know if a Collector needs tuning?

Assuming you’ve set up Collector monitoring, you will be alerted by the Collector Data Collecting Task datasource if the collector is unable to schedule tasks. This is a clear indication that the workload of a collector needs tuning, as data is not being collected in accordance with the datasource schedule. This may result in gaps in graphs. Another metric to watch is the presence of elements in the Task Queue. This indicates that the collector is having to wait to schedule tasks, but that they are still completing in the appropriate time – so it’s a leading indicator of a collector approaching its configured capacity.

You can see on the below graphs that the Collector datasources clearly show an overloaded collector – there are many tasks that cannot be scheduled, and the task queue is very high. After tuning (Aug 26), the number of successful tasks increases; unscheduled tasks drops to zero, as does the task queue.

 

How would I know if a Collector needs tuning?

A good proactive behavior is to create a collector dashboard, and create a Custom Graph showing the top 10 collectors by the datapoint UnavailableScheduleTaskRate for all instances of the Data Collecting Task datasource on all devices, and another showing the top 10 collectors by TasksCountInQueue. Given each collector has many instances of these datasources (one for each collection method), you may have to specify specific collection methods as instances – snmp, jmx, etc – in order to not exceed the instance limit on a custom graph. Otherwise set instances to a star (*) to see all methods on one graph.

Collector Tuning

The easiest way to tune your Collector is simply to increase the Collector size. The small Collector only uses 2GB of memory, but can perform more work if upgraded to a larger size (and the server running the Collector has the memory available). The Collector’s configurations can also be modified manually, as discussed in Editing the Collector Config Files

In general, there are two cases that could require Collector tuning:

  • when devices are not responding
  • when the Collector cannot keep up with the workload

Both are often addressed by increasing Collector Size, which should be your first step. However, if you’ve already tried increasing the size and still see performance issues, you may find it helpful to do a little fine tuning.

Devices not responding

If devices are failing to respond to a query from the collector, because they have had their credentials changed, the device is offline, the LogicMonitor credentials were incorrectly set, or other reasons, you should get alerts about the protocol not responding on the device. The best approach in this situation is to correct the underlying issue (set the credentials, etc), so that the monitoring can resume on the devices. However, this is not always possible.  You can validate from the  Collector debug window (Under Settings…Collectors…Manage Collector…Support…Run Debug Command) whether this issue is impacting your collectors. If you run the command !tlist c=METHOD, where method is the data collection method at issue (jmx, snmp, WMI, etc), you will get a list of all the tasks of that type the collector has scheduled.

If you see many tasks that failed due to timeout or non-response – those tasks are keeping a thread busy for the timeout period of that protocol. In this situation, it may be appropriate to reduce the configured timeout, to stop threads from blocking for so long. The default for JMX timeouts was 30 seconds at one point – which is a very long time for a computer to respond.  Setting that to 5 seconds (the current default) means that for a non-responsive device, 6 times as many tasks can be processed in the same time.  Care should be taken when setting timeouts to ensure they are reasonable for your environment. While it may be appropriate to set the JMX timeout to 5 seconds, the webpage collector may be left at 30 seconds, as you may have web pages that take that long to render. Setting a timeout to a shorter period than it takes devices to respond will adversely affect monitoring.

To change the timeout for a protocol, you must edit the collector configuration manually from the Collector Configuration window. Edit the collector.*.timeout parameter to change the timeout for the protocol you want (ex: change collector.jmx.timeout=30 to collector.jmx.timeout=5).

You may also need to increase the number of threads, as well as reducing the timeout period – see the section below.

Collector cannot keep up with workload

If the Collector is still reporting tasks cannot be scheduled, it may be appropriate to increase the number of threads for a collection method. This will allow the collector to perform more work simultaneously (especially if some threads are just waiting for timeouts), but will also increase the collectors CPU usage.

To increase the threads available to a collection method, you must edit the collector configuration manually from the Collector Configuration window. Edit the collector.*.threadpool parameter to change the threadpool allotment for the protocol you want (ex: change collector.jmx.threadpool=50 to collector.jmx.threadpool=150).

It is recommended to increase the threadpool setting gradually – try doubling the current setting, then observing behavior.  Note changes in the collector’s CPU utilization, and Heap utilization – more threads will use more CPU, and place more demands on the JVM heap. If the collector heap usage (shown under the Collector datasource Collector JVM Status) is approaching the limit, that may need increasing too.

If a collector has had its threads increased and its heap increased, and still cannot keep up with the workload (or is hitting CPU capacity) – it is time to add another collector and split the workload amongst the collectors. 

The LogicMonitor Collector is an application that runs on a Linux or Windows server within your infrastructure and uses standard monitoring protocols to intelligently monitor devices within your infrastructure.

LogicMonitor Collectors are not agents and do not have to be installed on every resource within your infrastructure that you would like monitored. Rather, you should install a Collector on a host in each location of your infrastructure. For more information, see Installing Collectors.

The Collector retrieves data from all the devices assigned to it, then encrypts the data and sends it back to the LogicMonitor servers over an outgoing SSL connection.

One Collector can typically monitor hundreds of devices; however, this capacity depends on how many metrics are being monitored for each device, as well as the available resources of the server on which the Collector is installed. For more information on capacity, see Collector Capacity.

How Collectors Determine What Metrics to Monitor for Devices

When you add a device into monitoring, LogicMonitor applies built-in intelligence to recognize what kind of device it is. Based on the information discovered about the device, LogicMonitor DataSources are applied.

DataSources are templates that tell the Collector how to monitor the device, what metrics to collect for the device, how to display those metrics as graphs, and what values indicate issues that need attention. LogicMonitor installs with hundreds of pre-built DataSources that will automatically apply when you add devices into your account.

Collector Data Storage

All of the data from your Collectors is consolidated in a LogicMonitor data center, and this data is accessible in your LogicMonitor portal from anywhere with an internet connection. This necessitates that the server your Collector is installed on can make an outgoing HTTPS connection to LogicMonitor’s data centers (note, however, that Collectors can be installed on proxy servers).

Ports Used by Collectors

The server on which a Collector is installed must be able to able to make an outgoing HTTPS connection to the LogicMonitor servers (proxies are supported). In addition, the ports for the monitoring protocols you intend to use (e.g. SNMP, WMI, JDBC, etc.) must be unrestricted between your Collector machine and the resources you want to monitor.

The following tables document how the Collector communicates outbound traffic so that firewall rules can be configured accordingly. Additionally, it highlights the use cases in which the Collector is listening for inbound traffic and, when applicable, the configurations that can be used to update these inbound ports.

Inbound communication

Port Protocol Use Case Configuration Setting
162 UDP SNMP traps received from target devices eventcollector.snmptrap.address
514 UDP Syslog messages received from target devices eventcollector.syslog.port
2055 UDP NetFlow data received from target devices netflow.ports
6343 UDP sFlow data received from target devices netflow.sflow.ports
7214 HTTP/ Proprietary Communication from custom JobMonitors to Collector service httpd.port

Outbound communication

Port Protocol Use Case Configuration Setting
135 TCP The RPC Endpoint Mapper uses port 135 to support WMI and PerfMon DataSources to help the Collector communicate with monitored devices. It enables the Collector to locate a temporary port which the device can use to send performance information. N/A
443 HTTP/TLS Communication between the Collector and the LogicMonitor data center (port 443 must be permitted to access LogicMonitor’s public IP addresses; If your environment does not allow the Collector to directly connect with the LogicMonitor data centers, you can configure the Collector to communicate through a proxy.) N/A
445 TCP For PerfMon datasources, the Collector connects to Windows system over port 445 using the SMB protocol. The PerfMon datasource uses the special IPC$ share to initiate communication and interact with the system services to collect performance data such as CPU, memory, and disk usage. N/A
Other non-privileged SNMP, WMI, HTTP, SSH, JMX, etc. Communication between Collector and target resources assigned for monitoring N/A

Internal communication

Port Protocol Use Case Configuration Setting
7211 Proprietary Communication between Watchdog and Collector services to OS Proxy service (sbwinproxy/sblinuxproxy) sbproxy.port
7212 Proprietary Communication from Watchdog service to Collector service agent.status.port
7213 Proprietary Communication from Collector service to Watchdog service watchdog.status.port
15003 Proprietary Communication between Collector service and its service wrapper N/A
15004 Proprietary Communication between Collector service and its service wrapper N/A

For instructions on editing a Collector’s configurations, see Editing the Collector Config Files.

Collector Security

The LogicMonitor Collector has been carefully designed and developed with high security in mind. For details on Collector security measures and recommended best practices, see LogicMonitor Security Best Practices.

Note: Windows Defender Credential Guard is not supported and should not be enabled on Windows Collectors. The security platform has application requirements, such as blocking specific authentication capabilities, that may interfere with Collector operation.

Anti-malware Considerations

LogicMonitor Collector undergoes rigorous security testing and is digitally signed using a DigiCert code signing certificate to ensure the authenticity and integrity of each release. This guarantees that the code has not been altered or tampered with after publication, providing users with a secure and trusted experience. Despite this, the network traffic patterns may look suspicious to anti-malware tools such as Heuristic antivirus or intelligent endpoint detection and response services. If you choose to run such software on collector systems, be aware that it may interfere with the collector’s operations. Frequent collector service restarts and process crashes are some of the common indicators of anti-malware interference.

LogicMonitor recommends to follow a targeted and balanced approach to address potential threats without compromising the system’s overall protection. Follow these guidelines to tune anti-malware alerts: 

For more information on setting exclusions in common anti-malware packages, see the following resources:

Open Source Software (OSS) List in Collector Installer

LogicMonitor has automated the OSS license report generation process. With every Collector release – Early Access (EA), Optional General Releases (GD), Required General Releases (MGD), and patch releases, a report of the OSS licenses used by the Collector is generated and bundled with the Collector installer. You can access the report file at the following locations:

Note: The AGENT_ROOT is the install path. The default value for Linux is – /usr/local/logicmonitor/agent and for Windows it is – C:\Program Files\LogicMonitor\agent.

Windows Collector Installation Directory Components

The AGENT_ROOT is the collector install path. The default AGENT_ROOT value for Linux and Windows is:

A summary of the components used in the Windows collector installation directory is given in the following table:

Windows Collector DirectoryDescription
<AGENT_ROOT>\SNMP-MIB-Copyrights.txtThis file contains copyrights of the out-of-the-box MIB files used for translating SNMP traps which are ingested as LM logs.
<AGENT_ROOT>\binThe folder bin contains executables and DLL files that are required to start, stop, and uninstall the Agent and Watchdog services.
<AGENT_ROOT>\bin\queuesThis consists of persistent queues for data reporting, and files for converting collector users to non-root or non-admin.
<AGENT_ROOT>\conf\agent.confThis configuration file controls the business behavior of collector. It consists of all data collection, active discovery, auto property, and other business logic configurations.
<AGENT_ROOT>\conf\sbproxy.confThis configuration file controls the internal behaviour of collector sbwinproxy process. It is recommended that you do not change this configuration.
<AGENT_ROOT>\conf\watchdog.confThis configuration file controls the internal behaviour of collector Watchdog service. It is recommended that you do not change this configuration.
<AGENT_ROOT>\conf\wrapper.confThis configuration file controls the internal behaviour of collector Wrapper service. It is recommended that you do not change this configuration. However, in exceptional cases,  to enlarge the memory that collector can use or the Java Classpath, you must additionally load a collector.
<AGENT_ROOT>\diagnosetoolThis utility contains a number of predefined checks related to configurations, memory, network, processes, systems, and more. It also contains some SNMP commands such as snmpbulkget, snmpbulkwalk, snmpget, and snmpwalk.
<AGENT_ROOT>\libThe lib folder contains libraries created by collector and third-party libraries on which the collector code depends.
<AGENT_ROOT>\logsThis file contains multiple logs such as logs related to collector installation, diagnose utility logs, agent logs, sbProxy logs, watchdog logs, and more.
<AGENT_ROOT>\tmpThis folder contains downloaded files used for upgrading and downgrading collectors. It also stores temporary files for monitoring.
<AGENT_ROOT>\configure.sh(Only for Linux directory) When a collector is installed using the install.sh, the configure.sh file is run to configure the collector settings.

Note:

If your environment does not allow the Collector to directly connect with the LogicMonitor data centers, you can configure the Collector to communicate through an HTTP proxy.

Updating SSL and Proxy Settings

By default, collectors are not configured to use proxies. To communicate with HTTP proxies, you need to make updates to several proxy settings located in the collector’s agent.conf file. For detailed instructions on editing the agent.conf file, see Editing the Collector Config Files.

agent config page

Once updated, the new settings should look similar to these:

# SSL & Proxy settings
ssl.enable=true
proxy.enable=true
proxy.host=10.0.0.54
proxy.port=8080
proxy.user=domain\username
proxy.pass=password
proxy.exclude=
proxy.global=false
proxy.pass.isencrypted=false

These new settings designate the following:

Note: The settings specified above reflect a Windows-based proxy requiring authentication. Linux collectors support only basic authentication. Windows collectors support NTLM and other native windows authentication methods.

Changing Proxy Password

If a proxy server has password-based authentication, its credentials are stored in the proxy.user and proxy.pass fields. The proxy password is encrypted. To indicate the encryption, the proxy.pass.isencrypted is set to true. You can set proxy.pass.isencrypted= false if you want to change the proxy password.

Note: This setting is available in collector version 30.104 or later.

  1. Navigate to Settings > Collectors.
  2. Under the Collectors tab, select the collector you want to configure.  
  3. Select the More option and then select Collector Configuration.
    Collector configuration option
    On the Collector Configuration page, settings under the Agent Config tab are displayed.
  4. Scroll to locate the SSL and Proxy settings.
  5. Enter a new password in plain text in the proxy.pass field.
  6. Set the proxy.pass.isencrypted value to false.
    Proxy settings option on Agent config page
  7. Select Save and Restart.
    After the restart, observe that the password is encrypted and the proxy.pass.isencrypted field is set to true.

Troubleshooting Collector Proxy Configuration

We have highlighted some common issues experienced (and how to resolve them) when configuring collectors to be used with HTTP proxies.

Issue: Proxy Authentication Required

When the collector is configured to use a proxy that requires basic authentication, the collector may throw the following exception:

[MSG] [WARN] [main::controller:main] [Controller2._initConfiguration:461] Unexpected status encountered from server. Will retry., CONTEXT=retry=30s, statusCode= 500, errMsg=Unable to tunnel through proxy. Proxy returns "HTTP/1.1 407 Proxy Authentication Required"

In this case, you will want to add the following configuration to the collector’s wrapper.conf:

wrapper.java.additional.16=-Djdk.http.auth.tunneling.disabledSchemes=

Issue: Invalid SSL Certificate

If a collector does not get a valid SSL certificate issued directly from LogicMonitor, it will fail to properly start. In the below example, all SSL certificates in the client environment were being intercepted and reissued using special security software (example, Blue Coat Proxy).

[03-26 15:53:03.222 EDT] [MSG] [INFO] [statusmonitor:::] [StatusListener$1.run:106] Receive peer request, CONTEXT=command=keepalive, charset=windows-1252, peer=/***.***.***.***:******
[03-26 15:53:03.268 EDT] [MSG] [WARN] [statusmonitor::scheduler:] [PropertyFilePersistentHandler._load:94] task file not found, CONTEXT=filename=C:\Program Files (x86)\LogicMonitor\Agent\conf\persistent_task.conf, EXCEPTION=C:\Program Files (x86)\LogicMonitor\Agent\conf\persistent_task.conf (The system cannot find the file specified)
java.io.FileNotFoundException: C:\Program Files (x86)\LogicMonitor\Agent\conf\persistent_task.conf (The system cannot find the file specified)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at com.santaba.common.util.scheduler.impl.PropertyFilePersistentHandler._load(PropertyFilePersistentHandler.java:88)
at com.santaba.common.util.scheduler.impl.PropertyFilePersistentHandler.<init>(PropertyFilePersistentHandler.java:30)
at com.santaba.common.util.scheduler.Schedulers.newPersistentScheduler(Schedulers.java:17)
at com.santaba.agent.collector3.CollectorDb._newScheduler(CollectorDb.java:172)
at com.santaba.agent.collector3.CollectorDb.<init>(CollectorDb.java:68)
at com.santaba.agent.collector3.CollectorDb.<clinit>(CollectorDb.java:65)
at com.santaba.agent.agentmonitor.StatusListener._getAgentStatusResponse(StatusListener.java:279)
at com.santaba.agent.agentmonitor.StatusListener$1.run(StatusListener.java:117) /
[03-26 15:53:03.947 EDT] [INFO] [1] [default] [controller] [Controller2._initHttpService:469] Agent starting with ID - 00baae57-3971-4239-9610-b512aae9c21csbagent
[03-26 15:53:04.232 EDT] [MSG] [INFO] [main::controller:main] [SSLUtilities.checkCertificates:160] Invalid or wrong SSL Certificates found, CONTEXT=info=Found total 2 certificates:
Subject: CN=*.logicmonitor.com, OU=Domain Control Validated
Issuer: CN=SSLInterception87
Type: X.509
SHA1: 9a:a6:ff:33:85:cc:13:4c:3a:13:11:77:5c:ef:5e:a7:74:65:6b:de
MD5: 61:35:08:b5:ec:71:a2:ae:05:c4:7f:54:f1:aa:6f:ad
Valid from: 2017-04-19 10:02:01 -0400
Valid to: 2020-06-18 17:33:09 -0400Subject: CN=SSLInterception3
Issuer: CN=BillyBob's-CA, DC=slhn, DC=org
Type: X.509
SHA1: 6b:a8:1f:61:7b:5d:f0:e4:ee:7e:6a:1b:bb:18:de:67:be:5c:44:1d
MD5: d0:fc:64:da:6f:9b:1f:8d:1a:52:64:dc:41:da:e7:1c
Valid from: 2017-08-09 15:08:18 -0400
Valid to: 2021-10-03 08:53:12 -0400 */
[03-26 15:53:04.232 EDT] [MSG] [WARN] [main::controller:main] [Controller2._initConfiguration:322] SANTABA SERVER ceriticates not trusted, CONTEXT=Host=generic-customer.logicmonitor.com, port=443

Solution A (Preferred)

Have the local administrator add the SSL certificate to your allow list so that it comes into the network unmodified by a proxy/firewall. This is the preferred option because it preserves security.

Solution B

Change the collector configuration setting from:

EnforceLogicMonitorSSL=true

to:

EnforceLogicMonitorSSL=false

Removing SSL enforcement lowers the security of the connection between your collector and LogicMonitor and, for this reason, should be carefully considered before implementing.