DataSource Style Guidelines
Our crack team of monitoring engineers have come up with a number of best practices and other recommendations for the effective creation of DataSources. When creating or modifying DataSources we strongly recommend you follow these guidelines.
- The DataSource Name serves as the unique "key" by which a datasource is identified within your account. As such, we recommend a naming standard along the lines of: Vendor_Product_Monitor. For example: PaloAlto_FW_Sessions, EMC_VNX_LUNs, Cisco_Nexus_Temperature, etc.
- Because the Name serves as a "key" we recommend that it not include spaces.
- Some legacy datasources included a trailing dash in their name to differentiate multi-instance vs single instance, but this is no longer necessary nor recommended.
- The Display Name is used solely for display purposes of the DataSource within the device tree. As such, it needs to contain only enough detail to identify the Datasource within the device itself. For instance, a datasource named EMC_VNX_LUNs could have a display name of LUNs.
Description & Technical Notes
- The Description field is used to briefly describe what is monitored by this DataSource. This information is included in the device tree, so should be sufficiently succinct to describe only what the DataSource does.
- DataSource Technical Notes should contain any implementation details required to use this module. For example, if specific device properties are required to be set on the parent device, or if this datasource is relevant only to specific OS or firmware versions, these details should be noted here.
- Because DataSource groups provide an additional tier within the a device's DataSource tree, grouping is only useful when you'd like "hide" ancillary datasources for a particular device. For instance, the default DataSources of DNS, Host Status, and Uptime are grouped because they aren't the primary instrumentation for a device. Grouping should not be used for a group of datasources that provide primary instrumentation for a system. For instance, a suite of DataSources for monitoring various NetApp metrics should not be grouped, because you want the NetApp-specific monitoring to have primary visibility in the device tree.
AppliesTo & Collection Schedule
- Construction of the AppliesTo field should be carefully considered so that it's neither overly broad (e.g. using the isLinux() method) nor overly narrow (e.g. to a single host). Typically it's best that -- where possible -- a DataSource is applied to a system.category (via the hasCategory() method), and that a corresponding PropertySource or SNMP SysOID Map build to automatically set the system.category.
- The Collection Schedule should be set to an interval appropriate to the data being checked. Instrumentation that changes quickly (CPU) or are important to alert on quickly (ping loss) should have a very short polling cycle such as 1 minute. Items that can tolerate a longer period between alerts, or change less frequently (drive loss in a RAID system, disk utilization ) should have a longer poll cycle -- maybe 5 minutes. Longer poll cycles impose less load on the server being monitored, the Collector, and the LogicMonitor platform.
- ActiveDiscovery should be employed only when the datasource is designed to monitor one or more instances of a given type, or for a single instance that may frequently disappear and reappear. If the system being monitored has only a single static instance, better to use a PropertySource to discover the instance and set the device property accordingly.
- Care should be taken with the "Do not automatically delete instances" setting. This flag should be set only if a failure in the system being monitored would result in ActiveDiscovery no longer being able to detect the instance. In these cases you want the "Do not automatically delete instances" to be set. Otherwise, if ActiveDiscovery runs shortly after a failure, the failing instance would be removed from monitoring and the alerts cleared. For example, if an application instance is detected simply by ActiveDiscovery monitoring TCP response on port 8888, then if the application fails, so that port 8888 was not responding, you would not want ActiveDiscovery to remove the instance.
- Ensure any ActiveDiscovery filters have an appropriate description in the comment field -- this is imminently helpful for diagnosing problems.
- When defining any parameters, attempt to use device property substitution (e.g. ##jmx.port##) rather than hard coding values.
- The Datapoint Name field should -- where possible -- reflect the name used by the entity that defined the object to be collected. For instance, an SNMP datapoint should be named using the object name corresponding to the SNMP OID. Or for a WMI datapoint you'd want to use the WMI corresponding WMI property name.
- Every datapoint should include a human-readable description. In addition to describing the specific metric, the description field should contain the units used in the measurement.
- A Valid Value range should be set to normalize incoming data, as this will prevent spurious data from triggering alerts.
- The Alert Transition Interval for each datapoint with an alert threshold should be considered carefully. The transition interval should maintain a balance of immediate notification of an alert state (a value of 0 or 1) vs quelling potential alerts on transitory conditions (a value of 5 or more).
- Any datapoint with an alert threshold defined should have an custom alert message. The customer alert should format the relevant information in the message text, as well as provide context and recommended remediations where possible.
- Ensure that every datapoint you create is used either in a graph, a complex datapoint, or to trigger an alert. Otherwise it's not being used.
- Create Complex Datapoints only when you need to calculate a value for use in an alert or a report. If you need to transform a datapoint only for display purposes, better to use a virtual datapoint to perform such calculations on-the-fly.
- Graphs Names should use a Proper Name that explains what the graph displays. Overview Graphs names should include a title keyword to indicate that it's an overview (e.g. Top Interfaces by Throughput or LUN Throughput Overview).
- The vertical label on a graph should include either a unit name (°C, %), a description (bps, connections/sec, count), or a status code (0=ok, 1=transitioning, 2=failed). When displaying throughput or other scaled units, it's best to convert from kilobytes/megabytes/gigabytes to bytes and let graph scaling handle the unit multiplier.
- Ensure you've set minimum and maximum values as appropriate for the graph to ensure it's scaled properly. In most cases, you'll want to specify at 0 minimum value.
- Display Priority should be used to sort graphs based on order of relevance, so that the most important visualizations are presented first.
- Line legends should use Proper Names with correct capitalization and spaces.
- When selecting graph colors, use red for negative/bad conditions and green for postiive/good conditions. For example, use a green "area" to represent total disk space and a "red" area in front to represent used disk space.