Apache Hadoop Monitoring

Last updated on 17 March, 2023

Overview

Apache Hadoop is a collection of software allowing distributed processing of large data sets across clusters of commodity hardware. The LogicMonitor Hadoop package monitors metrics for the following components:

  • HDFS NameNode
  • HDFS DataNode
  • Yarn
  • MapReduce

​Compatibility

As of February 2020, we have confirmed that our Hadoop package is compatible with version 3.2.1. It may be possible to monitor older versions of Hadoop, but data will not be returned for all datapoints.

Setup Requirements

Enable JMX on Hadoop Host

LogicMonitor collects Hadoop metrics via the REST API rather than directly via JMX. However, metrics are originally collected and stored using JMX and, therefore, JMX must be enabled on the Hadoop host. For more information on enabling JMX, see the “Enabling JMX” section of the Java Applications (via JMX) Monitoring support article.

Add Hosts Into Monitoring

Add your Hadoop host(s) into monitoring. For more information on adding resources into monitoring, see Adding Devices.

Assign Properties to Hadoop Resources

The following custom properties must be set on the Hadoop resource(s) within LogicMonitor. For more information on setting properties, see Resource and Instance Properties.

Property

Value

hadoop.namenode.port The TCP port on which Hadoop exposes HDFS namenode metrics via the REST API. This port can be customized, but, by default, Hadoop uses port 50070.
hadoop.datanode.port The TCP port on which Hadoop exposes HDFS datanode metrics via the REST API. This port can be customized, but, by default, Hadoop uses port 50075.
hadoop.secondarynamenode.port The TCP port on which Hadoop exposes secondary namenode metrics via the REST API. This port can be customized, but, by default, Hadoop uses port 50090.
hadoop.yarn.port The TCP port on which Hadoop exposes Yarn metrics via the REST API. This port can be customized, but, by default, Hadoop uses port 8088.
hadoop.mrhistory.port The TCP port on which Hadoop exposes MapReduce history metrics via the REST API. This port can be customized, but, by default, Hadoop uses port 19888.

Note: These ports must be open to the Collector.

Note: To verify the correct port is being used, you should be able to access http://<HOST>:<HTTP_PORT>/jmx and view metrics for each of the various components.

Import LogicModules

From the LogicMonitor repository, import all Hadoop LogicModules, which are listed in the LogicModules in Package section of this support article. Upon import, these LogicModules will be automatically associated with your Hadoop resources, assuming the properties listed in the previous section are assigned. ​

LogicModules in Package

​ LogicMonitor’s package for Apache Hadoop consists of the following LogicModules. For full coverage, please ensure that all of these LogicModules are imported into your LogicMonitor platform.

Display Name

Type

Description

addCategory_Hadoop_ResourceManager PropertySource Discovers Hadoop resource managers.
addCategory_Hadoop_NameNodePrimary PropertySource Discovers Hadoop name nodes.
addCategory_Hadoop_NameNodeSecondary PropertySource Discovers Hadoop secondary name nodes.
addCategory_Hadoop_DataNode PropertySource Discovers Hadoop datanodes.
Apache_Hadoop_Info PropertySource Generates host properties for Apache Hadoop devices.
Hadoop MapReduce Jobs DataSource Metrics for MapReduce jobs running on Hadoop.
Hadoop Yarn Cluster Status DataSource Cluster metrics reported from Hadoop HDFS resourcemanager about service statuses.
Hadoop Yarn JVM Performance DataSource Monitors JVM performance on Hadoop Yarn resourcemanager nodes.
Hadoop Yarn Applications DataSource Metrics on Yarn application running on Hadoop.
Hadoop Yarn Capacity Scheduler DataSource Capacity scheduler metrics for Hadoop Yarn resourcemanager nodes.
Hadoop Yarn RPC Activity DataSource Monitors RPC activity on various ports for Hadoop HDFS ResourceManager.
Hadoop Yarn Queue Metrics DataSource Aggregate metrics for Hadoop Yarn queue.
Hadoop HDFS NameNode Status DataSource Status of HDFS NameNode servers.
Hadoop HDFS DataNode RPC Activity DataSource RPC metrics for Hadoop HDFS DataNode.
Hadoop HDFS DataNode Volumes DataSource Volume metrics for Hadoop HDFS.
Hadoop HDFS DataNode Activity DataSource Metrics on datanode activity for Hadoop HDFS.
Hadoop HDFS NameNode RPC Activity DataSource RPC metrics for Hadoop HDFS NameNode.
Hadoop HDFS NameNode Info DataSource Metrics on the amount of space used across the cluster.
Hadoop HDFS DataNode FS State DataSource Metrics on the FS state in Hadoop HDFS.
Hadoop HDFS NameNode Activity DataSource Overview of operations on the primary namenode for Hadoop HDFS including file generation, snapshot information, and edit log synchronization operations.
Hadoop HDFS StartUp DataSource Status of HDFS NameNode startup tasks.
Hadoop HDFS JVM DataNode Performance DataSource Monitors the JVM performance for Hadoop HDFS datanodes.
Hadoop HDFS NameNode JVM Performance DataSource Monitors the JVM performance for Hadoop HDFS namenodes.
Hadoop HDFS Secondary Namenode JVM Performance DataSource Secondary Namenode JVM performance for Hadoop HDFS.
Hadoop HDFS NameNode FSNamesystem DataSource Metrics on the Hadoop HDFS NameNode FSNamesystem.
Hadoop HDFS NameNode Retry Cache DataSource Monitors Hadoop HDFS Namenode retry cache.

Configuring Datapoint Thresholds

The Hadoop package does not include predefined datapoint thresholds (in other words, no alerts will trigger based on collected data). This is because the technology owner has not provided KPIs that can be reliably extended to the majority of users. In order to receive alerts for collected data, you’ll need to manually create custom thresholds, as discussed in Tuning Static Thresholds for Datapoints.

Next are some datapoints for which you may want to consider setting thresholds:

  • DataSource: Hadoop HDFS DataNode FS State
    • NumFailedVolumes. Datapoint that reports total number of failed volumes.
    • Remaining. Datapoint that reports remaining capacity on the datanode.
  • DataSource: Hadoop HDFS NameNode Info
    • NumberOfMissingBlocksWithReplicationFactorOne. Datapoint that reports the number of blocks with only one copy across the cluster.
    • PercentUsed. Datapoint that reports the percentage of used space across the cluster (DFS and non-DFS).
  • DataSource: Hadoop HDFS NameNode Status
    • ServiceRestart. Datapoint that returns a value greater than 0 when the service state changes
    • State. Datapoint that returns a status code indicating the status of the Hadoop namenode service.
  • DataSource: Hadoop HDFS NameNode FSNamesystem
    • CorruptBlocks. Datapoint that reports the current number of blocks with corrupt replicas.
    • CorruptReplicatedBlocks. Datapoint that reports the number of corrupt blocks that have been replicated.
    • FSState. Datapoint that returns a status code indicating whether the FS is operational or in safe mode.
    • MissingBlocks. Datapoint that reports the current number of missing blocks.
    • MissingReplicationOneBlocks. Datapoint that reports the number of missing blocks with replication factor of 1.
    • NumDeadDataNodes. Datapoint that reports the number of datanodes currently dead.
    • UnderReplicatedBlocks. Datapoint that reports the current number of blocks under replicated.
    • VolumeFailuresTotal. Datapoint that reports the total number of volume failures across all datanodes.
  • DataSource: Hadoop Yarn Queue Metrics
    • AppsFailed. Datapoint that reports the number of applications that failed to complete.
  • DataSource: Hadoop Yarn Cluster Status
    • NumLostNMs. Datapoint that reports the current number of lost NodeManagers for not sending heartbeats.
In This Article