Monitoring

EMC XtremIO

Instructions on how to optimally configure and group your LogicMonitor portal to monitor your XtremIO:

Management Server:

  1. Create a LogicMonitor user and save credentials.
LogicMonitor:
  1. Create XtremIO (top level group)
    1. assign system category of 'xtremio'
  2. Create a nested group for each management group:
    1. Set group properties 'xio.user' , 'xio.pass' and 'xtremio.xms' (the hostname of the managing XMS)
  3. Add the XMS and each storage controller in the cluster to its associated group which we previously created.
    1. For the management servers, add an additional system category 'xms'.
    2. Every storage controller inside the cluster will have its own IP address.

Commonly Ask Questions:

1. Question: Why did LogicMonitor build the datasources in such a way ? Why are we organizing everything in groups ?

  • Answer: In the world of EMC XtremIO, in order to communicate with the cluster, you must go through the XMS. All API requests must go through the XMS in order to pull metrics about the cluster or the storage controllers, hence we group all related storage controllers together and easily set the XMS hostname at the top level.

2. Question: Why do I need to add every single storage controller as devices? Why don’t I just rely on the data the XMS returns?

  • Answer: The most important thing to remember here is that the XMS holds all the metrics. If the XMS goes down, we no longer have a way to know whats going on with the cluster. Just because the XMS went down doesn’t mean the cluster itself went down as well. Adding all the storage controllers as devices, we can do basic ping checks and host status checks in order to be certain if the storage controller went down, or there’s just problems with the XMS.

3. Question: Why are there so many more datasources applied to storage controller 1 (X1-SC1) in comparison to all the other controllers ?

  • Answer: We apply all Cluster-Level metrics to the first storage controller of every cluster. We do this to ensure alerts are not duplicated across multiple devices which are a part of the same cluster. We apply storage controller level metrics to the storage controllers as those will only alert from a single device, but cluster wide will only originate from a single source.