The purpose with insights is to intelligently group together alerts that share one or more areas of commonality. Dexda reduces alert noise by de-duplicating events, and correlating, grouping and prioritizing alerts of into actionable quality insights. Through these, IT professionals have fewer issues to manage and can identify and act on problems more quickly.
The following describes the concept of insights. For information on how to work with insights, see Exploring Data.
What are Insights?
Insights are based on alerts, which in turn are based on events from monitored sources, automatically grouped together by machine learning applications. An insight is a record that a collection of alerts has been grouped. Insights are displayed in dashboards, and can be further investigated through inspection views. Insights can be acted upon, either automatically, or manually.
Using a set of specialized algorithms, Dexda identifies hidden patterns within the text features of alert data. Dexda analyses both feature and temporal aspects of alerts to dynamically manage their clustering. IT Professsionals can transpose their business knowledge onto the correlation application by targeting Dexda’s pattern discovery at specifc alert data holding business value. For example, an enrichment that adds application or business service context to an alert.
Collecting the Data
Events collected from monitored sources are observed changes to the normal behaviour of a system, environment, process, or workflow. Examples: Router ACLs were updated, a firewall policy was pushed, or a threshold breach occured. Events are typically identified or reported by a monitoring tool. Events can also be created directly by an application or a service, or by a Configuration Item (CI) such as an IoT device or its management platform.
Events from different source formats are normalised and restructured into a homogeneous form – Dexda Common Event Format (CEF). This enables Dexda to analyse and process events in the same way, regardless of their origin. Incoming events are analysed, monitored and de-duplicated, and repeated series of a single event instance are stored into a single alert record.
When using the LogicMonitor integration, be aware that events and alerts are different concepts in Dexda. Events in Dexda are incoming alerts from LogicMonitor. New alerts and their subsequent updates are the output of LogicMonitor’s Alert Evaluation processing phase. Each alert transaction (creation, up/downgrade, closure) resulting from that phase is processed as a separate event in Dexda.
Dexda alerts have their own lifecycle represented in a series of states from new through to closed. Whilst the Dexda alert is in an open state, any reoccurrence of a LogicMonitor alert instance will de-duplicate under the open Dexda alert. This ensures that alert state is accurately reflected in Dexda, and provides a point of control for correlation and/or escalation.
Events to Alerts Flow
The following describes the default flow for the LogicMonitor integration. However, the flow is similar for other types of monitoring platforms.
Every single incoming event will trigger the “events to alerts” flow. The processing of events into de-duplicated alerts is automated through these artefacts:
- A Rule “LogicMonitor event processing” containing:
- A filter to conditionally match LogicMonitor events where “Source” equals “LogicMonitor”.
- An Action Group to trigger when an event is matched.
- An Action Group “Alert Processing” defining a set of actions executed in sequence. This implements a process flow and state model described in the following.
The process steps:
- For an incoming event, Dexda first checks whether it can be de-duplicated into an existing alert for the same event key.
- If the event can be de-duplicated into an existing open alert, then the alert is updated to reflect the additional of the new event.
- The #event counter is incremented and the LastTimestamp set to the new event timestamp.
- If a ServiceNow incident exists for the alert then it will be updated, else one will be created.
- Waits for an alert timeout. This is 24 hours by default (can be customized).
- After the timeout expires the alert is set to “closed”.
- If the event cannot be de-duplicated to an existing open alert, a new alert is created.
- Waits for correlation timeout for 15 minute (can be customised). This is the maximum time Dexda will wait for an alert to cluster before a ServiceNow ticket will be created for the alert.
- If not correlated after 15 minutes a new ServiceNow incident is created and the state changes to “incident”.
You can see the different steps of the process flow when looking at the Sequence for the Alert Processing Action Group.
During the alert timeout waiting period, alerts that cannot be correlated will appear in the Uncorrelated Alerts panel on the dashboard. See Using Dashboards.
Alert State Model
The alert record contains these state fields:
- Alert State: The state of the underlying monitoring that created the alert. The alert state can transition between “active” and “cleared” many times during the lifecycle while an alert is in a non-closed state.
- Escalation: The state of the alert’s workflow during its lifecycle, from “new” through to “closed”. Once closed no further alert updates are permitted to the alert.
The alert state process contains these flows:
- Flow 1: An alert comes in as “new”, and is correlated into an insight within 15 minutes.
- Flow 2: An incoming “new” alert cannot be correlated, and an incident is automatically created for it.
- Flow 3: An incoming “new” alert is manually assigned in Dexda, overriding the automatic incident creation. The alert stays in “new” state until it is assigned and an incident has been created.
- active: A problem state, for example, an alert level is above a defined threshold.
- cleared: A resolution state, for example, an alert level is below a defined monitoring threshold. Note that this is not the same as “closed”.
- new: Waiting to be correlated, or assigned if using a manual workflow.
- assigned: The alert (uncorrelated) has been assigned in a manual workflow.
- correlated: The alert has been correlated with existing alert grouping.
- incident: An incident has been created for an uncorrelated alert.
- closed: When the alert record itself is closed. Alert closure can be triggered through Dexda automations. For example if a related ServiceNow incident is marked as Resolved, or the maximum timeout (24 hours) is reached.
You can change the process flow by adding further states and create a manual process. However, the state of an alert must always start with “new” and end with “closed”, and cannot be reopened.
Alerts to Insights Flow
Using Machine Learning (ML) processes alerts are grouped into insights. The output from the ML processing are ML observations, which are a special types of event where the ML source is correlation. Using correlation processing rules and an Alert Correlation Action Group, Dexda creates an insight record to track the status of the correlation, and the workflow for the investigation and resolution.
Following the minimum number of alerts to correlate to a cluster (default is 2), Dexda creates the first insight. The default time window for incoming alerts after the first insight has been created is 15 minutes. Any incoming alert within that time frame matching the correlation rules will be added to the same alert cluster. The correlation model follows a certain prioritization hierarchy:
- Correlation by CI (Configuration Item)
- Correlation by Description
The predefined correlation models and rules can be cloned and customized.
Tags are keywords created through processing of descriptions associated with events and alerts. The description information is summarized and tags are added from a specific dictionary. Based on the correlation model, the description has to be similar (80%) to pick a matching keyword. Tags are propagated from events through alerts, to insights. You can define your own custom tags and add these to the dictionary.
Severities assigned to an insight are based on the maximum severity of associated alerts in the alert cluster for the insight.
The machine learning output observations (ml records) from the processing are automatically processed into insights through these artifacts:
- A Rule “Correlation processing” containing:
- A filter to conditionally match ML records where “ML Source” equals “correlation”.
- An Action Group to trigger when an alert is matched.
- An Action Group “Alert Correlation” defining a set of actions executed in sequence. This implements a process flow and state model described in the following.
For an incoming event (ml) with source set to “correlation”, the action group “Alert Correlation” is run. This waits for 15 minutes to correlate. During this process, the alert goes from state “new” to “correlated”, and will appear in the list of Uncorrelated Alerts on the dashboard. When successfully correlated, the alert is added to an insight in the list of Open Insights, and disappears from the list of uncorrelated alerts.
Uncorrelated alerts are alerts that did not fit the correlation model. To reduce the number of uncorrelated alerts, you can refine and finetune the model for example by increasing the time window to adjust the threshold.
Insight State Model
The insight record contains these state fields:
- Insight State: The state of the underlying machine learning that created the insight (for example a correlation). This state field can transition between “active” and “cleared” many times during the lifecycle of the insight.
- Escalation: The state of the insight’s workflow during its lifecycle, from “new” through to “closed”. Once closed no further updates are permitted to the insight.
- active: Insight level is above a defined threshold.
- cleared: Insight level is below a defined monitoring threshold. Note that this is not the same as “closed”.
- timeout: Waiting for the insight to be cleared, or timed out.
- new: Waiting to be correlated, or assigned if using a manual workflow.
- assigned: The insight has been assigned in a manual workflow.
- incident: An incident has been created for the insight.
- closed: When the insight record itself is closed either in ServiceNow or in Dexda.