Monitoring Azure Backup and Replication Jobs

We all know that systems fail. We plan for this with failover partners and system backups. But can you really trust your backups? If you are using Azure monitoring, your backup and site recovery can be complicated. LogicMonitor provides clarity. Our Azure Backup monitoring service provides simple, secure, and cost-effective solutions for backing up and recovering your data using the Azure cloud. It allows recovery services for on-premise, Azure VMs, Azure File Shares, SQL Server, and SAP HANA databases in Azure VMs. Keep reading for a breakdown of which Azure backup and replication jobs you should keep tabs on.

LogicMonitor dashboard showing Azure Backup Status and time since last backup.

Monitoring Backup Jobs

Monitoring your backups should be as easy as looking at a dashboard. LM Cloud will detect your recovery services vaults and show you their metadata so that you can monitor your backup and replication jobs. LogicMonitor now shows the status of the latest backup and replication jobs and time since the last successful jobs for your Azure resources. This allows you to have a quick understanding of the health of your backup services.

As a best practice, we recommend keeping tabs on the following for backup jobs:

  1. Status: This shows the status of your last backup attempt. It can be shown as a single big number or as a graph to track multiple resources. We recommend using a warning at 4 and an error at 5.
  2. Duration: This metric is reported in milliseconds. This can be shown as a single large number or as a graph of numbers that represent each of your resources. We recommend determining a threshold based on how frequent you have the backups set to run.

You can similarly see your replication jobs on a dashboard.

LogicMonitor dashboard showing Azure job status, time since backup and Error Count.

Monitoring Replication Jobs

As a best practice, we recommend keeping tabs on the following for replication jobs:

  1. State: This gives a numeric value that maps the result of the last replication attempt. It can be shown as a single big number or as a graph to track multiple resources. We recommend alerting if the state was 7 – Failed. 
  2. Time Since Start: This shows the time elapsed since the last job was started. We recommend determining a threshold based on how frequent you have the backups set to run.
  3. Error Count: This metric lets you know there was an error in the last attempted replication job. We recommend alerting if the value is greater than 0.

Following these guidelines, you can move forward with confidence and know that your backup and recovery in Azure is working. Another benefit is that the time previously spent on this can now be allocated to other important tasks. Be secure in your cloud and on-premises monitoring with LogicMonitor’s insight into the performance of your environment. You can try it free, or book a free demo today.