One of the difficulties in IT environments is that redundancy can sometimes make outages worse. The problem being that redundancy can often give people (mostly justified) confidence in the availability of their systems, so they design architectures on the assumption that their core switch (or database, or load balancing cluster, or what have you) will not go down.
And they even have monitoring.
But they don’t monitor the state of the redundant server or component. So then the redundant server or component fails, or is unplugged, or synchronization fails, or what have you, and stays that way for weeks with no one noticing. Then the active server or component fails, the other one is already out of commission – and boom – Bad Things happen.
So if you run redundant supervisor modules in your core switches to get high availability, make sure your cisco switch monitoring is capable of monitoring them. Same for redundant power supplies.
Same for active-standby Netscalers, or F5 Big IPs, or NetApp clusters, and or anything that you want to make sure works when needed.
If it’s not monitored, chances are it won’t be there when you need it.