Defining Recovery Alerts

The features described on this page are available through the Hyperic HQ Enterprise Subscription


Defining Recovery Alerts

Recovery Alerts are one of the most powerful and useful features of HQ. You normally set up alert definitions to notify you in case of problems, however, as the conditions persists, the alerting engine will continue to alert you to the same problems. In order to avoid this redundancy, an alert definition can be configured to disable itself after an alert. A recovery alert would be the opposite condition that you define for when the problem has been corrected or systems are behaving normal again. It would also re-enable the alert definition for the original problem that had been disabled after alerting, and optionally notify you that recovery has taken place. A properly configured alert/recovery alert combination keeps you notified of problems without deluging you with alert notifications.

An example of a good use for an alert/recovery alert combination would be an administrator who wants to know when a particular Linux platform is under a heavy CPU load. The initial alert to indicate a problem could be 1 Minute Load Avg > 2.0. The recovery alert could be 1 Minute Load Avg < 110% of Baseline.

In the event that the Linux 1 minute system load average goes over 2.0, the administrator would get an email alert notification about this event. That alert would automatically become disabled in HQ so no more alert notifications will be sent about that alert, no matter how long the system load average remains over 2.0. However, when the system load average drops to < 110% of the baseline for the platform, two things will happen. First, the administrator will get another alert notification that says the system load average is back to normal. Second, the first alert will be automatically re-enabled so if the system load average goes back above 2.0 the administrator will get an alert notification.

To configure this type of alert combination, simply define the initial alert (1 Minute Load Avg > 2.0) normally, and at the bottom of the New Alert Definition page in the Enable Action Filters section, check the checkbox for the Disable alert until re-enabled manually or by recovery alert. Click OK to create the alert, then set alert notification as appropriate.

Next, create the recovery alert by defining an alert with your recovery criteria (1 Minute Load Avg < 110% of Baseline) normally, and in the Recovery Alert drop-down box, select the previous alert that you have just created. Click OK to create the alert and set alert notification as appropriate. When the condition of this second alert is fulfilled, the previous alert will be re-enabled.

Labels

 
(None)
System Monitoring Software
SourceForge.net Logo