Try for free Book a demo

Azure Incident Management with Escalation Policy

Turbo360

5 Mins Read

Azure Incident Management

These days, businesses heavily rely on cloud services like Microsoft Azure to power their operations. While Azure provides robust infrastructure and services, occasional issues and incidents can still occur. Turbo360 provides enhanced capabilities to monitor and manage Azure incidents in a system. But to ensure seamless operations and timely resolution of problems, it is crucial to have a well-defined escalation policy in place for Azure Incident Management..

In this blog, let us explore a sample scenario in Azure and how escalation policies in Turbo360 will help resolve the issues in real-time.

Payment processing scenario

Consider a scenario, when a payment request is made to the Azure API Management which validates the request and triggers an Azure Function along with the payment details securely. The Azure Function processes the payment, stores transaction data in Azure SQL Database, and publishes the payment status to Azure Event Grid. A Logic App subscribes to the Event Grid, listens to the events, and based on the status, sends email notifications to the users. Critical information like credit card numbers and payment gateway credentials will be stored in Azure Key Vault.

Azure Logic App Overview

Failures in payment processing

Now a situation arises, where due to the degradation of Event Gird service the events published to the Event Grid go undelivered resulting in the dead-lettering of events. The above scenario could significantly impact the business operations and customer experience. Let us see how we can configure monitoring for the above scenario using Turbo360.

Configuring monitoring using Turbo360

In the above scenario, the ‘Dead Lettered Events’ metric can be used to monitor the Event Grid. So, whenever the number of dead-lettered events exceeds the configured threshold, alerts will be sent to the configured notification channels. Likewise, Turbo360 provides comprehensive monitoring for all Azure services.

Event Grid Metris and Properties

Reprocessing the failed events using Turbo360

The dead-lettered events in the Event Grid Subscriptions can be resubmitted either to the Event Grid Topic or to the Event Grid Subscription to avoid duplicate processing. Turbo360 allows resubmitting single as well as a bulk of dead-lettered events. Once an alert notification is received from Turbo360 the appropriate team or individual can resolve it by resubmitting the messages from Turbo360.

Reprocessing the failed events using Turbo360

Unresolved or unnoticed alert notifications

Turbo360 provides end-to-end monitoring and resolution to the above scenario. But when the alert notification goes unnoticed or unresolved due to the lack of expertise by L1 support for a period. It would take a significant amount of time and effort for the customers to reach the support team to resolve the issue. The situation may even get worse if the customers have not noticed the payment failure immediately and noticed it only after a couple of days. In such cases, it would be even harder for the support team to backtrack and resolve the issue.

Azure Incident Management

Having one level of alerting in case of any incidents is not an ideal solution in most cases and escalations at multiple levels will be more appropriate. Let us see how the incident can get escalated to different levels using Turbo360.

Resolving the problem using escalation policy in Turbo360

Turbo360 alerting can be configured in the form of escalation policies. An escalation policy in Turbo360 is a set of rules defined to escalate a Turbo360 alert to the configured notification channels after a predetermined amount of time.

An escalation policy can be created by giving a name and the rules. Rules represent the notification channels that are to be notified on each level of escalation. A maximum of 60 minutes is allowed for each rule to be executed. There can be a maximum of 5 rules configured per the escalation policy.

Resolving the problem using escalation policy in Turbo360

An incident can be escalated up to 5 times until it gets acknowledged. When an incident is created, by default it is in open state. It can be acknowledged or closed based on the requirements. The escalation will stop when the alert incident is acknowledged from Turbo360.

Azure Incident Management

Now that the escalation policy is configured in Serverlss360, say the first level of escalation is configured to the Teams notification channel, the second level of escalation to email with an interval of 10 minutes, and finally to the personal mobile number with an interval of 15 minutes. In case, if the alert to the Teams notification channel goes unnoticed, the incident will escalate to the next level. Then the appropriate team can resubmit the failed dead-lettered events to Event Grid and resolve the issue.

Conclusion

Having a well-defined escalation policy is crucial for effective incident management. It ensures that issues are promptly addressed, minimizing the impact on operations and customer satisfaction. The sample scenario discussed above highlights the importance of having an efficient escalation policy.

Combining the enhanced monitoring capabilities of Turbo360 with its ability to configure an efficient escalation policy out of the box has an edge over having separate tools to monitor the services and manage the incidents.

Why not give Turbo360 a free try?

This article was published on Jun 27, 2023.

Related Articles