Maintenance Mode allows you to temporarily mute alerts in order to complete server maintenance, or other work, without disrupting members of your team with paging. We’ve also seen Maintenance Mode used to help keep things quiet during alert storms.
When you start a Maintenance Mode, you have the option to mute ALL Routing Keys (this will mute all incidents globally) or mute select Routing Key(s). If you mute only select Routing Key(s), alerting from other Routing Keys will not be interrupted so that you do not miss critical incidents.
Only one global Maintenance Mode instance can be global at once, but multiple Maintenance Modes scoped to Routing Key(s) can be active at once. Routing Key(s) can overlap between Maintenance Modes.
Global Admins and Alert Admins will be able to start, end, and manage Maintenance Modes.
Start Maintenance Mode
To start a new Maintenance Mode, select the wrench icon within the incidents pane on the web UI.
You will see the following prompt pop up over the portal:
First, enter why you are starting Maintenance Mode into the Purpose field. For instance, you may want to type “Weekly Maintenance for Web platforms” or “Hot Fix for Tuesday’s Deploy.” The Purpose field has a default value of “Maintenance Mode” if you do not wish to customize this; however, the Purpose field is valuable if you or your team is managing multiple active Maintenance Modes.
Then, make selections for which incidents for which you want to mute paging.
Select Mute ALL Routing Keys if you want paging to be muted globally.
Select Mute select Routing Key(s) if you want to select one or more Routing Keys for which paging will be muted (without disrupting paging for other Routing Keys).
If you select Mute select Routing Key(s), you will see a Routing Key picker. In the picker, Routing Keys are listed to the left, and the associated Teams and Escalation Policies for each Routing Key are listed to the right in gray. Type to search or scroll through the list to select one or more Routing Keys for which to mute incidents.
Once you have made your selections, click Start Maintenance Mode.
Once you’ve started Maintenance Mode, the modal will close and a banner will display across the top of your screen signaling to you that the feature is active. If another Maintenance Mode is already active, this banner will already appear on the timeline. When you start a new Maintenance Mode, you will also see a success toast in the upper right corner of the screen and a notification in the timeline that a new Maintenance Mode was started. Members of your team who are on call will receive an email notification that a new Maintenance Mode was started.
During Maintenance Mode
Paging will be muted for any new triggered incidents for the Routing Key(s) you have selected. Paging will continue for incidents that are already in progress. If there is an incident that meets the criteria of the Maintenance Mode you just started, but was triggered before you started Maintenance Mode, it will continue to page.
Alerts and incidents created during Maintenance Mode for the Routing Key(s) you have selected will continue to populate in the timeline and incident pane, but they will not page members of your team. You will notice that alerts and incidents in Maintenance Mode will look slightly different, so you can differentiate them from alerts and incidents that may be critical.
To view all active Maintenance Modes, navigate to the Manage Maintenance Mode page either through Settings under the main navigation or by clicking on the button from the Maintenance Mode banner.
On the Manage Maintenance Mode page, you will see a table of all active Maintenance Modes with some helpful information, such as the Purpose of the Maintenance Mode, who started it, when it was started, and for which Routing Keys. You can also end a Maintenance Mode from this table.
End Maintenance Mode
To end a Maintenance Mode, navigate to the Manage Maintenance Mode page either through Settings under the main navigation or by clicking on the button from the Maintenance Mode banner.
From the Manage Maintenance Mode table, you can search for the Maintenance Mode you want to end, and then click on the X icon from the row.
Upon clicking the icon, you will see the following prompt:
Click End Maintenance Mode.
When you end a Maintenance Mode, VictorOps will start paging from the beginning of the escalation policy for any triggered incidents. You may want to ack and/or resolve all incidents triggered during a Maintenance Mode before you end it in order to prevent paging all the team members who you wanted to mute paging for in the first place.
Triggered incidents with Routing Keys that are a part of other active Maintenance Modes will NOT begin to page, as those incidents are still muted until that Maintenance Mode is ended.
All Maintenance Modes that are not ended manually will stay active forever, so remember to end Maintenance Mode once it has served its purpose so that you do not accidentally miss critical incidents.
When you end Maintenance Mode members of your team who are on call will receive an email notification that the Maintenance Mode was terminated.
Rules Engine Alternative
If you require more granularity than muting paging by Routing Key, you may create a matching condition in our Rules Engine to mute paging for other metadata. For more information regarding this alternative, please see the linked Knowledge Base article Rules Engine Transformations.