Lifecycle Phases
Lifecycle phases are groupings of milestones. They provide a structure for tracking an incident from start to finish, while allowing flexibility in how you define each milestone within a phase.
Milestones describe the current status of the incident and communicate to stakeholders the team's progress in resolving the issue. As responders work through incidents on FireHydrant, they will typically transition the Milestone, and FireHydrant automatically logs the timestamps of these changes.
This allows FireHydrant to collect data for holistic incident metrics out-of-the-box like MTT*, Impacted Infrastructure, Responder Impact, and so on.
Milestone timestamps can be adjusted during the incident in Slack (using /fh update) or within the FireHydrant UI can also be changed post-incident during the Retrospective phase.
Note:
The milestones must be chronologically equal to or greater than the previous. For example, the Acknowledged milestone cannot be earlier than the Started milestone.
Lifecycle Phases
We've defined four primary phases to cover the entire incident lifecycle:
Started
Indicates the beginning of an incident. By default it will be populated with the following milestones:
-
Started - When the affected system began having problems.
- By default, this is set when an incident is opened in FireHydrant. We use this timestamp to calculate the time difference to the rest of the milestones.
- You can modify this timestamp if the incident started before opening the FireHydrant incident.
-
Detected - When a monitoring system (or human) noticed that the system was having problems.
- If you open a FireHydrant incident directly from an inbound alert, this milestone will be set to the timestamp of the alert. When this happens, the Started milestone will also be set to the same timestamp.
- If you open an incident without an attached alert, this milestone will remain unpopulated and must be manually set.
All milestones in the Started lifecycle will be populated with the open time.
Active
Indicates that the incident is active. By default it will be populated with the following milestones:
- Acknowledged - When someone responding to the incident acknowledged the situation.
- Investigating - When the first concrete step toward triaging and identifying the problem was taken.
- You must transition to this milestone manually.
- Identified - When the problem was identified and corrective actions began.
- You must transition to this milestone manually.
- Mitigated - When the system is no longer exhibiting problems to users, but the team is still monitoring the situation.
- For example, the team may be waiting to see if signals or SLIs normalize after the corrective action, or maybe the team took temporary corrective measures to stop customer impact, but a more permanent fix is needed before the incident is considered resolved.
If you manually initiate a FireHydrant incident, the incident now begins with the first milestone in the Active Lifecycle phase, and the time for the Started milestone will match it. If the FireHydrant incident was initiated automatically (e.g., via API or Alert Routing), you will need to manually transition to an Active Lifecycle Phase.
Post Incident
Indicates that mitigation is complete and the incident itself is no longer active. By default it will be populated with the following milestones:
- Resolved - When the system is confirmed to be working again with no relapse.
- This is also the time when temporary fixes to mitigate the issue are removed, and the system is now behaving as normal.
- You must transition to this milestone manually.
- Retrospective Started - The incident will transition to this milestone when you click "Start Retrospective" in the Command Center or if you run
/
fh start retro
in Slack.- This milestone is tracked and only shown/modifiable in the interface after the incident has been Resolved.
- Retrospective Completed - When the team has finished reviewing the incident, clarified learnings and follow-ups, and published findings.
- The incident will transition to this milestone when you click "Publish Retrospective" on the Retrospective page.
- This milestone is tracked and only shown/modifiable in the interface after the incident has been Resolved.
Closed
Indicates that the incident’s process has been fully completed and no further work will commence. By default it will be populated with the following milestone:
- Closed - Indicates all tasks mid- and post-incident are completed.
- This is not yet factored into any analytics and only shows up when editing Milestone timestamps during the Retrospective phases.
Updating Milestone Times
When you transition a Milestone, the timestamp at which you performed the action is filled in. However, you can change these values at any point.
You can change the values by clicking the Milestone dropdown in an incident's Command Center:
Alternatively, you can go to any event in the timeline, click the ellipses, and then use that particular event's timestamp as a value for a chosen milestone. For example:
Incident Metrics
Incident metrics are crucial for helping you understand the health and effectiveness of your services, environments, functionalities, and incident response teams. They can help determine how quickly your organization is responding to incidents, and in turn, how much trust you are building with users.
Luckily, FireHydrant can provide you with the information you need to make informed business decisions when it comes to reliability.
The following metrics are built from the Milestone timestamps:
- MTTD : Mean Time to Detection
time of detection - time of incident start
- MTTA : Mean Time to Acknowledged
time to acknowledgment - time of incident start
- MTTM : Mean Time to Mitigation
time to mitigation - time of incident start
- MTTR : Mean Time to Resolution
time to resolution - time of incident start
- Healthiness :
(MTTM * incidents) / time window
- As an example, if you have an incident for a given service that was started at noon, mitigated at 1 PM, and then resolved at 2 PM, healthiness for that infrastructure would be 50% for the window of noon to 2 PM.
- Impact : Within a given date range, multiple incidents are added up to calculate the time a service, functionality, or environment was degraded.
Requiring Fields
Organizations can enforce mandatory data entry at specific milestones throughout the incident lifecycle. This ensures that critical information is captured consistently, facilitating more effective incident analysis, reporting, and regulatory compliance.
To modify these settings, go to Settings > Incident Settings and click the Pencil icon next to the field you want to edit. Check or uncheck "Required at and after milestone," then select the milestone where the field should be required. The field will be required at the chosen milestone and any subsequent milestones.
Attempting to transition a milestone with any required fields empty will result in an error via all methods: web UI, Slack, MS Teams, and API.
Custom Measurements
Our analytics API and UI supports custom milestones, users can access and export custom MTTX metrics for their personalized milestones over any configured milestone set.
Creating custom MTTX metrics
In the FireHydrant UI navigation, select Settings or ⚙️ and then Incident milestones. Scroll down to Measurement Definitions and click "Add measurement".
In the Create measurement modal, you’ll have the following fields:
- Name (required): The name as it will appear on forms and in the UI. The measurement name must be unique.
- Slug (required): The slug for the measurement. This will be automatically generated from the name if left blank
- Description: A brief explanation of the measurement.
- Starting Milestone (required): The milestone where the measurement should start.
- Ending Milestone (required): The milestone where the measurement should end.
Note:
Accounts are currently limited to ten different measurements.
Healthiness Measurement
You can choose which milestones your Healthiness measurement is based on. By default, it is calculated based on the Started and Mitigated milestones. Healthiness is calculated as the sum of the measurement duration for all selected incidents divided by the total time window, approximating an uptime calculation.
In the FireHydrant UI navigation, select Settings or ⚙️ and then Incident milestones. Scroll down to Measurement Definitions
- Select the measurement you want to use for your Healthiness measurement
- Click the kebab menu for that measurement
- In the modal select the “Use measurement for healthiness” option
Next Steps
With a basic understanding of FireHydrant's incidents, dive into the details of conducting one by visiting the following pages:
Updated about 2 months ago