Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

1 - Applications / APM

Important KPIs to follow on an APM:

Transactions

Errors

APDEX score - how to configure it

Availability

2 - Servers

CPU usage

Memory

Load

3 - Synthetics

Ping monitoring

4 - Alerting

KPIs

Ping

Channels: Slack & emails

5 - Process

Now that alerts are configured correctly, when alerts are received, here are a few tips to follow: 

...

  1. Raise a ticket "Incident" in JIRA describing: "Incident description and consequences", "Actions taken", "Root Cause", "Recommendation"
  2. If the incident is closed, close the JIRA Ticket, otherwise, change to the correct status.
  3. If the incident is closed but the root cause has not been identified yet or has not been corrected yet, raise a ticket "Problem" in JIRA, linked to the incident ("caused by")
  4. Before leaving an investigation, make sure that someone from the Operations Management and/or an investigator is accross the incident and ready to investigate. 
  5. Assign the JIRA tickets appropriately, publish the JIRA ticket(s) on Slack on the #incident Channel

...

New Relic is a tool used to give deep performance analytics for every part of your platform: applications and servers. You can easily view and analyse massive amounts of data, and gain actionable insights in real-time. It's a major tool to use in case of an incident.

Want to set up New Relic to support your platform? See: Setup New Relic for Monitoring


...

Child pages (Children Display)

...