Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This article describes how to monitor and troubleshoot Maestrano's infrastructure with New Relic




Table of Contents
stylenone



1 -

Overview

Server monitoring enables you to supervise the actual infrastructure behind the applicationscollects metrics about the servers running on the Maestrano environment

Drilling down to into a specific server shows its details

Image Removed

2.

details such as CPU and Memory usage, Disk and Network I/O and Load average

Image Added

The NewRelic server agent is installed by Nex! on all the racks.

Server alerts policies

NewRelic provides a set of default alerts on the servers based on the CPU and Memory usage. Depending on the applications running on the servers, you may want to tune these alerts.

Image Added

Go to NewRelic Dashboard > Servers > Alerts > Server Policies

Select the Policy group you want to edit or create a new one. These are the recommended settings

  • CPU: Send alert after 5 minutes > 95 %
  • Disk I/O: Send alert after 10 minutes > 75 %
  • Memory: Send alert after 10 minutes > 100 % (this includes swap)
  • Fullest disk: Send alert after 30 minutes > 90 %

It is highly recommended to send alerts to the Slack channel #alerts

Troubleshooting an alertĀ 

Severs alerts are related to either high CPU usage, memory usages, I/O or load.

Looking at the server by itself is usually not enough to understand the root cause of the issue, so it needs to be correlated to other events on the application side.

2.1

Non critical example: Memory Alert


Here, we have an alert on Memory usage on one of our servers, this is a none critical alert but we will make sure no action is needed.

From the Apps section, we can see that this is MnoHub.

Drilling down to processes helps us see what is going on

Everything looks fine, but to ensure business services are not impacted, we are going back to APMĀ 

All good.