Troubleshooting a NewRelic Alert

You can find here the list of common alerts triggered by NewRelic and how to resolve them. Looking at the server by itself is usually not enough to understand the root cause of the issue, so it needs to be correlated to other events on the application side.




1 - Memory Alert


Here, we have an alert on Memory usage on one of our servers, this is a non-critical alert but we will make sure no action is needed.

From the Apps section, we can see that this is coming from MnoHub.

Drilling down to processes helps us see what is going on

Everything looks fine, but to ensure business services are not impacted, we are going back to APM 

The web transaction time is stable over time and the APDEX is acceptable. The application behaves correctly and no actions is required.

2 - CPU Alerts

Under heavy application load, CPU can raise alerts and should be closely looked into. CPU reaching 100% usage leads to applications behaving slowly and becoming unresponsive.

These alerts require immediate action:

  1. Check the transactions response time and APDEX
  2. If the application becomes slow or unresponsive, it will need to be restarted
  3. Investigate the root cause, this will require profiling of the application
  4. Check if we had an usual application load over this period of time

3 - APDEX Alert

The APDEX score is based on the application response time, when the application performs slowly, an alert is raised.

Clicking through the Application link (Application: Connec API) an overview of the application is displayed for the time window where the alert occurred. Digging into the Transactions section, the slowest transactions will be listed.

From there an investigation will have to be lead to find the root cause of the performance issue and address the underlying problem which is performance related.
See Application Performance Monitoring (APM)

4 - Error Rate

When an application throws too many errors, an alert is raised. By default, when the error rate is above 5% an alert is sent by NewReloic.
Following the alert link gives information about the Error rate over the time period.

4.1 Connec Jobs

Background jobs can raise exceptions and when not caught, the job will be re-enqueued for future processing. This may lead to having a large amount of failing jobs being re-processed over many days.

Displaying the Events > Errors section gives a break down of the errors grouped by error message. Clicking on the message name will display the ruby trace to help troubleshoot the error. This will allow you to understand where the error comes from. Therefore, you may be able to solve it directly on your end. Otherwise, contact Maestrano support team and attach a screenshot of the new Relic error page, which will facilitate the resolution of the issue.

Expanding the application backtrace will tell you which part of the code fails and needs to be fixed.

4.2 Connec API

Public endpoint will always be pinged by crawlers trying to find vulnerabilities and executable scripts. These types of requests should be blocked at the web server (nginx) level to avoid NewRelic raising 404 errors.

Sometimes public APIs are scanned for common scripts and return 404 errors, triggering alerts. These can be ignored.