Skip to main content

Cloud APM


Cloud application performance management (cloud APM) is the process of monitoring resources that support software application performance in public cloud, private cloud and hybrid cloud environments, and ultimately taking actions to resolve issues and maintain optimal performance.

The major goals of cloud APM are the same as traditional APM: help administrators quickly identify and resolve any issues with a cloud-based application, which impact either the user experience (UX) or back-end functions such as security and costs.

Cloud application performance monitoring

The term APM is often used synonymously with a subcategory of management: application performance monitoring, which generally focuses on monitoring of metrics that underpin application performance and usability application performance. Note that APM tools have begun to evolve beyond basic monitoring and toward remediation, but true app management functionality is still mostly nascent, given the rapid growth of applications, their complexity and the involvement of teams and technologies to develop and maintain them.

In this context of cloud APM, issues are typically not remediated through the APM tool itself. The resolution process could involve on-premises adjustments for private cloud workloads, as well as tweaking cloud services and functions upon which the application depends. This may also include turning off the cloud service until the issue has been resolved.

By either interpretation of APM, the first step to identify and fix application performance problems is to know what's happening. Software agents placed on the application server monitor application, service and database response times. Administrators can use cloud APM tools to combine data from disparate monitoring silos into a correlation engine and dashboard, which makes audit logs easier to read and saves IT staff from memory-dependent and error-prone manual correlation and analysis. APM tools also can display graphical representation of how an application behaves on end-user devices -- including an index-based graph to measure end user satisfaction or "happiness" -- and gauge how service-based events affect these ratings.

Examples of key application metrics to monitor include:

  • Resource availability. Is the instance still running, or are database requests hanging?
  • Response time. Are slow response times due to network bandwidth or underlying resource issues?
  • Application errors. What's their frequency and source?
  • Traffic levels. How many users typically access the cloud application, and does it have sufficient scalability to handle a sudden spike in activity?
  • End user satisfaction. What is the success rate of a given task, and how long does it take?

The monitoring priority can shift depending on the current workload and business needs. Also, different aspects of cloud APM may overlap, such as response to denial-of-service attacks that impact both performance and security.

Monitoring isn't just to identify problems -- it's also useful to know what's operating well, so you don't devote time and effort where it's not needed.

Benefits of cloud APM

Cloud APM benefits are mostly the same as with traditional APM:

  • Monitor an application's performance and availability;
  • Quickly diagnose and troubleshoot performance issues; and
  • Improve application responsiveness and uptime.

Another important goal of cloud APM is to help administrators identify a poor UX quickly. 

Traditional APM vs. cloud APM

Cloud APM must account for more dependencies in application performance than traditional APM -- for example, monitoring network communications to detect problems between the application and any cloud services it requires to run. Many cloud APM tools monitor both latency and the number of incoming and outgoing requests an application makes.

Different types of cloud services must be monitored in different ways. An app running in a virtualized instance produces a lot more log data than a Serverless function.

Another major distinction between traditional APM and cloud APM is visibility into the underlying infrastructure for operational metrics. An enterprise hosting its application on premises or in a private cloud can see and control its physical IT infrastructure to help fix performance issues. By contrast, the abstraction layer in public cloud architecture prevents deep visibility into underlying IT assets to report on metrics and criteria. This makes cloud APM more challenging, and more critical, to perform root-cause analysis and troubleshoot performance problems. Cloud providers recently have made strides to expand visibility into their infrastructure, for their native service offerings as well as third-party tools.

Cloud application performance management tools

As more enterprises move applications to the cloud, they increasingly require tools to monitor and manage application performance and availability across a distributed computing environment. Some tools include predictive capabilities to alert administrators where potential problems may exist, and also automate the process to resolve them.   

By their nature, APM tools from the major public cloud providers perform "cloud APM" to monitor resource usage, manage costs and observe network performance. Native cloud APM capabilities may offer advantages such as compatibility with and deeper traceability for services in that provider's cloud ecosystem. However, visibility into some core metrics may not be available; typically, these do not integrate with other cloud platforms. The main APM tooling tools for the major public cloud platforms are: Amazon CloudWatch, Azure Application Insights and Google Operations (formerly Stackdriver).  

Third-party APM vendors historically have advantages in their depth of reporting and visualization, and ability to tie into various platforms. Increasingly their APM tools integrate with cloud apps as well. Most, if not all, standalone APM vendors deliver their tools in a SaaS model; some offer it as a managed service, or enable clients to run it in their own environment. Among these vendors are:

  • Broadcom (CA) DX
  • Cisco (AppDynamics)
  • Datadog
  • Dynatrace
  • ManageEngine
  • Micro Focus
  • New Relic
  • SolarWinds
  • Splunk (SignalFx)
  • Tingyun

Open source APM tools

With the integration complexity involved with native and vendor-specific tools across the spectrum of cloud computing, open-source instrumentation has become increasingly popular, and this includes cloud APM. Gartner predicts that by 2025, 50% of new cloud-native application monitoring will use open-source instrumentation instead of vendor-specific agents, a tenfold increase from 2019.

Open-source tools that can provide cloud APM include Nagios, Prometheus and Zabbix. Efforts to support open-source monitoring include the OpenTelemetry project, which is still in beta as of August 2020.

Comments

Popular posts from this blog

Understanding the Evolution: AI, ML, Deep Learning, and Gen AI

In the ever-evolving landscape of artificial intelligence (AI) and machine learning (ML), one of the most intriguing advancements is the emergence of General AI (Gen AI). To grasp its significance, it's essential to first distinguish between these interconnected but distinct technologies. AI, ML, and Deep Learning: The Building Blocks Artificial Intelligence refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. Machine Learning, a subset of AI, empowers machines to learn from data and improve over time without explicit programming. Deep Learning, a specialized subset of ML, involves neural networks with many layers (hence "deep"), capable of learning intricate patterns from vast amounts of data. Enter General AI (Gen AI): Unraveling the Next Frontier Unlike traditional AI systems that excel in specific tasks (narrow AI), General AI aims to replicate human cognitive abilities across various domains. I...

Normalization of Database

Database Normalisation is a technique of organizing the data in the database. Normalization is a systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion, Update and Deletion Anamolies. It is a multi-step process that puts data into tabular form by removing duplicated data from the relation tables. Normalization is used for mainly two purpose, Eliminating reduntant(useless) data. Ensuring data dependencies make sense i.e data is logically stored. Problem Without Normalization Without Normalization, it becomes difficult to handle and update the database, without facing data loss. Insertion, Updation and Deletion Anamolies are very frequent if Database is not Normalized. To understand these anomalies let us take an example of  Student  table. S_id S_Name S_Address Subject_opted 401 Adam Noida Bio 402 Alex Panipat Maths 403 Stuart Jammu Maths 404 Adam Noida Physics Updation Anamoly :  To upda...

How to deal with a toxic working environment

Handling a toxic working environment can be challenging, but there are steps you can take to address the situation and improve your experience at work: Recognize the Signs : Identify the specific behaviors or situations that contribute to the toxicity in your workplace. This could include bullying, harassment, micromanagement, negativity, or lack of support from management. Maintain Boundaries : Set boundaries to protect your mental and emotional well-being. This may involve limiting interactions with toxic individuals, avoiding gossip or negative conversations, and prioritizing self-care outside of work. Seek Support : Reach out to trusted colleagues, friends, or family members for support and advice. Sharing your experiences with others can help you feel less isolated and provide perspective on the situation. Document Incidents : Keep a record of any incidents or behaviors that contribute to the toxic environment, including dates, times, and specific details. This documentation may b...