Skip to main content

Artificial intelligence for IT operations (AIOps)


Artificial intelligence for IT operations (AIOps) is the use of deep learning and big data analytics to automate routine administrative tasks, including deployment, root cause analysis and problem resolution, for an information technology (IT) system.

Ideally, an AIOps platform brings three important capabilities to the enterprise:


  • The ability to recognize abnormal system behavior faster and with greater accuracy than humanly possible.
  • The ability to use IFTTT business rules to automate routine tasks.
  • The ability to streamline communication among stakeholders.


How AIOps works

AIOps tools gather information from the IT tools and devices already in place and apply detailed analytics and machine learning to that information in order to identify potential issues and correct them. Typically, AIOps data comes from network log files, cloud monitoring tools and helpdesk ticketing systems.

Big data technologies aggregate and organize all of the systems' output into a form that an AIOps platform can use. The platform uses correlation engines and business rules to monitor throughput and either do nothing, take action autonomously or alert a human administrator when required.

AIOps platforms are designed to illustrate dependencies and the role each dependency plays in both normal and abnormal system behavior. To be effective, AIOps tools must be adaptive to machine-learning-specific workflows and be able to handle the recursion required to support continuous machine learning (ML) model training.

Use case for AIOps

Although the underlying technologies for AIOps are relatively mature, it is still an early field in terms of combining the technologies for practical use. Organizations that want to streamline data-intensive, manual and repetitive tasks, such as ticketing, are good candidates for an AIOps platform proof-of-concept project.

Challenges of AIOps

AIOps is only as good as the data it receives and the algorithms that it is taught. The amount of time and effort needed to implement, maintain and manage an AIOps platform can be substantial. The diversity of available data sources as well as proper data storage, protection and retention are all important factors in AIOps results.

AIOps demands trust in tooling, which can be a gating factor for some businesses. For an AIOps tool to act autonomously, it must follow changes within its target environment accurately, gather and secure data, form correct conclusions based on the available algorithms and machine learning, prioritize actions properly and take the appropriate automated actions to match business priorities and objectives.


Comments

Popular posts from this blog

Black swan

A  black swan event  is an incident that occurs randomly and unexpectedly and has wide-spread ramifications. The event is usually followed with reflection and a flawed rationalization that it was inevitable. The phrase illustrates the frailty of inductive reasoning and the danger of making sweeping generalizations from limited observations. The term came from the idea that if a man saw a thousand swans and they were all white, he might logically conclude that all swans are white. The flaw in his logic is that even when the premises are true, the conclusion can still be false. In other words, just because the man has never seen a black swan, it does not mean they do not exist. As Dutch explorers discovered in 1697, black swans are simply outliers -- rare birds, unknown to Europeans until Willem de Vlamingh and his crew visited Australia. Statistician Nassim Nicholas Taleb uses the phrase black swan as a metaphor for how humans deal with unpredictable events in his 2007...

A Graphics Processing Unit (GPU)

A graphics processing unit (GPU) is a computer chip that performs rapid mathematical calculations, primarily for the purpose of rendering images. A GPU may be found integrated with a central processing unit (CPU) on the same circuit, on a graphics card or in the motherboard of a personal computer or server. In the early days of computing, the CPU performed these calculations. As more graphics-intensive applications such as AutoCAD were developed; however, their demands put strain on the CPU and degraded performance. GPUs came about as a way to offload those tasks from CPUs, freeing up their processing power. NVIDIA, AMD, Intel and ARM are some of the major players in the GPU market. GPU vs. CPU A graphics processing unit is able to render images more quickly than a central processing unit because of its parallel processing architecture, which allows it to perform multiple calculations at the same time. A single CPU does not have this capability, although multi...

6G (sixth-generation wireless)

6G (sixth-generation wireless) is the successor to 5G cellular technology. 6G networks will be able to use higher frequencies than 5G networks and provide substantially higher capacity and much lower latency. One of the goals of the 6G Internet will be to support one micro-second latency communications, representing 1,000 times faster -- or 1/1000th the latency -- than one millisecond throughput. The 6G technology market is expected to facilitate large improvements in the areas of imaging, presence technology and location awareness. Working in conjunction with AI, the computational infrastructure of 6G will be able to autonomously determine the best location for computing to occur; this includes decisions about data storage, processing and sharing.  Advantages of 6G over 5G 6G is expected to support 1 terabyte per second (Tbps) speeds. This level of capacity and latency will be unprecedented and wi...