08/13/2020

    Use AI to eliminate human operation and operate faster than human even for complex tasks!
    The future of network operations empowered by technology from 3 research labs

    *The names of the laboratories mentioned in the article may have changed since the time of writing/interview.

    The ultimate goal of network operations is "Zero-Touch Operation," which is fully automated operations without any human resources. NTT Laboratories have been approaching this from two dimensions: automation of network monitoring, alarm management as well as prediction and early detection of failures (outage, congestion, etc.) in assurance work. At the 2019 R&D Forum, NTT Access Network Service Systems Laboratories (AS Labs), NTT Network Technology Laboratories (NT Labs), and NTT Network Service Systems Laboratories (NS Labs) introduced a concept that takes these approaches to the next level. This article presents the overall shape of this concept and the technologies used for it by interviewing researchers from each of the labs.

    Interviewees

    Haruhisa Nozue
    Haruhisa Nozue
    Senior Research Engineer
    Operation System SE Group
    Access Operation Project
    NTT Access Network Service Systems Laboratories (AS Labs)
    Masahiro Kobayashi
    Masahiro Kobayashi
    Research Engineer
    Traffic Engineering Group
    Communication Traffic Quality Project
    NTT Network Technology Laboratories (NT Labs)
    Naoyuki Tanji
    Naoyuki Tanji
    Researcher
    Network Operations Development project
    Operation Base Project
    NTT Network Service Systems Laboratories (NS Labs)

    Toward Intelligent Zero-Touch Operation

    "We're aiming to achieve the ultimate zero-touch operation, in which AI analyzes various information and make decisions to handle problems autonomously"

    Currently NTT networks are operated by hand with some help from operation support systems. When an alarm is detected, network operation team analyses the root cause, determines failure points and recovery procedures 24/7. This method requires considerable labor for human judgement, which prolongs network outage. Intelligent zero-touch operation will fully automate network monitoring and maintenance work that has conventionally depended on human intervention, and will enable proactive maintenance through the prediction or early detection of failures, all autonomously controlled by AI. By replacing humans with AI, we intend to enable even higher precision, that is to say "intelligent-zero-touch operation. This concept was introduced at the 2019 R&D Forum. In the operation shown in Fig.1 the control loop are autonomously continued by AI connected to each technology; traffic classification and prediction technology developed by NT Lab, failure point estimation technology developed by AS Lab, and SLA (Service Level Agreement) based decision making technology developed by NS Lab. The key technologies in this concept are introduced below.

    Figure 1

    Traffic classification and prediction technology that enables a proactive response (NT Labs)

    "This technology accurately predicts complex traffic fluctuations, and responds proactively"

    Traffic classification and traffic prediction technology aims to enable proactive response that predicts traffic congestion and allocates network resources proactively.

    This technology predicts rapid increases in network traffic for responding proactively to prevent degradation of service quality, such as decreased communication bandwidth and increased latency caused by congestion. Network traffic prediction is getting troublesome as network services and user devices are increasing and being diversified, causing traffic fluctuation not related to end users' lifestyle and daily routines. Predicting traffic based on total traffic volume is difficult since traffic may vary suddenly by specific service and end user. Predicting traffic of each service and user does not have accuracy because traffic volume per service or user is a small bit, even though the prolonged computation time to add up the traffic data of services and user, which is enormous.

    In order to solve this problem, NT Lab is researching highly accurate prediction method. This technology groups traffic of services and users with similar characteristics, then predicts traffic volume by each group, and finally they are added together to predict overall traffic volume and characteristics. This method reduces computation time compared with making predictions for each user or service, and improves prediction accuracy by tracking the characteristics of groups. This even enables to predict complex traffic with high precision. The key to this method is how groups are classified, and nonnegative tensor factorization (NTF) is the answer. NTF is often used to analyze what consumers buy and where they buy it, as part of analyzing purchase history. By applying NTF to traffic data, it is possible to group services and users with similar access patterns. In addition, we are making some efforts to calculate the grouping of traffic in a short time.

    Figure 2

    Automatic failure point estimation technology that can handle complex failures (AS Labs)

    "Through repetitive rule learning, we can improve the accuracy of failure point estimation"

    Assuming that NT Lab's technology would be proactive, AS Labs has made a proposal that contributes to automation and strengthening autonomy in the event of actual failures. Conventionally, when responding to failures in networks, an alarm is issued whenever a monitoring device detects a failure, then an operator sees the alarm and makes a judgment to isolate the failure point. This requires considerable skill and knowledge, and is time consuming.

    Automatic failure point estimation technology automates this work.

    This technology uses a generic rule engine to estimate the point and cause of a failure based on rules that have been generated in advance. Its primary feature is that it automates rule-making. At first, it is necessary to learn how to respond to failures from operators, but after that, the AI will update the rules automatically, improving the precision of its estimations. Continuing this process makes it possible to understand the characteristics of past failures, even in the case of complex failures that trigger multiple alarms.

    Figure 3

    SLA-based decision making technology that determines the priority and suitability of responses (NS Labs)

    "This AI evaluates how end users feel about the quality of the network service based on indicators"

    Once traffic is classified and predicted to estimate the failure point, it is necessary to determine what kind of maintenance should be done. For example, even when there is an obvious failure, the priority of response will be lowered if no user is affected, such as when the defunct device does not accommodate any user. Another example is when network latency is expected to increase, the necessity or priority of response will depend on its effect to provided service. In the future, it is assumed that more various communication services offering different network qualities depending on the applications used by and situations of each user. AI determines the necessity and priority of countermeasures, and automatically select the optimal response by considering the quality level to be satisfied for each service, such as real time functionality, high capacity, or reliability. This is SLA Driven Operation.

    Figure 4

    An example of intelligent zero-touch operation utilized by the three technologies

    "Use case of proactive measures to avoid problems"

    Up until now, we have explained each technology that constitutes intelligent zero-touch operation. Below we show an example use case to avoid problems using these technologies.

    Proactive countermeasures are operations that deal with problems in advance. Here we explain them with use cases that provide multiple services on a single network. For example, say there is an area where network congestion occurs every time there is a sports match. Meanwhile several companies in the same area hold important business web conference on the same network, the conference may be disturbed and experience quality degrade due to network congestion. Intelligent zero-touch operation installed in this situation behaves in the following way. Firstly, AI classifies and predicts traffic based on classification rules that it learned in advance using traffic classification and prediction technology. In this case, a rapid increase in network traffic is predicted during the sports match. Secondly, SLA decision-making technology predicts each user's communication quality and compares it against the established SLA policy in order to determine the impact of the network congestion. AI decides that proactive response is required as web conferences are expected to lag due to traffic increase during the match. In the next step, failure point estimation technology estimates the failure point based on rules that AI has learned. This time, the AI judges that it is not a failure, only a congestion, and estimates the congestion point. Finally, optimal resource allocation technology calculates to bypass the sports match traffic to another network route. As a result, traffic from/to the venue increased as expected during the sports match, but proactive measures succeeded to avoid traffic congestion and quality degradation of web conference communication.

    As shown in the use case above, this technology is distinctive in that it detects an oncoming failure before it occurs, and solve the problem without human intervention.

    Figure 5

    Future Development

    "We aim for the evolution and establishment of each technology"

    There have been various studies on automating control of network maintenance, such as information supplementation, prediction, judgment and failure detection using AI learned from accumulation of past events. The distinctive part of intelligent zero-touch operation is to add new technologies from analysis and response judgment to reaction, and to incorporate each into a loop of network operation work, moving research forward a step further.

    At present, we are planning to go deeper with our research on individual technologies and provide each technology to NTT operating companies. Traffic classification and prediction technology is on its way to establishment. Failure point estimation technology experienced its first commercial trail in spring 2019 and gained more estimation accuracy. SLA decision-making technology is under development of its response necessity judgment function. Further enhancement of each technology will lead us to intelligent zero-touch operation. In the future, aside from zero-touch day to day operations, we will also expand the scope of zero-touch as research progresses, such as by bringing zero-touch to any follow up operations that are necessary when the specifications of devices and services change.

    The 3 interviewees

    Editor's Note

    It seems that zero touch operation as a concept is not only being worked on by NTT but various other carriers as well. Among these companies, I feel that NTT's intelligent zero touch operation has the advantage of combining component technologies, with characteristic parts created in each process. Our goal is to achieve the ultimate in zero touch operation that requires no human intervention, but that goal is still far ahead. This is because we are now in the process of establishing each technology by undertaking commercial trials and improving functionality. If we can make something that has improved to the next level, it may be possible to create zero touch operation mechanisms that have evolved beyond our current concept.

    Interview by Kayoko Kaisho
    on March 5, 2020

    Reference

    • Yuka Komai, Tatsuaki Kimura, Masahiro Kobayashi, Shigeaki Harada, "Traffic Prediction Method Based on Access Patterns", IEICE (The Institute of Electronics, Information and Communication Engineers) Technical Report, vol. 119, no. 158, IN2019-22, pp. 43-46, August 2019 (references for traffic classification prediction)

    Related content