We aim to make AI that actually "thinks" like a human.
Here we introduce the AI research and development currently underway at NTT.
Research into artificial intelligence (AI) is accelerating around the world with the development of big data analysis, machine learning, and deep learning. Commercialization is already underway in many areas, and certain functions such as media processing have achieved results that actually surpass human abilities. However, if AI as intelligent as humans is the goal, then that the goal has not yet been reached. NTT believes that the next generation of AI technology have to understand human values, in other words the ability to understand and make judgments regarding the background of thoughts of individuals and scenarios.
At NTT, in order to realize these ambitions, we believe it is necessary to progress from AI that simply “sees,” “listens,” and “speaks,” combining these abilities to make AI that “thinks,” in order to realize higher-order rational and analytical thinking.
In order to realize AI that “thinks” following its own values rooted in individual values it is necessary to equip AI with the basic human abilities to “see,” “listen,” and “speak”. Taking the ability “see” for example, NTT has been working on “angle-free object search technology” that enables highly precise identification of shape changing objects, even with a small number of reference images. Regarding the abilities to “listen” and “speak,” NTT is conducting case study research on “speech recognition and spoken dialogue technology” while engaged in discussions with Professor Noriko Arai of the National Institute of Informatics, as well as Professor Hiroshi Ishiguro of Osaka University, a leading android researcher.
NTT is working to further expand the application range of AI, by developing AI that understands diverse human values and considers those values itself. Even with current technologies it is possible to translate simple sentences and do chat in customer support, but these technologies only output answers optimized for the given information. However, if new proposals and problems can be made based on the experience of AI itself then the AI will be capable of communication that deepens human thought and expands the choice of actions.
Furthermore, in communication with AI, if the AI can guess the values of the other party and respond in a way that reflects their own values, then the AI will be capable of richer and more creative conversations. The ability is becoming important to introduce AI to various fields, such as counseling and facilities for the elderly.
Aiming to develop AI based on such diverse values can simultaneously realize the “tolerance” to accept a variety of ideas flexibly, and the “sincerity” to earn strong trust from humans by eliminating inconsistencies and failures. In order for to AI to “think” on a deeper level and help people think through a combination of “tolerance” to recognize diversity of thought and “sincerity” to recognize diversity and behave both flexibly and consistently, NTT will continue to pursue AI research with a focus on incorporating “tolerance” and “sincerity”.
Here we will introduce specific examples of AI-related research that are actually underway at NTT: “voice recognition and spoken dialogue technology" that supports the abilities to listen and speak, “angle-free object search technology” that supports the ability to see, and “spatial-temporal multidimensional aggregated data analysis technology” that supports temporal and spatial predictions.
At present, attention is focused on dialogue processing technology for enabling natural conversations, with the spread of AI-enabled speakers and smart assistants. NTT is researching and developing various technologies related to communication, including speech recognition systems that can accurately understand the intent of speakers and automatic customer support technology using chatbots.
As part of this research, we have developed an Android named “Totto” that is designed to imitate the renowned actress and TV personality Tetsuko Kuroyanagi. Using dialogue processing technology that realizes natural conversation with human personality in addition to high-accuracy speech recognition technology and highly realistic speech synthesis technology, users can enjoy communication as if they were talking with Ms. Kuroyanagi herself.
NTT has been researching speech recognition for more than half a century. In the beginning, this technology could only recognize extremely stilted speech, with a limited vocabulary. However, NTT was the first in Japan to adopt a technology known as WFST (Weighted Finite State Transducer), making it possible to recognize the closest word from a range of 10 million words, or 100 times as many as before. By utilizing deep learning technology, which has been a hot topic in recent years, NTT was able to win first place in an international competition for the accuracy of voice recognition using mobile terminals in noisy public areas (CHiME32015).
NTT has been working on “angle-free object search technology” since 2015, in order to improve the ability of AI to “see”. This technology is capable of recognizing three-dimensional objects with high precision, then presenting information related to the objects, regardless of the angle from which the object is imaged.
Conventional technology required a large amount of image data taken at various angles in order to increase recognition precision. In contrast, “angle-free object search technology” simulates changes in the three-dimensional appearance of objects, and precisely identifies the correlation between input images and reference images, making it possible to identify objects with only 1/10th of the number of images as with conventional technology.
This technology is used in services for foreign visitors to Japan. One example of this is “Kazashite Annai”. This service displays directions and detailed tourism information in the user's native language, simply by pointing their smartphone camera at signboards, buildings, products, and the like. Demonstration tests have confirmed commercially-viable performance.
In the future, if the range of application can be expanded to recognize various products, this technology will be used for labor-saving and automation of cash register work, as well as efficient sorting and inventory management.
NTT has developed “spatio-temporal multidimensional collective data analysis technology” that predicts the movement of people and objects from a variety of data using AI. Currently, in collaboration with NTT DOCOMO, NTT is studying the commercialization of “near-future population prediction”, which predicts the number of people in certain areas of Japan at present and several hours in the future using mobile spatial statistics data.
This technology is intended to model latent structures from the time-series data on the population of each space of a grid, then learn and predict the fluctuation patterns of the latent structural model to predict the current population and population several hours in the future on grid spaces measuring 250 m – 500 m. This technology is being considered for introduction to “on-demand buses” and car-sharing, which operate by predicting traffic demand. There are also attempts being made to develop this technology in order to relieve congestion and traffic jams in urban areas. Specifically, NTT is working on a simulation environment using information on the flow of people and vehicles, and searching for guidance measures to avoid traffic problems, and presenting those results to guide groups.
In the following section we introduce three issues in AI development and efforts being made to solve those issues.
The first matter at hand is training data. Generally when developing AI, it is necessary to prepare a large amount of training data. As the fields if application increase and subdivide, it is necessary to collect, aggregate and analyze data for each field of application, so cost increases dramatically.
Transfer learning is gaining attention as an approach to reduce training data. This is a method of adapting a model trained on one object to another. Using this method can improve accuracy even if the volume of sample data is limited.
The next issue for AI is “blackboxing AI”. Generally, AI learns from a huge volume of data in a mechanized manner in order to derive results, but it is difficult for humans to understand the learning process. If processing inside the AI is “blackboxed”, then it is difficult for users to trust the results.
“AI whiteboxing,” which shows the full process leading to output results, is necessary to solve this problem. In order to achieve this, in 2016 the Defense Advanced Research Projects Agency (DARPA) of the United States Department of Defense launched XAI, a research and development investment project to realize “explainable AI.”
Another requirement to realize advanced AI is high-performance hardware. Because it is necessary to both execute a large amount of calculations in complex combinations and execute them in parallel, the performance of general-purpose CPUs and GPUs used so far will not be able to keep up, or huge computing resources will be necessary. For this reason, we expect that in the future it will be important to develop frameworks and architectures specialized for AI computation.