We aim to make AI that actually "thinks" like a human.
Here we introduce the AI research and development currently underway at NTT.
NTT is conducting research with the goal of creating AI that gets closer, delves deeper, and better understands humans. We take two approaches to this in different fields of research: the technological approach to nearing and surpassing human capabilities, which includes media processing, data analysis and machine learning, and the scientific approach to better understand humans, which includes human science and diverse brain science.
With the technological approach, we aim to near and surpass specific human abilities through research into fields such as deep learning, machine learning, cross modal media processing, language processing, and discrete optimization. While AI is capable of nearing and surpassing human performance in specific areas, it is still far from reaching the so-called "singularity," in which AI exceeds complex human capabilities as a whole. On the other hand, humans are sometimes swayed by cognitive bias or fooled by illusions that lead to completely unexpected mistakes. With the scientific approach to obtain a deep understanding of people, we are expending effort to clarify and understand implicit brain functions that cause such cognitive bias and illusions, by investigating not only average people but also top athletes from the viewpoint of sports brain science.
Misuse of advanced AI technology could result in AI latching on to the imperfections of humanity. With the two approaches stated above, we aim to establish heart touching AI that is reliable, supportive and intimately familiar with humans through, for example, natural conversation and empathetic communication that bridge the gap and help sharing values between humans and AI. In order to design a more fulfilling and spiritually enriching society anticipating new lifestyles due to the new coronavirus, it has become increasingly important for us to establish this heart touching AI technology that can coexist with humans and be close to the human mind. Based on this understanding, we are conducting research with the ultimate goal of achieving communication that reaches the heart.
The basis of communication is to recognize and understand spoken language. Humans are capable of focusing on and hearing the voice of the person they want to hear, even when several people are talking at once, which is called selective listening. We have developed a technology called SpeakerBeam that enables computers to perform selective listening. Recently, in addition to vocal characteristics, we are using the lip movements as features in order to distinguish speakers even with similar voices. We are also developing voice conversion technology that makes it possible to freely change such voice features as voice quality and intonation while preserving the speech’s linguistic content. Future development of these technologies will enable natural communication that overcomes disabilities or age-related decline in speaking or hearing functions and support conversations in unfamiliar foreign languages.
Humans can identify a song and its title even from a short fragment of music suddenly heard on the street if they know it. We are researching and developing "Robust Media Search" technology for high-speed searching and retrieval of similar items from a massive database of songs and video using fragments of audio and video signals as clues. This technology has been commercialized through NTT DATA and has come to be used by many broadcasting stations as a service for automatically detecting songs used in broadcast programs and generating a list of those songs for music-related rights management. We are also working on searches for objects in real space. For example, adaptive spotting is a technology for rapidly finding objects of a desired shape from three-dimensional point cloud data in real space. As in the case of humans, this technology can learn an efficient search method on its own.
For several years, we have conducted research to find what extent artificial intelligence can solve the same problems as humans, as part of the "Todai Robot Project - Can a robot get into the University of Tokyo?" led by National Institute of Informatics. Our results showed that AI was able to achieve an extremely high score of 185 points out of 200 points (64.1 T-score) on the 2019 English written exam administered by the National Center for University Entrance Examinations. Among many types of tests, English exams contain problems that need to integrate both natural language and knowledge processing. We exploit the knowledge gained in tackling these problems for our conversational AI research to achieve more natural and mutually understandable conversation between AI and humans.
AI development is also intensifying the importance of obtaining a deep understanding of people. For example, when searching on the Internet, advertisements suddenly appear that match the search words. Sometimes users click on such links and impulsively make purchases. At this time, many users might probably argue that they purchased the product completely voluntarily, greatly downplaying any third-party manipulation. As AI technology expands, the risk increases for such manipulation that resembles an AI version of subliminal effects.
In order to prevent such risks, it is important to obtain a deep understanding of the preconceptions held by people and what kind of behavior they might cause and when. At NTT, we are conducting research to clarify information processing mechanisms of human brain and body involved in five senses, perception, and motion, and providing novel perceptual experiences. We are also carrying out research to understand mind, skill, and body relationships in humans through elucidating diversity of brain functions that support cognitive skills of top athletes. For a baseball context, for example, we are exploring the differences between great and mediocre hitters by investigating how well do good hitters actually see the ball or whether a fastball really travels in a straight path. The plan here is to use the knowledge gained in our research to provide feedback to athletes as training techniques that can sharpen brain functions.
We are also investigating the language acquisition process in young children who basically acquire language by talking with their parents. Language and language-based oral communication is a basic human function that has evolved over one hundred and fifty thousand years. On the other hand, since written language emerged relatively recently, only fifty-four hundred years ago, the ability to read is not an inherent function of the human brain, which did not evolve for reading. This function is realized by flexibly combining basic functions of the brain, such as vision, hearing, language, and cognition.
To understand the language acquisition mechanism, we constructed a child vocabulary development database by conducting a large-scale survey of what words children can understand and say at what particular development times and modeling the results. This database is also used in a personalized educational picture book project that encourages reading in children from an early age.
Today, various forms of social media have taken over the central role of telephone and it becomes possible to know things about people you are not even familiar with, such as where they are and what they are doing at the moment. A smartphone obtains a great deal of information about its users throughout the day, and it possibly knows more details about its user than the user knows about her/himself. The traditional black telephone, used in the era when NTT was still "Nippon Telegraph and Telephone Public Corporation," on the other hand, might have a magical sense of presence, solemnity, or a feeling of warmth possibly reflecting personal experience of its user.
In this world, how will communication change as technology continues to develop in the future? What is the communication that reaches the heart? In order to answer these questions, NTT is conducting research to get closer to human capabilities and to obtain a deep understanding of people.
Going forward, we are committed to creating heart touching AI technologies that contribute to human happiness, or in more modern terms, to the well-being of people, and to linking these technologies to the creation of a fulfilling and spiritually enriching society through collaboration with our business partners.
*This paper was excerpted and edited from "I want to know more about you – Getting closer to humans with AI and brain science", a speech made by Takeshi Yamada Vice President of NTT and Head of NTT Communication Science Laboratories at NTT Communication Science Laboratories Open House 2020.
https://www.rd.ntt/cs/event/openhouse/2020/talk_en.html