2024 REPORT

Next-Generation AI

Presenters:

Senior Researcher
NTT Computer and Data Science Laboratories

Susumu Takeuchi

Co-Founder and COO
Sakana AI

Ren Ito

Project Professor
Institute of Industrial Science (IIS)
The University of Tokyo

Youichiro Miyake

Moderator:

Head of Educational Content
WIRED JAPAN

Michiaki Matsushima

NTT RDF2024 TECHNICAL SEMINAR "Next Generation AI"

Not one, but many futures are possible

Matsushima Michiaki
Head of Educational Content, WIRED Japan

This session was initiated with the participation of WIRED in the “AI Constellation Round Table” meeting hosted by NTT, with experts and NTT researchers exploring the potential for AI constellations in the future.
Based on the concept of “Realizing Futures,” WIRED is using the term Futures (plural) rather than Future (singular), to emphasize that not one, but many futures are possible.
There is speculation that, through repeated self-improvement, AI will reach a “singularity,” when it exceeds human capabilities, but this cannot happen with current generative AI. Participants discussed “Next-Generation AI,” based on established major issues such as how AI can accommodate culture, regional differences and other complexities of human society, and how a pluralistic AI can be created.

Introduction to Research Content

Initiatives for next-generation AI - AI constellations/h3>
Takeuchi Susumu
Senior Researcher
NTT Computer and Data Science Laboratories

I work as a group leader for R&D on AI and algorithms. Today, I will describe an “AI constellation” concept that we have been working on since last year. Currently, Large Language Models (LLMs) have gained prominence in AI, so there can be no discussion without including them. Since the appearance of ChatGPT, both AI research and business have changed greatly, and there are now initiatives for both AIs that collect open, general-purpose knowledge, and others that utilize closed-domain, organization-internal data. I am sure everyone has a sense of how difficult it is to utilize such closed knowledge.
On the other hand, as the scale of LLMs increases, power consumption and computational costs increase, which is considered a problem. Also, although LLMs gain generality as they get larger, there is concern that they lose individuality and can lose their ability to differentiate. For these reasons, there is a trend away from huge LLMs that “know everything” to “reasonable LLMs” that have specialist knowledge, and there are already initiatives to develop original LLMs for fields such as medicine, law, manufacturing, and railways. We think there will be a trend in the future to use multiple such LLMs, created by various companies, in combination.
Last year, we also began working on a concept to “solve problems using combinations of low-cost LLMs that have specialization or individuality.” It is an advanced, large-scale AI-linking technology that solves problems from multiple viewpoints by having AIs discuss with and correct each other, while also respecting minority views. These AIs are linked to each other like stars in a constellation, so we refer to them as “AI Constellations” (Fig. 1).

Fig. 1 AI constellation overview

Then, when considering capabilities that AI constellations need to have, such as human creativity and individuality, we first look at regular tasks. Adding creativity results in continuous innovation, while adding individuality results in disruptive innovation. Current LLMs can be applied to such regular tasks, and replacing human work with AI is much anticipated for expanding this application domain. On the other hand, AI constellations will gain individuality from incorporating a diversity of AIs, while gaining creativity through discussion among these AIs, so they can be of assistance rather than replacing humans (Fig. 2).

Fig. 2 Capabilities of AI constellations

There are two use cases, which explicitly define user requirements or objectives. One is to expand creativity and individuality. When planning or deciding something, we imagine a future state and work backwards from it. Providing information from various viewpoints, as with an AI constellation, could expand the user’s perspective. The other use case is to raise the level of community discussion. For example, it can be very difficult to expand or deepen discussion within a meeting, but adding a diversity of views can make the level of knowledge and discussion deeper.
At this R&D Forum, we have an exhibit featuring AI constellations, which demonstrates discussion among multiple LLMs and introduces a “meeting singularity” held in Omuta, Fukuoka, to raise the level of community discussion (Fig. 3). In this initiative, AIs were introduced into the discussion of a real local issue, with discussion first among the AIs, and then among residents of the city. There were several effects, such as that discussion started smoothly with an idea from the AIs, and people were made aware of viewpoints beyond just their own.

Fig. 3 Meeting singularity held in Omuta, Fukuoka

Requirements for implementing AI constellations include a method to link AIs, improvements to training and operation, and cost reductions. Although current LLMs are able to understand within the scope of a natural language, they are not yet able to understand information from around the world, so advances in non-media are also needed.
We are providing a “service environment for human-AI collaboration” using IOWN network and computing infrastructure, which we hope can contribute to society (Fig. 4).

Fig. 4 Contribution to society by AI constellations

Next-Generation AI - Evolutionary model merge for unifying models -

Ren Ito
Co-Founder and COO
Sakana AI

In March this year, we announced “Evolutionary model merge,” which uses closely connected models. This is a method for building models and embodies the AI constellation concept. It connects multiple smaller models, solves problems with performance comparable to larger models, and is able to perform accurate calibration by having the AIs communicate with each other. This represents a next-generation AI. Today I will discuss what sort of AI must be built on this AI constellation concept, how this represents the next-generation of AI, and give some practical examples.
There are companies that can build a model starting from “zero” 20 to 30% more efficiently than OpenAI. However, we are aiming for 99.999% efficiency, so rather than starting from zero, we have attempted to increase efficiency by connecting existing models to each other. Using the example of creating a person, we have created a “Frankenstein merge” method for building models. Rather than gathering the best parts, taking the eyes from one person and the ears from another and so on, without concern for whether there are four eyes, if they are on the bottom of the feet, or if there are four ears, and so on. We create 10,000 merged models in this way, keep the best ten, and discard the rest. These ten models are taken as the second generation and used to create 1,000 more, again keeping only the top ten. This process was repeated for 999 generations, and achieved performance similar to GPT3.5 in 24 hours, costing only 24 dollars. This was a very interesting and significant observation for us.
There are also limitations to model building methods that simply inject data. While this can improve performance, it is not cost effective. For this reason, there is a trend toward sustainable model building using a technology called “reasoning,” which enables models to converse with each other. The current ChatGPT cannot accurately solve every problem right away, but it can perform some translation and summarization, so it can help improve call centers. However, our vision of “AIs needed to bring about an innovative future” will also come eventually. One such type of technology will perform workflow automation, dividing a task into multiple steps and automating them all at once.
We have attempted such automation with the example of “writing an academic paper.” Normally this involves steps such as a senior professor suggesting that a younger researcher write a paper on a particular topic. The researcher then thinks of 100 or more ideas that seem interesting and begins investigating them at the library. Of these, about 95 have already been investigated, so the researcher continues verifying the remaining five, creating charts and writing papers.
We have demonstrated performance of all of these steps using AI in a paper titled “AI scientist” (Fig. 5). This is the first paper by an AI accepted by the journal Nature. This was accomplished by submitting queries for 100 ideas to 100 different base models and using the results for calibration. We are using our constellation concept in this way to build interesting models and methods for using them.

"AI Scientist" paper — Fig. 5 “AI scientist” paper

Next-generation AI and digital game AI – Implementing smart city with three types of game AI -

Youichiro Miyake
Project Professor
The University of Tokyo

I will be discussing the field of games and digital game AIs. This industry is still quite new. It started to gain prominence in 2000, and I entered the industry in around 2004. There are three main types of game AI, referred to as meta AI, character AI and spatial AI, each having their particular roles (Fig. 6).

Further, meta AIs can be combined with generative AIs, character AIs with language AIs, and spatial AIs with spatial computers. At the University of Tokyo we are building a smart city (a city that utilizes advanced digital technology and information to improve efficiency and optimize city functions) system to apply these concepts to a real space. It consists of three AIs: a meta AI, which provides overall control of Omuta City; character AIs, which are active in the city; and a spatial AI, which understands spatial circumstances in the city. Today I will be describing the spatial AI and meta AI and will be the keywords in this discussion.
The spatial AI has the role of acquiring spatial information at particular locations in the real world and associating it spatially when building a digital-twin metaverse (a virtual space associating the digital and real worlds) (Fig. 7). There are also other techniques for embedding information AIs in the environment. In fact, objects such as doors in a game can themselves be AIs that can support character motion, and we are layering in such entities to build the smart city.

The meta AI is an “AI that attempts to understand humans.” Various devices can be attached to users to gather biological information and understand psychological state, and this can be applied not just in a game, but also in real space.
The meta AI itself can also create the game, such as a 3D dungeon. Till now, 100% of game content has been created by humans, but the meta AI can use the power of generative AI to create content or games with 20% more variety. We hope to use these types of technologies to create various types of communication.
To change a game space or real space with these three types of AI requires simulation in virtual space and then returning results back to real space. In the future, the role of meta AI will be to take the real and virtual spaces as a set and use the metaverse as an AI. Other agents (with the role of integrating data) that connect systems with humans will also be needed, and we envision a future in which we can converse with AIs, in the same direction as the AI constellation concept.

Metaverse synchronized with real world — Fig. 7 Metaverse synchronized with the physical world

Discussion

The potential of utilizing the combination of metaverse, with efforts to return game AI to the real world, and the AI constellation concept

Senior Researcher Susumu Takeuchi: Initially, we were thinking of AI constellations as focused on LLMs, with the general idea of understanding natural language. However, when attempting to actually apply it as in Omuta City, there is so much information to collect, even if the AI was to suggest creating a café, for example, it would not be able to do so without spatial information. Whether a virtual space or an LLM, both share the issue that if they are not grounded in correct knowledge, it will be difficult to have deeper discussion.
Head of Educational Content Michiaki Matsushima: When AIs interact with both physical space and with humans it is a challenge for AI research, but how do you think things in a game space should be fed-back to the real world?
COO Ren Ito: I think digital twins (collecting data in real space and reproducing digitally, like a twin), which Professor Mitake was discussing, is a technique that has a future not going directly into physical space. AI does not have physical elements, and responses that reside completely in the computer are easier to implement, so for example, automating the processes for home loans in a financial institution can be solved through computation alone. On the other hand, when creating an AI that can fly an airplane, there are physical processes, which cannot be handled by current AIs.
For an AI to output solutions reflecting physics in the real world, intermediate steps are required. A digital twin is essential for these intermediate steps, and the only way to implement robotics or automation is to collect information regarding physical defects and feed them back to the AI. The running loop that returns results back to the real world is also very important.
The return on the volume of data input to an LLM also diminishes, so time-sequence data besides language, or a signal comprehension model is also important. Different, non-language models are being developed steadily, and I expect that application of these models will generate major results.
Professor Youichiro Miyake: The term “AI constellation” is not an exaggeration and has immense potential. I participate in a variety of business-related meetings, and participation requires energy. Depending on who is participating, any conclusions result from how the meeting progressed at the time. This has always been an issue. If the members change or members are excluded, the meeting will be different, so company management should be interested in solutions to this. One solution could be to have an AI hold 1000 meetings with a diversity of ideas, discarding 999 to leave the single best meeting with the necessary result. It seems like this could have potential to implement requirements that we may not be aware of.

The expanding potential of meetings together with multimodal AI

Senior Researcher Susumu Takeuchi: There are still more things that can be done with LLMs, such as using multiple models to discuss and produce solutions from multiple viewpoints in a meeting, or creating large numbers of branches (options), which humans would not be able to do. There are also other reasons why a meeting might not go well, such as insufficient time or data, or being unable to gather the required stakeholders. For social issues, input from future stakeholders is also needed, and LLMs could be used to reproduce them to a certain extent. However, when discussing future ocean resources for example, the perspective of the creatures that live there should also be considered, but this could not be reproduced with current LLMs. In addition, some results cannot be understood without gathering ideas logically, with both sequential and spatial analysis, so various other media and information beyond a LLM will also be needed.
Editor Michiaki Matsushima: AI Scientist will also need to contribute new knowledge when writing papers. To what extent can this actually be controlled?
COO Ren Ito: The meeting singularity in Omuta City is a very good way to use such techniques, since even if 1000 stakeholders cannot be gathered, 1000 ideas and simulations can be done from various perspectives with the AI, which is very interesting. This is explained in the AI Scientist context using a bell curve (Fig. 8).
LLMs are very good at creating responses similar a particular type. This comes from expressing 1000 responses (for example) on a bell curve and then “shooting” for the middle. The expected and the unexpected responses from ChatGPT both also come from shooting for the middle. The Omuta City example gave responses that shoot for the middle of 1000 LLM responses, to produce a harmonized result.
This is also effective for hallucination solutions (when an AI generates information that differs from fact), so that rather than developing an AI that produces few hallucinations, we create a model with many parameters and then shoot for the center of the resulting wide bell curve. Shots are judged by an agent, but shooting for responses that are away from the center can result in some interesting responses. Thus, the solution changes depending on what is aimed for, and this is the real value of calibrating an AI.
Senior Researcher Susumu Takeuchi: We have been influenced by the idea proposed by Audrey Tang, that AI should be “Assistive Intelligence” rather than “Artificial Intelligence,” and a future in which it supports the human strengths of individuality and creativity. So, from the perspective of an AI scientist, we ask, “How will AI provide assistance to humans?”
COO Ren Ito: AI exists to help with simple human tasks, and AI development will develop in a more realistic and grounded way, rather than like a dream in a science-fiction movie. ChatGPT is expanding around the world, being used in advanced, practical ways beyond copying and pasting, with the main focus being to produce a response. As such, for now we will continue to develop work on regular tasks.
Editor Michiaki Matsushima: From gaming, the idea of having the three types of AI work together to present one major challenge or climax every day is used to keep players interested. Could this also be used in real meetings?
Professor Youichiro Miyake: The meta AI turns the game itself into an AI, so the game changes as it understands the human players. If we compare this to a meeting, the meta AI could be used in a mentor role to help the meeting progress, presenting measures to move forward or restating topics if the discussion stagnates.

Relationships between AI and humans in the “Future where AI is familiar”

Senior Researcher Susumu Takeuchi: AIs are very diverse, and each has its own “spirit,” like the Japanese belief that there is a god in everything. The idea of a system of systems (technology connecting multiple independent systems) has long been expressed in this industry, and the power of current generative AI captures this meaning. Using generative AI technology, it will be possible to implement more-real and advanced services all at once.
COO Ren Ito: A practical use case will definitely be for AI to make human life more comfortable. From this starting point, AIs have gained ability to perform reasoning (AIs can interact with each other and refine their own ideas). The next step is to be able to do brainstorming with humans and to give them inspiration. I think that could lead to a future in which everything can be automated by AIs, but beyond making the world more convenient, I think the next obstacle for humans will be to find an interesting division of roles between humans and AI.
Professor Youichiro Miyake: Games reach a conclusion by following a flowchart. For example, if there is a branch point during a meeting, there is no return once a selection has been made. However, if it was possible to return to a branch point on the flowchart, it would be possible to find different conclusions. Thus, if a meta simulation could be done with the AI performing the meeting 1000 times and showing the results to someone before they attend the meeting, it could enable them to reach a better future. This could also make it possible to avoid futures arising from choices made under unexpected conditions at the time.

> 『NTT R&D FORUM 2023』 Report