Senior Vice President
Head of Research and Development Planning
NTT Corporation
Shingo Kinoshita
This article introduces the initiatives, practical examples, and future outlook of NTT’s Large Language Model (LLM) “tsuzumi” and Innovative Optical & Wireless Network (IOWN). It is based on the KEYNOTE SPEECH given by Shingo Kinoshita, Head of Research and Development Planning, on November 26th, 2024.
In the title of my talk, “IOWN INTEGRAL,” INTEGRAL has two
meanings: “integration” and “indispensable.” “Integration”
refers to the application and integration of IOWN across
a wide range of areas, and “indispensable” means that IOWN
will become indispensable to the earth and mankind.
At this year’s Forum, the exhibition areas are divided
into RESEARCH, DEVELOPMENT, and BUSINESS.
(1) Personal Sound Zone, Noninvasive Glucose Sensor
Let me first introduce active noise cancelling (ANC)
technology for the Personal Sound Zone.
While open-ear headphones that leak no sound and do not
cover the ear are now available, the problem remains
that users can still hear peripheral noise in trains
and other noisy places. This exhibit introduced
“active noise cancelling technology” that eliminates
this noise so that users can hear the music they
want to hear.
It also presented technology that blocks peripheral
noise when entering a dome-like space.
We also introduced a “noninvasive glucose sensor.”
This is technology for measuring blood sugar level
without having to prick one’s body with a needle. It
uses a device that is even smaller than the prototype
presented last year, so we are getting even closer to
its commercialization (Fig. 1).
(2) Drug Ingredients Penetration Promotion Technology,
Visualization of Hand and Foot Dexterity
The first of these exhibits introduces drug ingredients
penetration promotion technology that facilitates the
penetration of drug ingredients in a beauty facial mask,
for example, through ionization using NTT battery
technology having low environmental impact. The other
exhibit introduces technology for visualizing hand and
foot dexterity using a smartphone.
(3) Quantum Computer
NTT has undertaken the development of quantum computers.
At present, mainstream quantum computers are achieved by
the superconducting method or neutral-atom method.
These methods, however, need to operate at extremely low
temperatures, which means that the equipment that must
be used to provide continuous cooling will invariably
be large. In contrast, the quantum computer targeted by
NTT uses optical pulses the same as in optical
communications to achieve quantum states called
“quantum bits” or “qubits” that serve as the basis of
computation. Our method enables large-scale computation
as long as there is equipment for generating optical
pulses. It also makes cooling to extremely low
temperatures as in other methods unnecessary negating
the need for large-scale equipment. With this method,
we would like to use the optical communications
technology nurtured by NTT to accelerate current trends
in quantum computers toward a large-scale quantum
computer capable of practical general-purpose computation.
(1) C89 Space Business Brand, Wireless Energy
Transmission Technology
The “NTT C89” brand in the space business field
was officially launched on June 13, 2024.
NTT C89 defines the businesses, services, and R&D
activities of NTT Group companies in the space field as
“stars.” It expresses the idea of “creating an 89th
constellation” by organically linking
these stars1.
By organically linking the businesses and services of
NTT Group companies in the space field and proposing
solutions that meet customer needs, we aim to strengthen
the business of NTT Group companies in the space sector
while creating synergetic effects and opening up new
markets in space.
In wireless energy transmission technology, I first point
out that it would be possible, for example, to run
mini-cars (rovers) on the surface of the moon over long
distances through built-in batteries using solar cells.
However, there is the problem that temperature
differences are intense in a lunar environment and
that solar cells cannot be used if the battery is not
operating well or if the vehicle enters a shaded area.
How to go about supplying energy in a lunar environment
is therefore an issue.
In response to this problem, we developed technology
for supplying power remotely by using
“electric field surface waves” and passing electromagnetic
waves on the sand covering the lunar surface. Furthermore,
by changing the communication format used from observation
satellites to ground stations from conventional
radio-frequency (RF) communication to optical
communication, we aim to create businesses on a scale
of tens of billions of yen in annual revenue.
1 At present, the International
Astronomical Union (IAU) has determined that 88
constellations exist.
To begin with, I would like to introduce the IOWN
roadmap, from IOWN 1.0 to IOWN 4.0.
IOWN 1.0 is networking, or in other words, technology
for establishing complete photonics between
data centers.
IOWN 2.0 is board-to-board photonics for devices
accommodated within data-center racks.
IOWN 3.0 is package-to-package photonics, and IOWN
4.0 is intra-chip, or die-to-die photonics.
In this way, IOWN is evolving.
In addition, the All-Photonics Network (APN) is
elemental technology for configuring each of the above
versions of IOWN. Within IOWN 1.0, it will feature
wider bandwidths and reduced power consumption.
In addition, photonics-electronics convergence (PEC)
technology will evolve along with IOWN generations
as PEC-2, PEC-3, and PEC-4. Likewise,
the next-generation computing platform known as
the Data Centric Infrastructure (DCI) will evolve
together with the evolution of PEC (Fig. 2).
In March 2023, NTT EAST and NTT WEST launched an APN
IOWN 1.0 service. There are also plans to provide a
series of new services from December 1, 2024 to
expand and enhance frequency bands, coverage areas,
and types of interfaces.
There are three key features of these new services.
In the past, we provided only the OTU4 optical interface, but on hearing from the corporate side that “Ethernet interfaces were easier to use,” we decided to provide Ethernet interfaces as well. The provision of Ethernet interfaces made terminating equipment at customer sites unnecessary thereby saving space and reducing power consumption (a maximum reduction of 940 W at both sites).
An APN connection was established between Japan and Taiwan as a world’s first on August 29, 2024. In fact, a variety of video-based demonstrations have been set up between Taiwan and Japan. Although the distance between Japan and Taiwan is about 3000 km, a delay time of approximately 17 ms has been achieved. Since the transmission delay of optical fiber is said to be 15 ms, this means that we have achieved stable communications with low latency and no jitter (Fig. 3). Several experiments are also being conducted using APN between Japan and Taiwan.
(1) Ultra-high-speed data backup by APN
In the event of a major disaster, for example,
plant data could be simultaneously backed up not only
in a data center in Japan but also in a data center
in Taiwan. However, it only takes a long transfer
distance to slow down transfer speed and significantly
increase backup time. On the other hand, data transfer
can be greatly speeded up using APN thereby minimizing
system restoration time when a disaster occurs.
Effective transfer speeds will differ even at the same
transmission speed of 10 Gbit/s. For example, for a
similar dedicated line, I-WAN achieved a transfer
speed of only 2.81 Gbit/s while APN doubled this
to almost 5 Gbit/s. As a result, backup time too could
be decreased from three minutes to one minute enabling
highly efficient data backup.
(2) High-efficiency remote production by APN
At present, live broadcasting of soccer, baseball,
and other sports events requires a large broadcast
van and more than 50 people for each match to provide
on-site support over a long period of time. This
requires a huge amount of resources for a broadcast
station, so making program production more efficient
has become an urgent issue.
In response to this problem, we can connect a studio,
stadium, or other site to APN and send all data via
the cloud. We can also store software for editing on
the cloud to enable remote editing from a production
base. This scheme would enable the production of
high-quality programs with one-third the staff
required in the past.
(1) APN connections between data centers overseas
We are also undertaking APN connections between data
centers located in other countries. In India, for example,
we set up connections between three data centers in
Mumbai in September 2024, and last year, we conducted
verification experiments of APN connections between data
centers in the United States and in the United Kingdom.
In this way, we are working to establish use cases of
distributed data centers by APN even overseas to support
NTT’s global business.
(2) Watt/bit linking by APN
The use of APN for watt/bit linking is also expected
as part of a plan in which the government maintains
the power grid and communications infrastructure in an
integrated manner. Here, by connecting regionally
distributed data centers by DCX, computer processing
can be performed at data centers located in areas where
the supply and demand of green energy is greatest.
This promotes local production for local consumption
of green energy, and it improves the usage efficiency
of renewable energy by dynamically arranging workload
based on supply-and-demand conditions of renewable
energy.
(3) Distributed GPU cloud by APN
Consideration is now being given as to whether APN
could also be used in AI machine learning, which has
recently become a hot topic. This is because rack space
at data centers concentrated in urban centers is becoming
scarce making it difficult to extend graphics processing
unit (GPU) clusters. For the case that GPU expansion
is desired but there is no space to do so, we are
conducting experiments on using GPUs at different data
centers like a single GPU cloud.
For example, on conducting an experiment on the drop
in performance when using distributed data centers
compared with a single data center, results showed that
training time would take 29 times longer when using the
Internet compared with only 1.006 times when using APN.
These results showed that distributed GPUs could be used
almost as if they were located in the same data center.
The question as to whether dark fiber is better has
also come up, so here we compare APN and dark fiber
and we explain why APN is superior. Since APN network
services are already being provided, only the access
portion needs to be configured making the launch period
short, and in addition, connections points can be
changed in an on-demand manner.
Management cost is also very low, and for long-distance
transmission, APN is superior since it will cover for
any relay equipment not prepared on one’s own.
Reliability and redundancy is also high with APN,
and since a single fiber can be shared, APN also
features high economy compared with dark
fiber (Fig. 4).
APN Step 3 increases transmission capacity by 125
times compared to that at the time of announcing
the IOWN concept, which is a dramatic increase
from Step 2. We aim to raise power efficiency even
more by promoting an all-optical network so that we
can economically expand APN areas.
In Step 3, we will also make more enhancements to
APN to further expand its use. One of these
enhancements is “on-demand connections.” This will
require that wavelength collisions and wavelength
paths be controlled, and the technologies for doing
so are “optical path design technology” and
“wavelength conversion and wavelength-band
conversion technology.”
In APN, a separate problem arises in that there
are constraints in achieving low-latency,
large-capacity on-demand services when connecting
two points by an optical path in an end-to-end
manner. For example, what wavelength band can pass
depends on the optical fiber, so a mechanism is
needed to flexibly control a large-capacity optical
path. A system for achieving such a mechanism is a
Photonic Exchange (Ph-EX) (Fig. 5).
A “wavelength-band conversion function” can use optical fiber already laid in the existing network by converting light to optimal wavelength bands to transmit signals along that optical fiber and achieve an end-to-end optical connection. NTT possesses technology for bundling wavelengths and converting them at a device, so wavelength-band conversion can be performed with good efficiency without delay. NTT also has a “wavelength conversion function” that can perform conversion in units of wavelengths without delay, which has the effect of reducing total delay time.
I will now talk about the third-generation and
fourth-generation PEC devices. Our goal is to apply an
optical engine to board-to-board connections as PEC-2
from FY2025, apply photonics to package-to-package
connections as PEC-3 from 2028, and apply photonics
to intra-chip die-to-die connections from 2032.
For PEC-3 and PEC-4 devices at NTT, we are driving
the evolution of silicon photonics and the evolution of
membranes (thin films). In IOWN PEC, we would like to
implement an ultra-small optical transceiver within a
package. To this end, we have so far fabricated a very
small 16-channel prototype transceiver at only
1.11 mm × 2.75 mm. In particular, to achieve a small
and high-speed direct modulation laser, how to make
the laser smaller and how to confine light and prevent
heat generation are key problems that must be solved.
However, in the conventional fabrication method,
the active layer is thick due to vertical stacking,
and heat is easily generated with increased height.
At NTT, to achieve a thin active layer, we radically
changed the structure of the existing optical device,
devised a horizontal-fabrication method, and applied
indium phosphorus (InP) in the form of a membrane on
a silicon carbide (SiC) substrate. With this technology,
NTT Laboratories is on a world-class level.
Finally, for IOWN 2.0, I would like to introduce DCI-2 that we are now developing with a target date for commercialization around 2026. In DCI-2, we aim to increase power efficiency by eight times by connecting CDI servers that subdivide computer resources into units of boards to optical switches using photonics-electronics-convergence devices and controlling them by a DCI controller.
The IOWN Global Forum was launched in 2019, and since then, the number of members has been increasing steadily. At present, its members consist of 154 organizations and associations, and even Google has recently joined and begun participating in discussions.
Since the announcement of tsuzumi in November 2023,
we have provided consultation about its
implementation to many companies, and after one
year, this has come to more than 900
companies.
In addition, tsuzumiwas
the first LLM in Japan to be adopted in
Microsoft’s Models-as-a-Service lineup,
which was announced at Microsoft’s Ignite
conference held in Chicago in the United States.
There are also plans to adopt tsuzumi in the Salesforce
LLM Open Connector for actual use in the future.
(1) Issues in LLM scaling up
LLMs are appearing in models of various sizes,
and a trend toward large-scale models can also be
seen, but training cost is huge. For example,
the training cost of GPT-3 when ChatGPT first
appeared was about 500 million yen per session,
while the training cost of GPT-4 and Gemini is
coming to 15 to 20 billion yen per session.
Power consumption is also massive.
One training session on the scale of GPT-3 requires
1300 MWh or the power from one nuclear power plant.
Going forward, the need for upgrading GPUs is
expected to become particularly intense,
so there is a need to consider environmental
issues with the aim of achieving Sustainable
Development Goals (SDGs)
(2) tsuzumi Features
Against this background, we researched and
developed “tsuzumi”
with the aim of creating a “small and lightweight
LLM.” tsuzumi
has five main features as follows.
The reasons for developing foundational models
from scratch revolve around issues such as copyright
problems, development freedom, and economic security.
We are conducting research and development here with
the aim of achieving a detailed, well-thought-out
model in Japan.
In 2023, we commercialized version 1.0 of
tsuzumi with 7
billion (7B) parameters. At present,
versions 1.1 and 1.2 represent an evolution toward
more supported languages and multimodal support.
Additionally, while still a beta version, we have
raised accuracy from 7B to 13B thereby achieving
a level of accuracy comparable to world-class LLMs
of the same scale, namely, Llama 2 and Llama 3.
In summarization and Q&A, this beta version
outperforms Llama (Fig. 6).
(1) AI agent: Operates the PC for the user
An AI agent operates a personal computer on behalf of
the user and executes the target task. For example,
if the user gives the instruction “purchase product
A listed in this catalog,” the language model visits
the product purchasing site or creates an in-house
purchasing site and automates all procedures up to
the actual purchase of that product.
In daily work, it is rare for one task to be completed
on a single page. With tsuzumi, however, simply chatting
with the system will automatically open up the pages
needed and even input the information required.
Moreover, with respect to many input fields, tsuzumi can refer to company
manuals and use its language comprehension ability
to determine what information should be entered where
and to then enter that information. This series of
operations can also be completely automated, but it
is designed so that human checking can be performed
along the way to prevent any errors from occurring.
(2) AI agent: Digital human that behaves naturally
like a human
Unlike past digital humans that only make mechanical
responses, we aim to develop a digital human that is
capable of more human-like, smooth exchanges called
“synlogue.” The idea here is to have the speaker and
listener create utterances together. In other words,
one speaker would not necessarily have to complete an
utterance before the other speaker begins to talk.
To this end, we are researching and developing new
dialogue architecture that creates a series of utterances
while multiple LLMs having different processing speeds
and expertise collaborate in generating that
conversation. In this way, we will achieve a digital
human capable of more natural dialogue that can freely
speak and easily be spoken to. Such a digital human will
utter responses in agreement, create pauses in generating
utterances by deliberately hesitating, and let the
conversation partner talk if interrupted while
talking.
This digital human makes abundant use of NTT technologies
such as image recognition, situational awareness,
and voice recognition. However, portions involved
in slow-paced thinking and topic selection make use
of ChatGPT.
(3) Multimodality: Understands voice features
and content and replies in natural language
It is possible to extend the ability of LLMs to
understand and analyze not only language but to also
understand the content of speech and information unique
to speech such as intonation.
If age, gender, or other attributes can be predicted from
the pitch or intonation of a speaker’s voice, it
should be possible to analyze what the speaker needs
and the urgency of that need. As a direct application,
this technology could be applied to automatic call
distribution at a call center to reduce customer wait
times. In the future, by handling voice not only in input
but in output too, we aim to develop AI operators and
AI automatic replies for call centers, actual shops,
and other applications.
(4) Utterance-unit speech summary: quickly
summarizes spoken words
We have developed technology that provides ease-of-reading
as in a full-text summary while maintaining the real-time
characteristics of speech recognition. This technology
enables real-time summarizing of a long meeting or
presentation so that participants who join a meeting
midway through can quickly grasp the main points that
have so far been made. It can also quickly grasp in real
time information that could not be obtained by
conventional speech recognition and full-text summaries
thereby making work more efficient and speeding up
decision making.
(5) Multimodality: Gives guidance on how to run
in place of a sports trainer
Another application of multimodality is to reproduce
the perspectives and judgments unique to a sports
trainer using generative AI. In the case of running,
for example, this extension simply observes a runner
in action from the viewpoint of a sports trainer to
identify key points in the runner’s way of running
and analyze differences between those movements and
those of a role model. It can also provide
easy-to-understand coaching tips just like those of a
sports trainer and guide the runner to run in a way
closer to that of the role model.
(1) AI Network operation × generative AI
Our goal is self-evolving zero-touch operation (ZTO) to
prevent and minimize the impact of failures and quality
drops in network services on customers. This means the
ability to automatically detect and analyze any kind of
failure that might occur and to take appropriate
measures without human intervention. Specifically,
we will apply generative AI to the research and
development of an AI/network technology group consisting
of operation tasks and to the development of an
AI/network training platform. Here, we will simulate
pseudo failures using AI and network digital twins to
learn diverse and unknown failures.
(2) Security operations × generative AI
Generative AI can be extremely effective not only in
network operations but also in security operations.
In the work of preparing an in-house security report,
for example, the conventional approach has been to
have any report prepared by a new security head checked
and brushed up by a veteran security head to produce
a good report. However, this kind of know-how is
tacit knowledge developed through years of experience
that is not easily acquired or inherited.
This creates a problem in that the quality of reports
depends on the individual.
However, NTT has accumulated the result of these
tasks up to the present, which means that it possesses
a large quantity of reports prepared by new security
heads and reports checked and brushed up by superiors.
This data can therefore be used to train an LLM and
formalize this tacit knowledge so that perfect security
reports can be prepared by simply supplying
information. In addition, linking this technology
with databases will enable the creation of even more
valuable security reports for in-house use.
We came up with the concept of an AI Constellation by thinking that, instead of creating a large monolithic LLM, wouldn’t it be possible to solve social problems by creating small, specialized, and diverse LLMs that can behave in either an autonomous, decentralized manner or in coordination with each other. As a use case of an AI Constellation, NTT recently held a workshop in Omuta City, Fukuoka prefecture in which AI agents discussed local social problems with each other. Specifically, the agents grasped local conditions, presented ideas from diverse perspectives, and discussed the issues amongst themselves. As a result of this activity, human ideas and opinions emerged thereby stimulating further discussions.
As to why generative AI behaves the way it does,
there are still many unknowns. For example, there are
questions like “How is it that generative AI trained
only in English can also handle Japanese?” Developing
and controlling generative AI whose inner workings can
be understood is said to be difficult. At NTT Research,
research into understanding the inner workings of AI
has begun by launching a new research field called
“Physics of Intelligence” in collaboration with Harvard
University's Center for Brain Science (CBS).
To give a typical research case, a relatively accurate
picture can be produced when entering a prompt like
“Draw a lizard (or goldfish) with the color specified.”
However, if the animal specified is a panda, a less
than accurate picture will be produced. These
experiments concern the essence of imagination in AI
and generative AI, and the difference between the two,
which can be stated as “a lizard can be imagined but
a panda cannot,” is being mathematically proven and
research results are being presented.
“Do research by drawing from the fountain of knowledge and provide specific benefits to society through commercial development.” Goro Yoshida, the first director of the Electrical Communication Laboratory, spoke these words on the founding of what was to become NTT Laboratories. These words still live on as our DNA, and at NTT, we attach great importance to the flow of research, development, and social implementation. NTT aims to become an R&D Center Of Excellence (COE) having responsibility for all of these steps, and to this end, we will repeat the cycle of research, development, and social implementation.
(1) Research
With regard to number of papers, NTT ranked 11th in
the world in the 2017–2021 tabulation but moved up to
9th in the world in the 2019–2023 tabulation.
We hope to become 5th in the world in the
near future.
However, on narrowing down the fields, there are many
in which NTT has been 1st or 2nd in the world.
For example, in optical communications, the basis of
IOWN, and in information security, neurological
function analysis, and quantum computers, NTT has
reached 1st and 2nd globally. We hope to expand our
involvement in world-class research fields from
here on.
Additionally, with regard to number of patent
applications, NTT ranks 13th in the world and 1st
in Japan. However, the number of patent applications
by countries like the United States and China are
increasing and are expected to keep increasing in the
years to come, so at NTT, we plan to step on the
accelerator and make every effort to increase our
number of patent applications. At the same time,
NTT Research presented 110 research papers in FY2023,
which accounted for 14% of the world’s most advanced
papers in cryptography, some of which have received
international awards.
(2) Development
In development, we will accelerate our R&D
efforts in IOWN and tsuzumi that I
previously introduced.
(3) Social implementation
In 2023, the Research and Development Planning
Department, Market Planning & Analysis Department,
and Alliances Department linked up under
the Research and Development Market Strategy
Division to form a new system with the goal of
getting research results into society not only
in terms of technology but also from a market
perspective.
In this new system, the Research and Development
Planning Department works closely with the Market
Planning & Analysis Department and Alliances
Department to implement R&D results into
society. A number of companies have also been launched
as spin-offs. These include NTT sonority that
develops and sells open-ear headphones with no
sound leakage as I introduced first,
Space Compass that aims to construct space data
centers, and NTT Green & Food involved in
land-based aquaculture.
Another spin-off from NTT Laboratories is NTT
AI-CIX that aims to contribute to further advances
in AI. Its founding reflects the intensification of
data use to promote the digital transformation of
society and industry as part of new data-driven
value creation promoted by NTT, and it arrives as
domestic AI businesses mutually expand with global
AI businesses (Fig. 7).
The original role of NTT AI-CIX in R&D was to develop AI models, but going forward, it looks to provide end-to-end solutions from consulting to AI model development plus platform services by focusing on two inseparable issues: what kind of problems are present in the customer’s industry and how can these problems be solved.
By repeating the cycle of research, development, and social implementation, we aim to produce research and development results that are useful to everyone. In this endeavor, we look forward to your continued support.