2024 REPORT

2024 REPORT

IOWN INTEGRAL

Senior Vice President
Head of Research and Development Planning
NTT Corporation

Shingo Kinoshita

This article introduces the initiatives, practical examples, and future outlook of NTT’s Large Language Model (LLM) “tsuzumi” and Innovative Optical & Wireless Network (IOWN). It is based on the KEYNOTE SPEECH given by Shingo Kinoshita, Head of Research and Development Planning, on November 26th, 2024.

NTT R&D FORUM 2024 Overview

In the title of my talk, “IOWN INTEGRAL,” INTEGRAL has two meanings: “integration” and “indispensable.” “Integration” refers to the application and integration of IOWN across a wide range of areas, and “indispensable” means that IOWN will become indispensable to the earth and mankind.
At this year’s Forum, the exhibition areas are divided into RESEARCH, DEVELOPMENT, and BUSINESS.

  • RESEARCH: About 50 exhibits were presented on diverse themes undertaken by NTT R&D including network, sustainability, security, bio/medical, and quantum.
  • DEVELOPMENT: This area presented advanced research in IOWN, tsuzumi generative AI, and space developments undertaken by NTT R&D and NTT Group companies and practical examples of this R&D.
  • BUSINESS: This area exhibited practical examples of technologies from NTT R&D and NTT Group companies.

■ RESEARCH area: recommended exhibits

(1) Personal Sound Zone, Noninvasive Glucose Sensor
Let me first introduce active noise cancelling (ANC) technology for the Personal Sound Zone.
While open-ear headphones that leak no sound and do not cover the ear are now available, the problem remains that users can still hear peripheral noise in trains and other noisy places. This exhibit introduced “active noise cancelling technology” that eliminates this noise so that users can hear the music they want to hear.
It also presented technology that blocks peripheral noise when entering a dome-like space.
We also introduced a “noninvasive glucose sensor.” This is technology for measuring blood sugar level without having to prick one’s body with a needle. It uses a device that is even smaller than the prototype presented last year, so we are getting even closer to its commercialization (Fig. 1).

RESEARCH area: recommended exhibits
Fig.1. Recommended exhibits in RESEARCH area.

(2) Drug Ingredients Penetration Promotion Technology, Visualization of Hand and Foot Dexterity
The first of these exhibits introduces drug ingredients penetration promotion technology that facilitates the penetration of drug ingredients in a beauty facial mask, for example, through ionization using NTT battery technology having low environmental impact. The other exhibit introduces technology for visualizing hand and foot dexterity using a smartphone.

(3) Quantum Computer
NTT has undertaken the development of quantum computers. At present, mainstream quantum computers are achieved by the superconducting method or neutral-atom method. These methods, however, need to operate at extremely low temperatures, which means that the equipment that must be used to provide continuous cooling will invariably be large. In contrast, the quantum computer targeted by NTT uses optical pulses the same as in optical communications to achieve quantum states called “quantum bits” or “qubits” that serve as the basis of computation. Our method enables large-scale computation as long as there is equipment for generating optical pulses. It also makes cooling to extremely low temperatures as in other methods unnecessary negating the need for large-scale equipment. With this method, we would like to use the optical communications technology nurtured by NTT to accelerate current trends in quantum computers toward a large-scale quantum computer capable of practical general-purpose computation.

■ DEVELOPMENT area: recommended exhibits

(1) C89 Space Business Brand, Wireless Energy Transmission Technology
The “NTT C89” brand in the space business field was officially launched on June 13, 2024.
NTT C89 defines the businesses, services, and R&D activities of NTT Group companies in the space field as “stars.” It expresses the idea of “creating an 89th constellation” by organically linking these stars1.
By organically linking the businesses and services of NTT Group companies in the space field and proposing solutions that meet customer needs, we aim to strengthen the business of NTT Group companies in the space sector while creating synergetic effects and opening up new markets in space.
In wireless energy transmission technology, I first point out that it would be possible, for example, to run mini-cars (rovers) on the surface of the moon over long distances through built-in batteries using solar cells. However, there is the problem that temperature differences are intense in a lunar environment and that solar cells cannot be used if the battery is not operating well or if the vehicle enters a shaded area. How to go about supplying energy in a lunar environment is therefore an issue.
In response to this problem, we developed technology for supplying power remotely by using “electric field surface waves” and passing electromagnetic waves on the sand covering the lunar surface. Furthermore, by changing the communication format used from observation satellites to ground stations from conventional radio-frequency (RF) communication to optical communication, we aim to create businesses on a scale of tens of billions of yen in annual revenue.
1 At present, the International Astronomical Union (IAU) has determined that 88 constellations exist.

IOWN

■ IOWN Roadmap

To begin with, I would like to introduce the IOWN roadmap, from IOWN 1.0 to IOWN 4.0.
IOWN 1.0 is networking, or in other words, technology for establishing complete photonics between data centers.
IOWN 2.0 is board-to-board photonics for devices accommodated within data-center racks.
IOWN 3.0 is package-to-package photonics, and IOWN 4.0 is intra-chip, or die-to-die photonics. In this way, IOWN is evolving.
In addition, the All-Photonics Network (APN) is elemental technology for configuring each of the above versions of IOWN. Within IOWN 1.0, it will feature wider bandwidths and reduced power consumption.
In addition, photonics-electronics convergence (PEC) technology will evolve along with IOWN generations as PEC-2, PEC-3, and PEC-4. Likewise, the next-generation computing platform known as the Data Centric Infrastructure (DCI) will evolve together with the evolution of PEC (Fig. 2).

IOWN Roadmap Details
Fig. 2. IOWN roadmap.

■ All-Photonic Connect Powered by IOWN

In March 2023, NTT EAST and NTT WEST launched an APN IOWN 1.0 service. There are also plans to provide a series of new services from December 1, 2024 to expand and enhance frequency bands, coverage areas, and types of interfaces.
There are three key features of these new services.

  1. A maximum bandwidth guarantee of 800 Gbit/s meeting the world’s highest standards
  2. Wide-area service provision with connections between major cities
  3. Enhanced service configurations/interfaces and low power consumption

In the past, we provided only the OTU4 optical interface, but on hearing from the corporate side that “Ethernet interfaces were easier to use,” we decided to provide Ethernet interfaces as well. The provision of Ethernet interfaces made terminating equipment at customer sites unnecessary thereby saving space and reducing power consumption (a maximum reduction of 940 W at both sites).

■ IOWN APN Step 1 and Step 2 for Enterprise

An APN connection was established between Japan and Taiwan as a world’s first on August 29, 2024. In fact, a variety of video-based demonstrations have been set up between Taiwan and Japan. Although the distance between Japan and Taiwan is about 3000 km, a delay time of approximately 17 ms has been achieved. Since the transmission delay of optical fiber is said to be 15 ms, this means that we have achieved stable communications with low latency and no jitter (Fig. 3). Several experiments are also being conducted using APN between Japan and Taiwan.

IOWN APN Step 1, 2 for Enterprise
Fig. 3. IOWN APN Step1 and Step 2 for Enterprise.

(1) Ultra-high-speed data backup by APN
In the event of a major disaster, for example, plant data could be simultaneously backed up not only in a data center in Japan but also in a data center in Taiwan. However, it only takes a long transfer distance to slow down transfer speed and significantly increase backup time. On the other hand, data transfer can be greatly speeded up using APN thereby minimizing system restoration time when a disaster occurs.
Effective transfer speeds will differ even at the same transmission speed of 10 Gbit/s. For example, for a similar dedicated line, I-WAN achieved a transfer speed of only 2.81 Gbit/s while APN doubled this to almost 5 Gbit/s. As a result, backup time too could be decreased from three minutes to one minute enabling highly efficient data backup.

(2) High-efficiency remote production by APN
At present, live broadcasting of soccer, baseball, and other sports events requires a large broadcast van and more than 50 people for each match to provide on-site support over a long period of time. This requires a huge amount of resources for a broadcast station, so making program production more efficient has become an urgent issue.
In response to this problem, we can connect a studio, stadium, or other site to APN and send all data via the cloud. We can also store software for editing on the cloud to enable remote editing from a production base. This scheme would enable the production of high-quality programs with one-third the staff required in the past.

■ IOWN APN Step1 and Step 2 for DCX (Digital Customer Experience)

(1) APN connections between data centers overseas
We are also undertaking APN connections between data centers located in other countries. In India, for example, we set up connections between three data centers in Mumbai in September 2024, and last year, we conducted verification experiments of APN connections between data centers in the United States and in the United Kingdom. In this way, we are working to establish use cases of distributed data centers by APN even overseas to support NTT’s global business.

(2) Watt/bit linking by APN
The use of APN for watt/bit linking is also expected as part of a plan in which the government maintains the power grid and communications infrastructure in an integrated manner. Here, by connecting regionally distributed data centers by DCX, computer processing can be performed at data centers located in areas where the supply and demand of green energy is greatest. This promotes local production for local consumption of green energy, and it improves the usage efficiency of renewable energy by dynamically arranging workload based on supply-and-demand conditions of renewable energy.

(3) Distributed GPU cloud by APN
Consideration is now being given as to whether APN could also be used in AI machine learning, which has recently become a hot topic. This is because rack space at data centers concentrated in urban centers is becoming scarce making it difficult to extend graphics processing unit (GPU) clusters. For the case that GPU expansion is desired but there is no space to do so, we are conducting experiments on using GPUs at different data centers like a single GPU cloud.
For example, on conducting an experiment on the drop in performance when using distributed data centers compared with a single data center, results showed that training time would take 29 times longer when using the Internet compared with only 1.006 times when using APN. These results showed that distributed GPUs could be used almost as if they were located in the same data center.

■ IOWN APN vs. Dark Fiber

The question as to whether dark fiber is better has also come up, so here we compare APN and dark fiber and we explain why APN is superior. Since APN network services are already being provided, only the access portion needs to be configured making the launch period short, and in addition, connections points can be changed in an on-demand manner.
Management cost is also very low, and for long-distance transmission, APN is superior since it will cover for any relay equipment not prepared on one’s own. Reliability and redundancy is also high with APN, and since a single fiber can be shared, APN also features high economy compared with dark fiber (Fig. 4).

IOWN APN vs. dark fiber
Fig. 4. IOWN APN vs. dark fiber.

■ APN Step 3

APN Step 3 increases transmission capacity by 125 times compared to that at the time of announcing the IOWN concept, which is a dramatic increase from Step 2. We aim to raise power efficiency even more by promoting an all-optical network so that we can economically expand APN areas.
In Step 3, we will also make more enhancements to APN to further expand its use. One of these enhancements is “on-demand connections.” This will require that wavelength collisions and wavelength paths be controlled, and the technologies for doing so are “optical path design technology” and “wavelength conversion and wavelength-band conversion technology.”
In APN, a separate problem arises in that there are constraints in achieving low-latency, large-capacity on-demand services when connecting two points by an optical path in an end-to-end manner. For example, what wavelength band can pass depends on the optical fiber, so a mechanism is needed to flexibly control a large-capacity optical path. A system for achieving such a mechanism is a Photonic Exchange (Ph-EX) (Fig. 5).

IOWN APN Step 3
Fig. 5. Photonic exchange.

A “wavelength-band conversion function” can use optical fiber already laid in the existing network by converting light to optimal wavelength bands to transmit signals along that optical fiber and achieve an end-to-end optical connection. NTT possesses technology for bundling wavelengths and converting them at a device, so wavelength-band conversion can be performed with good efficiency without delay. NTT also has a “wavelength conversion function” that can perform conversion in units of wavelengths without delay, which has the effect of reducing total delay time.

■ PEC-3/PEC-4

I will now talk about the third-generation and fourth-generation PEC devices. Our goal is to apply an optical engine to board-to-board connections as PEC-2 from FY2025, apply photonics to package-to-package connections as PEC-3 from 2028, and apply photonics to intra-chip die-to-die connections from 2032.
For PEC-3 and PEC-4 devices at NTT, we are driving the evolution of silicon photonics and the evolution of membranes (thin films). In IOWN PEC, we would like to implement an ultra-small optical transceiver within a package. To this end, we have so far fabricated a very small 16-channel prototype transceiver at only 1.11 mm × 2.75 mm. In particular, to achieve a small and high-speed direct modulation laser, how to make the laser smaller and how to confine light and prevent heat generation are key problems that must be solved.
However, in the conventional fabrication method, the active layer is thick due to vertical stacking, and heat is easily generated with increased height. At NTT, to achieve a thin active layer, we radically changed the structure of the existing optical device, devised a horizontal-fabrication method, and applied indium phosphorus (InP) in the form of a membrane on a silicon carbide (SiC) substrate. With this technology, NTT Laboratories is on a world-class level.

■ DCI-2

Finally, for IOWN 2.0, I would like to introduce DCI-2 that we are now developing with a target date for commercialization around 2026. In DCI-2, we aim to increase power efficiency by eight times by connecting CDI servers that subdivide computer resources into units of boards to optical switches using photonics-electronics-convergence devices and controlling them by a DCI controller.

■ IOWN Global Forum Member Status

The IOWN Global Forum was launched in 2019, and since then, the number of members has been increasing steadily. At present, its members consist of 154 organizations and associations, and even Google has recently joined and begun participating in discussions.

Generative AI/tsuzumi

tsuzumi Evolution

Since the announcement of tsuzumi in November 2023, we have provided consultation about its implementation to many companies, and after one year, this has come to more than 900 companies.
In addition, tsuzumiwas the first LLM in Japan to be adopted in Microsoft’s Models-as-a-Service lineup, which was announced at Microsoft’s Ignite conference held in Chicago in the United States. There are also plans to adopt tsuzumi in the Salesforce LLM Open Connector for actual use in the future.

tsuzumi Evolution
Fig. 6. tsuzumi evolution: High degree of Japanese knowledge and strong fundamental generative capabilities.

(1) Issues in LLM scaling up
LLMs are appearing in models of various sizes, and a trend toward large-scale models can also be seen, but training cost is huge. For example, the training cost of GPT-3 when ChatGPT first appeared was about 500 million yen per session, while the training cost of GPT-4 and Gemini is coming to 15 to 20 billion yen per session.
Power consumption is also massive. One training session on the scale of GPT-3 requires 1300 MWh or the power from one nuclear power plant. Going forward, the need for upgrading GPUs is expected to become particularly intense, so there is a need to consider environmental issues with the aim of achieving Sustainable Development Goals (SDGs)

(2) tsuzumi Features
Against this background, we researched and developed “tsuzumi” with the aim of creating a “small and lightweight LLM.” tsuzumi has five main features as follows.

  1. Lightweight: Can run on one 1GPU/1CPU
  2. Flexible customization: Easy to incorporate specialized knowledge of industries and organizations
  3. Multimodality: Supports reading comprehension of graphs, tables, etc. in addition to text
  4. Proficiency in Japanese: World-class linguistic proficiency, especially in Japanese
  5. Developed from scratch: Foundational models are developed from scratch.

The reasons for developing foundational models from scratch revolve around issues such as copyright problems, development freedom, and economic security. We are conducting research and development here with the aim of achieving a detailed, well-thought-out model in Japan.
In 2023, we commercialized version 1.0 of tsuzumi with 7 billion (7B) parameters. At present, versions 1.1 and 1.2 represent an evolution toward more supported languages and multimodal support. Additionally, while still a beta version, we have raised accuracy from 7B to 13B thereby achieving a level of accuracy comparable to world-class LLMs of the same scale, namely, Llama 2 and Llama 3. In summarization and Q&A, this beta version outperforms Llama (Fig. 6).

tsuzumi Extensions

(1) AI agent: Operates the PC for the user
An AI agent operates a personal computer on behalf of the user and executes the target task. For example, if the user gives the instruction “purchase product A listed in this catalog,” the language model visits the product purchasing site or creates an in-house purchasing site and automates all procedures up to the actual purchase of that product.
In daily work, it is rare for one task to be completed on a single page. With tsuzumi, however, simply chatting with the system will automatically open up the pages needed and even input the information required. Moreover, with respect to many input fields, tsuzumi can refer to company manuals and use its language comprehension ability to determine what information should be entered where and to then enter that information. This series of operations can also be completely automated, but it is designed so that human checking can be performed along the way to prevent any errors from occurring.

(2) AI agent: Digital human that behaves naturally like a human
Unlike past digital humans that only make mechanical responses, we aim to develop a digital human that is capable of more human-like, smooth exchanges called “synlogue.” The idea here is to have the speaker and listener create utterances together. In other words, one speaker would not necessarily have to complete an utterance before the other speaker begins to talk.
To this end, we are researching and developing new dialogue architecture that creates a series of utterances while multiple LLMs having different processing speeds and expertise collaborate in generating that conversation. In this way, we will achieve a digital human capable of more natural dialogue that can freely speak and easily be spoken to. Such a digital human will utter responses in agreement, create pauses in generating utterances by deliberately hesitating, and let the conversation partner talk if interrupted while talking.
This digital human makes abundant use of NTT technologies such as image recognition, situational awareness, and voice recognition. However, portions involved in slow-paced thinking and topic selection make use of ChatGPT.

(3) Multimodality: Understands voice features and content and replies in natural language
It is possible to extend the ability of LLMs to understand and analyze not only language but to also understand the content of speech and information unique to speech such as intonation.
If age, gender, or other attributes can be predicted from the pitch or intonation of a speaker’s voice, it should be possible to analyze what the speaker needs and the urgency of that need. As a direct application, this technology could be applied to automatic call distribution at a call center to reduce customer wait times. In the future, by handling voice not only in input but in output too, we aim to develop AI operators and AI automatic replies for call centers, actual shops, and other applications.

(4) Utterance-unit speech summary: quickly summarizes spoken words
We have developed technology that provides ease-of-reading as in a full-text summary while maintaining the real-time characteristics of speech recognition. This technology enables real-time summarizing of a long meeting or presentation so that participants who join a meeting midway through can quickly grasp the main points that have so far been made. It can also quickly grasp in real time information that could not be obtained by conventional speech recognition and full-text summaries thereby making work more efficient and speeding up decision making.

(5) Multimodality: Gives guidance on how to run in place of a sports trainer
Another application of multimodality is to reproduce the perspectives and judgments unique to a sports trainer using generative AI. In the case of running, for example, this extension simply observes a runner in action from the viewpoint of a sports trainer to identify key points in the runner’s way of running and analyze differences between those movements and those of a role model. It can also provide easy-to-understand coaching tips just like those of a sports trainer and guide the runner to run in a way closer to that of the role model.

tsuzumi Applications

(1) AI Network operation × generative AI
Our goal is self-evolving zero-touch operation (ZTO) to prevent and minimize the impact of failures and quality drops in network services on customers. This means the ability to automatically detect and analyze any kind of failure that might occur and to take appropriate measures without human intervention. Specifically, we will apply generative AI to the research and development of an AI/network technology group consisting of operation tasks and to the development of an AI/network training platform. Here, we will simulate pseudo failures using AI and network digital twins to learn diverse and unknown failures.

(2) Security operations × generative AI
Generative AI can be extremely effective not only in network operations but also in security operations. In the work of preparing an in-house security report, for example, the conventional approach has been to have any report prepared by a new security head checked and brushed up by a veteran security head to produce a good report. However, this kind of know-how is tacit knowledge developed through years of experience that is not easily acquired or inherited. This creates a problem in that the quality of reports depends on the individual.
However, NTT has accumulated the result of these tasks up to the present, which means that it possesses a large quantity of reports prepared by new security heads and reports checked and brushed up by superiors. This data can therefore be used to train an LLM and formalize this tacit knowledge so that perfect security reports can be prepared by simply supplying information. In addition, linking this technology with databases will enable the creation of even more valuable security reports for in-house use.

■ AI Constellation

We came up with the concept of an AI Constellation by thinking that, instead of creating a large monolithic LLM, wouldn’t it be possible to solve social problems by creating small, specialized, and diverse LLMs that can behave in either an autonomous, decentralized manner or in coordination with each other. As a use case of an AI Constellation, NTT recently held a workshop in Omuta City, Fukuoka prefecture in which AI agents discussed local social problems with each other. Specifically, the agents grasped local conditions, presented ideas from diverse perspectives, and discussed the issues amongst themselves. As a result of this activity, human ideas and opinions emerged thereby stimulating further discussions.

■ AI Basic Research

As to why generative AI behaves the way it does, there are still many unknowns. For example, there are questions like “How is it that generative AI trained only in English can also handle Japanese?” Developing and controlling generative AI whose inner workings can be understood is said to be difficult. At NTT Research, research into understanding the inner workings of AI has begun by launching a new research field called “Physics of Intelligence” in collaboration with Harvard University's Center for Brain Science (CBS).
To give a typical research case, a relatively accurate picture can be produced when entering a prompt like “Draw a lizard (or goldfish) with the color specified.” However, if the animal specified is a panda, a less than accurate picture will be produced. These experiments concern the essence of imagination in AI and generative AI, and the difference between the two, which can be stated as “a lizard can be imagined but a panda cannot,” is being mathematically proven and research results are being presented.

Becoming a COE

“Do research by drawing from the fountain of knowledge and provide specific benefits to society through commercial development.” Goro Yoshida, the first director of the Electrical Communication Laboratory, spoke these words on the founding of what was to become NTT Laboratories. These words still live on as our DNA, and at NTT, we attach great importance to the flow of research, development, and social implementation. NTT aims to become an R&D Center Of Excellence (COE) having responsibility for all of these steps, and to this end, we will repeat the cycle of research, development, and social implementation.

Spin-offs from NTT Laboratories
Fig. 7. NTT Laboratories spin-off: Overall supply chain optimization through “chain-type AI” across businesses and industries.

(1) Research
With regard to number of papers, NTT ranked 11th in the world in the 2017–2021 tabulation but moved up to 9th in the world in the 2019–2023 tabulation. We hope to become 5th in the world in the near future.
However, on narrowing down the fields, there are many in which NTT has been 1st or 2nd in the world. For example, in optical communications, the basis of IOWN, and in information security, neurological function analysis, and quantum computers, NTT has reached 1st and 2nd globally. We hope to expand our involvement in world-class research fields from here on.
Additionally, with regard to number of patent applications, NTT ranks 13th in the world and 1st in Japan. However, the number of patent applications by countries like the United States and China are increasing and are expected to keep increasing in the years to come, so at NTT, we plan to step on the accelerator and make every effort to increase our number of patent applications. At the same time, NTT Research presented 110 research papers in FY2023, which accounted for 14% of the world’s most advanced papers in cryptography, some of which have received international awards.

(2) Development
In development, we will accelerate our R&D efforts in IOWN and tsuzumi that I previously introduced.

(3) Social implementation
In 2023, the Research and Development Planning Department, Market Planning & Analysis Department, and Alliances Department linked up under the Research and Development Market Strategy Division to form a new system with the goal of getting research results into society not only in terms of technology but also from a market perspective.
In this new system, the Research and Development Planning Department works closely with the Market Planning & Analysis Department and Alliances Department to implement R&D results into society. A number of companies have also been launched as spin-offs. These include NTT sonority that develops and sells open-ear headphones with no sound leakage as I introduced first, Space Compass that aims to construct space data centers, and NTT Green & Food involved in land-based aquaculture.
Another spin-off from NTT Laboratories is NTT AI-CIX that aims to contribute to further advances in AI. Its founding reflects the intensification of data use to promote the digital transformation of society and industry as part of new data-driven value creation promoted by NTT, and it arrives as domestic AI businesses mutually expand with global AI businesses (Fig. 7).

The original role of NTT AI-CIX in R&D was to develop AI models, but going forward, it looks to provide end-to-end solutions from consulting to AI model development plus platform services by focusing on two inseparable issues: what kind of problems are present in the customer’s industry and how can these problems be solved.

In Conclusion

By repeating the cycle of research, development, and social implementation, we aim to produce research and development results that are useful to everyone. In this endeavor, we look forward to your continued support.