NTT Human Informatics Laboratories
The first Olympic Games to be broadcast were the Olympic Games Berlin 1936. Even though the receiver was a television (TV), it only had 180 scanning lines and a screen size equivalent to 19 inches. Images of the Games were shown via such screens located at 28 venues around Berlin and watched by 100 spectators. At the Olympic Games scheduled for Tokyo in 1940, it was planned to broadcast the events via screens with 441 scanning lines, frame rate of 25 frames per second, and an aspect ratio of 4:5, which would be the basis for today’s TVs. However, the Olympic Games at Tokyo was cancelled due to World War II, so it became a phantom TV broadcast. In Rome in 1960, the Olympic Games were broadcast live, for the first time, to 18 European countries and, albeit one hour later, to the United States, Canada, and Japan. At the Olympic Games Tokyo 1964, NTT developed the wireless relay technology used for the first satellite relay of the Olympic Games. Since then, NTT has provided technical support for the color broadcast of all events at the Olympic Winter Games Sapporo 1972, the first 2K high-definition broadcast of the Olympic Games Seoul 1988, the 2K high-definition broadcast of all events at the Olympic Winter Games Nagano 1998, and the 8K super-high-definition public viewing at the Olympic Games London 2012.
Although video-relay technology has evolved over 80 years in the manner described above, that evolution has been the same story about devices, namely, TVs, with screens surrounded by a square frame. In other words, it has not fundamentally changed the experience of watching sports. In preparation for the Olympic and Paralympic Games Tokyo 2020, in 2015, NTT began research and development (R&D) of an ultra-realistic communication technology called “Kirari!” for creating a new sports-viewing experience. To deliver the experience of being at the venue to those who cannot attend the actual event, Kirari! is an ambitious project that removes the square frame of conventional TV and public viewing and transmits the event space itself. Since 2015, we have experimentally demonstrated Kirari! for live viewing of a variety of events in addition to sports such as the Japanese professional soccer league and U.S. professional baseball league. Example live-viewing demonstrations include kabuki events in Japan and abroad, a global technology event in Austin, Texas, USA, a synchronized performance by a three-female-member techno-pop unit in three cities around the world, and the opening event at the new National Stadium in Tokyo (Fig. 1).
Initially, Kirari! was intended to provide viewing for people who lived far away from the event venue, people who could not afford tickets, and those unable to go out due to illness. However, with the spread of the novel coronavirus (COVID-19) in 2020, people’s mobility became ever more restricted, and the vision of Kirari! grew ever more important. The feeling that you could not go to an event venue even if you wanted to go or that something was missing even if you watched the event on TV or smartphone became even more pronounced in 2020 when events, such as concerts and sports, were canceled due to the COVID-19 pandemic and online distribution suddenly expanded. In the beginning, many people tried online streaming to relieve the stress of being unable to get to the venue, out of curiosity, or because of its ease of use. However, in accordance with the results of a survey called “Awareness Survey on Live Music Distribution” carried out by SKIYAKI Inc. in September 2020, many people who have experienced real live performances answered that real live performances are better than online-streaming ones because they convey senses of presence, unity, and specialness. For example, 77% of the survey’s respondents answered, “Real live is better” because it gives “a sense of presence” (92.9%), “sense of unity” (93.6%), and “sense of specialness” (69.6%). Given those results, we considered the question, “How can we recreate those sensations?” For that reason, recognizing once again the importance of the vision of Kirari!, we decided to take on that challenge at the Olympic and Paralympic Games Tokyo 2020.
The “sense of presence” concerning a sports event means that you feel as if you are actually there watching the event. In the actual venue, the distance between a spectator to the action is so great that you cannot see more than a speck in the distance. Although it is easier to see the action online, the feeling of “being there” is what makes the real thing so appealing. This sense of being there is based on two main factors: the feeling that “you are there” (at the venue) and the feeling that the “athletes are there” (in front of you). TVs and smartphones fit into a small square frame, so we perceive them as a separate space; in other words, we do not feel that we are sharing the event space with the athletes or feel the space that expands when we enter the venue and sit down.
To create a sense that “you are there” (at the venue), as if you were sitting in the stadium, we attempted to reproduce an expanse of space that covers the full viewing angle. To achieve this, we use the ultra-wide image-synthesis technology of Kirari!. Multiple 4K cameras are installed and multiple videos from them are seamlessly stitched together in real time to produce a single ultra-high-resolution video image. As a result, video images with a resolution of over 20K can be displayed on a display several tens of meters wide. Although it is possible to enlarge the video image from a 4K or 8K camera and project it on a huge display that covers the entire viewing angle, the image resolution will inevitably be rough, and it will be difficult to reproduce the sense of presence of the scene. It is also difficult to freely change the size of the image (in terms of height and width) in accordance with the event to be presented. NTT has conducted experiments with the U.S. professional baseball league, professional Japanese soccer league, tennis tournament in France, windsurfing world championships, and a large-scale fashion show for women in Tokyo.
For the Olympic and Paralympic Games Tokyo 2020, we used this technology at the sailing, at which the spectators were seated far from the sailing course. By using Kirari!, we transmitted the sailing course directly to an offshore display near the spectators’ seats to reproduce the sensation as if the races were being held right in front of them (Fig. 2). The details are introduced in the article “Sailing × Ultra-realistic Communication Technology Kirari!” in this issue .
To create a sense of the “athletes are there” (in front of you), we attempted to create three-dimensional images of the athletes at the Tokyo 2020 Games. To achieve this, we used the “real-time extraction of objects with arbitrary backgrounds,” a component technology of Kirari!. Even if there is no green screen, the athlete’s image is extracted in real time and displayed in a holographic manner. Even if it is the same image, displaying it as a holograph gives the audience a stronger sense of the athlete’s presence. The sense of presence was further enhanced by setting up realistic objects other than the images of the athletes, such as the badminton court or table tennis table. We conducted experiments with this technology at karate, judo, and badminton tournaments as well as at kabuki performances and a large-scale music event in Texas, USA.
For the Olympic Games Tokyo 2020, we used this technology at the badminton competition. Only video images of the athletes were extracted from 8K camera images of the badminton matches held at Musashino Forest Sport Plaza, and those images were transmitted to the National Museum of Emerging Science and Innovation (Miraikan), the remote-viewing venue. At this remote-viewing venue, courts, nets, and spectators’ seats were set up in a manner just like that at the main venue. The holographic images of the athletes were then displayed in conjunction with those objects in a way that created a sense of the athletes’ presence just like that at the main venue (Fig. 3). The details are introduced in the article “Badminton × Ultra-realistic Communication Technology Kirari!” in this issue .
A “sense of unity” means a connection between the athletes and spectators as well as among spectators. In sports, cheering from the spectators is the most powerful force in regard to the athletes. Spectators can connect with each other through cheering, thereby creating even more excitement as they become one. For online streaming, which was expanded in 2020, efforts were made to enhance the sense of unity by, for example, displaying images of the remote audience on the stage and initiating call-and-response. In reality, however, the timing of the shouts and cheers was too disjointed, preventing a sense of unity.
The reason for that disjointedness is mainly the latency and variability of the communication. Delays of more than a few dozen milliseconds generally make music sessions difficult, and delays of more than a few hundred milliseconds make call-and-response uncomfortable. Such a delay has the following three components: the propagation delay of light is about 0.5 ms for 100 km, transmission-processing delay is several milliseconds to several dozen milliseconds, and video-coding process takes several hundred milliseconds. With the addition of video editing and other factors, it is not uncommon for the final delay to be up to ten seconds in the case of digital terrestrial broadcasting and live streaming over the Internet.
For the Olympic Games Tokyo 2020, ultra-low-latency communication technology, which significantly reduces latency, was used at the marathon. Spectators’ cheers from Tokyo were delivered without delay to the athletes in Sapporo, who were running at 5 m/s, and the challenge was to create a sense of unity that transcended distance (Fig. 4). The details are introduced in the article “Marathon × Ultra-low-latency Communication Technology” in this issue .