Introducing Advanced security technology for "protecting" and "creating" a smart world.
As introduced in the first installment, NTT's basic concept for security comprises the two viewpoints, the <security to protect> and the <security to create> Smart World. Among them, what is aimed at with the <security to create> Smart World is the <world where all the processes of creating value can be implemented securely>. This time, the secure computation technology, the technology in relation to data encryption, will be introduced as an element NTT thinks necessary to embody that.
In recent years, as corporate activities are getting global and cloud computing and the IoT (Internet of Things) are spreading, many types of various terminals and the likes are mutually connecting, and a variety of data transmission and sharing is progressing rapidly. For example, according to Information and Communications in Japan, White Paper 2019, issued by the Ministry of Internal Affairs and Communications, it was estimated that the total number of the devices connected to the IoT would reach about 44.8 billion around the world in 2021. If DX (digital transformation) progresses and/or if turning humans, things, processes, and so on into data progresses, the circulation of such data will be further accelerated. Under these circumstances, it is the high-level data analyses such as deep leaning and AI (artificial intelligence) that are expected to realize the creation of values and to make business efficient and, thus, to lead to the solutions of various problems.
However, while the value of data is increasing in line with such expectations, the security risk of data and the worries about privacy are also increasing rapidly. In particular, to analyze data in targeting personal information and/or corporate information, legal restrictions such as the Personal Information Protection Act and GDPR (General Data Protection Regulation) and the worries in relation to privacy are also have to be considered as well as the risk of information leak.
To remove the risk and worries stated above, for the data on humans and/or the in relation to things and/or humans to be mutually circulated, and, thus, to realize the world leading to the solutions of problems, NTT Secure Platform Laboratories have been paying attention to the secure computation technology. The secure computation technology is a wondrous encryption technology that can execute processing without solving encryption, which means that it "executes processing with data encrypted and not turned back into original data at all."
Normally, to analyze data, if they are encrypted during telecommunication and/or for storage, they need to be turned back into original data for their processing at the time of processing. Therefore, from the point of view of the owners of the data, they feel the risk of information leak; thus, the number of the users and organizations that feel reluctance in the use or utilization of the data in relation to a corporate secret or individual privacy is not very small (Fig. 1).
In particular, if data is wanted to be provided by a data owner to another party or even to somebody in the same organization for proactive use and/or usage, the above-mentioned fact will be nothing but a big barrier. SC Laboratories, to contribute to solve such factors, have been working on the research and development of the secure computation technology for considerable years. As one of the achievements, in 2017, we developed "Sanshi (R)," a commercial system based on the secret-sharing schemes which can process statistics securely at a practical speed with data encrypted.
As the measures for the security and privacy in relation to the data used with AI, SC Laboratories have developed the secure computation deep training, which refers to the technology to process the training and prediction in deep training with data encrypted and not turned back to the original data at all. A feature of the secure computation deep training of SC Laboratories is the fact that the training processing using a standard optimization processing that is executed for normal deep training is completely reproduced for the first time in the world by making use of the performance in the world top class, while conventional ones, as they have unsolved problems in performance, have been replacing that with simpler processing than that for normal deep training(1).
As this technology is used, when the data in relation to corporate secret and/or private privacy is used for deep training, the processing can be executed with data encrypted and not turned back to the original data on the server where the training or prediction is processed. In other words, its implementation is enabled in all the steps with the encrypted data necessary to use for deep training, which includes ? distributing data, ? storing data, ? processing training, and ? processing predictions. NTT thinks that this is part of the above-mentioned concept: the <world where all the processes of creating value can be implemented securely>.
The secure computation deep training can be divided into the processes for the processing by the AI for machine training and the process for the prediction after the training with the data encrypted (Fig. 2). On the server, the data is always kept encrypted and is never turned back to its original data; therefore, the users and organizations can provide data with less worries than in conventional cases, and, thus, this leads to the increment of the data and their types that can be used for training. No other thing but this data enrichment realizes the AI with more accuracy and that can process high-level analyses (Fig. 3). As mentioned so far, the analyses with further enhanced security are enabled; this is the <security to create> that NTT thinks.
In the secure computation, the processing is executed with the data encrypted and not turned back to the original data; therefore, its method of processing is considerably different from those of the methods in which encryption is not involved. Conventionally, there were some uneasy processing domains where unencrypted data could be processed easily but the realization with secure computation was difficult. Generally speaking, at the state of training in deep training, data is provided for training, and multiple layers are processed sequentially to acquire the results halfway; if those halfway results comes from sufficiently learned ones, they are output as the final results, while, if they do not, the processing to renew the halfway results (optimization processing) is executed to repeat the course of processing once again from the one for the first layer during its processing flow.
Among those multiple layers, the final layer is called an output layer, with which, in standard cases, a numerical formula called softmax function is calculated. Further, as for optimization processing, SGD (Stochastic Gradient Descent) is known as a primitive method; however, as it involves a large number of repetitions, Adam (Adaptive moment estimation) and any other technology, which are the improvements of SGD, are used mainly. For those softmax functions and Adam, which are used for standard training processes in deep training, the processing combined with divisions, exponentials, reciprocals, and roots is executed (Fig. 4).
When the training in deep training was processed based on the secure computation, the conventional technologies had difficulty in processing divisions, exponentials, reciprocals, and roots; therefore, it was difficult to calculate softmax functions or Adam with secure computation. Therefore, many precedence researches focused on the predictions that did not require such calculations among the training and prediction. In addition, in some of the precedence researches that coped with training, softmax functions were approximated considerably roughly, and only some primitive SGD was used in optimization processing.
In the secure computation technology of NTT, the softmax functions, whose calculation was conventionally difficult, are calculated quickly and with accuracy and, moreover, the technology capable of using Adam, the major optimization processing, has been developed. There are two different approaches as the method of realizing it: one is an approach to prepare, in advance, correspondence tables listing the combinations of inputs and outputs for the calculation of softmax functions and Adam, to encrypt inputs and the correspondence tables, and to use a unique technology called hidden map capable of acquiring the outputs corresponding to their inputs. The other is an approach to develop high-speed algorithms dedicated to each of the divisions, exponentials, reciprocals, and roots, which compose softmax functions and Adam. By using the above-mentioned technologies, it has become possible to execute training by using the standard deep training algorithms with the data encrypted.
For instance, in the method of using the dedicated algorithms for divisions, exponentials, reciprocals, and roots in this development, it has become possible to executed one epoch of training in approximately two minutes in a model training to discriminate 60 thousand handwritten characters.
In the secure computation technology SC Laboratories coping with, analyses are conducted without turning back the encrypted data into their original data by using the secret-sharing schemes as an ISO standard; therefore, it is expected to contribute to the embodiment of the society in which the information such as the information in relation to corporate secret information and/or personal privacy can be provided and used safety and securely.
In order for appropriate secret sharing to be chosen, NTT has been leading the creation of the standards as an editor for the standardization of the secret-sharing schemes at ISO. As an achievement, the ISO international standard for secret-sharing schemes has been issued so that the users can choose safe and secure secret-sharing schemes that are recognized by the international organization.
As mentioned above, as of present, sending and receiving data and sharing data beyond the boundaries of corporations and business fields have been advancing. In addition, under these circumstances, the implementation of AI for using data is rapidly advancing. We think that the circulation of data per se will be activated and new services and applications that never have been seen before will be born by contributing to the secure analyses of a great amount of ever increasing data and by activating the use and usage of data.
For example, it is expected that it will become possible, by executing training involving the information on the weather and/or corporate events with personal location information and/or schedules encrypted, to predict optimum supplies for restaurants and the assignments of human resources and, by secretly training medical data such as x-ray photos, MRI images, CT scan images, and microscopic photos, to accurately discriminate if there is any malignant tumor or the like in the result of an inspection. Furthermore, besides such application in medical fields, it is understood that the accuracy of credit examinations on those who want loans will be enhanced in the field of financing.
NTT intends to cooperate with the partners that have the knowledge on AI to conduct demonstration experiments and the like and, thus, to verify the effects of the deep training using secure computation.
*The names of the laboratories mentioned in the article may have changed since the time of writing/interview.