Data mining technology and prediction and inference technology capable of analyzing the multitude of data in the world and present useful information will be helpful not only for making business decisions, but also in daily situtations. The statistical machine learning approach is useful for dealing with data that is noisy and can only be observed incompletely.
We are conducting research on a variety of topics. These include technology for asynchronously learning and inferring from distributed and accumulated statistically biased data, technology for learning and searching for optimal solutions for events that are difficult to model, learning technology that accounts for causal relationships, technology for learning and inference models with complicated structures in a realistic time for calculation, and technology for efficiently searching for partial data related to user interests and queries.
Advances in deep learning technology have made it possible to build high precision machine learning models for tasks that can collect high-quality and comprehensive data on a large scale, such as voice recognition and image recognition. Many types of data are difficult to comprehensively collect or collect then comprehensively analyze, including medical data, abnormality data, and data collected by various businesses. We are researching and developing algorithms that realize learning and inference and data analysis from statistically biased incomplete observation data with high precision and in practical calculation time in various situations where comprehensive large-scale data cannot be used.
We are working to develop technology for efficient model learning and inference from data that is distributed for management and is statistically biased, including medical data, factory and office data, and various sensor data. Distributed learning realizes global model learning without taking data stored in individual terminals and servers, so it can be expected to reduce data traffic and protect privacy. Technology that does asynchronous distributed learning avoids concentration of traffic and computing, so it can be expected to reduce time needed for learning.
We are researching black box optimization technology, which efficiently searches for superior solutions to problems that are difficult to model because the relationship between the obtained results and control variables is unknown. These include problems such as exploring traffic guidance measures for avoiding congestion for the flow of groups including people and cars, or searching for optimal conditions for producing substances with good characteristics. We are applying Bayesian optimization, a type of black box optimization, to develop human flow guidance search technology and increase the level of optimization technology.
We are developing technology that uses machine learning to automatically infer combinations of variables which have a cause-effect relationship (causal relationship), such as rainfall and river flow. Model design by data analysis experts is necessary to correctly infer causality, but using machine learning, we aim to build technology that can precisely infer causal relationships without expert knowledge.
Also, in order to obtain fair discriminators which do not cause unfair discrimination, we are researching learning methods that consider causal relationships between variables.
Statistical machine learning and statistical hypothesis testing require probability calculations using models. In models with complicated structures such as graph relationships between variables, it takes a great deal of time to calculate probability precisely. We are working to develop technology for precisely calculating probabilities in a realistic amount of time, by applying decision graphs used in combinatiorial optimization to models of events that have a graph structure.
We are researching high-speed similarity search methods that use neighborhood graphs as their index structure and can use a variety of similarity scales for various data including documents, images, audio/acoustic signals, and symbol strings.
Using neighborhood graphs achieves faster searches and makes it possible to use diverse data and similaity scales.
Neighborhood graphs also make it possble to easily visualize search results.