Zero-shot knowledge distillation



We developed a zero-shot method that does not require any real data for learning in knowledge distillation where the prediction result of the teacher model is used as the target to train a student model. This method enables few-shot learning for learning a high-performance neural network from only a small amount of learning data without using a model that has been pre-learned with large-scale data.


The proposed method uses artificially generated pseudo examples as training data and learns the student model so that the output of the student model for the pseudo examples mimics the output of the teacher model. We also use pseudo example optimization, which was inspired by adversarial example generation [Goodfellow+ICLR2015], to move pseudo examples to areas where the student model has not been sufficiently learned. By alternating between this pseudo example optimization and student model training, we can achieve zero-shot knowledge distillation, which constitutes a student model that behaves similarly to the teacher model.

Experimental results

If we can generate an infinite number of pseudo examples, as shown in (b) (green), we can construct a model (d) that almost exactly transcribes the teacher model (c), where the classification boundary is represented with a black line. Even if pseudo examples are limited (e), we can obtain a model (h) that approximately behaves in the space where real examples are assumed to exist.

Future work

We plan on developing technologies that take into account the essential nature of real data. This will lead to highly efficient, robust, and low-cost media-information processing and the machine learning infrastructure that supports it.


  1. Kimura, Ghahramani, Takeuchi, Iwata, Ueda, “Few-shot learning from scratch by pseudo example optimization,” Proc. British Machine Vision Conference (BMVC), 2018.


Akisato Kimura
Recognition Research Group, Media Information Laboratory, NTT Communication Science Laboratories

Related Research