Shrinking and accelerating deep neural networks

Song Han on compression techniques and inference engines to optimize deep learning in production.

By Roger Chen

April 13, 2017

GFAP neural storm. (source: Jason Snyder on Flickr)

This is a highlight from a talk by Song Han, “Deep Neural Network Model Compression and an Efficient Inference Engine.” Visit Safari to view the full session from the 2016 Artificial Intelligence Conference in New York.

Deep neural networks have proven powerful for a variety of applications, but their sheer size places sobering constraints on speed, memory, and power consumption. These limitations become particularly important given the rise of mobile devices and their limited hardware resources.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

In this talk, Song Han shows how compression techniques can alleviate these challenges by greatly reducing the size of deep neural nets. He also demonstrates an energy-efficient engine that performs inference to greatly accelerate computation, making deep learning more practical as it spills from university campus to production.

Related:

Deep Learning sessions at the O’Reilly Artificial Intelligence Conference in New York City, June 26-29, 2017
Training, evaluating, and tuning deep neural network models with TensorFlow-Slim (video)
The Deep Learning Video Collection: 2016

Post topics: Artificial Intelligence