Abstract

Convolutional networks for image classification progressively reduce resolution until the image is represented by tiny feature maps in which the spatial structure of the scene is no longer discernable. Such loss of spatial acuity can limit image classification accuracy and complicate the transfer of the model to downstream applications that require detailed scene understanding. These problems can be alleviated by dilation, which increases the resolution of output feature maps without reducing the receptive field of individual neurons. We show that dilated residual networks outperform their non-dilated counterparts in image classification, yielding higher accuracy on ImageNet classification without increasing the model’s depth or complexity. The sharper localization abilities developed by dilated ResNets during training also improve performance on downstream applications. Without any modifications, dilated ResNets yield state-of-the-art accuracy on weakly supervised object localization in ImageNet and outperform baseline models when transferred to pixelwise prediction.

Materials