In multi-task learning, multiple tasks are solved jointly, sharing inductive bias between them. Multi-task learning is inherently a multi-objective problem since different tasks may conflict, necessitating a trade-off between them. A common approach to this trade-off is to optimize a proxy objective that minimizes a weighted linear combination of per-task losses. However, this proxy is only valid when the tasks do not compete, which is rarely the case. In this paper, we explicitly cast multi-task learning as multi-objective optimization, with the overall objective of finding a Pareto optimal solution. To this end, we use algorithms developed in the gradient-based multi-objective optimization literature. Although these algorithms have desirable theoretical guarantees, they are not directly applicable to large-scale learning problems. We therefore propose efficient and accurate approximations. We apply our method to a variety of multi-task deep learning problems including digit classification, scene understanding (joint semantic segmentation, instance segmentation, and depth estimation), and multi-label classification. Our method yields higher-performing models than recent multi-task learning formulations or per-task training.