Visual Perception

Project Overview

We are working on models and algorithms for visual perception. Our work has produced widely used network architectures and random field models for images. We have also created large-scale datasets and benchmarks for training and evaluating broad-competence visual perception systems. One interest is in robust visual perception: models that work reliably in the real world, across many environmental conditions, and can be rapidly deployed in new environments. Another line of inquiry is whether alternatives to convolutional networks can be even more effective than these standard models for images. Another long-term interest is video and how it can be used to enhance visual perception.

Publications

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Aleksei Bochkovskii, Amaël Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, Vladlen Koltun
International Conference on Learning Representations (ICLR), 2025

CoMotion: Concurrent Multi-person 3D Motion

Alejandro Newell, Peiyun Hu, Lahav Lipson, Stephan Richter, Vladlen Koltun
International Conference on Learning Representations (ICLR), 2025

Non-deep Networks

Ankit Goyal, Alexey Bochkovskiy, Jia Deng, and Vladlen Koltun
Advances in Neural Information Processing Systems (NeurIPS), 2022

Simple Multi-dataset Detection

Xingyi Zhou, Vladlen Koltun, and Philipp Krähenbühl
Computer Vision and Pattern Recognition (CVPR), 2022

Global Tracking Transformers

Xingyi Zhou, Tianwei Yin, Vladlen Koltun, and Philipp Krähenbühl
Computer Vision and Pattern Recognition (CVPR), 2022

Language-driven Semantic Segmentation

Boyi Li, Kilian Q. Weinberger, Serge Belongie, Vladlen Koltun, and René Ranftl
International Conference on Learning Representations (ICLR), 2022

Vision Transformers for Dense Prediction

René Ranftl, Alexey Bochkovskiy‬, and Vladlen Koltun
International Conference on Computer Vision (ICCV), 2021

Multiscale Deep Equilibrium Models

Shaojie Bai, Vladlen Koltun, and J. Zico Kolter
Advances in Neural Information Processing Systems (NeurIPS), 2020
(Selected for full oral presentation)

Tracking Objects as Points

Xingyi Zhou, Vladlen Koltun, and Philipp Krähenbühl
European Conference on Computer Vision (ECCV), 2020
(Selected for spotlight oral presentation)

MSeg: A Composite Dataset for Multi-domain Semantic Segmentation

John Lambert, Zhuang Liu, Ozan Sener, James Hays, and Vladlen Koltun
Computer Vision and Pattern Recognition (CVPR), 2020

Exploring Self-attention for Image Recognition

Hengshuang Zhao, Jiaya Jia, and Vladlen Koltun
Computer Vision and Pattern Recognition (CVPR), 2020

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer

René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun
IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3), 2022

Does Computer Vision Matter for Action?

Brady Zhou, Philipp Krähenbühl, and Vladlen Koltun
Science Robotics, 4(30), 2019

Playing for Benchmarks

Stephan R. Richter, Zeeshan Hayder, and Vladlen Koltun
International Conference on Computer Vision (ICCV), 2017 (Selected for spotlight oral presentation)

Dilated Residual Networks

Fisher Yu, Vladlen Koltun, and Thomas Funkhouser
Computer Vision and Pattern Recognition (CVPR), 2017

Playing for Data: Ground Truth from Computer Games

Stephan R. Richter, Vibhav Vineet, Stefan Roth, and Vladlen Koltun
European Conference on Computer Vision (ECCV), 2016

Feature Space Optimization for Semantic Video Segmentation

Abhijit Kundu, Vibhav Vineet, and Vladlen Koltun
Computer Vision and Pattern Recognition (CVPR), 2016 (Selected for full oral presentation)

Dense Monocular Depth Estimation in Complex Dynamic Scenes

René Ranftl, Vibhav Vineet, Qifeng Chen, and Vladlen Koltun
Computer Vision and Pattern Recognition (CVPR), 2016

Multi-Scale Context Aggregation by Dilated Convolutions

Fisher Yu and Vladlen Koltun
International Conference on Learning Representations (ICLR), 2016

Parameter Learning and Convergent Inference for Dense Random Fields

Philipp Krähenbühl and Vladlen Koltun
International Conference on Machine Learning (ICML), 2013

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Philipp Krähenbühl and Vladlen Koltun
Advances in Neural Information Processing Systems, 2011 (Oustanding Student Paper Award)