Visual Perception

Project Overview

We are working on models and algorithms for visual perception. Our work has produced widely used network architectures and random field models for images. We have also created large-scale datasets and benchmarks for training and evaluating broad-competence visual perception systems. One interest is in robust visual perception: models that work reliably in the real world, across many environmental conditions, and can be rapidly deployed in new environments. Another line of inquiry is whether alternatives to convolutional networks can be even more effective than these standard models for images. Another long-term interest is video and how it can be used to enhance visual perception.


Non-deep Networks

Simple Multi-dataset Detection

Global Tracking Transformers

Language-driven Semantic Segmentation

Vision Transformers for Dense Prediction

Multiscale Deep Equilibrium Models

Tracking Objects as Points

MSeg: A Composite Dataset for Multi-domain Semantic Segmentation

Exploring Self-attention for Image Recognition

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer

Does Computer Vision Matter for Action?

Playing for Benchmarks

Dilated Residual Networks

Playing for Data: Ground Truth from Computer Games

Feature Space Optimization for Semantic Video Segmentation

Dense Monocular Depth Estimation in Complex Dynamic Scenes

Multi-Scale Context Aggregation by Dilated Convolutions

Parameter Learning and Convergent Inference for Dense Random Fields

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials