Matt Deitke
Matt Deitke
AI Researcher at Vercept

matt@vercept.com
Seattle, WA

Matt Deitke

Hi! I'm an AI researcher and co-founder of Vercept (in stealth).

Previously, I worked with the PRIOR team at the Allen Institute for AI, and was a Ph.D. student at the Allen School in the University of Washington, Seattle. I was fortunate to be advised by Ali Farhadi in the RAIVN lab, and have had the pleasure of collaborating with Ani Kembhavi, Roozbeh Mottaghi, Ludwig Schmidt, and Rick Szeliski. My research interests are in computer vision, deep learning, and embodied AI. I am interested in building AI systems that are broadly useful and robust. I led the development of Molmo, ProcTHOR, Objaverse, and Phone2Proc.

Since 2019, I have been conducting research at the Allen Institute for AI while completing my undergrad at the University of Washington, Seattle. Before that, I grew up near Chicago and spent several teenage years working on computer graphics, interface design, and visualization in the Department of Athletics at The Ohio State University, the University of Cincinnati, and a variety of other organizations. Later in high school, I studied machine learning and deep learning at Georgia Tech.

News

A futuristic cityscape
Thrilled to announce that I'm co-founding Vercept! So excited to build what's next. 🚀
Dec, 2024
NeurIPS Outstanding Paper Award
ProcTHOR won the Outstanding Paper Award at NeurIPS 2022!
Nov, 2022
Avatar
Excited to announce Objaverse, a massive dataset of 3D objects with broad applications across AI.
Dec, 2022
Avatar
Thrilled to continue at UW for my Ph.D. working with Ali Farhadi and at AI2.
Apr, 2023
Avatar
Received an Outstanding Reviewer Award from CVPR 2023!
Jun, 2023
Avatar
Giving invited talks at RSS 2023, ICCV 2023, and Shanghai AI Lab!
Jun, 2023
Avatar
The 2nd edition of Rick Szeliski's Computer Vision textbook was published! Ecstatic to have contributed!
Jan, 2022
Avatar
Grateful to have received Ph.D. offers at CMU, MIT, Oxford, Stanford, UC Berkeley, and UW.
Mar, 2023
Avatar
Giving an invited talk at UW's Vision Lunch titled, "Scaling Embodied AI with ProcTHOR: Where We Are and What's Next."
Jun, 2022
Avatar
Extremely excited to release ProcTHOR! Using procedural generation to scale up the diversity of data leads to remarkable generalization.
Jun, 2022
Avatar
Excited to release a retrospectives on the Embodied AI workshops! We discuss common approaches, its scope, and future directions.
Oct, 2022
Avatar
Co-Organizing the Embodied AI Workshop and the AI2-THOR Rearrangement Challenge at CVPR 2022 in New Orleans.
Jun, 2022
Avatar
We released an updated revision of the AI2-THOR paper covering its impact and new features!
Aug, 2022
Avatar
Aug, 2022

Publications

An image of a man holding a phone taking a pictuer of a group of people at a table.

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

Matt Deitke*, Christopher Clark*, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, Jiasen Lu, Taira Anderson, Erin Bransom, Kiana Ehsani, Huong Ngo, YenSung Chen, Ajay Patel, Mark Yatskar, Chris Callison-Burch, Andrew Head, Rose Hendrix, Favyen Bastani, Eli VanderBilt, Nathan Lambert, Yvonne Chou, Arnavi Chheda, Jenna Sparks, Sam Skjonsberg, Michael Schmitz, Aaron Sarnat, Byron Bischoff, Pete Walsh, Chris Newell, Piper Wolters, Tanmay Gupta, Kuo-Hao Zeng, Jon Borchardt, Dirk Groeneveld, Crystal Nam, Sophie Lebrecht, Caitlin Wittlif, Carissa Schoenick, Oscar Michel, Ranjay Krishna, Luca Weihs, Noah A. Smith, Hannaneh Hajishirzi, Ross Girshick, Ali Farhadi, Aniruddha Kembhavi

ArXiv 2024
TLDR
We release Molmo, a new family of state-of-the-art open vision-language models built entirely from scratch, featuring our novel PixMo datasets that include detailed image captions, Q&A pairs, and 2D pointing data. Our 72B model outperforms most proprietary models like Claude 3.5 Sonnet and Gemini 1.5, ranking second only to GPT-4 in both academic benchmarks and human evaluations.
PDF
An image of a bunch of 3D objects scattered in a scene

Objaverse-XL: A Universe of 10M+ 3D Objects

Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, Eli VanderBilt, Aniruddha Kembhavi, Carl Vondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt*, Ali Farhadi*

NeurIPS 2023
TLDR
We introduce Objaverse-XL, an open dataset of over 10 million 3D objects. With it, we train Zero123-XL, a foundation model for 3D, observing incredible 3D generalization abilities. With the Zero123-XL base model, we can then perform image-to-3D and text-to-3D.
Objaverse

Objaverse: A Universe of Annotated 3D Objects

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, Ali Farhadi

CVPR 2023
TLDR
Objaverse is a massive dataset of objects with 800K+ (and growing) 3D models with descriptive captions, tags, and animations. We demonstrate it's potential by training generative models, improving 2D instance segmentation, training open-vocabulary object navigation models, and creating a benchmark for testing the robustness of vision models.
Phone2Proc

Phone2Proc: Bringing Robust Robots Into Our Chaotic World

Matt Deitke*, Rose Hendrix*, Luca Weihs, Ali Farhadi, Kiana Ehsani, Aniruddha Kembhavi

CVPR 2023
TLDR
From a 10-minute iPhone scan of any environment, we generated simulated training scenes that semantically match that environment. Training a robot to perform ObjectNav in these scenes dramatically improves sim-to-real performance from 35% to 71% and results in an agent that is remarkably robust to human movement, lighting variations, added clutter, and rearranged objects.
ProcTHOR

🏘️ ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi

NeurIPS 2022 Outstanding Paper Award
TLDR
We built a platform to procedurally generate realistic, interactive, simulated 3D environments to dramatically scale up the diversity and size of training data in Embodied AI. We find that it helps significantly with performance on many tasks.
Features in AI2-THOR

Retrospectives on the Embodied AI Workshop

Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi, Sonia Raychaudhuri, Mike Roberts, Silvio Savarese, Manolis Savva, Mohit Shridhar, Niko Sünderhauf, Andrew Szot, Ben Talbot, Joshua B. Tenenbaum, Jesse Thomason, Alexander Toshev, Joanne Truong, Luca Weihs, Jiajun Wu

ArXiv 2022
TLDR
We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges in visual navigation, rearrangement, and embodied vision-and-language. We discuss the scope of embodied AI research, performance of state-of-the-art models, common modeling approaches, and future directions.
Features in AI2-THOR

AI2-THOR: An Interactive 3D Environment for Visual AI

Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Matt Deitke, Kiana Ehsani, Daniel Gordon, Yuke Zhu, Aniruddha Kembhavi, Abhinav Gupta, Ali Farhadi

ArXiv 2022
TLDR
We introduce The House Of inteRactions (THOR), a framework for visual AI research. AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks. It has enabled research in many areas of AI.
Room Rearrangement in AI2-THOR

Visual Room Rearrangement

Luca Weihs, Matt Deitke, Aniruddha Kembhavi, Roozbeh Mottaghi

CVPR 2021 Oral Presentation
TLDR
We built a pre-training task where the agent's goal is to interactively rearrange objects in a room from one state to another. For instance, the agent may have to open the Fridge and move the Lettuce to the CounterTop. Modern deep-RL struggles.
PDF
RoboTHOR

RoboTHOR: An Open Simulation-to-Real Embodied AI Platform

Matt Deitke*, Winson Han*, Alvaro Herrasti*, Aniruddha Kembhavi*, Eric Kolve*, Roozbeh Mottaghi*, Jordi Salvador*, Dustin Schwenk*, Eli VanderBilt*, Matthew Wallingford*, Luca Weihs*, Mark Yatskar*, Ali Farhadi

CVPR 2020
TLDR
We rent office buildings in Seattle and turn them into apartment studios with many possible furniture and wall layouts. Each apartment layout is then computationally remodeled by hand to enable a simulated robot to interact with it in video-game-like context. We study how well a robot trained purely in the simulated environments can transfer to reality.

Software

AI2-THOR

AI2-THOR consists of real and simulated environments for interactive robot learning.

Scripts for Downloading and Processing Objaverse-XL
Python
allenai/ai2thor
Framework
Interactive Simulated Environments for Embodied AI
Unity
Procedurally Generate Houses for Embodied AI Training
Python
Scripts for rendering Objaverse
Python
A Framework for Training Embodied-AI Agents
PyTorch
Code for Running the Visual Room Rearrangement task
PyTorch
Python Package for Distributing Datasets and Models
Python
Run AI2-THOR with Google Colab
Colab
The ProcTHOR-10K Houses Dataset
Python
Evaluation tasks for ObjectNav models
Python
The Website for the Embodied AI Workshop at CVPR
React
Website for the UW RAIVN Lab
React
Explore Trending Papers at CVPR 2021
React
Explore and Parse Papers at CVPR 2019
Python

Workshops

Embodied AI

I've co-organized the Embodied AI workshops at CVPR. Our goal is to bring together researchers to share and discuss the current state of intelligent agents that can see, talk, act, and reason.