CVPR Buzz - 2021

Built by Matt Deitke

CVPR Buzz displays the most discussed papers at CVPR 2021 using Twitter for indexing discussions and Semantic Scholar for collecting citation data.

To add data or see how it was collected, checkout the GitHub repo:

/mattdeitke/cvpr-buzz

1660 results

[1]

Meta Pseudo Labels

Hieu Pham, Zihang Dai, Qizhe Xie, Quoc V. Le

We present Meta Pseudo Labels, a semi-supervised learning method that achieves a new state-of-the-art top-1 accuracy of 90.2% on ImageNet, which is 1.6% better than the existing state-of-the-art. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

1178.75

723

3081

Thursday Poster Session

[2]

Animating Pictures With Eulerian Motion Fields

Aleksander Holynski, Brian L. Curless, Steven M. Seitz, Richard Szeliski

In this paper, we demonstrate a fully automatic method for converting a still image into a realistic animated looping video. [Expand]

Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, Sanja Fidler

We introduce DatasetGAN: an automatic procedure to generate massive datasets of high-quality semantically segmented images requiring minimal human effort. [Expand]

We propose pixelNeRF, a learning framework that predicts a continuous neural scene representation conditioned on one or few input images. [Expand]

PDF

arXiv

Show Tweets

109.25

314

Tuesday Poster Session

[50]

Petr Kellnhofer, Lars C. Jebe, Andrew Jones, Ryan Spicer, Kari Pulli, Gordon Wetzstein

Novel view synthesis is a challenging and ill-posed inverse rendering problem. [Expand]

Yuan-Hong Liao, Amlan Kar, Sanja Fidler

Data is the engine of modern computer vision, which necessitates collecting large-scale datasets. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

58.75

163

Tuesday Poster Session

[106]

Semantic Scholar

Show Tweets

46.00

133

Monday Poster Session

[131]

BoxInst: High-Performance Instance Segmentation With Box Annotations

Zhi Tian, Chunhua Shen, Xinlong Wang, Hao Chen

We present a high-performance method that can achieve mask-level instance segmentation with only bounding-box annotations for training. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

41.50

102

Tuesday Poster Session

[144]

On Semantic Similarity in Video Retrieval

Michael Wray, Hazel Doughty, Dima Damen

Current video retrieval efforts all found their evaluation on an instance-based assumption, that only a single caption is relevant to a query video and vice versa. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

41.25

120

Tuesday Poster Session

[145]

Pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

Gengshan Yang, Deqing Sun, Varun Jampani, Daniel Vlasic, Forrester Cole, Huiwen Chang, Deva Ramanan, William T. Freeman, Ce Liu

Remarkable progress has been made in 3D reconstruction of rigid structures from a video or a collection of images. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

39.50

124

Friday Poster Session

[155]

Representation Learning via Global Temporal Alignment and Cycle-Consistency

Isma Hadji, Konstantinos G. Derpanis, Allan D. Jepson

We introduce a weakly supervised method for representation learning based on aligning temporal sequences (e.g., videos) of the same process (e.g., human action). [Expand]

Ramprasaath R. Selvaraju, Karan Desai, Justin Johnson, Nikhil Naik

Recent advances in self-supervised learning (SSL) have largely closed the gap with supervised ImageNet pretraining. [Expand]

Daniel Barath, Dmytro Mishkin, Ivan Eichhardt, Ilia Shipachev, Jiri Matas

We propose ways to speed up the initial pose-graph generation for global Structure-from-Motion algorithms. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

36.50

Thursday Poster Session

[168]

Efficient Conditional GAN Transfer With Knowledge Propagation Across Classes

Mohamad Shahbazi, Zhiwu Huang, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool

Generative adversarial networks (GANs) have shown impressive results in both unconditional and conditional image generation. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

36.50

104

Thursday Poster Session

[169]

End-to-End Object Detection With Fully Convolutional Network

Jianfeng Wang, Lin Song, Zeming Li, Hongbin Sun, Jian Sun, Nanning Zheng

Mainstream object detectors based on the fully convolutional network has achieved impressive performance. [Expand]

Monocular 3D reconstruction of articulated object categories is challenging due to the lack of training data and the inherent ill-posedness of the problem. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

36.00

103

Monday Poster Session

[173]

Self-Supervised Augmentation Consistency for Adapting Semantic Segmentation

Nikita Araslanov, Stefan Roth

We propose an approach to domain adaptation for semantic segmentation that is both practical and highly accurate. [Expand]

Yun Chen, Frieda Rong, Shivam Duggal, Shenlong Wang, Xinchen Yan, Sivabalan Manivasagam, Shangjie Xue, Ersin Yumer, Raquel Urtasun

Scalable sensor simulation is an important yet challenging open problem for safety-critical domains such as self-driving. [Expand]

Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, Ping Luo

We present Sparse R-CNN, a purely sparse method for object detection in images. [Expand]

PDF

Semantic Scholar

Show Tweets

31.50

Thursday Poster Session

[195]

Temporal-Relational CrossTransformers for Few-Shot Action Recognition

Toby Perrett, Alessandro Masullo, Tilo Burghardt, Majid Mirmehdi, Dima Damen

We propose a novel approach to few-shot action recognition, finding temporally-corresponding frame tuples between the query and videos in the support set. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

31.25

Monday Poster Session

[196]

Amit Raj, Michael Zollhofer, Tomas Simon, Jason Saragih, Shunsuke Saito, James Hays, Stephen Lombardi

Acquisition and rendering of photo-realistic human heads is a highly challenging research problem of particular importance for virtual telepresence. [Expand]

PDF

Semantic Scholar

Show Tweets

29.25

Thursday Poster Session

[208]

PDF

Semantic Scholar

Show Tweets

23.75

Monday Poster Session

[258]

11.75

Thursday Poster Session

[398]

M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-Training

Minheng Ni, Haoyang Huang, Lin Su, Edward Cui, Taroon Bharti, Lijuan Wang, Dongdong Zhang, Nan Duan

We present M3P, a Multitask Multilingual Multimodal Pre-trained model that combines multilingual pre-training and multimodal pre-training into a unified framework via multitask pre-training. [Expand]

Kwanyoung Kim, Dongwon Park, Kwang In Kim, Se Young Chun

Often, labeling large amount of data is challenging due to high labeling cost limiting the application domain of deep learning techniques. [Expand]

Show Tweets

7.25

Monday Poster Session

[502]

Rethinking the Heatmap Regression for Bottom-Up Human Pose Estimation

Zhengxiong Luo, Zhicheng Wang, Yan Huang, Liang Wang, Tieniu Tan, Erjin Zhou

Heatmap regression has become the most prevalent choice for nowadays human pose estimation methods. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

7.25

Thursday Poster Session

[503]

Event cameras are novel vision sensors that sample, in an asynchronous fashion, brightness increments with low latency and high temporal resolution. [Expand]

PDF

Semantic Scholar

Show Tweets

6.75

Tuesday Poster Session

[514]

Shang-Hua Gao, Qi Han, Duo Li, Ming-Ming Cheng, Pai Peng

Batch Normalization (BatchNorm) has become the default component in modern neural networks to stabilize training. [Expand]

PDF

Semantic Scholar

Show Tweets

5.00

Wednesday Poster Session

[566]

FrameExit: Conditional Early Exiting for Efficient Video Recognition

Amir Ghodrati, Babak Ehteshami Bejnordi, Amirhossein Habibian

In this paper, we propose a conditional early exiting framework for efficient video recognition. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

5.00

Friday Poster Session

[567]

PDF

arXiv

Show Tweets

4.25

Thursday Poster Session

[595]

Beyond Image to Depth: Improving Depth Prediction Using Echoes

Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma

We address the problem of estimating depth with multi modal audio visual data. [Expand]

PDF

Semantic Scholar

Show Tweets

3.25

Thursday Poster Session

[651]

Yan Xia, Yusheng Xu, Shuang Li, Rui Wang, Juan Du, Daniel Cremers, Uwe Stilla

We tackle the problem of place recognition from point cloud data and introduce a self-attention and orientation encoding network (SOE-Net) that fully explores the relationship between points and incorporates long-range context into point-wise local descriptors. [Expand]

PDF

Semantic Scholar

Show Tweets

3.25

Thursday Poster Session

[658]

Semantic Scholar

Show Tweets

2.75

Wednesday Poster Session

[700]

PAConv: Position Adaptive Convolution With Dynamic Kernel Assembling on Point Clouds

Mutian Xu, Runyu Ding, Hengshuang Zhao, Xiaojuan Qi

We introduce Position Adaptive Convolution (PAConv), a generic convolution operation for 3D point cloud processing. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

2.75

Tuesday Poster Session

[701]

Praveen Tirupattur, Kevin Duarte, Yogesh S Rawat, Mubarak Shah

Real world videos contain many complex actions with inherent relationships between action classes. [Expand]

Xiaohan Wang, Linchao Zhu, Yi Yang

Text-video retrieval is a challenging task that aims to search relevant video contents based on natural language descriptions. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

2.25

Tuesday Poster Session

[748]

Alexandros Haliassos, Konstantinos Vougioukas, Stavros Petridis, Maja Pantic

Although current deep learning-based face forgery detectors achieve impressive performance in constrained scenarios, they are vulnerable to samples created by unseen manipulation methods. [Expand]

PDF

Semantic Scholar

Show Tweets

2.00

Tuesday Poster Session

[762]

Dongfang Liu, Yiming Cui, Wenbo Tan, Yingjie Chen

Video instance segmentation (VIS) is a new and critical task in computer vision. [Expand]

PDF

Semantic Scholar

Show Tweets

2.00

Wednesday Poster Session

[769]

Watching You: Global-Guided Reciprocal Learning for Video-Based Person Re-Identification

Xuehu Liu, Pingping Zhang, Chenyang Yu, Huchuan Lu, Xiaoyun Yang

Video-based person re-identification (Re-ID) aims to automatically retrieve video sequences of the same person under non-overlapping cameras. [Expand]

Thursday Poster Session

[878]

DER: Dynamically Expandable Representation for Class Incremental Learning

Shipeng Yan, Jiangwei Xie, Xuming He

We address the problem of class incremental learning, which is a core step towards achieving adaptive vision intelligence. [Expand]

Xingjian Zhen, Rudrasis Chakraborty, Vikas Singh

One strategy for adversarially training a robust model is to maximize its certified radius -- the neighborhood around a given training sample for which the model's prediction remains unchanged. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

1.25

Wednesday Poster Session

[882]

Progressive Temporal Feature Alignment Network for Video Inpainting

Xueyan Zou, Linjie Yang, Ding Liu, Yong Jae Lee

Video inpainting aims to fill spatio-temporal "corrupted" regions with plausible content. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

1.25

Friday Poster Session

[883]

What's in the Image? Explorable Decoding of Compressed Images

1.00

Wednesday Poster Session

[916]

POSEFusion: Pose-Guided Selective Fusion for Single-View Human Volumetric Capture

Zhe Li, Tao Yu, Zerong Zheng, Kaiwen Guo, Yebin Liu

We propose POse-guided SElective Fusion (POSEFusion), a single-view human volumetric capture method that leverages tracking-based methods and tracking-free inference to achieve high-fidelity and dynamic 3D reconstruction. [Expand]

Chia-Ni Lu, Ya-Chu Chang, Wei-Chen Chiu

In this paper we propose a new problem scenario in image processing, wide-range image blending, which aims to smoothly merge two different input photos into a panorama by generating novel image content for the intermediate region between them. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

1.00

Monday Poster Session

[920]

IQDet: Instance-Wise Quality Distribution Sampling for Object Detection

Yuchen Ma, Songtao Liu, Zeming Li, Jian Sun

We propose a dense object detector with an instance-wise sampling strategy, named IQDet. [Expand]

Ravi Teja Mullapudi, Fait Poms, William R. Mark, Deva Ramanan, Kayvon Fatahalian

We focus on the problem of training deep image classification models for a small number of extremely rare categories. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

1.00

Wednesday Poster Session

[924]

LayoutGMN: Neural Graph Matching for Structural Layout Similarity

Akshay Gadi Patil, Manyi Li, Matthew Fisher, Manolis Savva, Hao Zhang

We present a deep neural network to predict structural similarity between 2D layouts by leveraging Graph Matching Networks (GMN). [Expand]

Shiyu Xuan, Shiliang Zhang

Most of unsupervised person Re-Identification (Re-ID) works produce pseudo-labels by measuring the feature similarity without considering the distribution discrepancy among cameras, leading to degraded accuracy in label computation across cameras. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

1.00

Thursday Poster Session

[944]

Inferring CAD Modeling Sequences Using Zone Graphs

Xianghao Xu, Wenzhe Peng, Chin-Yi Cheng, Karl D.D. Willis, Daniel Ritchie

In computer-aided design (CAD), the ability to "reverse engineer" the modeling steps used to create 3D shapes is a long-sought-after goal. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

1.00

Tuesday Poster Session

[945]

Bidirectional Projection Network for Cross Dimension Scene Understanding

Wenbo Hu, Hengshuang Zhao, Li Jiang, Jiaya Jia, Tien-Tsin Wong

2D image representations are in regular grids and can be processed efficiently, whereas 3D point clouds are unordered and scattered in 3D space. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.75

Thursday Poster Session

[978]

Few-Shot Open-Set Recognition by Transformation Consistency

Minki Jeong, Seokeon Choi, Changick Kim

In this paper, we attack a few-shot open-set recognition (FSOSR) problem, which is a combination of few-shot learning (FSL) and open-set recognition (OSR). [Expand]

Shi-Lin Liu, Hao-Xiang Guo, Hao Pan, Peng-Shuai Wang, Xin Tong, Yang Liu

Point set is a flexible and lightweight representation widely used for 3D deep learning. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.75

Monday Poster Session

[982]

Jan Bechtold, Maxim Tatarchenko, Volker Fischer, Thomas Brox

Single-view 3D object reconstruction has seen much progress, yet methods still struggle generalizing to novel shapes unseen during training. [Expand]

Yingjie Cai, Xuesong Chen, Chao Zhang, Kwan-Yee Lin, Xiaogang Wang, Hongsheng Li

Semantic Scene Completion aims at reconstructing a complete 3D scene with precise voxel-wise semantics from a single-view depth or RGBD image. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.50

Monday Poster Session

[996]

Globally Optimal Relative Pose Estimation With Gravity Prior

Yaqing Ding, Daniel Barath, Jian Yang, Hui Kong, Zuzana Kukelova

Smartphones, tablets and camera systems used, e.g., in cars and UAVs, are typically equipped with IMUs (inertial measurement units) that can measure the gravity vector accurately. [Expand]

Jun Li, Sinisa Todorovic

This paper is about action segmentation under weak supervision in training, where the ground truth provides only a set of actions present, but neither their temporal ordering nor when they occur in a training video. [Expand]

PDF

arXiv

Show Tweets

0.50

Wednesday Poster Session

[1005]

Continuous Face Aging via Self-Estimated Residual Age Embedding

Zeqi Li, Ruowei Jiang, Parham Aarabi

Face synthesis, including face aging, in particular, has been one of the major topics that witnessed a substantial improvement in image fidelity by using generative adversarial networks (GANs). [Expand]

MeanShift is a popular mode-seeking clustering algorithm used in a wide range of applications in machine learning. [Expand]

PDF

Semantic Scholar

Show Tweets

0.25

Tuesday Poster Session

[1038]

LaPred: Lane-Aware Prediction of Multi-Modal Future Trajectories of Dynamic Agents

ByeoungDo Kim, Seong Hyeon Park, Seokhwan Lee, Elbek Khoshimjonov, Dongsuk Kum, Junsoo Kim, Jeong Soo Kim, Jun Won Choi

In this paper, we address the problem of predicting the future motion of a dynamic agent (called a target agent) given its current and past states as well as the information on its environment. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.25

Thursday Poster Session

[1039]

Convolutional Neural Networks (CNNs) often fail to maintain their performance when they confront new test domains, which is known as the problem of domain shift. [Expand]

PDF

arXiv

Show Tweets

0.25

Wednesday Poster Session

[1059]

House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent for Professional Architects

Nelson Nauata, Sepidehsadat Hosseini, Kai-Hung Chang, Hang Chu, Chin-Yi Cheng, Yasutaka Furukawa

This paper proposes a generative adversarial layout refinement network for automated floorplan generation. [Expand]

PDF

Show Tweets

0.25

Thursday Poster Session

[1060]

Hyperdimensional Computing as a Framework for Systematic Aggregation of Image Descriptors

Peer Neubert, Stefan Schubert

Image and video descriptors are an omnipresent tool in computer vision and its application fields like mobile robotics. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.25

Friday Poster Session

[1061]

Bridge To Answer: Structure-Aware Graph Interaction Network for Video Question Answering

Jungin Park, Jiyoung Lee, Kwanghoon Sohn

This paper presents a novel method, termed Bridge to Answer, to infer correct answers for questions about a given video by leveraging adequate graph interactions of heterogeneous crossmodal graphs. [Expand]

Guy Shacht, Dov Danon, Sharon Fogel, Daniel Cohen-Or

Non-visual imaging sensors are widely used in the industry for different purposes. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.25

Tuesday Poster Session

[1065]

Learning To Segment Actions From Visual and Language Instructions via Differentiable Weak Sequence Alignment

PDF

arXiv

Show Tweets

0.25

Wednesday Poster Session

[1085]

Abdelrahman Abdelhamed, Abhijith Punnappurath, Michael S. Brown

Most modern smartphones are now equipped with two rear-facing cameras -- a main camera for standard imaging and an additional camera to provide wide-angle or telephoto zoom capabilities. [Expand]

PDF

Show Tweets

0.00

Tuesday Poster Session

[1109]

RPSRNet: End-to-End Trainable Rigid Point Set Registration Network Using Barnes-Hut 2D-Tree Representation

Sk Aziz Ali, Kerem Kahraman, Gerd Reis, Didier Stricker

We propose RPSRNet - a novel end-to-end trainable deep neural network for rigid point set registration. [Expand]

Pranjal Awasthi, George Yu, Chun-Sung Ferng, Andrew Tomkins, Da-Cheng Juan

Adversarial robustness corresponds to the susceptibility of deep neural networks to imperceptible perturbations made at test time. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Wednesday Poster Session

[1113]

Xiaobin Chang, Frederick Tung, Greg Mori

Dynamic Time Warping (DTW) is widely used for temporal data processing. [Expand]

One-shot object detection tackles a challenging task that aims at identifying within a target image all object instances of the same class, implied by a query image patch. [Expand]

PDF

Show Tweets

0.00

Thursday Poster Session

[1136]

Class-Aware Robust Adversarial Training for Object Detection

Pin-Chun Chen, Bo-Han Kung, Jun-Cheng Chen

Object detection is an important computer vision task with plenty of real-world applications; therefore, how to enhance its robustness against adversarial attacks has emerged as a crucial issue. [Expand]

PDF

We address rotation averaging (RA) and its application to real-world 3D reconstruction. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Wednesday Poster Session

[1143]

Indoor Lighting Estimation Using an Event Camera

Zehao Chen, Qian Zheng, Peisong Niu, Huajin Tang, Gang Pan

Image-based methods for indoor lighting estimation suffer from the problem of intensity-distance ambiguity. [Expand]

PDF

Show Tweets

0.00

Thursday Poster Session

[1144]

Jigsaw Clustering for Unsupervised Visual Representation Learning

Pengguang Chen, Shu Liu, Jiaya Jia

Unsupervised representation learning with contrastive learning achieves great success recently. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Thursday Poster Session

[1145]

Learning a Non-Blind Deblurring Network for Night Blurry Images

Liang Chen, Jiawei Zhang, Jinshan Pan, Songnan Lin, Faming Fang, Jimmy S. Ren

Deblurring night blurry images is difficult, because the common-used blur model based on the linear convolution operation does not hold in this situation due to the influence of saturated pixels. [Expand]

PDF

Show Tweets

0.00

Wednesday Poster Session

[1146]

Joint Generative and Contrastive Learning for Unsupervised Person Re-Identification

Hao Chen, Yaohui Wang, Benoit Lagadec, Antitza Dantcheva, Francois Bremond

Recent self-supervised contrastive learning provides an effective approach for unsupervised person re-identification (ReID) by learning invariance from different views (transformed versions) of an input. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Monday Poster Session

[1147]

Learning 3D Shape Feature for Texture-Insensitive Person Re-Identification

Jiaxing Chen, Xinyang Jiang, Fudong Wang, Jun Zhang, Feng Zheng, Xing Sun, Wei-Shi Zheng

It is well acknowledged that person re-identification (person ReID) highly relies on visual texture information like clothing. [Expand]

PDF

Show Tweets

0.00

Wednesday Poster Session

[1148]

Learning Student Networks in the Wild

PDF

Show Tweets

0.00

Wednesday Poster Session

[1155]

S2R-DepthNet: Learning a Generalizable Depth-Specific Structural Representation

Xiaotian Chen, Yuwang Wang, Xuejin Chen, Wenjun Zeng

Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes. [Expand]

Nested networks or slimmable networks are neural networks whose architectures can be adjusted instantly during testing time, e.g., based on computational constraints. [Expand]

PDF

We show that learning affinity in upsampling provides an effective and efficient approach to exploit pairwise interactions in deep networks. [Expand]

PDF

In this paper, we propose a progressive margin loss (PML) approach for unconstrained facial age classification. [Expand]

PDF

We propose a framework for early action recognition and anticipation by correlating past features with the future using three novel similarity measures called Jaccard vector similarity, Jaccard cross-correlation and Jaccard Frobenius inner product over covariances. [Expand]

PDF

Yangye Fu, Ming Zhang, Xing Xu, Zuo Cao, Chao Ma, Yanli Ji, Kai Zuo, Huimin Lu

Multi-Source Domain Adaptation (MSDA), which dedicates to transfer the knowledge learned from multiple source domains to an unlabeled target domain, has drawn increasing attention in the research community. [Expand]

PDF

Show Tweets

0.00

Friday Poster Session

[1198]

STMTrack: Template-Free Visual Tracking With Space-Time Memory Networks

Zhihong Fu, Qingjie Liu, Zehua Fu, Yunhong Wang

Boosting performance of the offline trained siamese trackers is getting harder nowadays since the fixed information of the template cropped from the first frame has been almost thoroughly mined, but they are poorly capable of resisting target appearance changes. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Thursday Poster Session

[1199]

Robust Point Cloud Registration Framework Based on Deep Graph Matching

Kexue Fu, Shaolei Liu, Xiaoyuan Luo, Manning Wang

3D point cloud registration is a fundamental problem in computer vision and robotics. [Expand]

Hidden features in neural network usually fail to learn informative representation for 3D segmentation as supervisions are only given on output prediction, while this can be solved by omni-scale supervision on intermediate layers. [Expand]

Heng Guo, Fumio Okura, Boxin Shi, Takuya Funatomi, Yasuhiro Mukaigawa, Yasuyuki Matsushita

Multispectral photometric stereo (MPS) aims at recovering the surface normal of a scene from a single-shot multispectral image, which is known as an ill-posed problem. [Expand]

PDF

Show Tweets

0.00

Monday Poster Session

[1219]

Online Multiple Object Tracking With Cross-Task Synergy

Song Guo, Jingya Wang, Xinchao Wang, Dacheng Tao

Modern online multiple object tracking (MOT) methods usually focus on two directions to improve tracking performance. [Expand]

PDF

Generalized zero-shot learning (GZSL) aims to recognize objects from both seen and unseen classes, when only the labeled examples from seen classes are provided. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Monday Poster Session

[1223]

Learning To Fuse Asymmetric Feature Maps in Siamese Trackers

Wencheng Han, Xingping Dong, Fahad Shahbaz Khan, Ling Shao, Jianbing Shen

Recently, Siamese-based trackers have achieved promising performance in visual tracking. [Expand]

PDF

We propose a novel guided interactive segmentation (GIS) algorithm for video objects to improve the segmentation accuracy and reduce the interaction time. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Wednesday Poster Session

[1227]

DyCo3D: Robust Instance Segmentation of 3D Point Clouds Through Dynamic Convolution

Tong He, Chunhua Shen, Anton van den Hengel

Previous top-performing approaches for point cloud instance segmentation involve a bottom-up strategy, which often includes inefficient operations or complex pipelines, such as grouping over-segmented components, introducing additional steps for refining, or designing complicated loss functions. [Expand]

[1244]

MetaSets: Meta-Learning on Point Sets for Generalizable Representations

Chao Huang, Zhangjie Cao, Yunbo Wang, Jianmin Wang, Mingsheng Long

Deep learning techniques for point clouds have achieved strong performance on a range of 3D vision tasks. [Expand]

Yan-Cheng Huang, Yi-Hsin Chen, Cheng-You Lu, Hui-Po Wang, Wen-Hsiao Peng, Ching-Chun Huang

This paper addresses the video rescaling task, which arises from the needs of adapting the video spatial resolution to suit individual viewing devices. [Expand]

PDF

Hand gesture-to-gesture translation is a significant and interesting problem, which serves as a key role in many applications, such as sign language production. [Expand]

PDF

Show Tweets

0.00

Friday Poster Session

[1257]

Safe Local Motion Planning With Self-Supervised Freespace Forecasting

Peiyun Hu, Aaron Huang, John Dolan, David Held, Deva Ramanan

Safe local motion planning for autonomous driving in dynamic environments requires forecasting how the scene evolves. [Expand]

PDF

Show Tweets

0.00

Thursday Poster Session

[1258]

Wide-Depth-Range 6D Object Pose Estimation in Space

Yinlin Hu, Sebastien Speierer, Wenzel Jakob, Pascal Fua, Mathieu Salzmann

6D pose estimation in space poses unique challenges that are not commonly encountered in the terrestrial setting. [Expand]

PDF

Friday Poster Session

[1270]

IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking

Shuai Jia, Yibing Song, Chao Ma, Xiaokang Yang

Adversarial attack arises due to the vulnerability of deep neural networks to perceive input samples injected with imperceptible perturbations. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Tuesday Poster Session

[1271]

In multi-object tracking, the tracker maintains in its memory the appearance and motion information for each object in the scene. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Wednesday Poster Session

[1289]

Joint Negative and Positive Learning for Noisy Labels

Youngdong Kim, Juseung Yun, Hyounguk Shon, Junmo Kim

Training of Convolutional Neural Networks (CNNs) with data with noisy labels is known to be a challenge. [Expand]

PDF

In this paper, we present DRANet, a network architecture that disentangles image representations and transfers the visual attributes in a latent space for unsupervised cross-domain adaptation. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Thursday Poster Session

[1304]

PatchMatch-Based Neighborhood Consensus for Semantic Correspondence

Jae Yong Lee, Joseph DeGol, Victor Fragoso, Sudipta N. Sinha

We address estimating dense correspondences between two images depicting different but semantically related scenes. [Expand]

PDF

Show Tweets

0.00

Thursday Poster Session

[1305]

Network Quantization With Element-Wise Gradient Scaling

Junghyup Lee, Dohyung Kim, Bumsub Ham

Network quantization aims at reducing bit-widths of weights and/or activations, particularly important for implementing deep neural networks with limited hardware resources. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Tuesday Poster Session

[1306]

Relevance-CAM: Your Model Already Knows Where To Look

Temporal action localization is an important yet challenging task in video understanding. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Tuesday Poster Session

[1313]

Multi-View Multi-Person 3D Pose Estimation With Plane Sweep Stereo

Jiahao Lin, Gim Hee Lee

Existing approaches for multi-view multi-person 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views and solve for the 3D pose estimation for each person. [Expand]

PDF

A common strategy for improving model robustness is through data augmentations. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Wednesday Poster Session

[1317]

Cluster-Wise Hierarchical Generative Model for Deep Amortized Clustering

Huafeng Liu, Jiaqi Wang, Liping Jing

In this paper, we propose Cluster-wise Hierarchical Generative Model for deep amortized clustering (CHiGac). [Expand]

PDF

Show Tweets

0.00

Thursday Poster Session

[1318]

Context-Aware Biaffine Localizing Network for Temporal Sentence Grounding

Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Yu Cheng, Wei Wei, Zichuan Xu, Yulai Xie

This paper addresses the problem of temporal sentence grounding (TSG), which aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. [Expand]

PDF

Inferring 3D structure of a generic object from a 2D image is a long-standing objective of computer vision. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Wednesday Poster Session

[1323]

Generic Perceptual Loss for Modeling Structured Output Dependencies

Yifan Liu, Hao Chen, Yu Chen, Wei Yin, Chunhua Shen

The perceptual loss has been widely used as an effective loss term in image synthesis tasks including image super-resolution [16], and style transfer [14]. [Expand]

PDF

Video-based person re-identification aims to match pedestrians from video sequences across non-overlapping camera views. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Tuesday Poster Session

[1332]

3D Human Action Representation Learning via Cross-View Consistency Pursuit

Linguo Li, Minsi Wang, Bingbing Ni, Hang Wang, Jiancheng Yang, Wenjun Zhang

In this work, we propose a Cross-view Contrastive Learning framework for unsupervised 3D skeleton-based action representation (CrosSCLR), by leveraging multi-view complementary supervision signal. [Expand]

Uncertainty is the only certainty there is. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Thursday Poster Session

[1346]

Lighting, Reflectance and Geometry Estimation From 360deg Panoramic Stereo

[1354]

Self-Supervised Video Hashing via Bidirectional Transformers

Shuyan Li, Xiu Li, Jiwen Lu, Jie Zhou

Most existing unsupervised video hashing methods are built on unidirectional models with less reliable training objectives, which underuse the correlations among frames and the similarity structure between videos. [Expand]

PDF

Show Tweets

0.00

Thursday Poster Session

[1355]

Spatial Feature Calibration and Temporal Fusion for Effective One-Stage Video Instance Segmentation

Minghan Li, Shuai Li, Lida Li, Lei Zhang

Modern one-stage video instance segmentation networks suffer from two limitations. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Wednesday Poster Session

[1356]

Spherical Confidence Learning for Face Recognition

Shen Li, Jianqing Xu, Xiaqing Xu, Pengcheng Shen, Shaoxin Li, Bryan Hooi

An emerging line of research has found that spherical spaces better match the underlying geometry of facial images, as evidenced by the state-of-the-art facial recognition methods which benefit empirically from spherical representations. [Expand]

PDF

Show Tweets

0.00

Friday Poster Session

[1357]

The Heterogeneity Hypothesis: Finding Layer-Wise Differentiated Network Architectures

Yawei Li, Wen Li, Martin Danelljan, Kai Zhang, Shuhang Gu, Luc Van Gool, Radu Timofte

In this paper, we tackle the problem of convolutional neural network design. [Expand]

Temporal action detection on unconstrained videos has seen significant research progress in recent years. [Expand]

PDF

Show Tweets

0.00

Tuesday Poster Session

[1363]

VirFace: Enhancing Face Recognition via Unlabeled Shallow Data

Personalized Outfit Recommendation With Learnable Anchors

Zhi Lu, Yang Hu, Yan Chen, Bing Zeng

The multimedia community has recently seen a tremendous surge of interest in the fashion recommendation problem. [Expand]

PDF

Show Tweets

0.00

Thursday Poster Session

[1378]

Learning Normal Dynamics in Videos With Meta Prototype Network

Hui Lv, Chen Chen, Zhen Cui, Chunyan Xu, Yong Li, Jian Yang

Frame reconstruction (current or future frames) based on Auto-Encoder (AE) is a popular method for video anomaly detection. [Expand]

PDF

Recent works have shown that interval bound propagation (IBP) can be used to train verifiably robust neural networks. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Tuesday Poster Session

[1382]

Efficient Multi-Stage Video Denoising With Recurrent Spatio-Temporal Fusion

Matteo Maggioni, Yibin Huang, Cheng Li, Shuai Xiao, Zhongqian Fu, Fenglong Song

In recent years, denoising methods based on deep learning have achieved unparalleled performance at the cost of large computational complexity. [Expand]

PDF

Localizing actions in video is a core task in computer vision. [Expand]

PDF

[1399]

Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction

Sriram Narayanan, Ramin Moslemi, Francesco Pittaluga, Buyu Liu, Manmohan Chandraker

Trajectory prediction is a safety-critical tool for autonomous vehicles to plan and execute actions. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Friday Poster Session

[1400]

FixBi: Bridging Domain Spaces for Unsupervised Domain Adaptation

Jaemin Na, Heechul Jung, Hyung Jin Chang, Wonjun Hwang

Unsupervised domain adaptation (UDA) methods for learning domain invariant representations have achieved remarkable progress. [Expand]

PDF

[1406]

Controlling the Rain: From Removal to Rendering

Siqi Ni, Xueyun Cao, Tao Yue, Xuemei Hu

Existing rain image editing methods focus on either removing rain from rain images or rendering rain on rain-free images. [Expand]

PDF

Show Tweets

0.00

Tuesday Poster Session

[1407]

HVPR: Hybrid Voxel-Point Representation for Single-Stage 3D Object Detection

Jongyoun Noh, Sanghoon Lee, Bumsub Ham

We address the problem of 3D object detection, that is, estimating 3D object bounding boxes from point clouds. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Thursday Poster Session

[1408]

Automated Log-Scale Quantization for Low-Cost Deep Neural Networks

Sangyun Oh, Hyeonuk Sim, Sugil Lee, Jongeun Lee

Quantization plays an important role in deep neural network (DNN) hardware. [Expand]

PDF

Show Tweets

0.00

Monday Poster Session

[1409]

Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation

Youngmin Oh, Beomjun Kim, Bumsub Ham

We address the problem of weakly-supervised semantic segmentation (WSSS) using bounding box annotations. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Tuesday Poster Session

[1410]

Protecting Intellectual Property of Generative Adversarial Networks From Ambiguity Attacks

Ding Sheng Ong, Chee Seng Chan, Kam Woh Ng, Lixin Fan, Qiang Yang

Ever since Machine Learning as a Service emerges as a viable business that utilizes deep learning models to generate lucrative revenue, Intellectual Property Right (IPR) has become a major concern because these deep learning models can easily be replicated, shared, and re-distributed by any unauthorized third parties. [Expand]

PDF

arXiv

Show Tweets

0.00

Tuesday Poster Session

[1411]

A Quasiconvex Formulation for Radial Cameras

Carl Olsson, Viktor Larsson, Fredrik Kahl

In this paper we study structure from motion problems for 1D radial cameras. [Expand]

PDF

Show Tweets

0.00

Thursday Poster Session

[1412]

Prashant Pandey, Mrigank Raman, Sumanth Varambally, Prathosh AP

Generalization of machine learning models trained on a set of source domains on unseen target domains with different statistics, is a challenging problem. [Expand]

PDF

Show Tweets

0.00

Thursday Poster Session

[1418]

Trajectory Prediction With Latent Belief Energy-Based Model

Bo Pang, Tianyang Zhao, Xu Xie, Ying Nian Wu

Human trajectory prediction is critical for autonomous platforms like self-driving cars or social robots. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Thursday Poster Session

[1419]

Recorrupted-to-Recorrupted: Unsupervised Deep Learning for Image Denoising

Tongyao Pang, Huan Zheng, Yuhui Quan, Hui Ji

Deep denoiser, the deep network for denoising, has been the focus of the recent development on image denoising. [Expand]

PDF

Show Tweets

0.00

Monday Poster Session

[1420]

Unsupervised Hyperbolic Representation Learning via Message Passing Auto-Encoders

Jiwoong Park, Junho Cho, Hyung Jin Chang, Jin Young Choi

Most of the existing literature regarding hyperbolic embedding concentrate upon supervised learning, whereas the use of unsupervised hyperbolic embedding is less well explored. [Expand]

PDF

Video content is multifaceted, consisting of objects, scenes, interactions or actions. [Expand]

PDF

Show Tweets

0.00

Thursday Poster Session

[1430]

Effective Snapshot Compressive-Spectral Imaging via Deep Denoising and Total Variation Priors

Haiquan Qiu, Yao Wang, Deyu Meng

Snapshot compressive imaging (SCI) is a new type of compressive imaging system that compresses multiple frames of images into a single snapshot measurement, which enjoys low cost, low bandwidth, and high-speed sensing rate. [Expand]

PDF

Show Tweets

0.00

Wednesday Poster Session

[1431]

PQA: Perceptual Question Answering

Yonggang Qi, Kai Zhang, Aneeshan Sain, Yi-Zhe Song

Perceptual organization remains one of the very few established theories on the human visual system. [Expand]

PDF

0.00

Friday Poster Session

[1452]

Hierarchical Layout-Aware Graph Convolutional Network for Unified Aesthetics Assessment

Dongyu She, Yu-Kun Lai, Gaoxiong Yi, Kun Xu

Learning computational models of image aesthetics can have a substantial impact on visual art and graphic design. [Expand]

PDF

Show Tweets

0.00

Wednesday Poster Session

[1453]

Learning by Planning: Language-Guided Global Image Editing

Jing Shi, Ning Xu, Yihang Xu, Trung Bui, Franck Dernoncourt, Chenliang Xu

Recently, language-guided global image editing draws increasing attention with growing application potentials. [Expand]

In this paper, we address the problem of referring expression comprehension in videos, which is challenging due to complex expression and scene dynamics. [Expand]

PDF

0.00

Wednesday Poster Session

[1473]

Tuning IR-Cut Filter for Illumination-Aware Spectral Reconstruction From RGB

Bo Sun, Junchi Yan, Xiao Zhou, Yinqiang Zheng

To reconstruct spectral signals from multi-channel observations, in particular trichromatic RGBs, has recently emerged as a promising alternative to traditional scanning-based spectral imager. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Monday Poster Session

[1474]

Uncertainty Reduction for Model Adaptation in Semantic Segmentation

Prabhu Teja S, Francois Fleuret

Traditional methods for Unsupervised Domain Adaptation (UDA) targeting semantic segmentation exploit information common to the source and target domains, using both labeled source data and unlabeled target data. [Expand]

Transfer learning across heterogeneous data distributions (a.k.a. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Friday Poster Session

[1483]

Mirror3D: Depth Refinement for Mirror Surfaces

PDF

Semantic Scholar

Show Tweets

0.00

Thursday Poster Session

[1491]

Reconsidering Representation Alignment for Multi-View Clustering

Daniel J. Trosten, Sigurd Lokse, Robert Jenssen, Michael Kampffmeyer

Aligning distributions of view representations is a core component of today's state of the art models for deep multi-view clustering. [Expand]

PDF

Alexander Vakhitov, Luis Ferraz, Antonio Agudo, Francesc Moreno-Noguer

Perspective-n-Point-and-Line (PnPL) algorithms aim at fast, accurate, and robust camera localization with respect to a 3D model from 2D-3D feature correspondences, being a major part of modern robotic and AR/VR systems. [Expand]

PDF

Show Tweets

0.00

Tuesday Poster Session

[1496]

Can We Characterize Tasks Without Labels or Features?

Bram Wallace, Ziyang Wu, Bharath Hariharan

The problem of expert model selection deals with choosing the appropriate pretrained network ("expert") to transfer to a target task. [Expand]

PDF

Show Tweets

0.00

Monday Poster Session

[1497]

A Self-Boosting Framework for Automated Radiographic Report Generation

PDF

Show Tweets

0.00

Monday Poster Session

[1503]

FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation

Sijin Wang, Ziwei Yao, Ruiping Wang, Zhongqin Wu, Xilin Chen

Image caption evaluation is a crucial task, which involves the semantic perception and matching of image and text. [Expand]

PDF

Show Tweets

0.00

Thursday Poster Session

[1504]

FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds

Haiyan Wang, Jiahao Pang, Muhammad A. Lodhi, Yingli Tian, Dong Tian

Scene flow depicts the dynamics of a 3D scene, which is critical for various applications such as autonomous driving, robot navigation, AR/VR, etc. [Expand]

PDF

Recent success in casting Non-rigid Structure from Motion (NRSfM) as an unsupervised deep learning problem has raised fundamental questions about what novelty in NRSfM prior could the deep learning offer. [Expand]

PDF

Although vanilla Convolutional Neural Network (CNN) based detectors can achieve satisfactory performance on fake face detection, we observe that the detectors tend to seek forgeries on a limited region of face, which reveals that the detectors is short of understanding of forgery. [Expand]

PDF

The vision-based reinforcement learning (RL) has achieved tremendous success. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Tuesday Poster Session

[1520]

The 3D world limits the human body pose and the human body pose conveys information about the surrounding objects. [Expand]

PDF

Unsupervised domain adaptation (UDA) aims to improve the classification performance on an unlabeled target domain by leveraging information from a fully labeled source domain. [Expand]

PDF

Zhengqin Xu, Rui He, Shoulie Xie, Shiqian Wu

Robust principal component analysis (RPCA) and its variants have gained wide applications in computer vision. [Expand]

PDF

Show Tweets

0.00

Tuesday Poster Session

[1544]

Consistent Instance False Positive Improves Fairness in Face Recognition

Xingkun Xu, Yuge Huang, Pengcheng Shen, Shaoxin Li, Jilin Li, Feiyue Huang, Yong Li, Zhen Cui

Demographic bias is a significant challenge in practical face recognition systems. [Expand]

Text-based image captioning (TextCap) which aims to read and reason images with texts is crucial for a machine to understand a detailed and complex scene environment, considering that texts are omnipresent in daily life. [Expand]

PDF

The standard way of training video models entails sampling at each iteration a single clip from a video and optimizing the clip prediction with respect to the video-level label. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Wednesday Poster Session

[1552]

CT-Net: Complementary Transfering Network for Garment Transfer With Arbitrary Geometric Changes

Fan Yang, Guosheng Lin

Garment transfer shows great potential in realistic applications with the goal of transfering outfits across different people images. [Expand]

Luwei Yang, Heng Li, Jamal Ahmed Rahim, Zhaopeng Cui, Ping Tan

This paper presents an end-to-end neural network for multiple rotation averaging in SfM. [Expand]

PDF

Show Tweets

0.00

Thursday Poster Session

[1557]

Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods

Minghan Yang, Dong Xu, Hongyu Chen, Zaiwen Wen, Mengyun Chen

In this paper, we consider stochastic second-order methods for minimizing a finite summation of nonconvex functions. [Expand]

Multi-person pose estimation and tracking serve as crucial steps for video understanding. [Expand]

PDF

Distortion rectification is often required for fisheye images. [Expand]

PDF

Person search aims to simultaneously localize and identify a query person from realistic, uncropped images, which can be regarded as the unified task of pedestrian detection and person re-identification (re-id). [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Wednesday Poster Session

[1570]

Discrete-Continuous Action Space Policy Gradient-Based Attention for Image-Text Matching

Shiyang Yan, Li Yu, Yuan Xie

Image-text matching is an important multi-modal task with massive applications. [Expand]

Though machine learning algorithms are able to achieve pattern recognition from the correlation between data and labels, the presence of spurious features in the data decreases the robustness of these learned relationships with respect to varied testing environments. [Expand]

PDF

Show Tweets

0.00

Thursday Poster Session

[1578]

Linguistic Structures As Weak Supervision for Visual Scene Graph Generation

Keren Ye, Adriana Kovashka

Prior work in scene graph generation requires categorical supervision at the level of triplets---subjects and objects, and predicates that relate them, either with or without bounding box information. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Wednesday Poster Session

[1579]

Iso-Points: Optimizing Neural Implicit Surfaces With Hybrid Representations

Wang Yifan, Shihao Wu, Cengiz Oztireli, Olga Sorkine-Hornung

Neural implicit functions have emerged as a powerful representation for surfaces in 3D. [Expand]

Generative adversarial network (GAN) has become one of the most important neural network models for classical unsupervised machine learning. [Expand]

PDF

Mengyao Zhai, Lei Chen, Greg Mori

Deep neural networks are susceptible to catastrophic forgetting: when encountering a new task, they can only remember the new task and fail to preserve its ability to accomplish previously learned tasks. [Expand]

We propose a deep learning system for attention-guided dual-layer image compression (AGDL). [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Thursday Poster Session

[1591]

Coarse-To-Fine Person Re-Identification With Auxiliary-Domain Classification and Second-Order Information Bottleneck

Anguo Zhang, Yueming Gao, Yuzhen Niu, Wenxi Liu, Yongcheng Zhou

Person re-identification (Re-ID) is to retrieve a particular person captured by different cameras, which is of great significance for security surveillance and pedestrian behavior analysis. [Expand]

PDF

Show Tweets

0.00

Monday Poster Session

[1592]

Confluent Vessel Trees With Accurate Bifurcations

Zhongwen Zhang, Dmitrii Marin, Maria Drangova, Yuri Boykov

We are interested in unsupervised reconstruction of complex near-capillary vasculature with thousands of bifurcations where supervision and learning are infeasible. [Expand]

PDF

Learning to detect novel objects with a few instances is challenging. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Thursday Poster Session

[1601]

iVPF: Numerical Invertible Volume Preserving Flow for Efficient Lossless Compression

Shifeng Zhang, Chen Zhang, Ning Kang, Zhenguo Li

It is nontrivial to store rapidly growing big data nowadays, which demands high-performance lossless compression techniques. [Expand]

PDF

The precise localization of 3D objects from a single image without depth information is a highly challenging problem. [Expand]

PDF

Yajing Zheng, Lingxiao Zheng, Zhaofei Yu, Boxin Shi, Yonghong Tian, Tiejun Huang

Fovea, located in the centre of the retina, is specialized for high-acuity vision. [Expand]

PDF

Show Tweets

0.00

Tuesday Poster Session

[1639]

Improving Multiple Object Tracking With Single Object Tracking

Linyu Zheng, Ming Tang, Yingying Chen, Guibo Zhu, Jinqiao Wang, Hanqing Lu

Despite considerable similarities between multiple object tracking (MOT) and single object tracking (SOT) tasks, modern MOT methods have not benefited from the development of SOT ones to achieve satisfactory performance. [Expand]

PDF

Show Tweets

0.00

Monday Poster Session

[1640]

[1646]

Neighborhood Contrastive Learning for Novel Class Discovery

Traditional classifiers are deployed under closed-set setting, with both training and test classes belong to the same set. [Expand]

PDF

Semantic Scholar

arXiv

Show Tweets

0.00

Tuesday Poster Session

[1655]

Monocular 3D Object Detection: An Extrinsic Parameter Free Approach

Hieu Pham, Zihang Dai, Qizhe Xie, Quoc V. Le

Animating Pictures With Eulerian Motion Fields

Aleksander Holynski, Brian L. Curless, Steven M. Seitz, Richard Szeliski

Taming Transformers for High-Resolution Image Synthesis

Patrick Esser, Robin Rombach, Bjorn Ommer

Real-Time High-Resolution Background Matting

Shanchuan Lin, Andrey Ryabtsev, Soumyadip Sengupta, Brian L. Curless, Steven M. Seitz, Ira Kemelmacher-Shlizerman

RepVGG: Making VGG-Style ConvNets Great Again

Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, Guiguang Ding, Jian Sun

Natural Adversarial Examples

Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, Dawn Song

VirTex: Learning Visual Representations From Textual Annotations

Karan Desai, Justin Johnson

One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing

Ting-Chun Wang, Arun Mallya, Ming-Yu Liu

Learning Continuous Image Representation With Local Implicit Image Function

Yinbo Chen, Sifei Liu, Xiaolong Wang

Im2Vec: Synthesizing Vector Graphics Without Vector Supervision

Pradyumna Reddy, Michael Gharbi, Michal Lukac, Niloy J. Mitra

Exploring Simple Siamese Representation Learning

Xinlei Chen, Kaiming He

Bottleneck Transformers for Visual Recognition

Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani

Involution: Inverting the Inherence of Convolution for Visual Recognition

Duo Li, Jie Hu, Changhu Wang, Xiangtai Li, Qi She, Lei Zhu, Tong Zhang, Qifeng Chen

Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segmentation

Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D. Cubuk, Quoc V. Le, Barret Zoph

NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections

Ricardo Martin-Brualla, Noha Radwan, Mehdi S. M. Sajjadi, Jonathan T. Barron, Alexey Dosovitskiy, Daniel Duckworth

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

S. Mahdi H. Miangoleh, Sebastian Dille, Long Mai, Sylvain Paris, Yagiz Aksoy

Robust Consistent Video Depth Estimation

Johannes Kopf, Xuejian Rong, Jia-Bin Huang

NeX: Real-Time View Synthesis With Neural Basis Expansion

Suttisak Wizadwongsa, Pakkapon Phongthawee, Jiraphon Yenphraphai, Supasorn Suwajanakorn

Motion Representations for Articulated Animation

Aliaksandr Siarohin, Oliver J. Woodford, Jian Ren, Menglei Chai, Sergey Tulyakov

Omnimatte: Associating Objects and Their Effects in Video

Erika Lu, Forrester Cole, Tali Dekel, Andrew Zisserman, William T. Freeman, Michael Rubinstein

Closed-Form Factorization of Latent Semantics in GANs

Yujun Shen, Bolei Zhou

Scene Essence

Jiayan Qiu, Yiding Yang, Xinchao Wang, Dacheng Tao

Neural Geometric Level of Detail: Real-Time Rendering With Implicit 3D Shapes

Towaki Takikawa, Joey Litalien, Kangxue Yin, Karsten Kreis, Charles Loop, Derek Nowrouzezahrai, Alec Jacobson, Morgan McGuire, Sanja Fidler

Back to the Feature: Learning Robust Camera Localization From Pixels To Pose

Paul-Edouard Sarlin, Ajaykumar Unagar, Mans Larsson, Hugo Germain, Carl Toft, Viktor Larsson, Marc Pollefeys, Vincent Lepetit, Lars Hammarstrand, Fredrik Kahl, Torsten Sattler

Holistic 3D Scene Understanding From a Single Image With Implicit Representation

Cheng Zhang, Zhaopeng Cui, Yinda Zhang, Bing Zeng, Marc Pollefeys, Shuaicheng Liu

Pre-Trained Image Processing Transformer

Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, Wen Gao

Stylized Neural Painting

Zhengxia Zou, Tianyang Shi, Shuang Qiu, Yi Yuan, Zhenwei Shi

ArtEmis: Affective Language for Visual Art

Panos Achlioptas, Maks Ovsjanikov, Kilichbek Haydarov, Mohamed Elhoseiny, Leonidas J. Guibas

DatasetGAN: Efficient Labeled Data Factory With Minimal Human Effort

Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, Sanja Fidler

CutPaste: Self-Supervised Learning for Anomaly Detection and Localization

Chun-Liang Li, Kihyuk Sohn, Jinsung Yoon, Tomas Pfister

DriveGAN: Towards a Controllable High-Quality Neural Simulation

Seung Wook Kim, Jonah Philion, Antonio Torralba, Sanja Fidler

Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes

Zhengqi Li, Simon Niklaus, Noah Snavely, Oliver Wang

GAN Prior Embedded Network for Blind Face Restoration in the Wild

Tao Yang, Peiran Ren, Xuansong Xie, Lei Zhang

Image Generators With Conditionally-Independent Pixel Synthesis

Ivan Anokhin, Kirill Demochkin, Taras Khakhulin, Gleb Sterkin, Victor Lempitsky, Denis Korzhenkov

Semantic Segmentation With Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization

Daiqing Li, Junlin Yang, Karsten Kreis, Antonio Torralba, Sanja Fidler

Scaling Local Self-Attention for Parameter Efficient Visual Backbones

Ashish Vaswani, Prajit Ramachandran, Aravind Srinivas, Niki Parmar, Blake Hechtman, Jonathon Shlens

Encoding in Style: A StyleGAN Encoder for Image-to-Image Translation

Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, Daniel Cohen-Or

Rethinking Semantic Segmentation From a Sequence-to-Sequence Perspective With Transformers

Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip H.S. Torr, Li Zhang

MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments From a Single Moving Camera

Felix Wimbauer, Nan Yang, Lukas von Stumberg, Niclas Zeller, Daniel Cremers

Built by Matt Deitke