Skip to main content

Showing 1–50 of 115 results for author: Davis, S

Searching in archive cs. Search in all archives.
.
  1. Sketching AI Concepts with Capabilities and Examples: AI Innovation in the Intensive Care Unit

    Authors: Nur Yildirim, Susanna Zlotnikov, Deniz Sayar, Jeremy M. Kahn, Leigh A. Bukowski, Sher Shah Amin, Kathryn A. Riman, Billie S. Davis, John S. Minturn, Andrew J. King, Dan Ricketts, Lu Tang, Venkatesh Sivaraman, Adam Perer, Sarah M. Preum, James McCann, John Zimmerman

    Abstract: Advances in artificial intelligence (AI) have enabled unprecedented capabilities, yet innovation teams struggle when envisioning AI concepts. Data science teams think of innovations users do not want, while domain experts think of innovations that cannot be built. A lack of effective ideation seems to be a breakdown point. How might multidisciplinary teams identify buildable and desirable use case… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: to appear at CHI 2024

  2. arXiv:2310.05010  [pdf, other

    cs.CV

    Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data

    Authors: Zuxuan Wu, Zejia Weng, Wujian Peng, Xitong Yang, Ang Li, Larry S. Davis, Yu-Gang Jiang

    Abstract: Despite significant results achieved by Contrastive Language-Image Pretraining (CLIP) in zero-shot image recognition, limited effort has been made exploring its potential for zero-shot video recognition. This paper presents Open-VCLIP++, a simple yet effective framework that adapts CLIP to a strong zero-shot video classifier, capable of identifying novel actions and events during testing. Open-VCL… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2302.00624

  3. arXiv:2303.14368  [pdf, other

    cs.CV cs.AI cs.LG

    FlexNeRF: Photorealistic Free-viewpoint Rendering of Moving Humans from Sparse Views

    Authors: Vinoj Jayasundara, Amit Agrawal, Nicolas Heron, Abhinav Shrivastava, Larry S. Davis

    Abstract: We present FlexNeRF, a method for photorealistic freeviewpoint rendering of humans in motion from monocular videos. Our approach works well with sparse views, which is a challenging scenario when the subject is exhibiting fast/complex motions. We propose a novel approach which jointly optimizes a canonical time and pose configuration, with a pose-dependent motion field and pose-independent tempora… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  4. arXiv:2212.05667  [pdf, other

    cs.CV

    Fighting Malicious Media Data: A Survey on Tampering Detection and Deepfake Detection

    Authors: Junke Wang, Zhenxin Li, Chao Zhang, **g**g Chen, Zuxuan Wu, Larry S. Davis, Yu-Gang Jiang

    Abstract: Online media data, in the forms of images and videos, are becoming mainstream communication channels. However, recent advances in deep learning, particularly deep generative models, open the doors for producing perceptually convincing images and videos at a low cost, which not only poses a serious threat to the trustworthiness of digital information but also has severe societal implications. This… ▽ More

    Submitted 11 December, 2022; originally announced December 2022.

  5. arXiv:2211.07867  [pdf, other

    cs.LG eess.SP q-bio.NC

    Machine Learning Methods Applied to Cortico-Cortical Evoked Potentials Aid in Localizing Seizure Onset Zones

    Authors: Ian G. Malone, Kaleb E. Smith, Morgan E. Urdaneta, Tyler S. Davis, Daria Nesterovich Anderson, Brian J. Phillip, John D. Rolston, Christopher R. Butson

    Abstract: Epilepsy affects millions of people, reducing quality of life and increasing risk of premature death. One-third of epilepsy cases are drug-resistant and require surgery for treatment, which necessitates localizing the seizure onset zone (SOZ) in the brain. Attempts have been made to use cortico-cortical evoked potentials (CCEPs) to improve SOZ localization but none have been successful enough for… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 6 pages

  6. arXiv:2208.01813  [pdf, other

    cs.CV

    TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation

    Authors: Jun Wang, Mingfei Gao, Yuqian Hu, Ramprasaath R. Selvaraju, Chetan Ramaiah, Ran Xu, Joseph F. JaJa, Larry S. Davis

    Abstract: Text-VQA aims at answering questions that require understanding the textual cues in an image. Despite the great progress of existing Text-VQA methods, their performance suffers from insufficient human-labeled question-answer (QA) pairs. However, we observe that, in general, the scene text is not fully exploited in the existing datasets -- only a small portion of the text in each image participates… ▽ More

    Submitted 7 October, 2022; v1 submitted 2 August, 2022; originally announced August 2022.

    Comments: BMVC 2022

  7. arXiv:2112.04598  [pdf, other

    cs.CV cs.LG stat.ML

    InvGAN: Invertible GANs

    Authors: Partha Ghosh, Dominik Zietlow, Michael J. Black, Larry S. Davis, Xiaochen Hu

    Abstract: Generation of photo-realistic images, semantic editing and representation learning are a few of many potential applications of high resolution generative models. Recent progress in GANs have established them as an excellent choice for such tasks. However, since they do not provide an inference model, image editing or downstream tasks such as classification can not be done on real images using the… ▽ More

    Submitted 10 December, 2021; v1 submitted 8 December, 2021; originally announced December 2021.

  8. arXiv:2106.00168  [pdf, other

    cs.CV

    Rethinking Pseudo Labels for Semi-Supervised Object Detection

    Authors: Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, Larry S. Davis

    Abstract: Recent advances in semi-supervised object detection (SSOD) are largely driven by consistency-based pseudo-labeling methods for image classification tasks, producing pseudo labels as supervisory signals. However, when using pseudo labels, there is a lack of consideration in localization precision and amplified class imbalance, both of which are critical for detection tasks. In this paper, we introd… ▽ More

    Submitted 29 December, 2021; v1 submitted 31 May, 2021; originally announced June 2021.

    Comments: AAAI 2022

  9. arXiv:2105.06464  [pdf, other

    cs.CV cs.LG

    DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

    Authors: Shiyi Lan, Zhiding Yu, Christopher Choy, Subhashree Radhakrishnan, Guilin Liu, Yuke Zhu, Larry S. Davis, Anima Anandkumar

    Abstract: We introduce DiscoBox, a novel framework that jointly learns instance segmentation and semantic correspondence using bounding box supervision. Specifically, we propose a self-ensembling framework where instance segmentation and semantic correspondence are jointly guided by a structured teacher in addition to the bounding box supervision. The teacher is a structured energy model incorporating a pai… ▽ More

    Submitted 5 June, 2021; v1 submitted 13 May, 2021; originally announced May 2021.

    Comments: Tech Report

  10. arXiv:2104.14557  [pdf, other

    cs.CV

    Learned Spatial Representations for Few-shot Talking-Head Synthesis

    Authors: Moustafa Meshry, Saksham Suri, Larry S. Davis, Abhinav Shrivastava

    Abstract: We propose a novel approach for few-shot talking-head synthesis. While recent works in neural talking heads have produced promising results, they can still produce images that do not preserve the identity of the subject in source images. We posit this is a result of the entangled representation of each subject in a single latent code that models 3D shape information, identity cues, colors, lightin… ▽ More

    Submitted 29 April, 2021; originally announced April 2021.

    Comments: http://www.cs.umd.edu/~mmeshry/projects/lsr/

  11. arXiv:2104.07098  [pdf, other

    cs.CV

    StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis

    Authors: Moustafa Meshry, Yixuan Ren, Larry S Davis, Abhinav Shrivastava

    Abstract: We propose a novel approach for multi-modal Image-to-image (I2I) translation. To tackle the one-to-many relationship between input and output domains, previous works use complex training objectives to learn a latent embedding, jointly with the generator, that models the variability of the output domain. In contrast, we directly model the style variability of images, independent of the image synthe… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

  12. arXiv:2103.13612  [pdf, other

    cs.CV cs.LG

    THAT: Two Head Adversarial Training for Improving Robustness at Scale

    Authors: Zuxuan Wu, Tom Goldstein, Larry S. Davis, Ser-Nam Lim

    Abstract: Many variants of adversarial training have been proposed, with most research focusing on problems with relatively few classes. In this paper, we propose Two Head Adversarial Training (THAT), a two-stream adversarial learning network that is designed to handle the large-scale many-class ImageNet dataset. The proposed method trains a network with two heads and two loss functions; one to minimize fea… ▽ More

    Submitted 25 March, 2021; originally announced March 2021.

  13. arXiv:2102.05646  [pdf, other

    cs.CV cs.AI

    Scale Normalized Image Pyramids with AutoFocus for Object Detection

    Authors: Bharat Singh, Mahyar Najibi, Abhishek Sharma, Larry S. Davis

    Abstract: We present an efficient foveal framework to perform object detection. A scale normalized image pyramid (SNIP) is generated that, like human vision, only attends to objects within a fixed size range at different scales. Such a restriction of objects' size during training affords better learning of object-sensitive filters, and therefore, results in better accuracy. However, the use of an image pyra… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.

    Comments: Accepted in T-PAMI 2021

  14. arXiv:2101.11080  [pdf, other

    cs.CV

    Deep Video Inpainting Detection

    Authors: Peng Zhou, Ning Yu, Zuxuan Wu, Larry S. Davis, Abhinav Shrivastava, Ser-Nam Lim

    Abstract: This paper studies video inpainting detection, which localizes an inpainted region in a video both spatially and temporally. In particular, we introduce VIDNet, Video Inpainting Detection Network, which contains a two-stream encoder-decoder architecture with attention module. To reveal artifacts encoded in compression, VIDNet additionally takes in Error Level Analysis frames to augment RGB frames,… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

  15. arXiv:2012.14950  [pdf, other

    cs.CV

    2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition

    Authors: Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, Larry S. Davis

    Abstract: 3D convolutional networks are prevalent for video recognition. While achieving excellent recognition performance on standard benchmarks, they operate on a sequence of frames with 3D convolutions and thus are computationally demanding. Exploiting large variations among different videos, we introduce Ada3D, a conditional computation framework that learns instance-specific 3D usage policies to determ… ▽ More

    Submitted 28 April, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: CVPR 2021

  16. arXiv:2011.10469  [pdf, other

    cs.LG cs.SD eess.AS

    Empirical Evaluation of Deep Learning Model Compression Techniques on the WaveNet Vocoder

    Authors: Sam Davis, Giuseppe Coccia, Sam Gooch, Julian Mack

    Abstract: WaveNet is a state-of-the-art text-to-speech vocoder that remains challenging to deploy due to its autoregressive loop. In this work we focus on ways to accelerate the original WaveNet architecture directly, as opposed to modifying the architecture, such that the model can be deployed as part of a scalable text-to-speech system. We survey a wide variety of model compression techniques that are ame… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

  17. arXiv:2011.10269  [pdf, other

    cs.CV

    SLADE: A Self-Training Framework For Distance Metric Learning

    Authors: Jiali Duan, Yen-Liang Lin, Son Tran, Larry S. Davis, C. -C. Jay Kuo

    Abstract: Most existing distance metric learning approaches use fully labeled data to learn the sample similarities in an embedding space. We present a self-training framework, SLADE, to improve retrieval performance by leveraging additional unlabeled data. We first train a teacher model on the labeled data and use it to generate pseudo labels for the unlabeled data. We then train a student model on both la… ▽ More

    Submitted 29 March, 2021; v1 submitted 20 November, 2020; originally announced November 2020.

    Comments: Accepted by CVPR 2021

  18. arXiv:2008.12432  [pdf, other

    cs.CV

    All About Knowledge Graphs for Actions

    Authors: Pallabi Ghosh, Nirat Saini, Larry S. Davis, Abhinav Shrivastava

    Abstract: Current action recognition systems require large amounts of training data for recognizing an action. Recent works have explored the paradigm of zero-shot and few-shot learning to learn classifiers for unseen categories or categories with few labels. Following similar paradigms in object recognition, these approaches utilize external sources of knowledge (eg. knowledge graphs from language domains)… ▽ More

    Submitted 27 August, 2020; originally announced August 2020.

  19. arXiv:2007.08556  [pdf, other

    cs.CV

    InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling

    Authors: Jun Wang, Shiyi Lan, Mingfei Gao, Larry S. Davis

    Abstract: Real-time 3D object detection is crucial for autonomous cars. Achieving promising performance with high efficiency, voxel-based approaches have received considerable attention. However, previous methods model the input space with features extracted from equally divided sub-regions without considering that point cloud is generally non-uniformly distributed over the space. To address this issue, we… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

  20. arXiv:2004.01170  [pdf, other

    cs.CV

    DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes

    Authors: Mahyar Najibi, Guangda Lai, Abhijit Kundu, Zhichao Lu, Vivek Rathod, Thomas Funkhouser, Caroline Pantofaru, David Ross, Larry S. Davis, Alireza Fathi

    Abstract: We propose DOPS, a fast single-stage 3D object detection method for LIDAR data. Previous methods often make domain-specific design decisions, for example projecting points into a bird-eye view image in autonomous driving scenarios. In contrast, we propose a general-purpose method that works on both indoor and outdoor scenes. The core novelty of our method is a fast, single-pass architecture that b… ▽ More

    Submitted 6 April, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

    Comments: To appear in CVPR 2020

  21. arXiv:2003.12125  [pdf, other

    cs.CV

    SaccadeNet: A Fast and Accurate Object Detector

    Authors: Shiyi Lan, Zhou Ren, Yi Wu, Larry S. Davis, Gang Hua

    Abstract: Object detection is an essential step towards holistic scene understanding. Most existing object detection algorithms attend to certain object areas once and then predict the object locations. However, neuroscientists have revealed that humans do not look at the scene in fixed steadiness. Instead, human eyes move around, locating informative parts to understand the object location. This active per… ▽ More

    Submitted 26 March, 2020; originally announced March 2020.

  22. arXiv:2003.11670  [pdf, other

    cs.CV

    DeepStrip: High Resolution Boundary Refinement

    Authors: Peng Zhou, Brian Price, Scott Cohen, Gregg Wilensky, Larry S. Davis

    Abstract: In this paper, we target refining the boundaries in high resolution images given low resolution masks. For memory and computation efficiency, we propose to convert the regions of interest into strip images and compute a boundary prediction in the strip domain. To detect the target boundary, we present a framework with two prediction layers. First, all potential boundaries are predicted as an initi… ▽ More

    Submitted 25 March, 2020; originally announced March 2020.

    Journal ref: CVPR 2020

  23. arXiv:2003.03626  [pdf

    q-bio.NC cs.RO

    Discrimination Among Multiple Cutaneous and Proprioceptive Hand Percepts Evoked by Nerve Stimulation with Utah Slanted Electrode Arrays in Human Amputees

    Authors: David M. Page, Suzanne M. Wendelken, Tyler S. Davis, David T. Kluger, Douglas T. Hutchinson, Jacob A. George, Gregory A. Clark

    Abstract: Objective: This paper aims to demonstrate functional discriminability among restored hand sensations with different locations, qualities, and intensities that are evoked by microelectrode stimulation of residual afferent fibers in human amputees. Methods: We implanted a Utah Slanted Electrode Array (USEA) in the median and ulnar residual arm nerves of three transradial amputees and delivered stimu… ▽ More

    Submitted 7 March, 2020; originally announced March 2020.

    Comments: 19 pages

  24. arXiv:2001.07791  [pdf, other

    cs.CV

    Depth Completion Using a View-constrained Deep Prior

    Authors: Pallabi Ghosh, Vibhav Vineet, Larry S. Davis, Abhinav Shrivastava, Sudipta Sinha, Neel Joshi

    Abstract: Recent work has shown that the structure of convolutional neural networks (CNNs) induces a strong prior that favors natural images. This prior, known as a deep image prior (DIP), is an effective regularizer in inverse problems such as image denoising and inpainting. We extend the concept of the DIP to depth images. Given color images and noisy and incomplete target depth maps, we optimize a random… ▽ More

    Submitted 1 December, 2020; v1 submitted 21 January, 2020; originally announced January 2020.

  25. arXiv:1912.13000  [pdf, other

    cs.CV cs.LG eess.IV

    Recognizing Instagram Filtered Images with Feature De-stylization

    Authors: Zhe Wu, Zuxuan Wu, Bharat Singh, Larry S. Davis

    Abstract: Deep neural networks have been shown to suffer from poor generalization when small perturbations are added (like Gaussian noise), yet little work has been done to evaluate their robustness to more natural image transformations like photo filters. This paper presents a study on how popular pretrained models are affected by commonly used Instagram filters. To this end, we introduce ImageNet-Instagra… ▽ More

    Submitted 30 December, 2019; originally announced December 2019.

    Comments: Accepted in AAAI 2020 as an oral presentation paper

  26. arXiv:1912.08967  [pdf, other

    cs.CV

    Fashion Outfit Complementary Item Retrieval

    Authors: Yen-Liang Lin, Son Tran, Larry S. Davis

    Abstract: Complementary fashion item recommendation is critical for fashion outfit completion. Existing methods mainly focus on outfit compatibility prediction but not in a retrieval setting. We propose a new framework for outfit complementary item retrieval. Specifically, a category-based subspace attention network is presented, which is a scalable approach for learning the subspace attentions. In addition… ▽ More

    Submitted 4 March, 2020; v1 submitted 18 December, 2019; originally announced December 2019.

    Comments: Accepted by CVPR 2020

  27. arXiv:1912.05086  [pdf, other

    cs.CV

    Learning from Noisy Anchors for One-stage Object Detection

    Authors: Hengduo Li, Zuxuan Wu, Chen Zhu, Caiming Xiong, Richard Socher, Larry S. Davis

    Abstract: State-of-the-art object detectors rely on regressing and classifying an extensive list of possible anchors, which are divided into positive and negative samples based on their intersection-over-union (IoU) with corresponding groundtruth objects. Such a harsh split conditioned on IoU results in binary labels that are potentially noisy and challenging for training. In this paper, we propose to mitig… ▽ More

    Submitted 28 May, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: CVPR 2020 camera ready

  28. arXiv:1912.01601  [pdf, other

    cs.CV

    LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition

    Authors: Zuxuan Wu, Caiming Xiong, Yu-Gang Jiang, Larry S. Davis

    Abstract: This paper presents LiteEval, a simple yet effective coarse-to-fine framework for resource efficient video recognition, suitable for both online and offline scenarios. Exploiting decent yet computationally efficient features derived at a coarse scale with a lightweight CNN model, LiteEval dynamically decides on-the-fly whether to compute more powerful features for incoming video frames at a finer… ▽ More

    Submitted 3 December, 2019; originally announced December 2019.

    Comments: NeurIPS 2019

  29. arXiv:1911.02549  [pdf, other

    cs.LG cs.PF stat.ML

    MLPerf Inference Benchmark

    Authors: Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee , et al. (22 additional authors not shown)

    Abstract: Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devic… ▽ More

    Submitted 9 May, 2020; v1 submitted 6 November, 2019; originally announced November 2019.

    Comments: ISCA 2020

  30. arXiv:1910.07153  [pdf, other

    cs.LG cs.CV

    Consistency-based Semi-supervised Active Learning: Towards Minimizing Labeling Cost

    Authors: Mingfei Gao, Zizhao Zhang, Guo Yu, Sercan O. Arik, Larry S. Davis, Tomas Pfister

    Abstract: Active learning (AL) combines data labeling and model training to minimize the labeling cost by prioritizing the selection of high value data that can best improve model performance. In pool-based active learning, accessible unlabeled data are not used for model training in most conventional methods. Here, we propose to unify unlabeled sample selection and model training towards minimizing labelin… ▽ More

    Submitted 18 July, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

    Comments: Accepted by ECCV2020

  31. arXiv:1909.04412  [pdf, other

    cs.CV

    Cross-X Learning for Fine-Grained Visual Categorization

    Authors: Wei Luo, Xitong Yang, Xianjie Mo, Yuheng Lu, Larry S. Davis, Jun Li, Jian Yang, Ser-Nam Lim

    Abstract: Recognizing objects from subcategories with very subtle differences remains a challenging task due to the large intra-class and small inter-class variation. Recent work tackles this problem in a weakly-supervised manner: object parts are first detected and the corresponding part-specific features are extracted for fine-grained classification. However, these methods typically treat the part-specifi… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

    Comments: accepted by ICCV 2019

  32. A Modular Transradial Bypass Socket for Surface Myoelectric Prosthetic Control in Non-Amputees

    Authors: Michael D. Paskett, Nathaniel R. Olsen, Jacob A. George, David T. Kluger, Mark R. Brinton, Tyler S. Davis, Christopher C. Duncan, Gregory A. Clark

    Abstract: Bypass sockets allow researchers to perform tests of prosthetic systems from the prosthetic user's perspective. We designed a modular upper-limb bypass socket with 3D-printed components that can be easily modified for use with a variety of terminal devices. Our bypass socket preserves access to forearm musculature and the hand, which are necessary for surface electromyography and to provide substi… ▽ More

    Submitted 26 September, 2019; v1 submitted 6 September, 2019; originally announced September 2019.

    Comments: 8 pages, 5 figures

    Journal ref: IEEE Trans. Neural Syst. Rehabil. Eng. (2019)

  33. arXiv:1909.00848  [pdf, other

    cs.CV cs.LG

    HiCoRe: Visual Hierarchical Context-Reasoning

    Authors: Pedro H. Bugatti, Priscila T. M. Saito, Larry S. Davis

    Abstract: Reasoning about images/objects and their hierarchical interactions is a key concept for the next generation of computer vision approaches. Here we present a new framework to deal with it through a visual hierarchical context-based reasoning. Current reasoning methods use the fine-grained labels from images' objects and their interactions to predict labels to new objects. Our framework modifies thi… ▽ More

    Submitted 2 September, 2019; originally announced September 2019.

  34. arXiv:1909.00239  [pdf, other

    cs.CV

    WSLLN: Weakly Supervised Natural Language Localization Networks

    Authors: Mingfei Gao, Larry S. Davis, Richard Socher, Caiming Xiong

    Abstract: We propose weakly supervised language localization networks (WSLLN) to detect events in long, untrimmed videos given language queries. To learn the correspondence between visual segments and texts, most previous methods require temporal coordinates (start and end times) of events for training, which leads to high costs of annotation. WSLLN relieves the annotation burden by training with only video… ▽ More

    Submitted 31 August, 2019; originally announced September 2019.

    Comments: accepted by EMNLP2019

  35. arXiv:1908.10522  [pdf

    q-bio.NC cs.RO

    Intuitive Neuromyoelectric Control of a Dexterous Bionic Arm Using a Modified Kalman Filter

    Authors: Jacob A. George, Tyler S. Davis, Mark R. Brinton, Gregory A. Clark

    Abstract: Background: Multi-articulate prostheses are capable of performing dexterous hand movements. However, clinically available control strategies fail to provide users with intuitive, independent and proportional control over multiple degrees of freedom (DOFs) in real-time. New Method: We detail the use of a modified Kalman filter (MKF) to provide intuitive, independent and proportional control over si… ▽ More

    Submitted 10 October, 2019; v1 submitted 27 August, 2019; originally announced August 2019.

    Comments: 10 figures. Accepted in J. Neurosci. Methods (2019)

  36. arXiv:1906.00495  [pdf, other

    cs.LG cs.CV stat.ML

    Truncated Cauchy Non-negative Matrix Factorization

    Authors: Naiyang Guan, Tongliang Liu, Yangmuzi Zhang, Dacheng Tao, Larry S. Davis

    Abstract: Non-negative matrix factorization (NMF) minimizes the Euclidean distance between the data matrix and its low rank approximation, and it fails when applied to corrupted data because the loss function is sensitive to outliers. In this paper, we propose a Truncated CauchyNMF loss that handle outliers by truncating large errors, and develop a Truncated CauchyNMF to robustly learn the subspace on noisy… ▽ More

    Submitted 2 June, 2019; originally announced June 2019.

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE T-PAMI), vol. 41, no. 1, pp. 246-259, Jan. 2019

  37. arXiv:1905.11903  [pdf, other

    cs.CV

    Efficient Object Embedding for Spliced Image Retrieval

    Authors: Bor-Chun Chen, Zuxuan Wu, Larry S. Davis, Ser-Nam Lim

    Abstract: Detecting spliced images is one of the emerging challenges in computer vision. Unlike prior methods that focus on detecting low-level artifacts generated during the manipulation process, we use an image retrieval approach to tackle this problem. When given a spliced query image, our goal is to retrieve the original image from a database of authentic images. To achieve this goal, we propose represe… ▽ More

    Submitted 27 April, 2021; v1 submitted 28 May, 2019; originally announced May 2019.

  38. arXiv:1904.12843  [pdf, other

    cs.LG cs.CR cs.CV stat.ML

    Adversarial Training for Free!

    Authors: Ali Shafahi, Mahyar Najibi, Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S. Davis, Gavin Taylor, Tom Goldstein

    Abstract: Adversarial training, in which a network is trained on adversarial examples, is one of the few defenses against adversarial attacks that withstands strong attacks. Unfortunately, the high cost of generating strong adversarial examples makes standard adversarial training impractical on large-scale problems like ImageNet. We present an algorithm that eliminates the overhead cost of generating advers… ▽ More

    Submitted 20 November, 2019; v1 submitted 29 April, 2019; originally announced April 2019.

    Comments: Accepted to NeurIPS 2019

  39. arXiv:1904.06268  [pdf, other

    cs.CV

    ACE: Adapting to Changing Environments for Semantic Segmentation

    Authors: Zuxuan Wu, Xin Wang, Joseph E. Gonzalez, Tom Goldstein, Larry S. Davis

    Abstract: Deep neural networks exhibit exceptional accuracy when they are trained and tested on the same data distributions. However, neural classifiers are often extremely brittle when confronted with domain shift---changes in the input distribution that occur over time. We present ACE, a framework for semantic segmentation that dynamically adapts to changing environments over the time. By aligning the dis… ▽ More

    Submitted 12 April, 2019; originally announced April 2019.

  40. arXiv:1904.05871  [pdf, other

    cs.CV

    An Analysis of Pre-Training on Object Detection

    Authors: Hengduo Li, Bharat Singh, Mahyar Najibi, Zuxuan Wu, Larry S. Davis

    Abstract: We provide a detailed analysis of convolutional neural networks which are pre-trained on the task of object detection. To this end, we train detectors on large datasets like OpenImagesV4, ImageNet Localization and COCO. We analyze how well their features generalize to tasks like image classification, semantic segmentation and object detection on small datasets like PASCAL-VOC, Caltech-256, SUN-397… ▽ More

    Submitted 11 April, 2019; originally announced April 2019.

  41. arXiv:1904.03885  [pdf, other

    cs.CV cs.CL cs.LG

    Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions

    Authors: Peratham Wiriyathammabhum, Abhinav Shrivastava, Vlad I. Morariu, Larry S. Davis

    Abstract: This paper presents a new task, the grounding of spatio-temporal identifying descriptions in videos. Previous work suggests potential bias in existing datasets and emphasizes the need for a new data creation schema to better model linguistic structure. We introduce a new data collection scheme based on grammatical constraints for surface realization to enable us to investigate the problem of groun… ▽ More

    Submitted 8 April, 2019; originally announced April 2019.

  42. arXiv:1904.01769  [pdf, other

    cs.CV

    M2KD: Multi-model and Multi-level Knowledge Distillation for Incremental Learning

    Authors: Peng Zhou, Long Mai, Jianming Zhang, Ning Xu, Zuxuan Wu, Larry S. Davis

    Abstract: Incremental learning targets at achieving good performance on new categories without forgetting old ones. Knowledge distillation has been shown critical in preserving the performance on old classes. Conventional methods, however, sequentially distill knowledge only from the last model, leading to performance degradation on the old classes in later incremental learning steps. In this paper, we prop… ▽ More

    Submitted 5 September, 2020; v1 submitted 3 April, 2019; originally announced April 2019.

    Journal ref: BMVC 2020

  43. arXiv:1903.09868  [pdf, other

    cs.CV

    StartNet: Online Detection of Action Start in Untrimmed Videos

    Authors: Mingfei Gao, Mingze Xu, Larry S. Davis, Richard Socher, Caiming Xiong

    Abstract: We propose StartNet to address Online Detection of Action Start (ODAS) where action starts and their associated categories are detected in untrimmed, streaming videos. Previous methods aim to localize action starts by learning feature representations that can directly separate the start point from its preceding background. It is challenging due to the subtle appearance difference near the action s… ▽ More

    Submitted 23 March, 2019; originally announced March 2019.

  44. arXiv:1902.01096  [pdf, other

    cs.CV

    Compatible and Diverse Fashion Image Inpainting

    Authors: Xintong Han, Zuxuan Wu, Weilin Huang, Matthew R. Scott, Larry S. Davis

    Abstract: Visual compatibility is critical for fashion analysis, yet is missing in existing fashion image synthesis systems. In this paper, we propose to explicitly model visual compatibility through fashion image inpainting. To this end, we present Fashion Inpainting Networks (FiNet), a two-stage image-to-image generation framework that is able to perform compatible and diverse inpainting. Disentangling th… ▽ More

    Submitted 24 April, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

  45. arXiv:1812.06203  [pdf, other

    cs.CV

    TAN: Temporal Aggregation Network for Dense Multi-label Action Recognition

    Authors: Xiyang Dai, Bharat Singh, Joe Yue-Hei Ng, Larry S. Davis

    Abstract: We present Temporal Aggregation Network (TAN) which decomposes 3D convolutions into spatial and temporal aggregation blocks. By stacking spatial and temporal convolutions repeatedly, TAN forms a deep hierarchical representation for capturing spatio-temporal information in videos. Since we do not apply 3D convolutions in each layer but only apply temporal aggregation blocks once after each spatial… ▽ More

    Submitted 14 December, 2018; originally announced December 2018.

    Comments: WACV 2019

  46. arXiv:1812.05586  [pdf, other

    cs.CV

    FA-RPN: Floating Region Proposals for Face Detection

    Authors: Mahyar Najibi, Bharat Singh, Larry S. Davis

    Abstract: We propose a novel approach for generating region proposals for performing face-detection. Instead of classifying anchor boxes using features from a pixel in the convolutional feature map, we adopt a pooling-based approach for generating region proposals. However, pooling hundreds of thousands of anchors which are evaluated for generating proposals becomes a computational bottleneck during inferen… ▽ More

    Submitted 13 December, 2018; originally announced December 2018.

  47. arXiv:1812.01600  [pdf, other

    cs.CV

    AutoFocus: Efficient Multi-Scale Inference

    Authors: Mahyar Najibi, Bharat Singh, Larry S. Davis

    Abstract: This paper describes AutoFocus, an efficient multi-scale inference algorithm for deep-learning based object detectors. Instead of processing an entire image pyramid, AutoFocus adopts a coarse to fine approach and only processes regions which are likely to contain small objects at finer scales. This is achieved by predicting category agnostic segmentation maps for small objects at coarser scales, c… ▽ More

    Submitted 1 August, 2019; v1 submitted 4 December, 2018; originally announced December 2018.

    Comments: To appear in Proceedings of International Conference on Computer Vision (ICCV), 2019

  48. arXiv:1812.00087  [pdf, other

    cs.CV

    MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment

    Authors: Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang, Larry S. Davis

    Abstract: This research strives for natural language moment retrieval in long, untrimmed video streams. The problem is not trivial especially when a video contains multiple moments of interests and the language describes complex temporal dependencies, which often happens in real scenarios. We identify two crucial challenges: semantic misalignment and structural misalignment. However, existing approaches tre… ▽ More

    Submitted 17 May, 2019; v1 submitted 30 November, 2018; originally announced December 2018.

    Comments: CVPR 2019

  49. arXiv:1811.12432  [pdf, other

    cs.CV

    AdaFrame: Adaptive Frame Selection for Fast Video Recognition

    Authors: Zuxuan Wu, Caiming Xiong, Chih-Yao Ma, Richard Socher, Larry S. Davis

    Abstract: We present AdaFrame, a framework that adaptively selects relevant frames on a per-input basis for fast video recognition. AdaFrame contains a Long Short-Term Memory network augmented with a global memory that provides context information for searching which frames to use over time. Trained with policy gradient methods, AdaFrame generates a prediction, determines which frame to observe next, and co… ▽ More

    Submitted 10 April, 2019; v1 submitted 29 November, 2018; originally announced November 2018.

    Comments: CVPR 2019

  50. arXiv:1811.11347  [pdf, other

    cs.LG stat.ML

    Effective Ways to Build and Evaluate Individual Survival Distributions

    Authors: Humza Haider, Bret Hoehn, Sarah Davis, Russell Greiner

    Abstract: An accurate model of a patient's individual survival distribution can help determine the appropriate treatment for terminal patients. Unfortunately, risk scores (e.g., from Cox Proportional Hazard models) do not provide survival probabilities, single-time probability models (e.g., the Gail model, predicting 5 year probability) only provide for a single time point, and standard Kaplan-Meier surviva… ▽ More

    Submitted 27 November, 2018; originally announced November 2018.

    Comments: 34 pages (main text), 12 figures

    Journal ref: Journal of Machine Learning Research (JMLR) Volume 21 (2020) 18-772