Skip to main content

Showing 1–30 of 30 results for author: Newsam, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.14246  [pdf, other

    eess.AS cs.AI

    CATSE: A Context-Aware Framework for Causal Target Sound Extraction

    Authors: Shrishail Baligar, Mikolaj Kegler, Bryce Irvin, Marko Stamenovic, Shawn Newsam

    Abstract: Target Sound Extraction (TSE) focuses on the problem of separating sources of interest, indicated by a user's cue, from the input mixture. Most existing solutions operate in an offline fashion and are not suited to the low-latency causal processing constraints imposed by applications in live-streamed content such as augmented hearing. We introduce a family of context-aware low-latency causal TSE m… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Submitted to EUSIPCO 2024

  2. arXiv:2210.13207  [pdf

    cs.AI

    GeoAI at ACM SIGSPATIAL: The New Frontier of Geospatial Artificial Intelligence Research

    Authors: Dalton Lunga, Yingjie Hu, Shawn Newsam, Song Gao, Bruno Martins, Lexie Yang, Xueqing Deng

    Abstract: Geospatial Artificial Intelligence (GeoAI) is an interdisciplinary field enjoying tremendous adoption. However, the efficient design and implementation of GeoAI systems face many open challenges. This is mainly due to the lack of non-standardized approaches to artificial intelligence tool development, inadequate platforms, and a lack of multidisciplinary engagements, which all motivate domain expe… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: 12 pages, 1 figure, 1 table

  3. arXiv:2204.05547  [pdf, other

    cs.CV

    DistPro: Searching A Fast Knowledge Distillation Process via Meta Optimization

    Authors: Xueqing Deng, Dawei Sun, Shawn Newsam, Peng Wang

    Abstract: Recent Knowledge distillation (KD) studies show that different manually designed schemes impact the learned results significantly. Yet, in KD, automatically searching an optimal distillation scheme has not yet been well explored. In this paper, we propose DistPro, a novel framework which searches for an optimal KD process via differentiable meta-learning. Specifically, given a pair of student and… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: 14 pages, 5 figures

  4. arXiv:2204.05538  [pdf, other

    cs.CV

    NightLab: A Dual-level Architecture with Hardness Detection for Segmentation at Night

    Authors: Xueqing Deng, Peng Wang, Xiaochen Lian, Shawn Newsam

    Abstract: The semantic segmentation of nighttime scenes is a challenging problem that is key to impactful applications like self-driving cars. Yet, it has received little attention compared to its daytime counterpart. In this paper, we propose NightLab, a novel nighttime segmentation framework that leverages multiple deep learning models imbued with night-aware features to yield State-of-The-Art (SoTA) perf… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: 8pages, 6 figures, accept at CVPR 2022

  5. arXiv:2203.03809  [pdf, other

    cs.CV cs.CL

    Image Search with Text Feedback by Additive Attention Compositional Learning

    Authors: Yuxin Tian, Shawn Newsam, Kofi Boakye

    Abstract: Effective image retrieval with text feedback stands to impact a range of real-world applications, such as e-commerce. Given a source image and text feedback that describes the desired modifications to that image, the goal is to retrieve the target images that resemble the source yet satisfy the given modifications by composing a multi-modal (image-text) query. We propose a novel solution to this p… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

  6. arXiv:2106.13227  [pdf, other

    cs.CV

    AutoAdapt: Automated Segmentation Network Search for Unsupervised Domain Adaptation

    Authors: Xueqing Deng, Yi Zhu, Yuxin Tian, Shawn Newsam

    Abstract: Neural network-based semantic segmentation has achieved remarkable results when large amounts of annotated data are available, that is, in the supervised case. However, such data is expensive to collect and so methods have been developed to adapt models trained on related, often synthetic data for which labels are readily available. Current adaptation approaches do not consider the dependence of t… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: short version has been accepted at 1st NAS workshop co-organized with CVPR 2021

  7. arXiv:2012.04222  [pdf, other

    cs.CV

    Scale Aware Adaptation for Land-Cover Classification in Remote Sensing Imagery

    Authors: Xueqing Deng, Yi Zhu, Yuxin Tian, Shawn Newsam

    Abstract: Land-cover classification using remote sensing imagery is an important Earth observation task. Recently, land cover classification has benefited from the development of fully connected neural networks for semantic segmentation. The benchmark datasets available for training deep segmentation models in remote sensing imagery tend to be small, however, often consisting of only a handful of images fro… ▽ More

    Submitted 8 December, 2020; originally announced December 2020.

    Comments: The open-sourced codes are available on Github: https://github.com/xdeng7/scale-aware_da

  8. arXiv:1912.10667  [pdf, other

    cs.CV

    Generalizing Deep Models for Overhead Image Segmentation Through Getis-Ord Gi* Pooling

    Authors: Xueqing Deng, Yi Zhu, Yuxin Tian, Shawn Newsam

    Abstract: That most deep learning models are purely data driven is both a strength and a weakness. Given sufficient training data, the optimal model for a particular problem can be learned. However, this is usually not the case and so instead the model is either learned from scratch from a limited amount of training data or pre-trained on a different problem and then fine-tuned. Both of these situations are… ▽ More

    Submitted 23 December, 2019; originally announced December 2019.

  9. arXiv:1907.10211  [pdf, other

    cs.CV cs.LG eess.IV

    Motion-Aware Feature for Improved Video Anomaly Detection

    Authors: Yi Zhu, Shawn Newsam

    Abstract: Motivated by our observation that motion information is the key to good anomaly detection performance in video, we propose a temporal augmented network to learn a motion-aware feature. This feature alone can achieve competitive performance with previous state-of-the-art methods, and when combined with them, can achieve significant performance improvements. Furthermore, we incorporate temporal cont… ▽ More

    Submitted 23 July, 2019; originally announced July 2019.

    Comments: BMVC 2019

  10. arXiv:1902.06923  [pdf, other

    cs.CV

    Using Conditional Generative Adversarial Networks to Generate Ground-Level Views From Overhead Imagery

    Authors: Xueqing Deng, Yi Zhu, Shawn Newsam

    Abstract: This paper develops a deep-learning framework to synthesize a ground-level view of a location given an overhead image. We propose a novel conditional generative adversarial network (cGAN) in which the trained generator generates realistic looking and representative ground-level images using overhead imagery as auxiliary information. The generator is an encoder-decoder network which allows us to co… ▽ More

    Submitted 19 February, 2019; originally announced February 2019.

    Comments: 5 pages. arXiv admin note: text overlap with arXiv:1806.05129

  11. arXiv:1812.01593  [pdf, other

    cs.CV cs.AI cs.MM cs.RO

    Improving Semantic Segmentation via Video Propagation and Label Relaxation

    Authors: Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao, Bryan Catanzaro

    Abstract: Semantic segmentation requires large amounts of pixel-wise annotations to learn accurate models. In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks. We exploit video prediction models' ability to predict future frames in order to also predict future labels.… ▽ More

    Submitted 2 July, 2019; v1 submitted 4 December, 2018; originally announced December 2018.

    Comments: CVPR 2019 Oral. Code link: https://github.com/NVIDIA/semantic-segmentation. YouTube link: https://www.youtube.com/watch?v=aEbXjGZDZSQ

  12. arXiv:1810.12522  [pdf, other

    cs.CV cs.AI cs.MM

    Random Temporal Skip** for Multirate Video Analysis

    Authors: Yi Zhu, Shawn Newsam

    Abstract: Current state-of-the-art approaches to video understanding adopt temporal jittering to simulate analyzing the video at varying frame rates. However, this does not work well for multirate videos, in which actions or subactions occur at different speeds. The frame sampling rate should vary in accordance with the different motion speeds. In this work, we propose a simple yet effective strategy, terme… ▽ More

    Submitted 30 October, 2018; originally announced October 2018.

    Comments: Accepted at ACCV 2018. Camera ready

  13. arXiv:1810.12521  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Gated Transfer Network for Transfer Learning

    Authors: Yi Zhu, Jia Xue, Shawn Newsam

    Abstract: Deep neural networks have led to a series of breakthroughs in computer vision given sufficient annotated training datasets. For novel tasks with limited labeled data, the prevalent approach is to transfer the knowledge learned in the pre-trained models to the new tasks by fine-tuning. Classic model fine-tuning utilizes the fact that well trained neural networks appear to learn cross domain feature… ▽ More

    Submitted 30 October, 2018; originally announced October 2018.

    Comments: Accepted at ACCV 2018. Camera ready

  14. arXiv:1806.05129  [pdf, other

    cs.CV

    What Is It Like Down There? Generating Dense Ground-Level Views and Image Features From Overhead Imagery Using Conditional Generative Adversarial Networks

    Authors: Xueqing Deng, Yi Zhu, Shawn Newsam

    Abstract: This paper investigates conditional generative adversarial networks (cGANs) to overcome a fundamental limitation of using geotagged media for geographic discovery, namely its sparse and uneven spatial distribution. We train a cGAN to generate ground-level views of a location given overhead imagery. We show the "fake" ground-level images are natural looking and are structurally similar to the real… ▽ More

    Submitted 23 September, 2018; v1 submitted 13 June, 2018; originally announced June 2018.

    Comments: 10 pages, 5 figures, camera-ready version of ACM SIGSPATIAL 2018 (ORAL)

  15. arXiv:1805.02733  [pdf, other

    cs.CV cs.MM

    Learning Optical Flow via Dilated Networks and Occlusion Reasoning

    Authors: Yi Zhu, Shawn Newsam

    Abstract: Despite the significant progress that has been made on estimating optical flow recently, most estimation methods, including classical and deep learning approaches, still have difficulty with multi-scale estimation, real-time computation, and/or occlusion reasoning. In this paper, we introduce dilated convolution and occlusion reasoning into unsupervised optical flow estimation to address these iss… ▽ More

    Submitted 7 May, 2018; originally announced May 2018.

    Comments: Accepted at ICIP 2018

  16. arXiv:1803.08460  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Towards Universal Representation for Unseen Action Recognition

    Authors: Yi Zhu, Yang Long, Yu Guan, Shawn Newsam, Ling Shao

    Abstract: Unseen Action Recognition (UAR) aims to recognise novel action categories without training examples. While previous methods focus on inner-dataset seen/unseen splits, this paper proposes a pipeline using a large-scale training source to achieve a Universal Representation (UR) that can generalise to a more realistic Cross-Dataset UAR (CD-UAR) scenario. We first address UAR as a Generalised Multiple… ▽ More

    Submitted 22 March, 2018; originally announced March 2018.

    Comments: Accepted at CVPR 2018

  17. arXiv:1802.07452  [pdf, other

    cs.CV

    Spatial Morphing Kernel Regression For Feature Interpolation

    Authors: Xueqing Deng, Yi Zhu, Shawn Newsam

    Abstract: In recent years, geotagged social media has become popular as a novel source for geographic knowledge discovery. Ground-level images and videos provide a different perspective than overhead imagery and can be applied to a range of applications such as land use map**, activity detection, pollution map**, etc. The sparse and uneven distribution of this data presents a problem, however, for gener… ▽ More

    Submitted 4 May, 2018; v1 submitted 21 February, 2018; originally announced February 2018.

    Comments: accepted by ICIP 2018

  18. arXiv:1802.02668  [pdf, other

    cs.CV cs.IR cs.MM

    Fine-Grained Land Use Classification at the City Scale Using Ground-Level Images

    Authors: Yi Zhu, Xueqing Deng, Shawn Newsam

    Abstract: We perform fine-grained land use map** at the city scale using ground-level images. Map** land use is considerably more difficult than map** land cover and is generally not possible using overhead imagery as it requires close-up views and seeing inside buildings. We postulate that the growing collections of georeferenced, ground-level images suggest an alternate approach to this geographic k… ▽ More

    Submitted 7 February, 2018; originally announced February 2018.

  19. arXiv:1711.03641  [pdf, other

    cs.CY

    Quantitative Comparison of Open-Source Data for Fine-Grain Map** of Land Use

    Authors: Xueqing Deng, Shawn Newsam

    Abstract: This paper performs a quantitative comparison of open-source data available on the Internet for the fine-grain map** of land use. Three points of interest (POI) data sources--Google Places, Bing Maps, and the Yellow Pages--and one volunteered geographic information data source--Open Street Map (OSM)--are compared with each other at the parcel level for San Francisco with respect to a proposed fi… ▽ More

    Submitted 9 November, 2017; originally announced November 2017.

    Comments: ACM SIGSPATIAL 2017 Workshop on Urban GIS

  20. arXiv:1707.06316  [pdf, other

    cs.CV cs.MM

    DenseNet for Dense Flow

    Authors: Yi Zhu, Shawn Newsam

    Abstract: Classical approaches for estimating optical flow have achieved rapid progress in the last decade. However, most of them are too slow to be applied in real-time video analysis. Due to the great success of deep learning, recent work has focused on using CNNs to solve such dense prediction problems. In this paper, we investigate a new deep architecture, Densely Connected Convolutional Networks (Dense… ▽ More

    Submitted 19 July, 2017; originally announced July 2017.

    Comments: Accepted at ICIP 2017

  21. arXiv:1706.07911  [pdf, other

    cs.CV cs.CY cs.MM cs.SI

    Large-Scale Map** of Human Activity using Geo-Tagged Videos

    Authors: Yi Zhu, Sen Liu, Shawn Newsam

    Abstract: This paper is the first work to perform spatio-temporal map** of human activity using the visual content of geo-tagged videos. We utilize a recent deep-learning based video analysis framework, termed hidden two-stream networks, to recognize a range of activities in YouTube videos. This framework is efficient and can run in real time or faster which is important for recognizing events as they occ… ▽ More

    Submitted 28 November, 2017; v1 submitted 24 June, 2017; originally announced June 2017.

    Comments: Accepted at ACM SIGSPATIAL 2017

  22. PatternNet: A Benchmark Dataset for Performance Evaluation of Remote Sensing Image Retrieval

    Authors: Weixun Zhou, Shawn Newsam, Congmin Li, Zhenfeng Shao

    Abstract: Remote sensing image retrieval(RSIR), which aims to efficiently retrieve data of interest from large collections of remote sensing data, is a fundamental task in remote sensing. Over the past several decades, there has been significant effort to extract powerful feature representations for this task since the retrieval performance depends on the representative strength of the features. Benchmark d… ▽ More

    Submitted 10 July, 2017; v1 submitted 11 June, 2017; originally announced June 2017.

    Comments: 49 pages

  23. arXiv:1704.03503  [pdf, other

    cs.CV cs.MM

    UC Merced Submission to the ActivityNet Challenge 2016

    Authors: Yi Zhu, Shawn Newsam, Zaikun Xu

    Abstract: This notebook paper describes our system for the untrimmed classification task in the ActivityNet challenge 2016. We investigate multiple state-of-the-art approaches for action recognition in long, untrimmed videos. We exploit hand-crafted motion boundary histogram features as well feature activations from deep networks such as VGG16, GoogLeNet, and C3D. These features are separately fed to linear… ▽ More

    Submitted 11 April, 2017; originally announced April 2017.

    Comments: Notebook paper for ActivityNet 2016 challenge, untrimmed video classification track

  24. arXiv:1704.00389  [pdf, other

    cs.CV cs.LG cs.MM

    Hidden Two-Stream Convolutional Networks for Action Recognition

    Authors: Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander G. Hauptmann

    Abstract: Analyzing videos of human actions involves understanding the temporal relationships among video frames. State-of-the-art action recognition approaches rely on traditional optical flow estimation methods to pre-compute motion information for CNNs. Such a two-stage approach is computationally expensive, storage demanding, and not end-to-end trainable. In this paper, we present a novel CNN architectu… ▽ More

    Submitted 30 October, 2018; v1 submitted 2 April, 2017; originally announced April 2017.

    Comments: Accepted at ACCV 2018, camera ready. Code available at https://github.com/bryanyzhu/Hidden-Two-Stream

  25. arXiv:1702.02295  [pdf, other

    cs.CV

    Guided Optical Flow Learning

    Authors: Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander G. Hauptmann

    Abstract: We study the unsupervised learning of CNNs for optical flow estimation using proxy ground truth data. Supervised CNNs, due to their immense learning capacity, have shown superior performance on a range of computer vision problems including optical flow prediction. They however require the ground truth flow which is usually not accessible except on limited synthetic data. Without the guidance of gr… ▽ More

    Submitted 1 July, 2017; v1 submitted 8 February, 2017; originally announced February 2017.

    Comments: CVPR17 Workshop. Code available at https://github.com/bryanyzhu/GuidedNet

  26. arXiv:1612.07403  [pdf, other

    cs.CV cs.MM

    Efficient Action Detection in Untrimmed Videos via Multi-Task Learning

    Authors: Yi Zhu, Shawn Newsam

    Abstract: This paper studies the joint learning of action recognition and temporal localization in long, untrimmed videos. We employ a multi-task learning framework that performs the three highly related steps of action proposal, action recognition, and action localization refinement in parallel instead of the standard sequential pipeline that performs the steps in order. We develop a novel temporal actionn… ▽ More

    Submitted 4 April, 2017; v1 submitted 21 December, 2016; originally announced December 2016.

    Comments: WACV 2017 camera ready, minor updates about test time efficiency

  27. Learning Low Dimensional Convolutional Neural Networks for High-Resolution Remote Sensing Image Retrieval

    Authors: Weixun Zhou, Shawn Newsam, Congmin Li, Zhenfeng Shao

    Abstract: Learning powerful feature representations for image retrieval has always been a challenging task in the field of remote sensing. Traditional methods focus on extracting low-level hand-crafted features which are not only time-consuming but also tend to achieve unsatisfactory performance due to the content complexity of remote sensing images. In this paper, we investigate how to extract deep feature… ▽ More

    Submitted 30 December, 2016; v1 submitted 10 October, 2016; originally announced October 2016.

    Journal ref: Remote Sens., 9(5), 489 (2017)

  28. arXiv:1609.06772  [pdf, other

    cs.CV cs.CY cs.MM

    Spatio-Temporal Sentiment Hotspot Detection Using Geotagged Photos

    Authors: Yi Zhu, Shawn Newsam

    Abstract: We perform spatio-temporal analysis of public sentiment using geotagged photo collections. We develop a deep learning-based classifier that predicts the emotion conveyed by an image. This allows us to associate sentiment with place. We perform spatial hotspot detection and show that different emotions have distinct spatial distributions that match expectations. We also perform temporal analysis us… ▽ More

    Submitted 21 September, 2016; originally announced September 2016.

    Comments: To appear in ACM SIGSPATIAL 2016

  29. arXiv:1609.06653  [pdf, other

    cs.CV cs.CY cs.MM

    Land Use Classification using Convolutional Neural Networks Applied to Ground-Level Images

    Authors: Yi Zhu, Shawn Newsam

    Abstract: Land use map** is a fundamental yet challenging task in geographic science. In contrast to land cover map**, it is generally not possible using overhead imagery. The recent, explosive growth of online geo-referenced photo collections suggests an alternate approach to geographic knowledge discovery. In this work, we present a general framework that uses ground-level images from Flickr for land… ▽ More

    Submitted 21 September, 2016; originally announced September 2016.

    Comments: ACM SIGSPATIAL 2015, Best Poster Award

  30. arXiv:1608.04339  [pdf, other

    cs.CV

    Depth2Action: Exploring Embedded Depth for Large-Scale Action Recognition

    Authors: Yi Zhu, Shawn Newsam

    Abstract: This paper performs the first investigation into depth for large-scale human action recognition in video where the depth cues are estimated from the videos themselves. We develop a new framework called depth2action and experiment thoroughly into how best to incorporate the depth information. We introduce spatio-temporal depth normalization (STDN) to enforce temporal consistency in our estimated de… ▽ More

    Submitted 15 August, 2016; originally announced August 2016.

    Comments: ECCVW 2016, Web-scale Vision and Social Media (VSM) workshop