Skip to main content

Showing 1–25 of 25 results for author: Chan, S - G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16866  [pdf, other

    cs.CV

    Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models

    Authors: Jierun Chen, Fangyun Wei, ****g Zhao, Sizhe Song, Bohuai Wu, Zhuoxuan Peng, S. -H. Gary Chan, Hongyang Zhang

    Abstract: Referring expression comprehension (REC) involves localizing a target instance based on a textual description. Recent advancements in REC have been driven by large multimodal models (LMMs) like CogVLM, which achieved 92.44% accuracy on RefCOCO. However, this study questions whether existing benchmarks such as RefCOCO, RefCOCO+, and RefCOCOg, capture LMMs' comprehensive capabilities. We begin with… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2405.03218  [pdf, other

    cs.CV

    Elevator, Escalator or Neither? Classifying Pedestrian Conveyor State Using Inertial Navigation System

    Authors: Tianlang He, Zhiqiu Xia, S. -H. Gary Chan

    Abstract: Classifying a pedestrian in one of the three conveyor states of "elevator," "escalator" and "neither" is fundamental to many applications such as indoor localization and people flow analysis. We estimate, for the first time, the pedestrian conveyor state given the inertial navigation system (INS) readings of accelerometer, gyroscope and magnetometer sampled from the phone. Our problem is challengi… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  3. arXiv:2403.09124  [pdf, other

    cs.CV

    Single Domain Generalization for Crowd Counting

    Authors: Zhuoxuan Peng, S. -H. Gary Chan

    Abstract: Due to its promising results, density map regression has been widely employed for image-based crowd counting. The approach, however, often suffers from severe performance degradation when tested on data from unseen scenarios, the so-called "domain shift" problem. To address the problem, we investigate in this work single domain generalization (SDG) for crowd counting. The existing SDG approaches a… ▽ More

    Submitted 5 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  4. arXiv:2312.00540  [pdf, other

    cs.LG cs.AI stat.ML

    Target-agnostic Source-free Domain Adaptation for Regression Tasks

    Authors: Tianlang He, Zhiqiu Xia, Jierun Chen, Haoliang Li, S. -H. Gary Chan

    Abstract: Unsupervised domain adaptation (UDA) seeks to bridge the domain gap between the target and source using unlabeled target data. Source-free UDA removes the requirement for labeled source data at the target to preserve data privacy and storage. However, work on source-free UDA assumes knowledge of domain gap distribution, and hence is limited to either target-aware or classification task. To overcom… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: Accepted by ICDE 2024

  5. arXiv:2310.11959  [pdf, other

    cs.LG cs.AI

    A Multi-Scale Decomposition MLP-Mixer for Time Series Analysis

    Authors: Shuhan Zhong, Sizhe Song, Weipeng Zhuo, Guanyao Li, Yang Liu, S. -H. Gary Chan

    Abstract: Time series data, including univariate and multivariate ones, are characterized by unique composition and complex multi-scale temporal variations. They often require special consideration of decomposition and multi-scale modeling to analyze. Existing deep learning methods on this best fit to univariate time series only, and have not sufficiently considered sub-series modeling and decomposition com… ▽ More

    Submitted 24 March, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted for VLDB 2024

  6. arXiv:2307.12987  [pdf, other

    cs.MA

    Efficient Behavior-consistent Calibration for Multi-agent Market Simulation

    Authors: Tianlang He, Keyan Lu, Chang Xu, Yang Liu, Weiqing Liu, S. -H. Gary Chan, Jiang Bian

    Abstract: Order-driven market simulation mimics the trader behaviors to generate order streams to support interactive studies of financial strategies. In market simulator, the multi-agent approach is commonly adopted due to its explainability. Existing multi-agent systems employ heuristic search to generate order streams, which is inefficient for large-scale simulation. Furthermore, the search-based behavio… ▽ More

    Submitted 5 June, 2023; originally announced July 2023.

  7. arXiv:2307.12219  [pdf, other

    cs.LG

    Improving Out-of-Distribution Robustness of Classifiers via Generative Interpolation

    Authors: Haoyue Bai, Ceyuan Yang, Yinghao Xu, S. -H. Gary Chan, Bolei Zhou

    Abstract: Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data. However, their performance deteriorates significantly when handling out-of-distribution (OoD) data, where the training and test are drawn from different distributions. In this paper, we explore utilizing the generative models as a data augmentation source for improving out-of-… ▽ More

    Submitted 22 July, 2023; originally announced July 2023.

  8. arXiv:2307.05914  [pdf, other

    cs.NI cs.LG eess.SP

    FIS-ONE: Floor Identification System with One Label for Crowdsourced RF Signals

    Authors: Weipeng Zhuo, Ka Ho Chiu, Jierun Chen, Ziqi Zhao, S. -H. Gary Chan, Sangtae Ha, Chul-Ho Lee

    Abstract: Floor labels of crowdsourced RF signals are crucial for many smart-city applications, such as multi-floor indoor localization, geofencing, and robot surveillance. To build a prediction model to identify the floor number of a new RF signal upon its measurement, conventional approaches using the crowdsourced RF signals assume that at least few labeled signal samples are available on each floor. In t… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

    Comments: Accepted by IEEE ICDCS 2023

  9. arXiv:2303.03667  [pdf, other

    cs.CV

    Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks

    Authors: Jierun Chen, Shiu-hong Kao, Hao He, Weipeng Zhuo, Song Wen, Chul-Ho Lee, S. -H. Gary Chan

    Abstract: To design fast neural networks, many works have been focusing on reducing the number of floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does not necessarily lead to a similar level of reduction in latency. This mainly stems from inefficiently low floating-point operations per second (FLOPS). To achieve faster networks, we revisit popular operators and demonstra… ▽ More

    Submitted 21 May, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  10. arXiv:2210.07895  [pdf, other

    cs.NI

    GRAFICS: Graph Embedding-based Floor Identification Using Crowdsourced RF Signals

    Authors: Weipeng Zhuo, Ziqi Zhao, Ka Ho Chiu, Shiju Li, Sangtae Ha, Chul-Ho Lee, S. -H. Gary Chan

    Abstract: We study the problem of floor identification for radiofrequency (RF) signal samples obtained in a crowdsourced manner, where the signal samples are highly heterogeneous and most samples lack their floor labels. We propose GRAFICS, a graph embedding-based floor identification system. GRAFICS first builds a highly versatile bipartite graph model, having APs on one side and signal samples on the othe… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: Accepted by IEEE ICDCS 2022

  11. arXiv:2210.07889  [pdf, other

    cs.NI

    Semi-supervised Learning with Network Embedding on Ambient RF Signals for Geofencing Services

    Authors: Weipeng Zhuo, Ka Ho Chiu, Jierun Chen, Jiajie Tan, Edmund Sumpena, S. -H. Gary Chan, Sangtae Ha, Chul-Ho Lee

    Abstract: In applications such as elderly care, dementia anti-wandering and pandemic control, it is important to ensure that people are within a predefined area for their safety and well-being. We propose GEM, a practical, semi-supervised Geofencing system with network EMbedding, which is based only on ambient radio frequency (RF) signals. GEM models measured RF signal records as a weighted bipartite graph.… ▽ More

    Submitted 8 March, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: A conference version of this paper will appear in IEEE ICDE 2023

  12. arXiv:2203.10489  [pdf, other

    cs.CV

    TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing

    Authors: Jierun Chen, Tianlang He, Weipeng Zhuo, Li Ma, Sangtae Ha, S. -H. Gary Chan

    Abstract: As convolution has empowered many smart applications, dynamic convolution further equips it with the ability to adapt to diverse inputs. However, the static and dynamic convolutions are either layout-agnostic or computation-heavy, making it inappropriate for layout-specific applications, e.g., face recognition and medical image segmentation. We observe that these applications naturally exhibit the… ▽ More

    Submitted 22 March, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022

  13. arXiv:2201.03817  [pdf, other

    cs.HC

    Tackling Multipath and Biased Training Data for IMU-Assisted BLE Proximity Detection

    Authors: Tianlang He, Jiajie Tan, Weipeng Zhuo, Maximilian Printz, S. -H. Gary Chan

    Abstract: Proximity detection is to determine whether an IoT receiver is within a certain distance from a signal transmitter. Due to its low cost and high popularity, Bluetooth low energy (BLE) has been used to detect proximity based on the received signal strength indicator (RSSI). To address the fact that RSSI can be markedly influenced by device carriage states, previous works have incorporated RSSI with… ▽ More

    Submitted 11 January, 2022; v1 submitted 11 January, 2022; originally announced January 2022.

  14. arXiv:2201.00008  [pdf, other

    cs.LG cs.AI

    A Lightweight and Accurate Spatial-Temporal Transformer for Traffic Forecasting

    Authors: Guanyao Li, Shuhan Zhong, S. -H. Gary Chan, Ruiyuan Li, Chih-Chieh Hung, Wen-Chih Peng

    Abstract: We study the forecasting problem for traffic with dynamic, possibly periodical, and joint spatial-temporal dependency between regions. Given the aggregated inflow and outflow traffic of regions in a city from time slots 0 to t-1, we predict the traffic at time t at any region. Prior arts in the area often consider the spatial and temporal dependencies in a decoupled manner or are rather computatio… ▽ More

    Submitted 3 May, 2022; v1 submitted 30 December, 2021; originally announced January 2022.

  15. arXiv:2109.02038  [pdf, other

    cs.LG

    NAS-OoD: Neural Architecture Search for Out-of-Distribution Generalization

    Authors: Haoyue Bai, Fengwei Zhou, Lanqing Hong, Nanyang Ye, S. -H. Gary Chan, Zhenguo Li

    Abstract: Recent advances on Out-of-Distribution (OoD) generalization reveal the robustness of deep learning models against distribution shifts. However, existing works focus on OoD algorithms, such as invariant risk minimization, domain generalization, or stable learning, without considering the influence of deep model architectures on OoD generalization, which may lead to sub-optimal performance. Neural A… ▽ More

    Submitted 5 September, 2021; originally announced September 2021.

    Comments: Accepted by ICCV2021

  16. arXiv:2105.09684  [pdf, other

    cs.CV

    Crowd Counting by Self-supervised Transfer Colorization Learning and Global Prior Classification

    Authors: Haoyue Bai, Song Wen, S. -H. Gary Chan

    Abstract: Labeled crowd scene images are expensive and scarce. To significantly reduce the requirement of the labeled images, we propose ColorCount, a novel CNN-based approach by combining self-supervised transfer colorization learning and global prior classification to leverage the abundantly available unlabeled data. The self-supervised colorization branch learns the semantics and surface texture of the i… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

  17. arXiv:2104.13946  [pdf, other

    cs.CV

    Motion-guided Non-local Spatial-Temporal Network for Video Crowd Counting

    Authors: Haoyue Bai, S. -H. Gary Chan

    Abstract: We study video crowd counting, which is to estimate the number of objects (people in this paper) in all the frames of a video sequence. Previous work on crowd counting is mostly on still images. There has been little work on how to properly extract and take advantage of the spatial-temporal correlation between neighboring frames in both short and long ranges to achieve high estimation accuracy for… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

  18. arXiv:2101.04442  [pdf, other

    cs.CV eess.IV

    Joint Demosaicking and Denoising in the Wild: The Case of Training Under Ground Truth Uncertainty

    Authors: Jierun Chen, Song Wen, S. -H. Gary Chan

    Abstract: Image demosaicking and denoising are the two key fundamental steps in digital camera pipelines, aiming to reconstruct clean color images from noisy luminance readings. In this paper, we propose and study Wild-JDD, a novel learning framework for joint demosaicking and denoising in the wild. In contrast to previous works which generally assume the ground truth of training data is a perfect reflectio… ▽ More

    Submitted 12 January, 2021; originally announced January 2021.

    Comments: Accepted by AAAI2021

  19. arXiv:2012.15685  [pdf, other

    cs.CV

    A Survey on Deep Learning-based Single Image Crowd Counting: Network Design, Loss Function and Supervisory Signal

    Authors: Haoyue Bai, Jiageng Mao, S. -H. Gary Chan

    Abstract: Single image crowd counting is a challenging computer vision problem with wide applications in public safety, city planning, traffic management, etc. With the recent development of deep learning techniques, crowd counting has aroused much attention and achieved great success in recent years. This survey is to provide a comprehensive summary of recent advances on deep learning-based crowd counting… ▽ More

    Submitted 11 July, 2022; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: Neurocomputing minor revision. Project page is at https://github.com/HaoyueBaiZJU/A-Recent-Systematic-Survey-for-Crowd-Counting

  20. arXiv:2012.09382  [pdf, other

    cs.LG

    DecAug: Out-of-Distribution Generalization via Decomposed Feature Representation and Semantic Augmentation

    Authors: Haoyue Bai, Rui Sun, Lanqing Hong, Fengwei Zhou, Nanyang Ye, Han-Jia Ye, S. -H. Gary Chan, Zhenguo Li

    Abstract: While deep learning demonstrates its strong ability to handle independent and identically distributed (IID) data, it often suffers from out-of-distribution (OoD) generalization, where the test data come from another distribution (w.r.t. the training one). Designing a general OoD generalization framework to a wide range of applications is challenging, mainly due to possible correlation shift and di… ▽ More

    Submitted 16 December, 2020; originally announced December 2020.

    Comments: Accepted by AAAI2021

  21. arXiv:2009.05944  [pdf, other

    cs.CR

    vContact: Private WiFi-based Contact Tracing with Virus Lifespan

    Authors: Guanyao Li, Siyan Hu, Shuhan Zhong, Wai Lun Tsui, S. -H. Gary Chan

    Abstract: Covid-19 is primarily spread through contact with the virus which may survive on surfaces with lifespan of more than hours. To curb its spread, it is hence of vital importance to detect and quarantine those who have been in contact with the virus for sustained period of time, the so-called close contacts. In this work, we study, for the first time, automatic contact detection when the virus has a… ▽ More

    Submitted 26 January, 2021; v1 submitted 13 September, 2020; originally announced September 2020.

  22. arXiv:1909.03839  [pdf, other

    cs.CV

    Crowd Counting on Images with Scale Variation and Isolated Clusters

    Authors: Haoyue Bai, Song Wen, S. -H. Gary Chan

    Abstract: Crowd counting is to estimate the number of objects (e.g., people or vehicles) in an image of unconstrained congested scenes. Designing a general crowd counting algorithm applicable to a wide range of crowd images is challenging, mainly due to the possibly large variation in object scales and the presence of many isolated small clusters. Previous approaches based on convolution operations with mul… ▽ More

    Submitted 9 September, 2019; originally announced September 2019.

    Comments: Accepted at International Conference on Computer Vision (ICCV) 2019 Workshop

  23. arXiv:1903.02082  [pdf, other

    cs.NE cs.LG stat.ML

    DA-LSTM: A Long Short-Term Memory with Depth Adaptive to Non-uniform Information Flow in Sequential Data

    Authors: Yifeng Zhang, Ka-Ho Chow, S. -H. Gary Chan

    Abstract: Much sequential data exhibits highly non-uniform information distribution. This cannot be correctly modeled by traditional Long Short-Term Memory (LSTM). To address that, recent works have extended LSTM by adding more activations between adjacent inputs. However, the approaches often use a fixed depth, which is at the step of the most information content. This one-size-fits-all worst-case approach… ▽ More

    Submitted 18 January, 2019; originally announced March 2019.

  24. arXiv:1811.08069  [pdf, other

    cs.LG stat.ML

    Representation Learning of Pedestrian Trajectories Using Actor-Critic Sequence-to-Sequence Autoencoder

    Authors: Ka-Ho Chow, Anish Hiranandani, Yifeng Zhang, S. -H. Gary Chan

    Abstract: Representation learning of pedestrian trajectories transforms variable-length timestamp-coordinate tuples of a trajectory into a fixed-length vector representation that summarizes spatiotemporal characteristics. It is a crucial technique to connect feature-based data mining with trajectory data. Trajectory representation is a challenging problem, because both environmental constraints (e.g., wall… ▽ More

    Submitted 19 November, 2018; originally announced November 2018.

  25. arXiv:1211.4767  [pdf, ps, other

    cs.MM

    Collaborative P2P Streaming of Interactive Live Free Viewpoint Video

    Authors: Dongni Ren, S. -H. Gary Chan, Gene Cheung, Vicky Zhao, Pascal Frossard

    Abstract: We study an interactive live streaming scenario where multiple peers pull streams of the same free viewpoint video that are synchronized in time but not necessarily in view. In free viewpoint video, each user can periodically select a virtual view between two anchor camera views for display. The virtual view is synthesized using texture and depth videos of the anchor views via depth-image-based re… ▽ More

    Submitted 20 November, 2012; originally announced November 2012.