Skip to main content

Showing 1–50 of 54 results for author: See, S

.
  1. arXiv:2406.01938  [pdf, other

    cs.CV cs.MM

    Nutrition Estimation for Dietary Management: A Transformer Approach with Depth Sensing

    Authors: Zhengyi Kwan, Wei Zhang, Zhengkui Wang, Aik Beng Ng, Simon See

    Abstract: Nutrition estimation is crucial for effective dietary management and overall health and well-being. Existing methods often struggle with sub-optimal accuracy and can be time-consuming. In this paper, we propose NuNet, a transformer-based network designed for nutrition estimation that utilizes both RGB and depth information from food images. We have designed and implemented a multi-scale encoder an… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 pages

  2. arXiv:2405.13629  [pdf, other

    cs.LG

    Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow

    Authors: Chen-Hao Chao, Chien Feng, Wei-Fang Sun, Cheng-Kuang Lee, Simon See, Chun-Yi Lee

    Abstract: Existing Maximum-Entropy (MaxEnt) Reinforcement Learning (RL) methods for continuous action spaces are typically formulated based on actor-critic frameworks and optimized through alternating steps of policy evaluation and policy improvement. In the policy evaluation steps, the critic is updated to capture the soft Q-function. In the policy improvement steps, the actor is adjusted in accordance wit… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  3. arXiv:2405.02630  [pdf, other

    quant-ph cs.DC cs.SE

    cuTN-QSVM: cuTensorNet-accelerated Quantum Support Vector Machine with cuQuantum SDK

    Authors: Kuan-Cheng Chen, Tai-Yue Li, Yun-Yuan Wang, Simon See, Chun-Chieh Wang, Robert Wille, Nan-Yow Chen, An-Cheng Yang, Chun-Yu Lin

    Abstract: This paper investigates the application of Quantum Support Vector Machines (QSVMs) with an emphasis on the computational advancements enabled by NVIDIA's cuQuantum SDK, especially leveraging the cuTensorNet library. We present a simulation workflow that substantially diminishes computational overhead, as evidenced by our experiments, from exponential to quadratic cost. While state vector simulatio… ▽ More

    Submitted 8 May, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

    Comments: 10 pages, 14 figures

  4. arXiv:2402.10646  [pdf, other

    cs.CL

    AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility Estimation

    Authors: Zhaowei Wang, Wei Fan, Qing Zong, Hongming Zhang, Sehyun Choi, Tianqing Fang, Xin Liu, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: Abstraction ability is crucial in human intelligence, which can also benefit various tasks in NLP study. Existing work shows that LLMs are deficient in abstract ability, and how to improve it remains unexplored. In this work, we design the framework AbsInstruct to enhance LLMs' abstraction ability through instruction tuning. The framework builds instructions with in-depth explanations to assist LL… ▽ More

    Submitted 17 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024

  5. arXiv:2401.15977  [pdf, other

    cs.CV

    Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

    Authors: Xiaoyu Shi, Zhaoyang Huang, Fu-Yun Wang, Weikang Bian, Dasong Li, Yi Zhang, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li

    Abstract: We introduce Motion-I2V, a novel framework for consistent and controllable image-to-video generation (I2V). In contrast to previous methods that directly learn the complicated image-to-video map**, Motion-I2V factorizes I2V into two stages with explicit motion modeling. For the first stage, we propose a diffusion-based motion field predictor, which focuses on deducing the trajectories of the ref… ▽ More

    Submitted 31 January, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Project page: https://xiaoyushi97.github.io/Motion-I2V/

  6. arXiv:2401.14619  [pdf, other

    cs.LG

    Resilient Practical Test-Time Adaptation: Soft Batch Normalization Alignment and Entropy-driven Memory Bank

    Authors: Xingzhi Zhou, Zhiliang Tian, Ka Chun Cheung, Simon See, Nevin L. Zhang

    Abstract: Test-time domain adaptation effectively adjusts the source domain model to accommodate unseen domain shifts in a target domain during inference. However, the model performance can be significantly impaired by continuous distribution changes in the target domain and non-independent and identically distributed (non-i.i.d.) test samples often encountered in practical scenarios. While existing memory… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  7. arXiv:2310.05210  [pdf, other

    cs.AI cs.CL

    TILFA: A Unified Framework for Text, Image, and Layout Fusion in Argument Mining

    Authors: Qing Zong, Zhaowei Wang, Baixuan Xu, Tianshi Zheng, Haochen Shi, Weiqi Wang, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: A main goal of Argument Mining (AM) is to analyze an author's stance. Unlike previous AM datasets focusing only on text, the shared task at the 10th Workshop on Argument Mining introduces a dataset including both text and images. Importantly, these images contain both visual elements and optical characters. Our new framework, TILFA (A Unified Framework for Text, Image, and Layout Fusion in Argumen… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted to the 10th Workshop on Argument Mining, co-located with EMNLP 2023

  8. arXiv:2309.08303  [pdf, other

    cs.CL

    Self-Consistent Narrative Prompts on Abductive Natural Language Inference

    Authors: Chunkit Chan, Xin Liu, Tsz Ho Chan, Jiayang Cheng, Yangqiu Song, Ginny Wong, Simon See

    Abstract: Abduction has long been seen as crucial for narrative comprehension and reasoning about everyday situations. The abductive natural language inference ($α$NLI) task has been proposed, and this narrative text-based task aims to infer the most plausible hypothesis from the candidates given two observations. However, the inter-sentential coherence and the model consistency have not been well exploited… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted at IJCNLP-AACL 2023 main track

  9. arXiv:2308.05396  [pdf, other

    cs.CV

    Learning Gabor Texture Features for Fine-Grained Recognition

    Authors: Lanyun Zhu, Tianrun Chen, Jianxiong Yin, Simon See, Jun Liu

    Abstract: Extracting and using class-discriminative features is critical for fine-grained recognition. Existing works have demonstrated the possibility of applying deep CNNs to exploit features that distinguish similar classes. However, CNNs suffer from problems including frequency bias and loss of detailed local information, which restricts the performance of recognizing fine-grained categories. To address… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV2023

  10. Towards Building AI-CPS with NVIDIA Isaac Sim: An Industrial Benchmark and Case Study for Robotics Manipulation

    Authors: Zhehua Zhou, Jiayang Song, Xuan Xie, Zhan Shu, Lei Ma, Dikai Liu, Jianxiong Yin, Simon See

    Abstract: As a representative cyber-physical system (CPS), robotic manipulator has been widely adopted in various academic research and industrial processes, indicating its potential to act as a universal interface between the cyber and the physical worlds. Recent studies in robotics manipulation have started employing artificial intelligence (AI) approaches as controllers to achieve better adaptability and… ▽ More

    Submitted 31 July, 2023; originally announced August 2023.

  11. arXiv:2307.11526  [pdf, other

    cs.CV

    CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields

    Authors: Ziyuan Luo, Qing Guo, Ka Chun Cheung, Simon See, Renjie Wan

    Abstract: Neural Radiance Fields (NeRF) have the potential to be a major representation of media. Since training a NeRF has never been an easy task, the protection of its model copyright should be a priority. In this paper, by analyzing the pros and cons of possible copyright protection solutions, we propose to protect the copyright of NeRF models by replacing the original color representation in NeRF with… ▽ More

    Submitted 29 July, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

    Comments: 11 pages, 6 figures, accepted by ICCV 2023 non-camera-ready version

  12. arXiv:2307.06876  [pdf, other

    cond-mat.mes-hall physics.optics

    Light Emission and Conductance Fluctuations in Electrically Driven and Plasmonically Enhanced Molecular Junctions

    Authors: Sakthi Priya Amirtharaj, Zhiyuan Xie, Josephine Si Yu See, Gabriele Rolleri, Wen Chen, Konstantin Malchow, Alexandre Bouhelier, Emanuel Lörtscher, Christophe Galland

    Abstract: Electrically connected and plasmonically enhanced molecular junctions combine the optical functionalities of high field confinement and enhancement (cavity function), and of high radiative efficiency (antenna function) with the electrical functionalities of molecular transport. Such combined optical and electrical probes have proven useful for the fundamental understanding of metal-molecule contac… ▽ More

    Submitted 28 March, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

    Journal ref: ACS Photonics 2024

  13. Towards Balanced Active Learning for Multimodal Classification

    Authors: Meng Shen, Yizheng Huang, Jianxiong Yin, Heqing Zou, Deepu Rajan, Simon See

    Abstract: Training multimodal networks requires a vast amount of data due to their larger parameter space compared to unimodal networks. Active learning is a widely used technique for reducing data annotation costs by selecting only those samples that could contribute to improving model performance. However, current active learning strategies are mostly designed for unimodal tasks, and when applied to multi… ▽ More

    Submitted 21 August, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: 12 pages, accepted by ACMMM 2023

  14. arXiv:2306.05888  [pdf, other

    cs.CV

    TrajectoryFormer: 3D Object Tracking Transformer with Predictive Trajectory Hypotheses

    Authors: Xuesong Chen, Shaoshuai Shi, Chao Zhang, Ben** Zhu, Qiang Wang, Ka Chun Cheung, Simon See, Hongsheng Li

    Abstract: 3D multi-object tracking (MOT) is vital for many applications including autonomous driving vehicles and service robots. With the commonly used tracking-by-detection paradigm, 3D MOT has made important progress in recent years. However, these methods only use the detection boxes of the current frame to obtain trajectory-box association results, which makes it impossible for the tracker to recover o… ▽ More

    Submitted 18 August, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: Accepted by ICCV 2023

  15. arXiv:2306.02430  [pdf, other

    cs.MA cs.LG

    A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning

    Authors: Wei-Fang Sun, Cheng-Kuang Lee, Simon See, Chun-Yi Lee

    Abstract: In fully cooperative multi-agent reinforcement learning (MARL) settings, environments are highly stochastic due to the partial observability of each agent and the continuously changing policies of other agents. To address the above issues, we proposed a unified framework, called DFAC, for integrating distributional RL with value function factorization methods. This framework generalizes expected v… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: JMLR 2023. Extended version of arXiv:2102.07936

  16. arXiv:2305.05191  [pdf, other

    cs.CL cs.AI

    COLA: Contextualized Commonsense Causal Reasoning from the Causal Inference Perspective

    Authors: Zhaowei Wang, Quyet V. Do, Hongming Zhang, Jiayao Zhang, Weiqi Wang, Tianqing Fang, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: Detecting commonsense causal relations (causation) between events has long been an essential yet challenging task. Given that events are complicated, an event may have different causes under various contexts. Thus, exploiting context plays an essential role in detecting causal relations. Meanwhile, previous works about commonsense causation only consider two events and ignore their context, simpli… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: Accepted to the main conference of ACL 2023

  17. arXiv:2305.04034  [pdf, other

    cs.AI cs.DB cs.LG

    Wasserstein-Fisher-Rao Embedding: Logical Query Embeddings with Local Comparison and Global Transport

    Authors: Zihao Wang, Weizhi Fei, Hang Yin, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: Answering complex queries on knowledge graphs is important but particularly challenging because of the data incompleteness. Query embedding methods address this issue by learning-based models and simulating logical reasoning with set operators. Previous works focus on specific forms of embeddings, but scoring functions between embeddings are underexplored. In contrast to existing scoring functions… ▽ More

    Submitted 6 May, 2023; originally announced May 2023.

    Comments: Findings in ACL 2023. 16 pages, 6 figures, and 8 tables. Our implementation can be found at https://github.com/HKUST-KnowComp/WFRE

  18. arXiv:2305.03973  [pdf, other

    cs.CL

    DiscoPrompt: Path Prediction Prompt Tuning for Implicit Discourse Relation Recognition

    Authors: Chunkit Chan, Xin Liu, Jiayang Cheng, Zihan Li, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: Implicit Discourse Relation Recognition (IDRR) is a sophisticated and challenging task to recognize the discourse relations between the arguments with the absence of discourse connectives. The sense labels for each discourse relation follow a hierarchical classification scheme in the annotation process (Prasad et al., 2008), forming a hierarchy structure. Most existing works do not well incorporat… ▽ More

    Submitted 6 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of ACL 2023

  19. arXiv:2304.05015  [pdf, other

    cs.CV

    Continual Semantic Segmentation with Automatic Memory Sample Selection

    Authors: Lanyun Zhu, Tianrun Chen, Jianxiong Yin, Simon See, Jun Liu

    Abstract: Continual Semantic Segmentation (CSS) extends static semantic segmentation by incrementally introducing new classes for training. To alleviate the catastrophic forgetting issue in CSS, a memory buffer that stores a small number of samples from the previous classes is constructed for replay. However, existing methods select the memory samples either randomly or based on a single-factor-driven handc… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

    Comments: Accepted to CVPR2023

  20. arXiv:2303.08340  [pdf, other

    cs.CV

    VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation

    Authors: Xiaoyu Shi, Zhaoyang Huang, Weikang Bian, Dasong Li, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li

    Abstract: We introduce VideoFlow, a novel optical flow estimation framework for videos. In contrast to previous methods that learn to estimate optical flow from two frames, VideoFlow concurrently estimates bi-directional optical flows for multiple frames that are available in videos by sufficiently exploiting temporal cues. We first propose a TRi-frame Optical Flow (TROF) module that estimates bi-directiona… ▽ More

    Submitted 20 August, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

  21. arXiv:2303.01237  [pdf, other

    cs.CV

    FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation

    Authors: Xiaoyu Shi, Zhaoyang Huang, Dasong Li, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li

    Abstract: FlowFormer introduces a transformer architecture into optical flow estimation and achieves state-of-the-art performance. The core component of FlowFormer is the transformer-based cost-volume encoder. Inspired by the recent success of masked autoencoding (MAE) pretraining in unleashing transformers' capacity of encoding visual representation, we propose Masked Cost Volume Autoencoding (MCVA) to enh… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

  22. arXiv:2301.08859  [pdf, other

    cs.LG cs.LO

    Logical Message Passing Networks with One-hop Inference on Atomic Formulas

    Authors: Zihao Wang, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: Complex Query Answering (CQA) over Knowledge Graphs (KGs) has attracted a lot of attention to potentially support many applications. Given that KGs are usually incomplete, neural models are proposed to answer the logical queries by parameterizing set operators with complex neural networks. However, such methods usually train neural set operators with a large number of entity and relation embedding… ▽ More

    Submitted 26 August, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

    Comments: Accepted by ICLR 2023. 20 pages, 4 figures, and 9 tables. Our implementation can be found at https://github.com/HKUST-KnowComp/LMPNN . update v4: more accurate comparison about the computational cost between LMPNN and GNN-QE. update v3: typo fix. update v2: add code repository

  23. arXiv:2301.00407  [pdf, other

    cs.LG cs.PF

    MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs

    Authors: Huaizheng Zhang, Yuanming Li, Wencong Xiao, Yizheng Huang, Xing Di, Jianxiong Yin, Simon See, Yong Luo, Chiew Tong Lau, Yang You

    Abstract: New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensiv… ▽ More

    Submitted 1 January, 2023; originally announced January 2023.

    Comments: 10 pages, 11 figures

  24. arXiv:2212.08830  [pdf, other

    cs.CV

    Inductive Attention for Video Action Anticipation

    Authors: Tsung-Ming Tai, Giuseppe Fiameni, Cheng-Kuang Lee, Simon See, Oswald Lanz

    Abstract: Anticipating future actions based on spatiotemporal observations is essential in video understanding and predictive computer vision. Moreover, a model capable of anticipating the future has important applications, it can benefit precautionary systems to react before an event occurs. However, unlike in the action recognition task, future information is inaccessible at observation time -- a model ca… ▽ More

    Submitted 18 March, 2023; v1 submitted 17 December, 2022; originally announced December 2022.

  25. arXiv:2211.12759  [pdf, other

    cs.CV cs.AI cs.LG

    NAS-LID: Efficient Neural Architecture Search with Local Intrinsic Dimension

    Authors: Xin He, Jiangchao Yao, Yuxin Wang, Zhenheng Tang, Ka Chu Cheung, Simon See, Bo Han, Xiaowen Chu

    Abstract: One-shot neural architecture search (NAS) substantially improves the search efficiency by training one supernet to estimate the performance of every possible child architecture (i.e., subnet). However, the inconsistency of characteristics among subnets incurs serious interference in the optimization, resulting in poor performance ranking correlation of subnets. Subsequent explorations decompose su… ▽ More

    Submitted 24 November, 2022; v1 submitted 23 November, 2022; originally announced November 2022.

    Comments: Accepted by AAAI2023, AutoML, NAS

  26. arXiv:2211.03635  [pdf, other

    cs.LG cs.AI

    Complex Hyperbolic Knowledge Graph Embeddings with Fast Fourier Transform

    Authors: Huiru Xiao, Xin Liu, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: The choice of geometric space for knowledge graph (KG) embeddings can have significant effects on the performance of KG completion tasks. The hyperbolic geometry has been shown to capture the hierarchical patterns due to its tree-like metrics, which addressed the limitations of the Euclidean embedding models. Recent explorations of the complex hyperbolic geometry further improved the hyperbolic em… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: Aceepted by the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP22)

  27. CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation

    Authors: Jun Wang, Abhir Bhalerao, Terry Yin, Simon See, Yulan He

    Abstract: Radiology report generation (RRG) has gained increasing research attention because of its huge potential to mitigate medical resource shortages and aid the process of disease decision making by radiologists. Recent advancements in RRG are largely driven by improving a model's capabilities in encoding single-modal feature representations, while few studies explicitly explore the cross-modal alignme… ▽ More

    Submitted 3 March, 2024; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted to IEEE Journal of Biomedical and Health Informatics (IJBHI). 13 pages, 8 figures

  28. arXiv:2210.07988  [pdf, other

    cs.CL cs.AI

    PseudoReasoner: Leveraging Pseudo Labels for Commonsense Knowledge Base Population

    Authors: Tianqing Fang, Quyet V. Do, Hongming Zhang, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: Commonsense Knowledge Base (CSKB) Population aims at reasoning over unseen entities and assertions on CSKBs, and is an important yet hard commonsense reasoning task. One challenge is that it requires out-of-domain generalization ability as the source CSKB for training is of a relatively smaller scale (1M) while the whole candidate space for population is way larger (200M). We propose PseudoReasone… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022

  29. arXiv:2210.06694  [pdf, other

    cs.CL cs.AI

    SubeventWriter: Iterative Sub-event Sequence Generation with Coherence Controller

    Authors: Zhaowei Wang, Hongming Zhang, Tianqing Fang, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: In this paper, we propose a new task of sub-event generation for an unseen process to evaluate the understanding of the coherence of sub-event actions and objects. To solve the problem, we design SubeventWriter, a sub-event sequence generation framework with a coherence controller. Given an unseen process, the framework can iteratively construct the sub-event sequence by generating one sub-event a… ▽ More

    Submitted 19 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted to the main conference of EMNLP 2022

  30. arXiv:2210.00474  [pdf, other

    cs.RO

    Saving the Lim**: Fault-tolerant Quadruped Locomotion via Reinforcement Learning

    Authors: Dikai Liu, Tianwei Zhang, Jianxiong Yin, Simon See

    Abstract: Modern quadrupeds are skillful in traversing or even sprinting on uneven terrains in a remote uncontrolled environment. However, survival in the wild requires not only maneuverability, but also the ability to handle potential critical hardware failures. How to grant such ability to quadrupeds is rarely investigated. In this paper, we propose a novel methodology to train and test hardware fault-tol… ▽ More

    Submitted 7 September, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

    Comments: This work has been submitted to IEEE RA-L for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Project website for video: https://johnliudk.github.io/saving-the-lim**/

  31. arXiv:2207.01208  [pdf, other

    cs.CV cs.CL

    Attributed Abnormality Graph Embedding for Clinically Accurate X-Ray Report Generation

    Authors: Sixing Yan, William K. Cheung, Keith Chiu, Terence M. Tong, Charles K. Cheung, Simon See

    Abstract: Automatic generation of medical reports from X-ray images can assist radiologists to perform the time-consuming and yet important reporting task. Yet, achieving clinically accurate generated reports remains challenging. Modeling the underlying abnormalities using the knowledge graph approach has been found promising in enhancing the clinical accuracy. In this paper, we introduce a novel fined-grai… ▽ More

    Submitted 5 July, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

    Comments: 14 pages, 7 figures

  32. arXiv:2206.10869  [pdf, other

    cs.CV

    NVIDIA-UNIBZ Submission for EPIC-KITCHENS-100 Action Anticipation Challenge 2022

    Authors: Tsung-Ming Tai, Oswald Lanz, Giuseppe Fiameni, Yi-Kwan Wong, Sze-Sen Poon, Cheng-Kuang Lee, Ka-Chun Cheung, Simon See

    Abstract: In this report, we describe the technical details of our submission for the EPIC-Kitchen-100 action anticipation challenge. Our modelings, the higher-order recurrent space-time transformer and the message-passing neural network with edge learning, are both recurrent-based architectures which observe only 2.5 seconds inference context to form the action anticipation prediction. By averaging the pre… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

  33. arXiv:2206.10810  [pdf, other

    eess.IV cs.CV

    A Simple Baseline for Video Restoration with Grouped Spatial-temporal Shift

    Authors: Dasong Li, Xiaoyu Shi, Yi Zhang, Ka Chun Cheung, Simon See, Xiaogang Wang, Hongwei Qin, Hongsheng Li

    Abstract: Video restoration, which aims to restore clear frames from degraded videos, has numerous important applications. The key to video restoration depends on utilizing inter-frame information. However, existing deep learning methods often rely on complicated network architectures, such as optical flow estimation, deformable convolution, and cross-frame self-attention layers, resulting in high computati… ▽ More

    Submitted 22 May, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: Accepted to CVPR2023

    Journal ref: 2023 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

  34. arXiv:2206.01009  [pdf, other

    cs.CV

    Unified Recurrence Modeling for Video Action Anticipation

    Authors: Tsung-Ming Tai, Giuseppe Fiameni, Cheng-Kuang Lee, Simon See, Oswald Lanz

    Abstract: Forecasting future events based on evidence of current conditions is an innate skill of human beings, and key for predicting the outcome of any decision making. In artificial vision for example, we would like to predict the next human action before it happens, without observing the future video frames associated to it. Computer vision models for action anticipation are expected to collect the subt… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

  35. arXiv:2110.09930  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Speech Representation Learning Through Self-supervised Pretraining And Multi-task Finetuning

    Authors: Yi-Chen Chen, Shu-wen Yang, Cheng-Kuang Lee, Simon See, Hung-yi Lee

    Abstract: Speech representation learning plays a vital role in speech processing. Among them, self-supervised learning (SSL) has become an important research direction. It has been shown that an SSL pretraining model can achieve excellent performance in various downstream tasks of speech processing. On the other hand, supervised multi-task learning (MTL) is another representation learning paradigm, which ha… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

  36. arXiv:2109.12493  [pdf, other

    cs.CV

    Self-Supervised Video Representation Learning by Video Incoherence Detection

    Authors: Haozhi Cao, Yuecong Xu, Jianfei Yang, Kezhi Mao, Lihua Xie, Jianxiong Yin, Simon See

    Abstract: This paper introduces a novel self-supervised method that leverages incoherence detection for video representation learning. It roots from the observation that visual systems of human beings can easily identify video incoherence based on their comprehensive understanding of videos. Specifically, the training sample, denoted as the incoherent clip, is constructed by multiple sub-clips hierarchicall… ▽ More

    Submitted 26 September, 2021; originally announced September 2021.

    Comments: 11 pages, 7 figures

  37. Aligning Correlation Information for Domain Adaptation in Action Recognition

    Authors: Yuecong Xu, Jianfei Yang, Haozhi Cao, Kezhi Mao, Jianxiong Yin, Simon See

    Abstract: Domain adaptation (DA) approaches address domain shift and enable networks to be applied to different scenarios. Although various image DA approaches have been proposed in recent years, there is limited research towards video DA. This is partly due to the complexity in adapting the different modalities of features in videos, which includes the correlation features extracted as long-term dependenci… ▽ More

    Submitted 8 December, 2022; v1 submitted 10 July, 2021; originally announced July 2021.

    Comments: The dataset HMDB-ARID is available at https://xuyu0010.github.io/vuda.html.Camera-ready version of this paper accepted at IEEE TNNLS. Correction made for Figure 1 of the Camera-ready version

  38. arXiv:2106.03167  [pdf, other

    eess.AS

    Mathematical Vocoder Algorithm : Modified Spectral Inversion for Efficient Neural Speech Synthesis

    Authors: Hyun Gon Ryu, Jeong-Hoon Kim, Simon See

    Abstract: In this work, we propose a new mathematical vocoder algorithm(modified spectral inversion) that generates a waveform from acoustic features without phase estimation. The main benefit of using our proposed method is that it excludes the training stage of the neural vocoder from the end-to-end speech synthesis model. Our implementation can synthesize high fidelity speech at approximately 20 Mhz on C… ▽ More

    Submitted 15 June, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

    Comments: see sample in https://its10041004.github.io/MVA/

  39. arXiv:2008.11378  [pdf, other

    cs.CV

    Effective Action Recognition with Embedded Key Point Shifts

    Authors: Haozhi Cao, Yuecong Xu, Jianfei Yang, Kezhi Mao, Jianxiong Yin, Simon See

    Abstract: Temporal feature extraction is an essential technique in video-based action recognition. Key points have been utilized in skeleton-based action recognition methods but they require costly key point annotation. In this paper, we propose a novel temporal feature extraction module, named Key Point Shifts Embedding Module ($KPSEM$), to adaptively extract channel-wise key point shifts across video fram… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.

    Comments: 35 pages, 10 figures

  40. arXiv:2008.01170  [pdf, other

    cs.LG stat.ML

    Deep Learning Models for Early Detection and Prediction of the spread of Novel Coronavirus (COVID-19)

    Authors: Devante Ayris, Kye Horbury, Blake Williams, Mitchell Blackney, Celine Shi Hui See, Maleeha Imtiaz, Syed Afaq Ali Shah

    Abstract: SARS-CoV2, which causes coronavirus disease (COVID-19) is continuing to spread globally and has become a pandemic. People have lost their lives due to the virus and the lack of counter measures in place. Given the increasing caseload and uncertainty of spread, there is an urgent need to develop machine learning techniques to predict the spread of COVID-19. Prediction of the spread can allow counte… ▽ More

    Submitted 15 February, 2021; v1 submitted 29 July, 2020; originally announced August 2020.

  41. arXiv:2007.03432  [pdf, other

    math.NA

    A deep learning based nonlinear upscaling method for transport equations

    Authors: Tak Shing Au Yeung, Eric T. Chung, Simon See

    Abstract: We will develop a nonlinear upscaling method for nonlinear transport equation. The proposed scheme gives a coarse scale equation for the cell average of the solution. In order to compute the parameters in the coarse scale equation, a local downscaling operator is constructed. This downscaling operation recovers fine scale properties using cell averages. This is achieved by solving the equation on… ▽ More

    Submitted 7 July, 2020; originally announced July 2020.

    Comments: 18 pages

  42. arXiv:2006.05091  [pdf, other

    cs.CV

    PNL: Efficient Long-Range Dependencies Extraction with Pyramid Non-Local Module for Action Recognition

    Authors: Yuecong Xu, Haozhi Cao, Jianfei Yang, Kezhi Mao, Jianxiong Yin, Simon See

    Abstract: Long-range spatiotemporal dependencies capturing plays an essential role in improving video features for action recognition. The non-local block inspired by the non-local means is designed to address this challenge and have shown excellent performance. However, the non-local block brings significant increase in computation cost to the original network. It also lacks the ability to model regional c… ▽ More

    Submitted 9 June, 2020; originally announced June 2020.

    Comments: Single column, 26 pages, 6 figures

  43. arXiv:2006.03876  [pdf, other

    cs.CV

    ARID: A New Dataset for Recognizing Action in the Dark

    Authors: Yuecong Xu, Jianfei Yang, Haozhi Cao, Kezhi Mao, Jianxiong Yin, Simon See

    Abstract: The task of action recognition in dark videos is useful in various scenarios, e.g., night surveillance and self-driving at night. Though progress has been made in the action recognition task for videos in normal illumination, few have studied action recognition in the dark. This is partly due to the lack of sufficient datasets for such a task. In this paper, we explored the task of action recognit… ▽ More

    Submitted 19 August, 2022; v1 submitted 6 June, 2020; originally announced June 2020.

    Comments: 6 pages, 7 figures, Data available at https://xuyu0010.github.io/arid, simplified title, extension of IJCAIW version published by Springer (https://link.springer.com/chapter/10.1007/978-981-16-0575-8_6)

  44. arXiv:2005.02591  [pdf, other

    cs.CV

    Exploiting Inter-Frame Regional Correlation for Efficient Action Recognition

    Authors: Yuecong Xu, Jianfei Yang, Kezhi Mao, Jianxiong Yin, Simon See

    Abstract: Temporal feature extraction is an important issue in video-based action recognition. Optical flow is a popular method to extract temporal feature, which produces excellent performance thanks to its capacity of capturing pixel-level correlation information between consecutive frames. However, such a pixel-level correlation is extracted at the cost of high computational complexity and large storage… ▽ More

    Submitted 6 May, 2020; originally announced May 2020.

    Comments: 24 pages (exclude reference), 7 figures, 4 tables

  45. arXiv:2003.03785  [pdf, ps, other

    cs.FL cs.AI cs.DB

    Dependently Typed Knowledge Graphs

    Authors: Zhangsheng Lai, Aik Beng Ng, Liang Ze Wong, Simon See, Shaowei Lin

    Abstract: Reasoning over knowledge graphs is traditionally built upon a hierarchy of languages in the Semantic Web Stack. Starting from the Resource Description Framework (RDF) for knowledge graphs, more advanced constructs have been introduced through various syntax extensions to add reasoning capabilities to knowledge graphs. In this paper, we show how standardized semantic web technologies (RDF and its q… ▽ More

    Submitted 8 March, 2020; originally announced March 2020.

  46. arXiv:1911.08772  [pdf, other

    cs.LG cs.DC stat.ML

    Understanding Top-k Sparsification in Distributed Deep Learning

    Authors: Shaohuai Shi, Xiaowen Chu, Ka Chun Cheung, Simon See

    Abstract: Distributed stochastic gradient descent (SGD) algorithms are widely deployed in training large-scale deep learning models, while the communication overhead among workers becomes the new system bottleneck. Recently proposed gradient sparsification techniques, especially Top-$k$ sparsification with error compensation (TopK-SGD), can significantly reduce the communication traffic without an obvious i… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

    Comments: 14 pages

  47. arXiv:1907.04052  [pdf, other

    cs.CV

    Improving Deep Lesion Detection Using 3D Contextual and Spatial Attention

    Authors: Qingyi Tao, Zongyuan Ge, Jianfei Cai, Jianxiong Yin, Simon See

    Abstract: Lesion detection from computed tomography (CT) scans is challenging compared to natural object detection because of two major reasons: small lesion size and small inter-class variation. Firstly, the lesions usually only occupy a small region in the CT image. The feature of such small region may not be able to provide sufficient information due to its limited spatial feature resolution. Secondly, i… ▽ More

    Submitted 9 July, 2019; originally announced July 2019.

    Comments: Accepted by MICCAI 2019

  48. arXiv:1810.04538  [pdf, other

    cs.SE cs.AI cs.CR cs.LG

    Secure Deep Learning Engineering: A Software Quality Assurance Perspective

    Authors: Lei Ma, Felix Juefei-Xu, Minhui Xue, Qiang Hu, Sen Chen, Bo Li, Yang Liu, Jianjun Zhao, Jianxiong Yin, Simon See

    Abstract: Over the past decades, deep learning (DL) systems have achieved tremendous success and gained great popularity in various applications, such as intelligent machines, image processing, speech processing, and medical diagnostics. Deep neural networks are the key driving force behind its recent success, but still seem to be a magic black box lacking interpretability and understanding. This brings up… ▽ More

    Submitted 10 October, 2018; originally announced October 2018.

  49. arXiv:1809.01266  [pdf, other

    cs.SE cs.AI cs.CR cs.LG

    DeepHunter: Hunting Deep Neural Network Defects via Coverage-Guided Fuzzing

    Authors: Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Hongxu Chen, Minhui Xue, Bo Li, Yang Liu, Jianjun Zhao, Jianxiong Yin, Simon See

    Abstract: In company with the data explosion over the past decade, deep neural network (DNN) based software has experienced unprecedented leap and is becoming the key driving force of many novel industrial applications, including many safety-critical scenarios such as autonomous driving. Despite great success achieved in various human intelligence tasks, similar to traditional software, DNNs could also exhi… ▽ More

    Submitted 16 November, 2018; v1 submitted 4 September, 2018; originally announced September 2018.

  50. arXiv:1801.09335  [pdf, other

    cs.LG cs.CV cs.NE

    Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks

    Authors: Jason Kuen, Xiangfei Kong, Zhe Lin, Gang Wang, Jianxiong Yin, Simon See, Yap-Peng Tan

    Abstract: It is desirable to train convolutional networks (CNNs) to run more efficiently during inference. In many cases however, the computational budget that the system has for inference cannot be known beforehand during training, or the inference budget is dependent on the changing real-time resource availability. Thus, it is inadequate to train just inference-efficient CNNs, whose inference costs are no… ▽ More

    Submitted 28 January, 2018; originally announced January 2018.