Search | arXiv e-print repository

AI-native Memory: A Pathway from LLMs Towards AGI

Authors: **gbo Shang, Zai Zheng, Xiang Ying, Felix Tao, Mindverse Team

Abstract: Large language models (LLMs) have demonstrated the world with the sparks of artificial general intelligence (AGI). One opinion, especially from some startups working on LLMs, argues that an LLM with nearly unlimited context length can realize AGI. However, they might be too optimistic about the long-context capability of (existing) LLMs -- (1) Recent literature has shown that their effective conte… ▽ More Large language models (LLMs) have demonstrated the world with the sparks of artificial general intelligence (AGI). One opinion, especially from some startups working on LLMs, argues that an LLM with nearly unlimited context length can realize AGI. However, they might be too optimistic about the long-context capability of (existing) LLMs -- (1) Recent literature has shown that their effective context length is significantly smaller than their claimed context length; and (2) Our reasoning-in-a-haystack experiments further demonstrate that simultaneously finding the relevant information from a long context and conducting (simple) reasoning is nearly impossible. In this paper, we envision a pathway from LLMs to AGI through the integration of \emph{memory}. We believe that AGI should be a system where LLMs serve as core processors. In addition to raw data, the memory in this system would store a large number of important conclusions derived from reasoning processes. Compared with retrieval-augmented generation (RAG) that merely processing raw data, this approach not only connects semantically related information closer, but also simplifies complex inferences at the time of querying. As an intermediate stage, the memory will likely be in the form of natural language descriptions, which can be directly consumed by users too. Ultimately, every agent/person should have its own large personal model, a deep neural network model (thus \emph{AI-native}) that parameterizes and compresses all types of memory, even the ones cannot be described by natural languages. Finally, we discuss the significant potential of AI-native memory as the transformative infrastructure for (proactive) engagement, personalization, distribution, and social in the AGI era, as well as the incurred privacy and security challenges with preliminary solutions. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.14482 [pdf, other]

Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines

Authors: Xinyi Ying, Chao Xiao, Ruo**g Li, Xu He, Boyang Li, Zhaoxu Li, Yingqian Wang, Mingyuan Hu, Qingyu Xu, Zai** Lin, Miao Li, Shilin Zhou, Wei An, Weidong Sheng, Li Liu

Abstract: Small object detection (SOD) has been a longstanding yet challenging task for decades, with numerous datasets and algorithms being developed. However, they mainly focus on either visible or thermal modality, while visible-thermal (RGBT) bimodality is rarely explored. Although some RGBT datasets have been developed recently, the insufficient quantity, limited category, misaligned images and large t… ▽ More Small object detection (SOD) has been a longstanding yet challenging task for decades, with numerous datasets and algorithms being developed. However, they mainly focus on either visible or thermal modality, while visible-thermal (RGBT) bimodality is rarely explored. Although some RGBT datasets have been developed recently, the insufficient quantity, limited category, misaligned images and large target size cannot provide an impartial benchmark to evaluate multi-category visible-thermal small object detection (RGBT SOD) algorithms. In this paper, we build the first large-scale benchmark with high diversity for RGBT SOD (namely RGBT-Tiny), including 115 paired sequences, 93K frames and 1.2M manual annotations. RGBT-Tiny contains abundant targets (7 categories) and high-diversity scenes (8 types that cover different illumination and density variations). Note that, over 81% of targets are smaller than 16x16, and we provide paired bounding box annotations with tracking ID to offer an extremely challenging benchmark with wide-range applications, such as RGBT fusion, detection and tracking. In addition, we propose a scale adaptive fitness (SAFit) measure that exhibits high robustness on both small and large targets. The proposed SAFit can provide reasonable performance evaluation and promote detection performance. Based on the proposed RGBT-Tiny dataset and SAFit measure, extensive evaluations have been conducted, including 23 recent state-of-the-art algorithms that cover four different types (i.e., visible generic detection, visible SOD, thermal SOD and RGBT object detection). Project is available at https://github.com/XinyiYing24/RGBT-Tiny. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2404.18948 [pdf, other]

Sub-Adjacent Transformer: Improving Time Series Anomaly Detection with Reconstruction Error from Sub-Adjacent Neighborhoods

Authors: Wenzhen Yue, Xianghua Ying, Ruohao Guo, DongDong Chen, Ji Shi, Bowei Xing, Yuqing Zhu, Taiyan Chen

Abstract: In this paper, we present the Sub-Adjacent Transformer with a novel attention mechanism for unsupervised time series anomaly detection. Unlike previous approaches that rely on all the points within some neighborhood for time point reconstruction, our method restricts the attention to regions not immediately adjacent to the target points, termed sub-adjacent neighborhoods. Our key observation is th… ▽ More In this paper, we present the Sub-Adjacent Transformer with a novel attention mechanism for unsupervised time series anomaly detection. Unlike previous approaches that rely on all the points within some neighborhood for time point reconstruction, our method restricts the attention to regions not immediately adjacent to the target points, termed sub-adjacent neighborhoods. Our key observation is that owing to the rarity of anomalies, they typically exhibit more pronounced differences from their sub-adjacent neighborhoods than from their immediate vicinities. By focusing the attention on the sub-adjacent areas, we make the reconstruction of anomalies more challenging, thereby enhancing their detectability. Technically, our approach concentrates attention on the non-diagonal areas of the attention matrix by enlarging the corresponding elements in the training stage. To facilitate the implementation of the desired attention matrix pattern, we adopt linear attention because of its flexibility and adaptability. Moreover, a learnable map** function is proposed to improve the performance of linear attention. Empirically, the Sub-Adjacent Transformer achieves state-of-the-art performance across six real-world anomaly detection benchmarks, covering diverse fields such as server monitoring, space exploration, and water treatment. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: IJCAI 2024

arXiv:2310.18709 [pdf, other]

Audio-Visual Instance Segmentation

Authors: Ruohao Guo, Yaru Chen, Yanyu Qi, Wenzhen Yue, Dantong Niu, Xianghua Ying

Abstract: In this paper, we propose a new multi-modal task, namely audio-visual instance segmentation (AVIS), in which the goal is to identify, segment, and track individual sounding object instances in audible videos, simultaneously. To our knowledge, it is the first time that instance segmentation has been extended into the audio-visual domain. To better facilitate this research, we construct the first au… ▽ More In this paper, we propose a new multi-modal task, namely audio-visual instance segmentation (AVIS), in which the goal is to identify, segment, and track individual sounding object instances in audible videos, simultaneously. To our knowledge, it is the first time that instance segmentation has been extended into the audio-visual domain. To better facilitate this research, we construct the first audio-visual instance segmentation benchmark (AVISeg). Specifically, AVISeg consists of 1,258 videos with an average duration of 62.6 seconds from YouTube and public audio-visual datasets, where 117 videos have been annotated by using an interactive semi-automatic labeling tool based on the Segment Anything Model (SAM). In addition, we present a simple baseline model for the AVIS task. Our new model introduces an audio branch and a cross-modal fusion module to Mask2Former to locate all sounding objects. Finally, we evaluate the proposed method using two backbones on AVISeg. We believe that AVIS will inspire the community towards a more comprehensive multi-modal understanding. △ Less

Submitted 28 October, 2023; originally announced October 2023.

arXiv:2304.01484 [pdf, other]

Map** Degeneration Meets Label Evolution: Learning Infrared Small Target Detection with Single Point Supervision

Authors: Xinyi Ying, Li Liu, Yingqian Wang, Ruo**g Li, Nuo Chen, Zai** Lin, Weidong Sheng, Shilin Zhou

Abstract: Training a convolutional neural network (CNN) to detect infrared small targets in a fully supervised manner has gained remarkable research interests in recent years, but is highly labor expensive since a large number of per-pixel annotations are required. To handle this problem, in this paper, we make the first attempt to achieve infrared small target detection with point-level supervision. Intere… ▽ More Training a convolutional neural network (CNN) to detect infrared small targets in a fully supervised manner has gained remarkable research interests in recent years, but is highly labor expensive since a large number of per-pixel annotations are required. To handle this problem, in this paper, we make the first attempt to achieve infrared small target detection with point-level supervision. Interestingly, during the training phase supervised by point labels, we discover that CNNs first learn to segment a cluster of pixels near the targets, and then gradually converge to predict groundtruth point labels. Motivated by this "map** degeneration" phenomenon, we propose a label evolution framework named label evolution with single point supervision (LESPS) to progressively expand the point label by leveraging the intermediate predictions of CNNs. In this way, the network predictions can finally approximate the updated pseudo labels, and a pixel-level target mask can be obtained to train CNNs in an end-to-end manner. We conduct extensive experiments with insightful visualizations to validate the effectiveness of our method. Experimental results show that CNNs equipped with LESPS can well recover the target masks from corresponding point labels, {and can achieve over 70% and 95% of their fully supervised performance in terms of pixel-level intersection over union (IoU) and object-level probability of detection (Pd), respectively. Code is available at https://github.com/XinyiYing/LESPS. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Journal ref: CVPR 2023

arXiv:2303.17594 [pdf, other]

MobileInst: Video Instance Segmentation on the Mobile

Authors: Renhong Zhang, Tianheng Cheng, Shusheng Yang, Haoyi Jiang, Shuai Zhang, Jiancheng Lyu, Xin Li, Xiaowen Ying, Dashan Gao, Wenyu Liu, Xinggang Wang

Abstract: Video instance segmentation on mobile devices is an important yet very challenging edge AI problem. It mainly suffers from (1) heavy computation and memory costs for frame-by-frame pixel-level instance perception and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile… ▽ More Video instance segmentation on mobile devices is an important yet very challenging edge AI problem. It mainly suffers from (1) heavy computation and memory costs for frame-by-frame pixel-level instance perception and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices. Firstly, MobileInst adopts a mobile vision transformer to extract multi-level semantic features and presents an efficient query-based dual-transformer instance decoder for mask kernels and a semantic-enhanced mask decoder to generate instance segmentation per frame. Secondly, MobileInst exploits simple yet effective kernel reuse and kernel association to track objects for video instance segmentation. Further, we propose temporal query passing to enhance the tracking ability for kernels. We conduct experiments on COCO and YouTube-VIS datasets to demonstrate the superiority of MobileInst and evaluate the inference latency on one single CPU core of Snapdragon 778G Mobile Platform, without other methods of acceleration. On the COCO dataset, MobileInst achieves 31.2 mask AP and 433 ms on the mobile CPU, which reduces the latency by 50% compared to the previous SOTA. For video instance segmentation, MobileInst achieves 35.0 AP on YouTube-VIS 2019 and 30.1 AP on YouTube-VIS 2021. Code will be available to facilitate real-world applications and future research. △ Less

Submitted 18 December, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

Comments: Accepted by AAAI 2024 Main Track; Code will be released

arXiv:2303.09042 [pdf, other]

Embedding Theory of Reservoir Computing and Reducing Reservoir Network Using Time Delays

Authors: Xing-Yue Duan, Xiong Ying, Si-Yang Leng, Jürgen Kurths, Wei Lin, Huan-Fei Ma

Abstract: Reservoir computing (RC), a particular form of recurrent neural network, is under explosive development due to its exceptional efficacy and high performance in reconstruction or/and prediction of complex physical systems. However, the mechanism triggering such effective applications of RC is still unclear, awaiting deep and systematic exploration. Here, combining the delayed embedding theory with… ▽ More Reservoir computing (RC), a particular form of recurrent neural network, is under explosive development due to its exceptional efficacy and high performance in reconstruction or/and prediction of complex physical systems. However, the mechanism triggering such effective applications of RC is still unclear, awaiting deep and systematic exploration. Here, combining the delayed embedding theory with the generalized embedding theory, we rigorously prove that RC is essentially a high dimensional embedding of the original input nonlinear dynamical system. Thus, using this embedding property, we unify into a universal framework the standard RC and the time-delayed RC where we novelly introduce time delays only into the network's output layer, and we further find a trade-off relation between the time delays and the number of neurons in RC. Based on this finding, we significantly reduce the network size of RC for reconstructing and predicting some representative physical systems, and, more surprisingly, only using a single neuron reservoir with time delays is sometimes sufficient for achieving those tasks. △ Less

Submitted 8 May, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

arXiv:2210.09884 [pdf]

Is Dogecoin a Viable Investment? Insights from Network and Bubble Effects

Authors: Ruoxin Xiao, Xinyu Ying, Hengxu Li, Kexin Liu

Abstract: We find that three factors: Dogecoin network externalities, momentum, and tweet sentiment that capture the time-series expected Dogecoin returns. Dogecoin returns are exposed to Dogecoin network factors. We construct the network factors to capture the user adoption of Dogecoin. Moreover, there is a strong time-series momentum effect, and proxies for investor attention strongly forecast future Doge… ▽ More We find that three factors: Dogecoin network externalities, momentum, and tweet sentiment that capture the time-series expected Dogecoin returns. Dogecoin returns are exposed to Dogecoin network factors. We construct the network factors to capture the user adoption of Dogecoin. Moreover, there is a strong time-series momentum effect, and proxies for investor attention strongly forecast future Dogecoin returns. Lastly, we examine potential underlying mechanisms of the Dogecoin price bubble. △ Less

Submitted 19 November, 2022; v1 submitted 18 October, 2022; originally announced October 2022.

arXiv:2201.09045 [pdf, other]

From 996 to 007: Challenges of Working from Home During the Epidemic in China

Authors: Jie Gao, Pin Sym Foong, Yifan Yang, Weilin Jiang, Yijie Chen, Xiayin Ying, Simon Perrault

Abstract: During the COVID-19 epidemic in China, millions of workers in tech companies had to start working from home (WFH). The change was sudden, unexpected and companies were not ready for it. Additionally, it was also the first time that WFH was experienced on such a large scale. We used the opportunity to describe the effect of WFH at scale for a sustained period of time. As the lockdown was easing, we… ▽ More During the COVID-19 epidemic in China, millions of workers in tech companies had to start working from home (WFH). The change was sudden, unexpected and companies were not ready for it. Additionally, it was also the first time that WFH was experienced on such a large scale. We used the opportunity to describe the effect of WFH at scale for a sustained period of time. As the lockdown was easing, we conducted semi-structured interviews with 12 participants from China working in tech companies. While at first, WFH was reported as a pleasant experience with advantages, e.g. flexible schedule, more time with family, over time, this evolved into a rather negative experience where workers start working all day, every day and feel a higher workload despite the actual workload being reduced. We discuss these results and how they could apply for other extreme circumstances and to help improve WFH in general. △ Less

Submitted 22 January, 2022; originally announced January 2022.

ACM Class: J.4

arXiv:2201.01014 [pdf, other]

Local Motion and Contrast Priors Driven Deep Network for Infrared Small Target Super-Resolution

Authors: Xinyi Ying, Yingqian Wang, Longguang Wang, Weidong Sheng, Li Liu, Zai** Lin, Shilin Zhou

Abstract: Infrared small target super-resolution (SR) aims to recover reliable and detailed high-resolution image with high-contrast targets from its low-resolution counterparts. Since the infrared small target lacks color and fine structure information, it is significant to exploit the supplementary information among sequence images to enhance the target. In this paper, we propose the first infrared small… ▽ More Infrared small target super-resolution (SR) aims to recover reliable and detailed high-resolution image with high-contrast targets from its low-resolution counterparts. Since the infrared small target lacks color and fine structure information, it is significant to exploit the supplementary information among sequence images to enhance the target. In this paper, we propose the first infrared small target SR method named local motion and contrast prior driven deep network (MoCoPnet) to integrate the domain knowledge of infrared small target into deep network, which can mitigate the intrinsic feature scarcity of infrared small targets. Specifically, motivated by the local motion prior in the spatio-temporal dimension, we propose a local spatio-temporal attention module to perform implicit frame alignment and incorporate the local spatio-temporal information to enhance the local features (especially for small targets). Motivated by the local contrast prior in the spatial dimension, we propose a central difference residual group to incorporate the central difference convolution into the feature extraction backbone, which can achieve center-oriented gradient-aware feature extraction to further improve the target contrast. Extensive experiments have demonstrated that our method can recover accurate spatial dependency and improve the target contrast. Comparative results show that MoCoPnet can outperform the state-of-the-art video SR and single image SR methods in terms of both SR performance and target enhancement. Based on the SR results, we further investigate the influence of SR on infrared small target detection and the experimental results demonstrate that MoCoPnet promotes the detection performance. The code is available at https://github.com/XinyiYing/MoCoPnet. △ Less

Submitted 4 April, 2023; v1 submitted 4 January, 2022; originally announced January 2022.

Journal ref: JSTARS 2022

arXiv:2011.03802 [pdf, other]

Symmetric Parallax Attention for Stereo Image Super-Resolution

Authors: Yingqian Wang, Xinyi Ying, Longguang Wang, Jungang Yang, Wei An, Yulan Guo

Abstract: Although recent years have witnessed the great advances in stereo image super-resolution (SR), the beneficial information provided by binocular systems has not been fully used. Since stereo images are highly symmetric under epipolar constraint, in this paper, we improve the performance of stereo image SR by exploiting symmetry cues in stereo image pairs. Specifically, we propose a symmetric bi-dir… ▽ More Although recent years have witnessed the great advances in stereo image super-resolution (SR), the beneficial information provided by binocular systems has not been fully used. Since stereo images are highly symmetric under epipolar constraint, in this paper, we improve the performance of stereo image SR by exploiting symmetry cues in stereo image pairs. Specifically, we propose a symmetric bi-directional parallax attention module (biPAM) and an inline occlusion handling scheme to effectively interact cross-view information. Then, we design a Siamese network equipped with a biPAM to super-resolve both sides of views in a highly symmetric manner. Finally, we design several illuminance-robust losses to enhance stereo consistency. Experiments on four public datasets demonstrate the superior performance of our method. Source code is available at https://github.com/YingqianWang/iPASSR. △ Less

Submitted 20 April, 2021; v1 submitted 7 November, 2020; originally announced November 2020.

Comments: Accepted to NTIRE workshop at CVPR 2021. The first two authors contribute equally to this work

arXiv:2007.11070 [pdf, other]

doi 10.4204/EPTCS.321.3

How to Increase Interest in Studying Functional Programming via Interdisciplinary Application

Authors: Pedro Figueirêdo, Yuri Kim, Nghia Le Minh, Evan Sitt, Xue Ying, Viktória Zsók

Abstract: Functional programming represents a modern tool for applying and implementing software. The state of the art in functional programming reports an increasing number of methodologies in this paradigm. However, extensive interdisciplinary applications are missing. Our goal is to increase student interest in pursuing further studies in functional programming with the use of an application: the ray tra… ▽ More Functional programming represents a modern tool for applying and implementing software. The state of the art in functional programming reports an increasing number of methodologies in this paradigm. However, extensive interdisciplinary applications are missing. Our goal is to increase student interest in pursuing further studies in functional programming with the use of an application: the ray tracer. We conducted a teaching experience, with positive results and student feedback, described here in this paper. △ Less

Submitted 24 August, 2020; v1 submitted 21 July, 2020; originally announced July 2020.

Comments: In Proceedings TFPIE 2019 and 2020, arXiv:2008.08923

Journal ref: EPTCS 321, 2020, pp. 37-54

arXiv:2007.03535 [pdf, other]

doi 10.1109/TIP.2020.3042059

Light Field Image Super-Resolution Using Deformable Convolution

Authors: Yingqian Wang, Jungang Yang, Longguang Wang, Xinyi Ying, Tianhao Wu, Wei An, Yulan Guo

Abstract: Light field (LF) cameras can record scenes from multiple perspectives, and thus introduce beneficial angular information for image super-resolution (SR). However, it is challenging to incorporate angular information due to disparities among LF images. In this paper, we propose a deformable convolution network (i.e., LF-DFnet) to handle the disparity problem for LF image SR. Specifically, we design… ▽ More Light field (LF) cameras can record scenes from multiple perspectives, and thus introduce beneficial angular information for image super-resolution (SR). However, it is challenging to incorporate angular information due to disparities among LF images. In this paper, we propose a deformable convolution network (i.e., LF-DFnet) to handle the disparity problem for LF image SR. Specifically, we design an angular deformable alignment module (ADAM) for feature-level alignment. Based on ADAM, we further propose a collect-and-distribute approach to perform bidirectional alignment between the center-view feature and each side-view feature. Using our approach, angular information can be well incorporated and encoded into features of each view, which benefits the SR reconstruction of all LF images. Moreover, we develop a baseline-adjustable LF dataset to evaluate SR performance under different disparity variations. Experiments on both public and our self-developed datasets have demonstrated the superiority of our method. Our LF-DFnet can generate high-resolution images with more faithful details and achieve state-of-the-art reconstruction accuracy. Besides, our LF-DFnet is more robust to disparity variations, which has not been well addressed in literature. △ Less

Submitted 25 November, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

Comments: Accepted by IEEE Transactions on Image Processing

arXiv:2006.09603 [pdf, other]

Exploring Sparsity in Image Super-Resolution for Efficient Inference

Authors: Longguang Wang, Xiaoyu Dong, Yingqian Wang, Xinyi Ying, Zai** Lin, Wei An, Yulan Guo

Abstract: Current CNN-based super-resolution (SR) methods process all locations equally with computational resources being uniformly assigned in space. However, since missing details in low-resolution (LR) images mainly exist in regions of edges and textures, less computational resources are required for those flat regions. Therefore, existing CNN-based methods involve redundant computation in flat regions,… ▽ More Current CNN-based super-resolution (SR) methods process all locations equally with computational resources being uniformly assigned in space. However, since missing details in low-resolution (LR) images mainly exist in regions of edges and textures, less computational resources are required for those flat regions. Therefore, existing CNN-based methods involve redundant computation in flat regions, which increases their computational cost and limits their applications on mobile devices. In this paper, we explore the sparsity in image SR to improve inference efficiency of SR networks. Specifically, we develop a Sparse Mask SR (SMSR) network to learn sparse masks to prune redundant computation. Within our SMSR, spatial masks learn to identify "important" regions while channel masks learn to mark redundant channels in those "unimportant" regions. Consequently, redundant computation can be accurately localized and skipped while maintaining comparable performance. It is demonstrated that our SMSR achieves state-of-the-art performance with 41%/33%/27% FLOPs being reduced for x2/3/4 SR. Code is available at: https://github.com/LongguangWang/SMSR. △ Less

Submitted 1 April, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

Comments: Accepted by CVPR 2021

arXiv:2006.06644 [pdf, other]

Relay Aided Intelligent Reconfigurable Surfaces: Achieving the Potential Without So Many Antennas

Authors: Xiaoyan Ying, Umut Demirhan, Ahmed Alkhateeb

Abstract: This paper proposes a novel relay-aided intelligent reconfigurable surface (IRS) architecture for future wireless communication systems. The proposed architecture, which consists of two side-by-side intelligent surfaces connected via a full-duplex relay, has the potential of achieving the promising gains of intelligent surfaces while requiring much smaller numbers of reflecting elements. Consequen… ▽ More This paper proposes a novel relay-aided intelligent reconfigurable surface (IRS) architecture for future wireless communication systems. The proposed architecture, which consists of two side-by-side intelligent surfaces connected via a full-duplex relay, has the potential of achieving the promising gains of intelligent surfaces while requiring much smaller numbers of reflecting elements. Consequently, the proposed IRS architecture needs significantly less channel estimation and beam training overhead and provides higher robustness compared to classical IRS approaches. Further, thanks to dividing the IRS reflection process over two surfaces, the position and orientation of these surfaces can be optimized to extend the wireless communication coverage and enhance the system performance. In this paper, the achievable rates and required numbers of elements using the proposed relay-aided IRS architecture are first analytically characterized and then evaluated using numerical simulations. The results show that the proposed architecture can achieve the data rate targets with much smaller numbers of elements compared to typical IRS solutions, which highlights a promising path towards the practical deployment of these intelligent surfaces. △ Less

Submitted 11 June, 2020; originally announced June 2020.

Comments: 8 pages; 7 figures

arXiv:2004.02803 [pdf, other]

doi 10.1109/LSP.2020.3013518

Deformable 3D Convolution for Video Super-Resolution

Authors: Xinyi Ying, Longguang Wang, Yingqian Wang, Weidong Sheng, Wei An, Yulan Guo

Abstract: The spatio-temporal information among video sequences is significant for video super-resolution (SR). However, the spatio-temporal information cannot be fully used by existing video SR methods since spatial feature extraction and temporal motion compensation are usually performed sequentially. In this paper, we propose a deformable 3D convolution network (D3Dnet) to incorporate spatio-temporal inf… ▽ More The spatio-temporal information among video sequences is significant for video super-resolution (SR). However, the spatio-temporal information cannot be fully used by existing video SR methods since spatial feature extraction and temporal motion compensation are usually performed sequentially. In this paper, we propose a deformable 3D convolution network (D3Dnet) to incorporate spatio-temporal information from both spatial and temporal dimensions for video SR. Specifically, we introduce deformable 3D convolution (D3D) to integrate deformable convolution with 3D convolution, obtaining both superior spatio-temporal modeling capability and motion-aware modeling flexibility. Extensive experiments have demonstrated the effectiveness of D3D in exploiting spatio-temporal information. Comparative results show that our network achieves state-of-the-art SR performance. Code is available at: https://github.com/XinyiYing/D3Dnet. △ Less

Submitted 15 August, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

Comments: Accepted by IEEE Signal Processing Letters

arXiv:1912.04735 [pdf, other]

Covert Channel-Based Transmitter Authentication in Controller Area Networks

Authors: Xuhang Ying, Giuseppe Bernieri, Mauro Conti, Linda Bushnell, Radha Poovendran

Abstract: In recent years, the security of automotive Cyber-Physical Systems (CPSs) is facing urgent threats due to the widespread use of legacy in-vehicle communication systems. As a representative legacy bus system, the Controller Area Network (CAN) hosts Electronic Control Units (ECUs) that are crucial vehicle functioning. In this scenario, malicious actors can exploit CAN vulnerabilities, such as the la… ▽ More In recent years, the security of automotive Cyber-Physical Systems (CPSs) is facing urgent threats due to the widespread use of legacy in-vehicle communication systems. As a representative legacy bus system, the Controller Area Network (CAN) hosts Electronic Control Units (ECUs) that are crucial vehicle functioning. In this scenario, malicious actors can exploit CAN vulnerabilities, such as the lack of built-in authentication and encryption schemes, to launch CAN bus attacks with life-threatening consequences (e.g., disabling brakes). In this paper, we present TACAN (Transmitter Authentication in CAN), which provides secure authentication of ECUs on the legacy CAN bus by exploiting the covert channels, without introducing CAN protocol modifications or traffic overheads. TACAN turns upside-down the originally malicious concept of covert channels and exploits it to build an effective defensive technique that facilitates transmitter authentication via a centralized, trusted Monitor Node. TACAN consists of three different covert channels for ECU authentication: 1) the Inter-Arrival Time (IAT)-based; 2) the Least Significant Bit (LSB)-based; and 3) a hybrid covert channel, exploiting the combination of the first two. In order to validate TACAN, we implement the covert channels on the University of Washington (UW) EcoCAR (Chevrolet Camaro 2016) testbed. We further evaluate the bit error, throughput, and detection performance of TACAN through extensive experiments using the EcoCAR testbed and a publicly available dataset collected from Toyota Camry 2010. We demonstrate the feasibility of TACAN and the effectiveness of detecting CAN bus attacks, highlighting no traffic overheads and attesting the regular functionality of ECUs. △ Less

Submitted 7 December, 2019; originally announced December 2019.

Comments: Submitted to TDSC (Transactions on Dependable and Secure Computing). arXiv admin note: text overlap with arXiv:1903.05231

arXiv:1907.07792 [pdf, other]

GRIP++: Enhanced Graph-based Interaction-aware Trajectory Prediction for Autonomous Driving

Authors: Xin Li, Xiaowen Ying, Mooi Choo Chuah

Abstract: Despite the advancement in the technology of autonomous driving cars, the safety of a self-driving car is still a challenging problem that has not been well studied. Motion prediction is one of the core functions of an autonomous driving car. Previously, we propose a novel scheme called GRIP which is designed to predict trajectories for traffic agents around an autonomous car efficiently. GRIP use… ▽ More Despite the advancement in the technology of autonomous driving cars, the safety of a self-driving car is still a challenging problem that has not been well studied. Motion prediction is one of the core functions of an autonomous driving car. Previously, we propose a novel scheme called GRIP which is designed to predict trajectories for traffic agents around an autonomous car efficiently. GRIP uses a graph to represent the interactions of close objects, applies several graph convolutional blocks to extract features, and subsequently uses an encoder-decoder long short-term memory (LSTM) model to make predictions. Even though our experimental results show that GRIP improves the prediction accuracy of the state-of-the-art solution by 30%, GRIP still has some limitations. GRIP uses a fixed graph to describe the relationships between different traffic agents and hence may suffer some performance degradations when it is being used in urban traffic scenarios. Hence, in this paper, we describe an improved scheme called GRIP++ where we use both fixed and dynamic graphs for trajectory predictions of different types of traffic agents. Such an improvement can help autonomous driving cars avoid many traffic accidents. Our evaluations using a recently released urban traffic dataset, namely ApolloScape showed that GRIP++ achieves better prediction accuracy than state-of-the-art schemes. GRIP++ ranked #1 on the leaderboard of the ApolloScape trajectory competition in October 2019. In addition, GRIP++ runs 21.7 times faster than a state-of-the-art scheme, CS-LSTM. △ Less

Submitted 19 May, 2020; v1 submitted 17 July, 2019; originally announced July 2019.

arXiv:1905.06902 [pdf, other]

X2CT-GAN: Reconstructing CT from Biplanar X-Rays with Generative Adversarial Networks

Authors: Xingde Ying, Heng Guo, Kai Ma, Jian Wu, Zhengxin Weng, Yefeng Zheng

Abstract: Computed tomography (CT) can provide a 3D view of the patient's internal organs, facilitating disease diagnosis, but it incurs more radiation dose to a patient and a CT scanner is much more cost prohibitive than an X-ray machine too. Traditional CT reconstruction methods require hundreds of X-ray projections through a full rotational scan of the body, which cannot be performed on a typical X-ray m… ▽ More Computed tomography (CT) can provide a 3D view of the patient's internal organs, facilitating disease diagnosis, but it incurs more radiation dose to a patient and a CT scanner is much more cost prohibitive than an X-ray machine too. Traditional CT reconstruction methods require hundreds of X-ray projections through a full rotational scan of the body, which cannot be performed on a typical X-ray machine. In this work, we propose to reconstruct CT from two orthogonal X-rays using the generative adversarial network (GAN) framework. A specially designed generator network is exploited to increase data dimension from 2D (X-rays) to 3D (CT), which is not addressed in previous research of GAN. A novel feature fusion method is proposed to combine information from two X-rays.The mean squared error (MSE) loss and adversarial loss are combined to train the generator, resulting in a high-quality CT volume both visually and quantitatively. Extensive experiments on a publicly available chest CT dataset demonstrate the effectiveness of the proposed method. It could be a nice enhancement of a low-cost X-ray machine to provide physicians a CT-like 3D volume in several niche applications. △ Less

Submitted 16 May, 2019; originally announced May 2019.

arXiv:1904.09969 [pdf, other]

Detecting ADS-B Spoofing Attacks using Deep Neural Networks

Authors: Xuhang Ying, Joanna Mazer, Giuseppe Bernieri, Mauro Conti, Linda Bushnell, Radha Poovendran

Abstract: The Automatic Dependent Surveillance-Broadcast (ADS-B) system is a key component of the Next Generation Air Transportation System (NextGen) that manages the increasingly congested airspace. It provides accurate aircraft localization and efficient air traffic management and also improves the safety of billions of current and future passengers. While the benefits of ADS-B are well known, the lack of… ▽ More The Automatic Dependent Surveillance-Broadcast (ADS-B) system is a key component of the Next Generation Air Transportation System (NextGen) that manages the increasingly congested airspace. It provides accurate aircraft localization and efficient air traffic management and also improves the safety of billions of current and future passengers. While the benefits of ADS-B are well known, the lack of basic security measures like encryption and authentication introduces various exploitable security vulnerabilities. One practical threat is the ADS-B spoofing attack that targets the ADS-B ground station, in which the ground-based or aircraft-based attacker manipulates the International Civil Aviation Organization (ICAO) address (a unique identifier for each aircraft) in the ADS-B messages to fake the appearance of non-existent aircraft or masquerade as a trusted aircraft. As a result, this attack can confuse the pilots or the air traffic control personnel and cause dangerous maneuvers. In this paper, we introduce SODA - a two-stage Deep Neural Network (DNN)-based spoofing detector for ADS-B that consists of a message classifier and an aircraft classifier. It allows a ground station to examine each incoming message based on the PHY-layer features (e.g., IQ samples and phases) and flag suspicious messages. Our experimental results show that SODA detects ground-based spoofing attacks with a probability of 99.34%, while having a very small false alarm rate (i.e., 0.43%). It outperforms other machine learning techniques such as XGBoost, Logistic Regression, and Support Vector Machine. It further identifies individual aircraft with an average F-score of 96.68% and an accuracy of 96.66%, with a significant improvement over the state-of-the-art detector. △ Less

Submitted 22 April, 2019; originally announced April 2019.

Comments: Accepted to IEEE CNS 2019

arXiv:1903.05231 [pdf, other]

TACAN: Transmitter Authentication through Covert Channels in Controller Area Networks

Authors: Xuhang Ying, Giuseppe Bernieri, Mauro Conti, Radha Poovendran

Abstract: Nowadays, the interconnection of automotive systems with modern digital devices offers advanced user experiences to drivers. Electronic Control Units (ECUs) carry out a multitude of operations using the insecure Controller Area Network (CAN) bus in automotive Cyber-Physical Systems (CPSs). Therefore, dangerous attacks, such as disabling brakes, are possible and the safety of passengers is at risk.… ▽ More Nowadays, the interconnection of automotive systems with modern digital devices offers advanced user experiences to drivers. Electronic Control Units (ECUs) carry out a multitude of operations using the insecure Controller Area Network (CAN) bus in automotive Cyber-Physical Systems (CPSs). Therefore, dangerous attacks, such as disabling brakes, are possible and the safety of passengers is at risk. In this paper, we present TACAN (Transmitter Authentication in CAN), which provides secure authentication of ECUs by exploiting the covert channels without introducing CAN protocol modifications or traffic overheads (i.e., no extra bits or messages are used). TACAN turns upside-down the originally malicious concept of covert channels and exploits it to build an effective defensive technique that facilitates transmitter authentication via a trusted Monitor Node. TACAN consists of three different covert channels for ECU authentication: 1) Inter-Arrival Time (IAT)-based, leveraging the IATs of CAN messages; 2) offset-based, exploiting the clock offsets of CAN messages; 3) Least Significant Bit (LSB)-based, concealing authentication messages into the LSBs of normal CAN data. We implement the covert channels on the University of Washington (UW) EcoCAR testbed and evaluate their performance through extensive experiments. We demonstrate the feasibility of TACAN, highlighting no traffic overheads and attesting the regular functionality of ECUs. In particular, the bit error ratios are within 0.1% and 0.42% for the IAT-based and offset-based covert channels, respectively. Furthermore, the bit error ratio of the LSB-based covert channel is equal to that of a normal CAN bus, which is 3.1x10^-7%. △ Less

Submitted 12 March, 2019; originally announced March 2019.

Comments: To be published in ACM/IEEE ICCPS 2019

arXiv:1807.09432 [pdf, other]

Shape of the Cloak: Formal Analysis of Clock Skew-Based Intrusion Detection System in Controller Area Networks

Authors: Xuhang Ying, Sang Uk Sagong, Andrew Clark, Linda Bushnell, Radha Poovendran

Abstract: This paper presents a new masquerade attack called the cloaking attack and provides formal analyses for clock skew-based Intrusion Detection Systems (IDSs) that detect masquerade attacks in the Controller Area Network (CAN) in automobiles. In the cloaking attack, the adversary manipulates the message inter-transmission times of spoofed messages by adding delays so as to emulate a desired clock ske… ▽ More This paper presents a new masquerade attack called the cloaking attack and provides formal analyses for clock skew-based Intrusion Detection Systems (IDSs) that detect masquerade attacks in the Controller Area Network (CAN) in automobiles. In the cloaking attack, the adversary manipulates the message inter-transmission times of spoofed messages by adding delays so as to emulate a desired clock skew and avoid detection. In order to predict and characterize the impact of the cloaking attack in terms of the attack success probability on a given CAN bus and IDS, we develop formal models for two clock skew-based IDSs, i.e., the state-of-the-art (SOTA) IDS and its adaptation to the widely used Network Time Protocol (NTP), using parameters of the attacker, the detector, and the hardware platform. To the best of our knowledge, this is the first paper that provides formal analyses of clock skew-based IDSs in automotive CAN. We implement the cloaking attack on two hardware testbeds, a prototype and a real vehicle (the University of Washington (UW) EcoCAR), and demonstrate its effectiveness against both the SOTA and NTP-based IDSs. We validate our formal analyses through extensive experiments for different messages, IDS settings, and vehicles. By comparing each predicted attack success probability curve against its experimental curve, we find that the average prediction error is within 3.0% for the SOTA IDS and 5.7% for the NTP-based IDS. △ Less

Submitted 23 January, 2019; v1 submitted 25 July, 2018; originally announced July 2018.

Comments: Part of this work was presented at ACM/IEEE ICCPS 2018; to be published in IEEE Transactions on Information Forensics & Security

arXiv:1805.06053 [pdf, other]

SAS-Assisted Coexistence-Aware Dynamic Channel Assignment in CBRS Band

Authors: Xuhang Ying, Milind Buddhikot, Sumit Roy

Abstract: The paradigm of shared spectrum allows secondary devices to opportunistically access spectrum bands underutilized by primary owners. Recently, the FCC has targeted the sharing of the 3.5 GHz (3550-3700 MHz) federal spectrum with commercial systems such as small cells. The rules require a Spectrum Access System (SAS) to accommodate three service tiers: 1) Incumbent Access, 2) Priority Access (PA),… ▽ More The paradigm of shared spectrum allows secondary devices to opportunistically access spectrum bands underutilized by primary owners. Recently, the FCC has targeted the sharing of the 3.5 GHz (3550-3700 MHz) federal spectrum with commercial systems such as small cells. The rules require a Spectrum Access System (SAS) to accommodate three service tiers: 1) Incumbent Access, 2) Priority Access (PA), and 3) Generalized Authorized Access (GAA). In this work, we study the SAS-assisted dynamic channel assignment (CA) for PA and GAA tiers.We introduce the node-channel-pair conflict graph to capture pairwise interference, channel and geographic contiguity constraints, spatially varying channel availability, and coexistence awareness. The proposed conflict graph allows us to formulate PA CA and GAA CA with binary conflicts as max-cardinality and max-reward CA, respectively. Approximate solutions can be found by a heuristic-based algorithm that search for the maximum weighted independent set. We further formulate GAA CA with non-binary conflicts as max-utility CA. We show that the utility function is submodular, and the problem is an instance of matroid-constrained submodular maximization. A polynomial-time algorithm based on local search is proposed that provides a provable performance guarantee. Extensive simulations using a real-world Wi-Fi hotspot location dataset are conducted to evaluate the proposed algorithms. Our results have demonstrated the advantages of the proposed graph representation and improved performance of the proposed algorithms over the baseline algorithms. △ Less

Submitted 18 July, 2018; v1 submitted 15 May, 2018; originally announced May 2018.

Comments: Accepted to IEEE TWC

arXiv:1710.02692 [pdf, other]

Cloaking the Clock: Emulating Clock Skew in Controller Area Networks

Authors: Sang Uk Sagong, Xuhang Ying, Andrew Clark, Linda Bushnell, Radha Poovendran

Abstract: Automobiles are equipped with Electronic Control Units (ECU) that communicate via in-vehicle network protocol standards such as Controller Area Network (CAN). These protocols are designed under the assumption that separating in-vehicle communications from external networks is sufficient for protection against cyber attacks. This assumption, however, has been shown to be invalid by recent attacks i… ▽ More Automobiles are equipped with Electronic Control Units (ECU) that communicate via in-vehicle network protocol standards such as Controller Area Network (CAN). These protocols are designed under the assumption that separating in-vehicle communications from external networks is sufficient for protection against cyber attacks. This assumption, however, has been shown to be invalid by recent attacks in which adversaries were able to infiltrate the in-vehicle network. Motivated by these attacks, intrusion detection systems (IDSs) have been proposed for in-vehicle networks that attempt to detect attacks by making use of device fingerprinting using properties such as clock skew of an ECU. In this paper, we propose the cloaking attack, an intelligent masquerade attack in which an adversary modifies the timing of transmitted messages in order to match the clock skew of a targeted ECU. The attack leverages the fact that, while the clock skew is a physical property of each ECU that cannot be changed by the adversary, the estimation of the clock skew by other ECUs is based on network traffic, which, being a cyber component only, can be modified by an adversary. We implement the proposed cloaking attack and test it on two IDSs, namely, the current state-of-the-art IDS and a new IDS that we develop based on the widely-used Network Time Protocol (NTP). We implement the cloaking attack on two hardware testbeds, a prototype and a real connected vehicle, and show that it can always deceive both IDSs. We also introduce a new metric called the Maximum Slackness Index to quantify the effectiveness of the cloaking attack even when the adversary is unable to precisely match the clock skew of the targeted ECU. △ Less

Submitted 21 March, 2018; v1 submitted 7 October, 2017; originally announced October 2017.

Comments: 11 pages, 13 figures, This work has been accepted to the 9th ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS)

MSC Class: 68W01

arXiv:1710.01705 [pdf, other]

Detecting LTE-U Duty Cycling Misbehavior for Fair Sharing with Wi-Fi in Shared Bands

Authors: Xuhang Ying, Radha Poovendran, Sumit Roy

Abstract: Coexistence of Wi-Fi and LTE Unlicensed (LTE-U) in shared or unlicensed bands has drawn growing attention from both academia and industry. An important consideration is fairness between Wi-Fi and duty cycled LTE-U, which is often defined in terms of channel access time, as adopted by the LTE-U Forum. Despite many studies on duty cycle adaptation design for fair sharing, one crucial fact has often… ▽ More Coexistence of Wi-Fi and LTE Unlicensed (LTE-U) in shared or unlicensed bands has drawn growing attention from both academia and industry. An important consideration is fairness between Wi-Fi and duty cycled LTE-U, which is often defined in terms of channel access time, as adopted by the LTE-U Forum. Despite many studies on duty cycle adaptation design for fair sharing, one crucial fact has often been neglected: LTE-U systems unilaterally control LTE-U duty cycles; hence, as self- interested users, they have incentives to misbehave, e.g., transmitting with a larger duty cycle that exceeds a given limit, so as to gain a greater share in channel access time and throughput. In this paper, we propose a scheme that allows the spectrum manager managing the shared bands to estimate the duty cycle of a target LTE-U cell based on PHY layer observations from a nearby Wi-Fi AP, without interrupting normal Wi-Fi operations. We further propose a thresholding scheme to detect duty cycling misbehavior (i.e., determining if the duty cycle exceeds the assigned limit), and analyze its performance in terms of detection and false alarm probabilities. The proposed schemes are implemented in ns3 and evaluated with extensive simulations. Our results show that the proposed scheme provides an estimate within +/- 1% of the true duty cycle, and detects misbehavior with a duty cycle 2.8% higher than the limit with a detection probability of at least 95%, while kee** the false alarm probability less than or equal to 1%. △ Less

Submitted 4 October, 2017; originally announced October 2017.

Comments: Accepted to IEEE PIMRC 2017

arXiv:1611.07580 [pdf, other]

doi 10.1109/TCCN.2017.2701812

Pricing Mechanisms for Crowd-Sensed Spatial-Statistics-Based Radio Map**

Authors: Xuhang Ying, Sumit Roy, Radha Poovendran

Abstract: Networking on white spaces (i.e., locally unused spectrum) relies on active monitoring of spectrum usage. Spectrum databases based on empirical radio propagation models are widely adopted but shown to be error-prone, since they do not account for built environments like trees and man-made buildings. As an economically viable option, crowd-sensed radio map** acquires more accurate local spectrum… ▽ More Networking on white spaces (i.e., locally unused spectrum) relies on active monitoring of spectrum usage. Spectrum databases based on empirical radio propagation models are widely adopted but shown to be error-prone, since they do not account for built environments like trees and man-made buildings. As an economically viable option, crowd-sensed radio map** acquires more accurate local spectrum data from mobile users and constructs radio maps using spatial models such as Kriging and Gaussian Process. Success of such crowd-sensing systems presumes some incentive mechanisms to attract user participation. In this work, we consider the scenario where the platform who constructs radio environment maps makes one-time offers to selected users, and collects data from those who accept the offers. We design pricing mechanisms based on expected utility (EU) maximization, where EU captures the tradeoff between radio map** performance (location and data quality), crowd-sensing cost and uncertainty in offer outcomes (i.e., possible expiration and rejection). Specifically, we consider sequential offering, where one best price offer is sent to the best user in each round, and batched offering, where a batch of multiple offers are made in each round. For the later, we show that EU is submodular in the discrete domain, and propose a mechanism that first fixes the pricing rule and selects users based on Unconstrained Submodular Maximization (USM); it then compares different pricing rules to find the best batch of offers in each round. We show that USM-based user selection has provable performance guarantee. Proposed mechanisms are evaluated and compared against utility-maximization-based baseline mechanisms. △ Less

Submitted 22 May, 2017; v1 submitted 22 November, 2016; originally announced November 2016.

Comments: Part of this work was present at IEEE GLOBECOM 2016

arXiv:1305.1293 [pdf, other]

Parallel Chen-Han (PCH) Algorithm for Discrete Geodesics

Authors: Xiang Ying, Shi-Qing Xin, Ying He

Abstract: In many graphics applications, the computation of exact geodesic distance is very important. However, the high computational cost of the existing geodesic algorithms means that they are not practical for large-scale models or time-critical applications. To tackle this challenge, we propose the parallel Chen-Han (or PCH) algorithm, which extends the classic Chen-Han (CH) discrete geodesic algorithm… ▽ More In many graphics applications, the computation of exact geodesic distance is very important. However, the high computational cost of the existing geodesic algorithms means that they are not practical for large-scale models or time-critical applications. To tackle this challenge, we propose the parallel Chen-Han (or PCH) algorithm, which extends the classic Chen-Han (CH) discrete geodesic algorithm to the parallel setting. The original CH algorithm and its variant both lack a parallel solution because the windows (a key data structure that carries the shortest distance in the wavefront propagation) are maintained in a strict order or a tightly coupled manner, which means that only one window is processed at a time. We propose dividing the CH's sequential algorithm into four phases, window selection, window propagation, data organization, and events processing so that there is no data dependence or conflicts in each phase and the operations within each phase can be carried out in parallel. The proposed PCH algorithm is able to propagate a large number of windows simultaneously and independently. We also adopt a simple yet effective strategy to control the total number of windows. We implement the PCH algorithm on modern GPUs (such as Nvidia GTX 580) and analyze the performance in detail. The performance improvement (compared to the sequential algorithms) is highly consistent with GPU double-precision performance (GFLOPS). Extensive experiments on real-world models demonstrate an order of magnitude improvement in execution time compared to the state-of-the-art. △ Less

Submitted 7 May, 2013; originally announced May 2013.

Comments: 10 pages, accepted to ACM Transactions on Graphics with major revision

arXiv:1304.1103 [pdf]

Minimum Error Tree Decomposition

Authors: L. Liu, Y. Ma, D. Wilkins, Z. Bian, X. Ying

Abstract: This paper describes a generalization of previous methods for constructing tree-structured belief network with hidden variables. The major new feature of the described method is the ability to produce a tree decomposition even when there are errors in the correlation data among the input variables. This is an important extension of existing methods since the correlational coefficients usually cann… ▽ More This paper describes a generalization of previous methods for constructing tree-structured belief network with hidden variables. The major new feature of the described method is the ability to produce a tree decomposition even when there are errors in the correlation data among the input variables. This is an important extension of existing methods since the correlational coefficients usually cannot be measured with precision. The technique involves using a greedy search algorithm that locally minimizes an error function. △ Less

Submitted 27 March, 2013; originally announced April 2013.

Comments: Appears in Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence (UAI1990)

Report number: UAI-P-1990-PG-180-185

Showing 1–28 of 28 results for author: Ying, X