Search | arXiv e-print repository

SSNVC: Single Stream Neural Video Compression with Implicit Temporal Information

Authors: Feng Wang, Haihang Ruan, Zhihuang Xie, Ronggang Wang, Xiangyu Yue

Abstract: Recently, Neural Video Compression (NVC) techniques have achieved remarkable performance, even surpassing the best traditional lossy video codec. However, most existing NVC methods heavily rely on transmitting Motion Vector (MV) to generate accurate contextual features, which has the following drawbacks. (1) Compressing and transmitting MV requires specialized MV encoder and decoder, which makes m… ▽ More Recently, Neural Video Compression (NVC) techniques have achieved remarkable performance, even surpassing the best traditional lossy video codec. However, most existing NVC methods heavily rely on transmitting Motion Vector (MV) to generate accurate contextual features, which has the following drawbacks. (1) Compressing and transmitting MV requires specialized MV encoder and decoder, which makes modules redundant. (2) Due to the existence of MV Encoder-Decoder, the training strategy is complex. In this paper, we present a noval Single Stream NVC framework (SSNVC), which removes complex MV Encoder-Decoder structure and uses a one-stage training strategy. SSNVC implicitly use temporal information by adding previous entropy model feature to current entropy model and using previous two frame to generate predicted motion information at the decoder side. Besides, we enhance the frame generator to generate higher quality reconstructed frame. Experiments demonstrate that SSNVC can achieve state-of-the-art performance on multiple benchmarks, and can greatly simplify compression process as well as training process. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted by DCC 2024 as Poster. This is the full paper

arXiv:2405.11493 [pdf, other]

Point Cloud Compression with Implicit Neural Representations: A Unified Framework

Authors: Hongning Ruan, Yulin Shao, Qianqian Yang, Liang Zhao, Dusit Niyato

Abstract: Point clouds have become increasingly vital across various applications thanks to their ability to realistically depict 3D objects and scenes. Nevertheless, effectively compressing unstructured, high-precision point cloud data remains a significant challenge. In this paper, we present a pioneering point cloud compression framework capable of handling both geometry and attribute components. Unlike… ▽ More Point clouds have become increasingly vital across various applications thanks to their ability to realistically depict 3D objects and scenes. Nevertheless, effectively compressing unstructured, high-precision point cloud data remains a significant challenge. In this paper, we present a pioneering point cloud compression framework capable of handling both geometry and attribute components. Unlike traditional approaches and existing learning-based methods, our framework utilizes two coordinate-based neural networks to implicitly represent a voxelized point cloud. The first network generates the occupancy status of a voxel, while the second network determines the attributes of an occupied voxel. To tackle an immense number of voxels within the volumetric space, we partition the space into smaller cubes and focus solely on voxels within non-empty cubes. By feeding the coordinates of these voxels into the respective networks, we reconstruct the geometry and attribute components of the original point cloud. The neural network parameters are further quantized and compressed. Experimental results underscore the superior performance of our proposed method compared to the octree-based approach employed in the latest G-PCC standards. Moreover, our method exhibits high universality when contrasted with existing learning-based techniques. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: 6 Pages, 6 Figures, submitted to IEEE ICCC

arXiv:2404.05427 [pdf, other]

AutoCodeRover: Autonomous Program Improvement

Authors: Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, Abhik Roychoudhury

Abstract: Researchers have made significant progress in automating the software development process in the past decades. Recent progress in Large Language Models (LLMs) has significantly impacted the development process, where developers can use LLM-based programming assistants to achieve automated coding. Nevertheless software engineering involves the process of program improvement apart from coding, speci… ▽ More Researchers have made significant progress in automating the software development process in the past decades. Recent progress in Large Language Models (LLMs) has significantly impacted the development process, where developers can use LLM-based programming assistants to achieve automated coding. Nevertheless software engineering involves the process of program improvement apart from coding, specifically to enable software maintenance (e.g. bug fixing) and software evolution (e.g. feature additions). In this paper, we propose an automated approach for solving GitHub issues to autonomously achieve program improvement. In our approach called AutoCodeRover, LLMs are combined with sophisticated code search capabilities, ultimately leading to a program modification or patch. In contrast to recent LLM agent approaches from AI researchers and practitioners, our outlook is more software engineering oriented. We work on a program representation (abstract syntax tree) as opposed to viewing a software project as a mere collection of files. Our code search exploits the program structure in the form of classes/methods to enhance LLM's understanding of the issue's root cause, and effectively retrieve a context via iterative search. The use of spectrum based fault localization using tests, further sharpens the context, as long as a test-suite is available. Experiments on SWE-bench-lite which consists of 300 real-life GitHub issues show increased efficacy in solving GitHub issues (22-23% on SWE-bench-lite). On the full SWE-bench consisting of 2294 GitHub issues, AutoCodeRover solved around 16% of issues, which is higher than the efficacy of the recently reported AI software engineer Devin from Cognition Labs, while taking time comparable to Devin. We posit that our workflow enables autonomous software engineering, where, in future, auto-generated code from LLMs can be autonomously improved. △ Less

Submitted 14 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.11187 [pdf, other]

Task-Based Quantizer Design for Sensing With Random Signals

Authors: Hang Ruan, Fan Liu

Abstract: In integrated sensing and communication (ISAC) systems, random signaling is used to convey useful information as well as sense the environment. Such randomness poses challenges in various components in sensing signal processing. In this paper, we investigate quantizer design for sensing in ISAC systems. Unlike quantizers for channel estimation in massive multiple-input-multiple-out (MIMO) communic… ▽ More In integrated sensing and communication (ISAC) systems, random signaling is used to convey useful information as well as sense the environment. Such randomness poses challenges in various components in sensing signal processing. In this paper, we investigate quantizer design for sensing in ISAC systems. Unlike quantizers for channel estimation in massive multiple-input-multiple-out (MIMO) communication systems, sensing in ISAC systems needs to deal with random nonorthogonal transmitted signals rather than a fixed orthogonal pilot. Considering sensing performance and hardware implementation, we focus on task-based hardware-limited quantization with spatial analog combining. We propose two strategies of quantizer optimization, i.e., data-dependent (DD) and data-independent (DI). The former achieves optimized sensing performance with high implementation overhead. To reduce hardware complexity, the latter optimizes the quantizer with respect to the random signal from a stochastic perspective. We derive the optimal quantizers for both strategies and formulate an algorithm based on sample average approximation (SAA) to solve the optimization in the DI strategy. Numerical results show that the optimized quantizers outperform digital-only quantizers in terms of sensing performance. Additionally, the DI strategy, despite its lower computational complexity compared to the DD strategy, achieves near-optimal sensing performance. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2401.00260 [pdf, other]

GazeCLIP: Towards Enhancing Gaze Estimation via Text Guidance

Authors: Jun Wang, Hao Ruan, Mingjie Wang, Chuanghui Zhang, Huachun Li, Jun Zhou

Abstract: Over the past decade, visual gaze estimation has garnered increasing attention within the research community, owing to its wide-ranging application scenarios. While existing estimation approaches have achieved remarkable success in enhancing prediction accuracy, they primarily infer gaze from single-image signals, neglecting the potential benefits of the currently dominant text guidance. Notably,… ▽ More Over the past decade, visual gaze estimation has garnered increasing attention within the research community, owing to its wide-ranging application scenarios. While existing estimation approaches have achieved remarkable success in enhancing prediction accuracy, they primarily infer gaze from single-image signals, neglecting the potential benefits of the currently dominant text guidance. Notably, visual-language collaboration has been extensively explored across various visual tasks, such as image synthesis and manipulation, leveraging the remarkable transferability of large-scale Contrastive Language-Image Pre-training (CLIP) model. Nevertheless, existing gaze estimation approaches overlook the rich semantic cues conveyed by linguistic signals and the priors embedded in CLIP feature space, thereby yielding performance setbacks. To address this gap, we delve deeply into the text-eye collaboration protocol and introduce a novel gaze estimation framework, named GazeCLIP. Specifically, we intricately design a linguistic description generator to produce text signals with coarse directional cues. Additionally, a CLIP-based backbone that excels in characterizing text-eye pairs for gaze estimation is presented. This is followed by the implementation of a fine-grained multi-modal fusion module aimed at modeling the interrelationships between heterogeneous inputs. Extensive experiments on three challenging datasets demonstrate the superiority of the proposed GazeCLIP which achieves the state-of-the-art accuracy. △ Less

Submitted 25 April, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

arXiv:2311.07633 [pdf, other]

Rethinking and Benchmarking Predict-then-Optimize Paradigm for Combinatorial Optimization Problems

Authors: Haoyu Geng, Hang Ruan, Runzhong Wang, Yang Li, Yang Wang, Lei Chen, Junchi Yan

Abstract: Numerous web applications rely on solving combinatorial optimization problems, such as energy cost-aware scheduling, budget allocation on web advertising, and graph matching on social networks. However, many optimization problems involve unknown coefficients, and improper predictions of these factors may lead to inferior decisions which may cause energy wastage, inefficient resource allocation, in… ▽ More Numerous web applications rely on solving combinatorial optimization problems, such as energy cost-aware scheduling, budget allocation on web advertising, and graph matching on social networks. However, many optimization problems involve unknown coefficients, and improper predictions of these factors may lead to inferior decisions which may cause energy wastage, inefficient resource allocation, inappropriate matching in social networks, etc. Such a research topic is referred to as "Predict-Then-Optimize (PTO)" which considers the performance of prediction and decision-making in a unified system. A noteworthy recent development is the end-to-end methods by directly optimizing the ultimate decision quality which claims to yield better results in contrast to the traditional two-stage approach. However, the evaluation benchmarks in this field are fragmented and the effectiveness of various models in different scenarios remains unclear, hindering the comprehensive assessment and fast deployment of these methods. To address these issues, we provide a comprehensive categorization of current approaches and integrate existing experimental scenarios to establish a unified benchmark, elucidating the circumstances under which end-to-end training yields improvements, as well as the contexts in which it performs ineffectively. We also introduce a new dataset for the industrial combinatorial advertising problem for inclusive finance to open-source. We hope the rethinking and benchmarking of PTO could facilitate more convenient evaluation and deployment, and inspire further improvements both in the academy and industry within this field. △ Less

Submitted 19 November, 2023; v1 submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.00371 [pdf, other]

Learning Cooperative Trajectory Representations for Motion Forecasting

Authors: Hongzhi Ruan, Haibao Yu, Wenxian Yang, Siqi Fan, Yingjuan Tang, Zaiqing Nie

Abstract: Motion forecasting is an essential task for autonomous driving, and the effective information utilization from infrastructure and other vehicles can enhance motion forecasting capabilities. Existing research have primarily focused on leveraging single-frame cooperative information to enhance the limited perception capability of the ego vehicle, while underutilizing the motion and interaction infor… ▽ More Motion forecasting is an essential task for autonomous driving, and the effective information utilization from infrastructure and other vehicles can enhance motion forecasting capabilities. Existing research have primarily focused on leveraging single-frame cooperative information to enhance the limited perception capability of the ego vehicle, while underutilizing the motion and interaction information of traffic participants observed from cooperative devices. In this paper, we first propose the cooperative trajectory representations learning paradigm. Specifically, we present V2X-Graph, the first interpretable and end-to-end learning framework for cooperative motion forecasting. V2X-Graph employs an interpretable graph to fully leverage the cooperative motion and interaction contexts. Experimental results on the vehicle-to-infrastructure (V2I) motion forecasting dataset, V2X-Seq, demonstrate the effectiveness of V2X-Graph. To further evaluate on V2X scenario, we construct the first real-world vehicle-to-everything (V2X) motion forecasting dataset V2X-Traj, and the performance shows the advantage of our method. We hope both V2X-Graph and V2X-Traj can facilitate the further development of cooperative motion forecasting. Find project at https://github.com/AIR-THU/V2X-Graph, find data at https://github.com/AIR-THU/DAIR-V2X-Seq. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.11062 [pdf, other]

NP-RDMA: Using Commodity RDMA without Pinning Memory

Authors: Huijun Shen, Guo Chen, Bojie Li, Xingtong Lin, Xingyu Zhang, Xizheng Wang, Amit Geron, Shamir Rabinovitch, Haifeng Lin, Han Ruan, Lijun Li, **gbin Zhou, Kun Tan

Abstract: Remote Direct Memory Access (RDMA) has been haunted by the need of pinning down memory regions. Pinning limits the memory utilization because it impedes on-demand paging and swap**. It also increases the initialization latency of large memory applications from seconds to minutes. To remove memory pining, existing approaches often require special hardware which supports page fault, and still have… ▽ More Remote Direct Memory Access (RDMA) has been haunted by the need of pinning down memory regions. Pinning limits the memory utilization because it impedes on-demand paging and swap**. It also increases the initialization latency of large memory applications from seconds to minutes. To remove memory pining, existing approaches often require special hardware which supports page fault, and still have inferior performance. We propose NP-RDMA, which removes memory pinning during memory registration and enables dynamic page fault handling with commodity RDMA NICs. NP-RDMA does not require NICs to support page fault. Instead, by monitoring local memory paging and swap** with MMU-notifier, combining with IOMMU/SMMU-based address map**, NP-RDMA efficiently detects and handles page fault in the software with near-zero additional latency to non-page-fault RDMA verbs. We implement an LD_PRELOAD library (with a modified kernel module), which is fully compatible with existing RDMA applications. Experiments show that NP-RDMA adds only 0.1{\sim}2 μs latency under non-page-fault scenarios. Moreover, NP-RDMA adds only 3.5{\sim}5.7 μs and 60 μs under minor or major page faults, respectively, which is 500x faster than ODP which uses advanced NICs that support page fault. With non-pinned memory, Spark initialization is 20x faster and the physical memory usage reduces by 86% with only 5.4% slowdown. Enterprise storage can expand to 5x capacity with SSDs while the average latency is only 10% higher. To the best of our knowledge, NP-RDMA is the first efficient and application-transparent software approach to remove memory pinning using commodity RDMA NICs. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2308.16381 [pdf, other]

Wasserstein Distributionally Robust Chance Constrained Trajectory Optimization for Mobile Robots within Uncertain Safe Corridor

Authors: Shaohang Xu, Haolin Ruan, Wentao Zhang, Yian Wang, Lijun Zhu, Chin Pang Ho

Abstract: Safe corridor-based Trajectory Optimization (TO) presents an appealing approach for collision-free path planning of autonomous robots, offering global optimality through its convex formulation. The safe corridor is constructed based on the perceived map, however, the non-ideal perception induces uncertainty, which is rarely considered in trajectory generation. In this paper, we propose Distributio… ▽ More Safe corridor-based Trajectory Optimization (TO) presents an appealing approach for collision-free path planning of autonomous robots, offering global optimality through its convex formulation. The safe corridor is constructed based on the perceived map, however, the non-ideal perception induces uncertainty, which is rarely considered in trajectory generation. In this paper, we propose Distributionally Robust Safe Corridor Constraints (DRSCCs) to consider the uncertainty of the safe corridor. Then, we integrate DRSCCs into the trajectory optimization framework using Bernstein basis polynomials. Theoretically, we rigorously prove that the trajectory optimization problem incorporating DRSCCs is equivalent to a computationally efficient, convex quadratic program. Compared to the nominal TO, our method enhances navigation safety by significantly reducing the infeasible motions in presence of uncertainty. Moreover, the proposed approach is validated through two robotic applications, a micro Unmanned Aerial Vehicle (UAV) and a quadruped robot Unitree A1. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: 7 pages

arXiv:2305.05938 [pdf, other]

V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting

Authors: Haibao Yu, Wenxian Yang, Hongzhi Ruan, Zhenwei Yang, Yingjuan Tang, Xu Gao, Xin Hao, Yifeng Shi, Yifeng Pan, Ning Sun, Juan Song, Jirui Yuan, ** Luo, Zaiqing Nie

Abstract: Utilizing infrastructure and vehicle-side information to track and forecast the behaviors of surrounding traffic participants can significantly improve decision-making and safety in autonomous driving. However, the lack of real-world sequential datasets limits research in this area. To address this issue, we introduce V2X-Seq, the first large-scale sequential V2X dataset, which includes data frame… ▽ More Utilizing infrastructure and vehicle-side information to track and forecast the behaviors of surrounding traffic participants can significantly improve decision-making and safety in autonomous driving. However, the lack of real-world sequential datasets limits research in this area. To address this issue, we introduce V2X-Seq, the first large-scale sequential V2X dataset, which includes data frames, trajectories, vector maps, and traffic lights captured from natural scenery. V2X-Seq comprises two parts: the sequential perception dataset, which includes more than 15,000 frames captured from 95 scenarios, and the trajectory forecasting dataset, which contains about 80,000 infrastructure-view scenarios, 80,000 vehicle-view scenarios, and 50,000 cooperative-view scenarios captured from 28 intersections' areas, covering 672 hours of data. Based on V2X-Seq, we introduce three new tasks for vehicle-infrastructure cooperative (VIC) autonomous driving: VIC3D Tracking, Online-VIC Forecasting, and Offline-VIC Forecasting. We also provide benchmarks for the introduced tasks. Find data, code, and more up-to-date information at \href{https://github.com/AIR-THU/DAIR-V2X-Seq}{https://github.com/AIR-THU/DAIR-V2X-Seq}. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: CVPR2023

arXiv:2303.02959 [pdf, other]

Butterfly: Multiple Reference Frames Feature Propagation Mechanism for Neural Video Compression

Authors: Feng Wang, Haihang Ruan, Fei Xiong, Jiayu Yang, Litian Li, Ronggang Wang

Abstract: Using more reference frames can significantly improve the compression efficiency in neural video compression. However, in low-latency scenarios, most existing neural video compression frameworks usually use the previous one frame as reference. Or a few frameworks which use the previous multiple frames as reference only adopt a simple multi-reference frames propagation mechanism. In this paper, we… ▽ More Using more reference frames can significantly improve the compression efficiency in neural video compression. However, in low-latency scenarios, most existing neural video compression frameworks usually use the previous one frame as reference. Or a few frameworks which use the previous multiple frames as reference only adopt a simple multi-reference frames propagation mechanism. In this paper, we present a more reasonable multi-reference frames propagation mechanism for neural video compression, called butterfly multi-reference frame propagation mechanism (Butterfly), which allows a more effective feature fusion of multi-reference frames. By this, we can generate more accurate temporal context conditional prior for Contextual Coding Module. Besides, when the number of decoded frames does not meet the required number of reference frames, we duplicate the nearest reference frame to achieve the requirement, which is better than duplicating the furthest one. Experiment results show that our method can significantly outperform the previous state-of-the-art (SOTA), and our neural codec can achieve -7.6% bitrate save on HEVC Class D dataset when compares with our base single-reference frame model with the same compression configuration. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: Accepted by DCC 2023

arXiv:2301.06084 [pdf]

Scattering-induced entropy boost for highly-compressed optical sensing and encryption

Authors: Liheng Bian, Xinrui Zhan, Xuyang Chang, Daoyu Li, Rong Yan, Yinuo Zhang, Haowen Ruan, Jun Zhang

Abstract: Image classification often relies on a high-quality machine vision system with a large view field and high resolution, demanding fine imaging optics, heavy computational costs, and large communication bandwidths between an image sensor and a computing unit. Here, we report a novel image-free sensing framework for resource efficient image classification where the required number of measurements can… ▽ More Image classification often relies on a high-quality machine vision system with a large view field and high resolution, demanding fine imaging optics, heavy computational costs, and large communication bandwidths between an image sensor and a computing unit. Here, we report a novel image-free sensing framework for resource efficient image classification where the required number of measurements can be reduced by up to two orders of magnitude. In the proposed framework of single-pixel detection, the optical field from a target is first scattered by an optical diffuser and then two-dimensionally modulated by a spatial light modulator. The optical diffuser simultaneously serves as a compressor and an encryptor for the target information, effectively narrowing the view field and improving the system's security. The one-dimensional sequence of intensity values, measured with time-varying patterns on the spatial light modulator, is then used to extract semantic information based on end-to-end deep learning. The proposed sensing framework is shown to provide over 95 percentage accuracy with the sampling rate of 1 percentage and 5 percentage, respectively, for the classification of MNIST dataset and the recognition of Chinese license plate, which was up to 24 percentage more efficient compared with the case without an optical diffuser. The proposed framework represents a significant breakthrough in realizing high-throughput machine intelligence for scene analysis, with low-bandwidth, low cost, and strong encryption. △ Less

Submitted 16 December, 2022; originally announced January 2023.

arXiv:2301.01045 [pdf, other]

Risk-Averse MDPs under Reward Ambiguity

Authors: Haolin Ruan, Zhi Chen, Chin Pang Ho

Abstract: We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown rewar… ▽ More We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments. △ Less

Submitted 3 January, 2023; v1 submitted 3 January, 2023; originally announced January 2023.

arXiv:2211.05910 [pdf, other]

Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, **gang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, **woo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li , et al. (71 additional authors not shown)

Abstract: Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose… ▽ More Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

arXiv:2109.05233 [pdf, other]

AdaK-NER: An Adaptive Top-K Approach for Named Entity Recognition with Incomplete Annotations

Authors: Hongtao Ruan, Liying Zheng, Peixian Hu

Abstract: State-of-the-art Named Entity Recognition(NER) models rely heavily on large amountsof fully annotated training data. However, ac-cessible data are often incompletely annotatedsince the annotators usually lack comprehen-sive knowledge in the target domain. Normallythe unannotated tokens are regarded as non-entities by default, while we underline thatthese tokens could either be non-entities orpart… ▽ More State-of-the-art Named Entity Recognition(NER) models rely heavily on large amountsof fully annotated training data. However, ac-cessible data are often incompletely annotatedsince the annotators usually lack comprehen-sive knowledge in the target domain. Normallythe unannotated tokens are regarded as non-entities by default, while we underline thatthese tokens could either be non-entities orpart of any entity. Here, we study NER mod-eling with incomplete annotated data whereonly a fraction of the named entities are la-beled, and the unlabeled tokens are equiva-lently multi-labeled by every possible label.Taking multi-labeled tokens into account, thenumerous possible paths can distract the train-ing model from the gold path (ground truthlabel sequence), and thus hinders the learn-ing ability. In this paper, we propose AdaK-NER, named the adaptive top-Kapproach, tohelp the model focus on a smaller feasible re-gion where the gold path is more likely to belocated. We demonstrate the superiority ofour approach through extensive experimentson both English and Chinese datasets, aver-agely improving 2% in F-score on the CoNLL-2003 and over 10% on two Chinese datasetscompared with the prior state-of-the-art works. △ Less

Submitted 8 June, 2022; v1 submitted 11 September, 2021; originally announced September 2021.

arXiv:2002.06199 [pdf, other]

Effective AER Object Classification Using Segmented Probability-Maximization Learning in Spiking Neural Networks

Authors: Qianhui Liu, Haibo Ruan, Dong Xing, Hua** Tang, Gang Pan

Abstract: Address event representation (AER) cameras have recently attracted more attention due to the advantages of high temporal resolution and low power consumption, compared with traditional frame-based cameras. Since AER cameras record the visual input as asynchronous discrete events, they are inherently suitable to coordinate with the spiking neural network (SNN), which is biologically plausible and e… ▽ More Address event representation (AER) cameras have recently attracted more attention due to the advantages of high temporal resolution and low power consumption, compared with traditional frame-based cameras. Since AER cameras record the visual input as asynchronous discrete events, they are inherently suitable to coordinate with the spiking neural network (SNN), which is biologically plausible and energy-efficient on neuromorphic hardware. However, using SNN to perform the AER object classification is still challenging, due to the lack of effective learning algorithms for this new representation. To tackle this issue, we propose an AER object classification model using a novel segmented probability-maximization (SPA) learning algorithm. Technically, 1) the SPA learning algorithm iteratively maximizes the probability of the classes that samples belong to, in order to improve the reliability of neuron responses and effectiveness of learning; 2) a peak detection (PD) mechanism is introduced in SPA to locate informative time points segment by segment, based on which information within the whole event stream can be fully utilized by the learning. Extensive experimental results show that, compared to state-of-the-art methods, not only our model is more effective, but also it requires less information to reach a certain level of accuracy. △ Less

Submitted 13 February, 2020; originally announced February 2020.

Comments: AAAI 2020 (Oral)

arXiv:1912.01506 [pdf, ps, other]

Study of Distributed Robust Beamforming with Low-Rank and Cross-Correlation Techniques

Authors: H. Ruan, R. C. de Lamare

Abstract: In this work, we present a novel robust distributed beamforming (RDB) approach based on low-rank and cross-correlation techniques. The proposed RDB approach mitigates the effects of channel errors in wireless networks equipped with relays based on the exploitation of the cross-correlation between the received data from the relays at the destination and the system output and low-rank techniques. Th… ▽ More In this work, we present a novel robust distributed beamforming (RDB) approach based on low-rank and cross-correlation techniques. The proposed RDB approach mitigates the effects of channel errors in wireless networks equipped with relays based on the exploitation of the cross-correlation between the received data from the relays at the destination and the system output and low-rank techniques. The relay nodes are equipped with an amplify-and-forward (AF) protocol and the channel errors are modeled using an additive matrix perturbation, which results in degradation of the system performance. The proposed method, denoted low-rank and cross-correlation RDB (LRCC-RDB), considers a total relay transmit power constraint in the system and the goal of maximizing the output signal-to-interference-plus-noise ratio (SINR). We carry out a performance analysis of the proposed LRCC-RDB technique along with a computational complexity study. The proposed LRCC-RDB does not require any costly online optimization procedure and simulations show an excellent performance as compared to previously reported algorithms. △ Less

Submitted 26 November, 2019; originally announced December 2019.

Comments: 14 pages, 9 figures. arXiv admin note: text overlap with arXiv:1712.01115

arXiv:1911.08261 [pdf, other]

Unsupervised AER Object Recognition Based on Multiscale Spatio-Temporal Features and Spiking Neurons

Authors: Qianhui Liu, Gang Pan, Haibo Ruan, Dong Xing, Qi Xu, Hua** Tang

Abstract: This paper proposes an unsupervised address event representation (AER) object recognition approach. The proposed approach consists of a novel multiscale spatio-temporal feature (MuST) representation of input AER events and a spiking neural network (SNN) using spike-timing-dependent plasticity (STDP) for object recognition with MuST. MuST extracts the features contained in both the spatial and temp… ▽ More This paper proposes an unsupervised address event representation (AER) object recognition approach. The proposed approach consists of a novel multiscale spatio-temporal feature (MuST) representation of input AER events and a spiking neural network (SNN) using spike-timing-dependent plasticity (STDP) for object recognition with MuST. MuST extracts the features contained in both the spatial and temporal information of AER event flow, and meanwhile forms an informative and compact feature spike representation. We show not only how MuST exploits spikes to convey information more effectively, but also how it benefits the recognition using SNN. The recognition process is performed in an unsupervised manner, which does not need to specify the desired status of every single neuron of SNN, and thus can be flexibly applied in real-world recognition tasks. The experiments are performed on five AER datasets including a new one named GESTURE-DVS. Extensive experimental results show the effectiveness and advantages of this proposed approach. △ Less

Submitted 19 November, 2019; originally announced November 2019.

arXiv:1712.01115 [pdf, ps, other]

Study of Robust Distributed Beamforming Based on Cross-Correlation and Subspace Projection Techniques

Authors: H. Ruan, R. C. de Lamare

Abstract: In this work, we present a novel robust distributed beamforming (RDB) approach to mitigate the effects of channel errors on wireless networks equipped with relays based on the exploitation of the cross-correlation between the received data from the relays at the destination and the system output. The proposed RDB method, denoted cross-correlation and subspace projection (CCSP) RDB, considers a tot… ▽ More In this work, we present a novel robust distributed beamforming (RDB) approach to mitigate the effects of channel errors on wireless networks equipped with relays based on the exploitation of the cross-correlation between the received data from the relays at the destination and the system output. The proposed RDB method, denoted cross-correlation and subspace projection (CCSP) RDB, considers a total relay transmit power constraint in the system and the objective of maximizing the output signal-to-interference-plus-noise ratio (SINR). The relay nodes are equipped with an amplify-and-forward (AF) protocol and we assume that the channel state information (CSI) is imperfectly known at the relays and there is no direct link between the sources and the destination. The CCSP does not require any costly optimization procedure and simulations show an excellent performance as compared to previously reported algorithms. △ Less

Submitted 30 November, 2017; originally announced December 2017.

Comments: 3 figures, 7 pages. arXiv admin note: text overlap with arXiv:1707.00953"

arXiv:1707.08189 [pdf, ps, other]

Study of Joint MSINR and Relay Selection Algorithms for Distributed Beamforming

Authors: Hang Ruan, Rodrigo C. de Lamare

Abstract: This paper presents joint maximum signal-to-interference-plus-noise ratio (MSINR) and relay selection algorithms for distributed beamforming. We propose a joint MSINR and restricted greedy search relay selection (RGSRS) algorithm with a total relay transmit power constraint that iteratively optimizes both the beamforming weights at the relays nodes, maximizing the SINR at the destination. Specific… ▽ More This paper presents joint maximum signal-to-interference-plus-noise ratio (MSINR) and relay selection algorithms for distributed beamforming. We propose a joint MSINR and restricted greedy search relay selection (RGSRS) algorithm with a total relay transmit power constraint that iteratively optimizes both the beamforming weights at the relays nodes, maximizing the SINR at the destination. Specifically, we devise a relay selection scheme that based on greedy search and compare it to other schemes like restricted random relay selection (RRRS) and restricted exhaustive search relay selection (RESRS). A complexity analysis is provided and simulation results show that the proposed joint MSINR and RGSRS algorithm achieves excellent bit error rate (BER) and SINR performances. △ Less

Submitted 23 July, 2017; originally announced July 2017.

Comments: 7 pages, 2 figures. arXiv admin note: text overlap with arXiv:1707.00953

arXiv:1707.00953 [pdf, ps, other]

Study of Joint MMSE Consensus and Relay Selection Algorithms for Distributed Beamforming

Authors: H. Ruan, R. C. de Lamare

Abstract: This work presents joint minimum mean-square error (MMSE) consensus algorithm and relay selection algorithms for distributed beamforming. We propose joint MMSE consensus relay and selection schemes with a total power constraint and local communications among the relays for a network with cooperating sensors. We also devise greedy relay selection algorithms based on the MMSE consensus approach that… ▽ More This work presents joint minimum mean-square error (MMSE) consensus algorithm and relay selection algorithms for distributed beamforming. We propose joint MMSE consensus relay and selection schemes with a total power constraint and local communications among the relays for a network with cooperating sensors. We also devise greedy relay selection algorithms based on the MMSE consensus approach that optimize the network performance. Simulation results show that the proposed scheme and algorithms outperform existing techniques for distributed beamforming. △ Less

Submitted 28 May, 2017; originally announced July 2017.

Comments: 2 tables, 3 figures, 11 pages. arXiv admin note: text overlap with arXiv:1310.7282

arXiv:1606.01313 [pdf, ps, other]

doi 10.1109/TSP.2016.2550006

Design of Robust Adaptive Beamforming Algorithms Based on Low-Rank and Cross-Correlation Techniques

Authors: H. Ruan, R. C. de Lamare

Abstract: This work presents cost-effective low-rank techniques for designing robust adaptive beamforming (RAB) algorithms. The proposed algorithms are based on the exploitation of the cross-correlation between the array observation data and the output of the beamformer. Firstly, we construct a general linear equation considered in large dimensions whose solution yields the steering vector mismatch. Then, w… ▽ More This work presents cost-effective low-rank techniques for designing robust adaptive beamforming (RAB) algorithms. The proposed algorithms are based on the exploitation of the cross-correlation between the array observation data and the output of the beamformer. Firstly, we construct a general linear equation considered in large dimensions whose solution yields the steering vector mismatch. Then, we employ the idea of the full orthogonalization method (FOM), an orthogonal Krylov subspace based method, to iteratively estimate the steering vector mismatch in a reduced-dimensional subspace, resulting in the proposed orthogonal Krylov subspace projection mismatch estimation (OKSPME) method. We also devise adaptive algorithms based on stochastic gradient (SG) and conjugate gradient (CG) techniques to update the beamforming weights with low complexity and avoid any costly matrix inversion. The main advantages of the proposed low-rank and mismatch estimation techniques are their cost-effectiveness when dealing with high dimension subspaces or large sensor arrays. Simulations results show excellent performance in terms of the output signal-to-interference-plus-noise ratio (SINR) of the beamformer among all the compared RAB methods. △ Less

Submitted 3 June, 2016; originally announced June 2016.

Comments: 11 figures, 12 pages

arXiv:1603.08188 [pdf, other]

doi 10.1109/JSTSP.2016.2627183

The Random Frequency Diverse Array: A New Antenna Structure for Uncoupled Direction-Range Indication in Active Sensing

Authors: Yimin Liu, Hang Ruan, Lei Wang, Arye Nehorai

Abstract: In this paper, we propose a new type of array antenna, termed the Random Frequency Diverse Array (RFDA), for an uncoupled indication of target direction and range with low system complexity. In RFDA, each array element has a narrow bandwidth and a randomly assigned carrier frequency. The beampattern of the array is shown to be stochastic but thumbtack-like, and its stochastic characteristics, such… ▽ More In this paper, we propose a new type of array antenna, termed the Random Frequency Diverse Array (RFDA), for an uncoupled indication of target direction and range with low system complexity. In RFDA, each array element has a narrow bandwidth and a randomly assigned carrier frequency. The beampattern of the array is shown to be stochastic but thumbtack-like, and its stochastic characteristics, such as the mean, variance, and asymptotic distribution are derived analytically. Based on these two features, we propose two kinds of algorithms for signal processing. One is matched filtering, due to the beampattern's good characteristics. The other is compressive sensing, because the new approach can be regarded as a sparse and random sampling of target information in the spatial-frequency domain. Fundamental limits, such as the Cramér-Rao bound and the observing matrix's mutual coherence, are provided as performance guarantees of the new array structure. The features and performances of RFDA are verified with numerical results. △ Less

Submitted 27 March, 2016; originally announced March 2016.

Comments: 13 pages, 10 figures

arXiv:1512.01601 [pdf, ps, other]

Study of Efficient Robust Adaptive Beamforming Algorithms Based on Shrinkage Techniques

Authors: H. Ruan, R. C. de Lamare

Abstract: This paper proposes low-complexity robust adaptive beamforming (RAB) techniques based on shrinkage methods. We firstly briefly review a Low-Complexity Shrinkage-Based Mismatch Estimation (LOCSME) batch algorithm to estimate the desired signal steering vector mismatch, in which the interference-plus-noise covariance (INC) matrix is also estimated with a recursive matrix shrinkage method. Then we de… ▽ More This paper proposes low-complexity robust adaptive beamforming (RAB) techniques based on shrinkage methods. We firstly briefly review a Low-Complexity Shrinkage-Based Mismatch Estimation (LOCSME) batch algorithm to estimate the desired signal steering vector mismatch, in which the interference-plus-noise covariance (INC) matrix is also estimated with a recursive matrix shrinkage method. Then we develop low complexity adaptive robust version of the conjugate gradient (CG) algorithm to both estimate the steering vector mismatch and update the beamforming weights. A computational complexity study of the proposed and existing algorithms is carried out. Simulations are conducted in local scattering scenarios and comparisons to existing RAB techniques are provided. △ Less

Submitted 4 December, 2015; originally announced December 2015.

Comments: 9 pages, 2 figures. arXiv admin note: text overlap with arXiv:1505.06788

arXiv:1505.06788 [pdf, ps, other]

Low-Complexity Robust Adaptive Beamforming Algorithms Based on Shrinkage for Mismatch Estimation

Authors: H. Ruan, R. C. de Lamare

Abstract: In this paper, we propose low-complexity robust adaptive beamforming (RAB) techniques that based on shrinkage methods. The only prior knowledge required by the proposed algorithms are the angular sector in which the actual steering vector is located and the antenna array geometry. We firstly present a Low-Complexity Shrinkage-Based Mismatch Estimation (LOCSME) algorithm to estimate the desired sig… ▽ More In this paper, we propose low-complexity robust adaptive beamforming (RAB) techniques that based on shrinkage methods. The only prior knowledge required by the proposed algorithms are the angular sector in which the actual steering vector is located and the antenna array geometry. We firstly present a Low-Complexity Shrinkage-Based Mismatch Estimation (LOCSME) algorithm to estimate the desired signal steering vector mismatch, in which the interference-plus-noise covariance (INC) matrix is estimated with Oracle Approximating Shrinkage (OAS) method and the weights are computed with matrix inversions. We then develop low-cost stochastic gradient (SG) recursions to estimate the INC matrix and update the beamforming weights, resulting in the proposed LOCSME-SG algorithm. Simulation results show that both LOCSME and LOCSME-SG achieve very good output signal-to-interference-plus-noise ratio (SINR) compared to previously reported adaptive RAB algorithms. △ Less

Submitted 25 May, 2015; originally announced May 2015.

Comments: 8 pages, 2 figures, WSA. arXiv admin note: text overlap with arXiv:1311.2331

arXiv:1311.2331 [pdf, ps, other]

doi 10.1109/LSP.2013.2290948

Robust Adaptive Beamforming Based on Low-Complexity Shrinkage-Based Mismatch Estimation

Authors: Hang Ruan, Rodrigo C. de Lamare

Abstract: In this work, we propose a low-complexity robust adaptive beamforming (RAB) technique which estimates the steering vector using a Low-Complexity Shrinkage-Based Mismatch Estimation (LOCSME) algorithm. The proposed LOCSME algorithm estimates the covariance matrix of the input data and the interference-plus-noise covariance (INC) matrix by using the Oracle Approximating Shrinkage (OAS) method. LOCSM… ▽ More In this work, we propose a low-complexity robust adaptive beamforming (RAB) technique which estimates the steering vector using a Low-Complexity Shrinkage-Based Mismatch Estimation (LOCSME) algorithm. The proposed LOCSME algorithm estimates the covariance matrix of the input data and the interference-plus-noise covariance (INC) matrix by using the Oracle Approximating Shrinkage (OAS) method. LOCSME only requires prior knowledge of the angular sector in which the actual steering vector is located and the antenna array geometry. LOCSME does not require a costly optimization algorithm and does not need to know extra information from the interferers, which avoids direction finding for all interferers. Simulations show that LOCSME outperforms previously reported RAB algorithms and has a performance very close to the optimum. △ Less

Submitted 10 November, 2013; originally announced November 2013.

Comments: 5 pages, 2 figures. IEEE Signal Processing Letters, 2013

Showing 1–26 of 26 results for author: Ruan, H