Search | arXiv e-print repository

Harnessing Business and Media Insights with Large Language Models

Authors: Yujia Bao, Ankit Parag Shah, Neeru Narang, Jonathan Rivers, Rajeev Maksey, Lan Guan, Louise N. Barrere, Shelley Evenson, Rahul Basole, Connie Miao, Ankit Mehta, Fabien Boulay, Su Min Park, Natalie E. Pearson, Eldhose Joy, Tiger He, Sumiran Thakur, Koustav Ghosal, Josh On, Phoebe Morrison, Tim Major, Eva Siqi Wang, Gina Escobar, Jiaheng Wei, Tharindu Cyril Weerasooriya , et al. (8 additional authors not shown)

Abstract: This paper introduces Fortune Analytics Language Model (FALM). FALM empowers users with direct access to comprehensive business analysis, including market trends, company performance metrics, and expert insights. Unlike generic LLMs, FALM leverages a curated knowledge base built from professional journalism, enabling it to deliver precise and in-depth answers to intricate business questions. Users… ▽ More This paper introduces Fortune Analytics Language Model (FALM). FALM empowers users with direct access to comprehensive business analysis, including market trends, company performance metrics, and expert insights. Unlike generic LLMs, FALM leverages a curated knowledge base built from professional journalism, enabling it to deliver precise and in-depth answers to intricate business questions. Users can further leverage natural language queries to directly visualize financial data, generating insightful charts and graphs to understand trends across diverse business sectors clearly. FALM fosters user trust and ensures output accuracy through three novel methods: 1) Time-aware reasoning guarantees accurate event registration and prioritizes recent updates. 2) Thematic trend analysis explicitly examines topic evolution over time, providing insights into emerging business landscapes. 3) Content referencing and task decomposition enhance answer fidelity and data visualization accuracy. We conduct both automated and human evaluations, demonstrating FALM's significant performance improvements over baseline methods while prioritizing responsible AI practices. These benchmarks establish FALM as a cutting-edge LLM in the business and media domains, with exceptional accuracy and trustworthiness. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.20625 [pdf, other]

Robust Planning with LLM-Modulo Framework: Case Study in Travel Planning

Authors: Atharva Gundawar, Mudit Verma, Lin Guan, Karthik Valmeekam, Siddhant Bhambri, Subbarao Kambhampati

Abstract: As the applicability of Large Language Models (LLMs) extends beyond traditional text processing tasks, there is a burgeoning interest in their potential to excel in planning and reasoning assignments, realms traditionally reserved for System 2 cognitive competencies. Despite their perceived versatility, the research community is still unraveling effective strategies to harness these models in such… ▽ More As the applicability of Large Language Models (LLMs) extends beyond traditional text processing tasks, there is a burgeoning interest in their potential to excel in planning and reasoning assignments, realms traditionally reserved for System 2 cognitive competencies. Despite their perceived versatility, the research community is still unraveling effective strategies to harness these models in such complex domains. The recent discourse introduced by the paper on LLM Modulo marks a significant stride, proposing a conceptual framework that enhances the integration of LLMs into diverse planning and reasoning activities. This workshop paper delves into the practical application of this framework within the domain of travel planning, presenting a specific instance of its implementation. We are using the Travel Planning benchmark by the OSU NLP group, a benchmark for evaluating the performance of LLMs in producing valid itineraries based on user queries presented in natural language. While popular methods of enhancing the reasoning abilities of LLMs such as Chain of Thought, ReAct, and Reflexion achieve a meager 0%, 0.6%, and 0% with GPT3.5-Turbo respectively, our operationalization of the LLM-Modulo framework for TravelPlanning domain provides a remarkable improvement, enhancing baseline performances by 4.6x for GPT4-Turbo and even more for older models like GPT3.5-Turbo from 0% to 5%. Furthermore, we highlight the other useful roles of LLMs in the planning pipeline, as suggested in LLM-Modulo, which can be reliably operationalized such as extraction of useful critics and reformulator for critics. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.03196 [pdf, ps, other]

Design and Analysis of Massive Uncoupled Unsourced Random Access with Bayesian Joint Decoding

Authors: Feiyan Tian, Xiaoming Chen, Yong Liang Guan, Chau Yuen

Abstract: In this paper, we investigate unsourced random access for massive machine-type communications (mMTC) in the sixth-generation (6G) wireless networks. Firstly, we establish a high-efficiency uncoupled framework for massive unsourced random access without extra parity check bits. Then, we design a low-complexity Bayesian joint decoding algorithm, including codeword detection and stitching. In particu… ▽ More In this paper, we investigate unsourced random access for massive machine-type communications (mMTC) in the sixth-generation (6G) wireless networks. Firstly, we establish a high-efficiency uncoupled framework for massive unsourced random access without extra parity check bits. Then, we design a low-complexity Bayesian joint decoding algorithm, including codeword detection and stitching. In particular, we present a Bayesian codeword detection approach by exploiting Bayes-optimal divergence-free orthogonal approximate message passing in the case of unknown priors. The output long-term channel statistic information is well leveraged to stitch codewords for recovering the original message. Thus, the spectral efficiency is improved by avoiding the use of parity bits. Moreover, we analyze the performance of the proposed Bayesian joint decoding-based massive uncoupled unsourced random access scheme in terms of computational complexity and error probability of decoding. Furthermore, by asymptotic analysis, we obtain some useful insights for the design of massive unsourced random access. Finally, extensive simulation results confirm the effectiveness of the proposed scheme in 6G wireless networks. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.05678 [pdf, other]

Flexible Fairness Learning via Inverse Conditional Permutation

Authors: Yuheng Lai, Leying Guan

Abstract: Equalized odds, as a popular notion of algorithmic fairness, aims to ensure that sensitive variables, such as race and gender, do not unfairly influence the algorithm prediction when conditioning on the true outcome. Despite rapid advancements, most of the current research focuses on the violation of equalized odds caused by one sensitive attribute, leaving the challenge of simultaneously accounti… ▽ More Equalized odds, as a popular notion of algorithmic fairness, aims to ensure that sensitive variables, such as race and gender, do not unfairly influence the algorithm prediction when conditioning on the true outcome. Despite rapid advancements, most of the current research focuses on the violation of equalized odds caused by one sensitive attribute, leaving the challenge of simultaneously accounting for multiple attributes under-addressed. We address this gap by introducing a fairness learning approach that integrates adversarial learning with a novel inverse conditional permutation. This approach effectively and flexibly handles multiple sensitive attributes, potentially of mixed data types. The efficacy and flexibility of our method are demonstrated through both simulation studies and empirical analysis of real-world datasets. △ Less

Submitted 9 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.08343 [pdf, ps, other]

Coverage and Rate Analysis for Integrated Sensing and Communication Networks

Authors: Xu Gan, Chongwen Huang, Zhaohui Yang, Xiaoming Chen, Jiguang He, Zhaoyang Zhang, Chau Yuen, Yong Liang Guan, Mérouane Debbah

Abstract: Integrated sensing and communication (ISAC) is increasingly recognized as a pivotal technology for next-generation cellular networks, offering mutual benefits in both sensing and communication capabilities. This advancement necessitates a re-examination of the fundamental limits within networks where these two functions coexist via shared spectrum and infrastructures. However, traditional stochast… ▽ More Integrated sensing and communication (ISAC) is increasingly recognized as a pivotal technology for next-generation cellular networks, offering mutual benefits in both sensing and communication capabilities. This advancement necessitates a re-examination of the fundamental limits within networks where these two functions coexist via shared spectrum and infrastructures. However, traditional stochastic geometry-based performance analyses are confined to either communication or sensing networks separately. This paper bridges this gap by introducing a generalized stochastic geometry framework in ISAC networks. Based on this framework, we define and calculate the coverage and ergodic rate of sensing and communication performance under resource constraints. Then, we shed light on the fundamental limits of ISAC networks by presenting theoretical results for the coverage rate of the unified performance, taking into account the coupling effects of dual functions in coexistence networks. Further, we obtain the analytical formulations for evaluating the ergodic sensing rate constrained by the maximum communication rate, and the ergodic communication rate constrained by the maximum sensing rate. Extensive numerical results validate the accuracy of all theoretical derivations, and also indicate that denser networks significantly enhance ISAC coverage. Specifically, increasing the base station density from $1$ $\text{km}^{-2}$ to $10$ $\text{km}^{-2}$ can boost the ISAC coverage rate from $1.4\%$ to $39.8\%$. Further, results also reveal that with the increase of the constrained sensing rate, the ergodic communication rate improves significantly, but the reverse is not obvious. △ Less

Submitted 22 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2402.08224 [pdf, ps, other]

Two-Dimensional Direction-of-Arrival Estimation Using Stacked Intelligent Metasurfaces

Authors: Jiancheng An, Chau Yuen, Yong Liang Guan, Marco Di Renzo, Mérouane Debbah, H. Vincent Poor, Lajos Hanzo

Abstract: Stacked intelligent metasurfaces (SIM) are capable of emulating reconfigurable physical neural networks by relying on electromagnetic (EM) waves as carriers. They can also perform various complex computational and signal processing tasks. A SIM is fabricated by densely integrating multiple metasurface layers, each consisting of a large number of small meta-atoms that can control the EM waves passi… ▽ More Stacked intelligent metasurfaces (SIM) are capable of emulating reconfigurable physical neural networks by relying on electromagnetic (EM) waves as carriers. They can also perform various complex computational and signal processing tasks. A SIM is fabricated by densely integrating multiple metasurface layers, each consisting of a large number of small meta-atoms that can control the EM waves passing through it. In this paper, we harness a SIM for two-dimensional (2D) direction-of-arrival (DOA) estimation. In contrast to the conventional designs, an advanced SIM in front of the receiver array automatically carries out the 2D discrete Fourier transform (DFT) as the incident waves propagate through it. As a result, the receiver array directly observes the angular spectrum of the incoming signal. In this context, the DOA estimates can be readily obtained by using probes to detect the energy distribution on the receiver array. This avoids the need for power-thirsty radio frequency (RF) chains. To enable SIM to perform the 2D DFT, we formulate the optimization problem of minimizing the fitting error between the SIM's EM response and the 2D DFT matrix. Furthermore, a gradient descent algorithm is customized for iteratively updating the phase shift of each meta-atom in SIM. To further improve the DOA estimation accuracy, we configure the phase shift pattern in the zeroth layer of the SIM to generate a set of 2D DFT matrices associated with orthogonal spatial frequency bins. Additionally, we analytically evaluate the performance of the proposed SIM-based DOA estimator by deriving a tight upper bound for the mean square error (MSE). Our numerical simulations verify the capability of a well-trained SIM to perform DOA estimation and corroborate our theoretical analysis. It is demonstrated that a SIM having an optical computational speed achieves an MSE of $10^{-4}$ for DOA estimation. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: 37 pages, 12 figures, and 2 tables. arXiv admin note: text overlap with arXiv:2310.09861

arXiv:2402.06116 [pdf, other]

LLMs for Coding and Robotics Education

Authors: Peng Shu, Huaqin Zhao, Hanqi Jiang, Yiwei Li, Shaochen Xu, Yi Pan, Zihao Wu, Zhengliang Liu, Guoyu Lu, Le Guan, Gong Chen, Xianqiao Wang Tianming Liu

Abstract: Large language models and multimodal large language models have revolutionized artificial intelligence recently. An increasing number of regions are now embracing these advanced technologies. Within this context, robot coding education is garnering increasing attention. To teach young children how to code and compete in robot challenges, large language models are being utilized for robot code expl… ▽ More Large language models and multimodal large language models have revolutionized artificial intelligence recently. An increasing number of regions are now embracing these advanced technologies. Within this context, robot coding education is garnering increasing attention. To teach young children how to code and compete in robot challenges, large language models are being utilized for robot code explanation, generation, and modification. In this paper, we highlight an important trend in robot coding education. We test several mainstream large language models on both traditional coding tasks and the more challenging task of robot code generation, which includes block diagrams. Our results show that GPT-4V outperforms other models in all of our tests but struggles with generating block diagram images. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 20 pages, 6 figures, 1 table

arXiv:2402.04210 [pdf, other]

"Task Success" is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors

Authors: Lin Guan, Yifan Zhou, Denis Liu, Yantian Zha, Heni Ben Amor, Subbarao Kambhampati

Abstract: Large-scale generative models are shown to be useful for sampling meaningful candidate solutions, yet they often overlook task constraints and user preferences. Their full power is better harnessed when the models are coupled with external verifiers and the final solutions are derived iteratively or progressively according to the verification feedback. In the context of embodied AI, verification o… ▽ More Large-scale generative models are shown to be useful for sampling meaningful candidate solutions, yet they often overlook task constraints and user preferences. Their full power is better harnessed when the models are coupled with external verifiers and the final solutions are derived iteratively or progressively according to the verification feedback. In the context of embodied AI, verification often solely involves assessing whether goal conditions specified in the instructions have been met. Nonetheless, for these agents to be seamlessly integrated into daily life, it is crucial to account for a broader range of constraints and preferences beyond bare task success (e.g., a robot should grasp bread with care to avoid significant deformations). However, given the unbounded scope of robot tasks, it is infeasible to construct scripted verifiers akin to those used for explicit-knowledge tasks like the game of Go and theorem proving. This begs the question: when no sound verifier is available, can we use large vision and language models (VLMs), which are approximately omniscient, as scalable Behavior Critics to catch undesirable robot behaviors in videos? To answer this, we first construct a benchmark that contains diverse cases of goal-reaching yet undesirable robot policies. Then, we comprehensively evaluate VLM critics to gain a deeper understanding of their strengths and failure modes. Based on the evaluation, we provide guidelines on how to effectively utilize VLM critiques and showcase a practical way to integrate the feedback into an iterative process of policy refinement. The dataset and codebase are released at: https://guansuns.github.io/pages/vlm-critic. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.01817 [pdf, other]

LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

Authors: Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Mudit Verma, Kaya Stechly, Siddhant Bhambri, Lucas Saldyt, Anil Murthy

Abstract: There is considerable confusion about the role of Large Language Models (LLMs) in planning and reasoning tasks. On one side are over-optimistic claims that LLMs can indeed do these tasks with just the right prompting or self-verification strategies. On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the probl… ▽ More There is considerable confusion about the role of Large Language Models (LLMs) in planning and reasoning tasks. On one side are over-optimistic claims that LLMs can indeed do these tasks with just the right prompting or self-verification strategies. On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the problem specification from one syntactic format to another, and ship the problem off to external symbolic solvers. In this position paper, we take the view that both these extremes are misguided. We argue that auto-regressive LLMs cannot, by themselves, do planning or self-verification (which is after all a form of reasoning), and shed some light on the reasons for misunderstandings in the literature. We will also argue that LLMs should be viewed as universal approximate knowledge sources that have much more meaningful roles to play in planning/reasoning tasks beyond simple front-end/back-end format translators. We present a vision of {\bf LLM-Modulo Frameworks} that combine the strengths of LLMs with external model-based verifiers in a tighter bi-directional interaction regime. We will show how the models driving the external verifiers themselves can be acquired with the help of LLMs. We will also argue that rather than simply pipelining LLMs and symbolic components, this LLM-Modulo Framework provides a better neuro-symbolic approach that offers tighter integration between LLMs and symbolic components, and allows extending the scope of model-based planning/reasoning regimes towards more flexible knowledge, problem and preference specifications. △ Less

Submitted 11 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Journal ref: Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

arXiv:2402.00455 [pdf, ps, other]

New Lower Bounds on Aperiodic Ambiguity Function of Unimodular Sequences

Authors: Lingsheng Meng, Yong Liang Guan, Yao Ge, Zilong Liu, **zhi Fan

Abstract: This paper presents new aperiodic ambiguity function (AF) lower bounds of unimodular sequences under certain low ambiguity zone. Our key idea, motivated by the Levenshtein correlation bound, is to introduce two weight vectors associated to the delay and Doppler shifts, respectively, and then exploit the upper and lower bounds on the Frobenius norm of the weighted auto- and cross-AF matrices to der… ▽ More This paper presents new aperiodic ambiguity function (AF) lower bounds of unimodular sequences under certain low ambiguity zone. Our key idea, motivated by the Levenshtein correlation bound, is to introduce two weight vectors associated to the delay and Doppler shifts, respectively, and then exploit the upper and lower bounds on the Frobenius norm of the weighted auto- and cross-AF matrices to derive these bounds. Furthermore, the inherent structure properties of aperiodic AF are also utilized in our derivation. The derived bounds are useful design guidelines for optimal AF sha** in modern communication and radar systems. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 5 pages, 1 figure

arXiv:2401.15289 [pdf, other]

SoK: Where's the "up"?! A Comprehensive (bottom-up) Study on the Security of Arm Cortex-M Systems

Authors: Xi Tan, Zheyuan Ma, Sandro Pinto, Le Guan, Ning Zhang, Jun Xu, Zhiqiang Lin, Hongxin Hu, Ziming Zhao

Abstract: Arm Cortex-M processors are the most widely used 32-bit microcontrollers among embedded and Internet-of-Things devices. Despite the widespread usage, there has been little effort in summarizing their hardware security features, characterizing the limitations and vulnerabilities of their hardware and software stack, and systematizing the research on securing these systems. The goals and contributio… ▽ More Arm Cortex-M processors are the most widely used 32-bit microcontrollers among embedded and Internet-of-Things devices. Despite the widespread usage, there has been little effort in summarizing their hardware security features, characterizing the limitations and vulnerabilities of their hardware and software stack, and systematizing the research on securing these systems. The goals and contributions of this paper are multi-fold. First, we analyze the hardware security limitations and issues of Cortex-M systems. Second, we conducted a deep study of the software stack designed for Cortex-M and revealed its limitations, which is accompanied by an empirical analysis of 1,797 real-world firmware. Third, we categorize the reported bugs in Cortex-M software systems. Finally, we systematize the efforts that aim at securing Cortex-M systems and evaluate them in terms of the protections they offer, runtime performance, required hardware features, etc. Based on the insights, we develop a set of recommendations for the research community and MCU software developers. △ Less

Submitted 13 May, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

Comments: To Appear in the 18th USENIX WOOT Conference on Offensive Technologies, August 12-13, 2024

ACM Class: C.0; K.6.5

arXiv:2312.09452 [pdf, other]

Efficient Multi-Pair IoT Communication with Holographically Enhanced Meta-Surfaces Leveraging OAM Beams: Bridging Theory and Prototype

Authors: Yufei Zhao, Yong Liang Guan, Afkar Mohamed Ismail, Gaohua Ju, Deyu Lin, Yilong Lu, Chau Yuen

Abstract: Meta-surfaces, also known as Reconfigurable Intelligent Surfaces (RIS), have emerged as a cost-effective, low power consumption, and flexible solution for enabling multiple applications in Internet of Things (IoT). However, in the context of meta-surface-assisted multi-pair IoT communications, significant interference issues often arise amount multiple channels. This issue is particularly pronounc… ▽ More Meta-surfaces, also known as Reconfigurable Intelligent Surfaces (RIS), have emerged as a cost-effective, low power consumption, and flexible solution for enabling multiple applications in Internet of Things (IoT). However, in the context of meta-surface-assisted multi-pair IoT communications, significant interference issues often arise amount multiple channels. This issue is particularly pronounced in scenarios characterized by Line-of-Sight (LoS) conditions, where the channels exhibit low rank due to the significant correlation in propagation paths. These challenges pose a considerable threat to the quality of communication when multiplexing data streams. In this paper, we introduce a meta-surface-aided communication scheme for multi-pair interactions in IoT environments. Inspired by holographic technology, a novel compensation method on the whole meta-surface has been proposed, which allows for independent multi-pair direct data streams transmission with low interference. To further reduce correlation under LoS channel conditions, we propose a vortex beam-based solution that leverages the low correlation property between distinct topological modes. We use different vortex beams to carry distinct data streams, thereby enabling distinct receivers to capture their intended signal with low interference, aided by holographic meta-surfaces. Moreover, a prototype has been performed successfully to demonstrate two-pair multi-node communication scenario operating at 10 GHz with QPSK/16-QAM modulation. △ Less

Submitted 18 November, 2023; originally announced December 2023.

Comments: Meta-surface, RIS, Internet-of-Things (IoT), Line-of-Sight (LoS), Orbital Angular Momentum (OAM), holographic communications, multi-user

arXiv:2312.09439 [pdf, other]

Smart Roads: Roadside Perception, Vehicle-Road Cooperation and Business Model

Authors: Rui Chen, Lu Gao, Yutian Liu, Yong Liang Guan, Yan Zhang

Abstract: Smart roads have become an essential component of intelligent transportation systems (ITS). The roadside perception technology, a critical aspect of smart roads, utilizes various sensors, roadside units (RSUs), and edge computing devices to gather real-time traffic data for vehicle-road cooperation. However, the full potential of smart roads in improving the safety and efficiency of autonomous veh… ▽ More Smart roads have become an essential component of intelligent transportation systems (ITS). The roadside perception technology, a critical aspect of smart roads, utilizes various sensors, roadside units (RSUs), and edge computing devices to gather real-time traffic data for vehicle-road cooperation. However, the full potential of smart roads in improving the safety and efficiency of autonomous vehicles only can be realized through the mass deployment of roadside perception and communication devices. On the one hand, roadside devices require significant investment but can only achieve monitoring function currently, resulting in no profitability for investors. On the other hand, drivers lack trust in the safety of autonomous driving technology, making it difficult to promote large-scale commercial applications. To deal with the dilemma of mass deployment, we propose a novel smart-road vehicle-guiding architecture for vehicle-road cooperative autonomous driving, based on which we then propose the corresponding business model and analyze its benefits from both operator and driver perspectives. The numerical simulations validate that our proposed smart road solution can enhance driving safety and traffic efficiency. Moreover, we utilize the cost-benefit analysis (CBA) model to assess the economic advantages of the proposed business model which indicates that the smart highway that can provide vehicle-guided-driving services for autonomous vehicles yields more profit than the regular highway. △ Less

Submitted 19 October, 2023; originally announced December 2023.

arXiv:2312.08214 [pdf, other]

A Precoding for ORIS-Assisted MIMO Multi-User VLC System

Authors: Mahmoud Atashbar, Hamed Alizadeh Ghazijahani, Yong Liang Guan, Zhaojie Yang

Abstract: In this paper, we study a multi-user visible light communication (VLC) system assisted with optical reflecting intelligent surface (ORIS). Joint precoding and alignment matrices are designed to maximize the average signal-to-interference plus noise ratio (SINR) criteria. Considering the constraints of the constant mean transmission power of LEDs and the power associated with all users, an optimiza… ▽ More In this paper, we study a multi-user visible light communication (VLC) system assisted with optical reflecting intelligent surface (ORIS). Joint precoding and alignment matrices are designed to maximize the average signal-to-interference plus noise ratio (SINR) criteria. Considering the constraints of the constant mean transmission power of LEDs and the power associated with all users, an optimization problem is proposed. To solve this problem, we utilize an alternating optimization algorithm to optimize the precoding and alignment matrices. The simulation results demonstrate that the resultant SINR of the proposed method outperforms ZF and MMSE precoding algorithms. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: 5 pages, 3 figures

arXiv:2312.00839 [pdf, other]

PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction

Authors: Lei Guan, Dongsheng Li, Jiye Liang, Wenjian Wang, Xicheng Lu

Abstract: Asynchronous pipeline model parallelism with a "1F1B" (one forward, one backward) schedule generates little bubble overhead and always provides quite a high throughput. However, the "1F1B" schedule inevitably leads to weight inconsistency and weight staleness issues due to the cross-training of different mini-batches across GPUs. To simultaneously address these two problems, in this paper, we prop… ▽ More Asynchronous pipeline model parallelism with a "1F1B" (one forward, one backward) schedule generates little bubble overhead and always provides quite a high throughput. However, the "1F1B" schedule inevitably leads to weight inconsistency and weight staleness issues due to the cross-training of different mini-batches across GPUs. To simultaneously address these two problems, in this paper, we propose an optimizer-dependent weight prediction strategy (a.k.a PipeOptim) for asynchronous pipeline training. The key insight of our proposal is that we employ a weight prediction strategy in the forward pass to ensure that each mini-batch uses consistent and staleness-free weights to compute the forward pass. To be concrete, we first construct the weight prediction scheme based on the update rule of the used optimizer when training the deep neural network models. Then throughout the "1F1B" pipelined training, each mini-batch is mandated to execute weight prediction ahead of the forward pass, subsequently employing the predicted weights to perform the forward pass. As a result, PipeOptim 1) inherits the advantage of the "1F1B" schedule and generates pretty high throughput, and 2) can ensure effective parameter learning regardless of the type of the used optimizer. To verify the effectiveness of our proposal, we conducted extensive experimental evaluations using eight different deep-learning models spanning three machine-learning tasks including image classification, sentiment analysis, and machine translation. The experiment results demonstrate that PipeOptim outperforms the popular pipelined approaches including GPipe, PipeDream, PipeDream-2BW, and SpecTrain. The code of PipeOptim can be accessible at https://github.com/guanleics/PipeOptim. △ Less

Submitted 5 December, 2023; v1 submitted 30 November, 2023; originally announced December 2023.

Comments: 14 pages

arXiv:2312.00797 [pdf, other]

doi 10.1109/TAP.2023.3263920

Multi-mode OAM Convergent Transmission with Co-divergent Angle Tailored by Airy Wavefront

Authors: Yufei Zhao, Ziyang Wang, Yilong Lu, Yong Liang Guan

Abstract: Wireless backhaul offers a more cost-effective, time-efficient, and reconfigurable solution than wired backhaul to connect the edge-computing cells to the core network. As the amount of transmitted data increases, the low-rank characteristic of Line-of-Sight (LoS) channel severely limits the growth of channel capacity in the point-to-point backhaul transmission scenario. Orbital Angular Momentum (… ▽ More Wireless backhaul offers a more cost-effective, time-efficient, and reconfigurable solution than wired backhaul to connect the edge-computing cells to the core network. As the amount of transmitted data increases, the low-rank characteristic of Line-of-Sight (LoS) channel severely limits the growth of channel capacity in the point-to-point backhaul transmission scenario. Orbital Angular Momentum (OAM), also known as vortex beam, is considered a potentially effective solution for high-capacity LoS wireless transmission. However, due to the shortcomings of its energy divergence and the specificity of multi-mode divergence angles, OAM beams have been difficult to apply in practical communication systems for a long time. In this work, a novel multi-mode convergent transmission with co-scale reception scheme is proposed. OAM beams of different modes can be transmitted with the same beam divergent angle, while the wavefronts are tailored by the ring-shaped Airy compensation lens during propagation, so that the energy will converge to the same spatial area for receiving. Based on this scheme, not only is the Signal-to-Noise Ratio (SNR) greatly improved, but it is also possible to simultaneously receive and demodulate OAM channels multiplexed with different modes in a limited space area. Through prototype experiments, we demonstrated that 3 kinds of OAM modes are tunable, and different channels can be separated simultaneously with receiving power increasing. The measurement isolations between channels are over 11 dB, which ensures a reliable 16-QAM multiplexing wireless transmission demo system. This work may explore the potential applications of OAM-based multi-mode convergent transmission in LoS wireless communications. △ Less

Submitted 18 November, 2023; originally announced December 2023.

Comments: Airy beam, line-of-sight channel, orbital angular momentum, OAM multi-mode, wireless communication

Journal ref: IEEE Transactions on Antennas and Propagation (Volume: 71, Issue: 6, June 2023)

arXiv:2311.04617 [pdf, other]

doi 10.1109/TIP.2023.3281171

Image Patch-Matching with Graph-Based Learning in Street Scenes

Authors: Rui She, Qiyu Kang, Sijie Wang, Wee Peng Tay, Yong Liang Guan, Diego Navarro Navarro, Andreas Hartmannsgruber

Abstract: Matching landmark patches from a real-time image captured by an on-vehicle camera with landmark patches in an image database plays an important role in various computer perception tasks for autonomous driving. Current methods focus on local matching for regions of interest and do not take into account spatial neighborhood relationships among the image patches, which typically correspond to objects… ▽ More Matching landmark patches from a real-time image captured by an on-vehicle camera with landmark patches in an image database plays an important role in various computer perception tasks for autonomous driving. Current methods focus on local matching for regions of interest and do not take into account spatial neighborhood relationships among the image patches, which typically correspond to objects in the environment. In this paper, we construct a spatial graph with the graph vertices corresponding to patches and edges capturing the spatial neighborhood information. We propose a joint feature and metric learning model with graph-based learning. We provide a theoretical basis for the graph-based loss by showing that the information distance between the distributions conditioned on matched and unmatched pairs is maximized under our framework. We evaluate our model using several street-scene datasets and demonstrate that our approach achieves state-of-the-art matching results. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2310.10457 [pdf, other]

Flag Sequence Set Design for Low-Complexity Delay-Doppler Estimation

Authors: Lingsheng Meng, Yong Liang Guan, Yao Ge, Zilong Liu

Abstract: This paper studies Flag sequences for low-complexity delay-Doppler estimation by exploiting their distinctive peak-curtain ambiguity functions (AFs). Unlike the existing Flag sequence designs that are limited to prime lengths and periodic auto-AFs, we aim to design Flag sequence sets of arbitrary lengths with low (nontrivial) periodic/aperiodic auto- and cross-AFs. Since every Flag sequence consis… ▽ More This paper studies Flag sequences for low-complexity delay-Doppler estimation by exploiting their distinctive peak-curtain ambiguity functions (AFs). Unlike the existing Flag sequence designs that are limited to prime lengths and periodic auto-AFs, we aim to design Flag sequence sets of arbitrary lengths with low (nontrivial) periodic/aperiodic auto- and cross-AFs. Since every Flag sequence consists of a Curtain sequence and a Peak sequence, we first investigate the algebraic design of Curtain sequence sets of arbitrary lengths. Our proposed design gives rise to novel Curtain sequence sets with ideal curtain auto-AFs and zero/near-zero cross-AFs within the delay-Doppler zone of operation. Leveraging these Curtain sequence sets, two optimization problems are formulated to minimize the Weighted Integrated masked Sidelobe Level (WImSL) of the Flag sequence set. Accelerated Parallel Partially Majorization-Minimization Algorithms are proposed to jointly optimize the transmit Flag sequences and symmetric/asymmetric reference sequences stored in the receiver. Simulations demonstrate that our proposed Flag sequences lead to improved WImSL and peak-to-max-masked-sidelobe ratio compared with the existing Flag sequences. Additionally, our Flag sequences under the Flag method exhibit Mean Squared Errors that approach the Cramér-Rao Lower Bound and the Sampling Bound at high signal-to-noise power ratios. △ Less

Submitted 2 June, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: 14 pages, 7 figures, 1 table

arXiv:2309.07438 [pdf, other]

Towards Artificial General Intelligence (AGI) in the Internet of Things (IoT): Opportunities and Challenges

Authors: Fei Dou, ** Ye, Geng Yuan, Qin Lu, Wei Niu, Haijian Sun, Le Guan, Guoyu Lu, Gengchen Mai, Ninghao Liu, ** Lu, Zhengliang Liu, Zihao Wu, Chenjiao Tan, Shaochen Xu, Xianqiao Wang, Guoming Li, Lilong Chai, Sheng Li, ** Sun, Hongyue Sun, Yunli Shao, Changying Li, Tianming Liu, Wenzhan Song

Abstract: Artificial General Intelligence (AGI), possessing the capacity to comprehend, learn, and execute tasks with human cognitive abilities, engenders significant anticipation and intrigue across scientific, commercial, and societal arenas. This fascination extends particularly to the Internet of Things (IoT), a landscape characterized by the interconnection of countless devices, sensors, and systems, c… ▽ More Artificial General Intelligence (AGI), possessing the capacity to comprehend, learn, and execute tasks with human cognitive abilities, engenders significant anticipation and intrigue across scientific, commercial, and societal arenas. This fascination extends particularly to the Internet of Things (IoT), a landscape characterized by the interconnection of countless devices, sensors, and systems, collectively gathering and sharing data to enable intelligent decision-making and automation. This research embarks on an exploration of the opportunities and challenges towards achieving AGI in the context of the IoT. Specifically, it starts by outlining the fundamental principles of IoT and the critical role of Artificial Intelligence (AI) in IoT systems. Subsequently, it delves into AGI fundamentals, culminating in the formulation of a conceptual framework for AGI's seamless integration within IoT. The application spectrum for AGI-infused IoT is broad, encompassing domains ranging from smart grids, residential environments, manufacturing, and transportation to environmental monitoring, agriculture, healthcare, and education. However, adapting AGI to resource-constrained IoT settings necessitates dedicated research efforts. Furthermore, the paper addresses constraints imposed by limited computing resources, intricacies associated with large-scale IoT communication, as well as the critical concerns pertaining to security and privacy. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2309.06388 [pdf]

Computational Approaches for Predicting Drug-Disease Associations: A Comprehensive Review

Authors: Chunyan Ao, Zhichao Xiao, Lixin Guan, Liang Yu

Abstract: In recent decades, traditional drug research and development have been facing challenges such as high cost, long timelines, and high risks. To address these issues, many computational approaches have been suggested for predicting the relationship between drugs and diseases through drug repositioning, aiming to reduce the cost, development cycle, and risks associated with develo** new drugs. Rese… ▽ More In recent decades, traditional drug research and development have been facing challenges such as high cost, long timelines, and high risks. To address these issues, many computational approaches have been suggested for predicting the relationship between drugs and diseases through drug repositioning, aiming to reduce the cost, development cycle, and risks associated with develo** new drugs. Researchers have explored different computational methods to predict drug-disease associations, including drug side effects-disease associations, drug-target associations, and miRNAdisease associations. In this comprehensive review, we focus on recent advances in predicting drug-disease association methods for drug repositioning. We first categorize these methods into several groups, including neural network-based algorithms, matrixbased algorithms, recommendation algorithms, link-based reasoning algorithms, and text mining and semantic reasoning. Then, we compare the prediction performance of existing drug-disease association prediction algorithms. Lastly, we delve into the present challenges and future prospects concerning drug-disease associations. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Comments: 34 page, 5 figures, 2 tables

arXiv:2309.04709 [pdf]

A Public Information Precoding for MIMO Visible Light Communication System Based on Manifold Optimization

Authors: Hamed Alizadeh Ghazijahani, Mahmoud Atashbar, Yong Liang Guan, Zhaojie Yang

Abstract: Visible light communication (VLC) is an attractive subset of optical communication that provides a high data rate in the access layer of the network. The combination of multiple inputmultiple output (MIMO) with a VLC system leads to a higher speed of data transmission named as MIMO-VLC system. In multi-user (MU) MIMO-VLC, a LED array transmits signals for users. These signals are categorized as si… ▽ More Visible light communication (VLC) is an attractive subset of optical communication that provides a high data rate in the access layer of the network. The combination of multiple inputmultiple output (MIMO) with a VLC system leads to a higher speed of data transmission named as MIMO-VLC system. In multi-user (MU) MIMO-VLC, a LED array transmits signals for users. These signals are categorized as signals of private information for each user and signals of public information for all users. The main idea of this paper is to design an omnidirectional precoding to transmit the signals of public information in the MUMIMO-VLC network. To this end, we propose to maximize the achievable rate which leads to maximizing the received mean power at the possible location of the users. Besides maximizing the achievable rate, we consider equal mean transmission power constraint in all LEDs to achieve higher power efficiency of the power amplifiers used in the LED array. Based on this we formulate an optimization problem in which the constraint is in the form of a manifold and utilize a gradient method projected on the manifold to solve the problem. Simulation results indicate that the proposed omnidirectional precoding can achieve superior received mean power and bit error rate with respect to the classical form without precoding utilization. △ Less

Submitted 9 September, 2023; originally announced September 2023.

Comments: This paper has been submitted to an IEEE Journal

arXiv:2309.01966 [pdf, other]

AdaPlus: Integrating Nesterov Momentum and Precise Stepsize Adjustment on AdamW Basis

Authors: Lei Guan

Abstract: This paper proposes an efficient optimizer called AdaPlus which integrates Nesterov momentum and precise stepsize adjustment on AdamW basis. AdaPlus combines the advantages of AdamW, Nadam, and AdaBelief and, in particular, does not introduce any extra hyper-parameters. We perform extensive experimental evaluations on three machine learning tasks to validate the effectiveness of AdaPlus. The exper… ▽ More This paper proposes an efficient optimizer called AdaPlus which integrates Nesterov momentum and precise stepsize adjustment on AdamW basis. AdaPlus combines the advantages of AdamW, Nadam, and AdaBelief and, in particular, does not introduce any extra hyper-parameters. We perform extensive experimental evaluations on three machine learning tasks to validate the effectiveness of AdaPlus. The experiment results validate that AdaPlus (i) among all the evaluated adaptive methods, performs most comparable with (even slightly better than) SGD with momentum on image classification tasks and (ii) outperforms other state-of-the-art optimizers on language modeling tasks and illustrates pretty high stability when training GANs. The experiment code of AdaPlus will be accessible at: https://github.com/guanleics/AdaPlus. △ Less

Submitted 24 December, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

arXiv:2305.18240 [pdf, other]

XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

Authors: Lei Guan, Dongsheng Li, Yanqi Shi, Jian Meng

Abstract: In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then ap… ▽ More In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w.r.t. the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction. XGrad is rather straightforward to implement yet pretty effective in boosting the convergence of gradient-based optimizers and the accuracy of DNN models. Empirical results concerning five popular optimizers including SGD with momentum, Adam, AdamW, AdaBelief, and AdaM3 demonstrate the effectiveness of our proposal. The experimental results validate that XGrad can attain higher model accuracy than the baseline optimizers when training the DNN models. The code of XGrad will be available at: https://github.com/guanleics/XGrad. △ Less

Submitted 7 April, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: arXiv admin note: text overlap with arXiv:2302.00195

arXiv:2305.14909 [pdf, other]

Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning

Authors: Lin Guan, Karthik Valmeekam, Sarath Sreedharan, Subbarao Kambhampati

Abstract: There is a growing interest in applying pre-trained large language models (LLMs) to planning problems. However, methods that use LLMs directly as planners are currently impractical due to several factors, including limited correctness of plans, strong reliance on feedback from interactions with simulators or even the actual environment, and the inefficiency in utilizing human feedback. In this wor… ▽ More There is a growing interest in applying pre-trained large language models (LLMs) to planning problems. However, methods that use LLMs directly as planners are currently impractical due to several factors, including limited correctness of plans, strong reliance on feedback from interactions with simulators or even the actual environment, and the inefficiency in utilizing human feedback. In this work, we introduce a novel alternative paradigm that constructs an explicit world (domain) model in planning domain definition language (PDDL) and then uses it to plan with sound domain-independent planners. To address the fact that LLMs may not generate a fully functional PDDL model initially, we employ LLMs as an interface between PDDL and sources of corrective feedback, such as PDDL validators and humans. For users who lack a background in PDDL, we show that LLMs can translate PDDL into natural language and effectively encode corrective feedback back to the underlying domain model. Our framework not only enjoys the correctness guarantee offered by the external planners but also reduces human involvement by allowing users to correct domain models at the beginning, rather than inspecting and correcting (through interactive prompting) every generated plan as in previous work. On two IPC domains and a Household domain that is more complicated than commonly used benchmarks such as ALFWorld, we demonstrate that GPT-4 can be leveraged to produce high-quality PDDL models for over 40 actions, and the corrected PDDL models are then used to successfully solve 48 challenging planning tasks. Resources, including the source code, are released at: https://guansuns.github.io/pages/llm-dm. △ Less

Submitted 1 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023

arXiv:2305.13635 [pdf, ps, other]

doi 10.1109/MPRV.2023.3274770

Exploiting Radio Fingerprints for Simultaneous Localization and Map**

Authors: Ran Liu, Billy Pik Lik Lau, Khairuldanial Ismail, Achala Chathuranga, Chau Yuen, Simon X. Yang, Yong Liang Guan, Shiwen Mao, U-Xuan Tan

Abstract: Simultaneous localization and map** (SLAM) is paramount for unmanned systems to achieve self-localization and navigation. It is challenging to perform SLAM in large environments, due to sensor limitations, complexity of the environment, and computational resources. We propose a novel approach for localization and map** of autonomous vehicles using radio fingerprints, for example WiFi (Wireless… ▽ More Simultaneous localization and map** (SLAM) is paramount for unmanned systems to achieve self-localization and navigation. It is challenging to perform SLAM in large environments, due to sensor limitations, complexity of the environment, and computational resources. We propose a novel approach for localization and map** of autonomous vehicles using radio fingerprints, for example WiFi (Wireless Fidelity) or LTE (Long Term Evolution) radio features, which are widely available in the existing infrastructure. In particular, we present two solutions to exploit the radio fingerprints for SLAM. In the first solution-namely Radio SLAM, the output is a radio fingerprint map generated using SLAM technique. In the second solution-namely Radio+LiDAR SLAM, we use radio fingerprint to assist conventional LiDAR-based SLAM to improve accuracy and speed, while generating the occupancy map. We demonstrate the effectiveness of our system in three different environments, namely outdoor, indoor building, and semi-indoor environment. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: This paper has been accepted by IEEE Pervasive Computing with DOI: 10.1109/MPRV.2023.3274770

arXiv:2305.04190 [pdf, ps, other]

Fast Blind Recovery of Linear Block Codes over Noisy Channels

Authors: Peng Wang, Yong Liang Guan, Lipo Wang, Peng Cheng

Abstract: This paper addresses the blind recovery of the parity check matrix of an (n,k) linear block code over noisy channels by proposing a fast recovery scheme consisting of 3 parts. Firstly, this scheme performs initial error position detection among the received codewords and selects the desirable codewords. Then, this scheme conducts Gaussian elimination (GE) on a k-by-k full-rank matrix and uses a th… ▽ More This paper addresses the blind recovery of the parity check matrix of an (n,k) linear block code over noisy channels by proposing a fast recovery scheme consisting of 3 parts. Firstly, this scheme performs initial error position detection among the received codewords and selects the desirable codewords. Then, this scheme conducts Gaussian elimination (GE) on a k-by-k full-rank matrix and uses a threshold and the reliability associated to verify the recovered dual words, aiming to improve the reliability of recovery. Finally, it performs decoding on the received codewords with partially recovered dual words. These three parts can be combined into different schemes for different noise level scenarios. The GEV that combines Gaussian elimination and verification has a significantly lower recovery failure probability and a much lower computational complexity than an existing Canteaut-Chabaud-based algorithm, which relies on GE on n-by-n full-rank matrices. The decoding-aided recovery (DAR) and error-detection-&-codeword-selection-&-decoding-aided recovery (EDCSDAR) schemes can improve the code recovery performance over GEV for high noise level scenarios, and their computational complexities remain much lower than the Canteaut-Chabaud-based algorithm. △ Less

Submitted 7 May, 2023; originally announced May 2023.

Comments: 9 pages, 4 figures

arXiv:2305.00537 [pdf, other]

Interpretability of Machine Learning: Recent Advances and Future Prospects

Authors: Lei Gao, Ling Guan

Abstract: The proliferation of machine learning (ML) has drawn unprecedented interest in the study of various multimedia contents such as text, image, audio and video, among others. Consequently, understanding and learning ML-based representations have taken center stage in knowledge discovery in intelligent multimedia research and applications. Nevertheless, the black-box nature of contemporary ML, especia… ▽ More The proliferation of machine learning (ML) has drawn unprecedented interest in the study of various multimedia contents such as text, image, audio and video, among others. Consequently, understanding and learning ML-based representations have taken center stage in knowledge discovery in intelligent multimedia research and applications. Nevertheless, the black-box nature of contemporary ML, especially in deep neural networks (DNNs), has posed a primary challenge for ML-based representation learning. To address this black-box problem, the studies on interpretability of ML have attracted tremendous interests in recent years. This paper presents a survey on recent advances and future prospects on interpretability of ML, with several application examples pertinent to multimedia computing, including text-image cross-modal representation learning, face recognition, and the recognition of objects. It is evidently shown that the study of interpretability of ML promises an important research direction, one which is worth further investment in. △ Less

Submitted 30 April, 2023; originally announced May 2023.

Comments: IEEE Multimedia (Accepted)

arXiv:2303.17782 [pdf, other]

A Slow-Shifting Concerned Machine Learning Method for Short-term Traffic Flow Forecasting

Authors: Zann Koh, Yan Qin, Yong Liang Guan, Chau Yuen

Abstract: The ability to predict traffic flow over time for crowded areas during rush hours is increasingly important as it can help authorities make informed decisions for congestion mitigation or scheduling of infrastructure development in an area. However, a crucial challenge in traffic flow forecasting is the slow shifting in temporal peaks between daily and weekly cycles, resulting in the nonstationari… ▽ More The ability to predict traffic flow over time for crowded areas during rush hours is increasingly important as it can help authorities make informed decisions for congestion mitigation or scheduling of infrastructure development in an area. However, a crucial challenge in traffic flow forecasting is the slow shifting in temporal peaks between daily and weekly cycles, resulting in the nonstationarity of the traffic flow signal and leading to difficulty in accurate forecasting. To address this challenge, we propose a slow shifting concerned machine learning method for traffic flow forecasting, which includes two parts. First, we take advantage of Empirical Mode Decomposition as the feature engineering to alleviate the nonstationarity of traffic flow data, yielding a series of stationary components. Second, due to the superiority of Long-Short-Term-Memory networks in capturing temporal features, an advanced traffic flow forecasting model is developed by taking the stationary components as inputs. Finally, we apply this method on a benchmark of real-world data and provide a comparison with other existing methods. Our proposed method outperforms the state-of-art results by 14.55% and 62.56% using the metrics of root mean squared error and mean absolute percentage error, respectively. △ Less

Submitted 30 March, 2023; originally announced March 2023.

Comments: 6 pages, 4 figures. Accepted for IEEE International Conference on Smart Mobility 2023 (IEEE SM'23)

arXiv:2303.11784 [pdf, ps, other]

Energy Efficiency of Rate-Splitting Multiple Access for Multibeam Satellite System

Authors: **yuan Liu, Yong Liang Guan, Yao Ge, Longfei Yin, Bruno Clerckx

Abstract: Energy efficiency (EE) problem has become an important and major issue in satellite communications. In this paper, we study the beamforming design strategy to maximize the EE of rate-splitting multiple access (RSMA) for the multibeam satellite communications by considering imperfect channel state information at the transmitter (CSIT). We propose an expectation-based robust beamforming algorithm ag… ▽ More Energy efficiency (EE) problem has become an important and major issue in satellite communications. In this paper, we study the beamforming design strategy to maximize the EE of rate-splitting multiple access (RSMA) for the multibeam satellite communications by considering imperfect channel state information at the transmitter (CSIT). We propose an expectation-based robust beamforming algorithm against the imperfect CSIT scenario. By combining the successive convex approximation (SCA) with the penalty function transformation, the nonconvex EE maximization problem can be solved in an iterative manner. The simulation results demonstrate the effectiveness and superiority of RSMA over traditional space division multiple access (SDMA). Moreover, our proposed beamforming algorithm can achieve better EE performance than the conventional beamforming algorithm. △ Less

Submitted 19 March, 2023; originally announced March 2023.

Comments: 5 pages, 1 figure, accepted by the 2023 IEEE Vehicular Technology Conference

arXiv:2303.08531 [pdf, ps, other]

Low-Complexity Memory AMP Detector for High-Mobility MIMO-OTFS SCMA Systems

Authors: Yao Ge, Lei Liu, Shunqi Huang, David González G., Yong Liang Guan, Zhi Ding

Abstract: Efficient signal detectors are rather important yet challenging to achieve satisfactory performance for large-scale communication systems. This paper considers a non-orthogonal sparse code multiple access (SCMA) configuration for multiple-input multiple-output (MIMO) systems with recently proposed orthogonal time frequency space (OTFS) modulation. We develop a novel low-complexity yet effective cu… ▽ More Efficient signal detectors are rather important yet challenging to achieve satisfactory performance for large-scale communication systems. This paper considers a non-orthogonal sparse code multiple access (SCMA) configuration for multiple-input multiple-output (MIMO) systems with recently proposed orthogonal time frequency space (OTFS) modulation. We develop a novel low-complexity yet effective customized Memory approximate message passing (AMP) algorithm for channel equalization and multi-user detection. Specifically, the proposed Memory AMP detector enjoys the sparsity of the channel matrix and only applies matrix-vector multiplications in each iteration for low-complexity. To alleviate the performance degradation caused by positive reinforcement problem in the iterative process, all the preceding messages are utilized to guarantee the orthogonality principle in Memory AMP detector. Simulation results are finally provided to illustrate the superiority of our Memory AMP detector over the existing solutions. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: 6 pages, 4 figures, 1 table, accepted by IEEE ICC Workshop

arXiv:2303.02880 [pdf, other]

Spatiotemporal Capsule Neural Network for Vehicle Trajectory Prediction

Authors: Yan Qin, Yong Liang Guan, Chau Yuen

Abstract: Through advancement of the Vehicle-to-Everything (V2X) network, road safety, energy consumption, and traffic efficiency can be significantly improved. An accurate vehicle trajectory prediction benefits communication traffic management and network resource allocation for the real-time application of the V2X network. Recurrent neural networks and their variants have been reported in recent research… ▽ More Through advancement of the Vehicle-to-Everything (V2X) network, road safety, energy consumption, and traffic efficiency can be significantly improved. An accurate vehicle trajectory prediction benefits communication traffic management and network resource allocation for the real-time application of the V2X network. Recurrent neural networks and their variants have been reported in recent research to predict vehicle mobility. However, the spatial attribute of vehicle movement behavior has been overlooked, resulting in incomplete information utilization. To bridge this gap, we put forward for the first time a hierarchical trajectory prediction structure using the capsule neural network (CapsNet) with three sequential components. First, the geographic information is transformed into a grid map presentation, describing vehicle mobility distribution spatially and temporally. Second, CapsNet serves as the core model to embed local temporal and global spatial correlation through hierarchical capsules. Finally, extensive experiments conducted on actual taxi mobility data collected in Porto city (Portugal) and Singapore show that the proposed method outperforms the state-of-the-art methods. △ Less

Submitted 5 March, 2023; originally announced March 2023.

Comments: IEEE TVT has accepted this paper

arXiv:2302.08869 [pdf, other]

OTFS Signaling for SCMA With Coordinated Multi-Point Vehicle Communications

Authors: Yao Ge, Qinwen Deng, David González G., Yong Liang Guan, Zhi Ding

Abstract: This paper investigates an uplink coordinated multi-point (CoMP) coverage scenario, in which multiple mobile users are grouped for sparse code multiple access (SCMA), and served by the remote radio head (RRH) in front of them and the RRH behind them simultaneously. We apply orthogonal time frequency space (OTFS) modulation for each user to exploit the degrees of freedom arising from both the delay… ▽ More This paper investigates an uplink coordinated multi-point (CoMP) coverage scenario, in which multiple mobile users are grouped for sparse code multiple access (SCMA), and served by the remote radio head (RRH) in front of them and the RRH behind them simultaneously. We apply orthogonal time frequency space (OTFS) modulation for each user to exploit the degrees of freedom arising from both the delay and Doppler domains. As the signals received by the RRHs in front of and behind the users experience respectively positive and negative Doppler frequency shifts, our proposed OTFS-based SCMA (OBSCMA) with CoMP system can effectively harvest extra Doppler and spatial diversity for better performance. Based on maximum likelihood (ML) detector, we analyze the single-user average bit error rate (ABER) bound as the benchmark of the ABER performance for our proposed OBSCMA with CoMP system. We also develop a customized Gaussian approximation with expectation propagation (GAEP) algorithm for multi-user detection and propose efficient algorithm structures for centralized and decentralized detectors. Our proposed OBSCMA with CoMP system leads to stronger performance than the existing solutions. The proposed centralized and decentralized detectors exhibit effective reception and robustness under channel state information uncertainty. △ Less

Submitted 17 February, 2023; originally announced February 2023.

Comments: 15 pages, 12 figures, accepted by IEEE Transactions on Vehicular Technology

arXiv:2302.02352 [pdf, other]

TWIN: TWo-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction at Kuaishou

Authors: Jianxin Chang, Chenbin Zhang, Zhiyi Fu, Xiaoxue Zang, Lin Guan, **g Lu, Yiqun Hui, Dewei Leng, Yanan Niu, Yang Song, Kun Gai

Abstract: Life-long user behavior modeling, i.e., extracting a user's hidden interests from rich historical behaviors in months or even years, plays a central role in modern CTR prediction systems. Conventional algorithms mostly follow two cascading stages: a simple General Search Unit (GSU) for fast and coarse search over tens of thousands of long-term behaviors and an Exact Search Unit (ESU) for effective… ▽ More Life-long user behavior modeling, i.e., extracting a user's hidden interests from rich historical behaviors in months or even years, plays a central role in modern CTR prediction systems. Conventional algorithms mostly follow two cascading stages: a simple General Search Unit (GSU) for fast and coarse search over tens of thousands of long-term behaviors and an Exact Search Unit (ESU) for effective Target Attention (TA) over the small number of finalists from GSU. Although efficient, existing algorithms mostly suffer from a crucial limitation: the \textit{inconsistent} target-behavior relevance metrics between GSU and ESU. As a result, their GSU usually misses highly relevant behaviors but retrieves ones considered irrelevant by ESU. In such case, the TA in ESU, no matter how attention is allocated, mostly deviates from the real user interests and thus degrades the overall CTR prediction accuracy. To address such inconsistency, we propose \textbf{TWo-stage Interest Network (TWIN)}, where our Consistency-Preserved GSU (CP-GSU) adopts the identical target-behavior relevance metric as the TA in ESU, making the two stages twins. Specifically, to break TA's computational bottleneck and extend it from ESU to GSU, or namely from behavior length $10^2$ to length $10^4-10^5$, we build a novel attention mechanism by behavior feature splitting. For the video inherent features of a behavior, we calculate their linear projection by efficient pre-computing \& caching strategies. And for the user-item cross features, we compress each into a one-dimentional bias term in the attention score calculation to save the computational cost. The consistency between two stages, together with the effective TA-based relevance metric in CP-GSU, contributes to significant performance gain in CTR prediction. △ Less

Submitted 26 June, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

Comments: Accepted by KDD 2023

arXiv:2302.02245 [pdf, other]

GAN-based Vertical Federated Learning for Label Protection in Binary Classification

Authors: Yu** Han, Leying Guan

Abstract: Split learning (splitNN) has emerged as a popular strategy for addressing the high computational costs and low modeling efficiency in Vertical Federated Learning (VFL). However, despite its popularity, vanilla splitNN lacks encryption protection, leaving it vulnerable to privacy leakage issues, especially Label Leakage from Gradients (LLG). Motivated by the LLG issue resulting from the use of labe… ▽ More Split learning (splitNN) has emerged as a popular strategy for addressing the high computational costs and low modeling efficiency in Vertical Federated Learning (VFL). However, despite its popularity, vanilla splitNN lacks encryption protection, leaving it vulnerable to privacy leakage issues, especially Label Leakage from Gradients (LLG). Motivated by the LLG issue resulting from the use of labels during training, we propose the Generative Adversarial Federated Model (GAFM), a novel method designed specifically to enhance label privacy protection by integrating splitNN with Generative Adversarial Networks (GANs). GAFM leverages GANs to indirectly utilize label information by learning the label distribution rather than relying on explicit labels, thereby mitigating LLG. GAFM also employs an additional cross-entropy loss based on the noisy labels to further improve the prediction accuracy. Our ablation experiment demonstrates that the combination of GAN and the cross-entropy loss component is necessary to enable GAFM to mitigate LLG without significantly compromising the model utility. Empirical results on various datasets show that GAFM achieves a better and more robust trade-off between model utility and privacy compared to all baselines across multiple random runs. In addition, we provide experimental justification to substantiate GAFM's superiority over splitNN, demonstrating that it offers enhanced label protection through gradient perturbation relative to splitNN. △ Less

Submitted 16 May, 2023; v1 submitted 4 February, 2023; originally announced February 2023.

arXiv:2302.02237 [pdf, other]

Conformalized Semi-supervised Random Forest for Classification and Abnormality Detection

Authors: Yu** Han, Mingwenchan Xu, Leying Guan

Abstract: The Random Forests classifier, a widely utilized off-the-shelf classification tool, assumes training and test samples come from the same distribution as other standard classifiers. However, in safety-critical scenarios like medical diagnosis and network attack detection, discrepancies between the training and test sets, including the potential presence of novel outlier samples not appearing during… ▽ More The Random Forests classifier, a widely utilized off-the-shelf classification tool, assumes training and test samples come from the same distribution as other standard classifiers. However, in safety-critical scenarios like medical diagnosis and network attack detection, discrepancies between the training and test sets, including the potential presence of novel outlier samples not appearing during training, can pose significant challenges. To address this problem, we introduce the Conformalized Semi-Supervised Random Forest (CSForest), which couples the conformalization technique Jackknife+aB with semi-supervised tree ensembles to construct a set-valued prediction $C(x)$. Instead of optimizing over the training distribution, CSForest employs unlabeled test samples to enhance accuracy and flag unseen outliers by generating an empty set. Theoretically, we establish CSForest to cover true labels for previously observed inlier classes under arbitrarily label-shift in the test data. We compare CSForest with state-of-the-art methods using synthetic examples and various real-world datasets, under different types of distribution changes in the test domain. Our results highlight CSForest's effective prediction of inliers and its ability to detect outlier samples unique to the test data. In addition, CSForest shows persistently good performance as the sizes of the training and test sets vary. Codes of CSForest are available at https://github.com/yu**han98/CSForest. △ Less

Submitted 29 February, 2024; v1 submitted 4 February, 2023; originally announced February 2023.

arXiv:2302.00195 [pdf, other]

Weight Prediction Boosts the Convergence of AdamW

Authors: Lei Guan

Abstract: In this paper, we introduce weight prediction into the AdamW optimizer to boost its convergence when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, we predict the future weights according to the update rule of AdamW and then apply the predicted future weights to do both forward pass and backward propagation. In this way, the AdamW optimizer always… ▽ More In this paper, we introduce weight prediction into the AdamW optimizer to boost its convergence when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, we predict the future weights according to the update rule of AdamW and then apply the predicted future weights to do both forward pass and backward propagation. In this way, the AdamW optimizer always utilizes the gradients w.r.t. the future weights instead of current weights to update the DNN parameters, making the AdamW optimizer achieve better convergence. Our proposal is simple and straightforward to implement but effective in boosting the convergence of DNN training. We performed extensive experimental evaluations on image classification and language modeling tasks to verify the effectiveness of our proposal. The experimental results validate that our proposal can boost the convergence of AdamW and achieve better accuracy than AdamW when training the DNN models. △ Less

Submitted 7 August, 2023; v1 submitted 31 January, 2023; originally announced February 2023.

arXiv:2301.13361 [pdf, other]

Iterative Loop Method Combining Active and Semi-Supervised Learning for Domain Adaptive Semantic Segmentation

Authors: Licong Guan, Xue Yuan

Abstract: Semantic segmentation is an important technique for environment perception in intelligent transportation systems. With the rapid development of convolutional neural networks (CNNs), road scene analysis can usually achieve satisfactory results in the source domain. However, guaranteeing good generalization to different target domain scenarios remains a significant challenge. Recently, semi-supervis… ▽ More Semantic segmentation is an important technique for environment perception in intelligent transportation systems. With the rapid development of convolutional neural networks (CNNs), road scene analysis can usually achieve satisfactory results in the source domain. However, guaranteeing good generalization to different target domain scenarios remains a significant challenge. Recently, semi-supervised learning and active learning have been proposed to alleviate this problem. Semisupervised learning can improve model accuracy with massive unlabeled data, but some pseudo labels containing noise would be generated with limited or imbalanced training data. And there will be suboptimal models if human guidance is absent. Active learning can select more effective data to intervene, while the model accuracy can not be improved because the massive unlabeled data are not used. And the probability of querying sub-optimal samples will increase when the domain difference is too large, increasing annotation cost. This paper proposes an iterative loop method combining active and semisupervised learning for domain adaptive semantic segmentation. The method first uses semi-supervised to learn massive unlabeled data to improve model accuracy and provide more accurate selection models for active learning. Secondly, combined with the predictive uncertainty sample selection strategy of active learning, manual intervention is used to correct the pseudo-labels. Finally, flexible iterative loops achieve the best performance with minimal labeling cost. Extensive experiments show that our method establishes state-of-the-art performance on tasks of GTAV to Cityscapes, SYNTHIA to Cityscapes, improving by 4.9% mIoU and 5.2% mIoU, compared to the previous best method, respectively. △ Less

Submitted 13 March, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

Comments: 10 pages,5 figures

arXiv:2212.06485 [pdf, other]

The Jamming Donut: A Free-Space Gripper based on Granular Jamming

Authors: Therese Joseph, Sarah Baldwin, Lillian Guan, James Brett, David Howard

Abstract: Fruit harvesting has recently experienced a shift towards soft grippers that possess compliance, adaptability, and delicacy. In this context, pneumatic grippers are popular, due to provision of high deformability and compliance, however they typically possess limited grip strength. Jamming possesses strong grip capability, however has limited deformability and often requires the object to be pushe… ▽ More Fruit harvesting has recently experienced a shift towards soft grippers that possess compliance, adaptability, and delicacy. In this context, pneumatic grippers are popular, due to provision of high deformability and compliance, however they typically possess limited grip strength. Jamming possesses strong grip capability, however has limited deformability and often requires the object to be pushed onto a surface to attain a grip. This paper describes a hybrid gripper combining pneumatics (for deformation) and jamming (for grip strength). Our gripper utilises a torus (donut) structure with two chambers controlled by pneumatic and vacuum pressure respectively, to conform around a target object. The gripper displays good adaptability, exploiting pneumatics to mould to the shape of the target object where jamming can be successfully harnessed to grip. The main contribution of the paper is design, fabrication, and characterisation of the first hybrid gripper that can use granular jamming in free space, achieving significantly larger retention forces compared to pure pneumatics. We test our gripper on a range of different sizes and shapes, as well as picking a broad range of real fruit. △ Less

Submitted 13 December, 2022; originally announced December 2022.

arXiv:2212.04575 [pdf, other]

DDM-NET: End-to-end learning of keypoint feature Detection, Description and Matching for 3D localization

Authors: Xiangyu Xu, Li Guan, Enrique Dunn, Haoxiang Li, Gang Hua

Abstract: In this paper, we propose an end-to-end framework that jointly learns keypoint detection, descriptor representation and cross-frame matching for the task of image-based 3D localization. Prior art has tackled each of these components individually, purportedly aiming to alleviate difficulties in effectively train a holistic network. We design a self-supervised image war** correspondence loss for b… ▽ More In this paper, we propose an end-to-end framework that jointly learns keypoint detection, descriptor representation and cross-frame matching for the task of image-based 3D localization. Prior art has tackled each of these components individually, purportedly aiming to alleviate difficulties in effectively train a holistic network. We design a self-supervised image war** correspondence loss for both feature detection and matching, a weakly-supervised epipolar constraints loss on relative camera pose learning, and a directional matching scheme that detects key-point features in a source image and performs coarse-to-fine correspondence search on the target image. We leverage this framework to enforce cycle consistency in our matching module. In addition, we propose a new loss to robustly handle both definite inlier/outlier matches and less-certain matches. The integration of these learning mechanisms enables end-to-end training of a single network performing all three localization components. Bench-marking our approach on public data-sets, exemplifies how such an end-to-end framework is able to yield more accurate localization that out-performs both traditional methods as well as state-of-the-art weakly supervised methods. △ Less

Submitted 1 February, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

arXiv:2211.02541 [pdf]

Generation of Chinese classical poetry based on pre-trained model

Authors: Ziyao Wang, Lu** Guan, Guanyu Liu

Abstract: In order to test whether artificial intelligence can create qualified classical poetry like humans, the author proposes a study of Chinese classical poetry generation based on a pre-trained model. This paper mainly tries to use BART and other pre training models, proposes FS2TEXT and RR2TEXT to generate metrical poetry text and even specific style poetry text, and solves the problem that the user'… ▽ More In order to test whether artificial intelligence can create qualified classical poetry like humans, the author proposes a study of Chinese classical poetry generation based on a pre-trained model. This paper mainly tries to use BART and other pre training models, proposes FS2TEXT and RR2TEXT to generate metrical poetry text and even specific style poetry text, and solves the problem that the user's writing intention gradually reduces the relevance of the generated poetry text. In order to test the model's results, the authors selected ancient poets, by combining it with BART's poetic model work, developed a set of AI poetry Turing problems, it was reviewed by a group of poets and poetry writing researchers. There were more than 600 participants, and the final results showed that, high-level poetry lovers can't distinguish between AI activity and human activity, this indicates that the author's working methods are not significantly different from human activities. The model of poetry generation studied by the author generalizes works that cannot be distinguished from those of advanced scholars. The number of modern Chinese poets has reached 5 million. However, many modern Chinese poets lack language ability and skills as a result of their childhood learning. However, many modern poets have no creative inspiration, and the author's model can help them. They can look at this model when they choose words and phrases and they can write works based on the poems they already have, and they can write their own poems. The importance of poetry lies in the author's thoughts and reflections. It doesn't matter how good AI poetry is. The only thing that matters is for people to see and inspire them. △ Less

Submitted 4 November, 2022; originally announced November 2022.

Comments: 8 pages,2 figures

ACM Class: J.5; I.2.7

arXiv:2210.16007 [pdf, ps, other]

Design of Protograph LDPC-Coded MIMO-VLC Systems with Generalized Spatial Modulation

Authors: Lin Dai, Yi Fang, Yong Liang Guan, Mohsen Guizani

Abstract: This paper investigates the bit-interleaved coded generalized spatial modulation (BICGSM) with iterative decoding (BICGSM-ID) for multiple-input multiple-output (MIMO) visible light communications (VLC). In the BICGSM-ID scheme, the information bits conveyed by the signal-domain (SiD) symbols and the spatial-domain (SpD) light emitting diode (LED)-index patterns are coded by a protograph low-densi… ▽ More This paper investigates the bit-interleaved coded generalized spatial modulation (BICGSM) with iterative decoding (BICGSM-ID) for multiple-input multiple-output (MIMO) visible light communications (VLC). In the BICGSM-ID scheme, the information bits conveyed by the signal-domain (SiD) symbols and the spatial-domain (SpD) light emitting diode (LED)-index patterns are coded by a protograph low-density parity-check (P-LDPC) code. Specifically, we propose a signal-domain symbol expanding and re-allocating (SSER) method for constructing a type of novel generalized spatial modulation (GSM) constellations, referred to as SSERGSM constellations, so as to boost the performance of the BICGSM-ID MIMO-VLC systems. Moreover, by applying a modified PEXIT (MPEXIT) algorithm, we further design a family of rate-compatible P-LDPC codes, referred to as enhanced accumulate-repeat-accumulate (EARA) codes, which possess both excellent decoding thresholds and linear-minimum-distance-growth property. Both analysis and simulation results illustrate that the proposed SSERGSM constellations and P-LDPC codes can remarkably improve the convergence and decoding performance of MIMO-VLC systems. Therefore, the proposed P-LDPC-coded SSERGSM-mapped BICGSM-ID configuration is envisioned as a promising transmission solution to satisfy the high-throughput requirement of MIMO-VLC applications. △ Less

Submitted 28 October, 2022; originally announced October 2022.

arXiv:2210.15906 [pdf, other]

Relative Behavioral Attributes: Filling the Gap between Symbolic Goal Specification and Reward Learning from Human Preferences

Authors: Lin Guan, Karthik Valmeekam, Subbarao Kambhampati

Abstract: Generating complex behaviors that satisfy the preferences of non-expert users is a crucial requirement for AI agents. Interactive reward learning from trajectory comparisons (a.k.a. RLHF) is one way to allow non-expert users to convey complex objectives by expressing preferences over short clips of agent behaviors. Even though this parametric method can encode complex tacit knowledge present in th… ▽ More Generating complex behaviors that satisfy the preferences of non-expert users is a crucial requirement for AI agents. Interactive reward learning from trajectory comparisons (a.k.a. RLHF) is one way to allow non-expert users to convey complex objectives by expressing preferences over short clips of agent behaviors. Even though this parametric method can encode complex tacit knowledge present in the underlying tasks, it implicitly assumes that the human is unable to provide richer feedback than binary preference labels, leading to intolerably high feedback complexity and poor user experience. While providing a detailed symbolic closed-form specification of the objectives might be tempting, it is not always feasible even for an expert user. However, in most cases, humans are aware of how the agent should change its behavior along meaningful axes to fulfill their underlying purpose, even if they are not able to fully specify task objectives symbolically. Using this as motivation, we introduce the notion of Relative Behavioral Attributes, which allows the users to tweak the agent behavior through symbolic concepts (e.g., increasing the softness or speed of agents' movement). We propose two practical methods that can learn to model any kind of behavioral attributes from ordered behavior clips. We demonstrate the effectiveness of our methods on four tasks with nine different behavioral attributes, showing that once the attributes are learned, end users can produce desirable agent behaviors relatively effortlessly, by providing feedback just around ten times. This is over an order of magnitude less than that required by the popular learning-from-human-preferences baselines. The supplementary video and source code are available at: https://guansuns.github.io/pages/rba. △ Less

Submitted 27 February, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

Comments: ICLR 2023 Camera Ready

arXiv:2210.15096 [pdf, other]

Towards customizable reinforcement learning agents: Enabling preference specification through online vocabulary expansion

Authors: Utkarsh Soni, Nupur Thakur, Sarath Sreedharan, Lin Guan, Mudit Verma, Matthew Marquez, Subbarao Kambhampati

Abstract: There is a growing interest in develo** automated agents that can work alongside humans. In addition to completing the assigned task, such an agent will undoubtedly be expected to behave in a manner that is preferred by the human. This requires the human to communicate their preferences to the agent. To achieve this, the current approaches either require the users to specify the reward function… ▽ More There is a growing interest in develo** automated agents that can work alongside humans. In addition to completing the assigned task, such an agent will undoubtedly be expected to behave in a manner that is preferred by the human. This requires the human to communicate their preferences to the agent. To achieve this, the current approaches either require the users to specify the reward function or the preference is interactively learned from queries that ask the user to compare behavior. The former approach can be challenging if the internal representation used by the agent is inscrutable to the human while the latter is unnecessarily cumbersome for the user if their preference can be specified more easily in symbolic terms. In this work, we propose PRESCA (PREference Specification through Concept Acquisition), a system that allows users to specify their preferences in terms of concepts that they understand. PRESCA maintains a set of such concepts in a shared vocabulary. If the relevant concept is not in the shared vocabulary, then it is learned. To make learning a new concept more feedback efficient, PRESCA leverages causal associations between the target concept and concepts that are already known. In addition, we use a novel data augmentation approach to further reduce required feedback. We evaluate PRESCA by using it on a Minecraft environment and show that it can effectively align the agent with the user's preference. △ Less

Submitted 31 January, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

arXiv:2208.07833 [pdf, other]

What Your Firmware Tells You Is Not How You Should Emulate It: A Specification-Guided Approach for Firmware Emulation (Extended Version)

Authors: Wei Zhou, Lan Zhang, Le Guan, Peng Liu, Yuqing Zhang

Abstract: Emulating firmware of microcontrollers is challenging due to the lack of peripheral models. Existing work finds out how to respond to peripheral read operations by analyzing the target firmware. This is problematic because the firmware sometimes does not contain enough clues to support the emulation or even contains misleading information (e.g. buggy firmware). In this work, we propose a new appro… ▽ More Emulating firmware of microcontrollers is challenging due to the lack of peripheral models. Existing work finds out how to respond to peripheral read operations by analyzing the target firmware. This is problematic because the firmware sometimes does not contain enough clues to support the emulation or even contains misleading information (e.g. buggy firmware). In this work, we propose a new approach that builds peripheral models from the peripheral specification. Using NLP, we translate peripheral behaviors in human language (documented in chip manuals) into a set of structured condition-action rules. By checking, executing, and chaining them at runtime, we can dynamically synthesize a peripheral model for each firmware execution. The extracted condition-action rules might not be complete or even be wrong. We, therefore, propose incorporating symbolic execution to quickly pinpoint the root cause. This assists us in the manual correction of the problematic rules. We have implemented our idea for five popular MCU boards spanning three different chip vendors. Using a new edit-distance-based algorithm to calculate trace differences, our evaluation against a large firmware corpus confirmed that our prototype achieves much higher fidelity compared with state-of-the-art solutions. Benefiting from the accurate emulation, our emulator effectively avoids false positives observed in existing fuzzing work. We also designed a new dynamic analysis method to perform driver code compliance checks against the specification. We found some non-compliance which we later confirmed to be bugs caused by race conditions. △ Less

Submitted 11 October, 2022; v1 submitted 16 August, 2022; originally announced August 2022.

Comments: Wei Zhou and Lan Zhang contributed equally to this work

arXiv:2206.12134 [pdf, other]

Capacity Optimal Coded Generalized MU-MIMO

Authors: Yuhao Chi, Lei Liu, Guanghui Song, Ying Li, Yong Liang Guan, Chau Yuen

Abstract: With the complication of future communication scenarios, most conventional signal processing technologies of multi-user multiple-input multiple-output (MU-MIMO) become unreliable, which are designed based on ideal assumptions, such as Gaussian signaling and independent identically distributed (IID) channel matrices. As a result, this paper considers a generalized MU-MIMO (GMU-MIMO) system with mor… ▽ More With the complication of future communication scenarios, most conventional signal processing technologies of multi-user multiple-input multiple-output (MU-MIMO) become unreliable, which are designed based on ideal assumptions, such as Gaussian signaling and independent identically distributed (IID) channel matrices. As a result, this paper considers a generalized MU-MIMO (GMU-MIMO) system with more general assumptions, i.e., arbitrarily fixed input distributions, and general unitarily-invariant channel matrices. However, there is still no accurate capacity analysis and capacity optimal transceiver with practical complexity for GMU-MIMO under the constraint of coding. To address these issues, inspired by the replica method, the constrained sum capacity of coded GMU-MIMO with fixed input distribution is calculated by using the celebrated mutual information and minimum mean-square error (MMSE) lemma and the MMSE optimality of orthogonal/vector approximate message passing (OAMP/VAMP). Then, a capacity optimal multiuser OAMP/VAMP receiver is proposed, whose achievable rate is proved to be equal to the constrained sum capacity. Moreover, a design principle of multi-user codes is presented for the multiuser OAMP/VAMP, based on which a kind of practical multi-user low-density parity-check (MU-LDPC) code is designed. Numerical results show that finite-length performances of the proposed MU-LDPC codes with multi-user OAMP/VAMP are about 2 dB away from the constrained sum capacity and outperform those of the existing state-of-art methods. △ Less

Submitted 24 June, 2022; originally announced June 2022.

Comments: Accepted by the 2022 IEEE International Symposium on Information Theory (ISIT).arXiv admin note:substantial text overlap with arXiv:2111.11061

arXiv:2205.10791 [pdf, other]

Federated Spectrum Learning for Reconfigurable Intelligent Surfaces-Aided Wireless Edge Networks

Authors: Bo Yang, Xuelin Cao, Chongwen Huang, Chau Yuen, Marco Di Renzo, Yong Liang Guan, Dusit Niyato, Lijun Qian, Merouane Debbah

Abstract: Increasing concerns on intelligent spectrum sensing call for efficient training and inference technologies. In this paper, we propose a novel federated learning (FL) framework, dubbed federated spectrum learning (FSL), which exploits the benefits of reconfigurable intelligent surfaces (RISs) and overcomes the unfavorable impact of deep fading channels. Distinguishingly, we endow conventional RISs… ▽ More Increasing concerns on intelligent spectrum sensing call for efficient training and inference technologies. In this paper, we propose a novel federated learning (FL) framework, dubbed federated spectrum learning (FSL), which exploits the benefits of reconfigurable intelligent surfaces (RISs) and overcomes the unfavorable impact of deep fading channels. Distinguishingly, we endow conventional RISs with spectrum learning capabilities by leveraging a fully-trained convolutional neural network (CNN) model at each RIS controller, thereby hel** the base station to cooperatively infer the users who request to participate in FL at the beginning of each training iteration. To fully exploit the potential of FL and RISs, we address three technical challenges: RISs phase shifts configuration, user-RIS association, and wireless bandwidth allocation. The resulting joint learning, wireless resource allocation, and user-RIS association design is formulated as an optimization problem whose objective is to maximize the system utility while considering the impact of FL prediction accuracy. In this context, the accuracy of FL prediction interplays with the performance of resource optimization. In particular, if the accuracy of the trained CNN model deteriorates, the performance of resource allocation worsens. The proposed FSL framework is tested by using real radio frequency (RF) traces and numerical results demonstrate its advantages in terms of spectrum prediction accuracy and system utility: a better CNN prediction accuracy and FL system utility can be achieved with a larger number of RISs and reflecting elements. △ Less

Submitted 22 May, 2022; originally announced May 2022.

arXiv:2202.03013 [pdf, other]

doi 10.1145/3510003.3510208

$μ$AFL: Non-intrusive Feedback-driven Fuzzing for Microcontroller Firmware

Authors: Wenqiang Li, Jiameng Shi, Fengjun Li, **gqiang Lin, Wei Wang, Le Guan

Abstract: Fuzzing is one of the most effective approaches to finding software flaws. However, applying it to microcontroller firmware incurs many challenges. For example, rehosting-based solutions cannot accurately model peripheral behaviors and thus cannot be used to fuzz the corresponding driver code. In this work, we present $μ$AFL, a hardware-in-the-loop approach to fuzzing microcontroller firmware. It… ▽ More Fuzzing is one of the most effective approaches to finding software flaws. However, applying it to microcontroller firmware incurs many challenges. For example, rehosting-based solutions cannot accurately model peripheral behaviors and thus cannot be used to fuzz the corresponding driver code. In this work, we present $μ$AFL, a hardware-in-the-loop approach to fuzzing microcontroller firmware. It leverages debugging tools in existing embedded system development to construct an AFL-compatible fuzzing framework. Specifically, we use the debug dongle to bridge the fuzzing environment on the PC and the target firmware on the microcontroller device. To collect code coverage information without costly code instrumentation, $μ$AFL relies on the ARM ETM hardware debugging feature, which transparently collects the instruction trace and streams the results to the PC. However, the raw ETM data is obscure and needs enormous computing resources to recover the actual instruction flow. We therefore propose an alternative representation of code coverage, which retains the same path sensitivity as the original AFL algorithm, but can directly work on the raw ETM data without matching them with disassembled instructions. To further reduce the workload, we use the DWT hardware feature to selectively collect runtime information of interest. We evaluated $μ$AFL on two real evaluation boards from two major vendors: NXP and STMicroelectronics. With our prototype, we discovered ten zero-day bugs in the driver code shipped with the SDK of STMicroelectronics and three zero-day bugs in the SDK of NXP. Eight CVEs have been allocated for them. Considering the wide adoption of vendor SDKs in real products, our results are alarming. △ Less

Submitted 19 April, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

Comments: 44th International Conference on Software Engineering (ICSE 2022)

arXiv:2202.02886 [pdf, other]

Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity

Authors: Lin Guan, Sarath Sreedharan, Subbarao Kambhampati

Abstract: Creating reinforcement learning (RL) agents that are capable of accepting and leveraging task-specific knowledge from humans has been long identified as a possible strategy for develo** scalable approaches for solving long-horizon problems. While previous works have looked at the possibility of using symbolic models along with RL approaches, they tend to assume that the high-level action models… ▽ More Creating reinforcement learning (RL) agents that are capable of accepting and leveraging task-specific knowledge from humans has been long identified as a possible strategy for develo** scalable approaches for solving long-horizon problems. While previous works have looked at the possibility of using symbolic models along with RL approaches, they tend to assume that the high-level action models are executable at low level and the fluents can exclusively characterize all desirable MDP states. Symbolic models of real world tasks are however often incomplete. To this end, we introduce Approximate Symbolic-Model Guided Reinforcement Learning, wherein we will formalize the relationship between the symbolic model and the underlying MDP that will allow us to characterize the incompleteness of the symbolic model. We will use these models to extract high-level landmarks that will be used to decompose the task. At the low level, we learn a set of diverse policies for each possible task subgoal identified by the landmark, which are then stitched together. We evaluate our system by testing on three different benchmark domains and show how even with incomplete symbolic model information, our approach is able to discover the task structure and efficiently guide the RL agent towards the goal. △ Less

Submitted 17 June, 2022; v1 submitted 6 February, 2022; originally announced February 2022.

arXiv:2201.07021 [pdf, other]

MuSCLe: A Multi-Strategy Contrastive Learning Framework for Weakly Supervised Semantic Segmentation

Authors: Kunhao Yuan, Gerald Schaefer, Yu-Kun Lai, Yifan Wang, Xiyao Liu, Lin Guan, Hui Fang

Abstract: Weakly supervised semantic segmentation (WSSS) has gained significant popularity since it relies only on weak labels such as image level annotations rather than pixel level annotations required by supervised semantic segmentation (SSS) methods. Despite drastically reduced annotation costs, typical feature representations learned from WSSS are only representative of some salient parts of objects an… ▽ More Weakly supervised semantic segmentation (WSSS) has gained significant popularity since it relies only on weak labels such as image level annotations rather than pixel level annotations required by supervised semantic segmentation (SSS) methods. Despite drastically reduced annotation costs, typical feature representations learned from WSSS are only representative of some salient parts of objects and less reliable compared to SSS due to the weak guidance during training. In this paper, we propose a novel Multi-Strategy Contrastive Learning (MuSCLe) framework to obtain enhanced feature representations and improve WSSS performance by exploiting similarity and dissimilarity of contrastive sample pairs at image, region, pixel and object boundary levels. Extensive experiments demonstrate the effectiveness of our method and show that MuSCLe outperforms the current state-of-the-art on the widely used PASCAL VOC 2012 dataset. △ Less

Submitted 18 January, 2022; originally announced January 2022.

arXiv:2201.00785 [pdf, other]

Implicit Autoencoder for Point-Cloud Self-Supervised Representation Learning

Authors: Siming Yan, Zhenpei Yang, Haoxiang Li, Chen Song, Li Guan, Hao Kang, Gang Hua, Qixing Huang

Abstract: This paper advocates the use of implicit surface representation in autoencoder-based self-supervised 3D representation learning. The most popular and accessible 3D representation, i.e., point clouds, involves discrete samples of the underlying continuous 3D surface. This discretization process introduces sampling variations on the 3D shape, making it challenging to develop transferable knowledge o… ▽ More This paper advocates the use of implicit surface representation in autoencoder-based self-supervised 3D representation learning. The most popular and accessible 3D representation, i.e., point clouds, involves discrete samples of the underlying continuous 3D surface. This discretization process introduces sampling variations on the 3D shape, making it challenging to develop transferable knowledge of the true 3D geometry. In the standard autoencoding paradigm, the encoder is compelled to encode not only the 3D geometry but also information on the specific discrete sampling of the 3D shape into the latent code. This is because the point cloud reconstructed by the decoder is considered unacceptable unless there is a perfect map** between the original and the reconstructed point clouds. This paper introduces the Implicit AutoEncoder (IAE), a simple yet effective method that addresses the sampling variation issue by replacing the commonly-used point-cloud decoder with an implicit decoder. The implicit decoder reconstructs a continuous representation of the 3D shape, independent of the imperfections in the discrete samples. Extensive experiments demonstrate that the proposed IAE achieves state-of-the-art performance across various self-supervised learning benchmarks. △ Less

Submitted 27 August, 2023; v1 submitted 3 January, 2022; originally announced January 2022.

Comments: Published in ICCV 2023. The code is available at https://github.com/SimingYan/IAE

Showing 1–50 of 175 results for author: Guan, L