-
Fairpriori: Improving Biased Subgroup Discovery for Deep Neural Network Fairness
Authors:
Kacy Zhou,
Jiawen Wen,
Nan Yang,
Dong Yuan,
Qinghua Lu,
Huaming Chen
Abstract:
While deep learning has become a core functional module of most software systems, concerns regarding the fairness of ML predictions have emerged as a significant issue that affects prediction results due to discrimination. Intersectional bias, which disproportionately affects members of subgroups, is a prime example of this. For instance, a machine learning model might exhibit bias against darker-…
▽ More
While deep learning has become a core functional module of most software systems, concerns regarding the fairness of ML predictions have emerged as a significant issue that affects prediction results due to discrimination. Intersectional bias, which disproportionately affects members of subgroups, is a prime example of this. For instance, a machine learning model might exhibit bias against darker-skinned women, while not showing bias against individuals with darker skin or women. This problem calls for effective fairness testing before the deployment of such deep learning models in real-world scenarios. However, research into detecting such bias is currently limited compared to research on individual and group fairness. Existing tools to investigate intersectional bias lack important features such as support for multiple fairness metrics, fast and efficient computation, and user-friendly interpretation. This paper introduces Fairpriori, a novel biased subgroup discovery method, which aims to address these limitations. Fairpriori incorporates the frequent itemset generation algorithm to facilitate effective and efficient investigation of intersectional bias by producing fast fairness metric calculations on subgroups of a dataset. Through comparison with the state-of-the-art methods (e.g., Themis, FairFictPlay, and TestSGD) under similar conditions, Fairpriori demonstrates superior effectiveness and efficiency when identifying intersectional bias. Specifically, Fairpriori is easier to use and interpret, supports a wider range of use cases by accommodating multiple fairness metrics, and exhibits higher efficiency in computing fairness metrics. These findings showcase Fairpriori's potential for effectively uncovering subgroups affected by intersectional bias, supported by its open-source tooling at https://anonymous.4open.science/r/Fairpriori-0320.
△ Less
Submitted 24 June, 2024;
originally announced July 2024.
-
Generating grid maps via the snake model
Authors:
Zhiwei Wei,
Nai Yang,
Wenjia Xu,
Su Ding
Abstract:
The grid map, often referred to as the tile map, stands as a vital tool in geospatial visualization, possessing unique attributes that differentiate it from more commonly known techniques such as choropleths and cartograms. It transforms geographic regions into grids, which requires the displacement of both region centroids and boundary nodes to establish a coherent grid arrangement. However, exis…
▽ More
The grid map, often referred to as the tile map, stands as a vital tool in geospatial visualization, possessing unique attributes that differentiate it from more commonly known techniques such as choropleths and cartograms. It transforms geographic regions into grids, which requires the displacement of both region centroids and boundary nodes to establish a coherent grid arrangement. However, existing approaches typically displace region centroids and boundary nodes separately, potentially resulting in self-intersected boundaries and compromised relative orientation relations between regions. In this paper, we introduce a novel approach that leverages the Snake displacement algorithm from cartographic generalization to concurrently displace region centroids and boundary nodes. The revised Constrained Delaunay triangulation (CDT) is employed to represent the relations between regions and serves as a structural foundation for the Snake algorithm. Forces for displacing the region centroids into a grid-like pattern are then computed. These forces are iteratively applied within the Snake model until a satisfactory new boundary is achieved. Subsequently, the grid map is created by aligning the grids with the newly generated boundary, utilizing a one-to-one match algorithm to assign each region to a specific grid. Experimental results demonstrate that the proposed approach excels in maintaining the relative orientation and global shape of regions, albeit with a potential increase in local location deviations. We also present two strategies aligned with existing approaches to generate diverse grid maps for user preferences. Further details and resources are available on our project website: https://github.com/TrentonWei/DorlingMap.git.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
A Lung Nodule Dataset with Histopathology-based Cancer Type Annotation
Authors:
Muwei Jian,
Hongyu Chen,
Zaiyong Zhang,
Nan Yang,
Haorang Zhang,
Lifu Ma,
Wen**g Xu,
Huixiang Zhi
Abstract:
Recently, Computer-Aided Diagnosis (CAD) systems have emerged as indispensable tools in clinical diagnostic workflows, significantly alleviating the burden on radiologists. Nevertheless, despite their integration into clinical settings, CAD systems encounter limitations. Specifically, while CAD systems can achieve high performance in the detection of lung nodules, they face challenges in accuratel…
▽ More
Recently, Computer-Aided Diagnosis (CAD) systems have emerged as indispensable tools in clinical diagnostic workflows, significantly alleviating the burden on radiologists. Nevertheless, despite their integration into clinical settings, CAD systems encounter limitations. Specifically, while CAD systems can achieve high performance in the detection of lung nodules, they face challenges in accurately predicting multiple cancer types. This limitation can be attributed to the scarcity of publicly available datasets annotated with expert-level cancer type information. This research aims to bridge this gap by providing publicly accessible datasets and reliable tools for medical diagnosis, facilitating a finer categorization of different types of lung diseases so as to offer precise treatment recommendations. To achieve this objective, we curated a diverse dataset of lung Computed Tomography (CT) images, comprising 330 annotated nodules (nodules are labeled as bounding boxes) from 95 distinct patients. The quality of the dataset was evaluated using a variety of classical classification and detection models, and these promising results demonstrate that the dataset has a feasible application and further facilitate intelligent auxiliary diagnosis.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models
Authors:
Julian Straub,
Daniel DeTone,
Tianwei Shen,
Nan Yang,
Chris Sweeney,
Richard Newcombe
Abstract:
The advent of wearable computers enables a new source of context for AI that is embedded in egocentric sensor data. This new egocentric data comes equipped with fine-grained 3D location information and thus presents the opportunity for a novel class of spatial foundation models that are rooted in 3D space. To measure progress on what we term Egocentric Foundation Models (EFMs) we establish EFM3D,…
▽ More
The advent of wearable computers enables a new source of context for AI that is embedded in egocentric sensor data. This new egocentric data comes equipped with fine-grained 3D location information and thus presents the opportunity for a novel class of spatial foundation models that are rooted in 3D space. To measure progress on what we term Egocentric Foundation Models (EFMs) we establish EFM3D, a benchmark with two core 3D egocentric perception tasks. EFM3D is the first benchmark for 3D object detection and surface regression on high quality annotated egocentric data of Project Aria. We propose Egocentric Voxel Lifting (EVL), a baseline for 3D EFMs. EVL leverages all available egocentric modalities and inherits foundational capabilities from 2D foundation models. This model, trained on a large simulated dataset, outperforms existing methods on the EFM3D benchmark.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Interior-Point-based H2 Controller Synthesis for Compartmental Systems
Authors:
Zhaohua Yang,
Nachuan Yang,
Pengyu Wang,
Haishan Zhang,
Xiayan Xu,
Ling Shi
Abstract:
This paper addresses the problem of the optimal $H_2$ controller design for compartmental systems. In other words, we aim to enhance system robustness while maintaining the law of mass conservation. We perform a novel problem transformation and establish that the original problem is equivalent to an new optimization problem with a closed polyhedron constraint. Existing works have developed various…
▽ More
This paper addresses the problem of the optimal $H_2$ controller design for compartmental systems. In other words, we aim to enhance system robustness while maintaining the law of mass conservation. We perform a novel problem transformation and establish that the original problem is equivalent to an new optimization problem with a closed polyhedron constraint. Existing works have developed various first-order methods to tackle inequality constraints. However, the performance of the first-order method is limited in terms of convergence speed and precision, restricting its potential in practical applications. Therefore, develo** a novel algorithm with fast speed and high precision is critical. In this paper, we reformulate the problem using log-barrier functions and introduce two separate approaches to address the problem: the first-order interior point method (FIPM) and the second-order interior point method (SIPM). We show they converge to a stationary point of the new problem. In addition, we propose an initialization method to guarantee the interior property of initial values. Finally, we compare FIPM and SIPM through a room temperature control example and show their pros and cons.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
QUADFormer: Learning-based Detection of Cyber Attacks in Quadrotor UAVs
Authors:
Pengyu Wang,
Zhaohua Yang,
Nachuan Yang,
Zikai Wang,
Jialu Li,
Fan Zhang,
Chaoqun Wang,
Jiankun Wang,
Max Q. -H. Meng,
Ling Shi
Abstract:
Safety-critical intelligent cyber-physical systems, such as quadrotor unmanned aerial vehicles (UAVs), are vulnerable to different types of cyber attacks, and the absence of timely and accurate attack detection can lead to severe consequences. When UAVs are engaged in large outdoor maneuvering flights, their system constitutes highly nonlinear dynamics that include non-Gaussian noises. Therefore,…
▽ More
Safety-critical intelligent cyber-physical systems, such as quadrotor unmanned aerial vehicles (UAVs), are vulnerable to different types of cyber attacks, and the absence of timely and accurate attack detection can lead to severe consequences. When UAVs are engaged in large outdoor maneuvering flights, their system constitutes highly nonlinear dynamics that include non-Gaussian noises. Therefore, the commonly employed traditional statistics-based and emerging learning-based attack detection methods do not yield satisfactory results. In response to the above challenges, we propose QUADFormer, a novel Quadrotor UAV Attack Detection framework with transFormer-based architecture. This framework includes a residue generator designed to generate a residue sequence sensitive to anomalies. Subsequently, this sequence is fed into a transformer structure with disparity in correlation to specifically learn its statistical characteristics for the purpose of classification and attack detection. Finally, we design an alert module to ensure the safe execution of tasks by UAVs under attack conditions. We conduct extensive simulations and real-world experiments, and the results show that our method has achieved superior detection performance compared with many state-of-the-art methods.
△ Less
Submitted 14 June, 2024; v1 submitted 2 June, 2024;
originally announced June 2024.
-
Sharing tea on a graph
Authors:
J. Pascal Gollin,
Kevin Hendrey,
Hao Huang,
Tony Huynh,
Bojan Mohar,
Sang-il Oum,
Ningyuan Yang,
Wei-Hsuan Yu,
Xuding Zhu
Abstract:
Motivated by the analysis of consensus formation in the Deffuant model for social interaction, we consider the following procedure on a graph $G$. Initially, there is one unit of tea at a fixed vertex $r \in V(G)$, and all other vertices have no tea. At any time in the procedure, we can choose a connected subset of vertices $T$ and equalize the amount of tea among vertices in $T$. We prove that if…
▽ More
Motivated by the analysis of consensus formation in the Deffuant model for social interaction, we consider the following procedure on a graph $G$. Initially, there is one unit of tea at a fixed vertex $r \in V(G)$, and all other vertices have no tea. At any time in the procedure, we can choose a connected subset of vertices $T$ and equalize the amount of tea among vertices in $T$. We prove that if $x \in V(G)$ is at distance $d$ from $r$, then $x$ will have at most $\frac{1}{d+1}$ units of tea during any step of the procedure. This bound is best possible and answers a question of Gantert.
We also consider arbitrary initial weight distributions. For every finite graph $G$ and $w \in \mathbb{R}_{\geq 0}^{V(G)}$, we prove that the set of weight distributions reachable from $w$ is a compact subset of $\mathbb{R}_{\geq 0}^{V(G)}$.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Finding Product and Sum Patterns in non-commutative settings
Authors:
T. Y. Tao,
Neil N. Y. Yang
Abstract:
Hindman conjectured that any finite partition of $\mathbb{N}$ has a monochromatic $\{x,y,x+y,xy\}$. Recently, Bowen proved the result for all 2-partition. In this paper, we extend Bowen's result to any semiring $(S,+,\cdot)$ such that $Ss$ is piecewise syndetic for all $s\in S$. As a method, we gave a combinatorial proof for a piecewise syndetic version of Bergerson and Glasscock's IP$_r^*$ Szemer…
▽ More
Hindman conjectured that any finite partition of $\mathbb{N}$ has a monochromatic $\{x,y,x+y,xy\}$. Recently, Bowen proved the result for all 2-partition. In this paper, we extend Bowen's result to any semiring $(S,+,\cdot)$ such that $Ss$ is piecewise syndetic for all $s\in S$. As a method, we gave a combinatorial proof for a piecewise syndetic version of Bergerson and Glasscock's IP$_r^*$ Szemerédi Theorem, and discussed the case when the operation is not commutative.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Age-minimal Multicast by Graph Attention Reinforcement Learning
Authors:
Yanning Zhang,
Guocheng Liao,
Shengbin Cao,
Ning Yang,
Meng Zhang
Abstract:
Age of Information (AoI) is an emerging metric used to assess the timeliness of information, gaining research interest in real-time multicast applications such as video streaming and metaverse platforms. In this paper, we consider a dynamic multicast network with energy constraints, where our objective is to minimize the expected time-average AoI through energy-constrained multicast routing and sc…
▽ More
Age of Information (AoI) is an emerging metric used to assess the timeliness of information, gaining research interest in real-time multicast applications such as video streaming and metaverse platforms. In this paper, we consider a dynamic multicast network with energy constraints, where our objective is to minimize the expected time-average AoI through energy-constrained multicast routing and scheduling. The inherent complexity of the problem, given the NP-hardness and intertwined scheduling and routing decisions, makes existing approaches inapplicable. To address these challenges, we decompose the original problem into two subtasks, each amenable to reinforcement learning (RL) methods. Subsequently, we propose an innovative framework based on graph attention networks (GATs) to effectively capture graph information with superior generalization capabilities. To validate our framework, we conduct experiments on three datasets including a real-world dataset called AS-733, and show that our proposed scheme reduces the average weighted AoI by 62.9% and reduces the energy consumption by at most 72.5% compared to baselines.
△ Less
Submitted 31 May, 2024; v1 submitted 28 April, 2024;
originally announced April 2024.
-
Cross-Domain Causal Preference Learning for Out-of-Distribution Recommendation
Authors:
Zhuhang Li,
Ning Yang
Abstract:
Recommender systems use users' historical interactions to learn their preferences and deliver personalized recommendations from a vast array of candidate items. Current recommender systems primarily rely on the assumption that the training and testing datasets have identical distributions, which may not hold true in reality. In fact, the distribution shift between training and testing datasets oft…
▽ More
Recommender systems use users' historical interactions to learn their preferences and deliver personalized recommendations from a vast array of candidate items. Current recommender systems primarily rely on the assumption that the training and testing datasets have identical distributions, which may not hold true in reality. In fact, the distribution shift between training and testing datasets often occurs as a result of the evolution of user attributes, which degrades the performance of the conventional recommender systems because they fail in Out-of-Distribution (OOD) generalization, particularly in situations of data sparsity. This study delves deeply into the challenge of OOD generalization and proposes a novel model called Cross-Domain Causal Preference Learning for Out-of-Distribution Recommendation (CDCOR), which involves employing a domain adversarial network to uncover users' domain-shared preferences and utilizing a causal structure learner to capture causal invariance to deal with the OOD problem. Through extensive experiments on two real-world datasets, we validate the remarkable performance of our model in handling diverse scenarios of data sparsity and out-of-distribution environments. Furthermore, our approach surpasses the benchmark models, showcasing outstanding capabilities in out-of-distribution generalization.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Beyond the Edge: An Advanced Exploration of Reinforcement Learning for Mobile Edge Computing, its Applications, and Future Research Trajectories
Authors:
Ning Yang,
Shuo Chen,
Haijun Zhang,
Randall Berry
Abstract:
Mobile Edge Computing (MEC) broadens the scope of computation and storage beyond the central network, incorporating edge nodes close to end devices. This expansion facilitates the implementation of large-scale "connected things" within edge networks. The advent of applications necessitating real-time, high-quality service presents several challenges, such as low latency, high data rate, reliabilit…
▽ More
Mobile Edge Computing (MEC) broadens the scope of computation and storage beyond the central network, incorporating edge nodes close to end devices. This expansion facilitates the implementation of large-scale "connected things" within edge networks. The advent of applications necessitating real-time, high-quality service presents several challenges, such as low latency, high data rate, reliability, efficiency, and security, all of which demand resolution. The incorporation of reinforcement learning (RL) methodologies within MEC networks promotes a deeper understanding of mobile user behaviors and network dynamics, thereby optimizing resource use in computing and communication processes. This paper offers an exhaustive survey of RL applications in MEC networks, initially presenting an overview of RL from its fundamental principles to the latest advanced frameworks. Furthermore, it outlines various RL strategies employed in offloading, caching, and communication within MEC networks. Finally, it explores open issues linked with software and hardware platforms, representation, RL robustness, safe RL, large-scale scheduling, generalization, security, and privacy. The paper proposes specific RL techniques to mitigate these issues and provides insights into their practical applications.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Are We Ready for Planetary Exploration Robots? The TAIL-Plus Dataset for SLAM in Granular Environments
Authors:
Zirui Wang,
Chen Yao,
Yangtao Ge,
Guowei Shi,
Ningbo Yang,
Zheng Zhu,
Kewei Dong,
Hexiang Wei,
Zhenzhong Jia,
**g Wu
Abstract:
So far, planetary surface exploration depends on various mobile robot platforms. The autonomous navigation and decision-making of these mobile robots in complex terrains largely rely on their terrain-aware perception, localization and map** capabilities. In this paper we release the TAIL-Plus dataset, a new challenging dataset in deformable granular environments for planetary exploration robots,…
▽ More
So far, planetary surface exploration depends on various mobile robot platforms. The autonomous navigation and decision-making of these mobile robots in complex terrains largely rely on their terrain-aware perception, localization and map** capabilities. In this paper we release the TAIL-Plus dataset, a new challenging dataset in deformable granular environments for planetary exploration robots, which is an extension to our previous work, TAIL (Terrain-Aware multI-modaL) dataset. We conducted field experiments on beaches that are considered as planetary surface analog environments for diverse sandy terrains. In TAIL-Plus dataset, we provide more sequences with multiple loops and expand the scene from day to night. Benefit from our sensor suite with modular design, we use both wheeled and quadruped robots for data collection. The sensors include a 3D LiDAR, three downward RGB-D cameras, a pair of global-shutter color cameras that can be used as a forward-looking stereo camera, an RTK-GPS device and an extra IMU. Our datasets are intended to help researchers develo** multi-sensor simultaneous localization and map** (SLAM) algorithms for robots in unstructured, deformable granular terrains. Our datasets and supplementary materials will be available at \url{https://tailrobot.github.io/}.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
On groups whose conjugacy class sizes are not divisible by each other
Authors:
Nanying Yang,
Ilya Gorshkov
Abstract:
Let $G$ be a finite group and $N(G)$ be the set of its conjugacy class sizes excluding~$1$. Let us define a directed graph $Γ(G)$, the set of vertices of this graph is $N(G)$ and the vertices $x$ and $y$ are connected by a directed edge from $x$ to $y$ if $x$ divides $y$ and $N(G)$ does not contain a number $z$ different from $x$ and $y$ such that $x$ divides $z$ and $z$ divides $y$. We will call…
▽ More
Let $G$ be a finite group and $N(G)$ be the set of its conjugacy class sizes excluding~$1$. Let us define a directed graph $Γ(G)$, the set of vertices of this graph is $N(G)$ and the vertices $x$ and $y$ are connected by a directed edge from $x$ to $y$ if $x$ divides $y$ and $N(G)$ does not contain a number $z$ different from $x$ and $y$ such that $x$ divides $z$ and $z$ divides $y$. We will call the graph $Γ(G)$ the conjugate graph of the group $G$. In this work, we will study finite groups whose conjugate graph is a set of points.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
LongEmbed: Extending Embedding Models for Long Context Retrieval
Authors:
Dawei Zhu,
Liang Wang,
Nan Yang,
Yifan Song,
Wenhao Wu,
Furu Wei,
Sujian Li
Abstract:
Embedding models play a pivot role in modern NLP applications such as IR and RAG. While the context limit of LLMs has been pushed beyond 1 million tokens, embedding models are still confined to a narrow context window not exceeding 8k tokens, refrained from application scenarios requiring long inputs such as legal contracts. This paper explores context window extension of existing embedding models…
▽ More
Embedding models play a pivot role in modern NLP applications such as IR and RAG. While the context limit of LLMs has been pushed beyond 1 million tokens, embedding models are still confined to a narrow context window not exceeding 8k tokens, refrained from application scenarios requiring long inputs such as legal contracts. This paper explores context window extension of existing embedding models, pushing the limit to 32k without requiring additional training. First, we examine the performance of current embedding models for long context retrieval on our newly constructed LongEmbed benchmark. LongEmbed comprises two synthetic tasks and four carefully chosen real-world tasks, featuring documents of varying length and dispersed target information. Benchmarking results underscore huge room for improvement in these models. Based on this, comprehensive experiments show that training-free context window extension strategies like position interpolation can effectively extend the context window of existing embedding models by several folds, regardless of their original context being 512 or beyond 4k. Furthermore, for models employing absolute position encoding (APE), we show the possibility of further fine-tuning to harvest notable performance gains while strictly preserving original behavior for short inputs. For models using rotary position embedding (RoPE), significant enhancements are observed when employing RoPE-specific methods, such as NTK and SelfExtend, indicating RoPE's superiority over APE for context window extension. To facilitate future research, we release E5-Base-4k and E5-RoPE-Base, along with the LongEmbed benchmark.
△ Less
Submitted 24 April, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Token-level Direct Preference Optimization
Authors:
Yongcheng Zeng,
Guoqing Liu,
Weiyu Ma,
Ning Yang,
Haifeng Zhang,
Jun Wang
Abstract:
Fine-tuning pre-trained Large Language Models (LLMs) is essential to align them with human values and intentions. This process often utilizes methods like pairwise comparisons and KL divergence against a reference LLM, focusing on the evaluation of full answers generated by the models. However, the generation of these responses occurs in a token level, following a sequential, auto-regressive fashi…
▽ More
Fine-tuning pre-trained Large Language Models (LLMs) is essential to align them with human values and intentions. This process often utilizes methods like pairwise comparisons and KL divergence against a reference LLM, focusing on the evaluation of full answers generated by the models. However, the generation of these responses occurs in a token level, following a sequential, auto-regressive fashion. In this paper, we introduce Token-level Direct Preference Optimization (TDPO), a novel approach to align LLMs with human preferences by optimizing policy at the token level. Unlike previous methods, which face challenges in divergence efficiency, TDPO incorporates forward KL divergence constraints for each token, improving alignment and diversity. Utilizing the Bradley-Terry model for a token-based reward system, TDPO enhances the regulation of KL divergence, while preserving simplicity without the need for explicit reward modeling. Experimental results across various text tasks demonstrate TDPO's superior performance in balancing alignment with generation diversity. Notably, fine-tuning with TDPO strikes a better balance than DPO in the controlled sentiment generation and single-turn dialogue datasets, and significantly improves the quality of generated responses compared to both DPO and PPO-based RLHF methods. Our code is open-sourced at https://github.com/Vance0124/Token-level-Direct-Preference-Optimization.
△ Less
Submitted 27 June, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
SKIP: Skill-Localized Prompt Tuning for Inference Speed Boost-Up
Authors:
Nakyeong Yang,
Junseok Kim,
Jiwon Moon,
Yunah Jang,
Kyomin Jung
Abstract:
Prompt-tuning methods have shown comparable performance as parameter-efficient fine-tuning (PEFT) methods in various natural language understanding tasks. However, existing prompt tuning methods still utilize the entire model architecture; thus, they fail to accelerate inference speed in the application. In this paper, we propose a novel approach called SKIll-localized Prompt tuning (SKIP), which…
▽ More
Prompt-tuning methods have shown comparable performance as parameter-efficient fine-tuning (PEFT) methods in various natural language understanding tasks. However, existing prompt tuning methods still utilize the entire model architecture; thus, they fail to accelerate inference speed in the application. In this paper, we propose a novel approach called SKIll-localized Prompt tuning (SKIP), which is extremely efficient in inference time. Our method significantly enhances inference efficiency by investigating and utilizing a skill-localized subnetwork in a language model. Surprisingly, our method improves the inference speed up to 160% while pruning 52% of the parameters. Furthermore, we demonstrate that our method is applicable across various transformer-based architectures, thereby confirming its practicality and scalability.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Correlated Mean Field Imitation Learning
Authors:
Zhiyu Zhao,
Ning Yang,
Xue Yan,
Haifeng Zhang,
Jun Wang,
Yaodong Yang
Abstract:
We investigate multi-agent imitation learning (IL) within the framework of mean field games (MFGs), considering the presence of time-varying correlated signals. Existing MFG IL algorithms assume demonstrations are sampled from Mean Field Nash Equilibria (MFNE), limiting their adaptability to real-world scenarios. For example, in the traffic network equilibrium influenced by public routing recommen…
▽ More
We investigate multi-agent imitation learning (IL) within the framework of mean field games (MFGs), considering the presence of time-varying correlated signals. Existing MFG IL algorithms assume demonstrations are sampled from Mean Field Nash Equilibria (MFNE), limiting their adaptability to real-world scenarios. For example, in the traffic network equilibrium influenced by public routing recommendations, recommendations introduce time-varying correlated signals into the game, not captured by MFNE and other existing correlated equilibrium concepts. To address this gap, we propose Adaptive Mean Field Correlated Equilibrium (AMFCE), a general equilibrium incorporating time-varying correlated signals. We establish the existence of AMFCE under mild conditions and prove that MFNE is a subclass of AMFCE. We further propose Correlated Mean Field Imitation Learning (CMFIL), a novel IL framework designed to recover the AMFCE, accompanied by a theoretical guarantee on the quality of the recovered policy. Experimental results, including a real-world traffic flow prediction problem, demonstrate the superiority of CMFIL over state-of-the-art IL baselines, highlighting the potential of CMFIL in understanding large population behavior under correlated signals.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Adaptive Fair Representation Learning for Personalized Fairness in Recommendations via Information Alignment
Authors:
Xinyu Zhu,
Lilin Zhang,
Ning Yang
Abstract:
Personalized fairness in recommendations has been attracting increasing attention from researchers. The existing works often treat a fairness requirement, represented as a collection of sensitive attributes, as a hyper-parameter, and pursue extreme fairness by completely removing information of sensitive attributes from the learned fair embedding, which suffer from two challenges: huge training co…
▽ More
Personalized fairness in recommendations has been attracting increasing attention from researchers. The existing works often treat a fairness requirement, represented as a collection of sensitive attributes, as a hyper-parameter, and pursue extreme fairness by completely removing information of sensitive attributes from the learned fair embedding, which suffer from two challenges: huge training cost incurred by the explosion of attribute combinations, and the suboptimal trade-off between fairness and accuracy. In this paper, we propose a novel Adaptive Fair Representation Learning (AFRL) model, which achieves a real personalized fairness due to its advantage of training only one model to adaptively serve different fairness requirements during inference phase. Particularly, AFRL treats fairness requirements as inputs and can learn an attribute-specific embedding for each attribute from the unfair user embedding, which endows AFRL with the adaptability during inference phase to determine the non-sensitive attributes under the guidance of the user's unique fairness requirement. To achieve a better trade-off between fairness and accuracy in recommendations, AFRL conducts a novel Information Alignment to exactly preserve discriminative information of non-sensitive attributes and incorporate a debiased collaborative embedding into the fair embedding to capture attribute-independent collaborative signals, without loss of fairness. Finally, the extensive experiments conducted on real datasets together with the sound theoretical analysis demonstrate the superiority of AFRL.
△ Less
Submitted 12 April, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
TAIL: A Terrain-Aware Multi-Modal SLAM Dataset for Robot Locomotion in Deformable Granular Environments
Authors:
Chen Yao,
Yangtao Ge,
Guowei Shi,
Zirui Wang,
Ningbo Yang,
Zheng Zhu,
Hexiang Wei,
Yuntian Zhao,
**g Wu,
Zhenzhong Jia
Abstract:
Terrain-aware perception holds the potential to improve the robustness and accuracy of autonomous robot navigation in the wilds, thereby facilitating effective off-road traversals. However, the lack of multi-modal perception across various motion patterns hinders the solutions of Simultaneous Localization And Map** (SLAM), especially when confronting non-geometric hazards in demanding landscapes…
▽ More
Terrain-aware perception holds the potential to improve the robustness and accuracy of autonomous robot navigation in the wilds, thereby facilitating effective off-road traversals. However, the lack of multi-modal perception across various motion patterns hinders the solutions of Simultaneous Localization And Map** (SLAM), especially when confronting non-geometric hazards in demanding landscapes. In this paper, we first propose a Terrain-Aware multI-modaL (TAIL) dataset tailored to deformable and sandy terrains. It incorporates various types of robotic proprioception and distinct ground interactions for the unique challenges and benchmark of multi-sensor fusion SLAM. The versatile sensor suite comprises stereo frame cameras, multiple ground-pointing RGB-D cameras, a rotating 3D LiDAR, an IMU, and an RTK device. This ensemble is hardware-synchronized, well-calibrated, and self-contained. Utilizing both wheeled and quadrupedal locomotion, we efficiently collect comprehensive sequences to capture rich unstructured scenarios. It spans the spectrum of scope, terrain interactions, scene changes, ground-level properties, and dynamic robot characteristics. We benchmark several state-of-the-art SLAM methods against ground truth and provide performance validations. Corresponding challenges and limitations are also reported. All associated resources are accessible upon request at \url{https://tailrobot.github.io/}.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework
Authors:
Kaiyan Chang,
Kun Wang,
Nan Yang,
Ying Wang,
Dantong **,
Wenlong Zhu,
Zhirong Chen,
Cangyuan Li,
Hao Yan,
Yunhao Zhou,
Zhuoliang Zhao,
Yuan Cheng,
Yudong Pan,
Yiqi Liu,
Mengdi Wang,
Shengwen Liang,
yinhe han,
Huawei Li,
Xiaowei Li
Abstract:
Recent advances in large language models have demonstrated their potential for automated generation of hardware description language (HDL) code from high-level prompts. Researchers have utilized fine-tuning to enhance the ability of these large language models (LLMs) in the field of Chip Design. However, the lack of Verilog data hinders further improvement in the quality of Verilog generation by L…
▽ More
Recent advances in large language models have demonstrated their potential for automated generation of hardware description language (HDL) code from high-level prompts. Researchers have utilized fine-tuning to enhance the ability of these large language models (LLMs) in the field of Chip Design. However, the lack of Verilog data hinders further improvement in the quality of Verilog generation by LLMs. Additionally, the absence of a Verilog and Electronic Design Automation (EDA) script data augmentation framework significantly increases the time required to prepare the training dataset for LLM trainers. This paper proposes an automated design-data augmentation framework, which generates high-volume and high-quality natural language aligned with Verilog and EDA scripts. For Verilog generation, it translates Verilog files to an abstract syntax tree and then maps nodes to natural language with a predefined template. For Verilog repair, it uses predefined rules to generate the wrong verilog file and then pairs EDA Tool feedback with the right and wrong verilog file. For EDA Script generation, it uses existing LLM(GPT-3.5) to obtain the description of the Script. To evaluate the effectiveness of our data augmentation method, we finetune Llama2-13B and Llama2-7B models using the dataset generated by our augmentation framework. The results demonstrate a significant improvement in the Verilog generation tasks with LLMs. Moreover, the accuracy of Verilog generation surpasses that of the current state-of-the-art open-source Verilog generation model, increasing from 58.8% to 70.6% with the same benchmark. Our 13B model (ChipGPT-FT) has a pass rate improvement compared with GPT-3.5 in Verilog generation and outperforms in EDA script (i.e., SiliconCompiler) generation with only 200 EDA script data.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Parsimonious Generative Machine Learning for Non-Gaussian Tail Modeling and Risk-Neutral Distribution Extraction
Authors:
Qi Wu,
Zhonghao Xian,
Xing Yan,
Nan Yang
Abstract:
In financial modeling problems, non-Gaussian tails exist widely in many circumstances. Among them, the accurate estimation of risk-neutral distribution (RND) from option prices is of great importance for researchers and practitioners. A precise RND can provide valuable information regarding the market's expectations, and can further help empirical asset pricing studies. This paper presents a parsi…
▽ More
In financial modeling problems, non-Gaussian tails exist widely in many circumstances. Among them, the accurate estimation of risk-neutral distribution (RND) from option prices is of great importance for researchers and practitioners. A precise RND can provide valuable information regarding the market's expectations, and can further help empirical asset pricing studies. This paper presents a parsimonious parametric approach to extract RNDs of underlying asset returns by using a generative machine learning model. The model incorporates the asymmetric heavy tails property of returns with a clever design. To calibrate the model, we design a Monte Carlo algorithm that has good capability with the assistance of modern machine learning computing tools. Numerically, the model fits Heston option prices well and captures the main shapes of implied volatility curves. Empirically, using S\&P 500 index option prices, we demonstrate that the model outperforms some popular parametric density methods under mean absolute error. Furthermore, the skewness and kurtosis of RNDs extracted by our model are consistent with intuitive expectations. More generally, the proposed methodology is widely applicable in data fitting and probabilistic forecasting.
△ Less
Submitted 4 March, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Entanglement Measure Based on Optimal Entanglement Witness
Authors:
Nan Yang,
Jiaji Wu,
Xianyun Dong,
Longyu Xiao,
**g Wang,
Ming Li
Abstract:
We introduce a new entanglement measure based on optimal entanglement witness. First of all, we show that the entanglement measure satisfies some necessary properties, including zero entanglements for all separable states, convexity, continuity, invariance under local unitary operations and non-increase under local operations and classical communication(LOCC). More than that, we give a specific ma…
▽ More
We introduce a new entanglement measure based on optimal entanglement witness. First of all, we show that the entanglement measure satisfies some necessary properties, including zero entanglements for all separable states, convexity, continuity, invariance under local unitary operations and non-increase under local operations and classical communication(LOCC). More than that, we give a specific mathematical expression for the lower bound of this entanglement measure for any bipartite mixed states. We further improve the lower bound for 2$ \otimes $2 systems. Finally, we numerically simulate the lower bound of several types of specific quantum states.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Generative Representational Instruction Tuning
Authors:
Niklas Muennighoff,
Hong** Su,
Liang Wang,
Nan Yang,
Furu Wei,
Tao Yu,
Amanpreet Singh,
Douwe Kiela
Abstract:
All text-based language problems can be reduced to either generation or embedding. Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is trained to handle both generative and embedding tasks by distinguishing between them through instructions. Compared to other open models, our resulting GritLM 7B…
▽ More
All text-based language problems can be reduced to either generation or embedding. Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is trained to handle both generative and embedding tasks by distinguishing between them through instructions. Compared to other open models, our resulting GritLM 7B sets a new state of the art on the Massive Text Embedding Benchmark (MTEB) and outperforms all models up to its size on a range of generative tasks. By scaling up further, GritLM 8x7B outperforms all open generative language models that we tried while still being among the best embedding models. Notably, we find that GRIT matches training on only generative or embedding data, thus we can unify both at no performance loss. Among other benefits, the unification via GRIT speeds up Retrieval-Augmented Generation (RAG) by > 60% for long documents, by no longer requiring separate retrieval and generation models. Models, code, etc. are freely available at https://github.com/ContextualAI/gritlm.
△ Less
Submitted 17 April, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Multilingual E5 Text Embeddings: A Technical Report
Authors:
Liang Wang,
Nan Yang,
Xiaolong Huang,
Linjun Yang,
Rangan Majumder,
Furu Wei
Abstract:
This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023. Three embedding models of different sizes (small / base / large) are provided, offering a balance between the inference efficiency and embedding quality. The training procedure adheres to the English E5 model recipe, involving contrastive pr…
▽ More
This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023. Three embedding models of different sizes (small / base / large) are provided, offering a balance between the inference efficiency and embedding quality. The training procedure adheres to the English E5 model recipe, involving contrastive pre-training on 1 billion multilingual text pairs, followed by fine-tuning on a combination of labeled datasets. Additionally, we introduce a new instruction-tuned embedding model, whose performance is on par with state-of-the-art, English-only models of similar sizes. Information regarding the model release can be found at https://github.com/microsoft/unilm/tree/master/e5 .
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Video Semantic Communication with Major Object Extraction and Contextual Video Encoding
Authors:
Haopeng Li,
Haonan Tong,
Sihua Wang,
Nuocheng Yang,
Zhaohui Yang,
Changchuan Yin
Abstract:
This paper studies an end-to-end video semantic communication system for massive communication. In the considered system, the transmitter must continuously send the video to the receiver to facilitate character reconstruction in immersive applications, such as interactive video conference. However, transmitting the original video information with substantial amounts of data poses a challenge to th…
▽ More
This paper studies an end-to-end video semantic communication system for massive communication. In the considered system, the transmitter must continuously send the video to the receiver to facilitate character reconstruction in immersive applications, such as interactive video conference. However, transmitting the original video information with substantial amounts of data poses a challenge to the limited wireless resources. To address this issue, we reduce the amount of data transmitted by making the transmitter extract and send the semantic information from the video, which refines the major object and the correlation of time and space in the video. Specifically, we first develop a video semantic communication system based on major object extraction (MOE) and contextual video encoding (CVE) to achieve efficient video transmission. Then, we design the MOE and CVE modules with convolutional neural network based motion estimation, contextual extraction and entropy coding. Simulation results show that compared to the traditional coding schemes, the proposed method can reduce the amount of transmitted data by up to 25% while increasing the peak signal-to-noise ratio (PSNR) of the reconstructed video by up to 14%.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
On the Information Leakage Performance of Secure Finite Blocklength Transmissions over Rayleigh Fading Channels
Authors:
Milad Tatar Mamaghani,
Xiangyun Zhou,
Nan Yang,
A. Lee Swindlehurst,
H. Vincent Poor
Abstract:
This paper presents a secrecy performance study of a wiretap communication system with finite blocklength (FBL) transmissions over Rayleigh fading channels, based on the definition of an average information leakage (AIL) metric. We evaluate the exact and closed-form approximate AIL performance, assuming that only statistical channel state information (CSI) of the eavesdrop** link is available. T…
▽ More
This paper presents a secrecy performance study of a wiretap communication system with finite blocklength (FBL) transmissions over Rayleigh fading channels, based on the definition of an average information leakage (AIL) metric. We evaluate the exact and closed-form approximate AIL performance, assuming that only statistical channel state information (CSI) of the eavesdrop** link is available. Then, we reveal an inherent statistical relationship between the AIL metric in the FBL regime and the commonly-used secrecy outage probability in conventional infinite blocklength communications. Aiming to improve the secure communication performance of the considered system, we formulate a blocklength optimization problem and solve it via a low-complexity approach. Next, we present numerical results to verify our analytical findings and provide various important insights into the impacts of system parameters on the AIL. Specifically, our results indicate that i) compromising a small amount of AIL can lead to significant reliability improvements, and ii) the AIL experiences a secrecy floor in the high signal-to-noise ratio regime.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
MorphGrower: A Synchronized Layer-by-layer Growing Approach for Plausible Neuronal Morphology Generation
Authors:
Nianzu Yang,
Kaipeng Zeng,
Haotian Lu,
Yexin Wu,
Zexin Yuan,
Danni Chen,
Shengdian Jiang,
Jiaxiang Wu,
Yimin Wang,
Junchi Yan
Abstract:
Neuronal morphology is essential for studying brain functioning and understanding neurodegenerative disorders. As acquiring real-world morphology data is expensive, computational approaches for morphology generation have been studied. Traditional methods heavily rely on expert-set rules and parameter tuning, making it difficult to generalize across different types of morphologies. Recently, MorphV…
▽ More
Neuronal morphology is essential for studying brain functioning and understanding neurodegenerative disorders. As acquiring real-world morphology data is expensive, computational approaches for morphology generation have been studied. Traditional methods heavily rely on expert-set rules and parameter tuning, making it difficult to generalize across different types of morphologies. Recently, MorphVAE was introduced as the sole learning-based method, but its generated morphologies lack plausibility, i.e., they do not appear realistic enough and most of the generated samples are topologically invalid. To fill this gap, this paper proposes MorphGrower, which mimicks the neuron natural growth mechanism for generation. Specifically, MorphGrower generates morphologies layer by layer, with each subsequent layer conditioned on the previously generated structure. During each layer generation, MorphGrower utilizes a pair of sibling branches as the basic generation block and generates branch pairs synchronously. This approach ensures topological validity and allows for fine-grained generation, thereby enhancing the realism of the final generated morphologies. Results on four real-world datasets demonstrate that MorphGrower outperforms MorphVAE by a notable margin. Importantly, the electrophysiological response simulation demonstrates the plausibility of our generated samples from a neuroscience perspective. Our code is available at https://github.com/Thinklab-SJTU/MorphGrower.
△ Less
Submitted 27 May, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
Attention-based UNet enabled Lightweight Image Semantic Communication System over Internet of Things
Authors:
Guoxin Ma,
Haonan Tong,
Nuocheng Yang,
Changchuan Yin
Abstract:
This paper studies the problem of the lightweight image semantic communication system that is deployed on Internet of Things (IoT) devices. In the considered system model, devices must use semantic communication techniques to support user behavior recognition in ultimate video service with high data transmission efficiency. However, it is computationally expensive for IoT devices to deploy semanti…
▽ More
This paper studies the problem of the lightweight image semantic communication system that is deployed on Internet of Things (IoT) devices. In the considered system model, devices must use semantic communication techniques to support user behavior recognition in ultimate video service with high data transmission efficiency. However, it is computationally expensive for IoT devices to deploy semantic codecs due to the complex calculation processes of deep learning (DL) based codec training and inference. To make it affordable for IoT devices to deploy semantic communication systems, we propose an attention-based UNet enabled lightweight image semantic communication (LSSC) system, which achieves low computational complexity and small model size. In particular, we first let the LSSC system train the codec at the edge server to reduce the training computation load on IoT devices. Then, we introduce the convolutional block attention module (CBAM) to extract the image semantic features and decrease the number of downsampling layers thus reducing the floating-point operations (FLOPs). Finally, we experimentally adjust the structure of the codec and find out the optimal number of downsampling layers. Simulation results show that the proposed LSSC system can reduce the semantic codec FLOPs by 14%, and reduce the model size by 55%, with a sacrifice of 3% accuracy, compared to the baseline. Moreover, the proposed scheme can achieve a higher transmission accuracy than the traditional communication scheme in the low channel signal-to-noise (SNR) region.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
On combinatorial properties of Gruenberg--Kegel graphs of finite groups
Authors:
Mingzhu Chen,
Ilya B. Gorshkov,
Natalia V. Maslova,
Nanying Yang
Abstract:
If $G$ is a finite group, then the spectrum $ω(G)$ is the set of all element orders of $G$. The prime spectrum $π(G)$ is the set of all primes belonging to $ω(G)$. A simple graph $Γ(G)$ whose vertex set is $π(G)$ and in which two distinct vertices $r$ and $s$ are adjacent if and only if $rs \in ω(G)$ is called the Gruenberg-Kegel graph or the prime graph of $G$.
In this paper, we prove that if…
▽ More
If $G$ is a finite group, then the spectrum $ω(G)$ is the set of all element orders of $G$. The prime spectrum $π(G)$ is the set of all primes belonging to $ω(G)$. A simple graph $Γ(G)$ whose vertex set is $π(G)$ and in which two distinct vertices $r$ and $s$ are adjacent if and only if $rs \in ω(G)$ is called the Gruenberg-Kegel graph or the prime graph of $G$.
In this paper, we prove that if $G$ is a group of even order, then the set of vertices which are non-adjacent to $2$ in $Γ(G)$ form a union of cliques. Moreover, we decide when a strongly regular graph is isomorphic to the Gruenberg-Kegel graph of a finite group. Besides this, we prove that a complete bipartite graph with each part of size at least $3$ can not be isomorphic to the Gruenberg-Kegel graph of a finite group.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Reflected Schrödinger Bridge for Constrained Generative Modeling
Authors:
Wei Deng,
Yu Chen,
Nicole Tianjiao Yang,
Hengrong Du,
Qi Feng,
Ricky T. Q. Chen
Abstract:
Diffusion models have become the go-to method for large-scale generative models in real-world applications. These applications often involve data distributions confined within bounded domains, typically requiring ad-hoc thresholding techniques for boundary enforcement. Reflected diffusion models (Lou23) aim to enhance generalizability by generating the data distribution through a backward process…
▽ More
Diffusion models have become the go-to method for large-scale generative models in real-world applications. These applications often involve data distributions confined within bounded domains, typically requiring ad-hoc thresholding techniques for boundary enforcement. Reflected diffusion models (Lou23) aim to enhance generalizability by generating the data distribution through a backward process governed by reflected Brownian motion. However, reflected diffusion models may not easily adapt to diverse domains without the derivation of proper diffeomorphic map**s and do not guarantee optimal transport properties. To overcome these limitations, we introduce the Reflected Schrodinger Bridge algorithm: an entropy-regularized optimal transport approach tailored for generating data within diverse bounded domains. We derive elegant reflected forward-backward stochastic differential equations with Neumann and Robin boundary conditions, extend divergence-based likelihood training to bounded domains, and explore natural connections to entropic optimal transport for the study of approximate linear convergence - a valuable insight for practical training. Our algorithm yields robust generative modeling in diverse domains, and its scalability is demonstrated in real-world constrained generative modeling through standard image benchmarks.
△ Less
Submitted 6 January, 2024;
originally announced January 2024.
-
Improving Text Embeddings with Large Language Models
Authors:
Liang Wang,
Nan Yang,
Xiaolong Huang,
Linjun Yang,
Rangan Majumder,
Furu Wei
Abstract:
In this paper, we introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps. Unlike existing methods that often depend on multi-stage intermediate pre-training with billions of weakly-supervised text pairs, followed by fine-tuning with a few labeled datasets, our method does not require building complex training pipelin…
▽ More
In this paper, we introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps. Unlike existing methods that often depend on multi-stage intermediate pre-training with billions of weakly-supervised text pairs, followed by fine-tuning with a few labeled datasets, our method does not require building complex training pipelines or relying on manually collected datasets that are often constrained by task diversity and language coverage. We leverage proprietary LLMs to generate diverse synthetic data for hundreds of thousands of text embedding tasks across 93 languages. We then fine-tune open-source decoder-only LLMs on the synthetic data using standard contrastive loss. Experiments demonstrate that our method achieves strong performance on highly competitive text embedding benchmarks without using any labeled data. Furthermore, when fine-tuned with a mixture of synthetic and labeled data, our model sets new state-of-the-art results on the BEIR and MTEB benchmarks.
△ Less
Submitted 31 May, 2024; v1 submitted 30 December, 2023;
originally announced January 2024.
-
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
Authors:
Pengxiang Ding,
Han Zhao,
Wenjie Zhang,
Wenxuan Song,
Ningxi Yang,
Donglin Wang
Abstract:
The important manifestation of robot intelligence is the ability to naturally interact and autonomously make decisions. Traditional approaches to robot control often compartmentalize perception, planning, and decision-making, simplifying system design but limiting the synergy between different information streams. This compartmentalization poses challenges in achieving seamless autonomous reasonin…
▽ More
The important manifestation of robot intelligence is the ability to naturally interact and autonomously make decisions. Traditional approaches to robot control often compartmentalize perception, planning, and decision-making, simplifying system design but limiting the synergy between different information streams. This compartmentalization poses challenges in achieving seamless autonomous reasoning, decision-making, and action execution. To address these limitations, a novel paradigm, named Vision-Language-Action tasks for QUAdruped Robots (QUAR-VLA), has been introduced in this paper. This approach tightly integrates visual information and instructions to generate executable actions, effectively merging perception, planning, and decision-making. The central idea is to elevate the overall intelligence of the robot. Within this framework, a notable challenge lies in aligning fine-grained instructions with visual perception information. This emphasizes the complexity involved in ensuring that the robot accurately interprets and acts upon detailed instructions in harmony with its visual observations. Consequently, we propose QUAdruped Robotic Transformer (QUART), a family of VLA models to integrate visual information and instructions from diverse modalities as input and generates executable actions for real-world robots and present QUAdruped Robot Dataset (QUARD), a large-scale multi-task dataset including navigation, complex terrain locomotion, and whole-body manipulation tasks for training QUART models. Our extensive evaluation (4000 evaluation trials) shows that our approach leads to performant robotic policies and enables QUART to obtain a range of emergent capabilities.
△ Less
Submitted 16 June, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
Event-driven Real-time Retrieval in Web Search
Authors:
Nan Yang,
Shusen Zhang,
Yannan Zhang,
Xiaoling Bai,
Hualong Deng,
Tianhua Zhou,
** Ma
Abstract:
Information retrieval in real-time search presents unique challenges distinct from those encountered in classical web search. These challenges are particularly pronounced due to the rapid change of user search intent, which is influenced by the occurrence and evolution of breaking news events, such as earthquakes, elections, and wars. Previous dense retrieval methods, which primarily focused on st…
▽ More
Information retrieval in real-time search presents unique challenges distinct from those encountered in classical web search. These challenges are particularly pronounced due to the rapid change of user search intent, which is influenced by the occurrence and evolution of breaking news events, such as earthquakes, elections, and wars. Previous dense retrieval methods, which primarily focused on static semantic representation, lack the capacity to capture immediate search intent, leading to inferior performance in retrieving the most recent event-related documents in time-sensitive scenarios. To address this issue, this paper expands the query with event information that represents real-time search intent. The Event information is then integrated with the query through a cross-attention mechanism, resulting in a time-context query representation. We further enhance the model's capacity for event representation through multi-task training. Since publicly available datasets such as MS-MARCO do not contain any event information on the query side and have few time-sensitive queries, we design an automatic data collection and annotation pipeline to address this issue, which includes ModelZoo-based Coarse Annotation and LLM-driven Fine Annotation processes. In addition, we share the training tricks such as two-stage training and hard negative sampling. Finally, we conduct a set of offline experiments on a million-scale production dataset to evaluate our approach and deploy an A/B testing in a real online system to verify the performance. Extensive experimental results demonstrate that our proposed approach significantly outperforms existing state-of-the-art baseline methods.
△ Less
Submitted 4 December, 2023; v1 submitted 1 December, 2023;
originally announced December 2023.
-
LongStory: Coherent, Complete and Length Controlled Long story Generation
Authors:
Kyeongman Park,
Nakyeong Yang,
Kyomin Jung
Abstract:
A human author can write any length of story without losing coherence. Also, they always bring the story to a proper ending, an ability that current language models lack. In this work, we present the LongStory for coherent, complete, and length-controlled long story generation. LongStory introduces two novel methodologies: (1) the long and short-term contexts weight calibrator (CWC) and (2) long s…
▽ More
A human author can write any length of story without losing coherence. Also, they always bring the story to a proper ending, an ability that current language models lack. In this work, we present the LongStory for coherent, complete, and length-controlled long story generation. LongStory introduces two novel methodologies: (1) the long and short-term contexts weight calibrator (CWC) and (2) long story structural positions (LSP). The CWC adjusts weights for long-term context Memory and short-term context Cheating, acknowledging their distinct roles. The LSP employs discourse tokens to convey the structural positions of a long story. Trained on three datasets with varied average story lengths, LongStory outperforms other baselines, including the strong story generator Plotmachine, in coherence, completeness, relevance, and repetitiveness. We also perform zero-shot tests on each dataset to assess the model's ability to predict outcomes beyond its training data and validate our methodology by comparing its performance with variants of our model.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
Mitigating Biases for Instruction-following Language Models via Bias Neurons Elimination
Authors:
Nakyeong Yang,
Taegwan Kang,
Jungkyu Choi,
Honglak Lee,
Kyomin Jung
Abstract:
Instruction-following language models often show undesirable biases. These undesirable biases may be accelerated in the real-world usage of language models, where a wide range of instructions is used through zero-shot example prompting. To solve this problem, we first define the bias neuron, which significantly affects biased outputs, and prove its existence empirically. Furthermore, we propose a…
▽ More
Instruction-following language models often show undesirable biases. These undesirable biases may be accelerated in the real-world usage of language models, where a wide range of instructions is used through zero-shot example prompting. To solve this problem, we first define the bias neuron, which significantly affects biased outputs, and prove its existence empirically. Furthermore, we propose a novel and practical bias mitigation method, CRISPR, to eliminate bias neurons of language models in instruction-following settings. CRISPR automatically determines biased outputs and categorizes neurons that affect the biased outputs as bias neurons using an explainability method. Experimental results demonstrate the effectiveness of our method in mitigating biases under zero-shot instruction-following settings without losing the model's task performance and existing knowledge. The experimental results reveal the generalizability of our method as it shows robustness under various instructions and datasets. Surprisingly, our method can mitigate the bias in language models by eliminating only a few neurons (at least three).
△ Less
Submitted 5 June, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Quantum synchronization via Active-Passive-Decomposition configuration: An open quantum system study
Authors:
Nan Yang,
Ting Yu
Abstract:
In this paper, we study the synchronization of dissipative quantum harmonic oscillators in the framework of quantum open system via the Active-Passive Decomposition (APD) configuration. We show that two or more quantum systems may be synchronized when the quantum systems of interest are embedded in dissipative environments and influenced by a common classical system. Such a classical system is typ…
▽ More
In this paper, we study the synchronization of dissipative quantum harmonic oscillators in the framework of quantum open system via the Active-Passive Decomposition (APD) configuration. We show that two or more quantum systems may be synchronized when the quantum systems of interest are embedded in dissipative environments and influenced by a common classical system. Such a classical system is typically termed as a controller, which (1) can drive quantum systems to cross different regimes (e.g., from periodic to chaotic motions) and (2) constructs the so-called Active-Passive Decomposition configuration such that all the quantum objects under consideration may be synchronized. The main findings of this paper is that we demonstrate that the complete synchronizations measured by the standard quantum deviation may be achieved for both stable regimes (quantum limit circles) and unstable regimes (quantum chaotic motions). As an example, we numerically show in an optomechanical setup that the complete synchronization can be realized in quantum mechanical resonators.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Time-Frequency Localization Characteristics of the Delay-Doppler Plane Orthogonal Pulse
Authors:
Akram Shafie,
**hong Yuan,
Nan Yang,
Hai Lin
Abstract:
The orthogonal delay-Doppler (DD) division multiplexing (ODDM) modulation has recently been proposed as a promising solution for ensuring reliable communications in high mobility scenarios. In this work, we investigate the time-frequency (TF) localization characteristics of the DD plane orthogonal pulse (DDOP), which is the prototype pulse of ODDM modulation. The TF localization characteristics ex…
▽ More
The orthogonal delay-Doppler (DD) division multiplexing (ODDM) modulation has recently been proposed as a promising solution for ensuring reliable communications in high mobility scenarios. In this work, we investigate the time-frequency (TF) localization characteristics of the DD plane orthogonal pulse (DDOP), which is the prototype pulse of ODDM modulation. The TF localization characteristics examine how concentrated or spread out the energy of a pulse is in the joint TF domain. We first derive the TF localization metric, TF area (TFA), for the DDOP. Based on this result, we provide insights into the energy spread of the DDOP in the joint TF domain. Then, we delve into the potential advantages of the DDOP due to its energy spread, particularly in terms of leveraging both time and frequency diversities, and enabling high-resolution sensing. Furthermore, we determine the TFA for the recently proposed generalized design of the DDOP. Finally, we validate our analysis based on numerical results and show that the energy spread for the generalized design of the DDOP in the joint TF domain exhibits a step-wise increase as the duration of sub-pulses increases.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Relative Arbitrage Opportunities in an Extended Mean Field System
Authors:
Nicole Tianjiao Yang,
Tomoyuki Ichiba
Abstract:
This paper studies relative arbitrage opportunities in a market with infinitely many interacting investors. We establish a conditional McKean-Vlasov system to study the market dynamics coupled with investors. We then provide a theoretical framework to study a mean-field system, where the mean-field terms consist of a joint distribution of wealth and strategies. The optimal relative arbitrage is ch…
▽ More
This paper studies relative arbitrage opportunities in a market with infinitely many interacting investors. We establish a conditional McKean-Vlasov system to study the market dynamics coupled with investors. We then provide a theoretical framework to study a mean-field system, where the mean-field terms consist of a joint distribution of wealth and strategies. The optimal relative arbitrage is characterized by the equilibrium of extended mean field games. We show the conditions on the existence and the uniqueness of the mean field equilibrium, then prove the propagation of chaos results for the finite-player game, and demonstrate that the Nash equilibrium converges to the mean field equilibrium when the population grows to infinity.
△ Less
Submitted 5 November, 2023;
originally announced November 2023.
-
Enhancing Traffic Object Detection in Variable Illumination with RGB-Event Fusion
Authors:
Zhanwen Liu,
Nan Yang,
Yang Wang,
Yuke Li,
Xiangmo Zhao,
Fei-Yue Wang
Abstract:
Traffic object detection under variable illumination is challenging due to the information loss caused by the limited dynamic range of conventional frame-based cameras. To address this issue, we introduce bio-inspired event cameras and propose a novel Structure-aware Fusion Network (SFNet) that extracts sharp and complete object structures from the event stream to compensate for the lost informati…
▽ More
Traffic object detection under variable illumination is challenging due to the information loss caused by the limited dynamic range of conventional frame-based cameras. To address this issue, we introduce bio-inspired event cameras and propose a novel Structure-aware Fusion Network (SFNet) that extracts sharp and complete object structures from the event stream to compensate for the lost information in images through cross-modality fusion, enabling the network to obtain illumination-robust representations for traffic object detection. Specifically, to mitigate the sparsity or blurriness issues arising from diverse motion states of traffic objects in fixed-interval event sampling methods, we propose the Reliable Structure Generation Network (RSGNet) to generate Speed Invariant Frames (SIF), ensuring the integrity and sharpness of object structures. Next, we design a novel Adaptive Feature Complement Module (AFCM) which guides the adaptive fusion of two modality features to compensate for the information loss in the images by perceiving the global lightness distribution of the images, thereby generating illumination-robust representations. Finally, considering the lack of large-scale and high-quality annotations in the existing event-based object detection datasets, we build a DSEC-Det dataset, which consists of 53 sequences with 63,931 images and more than 208,000 labels for 8 classes. Extensive experimental results demonstrate that our proposed SFNet can overcome the perceptual boundaries of conventional cameras and outperform the frame-based method by 8.0% in mAP50 and 5.9% in mAP50:95. Our code and dataset will be available at https://github.com/YN-Yang/SFNet.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Large Search Model: Redefining Search Stack in the Era of LLMs
Authors:
Liang Wang,
Nan Yang,
Xiaolong Huang,
Linjun Yang,
Rangan Majumder,
Furu Wei
Abstract:
Modern search engines are built on a stack of different components, including query understanding, retrieval, multi-stage ranking, and question answering, among others. These components are often optimized and deployed independently. In this paper, we introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one la…
▽ More
Modern search engines are built on a stack of different components, including query understanding, retrieval, multi-stage ranking, and question answering, among others. These components are often optimized and deployed independently. In this paper, we introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one large language model (LLM). All tasks are formulated as autoregressive text generation problems, allowing for the customization of tasks through the use of natural language prompts. This proposed framework capitalizes on the strong language understanding and reasoning capabilities of LLMs, offering the potential to enhance search result quality while simultaneously simplifying the existing cumbersome search stack. To substantiate the feasibility of this framework, we present a series of proof-of-concept experiments and discuss the potential challenges associated with implementing this approach within real-world search systems.
△ Less
Submitted 2 January, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Fine-Tuning LLaMA for Multi-Stage Text Retrieval
Authors:
Xueguang Ma,
Liang Wang,
Nan Yang,
Furu Wei,
Jimmy Lin
Abstract:
The effectiveness of multi-stage text retrieval has been solidly demonstrated since before the era of pre-trained language models. However, most existing studies utilize models that predate recent advances in large language models (LLMs). This study seeks to explore potential improvements that state-of-the-art LLMs can bring. We conduct a comprehensive study, fine-tuning the latest LLaMA model bot…
▽ More
The effectiveness of multi-stage text retrieval has been solidly demonstrated since before the era of pre-trained language models. However, most existing studies utilize models that predate recent advances in large language models (LLMs). This study seeks to explore potential improvements that state-of-the-art LLMs can bring. We conduct a comprehensive study, fine-tuning the latest LLaMA model both as a dense retriever (RepLLaMA) and as a pointwise reranker (RankLLaMA) for both passage retrieval and document retrieval using the MS MARCO datasets. Our findings demonstrate that the effectiveness of large language models indeed surpasses that of smaller models. Additionally, since LLMs can inherently handle longer contexts, they can represent entire documents holistically, obviating the need for traditional segmenting and pooling strategies. Furthermore, evaluations on BEIR demonstrate that our RepLLaMA-RankLLaMA pipeline exhibits strong zero-shot effectiveness. Model checkpoints from this study are available on HuggingFace.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Secure Short-Packet Transmission with Aerial Relaying: Blocklength and Trajectory Co-Design
Authors:
Milad Tatar Mamaghani,
Xiangyun Zhou,
Nan Yang,
A. Lee Swindlehurst
Abstract:
In this paper, we propose a secure short-packet communication (SPC) system involving an unmanned aerial vehicle (UAV)-aided relay in the presence of a terrestrial passive eavesdropper. The considered system, which is applicable to various next-generation Internet-of-Things (IoT) networks, exploits a UAV as a mobile relay, facilitating the reliable and secure exchange of intermittent short packets…
▽ More
In this paper, we propose a secure short-packet communication (SPC) system involving an unmanned aerial vehicle (UAV)-aided relay in the presence of a terrestrial passive eavesdropper. The considered system, which is applicable to various next-generation Internet-of-Things (IoT) networks, exploits a UAV as a mobile relay, facilitating the reliable and secure exchange of intermittent short packets between a pair of remote IoT devices with strict latency. Our objective is to improve the overall secrecy throughput performance of the system by carefully designing key parameters such as the coding blocklengths and the UAV trajectory. However, this inherently poses a challenging optimization problem that is difficult to solve optimally. To address the issue, we propose a low-complexity algorithm inspired by the block successive convex approximation approach, where we divide the original problem into two subproblems and solve them alternately until convergence. Numerical results demonstrate that the proposed design achieves significant performance improvements relative to other benchmarks, and offer valuable insights into determining appropriate coding blocklengths and UAV trajectory.
△ Less
Submitted 8 October, 2023;
originally announced October 2023.
-
Is it Really Negative? Evaluating Natural Language Video Localization Performance on Multiple Reliable Videos Pool
Authors:
Nakyeong Yang,
Minsung Kim,
Seunghyun Yoon,
Joongbo Shin,
Kyomin Jung
Abstract:
With the explosion of multimedia content in recent years, Video Corpus Moment Retrieval (VCMR), which aims to detect a video moment that matches a given natural language query from multiple videos, has become a critical problem. However, existing VCMR studies have a significant limitation since they have regarded all videos not paired with a specific query as negative, neglecting the possibility o…
▽ More
With the explosion of multimedia content in recent years, Video Corpus Moment Retrieval (VCMR), which aims to detect a video moment that matches a given natural language query from multiple videos, has become a critical problem. However, existing VCMR studies have a significant limitation since they have regarded all videos not paired with a specific query as negative, neglecting the possibility of including false negatives when constructing the negative video set. In this paper, we propose an MVMR (Massive Videos Moment Retrieval) task that aims to localize video frames within a massive video set, mitigating the possibility of falsely distinguishing positive and negative videos. For this task, we suggest an automatic dataset construction framework by employing textual and visual semantic matching evaluation methods on the existing video moment search datasets and introduce three MVMR datasets. To solve MVMR task, we further propose a strong method, CroCs, which employs cross-directional contrastive learning that selectively identifies the reliable and informative negatives, enhancing the robustness of a model on MVMR task. Experimental results on the introduced datasets reveal that existing video moment search models are easily distracted by negative video frames, whereas our model shows significant performance.
△ Less
Submitted 18 March, 2024; v1 submitted 15 August, 2023;
originally announced September 2023.
-
DefectHunter: A Novel LLM-Driven Boosted-Conformer-based Code Vulnerability Detection Mechanism
Authors:
** Wang,
Zishan Huang,
Hengli Liu,
Nianyi Yang,
Yinhao Xiao
Abstract:
One of the most pressing threats to computing systems is software vulnerabilities, which can compromise both hardware and software components. Existing methods for vulnerability detection remain suboptimal. Traditional techniques are both time-consuming and labor-intensive, while machine-learning-based approaches often underperform when applied to complex datasets, due to their inability to captur…
▽ More
One of the most pressing threats to computing systems is software vulnerabilities, which can compromise both hardware and software components. Existing methods for vulnerability detection remain suboptimal. Traditional techniques are both time-consuming and labor-intensive, while machine-learning-based approaches often underperform when applied to complex datasets, due to their inability to capture high-dimensional relationships. Previous deep-learning strategies also fall short in capturing sufficient feature information. Although self-attention mechanisms can process information over long distances, they fail to capture structural information. In this paper, we introduce DefectHunter, an innovative model for vulnerability identification that employs the Conformer mechanism. This mechanism fuses self-attention with convolutional networks to capture both local, position-wise features and global, content-based interactions. Furthermore, we optimize the self-attention mechanisms to mitigate the issue of excessive attention heads introducing extraneous noise by adjusting the denominator. We evaluated DefectHunter against ten baseline methods using six industrial and two highly complex datasets. On the QEMU dataset, DefectHunter exhibited a 20.62\% improvement in accuracy over Pongo-70B, and for the CWE-754 dataset, its accuracy was 14.64\% higher. To investigate how DefectHunter comprehends vulnerabilities, we conducted a case study, which revealed that our model effectively understands the mechanisms underlying vulnerabilities.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Privacy-Preserving Quantum Two-Party Geometric Intersection
Authors:
Wen-Jie Liu,
Yong Xu,
James C. N. Yang,
Wen-Bin Yu,
Lian-Hua Chi
Abstract:
Privacy-preserving computational geometry is the research area on the intersection of the domains of secure multi-party computation (SMC) and computational geometry. As an important field, the privacy-preserving geometric intersection (PGI) problem is when each of the multiple parties has a private geometric graph and seeks to determine whether their graphs intersect or not without revealing their…
▽ More
Privacy-preserving computational geometry is the research area on the intersection of the domains of secure multi-party computation (SMC) and computational geometry. As an important field, the privacy-preserving geometric intersection (PGI) problem is when each of the multiple parties has a private geometric graph and seeks to determine whether their graphs intersect or not without revealing their private information. In this study, through representing Alice's (Bob's) private geometric graph G_A (G_B) as the set of numbered grids S_A (S_B), an efficient privacy-preserving quantum two-party geometric intersection (PQGI) protocol is proposed. In the protocol, the oracle operation O_A (O_B) is firstly utilized to encode the private elements of S_A=(a_0, a_1, ..., a_(M-1)) (S_B=(b_0, b_1, ..., b_(N-1))) into the quantum states, and then the oracle operation O_f is applied to obtain a new quantum state which includes the XOR results between each element of S_A and S_B. Finally, the quantum counting is introduced to get the amount (t) of the states |a_i+b_j> equaling to |0>, and the intersection result can be obtained by judging t>0 or not. Compared with classical PGI protocols, our proposed protocol not only has higher security, but also holds lower communication complexity.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Authors:
Dawei Zhu,
Nan Yang,
Liang Wang,
Yifan Song,
Wenhao Wu,
Furu Wei,
Sujian Li
Abstract:
Large Language Models (LLMs) are trained with a pre-defined context length, restricting their use in scenarios requiring long inputs. Previous efforts for adapting LLMs to a longer length usually requires fine-tuning with this target length (Full-length fine-tuning), suffering intensive training cost. To decouple train length from target length for efficient context window extension, we propose Po…
▽ More
Large Language Models (LLMs) are trained with a pre-defined context length, restricting their use in scenarios requiring long inputs. Previous efforts for adapting LLMs to a longer length usually requires fine-tuning with this target length (Full-length fine-tuning), suffering intensive training cost. To decouple train length from target length for efficient context window extension, we propose Positional Skip-wisE (PoSE) training that smartly simulates long inputs using a fixed context window. This is achieved by first dividing the original context window into several chunks, then designing distinct skip** bias terms to manipulate the position indices of each chunk. These bias terms and the lengths of each chunk are altered for every training example, allowing the model to adapt to all positions within target length. Experimental results show that PoSE greatly reduces memory and time overhead compared with Full-length fine-tuning, with minimal impact on performance. Leveraging this advantage, we have successfully extended the LLaMA model to 128k tokens using a 2k training context window. Furthermore, we empirically confirm that PoSE is compatible with all RoPE-based LLMs and position interpolation strategies. Notably, our method can potentially support infinite length, limited only by memory usage in inference. With ongoing progress for efficient inference, we believe PoSE can further scale the context window beyond 128k.
△ Less
Submitted 21 February, 2024; v1 submitted 19 September, 2023;
originally announced September 2023.
-
How People Perceive The Dynamic Zero-COVID Policy: A Retrospective Analysis From The Perspective of Appraisal Theory
Authors:
Na Yang,
Kyrie Zhixuan Zhou,
Yunzhe Li
Abstract:
The Dynamic Zero-COVID Policy in China spanned three years and diverse emotional responses have been observed at different times. In this paper, we retrospectively analyzed public sentiments and perceptions of the policy, especially regarding how they evolved over time, and how they related to people's lived experiences. Through sentiment analysis of 2,358 collected Weibo posts, we identified four…
▽ More
The Dynamic Zero-COVID Policy in China spanned three years and diverse emotional responses have been observed at different times. In this paper, we retrospectively analyzed public sentiments and perceptions of the policy, especially regarding how they evolved over time, and how they related to people's lived experiences. Through sentiment analysis of 2,358 collected Weibo posts, we identified four representative points, i.e., policy initialization, sharp sentiment change, lowest sentiment score, and policy termination, for an in-depth discourse analysis through the lens of appraisal theory. In the end, we reflected on the evolving public sentiments toward the Dynamic Zero-COVID Policy and proposed implications for effective epidemic prevention and control measures for future crises.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Project Aria: A New Tool for Egocentric Multi-Modal AI Research
Authors:
Jakob Engel,
Kiran Somasundaram,
Michael Goesele,
Albert Sun,
Alexander Gamino,
Andrew Turner,
Arjang Talattof,
Arnie Yuan,
Bilal Souti,
Brighid Meredith,
Cheng Peng,
Chris Sweeney,
Cole Wilson,
Dan Barnes,
Daniel DeTone,
David Caruso,
Derek Valleroy,
Dinesh Ginjupalli,
Duncan Frost,
Edward Miller,
Elias Mueggler,
Evgeniy Oleinik,
Fan Zhang,
Guruprasad Somasundaram,
Gustavo Solaira
, et al. (49 additional authors not shown)
Abstract:
Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, mul…
▽ More
Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, multi-modal data recording and streaming device with the goal to foster and accelerate research in this area. In this paper, we describe the Aria device hardware including its sensor configuration and the corresponding software tools that enable recording and processing of such data.
△ Less
Submitted 1 October, 2023; v1 submitted 24 August, 2023;
originally announced August 2023.
-
Performance Analysis of Finite Blocklength Transmissions Over Wiretap Fading Channels: An Average Information Leakage Perspective
Authors:
Milad Tatar Mamaghani,
Xiangyun Zhou,
Nan Yang,
A. Lee Swindlehurst,
H. Vincent Poor
Abstract:
Physical-layer security (PLS) is a promising technique to complement more traditional means of communication security in beyond-5G wireless networks. However, studies of PLS are often based on ideal assumptions such as infinite coding blocklengths or perfect knowledge of the wiretap link's channel state information (CSI). In this work, we study the performance of finite blocklength (FBL) transmiss…
▽ More
Physical-layer security (PLS) is a promising technique to complement more traditional means of communication security in beyond-5G wireless networks. However, studies of PLS are often based on ideal assumptions such as infinite coding blocklengths or perfect knowledge of the wiretap link's channel state information (CSI). In this work, we study the performance of finite blocklength (FBL) transmissions using a new secrecy metric $\unicode{x2013}$ the average information leakage (AIL). We evaluate the exact and approximate AIL with Gaussian signaling and arbitrary fading channels, assuming that the eavesdropper's instantaneous CSI is unknown. We then conduct case studies that use artificial noise (AN) beamforming to analyze the AIL in both Rayleigh and Rician fading channels. The accuracy of the analytical expressions is verified through extensive simulations, and various insights regarding the impact of key system parameters on the AIL are obtained. Particularly, our results reveal that allowing a small level of AIL can potentially lead to significant reliability enhancements. To improve the system performance, we formulate and solve an average secrecy throughput (AST) optimization problem via both non-adaptive and adaptive design strategies. Our findings highlight the significance of blocklength design and AN power allocation, as well as the impact of their trade-off on the AST.
△ Less
Submitted 13 May, 2024; v1 submitted 25 August, 2023;
originally announced August 2023.
-
High-Order Topological Phase Diagram Revealed by Anomalous Nernst Effect in Janus ScClI Monolayer
Authors:
Ning-**g Yang,
Jian-Min Zhang
Abstract:
Higher-order topological properties of two-dimensional(2D) magnetic materials have recently been proposed. In 2D ferromagnetic Janus materials, we find that ScClI is a second-order topological insulator (SOTI). By means of a multi-orbital tight-binding model, we analyze the orbital contributions of higher-order topologies. Further, we give the complete high-order topological phase diagram of ScClI…
▽ More
Higher-order topological properties of two-dimensional(2D) magnetic materials have recently been proposed. In 2D ferromagnetic Janus materials, we find that ScClI is a second-order topological insulator (SOTI). By means of a multi-orbital tight-binding model, we analyze the orbital contributions of higher-order topologies. Further, we give the complete high-order topological phase diagram of ScClI, based on the external field modulation of the magneto-valley coupling and energy levels. 2D ScClI has a pronounced valley polarization, which causes different insulating phases to exhibit completely different anomalous Nernst conductance. As a result, we use the matched anomalous Nernst effect to reveal the topological phase transition process of ScClI. We utilize the characteristics of valley electronics to link higher-order topological materials with the anomalous Nernst effect, which has potential implications for high-order topological insulators and valley electronics.
△ Less
Submitted 30 January, 2024; v1 submitted 14 August, 2023;
originally announced August 2023.