-
Fairpriori: Improving Biased Subgroup Discovery for Deep Neural Network Fairness
Authors:
Kacy Zhou,
Jiawen Wen,
Nan Yang,
Dong Yuan,
Qinghua Lu,
Huaming Chen
Abstract:
While deep learning has become a core functional module of most software systems, concerns regarding the fairness of ML predictions have emerged as a significant issue that affects prediction results due to discrimination. Intersectional bias, which disproportionately affects members of subgroups, is a prime example of this. For instance, a machine learning model might exhibit bias against darker-…
▽ More
While deep learning has become a core functional module of most software systems, concerns regarding the fairness of ML predictions have emerged as a significant issue that affects prediction results due to discrimination. Intersectional bias, which disproportionately affects members of subgroups, is a prime example of this. For instance, a machine learning model might exhibit bias against darker-skinned women, while not showing bias against individuals with darker skin or women. This problem calls for effective fairness testing before the deployment of such deep learning models in real-world scenarios. However, research into detecting such bias is currently limited compared to research on individual and group fairness. Existing tools to investigate intersectional bias lack important features such as support for multiple fairness metrics, fast and efficient computation, and user-friendly interpretation. This paper introduces Fairpriori, a novel biased subgroup discovery method, which aims to address these limitations. Fairpriori incorporates the frequent itemset generation algorithm to facilitate effective and efficient investigation of intersectional bias by producing fast fairness metric calculations on subgroups of a dataset. Through comparison with the state-of-the-art methods (e.g., Themis, FairFictPlay, and TestSGD) under similar conditions, Fairpriori demonstrates superior effectiveness and efficiency when identifying intersectional bias. Specifically, Fairpriori is easier to use and interpret, supports a wider range of use cases by accommodating multiple fairness metrics, and exhibits higher efficiency in computing fairness metrics. These findings showcase Fairpriori's potential for effectively uncovering subgroups affected by intersectional bias, supported by its open-source tooling at https://anonymous.4open.science/r/Fairpriori-0320.
△ Less
Submitted 24 June, 2024;
originally announced July 2024.
-
Generating grid maps via the snake model
Authors:
Zhiwei Wei,
Nai Yang,
Wenjia Xu,
Su Ding
Abstract:
The grid map, often referred to as the tile map, stands as a vital tool in geospatial visualization, possessing unique attributes that differentiate it from more commonly known techniques such as choropleths and cartograms. It transforms geographic regions into grids, which requires the displacement of both region centroids and boundary nodes to establish a coherent grid arrangement. However, exis…
▽ More
The grid map, often referred to as the tile map, stands as a vital tool in geospatial visualization, possessing unique attributes that differentiate it from more commonly known techniques such as choropleths and cartograms. It transforms geographic regions into grids, which requires the displacement of both region centroids and boundary nodes to establish a coherent grid arrangement. However, existing approaches typically displace region centroids and boundary nodes separately, potentially resulting in self-intersected boundaries and compromised relative orientation relations between regions. In this paper, we introduce a novel approach that leverages the Snake displacement algorithm from cartographic generalization to concurrently displace region centroids and boundary nodes. The revised Constrained Delaunay triangulation (CDT) is employed to represent the relations between regions and serves as a structural foundation for the Snake algorithm. Forces for displacing the region centroids into a grid-like pattern are then computed. These forces are iteratively applied within the Snake model until a satisfactory new boundary is achieved. Subsequently, the grid map is created by aligning the grids with the newly generated boundary, utilizing a one-to-one match algorithm to assign each region to a specific grid. Experimental results demonstrate that the proposed approach excels in maintaining the relative orientation and global shape of regions, albeit with a potential increase in local location deviations. We also present two strategies aligned with existing approaches to generate diverse grid maps for user preferences. Further details and resources are available on our project website: https://github.com/TrentonWei/DorlingMap.git.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
A Lung Nodule Dataset with Histopathology-based Cancer Type Annotation
Authors:
Muwei Jian,
Hongyu Chen,
Zaiyong Zhang,
Nan Yang,
Haorang Zhang,
Lifu Ma,
Wen**g Xu,
Huixiang Zhi
Abstract:
Recently, Computer-Aided Diagnosis (CAD) systems have emerged as indispensable tools in clinical diagnostic workflows, significantly alleviating the burden on radiologists. Nevertheless, despite their integration into clinical settings, CAD systems encounter limitations. Specifically, while CAD systems can achieve high performance in the detection of lung nodules, they face challenges in accuratel…
▽ More
Recently, Computer-Aided Diagnosis (CAD) systems have emerged as indispensable tools in clinical diagnostic workflows, significantly alleviating the burden on radiologists. Nevertheless, despite their integration into clinical settings, CAD systems encounter limitations. Specifically, while CAD systems can achieve high performance in the detection of lung nodules, they face challenges in accurately predicting multiple cancer types. This limitation can be attributed to the scarcity of publicly available datasets annotated with expert-level cancer type information. This research aims to bridge this gap by providing publicly accessible datasets and reliable tools for medical diagnosis, facilitating a finer categorization of different types of lung diseases so as to offer precise treatment recommendations. To achieve this objective, we curated a diverse dataset of lung Computed Tomography (CT) images, comprising 330 annotated nodules (nodules are labeled as bounding boxes) from 95 distinct patients. The quality of the dataset was evaluated using a variety of classical classification and detection models, and these promising results demonstrate that the dataset has a feasible application and further facilitate intelligent auxiliary diagnosis.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models
Authors:
Julian Straub,
Daniel DeTone,
Tianwei Shen,
Nan Yang,
Chris Sweeney,
Richard Newcombe
Abstract:
The advent of wearable computers enables a new source of context for AI that is embedded in egocentric sensor data. This new egocentric data comes equipped with fine-grained 3D location information and thus presents the opportunity for a novel class of spatial foundation models that are rooted in 3D space. To measure progress on what we term Egocentric Foundation Models (EFMs) we establish EFM3D,…
▽ More
The advent of wearable computers enables a new source of context for AI that is embedded in egocentric sensor data. This new egocentric data comes equipped with fine-grained 3D location information and thus presents the opportunity for a novel class of spatial foundation models that are rooted in 3D space. To measure progress on what we term Egocentric Foundation Models (EFMs) we establish EFM3D, a benchmark with two core 3D egocentric perception tasks. EFM3D is the first benchmark for 3D object detection and surface regression on high quality annotated egocentric data of Project Aria. We propose Egocentric Voxel Lifting (EVL), a baseline for 3D EFMs. EVL leverages all available egocentric modalities and inherits foundational capabilities from 2D foundation models. This model, trained on a large simulated dataset, outperforms existing methods on the EFM3D benchmark.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
QUADFormer: Learning-based Detection of Cyber Attacks in Quadrotor UAVs
Authors:
Pengyu Wang,
Zhaohua Yang,
Nachuan Yang,
Zikai Wang,
Jialu Li,
Fan Zhang,
Chaoqun Wang,
Jiankun Wang,
Max Q. -H. Meng,
Ling Shi
Abstract:
Safety-critical intelligent cyber-physical systems, such as quadrotor unmanned aerial vehicles (UAVs), are vulnerable to different types of cyber attacks, and the absence of timely and accurate attack detection can lead to severe consequences. When UAVs are engaged in large outdoor maneuvering flights, their system constitutes highly nonlinear dynamics that include non-Gaussian noises. Therefore,…
▽ More
Safety-critical intelligent cyber-physical systems, such as quadrotor unmanned aerial vehicles (UAVs), are vulnerable to different types of cyber attacks, and the absence of timely and accurate attack detection can lead to severe consequences. When UAVs are engaged in large outdoor maneuvering flights, their system constitutes highly nonlinear dynamics that include non-Gaussian noises. Therefore, the commonly employed traditional statistics-based and emerging learning-based attack detection methods do not yield satisfactory results. In response to the above challenges, we propose QUADFormer, a novel Quadrotor UAV Attack Detection framework with transFormer-based architecture. This framework includes a residue generator designed to generate a residue sequence sensitive to anomalies. Subsequently, this sequence is fed into a transformer structure with disparity in correlation to specifically learn its statistical characteristics for the purpose of classification and attack detection. Finally, we design an alert module to ensure the safe execution of tasks by UAVs under attack conditions. We conduct extensive simulations and real-world experiments, and the results show that our method has achieved superior detection performance compared with many state-of-the-art methods.
△ Less
Submitted 14 June, 2024; v1 submitted 2 June, 2024;
originally announced June 2024.
-
Age-minimal Multicast by Graph Attention Reinforcement Learning
Authors:
Yanning Zhang,
Guocheng Liao,
Shengbin Cao,
Ning Yang,
Meng Zhang
Abstract:
Age of Information (AoI) is an emerging metric used to assess the timeliness of information, gaining research interest in real-time multicast applications such as video streaming and metaverse platforms. In this paper, we consider a dynamic multicast network with energy constraints, where our objective is to minimize the expected time-average AoI through energy-constrained multicast routing and sc…
▽ More
Age of Information (AoI) is an emerging metric used to assess the timeliness of information, gaining research interest in real-time multicast applications such as video streaming and metaverse platforms. In this paper, we consider a dynamic multicast network with energy constraints, where our objective is to minimize the expected time-average AoI through energy-constrained multicast routing and scheduling. The inherent complexity of the problem, given the NP-hardness and intertwined scheduling and routing decisions, makes existing approaches inapplicable. To address these challenges, we decompose the original problem into two subtasks, each amenable to reinforcement learning (RL) methods. Subsequently, we propose an innovative framework based on graph attention networks (GATs) to effectively capture graph information with superior generalization capabilities. To validate our framework, we conduct experiments on three datasets including a real-world dataset called AS-733, and show that our proposed scheme reduces the average weighted AoI by 62.9% and reduces the energy consumption by at most 72.5% compared to baselines.
△ Less
Submitted 31 May, 2024; v1 submitted 28 April, 2024;
originally announced April 2024.
-
Cross-Domain Causal Preference Learning for Out-of-Distribution Recommendation
Authors:
Zhuhang Li,
Ning Yang
Abstract:
Recommender systems use users' historical interactions to learn their preferences and deliver personalized recommendations from a vast array of candidate items. Current recommender systems primarily rely on the assumption that the training and testing datasets have identical distributions, which may not hold true in reality. In fact, the distribution shift between training and testing datasets oft…
▽ More
Recommender systems use users' historical interactions to learn their preferences and deliver personalized recommendations from a vast array of candidate items. Current recommender systems primarily rely on the assumption that the training and testing datasets have identical distributions, which may not hold true in reality. In fact, the distribution shift between training and testing datasets often occurs as a result of the evolution of user attributes, which degrades the performance of the conventional recommender systems because they fail in Out-of-Distribution (OOD) generalization, particularly in situations of data sparsity. This study delves deeply into the challenge of OOD generalization and proposes a novel model called Cross-Domain Causal Preference Learning for Out-of-Distribution Recommendation (CDCOR), which involves employing a domain adversarial network to uncover users' domain-shared preferences and utilizing a causal structure learner to capture causal invariance to deal with the OOD problem. Through extensive experiments on two real-world datasets, we validate the remarkable performance of our model in handling diverse scenarios of data sparsity and out-of-distribution environments. Furthermore, our approach surpasses the benchmark models, showcasing outstanding capabilities in out-of-distribution generalization.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Beyond the Edge: An Advanced Exploration of Reinforcement Learning for Mobile Edge Computing, its Applications, and Future Research Trajectories
Authors:
Ning Yang,
Shuo Chen,
Haijun Zhang,
Randall Berry
Abstract:
Mobile Edge Computing (MEC) broadens the scope of computation and storage beyond the central network, incorporating edge nodes close to end devices. This expansion facilitates the implementation of large-scale "connected things" within edge networks. The advent of applications necessitating real-time, high-quality service presents several challenges, such as low latency, high data rate, reliabilit…
▽ More
Mobile Edge Computing (MEC) broadens the scope of computation and storage beyond the central network, incorporating edge nodes close to end devices. This expansion facilitates the implementation of large-scale "connected things" within edge networks. The advent of applications necessitating real-time, high-quality service presents several challenges, such as low latency, high data rate, reliability, efficiency, and security, all of which demand resolution. The incorporation of reinforcement learning (RL) methodologies within MEC networks promotes a deeper understanding of mobile user behaviors and network dynamics, thereby optimizing resource use in computing and communication processes. This paper offers an exhaustive survey of RL applications in MEC networks, initially presenting an overview of RL from its fundamental principles to the latest advanced frameworks. Furthermore, it outlines various RL strategies employed in offloading, caching, and communication within MEC networks. Finally, it explores open issues linked with software and hardware platforms, representation, RL robustness, safe RL, large-scale scheduling, generalization, security, and privacy. The paper proposes specific RL techniques to mitigate these issues and provides insights into their practical applications.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Are We Ready for Planetary Exploration Robots? The TAIL-Plus Dataset for SLAM in Granular Environments
Authors:
Zirui Wang,
Chen Yao,
Yangtao Ge,
Guowei Shi,
Ningbo Yang,
Zheng Zhu,
Kewei Dong,
Hexiang Wei,
Zhenzhong Jia,
**g Wu
Abstract:
So far, planetary surface exploration depends on various mobile robot platforms. The autonomous navigation and decision-making of these mobile robots in complex terrains largely rely on their terrain-aware perception, localization and map** capabilities. In this paper we release the TAIL-Plus dataset, a new challenging dataset in deformable granular environments for planetary exploration robots,…
▽ More
So far, planetary surface exploration depends on various mobile robot platforms. The autonomous navigation and decision-making of these mobile robots in complex terrains largely rely on their terrain-aware perception, localization and map** capabilities. In this paper we release the TAIL-Plus dataset, a new challenging dataset in deformable granular environments for planetary exploration robots, which is an extension to our previous work, TAIL (Terrain-Aware multI-modaL) dataset. We conducted field experiments on beaches that are considered as planetary surface analog environments for diverse sandy terrains. In TAIL-Plus dataset, we provide more sequences with multiple loops and expand the scene from day to night. Benefit from our sensor suite with modular design, we use both wheeled and quadruped robots for data collection. The sensors include a 3D LiDAR, three downward RGB-D cameras, a pair of global-shutter color cameras that can be used as a forward-looking stereo camera, an RTK-GPS device and an extra IMU. Our datasets are intended to help researchers develo** multi-sensor simultaneous localization and map** (SLAM) algorithms for robots in unstructured, deformable granular terrains. Our datasets and supplementary materials will be available at \url{https://tailrobot.github.io/}.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
LongEmbed: Extending Embedding Models for Long Context Retrieval
Authors:
Dawei Zhu,
Liang Wang,
Nan Yang,
Yifan Song,
Wenhao Wu,
Furu Wei,
Sujian Li
Abstract:
Embedding models play a pivot role in modern NLP applications such as IR and RAG. While the context limit of LLMs has been pushed beyond 1 million tokens, embedding models are still confined to a narrow context window not exceeding 8k tokens, refrained from application scenarios requiring long inputs such as legal contracts. This paper explores context window extension of existing embedding models…
▽ More
Embedding models play a pivot role in modern NLP applications such as IR and RAG. While the context limit of LLMs has been pushed beyond 1 million tokens, embedding models are still confined to a narrow context window not exceeding 8k tokens, refrained from application scenarios requiring long inputs such as legal contracts. This paper explores context window extension of existing embedding models, pushing the limit to 32k without requiring additional training. First, we examine the performance of current embedding models for long context retrieval on our newly constructed LongEmbed benchmark. LongEmbed comprises two synthetic tasks and four carefully chosen real-world tasks, featuring documents of varying length and dispersed target information. Benchmarking results underscore huge room for improvement in these models. Based on this, comprehensive experiments show that training-free context window extension strategies like position interpolation can effectively extend the context window of existing embedding models by several folds, regardless of their original context being 512 or beyond 4k. Furthermore, for models employing absolute position encoding (APE), we show the possibility of further fine-tuning to harvest notable performance gains while strictly preserving original behavior for short inputs. For models using rotary position embedding (RoPE), significant enhancements are observed when employing RoPE-specific methods, such as NTK and SelfExtend, indicating RoPE's superiority over APE for context window extension. To facilitate future research, we release E5-Base-4k and E5-RoPE-Base, along with the LongEmbed benchmark.
△ Less
Submitted 24 April, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Token-level Direct Preference Optimization
Authors:
Yongcheng Zeng,
Guoqing Liu,
Weiyu Ma,
Ning Yang,
Haifeng Zhang,
Jun Wang
Abstract:
Fine-tuning pre-trained Large Language Models (LLMs) is essential to align them with human values and intentions. This process often utilizes methods like pairwise comparisons and KL divergence against a reference LLM, focusing on the evaluation of full answers generated by the models. However, the generation of these responses occurs in a token level, following a sequential, auto-regressive fashi…
▽ More
Fine-tuning pre-trained Large Language Models (LLMs) is essential to align them with human values and intentions. This process often utilizes methods like pairwise comparisons and KL divergence against a reference LLM, focusing on the evaluation of full answers generated by the models. However, the generation of these responses occurs in a token level, following a sequential, auto-regressive fashion. In this paper, we introduce Token-level Direct Preference Optimization (TDPO), a novel approach to align LLMs with human preferences by optimizing policy at the token level. Unlike previous methods, which face challenges in divergence efficiency, TDPO incorporates forward KL divergence constraints for each token, improving alignment and diversity. Utilizing the Bradley-Terry model for a token-based reward system, TDPO enhances the regulation of KL divergence, while preserving simplicity without the need for explicit reward modeling. Experimental results across various text tasks demonstrate TDPO's superior performance in balancing alignment with generation diversity. Notably, fine-tuning with TDPO strikes a better balance than DPO in the controlled sentiment generation and single-turn dialogue datasets, and significantly improves the quality of generated responses compared to both DPO and PPO-based RLHF methods. Our code is open-sourced at https://github.com/Vance0124/Token-level-Direct-Preference-Optimization.
△ Less
Submitted 27 June, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
SKIP: Skill-Localized Prompt Tuning for Inference Speed Boost-Up
Authors:
Nakyeong Yang,
Junseok Kim,
Jiwon Moon,
Yunah Jang,
Kyomin Jung
Abstract:
Prompt-tuning methods have shown comparable performance as parameter-efficient fine-tuning (PEFT) methods in various natural language understanding tasks. However, existing prompt tuning methods still utilize the entire model architecture; thus, they fail to accelerate inference speed in the application. In this paper, we propose a novel approach called SKIll-localized Prompt tuning (SKIP), which…
▽ More
Prompt-tuning methods have shown comparable performance as parameter-efficient fine-tuning (PEFT) methods in various natural language understanding tasks. However, existing prompt tuning methods still utilize the entire model architecture; thus, they fail to accelerate inference speed in the application. In this paper, we propose a novel approach called SKIll-localized Prompt tuning (SKIP), which is extremely efficient in inference time. Our method significantly enhances inference efficiency by investigating and utilizing a skill-localized subnetwork in a language model. Surprisingly, our method improves the inference speed up to 160% while pruning 52% of the parameters. Furthermore, we demonstrate that our method is applicable across various transformer-based architectures, thereby confirming its practicality and scalability.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Correlated Mean Field Imitation Learning
Authors:
Zhiyu Zhao,
Ning Yang,
Xue Yan,
Haifeng Zhang,
Jun Wang,
Yaodong Yang
Abstract:
We investigate multi-agent imitation learning (IL) within the framework of mean field games (MFGs), considering the presence of time-varying correlated signals. Existing MFG IL algorithms assume demonstrations are sampled from Mean Field Nash Equilibria (MFNE), limiting their adaptability to real-world scenarios. For example, in the traffic network equilibrium influenced by public routing recommen…
▽ More
We investigate multi-agent imitation learning (IL) within the framework of mean field games (MFGs), considering the presence of time-varying correlated signals. Existing MFG IL algorithms assume demonstrations are sampled from Mean Field Nash Equilibria (MFNE), limiting their adaptability to real-world scenarios. For example, in the traffic network equilibrium influenced by public routing recommendations, recommendations introduce time-varying correlated signals into the game, not captured by MFNE and other existing correlated equilibrium concepts. To address this gap, we propose Adaptive Mean Field Correlated Equilibrium (AMFCE), a general equilibrium incorporating time-varying correlated signals. We establish the existence of AMFCE under mild conditions and prove that MFNE is a subclass of AMFCE. We further propose Correlated Mean Field Imitation Learning (CMFIL), a novel IL framework designed to recover the AMFCE, accompanied by a theoretical guarantee on the quality of the recovered policy. Experimental results, including a real-world traffic flow prediction problem, demonstrate the superiority of CMFIL over state-of-the-art IL baselines, highlighting the potential of CMFIL in understanding large population behavior under correlated signals.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Adaptive Fair Representation Learning for Personalized Fairness in Recommendations via Information Alignment
Authors:
Xinyu Zhu,
Lilin Zhang,
Ning Yang
Abstract:
Personalized fairness in recommendations has been attracting increasing attention from researchers. The existing works often treat a fairness requirement, represented as a collection of sensitive attributes, as a hyper-parameter, and pursue extreme fairness by completely removing information of sensitive attributes from the learned fair embedding, which suffer from two challenges: huge training co…
▽ More
Personalized fairness in recommendations has been attracting increasing attention from researchers. The existing works often treat a fairness requirement, represented as a collection of sensitive attributes, as a hyper-parameter, and pursue extreme fairness by completely removing information of sensitive attributes from the learned fair embedding, which suffer from two challenges: huge training cost incurred by the explosion of attribute combinations, and the suboptimal trade-off between fairness and accuracy. In this paper, we propose a novel Adaptive Fair Representation Learning (AFRL) model, which achieves a real personalized fairness due to its advantage of training only one model to adaptively serve different fairness requirements during inference phase. Particularly, AFRL treats fairness requirements as inputs and can learn an attribute-specific embedding for each attribute from the unfair user embedding, which endows AFRL with the adaptability during inference phase to determine the non-sensitive attributes under the guidance of the user's unique fairness requirement. To achieve a better trade-off between fairness and accuracy in recommendations, AFRL conducts a novel Information Alignment to exactly preserve discriminative information of non-sensitive attributes and incorporate a debiased collaborative embedding into the fair embedding to capture attribute-independent collaborative signals, without loss of fairness. Finally, the extensive experiments conducted on real datasets together with the sound theoretical analysis demonstrate the superiority of AFRL.
△ Less
Submitted 12 April, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
TAIL: A Terrain-Aware Multi-Modal SLAM Dataset for Robot Locomotion in Deformable Granular Environments
Authors:
Chen Yao,
Yangtao Ge,
Guowei Shi,
Zirui Wang,
Ningbo Yang,
Zheng Zhu,
Hexiang Wei,
Yuntian Zhao,
**g Wu,
Zhenzhong Jia
Abstract:
Terrain-aware perception holds the potential to improve the robustness and accuracy of autonomous robot navigation in the wilds, thereby facilitating effective off-road traversals. However, the lack of multi-modal perception across various motion patterns hinders the solutions of Simultaneous Localization And Map** (SLAM), especially when confronting non-geometric hazards in demanding landscapes…
▽ More
Terrain-aware perception holds the potential to improve the robustness and accuracy of autonomous robot navigation in the wilds, thereby facilitating effective off-road traversals. However, the lack of multi-modal perception across various motion patterns hinders the solutions of Simultaneous Localization And Map** (SLAM), especially when confronting non-geometric hazards in demanding landscapes. In this paper, we first propose a Terrain-Aware multI-modaL (TAIL) dataset tailored to deformable and sandy terrains. It incorporates various types of robotic proprioception and distinct ground interactions for the unique challenges and benchmark of multi-sensor fusion SLAM. The versatile sensor suite comprises stereo frame cameras, multiple ground-pointing RGB-D cameras, a rotating 3D LiDAR, an IMU, and an RTK device. This ensemble is hardware-synchronized, well-calibrated, and self-contained. Utilizing both wheeled and quadrupedal locomotion, we efficiently collect comprehensive sequences to capture rich unstructured scenarios. It spans the spectrum of scope, terrain interactions, scene changes, ground-level properties, and dynamic robot characteristics. We benchmark several state-of-the-art SLAM methods against ground truth and provide performance validations. Corresponding challenges and limitations are also reported. All associated resources are accessible upon request at \url{https://tailrobot.github.io/}.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework
Authors:
Kaiyan Chang,
Kun Wang,
Nan Yang,
Ying Wang,
Dantong **,
Wenlong Zhu,
Zhirong Chen,
Cangyuan Li,
Hao Yan,
Yunhao Zhou,
Zhuoliang Zhao,
Yuan Cheng,
Yudong Pan,
Yiqi Liu,
Mengdi Wang,
Shengwen Liang,
yinhe han,
Huawei Li,
Xiaowei Li
Abstract:
Recent advances in large language models have demonstrated their potential for automated generation of hardware description language (HDL) code from high-level prompts. Researchers have utilized fine-tuning to enhance the ability of these large language models (LLMs) in the field of Chip Design. However, the lack of Verilog data hinders further improvement in the quality of Verilog generation by L…
▽ More
Recent advances in large language models have demonstrated their potential for automated generation of hardware description language (HDL) code from high-level prompts. Researchers have utilized fine-tuning to enhance the ability of these large language models (LLMs) in the field of Chip Design. However, the lack of Verilog data hinders further improvement in the quality of Verilog generation by LLMs. Additionally, the absence of a Verilog and Electronic Design Automation (EDA) script data augmentation framework significantly increases the time required to prepare the training dataset for LLM trainers. This paper proposes an automated design-data augmentation framework, which generates high-volume and high-quality natural language aligned with Verilog and EDA scripts. For Verilog generation, it translates Verilog files to an abstract syntax tree and then maps nodes to natural language with a predefined template. For Verilog repair, it uses predefined rules to generate the wrong verilog file and then pairs EDA Tool feedback with the right and wrong verilog file. For EDA Script generation, it uses existing LLM(GPT-3.5) to obtain the description of the Script. To evaluate the effectiveness of our data augmentation method, we finetune Llama2-13B and Llama2-7B models using the dataset generated by our augmentation framework. The results demonstrate a significant improvement in the Verilog generation tasks with LLMs. Moreover, the accuracy of Verilog generation surpasses that of the current state-of-the-art open-source Verilog generation model, increasing from 58.8% to 70.6% with the same benchmark. Our 13B model (ChipGPT-FT) has a pass rate improvement compared with GPT-3.5 in Verilog generation and outperforms in EDA script (i.e., SiliconCompiler) generation with only 200 EDA script data.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Generative Representational Instruction Tuning
Authors:
Niklas Muennighoff,
Hong** Su,
Liang Wang,
Nan Yang,
Furu Wei,
Tao Yu,
Amanpreet Singh,
Douwe Kiela
Abstract:
All text-based language problems can be reduced to either generation or embedding. Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is trained to handle both generative and embedding tasks by distinguishing between them through instructions. Compared to other open models, our resulting GritLM 7B…
▽ More
All text-based language problems can be reduced to either generation or embedding. Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is trained to handle both generative and embedding tasks by distinguishing between them through instructions. Compared to other open models, our resulting GritLM 7B sets a new state of the art on the Massive Text Embedding Benchmark (MTEB) and outperforms all models up to its size on a range of generative tasks. By scaling up further, GritLM 8x7B outperforms all open generative language models that we tried while still being among the best embedding models. Notably, we find that GRIT matches training on only generative or embedding data, thus we can unify both at no performance loss. Among other benefits, the unification via GRIT speeds up Retrieval-Augmented Generation (RAG) by > 60% for long documents, by no longer requiring separate retrieval and generation models. Models, code, etc. are freely available at https://github.com/ContextualAI/gritlm.
△ Less
Submitted 17 April, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Multilingual E5 Text Embeddings: A Technical Report
Authors:
Liang Wang,
Nan Yang,
Xiaolong Huang,
Linjun Yang,
Rangan Majumder,
Furu Wei
Abstract:
This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023. Three embedding models of different sizes (small / base / large) are provided, offering a balance between the inference efficiency and embedding quality. The training procedure adheres to the English E5 model recipe, involving contrastive pr…
▽ More
This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023. Three embedding models of different sizes (small / base / large) are provided, offering a balance between the inference efficiency and embedding quality. The training procedure adheres to the English E5 model recipe, involving contrastive pre-training on 1 billion multilingual text pairs, followed by fine-tuning on a combination of labeled datasets. Additionally, we introduce a new instruction-tuned embedding model, whose performance is on par with state-of-the-art, English-only models of similar sizes. Information regarding the model release can be found at https://github.com/microsoft/unilm/tree/master/e5 .
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Video Semantic Communication with Major Object Extraction and Contextual Video Encoding
Authors:
Haopeng Li,
Haonan Tong,
Sihua Wang,
Nuocheng Yang,
Zhaohui Yang,
Changchuan Yin
Abstract:
This paper studies an end-to-end video semantic communication system for massive communication. In the considered system, the transmitter must continuously send the video to the receiver to facilitate character reconstruction in immersive applications, such as interactive video conference. However, transmitting the original video information with substantial amounts of data poses a challenge to th…
▽ More
This paper studies an end-to-end video semantic communication system for massive communication. In the considered system, the transmitter must continuously send the video to the receiver to facilitate character reconstruction in immersive applications, such as interactive video conference. However, transmitting the original video information with substantial amounts of data poses a challenge to the limited wireless resources. To address this issue, we reduce the amount of data transmitted by making the transmitter extract and send the semantic information from the video, which refines the major object and the correlation of time and space in the video. Specifically, we first develop a video semantic communication system based on major object extraction (MOE) and contextual video encoding (CVE) to achieve efficient video transmission. Then, we design the MOE and CVE modules with convolutional neural network based motion estimation, contextual extraction and entropy coding. Simulation results show that compared to the traditional coding schemes, the proposed method can reduce the amount of transmitted data by up to 25% while increasing the peak signal-to-noise ratio (PSNR) of the reconstructed video by up to 14%.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
On the Information Leakage Performance of Secure Finite Blocklength Transmissions over Rayleigh Fading Channels
Authors:
Milad Tatar Mamaghani,
Xiangyun Zhou,
Nan Yang,
A. Lee Swindlehurst,
H. Vincent Poor
Abstract:
This paper presents a secrecy performance study of a wiretap communication system with finite blocklength (FBL) transmissions over Rayleigh fading channels, based on the definition of an average information leakage (AIL) metric. We evaluate the exact and closed-form approximate AIL performance, assuming that only statistical channel state information (CSI) of the eavesdrop** link is available. T…
▽ More
This paper presents a secrecy performance study of a wiretap communication system with finite blocklength (FBL) transmissions over Rayleigh fading channels, based on the definition of an average information leakage (AIL) metric. We evaluate the exact and closed-form approximate AIL performance, assuming that only statistical channel state information (CSI) of the eavesdrop** link is available. Then, we reveal an inherent statistical relationship between the AIL metric in the FBL regime and the commonly-used secrecy outage probability in conventional infinite blocklength communications. Aiming to improve the secure communication performance of the considered system, we formulate a blocklength optimization problem and solve it via a low-complexity approach. Next, we present numerical results to verify our analytical findings and provide various important insights into the impacts of system parameters on the AIL. Specifically, our results indicate that i) compromising a small amount of AIL can lead to significant reliability improvements, and ii) the AIL experiences a secrecy floor in the high signal-to-noise ratio regime.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
MorphGrower: A Synchronized Layer-by-layer Growing Approach for Plausible Neuronal Morphology Generation
Authors:
Nianzu Yang,
Kaipeng Zeng,
Haotian Lu,
Yexin Wu,
Zexin Yuan,
Danni Chen,
Shengdian Jiang,
Jiaxiang Wu,
Yimin Wang,
Junchi Yan
Abstract:
Neuronal morphology is essential for studying brain functioning and understanding neurodegenerative disorders. As acquiring real-world morphology data is expensive, computational approaches for morphology generation have been studied. Traditional methods heavily rely on expert-set rules and parameter tuning, making it difficult to generalize across different types of morphologies. Recently, MorphV…
▽ More
Neuronal morphology is essential for studying brain functioning and understanding neurodegenerative disorders. As acquiring real-world morphology data is expensive, computational approaches for morphology generation have been studied. Traditional methods heavily rely on expert-set rules and parameter tuning, making it difficult to generalize across different types of morphologies. Recently, MorphVAE was introduced as the sole learning-based method, but its generated morphologies lack plausibility, i.e., they do not appear realistic enough and most of the generated samples are topologically invalid. To fill this gap, this paper proposes MorphGrower, which mimicks the neuron natural growth mechanism for generation. Specifically, MorphGrower generates morphologies layer by layer, with each subsequent layer conditioned on the previously generated structure. During each layer generation, MorphGrower utilizes a pair of sibling branches as the basic generation block and generates branch pairs synchronously. This approach ensures topological validity and allows for fine-grained generation, thereby enhancing the realism of the final generated morphologies. Results on four real-world datasets demonstrate that MorphGrower outperforms MorphVAE by a notable margin. Importantly, the electrophysiological response simulation demonstrates the plausibility of our generated samples from a neuroscience perspective. Our code is available at https://github.com/Thinklab-SJTU/MorphGrower.
△ Less
Submitted 27 May, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
Attention-based UNet enabled Lightweight Image Semantic Communication System over Internet of Things
Authors:
Guoxin Ma,
Haonan Tong,
Nuocheng Yang,
Changchuan Yin
Abstract:
This paper studies the problem of the lightweight image semantic communication system that is deployed on Internet of Things (IoT) devices. In the considered system model, devices must use semantic communication techniques to support user behavior recognition in ultimate video service with high data transmission efficiency. However, it is computationally expensive for IoT devices to deploy semanti…
▽ More
This paper studies the problem of the lightweight image semantic communication system that is deployed on Internet of Things (IoT) devices. In the considered system model, devices must use semantic communication techniques to support user behavior recognition in ultimate video service with high data transmission efficiency. However, it is computationally expensive for IoT devices to deploy semantic codecs due to the complex calculation processes of deep learning (DL) based codec training and inference. To make it affordable for IoT devices to deploy semantic communication systems, we propose an attention-based UNet enabled lightweight image semantic communication (LSSC) system, which achieves low computational complexity and small model size. In particular, we first let the LSSC system train the codec at the edge server to reduce the training computation load on IoT devices. Then, we introduce the convolutional block attention module (CBAM) to extract the image semantic features and decrease the number of downsampling layers thus reducing the floating-point operations (FLOPs). Finally, we experimentally adjust the structure of the codec and find out the optimal number of downsampling layers. Simulation results show that the proposed LSSC system can reduce the semantic codec FLOPs by 14%, and reduce the model size by 55%, with a sacrifice of 3% accuracy, compared to the baseline. Moreover, the proposed scheme can achieve a higher transmission accuracy than the traditional communication scheme in the low channel signal-to-noise (SNR) region.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
Reflected Schrödinger Bridge for Constrained Generative Modeling
Authors:
Wei Deng,
Yu Chen,
Nicole Tianjiao Yang,
Hengrong Du,
Qi Feng,
Ricky T. Q. Chen
Abstract:
Diffusion models have become the go-to method for large-scale generative models in real-world applications. These applications often involve data distributions confined within bounded domains, typically requiring ad-hoc thresholding techniques for boundary enforcement. Reflected diffusion models (Lou23) aim to enhance generalizability by generating the data distribution through a backward process…
▽ More
Diffusion models have become the go-to method for large-scale generative models in real-world applications. These applications often involve data distributions confined within bounded domains, typically requiring ad-hoc thresholding techniques for boundary enforcement. Reflected diffusion models (Lou23) aim to enhance generalizability by generating the data distribution through a backward process governed by reflected Brownian motion. However, reflected diffusion models may not easily adapt to diverse domains without the derivation of proper diffeomorphic map**s and do not guarantee optimal transport properties. To overcome these limitations, we introduce the Reflected Schrodinger Bridge algorithm: an entropy-regularized optimal transport approach tailored for generating data within diverse bounded domains. We derive elegant reflected forward-backward stochastic differential equations with Neumann and Robin boundary conditions, extend divergence-based likelihood training to bounded domains, and explore natural connections to entropic optimal transport for the study of approximate linear convergence - a valuable insight for practical training. Our algorithm yields robust generative modeling in diverse domains, and its scalability is demonstrated in real-world constrained generative modeling through standard image benchmarks.
△ Less
Submitted 6 January, 2024;
originally announced January 2024.
-
Improving Text Embeddings with Large Language Models
Authors:
Liang Wang,
Nan Yang,
Xiaolong Huang,
Linjun Yang,
Rangan Majumder,
Furu Wei
Abstract:
In this paper, we introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps. Unlike existing methods that often depend on multi-stage intermediate pre-training with billions of weakly-supervised text pairs, followed by fine-tuning with a few labeled datasets, our method does not require building complex training pipelin…
▽ More
In this paper, we introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps. Unlike existing methods that often depend on multi-stage intermediate pre-training with billions of weakly-supervised text pairs, followed by fine-tuning with a few labeled datasets, our method does not require building complex training pipelines or relying on manually collected datasets that are often constrained by task diversity and language coverage. We leverage proprietary LLMs to generate diverse synthetic data for hundreds of thousands of text embedding tasks across 93 languages. We then fine-tune open-source decoder-only LLMs on the synthetic data using standard contrastive loss. Experiments demonstrate that our method achieves strong performance on highly competitive text embedding benchmarks without using any labeled data. Furthermore, when fine-tuned with a mixture of synthetic and labeled data, our model sets new state-of-the-art results on the BEIR and MTEB benchmarks.
△ Less
Submitted 31 May, 2024; v1 submitted 30 December, 2023;
originally announced January 2024.
-
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
Authors:
Pengxiang Ding,
Han Zhao,
Wenjie Zhang,
Wenxuan Song,
Ningxi Yang,
Donglin Wang
Abstract:
The important manifestation of robot intelligence is the ability to naturally interact and autonomously make decisions. Traditional approaches to robot control often compartmentalize perception, planning, and decision-making, simplifying system design but limiting the synergy between different information streams. This compartmentalization poses challenges in achieving seamless autonomous reasonin…
▽ More
The important manifestation of robot intelligence is the ability to naturally interact and autonomously make decisions. Traditional approaches to robot control often compartmentalize perception, planning, and decision-making, simplifying system design but limiting the synergy between different information streams. This compartmentalization poses challenges in achieving seamless autonomous reasoning, decision-making, and action execution. To address these limitations, a novel paradigm, named Vision-Language-Action tasks for QUAdruped Robots (QUAR-VLA), has been introduced in this paper. This approach tightly integrates visual information and instructions to generate executable actions, effectively merging perception, planning, and decision-making. The central idea is to elevate the overall intelligence of the robot. Within this framework, a notable challenge lies in aligning fine-grained instructions with visual perception information. This emphasizes the complexity involved in ensuring that the robot accurately interprets and acts upon detailed instructions in harmony with its visual observations. Consequently, we propose QUAdruped Robotic Transformer (QUART), a family of VLA models to integrate visual information and instructions from diverse modalities as input and generates executable actions for real-world robots and present QUAdruped Robot Dataset (QUARD), a large-scale multi-task dataset including navigation, complex terrain locomotion, and whole-body manipulation tasks for training QUART models. Our extensive evaluation (4000 evaluation trials) shows that our approach leads to performant robotic policies and enables QUART to obtain a range of emergent capabilities.
△ Less
Submitted 16 June, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
Event-driven Real-time Retrieval in Web Search
Authors:
Nan Yang,
Shusen Zhang,
Yannan Zhang,
Xiaoling Bai,
Hualong Deng,
Tianhua Zhou,
** Ma
Abstract:
Information retrieval in real-time search presents unique challenges distinct from those encountered in classical web search. These challenges are particularly pronounced due to the rapid change of user search intent, which is influenced by the occurrence and evolution of breaking news events, such as earthquakes, elections, and wars. Previous dense retrieval methods, which primarily focused on st…
▽ More
Information retrieval in real-time search presents unique challenges distinct from those encountered in classical web search. These challenges are particularly pronounced due to the rapid change of user search intent, which is influenced by the occurrence and evolution of breaking news events, such as earthquakes, elections, and wars. Previous dense retrieval methods, which primarily focused on static semantic representation, lack the capacity to capture immediate search intent, leading to inferior performance in retrieving the most recent event-related documents in time-sensitive scenarios. To address this issue, this paper expands the query with event information that represents real-time search intent. The Event information is then integrated with the query through a cross-attention mechanism, resulting in a time-context query representation. We further enhance the model's capacity for event representation through multi-task training. Since publicly available datasets such as MS-MARCO do not contain any event information on the query side and have few time-sensitive queries, we design an automatic data collection and annotation pipeline to address this issue, which includes ModelZoo-based Coarse Annotation and LLM-driven Fine Annotation processes. In addition, we share the training tricks such as two-stage training and hard negative sampling. Finally, we conduct a set of offline experiments on a million-scale production dataset to evaluate our approach and deploy an A/B testing in a real online system to verify the performance. Extensive experimental results demonstrate that our proposed approach significantly outperforms existing state-of-the-art baseline methods.
△ Less
Submitted 4 December, 2023; v1 submitted 1 December, 2023;
originally announced December 2023.
-
LongStory: Coherent, Complete and Length Controlled Long story Generation
Authors:
Kyeongman Park,
Nakyeong Yang,
Kyomin Jung
Abstract:
A human author can write any length of story without losing coherence. Also, they always bring the story to a proper ending, an ability that current language models lack. In this work, we present the LongStory for coherent, complete, and length-controlled long story generation. LongStory introduces two novel methodologies: (1) the long and short-term contexts weight calibrator (CWC) and (2) long s…
▽ More
A human author can write any length of story without losing coherence. Also, they always bring the story to a proper ending, an ability that current language models lack. In this work, we present the LongStory for coherent, complete, and length-controlled long story generation. LongStory introduces two novel methodologies: (1) the long and short-term contexts weight calibrator (CWC) and (2) long story structural positions (LSP). The CWC adjusts weights for long-term context Memory and short-term context Cheating, acknowledging their distinct roles. The LSP employs discourse tokens to convey the structural positions of a long story. Trained on three datasets with varied average story lengths, LongStory outperforms other baselines, including the strong story generator Plotmachine, in coherence, completeness, relevance, and repetitiveness. We also perform zero-shot tests on each dataset to assess the model's ability to predict outcomes beyond its training data and validate our methodology by comparing its performance with variants of our model.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
Mitigating Biases for Instruction-following Language Models via Bias Neurons Elimination
Authors:
Nakyeong Yang,
Taegwan Kang,
Jungkyu Choi,
Honglak Lee,
Kyomin Jung
Abstract:
Instruction-following language models often show undesirable biases. These undesirable biases may be accelerated in the real-world usage of language models, where a wide range of instructions is used through zero-shot example prompting. To solve this problem, we first define the bias neuron, which significantly affects biased outputs, and prove its existence empirically. Furthermore, we propose a…
▽ More
Instruction-following language models often show undesirable biases. These undesirable biases may be accelerated in the real-world usage of language models, where a wide range of instructions is used through zero-shot example prompting. To solve this problem, we first define the bias neuron, which significantly affects biased outputs, and prove its existence empirically. Furthermore, we propose a novel and practical bias mitigation method, CRISPR, to eliminate bias neurons of language models in instruction-following settings. CRISPR automatically determines biased outputs and categorizes neurons that affect the biased outputs as bias neurons using an explainability method. Experimental results demonstrate the effectiveness of our method in mitigating biases under zero-shot instruction-following settings without losing the model's task performance and existing knowledge. The experimental results reveal the generalizability of our method as it shows robustness under various instructions and datasets. Surprisingly, our method can mitigate the bias in language models by eliminating only a few neurons (at least three).
△ Less
Submitted 5 June, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Enhancing Traffic Object Detection in Variable Illumination with RGB-Event Fusion
Authors:
Zhanwen Liu,
Nan Yang,
Yang Wang,
Yuke Li,
Xiangmo Zhao,
Fei-Yue Wang
Abstract:
Traffic object detection under variable illumination is challenging due to the information loss caused by the limited dynamic range of conventional frame-based cameras. To address this issue, we introduce bio-inspired event cameras and propose a novel Structure-aware Fusion Network (SFNet) that extracts sharp and complete object structures from the event stream to compensate for the lost informati…
▽ More
Traffic object detection under variable illumination is challenging due to the information loss caused by the limited dynamic range of conventional frame-based cameras. To address this issue, we introduce bio-inspired event cameras and propose a novel Structure-aware Fusion Network (SFNet) that extracts sharp and complete object structures from the event stream to compensate for the lost information in images through cross-modality fusion, enabling the network to obtain illumination-robust representations for traffic object detection. Specifically, to mitigate the sparsity or blurriness issues arising from diverse motion states of traffic objects in fixed-interval event sampling methods, we propose the Reliable Structure Generation Network (RSGNet) to generate Speed Invariant Frames (SIF), ensuring the integrity and sharpness of object structures. Next, we design a novel Adaptive Feature Complement Module (AFCM) which guides the adaptive fusion of two modality features to compensate for the information loss in the images by perceiving the global lightness distribution of the images, thereby generating illumination-robust representations. Finally, considering the lack of large-scale and high-quality annotations in the existing event-based object detection datasets, we build a DSEC-Det dataset, which consists of 53 sequences with 63,931 images and more than 208,000 labels for 8 classes. Extensive experimental results demonstrate that our proposed SFNet can overcome the perceptual boundaries of conventional cameras and outperform the frame-based method by 8.0% in mAP50 and 5.9% in mAP50:95. Our code and dataset will be available at https://github.com/YN-Yang/SFNet.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Large Search Model: Redefining Search Stack in the Era of LLMs
Authors:
Liang Wang,
Nan Yang,
Xiaolong Huang,
Linjun Yang,
Rangan Majumder,
Furu Wei
Abstract:
Modern search engines are built on a stack of different components, including query understanding, retrieval, multi-stage ranking, and question answering, among others. These components are often optimized and deployed independently. In this paper, we introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one la…
▽ More
Modern search engines are built on a stack of different components, including query understanding, retrieval, multi-stage ranking, and question answering, among others. These components are often optimized and deployed independently. In this paper, we introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one large language model (LLM). All tasks are formulated as autoregressive text generation problems, allowing for the customization of tasks through the use of natural language prompts. This proposed framework capitalizes on the strong language understanding and reasoning capabilities of LLMs, offering the potential to enhance search result quality while simultaneously simplifying the existing cumbersome search stack. To substantiate the feasibility of this framework, we present a series of proof-of-concept experiments and discuss the potential challenges associated with implementing this approach within real-world search systems.
△ Less
Submitted 2 January, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Fine-Tuning LLaMA for Multi-Stage Text Retrieval
Authors:
Xueguang Ma,
Liang Wang,
Nan Yang,
Furu Wei,
Jimmy Lin
Abstract:
The effectiveness of multi-stage text retrieval has been solidly demonstrated since before the era of pre-trained language models. However, most existing studies utilize models that predate recent advances in large language models (LLMs). This study seeks to explore potential improvements that state-of-the-art LLMs can bring. We conduct a comprehensive study, fine-tuning the latest LLaMA model bot…
▽ More
The effectiveness of multi-stage text retrieval has been solidly demonstrated since before the era of pre-trained language models. However, most existing studies utilize models that predate recent advances in large language models (LLMs). This study seeks to explore potential improvements that state-of-the-art LLMs can bring. We conduct a comprehensive study, fine-tuning the latest LLaMA model both as a dense retriever (RepLLaMA) and as a pointwise reranker (RankLLaMA) for both passage retrieval and document retrieval using the MS MARCO datasets. Our findings demonstrate that the effectiveness of large language models indeed surpasses that of smaller models. Additionally, since LLMs can inherently handle longer contexts, they can represent entire documents holistically, obviating the need for traditional segmenting and pooling strategies. Furthermore, evaluations on BEIR demonstrate that our RepLLaMA-RankLLaMA pipeline exhibits strong zero-shot effectiveness. Model checkpoints from this study are available on HuggingFace.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Secure Short-Packet Transmission with Aerial Relaying: Blocklength and Trajectory Co-Design
Authors:
Milad Tatar Mamaghani,
Xiangyun Zhou,
Nan Yang,
A. Lee Swindlehurst
Abstract:
In this paper, we propose a secure short-packet communication (SPC) system involving an unmanned aerial vehicle (UAV)-aided relay in the presence of a terrestrial passive eavesdropper. The considered system, which is applicable to various next-generation Internet-of-Things (IoT) networks, exploits a UAV as a mobile relay, facilitating the reliable and secure exchange of intermittent short packets…
▽ More
In this paper, we propose a secure short-packet communication (SPC) system involving an unmanned aerial vehicle (UAV)-aided relay in the presence of a terrestrial passive eavesdropper. The considered system, which is applicable to various next-generation Internet-of-Things (IoT) networks, exploits a UAV as a mobile relay, facilitating the reliable and secure exchange of intermittent short packets between a pair of remote IoT devices with strict latency. Our objective is to improve the overall secrecy throughput performance of the system by carefully designing key parameters such as the coding blocklengths and the UAV trajectory. However, this inherently poses a challenging optimization problem that is difficult to solve optimally. To address the issue, we propose a low-complexity algorithm inspired by the block successive convex approximation approach, where we divide the original problem into two subproblems and solve them alternately until convergence. Numerical results demonstrate that the proposed design achieves significant performance improvements relative to other benchmarks, and offer valuable insights into determining appropriate coding blocklengths and UAV trajectory.
△ Less
Submitted 8 October, 2023;
originally announced October 2023.
-
Is it Really Negative? Evaluating Natural Language Video Localization Performance on Multiple Reliable Videos Pool
Authors:
Nakyeong Yang,
Minsung Kim,
Seunghyun Yoon,
Joongbo Shin,
Kyomin Jung
Abstract:
With the explosion of multimedia content in recent years, Video Corpus Moment Retrieval (VCMR), which aims to detect a video moment that matches a given natural language query from multiple videos, has become a critical problem. However, existing VCMR studies have a significant limitation since they have regarded all videos not paired with a specific query as negative, neglecting the possibility o…
▽ More
With the explosion of multimedia content in recent years, Video Corpus Moment Retrieval (VCMR), which aims to detect a video moment that matches a given natural language query from multiple videos, has become a critical problem. However, existing VCMR studies have a significant limitation since they have regarded all videos not paired with a specific query as negative, neglecting the possibility of including false negatives when constructing the negative video set. In this paper, we propose an MVMR (Massive Videos Moment Retrieval) task that aims to localize video frames within a massive video set, mitigating the possibility of falsely distinguishing positive and negative videos. For this task, we suggest an automatic dataset construction framework by employing textual and visual semantic matching evaluation methods on the existing video moment search datasets and introduce three MVMR datasets. To solve MVMR task, we further propose a strong method, CroCs, which employs cross-directional contrastive learning that selectively identifies the reliable and informative negatives, enhancing the robustness of a model on MVMR task. Experimental results on the introduced datasets reveal that existing video moment search models are easily distracted by negative video frames, whereas our model shows significant performance.
△ Less
Submitted 18 March, 2024; v1 submitted 15 August, 2023;
originally announced September 2023.
-
DefectHunter: A Novel LLM-Driven Boosted-Conformer-based Code Vulnerability Detection Mechanism
Authors:
** Wang,
Zishan Huang,
Hengli Liu,
Nianyi Yang,
Yinhao Xiao
Abstract:
One of the most pressing threats to computing systems is software vulnerabilities, which can compromise both hardware and software components. Existing methods for vulnerability detection remain suboptimal. Traditional techniques are both time-consuming and labor-intensive, while machine-learning-based approaches often underperform when applied to complex datasets, due to their inability to captur…
▽ More
One of the most pressing threats to computing systems is software vulnerabilities, which can compromise both hardware and software components. Existing methods for vulnerability detection remain suboptimal. Traditional techniques are both time-consuming and labor-intensive, while machine-learning-based approaches often underperform when applied to complex datasets, due to their inability to capture high-dimensional relationships. Previous deep-learning strategies also fall short in capturing sufficient feature information. Although self-attention mechanisms can process information over long distances, they fail to capture structural information. In this paper, we introduce DefectHunter, an innovative model for vulnerability identification that employs the Conformer mechanism. This mechanism fuses self-attention with convolutional networks to capture both local, position-wise features and global, content-based interactions. Furthermore, we optimize the self-attention mechanisms to mitigate the issue of excessive attention heads introducing extraneous noise by adjusting the denominator. We evaluated DefectHunter against ten baseline methods using six industrial and two highly complex datasets. On the QEMU dataset, DefectHunter exhibited a 20.62\% improvement in accuracy over Pongo-70B, and for the CWE-754 dataset, its accuracy was 14.64\% higher. To investigate how DefectHunter comprehends vulnerabilities, we conducted a case study, which revealed that our model effectively understands the mechanisms underlying vulnerabilities.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Privacy-Preserving Quantum Two-Party Geometric Intersection
Authors:
Wen-Jie Liu,
Yong Xu,
James C. N. Yang,
Wen-Bin Yu,
Lian-Hua Chi
Abstract:
Privacy-preserving computational geometry is the research area on the intersection of the domains of secure multi-party computation (SMC) and computational geometry. As an important field, the privacy-preserving geometric intersection (PGI) problem is when each of the multiple parties has a private geometric graph and seeks to determine whether their graphs intersect or not without revealing their…
▽ More
Privacy-preserving computational geometry is the research area on the intersection of the domains of secure multi-party computation (SMC) and computational geometry. As an important field, the privacy-preserving geometric intersection (PGI) problem is when each of the multiple parties has a private geometric graph and seeks to determine whether their graphs intersect or not without revealing their private information. In this study, through representing Alice's (Bob's) private geometric graph G_A (G_B) as the set of numbered grids S_A (S_B), an efficient privacy-preserving quantum two-party geometric intersection (PQGI) protocol is proposed. In the protocol, the oracle operation O_A (O_B) is firstly utilized to encode the private elements of S_A=(a_0, a_1, ..., a_(M-1)) (S_B=(b_0, b_1, ..., b_(N-1))) into the quantum states, and then the oracle operation O_f is applied to obtain a new quantum state which includes the XOR results between each element of S_A and S_B. Finally, the quantum counting is introduced to get the amount (t) of the states |a_i+b_j> equaling to |0>, and the intersection result can be obtained by judging t>0 or not. Compared with classical PGI protocols, our proposed protocol not only has higher security, but also holds lower communication complexity.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Authors:
Dawei Zhu,
Nan Yang,
Liang Wang,
Yifan Song,
Wenhao Wu,
Furu Wei,
Sujian Li
Abstract:
Large Language Models (LLMs) are trained with a pre-defined context length, restricting their use in scenarios requiring long inputs. Previous efforts for adapting LLMs to a longer length usually requires fine-tuning with this target length (Full-length fine-tuning), suffering intensive training cost. To decouple train length from target length for efficient context window extension, we propose Po…
▽ More
Large Language Models (LLMs) are trained with a pre-defined context length, restricting their use in scenarios requiring long inputs. Previous efforts for adapting LLMs to a longer length usually requires fine-tuning with this target length (Full-length fine-tuning), suffering intensive training cost. To decouple train length from target length for efficient context window extension, we propose Positional Skip-wisE (PoSE) training that smartly simulates long inputs using a fixed context window. This is achieved by first dividing the original context window into several chunks, then designing distinct skip** bias terms to manipulate the position indices of each chunk. These bias terms and the lengths of each chunk are altered for every training example, allowing the model to adapt to all positions within target length. Experimental results show that PoSE greatly reduces memory and time overhead compared with Full-length fine-tuning, with minimal impact on performance. Leveraging this advantage, we have successfully extended the LLaMA model to 128k tokens using a 2k training context window. Furthermore, we empirically confirm that PoSE is compatible with all RoPE-based LLMs and position interpolation strategies. Notably, our method can potentially support infinite length, limited only by memory usage in inference. With ongoing progress for efficient inference, we believe PoSE can further scale the context window beyond 128k.
△ Less
Submitted 21 February, 2024; v1 submitted 19 September, 2023;
originally announced September 2023.
-
How People Perceive The Dynamic Zero-COVID Policy: A Retrospective Analysis From The Perspective of Appraisal Theory
Authors:
Na Yang,
Kyrie Zhixuan Zhou,
Yunzhe Li
Abstract:
The Dynamic Zero-COVID Policy in China spanned three years and diverse emotional responses have been observed at different times. In this paper, we retrospectively analyzed public sentiments and perceptions of the policy, especially regarding how they evolved over time, and how they related to people's lived experiences. Through sentiment analysis of 2,358 collected Weibo posts, we identified four…
▽ More
The Dynamic Zero-COVID Policy in China spanned three years and diverse emotional responses have been observed at different times. In this paper, we retrospectively analyzed public sentiments and perceptions of the policy, especially regarding how they evolved over time, and how they related to people's lived experiences. Through sentiment analysis of 2,358 collected Weibo posts, we identified four representative points, i.e., policy initialization, sharp sentiment change, lowest sentiment score, and policy termination, for an in-depth discourse analysis through the lens of appraisal theory. In the end, we reflected on the evolving public sentiments toward the Dynamic Zero-COVID Policy and proposed implications for effective epidemic prevention and control measures for future crises.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Project Aria: A New Tool for Egocentric Multi-Modal AI Research
Authors:
Jakob Engel,
Kiran Somasundaram,
Michael Goesele,
Albert Sun,
Alexander Gamino,
Andrew Turner,
Arjang Talattof,
Arnie Yuan,
Bilal Souti,
Brighid Meredith,
Cheng Peng,
Chris Sweeney,
Cole Wilson,
Dan Barnes,
Daniel DeTone,
David Caruso,
Derek Valleroy,
Dinesh Ginjupalli,
Duncan Frost,
Edward Miller,
Elias Mueggler,
Evgeniy Oleinik,
Fan Zhang,
Guruprasad Somasundaram,
Gustavo Solaira
, et al. (49 additional authors not shown)
Abstract:
Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, mul…
▽ More
Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, multi-modal data recording and streaming device with the goal to foster and accelerate research in this area. In this paper, we describe the Aria device hardware including its sensor configuration and the corresponding software tools that enable recording and processing of such data.
△ Less
Submitted 1 October, 2023; v1 submitted 24 August, 2023;
originally announced August 2023.
-
Performance Analysis of Finite Blocklength Transmissions Over Wiretap Fading Channels: An Average Information Leakage Perspective
Authors:
Milad Tatar Mamaghani,
Xiangyun Zhou,
Nan Yang,
A. Lee Swindlehurst,
H. Vincent Poor
Abstract:
Physical-layer security (PLS) is a promising technique to complement more traditional means of communication security in beyond-5G wireless networks. However, studies of PLS are often based on ideal assumptions such as infinite coding blocklengths or perfect knowledge of the wiretap link's channel state information (CSI). In this work, we study the performance of finite blocklength (FBL) transmiss…
▽ More
Physical-layer security (PLS) is a promising technique to complement more traditional means of communication security in beyond-5G wireless networks. However, studies of PLS are often based on ideal assumptions such as infinite coding blocklengths or perfect knowledge of the wiretap link's channel state information (CSI). In this work, we study the performance of finite blocklength (FBL) transmissions using a new secrecy metric $\unicode{x2013}$ the average information leakage (AIL). We evaluate the exact and approximate AIL with Gaussian signaling and arbitrary fading channels, assuming that the eavesdropper's instantaneous CSI is unknown. We then conduct case studies that use artificial noise (AN) beamforming to analyze the AIL in both Rayleigh and Rician fading channels. The accuracy of the analytical expressions is verified through extensive simulations, and various insights regarding the impact of key system parameters on the AIL are obtained. Particularly, our results reveal that allowing a small level of AIL can potentially lead to significant reliability enhancements. To improve the system performance, we formulate and solve an average secrecy throughput (AST) optimization problem via both non-adaptive and adaptive design strategies. Our findings highlight the significance of blocklength design and AN power allocation, as well as the impact of their trade-off on the AST.
△ Less
Submitted 13 May, 2024; v1 submitted 25 August, 2023;
originally announced August 2023.
-
Multi-objective Deep Reinforcement Learning for Mobile Edge Computing
Authors:
Ning Yang,
Junrui Wen,
Meng Zhang,
Ming Tang
Abstract:
Mobile edge computing (MEC) is essential for next-generation mobile network applications that prioritize various performance metrics, including delays and energy consumption. However, conventional single-objective scheduling solutions cannot be directly applied to practical systems in which the preferences of these applications (i.e., the weights of different objectives) are often unknown or chall…
▽ More
Mobile edge computing (MEC) is essential for next-generation mobile network applications that prioritize various performance metrics, including delays and energy consumption. However, conventional single-objective scheduling solutions cannot be directly applied to practical systems in which the preferences of these applications (i.e., the weights of different objectives) are often unknown or challenging to specify in advance. In this study, we address this issue by formulating a multi-objective offloading problem for MEC with multiple edges to minimize expected long-term energy consumption and transmission delay while considering unknown preferences as parameters. To address the challenge of unknown preferences, we design a multi-objective (deep) reinforcement learning (MORL)-based resource scheduling scheme with proximal policy optimization (PPO). In addition, we introduce a well-designed state encoding method for constructing features for multiple edges in MEC systems, a sophisticated reward function for accurately computing the utilities of delay and energy consumption. Simulation results demonstrate that our proposed MORL scheme enhances the hypervolume of the Pareto front by up to 233.1% compared to benchmarks. Our full framework is available at https://github.com/gracefulning/mec_morl_multipolicy.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Heterogeneous Receptors - Based Molecule Harvesting in MC: Analysis for ISI Mitigation and Energy Efficiency
Authors:
Xinyu Huang,
Yu Huang,
Miaowen Wen,
Nan Yang,
Robert Schober
Abstract:
This paper investigates a spherical transmitter (TX) with a membrane covered by heterogeneous receptors of varying sizes and arbitrary locations for molecular communication (MC), where molecules are encapsulated within vesicles and released from the TX through membrane fusion. Assuming continuous vesicle generation at the TX and a transparent receiver (RX), we calculate the molecule release rate,…
▽ More
This paper investigates a spherical transmitter (TX) with a membrane covered by heterogeneous receptors of varying sizes and arbitrary locations for molecular communication (MC), where molecules are encapsulated within vesicles and released from the TX through membrane fusion. Assuming continuous vesicle generation at the TX and a transparent receiver (RX), we calculate the molecule release rate, the fraction of absorbed molecules at the TX, and the received signal at the RX. All obtained analytical expressions are functions of all receptors locations and sizes, and are validated by particle-based simulations. Our numerical results indicate that evenly distributed receptors on the TX membrane can absorb more molecules than randomly distributed receptors or a single receptor. Furthermore, inspired by the autoreceptor functionality in synaptic communication, we incorporate a negative feedback mechanism (NFM) at the TX, such that molecule release stops after a certain period. We then derive the fraction of molecules that can be reused for the subsequent emissions when considering both NFM and molecule harvesting. Our numerical results demonstrate that incorporating NFM can reduce inter-symbol interference (ISI) while maintaining the same peak received signal as the case without NFM. Additionally, our results show that TXs incorporating both molecule harvesting and NFM can achieve a higher energy efficiency and lower error probability than TXs employing only molecule harvesting or neither functionality.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
Optimized data collection and analysis process for studying solar-thermal desalination by machine learning
Authors:
Guilong Peng,
Senshan Sun,
Yangjun Qin,
Zhenwei Xu,
Juxin Du,
Swellam W. sharshir,
A. W. Kandel,
A. E. Kabeel,
Nuo Yang
Abstract:
An effective interdisciplinary study between machine learning and solar-thermal desalination requires a sufficiently large and well-analyzed experimental datasets. This study develops a modified dataset collection and analysis process for studying solar-thermal desalination by machine learning. Based on the optimized water condensation and collection process, the proposed experimental method colle…
▽ More
An effective interdisciplinary study between machine learning and solar-thermal desalination requires a sufficiently large and well-analyzed experimental datasets. This study develops a modified dataset collection and analysis process for studying solar-thermal desalination by machine learning. Based on the optimized water condensation and collection process, the proposed experimental method collects over one thousand datasets, which is ten times more than the average number of datasets in previous works, by accelerating data collection and reducing the time by 83.3%. On the other hand, the effects of dataset features are investigated by using three different algorithms, including artificial neural networks, multiple linear regressions, and random forests. The investigation focuses on the effects of dataset size and range on prediction accuracy, factor importance ranking, and the model's generalization ability. The results demonstrate that a larger dataset can significantly improve prediction accuracy when using artificial neural networks and random forests. Additionally, the study highlights the significant impact of dataset size and range on ranking the importance of influence factors. Furthermore, the study reveals that the extrapolation data range significantly affects the extrapolation accuracy of artificial neural networks. Based on the results, massive dataset collection and analysis of dataset feature effects are important steps in an effective and consistent machine learning process flow for solar-thermal desalination, which can promote machine learning as a more general tool in the field of solar-thermal desalination.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Secure Short-Packet Communications via UAV-Enabled Mobile Relaying: Joint Resource Optimization and 3D Trajectory Design
Authors:
Milad Tatar Mamaghani,
Xiangyun Zhou,
Nan Yang,
A. Lee Swindlehurst
Abstract:
Short-packet communication (SPC) and unmanned aerial vehicles (UAVs) are anticipated to play crucial roles in the development of 5G-and-beyond wireless networks and the Internet of Things (IoT). In this paper, we propose a secure SPC system, where a UAV serves as a mobile decode-and-forward (DF) relay, periodically receiving and relaying small data packets from a remote IoT device to its receiver…
▽ More
Short-packet communication (SPC) and unmanned aerial vehicles (UAVs) are anticipated to play crucial roles in the development of 5G-and-beyond wireless networks and the Internet of Things (IoT). In this paper, we propose a secure SPC system, where a UAV serves as a mobile decode-and-forward (DF) relay, periodically receiving and relaying small data packets from a remote IoT device to its receiver in two hops with strict latency requirements, in the presence of an eavesdropper. This system requires careful optimization of important design parameters, such as the coding blocklengths of both hops, transmit powers, and the UAV's trajectory. While the overall optimization problem is nonconvex, we tackle it by applying a block successive convex approximation (BSCA) approach to divide the original problem into three subproblems and solve them separately. Then, an overall iterative algorithm is proposed to obtain the final design with guaranteed convergence. Our proposed low-complexity algorithm incorporates robust trajectory design and resource management to optimize the effective average secrecy throughput of the communication system over the course of the UAV-relay's mission. Simulation results demonstrate significant performance improvements compared to various benchmark schemes and provide useful design insights on the coding blocklengths and transmit powers along the trajectory of the UAV.
△ Less
Submitted 29 December, 2023; v1 submitted 14 July, 2023;
originally announced July 2023.
-
Learning to Retrieve In-Context Examples for Large Language Models
Authors:
Liang Wang,
Nan Yang,
Furu Wei
Abstract:
Large language models (LLMs) have demonstrated their ability to learn in-context, allowing them to perform various tasks based on a few input-output examples. However, the effectiveness of in-context learning is heavily reliant on the quality of the selected examples. In this paper, we propose a novel framework to iteratively train dense retrievers that can identify high-quality in-context example…
▽ More
Large language models (LLMs) have demonstrated their ability to learn in-context, allowing them to perform various tasks based on a few input-output examples. However, the effectiveness of in-context learning is heavily reliant on the quality of the selected examples. In this paper, we propose a novel framework to iteratively train dense retrievers that can identify high-quality in-context examples for LLMs. Our framework initially trains a reward model based on LLM feedback to evaluate the quality of candidate examples, followed by knowledge distillation to train a bi-encoder based dense retriever. Our experiments on a suite of $30$ tasks demonstrate that our framework significantly enhances in-context learning performance. Furthermore, we show the generalization ability of our framework to unseen tasks during training. An in-depth analysis reveals that our model improves performance by retrieving examples with similar patterns, and the gains are consistent across LLMs of varying sizes. The code and data are available at https://github.com/microsoft/LMOps/tree/main/llm_retriever .
△ Less
Submitted 26 January, 2024; v1 submitted 14 July, 2023;
originally announced July 2023.
-
Minimizing Age of Information for Mobile Edge Computing Systems: A Nested Index Approach
Authors:
Shuo Chen,
Ning Yang,
Meng Zhang,
Jun Wang
Abstract:
Exploiting the computational heterogeneity of mobile devices and edge nodes, mobile edge computation (MEC) provides an efficient approach to achieving real-time applications that are sensitive to information freshness, by offloading tasks from mobile devices to edge nodes. We use the metric Age-of-Information (AoI) to evaluate information freshness. An efficient solution to minimize the AoI for th…
▽ More
Exploiting the computational heterogeneity of mobile devices and edge nodes, mobile edge computation (MEC) provides an efficient approach to achieving real-time applications that are sensitive to information freshness, by offloading tasks from mobile devices to edge nodes. We use the metric Age-of-Information (AoI) to evaluate information freshness. An efficient solution to minimize the AoI for the MEC system with multiple users is non-trivial to obtain due to the random computing time. In this paper, we consider multiple users offloading tasks to heterogeneous edge servers in a MEC system. We first reformulate the problem as a Restless Multi-Arm-Bandit (RMAB) problem and establish a hierarchical Markov Decision Process (MDP) to characterize the updating of AoI for the MEC system. Based on the hierarchical MDP, we propose a nested index framework and design a nested index policy with provably asymptotic optimality. Finally, the closed form of the nested index is obtained, which enables the performance tradeoffs between computation complexity and accuracy. Our algorithm leads to an optimality gap reduction of up to 40%, compared to benchmarks. Our algorithm asymptotically approximates the lower bound as the system scalar gets large enough.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Learning to Rank in Generative Retrieval
Authors:
Yongqi Li,
Nan Yang,
Liang Wang,
Furu Wei,
Wenjie Li
Abstract:
Generative retrieval stands out as a promising new paradigm in text retrieval that aims to generate identifier strings of relevant passages as the retrieval target. This generative paradigm taps into powerful generative language models, distinct from traditional sparse or dense retrieval methods. However, only learning to generate is insufficient for generative retrieval. Generative retrieval lear…
▽ More
Generative retrieval stands out as a promising new paradigm in text retrieval that aims to generate identifier strings of relevant passages as the retrieval target. This generative paradigm taps into powerful generative language models, distinct from traditional sparse or dense retrieval methods. However, only learning to generate is insufficient for generative retrieval. Generative retrieval learns to generate identifiers of relevant passages as an intermediate goal and then converts predicted identifiers into the final passage rank list. The disconnect between the learning objective of autoregressive models and the desired passage ranking target leads to a learning gap. To bridge this gap, we propose a learning-to-rank framework for generative retrieval, dubbed LTRGR. LTRGR enables generative retrieval to learn to rank passages directly, optimizing the autoregressive model toward the final passage ranking target via a rank loss. This framework only requires an additional learning-to-rank training phase to enhance current generative retrieval systems and does not add any burden to the inference stage. We conducted experiments on three public benchmarks, and the results demonstrate that LTRGR achieves state-of-the-art performance among generative retrieval methods. The code and checkpoints are released at https://github.com/liyongqi67/LTRGR.
△ Less
Submitted 16 December, 2023; v1 submitted 27 June, 2023;
originally announced June 2023.
-
Multiview Identifiers Enhanced Generative Retrieval
Authors:
Yongqi Li,
Nan Yang,
Liang Wang,
Furu Wei,
Wenjie Li
Abstract:
Instead of simply matching a query to pre-existing passages, generative retrieval generates identifier strings of passages as the retrieval target. At a cost, the identifier must be distinctive enough to represent a passage. Current approaches use either a numeric ID or a text piece (such as a title or substrings) as the identifier. However, these identifiers cannot cover a passage's content well.…
▽ More
Instead of simply matching a query to pre-existing passages, generative retrieval generates identifier strings of passages as the retrieval target. At a cost, the identifier must be distinctive enough to represent a passage. Current approaches use either a numeric ID or a text piece (such as a title or substrings) as the identifier. However, these identifiers cannot cover a passage's content well. As such, we are motivated to propose a new type of identifier, synthetic identifiers, that are generated based on the content of a passage and could integrate contextualized information that text pieces lack. Furthermore, we simultaneously consider multiview identifiers, including synthetic identifiers, titles, and substrings. These views of identifiers complement each other and facilitate the holistic ranking of passages from multiple perspectives. We conduct a series of experiments on three public datasets, and the results indicate that our proposed approach performs the best in generative retrieval, demonstrating its effectiveness and robustness.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Incremental Dense Reconstruction from Monocular Video with Guided Sparse Feature Volume Fusion
Authors:
Xingxing Zuo,
Nan Yang,
Nathaniel Merrill,
Binbin Xu,
Stefan Leutenegger
Abstract:
Incrementally recovering 3D dense structures from monocular videos is of paramount importance since it enables various robotics and AR applications. Feature volumes have recently been shown to enable efficient and accurate incremental dense reconstruction without the need to first estimate depth, but they are not able to achieve as high of a resolution as depth-based methods due to the large memor…
▽ More
Incrementally recovering 3D dense structures from monocular videos is of paramount importance since it enables various robotics and AR applications. Feature volumes have recently been shown to enable efficient and accurate incremental dense reconstruction without the need to first estimate depth, but they are not able to achieve as high of a resolution as depth-based methods due to the large memory consumption of high-resolution feature volumes. This letter proposes a real-time feature volume-based dense reconstruction method that predicts TSDF (Truncated Signed Distance Function) values from a novel sparsified deep feature volume, which is able to achieve higher resolutions than previous feature volume-based methods, and is favorable in large-scale outdoor scenarios where the majority of voxels are empty. An uncertainty-aware multi-view stereo (MVS) network is leveraged to infer initial voxel locations of the physical surface in a sparse feature volume. Then for refining the recovered 3D geometry, deep features are attentively aggregated from multiview images at potential surface locations, and temporally fused. Besides achieving higher resolutions than before, our method is shown to produce more complete reconstructions with finer detail in many cases. Extensive evaluations on both public and self-collected datasets demonstrate a very competitive real-time reconstruction result for our method compared to state-of-the-art reconstruction methods in both indoor and outdoor settings.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Provably Convergent Schrödinger Bridge with Applications to Probabilistic Time Series Imputation
Authors:
Yu Chen,
Wei Deng,
Shikai Fang,
Fengpei Li,
Nicole Tianjiao Yang,
Yikai Zhang,
Kashif Rasul,
Shandian Zhe,
Anderson Schneider,
Yuriy Nevmyvaka
Abstract:
The Schrödinger bridge problem (SBP) is gaining increasing attention in generative modeling and showing promising potential even in comparison with the score-based generative models (SGMs). SBP can be interpreted as an entropy-regularized optimal transport problem, which conducts projections onto every other marginal alternatingly. However, in practice, only approximated projections are accessible…
▽ More
The Schrödinger bridge problem (SBP) is gaining increasing attention in generative modeling and showing promising potential even in comparison with the score-based generative models (SGMs). SBP can be interpreted as an entropy-regularized optimal transport problem, which conducts projections onto every other marginal alternatingly. However, in practice, only approximated projections are accessible and their convergence is not well understood. To fill this gap, we present a first convergence analysis of the Schrödinger bridge algorithm based on approximated projections. As for its practical applications, we apply SBP to probabilistic time series imputation by generating missing values conditioned on observed data. We show that optimizing the transport cost improves the performance and the proposed algorithm achieves the state-of-the-art result in healthcare and environmental data while exhibiting the advantage of exploring both temporal and feature patterns in probabilistic time series imputation.
△ Less
Submitted 10 September, 2023; v1 submitted 12 May, 2023;
originally announced May 2023.
-
Combining Adversaries with Anti-adversaries in Training
Authors:
Xiaoling Zhou,
Nan Yang,
Ou Wu
Abstract:
Adversarial training is an effective learning technique to improve the robustness of deep neural networks. In this study, the influence of adversarial training on deep learning models in terms of fairness, robustness, and generalization is theoretically investigated under more general perturbation scope that different samples can have different perturbation directions (the adversarial and anti-adv…
▽ More
Adversarial training is an effective learning technique to improve the robustness of deep neural networks. In this study, the influence of adversarial training on deep learning models in terms of fairness, robustness, and generalization is theoretically investigated under more general perturbation scope that different samples can have different perturbation directions (the adversarial and anti-adversarial directions) and varied perturbation bounds. Our theoretical explorations suggest that the combination of adversaries and anti-adversaries (samples with anti-adversarial perturbations) in training can be more effective in achieving better fairness between classes and a better tradeoff between robustness and generalization in some typical learning scenarios (e.g., noisy label learning and imbalance learning) compared with standard adversarial training. On the basis of our theoretical findings, a more general learning objective that combines adversaries and anti-adversaries with varied bounds on each training sample is presented. Meta learning is utilized to optimize the combination weights. Experiments on benchmark datasets under different learning scenarios verify our theoretical findings and the effectiveness of the proposed methodology.
△ Less
Submitted 18 May, 2023; v1 submitted 24 April, 2023;
originally announced April 2023.