-
Latent Distance Guided Alignment Training for Large Language Models
Authors:
Haotian Luo
Abstract:
Ensuring alignment with human preferences is a crucial characteristic of large language models (LLMs). Presently, the primary alignment methods, RLHF and DPO, require extensive human annotation, which is expensive despite their efficacy. The significant expenses associated with current alignment techniques motivate researchers to investigate the development of annotation-free alignment training me…
▽ More
Ensuring alignment with human preferences is a crucial characteristic of large language models (LLMs). Presently, the primary alignment methods, RLHF and DPO, require extensive human annotation, which is expensive despite their efficacy. The significant expenses associated with current alignment techniques motivate researchers to investigate the development of annotation-free alignment training methods. In pursuit of improved alignment without relying on external annotation, we introduce Latent Distance Guided Alignment Training (LD-Align). This approach seeks to align the model with a high-quality supervised fine-tune dataset using guidance from a latent space. The latent space is generated through sample reconstruction, akin to auto-encoding. Consequently, we utilize the distance between sample pairs in the latent space to guide DPO-based alignment training. Extensive experimentation and evaluation show the efficacy of our proposed method in achieving notable alignment.
△ Less
Submitted 13 April, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models
Authors:
Shibo Hao,
Yi Gu,
Haotian Luo,
Tianyang Liu,
Xiyan Shao,
Xinyuan Wang,
Shuhua Xie,
Haodi Ma,
Adithya Samavedhi,
Qiyue Gao,
Zhen Wang,
Zhiting Hu
Abstract:
Generating accurate step-by-step reasoning is essential for Large Language Models (LLMs) to address complex problems and enhance robustness and interpretability. Despite the flux of research on develo** advanced reasoning approaches, systematically analyzing the diverse LLMs and reasoning strategies in generating reasoning chains remains a significant challenge. The difficulties stem from the la…
▽ More
Generating accurate step-by-step reasoning is essential for Large Language Models (LLMs) to address complex problems and enhance robustness and interpretability. Despite the flux of research on develo** advanced reasoning approaches, systematically analyzing the diverse LLMs and reasoning strategies in generating reasoning chains remains a significant challenge. The difficulties stem from the lack of two key elements: (1) an automatic method for evaluating the generated reasoning chains on different tasks, and (2) a unified formalism and implementation of the diverse reasoning approaches for systematic comparison. This paper aims to close the gap: (1) We introduce AutoRace for fully automated reasoning chain evaluation. Existing metrics rely on expensive human annotations or pre-defined LLM prompts not adaptable to different tasks. In contrast, AutoRace automatically creates detailed evaluation criteria tailored for each task, and uses GPT-4 for accurate evaluation following the criteria. (2) We develop LLM Reasoners, a library for standardized modular implementation of existing and new reasoning algorithms, under a unified formulation of the search, reward, and world model components. With the new evaluation and library, (3) we conduct extensive study of different reasoning approaches (e.g., CoT, ToT, RAP). The analysis reveals interesting findings about different factors contributing to reasoning, including the reward-guidance, breadth-vs-depth in search, world model, and prompt formats, etc.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
POMDP-Guided Active Force-Based Search for Robotic Insertion
Authors:
Chen Wang,
Haoxiang Luo,
Kun Zhang,
Hua Chen,
Jia Pan,
Wei Zhang
Abstract:
In robotic insertion tasks where the uncertainty exceeds the allowable tolerance, a good search strategy is essential for successful insertion and significantly influences efficiency. The commonly used blind search method is time-consuming and does not exploit the rich contact information. In this paper, we propose a novel search strategy that actively utilizes the information contained in the con…
▽ More
In robotic insertion tasks where the uncertainty exceeds the allowable tolerance, a good search strategy is essential for successful insertion and significantly influences efficiency. The commonly used blind search method is time-consuming and does not exploit the rich contact information. In this paper, we propose a novel search strategy that actively utilizes the information contained in the contact configuration and shows high efficiency. In particular, we formulate this problem as a Partially Observable Markov Decision Process (POMDP) with carefully designed primitives based on an in-depth analysis of the contact configuration's static stability. From the formulated POMDP, we can derive a novel search strategy. Thanks to its simplicity, this search strategy can be incorporated into a Finite-State-Machine (FSM) controller. The behaviors of the FSM controller are realized through a low-level Cartesian Impedance Controller. Our method is based purely on the robot's proprioceptive sensing and does not need visual or tactile sensors. To evaluate the effectiveness of our proposed strategy and control framework, we conduct extensive comparison experiments in simulation, where we compare our method with the baseline approach. The results demonstrate that our proposed method achieves a higher success rate with a shorter search time and search trajectory length compared to the baseline method. Additionally, we show that our method is robust to various initial displacement errors.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
DeViDe: Faceted medical knowledge for improved medical vision-language pre-training
Authors:
Haozhe Luo,
Ziyu Zhou,
Corentin Royer,
Anjany Sekuboyina,
Bjoern Menze
Abstract:
Vision-language pre-training for chest X-rays has made significant strides, primarily by utilizing paired radiographs and radiology reports. However, existing approaches often face challenges in encoding medical knowledge effectively. While radiology reports provide insights into the current disease manifestation, medical definitions (as used by contemporary methods) tend to be overly abstract, cr…
▽ More
Vision-language pre-training for chest X-rays has made significant strides, primarily by utilizing paired radiographs and radiology reports. However, existing approaches often face challenges in encoding medical knowledge effectively. While radiology reports provide insights into the current disease manifestation, medical definitions (as used by contemporary methods) tend to be overly abstract, creating a gap in knowledge. To address this, we propose DeViDe, a novel transformer-based method that leverages radiographic descriptions from the open web. These descriptions outline general visual characteristics of diseases in radiographs, and when combined with abstract definitions and radiology reports, provide a holistic snapshot of knowledge. DeViDe incorporates three key features for knowledge-augmented vision language alignment: First, a large-language model-based augmentation is employed to homogenise medical knowledge from diverse sources. Second, this knowledge is aligned with image information at various levels of granularity. Third, a novel projection layer is proposed to handle the complexity of aligning each image with multiple descriptions arising in a multi-label setting. In zero-shot settings, DeViDe performs comparably to fully supervised models on external datasets and achieves state-of-the-art results on three large-scale datasets. Additionally, fine-tuning DeViDe on four downstream tasks and six segmentation tasks showcases its superior performance across data from diverse distributions.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Model-agnostic Origin Attribution of Generated Images with Few-shot Examples
Authors:
Fengyuan Liu,
Haochen Luo,
Yiming Li,
Philip Torr,
**dong Gu
Abstract:
Recent progress in visual generative models enables the generation of high-quality images. To prevent the misuse of generated images, it is important to identify the origin model that generates them. In this work, we study the origin attribution of generated images in a practical setting where only a few images generated by a source model are available and the source model cannot be accessed. The…
▽ More
Recent progress in visual generative models enables the generation of high-quality images. To prevent the misuse of generated images, it is important to identify the origin model that generates them. In this work, we study the origin attribution of generated images in a practical setting where only a few images generated by a source model are available and the source model cannot be accessed. The goal is to check if a given image is generated by the source model. We first formulate this problem as a few-shot one-class classification task. To solve the task, we propose OCC-CLIP, a CLIP-based framework for few-shot one-class classification, enabling the identification of an image's source model, even among multiple candidates. Extensive experiments corresponding to various generative models verify the effectiveness of our OCC-CLIP framework. Furthermore, an experiment based on the recently released DALL-E 3 API verifies the real-world applicability of our solution.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
FastqZip: An Improved Reference-Based Genome Sequence Lossy Compression Framework
Authors:
Yuanjian Liu,
Huihao Luo,
Zhijun Han,
Yao Hu,
Yehui Yang,
Kyle Chard,
Sheng Di,
Ian Foster,
Jiesheng Wu
Abstract:
Storing and archiving data produced by next-generation sequencing (NGS) is a huge burden for research institutions. Reference-based compression algorithms are effective in dealing with these data. Our work focuses on compressing FASTQ format files with an improved reference-based compression algorithm to achieve a higher compression ratio than other state-of-the-art algorithms. We propose FastqZip…
▽ More
Storing and archiving data produced by next-generation sequencing (NGS) is a huge burden for research institutions. Reference-based compression algorithms are effective in dealing with these data. Our work focuses on compressing FASTQ format files with an improved reference-based compression algorithm to achieve a higher compression ratio than other state-of-the-art algorithms. We propose FastqZip, which uses a new method map** the sequence to reference for compression, allows reads-reordering and lossy quality scores, and the BSC or ZPAQ algorithm to perform final lossless compression for a higher compression ratio and relatively fast speed. Our method ensures the sequence can be losslessly reconstructed while allowing lossless or lossy compression for the quality scores. We reordered the reads to get a higher compression ratio. We evaluate our algorithms on five datasets and show that FastqZip can outperform the SOTA algorithm Genozip by around 10% in terms of compression ratio while having an acceptable slowdown.
△ Less
Submitted 22 February, 2024;
originally announced April 2024.
-
Uncovering the Text Embedding in Text-to-Image Diffusion Models
Authors:
Hu Yu,
Hao Luo,
Fan Wang,
Feng Zhao
Abstract:
The correspondence between input text and the generated image exhibits opacity, wherein minor textual modifications can induce substantial deviations in the generated image. While, text embedding, as the pivotal intermediary between text and images, remains relatively underexplored. In this paper, we address this research gap by delving into the text embedding space, unleashing its capacity for co…
▽ More
The correspondence between input text and the generated image exhibits opacity, wherein minor textual modifications can induce substantial deviations in the generated image. While, text embedding, as the pivotal intermediary between text and images, remains relatively underexplored. In this paper, we address this research gap by delving into the text embedding space, unleashing its capacity for controllable image editing and explicable semantic direction attributes within a learning-free framework. Specifically, we identify two critical insights regarding the importance of per-word embedding and their contextual correlations within text embedding, providing instructive principles for learning-free image editing. Additionally, we find that text embedding inherently possesses diverse semantic potentials, and further reveal this property through the lens of singular value decomposition (SVD). These uncovered properties offer practical utility for image editing and semantic discovery. More importantly, we expect the in-depth analyses and findings of the text embedding can enhance the understanding of text-to-image diffusion models.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation
Authors:
Chi-Min Chan,
Chunpu Xu,
Ruibin Yuan,
Hongyin Luo,
Wei Xue,
Yike Guo,
Jie Fu
Abstract:
Large Language Models (LLMs) exhibit remarkable capabilities but are prone to generating inaccurate or hallucinatory responses. This limitation stems from their reliance on vast pretraining datasets, making them susceptible to errors in unseen scenarios. To tackle these challenges, Retrieval-Augmented Generation (RAG) addresses this by incorporating external, relevant documents into the response g…
▽ More
Large Language Models (LLMs) exhibit remarkable capabilities but are prone to generating inaccurate or hallucinatory responses. This limitation stems from their reliance on vast pretraining datasets, making them susceptible to errors in unseen scenarios. To tackle these challenges, Retrieval-Augmented Generation (RAG) addresses this by incorporating external, relevant documents into the response generation process, thus leveraging non-parametric knowledge alongside LLMs' in-context learning abilities. However, existing RAG implementations primarily focus on initial input for context retrieval, overlooking the nuances of ambiguous or complex queries that necessitate further clarification or decomposition for accurate responses. To this end, we propose learning to Refine Query for Retrieval Augmented Generation (RQ-RAG) in this paper, endeavoring to enhance the model by equip** it with capabilities for explicit rewriting, decomposition, and disambiguation. Our experimental results indicate that our method, when applied to a 7B Llama2 model, surpasses the previous state-of-the-art (SOTA) by an average of 1.9\% across three single-hop QA datasets, and also demonstrates enhanced performance in handling complex, multi-hop QA datasets. Our code is available at https://github.com/chanchimin/RQ-RAG.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Blockchain for Energy Market: A Comprehensive Survey
Authors:
Tianqi Jiang,
Haoxiang Luo,
Kun Yang,
Gang Sun,
Hongfang Yu,
Qi Huang,
Athanasios V. Vasilakos
Abstract:
The energy market encompasses the behavior of energy supply and trading within a platform system. By utilizing centralized or distributed trading, energy can be effectively managed and distributed across different regions, thereby achieving market equilibrium and satisfying both producers and consumers. However, recent years have presented unprecedented challenges and difficulties for the developm…
▽ More
The energy market encompasses the behavior of energy supply and trading within a platform system. By utilizing centralized or distributed trading, energy can be effectively managed and distributed across different regions, thereby achieving market equilibrium and satisfying both producers and consumers. However, recent years have presented unprecedented challenges and difficulties for the development of the energy market. These challenges include regional energy imbalances, volatile energy pricing, high computing costs, and issues related to transaction information disclosure. Researchers widely acknowledge that the security features of blockchain technology can enhance the efficiency of energy transactions and establish the fundamental stability and robustness of the energy market. This type of blockchain-enabled energy market is commonly referred to as an energy blockchain. Currently, there is a burgeoning amount of research in this field, encompassing algorithm design, framework construction, and practical application. It is crucial to organize and compare these research efforts to facilitate the further advancement of energy blockchain. This survey aims to comprehensively review the fundamental characteristics of blockchain and energy markets, highlighting the significant advantages of combining the two. Moreover, based on existing research outcomes, we will categorize and compare the current energy market research supported by blockchain in terms of algorithm design, market framework construction, and the policies and practical applications adopted by different countries. Finally, we will address current issues and propose potential future directions for improvement, to provide guidance for the practical implementation of blockchain in the energy market.
△ Less
Submitted 5 April, 2024; v1 submitted 29 March, 2024;
originally announced March 2024.
-
Text Data-Centric Image Captioning with Interactive Prompts
Authors:
Yiyu Wang,
Hao Luo,
Jungang Xu,
Yingfei Sun,
Fan Wang
Abstract:
Supervised image captioning approaches have made great progress, but it is challenging to collect high-quality human-annotated image-text data. Recently, large-scale vision and language models (e.g., CLIP) and large-scale generative language models (e.g., GPT-2) have shown strong performances in various tasks, which also provide some new solutions for image captioning with web paired data, unpaire…
▽ More
Supervised image captioning approaches have made great progress, but it is challenging to collect high-quality human-annotated image-text data. Recently, large-scale vision and language models (e.g., CLIP) and large-scale generative language models (e.g., GPT-2) have shown strong performances in various tasks, which also provide some new solutions for image captioning with web paired data, unpaired data or even text-only data. Among them, the mainstream solution is to project image embeddings into the text embedding space with the assistance of consistent representations between image-text pairs from the CLIP model. However, the current methods still face several challenges in adapting to the diversity of data configurations in a unified solution, accurately estimating image-text embedding bias, and correcting unsatisfactory prediction results in the inference stage. This paper proposes a new Text data-centric approach with Interactive Prompts for image Captioning, named TIPCap. 1) We consider four different settings which gradually reduce the dependence on paired data. 2) We construct a map** module driven by multivariate Gaussian distribution to mitigate the modality gap, which is applicable to the above four different settings. 3) We propose a prompt interaction module that can incorporate optional prompt information before generating captions. Extensive experiments show that our TIPCap outperforms other weakly or unsupervised image captioning methods and achieves a new state-of-the-art performance on two widely used datasets, i.e., MS-COCO and Flickr30K.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
Authors:
Zhenyu Pan,
Haozheng Luo,
Manling Li,
Han Liu
Abstract:
We present a Chain-of-Action (CoA) framework for multimodal and retrieval-augmented Question-Answering (QA). Compared to the literature, CoA overcomes two major challenges of current QA applications: (i) unfaithful hallucination that is inconsistent with real-time or domain facts and (ii) weak reasoning performance over compositional information. Our key contribution is a novel reasoning-retrieval…
▽ More
We present a Chain-of-Action (CoA) framework for multimodal and retrieval-augmented Question-Answering (QA). Compared to the literature, CoA overcomes two major challenges of current QA applications: (i) unfaithful hallucination that is inconsistent with real-time or domain facts and (ii) weak reasoning performance over compositional information. Our key contribution is a novel reasoning-retrieval mechanism that decomposes a complex question into a reasoning chain via systematic prompting and pre-designed actions. Methodologically, we propose three types of domain-adaptable `Plug-and-Play' actions for retrieving real-time information from heterogeneous sources. We also propose a multi-reference faith score (MRFS) to verify and resolve conflicts in the answers. Empirically, we exploit both public benchmarks and a Web3 case study to demonstrate the capability of CoA over other methods.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Facile synthesis of CoSi alloy with rich vacancy for base- and solvent-free aerobic oxidation of aromatic alcohols
Authors:
Zhiyue Zhao,
Zhiwei Jiang,
Yizhe Huang,
Mebrouka Boubeche,
Valentina G. Matveeva,
Hector F. Garces,
Huixia Luo,
Kai Yan
Abstract:
Rational design and green synthesis of low-cost and robust catalysts efficient for the selective oxidation of various alcohols are full of challenges. Herein, we report a fast and solvent-free arc-melting (AM) method to controllably synthesize semimetal CoSi alloy (abbreviated as AM-CoSi) that is efficient for the base- and solvent-free oxidation of six types of aromatic alcohols. X-ray absorption…
▽ More
Rational design and green synthesis of low-cost and robust catalysts efficient for the selective oxidation of various alcohols are full of challenges. Herein, we report a fast and solvent-free arc-melting (AM) method to controllably synthesize semimetal CoSi alloy (abbreviated as AM-CoSi) that is efficient for the base- and solvent-free oxidation of six types of aromatic alcohols. X-ray absorption fine structure (XAFS), electron paramagnetic resonance (EPR), and aberration corrected high angle annular dark field scanning transmission electron microscope (AC HAADF-STEM) confirmed the successful synthesis of AM-CoSi with rich Si vacancy (Siv). The as-prepared CoSi alloy catalysts exhibit an order of magnitude activity enhancement in the oxidation of model reactant benzyl alcohol (BAL) to benzyl benzoate (BBE) compared with its mono counterparts, whereas 70 % yield of BBE which is the highest yield to date. Experimental results and DFT calculations well verify that the CoSi alloy structure improves the BAL conversion and Si vacancy mainly contributes to the generation of BBE. After that, CoSi alloy maintains high stability and a potential pathway is rationally proposed. Besides, CoSi alloy also efficiently works for the selective oxidation of various alcohols with different groups. This work demonstrates for the first time that semimetal CoSi alloy is robust for the green oxidation of various alcohols and provides a vast opportunity for reasonable design and application of other semimetal alloy catalysts.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Green fabrication of nickel-iron layered double hydroxides nanosheets efficient for the enhanced capacitive performance
Authors:
Yuchen Wang,
Zuo Chen,
Man Zhang,
Yaoyu Liu,
Huixia Luo,
Kai Yan
Abstract:
Rational synthesis of robust layered double hydroxides (LDHs) nanosheets for high-energy supercapacitors is full of challenges. Herein, we reported an ultrasonication-assisted strategy to eco-friendly fabricate NiFe-LDHs nanosheets for the enhanced capacitive behavior. The experimental results combined with different advanced characterization tools document that the utilization of ultrasonication…
▽ More
Rational synthesis of robust layered double hydroxides (LDHs) nanosheets for high-energy supercapacitors is full of challenges. Herein, we reported an ultrasonication-assisted strategy to eco-friendly fabricate NiFe-LDHs nanosheets for the enhanced capacitive behavior. The experimental results combined with different advanced characterization tools document that the utilization of ultrasonication has a profound effect on the morphology and thickness of the as-obtained NiFe-LDHs, alternatively affecting the capacitive behavior. It shows that NiFe-LDHs nanosheets prepared with 2-h ultrasonic treatments display the exceptional capacitive performance because of the synergetic effect of ultrathin thickness, large specific surface area, and high mesoporous volume. The maximum specific capacitance of Ni3Fe1-LDHs nanosheets with the thickness of 7.39 nm and the specific surface area of 77.16 m2 g-1 reached 1923 F g-1, which is competitive with most previously reported values. In addition, the maximum specific energy of the assembled NiFe-LDHs//AC asymmetric supercapacitor achieved 49.13 Wh kg-1 at 400 W kg-1. This work provides a green technology to fabricate LDHs nanosheets, and offers deep insights for understanding the relationship between the morphology/structure and capacitive behavior of LDHs nanosheets, which is helpful for achieving high-performance LDHs-based electrode materials.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Joint Power Allocation and Beamforming for In-band Full-duplex Multi-cell Multi-user Networks
Authors:
Haifeng Luo,
Navneet Garg,
Mark Holm,
Tharmalingam Ratnarajah
Abstract:
This paper investigates a robust joint power allocation and beamforming scheme for in-band full-duplex multi-cell multi-user (IBFD-MCMU) networks. A mean-squared error (MSE) minimization problem is formulated with constraints on the power budgets and residual self-interference (RSI) power. The problem is not convex, so we decompose it into two sub-problems: interference management beamforming and…
▽ More
This paper investigates a robust joint power allocation and beamforming scheme for in-band full-duplex multi-cell multi-user (IBFD-MCMU) networks. A mean-squared error (MSE) minimization problem is formulated with constraints on the power budgets and residual self-interference (RSI) power. The problem is not convex, so we decompose it into two sub-problems: interference management beamforming and power allocation, and give closed-form solutions to the sub-problems. Then we propose an iterative algorithm to yield an overall solution. The computational complexity and convergence behavior of the algorithm are analyzed. Our method can enhance the analog self-interference cancellation (ASIC) depth provided by the precoder with less effect on the downlink communication than the existing null-space projection method, inspiring a low-cost but efficient IBFD transceiver design. It can achieve 42.9% of IBFD gain in terms of spectral efficiency with only antenna isolation, while this value increases to 60.9% with further digital self-interference cancellation (DSIC). Numerical results illustrate that our algorithm is robust to hardware impairments and channel uncertainty. With sufficient ASIC depth, our method reduces the computation time by at least 20% than the existing scheme due to its faster convergence speed at the cost of < 12.5% sum rate loss. The benefit is much more significant with single-antenna users that our algorithm saves at least 40% of the computation time at the cost of < 10% sum rate reduction.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
An Image Is Worth 1000 Lies: Adversarial Transferability across Prompts on Vision-Language Models
Authors:
Haochen Luo,
**dong Gu,
Fengyuan Liu,
Philip Torr
Abstract:
Different from traditional task-specific vision models, recent large VLMs can readily adapt to different vision tasks by simply using different textual instructions, i.e., prompts. However, a well-known concern about traditional task-specific vision models is that they can be misled by imperceptible adversarial perturbations. Furthermore, the concern is exacerbated by the phenomenon that the same…
▽ More
Different from traditional task-specific vision models, recent large VLMs can readily adapt to different vision tasks by simply using different textual instructions, i.e., prompts. However, a well-known concern about traditional task-specific vision models is that they can be misled by imperceptible adversarial perturbations. Furthermore, the concern is exacerbated by the phenomenon that the same adversarial perturbations can fool different task-specific models. Given that VLMs rely on prompts to adapt to different tasks, an intriguing question emerges: Can a single adversarial image mislead all predictions of VLMs when a thousand different prompts are given? This question essentially introduces a novel perspective on adversarial transferability: cross-prompt adversarial transferability. In this work, we propose the Cross-Prompt Attack (CroPA). This proposed method updates the visual adversarial perturbation with learnable prompts, which are designed to counteract the misleading effects of the adversarial image. By doing this, CroPA significantly improves the transferability of adversarial examples across prompts. Extensive experiments are conducted to verify the strong cross-prompt adversarial transferability of CroPA with prevalent VLMs including Flamingo, BLIP-2, and InstructBLIP in various different tasks. Our source code is available at \url{https://github.com/Haochen-Luo/CroPA}.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering
Authors:
Zhixuan Shen,
Haonan Luo,
Sijia Li,
Tianrui Li
Abstract:
Scene-Text Visual Question Answering (ST-VQA) aims to understand scene text in images and answer questions related to the text content. Most existing methods heavily rely on the accuracy of Optical Character Recognition (OCR) systems, and aggressive fine-tuning based on limited spatial location information and erroneous OCR text information often leads to inevitable overfitting. In this paper, we…
▽ More
Scene-Text Visual Question Answering (ST-VQA) aims to understand scene text in images and answer questions related to the text content. Most existing methods heavily rely on the accuracy of Optical Character Recognition (OCR) systems, and aggressive fine-tuning based on limited spatial location information and erroneous OCR text information often leads to inevitable overfitting. In this paper, we propose a multimodal adversarial training architecture with spatial awareness capabilities. Specifically, we introduce an Adversarial OCR Enhancement (AOE) module, which leverages adversarial training in the embedding space of OCR modality to enhance fault-tolerant representation of OCR texts, thereby reducing noise caused by OCR errors. Simultaneously, We add a Spatial-Aware Self-Attention (SASA) mechanism to help the model better capture the spatial relationships among OCR tokens. Various experiments demonstrate that our method achieves significant performance improvements on both the ST-VQA and TextVQA datasets and provides a novel paradigm for multimodal adversarial training.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Intention-driven Ego-to-Exo Video Generation
Authors:
Hongchen Luo,
Kai Zhu,
Wei Zhai,
Yang Cao
Abstract:
Ego-to-exo video generation refers to generating the corresponding exocentric video according to the egocentric video, providing valuable applications in AR/VR and embodied AI. Benefiting from advancements in diffusion model techniques, notable progress has been achieved in video generation. However, existing methods build upon the spatiotemporal consistency assumptions between adjacent frames, wh…
▽ More
Ego-to-exo video generation refers to generating the corresponding exocentric video according to the egocentric video, providing valuable applications in AR/VR and embodied AI. Benefiting from advancements in diffusion model techniques, notable progress has been achieved in video generation. However, existing methods build upon the spatiotemporal consistency assumptions between adjacent frames, which cannot be satisfied in the ego-to-exo scenarios due to drastic changes in views. To this end, this paper proposes an Intention-Driven Ego-to-exo video generation framework (IDE) that leverages action intention consisting of human movement and action description as view-independent representation to guide video generation, preserving the consistency of content and motion. Specifically, the egocentric head trajectory is first estimated through multi-view stereo matching. Then, cross-view feature perception module is introduced to establish correspondences between exo- and ego- views, guiding the trajectory transformation module to infer human full-body movement from the head trajectory. Meanwhile, we present an action description unit that maps the action semantics into the feature space consistent with the exocentric image. Finally, the inferred human movement and high-level action descriptions jointly guide the generation of exocentric motion and interaction content (i.e., corresponding optical flow and occlusion maps) in the backward process of the diffusion model, ultimately war** them into the corresponding exocentric video. We conduct extensive experiments on the relevant dataset with diverse exo-ego video pairs, and our IDE outperforms state-of-the-art models in both subjective and objective assessments, demonstrating its efficacy in ego-to-exo video generation.
△ Less
Submitted 17 March, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Plotinus: A Satellite Internet Digital Twin System
Authors:
Yue Gao,
Kun Qiu,
Zhe Chen,
Wenjun Zhu,
Qi Zhang,
Handong Luo,
Quanwei Lin,
Ziheng Yang,
Wenhao Liu
Abstract:
The development of an integrated space-air-ground network (SAGIN) requires sophisticated satellite Internet emulation tools that can handle complex, dynamic topologies and offer in-depth analysis. Existing emulation platforms struggle with challenges like the need for detailed implementation across all network layers, real-time response, and scalability. This paper proposes a digital twin system b…
▽ More
The development of an integrated space-air-ground network (SAGIN) requires sophisticated satellite Internet emulation tools that can handle complex, dynamic topologies and offer in-depth analysis. Existing emulation platforms struggle with challenges like the need for detailed implementation across all network layers, real-time response, and scalability. This paper proposes a digital twin system based on microservices for satellite Internet emulation, namely Plotinus, which aims to solve these problems. Plotinus features a modular design, allowing for easy replacement of the physical layer to emulate different aerial vehicles and analyze channel interference. It also enables replacing path computation methods to simplify testing and deploying algorithms. In particular, Plotinus allows for real-time emulation with live network traffic, enhancing practical network models. The evaluation result shows Plotinus's effective emulation of dynamic satellite networks with real-world devices. Its adaptability for various communication models and algorithm testing highlights Plotinus's role as a vital tool for develo** and analyzing SAGIN systems, offering a cross-layer, real-time and scalable digital twin system.
△ Less
Submitted 24 March, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
On Tractable $Φ$-Equilibria in Non-Concave Games
Authors:
Yang Cai,
Constantinos Daskalakis,
Haipeng Luo,
Chen-Yu Wei,
Weiqiang Zheng
Abstract:
While Online Gradient Descent and other no-regret learning procedures are known to efficiently converge to a coarse correlated equilibrium in games where each agent's utility is concave in their own strategy, this is not the case when utilities are non-concave -- a common scenario in machine learning applications involving strategies parameterized by deep neural networks, or when agents' utilities…
▽ More
While Online Gradient Descent and other no-regret learning procedures are known to efficiently converge to a coarse correlated equilibrium in games where each agent's utility is concave in their own strategy, this is not the case when utilities are non-concave -- a common scenario in machine learning applications involving strategies parameterized by deep neural networks, or when agents' utilities are computed by neural networks, or both. Non-concave games introduce significant game-theoretic and optimization challenges: (i) Nash equilibria may not exist; (ii) local Nash equilibria, though existing, are intractable; and (iii) mixed Nash, correlated, and coarse correlated equilibria generally have infinite support and are intractable. To sidestep these challenges, we revisit the classical solution concept of $Φ$-equilibria introduced by Greenwald and Jafari [2003], which is guaranteed to exist for an arbitrary set of strategy modifications $Φ$ even in non-concave games [Stoltz and Lugosi, 2007]. However, the tractability of $Φ$-equilibria in such games remains elusive. In this paper, we initiate the study of tractable $Φ$-equilibria in non-concave games and examine several natural families of strategy modifications. We show that when $Φ$ is finite, there exists an efficient uncoupled learning algorithm that converges to the corresponding $Φ$-equilibria. Additionally, we explore cases where $Φ$ is infinite but consists of local modifications, showing that Online Gradient Descent can efficiently approximate $Φ$-equilibria in non-trivial regimes.
△ Less
Submitted 2 July, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
On the Secrecy Rate of In-Band Full-duplex Two-way Wiretap Channel
Authors:
Navneet Garg,
Haifeng Luo,
Tharmalingam Ratnarajah
Abstract:
In this paper, we consider a two-way wiretap Multi-Input Multi-Output Multi-antenna Eve (MIMOME) channel, where both nodes (Alice and Bob) transmit and receive in an in-band full-duplex (IBFD) manner. For this system with keyless security, we provide a novel artificial noise (AN) based signal design, where the AN is injected in both signal and null spaces. We present an ergodic secrecy rate approx…
▽ More
In this paper, we consider a two-way wiretap Multi-Input Multi-Output Multi-antenna Eve (MIMOME) channel, where both nodes (Alice and Bob) transmit and receive in an in-band full-duplex (IBFD) manner. For this system with keyless security, we provide a novel artificial noise (AN) based signal design, where the AN is injected in both signal and null spaces. We present an ergodic secrecy rate approximation to derive the power allocation algorithm. We consider scenarios where AN is known and unknown to legitimate users and include imperfect channel information effects. To maximize secrecy rates subject to the transmit power constraint, a two-step power allocation solution is proposed, where the first step is known at Eve, and the second step helps to improve the secrecy further. We also consider scenarios where partial information is known by Eve and the effects of non-ideal self-interference cancellation. The usefulness and limitations of the resulting power allocation solution are analyzed and verified via simulations. Results show that secrecy rates are less when AN is unknown to receivers or Eve has more information about legitimate users. Since the ergodic approximation only considers Eves distance, the resulting power allocation provides secrecy rates close to the actual ones.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
van Hove Singularity-Driven Emergence of Multiple Flat Bands in Kagome Superconductors
Authors:
Hailan Luo,
Lin Zhao,
Zhen Zhao,
Haitao Yang,
Yun-Peng Huang,
Hongxiong Liu,
Yuhao Gu,
Feng **,
Hao Chen,
Taimin Miao,
Chaohui Yin,
Chengmin Shen,
Xiaolin Ren,
Bo Liang,
Yingjie Shu,
Yiwen Chen,
Fengfeng Zhang,
Feng Yang,
Shen** Zhang,
Qinjun Peng,
Hanqing Mao,
Guodong Liu,
Jiang** Hu,
Youguo Shi,
Zuyan Xu
, et al. (5 additional authors not shown)
Abstract:
The newly discovered Kagome superconductors AV$_3$Sb$_5$ (A=K, Rb and Cs) continue to bring surprises in generating unusual phenomena and physical properties, including anomalous Hall effect, unconventional charge density wave, electronic nematicity and time-reversal symmetry breaking. Here we report an unexpected emergence of multiple flat bands in the AV$_3$Sb$_5$ superconductors. By performing…
▽ More
The newly discovered Kagome superconductors AV$_3$Sb$_5$ (A=K, Rb and Cs) continue to bring surprises in generating unusual phenomena and physical properties, including anomalous Hall effect, unconventional charge density wave, electronic nematicity and time-reversal symmetry breaking. Here we report an unexpected emergence of multiple flat bands in the AV$_3$Sb$_5$ superconductors. By performing high-resolution angle-resolved photoemission (ARPES) measurements, we observed four branches of flat bands that span over the entire momentum space. The appearance of the flat bands is not anticipated from the band structure calculations and cannot be accounted for by the known mechanisms of flat band generation. It is intimately related to the evolution of van Hove singularities. It is for the first time to observe such emergence of multiple flat bands in solid materials. Our findings provide new insights in revealing the underlying mechanism that governs the unusual behaviors in the Kagome superconductors. They also provide a new pathway in producing flat bands and set a platform to study the flat bands related physics.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Discrete Boltzmann model with split collision for nonequilibrium reactive flows
Authors:
Chuandong Lin,
Kai H. Luo,
Huilin Lai
Abstract:
A multi-relaxation-time discrete Boltzmann model (DBM) with split collision is proposed for both subsonic and supersonic compressible reacting flows, where chemical reactions take place among various components. The physical model is based on a unified set of discrete Boltzmann equations that describes the evolution of each chemical species with adjustable acceleration, specific heat ratio, and Pr…
▽ More
A multi-relaxation-time discrete Boltzmann model (DBM) with split collision is proposed for both subsonic and supersonic compressible reacting flows, where chemical reactions take place among various components. The physical model is based on a unified set of discrete Boltzmann equations that describes the evolution of each chemical species with adjustable acceleration, specific heat ratio, and Prandtl number. On the righ-hand side of discrete Boltzmann equations, the collision, force, and reaction terms denote the change rates of distribution functions due to self- and cross-collisions, external forces, and chemical reactions, respectively. The source terms can be calculated in three ways, among which the matrix inversion method possesses the highest physical accuracy and computational efficiency. Through Chapman-Enskog analysis, it is proved that the DBM is consistent with the reactive Navier-Stokes equations, Fick's law and Stefan-Maxwell diffusion equation in the hydrodynamic limit. Compared with the one-step-relaxation model, the split collision model offers a detailed and precise measurement of hydrodynamic, thermodynamic, and chemical nonequilibrium effects. Finally, the model is validated by six benchmarks, including multicomponent diffusion, mixture in the force field, Kelvin-Helmholtz instability, flame at constant pressure, opposing chemical reaction, and steady detonation.
△ Less
Submitted 22 April, 2024; v1 submitted 9 March, 2024;
originally announced March 2024.
-
Superconductivity of the New Medium-Entropy Alloy V4Ti2W with a Body-Centered Cubic Structure
Authors:
Kuan Li,
Weijie Lin,
Ruixin Guo,
Shu Guo,
Lingyong Zeng,
Longfu Li,
Peifeng Yu,
Kangwang Wang,
Chao Zhang,
Huixia Luo
Abstract:
Medium- and high-entropy alloy (MEA and HEA) superconductors have attracted considerable interest since their discovery. This paper reports the superconducting properties of ternary tungsten-containing MEA V4Ti2W for the first time. V4Ti2W is a type II superconductor with a body-centered cubic (BCC) structure. Experimental results of resistivity, magnetization, and heat capacity indicate that the…
▽ More
Medium- and high-entropy alloy (MEA and HEA) superconductors have attracted considerable interest since their discovery. This paper reports the superconducting properties of ternary tungsten-containing MEA V4Ti2W for the first time. V4Ti2W is a type II superconductor with a body-centered cubic (BCC) structure. Experimental results of resistivity, magnetization, and heat capacity indicate that the superconducting transition temperature of the MEA V4Ti2W is roughly 5.0 K. The critical magnetic fields at the upper and lower ends are 9.93(2) T and 40.7(3) mT, respectively. Interestingly, few BCC MEA superconductors with VEC greater than 4.8 have been found. The addition of tungsten leads to a VEC of 4.83 e/a for V4Ti2W, which is rarely higher than the 4.8 value. Adding tungsten element expands the variety of MEA alloys, which may improve the microstructure and mechanical properties of materials and even superconducting properties. This material could potentially offer a new platform for the investigation of innovative MEA and HEA superconductors.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Guiding Trojan light beams via Lagrange points
Authors:
Haokun Luo,
Yunxuan Wei,
Fan O. Wu,
Georgios G. Pyrialakos,
Demetrios N. Christodoulides,
Mercedeh Khajavikhan
Abstract:
The guided transmission of optical waves is critical for light-based applications in modern communication, information processing, and energy generation systems. Traditionally, guiding light waves in structures like optical fibres is predominantly achieved through the use of total internal reflection. In periodic platforms, a variety of other physical mechanisms can also be deployed to transport o…
▽ More
The guided transmission of optical waves is critical for light-based applications in modern communication, information processing, and energy generation systems. Traditionally, guiding light waves in structures like optical fibres is predominantly achieved through the use of total internal reflection. In periodic platforms, a variety of other physical mechanisms can also be deployed to transport optical waves. However, transversely confining light in fully dielectric, non-periodic and passive configurations remains a challenge in situations where total internal reflection is not supported. Here, we present an approach to trap** light that utilizes the exotic features of Lagrange points, a special class of equilibrium positions akin to those responsible for capturing trojan asteroids in celestial mechanics. This is achieved in twisted arrangements in which optical Coriolis forces induce guiding channels even at locations where the refractive index landscape is defocusing or entirely unremarkable. These findings may have implications beyond standard optical waveguiding schemes and could also apply to other physical systems such as acoustics, electron beams and ultracold atoms.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
Authors:
Zhaorun Chen,
Zhuokai Zhao,
Hongyin Luo,
Huaxiu Yao,
Bo Li,
Jiawei Zhou
Abstract:
While large vision-language models (LVLMs) have demonstrated impressive capabilities in interpreting multi-modal contexts, they invariably suffer from object hallucinations (OH). We introduce HALC, a novel decoding algorithm designed to mitigate OH in LVLMs. HALC leverages distinct fine-grained optimal visual information in vision-language tasks and operates on both local and global contexts simul…
▽ More
While large vision-language models (LVLMs) have demonstrated impressive capabilities in interpreting multi-modal contexts, they invariably suffer from object hallucinations (OH). We introduce HALC, a novel decoding algorithm designed to mitigate OH in LVLMs. HALC leverages distinct fine-grained optimal visual information in vision-language tasks and operates on both local and global contexts simultaneously. Specifically, HALC integrates a robust auto-focal grounding mechanism (locally) to correct hallucinated tokens on the fly, and a specialized beam search algorithm (globally) to significantly reduce OH while preserving text generation quality. Additionally, HALC can be integrated into any LVLMs as a plug-and-play module without extra training. Extensive experimental studies demonstrate the effectiveness of HALC in reducing OH, outperforming state-of-the-arts across four benchmarks.
△ Less
Submitted 10 June, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Authors:
Weiyun Wang,
Yiming Ren,
Haowen Luo,
Tiantong Li,
Chenxiang Yan,
Zhe Chen,
Wenhai Wang,
Qingyun Li,
Lewei Lu,
Xizhou Zhu,
Yu Qiao,
Jifeng Dai
Abstract:
We present the All-Seeing Project V2: a new model and dataset designed for understanding object relations in images. Specifically, we propose the All-Seeing Model V2 (ASMv2) that integrates the formulation of text generation, object localization, and relation comprehension into a relation conversation (ReC) task. Leveraging this unified task, our model excels not only in perceiving and recognizing…
▽ More
We present the All-Seeing Project V2: a new model and dataset designed for understanding object relations in images. Specifically, we propose the All-Seeing Model V2 (ASMv2) that integrates the formulation of text generation, object localization, and relation comprehension into a relation conversation (ReC) task. Leveraging this unified task, our model excels not only in perceiving and recognizing all objects within the image but also in gras** the intricate relation graph between them, diminishing the relation hallucination often encountered by Multi-modal Large Language Models (MLLMs). To facilitate training and evaluation of MLLMs in relation understanding, we created the first high-quality ReC dataset ({AS-V2) which is aligned with the format of standard instruction tuning data. In addition, we design a new benchmark, termed Circular-based Relation Probing Evaluation (CRPE) for comprehensively evaluating the relation comprehension capabilities of MLLMs. Notably, our ASMv2 achieves an overall accuracy of 52.04 on this relation-aware benchmark, surpassing the 43.14 of LLaVA-1.5 by a large margin. We hope that our work can inspire more future research and contribute to the evolution towards artificial general intelligence. Our project is released at https://github.com/OpenGVLab/all-seeing.
△ Less
Submitted 17 April, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
Functionally-Complete Boolean Logic in Real DRAM Chips: Experimental Characterization and Analysis
Authors:
Ismail Emir Yuksel,
Yahya Can Tugrul,
Ataberk Olgun,
F. Nisa Bostanci,
A. Giray Yaglikci,
Geraldo F. Oliveira,
Haocong Luo,
Juan Gómez-Luna,
Mohammad Sadrosadati,
Onur Mutlu
Abstract:
Processing-using-DRAM (PuD) is an emerging paradigm that leverages the analog operational properties of DRAM circuitry to enable massively parallel in-DRAM computation. PuD has the potential to reduce or eliminate costly data movement between processing elements and main memory. Prior works experimentally demonstrate three-input MAJ (MAJ3) and two-input AND and OR operations in commercial off-the-…
▽ More
Processing-using-DRAM (PuD) is an emerging paradigm that leverages the analog operational properties of DRAM circuitry to enable massively parallel in-DRAM computation. PuD has the potential to reduce or eliminate costly data movement between processing elements and main memory. Prior works experimentally demonstrate three-input MAJ (MAJ3) and two-input AND and OR operations in commercial off-the-shelf (COTS) DRAM chips. Yet, demonstrations on COTS DRAM chips do not provide a functionally complete set of operations.
We experimentally demonstrate that COTS DRAM chips are capable of performing 1) functionally-complete Boolean operations: NOT, NAND, and NOR and 2) many-input (i.e., more than two-input) AND and OR operations. We present an extensive characterization of new bulk bitwise operations in 256 off-the-shelf modern DDR4 DRAM chips. We evaluate the reliability of these operations using a metric called success rate: the fraction of correctly performed bitwise operations. Among our 19 new observations, we highlight four major results. First, we can perform the NOT operation on COTS DRAM chips with a 98.37% success rate on average. Second, we can perform up to 16-input NAND, NOR, AND, and OR operations on COTS DRAM chips with high reliability (e.g., 16-input NAND, NOR, AND, and OR with an average success rate of 94.94%, 95.87%, 94.94%, and 95.85%, respectively). Third, data pattern only slightly affects bitwise operations. Our results show that executing NAND, NOR, AND, and OR operations with random data patterns decreases the success rate compared to all logic-1/logic-0 patterns by 1.39%, 1.97%, 1.43%, and 1.98%, respectively. Fourth, bitwise operations are highly resilient to temperature changes, with small success rate fluctuations of at most 1.66% when the temperature is increased from 50C to 95C. We open-source our infrastructure at https://github.com/CMU-SAFARI/FCDRAM
△ Less
Submitted 21 April, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Spatial Variation-Aware Read Disturbance Defenses: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions
Authors:
Abdullah Giray Yağlıkçı,
Yahya Can Tuğrul,
Geraldo F. Oliveira,
İsmail Emir Yüksel,
Ataberk Olgun,
Haocong Luo,
Onur Mutlu
Abstract:
Read disturbance in modern DRAM chips is a widespread phenomenon and is reliably used for breaking memory isolation, a fundamental building block for building robust systems. RowHammer and RowPress are two examples of read disturbance in DRAM where repeatedly accessing (hammering) or kee** active (pressing) a memory location induces bitflips in other memory locations. Unfortunately, shrinking te…
▽ More
Read disturbance in modern DRAM chips is a widespread phenomenon and is reliably used for breaking memory isolation, a fundamental building block for building robust systems. RowHammer and RowPress are two examples of read disturbance in DRAM where repeatedly accessing (hammering) or kee** active (pressing) a memory location induces bitflips in other memory locations. Unfortunately, shrinking technology node size exacerbates read disturbance in DRAM chips over generations. As a result, existing defense mechanisms suffer from significant performance and energy overheads, limited effectiveness, or prohibitively high hardware complexity.
In this paper, we tackle these shortcomings by leveraging the spatial variation in read disturbance across different memory locations in real DRAM chips. To do so, we 1) present the first rigorous real DRAM chip characterization study of spatial variation of read disturbance and 2) propose Svärd, a new mechanism that dynamically adapts the aggressiveness of existing solutions based on the row-level read disturbance profile. Our experimental characterization on 144 real DDR4 DRAM chips representing 10 chip designs demonstrates a large variation in read disturbance vulnerability across different memory locations: in the part of memory with the worst read disturbance vulnerability, 1) up to 2x the number of bitflips can occur and 2) bitflips can occur at an order of magnitude fewer accesses, compared to the memory locations with the least vulnerability to read disturbance. Svärd leverages this variation to reduce the overheads of five state-of-the-art read disturbance solutions, and thus significantly increases system performance.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Enhancing Hyperspectral Images via Diffusion Model and Group-Autoencoder Super-resolution Network
Authors:
Zhaoyang Wang,
Dongyang Li,
Mingyang Zhang,
Hao Luo,
Maoguo Gong
Abstract:
Existing hyperspectral image (HSI) super-resolution (SR) methods struggle to effectively capture the complex spectral-spatial relationships and low-level details, while diffusion models represent a promising generative model known for their exceptional performance in modeling complex relations and learning high and low-level visual features. The direct application of diffusion models to HSI SR is…
▽ More
Existing hyperspectral image (HSI) super-resolution (SR) methods struggle to effectively capture the complex spectral-spatial relationships and low-level details, while diffusion models represent a promising generative model known for their exceptional performance in modeling complex relations and learning high and low-level visual features. The direct application of diffusion models to HSI SR is hampered by challenges such as difficulties in model convergence and protracted inference time. In this work, we introduce a novel Group-Autoencoder (GAE) framework that synergistically combines with the diffusion model to construct a highly effective HSI SR model (DMGASR). Our proposed GAE framework encodes high-dimensional HSI data into low-dimensional latent space where the diffusion model works, thereby alleviating the difficulty of training the diffusion model while maintaining band correlation and considerably reducing inference time. Experimental results on both natural and remote sensing hyperspectral datasets demonstrate that the proposed method is superior to other state-of-the-art methods both visually and metrically.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Reinforcement Learning Based Robust Volt/Var Control in Active Distribution Networks With Imprecisely Known Delay
Authors:
Hong Cheng,
Huan Luo,
Zhi Liu,
Wei Sun,
Weitao Li,
Qiyue Li
Abstract:
Active distribution networks (ADNs) incorporating massive photovoltaic (PV) devices encounter challenges of rapid voltage fluctuations and potential violations. Due to the fluctuation and intermittency of PV generation, the state gap, arising from time-inconsistent states and exacerbated by imprecisely known system delays, significantly impacts the accuracy of voltage control. This paper addresses…
▽ More
Active distribution networks (ADNs) incorporating massive photovoltaic (PV) devices encounter challenges of rapid voltage fluctuations and potential violations. Due to the fluctuation and intermittency of PV generation, the state gap, arising from time-inconsistent states and exacerbated by imprecisely known system delays, significantly impacts the accuracy of voltage control. This paper addresses this challenge by introducing a framework for delay adaptive Volt/Var control (VVC) in the presence of imprecisely known system delays to regulate the reactive power of PV inverters. The proposed approach formulates the voltage control, based on predicted system operation states, as a robust VVC problem. It employs sample selection from the state prediction interval to promptly identify the worst-performing system operation state. Furthermore, we leverage the decentralized partially observable Markov decision process (Dec-POMDP) to reformulate the robust VVC problem. We design Multiple Policy Networks and employ Multiple Policy Networks and Reward Sha**-based Multi-agent Twin Delayed Deep Deterministic Policy Gradient (MPNRS-MATD3) algorithm to efficiently address and solve the Dec-POMDP model-based problem. Simulation results show the delay adaption characteristic of our proposed framework, and the MPNRS-MATD3 outperforms other multi-agent reinforcement learning algorithms in robust voltage control.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
On the maximum number of $r$-cliques in graphs free of complete $r$-partite subgraphs
Authors:
József Balogh,
Suyun Jiang,
Haoran Luo
Abstract:
We estimate the maximum possible number of cliques of size $r$ in an $n$-vertex graph free of a fixed complete $r$-partite graph $K_{s_1, s_2, \ldots, s_r}$. By viewing every $r$-clique as a hyperedge, the upper bound on the Turán number of the complete $r$-partite hypergraphs gives the upper bound $O\left(n^{r - {1}/{\prod_{i=1}^{r-1}s_i}}\right)$. We improve this to…
▽ More
We estimate the maximum possible number of cliques of size $r$ in an $n$-vertex graph free of a fixed complete $r$-partite graph $K_{s_1, s_2, \ldots, s_r}$. By viewing every $r$-clique as a hyperedge, the upper bound on the Turán number of the complete $r$-partite hypergraphs gives the upper bound $O\left(n^{r - {1}/{\prod_{i=1}^{r-1}s_i}}\right)$. We improve this to $o\left(n^{r - {1}/{\prod_{i=1}^{r-1}s_i}}\right)$. The main tool in our proof is the graph removal lemma. We also provide several lower bound constructions.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Enhancement of the critical current by surface irregularities in Fe-based superconductors
Authors:
I. F. Llovo,
J. Mosqueira,
Ding Hu,
Huiqian Luo,
Shiliang Li
Abstract:
The critical current $I_c$ of single crystals of the iron pnictide superconductor BaFe$_2$(As$_{1-x}$P$_x$)$_2$, has been studied through measurements of magnetic hysteresis cycles. We show that the introduction of surface irregularities in the $μ$m scale significantly increase $I_c$, primarily near the irreversibility magnetic field $H_{irr}$, where the surface currents are the main contribution…
▽ More
The critical current $I_c$ of single crystals of the iron pnictide superconductor BaFe$_2$(As$_{1-x}$P$_x$)$_2$, has been studied through measurements of magnetic hysteresis cycles. We show that the introduction of surface irregularities in the $μ$m scale significantly increase $I_c$, primarily near the irreversibility magnetic field $H_{irr}$, where the surface currents are the main contribution to $I_c$. Such an increase is consistent with a theoretical estimate for the maximum non-dissipative current that a rough surface can sustain, based on Mathieu-Simon continuum theory for the vortex state.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Giant enhancement of higher-order harmonics of an optical-tweezer phonon laser
Authors:
Guangzong Xiao,
Tengfang Kuang,
Yutong He,
Xinlin Chen,
Wei Xiong,
Xiang Han,
Zhongqi Tan,
Hui Luo,
Hui **g
Abstract:
Phonon lasers, as mechanical analogues of optical lasers, are unique tools for not only fundamental studies of phononics but also diverse applications such as acoustic imaging and force sensing. Very recently, by levitating a micro-size sphere in an optical tweezer, higher-order mechanical harmonics were observed in the phonon-lasing regime, as the first step towards nonlinear levitated optomechan…
▽ More
Phonon lasers, as mechanical analogues of optical lasers, are unique tools for not only fundamental studies of phononics but also diverse applications such as acoustic imaging and force sensing. Very recently, by levitating a micro-size sphere in an optical tweezer, higher-order mechanical harmonics were observed in the phonon-lasing regime, as the first step towards nonlinear levitated optomechanics [Nat. Phys. 19, 414 (2023)]. However, both the lasing strengths and the quality factors of the observed harmonics are typically very low, thus severely hindering their applications. Here we show that, by applying a simple but powerful electronic control to such a levitated micro-sphere, three orders of magnitude enhancement are achievable in the brightness of the phonon lasers, including both the fundamental mode and all its higher-order harmonics. Also, giant improvements of their linewidth and frequency stability are realized in such an electro-optomechanical system, together with further improved higher-order phonon coherence. These results, as a significant step forward for enhancing and controlling micro-object phonon lasers, can be readily used for a wide range of applications involving nonlinear phonon lasers, such as acoustic frequency comb, ultra-sound sensing, atmospherical monitoring, and even bio-medical diagnosis of levitated micro-size objects.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
GaussianHair: Hair Modeling and Rendering with Light-aware Gaussians
Authors:
Haimin Luo,
Min Ouyang,
Zijun Zhao,
Suyi Jiang,
Longwen Zhang,
Qixuan Zhang,
Wei Yang,
Lan Xu,
**gyi Yu
Abstract:
Hairstyle reflects culture and ethnicity at first glance. In the digital era, various realistic human hairstyles are also critical to high-fidelity digital human assets for beauty and inclusivity. Yet, realistic hair modeling and real-time rendering for animation is a formidable challenge due to its sheer number of strands, complicated structures of geometry, and sophisticated interaction with lig…
▽ More
Hairstyle reflects culture and ethnicity at first glance. In the digital era, various realistic human hairstyles are also critical to high-fidelity digital human assets for beauty and inclusivity. Yet, realistic hair modeling and real-time rendering for animation is a formidable challenge due to its sheer number of strands, complicated structures of geometry, and sophisticated interaction with light. This paper presents GaussianHair, a novel explicit hair representation. It enables comprehensive modeling of hair geometry and appearance from images, fostering innovative illumination effects and dynamic animation capabilities. At the heart of GaussianHair is the novel concept of representing each hair strand as a sequence of connected cylindrical 3D Gaussian primitives. This approach not only retains the hair's geometric structure and appearance but also allows for efficient rasterization onto a 2D image plane, facilitating differentiable volumetric rendering. We further enhance this model with the "GaussianHair Scattering Model", adept at recreating the slender structure of hair strands and accurately capturing their local diffuse color in uniform lighting. Through extensive experiments, we substantiate that GaussianHair achieves breakthroughs in both geometric and appearance fidelity, transcending the limitations encountered in state-of-the-art methods for hair reconstruction. Beyond representation, GaussianHair extends to support editing, relighting, and dynamic rendering of hair, offering seamless integration with conventional CG pipeline workflows. Complementing these advancements, we have compiled an extensive dataset of real human hair, each with meticulously detailed strand geometry, to propel further research in this field.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Accelerating Parallel Sampling of Diffusion Models
Authors:
Zhiwei Tang,
Jiasheng Tang,
Hao Luo,
Fan Wang,
Tsung-Hui Chang
Abstract:
Diffusion models have emerged as state-of-the-art generative models for image generation. However, sampling from diffusion models is usually time-consuming due to the inherent autoregressive nature of their sampling process. In this work, we propose a novel approach that accelerates the sampling of diffusion models by parallelizing the autoregressive process. Specifically, we reformulate the sampl…
▽ More
Diffusion models have emerged as state-of-the-art generative models for image generation. However, sampling from diffusion models is usually time-consuming due to the inherent autoregressive nature of their sampling process. In this work, we propose a novel approach that accelerates the sampling of diffusion models by parallelizing the autoregressive process. Specifically, we reformulate the sampling process as solving a system of triangular nonlinear equations through fixed-point iteration. With this innovative formulation, we explore several systematic techniques to further reduce the iteration steps required by the solving process. Applying these techniques, we introduce ParaTAA, a universal and training-free parallel sampling algorithm that can leverage extra computational and memory resources to increase the sampling speed. Our experiments demonstrate that ParaTAA can decrease the inference steps required by common sequential sampling algorithms such as DDIM and DDPM by a factor of 4$\sim$14 times. Notably, when applying ParaTAA with 100 steps DDIM for Stable Diffusion, a widely-used text-to-image diffusion model, it can produce the same images as the sequential sampling in only 7 inference steps. The code is available at https://github.com/TZW1998/ParaTAA-Diffusion.
△ Less
Submitted 27 May, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Efficient Contextual Bandits with Uninformed Feedback Graphs
Authors:
Mengxiao Zhang,
Yuheng Zhang,
Haipeng Luo,
Paul Mineiro
Abstract:
Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications. A recent work by Zhang et al. (2023) studies the contextual version of this problem and proposes an efficient and optimal algorithm via a reduction to online regression. However, their algorithm crucially relies on seeing…
▽ More
Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications. A recent work by Zhang et al. (2023) studies the contextual version of this problem and proposes an efficient and optimal algorithm via a reduction to online regression. However, their algorithm crucially relies on seeing the feedback graph before making each decision, while in many applications, the feedback graph is uninformed, meaning that it is either only revealed after the learner makes her decision or even never fully revealed at all. This work develops the first contextual algorithm for such uninformed settings, via an efficient reduction to online regression over both the losses and the graphs. Importantly, we show that it is critical to learn the graphs using log loss instead of squared loss to obtain favorable regret guarantees. We also demonstrate the empirical effectiveness of our algorithm on a bidding application using both synthetic and real-world data.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Contextual Multinomial Logit Bandits with General Value Functions
Authors:
Mengxiao Zhang,
Haipeng Luo
Abstract:
Contextual multinomial logit (MNL) bandits capture many real-world assortment recommendation problems such as online retailing/advertising. However, prior work has only considered (generalized) linear value functions, which greatly limits its applicability. Motivated by this fact, in this work, we consider contextual MNL bandits with a general value function class that contains the ground truth, b…
▽ More
Contextual multinomial logit (MNL) bandits capture many real-world assortment recommendation problems such as online retailing/advertising. However, prior work has only considered (generalized) linear value functions, which greatly limits its applicability. Motivated by this fact, in this work, we consider contextual MNL bandits with a general value function class that contains the ground truth, borrowing ideas from a recent trend of studies on contextual bandits. Specifically, we consider both the stochastic and the adversarial settings, and propose a suite of algorithms, each with different computation-regret trade-off. When applied to the linear case, our results not only are the first ones with no dependence on a certain problem-dependent constant that can be exponentially large, but also enjoy other advantages such as computational efficiency, dimension-free regret bounds, or the ability to handle completely adversarial contexts and rewards.
△ Less
Submitted 18 February, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
BAFLineDP: Code Bilinear Attention Fusion Framework for Line-Level Defect Prediction
Authors:
Shaojian Qiu,
Huihao Huang,
Jianxiang Luo,
Yingjie Kuang,
Haoyu Luo
Abstract:
Software defect prediction aims to identify defect-prone code, aiding developers in optimizing testing resource allocation. Most defect prediction approaches primarily focus on coarse-grained, file-level defect prediction, which fails to provide developers with the precision required to locate defective code. Recently, some researchers have proposed fine-grained, line-level defect prediction metho…
▽ More
Software defect prediction aims to identify defect-prone code, aiding developers in optimizing testing resource allocation. Most defect prediction approaches primarily focus on coarse-grained, file-level defect prediction, which fails to provide developers with the precision required to locate defective code. Recently, some researchers have proposed fine-grained, line-level defect prediction methods. However, most of these approaches lack an in-depth consideration of the contextual semantics of code lines and neglect the local interaction information among code lines. To address the above issues, this paper presents a line-level defect prediction method grounded in a code bilinear attention fusion framework (BAFLineDP). This method discerns defective code files and lines by integrating source code line semantics, line-level context, and local interaction information between code lines and line-level context. Through an extensive analysis involving within- and cross-project defect prediction across 9 distinct projects encompassing 32 releases, our results demonstrate that BAFLineDP outperforms current advanced file-level and line-level defect prediction approaches.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
Evaluating Large Language Models in Analysing Classroom Dialogue
Authors:
Yun Long,
Haifeng Luo,
Yu Zhang
Abstract:
This study explores the application of Large Language Models (LLMs), specifically GPT-4, in the analysis of classroom dialogue, a crucial research task for both teaching diagnosis and quality improvement. Recognizing the knowledge-intensive and labor-intensive nature of traditional qualitative methods in educational research, this study investigates the potential of LLM to streamline and enhance t…
▽ More
This study explores the application of Large Language Models (LLMs), specifically GPT-4, in the analysis of classroom dialogue, a crucial research task for both teaching diagnosis and quality improvement. Recognizing the knowledge-intensive and labor-intensive nature of traditional qualitative methods in educational research, this study investigates the potential of LLM to streamline and enhance the analysis process. The study involves datasets from a middle school, encompassing classroom dialogues across mathematics and Chinese classes. These dialogues were manually coded by educational experts and then analyzed using a customised GPT-4 model. This study focuses on comparing manual annotations with the outputs of GPT-4 to evaluate its efficacy in analyzing educational dialogues. Time efficiency, inter-coder agreement, and inter-coder reliability between human coders and GPT-4 are evaluated. Results indicate substantial time savings with GPT-4, and a high degree of consistency in coding between the model and human coders, with some discrepancies in specific codes. These findings highlight the strong potential of LLM in teaching evaluation and facilitation.
△ Less
Submitted 22 February, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
Transolver: A Fast Transformer Solver for PDEs on General Geometries
Authors:
Haixu Wu,
Huakun Luo,
Haowen Wang,
Jianmin Wang,
Mingsheng Long
Abstract:
Transformers have empowered many milestones across various fields and have recently been applied to solve partial differential equations (PDEs). However, since PDEs are typically discretized into large-scale meshes with complex geometries, it is challenging for Transformers to capture intricate physical correlations directly from massive individual points. Going beyond superficial and unwieldy mes…
▽ More
Transformers have empowered many milestones across various fields and have recently been applied to solve partial differential equations (PDEs). However, since PDEs are typically discretized into large-scale meshes with complex geometries, it is challenging for Transformers to capture intricate physical correlations directly from massive individual points. Going beyond superficial and unwieldy meshes, we present Transolver based on a more foundational idea, which is learning intrinsic physical states hidden behind discretized geometries. Specifically, we propose a new Physics-Attention to adaptively split the discretized domain into a series of learnable slices of flexible shapes, where mesh points under similar physical states will be ascribed to the same slice. By calculating attention to physics-aware tokens encoded from slices, Transovler can effectively capture intricate physical correlations under complex geometrics, which also empowers the solver with endogenetic geometry-general modeling capacity and can be efficiently computed in linear complexity. Transolver achieves consistent state-of-the-art with 22% relative gain across six standard benchmarks and also excels in large-scale industrial simulations, including car and airfoil designs. Code is available at https://github.com/thuml/Transolver.
△ Less
Submitted 1 June, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
SMUTF: Schema Matching Using Generative Tags and Hybrid Features
Authors:
Yu Zhang,
Mei Di,
Haozheng Luo,
Chenwei Xu,
Richard Tzong-Han Tsai
Abstract:
We introduce SMUTF, a unique approach for large-scale tabular data schema matching (SM), which assumes that supervised learning does not affect performance in open-domain tasks, thereby enabling effective cross-domain matching. This system uniquely combines rule-based feature engineering, pre-trained language models, and generative large language models. In an innovative adaptation inspired by the…
▽ More
We introduce SMUTF, a unique approach for large-scale tabular data schema matching (SM), which assumes that supervised learning does not affect performance in open-domain tasks, thereby enabling effective cross-domain matching. This system uniquely combines rule-based feature engineering, pre-trained language models, and generative large language models. In an innovative adaptation inspired by the Humanitarian Exchange Language, we deploy 'generative tags' for each data column, enhancing the effectiveness of SM. SMUTF exhibits extensive versatility, working seamlessly with any pre-existing pre-trained embeddings, classification methods, and generative models.
Recognizing the lack of extensive, publicly available datasets for SM, we have created and open-sourced the HDXSM dataset from the public humanitarian data. We believe this to be the most exhaustive SM dataset currently available. In evaluations across various public datasets and the novel HDXSM dataset, SMUTF demonstrated exceptional performance, surpassing existing state-of-the-art models in terms of accuracy and efficiency, and} improving the F1 score by 11.84% and the AUC of ROC by 5.08%.
△ Less
Submitted 6 February, 2024; v1 submitted 22 January, 2024;
originally announced February 2024.
-
Phase diagram of the interacting Haldane model with spin-dependent sublattice potentials
Authors:
Can Shao,
Hong-Gang Luo
Abstract:
Using the exact-diagonalization (ED) and mean-field (MF) approaches, we investigate the ground-state phase diagram of the interacting Haldane model on the honeycomb lattice, incorporating spin-dependent sublattice potentials $Δ_{σ,α}$. Here $α=\text{A}$,$\text{B}$ and $σ=\uparrow$,$\downarrow$ denote the sublattice and spin components, respectively. Setting $Δ_{σ,\text{A}}=+Δ$ ($-Δ$) and…
▽ More
Using the exact-diagonalization (ED) and mean-field (MF) approaches, we investigate the ground-state phase diagram of the interacting Haldane model on the honeycomb lattice, incorporating spin-dependent sublattice potentials $Δ_{σ,α}$. Here $α=\text{A}$,$\text{B}$ and $σ=\uparrow$,$\downarrow$ denote the sublattice and spin components, respectively. Setting $Δ_{σ,\text{A}}=+Δ$ ($-Δ$) and $Δ_{σ,\text{B}}$$=-Δ$ ($+Δ$) for $σ=\uparrow$ ($\downarrow$) results in the system favoring a spin ordered state. Conversely, introducing the nearest-neighbor Coulomb interaction can induce charge ordering in the system. Due to the competition between these factors, we observe that in both ED and MF approaches, an exotic state with Chern number $C=1$ survives amidst two locally ordered phases and a topologically ordered phase with $C=2$. In the ED method, various properties, such as the fidelity metric, the excitation gap and the structure factors, are employed to identify critical points. In the MF method, using a sufficiently large lattice size, we define the local order parameters and band gaps to characterize the phase transitions. The interacting Haldane model and the spin-dependent lattice potential may be experimentally realized in an ultracold atom gas, providing a potential means to detect this intriguing state.
△ Less
Submitted 3 June, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
Majority of water hides deep in the interiors of exoplanets
Authors:
Haiyang Luo,
Caroline Dorn,
Jie Deng
Abstract:
Water is an important component of exoplanets, with its distribution, i.e., whether at the surface or deep inside, fundamentally influencing the planetary properties. The distribution of water in most exoplanets is determined by yet-unknown partitioning coefficients at extreme conditions. Our new first-principles molecular dynamics simulations reveal that water strongly partitions into iron over s…
▽ More
Water is an important component of exoplanets, with its distribution, i.e., whether at the surface or deep inside, fundamentally influencing the planetary properties. The distribution of water in most exoplanets is determined by yet-unknown partitioning coefficients at extreme conditions. Our new first-principles molecular dynamics simulations reveal that water strongly partitions into iron over silicate at high pressures and thus would preferentially stay in a planet's core. Furthermore, we model planet interiors by considering the effect of water on density, melting temperature, and water partitioning. The results shatter the notion of water worlds as imagined before: the majority of the bulk water budget (even more than 95%) can be stored deep within the core and the mantle, and not at the surface. For planets more massive than ~6 Earth's mass and Earth-size planets (of lower mass and small water budgets), the majority of water resides deep in the cores of planets, Whether water is assumed to be at the surface or at depth can affect the radius by up to 25% for a given mass. This has drastic consequences for the inferred water distribution in exoplanets from mass-radius data.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Rethinking the Producer-Consumer Relationship in Modern DRAM-Based Systems
Authors:
Minesh Patel,
Taha Shahroodi,
Aditya Manglik,
Abdullah Giray Yağlıkçı,
Ataberk Olgun,
Haocong Luo,
Onur Mutlu
Abstract:
Generational improvements to commodity DRAM throughout half a century have long solidified its prevalence as main memory across the computing industry. However, overcoming today's DRAM technology scaling challenges requires new solutions driven by both DRAM producers and consumers. In this paper, we observe that the separation of concerns between producers and consumers specified by industry-wide…
▽ More
Generational improvements to commodity DRAM throughout half a century have long solidified its prevalence as main memory across the computing industry. However, overcoming today's DRAM technology scaling challenges requires new solutions driven by both DRAM producers and consumers. In this paper, we observe that the separation of concerns between producers and consumers specified by industry-wide DRAM standards is becoming a liability to progress in addressing scaling-related concerns.
To understand the problem, we study four key directions for overcoming DRAM scaling challenges using system-memory cooperation: (i) improving memory access latencies; (ii) reducing DRAM refresh overheads; (iii) securely defending against the RowHammer vulnerability; and (iv) addressing worsening memory errors. We find that the single most important barrier to advancement in all four cases is the consumer's lack of insight into DRAM reliability. Based on an analysis of DRAM reliability testing, we recommend revising the separation of concerns to incorporate limited information transparency between producers and consumers. Finally, we propose adopting this revision in a two-step plan, starting with immediate information release through crowdsourcing and publication and culminating in widespread modifications to DRAM standards.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Integrated Imaging and Communication with Reconfigurable Intelligent Surfaces
Authors:
Hao Luo,
Ahmed Alkhateeb
Abstract:
Reconfigurable intelligent surfaces, with their large number of antennas, offer an interesting opportunity for high spatial-resolution imaging. In this paper, we propose a novel RIS-aided integrated imaging and communication system that can reduce the RIS beam training overhead for communication by leveraging the imaging of the surrounding environment. In particular, using the RIS as a wireless im…
▽ More
Reconfigurable intelligent surfaces, with their large number of antennas, offer an interesting opportunity for high spatial-resolution imaging. In this paper, we propose a novel RIS-aided integrated imaging and communication system that can reduce the RIS beam training overhead for communication by leveraging the imaging of the surrounding environment. In particular, using the RIS as a wireless imaging device, our system constructs the scene depth map of the environment, including the mobile user. Then, we develop a user detection algorithm that subtracts the background and extracts the mobile user attributes from the depth map. These attributes are then utilized to design the RIS interaction vector and the beam selection strategy with low overhead. Simulation results show that the proposed approach can achieve comparable beamforming gain to the optimal/exhaustive beam selection solution while requiring 1000 times less beam training overhead.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games
Authors:
Yang Cai,
Haipeng Luo,
Chen-Yu Wei,
Weiqiang Zheng
Abstract:
We study policy optimization algorithms for computing correlated equilibria in multi-player general-sum Markov Games. Previous results achieve $O(T^{-1/2})$ convergence rate to a correlated equilibrium and an accelerated $O(T^{-3/4})$ convergence rate to the weaker notion of coarse correlated equilibrium. In this paper, we improve both results significantly by providing an uncoupled policy optimiz…
▽ More
We study policy optimization algorithms for computing correlated equilibria in multi-player general-sum Markov Games. Previous results achieve $O(T^{-1/2})$ convergence rate to a correlated equilibrium and an accelerated $O(T^{-3/4})$ convergence rate to the weaker notion of coarse correlated equilibrium. In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal $\tilde{O}(T^{-1})$ convergence rate for computing a correlated equilibrium. Our algorithm is constructed by combining two main elements (i) smooth value updates and (ii) the optimistic-follow-the-regularized-leader algorithm with the log barrier regularizer.
△ Less
Submitted 1 May, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
Quantization of Charge Carriers in Conduction Channels of Si-Based Field-Effect Transistors for Multinary Computation
Authors:
P. Xu,
H. Luo
Abstract:
The latest field-effect transistors are entering the regime where quantum effects within the conduction channel can play a significant role because of the increasingly reduced dimensions. We investigate the effects of quantized states in conduction channels in transistors with dimensions close to those presently used. We use the standard configuration of Si-based metal-oxide-semiconductor field-ef…
▽ More
The latest field-effect transistors are entering the regime where quantum effects within the conduction channel can play a significant role because of the increasingly reduced dimensions. We investigate the effects of quantized states in conduction channels in transistors with dimensions close to those presently used. We use the standard configuration of Si-based metal-oxide-semiconductor field-effect transistors (MOSFETs), as a simplified model to provide an estimate of the effect of quantization with respect to the dimensions of the conduction channel. The study shows simulated results of drain currents for various combinations of dimensions, in which distinguishable current levels as a function of the applied gate bias can be obtained at room temperature. The same qualitative dependence on dimensions is expected to apply to the state-of-the-art transistor architectures with dimensions near this range, such as fin field-effect transistors (FinFETs) and gate-all-around field-effect transistors (GAAFETs). The results show that utilizing quantized states in the conduction channel for multinary computation has become a possibility with their present dimensions.
△ Less
Submitted 2 February, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
From Understanding to Utilization: A Survey on Explainability for Large Language Models
Authors:
Haoyan Luo,
Lucia Specia
Abstract:
Explainability for Large Language Models (LLMs) is a critical yet challenging aspect of natural language processing. As LLMs are increasingly integral to diverse applications, their "black-box" nature sparks significant concerns regarding transparency and ethical use. This survey underscores the imperative for increased explainability in LLMs, delving into both the research on explainability and t…
▽ More
Explainability for Large Language Models (LLMs) is a critical yet challenging aspect of natural language processing. As LLMs are increasingly integral to diverse applications, their "black-box" nature sparks significant concerns regarding transparency and ethical use. This survey underscores the imperative for increased explainability in LLMs, delving into both the research on explainability and the various methodologies and tasks that utilize an understanding of these models. Our focus is primarily on pre-trained Transformer-based LLMs, such as LLaMA family, which pose distinctive interpretability challenges due to their scale and complexity. In terms of existing methods, we classify them into local and global analyses, based on their explanatory objectives. When considering the utilization of explainability, we explore several compelling methods that concentrate on model editing, control generation, and model enhancement. Additionally, we examine representative evaluation metrics and datasets, elucidating their advantages and limitations. Our goal is to reconcile theoretical and empirical understanding with practical implementation, proposing exciting avenues for explanatory techniques and their applications in the LLMs era.
△ Less
Submitted 21 February, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
Systematic Performance Evaluation Framework for LEO Mega-Constellation Satellite Networks
Authors:
Yu Wang,
Chuili Kong,
Xian Meng,
Hejia Luo,
Ke-Xin Li,
Jun Wang
Abstract:
Low Earth orbit (LEO) mega-constellation satellite networks have shown great potential to extend the coverage capability of conventional terrestrial networks. How to systematically define, quantify, and assess the technical performance of LEO mega-constellation satellite networks remains an open issue. In this paper, we propose a comprehensive key performance indicator (KPI) framework for mega-con…
▽ More
Low Earth orbit (LEO) mega-constellation satellite networks have shown great potential to extend the coverage capability of conventional terrestrial networks. How to systematically define, quantify, and assess the technical performance of LEO mega-constellation satellite networks remains an open issue. In this paper, we propose a comprehensive key performance indicator (KPI) framework for mega-constellation based LEO satellite networks. An efficient LEO constellation oriented performance evaluation methodology is then carefully designed by resorting to the concept of interfering area and spherical geographic cell. We have carried out rigorous system-level simulations and provided numerical results to assess the KPI framework. It can be observed that the achieved area traffic capacity of the reference LEO constellation is around 4 Kbps/km2, with service availability ranging from 0.36 to 0.39. Besides, the average access success probability and handover failure rate is approximate to 96% and 10%, respectively, in the nearest satellite association scheme.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Two-dimensional quantum droplets in binary quadrupolar condensates
Authors:
Aowei Yang,
Jiahao Zhou,
Xiaoqing Liang,
Guilong Li,
Bin Liu,
Huan-Bo Luo,
Boris A Malomed,
Yongyao Li
Abstract:
We study the stability and characteristics of two-dimensional (2D) quasi-isotropic quantum droplets (QDs) of fundamental and vortex types, formed by binary Bose-Einstein condensate with magnetic quadrupole-quadrupole interactions (MQQIs). The magnetic quadrupoles are built as pairs of dipoles and antidipoles polarized along the x-axis. The MQQIs are induced by applying an external magnetic field t…
▽ More
We study the stability and characteristics of two-dimensional (2D) quasi-isotropic quantum droplets (QDs) of fundamental and vortex types, formed by binary Bose-Einstein condensate with magnetic quadrupole-quadrupole interactions (MQQIs). The magnetic quadrupoles are built as pairs of dipoles and antidipoles polarized along the x-axis. The MQQIs are induced by applying an external magnetic field that varies along the x-axis. The system is modeled by the Gross-Pitaevskii equations including the MQQIs and Lee-Huang-Yang correction to the mean-field approximation. Stable 2D fundamental QDs and quasi-isotropic vortex QDs with topological charges S<4 are produced by means of the imaginary-time-integration method for configurations with the quadrupoles polarized parallel to the systems two-dimensional plane. Effects of the norm and MQQI strength on the QDs are studied in detail. Some results, including an accurate prediction of the effective area, chemical potential, and peak density of QDs, are obtained in an analytical form by means of the Thomas-Fermi approximation. Collisions between moving QDs are studied by means of systematic simulations.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.