-
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Authors:
Zihan Wang,
Deli Chen,
Damai Dai,
Runxin Xu,
Zhuoshu Li,
Y. Wu
Abstract:
Parameter-efficient fine-tuning (PEFT) is crucial for customizing Large Language Models (LLMs) with constrained resources. Although there have been various PEFT methods for dense-architecture LLMs, PEFT for sparse-architecture LLMs is still underexplored. In this work, we study the PEFT method for LLMs with the Mixture-of-Experts (MoE) architecture and the contents of this work are mainly threefol…
▽ More
Parameter-efficient fine-tuning (PEFT) is crucial for customizing Large Language Models (LLMs) with constrained resources. Although there have been various PEFT methods for dense-architecture LLMs, PEFT for sparse-architecture LLMs is still underexplored. In this work, we study the PEFT method for LLMs with the Mixture-of-Experts (MoE) architecture and the contents of this work are mainly threefold: (1) We investigate the dispersion degree of the activated experts in customized tasks, and found that the routing distribution for a specific task tends to be highly concentrated, while the distribution of activated experts varies significantly across different tasks. (2) We propose Expert-Specialized Fine-Tuning, or ESFT, which tunes the experts most relevant to downstream tasks while freezing the other experts and modules; experimental results demonstrate that our method not only improves the tuning efficiency, but also matches or even surpasses the performance of full-parameter fine-tuning. (3) We further analyze the impact of the MoE architecture on expert-specialized fine-tuning. We find that MoE models with finer-grained experts are more advantageous in selecting the combination of experts that are most relevant to downstream tasks, thereby enhancing both the training efficiency and effectiveness.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Data-driven methods for flow and transport in porous media: a review
Authors:
Guang Yang,
Ran Xu,
Yusong Tian,
Songyuan Guo,
**gyi Wu,
Xu Chu
Abstract:
This review examined the current advancements in data-driven methods for analyzing flow and transport in porous media, which has various applications in energy, chemical engineering, environmental science, and beyond. Although there has been progress in recent years, the challenges of current experimental and high-fidelity numerical simulations, such as high computational costs and difficulties in…
▽ More
This review examined the current advancements in data-driven methods for analyzing flow and transport in porous media, which has various applications in energy, chemical engineering, environmental science, and beyond. Although there has been progress in recent years, the challenges of current experimental and high-fidelity numerical simulations, such as high computational costs and difficulties in accurately representing complex, heterogeneous structures, can still potentially be addressed by state-of-the-art data-driven methods. We analyzed the synergistic potential of these methods, addressed their limitations, and suggested how they can be effectively integrated to improve both the fidelity and efficiency of current research. A discussion on future research directions in this field was conducted, emphasizing the need for collaborative efforts that combine domain expertise in physics and advanced computationald and data-driven methodologies.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Capturing Minds, Not Just Words: Enhancing Role-Playing Language Models with Personality-Indicative Data
Authors:
Yiting Ran,
Xintao Wang,
Rui Xu,
Xinfeng Yuan,
Jiaqing Liang,
Yanghua Xiao,
Deqing Yang
Abstract:
Role-playing agents (RPA) have been a popular application area for large language models (LLMs), attracting significant interest from both industry and academia.While existing RPAs well portray the characters' knowledge and tones, they face challenges in capturing their minds, especially for small role-playing language models (RPLMs). In this paper, we propose to enhance RPLMs via personality-indi…
▽ More
Role-playing agents (RPA) have been a popular application area for large language models (LLMs), attracting significant interest from both industry and academia.While existing RPAs well portray the characters' knowledge and tones, they face challenges in capturing their minds, especially for small role-playing language models (RPLMs). In this paper, we propose to enhance RPLMs via personality-indicative data. Specifically, we leverage questions from psychological scales and distill advanced RPAs to generate dialogues that grasp the minds of characters. Experimental results validate that RPLMs trained with our dataset exhibit advanced role-playing capabilities for both general and personality-related evaluations. Code and data are available at \href{https://github.com/alienet1109/RolePersonality}{this URL}.
△ Less
Submitted 29 June, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction
Authors:
Yice Zhang,
Jie Zeng,
Weiming Hu,
Ziyi Wang,
Shiwei Chen,
Ruifeng Xu
Abstract:
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review, which is the most representative and challenging task in aspect-based sentiment analysis. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. To tackle this issue, we propose a self-tra…
▽ More
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review, which is the most representative and challenging task in aspect-based sentiment analysis. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. To tackle this issue, we propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels, aiming to filter out mismatches and thereby enhance the effectiveness of self-training. We highlight two critical aspects to ensure the scorer's effectiveness and reliability: the quality of the training dataset and its model architecture. To this end, we create a human-annotated comparison dataset and train a generative model on it using ranking-based objectives. Extensive experiments on public ASQP datasets reveal that using our scorer can greatly and consistently improve the effectiveness of self-training. Moreover, we explore the possibility of replacing humans with large language models for comparison dataset annotation, and experiments demonstrate its feasibility. We release our code and data at https://github.com/HITSZ-HLT/ST-w-Scorer-ABSA .
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
MindSpore Quantum: A User-Friendly, High-Performance, and AI-Compatible Quantum Computing Framework
Authors:
Xusheng Xu,
Jiangyu Cui,
Zidong Cui,
Runhong He,
Qingyu Li,
Xiaowei Li,
Yanling Lin,
Jiale Liu,
Wuxin Liu,
Jiale Lu,
Maolin Luo,
Chufan Lyu,
Shijie Pan,
Mosharev Pavel,
Runqiu Shu,
Jialiang Tang,
Ruoqian Xu,
Shu Xu,
Kang Yang,
Fan Yu,
Qingguo Zeng,
Haiying Zhao,
Qiang Zheng,
Junyuan Zhou,
Xu Zhou
, et al. (14 additional authors not shown)
Abstract:
We introduce MindSpore Quantum, a pioneering hybrid quantum-classical framework with a primary focus on the design and implementation of noisy intermediate-scale quantum (NISQ) algorithms. Leveraging the robust support of MindSpore, an advanced open-source deep learning training/inference framework, MindSpore Quantum exhibits exceptional efficiency in the design and training of variational quantum…
▽ More
We introduce MindSpore Quantum, a pioneering hybrid quantum-classical framework with a primary focus on the design and implementation of noisy intermediate-scale quantum (NISQ) algorithms. Leveraging the robust support of MindSpore, an advanced open-source deep learning training/inference framework, MindSpore Quantum exhibits exceptional efficiency in the design and training of variational quantum algorithms on both CPU and GPU platforms, delivering remarkable performance. Furthermore, this framework places a strong emphasis on enhancing the operational efficiency of quantum algorithms when executed on real quantum hardware. This encompasses the development of algorithms for quantum circuit compilation and qubit map**, crucial components for achieving optimal performance on quantum processors. In addition to the core framework, we introduce QuPack, a meticulously crafted quantum computing acceleration engine. QuPack significantly accelerates the simulation speed of MindSpore Quantum, particularly in variational quantum eigensolver (VQE), quantum approximate optimization algorithm (QAOA), and tensor network simulations, providing astonishing speed. This combination of cutting-edge technologies empowers researchers and practitioners to explore the frontiers of quantum computing with unprecedented efficiency and performance.
△ Less
Submitted 27 June, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
Constraining the Physical Parameters of Blazars Using the Seed Factor Approach
Authors:
Chang-Bin Deng,
Yong-You Shi,
Yu-Jie Song,
Rui Xue,
Lei-Ming Du,
Ze-Rui Wang,
Zhao-Hua Xie
Abstract:
The discovery that blazars dominate the extra-galactic γ-ray sky is a triumph in the Fermi era. However, the exact location of γ-ray emission region still remains in debate. Low-synchrotron-peaked blazars (LSPs) are estimated to produce high-energy radiation through the external Compton process, thus their emission regions are closely related to the external photon fields. We employed the seed fac…
▽ More
The discovery that blazars dominate the extra-galactic γ-ray sky is a triumph in the Fermi era. However, the exact location of γ-ray emission region still remains in debate. Low-synchrotron-peaked blazars (LSPs) are estimated to produce high-energy radiation through the external Compton process, thus their emission regions are closely related to the external photon fields. We employed the seed factor approach proposed by Georganopoulos et al. It directly matches the observed seed factor of each LSP with the characteristic seed factors of external photon fields to locate the γ-ray emission region. A sample of 1138 LSPs with peak frequencies and peak luminosities was adopted to plot a histogram distribution of observed seed factors. We also collected some spectral energy distributions (SEDs) of historical flare states to investigate the variation of γ-ray emission region. Those SEDs were fitted by both quadratic and cubic functions using the Markov-chain Monte Carlo method. Furthermore, we derived some physical parameters of blazars and compared them with the constraint of internal γγ-absorption. We find that dusty torus dominates the soft photon fields of LSPs and most γ-ray emission regions of LSPs are located at 1-10 pc. The soft photon fields could also transition from dusty torus to broad line region and cosmic microwave background in different flare states. Our results suggest that the cubic function is better than the quadratic function to fit the SEDs.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Brownian friction dynamics: fluctuations in sliding distance
Authors:
Ruibin Xu,
Feng Zhou,
B. N. J. Persson
Abstract:
We have studied the fluctuation (noise) in the position of sliding blocks under constant driving forces on different substrate surfaces. The experimental data are complemented by simulations using a simple spring-block model where the asperity contact regions are modeled by miniblocks connected to the big block by viscoelastic springs. The miniblocks experience forces that fluctuate randomly with…
▽ More
We have studied the fluctuation (noise) in the position of sliding blocks under constant driving forces on different substrate surfaces. The experimental data are complemented by simulations using a simple spring-block model where the asperity contact regions are modeled by miniblocks connected to the big block by viscoelastic springs. The miniblocks experience forces that fluctuate randomly with the lateral position, simulating the interaction between asperities on the block and the substrate. The theoretical model provides displacement power spectra that agree well with the experimental results.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations
Authors:
Lichao Zhang,
Jia Yu,
Shuai Zhang,
Long Li,
Yangyang Zhong,
Guanbao Liang,
Yuming Yan,
Qing Ma,
Fangsheng Weng,
Fayu Pan,
**g Li,
Renjun Xu,
Zhenzhong Lan
Abstract:
Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We cond…
▽ More
Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We conduct a comprehensive analysis using a diverse set of chatbots and real-user interaction data, employing metrics such as retention rate and conversation length to evaluate user engagement. Our findings reveal a significant enhancement in user engagement with multi-modal interactions compared to text-only dialogues. Notably, the incorporation of a third modality significantly amplifies engagement beyond the benefits observed with just two modalities. These results suggest that multi-modal interactions optimize cognitive processing and facilitate richer information comprehension. This study underscores the importance of multi-modality in chatbot design, offering valuable insights for creating more engaging and immersive AI communication experiences and informing the broader AI community about the benefits of multi-modal interactions in enhancing user engagement.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback
Authors:
Bofei Gao,
Zefan Cai,
Runxin Xu,
Peiyi Wang,
Ce Zheng,
Runji Lin,
Keming Lu,
Junyang Lin,
Chang Zhou,
Wen Xiao,
Junjie Hu,
Tianyu Liu,
Baobao Chang
Abstract:
Mathematical verfier achieves success in mathematical reasoning tasks by validating the correctness of solutions. However, existing verifiers are trained with binary classification labels, which are not informative enough for the model to accurately assess the solutions. To mitigate the aforementioned insufficiency of binary labels, we introduce step-wise natural language feedbacks as rationale la…
▽ More
Mathematical verfier achieves success in mathematical reasoning tasks by validating the correctness of solutions. However, existing verifiers are trained with binary classification labels, which are not informative enough for the model to accurately assess the solutions. To mitigate the aforementioned insufficiency of binary labels, we introduce step-wise natural language feedbacks as rationale labels (i.e., the correctness of the current step and the explanations). In this paper, we propose \textbf{Math-Minos}, a natural language feedback enhanced verifier by constructing automatically-generated training data and a two-stage training paradigm for effective training and efficient inference. Our experiments reveal that a small set (30k) of natural language feedbacks can significantly boost the performance of the verifier by the accuracy of 1.6\% (86.6\% $\rightarrow$ 88.2\%) on GSM8K and 0.8\% (37.8\% $\rightarrow$ 38.6\%) on MATH. We have released our code and data for further exploration.
△ Less
Submitted 30 June, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models
Authors:
Zhongshen Zeng,
Yinhong Liu,
Yingjia Wan,
**gyao Li,
Pengguang Chen,
Jianbo Dai,
Yuxuan Yao,
Rongwu Xu,
Zehan Qi,
Wanru Zhao,
Linling Shen,
Jianqiao Lu,
Haochen Tan,
Yukang Chen,
Hao Zhang,
Zhan Shi,
Bailin Wang,
Zhijiang Guo,
Jiaya Jia
Abstract:
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we pr…
▽ More
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we present a process-based benchmark MR-BEN that demands a meta reasoning skill, where LMs are asked to locate and analyse potential errors in automatically generated reasoning steps. MR-BEN is a comprehensive benchmark comprising 5,975 questions collected from human experts, covering various subjects such as physics, chemistry, logic, coding, and more. Through our designed metrics for assessing meta-reasoning on this benchmark, we identify interesting limitations and weaknesses of current LLMs (open-source and closed-source models). For example, open-source models are seemingly comparable to GPT-4 on outcome-based benchmarks, but they lag far behind on our benchmark, revealing the underlying reasoning capability gap between them. Our dataset and codes are available on https://randolph-zeng.github.io/Mr-Ben.github.io/.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data
Authors:
Nahema Marchal,
Rachel Xu,
Rasmi Elasmar,
Iason Gabriel,
Beth Goldberg,
William Isaac
Abstract:
Generative, multimodal artificial intelligence (GenAI) offers transformative potential across industries, but its misuse poses significant risks. Prior research has shed light on the potential of advanced AI systems to be exploited for malicious purposes. However, we still lack a concrete understanding of how GenAI models are specifically exploited or abused in practice, including the tactics empl…
▽ More
Generative, multimodal artificial intelligence (GenAI) offers transformative potential across industries, but its misuse poses significant risks. Prior research has shed light on the potential of advanced AI systems to be exploited for malicious purposes. However, we still lack a concrete understanding of how GenAI models are specifically exploited or abused in practice, including the tactics employed to inflict harm. In this paper, we present a taxonomy of GenAI misuse tactics, informed by existing academic literature and a qualitative analysis of approximately 200 observed incidents of misuse reported between January 2023 and March 2024. Through this analysis, we illuminate key and novel patterns in misuse during this time period, including potential motivations, strategies, and how attackers leverage and abuse system capabilities across modalities (e.g. image, text, audio, video) in the wild.
△ Less
Submitted 21 June, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Authors:
Zhen Huang,
Zengzhi Wang,
Shijie Xia,
Xuefeng Li,
Haoyang Zou,
Ruijie Xu,
Run-Ze Fan,
Lyumanshan Ye,
Ethan Chern,
Yixin Ye,
Yikai Zhang,
Yuqing Yang,
Ting Wu,
Binjie Wang,
Shichao Sun,
Yang Xiao,
Yiyuan Li,
Fan Zhou,
Steffi Chern,
Yiwei Qin,
Yan Ma,
Jiadi Su,
Yixiu Liu,
Yuxiang Zheng,
Shaoting Zhang
, et al. (3 additional authors not shown)
Abstract:
The evolution of Artificial Intelligence (AI) has been significantly accelerated by advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), gradually showcasing potential cognitive reasoning abilities in problem-solving and scientific discovery (i.e., AI4Science) once exclusive to human intellect. To comprehensively evaluate current models' performance in cognitive reasoni…
▽ More
The evolution of Artificial Intelligence (AI) has been significantly accelerated by advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), gradually showcasing potential cognitive reasoning abilities in problem-solving and scientific discovery (i.e., AI4Science) once exclusive to human intellect. To comprehensively evaluate current models' performance in cognitive reasoning abilities, we introduce OlympicArena, which includes 11,163 bilingual problems across both text-only and interleaved text-image modalities. These challenges encompass a wide range of disciplines spanning seven fields and 62 international Olympic competitions, rigorously examined for data leakage. We argue that the challenges in Olympic competition problems are ideal for evaluating AI's cognitive reasoning due to their complexity and interdisciplinary nature, which are essential for tackling complex scientific challenges and facilitating discoveries. Beyond evaluating performance across various disciplines using answer-only criteria, we conduct detailed experiments and analyses from multiple perspectives. We delve into the models' cognitive reasoning abilities, their performance across different modalities, and their outcomes in process-level evaluations, which are vital for tasks requiring complex reasoning with lengthy solutions. Our extensive evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy, illustrating current AI limitations in complex reasoning and multimodal integration. Through the OlympicArena, we aim to advance AI towards superintelligence, equip** it to address more complex challenges in science and beyond. We also provide a comprehensive set of resources to support AI research, including a benchmark dataset, an open-source annotation platform, a detailed evaluation tool, and a leaderboard with automatic submission features.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Authors:
DeepSeek-AI,
Qihao Zhu,
Daya Guo,
Zhihong Shao,
Dejian Yang,
Peiyi Wang,
Runxin Xu,
Y. Wu,
Yukun Li,
Huazuo Gao,
Shirong Ma,
Wangding Zeng,
Xiao Bi,
Zihui Gu,
Hanwei Xu,
Damai Dai,
Kai Dong,
Liyue Zhang,
Yishi Piao,
Zhibin Gou,
Zhenda Xie,
Zhewen Hao,
Bingxuan Wang,
Junxiao Song,
Deli Chen
, et al. (15 additional authors not shown)
Abstract:
We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe…
▽ More
We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression
Authors:
Zilun Zhang,
Yutao Sun,
Tiancheng Zhao,
Leigang Sha,
Ruochen Xu,
Kyusong Lee,
Jianwei Yin
Abstract:
Humans can retain old knowledge while learning new information, but Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data. Moreover, for Multimodal Large Language Models (MLLMs) which are composed of the LLM base and visual projector (e.g. LLaVA), a significant decline in performance on language benchmarks…
▽ More
Humans can retain old knowledge while learning new information, but Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data. Moreover, for Multimodal Large Language Models (MLLMs) which are composed of the LLM base and visual projector (e.g. LLaVA), a significant decline in performance on language benchmarks was observed compared to their single-modality counterparts. To address these challenges, we introduce a novel model-agnostic self-decompression method, Tree Generation (TG), that decompresses knowledge within LLMs into the training corpus. This paper focuses on TG-SFT, which can synthetically generate SFT data for the instruction tuning steps. By incorporating the dumped corpus during SFT for MLLMs, we significantly reduce the forgetting problem.
△ Less
Submitted 19 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
Authors:
Anas Awadalla,
Le Xue,
Oscar Lo,
Manli Shu,
Hannah Lee,
Etash Kumar Guha,
Matt Jordan,
Sheng Shen,
Mohamed Awadalla,
Silvio Savarese,
Caiming Xiong,
Ran Xu,
Ye** Choi,
Ludwig Schmidt
Abstract:
Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs). Despite the rapid progression of open-source LMMs, there remains a pronounced scarcity of large-scale, diverse open-source multimodal interleaved datasets. In response, we introduce MINT-1T, the most extensive and diverse open-source Multimo…
▽ More
Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs). Despite the rapid progression of open-source LMMs, there remains a pronounced scarcity of large-scale, diverse open-source multimodal interleaved datasets. In response, we introduce MINT-1T, the most extensive and diverse open-source Multimodal INTerleaved dataset to date. MINT-1T comprises one trillion text tokens and three billion images, a 10x scale-up from existing open-source datasets. Additionally, we include previously untapped sources such as PDFs and ArXiv papers. As scaling multimodal interleaved datasets requires substantial engineering effort, sharing the data curation process and releasing the dataset greatly benefits the community. Our experiments show that LMMs trained on MINT-1T rival the performance of models trained on the previous leading dataset, OBELICS. Our data and code will be released at https://github.com/mlfoundations/MINT-1T.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases
Authors:
Rithesh Murthy,
Liangwei Yang,
Juntao Tan,
Tulika Manoj Awalgaonkar,
Yilun Zhou,
Shelby Heinecke,
Sachin Desai,
Jason Wu,
Ran Xu,
Sarah Tan,
Jianguo Zhang,
Zhiwei Liu,
Shirley Kokane,
Zuxin Liu,
Ming Zhu,
Huan Wang,
Caiming Xiong,
Silvio Savarese
Abstract:
The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understand…
▽ More
The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understanding of quantization's impact on various task performances, including LLM tasks, LMM tasks, and, critically, trust and safety. There is a lack of adequate tools for systematically testing these models on mobile devices. To address these gaps, we introduce MobileAIBench, a comprehensive benchmarking framework for evaluating mobile-optimized LLMs and LMMs. MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices. Our two-part open-source framework includes a library for running evaluations on desktops and an iOS app for on-device latency and hardware utilization measurements. Our thorough analysis aims to accelerate mobile AI research and deployment by providing insights into the performance and feasibility of deploying LLMs and LMMs on mobile platforms.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
TACCO: Task-guided Co-clustering of Clinical Concepts and Patient Visits for Disease Subty** based on EHR Data
Authors:
Ziyang Zhang,
Hejie Cui,
Ran Xu,
Yuzhang Xie,
Joyce C. Ho,
Carl Yang
Abstract:
The growing availability of well-organized Electronic Health Records (EHR) data has enabled the development of various machine learning models towards disease risk prediction. However, existing risk prediction methods overlook the heterogeneity of complex diseases, failing to model the potential disease subtypes regarding their corresponding patient visits and clinical concept subgroups. In this w…
▽ More
The growing availability of well-organized Electronic Health Records (EHR) data has enabled the development of various machine learning models towards disease risk prediction. However, existing risk prediction methods overlook the heterogeneity of complex diseases, failing to model the potential disease subtypes regarding their corresponding patient visits and clinical concept subgroups. In this work, we introduce TACCO, a novel framework that jointly discovers clusters of clinical concepts and patient visits based on a hypergraph modeling of EHR data. Specifically, we develop a novel self-supervised co-clustering framework that can be guided by the risk prediction task of specific diseases. Furthermore, we enhance the hypergraph model of EHR data with textual embeddings and enforce the alignment between the clusters of clinical concepts and patient visits through a contrastive objective. Comprehensive experiments conducted on the public MIMIC-III dataset and Emory internal CRADLE dataset over the downstream clinical tasks of phenotype classification and cardiovascular risk prediction demonstrate an average 31.25% performance improvement compared to traditional ML baselines and a 5.26% improvement on top of the vanilla hypergraph model without our co-clustering mechanism. In-depth model analysis, clustering results analysis, and clinical case studies further validate the improved utilities and insightful interpretations delivered by TACCO. Code is available at https://github.com/PericlesHat/TACCO.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
Authors:
Ruiyuan Lyu,
Tai Wang,
**gli Lin,
Shuai Yang,
Xiaohan Mao,
Yilun Chen,
Runsen Xu,
Haifeng Huang,
Chenming Zhu,
Dahua Lin,
Jiangmiao Pang
Abstract:
With the emergence of LLMs and their integration with other data modalities, multi-modal 3D perception attracts more attention due to its connectivity to the physical world and makes rapid progress. However, limited by existing datasets, previous works mainly focus on understanding object properties or inter-object spatial relationships in a 3D scene. To tackle this problem, this paper builds the…
▽ More
With the emergence of LLMs and their integration with other data modalities, multi-modal 3D perception attracts more attention due to its connectivity to the physical world and makes rapid progress. However, limited by existing datasets, previous works mainly focus on understanding object properties or inter-object spatial relationships in a 3D scene. To tackle this problem, this paper builds the first largest ever multi-modal 3D scene dataset and benchmark with hierarchical grounded language annotations, MMScan. It is constructed based on a top-down logic, from region to object level, from a single target to inter-target relationships, covering holistic aspects of spatial and attribute understanding. The overall pipeline incorporates powerful VLMs via carefully designed prompts to initialize the annotations efficiently and further involve humans' correction in the loop to ensure the annotations are natural, correct, and comprehensive. Built upon existing 3D scanning data, the resulting multi-modal 3D dataset encompasses 1.4M meta-annotated captions on 109k objects and 7.7k regions as well as over 3.04M diverse samples for 3D visual grounding and question-answering benchmarks. We evaluate representative baselines on our benchmarks, analyze their capabilities in different aspects, and showcase the key problems to be addressed in the future. Furthermore, we use this high-quality dataset to train state-of-the-art 3D visual grounding and LLMs and obtain remarkable performance improvement both on existing benchmarks and in-the-wild evaluation. Codes, datasets, and benchmarks will be available at https://github.com/OpenRobotLab/EmbodiedScan.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes…
▽ More
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Diffusion-Promoted HDR Video Reconstruction
Authors:
Yuanshen Guan,
Ruikang Xu,
Mingde Yao,
Ruisheng Gao,
Lizhi Wang,
Zhiwei Xiong
Abstract:
High dynamic range (HDR) video reconstruction aims to generate HDR videos from low dynamic range (LDR) frames captured with alternating exposures. Most existing works solely rely on the regression-based paradigm, leading to adverse effects such as ghosting artifacts and missing details in saturated regions. In this paper, we propose a diffusion-promoted method for HDR video reconstruction, termed…
▽ More
High dynamic range (HDR) video reconstruction aims to generate HDR videos from low dynamic range (LDR) frames captured with alternating exposures. Most existing works solely rely on the regression-based paradigm, leading to adverse effects such as ghosting artifacts and missing details in saturated regions. In this paper, we propose a diffusion-promoted method for HDR video reconstruction, termed HDR-V-Diff, which incorporates a diffusion model to capture the HDR distribution. As such, HDR-V-Diff can reconstruct HDR videos with realistic details while alleviating ghosting artifacts. However, the direct introduction of video diffusion models would impose massive computational burden. Instead, to alleviate this burden, we first propose an HDR Latent Diffusion Model (HDR-LDM) to learn the distribution prior of single HDR frames. Specifically, HDR-LDM incorporates a tonemap** strategy to compress HDR frames into the latent space and a novel exposure embedding to aggregate the exposure information into the diffusion process. We then propose a Temporal-Consistent Alignment Module (TCAM) to learn the temporal information as a complement for HDR-LDM, which conducts coarse-to-fine feature alignment at different scales among video frames. Finally, we design a Zero-Init Cross-Attention (ZiCA) mechanism to effectively integrate the learned distribution prior and temporal information for generating HDR frames. Extensive experiments validate that HDR-V-Diff achieves state-of-the-art results on several representative datasets.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
PretVM: Predictable, Efficient Virtual Machine for Real-Time Concurrency
Authors:
Shaokai Lin,
Erling Jellum,
Mirco Theile,
Tassilo Tanneberger,
Binqi Sun,
Chadlia Jerad,
Ruomu Xu,
Guangyu Feng,
Christian Menard,
Marten Lohstroh,
Jeronimo Castrillon,
Sanjit Seshia,
Edward Lee
Abstract:
This paper introduces the Precision-Timed Virtual Machine (PretVM), an intermediate platform facilitating the execution of quasi-static schedules compiled from a subset of programs written in the Lingua Franca (LF) coordination language. The subset consists of those programs that in principle should have statically verifiable and predictable timing behavior. The PretVM provides a schedule with wel…
▽ More
This paper introduces the Precision-Timed Virtual Machine (PretVM), an intermediate platform facilitating the execution of quasi-static schedules compiled from a subset of programs written in the Lingua Franca (LF) coordination language. The subset consists of those programs that in principle should have statically verifiable and predictable timing behavior. The PretVM provides a schedule with well-defined worst-case timing bounds. The PretVM provides a clean separation between application logic and coordination logic, yielding more analyzable program executions. Experiments compare the PretVM against the default (more dynamic) LF scheduler and show that it delivers time-accurate deterministic execution.
△ Less
Submitted 25 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Correlated electrons of the flat band in charge density wave state of 4Hb-TaSexS2-x
Authors:
Yanyan Geng,
Jianfeng Guo,
Fanyu Meng,
Manyu Wang,
Shuo Mi,
Li Huang,
Rui Xu,
Fei Pang,
Kai Liu,
Shancai Wang,
Hong-Jun Gao,
Weichang Zhou,
Wei Ji,
Hechang Lei,
Zhihai Cheng
Abstract:
Many intriguing quantum states of matter, such as unconventional superconductivity, magnetic phases and fractional quantum Hall physics, emergent from the spatially-correlated localized electrons in the flat band of solid materials. By using scanning tunneling microscopy and spectroscopy (STM/STS), we report the real-space investigation of correlated electrons in the flat band of superlattice 4Hb-…
▽ More
Many intriguing quantum states of matter, such as unconventional superconductivity, magnetic phases and fractional quantum Hall physics, emergent from the spatially-correlated localized electrons in the flat band of solid materials. By using scanning tunneling microscopy and spectroscopy (STM/STS), we report the real-space investigation of correlated electrons in the flat band of superlattice 4Hb-TaSexS2-x. In contrast with the pristine 4Hb-TaS2, the selenium (Se) substitutions significantly affect the interfacial transfer of correlated electrons between the CDW states of 1T- and 1H-TaS2 layers, and contribute a real-space fractional electron-filling configurations with the distributed electron-filled and -void SoD clusters of 1T-layer. The site-specific STS spectra directly reveal their respective prominent spectra weight above EF and symmetric Mott-like spectra. In addition, the spatial distributions of these electron-filled SoDs in the 1T-layer of 4Hb-TaSe0.7S1.3 demonstrate different local short-range patterning, clearly indicating the complex neighboring interactions among the localized electrons in the flat band of 1T-layer. Our results not only provide an in-depth insight of correlated electrons in the flat CDW band, and provide a simple platform to manipulate the electron-correlation-related quantum states.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
The Limits of Interval-Regulated Price Discrimination
Authors:
Kamesh Munagala,
Yiheng Shen,
Renzhe Xu
Abstract:
In this paper, we study third-degree price discrimination in a model first presented in Bergemann, Brooks, and Morris [2015]. Since such price discrimination might create market segments with vastly different posted prices, we consider regulating these prices, specifically, via restricting them to lie within an interval. Given a price interval, we consider segmentations of the market where a selle…
▽ More
In this paper, we study third-degree price discrimination in a model first presented in Bergemann, Brooks, and Morris [2015]. Since such price discrimination might create market segments with vastly different posted prices, we consider regulating these prices, specifically, via restricting them to lie within an interval. Given a price interval, we consider segmentations of the market where a seller, who is oblivious to the existence of such regulation, still posts prices within the price interval. We show the following surprising result: For any market and price interval where such segmentation is feasible, there is always a different segmentation that optimally transfers all excess surplus to the consumers. In addition, we characterize the entire space of buyer and seller surplus that are achievable by such segmentation, including maximizing seller surplus, and simultaneously minimizing buyer and seller surplus.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Async Learned User Embeddings for Ads Delivery Optimization
Authors:
Mingwei Tang,
Meng Liu,
Hong Li,
Junjie Yang,
Chenglin Wei,
Boyang Li,
Dai Li,
Rengan Xu,
Yifan Xu,
Zehua Zhang,
Xiangyu Wang,
Linfeng Liu,
Yuelei Xie,
Chengye Liu,
Labib Fawaz,
Li Li,
Hongnan Wang,
Bill Zhu,
Sri Reddy
Abstract:
In recommendation systems, high-quality user embeddings can capture subtle preferences, enable precise similarity calculations, and adapt to changing preferences over time to maintain relevance. The effectiveness of recommendation systems depends on the quality of user embedding. We propose to asynchronously learn high fidelity user embeddings for billions of users each day from sequence based mul…
▽ More
In recommendation systems, high-quality user embeddings can capture subtle preferences, enable precise similarity calculations, and adapt to changing preferences over time to maintain relevance. The effectiveness of recommendation systems depends on the quality of user embedding. We propose to asynchronously learn high fidelity user embeddings for billions of users each day from sequence based multimodal user activities through a Transformer-like large scale feature learning module. The async learned user representations embeddings (ALURE) are further converted to user similarity graphs through graph learning and then combined with user realtime activities to retrieval highly related ads candidates for the ads delivery system. Our method shows significant gains in both offline and online experiments.
△ Less
Submitted 23 June, 2024; v1 submitted 9 June, 2024;
originally announced June 2024.
-
II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
Authors:
Ziqiang Liu,
Feiteng Fang,
Xi Feng,
Xinrun Du,
Chenhao Zhang,
Zekun Wang,
Yuelin Bai,
Qixuan Zhao,
Liyang Fan,
Chengguang Gan,
Hongquan Lin,
Jiaming Li,
Yuansheng Ni,
Haihong Wu,
Yaswanth Narsupalli,
Zhigang Zheng,
Chengming Li,
Xi** Hu,
Ruifeng Xu,
Xiaojun Chen,
Min Yang,
Jiaheng Liu,
Ruibo Liu,
Wenhao Huang,
Ge Zhang
, et al. (1 additional authors not shown)
Abstract:
The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap,…
▽ More
The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap, we propose the Image Implication understanding Benchmark, II-Bench, which aims to evaluate the model's higher-order perception of images. Through extensive experiments on II-Bench across multiple MLLMs, we have made significant findings. Initially, a substantial gap is observed between the performance of MLLMs and humans on II-Bench. The pinnacle accuracy of MLLMs attains 74.8%, whereas human accuracy averages 90%, peaking at an impressive 98%. Subsequently, MLLMs perform worse on abstract and complex images, suggesting limitations in their ability to understand high-level semantics and capture image details. Finally, it is observed that most models exhibit enhanced accuracy when image sentiment polarity hints are incorporated into the prompts. This observation underscores a notable deficiency in their inherent understanding of image sentiment. We believe that II-Bench will inspire the community to develop the next generation of MLLMs, advancing the journey towards expert artificial general intelligence (AGI). II-Bench is publicly available at https://huggingface.co/datasets/m-a-p/II-Bench.
△ Less
Submitted 11 June, 2024; v1 submitted 9 June, 2024;
originally announced June 2024.
-
From Basic to Extra Features: Hypergraph Transformer Pretrain-then-Finetuning for Balanced Clinical Predictions on EHR
Authors:
Ran Xu,
Yiwen Lu,
Chang Liu,
Yong Chen,
Yan Sun,
Xiao Hu,
Joyce C Ho,
Carl Yang
Abstract:
Electronic Health Records (EHRs) contain rich patient information and are crucial for clinical research and practice. In recent years, deep learning models have been applied to EHRs, but they often rely on massive features, which may not be readily available for all patients. We propose HTP-Star, which leverages hypergraph structures with a pretrain-then-finetune framework for modeling EHR data, e…
▽ More
Electronic Health Records (EHRs) contain rich patient information and are crucial for clinical research and practice. In recent years, deep learning models have been applied to EHRs, but they often rely on massive features, which may not be readily available for all patients. We propose HTP-Star, which leverages hypergraph structures with a pretrain-then-finetune framework for modeling EHR data, enabling seamless integration of additional features. Additionally, we design two techniques, namely (1) Smoothness-inducing Regularization and (2) Group-balanced Reweighting, to enhance the model's robustness during fine-tuning. Through experiments conducted on two real EHR datasets, we demonstrate that HTP-Star consistently outperforms various baselines while striking a balance between patients with basic and extra features.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
Authors:
Zhenhong Zhou,
Haiyang Yu,
Xinghua Zhang,
Rongwu Xu,
Fei Huang,
Yongbin Li
Abstract:
Large language models (LLMs) rely on safety alignment to avoid responding to malicious user inputs. Unfortunately, jailbreak can circumvent safety guardrails, resulting in LLMs generating harmful content and raising concerns about LLM safety. Due to language models with intensive parameters often regarded as black boxes, the mechanisms of alignment and jailbreak are challenging to elucidate. In th…
▽ More
Large language models (LLMs) rely on safety alignment to avoid responding to malicious user inputs. Unfortunately, jailbreak can circumvent safety guardrails, resulting in LLMs generating harmful content and raising concerns about LLM safety. Due to language models with intensive parameters often regarded as black boxes, the mechanisms of alignment and jailbreak are challenging to elucidate. In this paper, we employ weak classifiers to explain LLM safety through the intermediate hidden states. We first confirm that LLMs learn ethical concepts during pre-training rather than alignment and can identify malicious and normal inputs in the early layers. Alignment actually associates the early concepts with emotion guesses in the middle layers and then refines them to the specific reject tokens for safe generations. Jailbreak disturbs the transformation of early unethical classification into negative emotions. We conduct experiments on models from 7B to 70B across various model families to prove our conclusion. Overall, our paper indicates the intrinsical mechanism of LLM safety and how jailbreaks circumvent safety guardrails, offering a new perspective on LLM safety and reducing concerns. Our code is available at https://github.com/ydyjya/LLM-IHS-Explanation.
△ Less
Submitted 13 June, 2024; v1 submitted 9 June, 2024;
originally announced June 2024.
-
Optical biomarker of metabolism for breast tumor diagnosis: Insights from subcellular dynamics
Authors:
Zichen Yin,
Shuwei Zhang,
Bin He,
Houpu Yang,
Zhengyu Chen,
Zhangwei Hu,
Yejiong Shi,
Ruizhi Xue,
Panqi Yang,
Yuzhe Ying,
Chengming Wang,
Shu Wang,
** Xue
Abstract:
Label-free metabolic dynamics contrast is highly appealing but difficult to achieve in biomedical imaging. Interference offers a highly sensitive mechanism for capturing the metabolic dynamics of the subcellular scatterers. However, traditional interference detection methods fail to isolate pure metabolic dynamics, as the dynamic signals are coupled with scatterer reflectivity and other uncontroll…
▽ More
Label-free metabolic dynamics contrast is highly appealing but difficult to achieve in biomedical imaging. Interference offers a highly sensitive mechanism for capturing the metabolic dynamics of the subcellular scatterers. However, traditional interference detection methods fail to isolate pure metabolic dynamics, as the dynamic signals are coupled with scatterer reflectivity and other uncontrollable imaging factors. Here, we demonstrate active phase modulation-assisted dynamic full-field optical coherence tomography (APMD-FFOCT) that decouples and quantifies the metabolic dynamics by adding a reference movement for all interferential scatterers. This novel technique enables imaging and dynamic analysis of subcellular structures along with their changes during the apoptotic process in tumor tissues. Furthermore, the nucleus-to-cytoplasm dynamic intensity ratio could serve as an optical biomarker for breast tumor grading, enhancing intraoperative diagnosis.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
GWnext 2024: Meeting Summary
Authors:
Alejandro Torres-Orjuela,
Veronica Vazquez-Aceves,
Rui Xu,
**-Hong Chen,
Andrea Derdzinski,
Matthias U. Kruckow,
Stefano Rinaldi,
Lorenzo Speri,
Ziming Wang,
Garvin Yim,
Xue-Ting Zhang,
Qian Hu,
Miaoxin Liu,
Xiangyu Lyu,
Zheng Wu,
Cong Zhou,
Manuel Arca Sedda,
Yan-Chen Bi,
Hong-Yu Chen,
Xian Chen,
Jiageng Jiao,
Yu-Mei Wu
Abstract:
GWnext 2024 was a meeting held in the Kavli Institute for Astronomy and Astrophysics at Peking University in March $4^\text{th} - 8^\text{th}$, 2024. In the meeting researchers at different career stages -- with a particular focus on early career scientists -- working on the different aspects of gravitational wave (GW) astronomy gathered to discuss the current status as well as prospects of the fi…
▽ More
GWnext 2024 was a meeting held in the Kavli Institute for Astronomy and Astrophysics at Peking University in March $4^\text{th} - 8^\text{th}$, 2024. In the meeting researchers at different career stages -- with a particular focus on early career scientists -- working on the different aspects of gravitational wave (GW) astronomy gathered to discuss the current status as well as prospects of the field. The meeting was divided into three core sessions: Astrophysics, GW Theory, and Detection. Each session consisted of introductory talks and extended discussion sessions. Moreover, there was a poster session where students could present their results. In this paper, we summarize the results presented during the meeting and present the most important outcomes.
△ Less
Submitted 27 May, 2024;
originally announced June 2024.
-
Improving In-Context Learning with Prediction Feedback for Sentiment Analysis
Authors:
Hongling Xu,
Qianlong Wang,
Yice Zhang,
Min Yang,
Xi Zeng,
Bing Qin,
Ruifeng Xu
Abstract:
Large language models (LLMs) have achieved promising results in sentiment analysis through the in-context learning (ICL) paradigm. However, their ability to distinguish subtle sentiments still remains a challenge. Inspired by the human ability to adjust understanding via feedback, this paper enhances ICL by incorporating prior predictions and feedback, aiming to rectify sentiment misinterpretation…
▽ More
Large language models (LLMs) have achieved promising results in sentiment analysis through the in-context learning (ICL) paradigm. However, their ability to distinguish subtle sentiments still remains a challenge. Inspired by the human ability to adjust understanding via feedback, this paper enhances ICL by incorporating prior predictions and feedback, aiming to rectify sentiment misinterpretation of LLMs. Specifically, the proposed framework consists of three steps: (1) acquiring prior predictions of LLMs, (2) devising predictive feedback based on correctness, and (3) leveraging a feedback-driven prompt to refine sentiment understanding. Experimental results across nine sentiment analysis datasets demonstrate the superiority of our framework over conventional ICL methods, with an average F1 improvement of 5.95%.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models
Authors:
Ancheng Xu,
Minghuan Tan,
Lei Wang,
Min Yang,
Ruifeng Xu
Abstract:
Numeral systems and units of measurement are two conjoined topics in activities of human beings and have mutual effects with the languages expressing them. Currently, the evaluation of Large Language Models (LLMs) often involves mathematical reasoning, yet little attention is given to how minor changes in numbers or units can drastically alter the complexity of problems and the performance of LLMs…
▽ More
Numeral systems and units of measurement are two conjoined topics in activities of human beings and have mutual effects with the languages expressing them. Currently, the evaluation of Large Language Models (LLMs) often involves mathematical reasoning, yet little attention is given to how minor changes in numbers or units can drastically alter the complexity of problems and the performance of LLMs. In this paper, we scrutinize existing LLMs on processing of numerals and units of measurement by constructing datasets with perturbations. We first anatomize the reasoning of math word problems to different sub-procedures like numeral conversions from language to numbers and measurement conversions based on units. Then we further annotate math word problems from ancient Chinese arithmetic works which are challenging in numerals and units of measurement. Experiments on perturbed datasets demonstrate that LLMs still encounter difficulties in handling numeral and measurement conversions.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning
Authors:
Jiaxu Wang,
Ziyi Zhang,
Qiang Zhang,
Jia Li,
**gkai Sun,
Mingyuan Sun,
Junhao He,
Ren**g Xu
Abstract:
Latent scene representation plays a significant role in training reinforcement learning (RL) agents. To obtain good latent vectors describing the scenes, recent works incorporate the 3D-aware latent-conditioned NeRF pipeline into scene representation learning. However, these NeRF-related methods struggle to perceive 3D structural information due to the inefficient dense sampling in volumetric rend…
▽ More
Latent scene representation plays a significant role in training reinforcement learning (RL) agents. To obtain good latent vectors describing the scenes, recent works incorporate the 3D-aware latent-conditioned NeRF pipeline into scene representation learning. However, these NeRF-related methods struggle to perceive 3D structural information due to the inefficient dense sampling in volumetric rendering. Moreover, they lack fine-grained semantic information included in their scene representation vectors because they evenly consider free and occupied spaces. Both of them can destroy the performance of downstream RL tasks. To address the above challenges, we propose a novel framework that adopts the efficient 3D Gaussian Splatting (3DGS) to learn 3D scene representation for the first time. In brief, we present the Query-based Generalizable 3DGS to bridge the 3DGS technique and scene representations with more geometrical awareness than those in NeRFs. Moreover, we present the Hierarchical Semantics Encoding to ground the fine-grained semantic features to 3D Gaussians and further distilled to the scene representation vectors. We conduct extensive experiments on two RL platforms including Maniskill2 and Robomimic across 10 different tasks. The results show that our method outperforms the other 5 baselines by a large margin. We achieve the best success rates on 8 tasks and the second-best on the other two tasks.
△ Less
Submitted 9 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection
Authors:
Ronghui Xu,
Hao Miao,
Senzhang Wang,
Philip S. Yu,
Jianxin Wang
Abstract:
With the proliferation of mobile sensing techniques, huge amounts of time series data are generated and accumulated in various domains, fueling plenty of real-world applications. In this setting, time series anomaly detection is practically important. It endeavors to identify deviant samples from the normal sample distribution in time series. Existing approaches generally assume that all the time…
▽ More
With the proliferation of mobile sensing techniques, huge amounts of time series data are generated and accumulated in various domains, fueling plenty of real-world applications. In this setting, time series anomaly detection is practically important. It endeavors to identify deviant samples from the normal sample distribution in time series. Existing approaches generally assume that all the time series is available at a central location. However, we are witnessing the decentralized collection of time series due to the deployment of various edge devices. To bridge the gap between the decentralized time series data and the centralized anomaly detection algorithms, we propose a Parameter-efficient Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns. PeFAD for the first time employs the pre-trained language model (PLM) as the body of the client's local model, which can benefit from its cross-modality knowledge transfer capability. To reduce the communication overhead and local model adaptation cost, we propose a parameter-efficient federated training module such that clients only need to fine-tune small-scale parameters and transmit them to the server for update. PeFAD utilizes a novel anomaly-driven mask selection strategy to mitigate the impact of neglected anomalies during training. A knowledge distillation operation on a synthetic privacy-preserving dataset that is shared by all the clients is also proposed to address the data heterogeneity issue across clients. We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74\%.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning
Authors:
Jiahang Cao,
Qiang Zhang,
Ziqing Wang,
Jiaxu Wang,
Hao Cheng,
Yecheng Shao,
Wen Zhao,
Gang Han,
Yijie Guo,
Ren**g Xu
Abstract:
Sequential modeling has demonstrated remarkable capabilities in offline reinforcement learning (RL), with Decision Transformer (DT) being one of the most notable representatives, achieving significant success. However, RL trajectories possess unique properties to be distinguished from the conventional sequence (e.g., text or audio): (1) local correlation, where the next states in RL are theoretica…
▽ More
Sequential modeling has demonstrated remarkable capabilities in offline reinforcement learning (RL), with Decision Transformer (DT) being one of the most notable representatives, achieving significant success. However, RL trajectories possess unique properties to be distinguished from the conventional sequence (e.g., text or audio): (1) local correlation, where the next states in RL are theoretically determined solely by current states and actions based on the Markov Decision Process (MDP), and (2) global correlation, where each step's features are related to long-term historical information due to the time-continuous nature of trajectories. In this paper, we propose a novel action sequence predictor, named Mamba Decision Maker (MambaDM), where Mamba is expected to be a promising alternative for sequence modeling paradigms, owing to its efficient modeling of multi-scale dependencies. In particular, we introduce a novel mixer module that proficiently extracts and integrates both global and local features of the input sequence, effectively capturing interrelationships in RL datasets. Extensive experiments demonstrate that MambaDM achieves state-of-the-art performance in Atari and OpenAI Gym datasets. Furthermore, we empirically investigate the scaling laws of MambaDM, finding that increasing model size does not bring performance improvement, but scaling the dataset amount by 2x for MambaDM can obtain up to 33.7% score improvement on Atari dataset. This paper delves into the sequence modeling capabilities of MambaDM in the RL domain, paving the way for future advancements in robust and efficient decision-making systems. Our code will be available at https://github.com/AndyCao1125/MambaDM.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Nonlinear Eigen-approach ADMM for Sparse Optimization on Stiefel Manifold
Authors:
Jiawei Wang,
Rencang Li,
Richard Yi Da Xu
Abstract:
With the growing interest and applications in machine learning and data science, finding an efficient method to sparse analysis the high-dimensional data and optimizing a dimension reduction model to extract lower dimensional features has becoming more and more important. Orthogonal constraints (Stiefel manifold) is a commonly met constraint in these applications, and the sparsity is usually enfor…
▽ More
With the growing interest and applications in machine learning and data science, finding an efficient method to sparse analysis the high-dimensional data and optimizing a dimension reduction model to extract lower dimensional features has becoming more and more important. Orthogonal constraints (Stiefel manifold) is a commonly met constraint in these applications, and the sparsity is usually enforced through the element-wise L1 norm. Many applications can be found on optimization over Stiefel manifold within the area of physics and machine learning. In this paper, we propose a novel idea by tackling the Stiefel manifold through an nonlinear eigen-approach by first using ADMM to split the problem into smooth optimization over manifold and convex non-smooth optimization, and then transforming the former into the form of nonlinear eigenvalue problem with eigenvector dependency (NEPv) which is solved by self-consistent field (SCF) iteration, and the latter can be found to have an closed-form solution through proximal gradient. Compared with existing methods, our proposed algorithm takes the advantage of specific structure of the objective function, and has efficient convergence results under mild assumptions.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
MAD: Multi-Alignment MEG-to-Text Decoding
Authors:
Yiqian Yang,
Hyejeong Jo,
Yiqun Duan,
Qiang Zhang,
**ni Zhou,
Won Hee Lee,
Ren**g Xu,
Hui Xiong
Abstract:
Deciphering language from brain activity is a crucial task in brain-computer interface (BCI) research. Non-invasive cerebral signaling techniques including electroencephalography (EEG) and magnetoencephalography (MEG) are becoming increasingly popular due to their safety and practicality, avoiding invasive electrode implantation. However, current works under-investigated three points: 1) a predomi…
▽ More
Deciphering language from brain activity is a crucial task in brain-computer interface (BCI) research. Non-invasive cerebral signaling techniques including electroencephalography (EEG) and magnetoencephalography (MEG) are becoming increasingly popular due to their safety and practicality, avoiding invasive electrode implantation. However, current works under-investigated three points: 1) a predominant focus on EEG with limited exploration of MEG, which provides superior signal quality; 2) poor performance on unseen text, indicating the need for models that can better generalize to diverse linguistic contexts; 3) insufficient integration of information from other modalities, which could potentially constrain our capacity to comprehensively understand the intricate dynamics of brain activity.
This study presents a novel approach for translating MEG signals into text using a speech-decoding framework with multiple alignments. Our method is the first to introduce an end-to-end multi-alignment framework for totally unseen text generation directly from MEG signals. We achieve an impressive BLEU-1 score on the $\textit{GWilliams}$ dataset, significantly outperforming the baseline from 5.49 to 10.44 on the BLEU-1 metric. This improvement demonstrates the advancement of our model towards real-world applications and underscores its potential in advancing BCI research. Code is available at $\href{https://github.com/NeuSpeech/MAD-MEG2text}{https://github.com/NeuSpeech/MAD-MEG2text}$.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation
Authors:
Dengke Han,
Meng Wu,
Runzhen Xue,
Mingyu Yan,
Xiaochun Ye,
Dongrui Fan
Abstract:
Heterogeneous Graph Neural Networks (HGNNs) have recently demonstrated great power in handling heterogeneous graph data, rendering them widely applied in many critical real-world domains. Most HGNN models leverage attention mechanisms to significantly improvemodel accuracy, albeit at the cost of increased computational complexity and memory bandwidth requirements. Fortunately, the attention dispar…
▽ More
Heterogeneous Graph Neural Networks (HGNNs) have recently demonstrated great power in handling heterogeneous graph data, rendering them widely applied in many critical real-world domains. Most HGNN models leverage attention mechanisms to significantly improvemodel accuracy, albeit at the cost of increased computational complexity and memory bandwidth requirements. Fortunately, the attention disparity from source vertices towards a common target vertex unveils an opportunity to boost the model execution performance by pruning unimportant source vertices during neighbor aggregation. In this study, we commence with a quantitative analysis of the attention disparity in HGNN models, where the importance of different source vertices varies for the same target vertex. To fully exploit this finding for inference acceleration, we propose a runtime pruning method based on min-heap and map it to a dedicated hardware pruner to discard unimportant vertices. Given that the pruning overhead itself is non-negligible and cannot be amortized by conventional staged execution paradigm, an operation-fusion execution fow of HGNNs is introduced to overlap the pruning overhead while harnessing inter-stage parallelism. Finally, we present the design of a novel HGNN accelerator, ADE-HGNN, tailored to support the proposed execution framework. Our experimental results demonstrate that ADE-HGNN achieves an average performance improvement of 28.21x over the NVIDIA GPU T4 platform and 7.98x over the advanced GPU A100, with the inference accuracy loss kept within a negligible range of 0.11%~1.47%. Furthermore, ADE-HGNN significantly reduces energy consumption to 1.97% and 5.37% of the two platforms, respectively.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Compact dwarfs made of light-quark nuggets
Authors:
Hao-Song You,
Hao Sun,
Hong-Bo Li,
Cheng-Jun Xia,
Ren-Xin Xu
Abstract:
Utilizing an equivparticle model with both linear confinement and leading-order perturbative interactions, we obtain systematically the properties of strangelets and nonstrange quark matter ($ud$QM) nuggets at various baryon ($A$) and charge ($Z$) numbers, where the detailed single-quark-energy levels are fixed by solving Dirac equations in mean-field approximation (MFA). We then examine the struc…
▽ More
Utilizing an equivparticle model with both linear confinement and leading-order perturbative interactions, we obtain systematically the properties of strangelets and nonstrange quark matter ($ud$QM) nuggets at various baryon ($A$) and charge ($Z$) numbers, where the detailed single-quark-energy levels are fixed by solving Dirac equations in mean-field approximation (MFA). We then examine the structures of compact dwarfs made of light strangelets or $ud$QM nuggets forming body-centered cubic lattices in a uniform electron background. Despite the strangelets and $ud$QM nuggets generally become more stable at larger $A$, the compact dwarfs are still stable since the fusion reactions between those objects do not take place in the presence of a Coulomb barrier, which is similar to the cases of light nuclei in normal white dwarfs. If $ud$QM dwarfs or strangelet dwarfs are covered with normal matter, their masses and radii become larger but do not exceed those of ordinary white dwarfs. Finally, we investigate the radial oscillation frequencies of $ud$QM dwarfs and strangelet dwarfs, and find that their frequencies are typically higher than traditional white dwarfs. The stability of compact dwarfs are then analysised by examining radial oscillation frequencies of the fundamental mode, where compact dwarfs covered by normal matter are still stable.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
High priority targets for transient gravitational waves from glitching pulsars
Authors:
Garvin Yim,
Li**g Shao,
Renxin Xu
Abstract:
Glitching pulsars are expected to be important sources of gravitational waves. In this paper, we explore six different models that propose the emission of transient continuous waves, lasting days to months, coincident with glitches. The maximal gravitational wave energy is calculated for each model, which is then used to determine whether associated gravitational waves could be detectable with LIG…
▽ More
Glitching pulsars are expected to be important sources of gravitational waves. In this paper, we explore six different models that propose the emission of transient continuous waves, lasting days to months, coincident with glitches. The maximal gravitational wave energy is calculated for each model, which is then used to determine whether associated gravitational waves could be detectable with LIGO-Virgo-KAGRA's O4 detectors. We provide an analytical approximation to calculate the signal-to-noise ratio which includes information about the source's sky position, improving on previous estimates that assume isotropic or sky and orientation averaged sensitivities. Applying the calculation to the entire glitching population, we find that certain models predict detectable signals in O4, whereas others do not. We also rank glitching pulsars in order of how significant a signal would be, based on archival data, and we find that for all models, the Vela pulsar (PSR J0835$-$4510) would provide the strongest signal. Moreover, PSR J0537$-$6910 is not expected to yield a detectable signal in O4, but will start becoming relevant for next generation detectors. Our analysis also extends to the entire pulsar population, regardless of whether they have glitched or not, and we provide a list of pulsars that would present a significant signal, if they were to glitch. Finally, we apply our analysis to the latest April 2024 Vela glitch and find that a signal should be detectable under certain models. The non-detection of a supposedly detectable signal would provide an efficiency factor that quantifies how much a model can contribute to gravitational wave emission, eventually leading to a differentiation of models and independent constraints on physical parameters.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training
Authors:
Feiteng Fang,
Yuelin Bai,
Shiwen Ni,
Min Yang,
Xiaojun Chen,
Ruifeng Xu
Abstract:
Large Language Models (LLMs) exhibit substantial capabilities yet encounter challenges, including hallucination, outdated knowledge, and untraceable reasoning processes. Retrieval-augmented generation (RAG) has emerged as a promising solution, integrating knowledge from external databases to mitigate these challenges. However, inappropriate retrieved passages can potentially hinder the LLMs' capac…
▽ More
Large Language Models (LLMs) exhibit substantial capabilities yet encounter challenges, including hallucination, outdated knowledge, and untraceable reasoning processes. Retrieval-augmented generation (RAG) has emerged as a promising solution, integrating knowledge from external databases to mitigate these challenges. However, inappropriate retrieved passages can potentially hinder the LLMs' capacity to generate comprehensive and high-quality responses. Prior RAG studies on the robustness of retrieval noises often confine themselves to a limited set of noise types, deviating from real-world retrieval environments and limiting practical applicability. In this study, we initially investigate retrieval noises and categorize them into three distinct types, reflecting real-world environments. We analyze the impact of these various retrieval noises on the robustness of LLMs. Subsequently, we propose a novel RAG approach known as Retrieval-augmented Adaptive Adversarial Training (RAAT). RAAT leverages adaptive adversarial training to dynamically adjust the model's training process in response to retrieval noises. Concurrently, it employs multi-task learning to ensure the model's capacity to internally recognize noisy contexts. Extensive experiments demonstrate that the LLaMA-2 7B model trained using RAAT exhibits significant improvements in F1 and EM scores under diverse noise conditions. For reproducibility, we release our code and data at: https://github.com/calubkk/RAAT.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Preemptive Answer "Attacks" on Chain-of-Thought Reasoning
Authors:
Rongwu Xu,
Zehan Qi,
Wei Xu
Abstract:
Large language models (LLMs) showcase impressive reasoning capabilities when coupled with Chain-of-Thought (CoT) prompting. However, the robustness of this approach warrants further investigation. In this paper, we introduce a novel scenario termed preemptive answers, where the LLM obtains an answer before engaging in reasoning. This situation can arise inadvertently or induced by malicious users…
▽ More
Large language models (LLMs) showcase impressive reasoning capabilities when coupled with Chain-of-Thought (CoT) prompting. However, the robustness of this approach warrants further investigation. In this paper, we introduce a novel scenario termed preemptive answers, where the LLM obtains an answer before engaging in reasoning. This situation can arise inadvertently or induced by malicious users by prompt injection attacks. Experiments reveal that preemptive answers significantly impair the model's reasoning capability across various CoT methods and a broad spectrum of datasets. To bolster the robustness of reasoning, we propose two measures aimed at mitigating this issue to some extent.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Large Language Models
Authors:
Hao Cheng,
Erjia Xiao,
Jiahang Cao,
Le Yang,
Kaidi Xu,
**dong Gu,
Ren**g Xu
Abstract:
Following the advent of the Artificial Intelligence (AI) era of large models, Multimodal Large Language Models (MLLMs) with the ability to understand cross-modal interactions between vision and text have attracted wide attention. Adversarial examples with human-imperceptible perturbation are shown to possess a characteristic known as transferability, which means that a perturbation generated by on…
▽ More
Following the advent of the Artificial Intelligence (AI) era of large models, Multimodal Large Language Models (MLLMs) with the ability to understand cross-modal interactions between vision and text have attracted wide attention. Adversarial examples with human-imperceptible perturbation are shown to possess a characteristic known as transferability, which means that a perturbation generated by one model could also mislead another different model. Augmenting the diversity in input data is one of the most significant methods for enhancing adversarial transferability. This method has been certified as a way to significantly enlarge the threat impact under black-box conditions. Research works also demonstrate that MLLMs can be exploited to generate adversarial examples in the white-box scenario. However, the adversarial transferability of such perturbations is quite limited, failing to achieve effective black-box attacks across different models. In this paper, we propose the Typographic-based Semantic Transfer Attack (TSTA), which is inspired by: (1) MLLMs tend to process semantic-level information; (2) Typographic Attack could effectively distract the visual information captured by MLLMs. In the scenarios of Harmful Word Insertion and Important Information Protection, our TSTA demonstrates superior performance.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Correlated Electronic Structure and Density-Wave Gap in Trilayer Nickelate La4Ni3O10
Authors:
X. Du,
Y. D. Li,
Y. T. Cao,
C. Y. Pei,
M. X. Zhang,
W. X. Zhao,
K. Y. Zhai,
R. Z. Xu,
Z. K. Liu,
Z. W. Li,
J. K. Zhao,
G. Li,
Y. L. Chen,
Y. P. Qi,
H. J. Guo,
L. X. Yang
Abstract:
The discovery of pressurized superconductivity at 80 K in La3Ni2O7 officially brings nickelates into the family of high-temperature superconductors, which gives rise to not only new insights but also mysteries in the strongly correlated superconductivity. More recently, the sibling compound La4Ni3O10 was also shown to be superconducting below about 25 K under pressure, further boosting the popular…
▽ More
The discovery of pressurized superconductivity at 80 K in La3Ni2O7 officially brings nickelates into the family of high-temperature superconductors, which gives rise to not only new insights but also mysteries in the strongly correlated superconductivity. More recently, the sibling compound La4Ni3O10 was also shown to be superconducting below about 25 K under pressure, further boosting the popularity of nickelates in the Ruddlesden-Popper phase. In this study, combining high-resolution angle-resolved photoemission spectroscopy and ab initio calculation, we systematically investigate the electronic structures of La4Ni3O10 at ambient pressure. We reveal a high resemblance of La4Ni3O10 with La3Ni2O7 in the orbital-dependent fermiology and electronic structure, suggesting a similar electronic correlation between the two compounds. The temperature-dependent measurements imply an orbital-dependent energy gap related to the density-wave transition in La4Ni3O10. By comparing the theoretical pressure-dependent electronic structure, clues about the superconducting high-pressure phase can be deduced from the ambient measurements, providing crucial information for deciphering the unconventional superconductivity in nickelates.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Efficient multi-prompt evaluation of LLMs
Authors:
Felipe Maia Polo,
Ronald Xu,
Lucas Weber,
Mírian Silva,
Onkar Bhardwaj,
Leshem Choshen,
Allysson Flavio Melo de Oliveira,
Yuekai Sun,
Mikhail Yurochkin
Abstract:
Most popular benchmarks for comparing LLMs rely on a limited set of prompt templates, which may not fully capture the LLMs' abilities and can affect the reproducibility of results on leaderboards. Many recent works empirically verify prompt sensitivity and advocate for changes in LLM evaluation. In this paper, we consider the problem of estimating the performance distribution across many prompt va…
▽ More
Most popular benchmarks for comparing LLMs rely on a limited set of prompt templates, which may not fully capture the LLMs' abilities and can affect the reproducibility of results on leaderboards. Many recent works empirically verify prompt sensitivity and advocate for changes in LLM evaluation. In this paper, we consider the problem of estimating the performance distribution across many prompt variants instead of finding a single prompt to evaluate with. We introduce PromptEval, a method for estimating performance across a large set of prompts borrowing strength across prompts and examples to produce accurate estimates under practical evaluation budgets. The resulting distribution can be used to obtain performance quantiles to construct various robust performance metrics (e.g., top 95% quantile or median). We prove that PromptEval consistently estimates the performance distribution and demonstrate its efficacy empirically on three prominent LLM benchmarks: MMLU, BIG-bench Hard, and LMentry. For example, PromptEval can accurately estimate performance quantiles across 100 prompt templates on MMLU with a budget equivalent to two single-prompt evaluations. Our code and data can be found at https://github.com/felipemaiapolo/prompt-eval.
△ Less
Submitted 7 June, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
Delay Performance Analysis of Delay-Deterministic Wireless Networks with Infinite and Finite Blocklength Transmission
Authors:
Hanxue Ding,
Shaoyi Xu,
Ziheng Xu,
Rongtao Xu,
Zonghui Li,
Junhui Zhao
Abstract:
In order to achieve stable and reliable industrial manufacturing, wireless networks must meet the stringent communication requirements of industrial automation, particularly the need for deterministic low latency communication. The limited wireless resources and time-varying fading channel contribute to the random fluctuations of transmission delay, making it challenging to realize delay-determini…
▽ More
In order to achieve stable and reliable industrial manufacturing, wireless networks must meet the stringent communication requirements of industrial automation, particularly the need for deterministic low latency communication. The limited wireless resources and time-varying fading channel contribute to the random fluctuations of transmission delay, making it challenging to realize delay-deterministic wireless networks. An open challenge in this context is to model delay determinism, also known as jitter, and analyze delay performance. In this paper, we model jitter as the variance of delay and conduct a comprehensive analysis of delay performance. Specifically, we consider two transmission regimes: infinite blocklength (IBL) and finite blocklength (FBL). In the IBL regime, the distribution of the transmission delay is analyzed, and the closed-form expressions for the average delay, jitter, and delay violation probability are derived. In the FBL regime, an upper bound on the transmission delay is first approximated at a high signalto-noise ratio. Based on this upper bound, the delay distribution, delay violation probability, average delay, and jitter are derived. Finally, simulation results are provided to validate the accuracy of the analysis and derivations. Additionally, the impact of system parameters on jitter is analyzed to gain further insights.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation
Authors:
Rongyu Zhang,
Aosong Cheng,
Yulin Luo,
Gaole Dai,
Huanrui Yang,
Jiaming Liu,
Ran Xu,
Li Du,
Yuan Du,
Yanbing Jiang,
Shanghang Zhang
Abstract:
Continual Test-Time Adaptation (CTTA), which aims to adapt the pre-trained model to ever-evolving target domains, emerges as an important task for vision models. As current vision models appear to be heavily biased towards texture, continuously adapting the model from one domain distribution to another can result in serious catastrophic forgetting. Drawing inspiration from the human visual system'…
▽ More
Continual Test-Time Adaptation (CTTA), which aims to adapt the pre-trained model to ever-evolving target domains, emerges as an important task for vision models. As current vision models appear to be heavily biased towards texture, continuously adapting the model from one domain distribution to another can result in serious catastrophic forgetting. Drawing inspiration from the human visual system's adeptness at processing both shape and texture according to the famous Trichromatic Theory, we explore the integration of a Mixture-of-Activation-Sparsity-Experts (MoASE) as an adapter for the CTTA task. Given the distinct reaction of neurons with low/high activation to domain-specific/agnostic features, MoASE decomposes the neural activation into high-activation and low-activation components with a non-differentiable Spatial Differentiate Dropout (SDD). Based on the decomposition, we devise a multi-gate structure comprising a Domain-Aware Gate (DAG) that utilizes domain information to adaptive combine experts that process the post-SDD sparse activations of different strengths, and the Activation Sparsity Gate (ASG) that adaptively assigned feature selection threshold of the SDD for different experts for more precise feature decomposition. Finally, we introduce a Homeostatic-Proximal (HP) loss to bypass the error accumulation problem when continuously adapting the model. Extensive experiments on four prominent benchmarks substantiate that our methodology achieves state-of-the-art performance in both classification and segmentation CTTA tasks. Our code is now available at https://github.com/RoyZry98/MoASE-Pytorch.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Inference of Utilities and Time Preference in Sequential Decision-Making
Authors:
Haoyang Cao,
Zhengqi Wu,
Renyuan Xu
Abstract:
This paper introduces a novel stochastic control framework to enhance the capabilities of automated investment managers, or robo-advisors, by accurately inferring clients' investment preferences from past activities. Our approach leverages a continuous-time model that incorporates utility functions and a generic discounting scheme of a time-varying rate, tailored to each client's risk tolerance, v…
▽ More
This paper introduces a novel stochastic control framework to enhance the capabilities of automated investment managers, or robo-advisors, by accurately inferring clients' investment preferences from past activities. Our approach leverages a continuous-time model that incorporates utility functions and a generic discounting scheme of a time-varying rate, tailored to each client's risk tolerance, valuation of daily consumption, and significant life goals. We address the resulting time inconsistency issue through state augmentation and the establishment of the dynamic programming principle and the verification theorem. Additionally, we provide sufficient conditions for the identifiability of client investment preferences. To complement our theoretical developments, we propose a learning algorithm based on maximum likelihood estimation within a discrete-time Markov Decision Process framework, augmented with entropy regularization. We prove that the log-likelihood function is locally concave, facilitating the fast convergence of our proposed algorithm. Practical effectiveness and efficiency are showcased through two numerical examples, including Merton's problem and an investment problem with unhedgeable risks.
Our proposed framework not only advances financial technology by improving personalized investment advice but also contributes broadly to other fields such as healthcare, economics, and artificial intelligence, where understanding individual preferences is crucial.
△ Less
Submitted 3 June, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
EvGGS: A Collaborative Learning Framework for Event-based Generalizable Gaussian Splatting
Authors:
Jiaxu Wang,
Junhao He,
Ziyi Zhang,
Mingyuan Sun,
**gkai Sun,
Ren**g Xu
Abstract:
Event cameras offer promising advantages such as high dynamic range and low latency, making them well-suited for challenging lighting conditions and fast-moving scenarios. However, reconstructing 3D scenes from raw event streams is difficult because event data is sparse and does not carry absolute color information. To release its potential in 3D reconstruction, we propose the first event-based ge…
▽ More
Event cameras offer promising advantages such as high dynamic range and low latency, making them well-suited for challenging lighting conditions and fast-moving scenarios. However, reconstructing 3D scenes from raw event streams is difficult because event data is sparse and does not carry absolute color information. To release its potential in 3D reconstruction, we propose the first event-based generalizable 3D reconstruction framework, called EvGGS, which reconstructs scenes as 3D Gaussians from only event input in a feedforward manner and can generalize to unseen cases without any retraining. This framework includes a depth estimation module, an intensity reconstruction module, and a Gaussian regression module. These submodules connect in a cascading manner, and we collaboratively train them with a designed joint loss to make them mutually promote. To facilitate related studies, we build a novel event-based 3D dataset with various material objects and calibrated labels of grayscale images, depth maps, camera poses, and silhouettes. Experiments show models that have jointly trained significantly outperform those trained individually. Our approach performs better than all baselines in reconstruction quality, and depth/intensity predictions with satisfactory rendering speed.
△ Less
Submitted 3 June, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Look into the Future: Deep Contextualized Sequential Recommendation
Authors:
Lei Zheng,
Ning Li,
Yanhuan Huang,
Ruiwen Xu,
Weinan Zhang,
Yong Yu
Abstract:
Sequential recommendation focuses on mining useful patterns from the user behavior history to better estimate his preference on the candidate items. Previous solutions adopt recurrent networks or retrieval methods to obtain the user's profile representation so as to perform the preference estimation. In this paper, we propose a novel framework of sequential recommendation called Look into the Futu…
▽ More
Sequential recommendation focuses on mining useful patterns from the user behavior history to better estimate his preference on the candidate items. Previous solutions adopt recurrent networks or retrieval methods to obtain the user's profile representation so as to perform the preference estimation. In this paper, we propose a novel framework of sequential recommendation called Look into the Future (LIFT), which builds and leverages the contexts of sequential recommendation. The context in LIFT refers to a user's current profile that can be represented based on both past and future behaviors. As such, the learned context will be more effective in predicting the user's behaviors in sequential recommendation. Apparently, it is impossible to use real future information to predict the current behavior, we thus propose a novel retrieval-based framework to use the most similar interaction's future information as the future context of the target interaction without data leakage. Furthermore, in order to exploit the intrinsic information embedded within the context itself, we introduce an innovative pretraining methodology incorporating behavior masking. This approach is designed to facilitate the efficient acquisition of context representations. We demonstrate that finding relevant contexts from the global user pool via retrieval methods will greatly improve preference estimation performance. In our extensive experiments over real-world datasets, LIFT demonstrates significant performance improvement on click-through rate prediction tasks in sequential recommendation over strong baselines.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Strangelets at finite temperature
Authors:
Hao-Song You,
Huai-Min Chen,
Jian-Feng Xu,
Cheng-Jun Xia,
Ren-Xin Xu,
Guang-Xiong Peng
Abstract:
We study the properties of strangelets at finite temperature $T$, employing an equivparticle model that incorporates both linear confinement and leading-order perturbative interactions with density-dependent quark masses. The shell effects are analyzed by solving the Dirac equations for quarks within the mean-field approximation. As temperature increases, these effects weaken due to the occupation…
▽ More
We study the properties of strangelets at finite temperature $T$, employing an equivparticle model that incorporates both linear confinement and leading-order perturbative interactions with density-dependent quark masses. The shell effects are analyzed by solving the Dirac equations for quarks within the mean-field approximation. As temperature increases, these effects weaken due to the occupation probability of single-particle levels being governed by the Fermi-Dirac statistics, a phenomenon known as shell dampening. Surprisingly, the surface tension, derived from a liquid-drop formula, does not decrease with temperature but instead rises until it peaks at $T \approx 20-40$ MeV. At this temperature, shell corrections become negligible, and the formula provides a reasonable approximation for the free energy per baryon of strangelets. However, the curvature term decreases with $T$ despite the presence of shell effects. The neutron and proton emission rates are determined microscopically by the external nucleon gas densities that are in equilibrium with strangelets. These emission rate generally increases with $T$ for stable strangelets, but decrease for those that are unstable to nucleon emission at $T$ = 0. The other properties of $β$-stable strangelets obtained with various parameter sets are presented as well. The results indicated in this work are useful for understanding the products of binary compact star mergers and heavy-ion collisions.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.