-
Insulator-to-Metal Transition and Isotropic Gigantic Magnetoresistance in Layered Magnetic Semiconductors
Authors:
Gokul Acharya,
Bimal Neupane,
Chia-Hsiu Hsu,
Xian P. Yang,
David Graf,
Eun Sang Choi,
Krishna Pandey,
Md Rafique Un Nabi,
Santosh Karki Chhetri,
Rabindra Basnet,
Sumaya Rahman,
Jian Wang,
Zhengxin Hu,
Bo Da,
Hugh Churchill,
Guoqing Chang,
M. Zahid Hasan,
Yuanxi Wang,
** Hu
Abstract:
Magnetotransport, the response of electrical conduction to external magnetic field, acts as an important tool to reveal fundamental concepts behind exotic phenomena and plays a key role in enabling spintronic applications. Magnetotransport is generally sensitive to magnetic field orientations. In contrast, efficient and isotropic modulation of electronic transport, which is useful in technology ap…
▽ More
Magnetotransport, the response of electrical conduction to external magnetic field, acts as an important tool to reveal fundamental concepts behind exotic phenomena and plays a key role in enabling spintronic applications. Magnetotransport is generally sensitive to magnetic field orientations. In contrast, efficient and isotropic modulation of electronic transport, which is useful in technology applications such as omnidirectional sensing, is rarely seen, especially for pristine crystals. Here we propose a strategy to realize extremely strong modulation of electron conduction by magnetic field which is independent of field direction. GdPS, a layered antiferromagnetic semiconductor with resistivity anisotropies, supports a field-driven insulator-to-metal transition with a paradoxically isotropic gigantic negative magnetoresistance insensitive to magnetic field orientations. This isotropic magnetoresistance originates from the combined effects of a near-zero spin-orbit coupling of Gd3+-based half-filling f-electron system and the strong on-site f-d exchange coupling in Gd atoms. Our results not only provide a novel material system with extraordinary magnetotransport that offers a missing block for antiferromagnet-based ultrafast and efficient spintronic devices, but also demonstrate the key ingredients for designing magnetic materials with desired transport properties for advanced functionalities.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
VIVA: A Benchmark for Vision-Grounded Decision-Making with Human Values
Authors:
Zhe Hu,
Yixiao Ren,
**g Li,
Yu Yin
Abstract:
This paper introduces VIVA, a benchmark for VIsion-grounded decision-making driven by human VAlues. While most large vision-language models (VLMs) focus on physical-level skills, our work is the first to examine their multimodal capabilities in leveraging human values to make decisions under a vision-depicted situation. VIVA contains 1,062 images depicting diverse real-world situations and the man…
▽ More
This paper introduces VIVA, a benchmark for VIsion-grounded decision-making driven by human VAlues. While most large vision-language models (VLMs) focus on physical-level skills, our work is the first to examine their multimodal capabilities in leveraging human values to make decisions under a vision-depicted situation. VIVA contains 1,062 images depicting diverse real-world situations and the manually annotated decisions grounded in them. Given an image there, the model should select the most appropriate action to address the situation and provide the relevant human values and reason underlying the decision. Extensive experiments based on VIVA show the limitation of VLMs in using human values to make multimodal decisions. Further analyses indicate the potential benefits of exploiting action consequences and predicted human values.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
HOIMotion: Forecasting Human Motion During Human-Object Interactions Using Egocentric 3D Object Bounding Boxes
Authors:
Zhiming Hu,
Zheming Yin,
Daniel Haeufle,
Syn Schmitt,
Andreas Bulling
Abstract:
We present HOIMotion - a novel approach for human motion forecasting during human-object interactions that integrates information about past body poses and egocentric 3D object bounding boxes. Human motion forecasting is important in many augmented reality applications but most existing methods have only used past body poses to predict future motion. HOIMotion first uses an encoder-residual graph…
▽ More
We present HOIMotion - a novel approach for human motion forecasting during human-object interactions that integrates information about past body poses and egocentric 3D object bounding boxes. Human motion forecasting is important in many augmented reality applications but most existing methods have only used past body poses to predict future motion. HOIMotion first uses an encoder-residual graph convolutional network (GCN) and multi-layer perceptrons to extract features from body poses and egocentric 3D object bounding boxes, respectively. Our method then fuses pose and object features into a novel pose-object graph and uses a residual-decoder GCN to forecast future body motion. We extensively evaluate our method on the Aria digital twin (ADT) and MoGaze datasets and show that HOIMotion consistently outperforms state-of-the-art methods by a large margin of up to 8.7% on ADT and 7.2% on MoGaze in terms of mean per joint position error. Complementing these evaluations, we report a human study (N=20) that shows that the improvements achieved by our method result in forecasted poses being perceived as both more precise and more realistic than those of existing methods. Taken together, these results reveal the significant information content available in egocentric 3D object bounding boxes for human motion forecasting and the effectiveness of our method in exploiting this information.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Enhanced Second-Harmonic Generation in Thin-Film Lithium Niobate Bragg Nanocavity
Authors:
Z. Li,
Z. Hu,
X. Ye,
Z. Mao,
J. Feng,
H. Li,
S. Liu,
B. Wang,
Y. Zheng,
X. Chen
Abstract:
Second-order nonlinearity gives rise to many distinctive physical phenomena, e.g., second-harmonic generation (SHG), which plays an important role in fundamental science and various applications. Lithium niobate, one of the most widely used nonlinear crystals, exhibits strong second-order nonlinear effects and electro-optic properties. However, its moderate refractive index and etching sidewall an…
▽ More
Second-order nonlinearity gives rise to many distinctive physical phenomena, e.g., second-harmonic generation (SHG), which plays an important role in fundamental science and various applications. Lithium niobate, one of the most widely used nonlinear crystals, exhibits strong second-order nonlinear effects and electro-optic properties. However, its moderate refractive index and etching sidewall angle limit its capability in confining light into nanoscales, restricting its application in nanophotonics. Here, we exploit nanocavities formed by second-order circular Bragg gratings (CBG), which support resonant anapole modes to achieve highly enhanced SHG in thin film lithium niobate (TFLN). The CBG nanocavity exhibits a record-high normalized conversion efficiency of $1.21\times 10^{-2}\mathrm{cm}^2/\mathrm{GW}$ under the pump intensity of $1.9$ $\mathrm{MW}/\mathrm{cm}^2$. An SHG enhancement of $420,000$ is realized compared to TFLN. In addition to circular CBGs, we also show s- and p-polarization independent SHG in elliptical Bragg nanocavities. This work could inspire studying nonlinear optics at the nanoscale on TFLN as well as other novel photonic platforms.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
ToCoAD: Two-Stage Contrastive Learning for Industrial Anomaly Detection
Authors:
Yun Liang,
Zhiguang Hu,
Junjie Huang,
Donglin Di,
Anyang Su,
Lei Fan
Abstract:
Current unsupervised anomaly detection approaches perform well on public datasets but struggle with specific anomaly types due to the domain gap between pre-trained feature extractors and target-specific domains. To tackle this issue, this paper presents a two-stage training strategy, called \textbf{ToCoAD}. In the first stage, a discriminative network is trained by using synthetic anomalies in a…
▽ More
Current unsupervised anomaly detection approaches perform well on public datasets but struggle with specific anomaly types due to the domain gap between pre-trained feature extractors and target-specific domains. To tackle this issue, this paper presents a two-stage training strategy, called \textbf{ToCoAD}. In the first stage, a discriminative network is trained by using synthetic anomalies in a self-supervised learning manner. This network is then utilized in the second stage to provide a negative feature guide, aiding in the training of the feature extractor through bootstrap contrastive learning. This approach enables the model to progressively learn the distribution of anomalies specific to industrial datasets, effectively enhancing its generalizability to various types of anomalies. Extensive experiments are conducted to demonstrate the effectiveness of our proposed two-stage training strategy, and our model produces competitive performance, achieving pixel-level AUROC scores of 98.21\%, 98.43\% and 97.70\% on MVTec AD, VisA and BTAD respectively.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
MIRAI: Evaluating LLM Agents for Event Forecasting
Authors:
Chenchen Ye,
Ziniu Hu,
Yihe Deng,
Zijie Huang,
Mingyu Derek Ma,
Yanqiao Zhu,
Wei Wang
Abstract:
Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite…
▽ More
Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite such a growing interest, there is a lack of a rigorous benchmark of LLM agents' forecasting capability and reliability. To address this gap, we introduce MIRAI, a novel benchmark designed to systematically evaluate LLM agents as temporal forecasters in the context of international events. Our benchmark features an agentic environment with tools for accessing an extensive database of historical, structured events and textual news articles. We refine the GDELT event database with careful cleaning and parsing to curate a series of relational prediction tasks with varying forecasting horizons, assessing LLM agents' abilities from short-term to long-term forecasting. We further implement APIs to enable LLM agents to utilize different tools via a code-based interface. In summary, MIRAI comprehensively evaluates the agents' capabilities in three dimensions: 1) autonomously source and integrate critical information from large global databases; 2) write codes using domain-specific APIs and libraries for tool-use; and 3) jointly reason over historical knowledge from diverse formats and time to accurately predict future events. Through comprehensive benchmarking, we aim to establish a reliable framework for assessing the capabilities of LLM agents in forecasting international events, thereby contributing to the development of more accurate and trustworthy models for international relation analysis.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Rethinking LLM-based Preference Evaluation
Authors:
Zhengyu Hu,
Linxin Song,
Jieyu Zhang,
Zheyuan Xiao,
**gang Wang,
Zhenyu Chen,
Jieyu Zhao,
Hui Xiong
Abstract:
Recently, large language model (LLM)-based preference evaluation has been widely adopted to compare pairs of model responses. However, a severe bias towards lengthy responses has been observed, raising concerns about the reliability of this evaluation method. In this work, we designed a series of controlled experiments to study the major impacting factors of the metric of LLM-based preference eval…
▽ More
Recently, large language model (LLM)-based preference evaluation has been widely adopted to compare pairs of model responses. However, a severe bias towards lengthy responses has been observed, raising concerns about the reliability of this evaluation method. In this work, we designed a series of controlled experiments to study the major impacting factors of the metric of LLM-based preference evaluation, i.e., win rate, and conclude that the win rate is affected by two axes of model response: desirability and information mass, where the former is length-independent and related to trustworthiness, and the latter is length-dependent and can be represented by conditional entropy. We find that length impacts the existing evaluations by influencing information mass. However, a reliable evaluation metric should not only assess content quality but also ensure that the assessment is not confounded by extraneous factors such as response length. Therefore, we propose a simple yet effective adjustment, AdapAlpaca, to the existing practice of win rate measurement. Specifically, by adjusting the lengths of reference answers to match the test model's answers within the same interval, we debias information mass relative to length, ensuring a fair model evaluation.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Coding for Intelligence from the Perspective of Category
Authors:
Wenhan Yang,
Zixuan Hu,
Lilang Lin,
Jiaying Liu,
Ling-Yu Duan
Abstract:
Coding, which targets compressing and reconstructing data, and intelligence, often regarded at an abstract computational level as being centered around model learning and prediction, interweave recently to give birth to a series of significant progress. The recent trends demonstrate the potential homogeneity of these two fields, especially when deep-learning models aid these two categories for bet…
▽ More
Coding, which targets compressing and reconstructing data, and intelligence, often regarded at an abstract computational level as being centered around model learning and prediction, interweave recently to give birth to a series of significant progress. The recent trends demonstrate the potential homogeneity of these two fields, especially when deep-learning models aid these two categories for better probability modeling. For better understanding and describing from a unified perspective, inspired by the basic generally recognized principles in cognitive psychology, we formulate a novel problem of Coding for Intelligence from the category theory view. Based on the three axioms: existence of ideal coding, existence of practical coding, and compactness promoting generalization, we derive a general framework to understand existing methodologies, namely that, coding captures the intrinsic relationships of objects as much as possible, while ignoring information irrelevant to downstream tasks. This framework helps identify the challenges and essential elements in solving the specific derived Minimal Description Length (MDL) optimization problem from a broader range, providing opportunities to build a more intelligent system for handling multiple tasks/applications with coding ideas/tools. Centering on those elements, we systematically review recent processes of towards optimizing the MDL problem in more comprehensive ways from data, model, and task perspectives, and reveal their impacts on the potential CfI technical routes. After that, we also present new technique paths to fulfill CfI and provide potential solutions with preliminary experimental evidence. Last, further directions and remaining issues are discussed as well. The discussion shows our theory can reveal many phenomena and insights about large foundation models, which mutually corroborate with recent practices in feature learning.
△ Less
Submitted 2 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation
Authors:
Mushui Liu,
Yuhang Ma,
Xinfeng Zhang,
Yang Zhen,
Zeng Zhao,
Zhipeng Hu,
Bai Liu,
Changjie Fan
Abstract:
Diffusion Models have exhibited substantial success in text-to-image generation. However, they often encounter challenges when dealing with complex and dense prompts that involve multiple objects, attribute binding, and long descriptions. This paper proposes a framework called \textbf{LLM4GEN}, which enhances the semantic understanding ability of text-to-image diffusion models by leveraging the se…
▽ More
Diffusion Models have exhibited substantial success in text-to-image generation. However, they often encounter challenges when dealing with complex and dense prompts that involve multiple objects, attribute binding, and long descriptions. This paper proposes a framework called \textbf{LLM4GEN}, which enhances the semantic understanding ability of text-to-image diffusion models by leveraging the semantic representation of Large Language Models (LLMs). Through a specially designed Cross-Adapter Module (CAM) that combines the original text features of text-to-image models with LLM features, LLM4GEN can be easily incorporated into various diffusion models as a plug-and-play component and enhances text-to-image generation. Additionally, to facilitate the complex and dense prompts semantic understanding, we develop a LAION-refined dataset, consisting of 1 million (M) text-image pairs with improved image descriptions. We also introduce DensePrompts which contains 7,000 dense prompts to provide a comprehensive evaluation for the text-to-image generation task. With just 10\% of the training data required by recent ELLA, LLM4GEN significantly improves the semantic alignment of SD1.5 and SDXL, demonstrating increases of 7.69\% and 9.60\% in color on T2I-CompBench, respectively. The extensive experiments on DensePrompts also demonstrate that LLM4GEN surpasses existing state-of-the-art models in terms of sample quality, image-text alignment, and human evaluation. The project website is at: \textcolor{magenta}{\url{https://xiaobul.github.io/LLM4GEN/}}
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Stable Machine-Learning Parameterization of Subgrid Processes with Real Geography and Full-physics Emulation
Authors:
Zeyuan Hu,
Akshay Subramaniam,
Zhiming Kuang,
Jerry Lin,
Sungduk Yu,
Walter M. Hannah,
Noah D. Brenowitz,
Josh Romero,
Michael S. Pritchard
Abstract:
Modern climate projections often suffer from inadequate spatial and temporal resolution due to computational limitations, resulting in inaccurate representations of sub-resolution processes. A promising technique to address this is the Multiscale Modeling Framework (MMF), which embeds a small-domain, kilometer-resolution cloud-resolving model within each atmospheric column of a host climate model…
▽ More
Modern climate projections often suffer from inadequate spatial and temporal resolution due to computational limitations, resulting in inaccurate representations of sub-resolution processes. A promising technique to address this is the Multiscale Modeling Framework (MMF), which embeds a small-domain, kilometer-resolution cloud-resolving model within each atmospheric column of a host climate model to replace traditional convection and cloud parameterizations. Machine learning (ML) offers a unique opportunity to make MMF more accessible by emulating the embedded cloud-resolving model and thereby reducing its substantial computational cost. Although many studies have demonstrated proof-of-concept success of emulating the MMF model with stable hybrid simulations, it remains a challenge to achieve operational-level success with real geography and comprehensive variable emulation, such as explicit cloud condensate coupling. In this study, we present a stable hybrid model capable of integrating for at least 5 years with near operational-level complexity, including real geography and explicit predictions of cloud condensate and wind tendencies. Our model demonstrates state-of-the-art online performance such as 5-year zonal mean biases when comparing to previous MMF emulation studies. Key factors contributing to this online performance include the use of an expressive U-Net architecture, leveraging input features that includes large-scale forcings and convective memory, and incorporating microphysics constraints. The microphysics constraints mitigate unrealistic cloud formations such as liquid clouds at freezing temperatures or excessive ice clouds in the stratosphere, which would occur in online simulations with an unconstrained ML model.
△ Less
Submitted 27 June, 2024;
originally announced July 2024.
-
Twist angle driven electronic structure evolution of twisted bilayer graphene
Authors:
Jiawei Yu,
Guihao Jia,
Qian Li,
Yuyang Wang,
Kebin Xiao,
Yongkang Ju,
Hongyun Zhang,
Zhiqiang Hu,
Yunkai Guo,
Biao Lian,
Peizhe Tang,
Shuyun Zhou,
Qi-Kun Xue,
Wei Li
Abstract:
In twisted bilayer graphene (TBG) devices, local strains often coexist and entangle with the twist-angle dependent moiré superlattice, both of which can significantly affect the electronic properties of TBG. Here, using low-temperature scanning tunneling microscopy, we investigate the fine evolution of the electronic structures of a TBG device with continuous variation of twist angles from 0.32° t…
▽ More
In twisted bilayer graphene (TBG) devices, local strains often coexist and entangle with the twist-angle dependent moiré superlattice, both of which can significantly affect the electronic properties of TBG. Here, using low-temperature scanning tunneling microscopy, we investigate the fine evolution of the electronic structures of a TBG device with continuous variation of twist angles from 0.32° to 1.29°, spanning the first (1.1°), second (0.5°) and third (0.3°) magic angles. We reveal the exotic behavior of the flat bands and remote bands in both the energy space and real space near the magic angles. Interestingly, we observe an anomalous spectral weight transfer between the two flat band peaks in the tunneling spectra when approaching the first magic angle, suggesting strong inter-flat-bands interactions. The position of the remote band peak can be an index for the twist angle in TBG, since it positively correlates with the twist angle but is insensitive to the strain. Moreover, influences of the twist angle gradient on symmetry breaking of the flat bands are also studied.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Unlocking Varied Perspectives: A Persona-Based Multi-Agent Framework with Debate-Driven Text Planning for Argument Generation
Authors:
Zhe Hu,
Hou Pong Chan,
**g Li,
Yu Yin
Abstract:
Writing persuasive arguments is a challenging task for both humans and machines. It entails incorporating high-level beliefs from various perspectives on the topic, along with deliberate reasoning and planning to construct a coherent narrative. Current language models often generate surface tokens autoregressively, lacking explicit integration of these underlying controls, resulting in limited out…
▽ More
Writing persuasive arguments is a challenging task for both humans and machines. It entails incorporating high-level beliefs from various perspectives on the topic, along with deliberate reasoning and planning to construct a coherent narrative. Current language models often generate surface tokens autoregressively, lacking explicit integration of these underlying controls, resulting in limited output diversity and coherence. In this work, we propose a persona-based multi-agent framework for argument writing. Inspired by the human debate, we first assign each agent a persona representing its high-level beliefs from a unique perspective, and then design an agent interaction process so that the agents can collaboratively debate and discuss the idea to form an overall plan for argument writing. Such debate process enables fluid and nonlinear development of ideas. We evaluate our framework on argumentative essay writing. The results show that our framework can generate more diverse and persuasive arguments through both automatic and human evaluations.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Semi-supervised Concept Bottleneck Models
Authors:
Lijie Hu,
Tianhao Huang,
Huanyi Xie,
Chenyang Ren,
Zhengyu Hu,
Lu Yu,
Di Wang
Abstract:
Concept Bottleneck Models (CBMs) have garnered increasing attention due to their ability to provide concept-based explanations for black-box deep learning models while achieving high final prediction accuracy using human-like concepts. However, the training of current CBMs heavily relies on the accuracy and richness of annotated concepts in the dataset. These concept labels are typically provided…
▽ More
Concept Bottleneck Models (CBMs) have garnered increasing attention due to their ability to provide concept-based explanations for black-box deep learning models while achieving high final prediction accuracy using human-like concepts. However, the training of current CBMs heavily relies on the accuracy and richness of annotated concepts in the dataset. These concept labels are typically provided by experts, which can be costly and require significant resources and effort. Additionally, concept saliency maps frequently misalign with input saliency maps, causing concept predictions to correspond to irrelevant input features - an issue related to annotation alignment. To address these limitations, we propose a new framework called SSCBM (Semi-supervised Concept Bottleneck Model). Our SSCBM is suitable for practical situations where annotated data is scarce. By leveraging joint training on both labeled and unlabeled data and aligning the unlabeled data at the concept level, we effectively solve these issues. We proposed a strategy to generate pseudo labels and an alignment loss. Experiments demonstrate that our SSCBM is both effective and efficient. With only 20% labeled data, we achieved 93.19% (96.39% in a fully supervised setting) concept accuracy and 75.51% (79.82% in a fully supervised setting) prediction accuracy.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Math-LLaVA: Bootstrap** Mathematical Reasoning for Multimodal Large Language Models
Authors:
Wenhao Shi,
Zhiqiang Hu,
Yi Bin,
Junhua Liu,
Yang Yang,
See-Kiong Ng,
Lidong Bing,
Roy Ka-Wei Lee
Abstract:
Large language models (LLMs) have demonstrated impressive reasoning capabilities, particularly in textual mathematical problem-solving. However, existing open-source image instruction fine-tuning datasets, containing limited question-answer pairs per image, do not fully exploit visual information to enhance the multimodal mathematical reasoning capabilities of Multimodal LLMs (MLLMs). To bridge th…
▽ More
Large language models (LLMs) have demonstrated impressive reasoning capabilities, particularly in textual mathematical problem-solving. However, existing open-source image instruction fine-tuning datasets, containing limited question-answer pairs per image, do not fully exploit visual information to enhance the multimodal mathematical reasoning capabilities of Multimodal LLMs (MLLMs). To bridge this gap, we address the lack of high-quality, diverse multimodal mathematical datasets by collecting 40K high-quality images with question-answer pairs from 24 existing datasets and synthesizing 320K new pairs, creating the MathV360K dataset, which enhances both the breadth and depth of multimodal mathematical questions. We introduce Math-LLaVA, a LLaVA-1.5-based model fine-tuned with MathV360K. This novel approach significantly improves the multimodal mathematical reasoning capabilities of LLaVA-1.5, achieving a 19-point increase and comparable performance to GPT-4V on MathVista's minitest split. Furthermore, Math-LLaVA demonstrates enhanced generalizability, showing substantial improvements on the MMMU benchmark. Our research highlights the importance of dataset diversity and synthesis in advancing MLLMs' mathematical reasoning abilities. The code and data are available at: \url{https://github.com/HZQ950419/Math-LLaVA}.
△ Less
Submitted 26 June, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
Gapless dynamic magnetic ground state in the charge-gapped trimer iridate Ba$_4$NbIr$_3$O$_{12}$
Authors:
Abhisek Bandyopadhyay,
S. Lee,
D. T. Adroja,
M. R. Lees,
G. B. G. Stenning,
P. Aich,
Luca Tortora,
C. Meneghini,
G. Cibin,
Adam Berlie,
R. A. Saha,
D. Takegami,
A. Melendez-Sans,
G. Poelchen,
M. Yoshimura,
K. D. Tsuei,
Z. Hu,
Ting-Shan Chan,
S. Chattopadhyay,
G. S. Thakur,
Kwang-Yong Choi
Abstract:
We present an experimental investigation of the magnetic ground state in Ba$_4$NbIr$_3$O$_{12}$, a fractional valent trimer iridate. X-ray absorption and photoemission spectroscopy show that the Ir valence lies between 3+ and 4+ while Nb is pentavalent. Combined dc/ac magnetization, specific heat, and muon spin rotation/relaxation ($μ$SR) measurements reveal no magnetic phase transition down to 0.…
▽ More
We present an experimental investigation of the magnetic ground state in Ba$_4$NbIr$_3$O$_{12}$, a fractional valent trimer iridate. X-ray absorption and photoemission spectroscopy show that the Ir valence lies between 3+ and 4+ while Nb is pentavalent. Combined dc/ac magnetization, specific heat, and muon spin rotation/relaxation ($μ$SR) measurements reveal no magnetic phase transition down to 0.05~K. Despite a significant Weiss temperature ($Θ_{\mathrm{W}} \sim -15$ to $-25$~K) indicating antiferromagnetic correlations, a quantum spin-liquid (QSL) phase emerges and persists down to 0.1~K. This state likely arises from geometric frustration in the edge-sharing equilateral triangle Ir network. Our $μ$SR analysis reveals a two-component depolarization, arising from the coexistence of rapidly (90\%) and slowly (10\%) fluctuating Ir moments. Powder x-ray diffraction and Ir-L$_3$edge x-ray absorption fine structure spectroscopy identify ~8-10\% Nb/Ir site-exchange, reducing frustration within part of the Ir network, and likely leading to the faster muon spin relaxation, while the structurally ordered Ir ions remain highly geometrically frustrated, giving rise to the rapidly spin-fluctuating QSL ground state. At low temperatures, the magnetic specific heat varies as $γT + αT^2$, indicating gapless spinon excitations, and possible Dirac QSL features with linear spinon dispersion, respectively.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
AI for Equitable Tennis Training: Leveraging AI for Equitable and Accurate Classification of Tennis Skill Levels and Training Phases
Authors:
Gyanna Gao,
Hao-Yu Liao,
Zhenhong Hu
Abstract:
Numerous studies have demonstrated the manifold benefits of tennis, such as increasing overall physical and mental health. Unfortunately, many children and youth from low-income families are unable to engage in this sport mainly due to financial constraints such as private lesson expenses as well as logistical concerns to and back from such lessons and clinics. While several tennis self-training s…
▽ More
Numerous studies have demonstrated the manifold benefits of tennis, such as increasing overall physical and mental health. Unfortunately, many children and youth from low-income families are unable to engage in this sport mainly due to financial constraints such as private lesson expenses as well as logistical concerns to and back from such lessons and clinics. While several tennis self-training systems exist, they are often tailored for professionals and are prohibitively expensive. The present study aims to classify tennis players' skill levels and classify tennis strokes into phases characterized by motion attributes for a future development of an AI-based tennis self-training model for affordable and convenient applications running on devices used in daily life such as an iPhone or an Apple Watch for tennis skill improvement. We collected motion data, including Motion Yaw, Roll and Pitch from inertial measurement units (IMUs) worn by participating junior tennis players. For this pilot study, data from twelve participants were processed using Support Vector Machine (SVM) algorithms. The SVM models demonstrated an overall accuracy of 77% in classifying players as beginners or intermediates, with low rates of false positives and false negatives, effectively distinguishing skill levels. Additionally, the tennis swings were successfully classified into five phases based on the collected motion data. These findings indicate that SVM-based classification can be a reliable foundation for develo** an equitable and accessible AI-driven tennis training system.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Anomaly Detection Utilizing a Riemann Metric for Robust Myoelectric Pattern Recognition
Authors:
ZongYe Hu,
Ge Gao,
Xiang Chen,
Xu Zhang
Abstract:
Traditional myoelectric pattern recognition (MPR) systems excel within controlled laboratory environments but they are interfered when confronted with anomaly or novel motions not encountered during the training phase. Utilizing metric ways to distinguish the target and novel motions based on extractors compared to training set is a prevalent idea to alleviate such interference. An innovative meth…
▽ More
Traditional myoelectric pattern recognition (MPR) systems excel within controlled laboratory environments but they are interfered when confronted with anomaly or novel motions not encountered during the training phase. Utilizing metric ways to distinguish the target and novel motions based on extractors compared to training set is a prevalent idea to alleviate such interference. An innovative method for anomaly motion detection was proposed based on simplified log-Euclidean distance (SLED) of symmetric positive definite manifolds. The SLED enhances the discrimination between target and novel motions. Moreover, it generates a more flexible sha** of motion boundaries to segregate target and novel motions, therefore effectively detecting the novel ones. The proposed method was evaluated using surface-electromyographic (sEMG) armband data recorded while performing 6 target and 8 novel hand motions. Based on linear discriminate analysis (LDA) and convolution prototype network (CPN) feature extractors, the proposed method achieved accuracies of 89.7% and 93.9% in novel motion detection respectively, while maintaining a target motion classification accuracy of 90%, outperforming the existing ones with statistical significance (p<0.05). This study provided a valuable solution for improving the robustness of MPR systems against anomaly motion interference.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Multi-channel Time Series Decomposition Network For Generalizable Sensor-Based Activity Recognition
Authors:
Jianguo Pan,
Zhengxin Hu,
Lingdun Zhang,
Xia Cai
Abstract:
Sensor-based human activity recognition is important in daily scenarios such as smart healthcare and homes due to its non-intrusive privacy and low cost advantages, but the problem of out-of-domain generalization caused by differences in focusing individuals and operating environments can lead to significant accuracy degradation on cross-person behavior recognition due to the inconsistent distribu…
▽ More
Sensor-based human activity recognition is important in daily scenarios such as smart healthcare and homes due to its non-intrusive privacy and low cost advantages, but the problem of out-of-domain generalization caused by differences in focusing individuals and operating environments can lead to significant accuracy degradation on cross-person behavior recognition due to the inconsistent distributions of training and test data. To address the above problems, this paper proposes a new method, Multi-channel Time Series Decomposition Network (MTSDNet). Firstly, MTSDNet decomposes the original signal into a combination of multiple polynomials and trigonometric functions by the trainable parameterized temporal decomposition to learn the low-rank representation of the original signal for improving the extraterritorial generalization ability of the model. Then, the different components obtained by the decomposition are classified layer by layer and the layer attention is used to aggregate components to obtain the final classification result. Extensive evaluation on DSADS, OPPORTUNITY, PAMAP2, UCIHAR and UniMib public datasets shows the advantages in predicting accuracy and stability of our method compared with other competing strategies, including the state-of-the-art ones. And the visualization is conducted to reveal MTSDNet's interpretability and layer-by-layer characteristics.
△ Less
Submitted 28 March, 2024;
originally announced June 2024.
-
Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character Customization
Authors:
Yuhang Ma,
Wenting Xu,
Jiji Tang,
Qinfeng **,
Rongsheng Zhang,
Zeng Zhao,
Changjie Fan,
Zhipeng Hu
Abstract:
Customized image generation, which seeks to synthesize images with consistent characters, holds significant relevance for applications such as storytelling, portrait generation, and character design. However, previous approaches have encountered challenges in preserving characters with high-fidelity consistency due to inadequate feature extraction and concept confusion of reference characters. The…
▽ More
Customized image generation, which seeks to synthesize images with consistent characters, holds significant relevance for applications such as storytelling, portrait generation, and character design. However, previous approaches have encountered challenges in preserving characters with high-fidelity consistency due to inadequate feature extraction and concept confusion of reference characters. Therefore, we propose Character-Adapter, a plug-and-play framework designed to generate images that preserve the details of reference characters, ensuring high-fidelity consistency. Character-Adapter employs prompt-guided segmentation to ensure fine-grained regional features of reference characters and dynamic region-level adapters to mitigate concept confusion. Extensive experiments are conducted to validate the effectiveness of Character-Adapter. Both quantitative and qualitative results demonstrate that Character-Adapter achieves the state-of-the-art performance of consistent character generation, with an improvement of 24.8% compared with other methods. Our code will be released at https://github.com/Character-Adapter/Character-Adapter
△ Less
Submitted 3 July, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
Smart Feature is What You Need
Authors:
Zhaoxin Hu,
Keyan Ren
Abstract:
Lack of shape guidance and label jitter caused by information deficiency of weak label are the main problems in 3D weakly-supervised object detection. Current weakly-supervised models often use heuristics or assumptions methods to infer information from weak labels without taking advantage of the inherent clues of weakly-supervised and fully-supervised methods, thus it is difficult to explore a me…
▽ More
Lack of shape guidance and label jitter caused by information deficiency of weak label are the main problems in 3D weakly-supervised object detection. Current weakly-supervised models often use heuristics or assumptions methods to infer information from weak labels without taking advantage of the inherent clues of weakly-supervised and fully-supervised methods, thus it is difficult to explore a method that combines data utilization efficiency and model accuracy. In an attempt to address these issues, we propose a novel plug-and-in point cloud feature representation network called Multi-scale Mixed Attention (MMA). MMA utilizes adjacency attention within neighborhoods and disparity attention at different density scales to build a feature representation network. The smart feature representation obtained from MMA has shape tendency and object existence area inference, which can constrain the region of the detection boxes, thereby alleviating the problems caused by the information default of weak labels. Extensive experiments show that in indoor weak label scenarios, the fully-supervised network can perform close to that of the weakly-supervised network merely through the improvement of point feature by MMA. At the same time, MMA can turn waste into treasure, reversing the label jitter problem that originally interfered with weakly-supervised detection into the source of data enhancement, strengthening the performance of existing weak supervision detection methods. Our code is available at https://github.com/hzx-9894/MMA.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
MEAT: Median-Ensemble Adversarial Training for Improving Robustness and Generalization
Authors:
Zhaozhe Hu,
Jia-Li Yin,
Bin Chen,
Luojun Lin,
Bo-Hao Chen,
Ximeng Liu
Abstract:
Self-ensemble adversarial training methods improve model robustness by ensembling models at different training epochs, such as model weight averaging (WA). However, previous research has shown that self-ensemble defense methods in adversarial training (AT) still suffer from robust overfitting, which severely affects the generalization performance. Empirically, in the late phases of training, the A…
▽ More
Self-ensemble adversarial training methods improve model robustness by ensembling models at different training epochs, such as model weight averaging (WA). However, previous research has shown that self-ensemble defense methods in adversarial training (AT) still suffer from robust overfitting, which severely affects the generalization performance. Empirically, in the late phases of training, the AT becomes more overfitting to the extent that the individuals for weight averaging also suffer from overfitting and produce anomalous weight values, which causes the self-ensemble model to continue to undergo robust overfitting due to the failure in removing the weight anomalies. To solve this problem, we aim to tackle the influence of outliers in the weight space in this work and propose an easy-to-operate and effective Median-Ensemble Adversarial Training (MEAT) method to solve the robust overfitting phenomenon existing in self-ensemble defense from the source by searching for the median of the historical model weights. Experimental results show that MEAT achieves the best robustness against the powerful AutoAttack and can effectively allievate the robust overfitting. We further demonstrate that most defense methods can improve robust generalization and robustness by combining with MEAT.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
The association of domain-specific physical activity and sedentary activity with stroke: A prospective cohort study
Authors:
Xinyi He,
Shidi Wang,
Yi Li,
Jiucun Wang,
Guangrui Yang,
Jun Chen,
Zixin Hu
Abstract:
Background The incidence of stroke places a heavy burden on both society and individuals. Activity is closely related to cardiovascular health. This study aimed to investigate the relationship between the varying domains of PA, like occupation-related Physical Activity (OPA), transportation-related Physical Activity (TPA), leisure-time Physical Activity (LTPA), and Sedentary Activity (SA) with str…
▽ More
Background The incidence of stroke places a heavy burden on both society and individuals. Activity is closely related to cardiovascular health. This study aimed to investigate the relationship between the varying domains of PA, like occupation-related Physical Activity (OPA), transportation-related Physical Activity (TPA), leisure-time Physical Activity (LTPA), and Sedentary Activity (SA) with stroke. Methods Our analysis included 30,400 participants aged 20+ years from 2007 to 2018 National Health and Nutrition Examination Survey (NHANES). Stroke was identified based on the participant's self-reported diagnoses from previous medical consultations, and PA and SA were self-reported. Multivariable logistic and restricted cubic spline models were used to assess the associations. Results Participants achieving PA guidelines (performing PA more than 150 min/week) were 35.7% less likely to have a stroke based on both the total PA (odds ratio [OR] 0.643, 95% confidence interval [CI] 0.523-0.790) and LTPA (OR 0.643, 95% CI 0.514-0.805), while OPA or TPA did not demonstrate lower stroke risk. Furthermore, participants with less than 7.5 h/day SA levels were 21.6% (OR 0.784, 95% CI 0.665-0.925) less likely to have a stroke. The intensities of total PA and LTPA exhibited nonlinear U-shaped associations with stroke risk. In contrast, those of OPA and TPA showed negative linear associations, while SA intensities were positively linearly correlated with stroke risk. Conclusions LTPA, but not OPA or TPA, was associated with a lower risk of stroke at any amount, suggesting that significant cardiovascular health would benefit from increased PA. Additionally, the positive association between SA and stroke indicated that prolonged sitting was detrimental to cardiovascular health. Overall, increased PA within a reasonable range reduces the risk of stroke, while increased SA elevates it.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
BeHonest: Benchmarking Honesty of Large Language Models
Authors:
Steffi Chern,
Zhulin Hu,
Yuqing Yang,
Ethan Chern,
Yuan Guo,
Jiahe **,
Binjie Wang,
Pengfei Liu
Abstract:
Previous works on Large Language Models (LLMs) have mainly focused on evaluating their helpfulness or harmlessness. However, honesty, another crucial alignment criterion, has received relatively less attention. Dishonest behaviors in LLMs, such as spreading misinformation and defrauding users, eroding user trust, and causing real-world harm, present severe risks that intensify as these models appr…
▽ More
Previous works on Large Language Models (LLMs) have mainly focused on evaluating their helpfulness or harmlessness. However, honesty, another crucial alignment criterion, has received relatively less attention. Dishonest behaviors in LLMs, such as spreading misinformation and defrauding users, eroding user trust, and causing real-world harm, present severe risks that intensify as these models approach superintelligence levels. Enhancing honesty in LLMs addresses critical deficiencies and helps uncover latent capabilities that are not readily expressed. This underscores the urgent need for reliable methods and benchmarks to effectively ensure and evaluate the honesty of LLMs.
In this paper, we introduce BeHonest, a pioneering benchmark specifically designed to assess honesty in LLMs comprehensively. BeHonest evaluates three essential aspects of honesty: awareness of knowledge boundaries, avoidance of deceit, and consistency in responses. Building on this foundation, we designed 10 scenarios to evaluate and analyze 9 popular LLMs on the market, including both closed-source and open-source models from different model families with varied model sizes. Our findings indicate that there is still significant room for improvement in the honesty of LLMs. We also encourage the AI community to prioritize honesty alignment in LLMs. Our benchmark and code can be found at: \url{https://github.com/GAIR-NLP/BeHonest}.
△ Less
Submitted 1 July, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
Tackling the Curse of Dimensionality in Fractional and Tempered Fractional PDEs with Physics-Informed Neural Networks
Authors:
Zheyuan Hu,
Kenji Kawaguchi,
Zhongqiang Zhang,
George Em Karniadakis
Abstract:
Fractional and tempered fractional partial differential equations (PDEs) are effective models of long-range interactions, anomalous diffusion, and non-local effects. Traditional numerical methods for these problems are mesh-based, thus struggling with the curse of dimensionality (CoD). Physics-informed neural networks (PINNs) offer a promising solution due to their universal approximation, general…
▽ More
Fractional and tempered fractional partial differential equations (PDEs) are effective models of long-range interactions, anomalous diffusion, and non-local effects. Traditional numerical methods for these problems are mesh-based, thus struggling with the curse of dimensionality (CoD). Physics-informed neural networks (PINNs) offer a promising solution due to their universal approximation, generalization ability, and mesh-free training. In principle, Monte Carlo fractional PINN (MC-fPINN) estimates fractional derivatives using Monte Carlo methods and thus could lift CoD. However, this may cause significant variance and errors, hence affecting convergence; in addition, MC-fPINN is sensitive to hyperparameters. In general, numerical methods and specifically PINNs for tempered fractional PDEs are under-developed. Herein, we extend MC-fPINN to tempered fractional PDEs to address these issues, resulting in the Monte Carlo tempered fractional PINN (MC-tfPINN). To reduce possible high variance and errors from Monte Carlo sampling, we replace the one-dimensional (1D) Monte Carlo with 1D Gaussian quadrature, applicable to both MC-fPINN and MC-tfPINN. We validate our methods on various forward and inverse problems of fractional and tempered fractional PDEs, scaling up to 100,000 dimensions. Our improved MC-fPINN/MC-tfPINN using quadrature consistently outperforms the original versions in accuracy and convergence speed in very high dimensions.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Score-fPINN: Fractional Score-Based Physics-Informed Neural Networks for High-Dimensional Fokker-Planck-Levy Equations
Authors:
Zheyuan Hu,
Zhongqiang Zhang,
George Em Karniadakis,
Kenji Kawaguchi
Abstract:
We introduce an innovative approach for solving high-dimensional Fokker-Planck-Lévy (FPL) equations in modeling non-Brownian processes across disciplines such as physics, finance, and ecology. We utilize a fractional score function and Physical-informed neural networks (PINN) to lift the curse of dimensionality (CoD) and alleviate numerical overflow from exponentially decaying solutions with dimen…
▽ More
We introduce an innovative approach for solving high-dimensional Fokker-Planck-Lévy (FPL) equations in modeling non-Brownian processes across disciplines such as physics, finance, and ecology. We utilize a fractional score function and Physical-informed neural networks (PINN) to lift the curse of dimensionality (CoD) and alleviate numerical overflow from exponentially decaying solutions with dimensions. The introduction of a fractional score function allows us to transform the FPL equation into a second-order partial differential equation without fractional Laplacian and thus can be readily solved with standard physics-informed neural networks (PINNs). We propose two methods to obtain a fractional score function: fractional score matching (FSM) and score-fPINN for fitting the fractional score function. While FSM is more cost-effective, it relies on known conditional distributions. On the other hand, score-fPINN is independent of specific stochastic differential equations (SDEs) but requires evaluating the PINN model's derivatives, which may be more costly. We conduct our experiments on various SDEs and demonstrate numerical stability and effectiveness of our method in dealing with high-dimensional problems, marking a significant advancement in addressing the CoD in FPL equations.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Detecting Planetary Oblateness in the Era of JWST: A Case Study of Kepler-167e
Authors:
Quanyi Liu,
Wei Zhu,
Yifan Zhou,
Zhecheng Hu,
Zitao Lin,
Fei Dai,
Kento Masuda,
Sharon X. Wang
Abstract:
Planets may be rotationally flattened, and their oblateness thus provide useful information on their formation and evolution. Here we develop a new algorithm that can compute the transit light curve due to an oblate planet very efficiently and use it to study the detectability of planet oblateness (and spin obliquity) with the James Webb Space Telescope (JWST). Using the Jupiter analog, Kepler-167…
▽ More
Planets may be rotationally flattened, and their oblateness thus provide useful information on their formation and evolution. Here we develop a new algorithm that can compute the transit light curve due to an oblate planet very efficiently and use it to study the detectability of planet oblateness (and spin obliquity) with the James Webb Space Telescope (JWST). Using the Jupiter analog, Kepler-167e, as an example, we show that observations of a single transit with JWST are able to detect a Saturn-like oblateness ($f=0.1$) with high confidence, or set a stringent upper limit on the oblateness parameter, as long as the planetary spin is slightly misaligned ($\gtrsim 20^\circ$) with respect to its orbital direction. Based on known obliquity measurements and theoretical arguments, it is reasonable to believe that this level of misalignment may be common. We estimate the sensitivity limit of JWST in oblateness detections and highlight the importance of better characterizations of cold planets in planning future JWST transit observations. The potential to detect rings, moons, and atmospheric species of the cold giants with JWST is also discussed.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
LFPLM: A General and Flexible Load Forecasting Framework based on Pre-trained Language Model
Authors:
Mingyang Gao,
Suyang Zhou,
Wei Gu,
Zhi Wu,
Zijian Hu,
Hong Zhu,
Haiquan Liu
Abstract:
Accurate load forecasting is essential for maintaining the power balance between generators and consumers, especially with the increasing integration of renewable energy sources, which introduce significant intermittent volatility. With the development of data-driven methods, machine learning and deep learning-based models have become the predominant approach for load forecasting tasks. In recent…
▽ More
Accurate load forecasting is essential for maintaining the power balance between generators and consumers, especially with the increasing integration of renewable energy sources, which introduce significant intermittent volatility. With the development of data-driven methods, machine learning and deep learning-based models have become the predominant approach for load forecasting tasks. In recent years, pre-trained language models (PLMs) have made significant advancements, demonstrating superior performance in various fields. This paper proposes a load forecasting method based on PLMs, which offers not only accurate predictive ability but also general and flexible applicability. Additionally, a data modeling method is proposed to effectively transform load sequence data into natural language for PLM training. Furthermore, we introduce a data enhancement strategy that eliminate the impact of PLM hallucinations on forecasting results. The effectiveness of the proposed method has been validated on two real-world datasets. Compared with existing methods, our approach shows state-of-the-art performance across all validation metrics.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
On the limitations of H alpha luminosity as a star formation tracer in spatially resolved observations
Authors:
Zipeng Hu,
Benjamin D. Wibking,
Mark R. Krumholz,
Christoph Federrath
Abstract:
This study examines the limitations of H$α$ luminosity as a tracer of star formation rates (SFR) in spatially resolved observations. We carry out high-resolution simulations of a Milky Way-like galaxy including both supernova and photoionization feedback, and from these we generate synthetic H$α$ emission maps that we compare to maps of the true distribution of young stellar objects (YSOs) on scal…
▽ More
This study examines the limitations of H$α$ luminosity as a tracer of star formation rates (SFR) in spatially resolved observations. We carry out high-resolution simulations of a Milky Way-like galaxy including both supernova and photoionization feedback, and from these we generate synthetic H$α$ emission maps that we compare to maps of the true distribution of young stellar objects (YSOs) on scales from whole-galaxy to individual molecular clouds ($\lesssim 100$ pc). Our results reveal significant spatial mismatches between H$α$ and true YSO maps on sub-100 pc scales, primarily due to ionizing photon leakage, with a secondary contribution from young stars drifting away from their parent molecular clouds. On small scales these effect contribute significantly to the observed anti-correlation between gas and star formation, such that there is noticeably less anti-correlation if we replace an H$α$-based star formation map with a YSO-based one; this in turn implies that previous studies have underestimated the time it takes for young stars to disperse their parent molecular clouds. However, these effects are limited in dense regions with hydrogen columns $N_\mathrm{H} > 3 \times 10^{21}$ cm$^{-2}$, where the H$α$- and YSO-based SFR maps show better agreement. Based on this finding we propose a calibration model that can precisely measure the SFR of large molecular clouds (mean radius > 100 pc) with a combination of H$α$ and CO observations, which provides a foundation for future study of star formation processes in extragalactic molecular clouds.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
TorchOpera: A Compound AI System for LLM Safety
Authors:
Shanshan Han,
Yuhang Yao,
Zijian Hu,
Dimitris Stripelis,
Zhaozhuo Xu,
Chaoyang He
Abstract:
We introduce TorchOpera, a compound AI system for enhancing the safety and quality of prompts and responses for Large Language Models. TorchOpera ensures that all user prompts are safe, contextually grounded, and effectively processed, while enhancing LLM responses to be relevant and high quality. TorchOpera utilizes the vector database for contextual grounding, rule-based wrappers for flexible mo…
▽ More
We introduce TorchOpera, a compound AI system for enhancing the safety and quality of prompts and responses for Large Language Models. TorchOpera ensures that all user prompts are safe, contextually grounded, and effectively processed, while enhancing LLM responses to be relevant and high quality. TorchOpera utilizes the vector database for contextual grounding, rule-based wrappers for flexible modifications, and specialized mechanisms for detecting and adjusting unsafe or incorrect content. We also provide a view of the compound AI system to reduce the computational cost. Extensive experiments show that TorchOpera ensures the safety, reliability, and applicability of LLMs in real-world settings while maintaining the efficiency of LLM responses.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Intermittent Encryption Strategies for Anti-Eavesdrop** Estimation
Authors:
Zhongyao Hu,
Bo Chen,
Pindi Weng,
Jianzheng Wang,
Li Yu
Abstract:
In this paper, an anti-eavesdrop** estimation problem is investigated. A linear encryption scheme is utilized, which first linearly transforms innovation via an encryption matrix and then encrypts some components of the transformed innovation. To reduce the computation and energy resources consumed by the linear encryption scheme, both stochastic and deterministic intermittent strategies which p…
▽ More
In this paper, an anti-eavesdrop** estimation problem is investigated. A linear encryption scheme is utilized, which first linearly transforms innovation via an encryption matrix and then encrypts some components of the transformed innovation. To reduce the computation and energy resources consumed by the linear encryption scheme, both stochastic and deterministic intermittent strategies which perform the linear encryption scheme only at partial moments are developed. When the system is stable, it is shown that the mean squared error (MSE) of the eavesdropper converges under any stochastic or deterministic intermittent strategy. Also, an analytical encryption matrix that maximizes the steady-state of the MSE is designed. When the system is unstable, the eavesdropper's MSE can be unbounded with arbitrary positive encryption probabilities and decision functions if encryption matrices are chosen appropriately. Then, the relationship between the aforementioned encryption parameters and the eavesdropper's MSE is analyzed. Moreover, a single intermittent strategy which only encrypts one message is discussed. This strategy can be unavailable for stable systems, but can make the eavesdropper's MSE unbounded in unstable systems for the encrypted message satisfies a linear matrix inequality (LMI) condition. The effectiveness of the proposed methods is verified in the simulation.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Tandem Photovoltaics from 2D Transition Metal Dichalcogenides on Silicon
Authors:
Zekun Hu,
Sudong Wang,
Jason Lynch,
Adam Alfieri,
Deep Jariwala
Abstract:
The demand for high-efficiency photovoltaic systems necessitates innovations that transcend the efficiency limitations of single-junction solar cells. This study investigates a tandem photovoltaic architecture comprising a top-cell with a transition metal dichalcogenide (TMDC) superlattice absorber and a bottom-cell of crystalline silicon (c-Si), focusing on optimizing the light absorption and ele…
▽ More
The demand for high-efficiency photovoltaic systems necessitates innovations that transcend the efficiency limitations of single-junction solar cells. This study investigates a tandem photovoltaic architecture comprising a top-cell with a transition metal dichalcogenide (TMDC) superlattice absorber and a bottom-cell of crystalline silicon (c-Si), focusing on optimizing the light absorption and electrical performance of the combined structure. Through the transfer matrix method and electrical simulations, we optimized the geometry of the superlattice, determining that a siz-layer MoSe2 configuration with a 40 nm SiO2 antireflective layer maximizes photon absorption while mitigating additional weight and preserving the cell's structural integrity. The results show that the optimized TMDC superlattice significantly improves the PCE of the tandem design to 28.96%, and increase of 5.68% over the original single-junction c-Si solar cell's efficiency. This advancement illustrates the potential of TMDC material in next-generation solar cells and presents a promising avenue for the development of highly efficient, tandem photovoltaic systems via van der Waals integration of the top cell on c-Si
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Spin waves in Dirac semimetal Ca$_{0.6}$Sr$_{0.4}$MnSb$_2$ investigated with neutrons by the diffraction method
Authors:
Xiao Hu,
Yan Wu,
Matthias D. Frontzek,
Zhixiang Hu,
Cedomir Petrovic,
John M. Tranquada,
Igor A. Zaliznyak
Abstract:
We report neutron diffraction measurements of Ca$_{0.6}$Sr$_{0.4}$MnSb$_2$, a low-carrier-density Dirac semimetal in which the antiferromagnetic Mn layers are interleaved with Sb layers that host Dirac fermions. We have discovered that we can detect a good quality inelastic spin wave signal from a small (m ~ 0.28 g) single crystal sample by the diffraction method, without energy analysis, using a…
▽ More
We report neutron diffraction measurements of Ca$_{0.6}$Sr$_{0.4}$MnSb$_2$, a low-carrier-density Dirac semimetal in which the antiferromagnetic Mn layers are interleaved with Sb layers that host Dirac fermions. We have discovered that we can detect a good quality inelastic spin wave signal from a small (m ~ 0.28 g) single crystal sample by the diffraction method, without energy analysis, using a neutron diffractometer with a position-sensitive area detector; the spin-waves appear as diffuse scattering that is shaped by energy-momentum conservation. By fitting this characteristic magnetic scattering to a spin-wave model, we refine all parameters of the model spin Hamiltonian, including the inter-plane interaction, through use of a three-dimensional measurement in reciprocal space. We also measure the temperature dependence of the spin waves, including the softening of the spin gap on approaching the Neel temperature, $T_N$. Not only do our results provide important new insights into an interplay of magnetism and Dirac electrons, they also establish a new, high-throughput approach to characterizing magnetic excitations on a modern diffractometer without direct energy analysis. Our work opens exciting new opportunities for the follow-up parametric and compositional studies on small, ~0.1 g crystals.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Automated Molecular Concept Generation and Labeling with Large Language Models
Authors:
Shichang Zhang,
Botao Xia,
Zimin Zhang,
Qianli Wu,
Fang Sun,
Ziniu Hu,
Yizhou Sun
Abstract:
Artificial intelligence (AI) is significantly transforming scientific research. Explainable AI methods, such as concept-based models (CMs), are promising for driving new scientific discoveries because they make predictions based on meaningful concepts and offer insights into the prediction process. In molecular science, however, explainable CMs are not as common compared to black-box models like G…
▽ More
Artificial intelligence (AI) is significantly transforming scientific research. Explainable AI methods, such as concept-based models (CMs), are promising for driving new scientific discoveries because they make predictions based on meaningful concepts and offer insights into the prediction process. In molecular science, however, explainable CMs are not as common compared to black-box models like Graph Neural Networks (GNNs), primarily due to their requirement for predefined concepts and manual label for each instance, which demand domain knowledge and can be labor-intensive. This paper introduces a novel framework for Automated Molecular Concept (AutoMolCo) generation and labeling. AutoMolCo leverages the knowledge in Large Language Models (LLMs) to automatically generate predictive molecular concepts and label them for each molecule. Such procedures are repeated through iterative interactions with LLMs to refine concepts, enabling simple linear models on the refined concepts to outperform GNNs and LLM in-context learning on several benchmarks. The whole AutoMolCo framework is automated without any human knowledge inputs in either concept generation, labeling, or refinement, thereby surpassing the limitations of extant CMs while maintaining their explainability and allowing easy intervention. Through systematic experiments on MoleculeNet and High-Throughput Experimentation (HTE) datasets, we demonstrate that the AutoMolCo-induced explainable CMs are beneficial and promising for molecular science research.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis
Authors:
Zongyue Qin,
Yunsheng Bai,
Atefeh Sohrabizadeh,
Zijian Ding,
Ziniu Hu,
Yizhou Sun,
Jason Cong
Abstract:
In recent years, domain-specific accelerators (DSAs) have gained popularity for applications such as deep learning and autonomous driving. To facilitate DSA designs, programmers use high-level synthesis (HLS) to compile a high-level description written in C/C++ into a design with low-level hardware description languages that eventually synthesize DSAs on circuits. However, creating a high-quality…
▽ More
In recent years, domain-specific accelerators (DSAs) have gained popularity for applications such as deep learning and autonomous driving. To facilitate DSA designs, programmers use high-level synthesis (HLS) to compile a high-level description written in C/C++ into a design with low-level hardware description languages that eventually synthesize DSAs on circuits. However, creating a high-quality HLS design still demands significant domain knowledge, particularly in microarchitecture decisions expressed as \textit{pragmas}. Thus, it is desirable to automate such decisions with the help of machine learning for predicting the quality of HLS designs, requiring a deeper understanding of the program that consists of original code and pragmas. Naturally, these programs can be considered as sequence data. In addition, these programs can be compiled and converted into a control data flow graph (CDFG). But existing works either fail to leverage both modalities or combine the two in shallow or coarse ways. We propose ProgSG, a model that allows interaction between the source code sequence modality and the graph modality in a deep and fine-grained way. To alleviate the scarcity of labeled designs, a pre-training method is proposed based on a suite of compiler's data flow analysis tasks. Experimental results show that ProgSG reduces the RMSE of design performance predictions by up to $22\%$, and identifies designs with an average of $1.10\times$ and $1.26\times$ (up to $8.17\times$ and $13.31\times$) performance improvement in design space exploration (DSE) task compared to HARP and AutoDSE, respectively.
△ Less
Submitted 27 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Pandora: Towards General World Model with Natural Language Actions and Video States
Authors:
Jiannan Xiang,
Guangyi Liu,
Yi Gu,
Qiyue Gao,
Yuting Ning,
Yuheng Zha,
Zeyu Feng,
Tianhua Tao,
Shibo Hao,
Yemin Shi,
Zhengzhong Liu,
Eric P. Xing,
Zhiting Hu
Abstract:
World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the…
▽ More
World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the physical world, while video models lack interactive action control over the world simulations. This paper makes a step towards building a general world model by introducing Pandora, a hybrid autoregressive-diffusion model that simulates world states by generating videos and allows real-time control with free-text actions. Pandora achieves domain generality, video consistency, and controllability through large-scale pretraining and instruction tuning. Crucially, Pandora bypasses the cost of training-from-scratch by integrating a pretrained LLM (7B) and a pretrained video model, requiring only additional lightweight finetuning. We illustrate extensive outputs by Pandora across diverse domains (indoor/outdoor, natural/urban, human/robot, 2D/3D, etc.). The results indicate great potential of building stronger general world models with larger-scale training.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Quantum Algorithms and Applications for Open Quantum Systems
Authors:
Luis H. Delgado-Granados,
Timothy J. Krogmeier,
LeeAnn M. Sager-Smith,
Irma Avdic,
Zixuan Hu,
Manas Sajjan,
Maryam Abbasi,
Scott E. Smart,
Prineha Narang,
Sabre Kais,
Anthony W. Schlimgen,
Kade Head-Marsden,
David A. Mazziotti
Abstract:
Accurate models for open quantum systems -- quantum states that have non-trivial interactions with their environment -- may aid in the advancement of a diverse array of fields, including quantum computation, informatics, and the prediction of static and dynamic molecular properties. In recent years, quantum algorithms have been leveraged for the computation of open quantum systems as the predicted…
▽ More
Accurate models for open quantum systems -- quantum states that have non-trivial interactions with their environment -- may aid in the advancement of a diverse array of fields, including quantum computation, informatics, and the prediction of static and dynamic molecular properties. In recent years, quantum algorithms have been leveraged for the computation of open quantum systems as the predicted quantum advantage of quantum devices over classical ones may allow previously inaccessible applications. Accomplishing this goal will require input and expertise from different research perspectives, as well as the training of a diverse quantum workforce, making a compilation of current quantum methods for treating open quantum systems both useful and timely. In this Review, we first provide a succinct summary of the fundamental theory of open quantum systems and then delve into a discussion on recent quantum algorithms. We conclude with a discussion of pertinent applications, demonstrating the applicability of this field to realistic chemical, biological, and material systems.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
$T_c$ and the elastocaloric effect of Sr$_2$RuO$_4$ under $\langle 110 \rangle$ uniaxial stress: no indications of transition splitting
Authors:
Fabian Jerzembeck,
You-Sheng Li,
Grgur Palle,
Zhenhai Hu,
Mehdi Biderang,
Naoki Kikugawa,
Dmitry A. Sokolov,
Sayak Ghosh,
Brad J. Ramshaw,
Thomas Scaffidi,
Michael Nicklas,
Jörg Schmalian,
Andrew P. Mackenzie,
Clifford W. Hicks
Abstract:
There is considerable evidence that the superconductivity of Sr2RuO4 has two components. Among this evidence is a jump in the shear elastic modulus $c_{66}$ at the critical temperature $T_c$, observed in ultrasound measurements. Such a jump is forbidden for homogeneous single-component order parameters, and implies that $T_c$ should develop as a cusp under the application of shear strain with…
▽ More
There is considerable evidence that the superconductivity of Sr2RuO4 has two components. Among this evidence is a jump in the shear elastic modulus $c_{66}$ at the critical temperature $T_c$, observed in ultrasound measurements. Such a jump is forbidden for homogeneous single-component order parameters, and implies that $T_c$ should develop as a cusp under the application of shear strain with $\langle 110 \rangle$ principal axes. This shear strain should split the onset temperatures of the two components, if they coexist, or select one component if they do not. Here, we report measurements of $T_c$ and the elastocaloric effect of Sr2RuO4 under uniaxial stress applied along the $[110]$ lattice direction. Within experimental resolution, we resolve neither a cusp in the stress dependence of $T_c$, nor any second transition in the elastocaloric effect data. We show that reconciling these null results with the observed jumps in $c_{66}$ requires extraordinarily fine tuning to a triple point of the Ginzburg-Landau parameter space. In addition, our results are inconsistent with homogeneous time reversal symmetry breaking at a temperature $T_2 \leq T_c$ as identified in muon spin relaxation experiments.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Optical biomarker of metabolism for breast tumor diagnosis: Insights from subcellular dynamics
Authors:
Zichen Yin,
Shuwei Zhang,
Bin He,
Houpu Yang,
Zhengyu Chen,
Zhangwei Hu,
Yejiong Shi,
Ruizhi Xue,
Panqi Yang,
Yuzhe Ying,
Chengming Wang,
Shu Wang,
** Xue
Abstract:
Label-free metabolic dynamics contrast is highly appealing but difficult to achieve in biomedical imaging. Interference offers a highly sensitive mechanism for capturing the metabolic dynamics of the subcellular scatterers. However, traditional interference detection methods fail to isolate pure metabolic dynamics, as the dynamic signals are coupled with scatterer reflectivity and other uncontroll…
▽ More
Label-free metabolic dynamics contrast is highly appealing but difficult to achieve in biomedical imaging. Interference offers a highly sensitive mechanism for capturing the metabolic dynamics of the subcellular scatterers. However, traditional interference detection methods fail to isolate pure metabolic dynamics, as the dynamic signals are coupled with scatterer reflectivity and other uncontrollable imaging factors. Here, we demonstrate active phase modulation-assisted dynamic full-field optical coherence tomography (APMD-FFOCT) that decouples and quantifies the metabolic dynamics by adding a reference movement for all interferential scatterers. This novel technique enables imaging and dynamic analysis of subcellular structures along with their changes during the apoptotic process in tumor tissues. Furthermore, the nucleus-to-cytoplasm dynamic intensity ratio could serve as an optical biomarker for breast tumor grading, enhancing intraoperative diagnosis.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller
Authors:
Min Cai,
Yuchen Zhang,
Shichang Zhang,
Fan Yin,
Difan Zou,
Yisong Yue,
Ziniu Hu
Abstract:
We propose Self-Control, a novel method utilizing suffix gradients to control the behavior of large language models (LLMs) without explicit human annotations. Given a guideline expressed in suffix string and the model's self-assessment of adherence, Self-Control computes the gradient of this self-judgment concerning the model's hidden states, directly influencing the auto-regressive generation pro…
▽ More
We propose Self-Control, a novel method utilizing suffix gradients to control the behavior of large language models (LLMs) without explicit human annotations. Given a guideline expressed in suffix string and the model's self-assessment of adherence, Self-Control computes the gradient of this self-judgment concerning the model's hidden states, directly influencing the auto-regressive generation process towards desired behaviors. To enhance efficiency, we introduce Self-Control_{prefix}, a compact module that encapsulates the learned representations from suffix gradients into a Prefix Controller, facilitating inference-time control for various LLM behaviors. Our experiments demonstrate Self-Control's efficacy across multiple domains, including emotional modulation, ensuring harmlessness, and enhancing complex reasoning. Especially, Self-Control_{prefix} enables a plug-and-play control and jointly controls multiple attributes, improving model outputs without altering model parameters or increasing inference-time costs.
△ Less
Submitted 18 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Distortion-free Watermarks are not Truly Distortion-free under Watermark Key Collisions
Authors:
Yihan Wu,
Ruibo Chen,
Zhengmian Hu,
Yanshuo Chen,
Junfeng Guo,
Hongyang Zhang,
Heng Huang
Abstract:
Language model (LM) watermarking techniques inject a statistical signal into LM-generated content by substituting the random sampling process with pseudo-random sampling, using watermark keys as the random seed. Among these statistical watermarking approaches, distortion-free watermarks are particularly crucial because they embed watermarks into LM-generated content without compromising generation…
▽ More
Language model (LM) watermarking techniques inject a statistical signal into LM-generated content by substituting the random sampling process with pseudo-random sampling, using watermark keys as the random seed. Among these statistical watermarking approaches, distortion-free watermarks are particularly crucial because they embed watermarks into LM-generated content without compromising generation quality. However, one notable limitation of pseudo-random sampling compared to true-random sampling is that, under the same watermark keys (i.e., key collision), the results of pseudo-random sampling exhibit correlations. This limitation could potentially undermine the distortion-free property. Our studies reveal that key collisions are inevitable due to the limited availability of watermark keys, and existing distortion-free watermarks exhibit a significant distribution bias toward the original LM distribution in the presence of key collisions. Moreover, achieving a perfect distortion-free watermark is impossible as no statistical signal can be embedded under key collisions. To reduce the distribution bias caused by key collisions, we introduce a new family of distortion-free watermarks--beta-watermark. Experimental results support that the beta-watermark can effectively reduce the distribution bias under key collisions.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors
Authors:
Mengge Xue,
Zhenyu Hu,
Liqun Liu,
Kuo Liao,
Shuang Li,
Honglin Han,
Meng Zhao,
Chengguo Yin
Abstract:
Multiple-Choice Questions (MCQs) constitute a critical area of research in the study of Large Language Models (LLMs). Previous works have investigated the selection bias problem in MCQs within few-shot scenarios, in which the LLM's performance may be influenced by the presentation of answer choices, leaving the selection bias during Supervised Fine-Tuning (SFT) unexplored. In this paper, we reveal…
▽ More
Multiple-Choice Questions (MCQs) constitute a critical area of research in the study of Large Language Models (LLMs). Previous works have investigated the selection bias problem in MCQs within few-shot scenarios, in which the LLM's performance may be influenced by the presentation of answer choices, leaving the selection bias during Supervised Fine-Tuning (SFT) unexplored. In this paper, we reveal that selection bias persists in the SFT phase , primarily due to the LLM's inadequate Multiple Choice Symbol Binding (MCSB) ability. This limitation implies that the model struggles to associate the answer options with their corresponding symbols (e.g., A/B/C/D) effectively. To enhance the model's MCSB capability, we first incorporate option contents into the loss function and subsequently adjust the weights of the option symbols and contents, guiding the model to understand the option content of the current symbol. Based on this, we introduce an efficient SFT algorithm for MCQs, termed Point-wise Intelligent Feedback (PIF). PIF constructs negative instances by randomly combining the incorrect option contents with all candidate symbols, and proposes a point-wise loss to provide feedback on these negative samples into LLMs. Our experimental results demonstrate that PIF significantly reduces the model's selection bias by improving its MCSB capability. Remarkably, PIF exhibits a substantial enhancement in the accuracy for MCQs.
△ Less
Submitted 6 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay
Authors:
Daya Bay collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
J. Cheng,
Y. -C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng
, et al. (177 additional authors not shown)
Abstract:
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive…
▽ More
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Towards Point Cloud Compression for Machine Perception: A Simple and Strong Baseline by Learning the Octree Depth Level Predictor
Authors:
Lei Liu,
Zhihao Hu,
Zhenghao Chen
Abstract:
Point cloud compression has garnered significant interest in computer vision. However, existing algorithms primarily cater to human vision, while most point cloud data is utilized for machine vision tasks. To address this, we propose a point cloud compression framework that simultaneously handles both human and machine vision tasks. Our framework learns a scalable bit-stream, using only subsets fo…
▽ More
Point cloud compression has garnered significant interest in computer vision. However, existing algorithms primarily cater to human vision, while most point cloud data is utilized for machine vision tasks. To address this, we propose a point cloud compression framework that simultaneously handles both human and machine vision tasks. Our framework learns a scalable bit-stream, using only subsets for different machine vision tasks to save bit-rate, while employing the entire bit-stream for human vision tasks. Building on mainstream octree-based frameworks like VoxelContext-Net, OctAttention, and G-PCC, we introduce a new octree depth-level predictor. This predictor adaptively determines the optimal depth level for each octree constructed from a point cloud, controlling the bit-rate for machine vision tasks. For simpler tasks (\textit{e.g.}, classification) or objects/scenarios, we use fewer depth levels with fewer bits, saving bit-rate. Conversely, for more complex tasks (\textit{e.g}., segmentation) or objects/scenarios, we use deeper depth levels with more bits to enhance performance. Experimental results on various datasets (\textit{e.g}., ModelNet10, ModelNet40, ShapeNet, ScanNet, and KITTI) show that our point cloud compression approach improves performance for machine vision tasks without compromising human vision quality.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Dynamic Multi-Objective Lion Swarm Optimization with Multi-strategy Fusion: An application in 6R robot trajectory planning
Authors:
Bao Liu,
Tianbao Liu,
Zhongshuo Hu,
Fei Ye,
Lei Gao
Abstract:
The advancement of industrialization has spurred the development of innovative swarm intelligence algorithms, with Lion Swarm Optimization (LSO) notable for its robustness, parallelism, simplicity, and efficiency. While LSO excels in single-objective optimization, its multi-objective variants face challenges such as poor initialization, local optima entrapment, and so on. This study proposes Dynam…
▽ More
The advancement of industrialization has spurred the development of innovative swarm intelligence algorithms, with Lion Swarm Optimization (LSO) notable for its robustness, parallelism, simplicity, and efficiency. While LSO excels in single-objective optimization, its multi-objective variants face challenges such as poor initialization, local optima entrapment, and so on. This study proposes Dynamic Multi-Objective Lion Swarm Optimization with Multi-strategy Fusion (MF-DMOLSO) to address these limitations. MF-DMOLSO comprises three key components: initialization, swarm position update, and external archive update. The initialization unit employs chaotic map** for uniform population distribution. The position update unit enhances behavior patterns and step size formulas for cub lions, incorporating crowding degree sorting, Pareto non-dominated sorting, and Levy flight to improve convergence speed and global search capabilities. Reference points guide convergence in higher-dimensional spaces, maintaining population diversity. An adaptive cold-hot start strategy generates a population responsive to environmental changes. The external archive update unit re-evaluates solutions based on non-domination and diversity to form the new population. Evaluations on benchmark functions showed MF-DMOLSO surpassed multi-objective particle swarm optimization, non-dominated sorting genetic algorithm II, and multi-objective lion swarm optimization, exceeding 90% accuracy for two-objective and 97% for three-objective problems. Compared to non-dominated sorting genetic algorithm III, MF-DMOLSO showed a 60% improvement. Applied to 6R robot trajectory planning, MF-DMOLSO optimized running time and maximum acceleration to 8.3s and 0.3pi rad/s^2, achieving a set coverage rate of 70.97% compared to 2% by multi-objective particle swarm optimization, thus improving efficiency and reducing mechanical dither.
△ Less
Submitted 7 June, 2024; v1 submitted 31 May, 2024;
originally announced June 2024.
-
Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning CodeLLMs
Authors:
Zichao Hu,
Junyi Jessy Li,
Arjun Guha,
Joydeep Biswas
Abstract:
Large language models (LLMs) have shown great promise at generating robot programs from natural language given domain-specific robot application programming interfaces (APIs). However, the performance gap between proprietary LLMs and smaller open-weight LLMs remains wide. This raises a question: Can we fine-tune smaller open-weight LLMs for generating domain-specific robot programs to close the pe…
▽ More
Large language models (LLMs) have shown great promise at generating robot programs from natural language given domain-specific robot application programming interfaces (APIs). However, the performance gap between proprietary LLMs and smaller open-weight LLMs remains wide. This raises a question: Can we fine-tune smaller open-weight LLMs for generating domain-specific robot programs to close the performance gap with proprietary LLMs? While Self-Instruct is a promising solution by generating a diverse set of training data, it cannot verify the correctness of these programs. In contrast, a robot simulator with a well-defined world can identify execution errors but limits the diversity of programs that it can verify. In this work, we introduce Robo-Instruct, which brings the best of both worlds -- it promotes the diversity of Self-Instruct while providing the correctness of simulator-based checking. Robo-Instruct introduces RoboSim to synthesize a consistent world state on the fly by inferring properties relevant to the program being checked, and simulating actions accordingly. Furthermore, the instructions and programs generated by Self-Instruct may be subtly inconsistent -- such as the program missing a step implied by the instruction. Robo-Instruct further addresses this with InstAlign, an instruction-program alignment procedure that revises the task instruction to reflect the actual results of the generated program. Given a few seed task descriptions and the robot APIs, Robo-Instruct is capable of generating a training dataset using only a small open-weight model. This dataset can then be used to fine-tune small open-weight language models, enabling them to match or even exceed the performance of several proprietary LLMs, such as GPT-3.5-Turbo and Gemini-Pro.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding
Authors:
Kuo Liao,
Shuang Li,
Meng Zhao,
Liqun Liu,
Mengge Xue,
Zhenyu Hu,
Honglin Han,
Chengguo Yin
Abstract:
Recent strides in large language models (LLMs) have yielded remarkable performance, leveraging reinforcement learning from human feedback (RLHF) to significantly enhance generation and alignment capabilities. However, RLHF encounters numerous challenges, including the objective mismatch issue, leading to suboptimal performance in Natural Language Understanding (NLU) tasks. To address this limitati…
▽ More
Recent strides in large language models (LLMs) have yielded remarkable performance, leveraging reinforcement learning from human feedback (RLHF) to significantly enhance generation and alignment capabilities. However, RLHF encounters numerous challenges, including the objective mismatch issue, leading to suboptimal performance in Natural Language Understanding (NLU) tasks. To address this limitation, we propose a novel Reinforcement Learning framework enhanced with Label-sensitive Reward (RLLR) to amplify the performance of LLMs in NLU tasks. By incorporating label-sensitive pairs into reinforcement learning, our method aims to adeptly capture nuanced label-sensitive semantic features during RL, thereby enhancing natural language understanding. Experiments conducted on five diverse foundation models across eight tasks showcase promising results. In comparison to Supervised Fine-tuning models (SFT), RLLR demonstrates an average performance improvement of 1.54%. Compared with RLHF models, the improvement averages at 0.69%. These results reveal the effectiveness of our method for LLMs in NLU tasks. Code and data available at: https://github.com/MagiaSN/ACL2024_RLLR.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Encoding and Controlling Global Semantics for Long-form Video Question Answering
Authors:
Thong Thanh Nguyen,
Zhiyuan Hu,
Xiaobao Wu,
Cong-Duy T Nguyen,
See-Kiong Ng,
Anh Tuan Luu
Abstract:
Seeking answers effectively for long videos is essential to build video question answering (videoQA) systems. Previous methods adaptively select frames and regions from long videos to save computations. However, this fails to reason over the whole sequence of video, leading to sub-optimal performance. To address this problem, we introduce a state space layer (SSL) into multi-modal Transformer to e…
▽ More
Seeking answers effectively for long videos is essential to build video question answering (videoQA) systems. Previous methods adaptively select frames and regions from long videos to save computations. However, this fails to reason over the whole sequence of video, leading to sub-optimal performance. To address this problem, we introduce a state space layer (SSL) into multi-modal Transformer to efficiently integrate global semantics of the video, which mitigates the video information loss caused by frame and region selection modules. Our SSL includes a gating unit to enable controllability over the flow of global semantics into visual representations. To further enhance the controllability, we introduce a cross-modal compositional congruence (C^3) objective to encourage global semantics aligned with the question. To rigorously evaluate long-form videoQA capacity, we construct two new benchmarks Ego-QA and MAD-QA featuring videos of considerably long length, i.e. 17.5 minutes and 1.9 hours, respectively. Extensive experiments demonstrate the superiority of our framework on these new as well as existing datasets.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Enhancing Large Vision Language Models with Self-Training on Image Comprehension
Authors:
Yihe Deng,
Pan Lu,
Fan Yin,
Ziniu Hu,
Sheng Shen,
James Zou,
Kai-Wei Chang,
Wei Wang
Abstract:
Large vision language models (LVLMs) integrate large language models (LLMs) with pre-trained vision encoders, thereby activating the perception capability of the model to understand image inputs for different queries and conduct subsequent reasoning. Improving this capability requires high-quality vision-language data, which is costly and labor-intensive to acquire. Self-training approaches have b…
▽ More
Large vision language models (LVLMs) integrate large language models (LLMs) with pre-trained vision encoders, thereby activating the perception capability of the model to understand image inputs for different queries and conduct subsequent reasoning. Improving this capability requires high-quality vision-language data, which is costly and labor-intensive to acquire. Self-training approaches have been effective in single-modal settings to alleviate the need for labeled data by leveraging model's own generation. However, effective self-training remains a challenge regarding the unique visual perception and reasoning capability of LVLMs. To address this, we introduce Self-Training on Image Comprehension (STIC), which emphasizes a self-training approach specifically for image comprehension. First, the model self-constructs a preference dataset for image descriptions using unlabeled images. Preferred responses are generated through a step-by-step prompt, while dis-preferred responses are generated from either corrupted images or misleading prompts. To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data and append its self-generated image descriptions to the prompts. We validate the effectiveness of STIC across seven different benchmarks, demonstrating substantial performance gains of 4.0% on average while using 70% less supervised fine-tuning data than the current method. Further studies investigate various components of STIC and highlight its potential to leverage vast quantities of unlabeled images for self-training. Code and data are made publicly available.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Learning Interpretable Scheduling Algorithms for Data Processing Clusters
Authors:
Zhibo Hu,
Chen Wang,
Helen,
Paik,
Yanfeng Shu,
Liming Zhu
Abstract:
Workloads in data processing clusters are often represented in the form of DAG (Directed Acyclic Graph) jobs. Scheduling DAG jobs is challenging. Simple heuristic scheduling algorithms are often adopted in practice in production data centres. There is much room for scheduling performance optimisation for cost saving. Recently, reinforcement learning approaches (like decima) have been attempted to…
▽ More
Workloads in data processing clusters are often represented in the form of DAG (Directed Acyclic Graph) jobs. Scheduling DAG jobs is challenging. Simple heuristic scheduling algorithms are often adopted in practice in production data centres. There is much room for scheduling performance optimisation for cost saving. Recently, reinforcement learning approaches (like decima) have been attempted to optimise DAG job scheduling and demonstrate clear performance gain in comparison to traditional algorithms. However, reinforcement learning (RL) approaches face their own problems in real-world deployment. In particular, their black-box decision making processes and generalizability in unseen workloads may add a non-trivial burden to the cluster administrators. Moreover, adapting RL models on unseen workloads often requires significant amount of training data, which leaves edge cases run in a sub-optimal mode. To fill the gap, we propose a new method to distill a simple scheduling policy based on observations of the behaviours of a complex deep learning model. The simple model not only provides interpretability of scheduling decisions, but also adaptive to edge cases easily through tuning. We show that our method achieves high fidelity to the decisions made by deep learning models and outperforms these models when additional heuristics are taken into account.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions
Authors:
Zhe Hu,
Tuo Liang,
**g Li,
Yiren Lu,
Yunlai Zhou,
Yiran Qiao,
**g Ma,
Yu Yin
Abstract:
Recent advancements in large multimodal language models have demonstrated remarkable proficiency across a wide range of tasks. Yet, these models still struggle with understanding the nuances of human humor through juxtaposition, particularly when it involves nonlinear narratives that underpin many jokes and humor cues. This paper investigates this challenge by focusing on comics with contradictory…
▽ More
Recent advancements in large multimodal language models have demonstrated remarkable proficiency across a wide range of tasks. Yet, these models still struggle with understanding the nuances of human humor through juxtaposition, particularly when it involves nonlinear narratives that underpin many jokes and humor cues. This paper investigates this challenge by focusing on comics with contradictory narratives, where each comic consists of two panels that create a humorous contradiction. We introduce the YesBut benchmark, which comprises tasks of varying difficulty aimed at assessing AI's capabilities in recognizing and interpreting these comics, ranging from literal content comprehension to deep narrative reasoning. Through extensive experimentation and analysis of recent commercial or open-sourced large (vision) language models, we assess their capability to comprehend the complex interplay of the narrative humor inherent in these comics. Our results show that even state-of-the-art models still lag behind human performance on this task. Our findings offer insights into the current limitations and potential improvements for AI in understanding human creative expressions.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.